SQLarray is a thin wrapper around pysqlite SQL tables. The main features ares that SELECT queries can return numpy.recarrays and the SQLarray.selection() method returns a new SQLarray instance.
numpy arrays can be stored in sql fields which allows advanced table aggregate functions such as histogram.
A number of additional SQL functions are defined.
TODO: |
|
---|
See also
PyTables is a high-performance interface to table data.
Ring buffer with key lookup.
Basically a ringbuffer for the keys and a dict (k,v) that is cleaned up to reflect the keys in the Ringbuffer.
x.append(k,v)
Reinitialize the KRingbuffer to empty.
Ring buffer of size capacity; ‘pushes’ data from left and discards on the right.
A SQL table that returns (mostly) rec arrays.
Arguments : |
|
---|---|
Bugs : |
|
Execute a simple SQL SELECT statement and returns values as new numpy rec array.
The arguments fields and the additional optional arguments are simply concatenated with additional SQL statements according to the template:
SELECT <fields> FROM __self__ [args]
The simplest fields argument is "*".
Create a recarray in which students with average grade less than 3 are listed:
result = T.SELECT("surname, subject, year, avg(grade) AS avg_grade",
"WHERE avg_grade < 3", "GROUP BY surname,subject",
"ORDER BY avg_grade,surname")
The resulting SQL would be:
SELECT surname, subject, year, avg(grade) AS avg_grade FROM __self__
WHERE avg_grade < 3
GROUP BY surname,subject
ORDER BY avg_grade,surname
Note how one can use aggregate functions such avg().
The string ‘__self__’ is automatically replaced with the table name (T.name); this can be used for cartesian products such as
LEFT JOIN __self__ WHERE ...
Note
See the documentation for sql() for more details on the available keyword arguments and the use of ? parameter interpolation.
Return minimum and maximum of variable across all rows of data.
Merge another recarray with the same columns into this table.
Arguments : |
|
---|---|
Returns : | n number of inserted rows |
Raises : | Raises an exception if duplicate and incompatible data exist in the main table and the new one. |
Merge an existing table in the database with the __self__ table.
Executes as 'INSERT INTO __self__ SELECT * FROM <name>'. However, this method is probably used less often than the simpler merge().
Arguments : | name name of the table in the database (must be compatible with __self__) |
---|---|
Returns : | n number of inserted rows |
Return underlying SQL table as a read-only record array.
Return a new SQLarray from a SELECT selection.
This is a very useful method because it allows one to build complicated selections and essentially new tables from existing data.
Examples:
s = selection('a > 3')
s = selection('a > ?', (3,))
s = selection('SELECT * FROM __self__ WHERE a > ? AND b < ?', (3, 10))
Execute sql statement.
Arguments : |
|
---|
Warning
There are no sanity checks applied to the SQL.
If possible, the returned list of tuples is turned into a numpy record array, otherwise the original list of tuples is returned.
Warning
Potential BUG: if there are memory issues then it can happen that we just silently fall back to a tuple even though calling code expects a recarray; because we swallowed ANY exception the caller will never know
The last cachesize queries are cached (for cache=True) and are returned directly unless the table has been modified.
Note
‘__self__’ is substituted with the table name. See the doc string of the SELECT() method for more details.
Add a named index on given columns to improve performance.
Execute a simple SQL SELECT statement and returns values as new numpy rec array.
The arguments fields and the additional optional arguments are simply concatenated with additional SQL statements according to the template:
SELECT <fields> FROM __self__ [args]
The simplest fields argument is "*".
Create a recarray in which students with average grade less than 3 are listed:
result = T.SELECT("surname, subject, year, avg(grade) AS avg_grade",
"WHERE avg_grade < 3", "GROUP BY surname,subject",
"ORDER BY avg_grade,surname")
The resulting SQL would be:
SELECT surname, subject, year, avg(grade) AS avg_grade FROM __self__
WHERE avg_grade < 3
GROUP BY surname,subject
ORDER BY avg_grade,surname
Note how one can use aggregate functions such avg().
The string ‘__self__’ is automatically replaced with the table name (T.name); this can be used for cartesian products such as
LEFT JOIN __self__ WHERE ...
Note
See the documentation for sql() for more details on the available keyword arguments and the use of ? parameter interpolation.
Create a SQLarray from filename.
Arguments : |
|
---|
Turn a restructured text simple table into a numpy array. See the Example below for how the table must look like. The module allows inclusion of parameters and data in the documentation itself in a natural way. Thus the parameters are automatically documented and only exist in a single place. The idea is inspired by literate programming and is embodied by the DRY (“Do not repeat yourself”) principle.
Note that not the full specifications of the original restructured text simple table are supported. In order to keep the parser simple, the following additional restriction apply:
The following table is converted:
Table[laureates]: Physics Nobel prize statistics.
============= ========== =========
name age year
============= ========== =========
A. Einstein 42 1921
P. Dirac 31 1933
R. P. Feynman 47 1965
============= ========== =========
with
>>> import recsql.rest_table as T
>>> P = T.Table2array(T.__doc__)
>>> P.recarray()
rec.array([(u'A. Einstein', 42, 1921), (u'P. Dirac', 31, 1933),
(u'R. P. Feynman', 47, 1965)],
dtype=[('name', '<U52'), ('age', '<i4'), ('year', '<i4')])
The only class that the user really needs to know anything about is recsql.rest_table.Table2array.
Primitive parser that converts a simple reST table into numpy.recarray.
The table must be the only table in the text. It must look similar to the example below (variable parts in angle brackets, optional in double brackets, everything else must be there, matching is case sensitive, ‘....’ signifies repetition in kind):
Table[<NAME>]: <<CAPTION>>
============ =========== ====================== ....
<COLNAME 1> <COLNAME 2> .... ....
============ =========== ====================== ....
<VALUE> <VALUE> <VALUE> <VALUE> ....
....
....
============ =========== ====================== ....
Rows may not span multiple lines. The column names must be single words and legal python names (no spaces, no dots, not starting with a number).
Field values are converted to one of the following python types: int, float, or str.
If a value is quote with single or double quotation marks then the outermost quotation marks are stripped and the enclosed value treated as a string.
Note
Values such as 001 must be quoted as ‘001’ or they will be interpreted as integers (1 in this case).
Table2array(string) –> parser
Arguments : |
|
---|
Return a recarray from the (parsed) string.
Signifies a failure to parse.
Turn a CSV table into a numpy array.
Uses csv (requires python 2.6 or better).
Read a csv file and provide conversion to a numpy.recarray.
Arguments : |
|
---|
Returns data as numpy.recarray.
Returns a unicode string that can be used as a legal python identifier.
Arguments : |
|
---|
Automatically convert an input value to a special python object.
The Autoconverter.convert() method turns the value into a special python value and casts strings to the “best” type (see besttype()).
The defaults for the conversion of a input field value to a special python value are:
value python ‘—’ None ‘none’ ‘None’ ‘’ ‘True’ True ‘x’ ‘X’ ‘yes’ ‘False’ False ‘-‘ ‘no’
If the sep keyword is set to a string instead of False then values are split into tuples. Probably the most convenient way to use this is to set sep = True (or None) because this splits on all white space whereas sep = ‘ ‘ would split multiple spaces.
Initialize the converter.
Arguments : |
|
---|
Convert x (if in the active state)
If set to True then conversion takes place; False just returns besttype() applid to the value.
Convert string x to the most useful type, i.e. int, float or unicode string.
If x is a quoted string (single or double quotes) then the quotes are stripped and the enclosed string returned.
Note
Strings will be returned as Unicode strings (using unicode()), based on the encoding argument, which is utf-8 by default.
Convert obj to unicode (if it can be be converted)
This module contains new SQL functions to be added to a SQLite database that can be used in the same way as the builtin functions.
Example:
Add the functions to an existing connection in the following way (assuming that the db connection is available in self.connection):
from sqlfunctions import * self.connection.create_function("sqrt", 1, _sqrt) self.connection.create_function("pow", 2, _pow) self.connection.create_function("match", 2, _match) # implements MATCH self.connection.create_function("regexp", 2, _regexp) # implements REGEXP self.connection.create_function("fformat",2,_fformat) self.connection.create_aggregate("std",1,_Stdev) self.connection.create_aggregate("median",1,_Median) self.connection.create_aggregate("array",1,_NumpyArray) self.connection.create_aggregate("histogram",4,_NumpyHistogram) self.connection.create_aggregate("distribution",4,_NormedNumpyHistogram) self.connection.create_aggregate("meanhistogram",5,_MeanHistogram) self.connection.create_aggregate("stdhistogram",5,_StdHistogram) self.connection.create_aggregate("minhistogram",5,_MinHistogram) self.connection.create_aggregate("maxhistogram",5,_MaxHistogram) self.connection.create_aggregate("medianhistogram",5,_MedianHistogram) self.connection.create_aggregate("zscorehistogram",5,_ZscoreHistogram)
Compute func() over data aggregated in bins.
(x,y) –> (x’, func(Y’)) with Y’ = {y: y(x) where x in x’ bin}
First the data is collected in bins x’ along x and then func is applied to all data points Y’ that have been collected in the bin.
Arguments : |
|
---|---|
Returns : |
|
Helper functions that are used throughout the recsql package.
Declare types as ‘NumpyArray’:
cur.execute("CREATE TABLE test(a NumpyArray)")
cur.execute("INSERT INTO test(a) values (?)", (my_array,))
or as column types:
cur.execute('SELECT a as "a [NumpyArray]" from test')
Pseudo recarray that is used to feed SQLarray:
Must only implement:
recarray.dtype.names sequence of column names iteration yield records
adapter: store numpy arrays in the db as ascii pickles
adapter: store python objects in the db as ascii pickles
converter: retrieve numpy arrays from the db as ascii pickles
convertor: retrieve python objects from the db as ascii pickles