Package recsql
Source Code for Package recsql

  1  # RecSQL -- a simple mash-up of sqlite and numpy.recsql 
  2  # Copyright (C) 2007-2010 Oliver Beckstein <orbeckst@gmail.com> 
  3  # Released under the GNU Public License, version 3 or higher (your choice) 
  4  """ 
  5  ================ 
  6   RecSQL package 
  7  ================ 
  8   
  9  RecSQL is a simple module that provides a numpy.record array frontend 
 10  to an underlying SQLite table. 
 11   
 12  The :class:`SQLarray` object populates a SQL table from a numpy record array, a 
 13  iterable that supplies table records, or a string that contains an 
 14  especially simple reStructured text table. The SQL table is held in memory 
 15  and functions are provided to run SQL queries and commands on the 
 16  underlying database. Queries return record arrays if possible (although a 
 17  flag can explicitly change this). 
 18   
 19  Query results are cached to improve performance. This can be disabled 
 20  (which is recommened for large data sets). 
 21   
 22  The SQL table is named on initialization. Later one can refer to this table 
 23  by the name or the magic name *__self__* in SQL statements. Additional 
 24  tables can be added to the same database (by using the connection keyword 
 25  of the constructor) 
 26   
 27  The :mod:`recsql.rest_table` module uses the base functionality to 
 28  parse a restructured text table from a string (such as a doc string) 
 29  and returns a nicely structured table. This allows for use of 
 30  parameters that are documented in the doc strings. 
 31   
 32  .. SeeAlso:: PyTables_ is a high-performance interface to table 
 33               data. In most cases you will probably better off in the 
 34               long run using PyTables than recSQL. 
 35   
 36  .. _PyTables: http://www.pytables.org 
 37   
 38   
 39  Important functions and classes 
 40  =============================== 
 41   
 42  A :class:`SQLarray` can be constructed by either reading data from a 
 43  CSV file or reST table with the :func:`SQLarray_fromfile` function or 
 44  constructed directly from a :class:`numpy.recarray` via the 
 45  :class:`SQLarray` constructor. 
 46   
 47  .. autofunction:: SQLarray_fromfile 
 48  .. autoclass:: SQLarray 
 49     :members: 
 50   
 51  Example 
 52  ======= 
 53   
 54     >>> from recsql import SQLarray 
 55     >>> import numpy 
 56     >>> a = numpy.rec.fromrecords(numpy.arange(100).reshape(25,4), names='a,b,c,d') 
 57     >>> Q = SQLarray('my_name', a) 
 58     >>> print repr(Q.recarray) 
 59     rec.array([(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14, 15), 
 60            (16, 17, 18, 19), (20, 21, 22, 23), (24, 25, 26, 27), 
 61            (28, 29, 30, 31), (32, 33, 34, 35), (36, 37, 38, 39), 
 62            (40, 41, 42, 43), (44, 45, 46, 47), (48, 49, 50, 51), 
 63            (52, 53, 54, 55), (56, 57, 58, 59), (60, 61, 62, 63), 
 64            (64, 65, 66, 67), (68, 69, 70, 71), (72, 73, 74, 75), 
 65            (76, 77, 78, 79), (80, 81, 82, 83), (84, 85, 86, 87), 
 66            (88, 89, 90, 91), (92, 93, 94, 95), (96, 97, 98, 99)], 
 67           dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4'), ('d', '<i4')]) 
 68     >>> Q.SELECT('*', 'WHERE a < 10 AND b > 5') 
 69     rec.array([(8, 9, 10, 11)], 
 70         dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4'), ('d', '<i4')]) 
 71     # creating new SQLarrays: 
 72     >>> R = Q.selection('a < 20 AND b > 5') 
 73     >>> print R 
 74     <recsql.sqlarray.SQLarray object at 0x...> 
 75   
 76   
 77   
 78  Additional SQL functions 
 79  ======================== 
 80   
 81  Note that the SQL database that is used as the backend for 
 82  :class:`SQLarray` has a few additional functions defined in addition 
 83  to the `standard SQL available in sqlite`_. These can be used in 
 84  ``SELECT`` statements and often avoid post-processing of record arrays 
 85  in python. It is relatively straightforward to add new functions (see 
 86  the source code and in particular the 
 87  :meth:`recsql.sqlarray.SQLarray._init_sql_functions` method; the 
 88  functions themselves are defined in the module :mod:`recsql.sqlfunctions`). 
 89   
 90  .. _standard SQL available in sqlite: http://www.sqlite.org/lang.html 
 91   
 92   
 93  Simple SQL functions 
 94  -------------------- 
 95   
 96  Simple functions transform a single input value into a single output value: 
 97   
 98  =====================   ============================================= 
 99  Expression              SQL equivalent 
100  =====================   ============================================= 
101    y = f(x)               SELECT f(x) AS y 
102  =====================   ============================================= 
103   
104  Two additional simple functions have been defined: 
105   
106  =====================   ============================================= 
107  Simple SQL f()           description 
108  =====================   ============================================= 
109  sqrt(x)                  square root math.sqrt(x) 
110  fformat(format,x)        string formatting of a single value format % x 
111  =====================   ============================================= 
112   
113   
114  Aggregate SQL functions 
115  ----------------------- 
116   
117  Aggregate functions combine data from a query; they are typically used with 
118  a 'GROUP BY col' clause. They can be thought of as numpy ufuncs: 
119   
120  =====================   ============================================= 
121  Expression              SQL equivalent 
122  =====================   ============================================= 
123    y = f(x1,x2,...xN)     SELECT f(x) AS y ... GROUP BY x 
124  =====================   ============================================= 
125   
126  For completeness, the tablke also lists sqlite built-in aggregate 
127  functions: 
128   
129  =====================   ============================================= 
130  Simple aggregate f()     description 
131  =====================   ============================================= 
132  avg(x)                   mean [sqlite builtin] 
133  std(x)                   standard deviation (using N-1 variance) 
134  median(x)                median of the data (see ``numpy.median``) 
135  min(x)                   minimum [sqlite builtin] 
136  max(x)                   maximum [sqlite builtin] 
137  =====================   ============================================= 
138   
139   
140  PyAggregate SQL functions 
141  ------------------------- 
142   
143  PyAggregate functions act on a list of data points in the same way as 
144  ordinary aggregate functions but they return python objects such as numpy 
145  arrays, or tuples of numpy arrays (eg bin edges and histogram). In order to 
146  make this work, specific types have to be declared when returning the 
147  results: 
148   
149  For instance, the histogram() function returns a python Object, the tuple 
150  (histogram, edges):: 
151   
152     a.sql('SELECT histogram(x) AS "x [Object]" FROM __self__', asrecarray=False) 
153   
154  The return type ('Object') needs to be declared with the ``'AS "x [Object]"'`` 
155  syntax (note the quotes). (See more details in the `sqlite documentation`_ 
156  under `adapters and converters`_.) The following table lists all *PyAggregate* 
157  functions that have been defined: 
158   
159  .. _sqlite documentation: http://docs.python.org/library/sqlite3.html 
160  .. _adapters and converters:  
161     http://docs.python.org/library/sqlite3.html#using-adapters-to-store-additional-python-types-in-sqlite-databases 
162   
163  ===============  ==============  ============================================================== 
164  PyAggregate      type            signature; description 
165  ===============  ==============  ============================================================== 
166  array             NumpyArray     array(x); 
167                                   a standard numpy array 
168   
169  histogram         Object         histogram(x,nbins,xmin,xmax);  
170                                   histogram x in nbins evenly spaced bins between xmin and xmax 
171   
172  distribution      Object         distribution(x,nbins,xmin,xmax); 
173                                   normalized histogram whose integral gives 1 
174   
175  meanhistogram     Object         meanhistogram(x,y,nbins,xmin,xmax);  
176                                   histogram data points y along x and average all y in each bin 
177   
178  stdhistogram      Object         stdhistogram(x,y,nbins,xmin,xmax);  
179                                   give the standard deviation (from N-1 variance) 
180                                   std(y) = sqrt(Var(y)) with Var(y) = <(y-<y>)^2> 
181   
182  medianhistogram   Object         medianhistogram((x,y,nbins,xmin,xmax); 
183                                   median(y) 
184   
185  minhistogram      Object         minhistogram((x,y,nbins,xmin,xmax); 
186                                   min(y) 
187   
188  maxhistogram      Object         maxhistogram((x,y,nbins,xmin,xmax); 
189                                   max(y) 
190   
191  zscorehistogram   Object         zscorehistogram((x,y,nbins,xmin,xmax); 
192                                   <abs(y-<y>)>/std(y) 
193  ===============  ==============  ============================================================== 
194   
195   
196   
197  Examples of using types in tables 
198  ================================= 
199   
200  The following show how to use the special types. 
201   
202  Declare types as 'NumpyArray':: 
203   
204     a.sql("CREATE TABLE __self__(a NumpyArray)") 
205   
206  Then you can simply insert python objects (type(my_array) == numpy.ndarray):: 
207   
208     a.sql("INSERT INTO __self__(a) values (?)", (my_array,)) 
209   
210  When returning results of declared columns one does not have to do anything :: 
211   
212     (my_array,) = a.sql("SELECT a FROM __self__") 
213   
214  although one can also do :: 
215   
216     (my_array,) = q.sql('SELECT a AS "a [NumpyArray]" FROM __self__') 
217   
218  but when using a PyAggregate the type *must* be declared:: 
219   
220     a.sql('SELECT histogram(x,10,0.0,1.5) as "hist [Object]" FROM __self__') 
221  """ 
222  VERSION = 0,7,3 
223   
224  __all__ = ['sqlarray'] 
225   
226  from sqlarray import SQLarray, SQLarray_fromfile 
227   
228 -def get_version(): 
229      """Return current package version as a string.""" 
230      return ".".join(map(str,VERSION)) 
231   
232 -def get_version_tuple(): 
233      """Return current package version as a (MAJOR,MINOR,PATCHLEVEL).""" 
234      return tuple(VERSION) 
235