1
2
3
4 """
5 ================
6 RecSQL package
7 ================
8
9 RecSQL is a simple module that provides a numpy.record array frontend
10 to an underlying SQLite table.
11
12 The :class:`SQLarray` object populates a SQL table from a numpy record array, a
13 iterable that supplies table records, or a string that contains an
14 especially simple reStructured text table. The SQL table is held in memory
15 and functions are provided to run SQL queries and commands on the
16 underlying database. Queries return record arrays if possible (although a
17 flag can explicitly change this).
18
19 Query results are cached to improve performance. This can be disabled
20 (which is recommened for large data sets).
21
22 The SQL table is named on initialization. Later one can refer to this table
23 by the name or the magic name *__self__* in SQL statements. Additional
24 tables can be added to the same database (by using the connection keyword
25 of the constructor)
26
27 The :mod:`recsql.rest_table` module uses the base functionality to
28 parse a restructured text table from a string (such as a doc string)
29 and returns a nicely structured table. This allows for use of
30 parameters that are documented in the doc strings.
31
32 .. SeeAlso:: PyTables_ is a high-performance interface to table
33 data. In most cases you will probably better off in the
34 long run using PyTables than recSQL.
35
36 .. _PyTables: http://www.pytables.org
37
38
39 Important functions and classes
40 ===============================
41
42 A :class:`SQLarray` can be constructed by either reading data from a
43 CSV file or reST table with the :func:`SQLarray_fromfile` function or
44 constructed directly from a :class:`numpy.recarray` via the
45 :class:`SQLarray` constructor.
46
47 .. autofunction:: SQLarray_fromfile
48 .. autoclass:: SQLarray
49 :members:
50
51 Example
52 =======
53
54 >>> from recsql import SQLarray
55 >>> import numpy
56 >>> a = numpy.rec.fromrecords(numpy.arange(100).reshape(25,4), names='a,b,c,d')
57 >>> Q = SQLarray('my_name', a)
58 >>> print repr(Q.recarray)
59 rec.array([(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11), (12, 13, 14, 15),
60 (16, 17, 18, 19), (20, 21, 22, 23), (24, 25, 26, 27),
61 (28, 29, 30, 31), (32, 33, 34, 35), (36, 37, 38, 39),
62 (40, 41, 42, 43), (44, 45, 46, 47), (48, 49, 50, 51),
63 (52, 53, 54, 55), (56, 57, 58, 59), (60, 61, 62, 63),
64 (64, 65, 66, 67), (68, 69, 70, 71), (72, 73, 74, 75),
65 (76, 77, 78, 79), (80, 81, 82, 83), (84, 85, 86, 87),
66 (88, 89, 90, 91), (92, 93, 94, 95), (96, 97, 98, 99)],
67 dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4'), ('d', '<i4')])
68 >>> Q.SELECT('*', 'WHERE a < 10 AND b > 5')
69 rec.array([(8, 9, 10, 11)],
70 dtype=[('a', '<i4'), ('b', '<i4'), ('c', '<i4'), ('d', '<i4')])
71 # creating new SQLarrays:
72 >>> R = Q.selection('a < 20 AND b > 5')
73 >>> print R
74 <recsql.sqlarray.SQLarray object at 0x...>
75
76
77
78 Additional SQL functions
79 ========================
80
81 Note that the SQL database that is used as the backend for
82 :class:`SQLarray` has a few additional functions defined in addition
83 to the `standard SQL available in sqlite`_. These can be used in
84 ``SELECT`` statements and often avoid post-processing of record arrays
85 in python. It is relatively straightforward to add new functions (see
86 the source code and in particular the
87 :meth:`recsql.sqlarray.SQLarray._init_sql_functions` method; the
88 functions themselves are defined in the module :mod:`recsql.sqlfunctions`).
89
90 .. _standard SQL available in sqlite: http://www.sqlite.org/lang.html
91
92
93 Simple SQL functions
94 --------------------
95
96 Simple functions transform a single input value into a single output value:
97
98 ===================== =============================================
99 Expression SQL equivalent
100 ===================== =============================================
101 y = f(x) SELECT f(x) AS y
102 ===================== =============================================
103
104 Two additional simple functions have been defined:
105
106 ===================== =============================================
107 Simple SQL f() description
108 ===================== =============================================
109 sqrt(x) square root math.sqrt(x)
110 fformat(format,x) string formatting of a single value format % x
111 ===================== =============================================
112
113
114 Aggregate SQL functions
115 -----------------------
116
117 Aggregate functions combine data from a query; they are typically used with
118 a 'GROUP BY col' clause. They can be thought of as numpy ufuncs:
119
120 ===================== =============================================
121 Expression SQL equivalent
122 ===================== =============================================
123 y = f(x1,x2,...xN) SELECT f(x) AS y ... GROUP BY x
124 ===================== =============================================
125
126 For completeness, the tablke also lists sqlite built-in aggregate
127 functions:
128
129 ===================== =============================================
130 Simple aggregate f() description
131 ===================== =============================================
132 avg(x) mean [sqlite builtin]
133 std(x) standard deviation (using N-1 variance)
134 median(x) median of the data (see ``numpy.median``)
135 min(x) minimum [sqlite builtin]
136 max(x) maximum [sqlite builtin]
137 ===================== =============================================
138
139
140 PyAggregate SQL functions
141 -------------------------
142
143 PyAggregate functions act on a list of data points in the same way as
144 ordinary aggregate functions but they return python objects such as numpy
145 arrays, or tuples of numpy arrays (eg bin edges and histogram). In order to
146 make this work, specific types have to be declared when returning the
147 results:
148
149 For instance, the histogram() function returns a python Object, the tuple
150 (histogram, edges)::
151
152 a.sql('SELECT histogram(x) AS "x [Object]" FROM __self__', asrecarray=False)
153
154 The return type ('Object') needs to be declared with the ``'AS "x [Object]"'``
155 syntax (note the quotes). (See more details in the `sqlite documentation`_
156 under `adapters and converters`_.) The following table lists all *PyAggregate*
157 functions that have been defined:
158
159 .. _sqlite documentation: http://docs.python.org/library/sqlite3.html
160 .. _adapters and converters:
161 http://docs.python.org/library/sqlite3.html#using-adapters-to-store-additional-python-types-in-sqlite-databases
162
163 =============== ============== ==============================================================
164 PyAggregate type signature; description
165 =============== ============== ==============================================================
166 array NumpyArray array(x);
167 a standard numpy array
168
169 histogram Object histogram(x,nbins,xmin,xmax);
170 histogram x in nbins evenly spaced bins between xmin and xmax
171
172 distribution Object distribution(x,nbins,xmin,xmax);
173 normalized histogram whose integral gives 1
174
175 meanhistogram Object meanhistogram(x,y,nbins,xmin,xmax);
176 histogram data points y along x and average all y in each bin
177
178 stdhistogram Object stdhistogram(x,y,nbins,xmin,xmax);
179 give the standard deviation (from N-1 variance)
180 std(y) = sqrt(Var(y)) with Var(y) = <(y-<y>)^2>
181
182 medianhistogram Object medianhistogram((x,y,nbins,xmin,xmax);
183 median(y)
184
185 minhistogram Object minhistogram((x,y,nbins,xmin,xmax);
186 min(y)
187
188 maxhistogram Object maxhistogram((x,y,nbins,xmin,xmax);
189 max(y)
190
191 zscorehistogram Object zscorehistogram((x,y,nbins,xmin,xmax);
192 <abs(y-<y>)>/std(y)
193 =============== ============== ==============================================================
194
195
196
197 Examples of using types in tables
198 =================================
199
200 The following show how to use the special types.
201
202 Declare types as 'NumpyArray'::
203
204 a.sql("CREATE TABLE __self__(a NumpyArray)")
205
206 Then you can simply insert python objects (type(my_array) == numpy.ndarray)::
207
208 a.sql("INSERT INTO __self__(a) values (?)", (my_array,))
209
210 When returning results of declared columns one does not have to do anything ::
211
212 (my_array,) = a.sql("SELECT a FROM __self__")
213
214 although one can also do ::
215
216 (my_array,) = q.sql('SELECT a AS "a [NumpyArray]" FROM __self__')
217
218 but when using a PyAggregate the type *must* be declared::
219
220 a.sql('SELECT histogram(x,10,0.0,1.5) as "hist [Object]" FROM __self__')
221 """
222 VERSION = 0,7,3
223
224 __all__ = ['sqlarray']
225
226 from sqlarray import SQLarray, SQLarray_fromfile
227
229 """Return current package version as a string."""
230 return ".".join(map(str,VERSION))
231
233 """Return current package version as a (MAJOR,MINOR,PATCHLEVEL)."""
234 return tuple(VERSION)
235