Python supports several ways of persisting data. One way, serialization, views data as a collection of Python objects. These objects can be serialized (saved) to a byte stream, and later deserialized (loaded and re-created) back from the byte stream. Object persistence lives on top of serialization, adding features such as object naming. This chapter covers the Python modules that support serialization and object persistence.
Another way to make data persistent is to store it in a database (DB). One simple category of DBs are just file formats that use keyed access to enable selective reading and updating of relevant parts of the data. This chapter covers Python standard library modules that support several variations of such a file format, known as DBM.
A relational DB management system (RDBMS), such as PostgreSQL or Oracle, offers a more powerful approach to storing, searching, and retrieving persistent data. Relational DBs rely on dialects of Structured Query Language (SQL) to create and alter a DB’s schema, insert and update data in the DB, and query the DB with search criteria. (This book does not provide reference material on SQL. We recommend SQL in a Nutshell, by Kevin Kline [O’Reilly].) Unfortunately, despite the existence of SQL standards, no two RDBMSes implement exactly the same SQL dialect.
The Python standard library does not come with an RDBMS interface. However, many third-party modules let your Python programs access a specific RDBMS. Such modules mostly follow the Python Database API 2.0 standard, also known as the DBAPI. This chapter covers the DBAPI standard and mentions a few of the most popular third-party modules that implement it.
A DBAPI module that is particularly handy—because it comes with every standard installation of Python—is sqlite3, which wraps SQLite, “a self-contained, server-less, zero-configuration, transactional SQL DB engine,” which is the most widely deployed relational DB engine in the world. We cover sqlite3
in “SQLite”.
Besides relational DBs, and the simpler approaches covered in this chapter, there exist Python-specific object DBs such as ZODB, as well as many NoSQL DBs, each with Python interfaces. We do not cover advanced nonrelational DBs in this book.
Python supplies several modules to serialize (save) Python objects to various kinds of byte streams and deserialize (load and re-create) Python objects back from streams. Serialization is also known as marshaling, or as “formatting for data interchange.”
Serialization approaches span the range from (by now) language-independent JSON to (low-level, Python-version-specific) marshal
, both limited to elementary data types, through richer but Python-specific pickle
in addition to rich cross-language formats such as XML, YAML, protocol buffers, and MessagePack.
In this section, we cover JSON and pickle
. We cover XML in Chapter 23. marshal
is too low-level to use in applications; should you need to maintain old code using it, refer to the online docs. As for protocol buffers, MessagePack, YAML, and other data-interchange/serialization approaches (each with specific advantages and weaknesses), we cannot cover everything in this book; we recommend studying them via the resources available on the web.
The standard library’s json
module supplies four key functions:
dump |
When When When When You can optionally pass When
You can optionally pass When |
dumps |
|
load |
JSON serializes just one object per fileJSON is not what is known as a framed format: this means it is not possible to call
The functions You can optionally pass You can optionally pass
|
loads |
|
Say you need to read several text files, whose names are given as your program’s arguments, recording where each distinct word appears in the files. What you need to record for each word is a list of (filename,
linenumber)
pairs. The following example uses json
to encode lists of (filename,
linenumber)
pairs as strings and store them in a DBM-like file (as covered in “DBM Modules”). Since these lists contain tuples, each containing a string and a number, they are within json
’s abilities to serialize.
import
collections
,
fileinput
,
json
,
dbm
word_pos
=
collections
.
defaultdict
(
list
)
for
line
in
fileinput
.
input
():
pos
=
fileinput
.
filename
(),
fileinput
.
filelineno
()
for
word
in
line
.
split
():
word_pos
[
word
]
.
append
(
pos
)
dbm_out
=
dbm
.
open
(
'indexfilem'
,
'n'
)
for
word
in
word_pos
:
dbm_out
[
word
]
=
json
.
dumps
(
word_pos
[
word
])
dbm_out
.
close
()
(In v3 only, dbm.open
is a context manager, so we could indent the second for
loop as the body of a statement with dbm.open('indexfilem', 'n') as dbm_out:
and omit the dbm_out.close()
; however, the example, as coded, works in both v2 and v3, except that in v2, to ensure the example works across platforms, you’d import and use the module anydbm
instead of the package dbm
, and that also applies to the following example.) We also need json
to deserialize the data stored in the DBM-like file indexfilem, as shown in the following example:
import
sys
,
json
,
dbm
,
linecache
dbm_in
=
dbm
.
open
(
'indexfilem'
)
for
word
in
sys
.
argv
[
1
:]:
if
word
not
in
dbm_in
:
(
'Word
{!r}
not found in index file'
.
format
(
word
),
file
=
sys
.
stderr
)
continue
places
=
json
.
loads
(
dbm_in
[
word
])
for
fname
,
lineno
in
places
:
(
'Word
{!r}
occurs in line
{}
of file
{}
:'
.
format
(
word
,
lineno
,
fname
))
(
linecache
.
getline
(
fname
,
lineno
),
end
=
''
)
The pickle
and, in v2, cPickle
modules supply factory functions, named Pickler
and Unpickler
, to generate objects that wrap file-like objects and supply Python-specific serialization mechanisms. Serializing and deserializing via these modules is also known as pickling and unpickling.
In v2 only, the difference between the modules is that, in pickle
, Pickler
and Unpickler
are classes, so you can inherit from these classes to create customized serializer objects, overriding methods as needed. In cPickle
, on the other hand, Pickler
and Unpickler
are factory functions that generate instances of non-subclassable types, not classes. Performance is much better with cPickle
, but inheritance is not feasible. In the rest of this section, we’ll be talking about the module pickle
, but everything applies to cPickle
too. v3 only supplies pickle
, which is quite fast and supplies Pickler
and Unpickler
as classes.
Serialization shares some of the issues of deep copying, covered in “The copy Module”. The pickle
module deals with these issues in much the same way as the copy
module does. Serialization, like deep copying, implies a recursive walk over a directed graph of references. pickle
preserves the graph’s shape: when the same object is encountered more than once, the object is serialized only the first time, and other occurrences of the same object serialize references to that single value. pickle
also correctly serializes graphs with reference cycles. However, this means that if a mutable object o
is serialized more than once to the same Pickler
instance p
, any changes to o
after the first serialization of o
to p
are not saved.
For clarity, correctness, and simplicity, don’t alter objects that are being serialized while serialization to a Pickler
instance is in progress.
pickle
can serialize with a legacy ASCII protocol or with one of a few compact binary ones. In v2, the ASCII protocol 0
is the default, for backward compatibility, but you should normally explicitly request binary protocol 2
, the v2-supported protocol that’s most parsimonious of time and storage space. In v3, protocols range from 0
to 4
, included; the default is 3
, which is usually a reasonable choice, but you may explicitly specify protocol 2
(to ensure that your saved pickles can be loaded by v2 programs), or protocol 4
, incompatible with earlier versions but with performance advantages for very large objects.
Always specify at least protocol 2
. The size and speed savings can be substantial, and binary format has basically no downside except loss of compatibility of resulting pickles with truly ancient versions of Python.
When you reload objects, pickle
transparently recognizes and uses any protocol that the Python version you’re currently using supports.
pickle
serializes classes and functions by name, not by value. pickle
can therefore deserialize a class or function only by importing it from the same module where the class or function was found when pickle
serialized it. In particular, pickle
can normally serialize and deserialize classes and functions only if they are top-level names for their module (i.e., attributes of their module). For example, consider the following:
def
adder
(
augend
):
def
inner
(
addend
,
augend
=
augend
):
return
addend
+
augend
return
inner
plus5
=
adder
(
5
)
This code binds a closure to name plus5
(as covered in “Nested functions and nested scopes”)—a nested function inner
plus an appropriate outer scope. Therefore, trying to pickle plus5
raises a pickle.PicklingError
exception (in v2; just AttributeError
in v3): a function can be pickled only when it is top-level, and the function inner
, whose closure is bound to the name plus5
in this code, is not top-level but rather nested inside the function adder
. Similar issues apply to pickling nested functions and nested classes (i.e., classes that are not top-level).
The pickle
module (and, in v2 only, the module cPickle
) exposes the following functions and classes:
dump, dumps |
|
load, loads |
The functions
Don’t unpickle untrusted dataUnpickling from an untrusted data source is a security risk; an attacker could exploit this vulnerability to execute arbitrary code. |
Pickler |
Creates and returns an object |
Unpickler |
Creates and returns an object |
The following example handles the same task as the json
example shown earlier, but uses pickle
instead of json
to serialize lists of (filename,
linenumber)
pairs as strings:
import
collections
,
fileinput
,
pickle
,
dbm
word_pos
=
collections
.
defaultdict
(
list
)
for
line
in
fileinput
.
input
():
pos
=
fileinput
.
filename
(),
fileinput
.
filelineno
()
for
word
in
line
.
split
():
word_pos
[
word
]
.
append
(
pos
)
dbm_out
=
dbm
.
open
(
'indexfilep'
,
'n'
)
for
word
in
word_pos
:
dbm_out
[
word
]
=
pickle
.
dumps
(
word_pos
[
word
],
2
)
dbm_out
.
close
()
We can then use pickle
to read back the data stored to the DBM-like file indexfilep, as shown in the following example:
import
sys
,
pickle
,
dbm
,
linecache
dbm_in
=
dbm
.
open
(
'indexfilep'
)
for
word
in
sys
.
argv
[
1
:]:
if
word
not
in
dbm_in
:
(
'Word
{!r}
not found in index file'
.
format
(
word
),
file
=
sys
.
stderr
)
continue
places
=
pickle
.
loads
(
dbm_in
[
word
])
for
fname
,
lineno
in
places
:
(
'Word
{!r}
occurs in line
{}
of file
{}
:'
.
format
(
word
,
lineno
,
fname
))
(
linecache
.
getline
(
fname
,
lineno
),
end
=
''
)
In v2, in both examples, to ensure the code works across platforms, you’d import and use the module anydbm
instead of the package dbm
.
In order for pickle
to reload an instance x
, pickle
must be able to import x
’s class from the same module in which the class was defined when pickle
saved the instance. Here is how pickle
saves the state of instance object x
of class T
and later reloads the saved state into a new instance y
of T
(the first step of the reloading is always to make a new empty instance y
of T
, except where we explicitly say otherwise in the following):
When T
supplies the method __getstate__
, pickle
saves the result d
of calling T.__getstate__(x)
.
When T
supplies the method __setstate__
, d
can be of any type, and pickle
reloads the saved state by calling T.__setstate__(y,
d)
.
Otherwise, d
must be a dictionary, and pickle
just sets y.__dict__ =
d
.
Otherwise, when T
supplies the method __getnewargs__
, and pickle
is pickling with protocol 2 or higher, pickle
saves the result t
of calling T.__getnewargs__(x)
; t
must be a tuple.
pickle
, in this one case, does not start with an empty y
, but rather creates y
by executing y
=
T.__new__(T, *t)
, which concludes the reloading.
Otherwise, by default, pickle
saves as d
the dictionary x.__dict__
.
When T
supplies the method __setstate__
, pickle
reloads the saved state by calling T.__setstate__ (y,
d)
.
Otherwise, pickle
just sets y.__dict__ =
d
.
All the items in the d
or t
object that pickle
saves and reloads (normally a dictionary or tuple) must in turn be instances of types suitable for pickling and unpickling (AKA pickleable objects), and the procedure just outlined may be repeated recursively, if necessary, until pickle
reaches primitive pickleable built-in types (dictionaries, tuples, lists, sets, numbers, strings, etc.).
As mentioned in “The copy Module”, the special methods __getnewargs__
, __getstate__
, and __setstate__
also control the way instance objects are copied and deep-copied. If a class defines __slots__
, and therefore its instances do not have a __dict__
, pickle
does its best to save and restore a dictionary equivalent to the names and values of the slots. However, such a class should define __getstate__
and __setstate__
; otherwise, its instances may not be correctly pickleable and copy-able through such best-effort endeavors.
You can control how pickle
serializes and deserializes objects of an arbitrary type by registering factory and reduction functions with the module copy_reg
. This is particularly, though not exclusively, useful when you define a type in a C-coded Python extension. The copy_reg
module supplies the following functions:
constructor |
Adds |
pickle |
Registers function
|
The shelve
module orchestrates the modules pickle
(or cPickle
, in v2, when available), io
(in v3; in v2, cStringIO
when available, and otherwise StringIO
), and dbm
(and its underlying modules for access to DBM-like archive files, as discussed in “DBM Modules”; that’s in v3—anydbm
in v2) in order to provide a simple, lightweight persistence mechanism.
shelve
supplies a function open
that is polymorphic to anydbm.open
. The mapping s
returned by shelve.open
is less limited than the mapping a
returned by anydbm.open
. a
’s keys and values must be strings. s
’s keys must also be strings, but s
’s values may be of any pickleable types. pickle
customizations (copy_reg
, __getnewargs__
, __getstate__
, and __setstate__
) also apply to shelve
, as shelve
delegates serialization to pickle
.
Beware of a subtle trap when you use shelve
with mutable objects: when you operate on a mutable object held in a shelf, the changes don’t “take” unless you assign the changed object back to the same index. For example:
import
shelve
s
=
shelve
.
open
(
'
data
'
)
s
[
'
akey
'
]
=
list
(
range
(
4
)
)
(
s
[
'
akey
'
]
)
# prints:
[0, 1, 2, 3]
s
[
'
akey
'
]
.
append
(
9
)
# trying direct mutation
(
s
[
'
akey
'
]
)
# doesn't "take"; prints:
[0, 1, 2, 3]
x
=
s
[
'
akey
'
]
# fetch the object
x
.
append
(
9
)
# perform mutation
s
[
'
akey
'
]
=
x
# key step: store the object back!
(
s
[
'
akey
'
]
)
# now it "takes", prints:
[0, 1, 2, 3, 9]
You can finesse this issue by passing the named argument writeback=True
when you call shelve.open
, but beware: if you do pass that argument, you may seriously impair the performance of your program.
The following example handles the same task as the earlier json
and pickle
examples, but uses shelve
to persist lists of (filename,
linenumber)
pairs:
import
collections
,
fileinput
,
shelve
word_pos
=
collections
.
defaultdict
(
list
)
for
line
in
fileinput
.
input
():
pos
=
fileinput
.
filename
(),
fileinput
.
filelineno
()
for
word
in
line
.
split
():
word_pos
[
word
]
.
append
(
pos
)
sh_out
=
shelve
.
open
(
'indexfiles'
,
'n'
)
sh_out
.
update
(
word_pos
)
sh_out
.
close
()
We must use shelve
to read back the data stored to the DBM-like file indexfiles, as shown in the following example:
import
sys
,
shelve
,
linecache
sh_in
=
shelve
.
open
(
'indexfiles'
)
for
word
in
sys
.
argv
[
1
:]:
if
word
not
in
sh_in
:
(
'Word
{!r}
not found in index file'
.
format
(
word
),
file
=
sys
.
stderr
)
continue
places
=
sh_in
[
word
]
for
fname
,
lineno
in
places
:
(
'Word
{!r}
occurs in line
{}
of file
{}
:'
.
format
(
word
,
lineno
,
fname
))
(
linecache
.
getline
(
fname
,
lineno
),
end
=
''
)
These two examples are the simplest and most direct of the various equivalent pairs of examples shown throughout this section. This reflects the fact that the module shelve
is higher-level than the modules used in previous examples.
DBM, a long-time Unix tradition, is a family of libraries supporting data files with pairs of strings (key,data)
, with fast fetching and storing of the data given a key, a usage pattern known as keyed access. Keyed access, while nowhere as powerful as the data-access functionality of relational DBs, imposes less overhead, yet may suffice for a given program’s needs. If DBM-like files are sufficient, you may end up with a program that is smaller and faster than one using a relational DB.
v3 organizes DBM support in Python’s standard library in a clean and elegant way: the package dbm
exposes two general functions, and within the same package live other modules supplying specific implementations. v2’s support, while functionally equivalent, evolved over time in a less organized fashion, thus ending up in a less elegant arrangement, as a collection of top-level modules not organized into a package. This section briefly presents v3’s organization first, and then v2’s, with references from the latter back to the former to show equivalent functionality.
v3’s dbm
package supplies the following two top-level functions:
open |
Opens or creates the DBM file named by
One extra method that |
||||||||||||||||||||
whichdb |
Opens and reads the file specified by |
In addition to these two top-level functions, the v3 package dbm
contains specific modules, such as ndbm
, gnu
, and dumb
, which provide various implementations of DBM functionality, which you normally access only via the two top-level functions of the package dbm
. Third-party packages can install further implementation modules in dbm
.
The only implementation module of the dbm
package that’s guaranteed to exist on all platforms is dumb
. The dumb
submodule of dbm
has minimal DBM functionality and mediocre performance. dumb
’s only advantage is that you can use it anywhere, since dumb
does not rely on any library. You don’t normally import dbm.dumb
: rather, import dbm
, and let dbm.open
supply the best DBM module available, defaulting to dumb
if nothing else is available on the current Python installation. The only case in which you import dumb
directly is the rare one in which you need to create a DBM-like file guaranteed to be readable from any Python installation. The dumb
module supplies an open
function polymorphic to dbm
’s.
v2’s anydbm
module is a generic interface to any other DBM module. anydbm
supplies a single factory function open
, equivalent to v3’s dbm.open
.
v2’s whichdb
module supplies a single function, whichdb
, equivalent to v3’s dbm.whichdb
.
DBM implementation modules, in v2, are top-level ones, named dbm
, gdbm
, dumbdbm
, and so forth, and otherwise paralleling v3 dbm
submodules.
DBM’s keyed access is suitable when your program needs to record persistently the equivalent of a Python dictionary, with strings as both keys and values. For example, suppose you need to analyze several text files, whose names are given as your program’s arguments, and record where each word appears in those files. In this case, the keys are words and, therefore, intrinsically strings. The data you need to record for each word is a list of (filename,
linenumber)
pairs. However, you can encode the data as a string in several ways—for example, by exploiting the fact that the path separator string os.pathsep
(covered in “Path-String Attributes of the os Module”) does not normally appear in filenames. (More solid, general, and reliable approaches to the issue of encoding data as strings are covered with the same example in “Serialization”.) With this simplification, the program that records word positions in files might be as follows, in v3:
import
collections
,
fileinput
,
os
,
dbm
word_pos
=
collections
.
defaultdict
(
list
)
for
line
in
fileinput
.
input
():
pos
=
'
{}{}{}
'
.
format
(
fileinput
.
filename
(),
os
.
pathsep
,
fileinput
.
filelineno
())
for
word
in
line
.
split
():
word_pos
[
word
]
.
append
(
pos
)
sep2
=
os
.
pathsep
*
2
with
dbm
.
open
(
'indexfile'
,
'n'
)
as
dbm_out
:
for
word
in
word_pos
:
dbm_out
[
word
]
=
sep2
.
join
(
word_pos
[
word
])
We can read back the data stored to the DBM-like file indexfile in several ways. The following example accepts words as command-line arguments and prints the lines where the requested words appear, in v3:
import
sys
,
os
,
dbm
,
linecache
dbm_in
=
dbm
.
open
(
'indexfile'
)
sep
=
os
.
pathsep
sep2
=
sep
*
2
for
word
in
sys
.
argv
[
1
:]:
if
word
not
in
dbm_in
:
(
'Word
{!r}
not found in index file'
.
format
(
word
),
file
=
sys
.
stderr
)
continue
places
=
dbm_in
[
word
]
.
split
(
sep2
)
for
place
in
places
:
fname
,
lineno
=
place
.
split
(
sep
)
(
'Word
{!r}
occurs in line
{}
of file
{}
:'
.
format
(
word
,
lineno
,
fname
))
(
linecache
.
getline
(
fname
,
lineno
),
end
=
''
)
In v2, in both examples, import and use the module anydbm
instead of the package dbm
.
v2 comes with the bsddb
package, which wraps the Berkeley Database (also known as BSD DB) library if that library is installed on your system and your Python installation is built to support it. However, bsddb
is deprecated in v2, and not present in v3, so we cannot recommend it. If you do need to interface to a BSD DB archive, we recommend instead the excellent third-party package bsddb3.
As we mentioned earlier, the Python standard library does not come with an RDBMS interface (except for sqlite3
, covered in “SQLite”, which is a rich implementation, not just an interface). Many third-party modules let your Python programs access specific DBs. Such modules mostly follow the Python Database API 2.0 standard, also known as the DBAPI, as specified in PEP 249.
After importing any DBAPI-compliant module, call the module’s connect
function with DB-specific parameters. connect
returns x
, an instance of Connection
, which represents a connection to the DB. x
supplies commit
and rollback
methods to deal with transactions, a close
method to call as soon as you’re done with the DB, and a cursor
method to return c
, an instance of Cursor
. c
supplies the methods and attributes used for DB operations. A DBAPI-compliant module also supplies exception classes, descriptive attributes, factory functions, and type-description attributes.
A DBAPI-compliant module supplies the exception classes Warning
, Error
, and several subclasses of Error
. Warning
indicates anomalies such as data truncation on insertion. Error
’s subclasses indicate various kinds of errors that your program can encounter when dealing with the DB and the DBAPI-compliant module that interfaces to it. Generally, your code uses a statement of the form:
try
:
...
except
module
.
Error
as
err
:
...
to trap all DB-related errors that you need to handle without terminating.
When a DBAPI-compliant module has a threadsafety
attribute greater than 0
, the module is asserting some level of thread safety for DB interfacing. Rather than relying on this, it’s usually safer, and always more portable, to ensure that a single thread has exclusive access to any given external resource, such as a DB, as outlined in “Threaded Program Architecture”.
A DBAPI-compliant module has an attribute called paramstyle
to identify the style of markers used as placeholders for parameters. Insert such markers in SQL statement strings that you pass to methods of Cursor
instances, such as the method execute
, to use runtime-determined parameter values. Say, for example, that you need to fetch the rows of DB table ATABLE
where field AFIELD
equals the current value of Python variable x
. Assuming the cursor instance is named c
, you could theoretically (but ill-advisedly) perform this task with Python’s string formatting:
c
.
execute
(
'SELECT * FROM ATABLE WHERE AFIELD=
{!r}
'
.
format
(
x
))
String formatting is not the recommended approach. It generates a different string for each value of x
, requiring statements to be parsed and prepared anew each time; it also opens up the possibility of security weaknesses such as SQL injection vulnerabilities. With parameter substitution, you pass to execute
a single statement string, with a placeholder instead of the parameter value. This lets execute
parse and prepare the statement just once, for better performance; more importantly, parameter substitution improves solidity and security.
For example, when a module’s paramstyle
attribute (described next) is 'qmark'
, express the preceding query as:
c
.
execute
(
'SELECT * FROM ATABLE WHERE AFIELD=?'
,
(
some_value
,))
The read-only string attribute paramstyle
tells your program how it should use parameter substitution with that module. The possible values of paramstyle
are:
format
The marker is %s
, as in old-style string formatting (always with s
: never use other type indicator letters, whatever the data’s type). A query looks like:
c
.
execute
(
'SELECT * FROM ATABLE WHERE AFIELD=
%s
'
,
(
some_value
,))
named
The marker is :name
, and parameters are named. A query looks like:
c
.
execute
(
'SELECT * FROM ATABLE WHERE AFIELD=:x'
,
{
'x'
:
some_value
})
numeric
The marker is :n
, giving the parameter’s number, 1
and up. A query looks like:
c
.
execute
(
'SELECT * FROM ATABLE WHERE AFIELD=:1'
,
(
some_value
,))
pyformat
The marker is %(name)s
, and parameters are named. Always use s
: never use other type indicator letters, whatever the data’s type. A query looks like:
c
.
execute
(
'SELECT * FROM ATABLE WHERE AFIELD=
%(x)s
'
,
{
'x'
:
some_value
})
qmark
The marker is ?
. A query looks like:
c
.
execute
(
'SELECT * FROM ATABLE WHERE AFIELD=?'
,
(
x
,))
When parameters are named (i.e., when paramstyle
is 'pyformat'
or 'named'
), the second argument of the execute
method is a mapping. Otherwise, the second argument is a sequence.
The only valid type indicator letter for format
or pyformat
is s
; neither accepts any other type indicator—for example, never use %d
nor %(name)d
. Use %s
or %(name)s
for all parameter substitutions, regardless of the type of the data.
Parameters passed to the DB via placeholders must typically be of the right type: this means Python numbers (integers or floating-point values), strings (bytes or Unicode), and None
to represent SQL NULL
. There is no type universally used to represent dates, times, and binary large objects (BLOBs). A DBAPI-compliant module supplies factory functions to build such objects. The types used for this purpose by most DBAPI-compliant modules are those supplied by the modules datetime
or mxDateTime
(covered in Chapter 12), and strings or buffer types for BLOBs. The factory functions specified by the DBAPI are as follows:
Binary |
Returns an object representing the given |
Date |
Returns an object representing the specified date. |
DateFromTicks |
Returns an object representing the date |
Time |
Returns an object representing the specified time. |
TimeFromTicks |
Returns an object representing the time |
Timestamp |
Returns an object representing the specified date and time. |
TimestampFromTicks |
Returns an object representing the date and time |
A Cursor
instance’s attribute description
describes the types and other characteristics of each column of the SELECT
query you last execute
d on that cursor. Each column’s type (the second item of the tuple describing the column) equals one of the following attributes of the DBAPI-compliant module:
BINARY |
Describes columns containing BLOBs |
DATETIME |
Describes columns containing dates, times, or both |
NUMBER |
Describes columns containing numbers of any kind |
ROWID |
Describes columns containing a row-identification number |
STRING |
Describes columns containing text of any kind |
A cursor’s description, and in particular each column’s type, is mostly useful for introspection about the DB your program is working with. Such introspection can help you write general modules and work with tables using different schemas, including schemas that may not be known at the time you are writing your code.
A DBAPI-compliant module’s connect
function accepts arguments that depend on the kind of DB and the specific module involved. The DBAPI standard recommends that connect
accept named arguments. In particular, connect
should at least accept optional arguments with the following names:
database |
Name of the specific database to connect |
dsn |
Data-source name to use for the connection |
host |
Hostname on which the database is running |
password |
Password to use for the connection |
user |
Username for the connection |
A DBAPI-compliant module’s connect
function returns an object x
that is an instance of the class Connection
. x
supplies the following methods:
close |
Terminates the DB connection and releases all related resources. Call |
commit |
Commits the current transaction in the DB. If the DB does not support transactions, |
cursor |
Returns a new instance of the class |
rollback |
Rolls back the current transaction in the DB. If the DB does not support transactions, |
A Connection
instance provides a cursor
method that returns an object c
that is an instance of the class Cursor
. A SQL cursor represents the set of results of a query and lets you work with the records in that set, in sequence, one at a time. A cursor as modeled by the DBAPI is a richer concept, since it’s the only way your program executes SQL queries in the first place. On the other hand, a DBAPI cursor allows you only to advance in the sequence of results (some relational DBs, but not all, also provide higher-functionality cursors that are able to go backward as well as forward), and does not support the SQL clause WHERE CURRENT OF CURSOR
. These limitations of DBAPI cursors enable DBAPI-compliant modules to easily provide DBAPI cursors even on RDBMSes that provide no real SQL cursors at all. An instance of the class Cursor
c
supplies many attributes and methods; the most frequently used ones are the following:
close |
Closes the cursor and releases all related resources. |
description |
A read-only attribute that is a sequence of seven-item tuples, one per column in the last query executed:
|
execute |
Executes a SQL |
executemany |
Executes a SQL
when
|
fetchall |
Returns all remaining result rows from the last query as a sequence of tuples. Raises an exception if the last operation was not a |
fetchmany |
Returns up to |
fetchone |
Returns the next result row from the last query as a tuple. Raises an exception if the last operation was not a |
rowcount |
A read-only attribute that specifies the number of rows fetched or affected by the last operation, or |
Whatever relational DB you want to use, there’s at least one (often more than one) Python DBAPI-compliant module downloadable from the Internet. There are so many DBs and modules, and the set of possibilities is so constantly shifting, that we couldn’t possibly list them all, nor (importantly) could we maintain the list over time. Rather, we recommend you start from the community-maintained wiki page, which has at least a fighting chance to be complete and up-to-date at any time.
What follows is therefore only a very short, time-specific list of a very few DBAPI-compliant modules that, at the time of writing, are very popular themselves, and interface to very popular open source DBs.
Open DataBase Connectivity (ODBC) is a standard way to connect to many different DBs, including a few not supported by other DBAPI-compliant modules. For an ODBC-compliant DBAPI-compliant module with a liberal open source license, use pyodbc; for a commercially supported one, mxODBC.
MySQL is a popular open source RDBMS, currently owned by Oracle. Oracle’s own “official” DBAPI-compliant interface to it is MySQL Connector/Python.
PostgreSQL is an excellent open source RDBMS. The most popular DBAPI-compliant interface to it is psycopg2.
SQLite is “a self-contained, server-less, zero-configuration, transactional SQL database engine,” which is the most widely deployed DB engine in the world—it’s a C-coded library that implements a DB within a single file, or even in memory for sufficiently small and transient cases. Python’s standard library supplies the package sqlite3
, which is a DBAPI-compliant interface to SQLite
.
SQLite
has rich advanced functionality, with many options you can choose; sqlite3
offers access to much of that functionality, plus further possibilities to make interoperation between your Python code and the underlying DB even smoother and more natural. In this book, we don’t cover every nook and cranny of these two powerful software systems; we focus on the subset that is most commonly used and most useful. For a great level of detail, including examples and tips about best practices, see SQLite’s documentation and sqlite3’s online documentation. If you want a book on the subject, we recommend O’Reilly’s Using SQLite.
Package sqlite3
supplies the following functions:
connect |
The
To allow type name detection, pass as When you pass When you pass When you pass
By default, a connection object can be used only in the Python thread that created it, to avoid accidents that could easily corrupt the DB due to minor bugs in your program; minor bugs are common in multithreaded programming. If you’re entirely confident about your threads’ use of locks and other synchronization mechanisms, and do need to reuse a connection object among multiple threads, you can pass
|
||||||||||||
register_adapter |
|
||||||||||||
register_converter |
|
In addition, sqlite3
supplies the classes Connection
, Cursor
, and Row
. Each can be subclassed for further customization; however, this is an advanced issue that we do not cover further in this book. The Cursor
class is a standard DBAPI cursor class, except for an extra convenience method executescript
accepting a single argument, a string of multiple statements separated by ;
(no parameters). The other two classes are covered in the following sections.
In addition to the methods common to all Connection
classes of DBAPI-compliant modules, covered in “Connection Objects”, sqlite3.Connection
supplies the following methods and other attributes:
create_aggregate |
|
create_collation |
|
create_function |
|
interrupt |
Call from any other thread to abort all queries executing on this connection (raising an exception in the thread using the connection). |
isolation_level |
A read-only attribute that’s the value given as |
iterdump |
Returns an iterator that yields strings: the SQL statements that build the current DB from scratch, including both schema and contents. Useful, for example, to persist an in-memory DB to one on disk for future reuse. |
row_factory |
A callable that accepts the cursor and the original row as a tuple, and returns an object to use as the real result row. A common idiom is |
text_factory |
A callable that accepts a single bytestring parameter and returns the object to use for that TEXT column value—by default, |
total_changes |
The total number of rows that have been modified, inserted, or deleted since the connection was created. |
sqlite3
supplies the class Row
, which is mostly like a tuple
but also supplies the method keys()
, returning a list of column names, and supports indexing by a column name as an extra alternative to indexing by column number.
The following example handles the same task as the examples shown earlier in the chapter, but uses sqlite3
for persistence, without creating the index in memory:
import
fileinput
,
sqlite3
connect
=
sqlite3
.
connect
(
'database.db'
)
cursor
=
connect
.
cursor
()
cursor
.
execute
(
'CREATE TABLE IF NOT EXISTS Words '
'(Word TEXT, File TEXT, Line INT)'
)
for
line
in
fileinput
.
input
():
f
,
l
=
fileinput
.
filename
(),
fileinput
.
filelineno
()
cursor
.
executemany
(
'INSERT INTO Words VALUES (:w, :f, :l)'
,
[{
'w'
:
w
,
'f'
:
f
,
'l'
:
l
}
for
w
in
line
.
split
()])
connect
.
commit
()
connect
.
close
()
We can then use sqlite3
to read back the data stored in the DB file database.db, as shown in the following example:
import
sys
,
sqlite3
,
linecache
connect
=
sqlite3
.
connect
(
'database.db'
)
cursor
=
connect
.
cursor
()
for
word
in
sys
.
argv
[
1
:]:
cursor
.
execute
(
'SELECT File, Line FROM Words '
'WHERE Word=?'
,
[
word
])
places
=
cursor
.
fetchall
()
if
not
places
:
(
'Word
{!r}
not found in index file'
.
format
(
word
),
file
=
sys
.
stderr
)
continue
for
fname
,
lineno
in
places
:
(
'Word
{!r}
occurs in line
{}
of file
{}
:'
.
format
(
word
,
lineno
,
fname
))
(
linecache
.
getline
(
fname
,
lineno
),
end
=
''
)