The ability to store data in a manageable database dramatically increases the options regarding the types of applications that can be created by Python. The Python language has built-in modules, as well as add-on modules, that provide an extensive platform for the persistent storage of data in various database formats.
This chapter familiarizes you with phrases used to create generic DBM files for simple persistent storage of data, as well as some advanced concepts such as pickling data to files and shelves. Most basic database needs can be handled by the DBM, pickle, and shelve modules. The advantage of those modules is that they do not require a backend database server.
This chapter also covers connecting to and using a MySQL server as the backend database engine for persistent storage. MySQLdb, available at http://www.mysql.org/, is an add-on Python package that conforms to the Python DB-API 2.0 specification. Python provides the DB-API specification to accommodate the numerous forms of SQL servers available. The specification provides the necessary framework to access most of the available SQL databases via add-on modules such as MySQLdb.
There are other SQL modules available for other SQL servers such as Oracle, JDBC, Sybase, and DB2, as well as others. Thanks to the Python DB-API spec, the phrases listed for MySQL can be used to access those SQL databases as well. You simply need to install the appropriate module and use that module’s connect
function to connect to the database.
Example .
import anydbm cityDB = anydbm.open("city.dbm", 'n') for flight in flights: cityDB[flight] = cities[i] cityDB.close()
The anydbm module provides a generic interface, allowing you to open databases based on several different lower-level packages that can be installed on the system. When imported, the anydbm module searches for the dbm, gdbm, and bsddb packages that provide access to the UNIX dbm, GNU DBM, and Berkely DB libraries, respectively. If none of those packages are available, then the dumbdbm module is loaded to provide access to a simple DBM-style database library.
The adybdm module provides the open
(filename [,flag [, mode]])
function that allows you to open and create databases (see the “Opening and Closing Files” phrase of Chapter 4, “Managing Files,” for more details).
When creating a new database, anydbm will try to use the database module that was first installed on the system.
The open
function returns a database object that behaves much the same as a dictionary. Entries can be added to the database by assigning a value to a key using the d
[key]
= value
syntax. The key
must be a standard string, and the value
must also be a standard string, except in the shelve module discussed in later phrases.
import anydbm cities = ["Dallas", "Los Angeles", "New York"] flights = ["1144", "1045", "1520"] times = ["230pm", "320pm", "420pm"] #Create DBM file cityDB = anydbm.open("city.dbm", 'n') timeDB = anydbm.open("time.dbm", 'n') #Add entries i = 0 for flight in flights: cityDB[flight] = cities[i] i += 1 i = 0 for flight in flights: timeDB[flight] = times[i] i += 1 print cityDB.items() print timeDB.items() #Close DBM file cityDB.close() timeDB.close()
add_dbm.py
[('1144', 'Dallas'), ('1045', 'Los Angeles'), ('1520', 'New York')] [('1144', '230pm'), ('1045', '320pm'), ('1520', '420pm')]]
Example .
import anydbm cityDB = anydbm.open("city.dbm", 'r') flights = cityDB.keys() for flight in flights: print ("Flight %s arrives from %s at %s" % (flight, cityDB[flight], timeDB[flight])) cityDB.close()
The anydbm module provides a generic interface allowing you to open databases based on several different lower-level packages that can be installed on the system. When imported, the anydbm module searches for the dbm, gdbm, or bsddb package. If none of those packages are available, the dumbdbm module is loaded and used for database I/O.
The anydbm module provides the open
(filename [,flag [, mode]])
function that allows you to open and create databases (see the “Opening and Closing Files” phrase of Chapter 4 for more details).
When opening an existing database, anydbm uses the whichdb module to determine which database module to use when opening the database.
Once the database has been opened, you can use the database object similarly to a dictionary. You can use the keys()
and values()
functions to retrieve a list of keys or values, respectively. You can also access a specific value by referencing using the corresponding key.
import anydbm #Open DBM file for reading cityDB = anydbm.open("city.dbm", 'r') timeDB = anydbm.open("time.dbm", 'r') #Get keys flights = cityDB.keys() #Use keys to get values print "Arrivals" print "=============================================" for flight in flights: print ("Flight %s arrives from %s at %s" % (flight, cityDB[flight], timeDB[flight])) #Close DBM file cityDB.close() timeDB.close()
get_dbm.py
Arrivals ============================================= Flight 1144 arrives from Dallas at 230pm Flight 1045 arrives from Los Angeles at 320pm Flight 1520 arrives from New York at 420pm
Output from get_dbm.py code
Example .
import anydbm cityDB = anydbm.open("city.dbm", 'w') flights = timeDB.keys() for flight in flights: if c == flight: timeDB[flight] = "CANCELLLED" if d == flight: del timeDB[flight]
After the database has been opened, you can use the database object similarly to a dictionary. To change a value of an object in the database, assign a new value to the corresponding key using d
[key]
= value
. To remove an object from the database, use del d
[key]
to reference the object by its specific key.
The d.has_key
(key)
function can be extremely useful if you are not certain whether a specific key exists in the database.
import anydbm flights = [] cancelled = ["1520", "1544"] deleted = ["1144"] def displayArrivals(header): print header print "========================================" for flight in flights: print ("Flight %s from %s arrives at %s" % (flight, cityDB[flight], timeDB[flight])) #Open DBM file for reading cityDB = anydbm.open("city.dbm", 'w') timeDB = anydbm.open("time.dbm", 'w') #Get keys flights = timeDB.keys() #Display arrivals displayArrivals("Arrivals") #Update DBM for flight in flights: for c in cancelled: if c == flight: timeDB[flight] = "CANCELLED" break for d in deleted: if d == flight: del timeDB[flight] del cityDB[flight] break #Display updataed arrivals flights = timeDB.keys() displayArrivals("Updated Arrivals") #Close DMB file cityDB.close() timeDB.close()
update_dbm.py
Arrivals ============================================= Flight 1144 from Dallas arrives at 230pm Flight 1045 from Los Angeles arrives at 320pm Flight 1520 from New York arrives at 420pm Updated Arrivals ============================================= Flight 1045 from Los Angeles arrives at 320pm Flight 1520 from New York arrives at CANCELLED
Example .
import cPickle f = open("pickled.dat", "w") p = cPickle.Pickler(f) p.dump(flights) p.dump(times) f.close()
Pickling data to files is one of the simplest ways to get around the limitation that DBM files have of only allowing simple text string storage. The pickle and cPickle modules included with Python provide a simple-to-use interface to pickle entire objects to a file for persistent storage.
The cPickler object is much faster than the pickler object; however, it will not allow you to subclass the pickler and unpickler objects for advanced handling of data.
The idea of pickling is to take an existing Python object and structure the data in such a way that it can be easily written out to an existing file and read back again.
The first step in pickleing Python objects is to open a file with the write permission. Once the file has been opened, use the Pickler
(file)
method to create a pickler object. The Pickler
method accepts a standard file object as its only parameter and returns the pickler object that is used to write objects to the file.
Once the pickler object has been created, you can use the dump
(object)
method to write almost any Python object to the file. The dump
method pickles the object and writes it to the file. As the output of the sample code illustrates, the pickled object is not a standard Python object.
If the same object is dumped to a pickler object twice, only the first object is saved, even if the object has been modified.
import cPickle flights = {"1144":"Dallas", "1045":"Los Angeles", "1520":"New York"} times = ["230pm", "320pm", "420pm"] #Create the pickle file f = open("pickled.dat", "w") #Create the pickler object p = cPickle.Pickler(f) #Pickle data to the file p.dump(flights) p.dump(times) f.close() #Display the file contents f = open("pickled.dat", "r") data = f.read() print data f.close()
pickle_data.py
(dp1 S'1520' p2 S'New York' p3 sS'1045' p4 S'Los Angeles' p5 sS'1144' p6 S'Dallas' p7 s.(lp8 S'230pm' p9 aS'320pm' p10 aS'420pm' p11 a.
Pickling data to files is one of the simplest ways to get around the limitation that DBM files have of only allowing simple text string storage. The pickle and cPickle modules included with Python provide a simple-to-use interface to pickle entire objects to a file for persistent storage.
The cPickler object is much faster than the pickler object; however, it will not allow you to subclass the pickler and unpickler objects for advanced handling of data.
The idea of unpickling is to read pickled objects from an existing pickle file and convert those pickled objects back to standard Python objects.
The first step to unpickle Python objects is to open the pickle file with the read permission. Once the file has been opened, use the UnPickler
(file)
method to create an unpickler object. The UnPickler
method accepts a standard file object as its only parameter and returns the unpickler object that is used to read pickled objects from the file.
Once the unpickler object has been created, you can use the load()
method to read a pickled object from the file. The object will be restructured and returned as a standard Python object.
import cPickle #Open the pickle file f = open("pickled.dat", "r") #Create the unpickler object p = cPickle.Unpickler(f) #Unpickle an object from the file data = p.load() print "Flight Dictionary:" print data #Unpickle an object from the file data = p.load() print " Time List:" print data f.close()
Flight Dictionary: {'1520': 'New York', '1144': 'Dallas', '1045': 'Los Angeles'} Time List: ['230pm', '320pm', '420pm']
Output from unpickle_data.py code
Example .
import shelve db = shelve.open("shelved.dat", "n") db['flights'] = flights db['times'] = times print db.keys()
Although pickling is great to store complex Python objects that DBMs cannot, it does not provide the direct entry access that is available with DBMs. Python provides the shelve module to bridge the gap and provide direct access to stored entries, as well as the ability to store complex Python objects. The shelve module accomplishes this by pickling the objects behind the scenes as they are added to the shelve file.
The shelve module provides its own open
(filename [, flags [, protocol [, writeback]]])
method to create and open shelve files. The optional flags
parameter accepts an r
, w
, c
, or n
character to determine whether the shelve will be read, write, created if it doesn’t already exist, or truncated to zero length if it does exist. The optional protocol
parameter accepts 0, 1, or 2 to determine whether the objects will be pickled as text based, binary, or a newer, faster method, respectively. The writeback
parameter, which defaults to false, is a Boolean that, when set to true, causes changes to be cached until the database is closed.
The open
method of the shelve module returns a shelve object that behaves much the same as a dictionary. Entries can be added to the shelve by assigning a value to a key using d
[key]
= value
. The key
must be a standard string; however, the value can be almost any Python object.
The output from the sample code shows what the contents of the shelve file looks like. You can see the objects in pickled form because the file was created using the default text-based protocol for pickling.
import shelve flights = {"1144":"Dallas", "1045":"Los Angeles", "1520":"New York"} times = ["230pm", "320pm", "420pm"] #Create shelve db = shelve.open("shelved.dat", "n") #Store objects in shelve db['flights'] = flights db['times'] = times #Display added keys print db.keys() db.close() #Display the file contents f = open("shelved.dat", "r") data = f.read() print data f.close()
shelve_store.py
['times', 'flights'] |(lp1 S'230pm' p2 aS'320pm' p3 aS'420pm' p4 a.|times|(dp1 S'1520' p2 S'New York' p3 sS'1045' p4 S'Los Angeles' p5 sS'1144' p6 S'Dallas' p7 s.|flights
Output from shelve_store.py code
Example .
import shelve db = shelve.open("shelved.dat", "r") for k in db.keys(): obj = db[k] flightDB = db['flights'] flights = flightDB.keys() cities = flightDB.values() times = db['times']
The shelve module provides its own open
(filename [, flags [, protocol [, writeback]]])
method to create and open shelve files. The optional flags
parameter accepts an r
, w
, c
, or n
character to determine whether the shelve will be read, write, created if it doesn’t already exist, or truncated to zero length if it does exist. The optional protocol
parameter accepts 0, 1, or 2 to determine whether the objects will be pickled as text based, binary, or a newer, faster method, respectively. The writeback
, which defaults to false, is a Boolean that, when set to true, causes changes to be cached until the database is closed.
The optional protocol
parameter accepts 0, 1, or 2 to determine whether the objects will be pickled as text based, binary, or a newer, faster method, respectively. When you open the shelve file to read objects, you must specify the correct protocol to properly unpickle the objects.
The open
method of the shelve module opens a shelve file and returns a shelve object that behaves much the same as a dictionary. Once the shelve object has been created, you can use the shelve object similarly to a dictionary.
The keys()
and values()
functions retrieve a list of keys or values, respectively. You can also access a specific value by referencing using the corresponding key.
When working with shelve files, the values that are returned can be almost any object type. You will need to keep this in mind when managing shelves that have multiple object types stored in them.
import shelve #Open shelve file db = shelve.open("shelved.dat", "r") #Get the keys from the shelve for k in db.keys(): obj = db[k] print "%s: %s" % (k, obj) #Use keys to get values flightDB = db['flights'] flights = flightDB.keys() cities = flightDB.values() times = db['times'] print " Departures" print "=============================================" x = 0 for flight in flights: print ("Flight %s leaves for %s at %s" % (flight, cities[x], times[x])) x+=1 db.close()
shelve_get.py
times: ['230pm', '320pm', '420pm'] flights: {'1520': 'New York', '1144': 'Dallas', '1045': 'Los Angeles'} Departures ============================================= Flight 1520 leaves for New York at 230pm Flight 1144 leaves for Dallas at 320pm Flight 1045 leaves for Los Angeles at 420pm
Example .
import shelve db = shelve.open("shelved.dat", "w", writeback=1) flights = db['flights'] del flights['1144'] flights['1145'] = "Dallas" db['times'] = newtimes db.sync()
Once the shelve file has been opened, you can use the shelve object similarly to a dictionary. If you want to replace an existing object in the shelve with a new one, assign the new value to the corresponding key using d
[key]
= value
. To remove an object from the database, use del d
[key]
to reference the object by its specific key.
Changing the value of specific parts of an object is where the power of using shelves rather than DBMs becomes very apparent. First, retrieve the object from the shelve by referencing its key using obj = d
[key]
. Once the object has been retrieved, values of the object can be modified using standard Python. The changes to the object are written back to the shelve file automatically.
In the example, we open the shelve with writeback set to true, so we use the sync()
method of the shelve module to force the changes to be flushed to disk.
import shelve newtimes = ["110pm", "220pm", "300pm", "445pm"] #Open shelve file db = shelve.open("shelved.dat", "w", writeback=1) #Get the keys for k in db.keys(): obj = db[k] print "%s: %s" % (k, obj) print " " #Use keys to get values flights = db['flights'] times = db['times'] #Update contents of old object del flights['1144'] flights['1145'] = "Dallas" flights['1709'] = "Orlando" #Replace old object with a new object db['times'] = newtimes #Add a new object db['oldtimes'] = times #Flush data to disk db.sync() for k in db.keys(): obj = db[k] print "%s: %s" % (k, obj) db.close()
times: ['230pm', '320pm', '420pm'] flights: {'1520': 'New York', '1144': 'Dallas', '1045': 'Los Angeles'} times: ['110pm', '220pm', '300pm', '445pm'] flights: {'1709': 'Orlando', '1520': 'New York', '1045': 'Los Angeles', '1145': 'Dallas'} oldtimes: ['230pm', '320pm', '420pm']
Output from shelve_edit.py code
Example .
import MySQLdb myDB = MySQLdb.connect(host="127.0.0.1", / port=3306) cHandler = myDB.cursor()
The MySQLdb module provides the standard Python DB-API 2.0 specification connect([host= [, port= [, user= [, passwd= [, db= [ , ...]]]]]])
function to connect to MySQL database servers. All the parameters to the connect
function are optional. The most common parameters used are the host
, port
, user
, passwd
, and db
.
Once you have successfully connected to the MySQL server, you need to get a cursor handle to send SQL requests to the server. The cursor()
function returns a cursor object that can be used to execute SQL commands on the server and obtain the results.
To execute a SQL command on the server, use the execute
(operation [, parameters])
function of the cursor object, where operation
is basically any properly formatted SQL command string.
To retrieve the results from executing the command, use the fetchall()
function of the cursor object. The fetchall
function returns the results of the SQL request in a series of one or more lists depending on the data being returned.
Once you have the cursor object and are able to execute SQL commands, you can use the SHOW DATABASES
SQL command to get a list of databases available on the server. To switch to a specific database, use the USE
<database>
SQL command.
To find out which database is currently active, use the SELECT DATABASE()
command to return the current database name.
import MySQLdb #Connect to MySQL Server myDB = MySQLdb.connect(host="127.0.0.1", port=3306) cHandler = myDB.cursor() #Display available databases cHandler.execute("SHOW DATABASES") results = cHandler.fetchall() print"Databases =====================" for item in results: print item[0] #Display current database cHandler.execute("SELECT DATABASE()") results = cHandler.fetchall() print " Current Database =======================" for item in results: print item[0] #Select database cHandler.execute("USE schedule") #Display current database cHandler.execute("SELECT DATABASE()") results = cHandler.fetchall() print " Current Database =======================" for item in results: print item[0] myDB.close()
MySQL_conn.py
Databases ===================== information_schema airport mysql schedule test testy Current Database ======================= None Current Database ======================= schedule
Example .
import MySQLdb myDB = MySQLdb.connect(host="127.0.0.1", port=3306) cHandler = myDB.cursor() cHandler.execute("CREATE DATABASE schedule") cHandler.execute("CREATE TABLE Arrivals (city TEXT, flight TEXT, time TEXT)")
Once you have connected to a MySQL database and got a SQL command cursor object, creating databases and tables is just a matter of sending the appropriately formatted SQL commands to the server.
To create a new database, use the execute
(operation [, parameters])
function of the cursor object to initiate the CREATE DATABASE
<database>
SQL command. To create a new table, use the execute()
function of the cursor object to initiate the CREATE Table
<tablename>
(<column name> <column type>, ...)
SQL command.
To verify that the table has been created, use the SHOW TABLES
SQL command to return a list of table entries available in the database.
The table entries that are returned are in the form of a list. The first entry in the list is the table name.
To verify structure of a specific table, use the DESCRIBE
<tablename>
SQL command to return a list of field entries included in the table.
The field entries that are returned are in the form of a list. The first entry in the list is the field name and the second is field type.
import MySQLdb #Connect to MySQL Server myDB = MySQLdb.connect(host="127.0.0.1", port=3306) #Get the cursor object cHandler = myDB.cursor() #Create database cHandler.execute("CREATE DATABASE schedule") #Select database cHandler.execute("USE schedule") #Create table cHandler.execute("CREATE TABLE Arrivals (city TEXT, flight TEXT, time TEXT)") #Show created table cHandler.execute("SHOW TABLES") results = cHandler.fetchall() print results #Describe the table cHandler.execute("DESCRIBE Arrivals") results = cHandler.fetchall() print results myDB.close()
(('arrivals',),) (('city', 'text', 'YES', '', None, ''), ('flight', 'text', 'YES', '', None, ''), ('time', 'text', 'YES', '', None, ''))
Output from MySQL_create.py code
Example .
import MySQLdb myDB = MySQLdb.connect(host="127.0.0.1", port=3306, db="schedule") cHandler = myDB.cursor() sqlCommand = "INSERT INTO Arrivals VALUES('%s', '%s', '%s')" % (city, flights[x], times[x]) cHandler.execute(sqlCommand) myDB.commit()
Once you have connected to a MySQL database and got a SQL command cursor object, adding entries to the database is just a matter of sending the appropriately formatted SQL commands to the server.
First, connect to the server using the MySQLdb modules connect
function, and then use the MySQL database object to get a cursor object. In the sample code, entries are added one at a time by executing the INSERT INTO
<tablename>
VALUES
(<data value>)
SQL command using the execute
function of the cursor object.
Remember to use the commit()
function of the cursor object to flush pending requests to the SQL database so that the changes will be written to disk.
import MySQLdb cities = ["Dallas", "Los Angeles", "New York"] flights = ["1144", "1045", "1520"] times = ["230pm", "320pm", "420pm"] #Connect to database myDB = MySQLdb.connect(host="127.0.0.1", port=3306, db="schedule") #Get cursor object cHandler = myDB.cursor() #Add entries to database x = 0 for city in cities: sqlCommand = "INSERT INTO Arrivals VALUES('%s', '%s', '%s')" % (city, flights[x], times[x]) cHandler.execute(sqlCommand) x += 1 #View added entries sqlCommand = "SELECT cities, flights, times FROM Arrivals" cHandler.execute(sqlCommand) results = cHandler.fetchall() print results #Commit changes to database myDB.commit() myDB.close()
MySQL_add.py
(('Dallas', '1144', '230pm'), ('Los Angeles', '1045', '320pm'), ('New York', '1520', '420pm'))
Output from MySQL_add.py code.
Example .
import MySQLdb myDB = MySQLdb.connect(host="127.0.0.1", port=3306, db="schedule") cHandler = myDB.cursor() sqlCommand = "SELECT * FROM Arrivals" cHandler.execute(sqlCommand) results = cHandler.fetchall() for row in results: cityList.append(row[0])
Once you have connected to a MySQL database and got a SQL command cursor object, retrieving entries from the database is just a matter of sending the appropriately formatted SQL commands to the server.
First, connect to the server using the MySQLdb modules connect
function, and then use the MySQL database object to get a cursor object. In the sample code, all entries are retrieved together by executing the SELECT * FROM
<tablename>
SQL command using the execute
function of the cursor object.
The SELECT
SQL command returns entries as a list of lists. Because we know that the field structure of the table is “city, flight, time,” each field can be accessed directly using index 0, 1, and 2, respectively.
import MySQLdb #Connect to database myDB = MySQLdb.connect(host="127.0.0.1", port=3306, db="schedule") #Get cursor object cHandler = myDB.cursor() #Send select request for specific entries sqlCommand = "SELECT * FROM Arrivals WHERE city = 'Dallas'" cHandler.execute(sqlCommand) #View results results = cHandler.fetchall() print results #Send select request for all entries sqlCommand = "SELECT * FROM Arrivals" cHandler.execute(sqlCommand) #View results results = cHandler.fetchall() print results #Process rows into lists cityList = [] flightList = [] timeList = [] for row in results: cityList.append(row[0]) flightList.append(row[1]) timeList.append(row[2]) print " Arrivals" print "=============================================" x = 0 for flight in flightList: print ("Flight %s arrives from %s at %s" % (flight, cityList[x], timeList[x])) x+=1 myDB.close()
MySQL_get.py
(('Dallas', '1144', '230pm'),) (('Dallas', '1144', '230pm'), ('Los Angeles', '1045', '320pm'), ('New York', '1520', '420pm')) Arrivals ============================================= Flight 1144 arrives from Dallas at 230pm Flight 1045 arrives from Los Angeles at 320pm Flight 1520 arrives from New York at 420pm
Output from MySQL_get.py code