You talk to the database in the form of transactions.[1] A database transaction is one or more database statements that must be executed together, or not at all. A bank account transfer is a good example of a complex transaction. In short, an account transfer is actually two separate events: a debit of one account and a credit to another. Should the database crash after the debit but before the credit, the application should be able to back out of the debit. A database transaction enables a programmer to mark when a transaction begins, when it ends, and what should happen if one of the pieces of the transaction fails.
Until recently, MySQL had no support for transactions. In other words, when you executed an SQL statement under old versions of MySQL, it took effect immediately. This behavior is still the default for MySQL. Newer versions of MySQL, however, support the ability to use transactions with certain tables in the database. Specifically, the table must use a transaction-safe table format. Currently, MySQL supports two transaction-safe table types: BDB (Berkeley DB) and InnoDB. Instructions for configuring MySQL to use these databases can be found on the MySQL web site.
In Chapter 3, we described the MySQL syntax for managing transactions from the MySQL client command line. Managing transactions from within applications is often very different. In general, each API will provide a mechanism for beginning, committing, and rolling back transactions. If it does not, you can likely follow the command-line SQL syntax to get the desired effect.
Managing transactions may seem simple, but there are many issues you need to consider when using transactions in a multiuser environment. First of all, transactions come with a heavy price in terms of performance. MySQL did not originally support transactions because MySQL’s goal was to provide a fast database engine. Transactions seriously impact database performance. To understand how this works, you need to have a basic understanding of transaction isolation levels.
A transaction isolation level basically determines what other people see when you are in the middle of a transaction. To understand transaction isolation levels, however, you first need to understand a few common terms:
A dirty read occurs when one transaction views the uncommitted changes of another transaction. If the original transaction rolls back its changes, the one that read the data is said to have “dirty” data.
A repeatable read occurs when one transaction always reads the same data from the same query no matter how many times the query is made or how many changes other transactions make to the rows read by the first transaction. In other words, a transaction that mandates repeatable reads will not see the committed changes made by another transaction. An application needs to start a new transaction to see those changes.
A phantom read
deals with changes occurring in other transactions that would result
in the new rows matching your transaction’s
WHERE
clause. Consider, for example, a situation in which you have a
transaction that reads all accounts with a balance of less than $100.
Your transaction performs two reads of that data. Between the two
reads, another transaction adds a new account to the database with no
balance. That account will now match your query. If your transaction
isolation allows phantom reads, you will see the new
“phantom” row. If it does not allow
phantom reads, you will see the same set of rows each time.
MySQL supports the following transaction isolation levels:
READ UNCOMMITTED
The transaction allows dirty, nonrepeatable, and phantom reads.
READ COMMITTED
The transaction does not allow dirty reads, but allows nonrepeatable and phantom reads.
REPEATABLE READ
The transaction allows committed, repeatable reads and phantom reads. Nonrepeatable reads are not allowed.
SERIALIZABLE
The transaction allows only committed, repeatable reads. Phantom reads are specifically not allowed.
As you climb the transaction isolation chain, from no transactions to
serializable transactions, you decrease the performance of your
application. You therefore need to balance your data integrity needs
with your performance needs. In general, READ
COMMITTED
is as high as an application wants to go, except
in a few cases.
One mechanism of getting the performance of READ
UNCOMMITTED
and the data integrity of
READ
COMMITTED
is to make a
row’s primary key the normal primary key
plus a timestamp reflecting the time in milliseconds when the row was
last updated. When an application performs an update on the
underlying row in the database, it updates that timestamp but uses
the old one in the WHERE
clause:
UPDATE ACCOUNT SET BALANCE = 5.00, LAST_UPDATE_TIME = 996432238000 WHERE ACCOUNT_ID = 5 AND LAST_UPDATE_TIME = 996432191119
If this transaction has dirty data, the update will fail and throw an error. The application can then requery the database for the new data.
[1] Even if you are using a version of MySQL without support for transactions, each statement you send to the database can, in a sense, be thought of as an individual transaction. You simply have no option to abort or package multiple statements together in a complex transaction.