10.
Using Transactions with InnoDB Tables

In this chapter, we will cover transactions in MySQL. When dealing with practical examples, we will primarily look at the InnoDB table type, but nearly everything said here applies equally well to BerkeleyDB tables.

Specifically, we will look at the following:

• What transactions are

• Using transactions in MySQL

• The InnoDB transaction model and ACID compliance

What Are Transactions?

The first thing to address in this chapter is what exactly is a transaction? In the context of a database management system, a transaction is a sequence of related instructions that must be treated as one indivisible unit. That is, all the work in the transaction must be done, or all of it must be left undone. This concept is known as atomicity. A transaction is atomic because it cannot be broken down into parts—it all gets processed or it all gets ignored.

This has particular implications when considering concurrent access by multiple users, processes, or threads and also for recovery. Modern computers are often accessed by more than one person at a time. They usually have more than one program running at one time, and they often run programs such as Web server software that create many processes or threads that run at the same time. Each of these users, programs, processes, or threads may need access to the database server.

Multiple threads must not interfere with each other when running concurrently. If an error occurs, the database must honor transactions when recovering. This means returning the database to the state it was in before the error occurred or finishing the whole transaction. It is generally better to lose an entire transaction of related changes than to recover to a state partway through a sequence of updates that might result in the database being in an inconsistent state.

This concept can be expressed more formally, but an example is probably the easiest way to come to grips with it. Consider a very simple (but potentially very important) database that records bank account details. Each account has, at minimum, a unique identifier and a balance.

We can create this table as follows:


create table account
(
  number int not null auto_increment primary key,
  balance float
) type = InnoDB;


It probably is not a shock that we can create two new accounts with these queries:


insert into account (balance) values (0.0);
insert into account (balance) values (1000.0);
insert into account (balance) values (2000.0);


There is not very much that can go wrong here, and the result should be this data:

image

As long as we are not making assumptions about the account numbers generated by the auto-increment column, it does not matter in what order these queries are eventually run. It does not matter if we are running them from three different interactive clients simultaneously and cannot be sure which will be processed first.

This is often not true for more complex queries or sets of queries. Consider the following pair of statements intended to deposit $500 into account 2:


# first check balance
select balance from account where number = 2;
# query gives us a result of $1000
# now store updated balance
update account set balance = 1500 where number = 2;


These queries are related. They need to be run together. If other clients can update the balance of this account between our balance check and our balance update, we may not get the result we expected.

If two clients were running pairs of queries like this at the same time, our final result would depend on timing. If we were attempting to deposit $500 with these queries and another client was attempting to deposit $100 with the following pair, the end result could be a balance of $1100 or $1500—neither of which is the right result.


# first check balance
select balance from account where number = 2;
# query gives us a result of $1000
# now store updated balance
update account set balance = 1100 where number = 2;


This is obviously not desirable, but in this case the problem is easily solved. Making our updates relative rather than absolute will make them into single, indivisible units and will solve the problem. The following query will run correctly, regardless of what other queries are running at the same time:


update account set balance = balance + 500 where number = 2;


A single update statement in MySQL is always atomic. It cannot be interrupted by another query or half succeed. It will complete or will completely fail on an error.

More complex scenarios are harder to address. Consider the following pair of queries intended to transfer $1000 from account 2 to account 1:


update account set balance = balance - 1000 where number = 2;
update account set balance = balance + 1000 where number = 1;


Our updates are both relative, but it is important that these two queries be run together for sensible results. The total amount of money in the system should be the same after the queries as before. Money should be moved from account to account, not created or destroyed. If a power failure happened between running the first query and running the second query, our data would no longer be consistent.

In simple cases, a workaround is to collapse the queries into one SQL statement. In this case we could write this:


update account as source, account as dest
  set source.balance = source.balance – 1000,
    dest.balance = dest.balance + 1000
  where source.number = 2 and dest.number = 1;


By using two aliases to the account table (source and dest), we have ensured that this is one atomic update that will either succeed or fail. We do not need to worry about the server stopping between parts of our operation.

In this case the only casualty is readability. The combined query is harder to read and debug than our first attempt.

In many cases, it may not be possible to collapse all the related queries into one like this. The solution in those cases, and anytime you want more readable code, is to use MySQL's transaction syntax. By marking a set of statements as a transaction, you inform the database that they are a related, indivisible set. They should be treated as an atomic unit and either all succeed or all have no effect. You can run these two queries as a single transaction using the following SQL statements:


start transaction;
update account set balance = balance - 1000 where number = 2;
update account set balance = balance + 1000 where number = 1;
commit;


An important property of transactions is that they are not visible to other sessions until they are complete and committed. No other thread can read inconsistent data from the table(s) while you are in the process of updating it.

Another benefit of transactions is that partially performed transactions can be undone. As long as we attempt to roll back the transaction before we have committed it, then any changes made by queries that are part of the transaction will be undone.

In the case of our transfer example, if we added a SELECT statement to check that we were not removing more money from the source account than it contained, we could use the keyword ROLLBACK if we wanted to cancel the whole transaction. The syntax would look like this:


start transaction;
update account set balance = balance - 1000 where number = 2;
update account set balance = balance + 1000 where number = 1;
select balance from account where number = 2;
# select tells us that account #2 has a negative balance!
# we'd better abort
rollback;


Calling ROLLBACK aborts the transaction and undoes any changes it would have made. A transaction that was rolled back instead of committed leaves no trace in the data. Because partial results were never visible to other sessions, it is exactly as though it never happened.

Using Transactions in MySQL

To use transactions as demonstrated in the preceding section, you must be using a transaction-safe table type—either InnoDB or BDB. There are various pieces of syntax that will get the same effect.

The syntax START TRANSACTION is synonymous with BEGIN or BEGIN WORK. You may like to use one of these forms to make your code more compatible with another database you use, but because START TRANSACTION is the SQL-99 syntax, it is generally recommended.

Setting the Autocommit Mode

Normally, MySQL runs in autocommit mode. Each query you run is effectively isolated in a transaction. You can think of it as adding START TRANSACTION and COMMIT to each of your queries for you. A couple of simple queries like these


update account set balance = balance - 1000 where number = 2;
update account set balance = balance + 1000 where number = 1;


will be run as though they were written like this:


start transaction;
update account set balance = balance - 1000 where number = 2;
commit;
start transaction;
update account set balance = balance + 1000 where number = 1;
commit;


Note that if you manually type


start transaction;


normally nothing will be committed until you manually type


commit;


You can disable the autocommit behavior using the SET command as follows:


set autocommit=0;


As you would probably guess, the following command will put MySQL back into autocommit mode:


set autocommit=1;


The autocommit variable is local to a single session, so changing the mode will affect only queries run from your session and only for as long as your session is connected.

If you turn autocommit off, you will not need to call START TRANSACTION to start a transaction. It is very important, though, that you remember to call COMMIT periodically to commit any changes you have made to the database.

Regardless of whether autocommit is on or off, there are times when your changes will automatically be committed. If you are using a non-transaction-safe table type, such as MyISAM, all of your changes will be committed immediately, regardless of the autocommit setting. You can happily group your statements with START TRANSACTION and COMMIT; it is just that this will have no effect on the non-transaction-safe tables. You can even call ROLLBACK. It will not give an error—it will just have no effect on anything you have altered in a non-transaction-safe table. This might be useful if you are testing code or reloading a dump file on servers with different table types.

For transaction-safe tables, there are actions (other than typing COMMIT) that will automatically trigger a COMMIT. Requesting a lock will implicitly commit any outstanding queries.

Using Locks

An alternative way of obtaining some of the benefits of a transaction is to manually lock and unlock tables.

If we really wanted to write our bank account deposit code as two separate transactions, we could implement it as follows:


lock tables account write;
select balance from account where number = 2;
update account set balance = 1500 where number = 2;
unlock tables;


A call to LOCK TABLES tries to lock any tables you list so that the current thread can work with it without interference. A call to UNLOCK TABLES releases any locks that this thread holds. Unlocking is straightforward. The only thing to note about it is that if you have locked a table, you should unlock it as soon as possible to limit the impact on other threads. Locking is a more complicated issue.

You need to request all the locks you need at once. The preceding example requested only one, but if we intended to access multiple tables or even multiple aliases to the one table, we would need to add them to the same call, for example:


lock tables account write, account as a read, othertable low_priority write;


Calling LOCK TABLES releases all locks you currently hold, so if you attempt to collect the locks you need over multiple statements, you will release all the early ones and will only actually hold the locks requested in the final statement.

There are two main types of locks: read and write. If you need access to a table to write, and you cannot allow other threads to use the table at the same time, a write lock will stop any other thread from reading or writing to the table until you release it. A read lock is less extreme. If you only intend to read from a table, there is no harm in allowing other threads to read at the same time. A read lock bars other threads only from writing to the table during the period that your thread holds the lock.

A write lock can also be marked as low_priority. Any system that distributes locks, including MySQL, needs a policy to decide who gets locks first when there are conflicting demands. MySQL generally gives write lock requests priority over read lock requests to ensure that updates to the stored data are made as soon as possible. If you do not want this behavior, you can request a low-priority write lock, as we did for the table named othertable in the preceding example. There is a catch, though. Whenever you request a lock, you may have to wait before it is granted. A low-priority lock will be granted only if there are no other threads requesting read or write locks on that table. It is possible on a busy server that this might never happen.

You will probably not often manually control locking in this way, but there are some reasons to do it. If you have an application that requires very high performance but needs transaction-like behavior only occasionally, it might be worth using a fast non-transaction-safe table type and using locks to solve your transaction issue.

Another common instance in which you would call LOCK TABLES is while manipulating MySQL's data files directly. For instance, if you wanted to ensure that the disk files stayed consistent and unmodified while you backed them up, you would need to lock them.

The most important thing to bear in mind about locking is that you should release your locks as soon as possible because other systems and users will be kept waiting. Some tasks you may lock while performing, such as reindexing or backing up large files, can take significant amounts of time.

The InnoDB Transaction Model

Transactional database management systems are generally striving for the same goals, using differing approaches. To isolate transactions, InnoDB uses a fine-grained, row-level locking mechanism. This means that different transactions can run on the same table at the same time as long as they are all only reading or do not use the same rows if they are writing.

Uncommitted changes lock other threads out of only affected rows, not a whole table. This is one of the features that gives InnoDB high performance while delivering the kinds of features you expect from a modern RDBMS. One of these features, or sets of features, is ACID compliance.

ACID Compliance

An important database term that we have not yet defined is the acronym ACID. ACID stands for Atomicity, Consistency, Isolation, and Durability. Much used to be made of the fact that MySQL using MyISAM tables did not pass the “ACID Test.” Using InnoDB tables, MySQL is ACID compliant.

Atomicity means that transactions are atomic and indivisible. Either all of a transaction's changes are stored in the database, or none of them are stored. In the event of an external error, it is obviously ideal if the recovery process can complete any transactions that were in progress at the time; however, it is also acceptable for those transactions to be completely rolled back.

Consistency means that operations transform the database from one valid state to another. There are no intermediate stages where the data is inconsistent. The database should also disallow operations that violate consistency constraints. If you are storing bank accounts that relate to bank customers, it should not be possible to create an account for a customer who does not exist, and it should not be possible to delete a customer from the customers table if there are still accounts referring to them in the accounts table.

Isolation means that transactions do not affect each other while they are running. Each transaction should be able to view the world as though it is the only one reading and altering things. In practice this is not usually the case, but locks are used to achieve the illusion. Depending on the database and option settings, you will have different levels of isolation in practice. (See the “Transaction Isolation” section in this chapter for more detail.)

Durability means that after a transaction has been committed to the database, its effects are permanent. This would be a fairly simple requirement to satisfy in a simple program, but in a complex RDBMS that uses locking and multiversioning to allow concurrent multiuser access and caching to improve performance, it is a minefield. In addition, durability implies that we should be able to recover the current state of the database in the event of a failure. If a power failure, hard-disk crash, or other catastrophe occurs between a client sending a transaction to the database and that transaction being recorded on disk, then we should be able to combine a backup and a log to bring the database back to its precrash state and perhaps process transactions that had been logged but not yet executed or committed.

If you are using InnoDB tables (or BerkeleyDB tables), MySQL is ACID compliant. Using the transaction syntax gives you atomicity. Transactions and foreign key constraints give you consistency. You can choose the level of isolation that transactions have from one another. The binary log and repair tools provide durability. (Using replication, you can have a highly durable system without any single point of failure.)

Transaction Isolation

InnoDB tables can run in four different transaction isolation levels. In order from strongest to weakest, they are

• Serializable

• Repeatable read

• Read committed

• Read uncommitted

As with many options, you have a trade-off between robustness and performance.

Serializable isolation is the ideal from a purity and robustness angle. With serializable isolation, reads and writes on the database should appear to be happening in a sequence, with changes from a write being completely recorded before the next read starts. Transactions will not always have to be performed in a noninterleaved sequence to achieve this appearance because many do not interfere with each other, but in cases in which there are clashes, they will. This locking and waiting, combined with the overhead of predicting which combinations of transactions will interfere, makes serializable isolation the slowest isolation mode. If you want to use this mode, this is the command to run:


set transaction isolation level serializable;


The default level for InnoDB is repeatable read. In this isolation mode, each transaction gets to work in an isolated version of the table where each row remains as it was when the transaction started. Reading a row is guaranteed to be repeatable. If you call


select * from account where number=1;


at the start of the transaction and perform the same query later in the transaction, you will get the same results both times. You can, however, get what are called phantom reads. It is possible that another transaction which commits before yours is adding new rows to the table. If you perform the same query with a condition twice, such as


select * from account where balance>1000;


it is possible that you will get new rows—phantom rows—the second time.

In practice you should very rarely see phantom reads from MySQL. InnoDB uses an algorithm called next key locking to solve the problem, as long as the column that your condition applies to is indexed. You probably already know that InnoDB has row-level locking. When a transaction uses a row, it locks that row so that the transaction can be isolated from others. As well as locking the rows used, next key locking also locks the gaps between rows found in the index. Because phantom reads are addressed in this way, few systems really need to be put in serialized isolation mode.

If you set the server to read committed, your transactions are no longer very isolated. If you perform a query and repeat it later in the same transaction, you will get different results the second time if another transaction has modified the data in the meantime and committed. Should you want to do this, the command is


set transaction isolation level read committed;


At the weakest isolation level, read uncommitted, it is distinctly arguable not only that your transactions are no longer isolated, consistent, and therefore ACID compliant, but that you no longer really have transactions. In this mode, it is possible for transactions to read changes that other transactions have made before the changes have been committed. This is called a dirty read. You would tolerate this only in fairly unusual circumstances, such as at a time when you know all active threads will be reading or writing, but not both. To enable read uncommitted mode, use this:


set transaction isolation level read uncommitted;


Table 10.1 summarizes the characteristics of each mode.

Table 10.1. Transaction Isolation Level Characteristics

Table 10.1. Transaction Isolation Level Characteristics

Summary

• A transaction is a related set of SQL queries treated as a single atomic unit. It can be either entirely committed or entirely rolled back.

• The standard SQL way to express a transaction is


start transaction;
# do work
commit;


but there are many equivalent ways to get the same effect.

ACID stands for Atomicity, Consistency, Isolation, and Durability. You should understand what these terms mean or other geeks will refuse to talk to you.

• In order from strongest to weakest, the transaction isolation levels are serializable, repeatable read, read committed, and read uncommitted. The InnoDB default is repeatable read.

Quiz

1.

With autocommit off, a transaction will be committed

a) when you call COMMIT

b) when you request a lock

c) both a) and b)

d) none of the above

2.

Atomicity means

a) either all of a transaction is performed or none of it is performed

b) operations transform the database from one consistent state to another

c) transactions do not interfere with each other

d) committed transactions should be permanent

3.

Isolation means

a) either all of a transaction is performed or none of it is performed

b) operations transform the database from one consistent state to another

c) transactions do not interfere with each other

d) committed transactions should be permanent

4.

Durability means

a) either all of a transaction is performed or none of it is performed

b) operations transform the database from one consistent state to another

c) transactions do not interfere with each other

d) committed transactions should be permanent

5.

In repeatable read mode

a) you may experience a dirty read

b) you may experience a nonrepeatable read

c) you may experience a phantom read

d) none of the above

Exercises

The MySQL/InnoDB manual contains many hints for improving the performance of your system when using InnoDB tables. Apply as many of these as appropriate to your system.

You can start by looking here:

www.innodb.com/ibman.html#InnoDBTuning

On a noncritical server, cause MySQL to stop while you are halfway through a transaction. You should not need to drop your hard drives from a great height or kick the power plug out of the wall. Killing the process should be sufficient. Examine the binary log and see what happens when you restart the server.

Answers

Quiz

1.

c

2.

a

3.

c

4.

d

5.

c

Next

In Chapter 11, “Managing User Privileges,” we will examine MySQL's advanced user privilege system. You have a great deal of choice about what power you give to users of the database. In this chapter you'll learn about these options and how to control them.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset