Transaction isolation and multiversion concurrency control

There is a fundamental concept is called transaction in relational databases. Transaction is a unit of data change. PostgreSQL implements a set of principles related to transaction processing, which is called ACID:

  • Atomicity: Every transaction can be fixed (saved or committed) only as a whole. If one part of a transaction fails, the whole transaction is aborted and no changes are saved at all.
  • Consistency: All data integrity constraints are checked and all triggers are processed before a transaction is fixed. If any constraint is violated, then the transaction cannot be fixed.
  • Isolation: A database system manages concurrent access to the data objects by parallel transactions in a way that guarantees that the results of their execution is the same as if they were executed serially, one after the other.
  • Durability: If a transaction is fixed, the changes it has made to the data are written to the non-volatile memory (disk). So, even if the system crashes or is powered-off immediately after that, the result of the transaction will persist in the database.

Simply speaking, transactions are a mechanism that allows the database user to save or discard any changes he has made in the data. To start a transaction, one should use the SQL command BEGIN and COMMIT to save the result of one's work. To abort a transaction without saving, the user can use ROLLBACK. A transaction can include only one statement or it can include many of them. The size of the transaction has no defined limit, they can be very big. If no transaction is started explicitly, PostgreSQL will automatically run each single SQL statement within its own transaction. For example, if an UPDATE statement has updated several rows and then fails to update another one, then all the rows that have been updated already will be returned back to their original versions as if the UPDATE was never executed.

PostgreSQL implements a special mechanism that maintains data consistency and transaction isolation. It is called multiversion concurrency control (MVCC). In short, it means that each transaction works with a consistent snapshot of data as it was some time ago, without taking into account the changes made by other transactions that are not finished yet.

When it comes to accessing the same data object by several concurrent transactions, the way they interact with each other is determined by the level of transaction isolation. There are four levels of transaction isolation defined in SQL standard: Read uncommitted, Read committed, Repeatable read, and Serializable. The difference between them is the phenomena that they prevent from appearing. Those phenomena are:

  • Dirty read: A transaction can read data that was written by another transaction, which is not committed yet.
  • Nonrepeatable read: When rereading data, a transaction can find that the data has been modified by another transaction that has just committed. The same query executed twice can return different values for the same rows.
  • Phantom read: This is similar to nonrepeatable read, but it is related to new data, created by another transaction. The same query executed twice can return different numbers of records.

The following table shows which phenomena are not allowed by the SQL standard at different isolation levels:

Isolation level

Dirty read

Nonrepeatable read

Phantom read

Read uncommitted

   

Read committed

Not allowed

  

Repeatable read

Not allowed

Not allowed

 

Serializable

Not allowed

Not allowed

Not allowed

PostgreSQL does not explicitly support the Read uncommitted isolation level. Furthermore, Phantom reads do not actually appear at the Repeatable read isolation level, which makes it quite similar to Serializable. The actual difference between Repeatable read and Serializable is that Serializable guarantees that the result of the concurrent transactions will be exactly the same as if they were executed serially one after another, which is not always true for Repeatable read.

The isolation level for the transaction can be set when the transaction is started. This command: BEGIN ISOLATION LEVEL REPEATABLE READ; will start a transaction with a Repeatable read isolation level. The default isolation level in PostgreSQL is Read committed.

Transactions will block each other when they try to access the same data, and the result of the statement will depend on whether the concurrent transaction commits or aborts. For example, suppose there are two transactions, A and B. Transaction A has updated a record, increasing the value of a field by one. Transaction B tries to do the same with the same record. But the result of the update of transaction B will depend on what transaction A will do further. If transaction A commits, then B should increment a new value. If transaction A aborts, then B should use the old value. So transaction B will wait until transaction A is finished.

It is possible that this blocking will lead to a situation of deadlock. It can happen if the transaction A, in turn, tries to update another record which has been previously updated by transaction B. In that case, A will wait for B while it is waiting for A. PostgreSQL detects deadlocks and automatically terminates the transactions causing those situations.

Another more complicated issue related to the concurrent access is serialization errors. Again, suppose there are two transactions that run at the Serializable isolation level. Transaction A updates a record, increasing the value of a field by 1. Transaction B tries to do the same with the same record. Again, it will be blocked until transaction A is finished. If A is aborted, then B will continue. But if A is committed, it will not allow B to continue. That is because both A and B work with the snapshots of data at the moment of the start of the transactions. Both of them will update a field from its original state and set it to the same new value. But the result of these operations will be different from what could have been if the transactions were executed one after another. In this situation, transaction B will fail with an error "could not serialize access".

To block transactions and monitor serialization anomalies, PostgreSQL uses the mechanism of locks. Locks are special objects in the database that are obtained by transactions and associated with data objects: rows or tables. A data object cannot be locked by more than one transaction. So, when a transaction wants to update a record, it will first lock it. If another transaction tries to update the same record, it will also try to lock it but since it is already locked, the transaction will wait. Locks are created automatically by the database when it processes SQL statements and released when the transaction is finished.

Database applications can manage concurrent access to shared resources by obtaining locks explicitly. To lock a record or several records, an application should use the SELECT statement with the locking clause. The locking clause, that is, FOR UPDATE is added to the very end of the statement. This will not only return the records as any other SELECT statement, but also make PostgreSQL lock those records against concurrent updates or deletions until the current transaction is finished:

SELECT * FROM car_portal_app.car WHERE car_model_id = 5 FOR UPDATE;

This will lock all the records in the table car that satisfy the condition. If another instance of the application tries to execute the same query, it will be blocked until the transaction is committed or aborted. If an application does not want to wait, the locking clause can be supplemented by the NOWAIT keyword. In that case, if a lock cannot be acquired, then the statement will immediately report an error.

It is also possible to explicitly create table-level locks by using the SQL LOCK command.

The third option is using the so-called advisory locks. They are not related to a data object, but are associated with a certain key value specified by the application. They are created and released by the application, and can be used when, for some reason, the default MVCC model is not suitable to manage concurrent access.

Detailed information about concurrency control is available in the documentation at http://www.postgresql.org/docs/current/static/mvcc.html.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset