When you make a change to a PostgreSQL database, the PostgreSQL server records your changes in the shared-buffer pool, the write-ahead log (WAL), and eventually, in the file that holds the table that you've changed. The WAL stores a complete record of every change that you make. PostgreSQL's point-in-time recovery mechanism (PITR) uses the modification history stored in the WAL files to roll-forward changes made since the most recent cluster backup. You can think of PITR as an incremental backup scheme. You start with a complete backup and then, periodically, archive the changes. To recover from a crash, you restore the complete backup and then apply the changes, in sequence, until you've recovered all of the data that you want to restore.
Point-in-time recovery (PITR) can seem very intimidating if you start out reading the documentation (not that the documentation is bad, it just assaults you with a lot of detail before you get the big picture). To give you a broad overview of PostgreSQL's PITR, I'll cook up a new database, make it crash, and then recover the data using the PITR mechanism. You can follow along if you want, but you'll need some spare disk space.
I'll start by creating a new database cluster so I can experiment without harming any real data:
$ export PGDATA=/usr/local/pgPITR $ initdb The files belonging to this database system will be owned by user "pg". This user must also own the server process. ... Success. You can now start the database server using: postmaster -D /usr/local/pgPITR/data or pg_ctl -D /usr/local/pgPITR/data -l logfile start
Next, I'll change the $PGDATA/postgresql.conf configuration file to enable PITR. The only change that I have to make is to define the archive_command parameter. archive_command tells PostgreSQL how to archive the WAL (write-ahead-log) files generated by the PostgreSQL server. Since I don't really care about the data in this cluster, I'll just squirrel away the WAL files in the /tmp/wals directory:
archive_command = 'cp %p /tmp/wals/%f'
PostgreSQL will execute the archive_command instead of simply deleting the WAL files as it would normally do. The %p and %f macros expand to the complete pathname of a WAL file (%p) and the filename component of that file (%f).
Now I'll create the archive directory (/tmp/wals), start the postmaster, and create a database that I can work in:
$ mkdir /tmp/wals $ pg_ctl -l /tmp/pg.log start postmaster starting $ createdb test CREATE DATABASE
At this point, I have a fully PostgreSQL cluster, based in $PGDATA, with a sacrificial database named test. When I make changes to database, those changes are recorded in the WAL files stored in $PGDATA/pg_xlog (just like any other PostgreSQL cluster). When a WAL file fills, PostgreSQL will copy the file into the /tmp/wals directory for safe-keeping (if this were a real live database, I would employ a much more secure archive method). The only difference between a regular PostgreSQL cluster and a PITR-enabled database is that PostgreSQL archives the WAL files instead of deleting them.
Because the PITR mechanism works by applying changes recorded in the WAL files, I'll generate some WAL data by creating some dummy tables. It really doesn't matter what I put in the dummy tables, they just have to generate enough WAL data to overflow a few WAL files (each WAL file is 16MB long). Since the whole point of PITR is to recover to a specific point in time, I'll make note of the time that I COMMIT each transaction:
$ psql test Welcome to psql 8.0.0, the PostgreSQL interactive terminal. ... test=# BEGIN WORK; BEGIN test=# CREATE TABLE dummy1 AS SELECT * FROM pg_class, pg_attribute; SELECT test=# COMMIT; — executed at 12:30:00pm COMMIT test=# q
The CREATE TABLE command produced a table that holds more than 245,000 records and produced enough WAL data to overflow a few segments. You can see the archived segments by looking in the /tmp/wals directory:
$ ls /tmp/wals 000000010000000000000000 000000010000000000000001 000000010000000000000002 000000010000000000000003 000000010000000000000004
Next, I'll create a complete backup of my database cluster. To simplify this example, I'll simply create a tarball and save it in the /tmp directory instead of writing the whole thing to tape. Before I start the backup, I'll tell PostgreSQL what I'm about to do by calling the pg_start_backup() function:
$ psql test Welcome to psql 8.0.0, the PostgreSQL interactive terminal. ... test=# select pg_start_backup( 'full backup - Monday' ); pg_start_backup ---------------- 0/52EA2B8 (1 row) test=# q $ tar -zxvf /tmp/pgdata.tgz $PGDATA tar: Removing leading '/' from member names /usr/local/pgPITR/data/ /usr/local/pgPITR/data/pg_hba.conf /usr/local/pgPITR/data/pg_subtrans/ ...
The argument that you give to pg_start_backup() is simply a label that helps you remember where the archive came from—you'll find a file named $PGDATA/backup_label after calling this function and the label is stored inside of that file.
When the backup completes, I'll tell PostgreSQL that I'm finished by calling the pg_stop_backup() function:
$ psql test Welcome to psql 8.0.0, the PostgreSQL interactive terminal. ... test=# select pg_stop_backup(); pg_stop_backup -------------- 0/52EA2F4 (1 row) test=# q
At this point, I have a complete backup of the database cluster, PostgreSQL is still running (I did not have to stop the postmaster), any database changes are recorded in the WAL files, and the WAL files are still being archived in /tmp/wals. Pretend that I've locked the backup (/tmp/pgdata.tgz) in a safe place somewhere.
To generate a few more WAL files, I'll create another dummy table:
$ psql test Welcome to psql 8.0.0, the PostgreSQL interactive terminal. ... test=# BEGIN WORK; BEGIN test=# CREATE TABLE dummy2 AS SELECT * FROM pg_class, pg_attribute; SELECT test=# COMMIT; — executed at 12:38:00pm
Now, just to make things more interesting, I'll add a third dummy table and the drop it again. You would expect that table to disappear once I recover:
test=# BEGIN WORK; BEGIN test=# CREATE TABLE dummy3 AS SELECT * FROM pg_class, pg_attribute; SELECT test=# COMMIT; — executed at 12:39:00pm COMMIT test=# BEGIN WORK; BEGIN test=# DROP TABLE dummy3; DROP TABLE test=# COMMIT; — executed at 12:40:00pm COMMIT test=# q
As you would expect, PostgreSQL has copied a few more WAL files to /tmp/wals:
$ ls /tmp/wals 000000010000000000000000 000000010000000000000001 000000010000000000000002 000000010000000000000003 000000010000000000000004 000000010000000000000005 000000010000000000000005.002EA2B8.backup 000000010000000000000006 000000010000000000000007 000000010000000000000008 000000010000000000000009 00000001000000000000000A 00000001000000000000000B 00000001000000000000000C 00000001000000000000000D 00000001000000000000000E
At this point, disaster strikes. My power goes out, an unexpected hurricane arrives, or my computer bursts into flames (and everything except my /tmp directory is destroyed—hey, we're just pretending). To simulate disaster, I'll kill off the postmaster:
$ kill -9 $(head -1 $PGDATA/postmaster.pid)
Now it's time to recover the entire cluster. I'll start by renaming the damaged cluster:
$ mv $PGDATA $PGDATA.old
Next, I'll restore the backup from its archive:
$ pushd / && tar -zxvf /tmp/pgdata.tgz && popd
That leaves me with the damaged cluster in $PGDATA.old and the backup cluster in $PGDATA. I'll clean up the (freshly restored) cluster by removing the old WAL files and the postmaster.pid file:
$ rm -f $PGDATA/pg_xlog/0* $ rm -f $PGDATA/postmaster.pid
To ensure that I can recover as much data as possible, I'll copy the WAL files from the damaged cluster into the restored cluster:
$ cp $PGDATA.old/pg_xlog/0* $PGDATA/pg_xlog/
If the damaged cluster is not available (maybe it burned up when my computer caught fire), I can still recover—I just can't recover as much. I can recover all transactions committed before the most recent WAL file was archived. In a busy database, that's a few minutes. In a quiescent database, the interval is longer, but the amount of data is the same (16MB or so).
Now I can start the PostgreSQL recovery process, but before I do, here's another look at the timeline that I've followed:
12:30:00pm— I created the dummy1 table and committed the change
2:38:00pm— I created the dummy2 table and committed the change
12:39:00pm— I created the dummy3 table and committed the change
12:40:00pm— I dropped dummy3 and committed the change
Sometime between 12:30 and 12:38, I backed up the entire cluster. When I start the recovery process, I can recover all changes, or I can tell PostgreSQL to stop at a certain point in time. If the recovery stops before 12:38, I should find the dummy1 table, but not the dummy2 (or dummy3) table. If the recovery stops before 12:39, I should find dummy1 and dummy2 (but not dummy3). If the recovery stops before 12:40, I should find all three tables. If I let PostgreSQL recover all changes, the dummy3 file should disappear (because I dropped that table before the lights went out).
To control the recovery process, I'll create a file named $PGDATA/recovery.conf that tells PostgreSQL how to proceed. recovery.conf looks like this:
$ cat $PGDATA/recovery.conf restore_command = 'cp /tmp/wals/%f %p' recovery_target_time = '2005-06-22 12:39:01 EST'
The restore_command parameter tells PostgreSQL how to restore the WAL files PostgreSQL tucked away in /tmp/wals (in a real-world environment, the restore_command might be a shell script that asks you to mount a specific tape). The recovery_target_time parameter tells PostgreSQL when to stop. If you want to recover all changes, simply omit the recovery_target_time parameter. In this case, I've asked PostgreSQL to stop the recovery process after it restores the 12:39pm change—I expect to find dummy1, dummy2, and dummy3 in the database when recovery completes.
PostgreSQL starts the recovery process as soon as you start the postmaster (the postmaster knows it has extra work to do because it finds the $PGDATA/recovery.conf file):
$ pg_ctl -l /tmp/pg.log start postmaster started
If you're running on a Linux/Unix system (or you have the Cygwin tools installed), you can follow the postmaster's progress by watching the server's log file:
$ tail -f /tmp/pg.log LOG: starting archive recovery LOG: restore_command = "cp /tmp/wals/%f %p" LOG: recovery_target_time = 2005-06-22 13:05:00-05 ... LOG: restored log file "000000010000000000000006" from archive LOG: restored log file "000000010000000000000007" from archive LOG: restored log file "000000010000000000000008" from archive LOG: restored log file "000000010000000000000009" from archive ... LOG: archive recovery complete LOG: database system is ready
When the recovery process completes, PostgreSQL has renamed the recovery.conf file (to recovery.done) to avoid an accidental recovery the next time you start the postmaster.
I can now connect to the database and see that the dummy1, dummy2, and dummy3 tables are in place:
$ psql test Welcome to psql 8.0.0, the PostgreSQL interactive terminal. ... test=# d List of relations Schema | Name | Type | Owner --------+--------+-------+------ public | dummy1 | table | pg public | dummy2 | table | pg public | dummy3 | table | pg (3 rows)
If I had omitted the recovery_target_time parameter, the dummy3 table would disappear because I dropped that table (and, most importantly, committed the change) before my system crashed.
You can see that PITR is easy to configure (just define an archive_command in the postgresql.conf file). The process of actually recovering from a crash can be tricky, but as long as you follow the procedure I've outlined, you should be in good shape. I strongly encourage you to practice the whole process before you really need it. Just create a sacrificial cluster, configure PITR recovery, create some data, and fake a few crashes. It won't take much time and you'll be happy that you've learned the procedure if your disk drives start making funny noises.
To summarize the setup procedure:
To summarize the recovery procedure:
The most important part of the PITR scheme is the archive_command. The PostgreSQL reference documentation (Chapter 22) provides a few guidelines regarding the archive_command (and the restore_command). You'll also find a detailed explanation of the parameters that you can include in the recovery.conf file. Be sure to read that chapter before you try to configure a real-world server that contains important data.