Schema evolution

When working with an SQL database, we have to address the problem of schema evolution. Our objects have a dynamic state and a (relatively) static class definition. We can easily persist the dynamic state of an object. Our class defines the schema for the persistent data. The ORM provides mappings from our class to an SQL implementation.

When we change a class definition, how can we fetch objects from the database? If the database must change, how do we upgrade the Python mappings and still access the data? A good design often involves some combination of several techniques.

The changes to the methods and properties of a Python class will not change the mapping to the SQL rows. These can be termed minor changes, as the tables in the database are still compatible with the changed class definition. A new software release can have a new minor version number.

Some changes to Python class attributes will not necessarily change the persisted object state. Adding an index, for example, doesn't change the underlying table. SQL can be somewhat flexible when converting the data types from the database to the Python objects. An ORM layer can also add flexibility. In some cases, we can make some class or database changes and call it a minor version update because the existing SQL schema will still work with new class definitions.

Other changes to the SQL table definitions will need to involve modifying the persisted objects. These can be called major changes when the existing database rows will no longer be compatible with the new class definition. These kinds of changes should not be made by modifying the original Python class definitions. These kinds of changes should be made by defining a new subclass and providing an updated factory function to create instances of either the old or new class.

Tools such as Alembic (see https://pypi.org/project/alembic/) and Migrate (https://sqlalchemy-migrate.readthedocs.io/en/latest/) can help manage schema evolution. A disciplined history of schema migration steps is often essential for properly converting from old data to new data.

There are two kinds of techniques used for transforming the schema from one version to the next. Any given application will use some combination of these techniques, as follows:

SQL ALTER statements modify a table in place. There are a number of constraints and restrictions on what changes can be done with an ALTER. This generally covers a number of minor changes.
Creating new tables and dropping old tables. Often, SQL schema changes require a new version of tables from data in the old tables. For a large database, this can be a time-consuming operation. For some kinds of structural changes, it's unavoidable.

SQL database schema changes typically involve running a one-time conversion script. This script will use the old schema to query the existing data, transform it to new data, and use the new schema to insert new data into the database. Of course, this must be tested on a backup database before being run on the user's live operational database. Once the schema change has been accomplished, the old schema can be safely ignored and later dropped to free up storage.

Tools such as Alembic require two separate conversion scripts. The upgrade script will move the schema forward to the new state. A downgrade script will move the schema back to the previous state. Debugging a migration may involve doing an upgrade, finding a problem, doing a downgrade, and using the previous software until the problem can be sorted out.

Table of Contents for Schema evolution

Create new playlist

Sign In

Sign Up

Table of Contents for
Schema evolution