Data difference

The easiest way to create a database abstraction interface is to use views to access the data. In this case, if one wants to change the table structure, it can be done without changing the code of the external applications. The only thing necessary is to update the definitions of the interface views. Moreover, if one would, for example, remove a column from a table that is used in a view, PostgreSQL will not allow this. This way, the objects in the database that are used by applications will be protected.

Nevertheless, if the database structure was changed and the view definitions were updated accordingly, it is important to check whether the new views return the same data as the old.

Sometimes it is possible to implement the new view in the same database. In this case one just needs to create a copy of the production database in the test environment, or prepare a test database containing all possible combinations of the attributes of the business entities. Then the new version of the view can be deployed with a different name. The following query can then be used to see if the new view returns the same data:

WITH 
n AS (SELECT * FROM new_view),
o AS (SELECT * FROM old_view)
SELECT 'new', * FROM (SELECT * FROM n EXCEPT ALL SELECT * FROM o) a
UNION ALL
SELECT 'old', * FROM (SELECT * FROM o EXCEPT ALL SELECT * FROM n) b;

Here, the names, new_view and old_view, refer to the names of the respective relations. The query returns no rows if both views return the same result.

However, this works only when both views are in the same database, and the old view works as it worked before the refactoring. In case the structure of underlying tables changes, the old view cannot work as it did before and comparison is not applicable. This problem can be solved by creating a temporary table from the data returned by the old view before refactoring, and then comparing that temporary table with the new view.

This can also be done by comparing the data from different databases; the old one before refactoring and the new one. One can use external tools to do so. For example, data from both the databases can be dumped into files using psql, and then these files can be compared using diff (this will work only if the rows have the same order). There are also some commercial tools that provide this functionality.

Another approach is connecting two databases, making queries, and making the comparison inside the database. This might seem complicated, but in fact, it is the fastest and most reliable way. There are a couple of methods to connect two databases: through the extensions, dblink (database link) or postgres_fdw (foreign data wrapper).

Using the dblink extension may seem easier than using postgres_fdw, and it allows performing different queries for different objects. However, this technology is older, and uses a syntax that is not standard-compliant and has performance issues, especially when big tables or views are queried.

On the other hand, postgres_fdw requires creating an object in the local database for each object in the remote database that is going to be accessed, which is not that convenient. However, this makes it easy to use the remote tables together with the local tables in queries, and it is faster.

In the example in the previous section, another database was created from the original database car_portal, and another field was added to the tablecar_portal_app.car.

Let's try to find out if that operation caused changes in the data. The following would need to be done:

  1. Connect to the new database as a super user:
user@host:~$ psql -h localhost -U postgres car_portal_new
psql (10beta3)
Type "help" for help.
car_portal_new=#
  1. Then create an extension for the foreign data wrapper. The binaries for the extension are included in the PostgreSQL server package:
car_portal_new=# CREATE EXTENSION postgres_fdw;
CREATE EXTENSION
  1. Once the extension has been created, create a server object and a user mapping:
car_portal_new=# CREATE SERVER car_portal_original FOREIGN DATA WRAPPER postgres_fdw 
OPTIONS (host 'localhost', dbname 'car_portal');
CREATE SERVER
car_portal_new=# CREATE USER MAPPING FOR CURRENT_USER SERVER car_portal_original;
CREATE USER MAPPING
  1. Create a foreign table and check whether we can query it:
car_portal_new=# CREATE FOREIGN TABLE car_portal_app.car_orignal (car_id int, number_of_owners int, 
registration_number text, manufacture_year int, number_of_doors int, car_model_id int, mileage int)
SERVER car_portal_original OPTIONS (table_name 'car');
CREATE FOREIGN TABLE
car_portal_new=# SELECT car_id FROM car_portal_app.car_orignal limit 1;
car_id
--------
1
(1 row)

Now the table that is in fact in a different database can be queried as if it was a normal table. It can be used in joins, filtering, grouping, everything that you would do in SQL will work.

Now the table is ready and it can be queried. To compare the data, the same query can be used as in the example we just saw for the old and new views:

car_portal_new=# WITH n AS (
SELECT car_id, number_of_owners, registration_number, manufacture_year, number_of_doors,
car_model_id, mileage
FROM car_portal_app.car),
o AS (SELECT * FROM car_portal_app.car_orignal)
SELECT 'new', * FROM (SELECT * FROM n EXCEPT ALL SELECT * FROM o) a
UNION ALL
SELECT 'old', * FROM (SELECT * FROM o EXCEPT ALL SELECT * FROM n) b;
?column? | car_id | number_of_owners | registration_number | manufacture_year | number_of_doors | car_model_id | mileage
----------+--------+------------------+---------------------+------------------+-----------------+
--------------+---------
(0 rows)

The result is empty. This means that the data in both tables is the same.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset