Chapter 5. Temporal data management and analytics in an operational warehouse 231
5.3.2 System-period temporal tables and history table storage
Although the benefits of using system-period temporal tables in InfoSphere
Warehouse 10.1 have been thoroughly discussed at this point in the chapter, you
also must consider the trade-offs. The introduction and management of history
tables for every system-period temporal tables carries with it the performance
and storage overhead associated with managing that history table. In this
section, we consider these trade-offs and discuss various best practices to use
when selecting and designing for system-period temporal tables.
Selecting tables that require history
When selecting tables that require history, the first step is to consider the
implications of introducing system-period temporal tables and history tables and
make a careful selection of which tables truly require history. Avoid introducing
the history logic if there is no legitimate business justification that outweighs the
overhead.
Consider the following
7
resource implications for history tables:
???? History tables consume storage. Although mitigated or reduced by use of
DB2 compression, history tables will increase storage consumption by
retaining a complete copy of every changed row in the base table.
???? INSERT operations into the history table are subject to logging and impact
logging I/O bandwidth.
???? INSERTs into the history table use buffer pool resources and increase the
need for page cleaning
???? History tables usually have at least one index that carries its own storage and
maintenance overhead.
???? History tables impact elapsed time and resource requirements for database
backups.
The scale of impact of the resource utilization areas depends on several factors,
most important among them:
???? Frequency of UPDATE and DELETE operations on your base table
In traditional data warehouse environments, the primary table operation other
than SELECT is INSERT; most rows are inserted and left alone until removal
to archive or deletion. In these environments, the ratio of history records to
base table records will be quite low.
However, with the increasing emergence of operational data warehousing for
which InfoSphere Warehouse 10.1 is so well suited, we expect a much higher
7
Source: Matthias Nicola, “Best practices: Temporal data management with DB2”, IBM Corp., April
2012.
232 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
frequency of UPDATE and DELETE activities, and a high ratio of history to
base table rows. Thus, in the more active optional warehouse environments,
we expect to see history tables proportionally much larger than traditional
data warehouse environments.
???? Average number of rows affected by UPDATE and DELETE operations
This factor goes along with the previous point, but it also speaks more to data
model and application complexity. In the operational warehouse environment,
the more sophisticated business and client self-service applications and
functions tend to impact more tables in the schema and thus can compound
the impact on history table resource consumption for temporal tables.
???? Number of indexes defined on the history table and number of columns in
those indexes
This is again related to the previous point. The more activity on the history
table and the more complex the operations, then the more dependency on
indexes and the more resources they require.
Given these implications, business need and requirements determine where
history tracking and temporal data management is applied. Although it can
theoretically be applied everywhere, use it only where business necessity
dictates. Having made that determination, you can use the expected number of
history inserts (based on workload characteristics) to estimate the size of history
tables. Additional workload and business application analysis can help estimate
the number and complexity of indexes on the history tables required to support
the business requirements.
Vertical table splitting to reduce history table resources
8
Building on the this discussion, we understand that the use of history tables and
system-period temporal features introduces resource overhead to manage the
history tables. Because we consider that DB2 will write the
entire before-change
row contents from the base table into the history table, we can foresee that even
though an update on the base table might only change a single column such as a
2-character field, the history table requires space to store the entire row. With
large row sizes, which is common in operational data warehouse
implementations, we can see that there are fixed overhead costs for each row of
the history table even if only a small proportion of the row is actually changed.
Important: Business need should be the primary driver determining the need
for history tracking.
8
Source: Matthias Nicola, “Best practices: Temporal data management on DB2”, IBM Corp., April
2012.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset