176 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
home and business address, phone numbers, education level, marital status,
income bracket, employment status, credit rating, buying preferences, and so on.
A temporal-aware data warehouse platform requires a mechanism to handle
changes to these attributes over time.
Consider what happens when an existing customer in the USA relocates to a
new business address in a different state and continues to do business with your
organization. Over time your data warehouse will contain history of transactions
for this customer at both the old and the new address. This will cause an update
to be made to the customer’s home address attributes in the customer dimension
table (or tables). But what about the old address? What about the transactions
that occurred at the old address?
The impact of this change of address will vary based on the nature and scope of
queries that involve this customer. For example, if we report on sales revenue
aggregated by product line, or by quarter for the past two years, the change in
the customer’s business address will have no impact on the results. All of the
customer’s transactions will be summarized, both those before and those after
the relocation, up to the quarter or to the product line level.
However, if the report shows aggregated sales revenue geographically by state,
then you expect that the transactions at the old business address will appear
under one state summary, and the transactions that occurred after the relocation
will appear in a second state summary. That is, the customer’s transaction data
is now split, since the relocation.
Changes to dimensional attributes such as this are not uncommon and can
involve multiple attributes for the same dimension member at different points in
time. A robust data warehouse solution must provide some means to represent
and manage these types of changes over time. Queries for reports and
applications must be written in such a way to ensure that data that is split in time
across changed attributes returns accurate results.
One common way this type of change is handled is through a modeling and
implementation technique known as
slowly changing dimensions that was first
put forth and popularized by Ralph Kimball.
1
This technique is widely accepted
as the standard approach to managing these types of changes, but in the
following sections we describe how this can be handled in a new way with
InfoSphere Warehouse 10.
Changes in dimension hierarchy
Another class of changes that must be managed in the dimensional structure
over time is the hierarchy. A useful example of this in an operational retail
industry scenario can be found in product hierarchies.
1
Ralph Kimball et al.The Data Warehouse Lifecycle Toolkit, John Wiley & Sons, 2008