Chapter 5. Temporal data management and analytics in an operational warehouse 175
Before looking into the technology, though, we first examine the concepts
involved and then consider several example use cases.
5.1.1 Temporal use cases for the operational data warehouse
There are numerous use cases in an operational data warehousing environment
where temporal awareness is useful or even required to manage and query the
data properly. We consider several here that illustrate key aspects of temporal
analysis.
Time-based business results analysis
One of the most common and most useful activities that can be performed in any
data warehouse environment is to report on some business results or metrics for
some specific point in time or to make a comparison between time periods. For
example, you might want to examine revenue totals for a product line for the
previous quarter and to compare those with results from the previous quarter or
the same quarter last year. Another example is the comparison of customer
churn rates before and after a special promotion or retention campaign. A third
example is reporting on trends of call failure rates in a telecommunication
network over the previous six months. There are any number of well-known
examples like these.
The common theme for all these examples is that the data warehouse solution
requires some means of capturing and tracking when a relevant business event
occurs. In many cases this is the date and time that a transaction took place, for
example, a withdrawal was made at a bank ATM; a shipment delivery was
confirmed at the destination; an insurance claim was paid to a provider; a user
clicked on a particular URL link on a website. Simply knowing when a
business-relevant event took place allows the data warehouse solution to easily
support the kinds of temporally-based analyses described here. Indeed, data
warehouses have been supporting these use cases for many years.
This use case gets more interesting, though, when you consider what happens
when information changes over time. For example, how does the system handle
changes in the dimensional information?
Changes in dimension attributes
Dimensions and dimension tables are used universally in data warehousing to
manage all of the contextual information “about” the business metrics. Common
dimensions include customer, product, sales organization, supplier, and so on.
For each dimension, several (or perhaps many) attributes are maintained for
each member of the dimension. For example, within a customer dimension, it is
common to find attributes that describe characteristics of each customer: their
176 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
home and business address, phone numbers, education level, marital status,
income bracket, employment status, credit rating, buying preferences, and so on.
A temporal-aware data warehouse platform requires a mechanism to handle
changes to these attributes over time.
Consider what happens when an existing customer in the USA relocates to a
new business address in a different state and continues to do business with your
organization. Over time your data warehouse will contain history of transactions
for this customer at both the old and the new address. This will cause an update
to be made to the customer’s home address attributes in the customer dimension
table (or tables). But what about the old address? What about the transactions
that occurred at the old address?
The impact of this change of address will vary based on the nature and scope of
queries that involve this customer. For example, if we report on sales revenue
aggregated by product line, or by quarter for the past two years, the change in
the customer’s business address will have no impact on the results. All of the
customer’s transactions will be summarized, both those before and those after
the relocation, up to the quarter or to the product line level.
However, if the report shows aggregated sales revenue geographically by state,
then you expect that the transactions at the old business address will appear
under one state summary, and the transactions that occurred after the relocation
will appear in a second state summary. That is, the customer’s transaction data
is now split, since the relocation.
Changes to dimensional attributes such as this are not uncommon and can
involve multiple attributes for the same dimension member at different points in
time. A robust data warehouse solution must provide some means to represent
and manage these types of changes over time. Queries for reports and
applications must be written in such a way to ensure that data that is split in time
across changed attributes returns accurate results.
One common way this type of change is handled is through a modeling and
implementation technique known as
slowly changing dimensions that was first
put forth and popularized by Ralph Kimball.
1
This technique is widely accepted
as the standard approach to managing these types of changes, but in the
following sections we describe how this can be handled in a new way with
InfoSphere Warehouse 10.
Changes in dimension hierarchy
Another class of changes that must be managed in the dimensional structure
over time is the hierarchy. A useful example of this in an operational retail
industry scenario can be found in product hierarchies.
1
Ralph Kimball et al.The Data Warehouse Lifecycle Toolkit, John Wiley & Sons, 2008
Chapter 5. Temporal data management and analytics in an operational warehouse 177
In a typical retail enterprise, individual products are organized in product lines
that might then also aggregate up to product categories and brands. Product
lines provide convenient summarizations of products for reporting and analysis
purposes, and for support of manufacturing operations, marketing campaigns,
and other areas. Products can be realigned from one product line or category to
another at any time for any number of business reasons to support increased
sales, more efficient distribution, manufacturing realignment, and so on.
Commonly such product realignment occurs frequently, and operational data
warehouse solutions must take these product realignments into account,
especially when reporting is mixed with historical data.
For example, consider a manufacturer and distributor of sports shoes. Assume
they market shoes for specific sports, including a product line of basketball
shoes. They might sell shoes in this product line for some period of time. Then
one day the company signs a big endorsement agreement with a major
professional basketball star. Having a number of shoes for different sports with
similar promotional deal, our vendor decides to manage these products under a
distinct product line called Signature Series. So from that day onward, they are
selling the exact same shoe, only now it is tracked under the Signature Series
product line instead of the Basketball product line.
This is a similar situation to the changing dimensional attributes discussed, but it
is different in that we are now talking about aggregation paths and parent and
child relationships in the dimensional hierarchy. We are changing the logical
structure of the product dimension.
Now, when the shoe company runs reports on sales of shoes by product line,
there are different ways that this product realignment can be handled. If we query
for sales of shoes in the Signature Series for the past 24 hours, then our
aggregate will include the new addition of the basketball shoe. But what about
queries that span back in time to include a period prior to the product
realignment?
For example, if we were to compare aggregate sales of Signature Series shoes
in the past 24 hours with daily aggregate daily sales over the past 14 days, how
can we handle the fact that the Signature Series did not always include the
basketball shoe?
The answer depends on what insights we are trying to learn from the query
results. If we want to see how the Signature Series is performing relative to the
exact same product mix over the past 14 days, then the basketball shoe will be
included in the aggregate results for each daily summarization as through it had
always been a part of that product line.
Alternatively, if we are more interested in seeing the impact that the addition of
the new basketball shoe has had on the Signature Series results, then we want
178 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
each daily aggregate to include only products that were truly in the Signature
Series for each particular day in the report. In this second scenario, the product
mix in the Signature Series might be different for each daily aggregate. There are
subtle and not-so-subtle nuances in the nature of each of these example
business queries that have to be handled by either the data warehouse platform
itself or coded into the business application logic for each application.
Information with effective start and end dates
It is quite common for activities and information of an enterprise to have validity
dates or effective start and end dates (or date stamps). In fact, all of the example
use cases given so far are actually specific examples of this general use case.
Even when no effective begin and end time or date is stated, it is implicitly
understood to being “now” and to extend “indefinitely” into the future. Many other
examples fall into this category of temporal awareness, such as:
???? Effective dates for an insurance policy
???? Start and end dates for a marketing promotion
???? Effective start and end dates for a lease agreement or any legal contract
???? Grace period for a credit card monthly payment
???? Period of time that a technology patent is in force
???? Effective start date for a new salary or employment position
???? Start and end dates for a tax filing or payment period
Many other examples can be listed, and they all have applicability across data
warehousing, including operational warehousing. In an operational scenario, you
might have to respond to a new sales promotion, review current account
conditions or take action prior to a deadline. Comparison of current conditions
must be made with previous offers or year-to-date summarizations.
What each of these examples has in common is the need for the system to
understand the business logic of the effective start and end dates (or dates and
times) and what values are in force during that time. The policies of the business
or organization will dictate what activities, operations, values, or transactions are
possible or valid within the effective start-end date range.
Handling changes in the database context
A completely different category of temporal awareness deals with changes to the
data itself in the context of the database or data warehouse system. For these
use cases, it is of most interest to note the following two items:
???? Some piece of data changed within a data record
???? The time stamp of when that change occurred
Further, it is useful to track the history of changes to data records.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset