Chapter 9. Managing data lifecyle with InfoSphere Warehouse 339
???? DB2 Workload Manager
DB2 Workload Manager (DB2 WLM) can be configured and used in
conjunction with storage group data tags to manage the resources assigned
to workloads that access data in a multi-temperature database.
For example, by creating a tag on a storage group, DB2 WLM can be
configured to assign a workload to a certain service subclass depending on
the storage group accessed, and then effect the resources and concurrency
rules applied to that workload. Using this approach you can change the
workload priority of a query based on the storage groups it touches. This
gives the administrator more control over queries that might access both hot
and cold data.
???? Temporal data
Creating a temporal table allows you to pass the processing task of
maintaining dimensional history over to the database engine. Deploying
temporal features for dimension tables further enhances your ability to
manage the data lifecycle, because temporal history data can be managed
separately from active data.
A change to a data row in a temporal table generates data in the associated
temporal history table. The temporal history table can be physically
implemented independently of the parent table. This means that it can be
partitioned, stored, and maintained (pruned) separately based on your
requirements.
For example, although a parent table might be range partitioned, a history
table with a smaller volume of rows does not have to be range partitioned,
correlating the maintenance effort with the table volume. Similarly, data can
be archived from the history table independently of the parent table.
???? Federated database
In situations where you need to archive aged or inactive data out of the
primary production database but retain data availability, you can migrate the
data to a federated database from where it can be referenced.
9.2.1 Using multi-temperatures features in a sample scenario
This section describes how to implement the concepts discussed to effectively
manage the cost of storage. The example uses a standard FACT table, which is
typically the largest table in a data warehouse.
The target environment has three distinct types of storage available to the
database which represent distinct cost groups for storage. The object of the
exercise as discussed in this chapter is to place data for the fact table on the
appropriate storage based on age. Figure 9-1 on page 340 shows the target
340 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
environment where SSD, SAS, and SATA storage is utilized in placing data for a
single fact table across multiple storage tiers.
The approach to the data lifecycle is through the implementation of storage
group, table partitioning, and table space objects for a sales transactions fact
table. Each data partition and table space is designed to hold data for a calendar
quarter. Data for the current quarter is always maintained on the fastest storage.
As data ages, the table space associated with the data partition is moved to
warm storage and then cold storage.
Figure 9-1 Physical implementation of a fact table to support multi-temperature storage through the data
lifecycle
The following steps show how this environment was created, using the example
of the sales transactions table CUSTOMER_TXN as shown in Figure 9-1. Each
row in the table will be placed in a data partition, table space and storage group
based on the transaction date.
1. Create a storage group.
A storage group is characterized by having one or more storage paths and
performance statistics for the DB2 optimizer to reference. Using IBM Data
sg_coldsg_hot
SSD RAID Array FC/SAS RAID Array SATA RAID Array
2011Q1 2010Q4 2010Q3 2010Q2 2010Q1 2009Q4 2007Q3
sg_warm
TbSpc14
TbSpc13 TbSpc12 TbSpc11 TbSpc10 TbSpc9 TbSpc1
...
...
Data partition
Table: Sales
Range
Partitions
Table
Spaces
Storage
Groups
Legend
...
Chapter 9. Managing data lifecyle with InfoSphere Warehouse 341
Studio, a separate storage group for hot, warm, and cold storage was created
where the storage paths were already created on the target storage
subsystems.
Each storage group is assigned an OVERHEAD and DEVICE READ RATE
value. These values help the DB2 optimizer understand the speed of the
underlying storage when compiling an access plan. The values for
OVERHEAD and DEVICE_READ_RATE can reflect the actual storage
capability, or you can alter these to reflect the temperature of data that you
intend to place on that storage.
For example, in an environment where all storage devices are the same but
where you still want DB2 to compile different access plans based on the
temperature of data, you can create different storage groups and assign
different parameter values to reflect temperature you want to assign to the
data on that storage device. Refer to the DB2 Information Center for a
complete description:
http://pic.dhe.ibm.com/infocenter/db2luw/v10r1/index.jsp?topic=%2Fcom.ibm.d
b2.luw.admin.perf.doc%2Fdoc%2Fc0005051.html
The hot storage group is set as the default storage group for the database;
any new table space created is assigned to the default hot storage group.
Figure 10-2 shows hot, warm, and cold storage groups as created in our test
environment.
Figure 9-2 Storage groups represented in IBM Data Studio
When you deploy these changes, click Advanced Options and include the
tasks of flushing the package cache and issuing the RUNSTATS command
342 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
so that the optimizer can build a new access plan for queries that access
these objects. Figure 9-3 shows an example.
Figure 9-3 Advanced options available when generating DDL in IBM Data Studio
2. Create table spaces for each data partition to be created.
Using Data Studio, create a table space for each data partition to be attached
to the table. Figure 9-4 shows an excerpt of table spaces created for each
quarter of 2010 and Q1 2011.
Figure 9-4 Table space list including FACTDATA table spaces for partitioned table
Chapter 9. Managing data lifecyle with InfoSphere Warehouse 343
When creating each table space, click Storage Options to assign the table
space to a storage group. Click I/O Settings to inherit Overhead, Transfer
rate, and Data tag values from the storage group (Figure 9-5).
Figure 9-5 Set table spaces for partitioned table to Inherit from storage group
At this point, the storage group and table spaces have been created. The
range partitioned table can now be created.
3. Create data partitions for the fact table.
In this example, a partitioned table is created to hold the sales transactions. In
our scenario, the table already exists as a regular table and has to be
migrated to a range partitioned table. We achieve this by doing the following:
Generate the DDL for the original table.
Modify the DDL to incorporate the data ranges you need and table spaces
to be used. Use a new table name.
Create the new table with required indexes.
Use the INSERT with subselect statement to copy the data from the old to
the new table.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset