286 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
2. Issue and adjust the number of instances of each component in operation to
address the area experiencing a performance or data bottleneck.
Create separate components for separate data sources.
For example, where source files with the same format arrive from
individual branches, process them separately using a different set of
staging tables. Where service level objectives require, you can process
the data in parallel.
Separate processes to select ranges of data in same staging table.
Define multiple components that uniquely identify different ranges of data
and insert select in parallel.
For example, create two fact table data flow components that select from
staging and insert into target production table in commit ranges of 1000
rows.
7.5.1 How Ingest and MQT tables correlate with data availability and
data latency
MQTs can greatly improve query performance by pre-computing the results of a
query or queries for some or all the columns in the underlying table or tables.
Additional storage and resources are required to compute the query and store
the results.
The DB2 optimizer rewrites all or a portion of a query to take advantage of an
MQT when it is determined that the MQT offers a better execution plan. The
query workload does not need to be rewritten to take advantage of the MQT. This
means that you can continue to identify new and refine existing MQTs to react to
changes in how data is accessed throughout the lifecycle of your data
warehouse.
An MQT can be populated automatically as the underlying data changes or
controlled manually; the manual method is recommended in a warehouse
environment to ensure control over all aspects of data processing and minimize
the number of MQT refreshes during data ingest cycles. It is best that all MQTs
are refreshed as part of the ETL application to help ensure data availability in line
with data refresh and service level objectives. Again, this avoids unnecessary
MQT refresh operations.
The features and functionality available through MQTs is well documented
elsewhere, for example:
http://www.ibm.com/developerworks/data/bestpractices/databasedesign/
Chapter 7. Understand and address data latency requirements 287
Example 7-1 represents one type of MQT, and shows how to create a partitioned
MQT for “customer sales by product line by day.
Example 7-1 Creating partitioned MQT
CREATE TABLE CSTINSIGHT.MQT_CUST_PD_SALES_DATE AS
(SELECT C.CST_NUM, C.FULL_NM, F.DATE_ID, P.PD_GRP, SUM(QTY_SOLD*UNIT_PRICE) AS
SALES_TOTAL_DAY
FROM CSTINSIGHT.CUSTOMER C, CSTINSIGHT.CUSTOMER_TXN F, CSTINSIGHT.PRODUCT P
WHERE F.PD_ID= F.PD_ID AND C.CST_NUM = F.CUST_NUM
GROUP BY F.CST_NUM, P.PD_ID)
DATA INITIALLY DEFERRED REFRESH DEFERRED
ENABLE QUERY OPTIMIZATION MAINTAINED BY SYSTEM
COMPRESS YES
DISTRIBUTE BY HASH(PD_ID)
PARTITION BY RANGE(DATE_ID)
(PART PART_PAST STARTING(MINVALUE)
ENDING(’2010-06-30’) EXCLUSIVE IN TS_PROD_MQT_SALES,
PART PART_2010_Q3 STARTING (’2010-07-01’)
ENDING(’2010-09-30’) EXCLUSIVE IN TS_ PROD _MQT_SALES,
PART PART_2010_Q4 STARTING (’2010-09-01’)
ENDING(’2011-12-31’) EXCLUSIVE IN TS_ PROD _MQT_SALES,
PART PART_2011_Q1 STARTING (’2011-01-01’)
ENDING(’2011-03-30’) EXCLUSIVE IN TS_ PROD _MQT_SALES)
ORGANIZE BY (DATE_ID);
After the MQT table is created, perform these tasks:
1. Populate the CSTINSIGHT.MQT_CUST_PD_SALES_DATE by issuing the
following SQL statement; this operation is logged in the database transaction
logs.
Example 7-2 Refresh MQT table
REFRESH TABLE CSTINSIGHT.MQT_CUST_PD_SALES_DATE;
2. After populating the MQT, create an index on product and date columns to
improve the performance of operations on the MQT by issuing the DDL
statements shown in Example 7-3.
Example 7-3 Create index on MQT table
CREATE INDEX CSTINSIGHT.IDX_MQT_SALES_PD_DATE_PROD ON
CSTINSIGHT.MQT_CUST_PD_SALES_DATE (PD_ID, DATE_ID);
3. Collect statistics on the MQT by issuing the SQL statement shown in
Example 7-4 on page 288.
288 Solving Operational Business Intelligence with InfoSphere Warehouse Advanced Edition
Example 7-4 Issue RUNSTATS process
RUNSTATS ON TABLE CSTINSIGHT.IDX_MQT_SALES_DATE_PROD WITH DISTRIBUTION AND
SAMPLED DETAILED INDEXES ALL;
The table created in Example 7-1 on page 287 must be refreshed each time you
want to accommodate changes to the underlying tables. In addition, MQT tables
can be layered. For example, to create an MQT similar to that this example but
for a granularity of Month rather than day, the SELECT statement would
incorporate the MQT shown here rather than the base table. In this way the
number of rows to be read to produce the new MQT are significantly reduced.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset