50 High-Function Business Intelligence in e-business
Example 2-18 Nullable columns or expressions in GROUP BY
CREATE TABLE S2 AS
(SELECT Period.year,
Product.id,
Fact.DeliveryCode,
GROUPING(Period.year) AS gpyear,
GROUPING(Product.id) AS gpprodid,
GROUPING(Fact.DeliveryCode) AS gpfactDC,
SUM(Fact.Quantity) AS Quantity,
SUM(Fact.Amount) AS Amount,
SUM(Fact.Quantity * Product.Price) AS QP_Amount
FROM Fact,
Product,
Period
WHERE Fact.prod_id = Product.id and
Fact.period_id = Period.id
GROUP BY rollup(Period.year,
Product.id,
Fact.DeliveryCode ))
DATA INITIALLY DEFERRED REFRESH DEFERRED
In this example, columns Period.Year, Product.id and Fact.DeliveryCode in
the GROUP BY are nullable. Therefore, the materialized view will have rows
with NULL values that may either be due to a rolled up aggregation
(sub-total), or due to NULL values from the base table itself. The GROUPING
identifier enables you to differentiate between these two cases. A value of one
in this column indicates that the row was the result of a sub-total from the
GROUP BY function, while a value of zero indicates otherwise. An example of
the results of using the GROUPING identifier is shown in GROUPING,
GROUP BY and CUBE example on page 138.
Attention: The assumption in the above scenarios is that the columns or
expressions in the GROUP BY list are defined as NOT NULL. In the event that
this is not true, that is, the GROUP BY list items are nullable, then the
materialized view
must have GROUPING identifiers defined as highlighted in
Example 2-18, in order for the DB2 optimizer to consider matching the user
query to the materialized view. The reason for this requirement is that the DB2
optimizer needs to distinguish between NULL values within a materialized
view that could either be due to a rolled up aggregation involving missing rows,
or due to actual NULL values in the base tables itself.
The GROUPING function
only needs to be defined for all nullable columns
when super aggregates (ROLLUP, CUBE and grouping sets) are involved.
Chapter 2. DB2 UDBs materialized views 51
The query in Example 2-19 has a ROLLUP on fewer columns (only
Period.Year, and Product.id) than in the materialized view.
Example 2-19 Nullable columns or expressions in GROUP BY user query
SELECT Period.year,
Product.id,
SUM(Fact.Quantity) AS Quantity,
SUM(Fact.Amount) AS Amount,
SUM(Fact.Quantity * Product.Price) AS QP_Amount
FROM Fact,
Product,
Period
WHERE Fact.prod_id = Product.id and
Fact.period_id = Period.id
GROUP BY rollup(Period.year,
Product.id)
Assuming this query matches the materialized view in Example 2-18 on
page 50, then the query would be rewritten as shown here in Example 2-20,
to filter out the unwanted rows.
Example 2-20 Rewritten query
SELECT year,
id,
Quantity,
Amount,
QP_Amount
FROM S2
WHERE gpfactDC = 1
The gpfactDC = 1 predicate in the rewritten query only selects those rows in the
materialized view returned by the GROUP BY function, and ignores those
relating to NULLs in the base table itself.
52 High-Function Business Intelligence in e-business
Figure 2-9 provides additional examples of matching conditions. It shows that the
GROUP BY expressions
must be derivable from the materialized view.
Figure 2-9 Matching GROUP BY and aggregate functions
3. Extra tables in the query:
DB2 is able to match user queries that contain more tables than those defined
in the materialized view, when the join predicates to the base tables can be
replaced by join predicates between the materialized view and the additional
tables.
Consider the materialized view shown in Example 2-21:
Example 2-21 Extra tables in the query materialized view
CREATE SUMMARY TABLE dba.trans_agg AS
(
SELECT ti.pgid, t.locid, t.acctid, t.status, YEAR(pdate) AS year,
MONTH(pdate) AS month, SUM(ti.amount) AS amount, COUNT(*) AS count
FROM transitem AS ti, trans AS t
WHERE ti.transid = t.transid
GROUP BY YEAR(pdate), MONTH(pdate), ti.pgid, t.locid, t.acctid, t.status
)
DATA INITIALLY DEFERRED REFRESH IMMEDIATE
A user query that looks like the one shown in Example 2-22 can be
considered matching the above materialized view even though it has an
additional location table
loc
included.
Example 2-22 Extra tables in the query matching query
SELECT YEAR(pdate) AS year, loc.country, SUM(ti.amount) AS amount, COUNT(*)
AS count
FROM transitem AS ti, trans AS t, loc AS loc
WHERE ti.transid = t.transid AND t.locid = loc.locid
AND YEAR(pdate) BETWEEN 1990 and 1999
GROUP BY YEAR(pdate), loc.country
QUERY Materialized View
Valid ?
... GROUP BY store_id ... GROUP BY cust_id, store_id
... GROUP BY ((cust_age + 5) / 10) ... GROUP BY cust_age / 10
... CUBE(cust_id, store_id) ... GROUPING SETS (cust_id, store_id)
... GROUPING SETS (cust_id, store_id) ... CUBE(cust_id, store_id)
... AVG(sales) ... ... SUM(sales), COUNT(*) ...
... COUNT(DISTINCT cust_id) ... ... cust_id, COUNT(*) ....
Yes
No
No
No
Yes
Yes
Chapter 2. DB2 UDBs materialized views 53
The
loc
is joined on the locid column to
trans
. The locid column is one of the
GROUP BY columns of the materialized view. DB2 can use this column to join
the relevant rows of the materialized view
after applying the YEAR predicate
with the
loc
table. The aggregated results can then be further consolidated by
grouping on the YEAR(pdate) and country.
4. Extra tables in the materialized view:
DB2 is able to match user queries against materialized views that have more
tables than defined in the query, in certain cases where referential integrity is
known to exist.
Consider the materialized view shown in Example 2-23.
Example 2-23 Extra tables in the materialized view
CREATE TABLE dba.PG_SALESSUM AS
(
SELECT l.lineid AS prodline, pg.pgid AS pgroup, loc.country, loc.state,
YEAR(pdate) AS year, MONTH(pdate) AS month, SUM(ti.amount) AS amount,
COUNT(*) AS count
FROM transitem AS ti, trans AS t, loc AS loc,
pgroup AS pg, prodline AS l
WHERE ti.transid = t.transid AND ti.pgid = pg.pgid
AND pg.lineid =l.lineid AND t.locid = loc.locid
GROUP BY loc.country, loc.state, year(pdate),month(pdate), l.lineid,
pg.pgid
)
DATA INITIALLY DEFERRED REFRESH IMMEDIATE
A user query as shown in Example 2-24 can be considered as matching the
above materialized view even though the materialized view has two more
tables
pgroup
and
prodline
than in the user query:
Example 2-24 Extra tables in the materialized view matching query
SELECT YEAR(pdate) AS year, loc.country,
SUM(ti.amount) AS amount, COUNT(*) AS count
FROM transitem AS ti, trans AS t, loc AS loc
WHERE ti.transid = t.transid AND t.locid = loc.locid
AND YEAR(pdate) BETWEEN 1990 and 1999
GROUP BY YEAR(pdate), loc.country
54 High-Function Business Intelligence in e-business
The query in Example 2-24 references three tables, while the materialized
view in Example 2-23 has five. It would appear at first glance that these
additional tables
pgroup
and
prodline
would affect the result of the above
query, if that materialized view were used in the query rewrite. This would be
true unless DB2 was aware of referential integrity relationships being
involved. For instance, if the
pgroup
and
prodline
tables were related to the
other tables through referential integrity, it will not affect the number of rows in
the result. They could be considered as look-up tables that are merely adding
columns to the output.
Example 2-25 Informational and system-maintained referential integrity constraints
-- INFORMATIONAL REFERENTIAL INTEGRITY CONSTRAINT
CREATE TABLE transitem (
.....
.....
pgid INT,
....
CONSTRAINT fk_pgid FOREIGN KEY (pgid) REFERENCES pgroup
ON DELETE CASCADE NOT ENFORCED
.....
-- SYSTEM-MAINTAINED REFERENTIAL INTEGRITY CONSTRAINT
CREATE TABLE pgroup (
.....
.....
lineid INT,
....
CONSTRAINT fk_lineid FOREIGN KEY (lineid) REFERENCES prodline
ON DELETE CASCADE ENFORCED
System-maintained referential integrity (where the constraint attribute is
ENFORCED), as well as informational referential integrity constraints let the
query rewrite component know of the existence of referential integrity. In such
cases, the additional tables in the materialized view are guaranteed not to add
or remove rows in the result, and the query rewrite engine can proceed with
the materialized view matching optimization, and ignore these tables.
Important: Referential integrity may either be system maintained, or
provided as informational constraints (see the NOT ENFORCED constraint
attribute in Example 2-25). With informational referential integrity
constraints, the onus is on the DBA to guarantee the integrity of reference,
since DB2 makes no attempt to enforce referential integrity. Informational
referential integrity constraints help the DB2 optimizer make superior
decisions about matching user queries that have fewer tables than those
defined in the materialized view.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset