Chapter 3. DB2 UDBs statistics, analytic, and OLAP functions 127
CUBE
The CUBE super-group is the other extension to the GROUP BY clause that
produces a result set that contains all the sub-total rows of a ROLLUP
aggregation and, in addition, contains cross-tabulation rows. Cross-tabulation
rows are additional super-aggregate rows. They are, as the name implies,
summaries across columns if the data were represented as a spreadsheet.
Like ROLLUP, a CUBE group can also be thought of as a series of grouping-sets.
In the case of a CUBE, all permutations of the cubed grouping-expression are
computed along with the grand total. Therefore, the n elements of a CUBE
translate to 2
n
(2 to the power n) grouping-sets. For instance, a specification of:
GROUP BY CUBE (a,b,c) is equivalent to
GROUP BY GROUPING SETS
(
(a,b,c)
(a,b)
(a,c)
(b,c)
(a)
(b)
(c)
()
)
Notice that the 3 elements of the CUBE translate to 8 grouping sets.
Unlike ROLLUP, the order of specification of elements does not matter for CUBE.
CUBE (DayOfYear, Sales_Person) is the same as CUBE (Sales_Person,DayOfYear)
CUBE is an extension of the Rollup clause. The CUBE clause not only provides
the column summaries we saw in rollup but also calculates the row summaries
and grand totals for the various dimensions.
3.3.3 Ranking, numbering, aggregation examples
These functions are useful in determining ranks, positioning, sequences and
medians. They have been used by:
? Financial institutions to identify top profitable customers
? International Olympic Committee to rank contestants, assign medals and
leading country medal rankings
Note: The order in which the
grouping-expressions
is specified is significant
for ROLLUP.
128 High-Function Business Intelligence in e-business
Medians are useful in applications where the average is greatly influenced by a
few extreme values, called outliers. Companies want to build sales campaigns
that hit the largest segment of their target population and not the average.
There is no DB2 function to compute the median of a set of data elements.
However, the following SQL can be used to compute the median of a set of data
elements.
Computing the median
The computation of the median value depends upon whether there are an odd
number, or an even number of data points in the set.
? For an odd number, the median is the middle element of the sorted rows. In
Example 3-12, the median value is 25.70.
? For an even number, the median is the average of the two middle elements of
the sorted rows. In Example 3-13, the average of the sum of (25.7 + 32.0) =
28.85. There are other ways of dealing with an even number of data elements
for example, return (n/2), or [(n+2) + 1] smallest value.
Example 3-12 Table containing an odd number of rows
1, 50.2
2, 25.7
3, 32.0
4, 17.2
5, 18.4
6, 19.6
7, 44.3
8, 22.5
9, 1000.7
Example 3-13 Table containing an even number of rows
1, 50.2
2, 25.7
3, 32.0
4, 17.2
5, 18.4
6, 19.6
7, 44.3
8, 1000.7
Note: (The median is the midpoint of a set of data elements. Take following
sequence of numbers: 3, 5, 7, 8, 37. The median is 7. The Average or mean is
12.
Chapter 3. DB2 UDBs statistics, analytic, and OLAP functions 129
The SQL shown in Example 3-14 determines the median value for a table that
may include either an odd number of rows, or an even number of rows. In the
case of an odd number of rows, the virtual table dt3 has only a single row in it,
while in the case of an even number of rows, dt3 has 2 rows in it. When the
following query is issued against the table shown in Example 3-12, the median
value is 25.70, and when executed against the table shown in Example 3-13, the
median value is computed as 28.85.
Example 3-14 Compute median value with an even number of data points in the set
WITH
dt1 AS (SELECT purchases, ROWNUMBER() OVER (ORDER BY purchases) AS num
FROM cust_data),
dt2 AS (SELECT COUNT(purchases) + 1 AS count FROM dt1),
dt3 AS (SELECT purchases FROM dt1,dt2
WHERE num = FLOOR(count/2e0) OR num = CEILING(count/2e0))
SELECT DECIMAL(AVG(purchases),10,2) AS median FROM dt3
RANK example
Assume we would like to rank employees by total compensation and listed
alphabetically.
The SQL in Example 3-15 shows that the ORDER BY clause in the RANK ()
OVER statement controls only the ranking sequence and not output sequence.
Important: Note that there are no nulls in the set of data points. If nulls are
present, then they should be
excluded from the evaluation by adding to the
above query a predicate such as ...WHERE purchases IS NOT NULL.
Note: It is worth observing that the value for average purchases is distorted by
the customer making a purchase of 1000.70. This may or may not be a data
entry error.
Note: When specifying an OLAP function like Rank, Dense_Rank,
Row_Number, a window is specified that defines the rows over which the
function is applied, and in what order. This window is specified via the OVER()
clause.
130 High-Function Business Intelligence in e-business
Example 3-15 RANK() OVER example
SELECT EMPNO, LASTNAME, FIRSTNME, SALARY+BONUS AS TOTAL_SALARY,
RANK() OVER (ORDER BY SALARY+BONUS DESC) AS RANK_SALARY
FROM EMPLOYEE
WHERE SALARY+BONUS > 30000
ORDER BY LASTNAME
The result of this query is shown in Figure 3-9.
Figure 3-9 Employee rank by total salary
One can see gaps and duplicates in the ranks in the above result. If
DENSE_RANK is specified there will be no gaps in the sequential rank
numbering.
DENSE_RANK example
Non-distinct rows get the same rank but the row following a tie is given the next
sequential rank.
Rewriting the above SQL with DENSE_RANK would look as shown in
Example 3-16.
Example 3-16 DENSE_RANK() OVER example
SELECT EMPNO, LASTNAME, FIRSTNME, SALARY+BONUS AS TOTAL_SALARY,
DENSE_RANK() OVER (ORDER BY SALARY+BONUS DESC) AS RANK_SALARY
FROM EMPLOYEE
WHERE SALARY+BONUS > 30000
ORDER BY LASTNAME
Chapter 3. DB2 UDBs statistics, analytic, and OLAP functions 131
The result of this query is shown in Figure 3-10.
Figure 3-10 Employee DENSE_RANK by total salary
2
ROW_NUMBER, RANK and DENSE_RANK example
Since ROW_NUMBER creates a sequential numbering to the rows in the
window, it can be used with an ORDER BY clause in the window to eliminate
gaps or duplicates.
Without the ORDER BY clause, ROW_NUMBER assigns sequential numbers to
rows arbitrarily as retrieved by the subselect. Such a result is not related to
ranking but merely assigning
arbitrary numbers to rows.
Example 3-17 shows the differences between RANK, DENSE_RANK and
ROW_NUMBER.
Example 3-17 ROW_NUMBER, RANK, DENSE_RANK example
SELECT EMPNO, LASTNAME, FIRSTNME, SALARY+BONUS AS TOTAL_SALARY,
RANK() OVER (ORDER BY SALARY+BONUS DESC) AS RANK_SALARY,
DENSE_RANK() OVER (ORDER BY SALARY+BONUS DESC) AS DENSERANK,
ROW_NUMBER() OVER (ORDER BY SALARY+BONUS DESC) AS ROW_NUMBER
FROM EMPLOYEE
WHERE SALARY+BONUS > 30000
Note: Since Nulls collate high, nulls in RANK and DENSERANK functions are
ranked first for descending rankings. This can be overridden with the nulls
last parameter, RANK () OVER (ORDER BY salary desc nulls last) as rank-
ing.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset