Ranking, numbering, aggregation examples (1/3)

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 3. DB2 UDB’s statistics, analytic, and OLAP functions 127

CUBE

The CUBE super-group is the other extension to the GROUP BY clause that

produces a result set that contains all the sub-total rows of a ROLLUP

aggregation and, in addition, contains ’cross-tabulation’ rows. Cross-tabulation

rows are additional ’super-aggregate’ rows. They are, as the name implies,

summaries across columns if the data were represented as a spreadsheet.

Like ROLLUP, a CUBE group can also be thought of as a series of grouping-sets.

In the case of a CUBE, all permutations of the cubed grouping-expression are

computed along with the grand total. Therefore, the “n” elements of a CUBE

translate to 2

(2 to the power n) grouping-sets. For instance, a specification of:

GROUP BY CUBE (a,b,c) is equivalent to

GROUP BY GROUPING SETS

(

(a,b,c)

(a,b)

(a,c)

(b,c)

(a)

(b)

(c)

()

)

Notice that the 3 elements of the CUBE translate to 8 grouping sets.

Unlike ROLLUP, the order of specification of elements does not matter for CUBE.

CUBE (DayOfYear, Sales_Person) is the same as CUBE (Sales_Person,DayOfYear)

CUBE is an extension of the Rollup clause. The CUBE clause not only provides

the column summaries we saw in rollup but also calculates the row summaries

and grand totals for the various dimensions.

3.3.3 Ranking, numbering, aggregation examples

These functions are useful in determining ranks, positioning, sequences and

medians. They have been used by:

? Financial institutions to identify top profitable customers

? International Olympic Committee to rank contestants, assign medals and

leading country medal rankings

Note: The order in which the

grouping-expressions

is specified is significant

for ROLLUP.

128 High-Function Business Intelligence in e-business

Medians are useful in applications where the average is greatly influenced by a

few extreme values, called outliers. Companies want to build sales campaigns

that hit the largest segment of their target population and not the average.

There is no DB2 function to compute the median of a set of data elements.

However, the following SQL can be used to compute the median of a set of data

elements.

Computing the median

The computation of the median value depends upon whether there are an odd

number, or an even number of data points in the set.

? For an odd number, the median is the middle element of the sorted rows. In

Example 3-12, the median value is 25.70.

? For an even number, the median is the average of the two middle elements of

the sorted rows. In Example 3-13, the average of the sum of (25.7 + 32.0) =

28.85. There are other ways of dealing with an even number of data elements

for example, return (n/2), or [(n+2) + 1] smallest value.

Example 3-12 Table containing an odd number of rows

1, 50.2

2, 25.7

3, 32.0

4, 17.2

5, 18.4

6, 19.6

7, 44.3

8, 22.5

9, 1000.7

Example 3-13 Table containing an even number of rows

1, 50.2

2, 25.7

3, 32.0

4, 17.2

5, 18.4

6, 19.6

7, 44.3

8, 1000.7

Note: (The median is the midpoint of a set of data elements. Take following

sequence of numbers: 3, 5, 7, 8, 37. The median is 7. The Average or mean is

12.

Chapter 3. DB2 UDB’s statistics, analytic, and OLAP functions 129

The SQL shown in Example 3-14 determines the median value for a table that

may include either an odd number of rows, or an even number of rows. In the

case of an odd number of rows, the virtual table ‘dt3’ has only a single row in it,

while in the case of an even number of rows, ‘dt3’ has 2 rows in it. When the

following query is issued against the table shown in Example 3-12, the median

value is 25.70, and when executed against the table shown in Example 3-13, the

median value is computed as 28.85.

Example 3-14 Compute median value with an even number of data points in the set

WITH

dt1 AS (SELECT purchases, ROWNUMBER() OVER (ORDER BY purchases) AS num

FROM cust_data),

dt2 AS (SELECT COUNT(purchases) + 1 AS count FROM dt1),

dt3 AS (SELECT purchases FROM dt1,dt2

WHERE num = FLOOR(count/2e0) OR num = CEILING(count/2e0))

SELECT DECIMAL(AVG(purchases),10,2) AS median FROM dt3

RANK example

Assume we would like to rank employees by total compensation and listed

alphabetically.

The SQL in Example 3-15 shows that the ORDER BY clause in the RANK ()

OVER statement controls only the ranking sequence and not output sequence.

Important: Note that there are no nulls in the set of data points. If nulls are

present, then they should be

excluded from the evaluation by adding to the

above query a predicate such as ...WHERE purchases IS NOT NULL.

Note: It is worth observing that the value for average purchases is distorted by

the customer making a purchase of 1000.70. This may or may not be a data

entry error.

Note: When specifying an OLAP function like Rank, Dense_Rank,

Row_Number, a window is specified that defines the rows over which the

function is applied, and in what order. This window is specified via the OVER()

clause.

130 High-Function Business Intelligence in e-business

Example 3-15 RANK() OVER example

SELECT EMPNO, LASTNAME, FIRSTNME, SALARY+BONUS AS TOTAL_SALARY,

RANK() OVER (ORDER BY SALARY+BONUS DESC) AS RANK_SALARY

FROM EMPLOYEE

WHERE SALARY+BONUS > 30000

ORDER BY LASTNAME

The result of this query is shown in Figure 3-9.

Figure 3-9 Employee rank by total salary

One can see gaps and duplicates in the ranks in the above result. If

DENSE_RANK is specified there will be no gaps in the sequential rank

numbering.

DENSE_RANK example

Non-distinct rows get the same rank but the row following a tie is given the next

sequential rank.

Rewriting the above SQL with DENSE_RANK would look as shown in

Example 3-16.

Example 3-16 DENSE_RANK() OVER example

SELECT EMPNO, LASTNAME, FIRSTNME, SALARY+BONUS AS TOTAL_SALARY,

DENSE_RANK() OVER (ORDER BY SALARY+BONUS DESC) AS RANK_SALARY

FROM EMPLOYEE

WHERE SALARY+BONUS > 30000

ORDER BY LASTNAME

Chapter 3. DB2 UDB’s statistics, analytic, and OLAP functions 131

The result of this query is shown in Figure 3-10.

Figure 3-10 Employee DENSE_RANK by total salary

ROW_NUMBER, RANK and DENSE_RANK example

Since ROW_NUMBER creates a sequential numbering to the rows in the

window, it can be used with an ORDER BY clause in the window to eliminate

gaps or duplicates.

Without the ORDER BY clause, ROW_NUMBER assigns sequential numbers to

rows arbitrarily as retrieved by the subselect. Such a result is not related to

ranking but merely assigning

arbitrary numbers to rows.

Example 3-17 shows the differences between RANK, DENSE_RANK and

ROW_NUMBER.

Example 3-17 ROW_NUMBER, RANK, DENSE_RANK example

SELECT EMPNO, LASTNAME, FIRSTNME, SALARY+BONUS AS TOTAL_SALARY,

RANK() OVER (ORDER BY SALARY+BONUS DESC) AS RANK_SALARY,

DENSE_RANK() OVER (ORDER BY SALARY+BONUS DESC) AS DENSERANK,

ROW_NUMBER() OVER (ORDER BY SALARY+BONUS DESC) AS ROW_NUMBER

FROM EMPLOYEE

WHERE SALARY+BONUS > 30000

Note: Since Nulls collate high, nulls in RANK and DENSERANK functions are

ranked first for descending rankings. This can be overridden with the “nulls

last” parameter, RANK () OVER (ORDER BY salary desc nulls last) as rank-

ing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Ranking, numbering, aggregation examples (1/3)

Create new playlist

Sign In

Sign Up

Table of Contents for
Ranking, numbering, aggregation examples (1/3)