This chapter introduces queries you may find helpful for creating reports. These typically involve reporting-specific formatting considerations along with different levels of aggregation. Another focus of this chapter is on transposing or pivoting result sets, converting rows into columns: reshaping the data.
In general,these recipes have in common that they allow you to present data in formats or shapes different to the way they are stored. As your comfort level increases with pivoting, you’ll undoubtedly find uses for it outside of what are presented in this chapter.
You wish to take values from groups of rows and turn those values into columns in a single row per group. For example, you have a result set displaying the number of employees in each department:
DEPTNO CNT ------ ---------- 10 3 20 5 30 6
You would like to reformat the output such the result set looks as follows:
DEPTNO_10 DEPTNO_20 DEPTNO_30 --------- ---------- ---------- 3 5 6
This is a classic example of data presented in a different shape to the way it is stored.
Transpose the result set using CASE and the aggregate function SUM:
1 select sum(case when deptno=10 then 1 else 0 end) as deptno_10, 2 sum(case when deptno=20 then 1 else 0 end) as deptno_20, 3 sum(case when deptno=30 then 1 else 0 end) as deptno_30 4 from emp
This example is an excellent introduction to pivoting. The concept is simple: for each row returned by the unpivoted query, use a CASE expression to separate the rows into columns. Then, because this particular problem is to count the number of employees per department, use the aggregate function SUM to count the occurrence of each DEPTNO. If you’re having trouble understanding how this works exactly, execute the query with the aggregate function SUM and include DEPTNO for readability:
select deptno,
case when deptno=10 then 1 else 0 end as deptno_10,
case when deptno=20 then 1 else 0 end as deptno_20,
case when deptno=30 then 1 else 0 end as deptno_30
from emp
order by 1
DEPTNO DEPTNO_10 DEPTNO_20 DEPTNO_30 ------ ---------- ---------- ---------- 10 1 0 0 10 1 0 0 10 1 0 0 20 0 1 0 20 0 1 0 20 0 1 0 20 0 1 0 30 0 0 1 30 0 0 1 30 0 0 1 30 0 0 1 30 0 0 1 30 0 0 1
You can think of each CASE expression as a flag to determine which DEPTNO a row belongs to. At this point, the “rows to columns” transformation is already done; the next step is to simply sum the values returned by DEPTNO_10, DEPTNO_20, and DEPTNO_30, and then to group by DEPTNO. Following are the results:
select deptno,
sum(case when deptno=10 then 1 else 0 end) as deptno_10,
sum(case when deptno=20 then 1 else 0 end) as deptno_20,
sum(case when deptno=30 then 1 else 0 end) as deptno_30
from emp
group by deptno
DEPTNO DEPTNO_10 DEPTNO_20 DEPTNO_30 ------ ---------- ---------- ---------- 10 3 0 0 20 0 5 0 30 0 0 6
If you inspect this result set, you see that logically the output makes sense; for example, DEPTNO 10 has 3 employees in DEPTNO_10 and zero in the other departments. Since the goal is to return one row, the last step is to remove the DEPTNO and GROUP BY clause, and simply sum the CASE expressions:
select sum(case when deptno=10 then 1 else 0 end) as deptno_10,
sum(case when deptno=20 then 1 else 0 end) as deptno_20,
sum(case when deptno=30 then 1 else 0 end) as deptno_30
from emp
DEPTNO_10 DEPTNO_20 DEPTNO_30 --------- ---------- ---------- 3 5 6
Following is another approach that you may sometimes see applied to this same sort of problem:
select max(case when deptno=10 then empcount else null end) as deptno_10 max(case when deptno=20 then empcount else null end) as deptno_20, max(case when deptno=10 then empcount else null end) as deptno_30 from ( select deptno, count(*) as empcount from emp group by deptno ) x
This approach uses an inline view to generate the employee counts per department. CASE expressions in the main query translate rows to columns, getting you to the following results:
DEPTNO_10 DEPTNO_20 DEPTNO_30 --------- ---------- ---------- 3 NULL NULL NULL 5 NULL NULL NULL 6
Then the MAX functions collapses the columns into one row:
DEPTNO_10 DEPTNO_20 DEPTNO_30 --------- ---------- ---------- 3 5 6
You want to turn rows into columns by creating a column corresponding to each of the values in a single given column. However, unlike in the previous recipe, you need multiple rows of output. Like the earlier recipe, pivoting into multiple rows is a fundamental method of reshaping data.
For example, you want to return each employee and their position (JOB), and you currently use a query that returns the following result set:
JOB ENAME --------- ---------- ANALYST SCOTT ANALYST FORD CLERK SMITH CLERK ADAMS CLERK MILLER CLERK JAMES MANAGER JONES MANAGER CLARK MANAGER BLAKE PRESIDENT KING SALESMAN ALLEN SALESMAN MARTIN SALESMAN TURNER SALESMAN WARD
You would like to format the result set such that each job gets its own column:
CLERKS ANALYSTS MGRS PREZ SALES ------ -------- ----- ---- ------ MILLER FORD CLARK KING TURNER JAMES SCOTT BLAKE MARTIN ADAMS JONES WARD SMITH ALLEN
Unlike the first recipe in this chapter, the result set for this recipe consists of more than one row. Using the previous recipe’s technique will not work for this recipe, as the MAX(ENAME) for each JOB would be returned, which would result in one ENAME for each JOB (i.e., one row will be returned as in the first recipe). To solve this problem, you must make each JOB/ENAME combination unique. Then, when you apply an aggregate function to remove NULLs, you don’t lose any ENAMEs.
Use the ranking function ROW_NUMBER OVER to make each JOB/ENAME combination unique. Pivot the result set using a CASE expression and the aggregate function MAX while grouping on the value returned by the window function:
1 select max(case when job='CLERK' 2 then ename else null end) as clerks, 3 max(case when job='ANALYST' 4 then ename else null end) as analysts, 5 max(case when job='MANAGER' 6 then ename else null end) as mgrs, 7 max(case when job='PRESIDENT' 8 then ename else null end) as prez, 9 max(case when job='SALESMAN' 10 then ename else null end) as sales 11 from ( 12 select job, 13 ename, 14 row_number()over(partition by job order by ename) rn 15 from emp 16 ) x 17 group by rn
The first step is to use the window function ROW_NUMBER OVER to help make each JOB/ENAME combination unique:
select job,
ename,
row_number()over(partition by job order by ename) rn
from emp
JOB ENAME RN --------- ---------- ---------- ANALYST FORD 1 ANALYST SCOTT 2 CLERK ADAMS 1 CLERK JAMES 2 CLERK MILLER 3 CLERK SMITH 4 MANAGER BLAKE 1 MANAGER CLARK 2 MANAGER JONES 3 PRESIDENT KING 1 SALESMAN ALLEN 1 SALESMAN MARTIN 2 SALESMAN TURNER 3 SALESMAN WARD 4
Giving each ENAME a unique “row number” within a given job prevents any problems that might otherwise result from two employees having the same name and job. The goal here is to be able to group on row number (on RN) without dropping any employees from the result set due to the use of MAX. This step is the most important step in solving the problem. Without this first step, the aggregation in the outer query will remove necessary rows. Consider what the result set would look like without using ROW_NUMBER OVER, using the same technique as seen in the first recipe:
select max(case when job='CLERK'
then ename else null end) as clerks,
max(case when job='ANALYST'
then ename else null end) as analysts,
max(case when job='MANAGER'
then ename else null end) as mgrs,
max(case when job='PRESIDENT'
then ename else null end) as prez,
max(case when job='SALESMAN'
then ename else null end) as sales
from emp
CLERKS ANALYSTS MGRS PREZ SALES ---------- ---------- ---------- ---------- ---------- SMITH SCOTT JONES KING WARD
Unfortunately, only one row is returned for each JOB: the employee with the MAX ENAME. When it comes time to pivot the result set, using MIN or MAX should serve as a means to remove NULLs from the result set, not restrict the ENAMEs returned. How this works will be come clearer as you continue through the explanation.
The next step uses a CASE expression to organize the ENAMEs into their proper column (JOB):
select rn,
case when job='CLERK'
then ename else null end as clerks,
case when job='ANALYST'
then ename else null end as analysts,
case when job='MANAGER'
then ename else null end as mgrs,
case when job='PRESIDENT'
then ename else null end as prez,
case when job='SALESMAN'
then ename else null end as sales
from (
select job,
ename,
row_number()over(partition by job order by ename) rn
from emp
) x
RN CLERKS ANALYSTS MGRS PREZ SALES -- ---------- ---------- ---------- ---------- ---------- 1 FORD 2 SCOTT 1 ADAMS 2 JAMES 3 MILLER 4 SMITH 1 BLAKE 2 CLARK 3 JONES 1 KING 1 ALLEN 2 MARTIN 3 TURNER 4 WARD
At this point, the rows are transposed into columns and the last step is to remove the NULLs to make the result set more readable. To remove the NULLs use the aggregate function MAX and group by RN. (You can use the function MIN as well. The choice to use MAX is arbitrary, as you will only ever be aggregating one value per group.) There is only one value for each RN/JOB/ENAME combination. Grouping by RN in conjunction with the CASE expressions embedded within the calls to MAX ensures that each call to MAX results in picking only one name from a group of otherwise NULL values:
select max(case when job='CLERK'
then ename else null end) as clerks,
max(case when job='ANALYST'
then ename else null end) as analysts,
max(case when job='MANAGER'
then ename else null end) as mgrs,
max(case when job='PRESIDENT'
then ename else null end) as prez,
max(case when job='SALESMAN'
then ename else null end) as sales
from (
select job,
ename,
row_number()over(partition by job order by ename) rn
from emp
) x
group by rn
CLERKS ANALYSTS MGRS PREZ SALES ------ -------- ----- ---- ------ MILLER FORD CLARK KING TURNER JAMES SCOTT BLAKE MARTIN ADAMS JONES WARD SMITH ALLEN
The technique of using ROW_NUMBER OVER to create unique combinations of rows is extremely useful for formatting query results. Consider the query below that creates a sparse report showing employees by DEPTNO and JOB:
select deptno dno, job,
max(case when deptno=10
then ename else null end) as d10,
max(case when deptno=20
then ename else null end) as d20,
max(case when deptno=30
then ename else null end) as d30,
max(case when job='CLERK'
then ename else null end) as clerks,
max(case when job='ANALYST'
then ename else null end) as anals,
max(case when job='MANAGER'
then ename else null end) as mgrs,
max(case when job='PRESIDENT'
then ename else null end) as prez,
max(case when job='SALESMAN'
then ename else null end) as sales
from (
Select deptno,
job,
ename,
row_number()over(partition by job order by ename) rn_job,
row_number()over(partition by job order by ename) rn_deptno
from emp
) x
group by deptno, job, rn_deptno, rn_job
order by 1
DNO JOB D10 D20 D30 CLERKS ANALS MGRS PREZ SALES --- --------- ------ ----- ------ ------ ----- ----- ---- ------ 10 CLERK MILLER MILLER 10 MANAGER CLARK CLARK 10 PRESIDENT KING KING 20 ANALYST FORD FORD 20 ANALYST SCOTT SCOTT 20 CLERK ADAMS ADAMS 20 CLERK SMITH SMITH 20 MANAGER JONES JONES 30 CLERK JAMES JAMES 30 MANAGER BLAKE BLAKE 30 SALESMAN ALLEN ALLEN 30 SALESMAN MARTIN MARTIN 30 SALESMAN TURNER TURNER 30 SALESMAN WARD WARD
By simply modifying what you group by (hence the nonaggregate items in the SELECT list above), you can produce reports with different formats. It is worth the time of changing things around to understand how these formats change based on what you include in your GROUP BY clause.
You want to transform columns to rows. Consider the following result set:
DEPTNO_10 DEPTNO_20 DEPTNO_30 ---------- ---------- ---------- 3 5 6
You would like to convert that to:
DEPTNO COUNTS_BY_DEPT ------ -------------- 10 3 20 5 30 6
Some readers may have noticed that the first listing is the output from the first recipe in this chapter. To make this output available for this recipe, we can store it in a view with the following query:
create view emp_cnts as ( select sum(case when deptno=10 then 1 else 0 end) as deptno_10, sum(case when deptno=20 then 1 else 0 end) as deptno_20, sum(case when deptno=30 then 1 else 0 end) as deptno_30 from emp )
In the solution and discussion that follow, the queries will refer to the EMP_CNTS view created by the preceding query.
Examining the desired result set, it’s easy to see that you can execute a simple COUNT and GROUP BY on table EMP to produce the desired result. The object here, though, is to imagine that the data is not stored as rows; perhaps the data is denormalized and aggregated values are stored as multiple columns.
To convert columns to rows, use a Cartesian product. You’ll need to know in advance how many columns you want to convert to rows because the table expression you use to create the Cartesian product must have a cardinality of at least the number of columns you want to transpose.
Rather than create a denormalized table of data, the solution for this recipe will use the solution from the first recipe of this chapter to create a “wide” result set. The full solution is as follows:
1 select dept.deptno, 2 case dept.deptno 3 when 10 then emp_cnts.deptno_10 4 when 20 then emp_cnts.deptno_20 5 when 30 then emp_cnts.deptno_30 6 end as counts_by_dept 7 from emp_cnts cross join 8 (select deptno from dept where deptno <= 30) dept
The view EMP_CNTS represents the denormalized view, or “wide” result set that you want to convert to rows, and is shown below:
DEPTNO_10 DEPTNO_20 DEPTNO_30 --------- ---------- ---------- 3 5 6
Because there are three columns, you will create three rows. Begin by creating a Cartesian product between inline view EMP_CNTS and some table expression that has at least three rows. The following code uses table DEPT to create the Cartesian product; DEPT has four rows:
The Cartesian product enables you to return a row for each column in inline view EMP_CNTS. Since the final result set should have only the DEPTNO and the number of employees in said DEPTNO, use a CASE expression to transform the three columns into one:
select dept.deptno,
case dept.deptno
when 10 then emp_cnts.deptno_10
when 20 then emp_cnts.deptno_20
when 30 then emp_cnts.deptno_30
end as counts_by_dept
from (
emp_cnts
cross join (select deptno from dept where deptno <= 30) dept
DEPTNO COUNTS_BY_DEPT ------ -------------- 10 3 20 5 30 6
You want to return all columns from a query as just one column. For example, you want to return the ENAME, JOB, and SAL of all employees in DEPTNO 10, and you want to return all three values in one column. You want to return three rows for each employee and one row of white space between employees. You want to return the following result set:
EMPS ---------- CLARK MANAGER 2450 KING PRESIDENT 5000 MILLER CLERK 1300
The key is to use a recursive CTE combined with Cartesian product to return four rows for each employee. The recursive CTE needed is seen in Chapter 10 and explored further in Appendix B. Using the Cartesian join lets you return one column value per row and have an extra row for spacing between employees.
Use the window function ROW_NUMBER OVER to rank each row based on EMPNO (1–4). Then use a CASE expression to transform three columns into one (the keyword RECURSIVE is needed after the first WITH in PostgreSQL and MySQL) :
1 with four_rows (id) 2 as 3 ( 4 select 1 5 union all 6 select id+1 7 from four_rows 8 where id < 4 9 ) 10 , 11 x_tab (ename,job,sal,rn ) 12 as 13 ( select e.ename,e.job,e.sal, 14 row_number()over(partition by e.empno 15 order by e.empno) 16 from emp e 17 join four_rows on 1=1 18 ) 19 20 select 21 case rn 22 when 1 then ename 23 when 2 then job 24 when 3 then cast(sal as char(4)) 25 end emps 26 from x_tab
The first step is to use the window function ROW_NUMBER OVER to create a ranking for each employee in DEPTNO 10:
select e.ename,e.job,e.sal, row_number()over(partition by e.empno order by e.empno) rn from emp e where e.deptno=10 ENAME JOB SAL RN ---------- --------- ---------- ---------- CLARK MANAGER 2450 1 KING PRESIDENT 5000 1 MILLER CLERK 1300 1
At this point the ranking doesn’t mean much. You are partitioning by EMPNO, so the rank is 1 for all three rows in DEPTNO 10. Once you add the Cartesian product, the rank will begin to take shape, as can be seen in the following results:
[listing]
with four_rows (id) as (select 1 union all select id+1 from four_rows where id < 4 ) select e.ename,e.job,e.sal, row_number()over(partition by e.empno order by e.empno) from emp e join four_rows on 1=1 ENAME JOB SAL RN ---------- --------- ---------- ---------- CLARK MANAGER 2450 1 CLARK MANAGER 2450 2 CLARK MANAGER 2450 3 CLARK MANAGER 2450 4 KING PRESIDENT 5000 1 KING PRESIDENT 5000 2 KING PRESIDENT 5000 3 KING PRESIDENT 5000 4 MILLER CLERK 1300 1 MILLER CLERK 1300 2 MILLER CLERK 1300 3 MILLER CLERK 1300 4
You should stop at this point and understand two key points:
RN is no longer 1 for each employee; it is now a repeating sequence of values from 1 to 4, the reason being, window functions are applied after the FROM and WHERE clauses are evaluated. So, partitioning by EMPNO causes the RN to reset to 1 when a new employee is encountered.
We’ve used a recursive CTE to ensure that for each employee there are four rows. We don’t need the RECURSIVE keyword in SQL Server or DB2 but we do for Oracle, MySQL and PostgreSQL.
The hard work is now done and all that is left is to use a CASE expression to put ENAME, JOB, and SAL into one column for each employee (you need to use CAST to convert SAL to a string to keep CASE happy):
with four_rows (id) as (select 1 union all select id+1 from four_rows where id < 4 ) , x_tab (ename,job,sal,rn ) as (select e.ename,e.job,e.sal, row_number()over(partition by e.empno order by e.empno) from emp e join four_rows on 1=1) select case rn when 1 then ename when 2 then job when 3 then cast(sal as char(4)) end emps from x_tab EMPS ---------- CLARK MANAGER 2450 KING PRESIDENT 5000 MILLER CLERK 1300
You are generating a report, and, when two rows have the same value in a column, you wish to display that value only once. For example, you want to return DEPTNO and ENAME from table EMP, you wish to group all rows for each DEPTNO, and you wish to display each DEPTNO only one time. You want to return the following result set:
DEPTNO ENAME ------ --------- 10 CLARK KING MILLER 20 SMITH ADAMS FORD SCOTT JONES 30 ALLEN BLAKE MARTIN JAMES TURNER WARD
This is a simple formatting problem that is easily solved by the window function LAG OVER.
1 select 2 case when 3 lag(deptno)over(order by deptno) = deptno then null 4 else deptno end DEPTNO 5 , ename 6 from emp
Oracle users can also use DECODE as an alternative to CASE:
1 select to_number( 2 decode(lag(deptno)over(order by deptno), 3 deptno,null,deptno) 4 ) deptno, ename 5 from emp
The first step is to use the window function LAG OVER to return the prior DEPTNO for each row:
select lag(deptno)over(order by deptno) lag_deptno, deptno, ename from emp LAG_DEPTNO DEPTNO ENAME ---------- ---------- ---------- 10 CLARK 10 10 KING 10 10 MILLER 10 20 SMITH 20 20 ADAMS 20 20 FORD 20 20 SCOTT 20 20 JONES 20 30 ALLEN 30 30 BLAKE 30 30 MARTIN 30 30 JAMES 30 30 TURNER 30 30 WARD
If you inspect the result set above, you can easily see where DEPTNO matches LAG_ DEPTNO. For those rows, you want to set DEPTNO to NULL. Do that by using DECODE (TO_NUMBER is included to cast DEPTNO as a number):
select to_number(
CASE WHEN (lag(deptno)over(order by deptno) = deptno THEN null else deptno END deptno ,
deptno,null,deptno)
) deptno, ename
from emp
DEPTNO ENAME ------ ---------- 10 CLARK KING MILLER 20 SMITH ADAMS FORD SCOTT JONES 30 ALLEN BLAKE MARTIN JAMES TURNER WARD
=== Pivoting a Result Set to Facilitate Inter-Row Calculations
==== Problem
You wish to make calculations involving data from multiple rows. To make your job easier, you wish to pivot those rows into columns such that all values you need are then in a single row.
In this book’s example data, DEPTNO 20 is the department with the highest combined salary, which you can confirm by executing the following query:
select deptno, sum(sal) as sal
from emp
group by deptno
DEPTNO SAL ------ ---------- 10 8750 20 10875 30 9400
You want to calculate the difference between the salaries of DEPTNO 20 and DEPTNO 10 and between DEPTNO 20 and DEPTNO 30.
The final result will look like this:
d20_10_diff d20_30_diff ------------ ---------- 2125 1475
==== Solution
Transpose the totals using the aggregate function SUM and a CASE expression. Then code your expressions in the select list:
1 select d20_sal - d10_sal as d20_10_diff, 2 d20_sal - d30_sal as d20_30_diff 3 from ( 4 select sum(case when deptno=10 then sal end) as d10_sal, 5 sum(case when deptno=20 then sal end) as d20_sal, 6 sum(case when deptno=30 then sal end) as d30_sal 7 from emp 8 ) totals_by_dept
It is also possible to write this query using a Common Table Expression, which some people may find more readable:
with
totals_by_dept
(
d10_sal
,
d20_sal
,
d30_sal
)
as
(
select
sum
(
case
when
deptno
=
10
then
sal
end
)
as
d10_sal
,
sum
(
case
when
deptno
=
20
then
sal
end
)
as
d20_sal
,
sum
(
case
when
deptno
=
30
then
sal
end
)
as
d30_sal
from
emp
)
select
d20_sal
-
d10_sal
as
d20_10_diff
,
d20_sal
-
d30_sal
as
d20_30_diff
from
totals_by_dept
==== Discussion
The first step is to pivot the salaries for each DEPTNO from rows to columns by using a CASE expression:
select case when deptno=10 then sal end as d10_sal,
case when deptno=20 then sal end as d20_sal,
case when deptno=30 then sal end as d30_sal
from emp
D10_SAL D20_SAL D30_SAL ------- ---------- ---------- 800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300
The next step is to sum all the salaries for each DEPTNO by applying the aggregate function SUM to each CASE expression:
select sum(case when deptno=10 then sal end) as d10_sal,
sum(case when deptno=20 then sal end) as d20_sal,
sum(case when deptno=30 then sal end) as d30_sal
from emp
D10_SAL D20_SAL D30_SAL ------- ---------- ---------- 8750 10875 9400
The final step is to simply wrap the above SQL in an inline view and perform the subtractions.
=== Creating Buckets of Data, of a Fixed Size
==== Problem
You wish to organized data into evenly sized buckets, with a predetermined number of elements in each bucket. The total number of buckets may be unknown, but you want to ensure that each bucket has five elements. For example, you want to organize the employees in table EMP into groups of five based on the value of EMPNO, as shown in the following results:
GRP EMPNO ENAME --- ---------- ------- 1 7369 SMITH 1 7499 ALLEN 1 7521 WARD 1 7566 JONES 1 7654 MARTIN 2 7698 BLAKE 2 7782 CLARK 2 7788 SCOTT 2 7839 KING 2 7844 TURNER 3 7876 ADAMS 3 7900 JAMES 3 7902 FORD 3 7934 MILLER
==== Solution
The solution to this problem is greatly simplified by functions for ranking rows. Once rows are ranked, creating buckets of five is simply a matter of dividing and then taking the mathematical ceiling of the quotient.
Use the window function ROW_NUMBER OVER to rank each employee by EMPNO. Then divide by 5 to create the groups (SQL Server users will use CEILING, not CEIL):
1 select ceil(row_number()over(order by empno)/5.0) grp, 2 empno, 3 ename 4 from emp
==== Discussion
The window function ROW_NUMBER OVER assigns a rank or “row number” to each row sorted by EMPNO:
select row_number()over(order by empno) rn,
empno,
ename
from emp
RN EMPNO ENAME -- ---------- ---------- 1 7369 SMITH 2 7499 ALLEN 3 7521 WARD 4 7566 JONES 5 7654 MARTIN 6 7698 BLAKE 7 7782 CLARK 8 7788 SCOTT 9 7839 KING 10 7844 TURNER 11 7876 ADAMS 12 7900 JAMES 13 7902 FORD 14 7934 MILLER
The next step is to apply the function CEIL (or CEILING) after dividing ROW_ NUMBER OVER by five. Dividing by five logically organizes the rows into groups of five, i.e., five values less than or equal to 1, five values greater than 1 but less than or equal to 2, the remaining group (composed of the last four rows since 14, the number of rows in table EMP, is not a multiple of 5) has a value greater than 2 but less than or equal to 3.
The CEIL function will return the smallest whole number greater than the value passed to it; this will create whole number groups. The results of the division and application of the CEIL are shown below. You can follow the order of operation from left to right, from RN to DIVISION to GRP:
select row_number()over(order by empno) rn,
row_number()over(order by empno)/5.0 division,
ceil(row_number()over(order by empno)/5.0) grp,
empno,
ename
from emp
RN DIVISION GRP EMPNO ENAME -- ---------- --- ----- ---------- 1 .2 1 7369 SMITH 2 .4 1 7499 ALLEN 3 .6 1 7521 WARD 4 .8 1 7566 JONES 5 1 1 7654 MARTIN 6 1.2 2 7698 BLAKE 7 1.4 2 7782 CLARK 8 1.6 2 7788 SCOTT 9 1.8 2 7839 KING 10 2 2 7844 TURNER 11 2.2 3 7876 ADAMS 12 2.4 3 7900 JAMES 13 2.6 3 7902 FORD 14 2.8 3 7934 MILLER
=== Creating a Predefined Number of Buckets
==== Problem
You want to organize your data into a fixed number of buckets. For example, you want to organize the employees in table EMP into four buckets. The result set should look similar to the following:
GRP EMPNO ENAME --- ----- --------- 1 7369 SMITH 1 7499 ALLEN 1 7521 WARD 1 7566 JONES 2 7654 MARTIN 2 7698 BLAKE 2 7782 CLARK 2 7788 SCOTT 3 7839 KING 3 7844 TURNER 3 7876 ADAMS 4 7900 JAMES 4 7902 FORD 4 7934 MILLER
This is a common way to organise categorical data as dividing a set into a number of smaller equal sized sets is an important first step for many kinds of analysis. For example, taking the averages of these groups on salary or any other value may reveal a trend that is concealed by variability when looking at the cases individually.
This problem is the opposite of the previous recipe, where you had an unknown number of buckets but a predetermined number of elements in each bucket. In this recipe, the goal is such that you may not necessarily know how many elements are in each bucket, but you are defining a fixed (known) number of buckets to be created.
==== Solution
The solution to this problem is simple now the NTILE function is widely available. NTILE organizes an ordered set into the number of buckets you specify, with any stragglers distributed into the available buckets starting from the first bucket. The desired result set for this recipe reflects this: buckets 1 and 2 have four rows while buckets 3 and 4 have three rows.
Use the NTILE window function to create four buckets:
1 select ntile(4)over(order by empno) grp, 2 empno, 3 ename 4 from emp
==== Discussion
All the work is done by the NTILE function. The ORDER BY clause puts the rows into the desired order, and the function itself then assigns a group number to each row e.g. so that the first quarter (in this case) are put into group one, the second into group 2 etc.
=== Creating Horizontal Histograms
==== Problem
You want to use SQL to generate histograms that extend horizontally. For example, you want to display the number of employees in each department as a horizontal histogram with each employee represented by an instance of “*”. You want to return the following result set:
DEPTNO CNT ------ ---------- 10 *** 20 ***** 30 ******
==== Solution
The key to this solution is to use the aggregate function COUNT, and use GROUP BY DEPTNO to determine the number of employees in each DEPTNO. The value returned by COUNT is then passed to a string function that generates a series of “*” characters.
===== DB2
Use the REPEAT function to generate the histogram:
1 select deptno, 2 repeat('*',count(*)) cnt 3 from emp 4 group by deptno
===== Oracle, PostgreSQL, and MySQL
Use the LPAD function to generate the needed strings of “*” characters:
1 select deptno, 2 lpad('*',count(*),'*') as cnt 3 from emp 4 group by deptno
===== SQL Server
Generate the histogram using the REPLICATE function:
1 select deptno, 2 replicate('*',count(*)) cnt 3 from emp 4 group by deptno
==== Discussion
The technique is the same for all vendors. The only difference lies in the string function used to return a “*” for each employee. The Oracle solution will be used for this discussion, but the explanation is relevant for all the solutions.
The first step is to count the number of employees in each department:
select deptno,
count(*)
from emp
group by deptno
DEPTNO COUNT(*) ------ ---------- 10 3 20 5 30 6
The next step is to use the value returned by COUNT to control the number of " * " characters to return for each department. Simply pass COUNT( * ) as an argument to the string function LPAD to return the desired number of " * “s:
select deptno,
lpad('*',count(*),'*') as cnt
from emp
group by deptno
DEPTNO CNT ------ ---------- 10 *** 20 ***** 30 ******
For PostgreSQL users, you may need to use CAST to ensure that COUNT(*) returns an integer as can be seen below:
select deptno,
lpad('*',count(*)::integer,'*') as cnt
from emp
group by deptno
DEPTNO CNT ------ ---------- 10 *** 20 ***** 30 ******
This CAST is necessary because PostgreSQL requires the numeric argument to LPAD to be an integer.
=== Creating Vertical Histograms
==== Problem
You want to generate a histogram that grows from the bottom up. For example, you want to display the number of employees in each department as a vertical histogram with each employee represented by an instance of “*”. You want to return the following result set:
D10 D20 D30 --- --- --- * * * * * * * * * * * * * *
==== Solution
The technique used to solve this problem is built on a technique used earlier in this chapter: use the ROW_NUMBER OVER function to uniquely identify each instance of “*” for each DEPTNO. Use the aggregate function MAX to pivot the result set and group by the values returned by ROW_NUMBER OVER (SQL Server users should not use DESC in the ORDER BY clause):
1 select max(deptno_10) d10, 2 max(deptno_20) d20, 3 max(deptno_30) d30 4 from ( 5 select row_number()over(partition by deptno order by empno) rn, 6 case when deptno=10 then '*' else null end deptno_10, 7 case when deptno=20 then '*' else null end deptno_20, 8 case when deptno=30 then '*' else null end deptno_30 9 from emp 10 ) x 11 group by rn 12 order by 1 desc, 2 desc, 3 desc
==== Discussion
The first step is to use the window function ROW_NUMBER to uniquely identify each instance of “*” in each department. Use a CASE expression to return a “*” for each employee in each department:
select row_number()over(partition by deptno order by empno) rn,
case when deptno=10 then '*' else null end deptno_10,
case when deptno=20 then '*' else null end deptno_20,
case when deptno=30 then '*' else null end deptno_30
from emp
RN DEPTNO_10 DEPTNO_20 DEPTNO_30 -- ---------- ---------- --------- 1 * 2 * 3 * 1 * 2 * 3 * 4 * 5 * 1 * 2 * 3 * 4 * 5 * 6 *
The next and last step is to use the aggregate function MAX on each CASE expression, grouping by RN to remove the NULLs from the result set. Order the results ASC or DESC depending on how your RDBMS sorts NULLs:
select max(deptno_10) d10,
max(deptno_20) d20,
max(deptno_30) d30
from (
select row_number()over(partition by deptno order by empno) rn,
case when deptno=10 then '*' else null end deptno_10,
case when deptno=20 then '*' else null end deptno_20,
case when deptno=30 then '*' else null end deptno_30
from emp
) x
group by rn
order by 1 desc, 2 desc, 3 desc
D10 D20 D30 --- --- --- * * * * * * * * * * * * * *
=== Returning Non-GROUP BY Columns
==== Problem
You are executing a GROUP BY query, and you wish to return columns in your select list that are not also listed in your GROUP BY clause. This is not normally possible, as such ungrouped columns would not represent a single value per row.
Say that you want to find the employees who earn the highest and lowest salaries in each department, as well as the employees who earn the highest and lowest salaries in each job. You want to see each employee’s name, the department he works in, his job title, and his salary. You want to return the following result set:
DEPTNO ENAME JOB SAL DEPT_STATUS JOB_STATUS ------ ------ --------- ----- --------------- -------------- 10 MILLER CLERK 1300 LOW SAL IN DEPT TOP SAL IN JOB 10 CLARK MANAGER 2450 LOW SAL IN JOB 10 KING PRESIDENT 5000 TOP SAL IN DEPT TOP SAL IN JOB 20 SCOTT ANALYST 3000 TOP SAL IN DEPT TOP SAL IN JOB 20 FORD ANALYST 3000 TOP SAL IN DEPT TOP SAL IN JOB 20 SMITH CLERK 800 LOW SAL IN DEPT LOW SAL IN JOB 20 JONES MANAGER 2975 TOP SAL IN JOB 30 JAMES CLERK 950 LOW SAL IN DEPT 30 MARTIN SALESMAN 1250 LOW SAL IN JOB 30 WARD SALESMAN 1250 LOW SAL IN JOB 30 ALLEN SALESMAN 1600 TOP SAL IN JOB 30 BLAKE MANAGER 2850 TOP SAL IN DEPT
Unfortunately, including all these columns in the SELECT clause will ruin the grouping. Consider the following example. Employee “KING” earns the highest salary. You want to verify this with the following query:
select ename,max(sal) from empgroup by ename
Instead of seeing “KING” and KING’s salary, the above query will return all 14 rows from table EMP. The reason is because of the grouping: the MAX(SAL) is applied to each ENAME. So, it would seem the above query can be stated as “find the employee with the highest salary” but in fact what it is doing is “find the highest salary for each ENAME in table EMP.” This recipe explains a technique for including ENAME without the need to GROUP BY that column.
==== Solution
Use an inline view to find the high and low salaries by DEPTNO and JOB. Then keep only the employees who make those salaries.
Use the window functions MAX OVER and MIN OVER to find the highest and lowest salaries by DEPTNO and JOB. Then keep the rows where the salaries are those that are highest or lowest by DEPTNO or JOB:
1 select deptno,ename,job,sal, 2 case when sal = max_by_dept 3 then 'TOP SAL IN DEPT' 4 when sal = min_by_dept 5 then 'LOW SAL IN DEPT' 6 end dept_status, 7 case when sal = max_by_job 8 then 'TOP SAL IN JOB' 9 when sal = min_by_job 10 then 'LOW SAL IN JOB' 11 end job_status 12 from ( 13 select deptno,ename,job,sal, 14 max(sal)over(partition by deptno) max_by_dept, 15 max(sal)over(partition by job) max_by_job, 16 min(sal)over(partition by deptno) min_by_dept, 17 min(sal)over(partition by job) min_by_job 18 from emp 19 ) emp_sals 20 where sal in (max_by_dept,max_by_job, 21 min_by_dept,min_by_job)
==== Discussion
The first step is to use the window functions MAX OVER and MIN OVER to find the highest and lowest salaries by DEPTNO and JOB:
select deptno,ename,job,sal,
max(sal)over(partition by deptno) maxDEPT,
max(sal)over(partition by job) maxJOB,
min(sal)over(partition by deptno) minDEPT,
min(sal)over(partition by job) minJOB
from emp
DEPTNO ENAME JOB SAL MAXDEPT MAXJOB MINDEPT MINJOB ------ ------ --------- ----- ------- ------ ------- ------ 10 MILLER CLERK 1300 5000 1300 1300 800 10 CLARK MANAGER 2450 5000 2975 1300 2450 10 KING PRESIDENT 5000 5000 5000 1300 5000 20 SCOTT ANALYST 3000 3000 3000 800 3000 20 FORD ANALYST 3000 3000 3000 800 3000 20 SMITH CLERK 800 3000 1300 800 800 20 JONES MANAGER 2975 3000 2975 800 2450 20 ADAMS CLERK 1100 3000 1300 800 800 30 JAMES CLERK 950 2850 1300 950 800 30 MARTIN SALESMAN 1250 2850 1600 950 1250 30 TURNER SALESMAN 1500 2850 1600 950 1250 30 WARD SALESMAN 1250 2850 1600 950 1250 30 ALLEN SALESMAN 1600 2850 1600 950 1250 30 BLAKE MANAGER 2850 2850 2975 950 2450
At this point, every salary can be compared with the highest and lowest salaries by DEPTNO and JOB. Notice that the grouping (the inclusion of multiple columns in the SELECT clause) does not affect the values returned by MIN OVER and MAX OVER. This is the beauty of window functions: the aggregate is computed over a defined “group” or partition and returns multiple rows for each group. The last step is to simply wrap the window functions in an inline view and keep only those rows that match the values returned by the window functions. Use a simple CASE expression to display the “status” of each employee in the final result set:
select deptno,ename,job,sal,
case when sal = max_by_dept
then 'TOP SAL IN DEPT'
when sal = min_by_dept
then 'LOW SAL IN DEPT'
end dept_status,
case when sal = max_by_job
then 'TOP SAL IN JOB'
when sal = min_by_job
then 'LOW SAL IN JOB'
end job_status
from (
select deptno,ename,job,sal,
max(sal)over(partition by deptno) max_by_dept,
max(sal)over(partition by job) max_by_job,
min(sal)over(partition by deptno) min_by_dept,
min(sal)over(partition by job) min_by_job
from emp
) x
where sal in (max_by_dept,max_by_job,
min_by_dept,min_by_job)
DEPTNO ENAME JOB SAL DEPT_STATUS JOB_STATUS ------ ------ --------- ----- --------------- -------------- 10 MILLER CLERK 1300 LOW SAL IN DEPT TOP SAL IN JOB 10 CLARK MANAGER 2450 LOW SAL IN JOB 10 KING PRESIDENT 5000 TOP SAL IN DEPT TOP SAL IN JOB 20 SCOTT ANALYST 3000 TOP SAL IN DEPT TOP SAL IN JOB 20 FORD ANALYST 3000 TOP SAL IN DEPT TOP SAL IN JOB 20 SMITH CLERK 800 LOW SAL IN DEPT LOW SAL IN JOB 20 JONES MANAGER 2975 TOP SAL IN JOB 30 JAMES CLERK 950 LOW SAL IN DEPT 30 MARTIN SALESMAN 1250 LOW SAL IN JOB 30 WARD SALESMAN 1250 LOW SAL IN JOB 30 ALLEN SALESMAN 1600 TOP SAL IN JOB 30 BLAKE MANAGER 2850 TOP SAL IN DEPT
=== Calculating Simple Subtotals
==== Problem
For the purposes of this recipe, a “simple subtotal” is defined as a result set that contains values from the aggregation of one column along with a grand total value for the table. An example would be a result set that sums the salaries in table EMP by JOB, and that also includes the sum of all salaries in table EMP. The summed salaries by JOB are the subtotals, and the sum of all salaries in table EMP is the grand total. Such a result set should look as follows:
JOB SAL --------- ---------- ANALYST 6000 CLERK 4150 MANAGER 8275 PRESIDENT 5000 SALESMAN 5600 TOTAL 29025
==== Solution
The ROLLUP extension to the GROUP BY clause solves this problem perfectly. If ROLLUP is not available for your RDBMS, you can solve the problem, albeit with more difficulty, using a scalar subquery or a UNION query.
===== DB2 and Oracle
Use the aggregate function SUM to sum the salaries, and use the ROLLUP extension of GROUP BY to organize the results into subtotals (by JOB) and a grand total (for the whole table):
1 select case grouping(job) 2 when 0 then job 3 else 'TOTAL' 4 end job, 5 sum(sal) sal 6 from emp 7 group by rollup(job)
===== SQL Server and MySQL
Use the aggregate function SUM to sum the salaries, and use WITH ROLLUP to organize the results into subtotals (by JOB) and a grand total (for the whole table). Then use COALESCE to supply the label TOTAL for the grand total row (which will otherwise have a NULL in the job column):
1 select coalesce(job,'TOTAL') job, 2 sum(sal) sal 3 from emp 4 group by job with rollup
With SQL Server, you also have the option to use the GROUPING function shown in the Oracle/DB2 recipe rather than COALESCE to determine the level of aggregation.
===== PostgreSQL
Similar to the SQL Server and MySQL solutions use the ROLLUP extension to GROUP BY with slightly different syntax:
select coalesce(job,'TOTAL') job, sum(sal) sal from emp group by rollup(job)
==== Discussion
===== DB2 and Oracle
The first step is to use the aggregate function SUM, grouping by JOB in order to sum the salaries by JOB:
select job, sum(sal) sal
from emp
group by job
JOB SAL --------- ----- ANALYST 6000 CLERK 4150 MANAGER 8275 PRESIDENT 5000 SALESMAN 5600
The next step is to use the ROLLUP extension to GROUP BY to produce a grand total for all salaries along with the subtotals for each JOB:
select job, sum(sal) sal
from emp
group by rollup(job)
JOB SAL --------- ------- ANALYST 6000 CLERK 4150 MANAGER 8275 PRESIDENT 5000 SALESMAN 5600 29025
The last step is to use the GROUPING function in the JOB column to display a label for the grand total. If the value of JOB is NULL, the GROUPING function will return 1, which signifies that the value for SAL is the grand total created by ROLLUP. If the value of JOB is not NULL, the GROUPING function will return 0, which signifies the value for SAL is the result of the GROUP BY, not the ROLLUP. Wrap the call to GROUPING(JOB) in a CASE expression that returns either the job name or the label TOTAL, as appropriate:
select case grouping(job)
when 0 then job
else 'TOTAL'
end job,
sum(sal) sal
from emp
group by rollup(job)
JOB SAL --------- ---------- ANALYST 6000 CLERK 4150 MANAGER 8275 PRESIDENT 5000 SALESMAN 5600 TOTAL 29025
===== SQL Server and MySQL
The first step is to use the aggregate function SUM, grouping the results by JOB to generate salary sums by JOB:
select job, sum(sal) sal
from emp
group by job
JOB SAL --------- ----- ANALYST 6000 CLERK 4150 MANAGER 8275 PRESIDENT 5000 SALESMAN 5600
The next step is to use GROUP BY’s ROLLUP extension to produce a grand total for all salaries along with the subtotals for each JOB:
select job, sum(sal) sal
from emp
group by job with rollup
JOB SAL --------- ------- ANALYST 6000 CLERK 4150 MANAGER 8275 PRESIDENT 5000 SALESMAN 5600 29025
The last step is to use the COEALESCE function against the JOB column. If the value of JOB is NULL, the value for SAL is the grand total created by ROLLUP. If the value of JOB is not NULL, the value for SAL is the result of the “regular” GROUP BY, not the ROLLUP:
select coalesce(job,'TOTAL') job,
sum(sal) sal
from emp
group by job with rollup
JOB SAL --------- ---------- ANALYST 6000 CLERK 4150 MANAGER 8275 PRESIDENT 5000 SALESMAN 5600 TOTAL 29025
===== PostgreSQL
The solution is the same in its manner of operation as the preceeding solution for MySQL and SQL Server. The only difference is the syntax for the ROLLUP clause: write ROLLUP(JOB) after GROUP BY.
=== Calculating Subtotals for All Possible Expression Combinations
==== Problem
You want to find the sum of all salaries by DEPTNO, and by JOB, for every JOB/ DEPTNO combination. You also want a grand total for all salaries in table EMP. You want to return the following result set:
DEPTNO JOB CATEGORY SAL ------ --------- --------------------- ------- 10 CLERK TOTAL BY DEPT AND JOB 1300 10 MANAGER TOTAL BY DEPT AND JOB 2450 10 PRESIDENT TOTAL BY DEPT AND JOB 5000 20 CLERK TOTAL BY DEPT AND JOB 1900 30 CLERK TOTAL BY DEPT AND JOB 950 30 SALESMAN TOTAL BY DEPT AND JOB 5600 30 MANAGER TOTAL BY DEPT AND JOB 2850 20 MANAGER TOTAL BY DEPT AND JOB 2975 20 ANALYST TOTAL BY DEPT AND JOB 6000 CLERK TOTAL BY JOB 4150 ANALYST TOTAL BY JOB 6000 MANAGER TOTAL BY JOB 8275 PRESIDENT TOTAL BY JOB 5000 SALESMAN TOTAL BY JOB 5600 10 TOTAL BY DEPT 8750 30 TOTAL BY DEPT 9400 20 TOTAL BY DEPT 10875 GRAND TOTAL FOR TABLE 29025
==== Solution
Extensions added to GROUP BY in recent years make this a fairly easy problem to solve. If your platform does not supply such extensions for computing various levels of subtotals, then you must compute them manually (via self joins or scalar subqueries).
===== DB2
For DB2, you will need to use CAST to return from GROUPING as the CHAR(1) data type:
1 select deptno, 2 job, 3 case cast(grouping(deptno) as char(1))|| 4 cast(grouping(job) as char(1)) 5 when '00' then 'TOTAL BY DEPT AND JOB' 6 when '10' then 'TOTAL BY JOB' 7 when '01' then 'TOTAL BY DEPT' 8 when '11' then 'TOTAL FOR TABLE' 9 end category, 10 sum(sal) 11 from emp 12 group by cube(deptno,job) 13 order by grouping(job),grouping(deptno)
===== Oracle
Use the CUBE extension to the GROUP BY clause with the concatenation operator ||:
1 select deptno, 2 job, 3 case grouping(deptno)||grouping(job) 4 when '00' then 'TOTAL BY DEPT AND JOB' 5 when '10' then 'TOTAL BY JOB' 6 when '01' then 'TOTAL BY DEPT' 7 when '11' then 'GRAND TOTALFOR TABLE' 8 end category, 9 sum(sal) sal 10 from emp 11 group by cube(deptno,job) 12 order by grouping(job),grouping(deptno)
===== SQL Server
Use the CUBE extension to the GROUP BY clause. For SQL Server, you will need to CAST the results from GROUPING to CHAR(1), and you will need to use the + operator for concatenation (as opposed to Oracle’s || operator):
1 select deptno, 2 job, 3 case cast(grouping(deptno)as char(1))+ 4 cast(grouping(job)as char(1)) 5 when '00' then 'TOTAL BY DEPT AND JOB' 6 when '10' then 'TOTAL BY JOB' 7 when '01' then 'TOTAL BY DEPT' 8 when '11' then 'GRAND TOTAL FOR TABLE' 9 end category, 10 sum(sal) sal 11 from emp 12 group by deptno,job with cube 13 order by grouping(job),grouping(deptno)
===== PostgreSQL
PostgreSQL is similar to the preceding, but with slightly different syntax for the CUBE operator and the concatenation.
select deptno,job ,case concat( cast (grouping(deptno) as char(1)),cast (grouping(job) as char(1)) ) when '00' then 'TOTAL BY DEPT AND JOB' when '10' then 'TOTAL BY JOB' when '01' then 'TOTAL BY DEPT' when '11' then 'GRAND TOTAL FOR TABLE' end category , sum(sal) as sal from emp group by cube(deptno,job)
===== MySQL
Although part of the functionality is available, it is not complete, as MySQL has no CUBE fucntion. Hence, use multiple UNION ALLs, creating different sums for each:
1 select deptno, job, 2 'TOTAL BY DEPT AND JOB' as category, 3 sum(sal) as sal 4 from emp 5 group by deptno, job 6 union all 7 select null, job, 'TOTAL BY JOB', sum(sal) 8 from emp 9 group by job 10 union all 11 select deptno, null, 'TOTAL BY DEPT', sum(sal) 12 from emp 13 group by deptno 14 union all 15 select null,null,'GRAND TOTAL FOR TABLE', sum(sal) 16 from emp
==== Discussion
===== Oracle, DB2, and SQL Server
The solutions for all three are essentially the same. The first step is to use the aggregate function SUM and group by both DEPTNO and JOB to find the total salaries for each JOB and DEPTNO combination:
select deptno, job, sum(sal) sal
from emp
group by deptno, job
DEPTNO JOB SAL ------ --------- ------- 10 CLERK 1300 10 MANAGER 2450 10 PRESIDENT 5000 20 CLERK 1900 20 ANALYST 6000 20 MANAGER 2975 30 CLERK 950 30 MANAGER 2850 30 SALESMAN 5600
The next step is to create subtotals by JOB and DEPTNO along with the grand total for the whole table. Use the CUBE extension to the GROUP BY clause to perform aggregations on SAL by DEPTNO, JOB, and for the whole table:
select deptno,
job,
sum(sal) sal
from emp
group by cube(deptno,job)
DEPTNO JOB SAL ------ --------- ------- 29025 CLERK 4150 ANALYST 6000 MANAGER 8275 SALESMAN 5600 PRESIDENT 5000 10 8750 10 CLERK 1300 10 MANAGER 2450 10 PRESIDENT 5000 20 10875 20 CLERK 1900 20 ANALYST 6000 20 MANAGER 2975 30 9400 30 CLERK 950 30 MANAGER 2850 30 SALESMAN 5600
Next, use the GROUPING function in conjunction with CASE to format the results into more meaningful output. The value from GROUPING(JOB) will be 1 or 0 depending on whether or not the values for SAL are due to the GROUP BY or the CUBE. If the results are due to the CUBE, the value will be 1, otherwise it will be 0. The same goes for GROUPING(DEPTNO). Looking at the first step of the solution, you should see that grouping is done by DEPTNO and JOB. Thus, the expected values from the calls to GROUPING when a row represents a combination of both DEPTNO and JOB is 0. The query below confirms this:
select deptno,
job,
grouping(deptno) is_deptno_subtotal,
grouping(job) is_job_subtotal,
sum(sal) sal
from emp
group by cube(deptno,job)
order by 3,4
DEPTNO JOB IS_DEPTNO_SUBTOTAL IS_JOB_SUBTOTAL SAL ------ --------- ------------------ --------------- ------- 10 CLERK 0 0 1300 10 MANAGER 0 0 2450 10 PRESIDENT 0 0 5000 20 CLERK 0 0 1900 30 CLERK 0 0 950 30 SALESMAN 0 0 5600 30 MANAGER 0 0 2850 20 MANAGER 0 0 2975 20 ANALYST 0 0 6000 10 0 1 8750 20 0 1 10875 30 0 1 9400 CLERK 1 0 4150 ANALYST 1 0 6000 MANAGER 1 0 8275 PRESIDENT 1 0 5000 SALESMAN 1 0 5600 1 1 29025
The final step is to use a CASE expression to determine which category each row belongs to based on the values returned by GROUPING(JOB) and GROUPING(DEPTNO) concatenated:
select deptno,
job,
case grouping(deptno)||grouping(job)
when '00' then 'TOTAL BY DEPT AND JOB'
when '10' then 'TOTAL BY JOB'
when '01' then 'TOTAL BY DEPT'
when '11' then 'GRAND TOTAL FOR TABLE'
end category,
sum(sal) sal
from emp
group by cube(deptno,job)
order by grouping(job),grouping(deptno)
DEPTNO JOB CATEGORY SAL ------ --------- --------------------- ------- 10 CLERK TOTAL BY DEPT AND JOB 1300 10 MANAGER TOTAL BY DEPT AND JOB 2450 10 PRESIDENT TOTAL BY DEPT AND JOB 5000 20 CLERK TOTAL BY DEPT AND JOB 1900 30 CLERK TOTAL BY DEPT AND JOB 950 30 SALESMAN TOTAL BY DEPT AND JOB 5600 30 MANAGER TOTAL BY DEPT AND JOB 2850 20 MANAGER TOTAL BY DEPT AND JOB 2975 20 ANALYST TOTAL BY DEPT AND JOB 6000 CLERK TOTAL BY JOB 4150 ANALYST TOTAL BY JOB 6000 MANAGER TOTAL BY JOB 8275 PRESIDENT TOTAL BY JOB 5000 SALESMAN TOTAL BY JOB 5600 10 TOTAL BY DEPT 8750 30 TOTAL BY DEPT 9400 20 TOTAL BY DEPT 10875 GRAND TOTAL FOR TABLE 29025
This Oracle solution implicitly converts the results from the GROUPING functions to a character type in preparation for concatenating the two values. DB2 and SQL Server users will need to explicitly CAST the results of the GROUPING functions to CHAR(1) as shown in the solution. In addition, SQL Server users must use the + operator, and not the || operator, to concatenate the results from the two GROUPING calls into one string.
For Oracle and DB2 users, there is an additional extension to GROUP BY called GROUPING SETS; this extension is extremely useful. For example, you can use GROUPING SETS to mimic the output created by CUBE as is done below (DB2 and SQL Server users will need to use CAST to ensure the values returned by the GROUPING function are in the correct format in the same way as in the CUBE solution):
select deptno,
job,
case grouping(deptno)||grouping(job)
when '00' then 'TOTAL BY DEPT AND JOB'
when '10' then 'TOTAL BY JOB'
when '01' then 'TOTAL BY DEPT'
when '11' then 'GRAND TOTAL FOR TABLE'
end category,
sum(sal) sal
from emp
group by grouping sets ((deptno),(job),(deptno,job),())
DEPTNO JOB CATEGORY SAL ------ --------- --------------------- ------- 10 CLERK TOTAL BY DEPT AND JOB 1300 20 CLERK TOTAL BY DEPT AND JOB 1900 30 CLERK TOTAL BY DEPT AND JOB 950 20 ANALYST TOTAL BY DEPT AND JOB 6000 10 MANAGER TOTAL BY DEPT AND JOB 2450 20 MANAGER TOTAL BY DEPT AND JOB 2975 30 MANAGER TOTAL BY DEPT AND JOB 2850 30 SALESMAN TOTAL BY DEPT AND JOB 5600 10 PRESIDENT TOTAL BY DEPT AND JOB 5000 CLERK TOTAL BY JOB 4150 ANALYST TOTAL BY JOB 6000 MANAGER TOTAL BY JOB 8275 SALESMAN TOTAL BY JOB 5600 PRESIDENT TOTAL BY JOB 5000 10 TOTAL BY DEPT 8750 20 TOTAL BY DEPT 10875 30 TOTAL BY DEPT 9400 GRAND TOTAL FOR TABLE 29025
What’s great about GROUPING SETS is that it allows you to define the groups. The GROUPING SETS clause in the preceding query causes groups to be created by DEPTNO, by JOB, by the combination of DEPTNO and JOB, and finally the empty parenthesis requests a grand total. GROUPING SETS gives you enormous flexibility for creating reports with different levels of aggregation; for example, if you wanted to modify the preceding example to exclude the GRAND TOTAL, simply modify the GROUPING SETS clause by excluding the empty parentheses:
/* no grand total */select deptno,
job,
case grouping(deptno)||grouping(job)
when '00' then 'TOTAL BY DEPT AND JOB'
when '10' then 'TOTAL BY JOB'
when '01' then 'TOTAL BY DEPT'
when '11' then 'GRAND TOTAL FOR TABLE'
end category,
sum(sal) sal
from emp
group by grouping sets ((deptno),(job),(deptno,job))
DEPTNO JOB CATEGORY SAL ------ --------- --------------------- ---------- 10 CLERK TOTAL BY DEPT AND JOB 1300 20 CLERK TOTAL BY DEPT AND JOB 1900 30 CLERK TOTAL BY DEPT AND JOB 950 20 ANALYST TOTAL BY DEPT AND JOB 6000 10 MANAGER TOTAL BY DEPT AND JOB 2450 20 MANAGER TOTAL BY DEPT AND JOB 2975 30 MANAGER TOTAL BY DEPT AND JOB 2850 30 SALESMAN TOTAL BY DEPT AND JOB 5600 10 PRESIDENT TOTAL BY DEPT AND JOB 5000 CLERK TOTAL BY JOB 4150 ANALYST TOTAL BY JOB 6000 MANAGER TOTAL BY JOB 8275 SALESMAN TOTAL BY JOB 5600 PRESIDENT TOTAL BY JOB 5000 10 TOTAL BY DEPT 8750 20 TOTAL BY DEPT 10875 30 TOTAL BY DEPT 9400
You can also eliminate a subtotal, such as the one on DEPTNO, simply by omitting (DEPTNO) from the GROUPING SETS clause:
/* nosubtotals by DEPTNO */select deptno,
job,
case grouping(deptno)||grouping(job)
when '00' then 'TOTAL BY DEPT AND JOB'
when '10' then 'TOTAL BY JOB'
when '01' then 'TOTAL BY DEPT'
when '11' then 'GRAND TOTAL FOR TABLE'
end category,
sum(sal) sal
from emp
group by grouping sets ((job),(deptno,job),())
order by 3
DEPTNO JOB CATEGORY SAL ------ --------- --------------------- ---------- GRAND TOTAL FOR TABLE 29025 10 CLERK TOTAL BY DEPT AND JOB 1300 20 CLERK TOTAL BY DEPT AND JOB 1900 30 CLERK TOTAL BY DEPT AND JOB 950 20 ANALYST TOTAL BY DEPT AND JOB 6000 20 MANAGER TOTAL BY DEPT AND JOB 2975 30 MANAGER TOTAL BY DEPT AND JOB 2850 30 SALESMAN TOTAL BY DEPT AND JOB 5600 10 PRESIDENT TOTAL BY DEPT AND JOB 5000 10 MANAGER TOTAL BY DEPT AND JOB 2450 CLERK TOTAL BY JOB 4150 SALESMAN TOTAL BY JOB 5600 PRESIDENT TOTAL BY JOB 5000 MANAGER TOTAL BY JOB 8275 ANALYST TOTAL BY JOB 6000
As you can see, GROUPING SETS makes it very easy indeed to play around with totals and subtotals in order to look at your data from different angles.
===== MySQL
The first step is to use the aggregate function SUM and group by both DEPTNO and JOB:
select deptno, job,
'TOTAL BY DEPT AND JOB' as category,
sum(sal) as sal
from emp
group by deptno, job
DEPTNO JOB CATEGORY SAL ------ --------- --------------------- ------- 10 CLERK TOTAL BY DEPT AND JOB 1300 10 MANAGER TOTAL BY DEPT AND JOB 2450 10 PRESIDENT TOTAL BY DEPT AND JOB 5000 20 CLERK TOTAL BY DEPT AND JOB 1900 20 ANALYST TOTAL BY DEPT AND JOB 6000 20 MANAGER TOTAL BY DEPT AND JOB 2975 30 CLERK TOTAL BY DEPT AND JOB 950 30 MANAGER TOTAL BY DEPT AND JOB 2850 30 SALESMAN TOTAL BY DEPT AND JOB 5600
The next step is to UNION ALL the sum of all the salaries by JOB:
select deptno, job,
'TOTAL BY DEPT AND JOB' as category,
sum(sal) as sal
from emp
group by deptno, job
union all
select null, job, 'TOTAL BY JOB', sum(sal)
from emp
group by job
DEPTNO JOB CATEGORY SAL ------ --------- --------------------- ------- 10 CLERK TOTAL BY DEPT AND JOB 1300 10 MANAGER TOTAL BY DEPT AND JOB 2450 10 PRESIDENT TOTAL BY DEPT AND JOB 5000 20 CLERK TOTAL BY DEPT AND JOB 1900 20 ANALYST TOTAL BY DEPT AND JOB 6000 20 MANAGER TOTAL BY DEPT AND JOB 2975 30 CLERK TOTAL BY DEPT AND JOB 950 30 MANAGER TOTAL BY DEPT AND JOB 2850 30 SALESMAN TOTAL BY DEPT AND JOB 5600 ANALYST TOTAL BY JOB 6000 CLERK TOTAL BY JOB 4150 MANAGER TOTAL BY JOB 8275 PRESIDENT TOTAL BY JOB 5000 SALESMAN TOTAL BY JOB 5600
The next step is to UNION ALL the sum of all the salaries by DEPTNO:
select deptno, job,
'TOTAL BY DEPT AND JOB' as category,
sum(sal) as sal
from emp
group by deptno, job
union all
select null, job, 'TOTAL BY JOB', sum(sal)
from emp
group by job
union all
select deptno, null, 'TOTAL BY DEPT', sum(sal)
from emp
group by deptno
DEPTNO JOB CATEGORY SAL ------ --------- --------------------- ------- 10 CLERK TOTAL BY DEPT AND JOB 1300 10 MANAGER TOTAL BY DEPT AND JOB 2450 10 PRESIDENT TOTAL BY DEPT AND JOB 5000 20 CLERK TOTAL BY DEPT AND JOB 1900 20 ANALYST TOTAL BY DEPT AND JOB 6000 20 MANAGER TOTAL BY DEPT AND JOB 2975 30 CLERK TOTAL BY DEPT AND JOB 950 30 MANAGER TOTAL BY DEPT AND JOB 2850 30 SALESMAN TOTAL BY DEPT AND JOB 5600 ANALYST TOTAL BY JOB 6000 CLERK TOTAL BY JOB 4150 MANAGER TOTAL BY JOB 8275 PRESIDENT TOTAL BY JOB 5000 SALESMAN TOTAL BY JOB 5600 10 TOTAL BY DEPT 8750 20 TOTAL BY DEPT 10875 30 TOTAL BY DEPT 9400
The final step is to UNION ALL the sum of all salaries in table EMP:
select deptno, job,
'TOTAL BY DEPT AND JOB' as category,
sum(sal) as sal
from emp
group by deptno, job
union all
select null, job, 'TOTAL BY JOB', sum(sal)
from emp
group by job
union all
select deptno, null, 'TOTAL BY DEPT', sum(sal)
from emp
group by deptno
union all
select null,null, 'GRAND TOTAL
FOR TABLE', sum(sal)
from emp
DEPTNO JOB CATEGORY SAL ------ --------- --------------------- ------- 10 CLERK TOTAL BY DEPT AND JOB 1300 10 MANAGER TOTAL BY DEPT AND JOB 2450 10 PRESIDENT TOTAL BY DEPT AND JOB 5000 20 CLERK TOTAL BY DEPT AND JOB 1900 20 ANALYST TOTAL BY DEPT AND JOB 6000 20 MANAGER TOTAL BY DEPT AND JOB 2975 30 CLERK TOTAL BY DEPT AND JOB 950 30 MANAGER TOTAL BY DEPT AND JOB 2850 30 SALESMAN TOTAL BY DEPT AND JOB 5600 ANALYST TOTAL BY JOB 6000 CLERK TOTAL BY JOB 4150 MANAGER TOTAL BY JOB 8275 PRESIDENT TOTAL BY JOB 5000 SALESMAN TOTAL BY JOB 5600 10 TOTAL BY DEPT 8750 20 TOTAL BY DEPT 10875 30 TOTAL BY DEPT 9400 GRAND TOTAL FOR TABLE 29025
=== Identifying Rows That Are Not Subtotals
==== Problem
You’ve used the CUBE extension of the GROUP BY clause to create a report, and you need a way to differentiate between rows that would be generated by a normal GROUP BY clause and those rows that have been generated as a result of using CUBE or ROLLUP.
Following is the result set from a query using the CUBE extension to GROUP BY to create a breakdown of the salaries in table EMP:
DEPTNO JOB SAL ------ --------- ------- 29025 CLERK 4150 ANALYST 6000 MANAGER 8275 SALESMAN 5600 PRESIDENT 5000 10 8750 10 CLERK 1300 10 MANAGER 2450 10 PRESIDENT 5000 20 10875 20 CLERK 1900 20 ANALYST 6000 20 MANAGER 2975 30 9400 30 CLERK 950 30 MANAGER 2850 30 SALESMAN 5600
This report includes the sum of all salaries by DEPTNO and JOB (for each JOB per DEPTNO), the sum of all salaries by DEPTNO, the sum of all salaries by JOB, and finally a grand total (the sum of all salaries in table EMP). You want to clearly identify the different levels of aggregation. You want to be able to identify which category an aggregated value belongs to (i.e., does a given value in the SAL column represent a total by DEPTNO? By JOB? The grand total?). You would like to return the following result set:
DEPTNO JOB SAL DEPTNO_SUBTOTALS JOB_SUBTOTALS ------ --------- ------- ---------------- ------------- 29025 1 1 CLERK 4150 1 0 ANALYST 6000 1 0 MANAGER 8275 1 0 SALESMAN 5600 1 0 PRESIDENT 5000 1 0 10 8750 0 1 10 CLERK 1300 0 0 10 MANAGER 2450 0 0 10 PRESIDENT 5000 0 0 20 10875 0 1 20 CLERK 1900 0 0 20 ANALYST 6000 0 0 20 MANAGER 2975 0 0 30 9400 0 1 30 CLERK 950 0 0 30 MANAGER 2850 0 0 30 SALESMAN 5600 0 0
==== Solution
Use the GROUPING function to identify which values exist due to CUBE’s or ROLLUP’s creation of subtotals, or superaggregate values. The following is an example for PostgreSQL,DB2 and Oracle:
1 select deptno, jo) sal, 2 grouping(deptno) deptno_subtotals, 3 grouping(job) job_subtotals 4 from emp 5 group by cube(deptno,job)
The only difference between the SQL Server solution and that for DB2 and Oracle lies in how the CUBE/ROLLUP clauses are written:
1 select deptno, job, sum(sal) sal, 2 grouping(deptno) deptno_subtotals, 3 grouping(job) job_subtotals 4 from emp 5 group by deptno,job with cube
This recipe is meant to highlight the use of CUBE and GROUPING when working with subtotals. As of the time of this writing, MySQL doesn’t support neither CUBE.
==== Discussion
If DEPTNO_SUBTOTALS is 0 and JOB_SUBTOTALS is 1 (in which case JOB is NULL), the value of SAL represents a subtotal of salaries by DEPTNO created by CUBE. If JOB_SUBTOTALS is 0 and DEPTNO_SUBTOTALS is 1 (in which case DEPTNO is NULL) the value of SAL represents a subtotal of salaries by JOB created by CUBE. Rows with 0 for both DEPTNO_SUBTOTALS and JOB_SUBTOTALS represent rows created by regular aggregation (the sum of SAL for each DEPTNO/JOB combination).
=== Using Case Expressions to Flag Rows
==== Problem
You want to map the values in a column, say, the EMP table’s JOB column, into a series of “Boolean” flags. For example, you wish to return the following result set:
ENAME IS_CLERK IS_SALES IS_MGR IS_ANALYST IS_PREZ ------ -------- -------- ------ ---------- ------- KING 0 0 0 0 1 SCOTT 0 0 0 1 0 FORD 0 0 0 1 0 JONES 0 0 1 0 0 BLAKE 0 0 1 0 0 CLARK 0 0 1 0 0 ALLEN 0 1 0 0 0 WARD 0 1 0 0 0 MARTIN 0 1 0 0 0 TURNER 0 1 0 0 0 SMITH 1 0 0 0 0 MILLER 1 0 0 0 0 ADAMS 1 0 0 0 0 JAMES 1 0 0 0 0
Such a result set can be useful for debugging and to provide yourself a view of the data different from what you’d see in a more typical result set.
==== Solution
Use a CASE expression to evaluate each employee’s JOB, and return a 1 or 0 to signify her JOB. You’ll need to write one CASE expression, and thus create one column for each possible job:
1 select ename, 2 case when job = 'CLERK' 3 then 1 else 0 4 end as is_clerk, 5 case when job = 'SALESMAN' 6 then 1 else 0 7 end as is_sales, 8 case when job = 'MANAGER' 9 then 1 else 0 10 end as is_mgr, 11 case when job = 'ANALYST' 12 then 1 else 0 13 end as is_analyst, 14 case when job = 'PRESIDENT' 15 then 1 else 0 16 end as is_prez 17 from emp 18 order by 2,3,4,5,6
==== Discussion
The solution code is pretty much self-explanatory. If you are having trouble understanding it, simply add JOB to the SELECT clause:
select ename,
job,
case when job = 'CLERK'
then 1 else 0
end as is_clerk,
case when job = 'SALESMAN'
then 1 else 0
end as is_sales,
case when job = 'MANAGER'
then 1 else 0
end as is_mgr,
case when job = 'ANALYST'
then 1 else 0
end as is_analyst,
case when job = 'PRESIDENT'
then 1 else 0
end as is_prez
from emp
order by 2
ENAME JOB IS_CLERK IS_SALES IS_MGR IS_ANALYST IS_PREZ ------ --------- -------- -------- ------ ---------- ------- SCOTT ANALYST 0 0 0 1 0 FORD ANALYST 0 0 0 1 0 SMITH CLERK 1 0 0 0 0 ADAMS CLERK 1 0 0 0 0 MILLER CLERK 1 0 0 0 0 JAMES CLERK 1 0 0 0 0 JONES MANAGER 0 0 1 0 0 CLARK MANAGER 0 0 1 0 0 BLAKE MANAGER 0 0 1 0 0 KING PRESIDENT 0 0 0 0 1 ALLEN SALESMAN 0 1 0 0 0 MARTIN SALESMAN 0 1 0 0 0 TURNER SALESMAN 0 1 0 0 0 WARD SALESMAN 0 1 0 0 0
=== Creating a Sparse Matrix
==== Problem
You want to create a sparse matrix, such as the following one transposing the DEPTNO and JOB columns of table EMP:
D10 D20 D30 CLERKS MGRS PREZ ANALS SALES ---------- ---------- ---------- ------ ----- ---- ----- ------ SMITH SMITH ALLEN ALLEN WARD WARD JONES JONES MARTIN MARTIN BLAKE BLAKE CLARK CLARK SCOTT SCOTT KING KING TURNER TURNER ADAMS ADAMS JAMES JAMES FORD FORD MILLER MILLER
==== Solution
Use CASE expressions to create a sparse row-to-column transformation:
1 select case deptno when 10 then ename end as d10, 2 case deptno when 20 then ename end as d20, 3 case deptno when 30 then ename end as d30, 4 case job when 'CLERK' then ename end as clerks, 5 case job when 'MANAGER' then ename end as mgrs, 6 case job when 'PRESIDENT' then ename end as prez, 7 case job when 'ANALYST' then ename end as anals, 8 case job when 'SALESMAN' then ename end as sales 9 from emp
==== Discussion
To transform the DEPTNO and JOB rows to columns, simply use a CASE expression to evaluate the possible values returned by those rows. That’s all there is to it. As an aside, if you want to “densify” the report and get rid of some of those NULL rows, you would need to find something to group by. For example, use the window function ROW_NUMBER OVER to assign a ranking for each employee per DEPTNO, and then use the aggregate function MAX to rub out some of the NULLs:
select max(case deptno when 10 then ename end) d10,
max(case deptno when 20 then ename end) d20,
max(case deptno when 30 then ename end) d30,
max(case job when 'CLERK' then ename end) clerks,
max(case job when 'MANAGER' then ename end) mgrs,
max(case job when 'PRESIDENT' then ename end) prez,
max(case job when 'ANALYST' then ename end) anals,
max(case job when 'SALESMAN' then ename end) sales
from (
select deptno, job, ename,
row_number()over(partition
by deptno order by empno) rn
from emp
) x
group by rn
D10 D20 D30 CLERKS MGRS PREZ ANALS SALES ---------- ---------- ---------- ------ ----- ---- ----- ------ CLARK SMITH ALLEN SMITH CLARK ALLEN KING JONES WARD JONES KING WARD MILLER SCOTT MARTIN MILLER SCOTT MARTIN ADAMS BLAKE ADAMS BLAKE FORD TURNER FORD TURNER JAMES JAMES
=== Grouping Rows by Units of Time
==== Problem
You want to summarize data by some interval of time. For example, you have a transaction log and want to summarize transactions by 5-second intervals. The rows in table TRX_LOG are shown below:
select trx_id,
trx_date,
trx_cnt
from trx_log
TRX_ID TRX_DATE TRX_CNT ------ -------------------- ---------- 1 28-JUL-2020 19:03:07 44 2 28-JUL-2020 19:03:08 18 3 28-JUL-2020 19:03:09 23 4 28-JUL-2020 19:03:10 29 5 28-JUL-2020 19:03:11 27 6 28-JUL-2020 19:03:12 45 7 28-JUL-2020 19:03:13 45 8 28-JUL-2020 19:03:14 32 9 28-JUL-2020 19:03:15 41 10 28-JUL-2020 19:03:16 15 11 28-JUL-2020 19:03:17 24 12 28-JUL-2020 19:03:18 47 13 28-JUL-2020 19:03:19 37 14 28-JUL-2020 19:03:20 48 15 28-JUL-2020 19:03:21 46 16 28-JUL-2020 19:03:22 44 17 28-JUL-2020 19:03:23 36 18 28-JUL-2020 19:03:24 41 19 28-JUL-2020 19:03:25 33 20 28-JUL-2020 19:03:26 19
You want to return the following result set:
GRP TRX_START TRX_END TOTAL --- -------------------- -------------------- ---------- 1 28-JUL-2020 19:03:07 28-JUL-2020 19:03:11 141 2 28-JUL-2020 19:03:12 28-JUL-2020 19:03:16 178 3 28-JUL-2020 19:03:17 28-JUL-2020 19:03:21 202 4 28-JUL-2020 19:03:22 28-JUL-2020 19:03:26 173
==== Solution
Group the entries into five row buckets. There are several ways to accomplish that logical grouping; this recipe does so by dividing the TRX_ID values by 5, using a technique shown earlier in “Creating Buckets of Data, of a Fixed Size.”
Once you’ve created the “groups,” use the aggregate functions MIN, MAX, and SUM to find the start time, end time, and total number of transactions for each “group” (SQL Server users should use CEILING instead of CEIL):
1 select ceil(trx_id/5.0) as grp, 2 min(trx_date) as trx_start, 3 max(trx_date) as trx_end, 4 sum(trx_cnt) as total 5 from trx_log 6 group by ceil(trx_id/5.0)
==== Discussion
The first step, and the key to the whole solution, is to logically group the rows together. By dividing by 5 and taking the smallest whole number greater than the quotient, you can create logical groups. For example:
select trx_id,
trx_date,
trx_cnt,
trx_id/5.0 as val,
ceil(trx_id/5.0) as grp
from trx_log
TRX_ID TRX_DATE TRX_CNT VAL GRP ------ -------------------- ------- ------ --- 1 28-JUL-2020 19:03:07 44 .20 1 2 28-JUL-2020 19:03:08 18 .40 1 3 28-JUL-2020 19:03:09 23 .60 1 4 28-JUL-2020 19:03:10 29 .80 1 5 28-JUL-2020 19:03:11 27 1.00 1 6 28-JUL-2020 19:03:12 45 1.20 2 7 28-JUL-2020 19:03:13 45 1.40 2 8 28-JUL-2020 19:03:14 32 1.60 2 9 28-JUL-2020 19:03:15 41 1.80 2 10 28-JUL-2020 19:03:16 15 2.00 2 11 28-JUL-2020 19:03:17 24 2.20 3 12 28-JUL-2020 19:03:18 47 2.40 3 13 28-JUL-2020 19:03:19 37 2.60 3 14 28-JUL-2020 19:03:20 48 2.80 3 15 28-JUL-2020 19:03:21 46 3.00 3 16 28-JUL-2020 19:03:22 44 3.20 4 17 28-JUL-2020 19:03:23 36 3.40 4 18 28-JUL-2020 19:03:24 41 3.60 4 19 28-JUL-2020 19:03:25 33 3.80 4 20 28-JUL-2020 19:03:26 19 4.00 4
The last step is to apply the appropriate aggregate functions to find the total number of transactions per 5 seconds along with the start and end times for each transaction:
select ceil(trx_id/5.0) as grp,
min(trx_date) as trx_start,
max(trx_date) as trx_end,
sum(trx_cnt) as total
from trx_log
group
by ceil(trx_id/5.0)
GRP TRX_START TRX_END TOTAL --- -------------------- -------------------- ---------- 1 28-JUL-2020 19:03:07 28-JUL-2005 19:03:11 141 2 28-JUL-2020 19:03:12 28-JUL-2005 19:03:16 178 3 28-JUL-2020 19:03:17 28-JUL-2005 19:03:21 202 4 28-JUL-2020 19:03:22 28-JUL-2005 19:03:26 173
If your data is slightly different (perhaps you don’t have an ID for each row), you can always “group” by dividing the seconds of each TRX_DATE row by 5 to create a similar grouping. Then you can include the hour for each TRX_DATE and group by the actual hour and logical “grouping,” GRP. Following is an example of this technique (using Oracle’s TO_CHAR and TO_NUMBER functions, you would use the appropriate date and character formatting functions for your platform):
select trx_date,trx_cnt,
to_number(to_char(trx_date,'hh24')) hr,
ceil(to_number(to_char(trx_date-1/24/60/60,'miss'))/5.0) grp
from trx_log
TRX_DATE 20 TRX_CNT HR GRP -------------------- ---------- ---------- ---------- 28-JUL-2020 19:03:07 44 19 62 28-JUL-2020 19:03:08 18 19 62 28-JUL-2020 19:03:09 23 19 62 28-JUL-2020 19:03:10 29 19 62 28-JUL-2020 19:03:11 27 19 62 28-JUL-2020 19:03:12 45 19 63 28-JUL-2020 19:03:13 45 19 63 28-JUL-2020 19:03:14 32 19 63 28-JUL-2020 19:03:15 41 19 63 28-JUL-2020 19:03:16 15 19 63 28-JUL-2020 19:03:17 24 19 64 28-JUL-2020 19:03:18 47 19 64 28-JUL-2020 19:03:19 37 19 64 28-JUL-2020 19:03:20 48 19 64 28-JUL-2020 19:03:21 46 19 64 28-JUL-2020 19:03:22 44 19 65 28-JUL-2020 19:03:23 36 19 65 28-JUL-2020 19:03:24 41 19 65 28-JUL-2020 19:03:25 33 19 65 28-JUL-2020 19:03:26 19 19 65
Regardless of the actual values for GRP, the key here is that you are grouping for every 5 seconds. From there you can apply the aggregate functions in the same way as in the original solution:
select hr,grp,sum(trx_cnt) total
from (
select trx_date,trx_cnt,
to_number(to_char(trx_date,'hh24')) hr,
ceil(to_number(to_char(trx_date-1/24/60/60,'miss'))/5.0) grp
from trx_log
) x
group
by hr,grp
HR GRP TOTAL -- ---------- ---------- 19 62 141 19 63 178 19 64 202 19 65 173
Including the hour in the grouping is useful if your transaction log spans hours. In DB2 and Oracle, you can also use the window function SUM OVER to produce the same result. The following query returns all rows from TRX_LOG along with a running total for TRX_CNT by logical “group,” and the TOTAL for TRX_CNT for each row in the “group”:
select trx_id, trx_date, trx_cnt,
sum(trx_cnt)over(partition by ceil(trx_id/5.0)
order by trx_date
range between unbounded preceding
and current row) runing_total,
sum(trx_cnt)over(partition by ceil(trx_id/5.0)) total,
case when mod(trx_id,5.0) = 0 then 'X' end grp_end
from trx_log
TRX_ID TRX_DATE TRX_CNT RUNING_TOTAL TOTAL GRP_END ------ -------------------- ---------- ------------ ---------- ------- 1 28-JUL-2020 19:03:07 44 44 141 2 28-JUL-2020 19:03:08 18 62 141 3 28-JUL-2020 19:03:09 23 85 141 4 28-JUL-2020 19:03:10 29 114 141 5 28-JUL-2020 19:03:11 27 141 141 X 6 28-JUL-2020 19:03:12 45 45 178 7 28-JUL-2020 19:03:13 45 90 178 8 28-JUL-2020 19:03:14 32 122 178 9 28-JUL-2020 19:03:15 41 163 178 10 28-JUL-2020 19:03:16 15 178 178 X 11 28-JUL-2020 19:03:17 24 24 202 12 28-JUL-2020 19:03:18 47 71 202 13 28-JUL-2020 19:03:19 37 108 202 14 28-JUL-2020 19:03:20 48 156 202 15 28-JUL-2020 19:03:21 46 202 202 X 16 28-JUL-2020 19:03:22 44 44 173 17 28-JUL-2020 19:03:23 36 80 173 18 28-JUL-2020 19:03:24 41 121 173 19 28-JUL-2020 19:03:25 33 154 173 20 28-JUL-2020 19:03:26 19 173 173 X
=== Performing Aggregations over Different Groups/Partitions Simultaneously
==== Problem
You want to aggregate over different dimensions at the same time. For example, you want to return a result set that lists each employee’s name, their department, the number of employees in their department (themselves included), the number of employees that have the same job as he does (themselves included in this count as well), and the total number of employees in the EMP table. The result set should look like the following:
ENAME DEPTNO DEPTNO_CNT JOB JOB_CNT TOTAL ------ ------ ---------- --------- -------- ------ MILLER 10 3 CLERK 4 14 CLARK 10 3 MANAGER 3 14 KING 10 3 PRESIDENT 1 14 SCOTT 20 5 ANALYST 2 14 FORD 20 5 ANALYST 2 14 SMITH 20 5 CLERK 4 14 JONES 20 5 MANAGER 3 14 ADAMS 20 5 CLERK 4 14 JAMES 30 6 CLERK 4 14 MARTIN 30 6 SALESMAN 4 14 TURNER 30 6 SALESMAN 4 14 WARD 30 6 SALESMAN 4 14 ALLEN 30 6 SALESMAN 4 14 BLAKE 30 6 MANAGER 3 14
==== Solution Use the COUNT OVER window function while specifying different partitions, or groups of data on which to perform aggregation:
select ename, deptno, count(*)over(partition by deptno) deptno_cnt, job, count(*)over(partition by job) job_cnt, count(*)over() total from emp
==== Discussion
This example really shows off the power and convenience of window functions. By simply specifying different partitions or groups of data to aggregate, you can create immensely detailed reports without having to self join over and over, and without having to write cumbersome and perhaps poorly performing subqueries in your SELECT list. All the work is done by the window function COUNT OVER. To understand the output, focus on the OVER clause for a moment for each COUNT operation:
count(*)over(partition by deptno) count(*)over(partition by job) count(*)over()
Remember the main parts of the OVER clause: the PARTITION BY sublclause, dividing the query into partitions: and the ORDER BY subclause, defining the logical order. Look at the first COUNT, which partitions by DEPTNO. The rows in table EMP will be grouped by DEPTNO and the COUNT operation will be performed on all the rows in each group. Since there is no frame or window clause specified (no ORDER BY), all the rows in the group are counted. The PARTITION BY clause finds all the unique DEPTNO values, and then the COUNT function counts the number of rows having each value. In the specific example of COUNT(*)OVER(PARTITION BY DEPTNO), The PARTITION BY clause identifies the partitions or groups to be values 10, 20, and 30.
The same processing is applied to the second COUNT, which partitions by JOB. The last count does not partition by anything, and simply has an empty parenthesis. An empty parenthesis implies “the whole table.” So, whereas the two prior COUNTs aggregate values based on the defined groups or partitions, the final COUNT counts all rows in table EMP.
Keep in mind that window functions are applied after the WHERE clause. If you were to filter the result set in some way, for example, excluding all employees in DEPTNO 10, the value for TOTAL would not be 14, it would be 11. To filter results after window functions have been evaluated, you must make your windowing query into an inline view and then filter on the results from that view.
=== Performing Aggregations over a Moving Range of Values
==== Problem
You want to compute a moving aggregation, such as a moving sum on the salaries in table EMP. You want to compute a sum for every 90 days, starting with the HIREDATE of the first employee. You want to see how spending has fluctuated for every 90-day period between the first and last employee hired. You want to return the following result set:
HIREDATE SAL SPENDING_PATTERN ----------- ------- ---------------- 17-DEC-200 800 800 20-FEB-2011 1600 2400 22-FEB-2011 1250 3650 02-APR-2011 2975 5825 01-MAY-2011 2850 8675 09-JUN-2011 2450 8275 08-SEP-2011 1500 1500 28-SEP-2011 1250 2750 17-NOV-2011 5000 7750 03-DEC-2011 950 11700 03-DEC-2011 3000 11700 23-JAN-2012 1300 10250 09-DEC-2012 3000 3000 12-JAN-2013 1100 4100
==== Solution
Being able to specify a moving window in the framing or windowing clause of window functions makes this problem very easy to solve, if your RDBMS supports such functions. The key is to order by HIREDATE in your window function and then specify a window of 90 days starting from the earliest employee hired. The sum will be computed using the salaries of employees hired up to 90 days prior to the current employee’s HIREDATE (the current employee is included in the sum). If you do not have window functions available, you can use scalar subqueries, but the solution will be more complex.
===== DB2 and Oracle
For DB2 and Oracle, use the window function SUM OVER and order by HIREDATE. Specify a range of 90 days in the window or “framing” clause to allow the sum to be computed for each employee’s salary and to include the salaries of all employees hired up to 90 days earlier. Because DB2 does not allow you to specify HIREDATE in the ORDER BY clause of a window function (line 3 below), you can order by DAYS(HIREDATE) instead:
1 select hiredate, 2 sal, 3 sum(sal)over(order by days(hiredate) 4 range between 90 preceding 5 and current row) spending_pattern 6 from emp e
The Oracle solution is more straightforward than DB2’s, because Oracle allows window functions to order by datetime types:
1 select hiredate, 2 sal, 3 sum(sal)over(order by hiredate 4 range between 90 preceding 5 and current row) spending_pattern 6 from emp e
===== MySQL Use windown function with slightly altered syntax.
1 select hiredate, 2 sal, 3 sum(sal)over(order by hiredate 4 range interval 90 day preceding ) spending_pattern 5 from emp e
===== PostgreSQL, and SQL Server
Use a scalar subquery to sum the salaries of all employees hired up to 90 days prior to the day each employee was hired:
1 select e.hiredate, 2 e.sal, 3 (select sum(sal) from emp d 4 whered.hiredate between e.hiredate-90 5 and e.hiredate) as spending_pattern 6 from emp e 7 order by 1
==== Discussion
===== DB2, MySQL and Oracle
DB2, MySQL and Oracle share the same lofical solution. The only minor differences between the solutions, lie in how you specify HIREDATE in the ORDER BY clause of the window function and the syntax of specifying the time intervale in MySQL. At the time of this book’s writing, DB2 doesn’t allow a DATE value in such an ORDER BY clause if you are using a numeric value to set the window’s range. (For example, RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW allows you to order by a date, but RANGE BETWEEN 90 PRECEDING AND CURRENT ROW does not.)
To understand what the solution query is doing, you simply need to understand what the window clause is doing. The window you are defining orders the salaries for all employees by HIREDATE. Then the function computes a sum. The sum is not computed for all salaries. Instead, the processing is as follows:
The salary of the first employee hired is evaluated. Since no employees were hired before the first employee, the sum at this point is simply the first employee’s salary.
The salary of the next employee (by HIREDATE) is evaluated. This employee’s salary is included in the moving sum along with any other employees who were hired up to 90 days prior.
The HIREDATE of the first employee is December 17, 2010, and the HIREDATE of the next hired employee is February 20, 2011. The second employee was hired less than 90 days after the first employee, and thus the moving sum for the second employee is 2400 (1600 + 800). If you are having trouble understanding where the values in SPENDING_PATTERN come from, examine the following query and result set:
select distinct
dense_rank()
over(order by e.hiredate) window,
e.hiredate current_hiredate,
d.hiredate hiredate_within_90_days,
d.sal sals_used_for_sum
from emp e,
emp d
where d.hiredate between e.hiredate-90 and e.hiredate
WINDOW CURRENT_HIREDATE HIREDATE_WITHIN_90_DAYS SALS_USED_FOR_SUM ------ ---------------- ----------------------- ----------------- 1 17-DEC-2010 17-DEC-2010 800 2 20-FEB-2011 17-DEC-2010 800 2 20-FEB-2011 20-FEB-2011 1600 3 22-FEB-2011 17-DEC-2010 800 3 22-FEB-2011 20-FEB-2011 1600 3 22-FEB-2011 22-FEB-2011 1250 4 02-APR-2011 20-FEB-2011 1600 4 02-APR-2011 22-FEB-2011 1250 4 02-APR-2011 02-APR-2011 2975 5 01-MAY-2011 20-FEB-2011 1600 5 01-MAY-2011 22-FEB-2011 1250 5 01-MAY-2011 02-APR-2011 2975 5 01-MAY-2011 01-MAY-2011 2850 6 09-JUN-2011 02-APR-2011 2975 6 09-JUN-2011 01-MAY-2011 2850 6 09-JUN-2011 09-JUN-2011 2450 7 08-SEP-2011 08-SEP-2011 1500 8 28-SEP-2011 08-SEP-2011 1500 8 28-SEP-2011 28-SEP-2011 1250 9 17-NOV-2011 08-SEP-2011 1500 9 17-NOV-2011 28-SEP-2011 1250 9 17-NOV-2011 17-NOV-2011 5000 10 03-DEC-2011 08-SEP-2011 1500 10 03-DEC-2011 28-SEP-2011 1250 10 03-DEC-2011 17-NOV-2011 5000 10 03-DEC-2011 03-DEC-2011 950 10 03-DEC-2011 03-DEC-2011 3000 11 23-JAN-2012 17-NOV-2011 5000 11 23-JAN-2012 03-DEC-2011 950 11 23-JAN-2012 03-DEC-2011 3000 11 23-JAN-2012 23-JAN-2012 1300 12 09-DEC-2012 09-DEC-2012 3000 13 12-JAN-2013 09-DEC-2012 3000 13 12-JAN-2013 12-JAN-2013 1100
If you look at the WINDOW column, only those rows with the same WINDOW value will be considered for each sum. Take for example, WINDOW 3. The salaries used for the sum for that window are 800, 1600, and 1250, which total 3650. If you look at the final result set in the “Problem” section, you’ll see the SPENDING_PATTERN for February 22, 2011 (WINDOW 3) is 3650. As proof, to verify that the above self join includes the correct salaries for the windows defined, simply sum the values in SALS_USED_FOR_SUM and group by CURRENT_DATE. The result should be the same as the result set shown in the “Problem” section (with the duplicate row for December 3, 2011, filtered out):
select current_hiredate,
sum(sals_used_for_sum) spending_pattern
from (
select distinct
dense_rank()
over(order by e.hiredate) window,
e.hiredate current_hiredate,
d.hiredate hiredate_within_90_days,
d.sal sals_used_for_sum
from emp e,
emp d
where d.hiredate between e.hiredate-90 and e.hiredate
) x
group by current_hiredate
CURRENT_HIREDATE SPENDING_PATTERN ---------------- ---------------- 17-DEC-2010 800 20-FEB-2011 2400 22-FEB-2011 3650 02-APR-2011 5825 01-MAY-2011 8675 09-JUN-2011 8275 08-SEP-2011 1500 28-SEP-2011 2750 17-NOV-2011 7750 03-DEC-2011 11700 23-JAN-2012 10250 09-DEC-2012 3000 12-JAN-2013 4100
===== PostgreSQL, and SQL Server
The key to this solution is to use a scalar subquery (a self join will work as well) while using the aggregate function SUM to compute a sum for every 90 days based on HIREDATE. If you are having trouble seeing how this works, simply convert the solution to a self join and examine which rows are included in the computations. Consider the result set below, which returns the same result set as that in the solution:
select e.hiredate,
e.sal,
sum(d.sal) as spending_pattern
from emp e, emp d
where d.hiredate
between e.hiredate-90 and e.hiredate
group by e.hiredate,e.sal
order by 1
HIREDATE SAL SPENDING_PATTERN ----------- ----- ---------------- 17-DEC-2010 800 800 20-FEB-2011 1600 2400 22-FEB-2011 1250 3650 02-APR-2011 2975 5825 01-MAY-2011 2850 8675 09-JUN-2011 2450 8275 08-SEP-2011 1500 1500 28-SEP-2011 1250 2750 17-NOV-2011 5000 7750 03-DEC-2011 950 11700 03-DEC-2011 3000 11700 23-JAN-2012 1300 10250 09-DEC-2012 3000 3000 12-JAN-2013 1100 4100
If it is still unclear, simply remove the aggregation and start with the Cartesian product. The first step is to generate a Cartesian product using table EMP so that each HIREDATE can be compared with all the other HIREDATEs. [Only a snippet of the result set is shown below because there are 196 rows (14x14) returned by a Cartesian of EMP.]
select e.hiredate,
e.sal,
d.sal,
d.hiredate
from emp e, emp d
HIREDATE SAL SAL HIREDATE ----------- ----- ----- ----------- 17-DEC-2010 800 800 17-DEC-2010 17-DEC-2010 800 1600 20-FEB-2011 17-DEC-2010 800 1250 22-FEB-2011 17-DEC-2010 800 2975 02-APR-2011 17-DEC-2010 800 1250 28-SEP-2011 17-DEC-2010 800 2850 01-MAY-2011 17-DEC-2010 800 2450 09-JUN-2011 17-DEC-2010 800 3000 09-DEC-2012 17-DEC-2010 800 5000 17-NOV-2011 17-DEC-2010 800 1500 08-SEP-2011 17-DEC-2010 800 1100 12-JAN-2013 17-DEC-2010 800 950 03-DEC-2011 17-DEC-2010 800 3000 03-DEC-2011 17-DEC-2010 800 1300 23-JAN-2012 20-FEB-2011 1600 800 17-DEC-2010 20-FEB-2011 1600 1600 20-FEB-2011 20-FEB-2011 1600 1250 22-FEB-2011 20-FEB-2011 1600 2975 02-APR-2011 20-FEB-2011 1600 1250 28-SEP-2011 20-FEB-2011 1600 2850 01-MAY-2011 20-FEB-2011 1600 2450 09-JUN-2011 20-FEB-2011 1600 3000 09-DEC-2012 20-FEB-2011 1600 5000 17-NOV-2011 20-FEB-2011 1600 1500 08-SEP-2011 20-FEB-2011 1600 1100 12-JAN-2013 20-FEB-2011 1600 950 03-DEC-2011 20-FEB-2011 1600 3000 03-DEC-2011 20-FEB-2011 1600 1300 23-JAN-2012
If you examine the result set above, you’ll notice that there is no HIREDATE 90 days earlier or equal to December 17, except for December 17. So, the sum for that row should be only 800. If you examine the next HIREDATE, February 20, you’ll notice that there is one HIREDATE that falls within the 90-day window (within 90 days prior), and that is December 17. If you sum the SAL from December 17 with the SAL from February 20 (because we are looking for HIREDATEs equal to each HIREDATE or within 90 days earlier) you get 2400, which happens to be the final result for that HIREDATE.
Now that you know how it works, use a filter in the WHERE clause to return for each HIREDATE and HIREDATE that is equal to it or is no more than 90 days earlier:
select e.hiredate,
e.sal,
d.sal sal_to_sum,
d.hiredate within_90_days
from emp e, emp d
where d.hiredate
between e.hiredate-90 and e.hiredate
order by 1
HIREDATE SAL SAL_TO_SUM WITHIN_90_DAYS ----------- ----- ---------- -------------- 17-DEC-2010 800 800 17-DEC-2010 20-FEB-2011 1600 800 17-DEC-2010 20-FEB-2011 1600 1600 20-FEB-2011 22-FEB-2011 1250 800 17-DEC-2010 22-FEB-2011 1250 1600 20-FEB-2011 22-FEB-2011 1250 1250 22-FEB-2011 02-APR-2011 2975 1600 20-FEB-2011 02-APR-2011 2975 1250 22-FEB-2011 02-APR-2011 2975 2975 02-APR-2011 01-MAY-2011 2850 1600 20-FEB-2011 01-MAY-2011 2850 1250 22-FEB-2011 01-MAY-2011 2850 2975 02-APR-2011 01-MAY-2011 2850 2850 01-MAY-2011 09-JUN-2011 2450 2975 02-APR-2011 09-JUN-2011 2450 2850 01-MAY-2011 09-JUN-2011 2450 2450 09-JUN-2011 08-SEP-2011 1500 1500 08-SEP-2011 28-SEP-2011 1250 1500 08-SEP-2011 28-SEP-2011 1250 1250 28-SEP-2011 17-NOV-2011 5000 1500 08-SEP-2011 17-NOV-2011 5000 1250 28-SEP-2011 17-NOV-2011 5000 5000 17-NOV-2011 03-DEC-2011 950 1500 08-SEP-2011 03-DEC-2011 950 1250 28-SEP-2011 03-DEC-2011 950 5000 17-NOV-2011 03-DEC-2011 950 950 03-DEC-2011 03-DEC-2011 950 3000 03-DEC-2011 03-DEC-2011 3000 1500 08-SEP-2011 03-DEC-2011 3000 1250 28-SEP-2011 03-DEC-2011 3000 5000 17-NOV-2011 03-DEC-2011 3000 950 03-DEC-2011 03-DEC-2011 3000 3000 03-DEC-2011 23-JAN-2012 1300 5000 17-NOV-2011 23-JAN-2012 1300 950 03-DEC-2011 23-JAN-2012 1300 3000 03-DEC-2011 23-JAN-2012 1300 1300 23-JAN-2012 09-DEC-2012 3000 3000 09-DEC-2012 12-JAN-2013 1100 3000 09-DEC-2012 12-JAN-2013 1100 1100 12-JAN-2013
Now that you know which SALs are to be included in the moving window of summation, simply use the aggregate function SUM to produce a more expressive result set:
select e.hiredate, e.sal, sum(d.sal) as spending_pattern from emp e, emp d where d.hiredate between e.hiredate-90 and e.hiredate group by e.hiredate,e.sal order by 1
If you compare the result set for the query above and the result set for the query below (which is the original solution presented), you will see they are the same:
select e.hiredate, e.sal, (select sum(sal) from emp d where d.hiredate between e.hiredate-90 and e.hiredate) as spending_pattern from emp e order by 1 HIREDATE SAL SPENDING_PATTERN ----------- ----- ---------------- 17-DEC-2010 800 800 20-FEB-2011 1600 2400 22-FEB-2011 1250 3650 02-APR-2011 2975 5825 01-MAY-2011 2850 8675 09-JUN-2011 2450 8275 08-SEP-2011 1500 1500 28-SEP-2011 1250 2750 17-NOV-2011 5000 7750 03-DEC-2011 950 11700 03-DEC-2011 3000 11700 23-JAN-2012 1300 10250 09-DEC-2012 3000 3000 12-JAN-2013 1100 4100
=== Pivoting a Result Set with Subtotals
==== Problem
You want to create a report containing subtotals, then transpose the results to provide a more readable report. For example, you’ve been asked to create a report that displays for each department, the managers in the department along with a sum of the salaries of the employees who work for those managers. Additionally, you want to return two subtotals: the sum of all salaries in each department for those employees who have managers, and a sum of all salaries in the result set (the sum of the department subtotals). You currently have the following report:
DEPTNO MGR SAL ------ ---------- ---------- 10 7782 1300 10 7839 2450 10 3750 20 7566 6000 20 7788 1100 20 7839 2975 20 7902 800 20 10875 30 7698 6550 30 7839 2850 30 9400 24025
You want to provide a more readable report and wish to transform the above result set to the following, which makes the meaning of the report much more clear:
MGR DEPT10 DEPT20 DEPT30 TOTAL ---- ---------- ---------- ---------- ---------- 7566 0 6000 0 7698 0 0 6550 7782 1300 0 0 7788 0 1100 0 7839 2450 2975 2850 7902 0 800 0 3750 10875 9400 24025
==== Solution
The first step is to generate subtotals using the ROLLUP extension to GROUP BY. The next step is to perform a classic pivot (aggregate and CASE expression) to create the desired columns for your report. The GROUPING function allows you to easily determine which values are subtotals (that is, exist because of ROLLUP and otherwise would not normally be there). Depending on how your RDBMS sorts NULL values, you may need to add an ORDER BY to the solution to allow it to look like the target result set above.
===== DB2 and Oracle
Use the ROLLUP extension to GROUP BY then use a CASE expression to format the data into a more readable report:
1 select mgr, 2 sum(case deptno when 10 then sal else 0 end) dept10, 3 sum(case deptno when 20 then sal else 0 end) dept20, 4 sum(case deptno when 30 then sal else 0 end) dept30, 5 sum(case flag when '11' then sal else null end) total 6 from ( 7 select deptno,mgr,sum(sal) sal, 8 cast(grouping(deptno) as char(1))|| 9 cast(grouping(mgr) as char(1)) flag 10 from emp 11 where mgr is not null 12 group by rollup(deptno,mgr) 13 ) x 14 group by mgr
===== SQL Server
Use the ROLLUP extension to GROUP BY then use a CASE expression to format the data into a more readable report:
1 select mgr, 2 sum(case deptno when 10 then sal else 0 end) dept10, 3 sum(case deptno when 20 then sal else 0 end) dept20, 4 sum(case deptno when 30 then sal else 0 end) dept30, 5 sum(case flag when '11' then sal else null end) total 6 from ( 7 select deptno,mgr,sum(sal) sal, 8 cast(grouping(deptno) as char(1))+ 9 cast(grouping(mgr) as char(1)) flag 10 from emp 11 where mgr is not null 12 group by deptno,mgr with rollup 13 ) x 14 group by mgr
===== PostgreSQL
Use the ROLLUP extension to GROUP BY then use a CASE expression to format the data into a more readable report:
1 select mgr, 2 sum(case deptno when 10 then sal else 0 end) dept10, 3 sum(case deptno when 20 then sal else 0 end) dept20, 4 sum(case deptno when 30 then sal else 0 end) dept30, 5 sum(case flag when '11' then sal else null end) total 6 from ( 7 select deptno,mgr,sum(sal) sal, 8 concat(cast (grouping(deptno) as char(1)), 9 cast(grouping(mgr) as char(1))) flag 10 from emp 11 where mgr is not null 12 group by rollup (deptno,mgr) 13 ) x 14 group by mgr
===== MySQL
Use the ROLLUP extension to GROUP BY then use a CASE expression to format the data into a more readable report:
1 select mgr, 2 sum(case deptno when 10 then sal else 0 end) dept10, 3 sum(case deptno when 20 then sal else 0 end) dept20, 4 sum(case deptno when 30 then sal else 0 end) dept30, 5 sum(case flag when '11' then sal else null end) total 6 from ( 7 select deptno,mgr,sum(sal) sal, 8 concat( cast(grouping(deptno) as char(1)) , 9 cast(grouping(mgr) as char(1))) flag 10 from emp 11 where mgr is not null 12 group by deptno,mgr with rollup 13 ) x 14 group by mgr;
==== Discussion
The solutions provided above are identical except for the string concatenation and how GROUPING is specified. Because the solutions are so similar, the discussion below will refer to the SQL Server solution to highlight the intermediate result sets (the discussion is relevant to DB2 and Oracle as well).
The first step is to generate a result set that sums the SAL for the employees in each DEPTNO per MGR. The idea is to show how much the employees make under a particular manager in a particular department. For example, this query below will allow you to compare the salaries of employees who work for KING in DEPTNO 10 compared with those who work for KING in DEPTNO 30:
select deptno,mgr,sum(sal) sal from emp where mgr is not null group by mgr,deptno order by 1,2 DEPTNO MGR SAL ------ ---------- ---------- 10 7782 1300 10 7839 2450 20 7566 6000 20 7788 1100 20 7839 2975 20 7902 800 30 7698 6550 30 7839 2850
The next step is to use the ROLLUP extension to GROUP BY to create subtotals for each DEPTNO and across all employees (who have a manager):
select deptno,mgr,sum(sal) sal from emp where mgr is not null group by deptno,mgr with rollup DEPTNO MGR SAL ------ ---------- ---------- 10 7782 1300 10 7839 2450 10 3750 20 7566 6000 20 7788 1100 20 7839 2975 20 7902 800 20 10875 30 7698 6550 30 7839 2850 30 9400 24025
With the subtotals created, you need a way to determine which values are in fact subtotals (created by ROLLUP) and which are results of the regular GROUP BY. Use the GROUPING function to create bitmaps to help identify the subtotal values from the regular aggregate values:
select deptno,mgr,sum(sal) sal, cast(grouping(deptno) as char(1))+ cast(grouping(mgr) as char(1)) flag from emp where mgr is not null group by deptno,mgr with rollup DEPTNO MGR SAL FLAG ------ ---------- ---------- ---- 10 7782 1300 00 10 7839 2450 00 10 3750 01 20 7566 6000 00 20 7788 1100 00 20 7839 2975 00 20 7902 800 00 20 10875 01 30 7698 6550 00 30 7839 2850 00 30 9400 01 24025 11
If it isn’t immediately obvious, the rows with a value of 00 for FLAG are the results of regular aggregation. The rows with a value of 01 for FLAG are the results of ROLLUP aggregating SAL by DEPTNO (since DEPTNO is listed first in the ROLLUP; if you switch the order, for example, “GROUP BY MGR, DEPTNO WITH ROLLUP”, you’d see quite different results). The row with a value of 11 for FLAG is the result of ROLLUP aggregating SAL over all rows.
At this point you have everything you need to create a beautified report by simply using CASE expressions. The goal is to provide a report that shows employee salaries for each manager across departments. If a manager does not have any subordinates in a particular department, a zero should be returned; otherwise, you want to return the sum of all salaries for that manager’s subordinates in that department. Additionally, you want to add a final column, TOTAL, representing a sum of all the salaries in the report. The solution satisfying all these requirements is shown below:
select mgr, sum(case deptno when 10 then sal else 0 end) dept10, sum(case deptno when 20 then sal else 0 end) dept20, sum(case deptno when 30 then sal else 0 end) dept30, sum(case flag when '11' then sal else null end) total from ( select deptno,mgr,sum(sal) sal, cast(grouping(deptno) as char(1))+ cast(grouping(mgr) as char(1)) flag from emp where mgr is not null group by deptno,mgr with rollup ) x group by mgr order by coalesce(mgr,9999) MGR DEPT10 DEPT20 DEPT30 TOTAL ---- ---------- ---------- ---------- ---------- 7566 0 6000 0 7698 0 0 6550 7782 1300 0 0 7788 0 1100 0 7839 2450 2975 2850 7902 0 800 0 3750 10875 9400 24025
=== Summing Up
Databases are for storing data, but eventually someone needs to retrieve the data, and present it somewhere. The recipes in this chapter show a variety of important ways that data can be re-shaped or formatted to meet the needs of users. Apart from their general usefulness in giving users data in the form they need, these techniques play an important role in giving a database owner the ability to create a datawarehouse. As you gain more experience in supporting users in the business, you will become more adept, and extend the ideas here into more elaborate presentations.