The last thing we'll do today is look at a method for generating a pivot table using aggregates and grouping. Pivot tables, also called cross-tabs, are different from regular result sets because the column headings are drawn from the data itself. One column in the data is selected to become the headings for the cross-tab report. To understand this better, you need to see the result set for a pivot table. Then we will step through the process of building it. The result set we will produce analyzes current (1998) and prior (1997) year orders for the first five months of the year:
Month Orders 1998 Orders 1997 Percent Change --------- ----------- ----------- ------------------------- January 55 33 66.666666 February 54 29 86.206896 March 73 30 143.333333 April 74 31 138.709677 May 14 32 -56.250000 NULL 270 155 74.193548
The months (extracted from the export date) are displayed on the left. The years (also extracted from the export date) are displayed at the top. Each cell (intersection of a month and year) contains the result of an aggregate based on that month and year. At the right is the percent change from the prior year to the current year for each month. At the bottom is a year-to-date total.
Note
Everything we have learned so far will end up in this query. Make sure that you feel pretty good about the material. You might even want to work on some of the exercises at the end before you go on.
We will take this example slowly, as we did the rest. Work through each stage of the code. Test it, change it, break it: It's the only way to learn.
The key to this query is the CASE operator. You learned about this yesterday, and now it's time to put it to good use.
We'll use CASE to examine the year of each order and decide whether the order belongs in a count of orders for 1998 or 1997. This query looks at 10 individual rows, performing that analysis:
set rowcount 10 select OrderDate, datepart(yy, OrderDate) as 'Year', case datepart(yy, OrderDate) when 1998 then 1 else 0 end as 'is1998' from Orders where OrderDate >= '12/30/97' set rowcount 0
Results:
OrderDate Year 1998 --------------------------- ----------- ----------- 1997-12-30 00:00:00.000 1997 0 1997-12-30 00:00:00.000 1997 0 1997-12-30 00:00:00.000 1997 0 1997-12-31 00:00:00.000 1997 0 1997-12-31 00:00:00.000 1997 0 1998-01-01 00:00:00.000 1998 1 1998-01-01 00:00:00.000 1998 1 1998-01-01 00:00:00.000 1998 1 1998-01-02 00:00:00.000 1998 1 1998-01-02 00:00:00.000 1998 1
Note
The first and last lines set a limit on the number of rows returned by a query. I wanted to limit the output to just 10 rows.
The WHERE condition was specifically chosen to find rows that select rows in 1997 and 1998. This enables you to see that the CASE expression returns 0 for rows that do not fall in 1998 and 1 for rows that do fall in 1998.
The expression to find rows in 1997 uses the same logic. Here is a query that lists the orders and includes a column for each year. The left column displays the month of the order.
set rowcount 10 select datename(mm, OrderDate) as 'Month', case datepart(yy, OrderDate) when 1998 then 1 else 0 end as 'is1998', case datepart(yy, OrderDate) when 1997 then 1 else 0 end as 'is1997' from Orders where OrderDate >= '12/30/97' set rowcount 0
Results:
Month is1998 is1997 ------------------------------ ----------- ----------- December 0 1 December 0 1 December 0 1 December 0 1 December 0 1 January 1 0 January 1 0 January 1 0 January 1 0 January 1 0
Notice that there is still no grouping clause, so each row in the result reflects a single row in the Orders table.
If we had a table with the columns Month, is1998, and is1997, it would be a simple matter to produce the grouped result. We would add up all the ones and zeros in each column to come up with a total number of rows. Here's the query we would write:
/* query we would write if we could */ select Month, sum(is1998) as 'Orders 1998', sum(is1997) as 'Orders 1997' from Orders where OrderDate >= '1/1/97'and OrderDate < '1/1/99' group by MonthNumber, Month with rollup having MonthNumber is null or MonthName is not null
You should add the date condition to avoid working with rows that are outside the range of either year column. That will save processing time.
To get the overall totals, I've added the ROLLUP operator. Make certain to group first by the month number rather than the month name so that the data is sorted chronologically, not alphabetically. (You learned about that problem earlier today.) I've added the HAVING clause to suppress rows created by the extra grouping level.
Unfortunately, we don't have those columns in our table, but we already have expressions for them. Just write the same query, but substitute the expressions from the preceding code. Listing 4.10 has the integrated code.
The result set displays the month with the orders for each of the years:
Month Orders 1998 Orders 1997 ------------------------------ ----------- ----------- January 55 33 February 54 29 March 73 30 April 74 31 May 14 32 June 0 30 July 0 33 August 0 33 September 0 37 October 0 38 November 0 34 December 0 48 NULL 270 408
Now we need to perform the year-to-year comparison. We'll limit the query to the first five months of the year (there's no data after May for 1998). To do that, use a WHERE clause.
We can calculate the percent change with this expression:
100 * Orders 1998 / Orders 1997 - 100 as 'Percent Change'
Again, it's just a matter of substituting the expressions we already have for Orders 1998 and Orders 1998 into this formula and adding it to the select list as shown in Listing 4.11.
In this code, it's worth noting that all the numbers in the percentage calculation have been converted to the float data type by adding a decimal point (1., 0., 100.). Otherwise, the percentage would be calculated by using integer arithmetic and would be incredibly inaccurate.
We will come back to this code again later in the book. This query and queries like it are very common requirements of SQL programmers. Tomorrow, after you have learned to join tables, you will be asked to produce a similar analysis of the total dollar activity for the company.
Later in the book, you will learn to use a number of methods that will simplify queries like this one. User-defined functions and computed columns will enable users to invoke the detailed CASE logic more simply. Views will enable you to predefine whole queries to access similar data more easily. Stored procedures will enable you to execute a query like this one by name.