Chapter 7
Grouping Data

Introduction

In this chapter you will learn how to use the GROUP BY and HAVING clauses to group and filter data.

Keywords

GROUP BY

HAVING

ORDER BY

WHERE

Definitions

GROUP BY clause — Used with aggregate functions to combine groups of records into a single functional record.

HAVING clause — Used with the GROUP BY clause to set conditions on groups of data calculated from aggregate functions.

The GROUP BY Clause

In Chapter 4 we covered the ORDER BY clause, which affects the results of a query by returning records in either descending or ascending order. In this chapter we will be covering the GROUP BY clause, which is used with aggregate functions to combine groups of records into a single record. We briefly mentioned a method of grouping records in Chapter 6 with the discussion of aggregate functions. Recall that they are used to return a single value based on values stored in a column. Examples of aggregate functions include the following: AVG (), COUNT (), MAX (), MIN (), and SUM (). The GROUP BY clause is far more powerful since it provides a means for grouping specific subsets of records and presenting calculations on each of the subsets.

Before we get started using the GROUP BY clause, let’s take a moment to discuss the rules for using the GROUP BY clause. To use the GROUP BY clause the following must apply:

The GROUP BY clause can only be used in queries that contain at least one aggregate function. (Otherwise there is no need for the GROUP BY!)

All column names retrieved from the database (specified after the SELECT keyword) must be present in the GROUP BY clause. Note that this does not include column names that are specified within functions or alternate column names (aliases).

You have probably used the GROUP BY clause without realizing it since every time you run a query in the query grid with Totals turned on, you are in effect running an aggregate query (see Figure 7-1).

Figure 7-1

Enabling Totals modifies the query grid by adding the Total row to the grid. Using the Totals option (located under the Design menu) for each field presents the user with several functions, including Group By (the default that is used when totals are not desired).

Example 1

Figure 7-2. Transactions table

Figure 7-2 shows a table named Transactions. The Transactions table represents sales at a company. The TransactionID column is the primary key column. The ProductID column represents a unique ID for products, and each product ID contains a corresponding customer ID that represents a customer. Customer IDs that appear more than once represent customers who purchased multiple products. The DateSold column represents the date a product was sold.

Suppose you want to count the total number of products each customer purchased. Using the query grid (Figure 7-3) you would start with the basic select query with two columns: CustomerID and ProductID. (Notice that the full field name is TotalProductsPurchased:ProductID. The text to the left of the colon is the alias, and the text to the right is the actual field name.) Select Totals from the Design menu, and select Count in the Total row for the TotalProductsPurchased column.

Figure 7-3. Query Design view

Changing the view to SQL (Figure 7-4) produces the following result:

Figure 7-4. SQL view

With a bit of cleanup and changing the name of the result of the count to the more descriptive TotalProductsPurchased, we produce the following script:


SELECT CustomerID, COUNT (ProductID) AS TotalProductsPurchased
FROM Transactions
GROUP BY CustomerID;

The preceding script displays two columns (CustomerID and TotalProductsPurchased). The COUNT (ProductID) function is used to count each product ID. The GROUP BY clause groups the results from the aggregate function COUNT (ProductID) per each customer ID. Take a look at Figure 7-5, which shows each customer ID and the total number of products purchased.

Figure 7-5. Results (output)

Note: As you have probably discovered from using the query grid, the GROUP BY clause can also be used to group multiple columns. In the SQL statement, the fields you are grouping by are separated with commas. Using our previous example and grouping by both the CustomerID and the DateSold fields produces the following:


SELECT Sales.CustomerID, Count(Sales.ProductID) AS CountOfProductID,
Sales.DateSold
FROM Sales
GROUP BY Sales.CustomerID, Sales.DateSold

Using the GROUP BY Clause with the ORDER BY Clause

The GROUP BY clause can also be used in conjunction with the ORDER BY clause to sort the query results. Take a look at the following rules for using the GROUP BY clause with the ORDER BY clause.

The ORDER BY clause cannot be used in a query containing an aggregate function and no GROUP BY clause.

The GROUP BY clause must appear before the ORDER BY clause.

Example 2 implements a query using the GROUP BY and ORDER BY clauses.

Example 2

Suppose you want to duplicate the query in Example 1, but this time you want to sort the output by the total amount of purchases per customer. In Design view, add the Sort option on the ProductID column to the previous query.

Figure 7-6. Query Design view

Changing to SQL view and simplifying produces the following:


SELECT CustomerID, COUNT (ProductID) AS TotalProductsPurchased
FROM Transactions
GROUP BY CustomerID
ORDER BY COUNT (ProductID);

The preceding script uses the ORDER BY clause to sort the output by the total amount of purchases per customer COUNT (ProductID). Take a look at the results in Figure 7-7.

Figure 7-7. Results (output)

The HAVING Clause

The HAVING clause is used with the GROUP BY clause to set conditions on groups of data calculated from aggregate functions. The HAVING clause uses the same operators as the WHERE clause and has the same syntax. Refer to Chapter 5 to refresh your memory on the WHERE clause syntax and the operators used with the WHERE clause. Example 3 shows a query using the HAVING clause.

Example 3

Suppose you want to display the customer ID and the total number of products purchased for customers who purchased two or more products. In Design view you would represent this as shown in Figure 7-8:

Figure 7-8. Query Design view

In SQL view (Figure 7-9) this produces the following result:

Figure 7-9. SQL view

When simplified it looks like this:


SELECT CustomerID, COUNT (ProductID) AS TotalProductsPurchased
FROM Transactions
GROUP BY CustomerID
HAVING COUNT (ProductID) >= 2;

The preceding script uses the COUNT (ProductID) function to count the product IDs. The GROUP BY clause groups the results of the aggregate function (COUNT (ProductID)) per each customer ID. The HAVING clause specifies the display of only the total counts that are greater than or equal to 2. Notice that the HAVING clause follows the GROUP BY clause. If you try to place the HAVING clause before the GROUP BY clause, you will receive an error. Refer to Figure 7-10 to see the results.

Figure 7-10. Results (output)

Using the HAVING Clause with the WHERE Clause

The WHERE clause can be used with the HAVING clause since the WHERE clause filters rows before any data is grouped and the HAVING clause filters rows after data is grouped. This comes in handy when you want to filter groups and items that are not in the same query.

Note: Whenever you use the GROUP BY clause with a WHERE clause, the GROUP BY clause must appear after the WHERE clause.

Take a look at Example 4, which shows a query using both the HAVING and the WHERE clauses.

Example 4

Suppose you want to count the total number of products purchased for customer IDs less than or equal to 6 with a total count of products purchased that is greater than or equal to 2. Take a look at the following script:


SELECT CustomerID, COUNT (ProductID) AS TotalProductsPurchased
FROM Transactions
WHERE CustomerID <= 6
GROUP BY CustomerID
HAVING COUNT (ProductID) >= 2;

This script uses the WHERE clause to instruct Microsoft Access to only include customer IDs less than or equal to 6, while the HAVING clause is used to instruct Microsoft Access to include only the total products purchased greater than or equal to 2. Figure 7-11 shows the results from the query.

Figure 7-11. Results (output)

Notice that for this example we started with the SQL statement and have not shown the query in Design view. This is to stress a point. The query grid for this query is as follows (Figure 7-12):

Figure 7-12. Query Design view

As the query gets more complex, you will rapidly discover that the SQL statement is far more descriptive and easier to interpret than the query grid. While one of the authors spends most of his time in Design view with only occasional initial query writing in SQL view, he often has to change over to SQL view to see how Access is really interpreting the query. Sometimes what you think you have written in Design view is not what actually is occurring. SQL view removes possible ambiguity and shows what is really going to happen!

Summary

In this chapter, you have learned how to use the GROUP BY clause in queries that contain aggregate functions. You have additionally learned how to use the GROUP BY clause with the ORDER BY and HAVING clauses. You also learned how to use the HAVING clause with the WHERE clause.

Quiz 7

1. True or False. The GROUP BY clause can only be used in queries that contain at least two aggregate functions.

2. Will the following query work?


SELECT DATE () AS TodaysDate
FROM Transactions
GROUP BY CustomerID;

3. True or False. When using the GROUP BY clause with a WHERE clause, the GROUP BY clause must appear before the WHERE clause.

4. True or False. The GROUP BY clause must appear before the ORDER BY clause.

5. True or False. The HAVING clause filters rows before any data is grouped.

Project 7

Use the Transactions table in Figure 7-2 to display the customer IDs and the total number of products purchased by customers who only purchased one product.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset