In the SQL procedure, a CASE expression provides a way of determining what the resulting value will be from all rows in a table (or view). Similar to a DATA step SELECT statement (or IF-THEN/ELSE statement), a CASE expression is based on some condition and the condition uses a WHEN-THEN clause to determine what the resulting value will be. An optional ELSE expression can be specified to handle an alternative action if none of the expression(s) identified in the when-condition(s) is satisfied.
A CASE expression must be a valid PROC SQL expression and conform to syntax rules similar to DATA step SELECT-WHEN statements. Even though this topic is best explained by example, let’s take a quick look at the syntax.
CASE <column-name> WHEN when-condition THEN result-expression <WHEN when-condition THEN result-expression> ... <ELSE result-expression> END
A column name can optionally be specified as part of the CASE expression. If present, it is automatically made available to each WHEN-condition. When it is not specified, the column name must be coded in each WHEN-condition. Let’s examine how a CASE expression works.
If a WHEN-condition is satisfied by a row in a table (or view), then it is considered “true” and the result expression following the THEN expression is processed. The remaining WHEN conditions in the CASE expression are skipped. If a WHEN-condition is “false,” the next WHEN-condition is evaluated. SQL evaluates each WHEN-condition until a “true” condition is found; or in the event all WHEN-conditions are “false,” it then executes the ELSE expression and assigns its value to the CASE expression’s result. A missing value is assigned to a CASE expression when an ELSE expression is not specified and each WHEN-condition is “false.”
In the next example, let’s see how a CASE expression actually works. Suppose a value of “West”, “East”, “Central”, or “Unknown” is desired for each of the manufacturers. Using the manufacturer’s state of residence (MANUSTAT) column, a CASE expression could be constructed to assign the desired value in a unique column for each row of data. A value of “West” is assigned to the manufacturers in California, “East” for manufacturers in Florida, “Central” for manufacturers in Texas, and for all other manufacturers a value of “Unknown” is assigned to represent missing state values. A column heading of Region is assigned to the new derived output column using the AS keyword.
SQL Code
PROC SQL; SELECT MANUNAME, MANUSTAT, CASE WHEN MANUSTAT = 'CA' THEN 'West' WHEN MANUSTAT = 'FL' THEN 'East' WHEN MANUSTAT = 'TX' THEN 'Central' ELSE 'Unknown' END AS Region FROM MANUFACTURERS; QUIT;
Results
The SAS System Manufacturer Manufacturer Name State Region ________________________________________________ Cupid Computer TX Central Global Comm Corp CA West World Internet Corp FL East Storage Devices Inc CA West KPL Enterprises CA West San Diego PC Planet CA West |
Let’s look at another example. In the PRODUCTS table a column called PRODTYPE contains the type of product (for example, Laptop, Phone, Software, and Workstation) as a text string. Using a CASE expression the assignment of a new data value is derived from the coded values in the PRODTYPE column: “Laptop” = “ Hardware”, “ Phone” = “Hardware”, “Software” = “Software”, and “Workstation” = “Hardware”. A column heading of Product_Classification is assigned to the derived output column with the AS keyword.
PROC SQL; SELECT PRODNAME, CASE PRODTYPE WHEN 'Laptop' THEN 'Hardware' WHEN 'Phone' THEN 'Hardware' WHEN 'Software' THEN 'Software' WHEN 'Workstation' THEN 'Hardware' ELSE 'Unknown' END AS Product_Classification FROM PRODUCTS; QUIT;
Results
The SAS System Product_ Product Name Classification _________________________________________ Dream Machine Hardware Business Machine Hardware Travel Laptop Hardware Analog Cell Phone Hardware Digital Cell Phone Hardware Office Phone Hardware Spreadsheet Software Software Database Software Software Wordprocessor Software Software Graphics Software Software |
A popular convention among SQL users is to specify a COALESCE function in an expression to perform case logic. As described in Chapter 2, “Working with Data in PROC SQL,” the COALESCE function permits a new value to be substituted for one or more missing column values. When you specify COALESCE in an expression, PROC SQL evaluates each argument from left to right for the occurrence of a nonmissing value.
The first nonmissing value found in the list of arguments is returned; otherwise a missing value, or assigned value, is returned. This approach not only saves programming time; it makes code simpler to maintain.
Expressing logical expressions in one or more WHEN-THEN/ELSE statements is generally easy, and the expressions are easy to code, understand, and maintain. But as the complexities associated with case logic increase, the amount of coding also increases. In the following example a simple CASE expression is presented to illustrate how a value of “Unknown” is assigned and displayed when CUSTCITY is missing.
SQL Case Logic
PROC SQL; SELECT CUSTNAME, CASE WHEN CUSTCITY IS NOT NULL THEN CUSTCITY ELSE 'Unknown' END AS Customer_City FROM CUSTOMERS; QUIT;
To illustrate the usefulness of the COALESCE function as an alternative to case logic, the same query can be modified to achieve the same results as before. By replacing the case logic with a COALESCE expression as follows, the value of CUSTCITY is automatically displayed unless it is missing. In cases of character data a value of “Unknown” is displayed. This technique makes the COALESCE function a very useful and shorthand approach indeed.
SQL COALESCE Logic
PROC SQL; SELECT CUSTNAME, COALESCE(CUSTCITY,'Unknown') AS Customer_City FROM CUSTOMER; QUIT;
In cases where a COALESCE expression is used with numeric data, the value assigned or displayed must be of the same type as the expression. The next example shows a value of “0” (zero) being assigned and displayed when units (UNITS) from the PURCHASES table are processed.
PROC SQL;
SELECT ITEM,
COALESCE(UNITS, 0)
FROM PURCHASES;
QUIT;
Results
The SAS System
item Units
________________________
Chair 1
Pens 12
Paper 4
Stapler 1
Mouse Pad 1
Pens 24
Markers 0 |
Assigning data values and group data based on the existence of distinct values for specified table columns is a popular and frequently useful operation. Suppose you want to assign a specific data value and then group the output based on this assigned value. As a savvy SAS user you are probably thinking, “Hey, this is easy — I’ll just create a user-defined format and use it in the PRINT or REPORT procedure.”
In the next example, PROC FORMAT is used to assign temporary formatted values based on a range of values for INVENQTY. The result from executing this simple three-step (non-SQL procedure) program shows that the actual INVENQTY value is temporarily replaced with the “matched” value in the user-defined format. The FORMAT statement performs a look-up process to determine how the data should be displayed. The actual data value being looked up is not changed during the process, but a determination is made as to how its value should be displayed. The BY statement specifies how BY-group processing is to be constructed. The displayed results show the product numbers in relation to their respective inventory quantity status.
Non-SQL Code
PROC FORMAT; VALUE INVQTY 0 – 5 = 'Low on Stock – Reorder' 6 – 10 = 'Stock Levels OK' 11 – 99 = 'Plenty of Stock' 100 – 999 = 'Excessive Quantities'; RUN; PROC SORT DATA=INVENTORY; BY INVENQTY; RUN; PROC PRINT DATA=INVENTORY(KEEP=PRODNUM INVENQTY) NOOBS; FORMAT INVENQTY INVQTY.; RUN;
Results
The SAS System prodnum invenqty 5001 Low on Stock - Reorder 5002 Low on Stock - Reorder 5001 Low on Stock - Reorder 1700 Stock Levels OK 5003 Stock Levels OK 1110 Plenty of Stock 5004 Plenty of Stock |
The same results can also be derived using a CASE expression in the SQL procedure. In the next example, a CASE expression is constructed using the INVENTORY table to assign values to the user-defined column Inventory_Status. The biggest difference between the FORMAT procedure approach and a CASE expression is that the latter uses one step and does not replace the actual data value with the recoded result. Instead, it creates a new column containing the result of the CASE expression.
PROC SQL; SELECT PRODNUM, CASE WHEN INVENQTY LE 5 THEN 'Low on Stock - Reorder' WHEN 6 LE INVENQTY LE 10 THEN 'Stock Levels OK' WHEN 11 LE INVENQTY LE 99 THEN 'Plenty of Stock' ELSE 'Excessive Quantities' END AS Inventory_Status FROM INVENTORY ORDER BY INVENQTY; QUIT;
Results
The SAS System Product Number Inventory_Status ________________________________ 5001 Low on Stock - Reorder 5002 Low on Stock - Reorder 5001 Low on Stock - Reorder 1700 Stock Levels OK 5003 Stock Levels OK 1110 Plenty of Stock 5004 Plenty of Stock |
The existence of null values frequently introduces complexities for programmers. Instead of coding two-valued logic conditions, such as true and false, logic conditions must be designed to handle three-valued logic: true, false, and unknown. When developing logic conditions, you need to be ready to deal with the possibility of having null values. Program logic should test whether the current value of an expression contains a value or is empty (null).
Let’s examine a CASE expression that is meant to handle the possibility of having missing values in a table. Returning to an example presented earlier in this chapter, suppose we want to assign a value of “South West”, “South East”, “Central”, “Missing”, or “Unknown” to each of the manufacturers based on their state of residence.
SQL Code
PROC SQL; SELECT MANUNAME, MANUSTAT, CASE WHEN MANUSTAT = 'CA' THEN 'South West' WHEN MANUSTAT = 'FL' THEN 'South East' WHEN MANUSTAT = 'TX' THEN 'Central' WHEN MANUSTAT = ' ' THEN 'Missing' ELSE 'Unknown' END AS Region FROM MANUFACTURERS; QUIT;
Results
The SAS System Manufacturer Manufacturer Name State Region ___________________________________________________ Cupid Computer TX Central Global Comm Corp CA South West World Internet Corp FL South East Storage Devices Inc CA South West KPL Enterprises CA South West San Diego PC Planet CA South West |
The results indicate that there were no missing or null values in our database for the column being tested. But, suppose a new row of data were added containing null values in the manufacturer’s city and state of residence columns so our new row looked something like the following:
Manufacturer Number: 800 Manufacturer Name: Spring Valley Products Manufacturer City: <Missing> Manufacturer State: <Missing>.
The result would look something like the following if we ran the previous code again.
SQL Code
PROC SQL; SELECT MANUNAME, MANUSTAT, CASE WHEN MANUSTAT = 'CA' THEN 'South West' WHEN MANUSTAT = 'FL' THEN 'South East' WHEN MANUSTAT = 'TX' THEN 'Central' WHEN MANUSTAT = ' ' THEN 'Missing' ELSE 'Unknown' END AS Region FROM MANUFACTURERS; QUIT;
Results
The SAS System Manufacturer Manufacturer Name State Region ___________________________________________________ Cupid Computer TX Central Global Comm Corp CA South West World Internet Corp FL South East Storage Devices Inc CA South West KPL Enterprises CA South West San Diego PC Planet CA South West Spring Valley Products Missing |