SQL programmers have a number of ways to accomplish their objectives, particularly when the goal is to retrieve and work with data. The SELECT statement is an extremely powerful statement in the SQL language. Its syntax can be somewhat complex because of the number of ways that columns, tables, operators, functions, and predicates can be combined into executable statements.
There are several types of operators and functions in PROC SQL: 1) comparison operators, 2) logical operators, 3) arithmetic operators, 4) character string operators, 5) summary functions, and 6) predicates. Operators and functions provide value-added features for PROC SQL programmers. Each will be presented below.
Comparison operators are used in the SQL procedure to compare one character or numeric value to another. As in the DATA step, SQL comparison operators, mnemonics, and their descriptions appear in the following table.
SAS Operator | Mnemonic Operator | Description |
---|---|---|
= | EQ | Equal to |
^= or ¬= | NE | Not equal to |
< | LT | Less than |
<= | LE | Less than or equal to |
> | GT | Greater than |
>= | GE | Greater than or equal to |
Suppose you want to select only those products from the PRODUCTS table costing more than $300.00. The example below illustrates the use of the greater than sign (>) in a WHERE clause to select products meeting the condition.
SQL Code
PROC SQL; SELECT PRODNAME, PRODTYPE, PRODCOST FROM PRODUCTS WHERE PRODCOST > 300; QUIT;
Results
The SAS System Product Product Name Product Type Cost ________________________________________________ ______ Dream Machine Workstation $3 ,200.00 Business Machine Workstation $3 ,300.00 Travel Laptop Laptop $3 ,400.00 Database Software Software $399.00 |
PROC SQL also supports the use of truncated string comparison operators. These operators work by first truncating the longer string to the same length as the shorter string, and then performing the specified comparison. Using any of the comparison operators has no permanent effect on the strings themselves. The list of truncated string comparison operators and their meanings appears below.
Truncated String Comparison Operator | Description |
---|---|
EQT | Equal to |
GTT | Greater than |
LTT | Less than |
GET | Greater than or equal to |
LET | Less than or equal to |
NET | Not equal to |
Logical operators are used to connect two or more expressions together in a WHERE or HAVING clause. The available logical operators are AND, OR, and NOT. Suppose you want to select only those software products costing more than $300.00. The example illustrates how the AND operator is used to ensure that both conditions are true.
SQL Code
PROC SQL; SELECT PRODNAME, PRODTYPE, PRODCOST FROM PRODUCTS WHERE PRODTYPE = 'Software' AND PRODCOST > 300; QUIT;
Results
The SAS System Product Product Name Product Type Cost ________________________________________________ _____ Database Software Software $399.00 |
The next example illustrates the use of the OR logical operator to select software products or products that cost more than $300.
SQL Code
PROC SQL; SELECT PRODNAME, PRODTYPE, PRODCOST FROM PRODUCTS WHERE PRODTYPE = 'Software' OR PRODCOST > 300; QUIT;
Results
The SAS System Product Product Name Product Type Cost ________________________________________________ _____ Dream Machine Workstation $3 ,200.00 Business Machine Workstation $3 ,300.00 Travel Laptop Laptop $3 ,400.00 Spreadsheet Software Software $299.00 Database Software Software $399.00 Wordprocessor Software Software $299.00 Graphics Software Software $299.00 |
The next example illustrates the use of the NOT logical operator to select products that are not software products and do not cost more than $300.
PROC SQL; SELECT PRODNAME, PRODTYPE, PRODCOST FROM PRODUCTS WHERE NOT PRODTYPE = 'Software' AND NOT PRODCOST > 300; QUIT;
Results
The SAS System Product Product Name Product Type Cost ________________________________________________ _____ Analog Cell Phone Phone $35.00 Digital Cell Phone Phone $175.00 Office Phone Phone $130.00 |
The arithmetic operators used in PROC SQL are the same as those used in the DATA step as well as those found in other languages like C, Pascal, FORTRAN, and COBOL. The arithmetic operators available in the SQL procedure appear below.
Operator | Description |
---|---|
+ | Addition |
- | Subtraction |
* | Multiplication |
/ | Division |
** | Exponent (raises to a power) |
= | Equals |
To illustrate how arithmetic operators are used, suppose you want to apply a discount of 20% to the product price (PRODCOST) in the PRODUCTS table and display the results in ascending order by the discounted price. Note that the computed column (PRODCOST * 0.80) does not automatically create a column header on output.
SQL Code
PROC SQL; SELECT PRODNAME, PRODTYPE, PRODCOST * 0.80 FROM PRODUCTS; QUIT;
Results
The SAS System Product Name Product Type ________________________________________________ _____ Dream Machine Workstation 2560 Business Machine Workstation 2640 Travel Laptop Laptop 2720 Analog Cell Phone Phone 28 Digital Cell Phone Phone 140 Office Phone Phone 104 Spreadsheet Software Software 239.2 Database Software Software 319.2 Wordprocessor Software Software 239.2 Graphics Software Software 239.2 |
In the next example, suppose you wanted to reference a column that was calculated in the SELECT statement. PROC SQL allows references to a computed column in the same SELECT statement (or a WHERE clause) using the CALCULATED keyword. Note that the computed columns have column aliases created for them using the AS keyword. If the CALCULATED keyword were not specified preceding the calculated column, an error would have been generated.
SQL Code
PROC SQL;
SELECT PRODNAME,
PRODTYPE,
PRODCOST * 0.80 AS DISCOUNT_PRICE
FORMAT=DOLLAR9.2,
PRODCOST – CALCULATED DISCOUNT_PRICE AS LOSS
FORMAT=DOLLAR7.2
FROM PRODUCTS
ORDER BY 3;
QUIT;
Results
The SAS System DISCOUNT_ Product Name Product Type PRICE LOSS _______________________________________________ __________________ Analog Cell Phone Phone $28.00 $7.00 Office Phone Phone $104.00 $26.00 Digital Cell Phone Phone $140.00 $35.00 Spreadsheet Software Software $239.20 $59.80 Graphics Software Software $239.20 $59.80 Wordprocessor Software Software $239.20 $59.80 Database Software Software $319.20 $79.80 Dream Machine Workstation $2,560.00 $640.00 Business Machine Workstation $2,640.00 $660.00 Travel Laptop Laptop $2,720.00 $680.00 |
Character string operators and functions are typically used with character data. Numerous operators are presented to make you aware of the power available with the SQL procedure. As you become familiar with each of the operators, you’ll find their real strength as you begin to nest functions within each other.
The following example illustrates a basic concatenation operator that is used to concatenate two columns and a text string. Note that the created column is not labeled. The concatenation operator will be discussed in greater detail in Chapter 3, “Formatting Output.”
SQL Code
PROC SQL; SELECT MANUCITY || "," || MANUSTAT FROM MANUFACTURERS; QUIT;
The SAS System _______________________ Houston ,TX San Diego ,CA Miami ,FL San Mateo ,CA San Diego ,CA San Diego ,CA |
The LENGTH function is used to obtain the length of a character string column. LENGTH returns a number equal to the number of characters in the argument. Note that the computed column (LENGTH(PRODNAME)) has a column header created for it called Length by specifying the AS keyword. This example illustrates using the LENGTH function to determine the length of data values.
PROC SQL; SELECT PRODNUM, PRODNAME, LENGTH(PRODNAME) AS Length FROM PRODUCTS; QUIT;
Results
The SAS System Product Number Product Name Length _____________________________________________ 1110 Dream Machine 13 1200 Business Machine 16 1700 Travel Laptop 13 2101 Analog Cell Phone 17 2102 Digital Cell Phone 18 2200 Office Phone 12 5001 Spreadsheet Software 20 5002 Database Software 17 5003 Wordprocessor Software 22 5004 Graphics Software 17 |
As in the DATA step, many functions can be used in the SQL procedure. To modify one or more existing rows in a table, the UPDATE statement is used (see Chapter 6, “Modifying and Updating Tables and Indexes,” for more details). The UPDATE statement with SET clause changes the contents of a data value (functioning the same way as a DATA step assignment statement) by assigning a new value to the column identified to the left of the equal sign by a constant or expression referenced to the right of the equal sign.
The UPDATE statement does not automatically produce any output, except for the log messages based on the operation results itself. To illustrate the use of DATA step functions and operators in the SQL procedure, the next example shows a SCAN function that isolates the first piece of information from product name (PRODNAME), a TRIM function to remove trailing blanks from product type (PRODTYPE), and a concatenation operator “||” that concatenates the first character expression with the second expression. Exercise care when using the SCAN function because it returns a 200-byte string.
SQL Code
PROC SQL; UPDATE PRODUCTS SET PRODNAME = SCAN(PRODNAME,1) || TRIM(PRODTYPE); QUIT;
SAS Log Results
PROC SQL; UPDATE PRODUCTS SET PRODNAME = SCAN(PRODNAME,1) || TRIM (PRODTYPE); NOTE: 10 rows were updated in PRODUCTS. QUIT; |
An optional WHERE clause can be specified limiting the number of rows that modifications will be applied to. The next example illustrates using a WHERE clause to restrict the number of rows that are updated in the previous example to just “phones”, excluding all the other rows.
SQL Code
PROC SQL; UPDATE PRODUCTS SET PRODNAME = SCAN(PRODNAME,1) || TRIM(PRODTYPE) WHERE PRODTYPE IN ('Phone'), QUIT;
PROC SQL; UPDATE PRODUCTS SET PRODNAME = SCAN(PRODNAME,1) || TRIM (PRODTYPE) WHERE PRODTYPE IN ('Phone'), NOTE: 3 rows were updated in PRODUCTS. QUIT; |
The default alignment for character data is to the left. However, character columns or expressions can also be aligned to the right. Two functions are available for character alignment: LEFT and RIGHT. The next example combines the concatenation operator “||” and the TRIM function with the LEFT function to left align a character expression while inserting a comma “,” and blank between the columns.
SQL Code
PROC SQL; SELECT LEFT(TRIM(MANUCITY) || ", " || MANUSTAT) FROM MANUFACTURERS; QUIT;
Results
The SAS System _______________________ Houston, TX San Diego, CA Miami, FL San Mateo, CA San Diego, CA San Diego, CA |
The next example illustrates how character data can be right aligned using the RIGHT function.
PROC SQL; SELECT RIGHT(MANUCITY) FROM MANUFACTURERS; QUIT;
Results
The SAS System ____________________ Houston San Diego Miami San Mateo San Diego San Diego |
To find the occurrence of a pattern, the INDEX function can be used. Frequently, requirements call for a column to be searched using a specific character string. The INDEX function can be used in the SQL procedure to search for patterns in a character string. The character string is searched from left to right for the first occurrence of the specified value. If the desired string is found, the column position of the first character is returned. Otherwise a value of zero (0) is returned. The following arguments are used to search for patterns in a column: the character column or expression and the character string to search for. To find all products with the characters “phone” in the product name (PRODNAME) column, the following code can be specified:
PROC SQL; SELECT PRODNUM, PRODNAME, PRODTYPE FROM PRODUCTS WHERE INDEX(PRODNAME, 'phone') > 0; QUIT;
Results
PROC SQL; SELECT PRODNUM, PRODNAME, PRODTYPE FROM PRODUCTS WHERE INDEX(PRODNAME, 'phone') > 0; NOTE: No rows were selected. QUIT; |
As in the DATA step, no rows were selected because the search is case-sensitive and “phone” is specified as all lowercase characters.
The SAS System provides two functions that enable you to change the case of a string’s characters: LOWCASE and UPCASE. The LOWCASE function converts all of the characters in a string or expression to lowercase characters. The UPCASE function converts all of the characters in a string or expression to uppercase characters.
In the previous example, the results of the search were negative even though the character string “phone” appeared multiple times in more than one row. In order to make this search recognize all the possible lower- and uppercase variations of the word “phone”, the search criteria in the WHERE clause could be made “smarter” by combining an UPCASE function with the INDEX function as follows.
SQL Code
PROC SQL; SELECT PRODNUM, PRODNAME, PRODTYPE FROM PRODUCTS WHERE INDEX(UPCASE(PRODNAME), 'PHONE') > 0; QUIT;
Results
The SAS System Product Number Product Name Product Type _______________________________________________ _____ 2101 Analog Cell Phone Phone 2102 Digital Cell Phone Phone 2200 Office Phone Phone |
In the next example, the LOWCASE function is combined with the INDEX function to produce the identical output from the previous example.
SQL Code
PROC SQL; SELECT PRODNUM, PRODNAME, PRODTYPE FROM PRODUCTS WHERE INDEX(LOWCASE(PRODNAME), 'phone') > 0; QUIT;
The SAS System Product Number Product Name Product Type _______________________________________________ _____ 2101 Analog Cell Phone Phone 2102 Digital Cell Phone Phone 2200 Office Phone Phone |
Occasionally, processing requirements call for specific pieces of information to be extracted from a column. In these situations the SUBSTR function can be used with a character column by specifying a starting position and the number of characters to extract. The following example illustrates how the SUBSTR function is used to capture the first 4 bytes from the product type (PRODTYPE) column.
SQL Code
PROC SQL; SELECT PRODNUM, PRODNAME, PRODTYPE, SUBSTR(PRODTYPE,1,4) FROM PRODUCTS WHERE PRODCOST > 100.00; QUIT;
The SAS System Product Number Product Name Product Type _______________________________________________ _______________ 1110 Dream Machine Workstation Work 1200 Business Machine Workstation Work 1700 Travel Laptop Laptop Lapt 2102 Digital Cell Phone Phone Phon 2200 Office Phone Phone Phon 5001 Spreadsheet Software Software Soft 5002 Database Software Software Soft 5003 Wordprocessor Software Software Soft 5004 Graphics Software Software Soft |
A technique for finding names that sound alike or have spelling variations is available in the SQL procedure. This frequently used technique is referred to as phonetic matching and is performed using the Soundex algorithm. In Joe Celko’s book, SQL for Smarties: Advanced SQL Programming (pages 83-87), he traced the origins of the Soundex algorithm to the developers Margaret O’Dell and Robert C. Russell in 1918. Developed before the first computer, this technique was often used by clerks to manually search for similar sounding names.
Although not technically a function, the sounds-like operator “=*” searches and selects character data based on two expressions: the search value and the matched value. Anyone that has looked for a last name in a local telephone directory is quickly reminded of the possible phonetic variations.
To illustrate how the sounds-like operator works, we will search on last name in a table called CUSTOMERS2. The CUSTOMERS2 table is illustrated below. Although each name has phonetic variations and sounds the same, the results of “Laughler,” “Loffler,” and “Laffler” are spelled differently (illustrated below). The following PROC SQL code uses the sounds-like operator to find all customers that sound like “Lafler”.
CUSTNUM CUSTNAME CUSTCITY ______________________________________ 1 Smith San Diego 7 Lafler Spring Valley 11 Jones Carmel 13 Thompson Miami 7 Loffler Spring Valley 1 Smithe San Diego 7 Laughler Spring Valley 7 Laffler Spring Valley
PROC SQL; SELECT CUSTNUM, CUSTNAME, CUSTCITY FROM CUSTOMERS2 WHERE CUSTNAME =* 'Lafler'; QUIT;
Results
The SAS System CUSTNUM CUSTNAME CUSTCITY _________________________________________ 7 Lafler Spring Valley 7 Loffler Spring Valley 7 Laffler Spring Valley |
Readers familiar with the DATA step SOUNDEX(argument) function to search a string are cautioned that it cannot be used in an SQL WHERE clause. Instead the sounds-like operator “=*” must be specified; otherwise a result of no rows will be selected.
Notice that only three of the four possible phonetic matches were selected in the preceding example (that is, Lafler, Loffler, and Laffler). The fourth possibility “Laughler” was not chosen as a “matched” value in the search by the sounds-like algorithm. In an attempt to overcome the inherent limitation with the sounds-like operator, as described in Celko’s SQL for Smarties (see earlier reference), and to derive a broader list of “matched” values, programmers should make every attempt to develop a comprehensive list of search values to widen the scope of possibilities. We can expand our original search criteria in the previous example to include the missing possibilities using OR logic.
SQL Code
PROC SQL; SELECT CUSTNUM, CUSTNAME, CUSTCITY FROM CUSTOMERS2 WHERE CUSTNAME =* 'Lafler' OR CUSTNAME =* 'Laughler' OR CUSTNAME =* 'Lasler'; QUIT;
Results
The SAS System CUSTNUM CUSTNAME CUSTCITY _________________________________________ 7 Lafler Spring Valley 7 Loffler Spring Valley 7 Laughler Spring Valley 7 Laffler Spring Valley |
The first example provides a way to find the first nonmissing value in a column or list. Specified in a SELECT statement, the COALESCE function inspects a column, or, in the case of a list, scans the arguments from left to right, and returns the first nonmissing or non-null value. If all values are missing, the result is missing. To take advantage of the COALESCE function, use arguments all of the same data type. The next example illustrates one approach to computing the total cost for each product purchased based on the number of units and unit costs columns in the PURCHASES table. If either the UNITS or UNITCOST columns contain a missing value, a zero is assigned to prevent the propagation of missing values.
PROC SQL; SELECT CUSTNUM, ITEM, UNITS, UNITCOST, (COALESCE(UNITS, 0) * COALESCE(UNITCOST, 0)) AS Totcost FORMAT=DOLLAR6.2 FROM PURCHASES; QUIT;
Results
The SAS System Custnum Item Units Unitcost Totcost ________________________________________________ ______ 1 Chair 1 $179.00 179.00 1 Pens 12 $0.89 $10.68 1 Paper 4 $6.95 $27.80 1 Stapler 1 $8.95 $8.95 7 Mouse Pad 1 $11.79 $11.79 7 Pens 24 $1.59 $38.16 13 Markers . $0.99 $0.00 |
A unique undocumented, but unsupported, feature for producing a row (observation) count can be obtained with the MONOTONIC( ) function. Similar to the row numbers produced and displayed on output from the PRINT procedure (without the NOOBS option specified), the MONOTONIC() function displays row numbers too. The MONOTONIC() function automatically creates a column (variable) in the output results or in a new table. Because this is an undocumented feature and not supported in the SQL procedure, you are cautioned that it is possible to obtain duplicates or missing values with the MONOTONIC() function. The next example illustrates the creation of a row number using the MONOTONIC() function in a SELECT statement.
PROC SQL; SELECT MONOTONIC() AS Row_Number FORMAT=COMMA6., ITEM, UNITS, UNITCOST FROM PURCHASES; QUIT;
Results
The SAS System Row_Number Item Units Unitcost ______________________________________________ 1 Chair 1 $179.00 2 Pens 12 $0.89 3 Paper 4 $6.95 4 Stapler 1 $8.95 5 Mouse Pad 1 $11.79 6 Pens 24 $1.59 7 Markers . $0.99 |
A row number can also be produced with the documented and supported SQL procedure option NUMBER. Unlike the MONOTONIC() function, the NUMBER option does not create a new column in a new table. The NUMBER option is illustrated below.
SQL Code
PROC SQL NUMBER; SELECT ITEM, UNITS, UNITCOST FROM PURCHASES; QUIT;
The SAS System Row Item Units Unitcost __________________________________________ 1 Chair 1 $179.00 2 Pens 12 $0.89 3 Paper 4 $6.95 4 Stapler 1 $8.95 5 Mouse Pad 1 $11.79 6 Pens 24 $1.59 7 Markers . $0.99 |
The SQL procedure is a wonderful tool for summarizing (or aggregating) data. It provides a number of useful summary (or aggregate) functions to help perform calculations, descriptive statistics, and other aggregating operations in a SELECT statement or HAVING clause. These functions are designed to summarize information and not display detail about data.
Without the availability of summary functions, you would have to construct the necessary logic using somewhat complicated SQL programming constructs. When using a summary function without a GROUP BY clause (see Chapter 3 for details), all the rows in a table are treated as a single group. Consequently, the results are often a single row value.
A number of summary functions are available including facilities to count nonmissing values; determine the minimum and maximum values in specific columns; return the range of values; compute the mean, standard deviation, and variance of specific values; and perform other aggregating functions. The following table is an alphabetical list of the available summary functions. When multiple names for the same function are available, the ANSI-approved name appears first.
The next example uses the COUNT function with the (*) argument to produce a total number of rows, regardless if data is missing. The asterisk (*) is specified as the argument to the COUNT function to count all rows in the PURCHASES table.
SQL Code
PROC SQL; SELECT COUNT(*) AS Row_Count FROM PURCHASES; QUIT;
Results
The SAS System Row_Count _________ 7 |
Unlike the COUNT(*) function syntax that counts all rows, regardless if data is missing or not, the next example uses the COUNT function with the (column-name) argument to produce a total number of nonmissing rows based on the column UNITS.
SQL Code
PROC SQL; SELECT COUNT(UNITS) AS Non_Missing_Row_Count FROM PURCHASES; QUIT;
Results
The SAS System Non_Missing_ Row_Count ____________ 6 |
The MIN summary function can be specified to determine what the least expensive product is in the PRODUCTS table.
PROC SQL; SELECT MIN(prodcost) AS Cheapest Format=dollar9.2 Label='Least Expensive' FROM PRODUCTS; QUIT;
Results
The SAS System Least Expensive _________ $35.00 |
In the next example, the SUM function is specified to sum numeric data types for a selected column. Suppose you wanted to determine the total costs of all purchases by customers who bought pens and markers. You could construct the following query to sum all nonmissing values for customers who purchased pens and markers in the PURCHASES table as follows.
SQL Code
PROC SQL; SELECT SUM((UNITS) * (UNITCOST)) AS Total_Purchases FORMAT=DOLLAR6.2 FROM PURCHASES WHERE UPCASE(ITEM)='PENS' OR UPCASE(ITEM)='MARKERS'; QUIT;
Total_ Purchases ____________ $48.84 |
Data can also be summarized down rows (observations) as well as across columns (variables). This flexibility gives SAS users an incredible range of power and the ability to take advantage of several SAS-supplied (or built-in) summary functions. These techniques permit the average of quantities rather than the set of all quantities. Without the ability to summarize data in PROC SQL, users would be forced to write complicated formulas and/or routines or even write and test DATA step programs to summarize data. Two examples illustrate how SQL can be constructed to summarize data: 1) summarizing data down rows and 2) summarizing data across columns.
Summarizing data down rows
The SQL procedure can be used to produce a single aggregate value by summarizing data down rows (or observations). The advantage of using a summary function in PROC SQL is that it will generally compute the aggregate quicker than if a user-defined equation were constructed. It also saves the effort of having to construct and test a program containing the user-defined equation in the first place. Suppose you wanted to know the average product cost for all software in the PRODUCTS table containing a variety of products. The following query computes the average product cost and produces a single aggregate value using the AVG function.
SQL Code
PROC SQL; SELECT AVG(PRODCOST) AS Average_Product_Cost FROM PRODUCTS WHERE UPCASE(PRODTYPE) IN ('SOFTWARE'), QUIT;
Average_
Product_Cost
324 |
Summarizing data across columns
When a computation is needed on two or more columns in a row, the SQL procedure can be used to summarize data across columns. Suppose you wanted to know the average cost of products in inventory. The next example computes the average inventory cost for each product without using a summary function, and once computed displays the value for each row as Average_Price.
SQL Code
PROC SQL; SELECT PRODNUM, (INVPRICE / INVQTY) AS Averge_Price FORMAT=DOLLAR8.2 FROM INVOICE; QUIT;
Results
Product Number Averge_Price ______________________ 5001 $299.00 6001 $799.00 2101 $35.00 1110 $3200.00 5002 $399.00 6000 $99.00 1200 $3300.00 |
Predicates are used in PROC SQL to perform direct comparisons between two conditions or expressions. Six predicates will be looked at:
BETWEEN
IN
IS NULL, IS MISSING
LIKE
EXISTS
The BETWEEN predicate is a way of simplifying a query by selecting column values within a designated range of values. BETWEEN is equivalent to one LE (less than or equal) and one GE (greater than or equal) condition being ANDed together. It is extremely flexible because it works with character, numeric, and date values. Programmers can also combine two or more BETWEEN predicates with AND or OR operators for more complicated conditions. In the next example a range of products costing between $200 and $500 inclusively are selected from the PRODUCTS table.
SQL Code
PROC SQL; SELECT PRODNAME, PRODTYPE, PRODCOST FROM PRODUCTS WHERE PRODCOST BETWEEN 200 AND 500; QUIT;
The SAS System Product Product Name Product Type Cost ________________________________________________ _____ Spreadsheet Software Software $299.00 Database Software Software $399.00 Wordprocessor Software Software $299.00 Graphics Software Software $299.00 |
In the next example, products are selected from the INVENTORY table that were ordered between the years 1999 and 2000. The YEAR function returns the year portion from a SAS date value and is used as the range of values in the WHERE clause.
SQL Code
PROC SQL; SELECT PRODNUM, INVENQTY, ORDDATE FROM INVENTORY WHERE YEAR(ORDDATE) BETWEEN 1999 AND 2000; QUIT;
The SAS System Date Inventory Product Inventory Last Number Quantity Ordered _______________________________ 1110 20 09/01/2000 1700 10 08/15/2000 5001 5 08/15/2000 5002 3 08/15/2000 5003 10 08/15/2000 5004 20 09/01/2000 5001 2 09/01/2000 |
The BETWEEN predicate and OR operator are used together in the next example to select products ordered between 1999 and 2000 or where inventory quantities are greater than 15. The YEAR function returns the year portion from a SAS date value and is used as the range of values in the WHERE clause.
SQL Code
PROC SQL; SELECT PRODNUM, INVENQTY, ORDDATE FROM INVENTORY WHERE (YEAR(ORDDATE) BETWEEN 1999 AND 2000) OR INVENQTY > 15; QUIT;
The SAS System Date Inventory Product Inventory Last Number Quantity Ordered _______________________________ 1110 20 09/01/2000 1700 10 08/15/2000 5001 5 08/15/2000 5002 3 08/15/2000 5003 10 08/15/2000 5004 20 09/01/2000 5001 2 09/01/2000 |
The IN predicate selects one or more rows based on the matching of one or more column values in a set of values. The IN predicate creates an OR condition between each value and returns a Boolean value of true if a column value is equal to one or more of the values in the expression list. Although the IN predicate can be specified with single column values, it may be less costly to specify the “=” sign instead. In the next example, the “=” sign is used rather than the IN predicate to select phones from the PRODUCTS table.
SQL Code
PROC SQL; SELECT PRODNAME, PRODTYPE, PRODCOST FROM PRODUCTS WHERE UPCASE(PRODTYPE) = 'PHONE'; QUIT;
Results
The SAS System Product Product Name Product Type Cost ________________________________________________ _____ Analog Cell Phone Phone $35.00 Digital Cell Phone Phone $175.00 Office Phone Phone $130.00 |
In the next example, both phones and software products are selected from the PRODUCTS table. To avoid having to specify two OR conditions, you can specify the IN predicate.
SQL Code
PROC SQL; SELECT PRODNAME, PRODTYPE, PRODCOST FROM PRODUCTS WHERE UPCASE(PRODTYPE) IN ('PHONE', 'SOFTWARE'), QUIT;
The SAS System Product Product Name Product Type Cost ________________________________________________ _____ Analog Cell Phone Phone $35.00 Digital Cell Phone Phone $175.00 Office Phone Phone $130.00 Spreadsheet Software Software $299.00 Database Software Software $399.00 Wordprocessor Software Software $299.00 Graphics Software Software $299.00 |
The IS NULL predicate is the ANSI approach to selecting one or more rows by evaluating whether a column value is missing or null (see earlier section on null values). The next example selects products from the INVENTORY table that are out-of-stock in inventory.
SQL Code
PROC SQL; SELECT PRODNUM, INVENQTY, INVENCST FROM INVENTORY WHERE INVENQTY IS NULL; QUIT;
PROC SQL; SELECT PRODNUM, INVENQTY, INVENCST FROM INVENTORY WHERE INVENQTY IS NULL; NOTE: No rows were selected. QUIT; NOTE: PROCEDURE SQL used: real time 0.05 seconds |
In the next example products are selected from the INVENTORY table that are currently stocked in inventory. Note that the predicates NOT IS NULL or IS NOT NULL can be specified to produce the same results.
SQL Code
PROC SQL; SELECT PRODNUM, INVENQTY, INVENCST FROM INVENTORY WHERE INVENQTY IS NOT NULL; QUIT;
The SAS System Product Inventory Inventory Number Quantity Cost ______________________________ 1110 20 $45,000.00 1700 10 $28,000.00 5001 5 $1,000.00 5002 3 $900.00 5003 10 $2,000.00 5004 20 $1,400.00 5001 2 $1,200.00 |
The IS MISSING predicate performs identically to the IS NULL predicate by selecting one or more rows containing a missing value (null). The only difference is that specifying IS NULL is the ANSI standard way of expressing the predicate and IS MISSING is commonly used in the SAS System.
The next example uses the IS MISSING predicate with the NOT predicates to select products from the INVENTORY table that are stocked in inventory.
SQL Code
PROC SQL; SELECT PRODNUM, INVENQTY, INVENCST FROM INVENTORY WHERE INVENQTY IS NOT MISSING; QUIT;
Constructing specific search patterns in string expressions is a simple process with the LIKE predicate. The % acts as a wildcard character representing any number of characters, including any combination of upper- or lowercase characters. Combining the LIKE predicate with the % (percent sign) permits case-sensitive searches and is a popular technique used by savvy SQL programmers to find patterns in their data.
Using the LIKE operator with the % (percent sign) provides a wildcard capability enabling the selection of table rows that match a specific pattern. The LIKE predicate is case-sensitive and should be used with care. The wildcard character % preceding and following the search word selects all product types with “Soft” in the name. The following WHERE clause finds patterns in product name (PRODNAME) containing the uppercase character “A” in the first position followed by any number of characters.
SQL Code
PROC SQL; SELECT PRODNAME FROM PRODUCTS WHERE PRODNAME LIKE 'A%'; QUIT;
Results
The SAS System Product Name _________________________ Analog Cell Phone |
The next example selects products whose name contains the word “Soft”. The resulting output contains product types such as “Software” and any other products containing the word “Soft”.
SQL Code
PROC SQL; SELECT PRODNAME, PRODTYPE, PRODCOST FROM PRODUCTS WHERE PRODTYPE LIKE '%Soft%'; QUIT;
The SAS System Product Product Name Product Type Cost ________________________________________________ _____ Spreadsheet Software Software $299.00 Database Software Software $399.00 Wordprocessor Software Software $299.00 Graphics Software Software $299.00 |
In the next example, the LIKE predicate is used to check a column for the existence of trailing blanks. The wildcard character % followed by a blank space is specified as the search argument.
SQL Code
PROC SQL; SELECT PRODNAME FROM PRODUCTS WHERE PRODNAME LIKE '% '; QUIT;
The SAS System Product Name _________________________ Dream Machine Business Machine Travel Laptop Analog Cell Phone Digital Cell Phone Office Phone Spreadsheet Software Database Software Wordprocessor Software Graphics Software |
When a pattern search for a specific number of characters is needed, using the LIKE predicate with the underscore (_) provides a way to pattern match character by character. Thus, a single underscore (_) in a specific position acts as a wildcard placement holder for that position only. Two consecutive underscores (__) act as a wildcard placement holder for those two positions. Three consecutive underscores act as a wildcard placement holder for those three positions. And so forth. In the next example, the first position used to search product type contains the character “P” and the next five positions (represented with five underscores) act as a placeholder for any value.
SQL Code
PROC SQL; SELECT PRODNAME, PRODTYPE, PRODCOST FROM PRODUCTS WHERE UPCASE(PRODTYPE) LIKE 'P_____'; QUIT;
The SAS System Product Product Name Product Type Cost ________________________________________________ _____ Analog Cell Phone Phone $35.00 Digital Cell Phone Phone $175.00 Office Phone Phone $130.00 |
The next example illustrates a pattern search of product name (PRODNAME) where the first three positions are represented as a wildcard; the fourth position contains the lowercase character “a”, followed by any combination of upper- or lowercase characters.
SQL Code
PROC SQL; SELECT PRODNAME FROM PRODUCTS WHERE PRODNAME LIKE '___a%'; QUIT;
Results
The SAS System Product Name _________________________ Dream Machine Database Software |
The EXISTS predicate is used to test for the existence of a set of values. In the next example, a subquery is used to check for the existence of customers in the CUSTOMERS table with purchases from the PURCHASES table. More details on subqueries will be presented in Chapter 7, “Coding Complex Queries.”
PROC SQL; SELECT CUSTNUM, CUSTNAME, CUSTCITY FROM CUSTOMERS2 C WHERE EXISTS (SELECT * FROM PURCHASES P WHERE C.CUSTNUM = P.CUSTNUM); QUIT;
Results
The SAS System Customer Number Customer Name Customer's Home City _____________________________________________ _____________ 1 Smith San Diego 7 Lafler Spring Valley 13 Thompson Miami 7 Loffler Spring Valley 1 Smithe San Diego 7 Laughler Spring Valley 7 Laffler Spring Valley |