Now that we have seen how tables are combined with join queries and subqueries, we turn our attention to another type of complex query. The SQL procedure provides users with several table operators: INTERSECT, UNION, OUTER UNION, and EXCEPT, commonly referred to as set operators. In contrast to joins and subqueries where query results are combined horizontally, the purpose of each set operator is to combine or concatenate query results vertically.
The INTERSECT operator creates query results consisting of all the unique rows from the intersection of the two queries. Put another way, the intersection of two queries (A and B) is represented by C, indicating that the rows that are produced occur in both A and in B. As the following figure shows, the intersection of both queries is represented in the shaded area (C).
To see all products that cost less than $300.00 and product types classified as “phone”, you could construct a simple query with a WHERE clause or specify the intersection of two separate queries. The next example illustrates a simple query that specifies a WHERE clause to display phones that cost less than $300.
SQL Code
PROC SQL; SELECT * FROM PRODUCTS WHERE prodcost < 300.00 AND prodtype = 'Phone'; QUIT;
Results
The SAS System Product Manufacturer Product Number Product Name Number Product Type Cost __________________________________________________ ____________________ 2101 Analog Cell Phone 210 Phone $35.00 2102 Digital Cell Phone 210 Phone $175.00 2200 Office Phone 220 Phone $130.00 |
The INTERSECT approach can be constructed to produce the same results as in the previous example. The INTERSECT process assumes that the tables in each query are structurally identical to each other. It overlays the columns from both queries based on position in the SELECT statement. Should you attempt to intersect two queries with different table structures, the process may fail due to differing column types, or the produced results may contain data integrity issues.
The most significant distinction between the two approaches, and one that may affect large table processing, is that the first query example (using the AND operator) takes less time to process: 0.05 seconds versus 0.17 seconds for the second approach (using the INTERSECT operator). The next example shows how the INTERSECT operator achieves the same result less efficiently.
SQL Code
PROC SQL; SELECT * 1 FROM PRODUCTS WHERE prodcost < 300.00 INTERSECT 2 SELECT * 1 FROM PRODUCTS WHERE prodtype = "Phone"; QUIT;
| It is assumed that the tables in both queries are structurally identical because the wildcard character “*” is specified in the SELECT statement. |
| The INTERSECT operator produces rows common to both queries. |
Results
The SAS System Product Manufacturer Product Number Product Name Number Product Type Cost ________________________________________________ _____________________ 2101 Analog Cell Phone 210 Phone $35.00 2102 Digital Cell Phone 210 Phone $175.00 2200 Office Phone 220 Phone $130.00 |
The UNION operator preserves all the unique rows from the combination of queries. The result is the same as if an OR operator is used to combine the results of each query. Put another way, the union of two queries (A and B) represents rows in A or in B or in both A and B. As illustrated in the figure below, the union represents the entire shaded area (A, B, and C).
UNION automatically eliminates duplicate rows from the results, unless the ALL keyword is specified as part of the UNION operator. The column names assigned to the results are derived from the names in the first query.
In order for the union of two or more queries to be successful, each query must specify the same number of columns of the same or compatible types. Type compatibility means that column attributes are defined the same way. Because column names and attributes are derived from the first table, data types must be of the same type. The data types of the result columns are derived from the source table(s).
To see all products that cost less than $300.00 or products classified as a workstation, you have a choice between using OR as in the following query or UNION as in the next. As illustrated in the output from both queries, the results are identical no matter which query is used.
SQL Code
PROC SQL; SELECT * FROM PRODUCTS WHERE prodcost < 300.00 OR prodtype = "Workstation"; QUIT;
Results
The SAS System Product Manufacturer Product Number Product Name Number Product Type Cost __________________________________________________ ____________________ 1110 Dream Machine 111 Workstation $3,200.00 1200 Business Machine 120 Workstation $3,300.00 2101 Analog Cell Phone 210 Phone $35.00 2102 Digital Cell Phone 210 Phone $175.00 2200 Office Phone 220 Phone $130.00 5001 Spreadsheet Software 500 Software $299.00 5003 Wordprocessor Software 500 Software $299.00 5004 Graphics Software 500 Software $299.00 |
In the next example, the UNION operator is specified to combine the results of both queries.
SQL Code
PROC SQL; SELECT * FROM PRODUCTS WHERE prodcost < 300.00 UNION 1 SELECT * FROM PRODUCTS WHERE prodtype = 'Workstation'; QUIT;
| The UNION operator combines the results of two queries. |
Results
The SAS System Product Manufacturer Product Number Product Name Number Product Type Cost __________________________________________________ ___________________ 1110 Dream Machine 111 Workstation $3,200.00 1200 Business Machine 120 Workstation $3,300.00 2101 Analog Cell Phone 210 Phone $35.00 2102 Digital Cell Phone 210 Phone $175.00 2200 Office Phone 220 Phone $130.00 5001 Spreadsheet Software 500 Software $299.00 5003 Wordprocessor Software 500 Software $299.00 5004 Graphics Software 500 Software $299.00 |
The OUTER UNION operator concatenates the results of two queries. As with a DATA step or PROC APPEND concatenation the results consist of rows combined vertically. Put another way, the outer union of two queries (A and B) represents all rows in both A and B with no overlap. As illustrated below, the outer union represents the entire shaded area (A and B).
The next example concatenates the results of two queries. As illustrated in the output, the results show the rows from both queries are concatenated.
SQL Code
PROC SQL; SELECT prodnum, prodname, prodtype, prodcost FROM PRODUCTS OUTER UNION 1 SELECT prodnum, prodname, prodtype, prodcost FROM PRODUCTS; QUIT;
| The OUTER UNION operator concatenates the results of both queries. |
The SAS System Product Product Product Type Cost Product Type Cost _____________________________________________________ Workstation $3,200.00 . Workstation $3,300.00 . Laptop $3,400.00 . Phone $35.00 . Phone $175.00 . Phone $130.00 . Software $299.00 . Software $399.00 . Software $299.00 . Software $299.00 . . Workstation $3,200.00 . Workstation $3,300.00 . Laptop $3,400.00 . Phone $35.00 . Phone $175.00 . Phone $130.00 . Software $299.00 . Software $399.00 . Software $299.00 . Software $299.00 |
The OUTER UNION operator automatically concatenates rows from two queries with no overlap, unless the CORRESPONDING (CORR) keyword is specified as part of the operator. The column names assigned to the results are derived from the names in the first query. In the next example, the CORR keyword enables columns with the same name and attributes to be overlaid.
SQL Code
PROC SQL; SELECT prodnum, prodname, prodtype, prodcost FROM PRODUCTS OUTER UNION CORR 1 SELECT prodnum, prodname, prodtype, prodcost FROM PRODUCTS; QUIT;
| The OUTER UNION operator with the CORR keyword concatenates and overlays the results of both queries. |
Results
The SAS System Product Product Number Product Name Product Type Cost _______________________________________________ _______________ 1110 Dream Machine Workstation $3,200.00 1200 Business Machine Workstation $3,300.00 1700 Travel Laptop Laptop $3,400.00 2101 Analog Cell Phone Phone $35.00 2102 Digital Cell Phone Phone $175.00 2200 Office Phone Phone $130.00 5001 Spreadsheet Software Software $299.00 5002 Database Software Software $399.00 5003 Wordprocessor Software Software $299.00 5004 Graphics Software Software $299.00 1110 Dream Machine Workstation $3,200.00 1200 Business Machine Workstation $3,300.00 1700 Travel Laptop Laptop $3,400.00 2101 Analog Cell Phone Phone $35.00 2102 Digital Cell Phone Phone $175.00 2200 Office Phone Phone $130.00 5001 Spreadsheet Software Software $299.00 5002 Database Software Software $399.00 5003 Wordprocessor Software Software $299.00 5004 Graphics Software Software $299.00 |
The EXCEPT operator compares rows from two queries to determine the changes made to the first table that are not present in the second table. The result below shows new and changed rows in the first table that are not in the second table, but not rows that have been deleted from the second table. As illustrated in figure 7-4, the results of specifying the EXCEPT operator represent the shaded area (A) in the diagram.
When working with two tables consisting of similar information, you can use the EXCEPT operator to determine new and modified rows. The EXCEPT operator compares rows in both tables to identify the rows existing in the first table but not in the second table. It also uniquely identifies rows that have changed from the first to the second tables. Columns are compared in the order they appear in the SELECT statement.
If the wildcard character “*” is specified in the SELECT statement, it is assumed that the tables are structurally identical to one another. Let’s look at an example.
Suppose you have master and backup tables of the CUSTOMERS file, and you want to compare them to determine the new and changed rows. The EXCEPT operator as illustrated in the next example returns all new or changed rows from the CUSTOMERS table that do not appear in the CUSTOMERS_BACKUP table. As illustrated by the output, three new customer rows are added to the CUSTOMERS table that had not previously existed in the CUSTOMERS_BACKUP table.
SQL Code
PROC SQL; SELECT * FROM CUSTOMERS EXCEPT 1 SELECT * FROM CUSTOMERS_BACKUP; QUIT;
| The EXCEPT operator compares rows in both tables to identify the rows existing in the first table but not the second table. |
Results
The SAS System Customer Number Customer Name Customer's Home City _______________________________________________ ____________ 1302 Software Intelligence Cor Spring Valley 1901 Shipp Consulting San Pedro 1902 Gupta Programming Simi Valley |