As the previous examples in this chapter have shown, an inner join disregards any rows where the search condition is not met. This differs significantly from the way an outer join groups tables. In contrast with an inner join, an outer join keeps rows that match the ON (search) condition, as well as preserving some or all of the unmatched data from one or both of the tables. Essentially, an outer join retains rows from one table even when they do not match rows in the second table. This distinction is critical because this is what truly differentiates an outer join from an inner join.
Next, an outer join is capable of processing a maximum of two tables at a time, whereas (under the SAS implementation) an inner join is able to process a maximum of 32 tables.
Another difference has to do with how you specify outer join syntax. The comma used to designate or delimit one table from the other in the FROM clause of inner joins is replaced with one of the following keywords: LEFT JOIN, RIGHT JOIN, or FULL JOIN in outer joins Additionally, the WHERE clause expression used to restrict what rows are kept in the result table is replaced with the ON keyword.
Finally, an outer join is considered to be an asymmetric join (Lorie, Raymond A. and Jean-Jacques Daudenarde, SQL & Its Applications, page 87). Unlike inner joins, an outer join does not select rows proportionally from its parts or tables.
Let’s look at how a left join is applied in a real-world situation. Suppose you want to see a list of all manufacturers, their city locations, manufacturer numbers, their product types, and product costs (if available) without leaving out those manufacturers that do not have products yet. This means that the MANUFACTURERS table (left table) acts as the master table having its rows preserved while the PRODUCTS table (right table) acts as the contributing table (subordinate table). The following left outer join example effectively retains those matched rows from both tables as well as those rows from the left table that have no match in the right table.
SQL Code
PROC SQL; SELECT manuname, manucity, manufacturers.manunum, products.prodtype, products.prodcost FROM MANUFACTURERS LEFT JOIN PRODUCTS 1 ON manufacturers.manunum = 2 products.manunum; QUIT;
| The LEFT JOIN specification preserves all the rows in the left table (MANUFACTURERS) even when there are no matching rows in the right table (PRODUCTS). |
| The ON clause acts as a WHERE clause to select the desired rows in the join results. |
As the results from the left outer join illustrate, the rows in the left (MANUFACTURERS) table that match rows in the right (PRODUCTS) table are included in the result table. As a result, eight rows match as evidenced by the value assigned to product type and product cost. Additionally, two rows from the left table that do not match rows in the right table (based on the search condition) are also retained (bolded). Therefore, each row from the MANUFACTURERS table that does not have a matching value in the PRODUCTS table is added to the resulting virtual table, accompanied by null values in the product type and product cost columns.
The SAS System Manufacturer Product Manufacturer Name Manufacturer City Number Product Type Cost __________________________________________________ _______________________________ Cupid Computer Houston 111 Workstation $3,200.00 Storage Devices Inc San Mate 120 Workstation $3,300.00 Global Comm Corp San Diego 210 Phone $175.00 Global Comm Corp San Diego 210 Phone $35.00 KPL Enterprises San Diego 500 Software $299.00 KPL Enterprises San Diego 500 Software $299.00 KPL Enterprises San Diego 500 Software $299.00 KPL Enterprises San Diego 500 Software $399.00 World Internet Corp Miami 600 . San Diego PC Planet San Diego 700 . |
To provide greater subsetting capabilities as well as added flexibility, the SQL procedure also permits the specification of an optional WHERE clause in addition to an ON clause when constructing outer joins. The ability to specify a WHERE clause in conjunction with an ON clause permits greater control over the subsetting of rows. An example will help illustrate how a WHERE clause is used in an outer join. Suppose you want to limit the results from the previous left outer join to only those products costing less than $300. In this example, the left outer join syntax uses a WHERE clause to subset row results to nonmissing products that cost less than $300.
SQL Code
PROC SQL; SELECT manuname, manucity, manufacturers.manunum, products.prodtype, products.prodcost FROM MANUFACTURERS LEFT JOIN PRODUCTS ON manufacturers.manunum = products.manunum WHERE prodcost < 300 AND 1 prodcost NE .; QUIT;
| The optional WHERE clause specified in addition to an ON clause in an outer join further subsets the joined results. |
Results
The SAS System
1
Manufacturer Product
Manufacturer Name Manufacturer City
Number Product Type Cost
__________________________________________________
_________________________________
Global Comm Corp San Diego
210 Phone $175.00
Global Comm Corp San Diego
210 Phone $35.00
KPL Enterprises San Diego
500 Software $299.00
KPL Enterprises San Diego
500 Software $299.00
KPL Enterprises San Diego
500 Software $299.00 |
Suppose you need to produce a monthly report consisting of a total invoice amount by manufacturer. An aggregate function can be specified with outer join syntax to perform a group computation using a GROUP BY clause. In the next example, a left join computes the total invoice amount for each manufacturer with a SUM function and GROUP BY clause.
SQL Code
PROC SQL; SELECT manuname, SUM(invoice.invprice) AS Total_Invoice_Amt 1 FORMAT=DOLLAR10.2 FROM MANUFACTURERS LEFT JOIN INVOICE ON manufacturers.manunum = invoice.manunum GROUP BY MANUNAME; 2 QUIT;
| The SUM function computes the total invoice amount for each manufacturer. |
| The GROUP BY clause groups all rows associated with a manufacturer into a single row. |
The results show that manufacturers with no activity have a null or missing value in the aggregated Total_Invoice_Amt column.
Results
The SAS System 1 2 Total_ Manufacturer Name Invoice_Amt _______________________________________ Cupid Computer $9,600.00 Global Comm Corp $245.00 KPL Enterprises $25,789.00 San Diego PC Planet . Storage Devices Inc . World Internet Corp $1,598.00 |
Right joins are similar to left joins, except the rows in the right (second) table are preserved. Consequently, the results will contain the rows of the symmetric join plus a row for each unmatched row in the right table. Nulls are automatically substituted for values from the left table. Suppose you want to see all manufacturers and their respective products. In the next example, a simple report containing products, product type, manufacturer number, and manufacturer name is produced from the PRODUCTS and MANUFACTURERS tables using a right outer join construct.
SQL Code
PROC SQL; SELECT prodname, prodtype, products.manunum, manuname FROM PRODUCTS RIGHT JOIN MANUFACTURERS 1 ON products.manunum = manufacturers.manunum; QUIT;
| The RIGHT JOIN specification preserves all the rows in the right table (MANUFACTURERS) even when there are no matching rows in the left table (PRODUCTS). |
The results show that manufacturers appearing in the MANUFACTURERS table with no products listed in the PRODUCTS table have null or missing values in the Product Name, Product Type, and Manufacturer Number columns.
Note: | To remove rows with missing values in the results, a WHERE clause could be specified. |
Results
The SAS System
1
Manufacturer Manufacturer
Product Name Product Type
Number Name
_____________________________________________
___________________________
Dream Machine Workstation
111 Cupid Computer
Business Machine Workstation
120 Storage Devices Inc
Digital Cell Phone Phone
210 Global Comm Corp
Analog Cell Phone Phone
210 Global Comm Corp
Spreadsheet Software Software
500 KPL Enterprises
Graphics Software Software
500 KPL Enterprises
Wordprocessor Software Software
500 KPL Enterprises
Database Software Software
500 KPL Enterprises
. World Internet Corp
. San Diego PC Planet |
Full outer joins combine the power of left and right joins by preserving rows from both the left and right tables. Although a full join is not used as frequently as left join or right join constructs, it can be useful when information from both tables is missing. In the next example, a full outer join is specified to produce a report containing manufacturers with no products and products with no known manufacturers.
SQL Code
PROC SQL; SELECT prodname, prodtype, products.manunum, manuname FROM PRODUCTS FULL JOIN MANUFACTURERS 1 ON products.manunum = manufacturers.manunum; QUIT;
| The full join specification preserves all the rows in the left table (PRODUCTS) as well as all rows in the right table (MANUFACTURERS) even when there are no matching rows. |
Results
The SAS System
1
Manufacturer
Product Name Product Type
Number Manufacturer Name
__________________________________________________
___________________________
Dream Machine Workstation
111 Cupid Computer
Business Machine Workstation
120 Storage Devices Inc
Travel Laptop Laptop
170
Digital Cell Phone Phone
210 Global Comm Corp
Analog Cell Phone Phone
210 Global Comm Corp
Office Phone Phone
220
Spreadsheet Software Software
500 KPL Enterprises
Graphics Software Software
500 KPL Enterprises
Wordprocessor Software Software
500 KPL Enterprises
Database Software Software
500 KPL Enterprises
. World Internet Corp
. San Diego PC Planet |