As was mentioned earlier, inner joins can handle a maximum of 32 tables at a time, and are the most recognized and widely used type of join. They are principally used to restrict rows where the specific search condition is not met. As a result, only rows satisfying the conditions specified in the WHERE clause are kept. This is in direct contrast with outer joins (discussed in a later section).
The most common form of inner join often referred to as an equijoin uses an equal sign “=” in the WHERE clause to indicate equality between the columns in two or more tables. Suppose you wanted to match products with their corresponding manufacturers so that all products from each manufacturer would be listed. An equijoin is performed to equate the manufacturer number from tables PRODUCTS and MANUFACTURERS.
SQL Code
PROC SQL; SELECT prodname, prodcost, manufacturers.manunum, manuname 1 2 FROM PRODUCTS, MANUFACTURERS WHERE products.manunum = 3 manufacturers.manunum; QUIT;
| The PRODUCTS table is the first table specified in the FROM clause. |
| The MANUFACTURERS table is the second table specified in the FROM clause. |
| The specification of an equal sign “=” in a WHERE clause between the columns in the tables indicates an equality type of join. |
The SAS System Product Manufacturer Product Name Cost Number Manufacturer Name ________________________________________________ _________________ Dream Machine $3,200.00 111 Cupid Computer Business Machine $3,300.00 120 Storage Devices Inc Analog Cell Phone $35.00 210 Global Comm Corp Digital Cell Phone $175.00 210 Global Comm Corp Spreadsheet Software $299.00 500 KPL Enterprises Database Software $399.00 500 KPL Enterprises Wordprocessor Software $299.00 500 KPL Enterprises Graphics Software $299.00 500 KPL Enterprises |
The previous example can be further qualified by adding another condition in the WHERE clause. For example, suppose you wanted to display only those products from the manufacturer KPL Enterprises. The following join identifies all the products manufactured by KPL Enterprises as specified in the WHERE clause (all rows not meeting the condition of the WHERE clause are automatically excluded from the results of the join).
Note: | This join assumes you know KPL Enterprises’s unique manufacturer number. |
SQL Code
PROC SQL; SELECT prodname, prodcost, manufacturers.manunum, manuname FROM PRODUCTS, MANUFACTURERS WHERE products.manunum = 1 manufacturers.manunum AND products.manunum = 500; QUIT;
| The specification of the AND logical operator in the WHERE clause indicates that both conditions must be true in order to retrieve rows from both tables. |
The SAS System Product Manufacturer Product Name Cost Number Manufacturer Name ________________________________________________ _______________ Spreadsheet Software $299.00 500 KPL Enterprises Database Software $399.00 500 KPL Enterprises Wordprocessor Software $299.00 500 KPL Enterprises Graphics Software $299.00 500 KPL Enterprises |
Let’s extend our knowledge of equijoins a bit further by identifying how much money is tied up with products manufactured by KPL Enterprises. To accomplish this, you need to do two things. First, you need to sum the product cost (PRODCOST) column across all rows that match the WHERE clause condition. Because the objective of the equijoin is to compute a total amount for products manufactured by KPL Enterprises, you need to prevent duplicate rows from displaying in the result. To do so, specify the DISTINCT keyword.
SQL Code
PROC SQL; SELECT DISTINCT SUM(prodcost) AS Total_Cost 1 FORMAT=DOLLAR10.2, manufacturers.manunum FROM PRODUCTS, MANUFACTURERS WHERE products.manunum = manufacturers.manunum AND manufacturers.manuname = 'KPL Enterprises'; QUIT;
| The DISTINCT keyword prevents duplicate rows from appearing in the result. |
The SAS System Total_Cost Manufacturer Name _____________________________________ $1,296.00 KPL Enterprises |
Another type of inner join is known as a non-equijoin. As you might guess from its name, a non-equijoin does not have an equal sign “=” specified in its WHERE clause. For example, suppose you want to display products manufactured by KPL Enterprises that cost more than $299.00. The use of the greater than “>”operator gives this type of join its name.
Note: | When the SQL procedure optimizer is unable to optimize a join query by reducing the Cartesian product, a message is displayed in the SAS log indicating that the join requires performing one or more Cartesian product joins and cannot be optimized. |
SQL Code
PROC SQL; SELECT prodname, prodtype, prodcost, manufacturers.manunum, manufacturers.manuname FROM PRODUCTS, MANUFACTURERS WHERE manufacturers.manunum = 500 AND prodtype = 'Software' AND prodcost > 299.00; 1 QUIT;
| The specification of the greater than “>”operator in the WHERE clause indicates a non-equijoin scenario. |
PROC SQL; SELECT prodname, prodtype, prodcost, manufacturers.manunum, manufacturers .manuname FROM PRODUCTS, MANUFACTURERS WHERE manufacturers.manunum = 500 AND prodtype = 'Software' AND prodcost > 299.00; NOTE: The execution of this query involves performing one or more Cartesian product joins that can not be optimized. QUIT; NOTE: PROCEDURE SQL used: real time 0.01 seconds cpu time 0.01 seconds |
Results
The SAS System Product Product Manufacturer Manufacturer Product Name Type Cost Number Name ________________________________________________ ________________________ Database Software Software $399.00 500 KPL Enterprises |
The final type of inner join is referred to as a reflexive join, or as it is sometimes called by practitioners a self join. As its name implies, a self join makes an internal copy of a table and joins the copy to itself. Essentially a join of this type joins one copy of a table to itself for the purpose of exploiting and illustrating comparisons between table values. For example, suppose you want to compare the prices of products side-by-side by product type with the less expensive product appearing first (in the first three columns of example result below).
PROC SQL; SELECT products.prodname, products.prodtype, products.prodcost, products_copy.prodname, products_copy.prodtype, products_copy.prodcost 1 2 FROM PRODUCTS, PRODUCTS PRODUCTS_COPY WHERE products.prodtype = 3 products_copy.prodtype AND products.prodcost < products_copy.prodcost; QUIT;
| The PRODUCTS table is the primary table specified in the FROM clause. |
| A copy of the PRODUCTS table called PRODUCTS_COPY is joined with the PRODUCTS table. |
| The WHERE clause requests the same type of products to be compared side-by-side with the less expensive product appearing first. |
Results
The SAS System 1 Product 2 Product Product Name Product Type Cost Product Name Product Type Cost _________________________________________________ _____________________________________________ 3 Dream Machine Workstation $3,200.00 Business Machine Workstation $3,300.00 Analog Cell Phone Phone $35.00 Digital Cell Phone Phone $175.00 Analog Cell Phone Phone $35.00 Office Phone Phone $130.00 Office Phone Phone $130.00 Digital Cell Phone Phone $175.00 Spreadsheet Software Software $299.00 Database Software Software $399.00 Wordprocessor Software Software $299.00 Database Software Software $399.00 Graphics Software Software $299.00 Database Software Software $399.00 |
Looking at another example, suppose you want to find out the names and invoice amounts where, for each customer, you list the names and invoice amounts of each customer with larger invoice amounts. The next example illustrates a very useful application of a self join.
SQL Code
PROC SQL; SELECT invoice.custnum, invoice.invprice, invoice_copy.custnum, invoice_copy.invprice 1 2 FROM INVOICE, INVOICE INVOICE_COPY WHERE invoice.invprice < 3 invoice_copy.invprice; QUIT;
| The INVOICE table is the primary table specified in the FROM clause. |
| A copy of the INVOICE table called INVOICE_COPY is joined with the INVOICE table. |
| The WHERE clause produces names of customers with larger invoice amounts. |
Results
The SAS System Customer Invoice Unit Customer Invoice Unit Number Price Number Price _______________________________________________ 201 $1,495.00 1301 $1,598.00 201 $1,495.00 501 $9,600.00 201 $1,495.00 401 $23,100.00 1301 $1,598.00 501 $9,600.00 1301 $1,598.00 401 $23,100.00 101 $245.00 201 $1,495.00 101 $245.00 1301 $1,598.00 101 $245.00 501 $9,600.00 101 $245.00 801 $798.00 101 $245.00 901 $396.00 101 $245.00 401 $23,100.00 501 $9,600.00 401 $23,100.00 801 $798.00 201 $1,495.00 801 $798.00 1301 $1,598.00 801 $798.00 501 $9,600.00 801 $798.00 401 $23,100.00 901 $396.00 201 $1,495.00 901 $396.00 1301 $1,598.00 901 $396.00 501 $9,600.00 901 $396.00 801 $798.00 901 $396.00 401 $23,100.00 |
Every table in a SAS library must have a unique name to reference it. Table names must conform to valid SAS naming conventions having a maximum length of 32 characters and starting with a letter or underscore (see the SAS Language Reference: Concepts for further details).
To minimize the number of keystrokes when referencing the tables specified in a join query, you can assign an alias or temporary table name reference to each table. When assigned, these arbitrary aliases provide a short-cut method to the tables themselves and are in effect for the duration of the join query but no longer. In the next example, the table alias “P” is assigned to the PRODUCTS table and the alias “M” is assigned to the MANUFACTURERS table in the FROM clause. Table name references in the SELECT statement and WHERE clause are made easier as well.
SQL Code
PROC SQL; SELECT prodnum, prodname, prodtype, M.manunum FROM PRODUCTS P, MANUFACTURERS M 1 WHERE P.manunum = M.manunum AND M.manuname = 'KPL Enterprises'; QUIT;
| The assignment of the table alias “P” and the table alias “M” in the FROM clause provides a short-cut method of referencing the longer table names PRODUCTS and MANUFACTURERS. |
The SAS System Product Manufacturer Number Product Name Product Type Number ______________________________________________ __________________ 5001 Spreadsheet Software Software 500 5002 Database Software Software 500 5003 Wordprocessor Software Software 500 5004 Graphics Software Software 500 |
Join queries, as with simpler queries, can take full advantage of the power of the SQL procedure. Logical and arithmetic operators, predicates, and summary functions are all available for you to use. The join query is an essential component because stored information is not always available in the form we need.
PROC SQL provides the ability to perform basic arithmetic operations such as addition, subtraction, multiplication, and division with columns containing numeric values. Essentially, this enables any query to perform column addition, subtraction, multiplication, and division. Suppose you had to compute the sales tax of 7.75% for all manufactured products sold in the state of California. In the next example, the SELECT statement shows the California sales tax (using the product cost column and the fixed sales tax percentage) computation, assigns a column alias to the result column as well as a format and label to enhance the readability of the result.
SQL Code
PROC SQL; SELECT prodname, prodtype, prodcost, prodcost * .0775 AS SalesTax 1 FORMAT=dollar10.2 LABEL='California Sales Tax' FROM PRODUCTS P, MANUFACTURERS M WHERE P.manunum = M.manunum AND M.manustat = 'CA'; QUIT;
| The ability to perform basic arithmetic operations in a SELECT statement as well as assign a column alias to the result is part of the SQL ANSI standard. |
Results
The SAS System
1
Product California
Product Name Product Type
Cost Sales Tax
_______________________________________________
_________________
Business Machine Workstation $3
,300.00 $255.75
Analog Cell Phone Phone
$35.00 $2.71
Digital Cell Phone Phone
$175.00 $13.56
Spreadsheet Software Software
$299.00 $23.17
Database Software Software
$399.00 $30.92
Wordprocessor Software Software
$299.00 $23.17
Graphics Software Software
$299.00 $23.17 |
Up to this point, our examples have been limited to two-table joins. But what if more information is needed than the two tables can provide? To extract the required information, access to a third table may be necessary. A join with three tables is a fairly simple extension of a two-table join.
As before, each joinable column must possess the same column attributes and contain the same type of information. Besides listing all required tables in the FROM clause, the WHERE clause would need to include any and all restrictions to subset only the rows desired. For example, suppose you want to display only those products along with their invoice quantity that appear in the INVOICE table for the manufacturer KPL Enterprises (manunum=500).
PROC SQL; SELECT P.prodname, P.prodcost, M.manuname, I.invqty FROM PRODUCTS P, MANUFACTURERS M, INVOICE I WHERE P.manunum = M.manunum AND P.prodnum = I.prodnum AND M.manunum = 500; QUIT;
Results
The SAS System Invoice Quantity Product - Units Product Name Cost Manufacturer Name Sold _______________________________________________ __________________ Spreadsheet Software $299.00 KPL Enterprises 5 Database Software $399.00 KPL Enterprises 2 |
Let’s examine the construction of the WHERE clause for this three-way join a bit further. The column containing the manufacturer number from the PRODUCTS, MANUFACTURERS, and INVOICE tables is joined using an AND logical operator in the WHERE clause. Additionally, the WHERE clause restricts the resulting table to only product invoices for manufacturer (manunum=500). In the next example, a three-way join lists the product names and costs, along with the customer who bought each product.
PROC SQL; SELECT P.prodname, P.prodcost, C.custname, I.invprice FROM PRODUCTS P, INVOICE I, CUSTOMERS C WHERE P.prodnum = I.prodnum AND I.custnum = C.custnum; QUIT;
Results
The SAS System Product Invoice Product Name Cost Customer Name Price ________________________________________________ ______________________ Analog Cell Phone $35.00 La Mesa Computer Land $245.00 Spreadsheet Software $299.00 Vista Tech Center $1,495.00 Business Machine $3,300.00 La Jolla Computing $23,100.00 Dream Machine $3,200.00 Alpine Technical Center $9,600.00 Database Software $399.00 Jamul Hardware & Software $798.00 |
Occasionally, information needs to be extracted from four, five, or more tables (up to a maximum of 32 tables). Joins of four or more tables can be constructed just like those accessing two or three tables. The only difference is the number of table references in the FROM clause and the level of complexity in the WHERE clause to restrict what rows are kept. Suppose you want to know, based on invoices, the number of products ordered before September 1, 2000. One way to find this information is to perform a join with four tables.
SQL Code
PROC SQL; SELECT sum(inventory.invenqty) AS Products_Ordered_Before_09012000 FROM PRODUCTS, INVOICE, CUSTOMERS, INVENTORY WHERE inventory.orddate < mdy(09,01,00) AND products.prodnum = invoice.prodnum AND invoice.custnum = customers.custnum AND invoice.prodnum = inventory.prodnum; QUIT;
Results
The SAS System Products_ Ordered_Before_ 09012000 _________ 8 |
If you were wondering whether this result could have been derived another way, you would be correct. You could also determine, based on invoices, the number of products ordered before September 1, 2000, with the following two-way join code. As can be seen, there is often more than one way to construct a join to extract the information you want.
SQL Code
PROC SQL; SELECT sum(inventory.invenqty) AS Products_Ordered_Before_09012000 FROM INVOICE I, INVENTORY I2 WHERE inventory.orddate < mdy(09,01,00) AND invoice.prodnum = inventory.prodnum; QUIT;
Results
The SAS System Products_ Ordered_Before_ 09012000 _________ 8 |
To expand your understanding of joins with more than three tables, we will illustrate a four-table join. Suppose you want to know the products being purchased and who is purchasing them. The next example shows a four-way inner join that combines data from the MANUFACTURERS, PRODUCTS, INVOICE, and CUSTOMERS tables.
SQL Code
PROC SQL; SELECT products.prodname, products.prodtype, customers.custname, manufacturers.manuname FROM MANUFACTURERS, PRODUCTS, INVOICE, CUSTOMERS WHERE manufacturers.manunum = products.manunum AND manufacturers.manunum = invoice.manunum AND products.prodnum = invoice.prodnum AND invoice.custnum = customers.custnum; QUIT;
The SAS System Product Name Product Type Customer Name Manufacturer Name -------------------------------------------------- --------------------------------- Analog Cell Phone Phone La Mesa Computer Land Global Comm Corp Spreadsheet Software Software Vista Tech Center Incredible Software Dream Machine Workstation Alpine Technical Center Cupid Computer Database Software Software Jamul Hardware & Software KPL Enterprises |