CHAPTER 5

image

Joining Tables

Now that you know how to write simple queries using one table and how to use functions and expressions in queries, it is time to learn how to write queries involving two or more tables. In a properly designed relational database, a table contains data about one thing or entity. For example, an order-entry application will have a table storing customer information, a table containing data about orders, and a table containing detail information about each item ordered. The order table has a column, called a foreign key, which refers to a row in the customer table. The detail table has a foreign key column that refers to the order table. By using joins, you can link these tables together within the query so you can display columns from each table in the same result set.

You can also think about joining two tables in terms of a parent-child relationship. The parent row has one or more matching rows in the child table. The child table matches back to just one row in the parent table. In the previous example, the customer table is a parent to the orders table and the orders table is a parent to the details table.

Learning how to join tables is a critical skill for T-SQL developers because it allows you to combine the relational data stored in multiple tables and present it as a single result set. There are also multiple types of joins. You will start this chapter learning about the most common type, the INNER JOIN. Make sure you understand all the example code and complete the exercises in this chapter before moving on to the next chapter.

Using INNER JOIN

Most of the time, to join tables together, you will use INNER JOIN. When connecting two tables with INNER JOIN, only the rows from the tables that match on the joining columns will show up in the results. If you join the customer and order tables, the query will return only the customers who have placed orders, along with the orders that have been placed. Only the rows where the customer ID is common in both tables will show up in the results.

Joining Two Tables

To join tables together, it might seem like another major clause would need to be added to the SELECT statement. This is not the case. Instead, the FROM clause contains information about how the tables join together. Here is the syntax for joining two tables (the keyword INNER is optional):

SELECT <select list>
FROM <table1>
[INNER] JOIN <table2> ON <table1>.<col1> = <table2>.<col2>

Figure 5-1 shows how the Sales.SalesOrderHeader and Sales.SalesOrderDetail tables connect and shows some of the columns in the tables. You will see these tables joined in the first example query, so make sure you understand how they connect before typing the code in Listing 5-1.

9781484200476_Fig05-01.jpg

Figure 5-1. The Sales.SalesOrderHeader and Sales.SalesOrderDetail tables

The Sales.SalesOrderHeader table has a primary key called SalesOrderID. The Sales.SalesOrderDetail table has a composite primary key, one that is made up of more than one column, consisting of SalesOrderDetailID and SalesOrderID. The SalesOrderID column in the Sales.SalesOrderDetail table is also a foreign key pointing back to the Sales.SalesOrderHeader table. The arrow points from the foreign key in the Sales.SalesOrderDetail table to the primary key in the Sales.SalesOrderHeader table.

Take a look at the code in Listing 5-1. Type in and execute the code to learn how to join the two tables.

Listing 5-1. Joining Two Tables

SELECT s.SalesOrderID, s.OrderDate, s.TotalDue, d.SalesOrderDetailID,
    d.ProductID, d.OrderQty
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail AS d ON s.SalesOrderID = d.SalesOrderID;

Figure 5-2 displays the partial results. The SELECT list may contain columns from either of the tables. In the FROM clause, you list one of the tables followed by the words INNER JOIN and the second table name. To define how the two tables join together, use the keyword ON and an equality expression. Each Sales.OrderHeader row contains a unique SalesOrderID. Each Sales.SalesOrderDetail row contains a SalesOrderID column that determines to which order the detail belongs. When you join these two tables together, the query displays every row from the Sales.SalesOrderHeader table that matches one or more rows in the Sales.SalesOrderDetail table.

9781484200476_Fig05-02.jpg

Figure 5-2. The partial results of joining two tables

Take a look at the data from the Sales.SalesOrderHeader columns in the query results. The information from the Sales.SalesOrderHeader table repeats for each matching row in the Sales.SalesOrderDetail table. If a row exists in the Sales.SalesOrderHeader table with no matches in the Sales.SalesOrderDetail table, the Sales.SalesOrderHeader row will not show up in the results.

Because the column name, SalesOrderID, is the same in both tables, it must be fully qualified with the table name anywhere it is used in the query. To save typing, use an alias for each table. Notice that the query uses the table alias for all the columns in the SELECT list. Fully qualifying the column name is not required except for the columns with the same name; however, fully qualifying all of the column names will make the query more readable. Six months after you write a query, you can immediately see which table each column came from without spending a lot of time trying to figure it out. There are many schools of thought represented here. Some recommend aliasing even one table query so you don’t have to go back to fix it if another table is added later. You will also see recommendations to use abbreviations for each table name. The worst recommendation I have seen in the real world is lettering tables, such as “A,” “B,” “C,” and so on. These letters mean nothing in regard to your tables.

Avoiding an Incorrect Join Condition

Although you must specify join criteria with ON in the FROM clause when using INNER JOIN, nothing can prevent you from writing the join incorrectly. Take a look at Listing 5-2. If you decide to run the code, you may have to click the red square Cancel Executing Query icon to the right of the Execute icon to stop query execution, or the query will run for several minutes and return almost 4 billion rows. Not only is this a waste of your time, but it also affects all other operations on the SQL Server.

Listing 5-2. Writing an Incorrect Query

SELECT s.SalesOrderID, OrderDate, TotalDue, SalesOrderDetailID,
    d.ProductID, d.OrderQty
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesOrderDetail d ON 1 = 1;

Figure 5-3 displays a portion of the results after scrolling down more than 3,000 rows. When comparing the results to those in Figure 5-2, you will see that the rows from Sales.SalesOrderHeader join inappropriate rows from Sales.SalesOrderDetail. Both sets of results show SalesOrderID 43659, but the results are correct only in Figure 5-2. Because 1 = 1 is always true, every row from the first table joins every row from the second table to produce these incorrect results, which is also called a Cartesian product.

9781484200476_Fig05-03.jpg

Figure 5-3. The partial results of an incorrect join

Whenever you write a query with INNER JOIN, make sure you understand the relation between the two tables. For example, you could join the OrderQty column from the Sales.SalesOrderDetail table to the SalesOrderID column in the Sales.SalesOrderHeader table. The query would run, but the results would not make any sense at all.

Joining on a Different Column Name

In the previous two examples, the key column names happen to be the same, but this is not a requirement. The Person.Person table contains information about people from several tables in the AdventureWorks database. Figure 5-4 shows how the Person.Person and the Sales.Customer table connect. The PersonID from the Sales.Customer table joins to the BusinessEntityID in the Person.Person table. The PersonID column in the Sales.Customer table is the foreign key.

9781484200476_Fig05-04.jpg

Figure 5-4. How to connect the Sales.Customer and Person.Person tables

Listing 5-3 shows an example that joins these two tables.

Listing 5-3. Joining Two Tables with Different Column Names

SELECT c.CustomerID, c.PersonID, p.BusinessEntityID, p.LastName
FROM Sales.Customer AS c
INNER JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID;

Figure 5-5 shows the partial results. The Person.Person table contains information about people from several tables in the database. In this case, the columns joining the two tables have different names. The PersonID from the Sales.Customer table joins to the BusinessEntityID in the Person.Person table. This works even though the columns have different names.

9781484200476_Fig05-05.jpg

Figure 5-5. The partial results of joining tables with different key column names

Joining on More Than One Column

Although a JOIN frequently involves joining a column from one table to a column from another table, sometimes you must join multiple columns. The AdventureWorks database contains only one example in which multiple columns must be used in a single JOIN: Sales.SalesOrderDetail to Sales.SpecialOfferProduct. Figure 5-6 shows how these two tables connect.

9781484200476_Fig05-06.jpg

Figure 5-6. How to connect the Sales.SalesOrderDetail table to the Sales.SpecialOfferProduct table

The Sales.SpecialOfferProduct table has a composite primary key composed of SpecialOfferID plus ProductID. To identify a row in this table, you must use both columns. When joining Sales.SalesOrderDetail to the Sales.SpecialOfferProduct table, you specify both columns in the join. Here is the syntax for joining more than one column:

SELECT <SELECT list>
FROM <table1>
[INNER] JOIN <table2> ON <table1>.<col1> = <table2><col2>
    AND <table1>.<col3> = <table2>.<col4>

Type in and execute the code in Listing 5-4 to learn how to join two columns.

Listing 5-4. Joining Two Columns

SELECT sod.SalesOrderID, sod.SalesOrderDetailID,
    so.ProductID, so.SpecialOfferID, so.ModifiedDate
FROM Sales.SalesOrderDetail AS sod
INNER JOIN Sales.SpecialOfferProduct AS so
    ON so.ProductID = sod.ProductID AND
    so.SpecialOfferID = sod.SpecialOfferID
WHERE sod.SalesOrderID IN (51116,51112);

Take a look at the results (see Figure 5-7). Two columns, ProductID and SpecialOfferID, comprise the join condition. To determine which row matches the rows from Sales.SalesOrderDetail, both columns are used in the join condition. If the join contained only one of the columns, the results would be similar to the incorrect results presented earlier in the section “Avoiding an Incorrect Join Condition.” If the join contained only the ProductID, the results would show every possible SpecialOfferID row for each ProductID, not just the correct rows. Try modifying the join yourself by leaving out one of the conditions to see what happens.

9781484200476_Fig05-07.jpg

Figure 5-7. The partial results of joining on two columns

Joining Three or More Tables

Sometimes you will need to join only two tables together in a query, but more frequently, you will need to join three or more tables. You will often join three tables when there is a many-to-many relationship between two of the tables. For example, suppose you have a table listing college courses and a table listing students. You would need a third table that records which students take which courses. To join courses to students, your query will join all three tables.

In the AdventureWorks database, you will find many reasons to join more than two tables in one query. For example, suppose you want to see a list of the product names for each order, along with the OrderDate column. This query requires the Sales.SalesOrderHeader, Sales.SalesOrderDetail, and Production.Product tables. Figure 5-8 shows how to connect these three tables.

9781484200476_Fig05-08.jpg

Figure 5-8. How to join Sales.SalesOrderDetail to Production.Product and Sales.SalesOrderHeader

To add a third or more tables, just continue the FROM clause. Take a look at the syntax:

SELECT <SELECT list> FROM <table1>
[INNER] JOIN <table2> ON <table1>.<col1> = <table2>.<col2>
[INNER] JOIN <table3> ON <table2>.<col2> = <table3>.<col3>

Type in and execute the query in Listing 5-5 to learn how to join more than two tables in one query.

Listing 5-5. Joining Three Tables

SELECT soh.SalesOrderID, soh.OrderDate, p.ProductID, p.Name
FROM Sales.SalesOrderHeader as soh
INNER JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID
INNER JOIN Production.Product AS p ON sod.ProductID = p.ProductID
ORDER BY soh.SalesOrderID;

Figure 5-9 shows the results. Notice that even though the query joins three tables, the query displays columns from only two of the tables. To get from Sales.SalesOrderHeader to the names of the products ordered in the Production.Product table, the query must include the Sales.SalesOrderDetail table to connect the other two tables. Depending on the goal of the query, you may want to include columns from all tables involved in the query.

9781484200476_Fig05-09.jpg

Figure 5-9. The partial results of joining three tables

Take another look at the FROM clause. The Sales.SalesOrderHeader table joins to the Sales.SalesOrderDetail table on the SalesOrderID column. Then the Sales.SalesOrderDetail table joins the Production.Product table on the ProductID column. If you have trouble figuring out how to join the tables, take it a step at a time. Join two tables first, and then add the third table.

Joining tables is a very important skill for T-SQL developers. Before you move on to the next section, make sure you are comfortable with what the chapter has covered so far by completing Exercise 5-1.

EXERCISE 5-1

Use the AdventureWorks database to complete this exercise. You can find the solutions at the end of the chapter.

  1. The HumanResources.Employee table does not contain the employee names. Join that table to the Person.Person table on the BusinessEntityID column. Display the job title, birth date, first name, and last name.
  2. The customer names also appear in the Person.Person table. Join the Sales.Customer table to the Person.Person table. The BusinessEntityID column in the Person.Person table matches the PersonID column in the Sales.Customer table. Display the CustomerID, StoreID, and TerritoryID columns along with the name columns.
  3. Extend the query written in question 2 to include the Sales.SalesOrderHeader table. Display the SalesOrderID column along with the columns already specified. The Sales.SalesOrderHeader table joins the Sales.Customer table on CustomerID.
  4. Write a query that joins the Sales.SalesOrderHeader table to the Sales.SalesPerson table. Join the BusinessEntityID column from the Sales.SalesPerson table to the SalesPersonID column in the Sales.SalesOrderHeader table. Display the SalesOrderID along with the SalesQuota and Bonus.
  5. Add the name columns to the query written in question 4 by joining on the Person.Person table. See whether you can figure out which columns will be used to write the join.
  6. The catalog description for each product is stored in the Production.ProductModel table. Display the columns that describe the product such as the color and size, along with the catalog description for each product.
  7. Write a query that displays the names of the customers along with the product names they have purchased. Hint: Five tables will be required to write this query!

Using OUTER JOIN

When joining two tables with INNER JOIN, there must be an exact match between the two tables for a row to show up in the results. Occasionally, you’ll need to retrieve all the rows from one of the tables even if the other table doesn’t contain a match for every row. For example, you may want to display all the customers along with their orders, including the customers who have not placed orders yet. By using OUTER JOIN, you can retrieve all the rows from one table along with any rows that match from the other table.

Using LEFT OUTER JOIN

When writing OUTER JOIN, you must specify either LEFT or RIGHT. If the main table, the table you want to see all the rows from even if there is not a match, is on the left side of the join, you will specify LEFT. Figure 5-10 shows how the Sales.Customer and Sales.SalesOrderHeader tables connect when using LEFT OUTER JOIN so that all customers show up in the results even if they have not placed any orders. The tan area of the Venn diagram illustrates how all the CustomerID values in the Sales.Customer table will be returned whether or not there is a matching CustomerID value in the Sales.SalesOrderHeader table. Additionally, all matching CustomerID values (this is where the circles intersect in the diagram) will also be returned.

9781484200476_Fig05-10.jpg

Figure 5-10. How to perform LEFT OUTER JOIN

Here is the syntax for LEFT OUTER JOIN:

SELECT <SELECT list>
FROM <table1>
LEFT [OUTER] JOIN <table2> ON <table1>.<col1> = <table2>.<col2>

Type in and execute the code in Listing 5-6 to learn how to write a LEFT OUTER JOIN query. Note that the word OUTER is optional.

Listing 5-6. Using LEFT OUTER JOIN

SELECT c.CustomerID, s.SalesOrderID, s.OrderDate
FROM Sales.Customer AS c
LEFT OUTER JOIN Sales.SalesOrderHeader AS s ON c.CustomerID = s.CustomerID
WHERE c.CustomerID IN (11028,11029,1,2,3,4);

Figure 5-11 displays the results. Just like INNER JOIN, you must determine which column or columns from one table join the column or columns from the other table. All the rows from the table on the left side of the join, the Sales.Customer table, that meet the criteria in the WHERE clause show up in the results. The query returns rows from the right side of the join, the Sales.SalesOrderHeader table, only if they match on CustomerID. All of the columns from the Sales.SalesOrderHeader rows that don’t match return NULL values. The query returns the customers along with the orders for customers with no orders (customers 1–4).

9781484200476_Fig05-11.jpg

Figure 5-11. The results of using LEFT OUTER JOIN

Using OUTER JOIN is not difficult, but it seems to be confusing when someone first tries to use it. If the tables have the primary and foreign keys defined, the table joining with the primary key will usually be the table on the left side in a LEFT OUTER JOIN. Figure out which table must have rows returned even if there is not a match. That table must show up on the left side of a LEFT OUTER JOIN.

Using RIGHT OUTER JOIN

RIGHT OUTER JOIN differs from LEFT OUTER JOIN in just the location of the tables. If the main table, the table in which you want to see all the rows, even if there is not a match, is on the right side of the join, you will specify RIGHT. Here is the syntax:

SELECT <SELECT list>
FROM <table2>
RIGHT [OUTER] JOIN <table1> ON <table1>.<col1> = <table2>.<col2>

Type in and execute the code in Listing 5-7 to learn how to write a query using RIGHT OUTER JOIN. This query is just like the query in Listing 5-6, only the position of the tables has been switched.

Listing 5-7. Using RIGHT OUTER JOIN

SELECT c.CustomerID, s.SalesOrderID, s.OrderDate
FROM Sales.SalesOrderHeader AS s
RIGHT OUTER JOIN Sales.Customer AS c ON c.CustomerID = s.CustomerID
WHERE c.CustomerID IN (11028,11029,1,2,3,4);

Figure 5-12 shows the results; they are identical to the results shown in Figure 5-11. The only difference between this query and the one from Listing 5-6 is the order of the tables within the FROM clause and the direction keyword. Again, all of the customers who meet the criteria are displayed along with any orders that were placed. For customers with no orders, NULL values are returned in the SalesOrderID and OrderDate columns.

9781484200476_Fig05-12.jpg

Figure 5-12. Result of using RIGHT OUTER JOIN

Using OUTER JOIN to Find Rows with No Match

Sometimes it’s useful to find all the rows in one table that don’t have corresponding rows in another table. For example, you may want to find all the customers who have never placed an order. Because the columns from the nonmatching rows contain NULL values, you can use OUTER JOIN to find rows with no match by checking for NULL. The syntax is as follows:

SELECT <SELECT list>
FROM <table1>
LEFT [OUTER] JOIN <table2> ON <table1>.<col1> = <table2>.<col2>
WHERE <col2> IS NULL

Type in and execute the code in Listing 5-8 to see how this works.

Listing 5-8. Using LEFT OUTER JOIN to Find the Rows with No Matches

SELECT c.CustomerID, s.SalesOrderID, s.OrderDate
FROM Sales.Customer AS c
LEFT OUTER JOIN Sales.SalesOrderHeader AS s ON c.CustomerID = s.CustomerID
WHERE s.SalesOrderID IS NULL;

Figure 5-13 shows the partial results. The query in Listing 5-8 returns a list of all customers who have not placed an order. After you run the query, scroll down to see that every row in the results contains NULL in the SalesOrderID column. Actually, the SalesOrderID and OrderDate columns are not needed in the results and can decrease the performance of this query. In this case, I have left them in the query so that you can verify the NULL values.

9781484200476_Fig05-13.jpg

Figure 5-13. The partial results of finding rows with no match

The LEFT JOIN returns all rows from Sales.Customer even if the customer has no orders. The customer rows with no orders contain NULL in the SalesOrderID and OrderDate columns. By checking for NULL, the customers with no orders show up in the results. Again, this might be complicated to understand at first. Just take it a step at a time when writing your own queries. You will learn another way to find rows with no match in Chapter 6.

Adding a Table to the Right Side of a LEFT JOIN

The next step is to understand what to do when additional tables are added to the query. For example, you might want to display all the customers and their orders even if an order has not been placed, along with the ProductID from those orders that were placed. To keep the customers with no orders from dropping out of the results, you must continue to use LEFT JOIN. Figure 5-14 shows how these three tables can be joined to produce the correct results. Notice the Venn diagram shows the SalesOrderHeader and the SalesOrderDetail tables joining on the SalesOrderID. Those results are matched with the CustomerID to get a result set that includes all of the customers in the Customer table including those without orders. The diagram in Figure 5-14 shows the SalesOrderHeader as the circle linking the CustomerID and SalesOrderDetail together because it is the only table containing both the CustomerID and the SalesOrderID. These types of linking tables are normally referred to as junction tables, which allow you to combine in a single query output SalesOrderDetail and Customer data.

9781484200476_Fig05-14.jpg

Figure 5-14. How to connect the tables with two LEFT OUTER JOINs

Take a look at the syntax:

SELECT <SELECT list>
FROM <table1> LEFT [OUTER]JOIN <table2> ON <table1>.<col1> = <table2>.<col2>
LEFT [OUTER] JOIN <table3> ON <table2>.<col3> = <table3>.<col4>

Listing 5-9 contains an example query. Type in and execute the code to learn how to write this type of join.

Listing 5-9. Joining Three Tables with LEFT OUTER JOIN

SELECT C.CustomerID, SOH.SalesOrderID, SOD.SalesOrderDetailID, SOD.ProductID
FROM Sales.Customer AS C
LEFT OUTER JOIN Sales.SalesOrderHeader AS SOH ON C.CustomerID = SOH.CustomerID
LEFT OUTER JOIN Sales.SalesOrderDetail AS SOD ON SOH.SalesOrderID = SOD.SalesOrderID
WHERE C.CustomerID IN (11028,11029,1,2,3,4);

Figure 5-15 shows the results. Because the columns from the nonmatching rows from Sales.SalesOrderHeader contain NULL, they can’t join to the Sales.SalesOrderDetail table. If you must join another table to the Sales.SalesOrderHeader table, you must use LEFT OUTER JOIN because you can’t join on the NULL values. On your own, change the query by removing the words LEFT OUTER in the join between Sales.SalesOrderHeader and Sales.SalesOrderDetail. The customers with no orders will drop out of the results.

9781484200476_Fig05-15.jpg

Figure 5-15. The results of querying multiple tables with LEFT OUTER JOIN

I prefer listing the main, or parent, table first and using LEFT JOIN over RIGHT JOIN. In fact, this always makes the query more readable regardless of the type of join. If you list the main table first and you start down the LEFT OUTER JOIN path, you can continue to use LEFT. If you start out with RIGHT, you may have to switch to LEFT when you add more tables, which can be confusing.

Adding a Table to the Main Table of a LEFT JOIN

You may be wondering what kind of join you would use if you wanted to join another table to the main table of a left join. To be on the safe side, use LEFT OUTER JOIN to ensure that you will not lose any rows from the main table. In some cases it will not matter if you use INNER JOIN or LEFT OUTER JOIN, and the following example is one of those cases. Because the Territory table is joining directly to the Customer table and every customer must have a territory, you won’t lose any rows with an INNER JOIN. With more experience, you will know when it is safe to use INNER JOIN with situations like these.

Listing 5-10 shows an example query that joins another table to the main table. Type in and execute the code.

Listing 5-10. Adding Another Table to the Left Side of the Join

SELECT C.CustomerID, SOH.SalesOrderID, SOD.SalesOrderDetailID,
    SOD.ProductID, T.Name
FROM Sales.Customer AS C
LEFT OUTER JOIN Sales.SalesOrderHeader AS SOH ON C.CustomerID = SOH.CustomerID
LEFT OUTER JOIN Sales.SalesOrderDetail AS SOD ON SOH.SalesOrderID = SOD.SalesOrderID
LEFT OUTER JOIN Sales.SalesTerritory AS T ON C.TerritoryID = T.TerritoryID
WHERE C.CustomerID IN (11028,11029,1,2,3,4);

Figure 5-16 shows the partial results. The Sales.SalesTerritory table joins the Sales.Customer table on TerritoryID.

9781484200476_Fig05-16.jpg

Figure 5-16. The partial results of joining to the left side of LEFT OUTER JOIN

FULL OUTER JOIN

FULL OUTER JOIN is similar to LEFT OUTER JOIN and RIGHT OUTER JOIN, but in this case, all the rows from each side of the join are returned. In other words, all rows from the left side of the join, even if there is not a match, and all rows from the right side, even if there is not a match, show up in the results. This type of join is rare and could indicate some problems with the database design or the data. For example, this type of join might be necessary if the Sales.SalesOrderHeader table contains orders with invalid CustomerID values. Here is the syntax:

SELECT <column list>
FROM <table1>
FULL [OUTER] JOIN <table2> ON <table1>.<col1>  = <table2>.<col2>

Because no good example exists in the AdventureWorks database, Listing 5-11 includes a script that creates and populates a table of colors that can be used in the Production.Product table. After populating the table, it contains colors that don’t appear in the Production.Product table, and it is missing a color that should be there. Don’t worry about understanding the table creation and population part of the script at this point.

Listing 5-11. FULL OUTER JOIN Demonstration

IF OBJECT_ID('Production.ProductColor') IS NOT NULL BEGIN
    DROP TABLE Production.ProductColor;
END
CREATE table Production.ProductColor
    (Color nvarchar(15) NOT NULL PRIMARY KEY);
GO
--Insert most of the existing colors
INSERT INTO Production.ProductColor
SELECT DISTINCT Color
FROM Production.Product
WHERE Color IS NOT NULL and Color <> 'Silver';
--Insert some additional colors
INSERT INTO Production.ProductColor
VALUES ('Green'),('Orange'),('Purple'),

--Here is the query:
SELECT c.Color AS "Color from list", p.Color, p.ProductID
FROM Production.Product AS p
FULL OUTER JOIN Production.ProductColor AS c ON p.Color = c.Color
ORDER BY p.ProductID;

Figure 5-17 displays the results. When colors from the Production.ProductColor table have no matches in the Production.Product table, the query returns NULL values in the second and third columns, which are from Production.Product (rows 1–3). When colors from the Production.Product table don’t match the Production.ProductColor table (in this case, silver) or no color for a product is specified, the query returns NULL values in the first column, which is from Production.ProductColor (row 12). Finally, when a product has a color that matches one found in the Production.ProductColor table, the query returns all non-NULL values (rows 9–11). A query like this might be used to find problems in data so that it can be cleaned up before loading it into a production system or data warehouse.

9781484200476_Fig05-17.jpg

Figure 5-17. The partial results of using FULL OUTER JOIN

CROSS JOIN

Another type of rarely used join is CROSS JOIN. This is actually the same as the Cartesian product mentioned earlier in the section “Avoiding an Incorrect Join Condition.” In this case, use CROSS JOIN when you intend to multiply two tables together—every row from one table matched to every row from another table. You might write a CROSS JOIN query to populate a table for a special purpose such as an inventory worksheet. You may need a list of every product in every possible location to create forms for the inventory crew. Here is the syntax:

SELECT <SELECT list> FROM <table1> CROSS JOIN <table2>

Notice that the FROM clause doesn’t contain a join condition. Every possible row from one table joins every possible row from another table, so you don’t have to specify a join condition. Listing 5-12 demonstrates how to write this type of query. Type in and execute the code.

Listing 5-12. A CROSS JOIN

--1
SELECT p.ProductID, l.LocationID
FROM Production.Product AS p
CROSS JOIN Production.Location AS l
ORDER BY ProductID;

--2
SELECT p.ProductID, l.LocationID
FROM Production.Product AS p
CROSS JOIN Production.Location AS l
ORDER BY LocationID;

Figure 5-18 shows the partial results. These queries, just sorted differently, each produce a row for every possible product and every possible location. Query 1 shows that product 1 displays along with every location. Query 2 shows that location 1 displays along with every product.

9781484200476_Fig05-18.jpg

Figure 5-18. The partial results of a CROSS JOIN

Self-Joins

A self-join is a special type of query that joins a table back to itself. In this example, you will first create a temporary table named #Employee. Normally the EmployeeID would be a primary key column and the ManagerID would be a foreign key pointing back to the same table. This would ensure that only an existing EmployeeID could be added to the ManagerID column. Every employee, except for one, has a manager—another employee appearing in the same table. The one employee with no manager is the CEO of Adventure Works. The SQL Server team chose to eliminate the self-join when creating recent versions of AdventureWorks in favor of a new feature first introduced with SQL Server 2008, the HIERARCHYID data type. You will learn about HIERARCHYID in Chapter 16.

You can actually join any table to itself even if it doesn’t have a foreign key pointing back to the primary key. This relationship is called a unary relationship. Here is the syntax for a self-join:

SELECT <a.col1>, <b.col1>
FROM <table1> AS a
LEFT [OUTER] JOIN <table1> AS b ON a.<col1> = b.<col2>

Listing 5-13 demonstrates how to write a self-join by creating and populating a temporary table. Type in and execute the code to learn about this.

Listing 5-13. Using Self-Join

CREATE TABLE #Employee (
EmployeeID  int,
ManagerID int,
Title nvarchar(50));

INSERT INTO #Employee
VALUES (1, NULL, 'Chief Executive Officer'),
INSERT INTO #Employee
VALUES (2, 1, 'Engineering Manager'),
INSERT INTO #Employee
VALUES (3, 2, 'Senior Tool Designer'),
INSERT INTO #Employee
VALUES (4, 2, 'Design Engineer'),
INSERT INTO #Employee
VALUES (5, 2, 'Research and Development'),
INSERT INTO #Employee
VALUES (6, 1, 'Marketing Manager'),
INSERT INTO #Employee
VALUES (7, 6, 'Marketing Specialist'),

SELECT a.EmployeeID AS Employee,
    a.Title AS EmployeeTitle,
    b.EmployeeID AS ManagerID,
    b.Title AS ManagerTitle
FROM #Employee AS a
LEFT OUTER JOIN #Employee AS b ON a.ManagerID = b.EmployeeID;

DROP TABLE #Employee;

Take a look at the results shown in Figure 5-19. Each employee, except for one, has a manager who is also an employee listed in the same table. The table has ManagerID, which points back to the Employee. Because employee 1 doesn’t have a manager, the query uses LEFT OUTER JOIN. Be sure to keep track of which table each column is supposed to come from. Even though the query uses the same table twice, it has two separate roles.

9781484200476_Fig05-19.jpg

Figure 5-19. The results of using a self-join

The important thing to remember is that one table is used twice in the query. At least one of the table names must be aliased; this is not optional because you can’t have two tables with the same name in the query. You will have to qualify all the column names, so you may want to alias both table names to save typing.

This section covered several advanced joining techniques. Understanding how the techniques work and when to use them are very important skills. Practice what you have learned by completing Exercise 5-2.

EXERCISE 5-2

Use the AdventureWorks database to complete this exercise. You can find the solutions at the end of the chapter.

  1. Write a query that displays all the products along with the SalesOrderID even if an order has never been placed for that product. Join to the Sales.SalesOrderDetail table using the ProductID column.
  2. Change the query written in question 1 so that only products that have not been ordered show up in the query.
  3. Write a query that returns all the rows from the Sales.SalesPerson table joined to the Sales.SalesOrderHeader table along with the SalesOrderID column even if no orders match. Include the SalesPersonID, SalesYTD and SalesOrderID columns in the results.
  4. Change the query written in question 3 so that the salesperson’s name also displays from the Person.Person table.
  5. The Sales.SalesOrderHeader table contains foreign keys to the Sales.CurrencyRate and Purchasing.ShipMethod tables. Write a query joining all three tables, and make sure it contains all rows from Sales.SalesOrderHeader. Include the CurrencyRateID, AverageRate, SalesOrderID, and ShipBase columns.
  6. Write a query that returns the BusinessEntityID column from the Sales.SalesPerson table along with every ProductID from the Production.Product table.

Thinking About Performance

When joining two tables, the database engine can choose from three ways to join the tables: Merge Join, Hash Match, and Nested Loop. There is no one way that is the best; SQL Server will select the best choice for each situation. The component of SQL Server that chooses a good execution plan for each query is called the optimizer. To learn more about this topic, view the Technet article found at http://technet.microsoft.com/en-us/library/ms191426(v=sql.105).aspx.

Merge Join

The Merge Join operator will be used when the optimizer estimates that a relatively large number of rows will be returned from both sides of the join, and the data on each side is sorted by the join key. In the AdventureWorks database, the Sales.SalesOrderHeader table joins to the Sales.SalesOrderDetail table on SalesOrderID. This column is the primary key of the Sales.SalesOrderHeader table and the first column of the primary key of the  Sales.SalesOrderDetail table. The primary keys are also used as the clustered index keys, therefore, each of those tables is sorted by the SalesOrderID column. The query in Listing 5-14 uses a Merge Join.

Listing 5-14. Query Using a Merge Join

SELECT SOH.SalesOrderID, SOD.OrderQty, SOD.ProductID
FROM Sales.SalesOrderHeader AS SOH
INNER JOIN Sales.SalesOrderDetail AS SOD
ON SOH.SalesOrderID = SOD.SalesOrderID;

Figure 5-20 shows the execution plan for this query with the Merge Join operator. As this query is processed, the SalesOrderID of the two sides is compared. When there is a match, the combined row is returned as part of the results.

9781484200476_Fig05-20.jpg

Figure 5-20. The execution plan showing a Merge Join

Figure 5-21 illustrates how this works. The two inputs are sorted. In step 1 the first two values are compared and they match. Now the first value of the top input is compared to the second value of the bottom input in step 2 for another match. In step three, the first value of the top row is compared to the third value of the bottom input. In this case, it is not a match. Now the process can move to the second value of the top input in step 4 where there is another match. In step 5, the inputs do not match so the process moves to the last two items in step 6.

9781484200476_Fig05-21.jpg

Figure 5-21. How a Merge Join works

Nested Loop

If the optimizer estimates that one side of the join has a small number of rows to join, it may choose to use a Nested Loop. It doesn’t matter if the two sides of the join are sorted. With a Nested Loop, SQL Server will loop through all the rows from the small side of the join looking for a match in the rows of the larger side. Listing 5-15 will produce a Nested Loop.

Listing 5-15. Query Using a Nested Loop

SELECT SOH.SalesOrderID, SOD.OrderQty, SOD.ProductID
FROM Sales.SalesOrderHeader AS SOH
INNER JOIN Sales.SalesOrderDetail AS SOD
ON SOH.SalesOrderID = SOD.SalesOrderID
WHERE SOH.CustomerID = 11000;

Figure 5-22 shows the execution plan featuring Nested Loop. In this case, the database engine will loop through the clustered index of the Sales.SalesOrderDetail table for every row found in the IX_SalesOrderHeader_CustomerID index. Even if there is not an index on the joining columns, the optimizer may select this type of join.

9781484200476_Fig05-22.jpg

Figure 5-22. The execution plan showing a Nested Loop

To see the estimated and actual number of rows, mouse-over the Seek or Scan operators to see the properties and read the values. Figure 5-23 shows the properties of the seek of the SalesOrderHeader index.

9781484200476_Fig05-23.jpg

Figure 5-23. The properties of the index seek

To illustrate this join operator, take a look at Figure 5-24. In a Nested Loop, the process loops through the smaller side of the join, comparing each value to the values of the larger side of the join. The process loops through the smaller input one time, but loops through the larger input one time for each iteration of the small loop. Notice that this process takes 12 steps compared with only six steps in the Merge Join example because the data doesn’t have to be sorted in this type of operation.

9781484200476_Fig05-24.jpg

Figure 5-24. How a Nested Loop works

Hash Match

Finally, the optimizer may select the Hash Match join type. This type of join is used when the optimizer estimates that a large number of rows will be returned from each side of the join and the input is not sorted on the joining columns. The database engine actually creates hash tables in memory to get this to work. Listing 5-16 shows a query that uses a Hash Match.

Listing 5-16. Query Using Hash Match

SELECT C.CustomerID, TotalDue
FROM Sales.Customer AS C
JOIN Sales.SalesOrderHeader S
ON C.CustomerID = S.CustomerID;

Figure 5-25 shows the execution plan. Because there is not an index on the Sales.SalesOrderHeader table containing both the CustomerID and the TotalDue, the optimizer decides to use Sales.SalesOrderHeader’s clustered index (table). It is sorted by SalesOrderID so a Merge Join can’t be used because CustomerID is the joining column. If TotalDue was added to the IX_SalesOrderHeader_CustomerID index, then the query could perform much better and also use a Merge Join instead. There are two ways the TotalDue column could be added to the index: either as a key column or as an included column. The included column doesn’t help with ordering or searching, but it would allow the index to be used instead of the actual table.

9781484200476_Fig05-25.jpg

Figure 5-25. The execution plan showing a Hash Match

The Hash Match join type builds one or more hash tables, like buckets, with the smaller input. Then the larger input probes the hash table looking for matches. The process will loop through the larger input once, and it will be efficient to find the matches in the hash table.

Indexes can affect the performance of queries joining more than one table. Although this book is not meant to teach index tuning, it is a best practice to add indexes to foreign keys. To learn more about index tuning, see Expert Performance Indexing for SQL Server 2012 by Jason Strate and Ted Kruger (Apress 2012).

Summary

For the data to make sense in reports and applications, tables must be joined together. As you can see from the number of topics in this chapter, there are many ways to do this. Most queries will use the INNER JOIN syntax, but for returning all the rows, even if there is not a match, use an OUTER JOIN. For queries using INNER JOIN, there must be a match on columns from both sides of the join, usually in a parent–child relationship. The OUTER JOIN syntax allows you to join tables when there is not a match in some cases. In Chapter 6 you will learn even more ways to use more than two tables in a query, such as common table expressions and subqueries.

Answers to the Exercises

This section provides solutions to the exercises found on writing queries joining tables.

Solutions to Exercise 5-1: Using INNER JOIN

Use the AdventureWorks database to complete this exercise.

  1. The HumanResources.Employee table does not contain the employee names. Join that table to the Person.Person table on the BusinessEntityID column. Display the job title, birth date, first name, and last name.

    SELECT E.JobTitle, E.BirthDate, P.FirstName, P.LastName
    FROM HumanResources.Employee AS E
    INNER JOIN Person.Person AS P ON
              E.BusinessEntityID = P.BusinessEntityID;

  2. The customer names also appear in the Person.Person table. Join the Sales.Customer table to the Person.Person table. The BusinessEntityID column in the Person.Person table matches the PersonID column in the Sales.Customer table. Display the CustomerID, StoreID, and TerritoryID columns along with the name columns.

    SELECT C.CustomerID, C.StoreID, C.TerritoryID,
              P.FirstName, P.MiddleName, P.LastName
    FROM Sales.Customer AS C
    INNER JOIN Person.Person AS P
              ON C.PersonID = P.BusinessEntityID;

  3. Extend the query written in question 2 to include the Sales.SalesOrderHeader table. Display the SalesOrderID column along with the columns already specified. The Sales.SalesOrderHeader table joins the Sales.Customer table on CustomerID.

    SELECT C.CustomerID, C.StoreID, C.TerritoryID,
              P.FirstName, P.MiddleName,
              P.LastName, S.SalesOrderID
    FROM Sales.Customer AS C
    INNER JOIN Person.Person AS P
              ON C.PersonID = P.BusinessEntityID
    INNER JOIN Sales.SalesOrderHeader AS S
              ON S.CustomerID = C.CustomerID;

  4. Write a query that joins the Sales.SalesOrderHeader table to the Sales.SalesPerson table. Join the BusinessEntityID column from the Sales.SalesPerson table to the SalesPersonID column in the Sales.SalesOrderHeader table. Display the SalesOrderID along with the SalesQuota and Bonus.

    SELECT S.SalesOrderID, SP.SalesQuota, SP.Bonus
    FROM Sales.SalesOrderHeader AS S
    INNER JOIN Sales.SalesPerson AS SP
              ON S.SalesPersonID = SP.BusinessEntityID;

  5. Add the name columns to the query written in question 4 by joining on the Person.Person table. See whether you can figure out which columns will be used to write the join.

    SELECT SalesOrderID, SalesQuota, Bonus, FirstName,
              MiddleName, LastName
    FROM Sales.SalesOrderHeader AS S
    INNER JOIN Sales.SalesPerson AS SP
              ON S.SalesPersonID = SP.BusinessEntityID
    INNER JOIN Person.Person AS P
              ON SP.BusinessEntityID = P.BusinessEntityID;

  6. The catalog description for each product is stored in the Production.ProductModel table. Display the columns that describe the product such as the color and size, along with the catalog description for each product.

    SELECT PM.CatalogDescription, P.Color, P.Size
    FROM Production.Product AS P
    INNER JOIN Production.ProductModel AS PM
              ON P.ProductModelID = PM.ProductModelID;

  7. Write a query that displays the names of the customers along with the product names that they have purchased. Hint: Five tables will be required to write this query!

    SELECT FirstName, MiddleName, LastName, Prod.Name
    FROM Sales.Customer AS C
    INNER JOIN Person.Person AS P
              ON C.PersonID = P.BusinessEntityID
    INNER JOIN Sales.SalesOrderHeader AS SOH
              ON C.CustomerID = SOH.CustomerID
    INNER JOIN Sales.SalesOrderDetail AS SOD
              ON SOH.SalesOrderID = SOD.SalesOrderID
    INNER JOIN Production.Product AS Prod
              ON SOD.ProductID = Prod.ProductID;

Solutions to Exercise 5-2: Using OUTER JOIN

Use the AdventureWorks database to complete this exercise.

  1. Write a query that displays all the products along with the SalesOrderID even if an order has never been placed for that product. Join to the Sales.SalesOrderDetail table using the ProductID column.

    SELECT SalesOrderID, P.ProductID, P.Name
    FROM Production.Product AS P
    LEFT OUTER JOIN Sales.SalesOrderDetail
              AS SOD ON P.ProductID = SOD.ProductID;

  2. Change the query written in question 1 so that only products that have not been ordered show up in the query.

    SELECT SalesOrderID, P.ProductID, P.Name
    FROM Production.Product AS P
    LEFT OUTER JOIN Sales.SalesOrderDetail
             AS SOD ON P.ProductID = SOD.ProductID
    WHERE SalesOrderID IS NULL;

  3. Write a query that returns all the rows from the Sales.SalesPerson table joined to the Sales.SalesOrderHeader table along with the SalesOrderID column even if no orders match. Include the SalesPersonID, SalesYTD and SalesOrderID  columns in the results.

    SELECT SalesOrderID, SalesPersonID, SalesYTD, SOH.SalesOrderID
    FROM Sales.SalesPerson AS SP
    LEFT OUTER JOIN Sales.SalesOrderHeader AS SOH
          ON SP.BusinessEntityID = SOH.SalesPersonID;

  4. Change the query written in question 3 so that the salesperson’s name also displays from the Person.Person table.

    SELECT SalesOrderID, SalesPersonID, SalesYTD, SOH.SalesOrderID,
                FirstName, MiddleName, LastName
    FROM Sales.SalesPerson AS SP
    LEFT OUTER JOIN Sales.SalesOrderHeader AS SOH
            ON SP.BusinessEntityID = SOH.SalesPersonID
    LEFT OUTER JOIN Person.Person AS P
              ON P.BusinessEntityID = SP.BusinessEntityID;

  5. The Sales.SalesOrderHeader table contains foreign keys to the Sales.CurrencyRate and Purchasing.ShipMethod tables. Write a query joining all three tables, and make sure it contains all rows from Sales.SalesOrderHeader. Include the CurrencyRateID, AverageRate, SalesOrderID, and ShipBase columns.

    SELECT CR.CurrencyRateID, CR.AverageRate,
              SM.ShipBase, SalesOrderID
    FROM Sales.SalesOrderHeader AS SOH
    LEFT OUTER JOIN Sales.CurrencyRate AS CR
              ON SOH.CurrencyRateID = CR.CurrencyRateID
    LEFT OUTER JOIN Purchasing.ShipMethod AS SM
              ON SOH.ShipMethodID = SM.ShipMethodID;

  6. Write a query that returns the BusinessEntityID column from the Sales.SalesPerson table along with every ProductID from the Production.Product table.

    SELECT SP.BusinessEntityID, P.ProductID
    FROM Sales.SalesPerson AS SP
    CROSS JOIN Production.Product AS P;

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset