So far, you've learned about inner joins and cross joins. Cross joins include all the combinations of rows in the tables. Inner joins include only rows where tables include common values. Outer joins include rows where the tables don't include common values.
Why do you need to find rows where tables fail to intersect? Here's a common problem: Find the worst performing customers to target them for a sales campaign.
The example that we will create will start by first developing the example using inner joins, and then modifying the example to use an outer join to see how they differ.
We're interested in customer information, so we'll display the names and total sales for each customer. I'll limit the output to the first five rows.
select top 5 CompanyName from Customers
Results:
CompanyName ---------------------------------------- Alfreds Futterkiste Ana Trujillo Emparedados y helados Antonio Moreno Taquerià Around the Horn Berglunds snabbköp
The next step is to integrate the Orders table into the query. Performance will be based on dollars in the Order Details table, but the Customers table doesn't connect directly to Order Details, so we need to join to Orders first. For the time being, just to include some data, we'll include the OrderID in the output:
select top 5 c.CompanyName, o.OrderID from Customers as c inner join Orders as o on o.CustomerID = c.CustomerID
Results:
CompanyName OrderID ---------------------------------------- ----------- Alfreds Futterkiste 10643 Alfreds Futterkiste 10692 Alfreds Futterkiste 10702 Alfreds Futterkiste 10835 Alfreds Futterkiste 10952
All the rows apply to Alfreds Futterkiste. That's because there are multiple orders for that customer.
Now we can connect to the Order Details table. We'll display the dollar amount for each item in the order. That will demonstrate that we have the right join conditions and prepare us for the next step.
select top 5 c.CompanyName, o.OrderID, od.ProductID, od.UnitPrice * od.Quantity * (1.0 - od.Discount) as 'Total Dollars' from Customers as c inner join Orders as o on o.CustomerID = c.CustomerID inner join [Order Details] as od on od.OrderID = o.OrderID
Results:
CompanyName OrderID ProductID Total Dollars ----------------- ----------- ----------- --------------- Alfreds Futterkis 10643 28 513.0 Alfreds Futterkis 10643 39 283.5 Alfreds Futterkis 10643 46 18.0 Alfreds Futterkis 10692 63 878.0 Alfreds Futterkis 10702 3 60.0
Notice that there are now multiple rows for the first order, number 10643. Each product included in the order requires a separate order detail row.
All that's required now is to total the dollars for each customer. The query will omit the OrderID and ProductID because I'm particularly interested in the total dollars for each customer, so I will group only on CompanyName. The last step is to order the customers based on the total dollars of business. Listing 5.4 shows the complete code for selecting customers by total sales.
Results:
CompanyName Total Dollars ---------------------------------------- ------------------ Centro comercial Moctezuma 100.79999923706055 Lazy K Kountry Store 357.0 Laughing Bacchus Wine Cellars 522.5 North/South 649.0 Galerià del gastrònomo 836.69999694824219 … Rattlesnake Canyon Grocery 51097.800333023071 Save-a-lot Markets 104361.9499206543 Ernst Handel 104874.97871398926 QUICK-Stop 110418.65497779846
This query displays customers in order by total dollars sold, as we wanted. The query returns only 89 rows, but a quick check of the Customers table shows that there are 91 total rows (use SELECT COUNT(*)). What happened to the other two rows?
The extra two rows have no sales at all. If we decide to focus on customers with low sales, the two customers with no sales might be the best place to start. We need a way to force the customers with no sales into the list.
That's where the outer join comes in. To get a complete list of customers, regardless of whether there are associated orders, you need to write an outer join.
Outer joins relate to the order of tables in your join clause. There are three types of outer joins: LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN. A LEFT OUTER JOIN asks that all data on the left of the join clause be included. A RIGHT OUTER JOIN asks that all data on the right of the join clause be included. A FULL OUTER JOIN asks that all data on both sides of the join clause be included. Figure 5.9 depicts which customers from the Customers table are included in an inner join and which are included in an outer join.
We'll change both inner joins in the original query to left outer joins (See lines 8 and 11) to force all rows from the Customers table into the final result set. Listing 5.5 shows how the left outer join replaces the inner join.
Results:
CompanyName Total Dollars ---------------------------------------- ------------------- FISSA Fabrica Inter. Salchichas S.A. NULL Paris spécialités NULL Centro comercial Moctezuma 100.79999923706055 Lazy K Kountry Store 357.0 Laughing Bacchus Wine Cellars 522.5
This time, the query includes two rows with NULL values for Total Dollars. These are the customers we need to focus on, before customers with small but real sales.
How did I know to use a left outer join instead of a right outer join in the query? Because the Customers table was listed first, it was considered to be on the left side of the query. If I had written the query as a right outer join, the server would have reported Order Details where there was no corresponding order, and Orders where there was no corresponding customer. These situations both violate basic referential integrity, so they should not occur.