Writing Outer Joins

So far, you've learned about inner joins and cross joins. Cross joins include all the combinations of rows in the tables. Inner joins include only rows where tables include common values. Outer joins include rows where the tables don't include common values.

Why do you need to find rows where tables fail to intersect? Here's a common problem: Find the worst performing customers to target them for a sales campaign.

The example that we will create will start by first developing the example using inner joins, and then modifying the example to use an outer join to see how they differ.

We're interested in customer information, so we'll display the names and total sales for each customer. I'll limit the output to the first five rows.

select top 5
        CompanyName
from
        Customers

Results:

CompanyName
----------------------------------------
Alfreds Futterkiste
Ana Trujillo Emparedados y helados
Antonio Moreno Taquerià
Around the Horn
Berglunds snabbköp

The next step is to integrate the Orders table into the query. Performance will be based on dollars in the Order Details table, but the Customers table doesn't connect directly to Order Details, so we need to join to Orders first. For the time being, just to include some data, we'll include the OrderID in the output:

select top 5
        c.CompanyName,
        o.OrderID
from
        Customers as c
    inner join
        Orders as o
            on o.CustomerID = c.CustomerID

Results:

CompanyName                              OrderID
---------------------------------------- -----------
Alfreds Futterkiste                            10643
Alfreds Futterkiste                            10692
Alfreds Futterkiste                            10702
Alfreds Futterkiste                            10835
Alfreds Futterkiste                            10952

All the rows apply to Alfreds Futterkiste. That's because there are multiple orders for that customer.

Now we can connect to the Order Details table. We'll display the dollar amount for each item in the order. That will demonstrate that we have the right join conditions and prepare us for the next step.

select top 5
        c.CompanyName,
        o.OrderID,
        od.ProductID,
        od.UnitPrice * od.Quantity *
                (1.0 - od.Discount)
                as 'Total Dollars'
from
        Customers as c
    inner join
        Orders as o
            on o.CustomerID = c.CustomerID
    inner join
        [Order Details] as od
            on od.OrderID = o.OrderID

Results:

CompanyName       OrderID     ProductID   Total Dollars
----------------- ----------- ----------- ---------------
Alfreds Futterkis       10643          28           513.0
Alfreds Futterkis       10643          39           283.5
Alfreds Futterkis       10643          46            18.0
Alfreds Futterkis       10692          63           878.0
Alfreds Futterkis       10702           3            60.0

Notice that there are now multiple rows for the first order, number 10643. Each product included in the order requires a separate order detail row.

All that's required now is to total the dollars for each customer. The query will omit the OrderID and ProductID because I'm particularly interested in the total dollars for each customer, so I will group only on CompanyName. The last step is to order the customers based on the total dollars of business. Listing 5.4 shows the complete code for selecting customers by total sales.

Code Listing 5.4. Customers by Total Dollar Sales
 1: select top 5
 2:         c.CompanyName,
 3:         sum(od.UnitPrice * od.Quantity *
 4:                 (1.0 - od.Discount))
 5:                 as 'Total Dollars'
 6: from
 7:         Customers as c
 8:     inner join
 9:         Orders as o
10:             on o.CustomerID = c.CustomerID
11:     inner join
12:         [Order Details] as od
13:             on od.OrderID = o.OrderID
14: group by
15:         c.CompanyName
16: order by
17:         'Total Dollars'

Results:

CompanyName                              Total Dollars
---------------------------------------- ------------------
Centro comercial Moctezuma               100.79999923706055
Lazy K Kountry Store                                  357.0
Laughing Bacchus Wine Cellars                         522.5
North/South                                           649.0
Galerià del gastrònomo                   836.69999694824219
…
Rattlesnake Canyon Grocery               51097.800333023071
Save-a-lot Markets                        104361.9499206543
Ernst Handel                             104874.97871398926
QUICK-Stop                               110418.65497779846
					

This query displays customers in order by total dollars sold, as we wanted. The query returns only 89 rows, but a quick check of the Customers table shows that there are 91 total rows (use SELECT COUNT(*)). What happened to the other two rows?

The extra two rows have no sales at all. If we decide to focus on customers with low sales, the two customers with no sales might be the best place to start. We need a way to force the customers with no sales into the list.

That's where the outer join comes in. To get a complete list of customers, regardless of whether there are associated orders, you need to write an outer join.

Outer joins relate to the order of tables in your join clause. There are three types of outer joins: LEFT OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN. A LEFT OUTER JOIN asks that all data on the left of the join clause be included. A RIGHT OUTER JOIN asks that all data on the right of the join clause be included. A FULL OUTER JOIN asks that all data on both sides of the join clause be included. Figure 5.9 depicts which customers from the Customers table are included in an inner join and which are included in an outer join.

Figure 5.9. An outer join includes rows that have no match in the joined table.


We'll change both inner joins in the original query to left outer joins (See lines 8 and 11) to force all rows from the Customers table into the final result set. Listing 5.5 shows how the left outer join replaces the inner join.

Code Listing 5.5. Customers by Total Dollar Sales, Includes All Customers
 1: select top 5
 2:         c.CompanyName,
 3:         sum(od.UnitPrice * od.Quantity *
 4:                 (1.0 - od.Discount))
 5:                 as 'Total Dollars'
 6: from
 7:         Customers as c
 8:     left outer join
 9:         Orders as o
10:             on o.CustomerID = c.CustomerID
11:     left outer join
12:         [Order Details] as od
13:             on od.OrderID = o.OrderID
14: group by
15:         c.CompanyName
16: order by
17:         'Total Dollars'

Results:

CompanyName                              Total Dollars
---------------------------------------- -------------------
FISSA Fabrica Inter. Salchichas S.A.     NULL
Paris spécialités                        NULL
Centro comercial Moctezuma               100.79999923706055
Lazy K Kountry Store                                  357.0
Laughing Bacchus Wine Cellars                         522.5
					

This time, the query includes two rows with NULL values for Total Dollars. These are the customers we need to focus on, before customers with small but real sales.

How did I know to use a left outer join instead of a right outer join in the query? Because the Customers table was listed first, it was considered to be on the left side of the query. If I had written the query as a right outer join, the server would have reported Order Details where there was no corresponding order, and Orders where there was no corresponding customer. These situations both violate basic referential integrity, so they should not occur.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset