Writing Outer Joins in T-SQL

By Kathi Kellenberger on 11 February 2008 | Tags: Queries , SELECT


Occasionally someone will ask for my help with a query and say that both a right outer join and a left outer join was tried, and still the expected results were not achieved. That made me realize that some developers do not completely understand outer joins and that an article explaining how to use them might help.

Inner Join Review

The most commonly used join is an INNER JOIN. This type of join combines rows from two tables only when they match on the joining condition. Usually the primary key from one table matches a foreign key on another table, but join conditions can be more complex than that.

(Note: Most of the information in this article can be applied to views as well as tables. For simplicity, the word "table" will be used to mean table or view unless stated otherwise. Keys are not defined on views, but the underlying table’s key column or columns are often included in the view. To keep things simple, let's assume that is the case.)

INNER JOIN will retrieve a results row only where there is a perfect match between the two tables in the join condition. You will also often see one row from one of the tables matching multiple rows in the other table. For example, one customer can have many orders. One order can have many order details. The data on the one side will be repeated for each row on the many side. The following query is an example showing how the information from the Sales.SalesOrderHeader is repeated on each matching row:

SELECT s.SalesOrderID, OrderDate,ProductID
FROM Sales.SalesOrderHeader AS s 
INNER JOIN Sales.SalesOrderDetail AS d ON s.SalesOrderID = d.SalesOrderID
ORDER BY s.SalesOrderID, ProductID

Outer Join Introduction

OUTER JOIN is used to join two tables even if there is not a match. An OUTER JOIN can be used to return a list of all the customers and the orders even if no orders have been placed for some of the customers. A keyword, RIGHT or LEFT, is used to specify which side of the join returns all possible rows. I like using LEFT because it makes sense to me to list the most important table first. Except for one example demonstrating RIGHT OUTER JOIN, this article will use left joins. Just a note: the keywords INNER and OUTER are optional.

The next example returns a list of all the customers and the SalesOrderID for the orders that have been placed, if any.

SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID

It uses the LEFT keyword because the Sales.Customer table is located on the left side and we want all rows returned from that table even if there is no match in the Sales.SalesOrderHeader table. This is an important point. Notice also that the CustomerID column is the primary key of the Sales.Customer table and a foreign key in the Sales.SalesOrderHeader table. This means that there must be a valid customer for every order placed. Writing a query that returns all orders and the customers if they match doesn’t make sense. The LEFT table should always be the primary key table when performing a LEFT OUTER JOIN.

If the location of the tables in the query are switched, the RIGHT keyword is used and the same results are returned:

SELECT c.CustomerID, s.SalesOrderID
FROM Sales.SalesOrderHeader s 
RIGHT OUTER JOIN Sales.Customer c ON c.CustomerID = s.CustomerID

Notice that I didn’t change the join condition at all. It doesn’t matter which side of the equal sign the columns are listed; only where the tables are named is it important.

If I have a LEFT OUTER JOIN, what is returned from the table on the right side of the join where there is not a match? Each column from the right side will return a NULL. Try this query which lists the non-matching rows first:

SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
ORDER BY s.SalesOrderID

By adding a WHERE clause to check for a NULL SalesOrderID, you can find all the customers who have not placed an order. My copy of AdventureWorks returns 66 customers with no orders:

SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID
WHERE s.SalesOrderID IS NULL

Occasionally, you will need to be more specific. How can you find all the customers who have not placed an order in 2002? There are several ways to solve this problem. You could create a view of all the orders placed in 2002 and join the view on the Sales.Customer table. Another option is to create a CTE, or Common Table Expression, of the orders placed in 2002. This example shows how to use a CTE to get the required results:

WITH s AS
(   SELECT SalesOrderID, customerID
    FROM Sales.SalesOrderHeader
    WHERE OrderDate between '1/1/2002' and '12/31/2002'
)
SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c 
LEFT OUTER JOIN s ON c.customerID = s.customerID
WHERE s.SalesOrderID IS NULL

My favorite technique to solve this problem is much simpler. Additional criteria, in this case filtering on the OrderDate, can be added to the join condition. The query joins all customers to the orders placed in 2002. Then the results are restricted to those where there is no match. This query will return exactly the same results as the previous, more complicated query:

SELECT c.CustomerID, s.SalesOrderID
FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID 
                              and s.OrderDate between '1/1/2002' and '12/31/2002' 
WHERE s.SalesOrderID IS NULL

Using Aggregates with Outer Joins

Aggregate queries introduce another pitfall watch out for. The following example is an attempt to list all the customers and the count of the orders that have been placed. Can you spot the problem?

SELECT c.CustomerID, count(*) OrderCount
FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID 
GROUP BY c.CustomerID
ORDER BY OrderCount

Now the customers with no orders look like they have placed one order. That is because this query is counting the rows returned. To solve this problem, count the SalesOrderID column. NULL values are eliminated from the count.

SELECT c.CustomerID, count(SalesOrderID) OrderCount
FROM Sales.Customer c LEFT OUTER JOIN Sales.SalesOrderHeader s
ON c.CustomerID = s.CustomerID 
GROUP BY c.CustomerID
ORDER BY OrderCount

Multiple Joins

Once more than two tables are involved in the query, things get a bit more complicated. When a table is joined to the RIGHT table, a LEFT OUTER JOIN must be used. That is because the NULL rows from the RIGHT table will not match any rows on the new table. An INNER JOIN causes the non-matching rows to be eliminated from the results. If the Sales.SalesOrderDetail table is joined to the Sales.SalesOrderHeader table and an INNER JOIN is used, none of the customers without orders will show up. NULL cannot be joined to any value, not even NULL.

To illustrate this point, when I add the Sales.SalesOrderDetail table to one of the previous queries that checked for customers without orders, I get back no rows at all.

SELECT c.CustomerID, s.SalesOrderID, d.SalesOrderDetailID
FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID 
INNER JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID
WHERE s.SalesOrderID IS NULL

To get correct results, change the INNER JOIN to a LEFT JOIN.

SELECT c.CustomerID, s.SalesOrderID, d.SalesOrderDetailID
FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID 
LEFT OUTER JOIN Sales.SalesOrderDetail d ON s.SalesOrderID = d.SalesOrderID
WHERE s.SalesOrderID IS NULL

What about additional tables joined to Sales.Customer, the table on the left? Must outer joins be used? If it is possible that there are some rows without matches, it must be an outer join to guarantee that no results are lost. The Sales.Customer table has a foreign key pointing to the Sales.SalesTerritory table. Every customer’s territory ID must match a valid value in Sales.SalesTerritory. This query returns 66 rows as expected because it is impossible to eliminate any customers by joining to Sales.SalesTerritory:

SELECT c.CustomerID, s.SalesOrderID, t.Name
FROM Sales.Customer c 
LEFT OUTER JOIN Sales.SalesOrderHeader s ON c.CustomerID = s.CustomerID 
INNER JOIN Sales.SalesTerritory t ON c.TerritoryID = t.TerritoryID
WHERE SalesOrderID IS NULL

Sales.SalesTerritory is the primary key table; every customer must match a valid territory. If you wanted to write a query that listed all territories, even those that had no customers, an outer join will be used. This time, Sales.Customers is on the right side of the join.

SELECT t.Name, CustomerID
FROM Sales.SalesTerritory t 
LEFT OUTER JOIN Sales.Customer c ON t.TerritoryID = c.TerritoryID

Conclusion

Queries with outer joins can be tricky to write. Extra time and care must be spent making sure the results are correct. Think about the relationship between the tables and make sure that the outer join is continued down the path.  This article covered almost everything you need to know about outer joins.


Related Articles

Using Dynamic SQL in Stored Procedures (7 March 2011)

Joining to the Next Sequential Row (2 April 2008)

Aggregating Correlated Sub-Queries (23 October 2007)

How to Use GROUP BY with Distinct Aggregates and Derived tables (31 July 2007)

How to Use GROUP BY in SQL Server (30 July 2007)

Returning Complex Data from User-Defined Functions with CROSS APPLY (11 June 2007)

SQL Server 2005: Using OVER() with Aggregate Functions (21 May 2007)

Returning a week number for any given date and starting fiscal month (2 May 2007)

Other Recent Forum Posts

Troubleshooting Deadlocks in SQL Server (1d)

Last Login date and time (2d)

Negative effects of High VLF counts (2d)

Need to return a value that indicates that a record has been added, but not when a record is modified (3d)

Indexex on low cardinality fields (3d)

Error in stored procedure (3d)

Spam post flagging (4d)

Update Microsoft SQL Server (RTM) 12.0.2000.8 to latest v14 (12.0.6449.1) (4d)

- Advertisement -