Aggregating Correlated Sub-Queries

By Jeff Smith on 23 October 2007 | Tags: Queries

Every now and then you may attempt to calculate an aggregate function -- such as SUM() -- on a correlated subquery, only to encounter the following error:

Msg 130, Level 15, State 1, Line 24
Cannot perform an aggregate function on an 
expression containing an aggregate or a subquery.

This can be a tricky situation to resolve depending on how your query is structured and how complicated it is, but here's a couple of ideas to help you out.

Summing up a correlated SUM()

Let's start by looking at a typical example. Suppose you have written the following SELECT:

select Company.Name, Company.Region, 
  (select sum(Amount) from Orders where Orders.CompanyID = Company.CompanyID) as Total
from Company

Notice that we have used a correlated sub-query to return the total Order Amount per Company. Working from that, let's say that you instead would like to the return the total Order Amount by Region. Typically, you would remove Company from the SELECT list, add GROUP BY Region to the end, and wrap what you'd like to total in in a SUM() aggregate function:

select Company.Region, 
  sum(select sum(Amount) from Orders 
            where Orders.CompanyID = Company.CompanyID) as Total

from Company
group by Company.Region

Unfortunately, that is not a valid SQL statement, since you are trying to directly SUM() a correlated sub-query (which also happens to have a SUM() in it already). This is not allowed in SQL Server; if you try to execute that, you'll be greeted with the error mentioned above.

There are two ways to fix this:

Option 1: Use a dervied table

Perhaps the easiest solution is to simply wrap your original SELECT in a derived table, and then select from that derived table and do the grouping in the outer select:

select Region, sum(Total) as Total
from
(select Company.Name, Company.Region, (select sum(Amount) from Orders where Orders.CompanyID
    = Company.CompanyID) as Total from Company ) x
group by Region

This is logically equivalent to what we tried to do earlier, except it is now a valid SQL statement and it will return the correct results without an error. This is a very quick and easy way to solve this situation in general, though it is not always the optimal solution in terms of performance and readability.

Option 2: Rewrite your SELECT without a Correlated Subquery

Often the best solution is to rewrite your SELECT without using a correlated sub-query at all. In our example, we could write the original as:

select Company.Name, Company.Region, sum(Orders.Amount) as Total
from Company
left outer Orders on Orders.CompanyID = Company.CompanyID
group by Company.Name, Company.Region

Once the SELECT is re-written, we can now easily adjust that to return only totals by Region simply by removing Company.Name from the SELECT and GROUP BY clauses:

select Company.Region, sum(Orders.Amount) as Total
from Company
left outer Orders on Orders.CompanyID = Company.CompanyID
group by Company.Region

This is usually a good way to solve the issue because the end result is very efficient and well-structured, so it is easier to understand and maintain.

Nesting Mixed Aggregate Functions

The previous example was fairly easy to rewrite because all we were using was a SUM() aggregate; it is quite easy just to adjust the grouping to get the totals that you need. However, what happens if you'd like to calculate the SUM() of the MAX() Order Amount per company? That is, let's say you have this:

select Company.Name, Company.Region, max(orders.Amount) as MaxOrderAmount
from Company
left outer join Orders on Orders.CompanyID = Company.CompanyID
group by Company.Name, Company.Region

That returns each Company along with their Region and their top Order Amount. Now, let's say we'd like to summarize that by Region, so that we are returning the SUM() of each customer's maximum Order Amount per Region. We can try to remove Company.Name from the GROUP BY and the SELECT, and add a SUM() around the MAX() expression like this:

select Company.Region, sum(max(orders.Amount)) as TotalMaxOrderAmount

from Company
left outer join Orders on Orders.CompanyID = Company.CompanyID
group by Company.Region

... but that will result in an error, since an aggregate function cannot be contained within another aggregate function. It also doesn't make much logical sense, anyway: We have a MAX() function nested within the SUM() function, but we never indicate anywhere what the MAX() is for.

To solve this, just wrap the original SELECT in a derived table, GROUP BY Region and calculate the SUM() of the MaxOrderAmount:

select Region, sum(MaxOrderAmount) as TotalCustomerMaxOrders
from
(
  select Company.Name, Company.Region, max(orders.Amount) as MaxOrderAmount
  from Company
  left outer join Orders on Orders.CompanyID = Company.CompanyID
  group by Company.Name, Company.Region
) x

Now it is clear that we are first summarizing the Orders by Company, and then summarizing those results by Region. The derived table allows us to break down the problem into smaller, simpler parts and to calculate one aggregate function within another.

A More Complicated Situation

Another example that I've seen recently is replacing a hard-coded list of values in an IN() clause with values selected from a table. For example, suppose you'd like to return a grand total of all items in a table along with a subtotal of just a select few; you can use a SUM(CASE ...) expression like this:

select sum(Amount) as Total,
       sum(case when Code in ('a','b','d') then Amount else 0 end) as SubTotal
from yourtable

Now, what if that list of Codes actually comes from another table? That is, instead of hard-coding

when Code in ('a','b','d')

we'd like it to be

when Code in (select Code from SubTotalCodes)

If we try to make that change in our previous SQL statement:

select sum(Amount) as Total,
       sum(case when Code in (select Code from SubTotalCodes)
              then Amount else 0 end) as SubTotal
from SomeTable

we get that dreaded error once again, because we are trying to apply a SUM() on an expression containing a sub-query. This one seems a bit more tricky to solve!

We can handle this by removing the SUM() aggregate functions completely, wrapping that in a derived table, and then totaling it up:

select sum(Total) as Total, sum(SubTotal) as Total
from
( select Amount as Total, case when Code in (select Code from SubTotalCodes) then
    Amount else 0 end as SubTotal from SomeTable ) x

We also could rewrite the previous statement using JOINS instead of an IN() clause like this:

select sum(Amount) as Total, 
  sum(case when SubTotalCodes.Code is null then 0 else Amount end) as SubTotal
from SomeTable

left outer join SubTotalCodes on SubTotalCodes.Code = SomeTable.Code

We'd have to be sure the primary key of SubTotalCodes is on the "Code" column, otherwise the JOIN may cause duplicate values to be aggregated in our results.

Conclusion

Resolving this situation can sometimes be tricky, but it can always be done. If you rewrite your query using standard JOINS and/or derived tables, it will not only fix the error, but it will also result in a clearer and more structured SELECT statement.