Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Other SQL Server 2008 Topics
 Distinct Count Aggregation: Slowness

Author  Topic 

ddamico
Yak Posting Veteran

76 Posts

Posted - 2011-01-27 : 09:53:50
Good day,

I am working with a very larged data set and am running into performance slowness introducing aggregation of count distinct and wondering if there is a work around

Large table has .5 billion rows

Account Table
Account
TransactionType
TranCount
TranValue
TransactionDate

Household Table
Household
Account

TimePeriod
Period
StartDate
EndDate


A household can have multiple accounts with transactions for each day. I am trying to get a count of the distinct transaction days
for a household across all accounts.

SELECT H.Household
, A.TransactionType
, P.Period
, TranValue = COUNT(DISTINCT(TransactionDate))
FROM Household H
JOIN TimePeriod P
ON P.Period IN ('M0','M1','M2','M3')
JOIN Account A
ON A.TransactionType = 3
AND H.Account = A.Account
AND A.TransactionDate BETWEEN P.StartDate AND P.EndDate
GROUP BY H.Household
, A.TransactionType
, P.Period

Because multiple accounts for a household could have transaction on the same day i don't want them counted more than once hence the distinct.

Is there another/better way to accomplish this with new functionality
Thanks.


nigelrivett
Master Smack Fu Yak Hacker

3385 Posts

Posted - 2011-01-27 : 10:11:26
FROM Household H
JOIN TimePeriod P
ON P.Period IN ('M0','M1','M2','M3')

Appears to be missing the hoin criteria here - is it
FROM Household H
JOIN Account A
ON A.TransactionType = 3
AND H.Account = A.Account
JOIN TimePeriod P
ON A.TransactionDate BETWEEN P.StartDate AND P.EndDate
AND P.Period IN ('M0','M1','M2','M3')


How many rows do you get back?
How long does it take to insert them into a temp table? i.e.

SELECT H.Household
, A.TransactionType
, P.Period
, TransactionDate
FROM Household H
JOIN Account A
ON A.TransactionType = 3
AND H.Account = A.Account
JOIN TimePeriod P
ON A.TransactionDate BETWEEN P.StartDate AND P.EndDate
AND P.Period IN ('M0','M1','M2','M3')
GROUP BY H.Household
, A.TransactionType
, P.Period
, TransactionDate

could it be because there is an index that doesn't include the transactiondate?

==========================================
Cursors are useful if you don't know sql.
SSIS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-01-27 : 11:31:10
COUNT(DISTINCT TransactionDate) is a slow aggregate operation compared to this.
;WITH cteSource(Household, TransactionType, Period, TransactionDate)
AS (
SELECT h.Household,
a.TransactionType,
tp.Period,
a.TransactionDate
FROM dbo.TimePeriod AS tp
INNER JOIN dbo.Account AS a ON a.TransactionDate BETWEEN tp.StartDate AND tp.EndDate
AND a.TransactionType = 3
INNER JOIN dbo.Household AS h ON h.Account = a.Account
WHERE tp.Period IN ('M0', 'M1', 'M2',' M3')
GROUP BY h.Household,
a.TransactionType,
tp.Period,
a.TransactionDate
)
SELECT Household,
TransactionType,
Period,
COUNT(*) AS TranValue
FROM cteSource
GROUP BY Household,
TransactionType,
Period



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page
   

- Advertisement -