Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Other SQL Server 2008 Topics
 Performance Tuning

Author  Topic 

ddamico
Yak Posting Veteran

76 Posts

Posted - 2011-01-17 : 17:00:38
I am trying to increase performance of aggregation of data from a large table and am
looking for recommendations in indexing and generally restructuring my query.

Environment: SQL Server 2008 64bit Standard Edition
Windows Server 2008 Standard SP2


I have the following table

CREATE TABLE DailyTransactionSummary (
TransactionDate smalldatetime,
Account int,
TransactionType int,
TranCount money,
TranAmount money
)
SELECT '01/01/2011', 90008, 1, 2.00 513.98 UNION
SELECT '01/01/2011', 90008, 2, 2.00 513.98 UNION
SELECT '01/01/2011', 90010, 1, 1.00 1000.00 UNION
SELECT '01/01/2011', 90008, 2, 1.00 1000.00


On any given date each account can have up to a possible of 5 TransactionType records. The current
table has approximately 650,000,000 rows. The goal that has to be achieved is to aggregate the accounts into 4 monthly
rollups defined as M3, M2, M1 and M0 which are define as approximately 30 day windows x to y.

We are converting each transaction into two possible outputs Frequence based on TranCount and Amount(Sum). The
below query depicts the example of what is trying to be achieved.

SELECT A.MMGAccountID
, TriggerBucketValue = GetTriggerTypeValueFrequencyID(MMGTransactionTypeID)
, TransactionValue = CONVERT(money,COUNT(TranCount))
FROM M3EligibleAccount X
JOIN DailyTransactionSummary A
ON X.MMGAccountID = A.MMGAccountID
WHERE A.TransactionDate BETWEEN (SELECT StartDate FROM TriggerPeriod WHERE Period = 'M3')
AND (SELECT EndDate From TriggerPeriod WHERE Period = 'M3')
GROUP BY A.MMGAccountID,MMGTransactionTypeID -- Indicator Values
UNION
SELECT A.MMGAccountID
, TriggerBucketValue = GetTriggerTypeValueAmountID(MMGTransactionTypeID)
, TransactionValue = SUM(TranAmount)
FROM M3EligibleAccount X
JOIN DailyTransactionSummary A
ON X.MMGAccountID = A.MMGAccountID
WHERE A.TransactionDate BETWEEN (SELECT StartDate FROM TriggerPeriod WHERE Period = 'M3')
AND (SELECT EndDate From TriggerPeriod WHERE Period = 'M3')
GROUP BY A.MMGAccountID,MMGTransactionTypeID -- Sum Values


An order for an account to be considered it must have had at least one transaction in any of the monthly windows defined.
"M3EligibleAccount". The query runs extremely slow so I tried to make use of Filtered Indexes defining indexes as subsets
of date ranges for each window. However, the only way to make use of the these indicies is to create a one query with hard-coded
variables. which is done below as an example. This is actually a Table Valued Function that is called via a synonym.


SELECT A.MMGAccountID
, TriggerBucketValue = GetTriggerTypeValueFrequencyID(MMGTransactionTypeID)
, TransactionValue = CONVERT(money,COUNT(TranCount))
FROM M3EligibleAccount X
JOIN DailyTransactionSummary A
ON X.MMGAccountID = A.MMGAccountID
WHERE A.TransactionDate BETWEEN '09/13/2010' AND '10/12/2010'
GROUP BY A.MMGAccountID,MMGTransactionTypeID -- Indicator Values
UNION
SELECT A.MMGAccountID
, TriggerBucketValue = GetTriggerTypeValueAmountID(MMGTransactionTypeID)
, TransactionValue = SUM(TranAmount)
FROM M3EligibleAccount X
JOIN DailyTransactionSummary A
ON X.MMGAccountID = A.MMGAccountID
WHERE A.TransactionDate BETWEEN '09/13/2010' AND '10/12/2010'
GROUP BY A.MMGAccountID,MMGTransactionTypeID -- Sum Values

That said the query is still performing poorly. I am wondering if anyone has recommendations for working with a large table
like this and maybe enhancing the query to do a better job or indexes that might be beneficial. Thanks alot

ddamico

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2011-01-18 : 04:26:55
There are several parts here that blow the performance and running any query on a 650 mill rows table will require some strict discipline.

Some questions:
- is there an index on transactiondate?
- is there an index on X.MMGAccountID and A.MMGAccountID? Are they clustered?
- what do these user defined functions do?
- can you change the UNION to a UNION ALL?
- can you post the execution plan of one of the queries? (set showplan_text on)
- is there an option to upgrade to enterprise version? Enterprise has partitioning which would *really* help in situations like this...

I don't think changing the dates to hardcoded values will have much impact, but it does of course help to some degree.

- Lumbago

My blog (yes, I have a blog now! just not that much content yet)
-> www.thefirstsql.com
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-01-18 : 05:57:58
I would look at the function GetTriggerTypeValueAmountID.



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2011-01-18 : 06:09:43
Try this. It minimized the number of calls to the function.
;WITH cteSource(MMGAccountID, MMGTransactionTypeID, Period, theCount, theAmount)
AS (
SELECT a.MMGAccountID,
a.MMGTransactionTypeID,
t.Period,
CAST(SUM(x.TranCount) AS MONEY) AS theCount,
SUM(x.TranAmount) AS theAmount
FROM dbo.M3EligibleAccount AS x
INNER JOIN dbo.TriggerPeriod AS t ON t.Period IN ('M0', 'M1', 'M2', 'M3')
INNER JOIN dbo.DailyTransactionSummary AS a ON a.MMGAccountID = x.MMGAccountID
AND a.TransactionDatet.StartDate BETWEEN t.StartDate AND t.EndDate
GROUP BY a.MMGAccountID,
a.MMGTransactionTypeID
)
SELECT u.MMGAccountID,
dbo.GetTriggerTypeValueFrequencyID(u.MMGTransactionTypeID) AS TriggerBucketValue,
u.Period,
u.TransactionValue
FROM cteSource AS s
UNPIVOT (
TransactionValue
FOR theColumn IN (s.theCount, s.theAmount)
) AS u



N 56°04'39.26"
E 12°55'05.63"
Go to Top of Page

ddamico
Yak Posting Veteran

76 Posts

Posted - 2011-01-19 : 12:43:50
Lumbago,

1-2) I am intending on trying Peso query but here are few facts about my table as it pertains to indexes.

My current table is indexed primary key clustered
TransactionDate,TransactionType,Account

I did it this way because each day we have new transactions for accounts. Maybe, i shouldn't have the primary key clustered.

I created a filtered index for every Monthly period example

idx_dailytransactionsummary_M0
(Account,TransactionType) INCLUDE(TranAmount,TranCount)
where TransactionDate >= <StartDate> AND TransactionDate <= <EndDate>
.
.
.

As for the function each customer transaction type can yield two internal transaction one based on TranCount and the other based upon TranAmount therefore every record yields two records.

3) I should use "union all" didn't think about that

4) The hardcoded dates were the only way I was able to have the query engine use the filtered indexes I had created.

5) Maybe its worth setting the the value of the internal transaction types in the table upon loading therefore excluding the need to use the function call, or use a computed columns to store them.

6) I would love to used enterprise edition just not an option at the moment. The old system was using separated tables for each days transactions (poor mans partitioning).

I want to be as disciplined as possible because of the sheer volume
of data being dealt with. I will post the query execution plan shortly but would like to look at Peso query to get a feeling on how it works.

Thanks and i will post an update soon
Go to Top of Page

ddamico
Yak Posting Veteran

76 Posts

Posted - 2011-01-27 : 09:59:37
I wanted to thank everyone for the advice. After building a couple of indexes and breaking the logic up to do one transaction type at a time instead of all at once I was able to get the process time from 3 hours to a mere 3 minutes (6 per transaction type)

Go to Top of Page

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2011-01-28 : 03:55:59
Great news! That is a substantial amount of time saving and the first time this happened to me I was totally mind blown and have loved working with databases ever since :)

- Lumbago

My blog-> www.thefirstsql.com
Go to Top of Page
   

- Advertisement -