Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
ddamico
Yak Posting Veteran
76 Posts |
Posted - 2011-01-17 : 17:00:38
|
I am trying to increase performance of aggregation of data from a large table and am looking for recommendations in indexing and generally restructuring my query.Environment: SQL Server 2008 64bit Standard Edition Windows Server 2008 Standard SP2I have the following table CREATE TABLE DailyTransactionSummary ( TransactionDate smalldatetime, Account int, TransactionType int, TranCount money, TranAmount money)SELECT '01/01/2011', 90008, 1, 2.00 513.98 UNIONSELECT '01/01/2011', 90008, 2, 2.00 513.98 UNIONSELECT '01/01/2011', 90010, 1, 1.00 1000.00 UNIONSELECT '01/01/2011', 90008, 2, 1.00 1000.00On any given date each account can have up to a possible of 5 TransactionType records. The current table has approximately 650,000,000 rows. The goal that has to be achieved is to aggregate the accounts into 4 monthlyrollups defined as M3, M2, M1 and M0 which are define as approximately 30 day windows x to y.We are converting each transaction into two possible outputs Frequence based on TranCount and Amount(Sum). Thebelow query depicts the example of what is trying to be achieved. SELECT A.MMGAccountID , TriggerBucketValue = GetTriggerTypeValueFrequencyID(MMGTransactionTypeID) , TransactionValue = CONVERT(money,COUNT(TranCount)) FROM M3EligibleAccount X JOIN DailyTransactionSummary A ON X.MMGAccountID = A.MMGAccountID WHERE A.TransactionDate BETWEEN (SELECT StartDate FROM TriggerPeriod WHERE Period = 'M3') AND (SELECT EndDate From TriggerPeriod WHERE Period = 'M3') GROUP BY A.MMGAccountID,MMGTransactionTypeID -- Indicator Values UNION SELECT A.MMGAccountID , TriggerBucketValue = GetTriggerTypeValueAmountID(MMGTransactionTypeID) , TransactionValue = SUM(TranAmount) FROM M3EligibleAccount X JOIN DailyTransactionSummary A ON X.MMGAccountID = A.MMGAccountID WHERE A.TransactionDate BETWEEN (SELECT StartDate FROM TriggerPeriod WHERE Period = 'M3') AND (SELECT EndDate From TriggerPeriod WHERE Period = 'M3') GROUP BY A.MMGAccountID,MMGTransactionTypeID -- Sum ValuesAn order for an account to be considered it must have had at least one transaction in any of the monthly windows defined."M3EligibleAccount". The query runs extremely slow so I tried to make use of Filtered Indexes defining indexes as subsetsof date ranges for each window. However, the only way to make use of the these indicies is to create a one query with hard-coded variables. which is done below as an example. This is actually a Table Valued Function that is called via a synonym. SELECT A.MMGAccountID , TriggerBucketValue = GetTriggerTypeValueFrequencyID(MMGTransactionTypeID) , TransactionValue = CONVERT(money,COUNT(TranCount)) FROM M3EligibleAccount X JOIN DailyTransactionSummary A ON X.MMGAccountID = A.MMGAccountID WHERE A.TransactionDate BETWEEN '09/13/2010' AND '10/12/2010' GROUP BY A.MMGAccountID,MMGTransactionTypeID -- Indicator Values UNION SELECT A.MMGAccountID , TriggerBucketValue = GetTriggerTypeValueAmountID(MMGTransactionTypeID) , TransactionValue = SUM(TranAmount) FROM M3EligibleAccount X JOIN DailyTransactionSummary A ON X.MMGAccountID = A.MMGAccountID WHERE A.TransactionDate BETWEEN '09/13/2010' AND '10/12/2010' GROUP BY A.MMGAccountID,MMGTransactionTypeID -- Sum ValuesThat said the query is still performing poorly. I am wondering if anyone has recommendations for working with a large tablelike this and maybe enhancing the query to do a better job or indexes that might be beneficial. Thanks alot ddamico |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2011-01-18 : 04:26:55
|
There are several parts here that blow the performance and running any query on a 650 mill rows table will require some strict discipline.Some questions:- is there an index on transactiondate?- is there an index on X.MMGAccountID and A.MMGAccountID? Are they clustered?- what do these user defined functions do? - can you change the UNION to a UNION ALL?- can you post the execution plan of one of the queries? (set showplan_text on)- is there an option to upgrade to enterprise version? Enterprise has partitioning which would *really* help in situations like this...I don't think changing the dates to hardcoded values will have much impact, but it does of course help to some degree.- LumbagoMy blog (yes, I have a blog now! just not that much content yet) -> www.thefirstsql.com |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2011-01-18 : 05:57:58
|
I would look at the function GetTriggerTypeValueAmountID. N 56°04'39.26"E 12°55'05.63" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2011-01-18 : 06:09:43
|
Try this. It minimized the number of calls to the function.;WITH cteSource(MMGAccountID, MMGTransactionTypeID, Period, theCount, theAmount)AS ( SELECT a.MMGAccountID, a.MMGTransactionTypeID, t.Period, CAST(SUM(x.TranCount) AS MONEY) AS theCount, SUM(x.TranAmount) AS theAmount FROM dbo.M3EligibleAccount AS x INNER JOIN dbo.TriggerPeriod AS t ON t.Period IN ('M0', 'M1', 'M2', 'M3') INNER JOIN dbo.DailyTransactionSummary AS a ON a.MMGAccountID = x.MMGAccountID AND a.TransactionDatet.StartDate BETWEEN t.StartDate AND t.EndDate GROUP BY a.MMGAccountID, a.MMGTransactionTypeID)SELECT u.MMGAccountID, dbo.GetTriggerTypeValueFrequencyID(u.MMGTransactionTypeID) AS TriggerBucketValue, u.Period, u.TransactionValueFROM cteSource AS sUNPIVOT ( TransactionValue FOR theColumn IN (s.theCount, s.theAmount) ) AS u N 56°04'39.26"E 12°55'05.63" |
|
|
ddamico
Yak Posting Veteran
76 Posts |
Posted - 2011-01-19 : 12:43:50
|
Lumbago,1-2) I am intending on trying Peso query but here are few facts about my table as it pertains to indexes.My current table is indexed primary key clustered TransactionDate,TransactionType,AccountI did it this way because each day we have new transactions for accounts. Maybe, i shouldn't have the primary key clustered.I created a filtered index for every Monthly period exampleidx_dailytransactionsummary_M0 (Account,TransactionType) INCLUDE(TranAmount,TranCount)where TransactionDate >= <StartDate> AND TransactionDate <= <EndDate>...As for the function each customer transaction type can yield two internal transaction one based on TranCount and the other based upon TranAmount therefore every record yields two records.3) I should use "union all" didn't think about that4) The hardcoded dates were the only way I was able to have the query engine use the filtered indexes I had created. 5) Maybe its worth setting the the value of the internal transaction types in the table upon loading therefore excluding the need to use the function call, or use a computed columns to store them.6) I would love to used enterprise edition just not an option at the moment. The old system was using separated tables for each days transactions (poor mans partitioning).I want to be as disciplined as possible because of the sheer volumeof data being dealt with. I will post the query execution plan shortly but would like to look at Peso query to get a feeling on how it works.Thanks and i will post an update soon |
|
|
ddamico
Yak Posting Veteran
76 Posts |
Posted - 2011-01-27 : 09:59:37
|
I wanted to thank everyone for the advice. After building a couple of indexes and breaking the logic up to do one transaction type at a time instead of all at once I was able to get the process time from 3 hours to a mere 3 minutes (6 per transaction type) |
|
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2011-01-28 : 03:55:59
|
Great news! That is a substantial amount of time saving and the first time this happened to me I was totally mind blown and have loved working with databases ever since :)- LumbagoMy blog-> www.thefirstsql.com |
|
|
|
|
|
|
|