Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
alejandrolepetittomas
Starting Member
9 Posts |
Posted - 2011-02-08 : 09:50:03
|
hello everyone! i'm dealing with a table in SQL Server of about 5 million records (and it's getting bigger) with this structure:date | kwh |period 2010-06-05 8:00:00 | 1222 |12010-06-05 8:00:05 | 1222 |12010-06-05 8:00:10 | 1223 |12010-06-05 8:00:15 | 1224 |12010-06-05 8:00:20 | 1224 |52010-06-05 8:00:25 | 1227 |52010-06-05 8:00:30 | 1227 |32010-06-05 8:00:35 | 1233 |32010-06-05 8:00:40 | 1233 |12010-06-05 8:00:45 | 1237 |12010-06-05 8:00:50 | 1239 |2..... | ... |..well, the point is that i have to calculate the increment of kwh for a given period (1, 2, 3, 4, 5 or 6). in the table above, for period 1, it would be 1224-1222+1237-1233=2+4=6.The problem is that the t-sql queries that i have designed take too much time to execute: i have use cursors (it takes about 3 minutes in returning the value) and i have also tried to split the table in the number of variations of the period (to reduce the number of iterations in the query); but surprisingly, that's even slower.i don't know.. i rely on the potential of SQL Server, and i think there must be a better and faster way to do it... i feel like i'm missing something or forgetting an t-sql tool that could be used. any ideas are welcomed. thank you for your time!Alejandro |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2011-02-08 : 10:56:27
|
Huh?Here's the data in a sample tableDECLARE @dataTab TABLE ( [dateStamp] DATETIME , [kwh] INT , [period] INT PRIMARY KEY ([period], [dateStamp]) )INSERT @dataTab ([dateStamp], [kwh], [period]) SELECT '2010-06-05 8:00:00', 1222, 1UNION SELECT '2010-06-05 8:00:05', 1222, 1UNION SELECT '2010-06-05 8:00:10', 1223, 1UNION SELECT '2010-06-05 8:00:15', 1224, 1UNION SELECT '2010-06-05 8:00:20', 1224, 5UNION SELECT '2010-06-05 8:00:25', 1227, 5UNION SELECT '2010-06-05 8:00:30', 1227, 3UNION SELECT '2010-06-05 8:00:35', 1233, 3UNION SELECT '2010-06-05 8:00:40', 1233, 1UNION SELECT '2010-06-05 8:00:45', 1237, 1UNION SELECT '2010-06-05 8:00:50', 1239, 2SELECT * FROM @dataTab ORDER BY [period], [dateStamp]/*well, the point is that i have to calculate the increment of kwhfor a given period (1, 2, 3, 4, 5 or 6). in the table above, for period 1,it would be 1224 - 1222 + 1237 - 1233 = 2+4 = 6.*/ I don't understand the rules. You mention only the values [1224 - 1222 + 1237 - 1233] for period 1 -- what happens to the other values?Does period 1 model two distinct time segments? From [2010-06-05T08:00:00 to 2010-06-05T08:00:15] and [2010-06-05 08:00:40 to 2010-06-05 08:00:45]Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
alejandrolepetittomas
Starting Member
9 Posts |
Posted - 2011-02-08 : 11:34:38
|
Thank you for your reply Transact Charlie.i have to calculate the kwh spent when period=1, but the kwh in the table are absolute (they refer to the kwh consumed since 2010-01-01) and i have to calculate the difference in groups:In the example:2 kwh have been consumed in period 1,later, 3 kwh have been consumed in period 5later, 6 kwh have been consumed in period 3later, 4 kwh have been consumed in period 1that makes 6 kwh consumed in period 1 (and 3 kwh in period 5, and 6 kwh in period 3)the periods refer to the different prices that general electric stablishes per kwh depending on the season of the year and time of the day. so, the goal is to know how many kwh have been consumed in period 1, to calculate the costs later.i hope it's more clear now... |
 |
|
Ifor
Aged Yak Warrior
700 Posts |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2011-02-08 : 12:10:08
|
OK -- I think this might work. It uses a quirky update so it is the equivalent of black magic but......All the usual caveats apply -- you need to copy the information to a table you can completely lock and apply the clustered index over.There is almost certainly a better way to do this but I'm too tired to think of it.DECLARE @dataTab TABLE ( [dateStamp] DATETIME , [kwh] INT , [period] INT PRIMARY KEY ([period], [dateStamp]) )INSERT @dataTab ([dateStamp], [kwh], [period]) SELECT '2010-06-05 8:00:00', 1222, 1UNION SELECT '2010-06-05 8:00:05', 1222, 1UNION SELECT '2010-06-05 8:00:10', 1223, 1UNION SELECT '2010-06-05 8:00:15', 1224, 1UNION SELECT '2010-06-05 8:00:20', 1224, 5UNION SELECT '2010-06-05 8:00:25', 1227, 5UNION SELECT '2010-06-05 8:00:30', 1227, 3UNION SELECT '2010-06-05 8:00:35', 1233, 3UNION SELECT '2010-06-05 8:00:40', 1233, 1UNION SELECT '2010-06-05 8:00:45', 1237, 1UNION SELECT '2010-06-05 8:00:50', 1239, 2SELECT * FROM @dataTab ORDER BY [period], [dateStamp]/*well, the point is that i have to calculate the increment of kwhfor a given period (1, 2, 3, 4, 5 or 6). in the table above, for period 1,it would be 1224 - 1222 + 1237 - 1233 = 2+4 = 6.*/IF OBJECT_ID('tempdb..#quirky') IS NOT NULL DROP TABLE #quirkyCREATE TABLE #quirky ( [dateStamp] DATETIME , [kwh] INT , [period] INT , [diff] INT PRIMARY KEY CLUSTERED ( [dateStamp] , [kwh] , [period] ) )INSERT #quirky ([dateStamp], [kwh], [period]) SELECT * FROM @dataTabSELECT * FROM #quirkyDECLARE @periodMark INT SET @periodMark = -1DECLARE @rnk BIGINT SET @rnk = 0-- Rank the tableUPDATE q SET [diff] = @rnk , @rnk = CASE WHEN [period] = @periodMark THEN @rnk ELSE @rnk + 1 END , @periodMark = [period]FROM #quirky AS q-- Do the mathSELECT [period] , SUM([kwused]) AS [kwused]FROM ( SELECT [period] , MAX([kwh]) - MIN([kwh]) AS [kwused] FROM #quirky GROUP BY [period], [diff] ) AS subPartsGROUP BY [period] Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2011-02-08 : 12:11:38
|
Final Results===================period kwused1 62 03 65 3Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
alejandrolepetittomas
Starting Member
9 Posts |
Posted - 2011-02-09 : 05:52:46
|
Ifor, using CTE is a good idea, but in this case, I don't have a common id for each group. I only know that the group has changed because the next record has a different period (thus, at first sight, i can't partitionate the set, or at least, I can't see the way to do it). Same thing for the outter apply shown in [url]http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=155770[/url].In my case:DECLARE @dataTab TABLE ( [dateStamp] DATETIME , [kwh] INT , [period] INT PRIMARY KEY ([period], [dateStamp]) )INSERT @dataTab ([dateStamp], [kwh], [period]) SELECT '2010-06-05 8:00:00', 1222, 1UNION SELECT '2010-06-05 8:00:05', 1222, 1UNION SELECT '2010-06-05 8:00:10', 1223, 1UNION SELECT '2010-06-05 8:00:15', 1224, 1UNION SELECT '2010-06-05 8:00:20', 1224, 5UNION SELECT '2010-06-05 8:00:25', 1227, 5UNION SELECT '2010-06-05 8:00:30', 1227, 3UNION SELECT '2010-06-05 8:00:35', 1233, 3UNION SELECT '2010-06-05 8:00:40', 1233, 1UNION SELECT '2010-06-05 8:00:45', 1237, 1UNION SELECT '2010-06-05 8:00:50', 1239, 2It would be great if i could partitionate the table in 5 groups: for period 1, for period 5, for period 3, for period 1 and for period 2. If that were achieved, I would only have to make the difference between the bigger and the lowest kwh in each group. Using CTE and partitioning seems to be the right way to do it; but, again, i feel like I'm missing something. |
 |
|
Ifor
Aged Yak Warrior
700 Posts |
Posted - 2011-02-09 : 06:22:37
|
You just have to find two different orders to make the group:;WITH dataGrpAS( SELECT * ,ROW_NUMBER() OVER (ORDER BY dateStamp) - ROW_NUMBER() OVER (PARTITION BY period ORDER BY dateStamp) AS Grp FROM @dataTab), PeriodGrpAS( SELECT period, Grp ,MAX(kwh) - MIN(kwh) AS KWUsed FROM dataGrp GROUP BY period, Grp)SELECT period, SUM(KWUsed) AS KWUsedFROM PeriodGrpGROUP BY periodORDER BY period |
 |
|
Ifor
Aged Yak Warrior
700 Posts |
Posted - 2011-02-09 : 06:24:33
|
You just need to find two different order to make the group:;WITH dataGrpAS( SELECT * ,ROW_NUMBER() OVER (ORDER BY dateStamp) - ROW_NUMBER() OVER (PARTITION BY period ORDER BY dateStamp) AS Grp FROM @dataTab), PeriodGrpAS( SELECT period, Grp ,MAX(kwh) - MIN(kwh) AS KWUsed FROM dataGrp GROUP BY period, Grp)SELECT period, SUM(KWUsed) AS KWUsedFROM PeriodGrpGROUP BY periodORDER BY period |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2011-02-09 : 06:47:35
|
if you do implement that I'd be interested to see the performance of the two ROW_NUMBER() ranking functions vs the quirky update (which is effectively doing a ranking over the data)Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
Lumbago
Norsk Yak Master
3271 Posts |
Posted - 2011-02-09 : 07:07:08
|
Wow, this grouping technique was really cool Ifor! I've been looking at it for the past 30 mins to fully understand what's going on (I'm not as sharp as I like to imagine ) and I don't think I've seen anything like this before.EDIT: I think it will be vastly outperformed by the quirky update but still a nice trick for the bag'o'tricks!- LumbagoMy blog-> www.thefirstsql.com |
 |
|
Ifor
Aged Yak Warrior
700 Posts |
Posted - 2011-02-09 : 09:43:26
|
quote: Originally posted by Lumbago Wow, this grouping technique was really cool Ifor!
That's what I though when I first saw the technique.To understand it you only have to eye-ball the results of the following query:SELECT * ,ROW_NUMBER() OVER (ORDER BY dateStamp) AS TemporalOrder ,ROW_NUMBER() OVER (PARTITION BY period ORDER BY dateStamp) AS TemporalByPeriodOrder ,ROW_NUMBER() OVER (ORDER BY dateStamp) - ROW_NUMBER() OVER (PARTITION BY period ORDER BY dateStamp) AS GrpFROM @dataTabORDER BY dateStamp I would also be interested in knowing how it compares with the quirky update.I suspect the quirky update will be slower as it has to write a temp table and then update it. |
 |
|
alejandrolepetittomas
Starting Member
9 Posts |
Posted - 2011-02-09 : 11:58:19
|
First of all, Thank you both Ifor and Transact Charlie for the time that you have dedicated to this issue, I really appreciate that, sirs.The method proposed by Transact Charlie has one problem: the update of the column [diff] from the quirky table (which stablishes the groups) doesn't occur ordenately. It doesn´t update the records ordenately by date, so the number of groups created is bigger than it should be. We can simply check that by executing the queries: each execution returns different results for the same data. The time the system takes to execute the queries is approximately 15 seconds.About the technique proposed by Ifor, i just can feel amazed. It's awesome how the two row_number() work together to determine the different groups that a first sight had nothing in common: grp in conjuction with period will determine always different groups, since the two row_number are always increasing for each period and for all the data.Deffinitely, CTE is the most efficient solution: everything is done in just one transaction! And its execution only takes 5 seconds!! I have lots of places where I have to split data in a similar way, so it's really helpful for me. Thanks for sharing your knowledges Ifor.Alejandro |
 |
|
dataguru1971
Master Smack Fu Yak Hacker
1464 Posts |
Posted - 2011-02-09 : 12:04:31
|
Quirky updates are wicked fast,but can cause trouble if not done properly. The table HAS to have a clustered index, and you HAVE to be careful of MAXDOP. Parallelism can cause the quirky update to misfire. If it isn't updating in the correct order, you can modify the FROM statement to use OPTION (MAXDOP 1) and you can also force use of the index with WITH (INDEX("IX_NAME")). The data needs to be in the correct order. However, a lot of times the CTE methods can be exploited to do similar things, combined with Ranking/Rownumber functions.Nice thread. Poor planning on your part does not constitute an emergency on my part. |
 |
|
alejandrolepetittomas
Starting Member
9 Posts |
Posted - 2011-02-09 : 12:15:04
|
In addition, CTE methods are much faster and efficient. In my case, a client is logged on a webpage, and he is waiting for the results while they are being calculated (thanks to those excellent queries). It wouldn't be proper to make him wait for more time than necessary.. |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2011-02-09 : 16:51:50
|
well - thanks for testing!I'm surprised the dual ROW_NUMBER approach is sufficiently fast. I'm definitely (but very pleasantly) surprised that it outperforms a quriky update for ranking as well.Just out of interesters how many rows are you performing an average query over? I take if from the statement:"In my case, a client is logged on a webpage, and he is waiting for the results while they are being calculated "That the data set is some subset of your 5 million row table?Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
dataguru1971
Master Smack Fu Yak Hacker
1464 Posts |
Posted - 2011-02-09 : 17:07:11
|
CTE is really quick for the ROW_NUMBER function, even on millions of rows ( I use it frequently) . However, I suspect that the quirky update will fare far better on the actual full data set. I haven't found much that will outpeform the quirky update for actual updates to large datasets. I have used it for lightning fast updates with multiple anchors and conditional evaluations against tables for 90 million row tables and above and it is uber-fast.The same operation using CTE ranking was slower in a massive way (it was expected to be slower)..this was a while back though. All I remember is that the quirky uddate did all 90 million records faster than the CTE did 1 years worth of records (< 9 million) (the 90 million was 15 years worth of monthly data, for 600K accounts each month) WIsh I still had those results. Moden's article here is quite good:http://www.sqlservercentral.com/articles/T-SQL/68467/The multiple row_number rankings/ordering I have seen used here on a number of problems, and I like that alot...comes in very very handy when you can utilize it. Poor planning on your part does not constitute an emergency on my part. |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2011-02-09 : 17:20:15
|
Sounds good. I love ROW_NUMBER() -- I think it's probably my favourite 2005+ feature.This was a nice thread!Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
alejandrolepetittomas
Starting Member
9 Posts |
Posted - 2011-02-10 : 02:40:49
|
Well, in my case CTE combined with rowset is about 3 times faster than the quirky update. In this case, the result shown to the client is discrete (i return him just one value, the amount of kwh consumed in the selected period), and the query has to go through the 5 million row data table (5 million rows is approximately the number of rows we store in each table per year). The website I have designed allows the client to get subsets of the table correlating it with other tables, to send the subset periodically in an email, to export it to a .csv file...Alejandro |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2011-02-10 : 06:23:06
|
Here is another approach (using same sample data as provided above)SELECT t1.Period, COALESCE(SUM(t1.kwh - t2.kwh), 0) AS ConsumptionFROM @dataTab AS t1LEFT JOIN @dataTab AS t2 ON t2.Period = t1.Period AND t2.dateStamp = DATEADD(SECOND, -5, t1.dateStamp)GROUP BY ALL t1.PeriodORDER BY t1.Period N 56°04'39.26"E 12°55'05.63" |
 |
|
alejandrolepetittomas
Starting Member
9 Posts |
Posted - 2011-02-10 : 10:46:48
|
Well, in my real table, the interval between consecutive dateStamps isn't always 5 seconds, sometimes is 4, or 6 seconds... maybe the sample wasn't as clear as it shuld be.But I guess that a similar left join could be done by adding an integer autoincrement id field to the table. Thanks for the idea! |
 |
|
Next Page
|
|
|
|
|