Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2005-01-31 : 20:02:31
|
| I'm sure there's a generic "name" for this problem, but I can't put my finger on it.Given a tableCREATE TABLE MyTable (SomeSetOfKeys INT ,KeyDate DATETIME )I need to order the keys by KeyDate, and on each row, have access to KeyDate and KeyDate(+1) - the next Date.My understanding of the least expensive way to do this is:SELECT A.SomeSetOfKeys, A.KeyDate, MIN(B.KeyDate) As NextKeyDate FROM MyTable A INNER JOIN MyTable B ON B.SomeSetOfKeys = A.SomeSetOfKeys AND B.KeyDate > A.KeyDate GROUP BY SomeSetOfKeys, KeyDateThis can be an expensive way to get the job done, even with optimized clustered indexes, if the number of rows is large.Any other tricks to improve performance??? |
|
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2005-01-31 : 20:49:02
|
| yes, this code is like a correlated subquery. Does the optimizer force nested loops when this executes? That could be bad (as you point out) if your table is large.seems like a fun puzzle. I'll have to grab a pint of guinness or two and ponder it.-ec |
 |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2005-01-31 : 20:51:19
|
maybe:SELECT A.SomeSetOfKeys, A.KeyDate, (select top 1 KeyDate from MyTable where SomeSetOfKeys = A.SomeSetOfKeysand KeyDate > A.KeyDate) As NextKeyDateFROM MyTable Abut if this is faster i'd be very surprised...Go with the flow & have fun! Else fight the flow |
 |
|
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2005-01-31 : 22:05:23
|
neither one of these solutions actually works as posted. I tweaked spirit's to make it work (eliminated a join)here it is:SELECT A.SomeSetOfKeys, A.KeyDate, (select top 1 KeyDate from test where KeyDate > A.KeyDate) As NextKeyDateFROM test A I tried a solution joining on a virtual rownum, but that was much slower than spirit's solution. I won't even bother posting it.My test case only had ~7000 rows too. I have no idea what would happen if you had millions of rows.-ec |
 |
|
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2005-01-31 : 22:19:21
|
btw here is the little test case I created:CREATE TABLE [dbo].[test] ( [somesetofkeys] [int] IDENTITY (50, 5) NOT NULL , [keydate] [datetime] NULL ) ON [PRIMARY]GOALTER TABLE [dbo].[test] WITH NOCHECK ADD CONSTRAINT [PK_test] PRIMARY KEY CLUSTERED ( [somesetofkeys] ) ON [PRIMARY] GOALTER TABLE [dbo].[test] ADD CONSTRAINT [DF_test_keydate] DEFAULT (getdate()) FOR [keydate]GO CREATE INDEX [test2] ON [dbo].[test]([keydate], [somesetofkeys]) ON [PRIMARY]GO I inserted into this table some test data in a loop (I put a waitfor delay in the loop in order to get different getdate() values).I tweaked Sam's code to work (eliminate join) and here is showplan when executed:StmtText --------------------------------------- SELECT A.SomeSetOfKeys, A.KeyDate, MIN(B.KeyDate) As NextKeyDateFROM test AINNER JOIN test B ON B.KeyDate > A.KeyDateGROUP BY a.SomeSetOfKeys, a.KeyDate(1 row(s) affected)StmtText ---------------------------------------------- |--Parallelism(Gather Streams) |--Stream Aggregate(GROUP BY:([A].[somesetofkeys]) DEFINE:([Expr1002]=MIN([partialagg1003]), [A].[keydate]=ANY([A].[keydate]))) |--Parallelism(Repartition Streams, PARTITION COLUMNS:([A].[somesetofkeys]), ORDER BY:([A].[somesetofkeys] ASC)) |--Stream Aggregate(GROUP BY:([A].[somesetofkeys]) DEFINE:([partialagg1003]=MIN([B].[keydate]), [A].[keydate]=ANY([A].[keydate]))) |--Nested Loops(Inner Join, OUTER REFERENCES:([A].[keydate]) WITH PREFETCH) |--Clustered Index Scan(OBJECT:([DBA].[dbo].[test].[PK_test] AS [A]), ORDERED FORWARD) |--Index Seek(OBJECT:([DBA].[dbo].[test].[test2] AS [B]), SEEK:([B].[keydate] > [A].[keydate]) ORDERED FORWARD)(7 row(s) affected) Here is the statistics io and statistics time output for Sam's querySQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.SQL Server parse and compile time: CPU time = 13 ms, elapsed time = 13 ms.(7279 row(s) affected)Table 'test'. Scan count 7282, logical reads 72521, physical reads 0, read-ahead reads 0.SQL Server Execution Times: CPU time = 31250 ms, elapsed time = 15761 ms.SQL Server Execution Times: CPU time = 31250 ms, elapsed time = 15761 ms.SQL Server Execution Times: CPU time = 31250 ms, elapsed time = 15762 ms.SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. Here is the same info for spirit's codeStmtText ----------------------------------------------------------- SELECT A.SomeSetOfKeys, A.KeyDate, (select top 1 KeyDate from test where KeyDate > A.KeyDate) As NextKeyDateFROM test A(1 row(s) affected)StmtText ------------------------------------------------------------ |--Compute Scalar(DEFINE:([test].[keydate]=[test].[keydate])) |--Nested Loops(Left Outer Join, OUTER REFERENCES:([A].[keydate])) |--Index Scan(OBJECT:([DBA].[dbo].[test].[test2] AS [A])) |--Hash Match(Cache, HASH:([A].[keydate]), RESIDUAL:([A].[keydate]=[A].[keydate])) |--Top(1) |--Index Seek(OBJECT:([DBA].[dbo].[test].[test2]), SEEK:([test].[keydate] > [A].[keydate]) ORDERED FORWARD) here is the statistics io and statistics time info:SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 3 ms.(7280 row(s) affected)Table 'test'. Scan count 7281, logical reads 14594, physical reads 0, read-ahead reads 0.SQL Server Execution Times: CPU time = 120 ms, elapsed time = 773 ms.SQL Server Execution Times: CPU time = 120 ms, elapsed time = 773 ms.SQL Server Execution Times: CPU time = 120 ms, elapsed time = 774 ms.SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. I just noticed that the two queries differ by one row. Not sure why, time to hit the pub though.-ec |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2005-01-31 : 23:49:42
|
quote: Originally posted by eyechart CREATE INDEX [test2] ON [dbo].[test]([keydate], [somesetofkeys]) ON [PRIMARY]
What led you to add somesetofkeys after keydate when somesetofkeys is already a clustered index? (I think this may be my performance problem.) |
 |
|
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2005-02-01 : 01:36:33
|
quote: What led you to add somesetofkeys after keydate when somesetofkeys is already a clustered index? (I think this may be my performance problem.)
I think I was experimenting a bit. I had tried several combinations of indexes and wanted to see if a covered index helped any. this was really for the test that I was trying using a pseudo rownum. I essentially did a self join on some derived tables using this rownum (the 2nd table being rownum -1) to get the offset you were looking for. The performance was better than your query, but not as good as spirit's. It was cool looking though. :)I can post that code if you are interested, but it doesn't seem to be a path worth exploring at this point.I'll retest your query without the covered index to see what the outcome is.-ec |
 |
|
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2005-02-01 : 01:55:48
|
I re-ran without using the covered index and got the following numbers:SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms.SQL Server parse and compile time: CPU time = 39 ms, elapsed time = 39 ms.(7279 row(s) affected)Table 'Worktable'. Scan count 3547, logical reads 64049, physical reads 0, read-ahead reads 0.Table 'test'. Scan count 4, logical reads 75, physical reads 0, read-ahead reads 0.Table 'Worktable'. Scan count 3731, logical reads 66993, physical reads 0, read-ahead reads 0.SQL Server Execution Times: CPU time = 53830 ms, elapsed time = 27344 ms.SQL Server Execution Times: CPU time = 53830 ms, elapsed time = 27344 ms.SQL Server Execution Times: CPU time = 53830 ms, elapsed time = 27345 ms.SQL Server parse and compile time: CPU time = 0 ms, elapsed time = 0 ms.SQL Server Execution Times: CPU time = 0 ms, elapsed time = 0 ms. here is the execution plan:StmtText ------------------------------------------------------- SELECT A.SomeSetOfKeys, A.KeyDate, MIN(B.KeyDate) As NextKeyDateFROM test AINNER JOIN test B ON B.KeyDate > A.KeyDateGROUP BY a.SomeSetOfKeys, a.KeyDate(1 row(s) affected)StmtText ------------------------------------------------------------------- |--Parallelism(Gather Streams) |--Stream Aggregate(GROUP BY:([A].[somesetofkeys]) DEFINE:([Expr1002]=MIN([partialagg1003]), [A].[keydate]=ANY([A].[keydate]))) |--Parallelism(Repartition Streams, PARTITION COLUMNS:([A].[somesetofkeys]), ORDER BY:([A].[somesetofkeys] ASC)) |--Stream Aggregate(GROUP BY:([A].[somesetofkeys]) DEFINE:([partialagg1003]=MIN([B].[keydate]), [A].[keydate]=ANY([A].[keydate]))) |--Nested Loops(Inner Join, WHERE:([B].[keydate]>[A].[keydate])) |--Clustered Index Scan(OBJECT:([DBA].[dbo].[test].[PK_test] AS [A]), ORDERED FORWARD) |--Table Spool |--Clustered Index Scan(OBJECT:([DBA].[dbo].[test].[PK_test] AS [B]))(8 row(s) affected) It is clearly much faster using the composite index, at least in this test case. -ec |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-02-01 : 02:26:52
|
| > I just noticed that the two queries differ by one row.ec, of course, it must be one row less -- the maximal KeyDate joins to nothingand its row gets lost (because of INNER (not LEFT) JOIN).> SELECT A.SomeSetOfKeys, A.KeyDate,> (select top 1 KeyDate from test where KeyDate > A.KeyDate) As NextKeyDate> FROM test Aand I don't see ORDER BY in (select top 1 ... ). |
 |
|
|
eyechart
Master Smack Fu Yak Hacker
3575 Posts |
Posted - 2005-02-01 : 04:16:28
|
quote: Originally posted by Stoad > I just noticed that the two queries differ by one row.ec, of course, it must be one row less -- the maximal KeyDate joins to nothingand its row gets lost (because of INNER (not LEFT) JOIN).> SELECT A.SomeSetOfKeys, A.KeyDate,> (select top 1 KeyDate from test where KeyDate > A.KeyDate) As NextKeyDate> FROM test Aand I don't see ORDER BY in (select top 1 ... ).
You are right on both accounts. I see that now after a visit to the pub.Actually, both queries could benefit from an order by. Good catch on spirit's code though. TOP without an ORDER BY is asking for trouble.-ec |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2005-02-01 : 08:35:44
|
I get a 13 second execution time in this proc, and maybe that's the best it can do.I've isolated the join in the proc that we're discussing in this thread. It is slightly more complicated because there are 3 keys to join...SELECT -- Join each row to find the next required module timestamp P1.CourseID, P1.UserID, P1.ModuleID, DATEDIFF(ss, P1.URLDate, MIN(P2.URLDate)) As Seconds FROM dbo.PageHits P1 INNER JOIN dbo.PageHits P2 ON P2.URLDate > P1.URLDate AND P2.CourseID = P1.CourseID AND P2.UserID = P1.UserID AND P2.ModuleID > 0 WHERE P1.CourseID = 5932 AND P1.ModuleID > 0 GROUP BY P1.CourseID, P1.UserID, P1.ModuleID, P1.URLDate Now, this isn't that bad. Lemme 'splain:The table holds entries for all courses, all users, and records every module in the course visited with a DATETIME stamp. We want to calculate the time spent in the course for every user, so the time differential is what we are after in this thread."SomeSetOfKeys" is: CourseID, UserID. I chose to eliminate any row where ModuleID=0 (these are optional modules) which makes the calculation find a time spent on a "required module" basis.That wasn't so hard.My execution plan may be as good as it gets? |--Compute Scalar(DEFINE:([Expr1003]=datediff(second, [P1].[UrlDate], [Expr1002]))) |--Stream Aggregate(GROUP BY:([P1].[UserID], [P1].[ModuleID], [P1].[UrlDate]) DEFINE:([Expr1002]=MIN([P2].[UrlDate]), [P1].[CourseID]=ANY([P1].[CourseID]))) |--Nested Loops(Inner Join, OUTER REFERENCES:([P1].[UserID], [P1].[UrlDate])) |--Clustered Index Seek(OBJECT:([nimc].[dbo].[pagehits].[IDX_PateHitsCluster2] AS [P1]), SEEK:([P1].[CourseID]=5932), WHERE:([P1].[ModuleID]>0) ORDERED FORWARD) |--Index Spool(SEEK:([P2].[UserID]=[P1].[UserID] AND [P2].[UrlDate] > [P1].[UrlDate])) |--Clustered Index Seek(OBJECT:([nimc].[dbo].[pagehits].[IDX_PateHitsCluster2] AS [P2]), SEEK:([P2].[CourseID]=5932), WHERE:([P2].[ModuleID]>0) ORDERED FORWARD) The graphic execution plan shows an "index spool/eager spool". Maybe I'm not readhing the SHOWPLAN_TEXT right, but I don't see the eager spool there.Anyone know why these spools are called "eager"?CREATE TABLE dbo.PageHits2 ( P_ID INT NOT NULL IDENTITY (1,1) , [UrlDate] [datetime] NOT NULL DEFAULT (getdate()), [UserID] [int] NOT NULL , [CourseID] [int] NULL , [ModuleID] [int] NULL , CONSTRAINT [IDX_PateHitsCluster2] UNIQUE CLUSTERED ( CourseID, UserID, ModuleID, URLDate ) WITH FILLFACTOR = 80 ON [PRIMARY] ) ON [PRIMARY]GOCREATE INDEX IDX_PageHitsURLDate2 on dbo.PageHits2 (URLDate, CourseID, UserID, ModuleID) |
 |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2005-02-01 : 10:47:12
|
Sam.... let me ask you something about thisIn regards to quote: "SomeSetOfKeys" is: CourseID, UserID. I chose to eliminate any row where ModuleID=0 (these are optional modules) which makes the calculation find a time spent on a "required module" basis.
Is that true?? If I am in course1 (required) and I then got to course2 (not required) and I then go to course2 (required) and I then go to course4 (required)Isn't it true that the time spent on a course is the differential to the next chronological course, and not the next 'required' course. The required should not matter for determining time spent. (IMHO)With the index I think I would use spirit's solution as you are only looking up 1 value per row.Good Luck Corey "If the only tool you have is a hammer, the whole world looks like a nail." - Mark Twain |
 |
|
|
|
|
|
|
|