Author |
Topic |
AskSQLTeam
Ask SQLTeam Question
0 Posts |
Posted - 2002-07-16 : 09:21:52
|
neil submitted "A TSQL tutorial on spotting breaks in data sequences" Article Link. |
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-07-16 : 09:33:07
|
In almost all cases the WHERE NOT EXISTS correlated sub-query will outperform the LEFT JOIN WHERE therightside IS NULL....Run this in QA and Show Execution Plan...create table testTable (i int) goinsert testTable select 1 union select 2 union select 3 union select 4 union select 6 union select 7 union select 10 union select 11 union select 15 union select 21goselect 'there is a gap between ' + convert(varchar, l.i) + ' and ' + (select convert(varchar, min(i)) from testTable where i > l.i)from testTable lleft join (select i from testTable ) r on l.i + 1 = r.i where r.i IS NULLand l.i < (select max(i) from testTable)order by l.iselect 'there is a gap between ' + convert(varchar, l.i) + ' and ' + (select convert(varchar, min(i)) from testTable where i > l.i)from testTable lwhere not exists ( select 47 from testTable r where l.i + 1 = r.i )and l.i < (select max(i) from testTable)order by l.igodrop table testTablego SubTree cost of .266 vs. .196 . . .[edit]Although, this is highly dependend on indexing. A clustered index on testTable(i) gives the LEFT JOIN query a .0578 and the NOT EXISTS query a .0747.[/edit]<O>Edited by - Page47 on 07/16/2002 10:09:01 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2002-07-16 : 13:13:08
|
If i is a clustered primary key (etc.) then (SQL Server 2000 certainly) is able to use a merge join between l and r, which is handy. This applies to both the LEFT JOIN and NOT EXISTS versions. Unfortunately, if you try r.i - l.i = 1 it is unable to rearrange this!BTW, the number of gaps for unique values 1 .. n where each value has an independent probability of being present of p is n*p - n*p^2. |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2002-11-08 : 16:43:23
|
OK, here's an alternate and shorter very quick way to spot gaps: SELECT "Gap between " & x.Value & " and " & Max(y.value) AS MessageFROM Vals AS y, Vals AS xWHERE y.Value < x.[value]GROUP BY x.ValueHAVING Max(y.Value) <> x.value - 1If you are using dates, just edit the "x.value - 1" as needed. For example, DateAdd(Day,-1,x.value).I think this takes care of all cases ..... Have fun! |
|
|
billsox
Yak Posting Veteran
74 Posts |
Posted - 2002-11-08 : 22:30:07
|
jsmith8858, this code is golden! Can you explain exactly what it's doing?Billquote: OK, here's an alternate and shorter very quick way to spot gaps: SELECT "Gap between " & x.Value & " and " & Max(y.value) AS MessageFROM Vals AS y, Vals AS xWHERE y.Value < x.[value]GROUP BY x.ValueHAVING Max(y.Value) <> x.value - 1If you are using dates, just edit the "x.value - 1" as needed. For example, DateAdd(Day,-1,x.value).I think this takes care of all cases ..... Have fun!
|
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2002-11-10 : 18:01:38
|
SELECT "Gap between " & x.Value & " and " & Max(y.value) AS Message FROM Vals AS y, Vals AS x WHERE y.Value < x.[value] GROUP BY x.Value HAVING Max(y.Value) <> x.value - 1 WHERE y.Value < x.[value] get all y values that are less than the x in this groupHAVING Max(y.Value) <> x.value - 1 Exclude groups where there is a a y = x - 1 i.e. no gapSELECT "Gap between " & x.Value & " and " & Max(y.value)give the x and max(y) = greatest value less than the x where there is a gap.same as (hope this works - removed some strange " and &)SELECT 'Gap between ' + x.Value + ' and ' + y.value AS Message FROM Vals AS y, Vals AS x WHERE y.Value = (select max(z.Value) from Vals z where z.Value < x.Value)and y.Value <> x.Value - 1==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2002-11-11 : 15:04:49
|
sorry, my SQL was kind of a mix of SQL server and Access.NR, you are pretty right on about explaining how my query works. However, it is very different from your example for mostly 1 reason: your example needs to enumerate through the "Values" table 3 times, and mine only twice. If that table is really a slow view or huge table, the performance difference can be very important. Also, I am not sure what your example is doing, or why it needs the subquery in the WHERE clause. All of the logic components are provided just by doing a cartesian of the table to itself 1 time.Another way of wording my query in english:"For each value in the table (call this "x"), get the highest value ("y") less than x. If x <> (y + 1), there must be a gap between x and y."Examples like this illustrate the big difference between WHERE and HAVING clauses which confused me for quite some time as I was beginning SQL.Edited by - jsmith8858 on 11/15/2002 13:50:59 |
|
|
BusmasterJones
Starting Member
4 Posts |
Posted - 2004-01-15 : 11:42:30
|
I ran a comparision test between jsmith's HAVING version of the query and a slightly modified (to fix the datatype problem on concatenating the output) version of nr's MAX query. I ran these queries on a table with approx 14000 rows and 33 gaps. The column checked is an IDENTITY column functioning as the PK and has a nonclustered index on it. Here are the results:nr query: 33 gaps found in < 1 sec execution timejsmith query: 33 gaps found in 1 min 49 sec execution timeThe query plans are radically different. The faster query really only uses the indexes to solve the problem. The slower query gets a cartesian product of the table being searched joined to itself (in this case 14000 x 14000 rows). Because of some filtering first due to the WHERE clause, it is actually more like 14000 x 7000 rows, but this still deals with almost 100,000,000 rows.Clearly "small" differences in query construction can mean BIG differences in performance. |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2004-01-15 : 12:36:21
|
you know, i think i wrote that one before i fully grasped the concepts of subqueries in SQL Server -- i was used to doing things more of the old Access way.And, of course, as time goes on we get much wiser -- when I revisit this problem, I then realize that this might be the most efficient way:select v1,v2 from(SELECT x.i v1, (select max(y.i) from vals y where y.i < x.i) as v2from vals x) awhere v1 <> v2+1which is a little bit faster than NR's method, and requires a little less work.And MUCH MUCH more efficient than my old way of doing it !!!!Great job, Busmaster, thanks for the info and doing the actual test. FYI -- here's how I tested:create table vals(i int primary key, d varchar(10))GOdeclare @i int;set @i=0set nocount onwhile @i < 14000 begin if @i % 424 <> 0 insert into vals(i, d) values(@i, 'dummy data') set @i = @i+1 endset nocount offGO-- my "new" way:select v1,v2 from(SELECT x.i v1, (select max(y.i) from vals y where y.i < x.i) as v2from vals x) awhere v1 <> v2+1-- NR's method:SELECT x.i, y.iFROM Vals y, Vals x WHERE y.i = (select max(z.i) from Vals z where z.i < x.i)and y.i <> x.i - 1GOdrop table vals - Jeff |
|
|
CactusJuice
Starting Member
46 Posts |
Posted - 2005-04-15 : 12:25:39
|
What happened to the original article? I'd like to see it. Get a 404 error and Google's cache is empty so I assume it's been missing for a while. |
|
|
Eejit
Starting Member
1 Post |
Posted - 2005-07-14 : 10:33:48
|
Ditto Cactus Juice! Cannot find original article, which seems to cover the exact problem I've been struggling with all day. Googled it, tried different variations on DNS, etc. Any chance someone could fix the broken link please? |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2005-07-18 : 15:56:08
|
Don't know what the article said but have a look athttp://www.mindsdoor.net/SQLTsql/FindGapsInSequence.html==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
|