Author |
Topic |
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2007-10-03 : 08:11:21
|
I'm wondering if anyone has a better Effective Date Pattern than what I've been doing.Simply put, I'm working with a database that has an "Effective Date" (and sometimes an "Effective Sequence", as well) on just about every table.The two scenarios I most frequently encounter are:1.) I need to query on the "most recent":select ...from Source.Table twhere not exists ( select 1 from Source.Table t1 where t.PK = t1.PK and ((t.EffectiveDate < t1.EffectiveDate) or (t.EffectiveDate = t1.EffectiveDate and t.EffectiveSequence < t1.EffectiveSequence))) 2.) I need to join two tables based on a Foreign Key and an Effective Date (Sequence):select ...from Source.Table t left join Source.OtherTable ot on t.FK = ot.PK and ot.EffectiveDate <= t.EffectiveDate and not exists ( select 1 from Source.OtherTable ot1 where ot.PK = ot1.PK and ot1.EffectiveDate <= t.EffectiveDate and ((ot.EffectiveDate < ot1.EffectiveDate) or (ot.EffectiveDate = ot1.EffectiveDate and ot.EffectiveSequence < ot1.EffectiveSequence))) Now, this works fine, but I'm wondering if there is a better way. Once I start joining together 15 or so Effective Dated Source tables sprinkled in with many 5+ column candidate keys and some pretty heavy business rules ... my queries a) perform poorly and b) become very difficult to debug.I'll admit that I haven't spent a ton of time investigating a better pattern.Let me also say, I'm well aware of alternatives to the "not exists(<correlated subquery>)" approach, such as the "left join where the right-side is null" and the "in(select max())". I've found that in my domain it is so often true the that "not exists(<correlated subquery>)" is the superior performer, that I hardly even bother to test alternatives anymore.I first started down the path of creating a CTE based view for some of these Effective Dated Source tables using a row_number() partition over Effective Date. This works well for simple situations (1. above); however, I'm finding that when applying this CTE Pattern to 2. above, I'm getting some heavy-duty sort operations in my plan that are killing performance. Plus, I can't use the CTE Pattern for my Source.OtherTable since the desired row may not be the most recent. I've tabled the CTE approach for now pending further thought.I was think yesterday about Partitioned Indexes, but I haven't tried anything with that.I'm tempted to think that due to my problem domain and the massive queries I need to write (I'm talking about Data Mart transformations here), my existing pattern may be all I have to work with. I just wanted to throw this out there to see if anyone can spark some new ideas for me.ThanksJayto here knows when |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-03 : 08:15:22
|
Something like this?-- Prepare sample dataDECLARE @Sample TABLE (PkCol INT, Section VARCHAR(5), EffectiveDate DATETIME)INSERT @SampleSELECT 1, 'Home', '20070910 15:00' UNION ALLSELECT 2, 'Home', '20071003 23:19' UNION ALLSELECT 7, 'Away', '20071003 15:00' UNION ALLSELECT 22, 'Home', '20071002 08:44' UNION ALLSELECT 55, 'Away', '20071003 15:01'-- Show the expected outputSELECT PkCol, Section, EffectiveDateFROM ( SELECT PkCol, Section, EffectiveDate, ROW_NUMBER() OVER (PARTITION BY Section ORDER BY EffectiveDate DESC) AS RecID FROM @Sample ) AS dWHERE RecID = 1PkCol Section EffectiveDate----- ------- -----------------------55 Away 2007-10-03 15:01:00.0002 Home 2007-10-03 23:19:00.000 E 12°55'05.25"N 56°04'39.16" |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-10-03 : 08:20:00
|
I sort of have this two in multi-part-keys where I have a Record Card [in my application] which shows a single record, and has Next/Back options which do:SELECT TOP 1 *FROM MyTableWHERE PKey1 < @MyPKey1 OR (PKey1 = @MyPKey1 AND PKey2 < @MyPKey2 OR (PKey1 = @MyPKey1 AND PKey2 = @MyPKey2 AND PKey3 < @MyPKey3) )ORDER BY PKey1, PKey2, PKey3 an if I just had a "sequence number" column, that was always in order by PKey1, PKey2, PKey3 I would be larfing.We do maintain a Sequence Number column on some of our hierarchical tables, so that we have "walking order" to display the tree, but if anything changes we have to renumber all/part of the table.(Actually, we do have some reasonably intelligent stuff for that which only renumbers up to the next-higher record such that there is enough space to accommodate the renumber ...), but it still generates huge amounts of TLog when we are rapidly inserting in our Hierarchy tables (and stupidly we do NOT have a "Do not renumber until all the following inserts are done" method)Sorry, just thinking out loud ...Kristen |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-10-03 : 08:20:52
|
I was thinking that "ROW_NUMBER() OVER (PARTITION BY ..." might help.Any ideas how performant that is Peso?Kristen |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-03 : 08:22:48
|
It is very fast. And do not add many extra reads at all to the query.I recently compared ROW_NUMBER() with DENSE_RANK() and found that ROW_NUMBER is superior in getting sequence numbers. E 12°55'05.25"N 56°04'39.16" |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-03 : 08:33:01
|
Please post some proper and accurate sample data together with expected output. E 12°55'05.25"N 56°04'39.16" |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-10-03 : 08:49:37
|
"It is very fast"That sounds faster than inserting into @TableVar with IDENTITY column Kristen |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-03 : 08:50:59
|
The drawback with that method is that you need to READ again from @TableVar.Also TOP n records for each group must be self-joined to @TableVar. E 12°55'05.25"N 56°04'39.16" |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2007-10-03 : 09:32:17
|
quote: Originally posted by Peso Please post some proper and accurate sample data together with expected output. E 12°55'05.25"N 56°04'39.16"
I can't easily post due to IP issues, sorry.You've sparked some thoughts for me ... one of the pieces that I struggled with when I first looked at a row_number partion is that when I go to join to Source.OtherTable, I need to get the most recent Effective Date that is less than the Source.Table Effective Date. Obviously, you can't reference Source.Table.PK inside the Source.OtherTable derived query ... however, what I just realized is that the partitioning should be done over Source.Table and Source.OtherTable at the same time.I'm trying to re-engineer one of my queries to give this a try ... if I get stuck, I'll concoct some sample data and expected result that will highlight where I'm stuck ...Thanks for you help.Jayto here knows when |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-03 : 09:52:27
|
[code]DECLARE @Source TABLE (Pk INT, dt DATETIME)INSERT @SourceSELECT 1, '20070910' UNION ALLSELECT 1, '20071003' UNION ALLSELECT 2, '20071002' UNION ALLSELECT 2, '20070822'DECLARE @Other TABLE (Pk INT, dt DATETIME)INSERT @OtherSELECT 1, '20071004' UNION ALLSELECT 1, '20070101' UNION ALLSELECT 2, '20070901';WITH Yak (Pk, dt, RecID)AS ( SELECT s.Pk, s.dt, ROW_NUMBER() OVER (PARTITION BY s.Pk ORDER BY s.dt DESC) AS RecID FROM @Source AS s INNER JOIN @Other AS o ON o.Pk = s.Pk WHERE s.dt < o.dt)SELECT Pk, dtFROM YakWHERE RecID = 1[/code] E 12°55'05.25"N 56°04'39.16" |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2007-10-03 : 10:51:27
|
Peso, in your above query, I'll also need to order the partition by o.dt desc ...Should the partition then be on s.PK and o.PK?Jayto here knows when |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-03 : 10:55:18
|
No, not since they are used for the JOIN operation, which means they are the same. E 12°55'05.25"N 56°04'39.16" |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-10-03 : 11:55:11
|
Some thoughts:1) ROWNUMBER() is the way to go, it saves a self-join and is very efficient. I recommend creating simple "Latest" views for all of your enities using ROWNUMBER() and you can use that where you need it.2) for point in time reporting as of an effective date, remember that you can use table-valued UDF's that might accept an "effective date" parameter. That, combined with the new CROSS APPLY and OUTER APPLY features in SQL 2005, this should allow you to do whatever you need fairly efficiently. So, in addition to your "Latest" views, you could create "As-Of" user-defined-functions for each of your tables as well.In scenario #2, that means you can write something like this:select A.*, B.*from dbo.TableA_AsOf(@EffectiveDate) Ainner join dbo.TableB_AsOf(@EffectiveDate) B on A.ID= B.IDtest for performance, of course, but certainly if you can create standard views/UDF's like this it will make your SQL very clean and clear and easy to work with.- Jeffhttp://weblogs.sqlteam.com/JeffS |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2007-10-04 : 08:11:04
|
Thanks for you input, Jeff ... my main problem is when dbo.TableB_AsOf needs to consume a.EffectiveDate ... this puts performance in the toilet. I think I have a good handle on using the row_number partition across the joins of all my tables ordered by the various EffectiveDates ... thanks everyone for you input.Jayto here knows when |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2007-10-04 : 09:18:51
|
I've found the best thing for effective dates is a sequence id for the rest of the key. I waited a long time to add this column in, but I finally decided to add it about 6 months ago... and speed has been up . To me the key to adding the sequence id was getting a trigger in place so I didn't have to manage it.Granted, I'm still working in 2000 without all of the fancy shmancy row_number options .CoreysnSQL on previous signature "...Oh and by the way Seventhnight, your signature is so wrong! On so many levels, morally, gramatically and there is a typo!" |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-10-05 : 09:16:18
|
"I've found the best thing for effective dates is a sequence id for the rest of the key"Do you mean that you have a "walking order sequence" number column?We have that on a Hierarchy table. But we store it in a separate, parallel, one:one table.The main table is "audited". We don't want updates caused only by changes to sequence number to be audited.Also, on bulk-inserts we turn OFF the automatic-renumber thingie, and then rebuild the whole Sequence table afterwards. Reduces the number of hits on the TLog significantly!(e.g. if we move a record to a different part of the hierarchy and it has loads of child items that need to be moved too)Kristen |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2007-10-05 : 09:40:41
|
quote: Originally posted by Page47 Thanks for you input, Jeff ... my main problem is when dbo.TableB_AsOf needs to consume a.EffectiveDate ... this puts performance in the toilet. I think I have a good handle on using the row_number partition across the joins of all my tables ordered by the various EffectiveDates ... thanks everyone for you input.Jayto here knows when
You actually can have TableB consume the effective date of TableA via a UDF if you use cross apply:select *from dbo.TableA(@EffectiveDate) Across apply dbo.TableB(A.EffectiveDate, A.ID)but I am not sure of the performance implications.- Jeffhttp://weblogs.sqlteam.com/JeffS |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-10-05 : 16:26:44
|
There is not only CROSS APPLY.You also have OUTER APPLY, which is pretty much like "LEFT JOIN WHERE IS NULL". E 12°55'05.25"N 56°04'39.16" |
|
|
|