Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2005-07-20 : 14:42:01
|
After searching the forums (and enduring a temporary SQLTeam outage ) I have not found anything in the last 60 days to address this. I'm seeking input on a more efficient way to JOIN my history tables. Here's the short story: I'm building custom reports on a commercial application. Therefore, I have no option to restructure the database. In order to protect the interested parties, I have created an example focusing on the specific issue, which is the JOIN.The question is: Is there a better way to join the history table (such as a technique that does not use a subquery)? Unfortunately, my actual query will be referencing nigh-on to a dozen of these suckers, and I'm just a little nervous on general principles. Actual performance is acceptable for now, but we're early yet in the database's life and record counts are low.Here's the example:CREATE TABLE #LineItem ( ItemID int NOT NULL, ItemDate datetime NOT NULL, ProductID int NOT NULL)CREATE TABLE #ProductHistory ( ProductID int NOT NULL, UpdateDate datetime NOT NULL, Price money)GOINSERT INTO #LineItem (ItemID, ItemDate, ProductID) VALUES (1, '20050401', 1)INSERT INTO #LineItem (ItemID, ItemDate, ProductID) VALUES (2, '20050401', 2)INSERT INTO #LineItem (ItemID, ItemDate, ProductID) VALUES (3, '20050515', 1)INSERT INTO #ProductHistory (ProductID, UpdateDate, Price) VALUES (1, '20050101', 7.99)INSERT INTO #ProductHistory (ProductID, UpdateDate, Price) VALUES (1, '20050320', 10.00)INSERT INTO #ProductHistory (ProductID, UpdateDate, Price) VALUES (1, '20050514', 15.00)INSERT INTO #ProductHistory (ProductID, UpdateDate, Price) VALUES (1, '20050720', 22.22)INSERT INTO #ProductHistory (ProductID, UpdateDate, Price) VALUES (2, '20050101', 18.00)INSERT INTO #ProductHistory (ProductID, UpdateDate, Price) VALUES (2, '20050219', 19.50)INSERT INTO #ProductHistory (ProductID, UpdateDate, Price) VALUES (2, '20050401', 25.50)INSERT INTO #ProductHistory (ProductID, UpdateDate, Price) VALUES (2, '20050623', 37.77)GOSELECT L.ItemID, CONVERT(varchar(12), L.ItemDate, 101) AS ItemDate, P.PriceFROM #LineItem L JOIN #ProductHistory P ON L.ProductID = P.ProductID AND P.UpdateDate = ( SELECT MAX(UpdateDate) FROM #ProductHistory innerP WHERE innerP.ProductID = L.ProductID AND innerP.UpdateDate <= L.ItemDate )GODROP TABLE #LineItemDROP TABLE #ProductHistory The expected result is:ItemID ItemDate Price ----------- ------------ --------------------- 1 04/01/2005 10.00002 04/01/2005 25.50003 05/15/2005 15.0000 ---------------------------EmeraldCityDomains.com |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2005-07-20 : 15:08:56
|
Here's an option (same results without the correlated subquery)select b.ItemID ,convert(varchar,b.ItemDate,101) ItemDate ,Pricefrom #ProductHistory ajoin ( select ItemID ,ItemDate ,a.ProductID ,max(updateDate) UpdateDate from #lineitem a join #producthistory b on a.productid = b.productid where UpdateDate <= ItemDate group by ItemID ,ItemDate ,a.ProductID ) b on a.ProductID = b.ProductID and a.UpdateDate = b.UpdateDate If the reporting doesn't need to be real time and the tables get huge, you could go to either summary tables rebuilt on some acceptable schedule or a warehousing scenario where every <timeperiod> the next period of data is added to the ware house.Be One with the OptimizerTG |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-07-20 : 18:11:52
|
Here is another option:SELECT L.ItemID, CONVERT(varchar(12), L.ItemDate, 101) AS ItemDate, priceInterval.PriceFROM #LineItem L JOIN ( SELECT frmd.ProductId ,frmd.UpdateDate ,( SELECT MIN(tod.UpdateDate)-1 FROM #ProductHistory tod WHERE frmd.ProductId = tod.ProductId AND frmd.UpdateDate < tod.UpdateDate ) AS ValidTo ,frmd.Price FROM #ProductHistory frmd ) AS priceInterval ON L.ProductId = priceInterval.ProductId AND L.ItemDate BETWEEN priceInterval.UpdateDate AND ISNULL(priceInterval.ValidTo,L.ItemDate)) As a plain Join it will most probably do worse than the other options.You could however maintain the ValidTo column in the history tables themselves, (with triggers).The part in red is basically just a history table with to - from dates.rockmoose |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2005-07-20 : 18:48:15
|
| >> Therefore, I have no option to restructure the databaseCan you create views or TVF's or indexes if needed?DavidMA front-end is something that tries to violate a back-end. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-07-21 : 01:57:30
|
| Any way to create [even by adding another table etc.] a "Flag" column on #ProductHistory for the row, for a given product, which is the most recently dated (and enforce it with, say, a trigger?)Kristen |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-07-21 : 05:15:15
|
^^^^^^^^^^^^^^^^^^^^^^Did not read the specs----------------------Tell them that joining to the history tables and retrieving the correct price for each lineItme is slow by design (SBD) .rockmoose |
 |
|
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2005-07-21 : 11:58:38
|
quote: Originally posted by byrmol >> Therefore, I have no option to restructure the databaseCan you create views or TVF's or indexes if needed?DavidMA front-end is something that tries to violate a back-end.
I must need more coffee because I can't figure out what a TVF is right now. But in answer to your question, yes, I can add objects to the database so long as I don't blow up existing processing. I have written several stored procedures and functions for our custom reporting needs.Are you suggesting creating a view that would retrieve only the desired set of history records? Or what did you have in mind?---------------------------EmeraldCityDomains.com |
 |
|
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2005-07-21 : 12:03:00
|
quote: Originally posted by Kristen Any way to create [even by adding another table etc.] a "Flag" column on #ProductHistory for the row, for a given product, which is the most recently dated (and enforce it with, say, a trigger?)Kristen
I could, unfortunately I don't want "the most recent" record. I want the one that was the most recent at the time the record in the other table was created.Think of it like an order entry system where, instead of creating a snapshot of the order with its pricing at the time it was created like they should have done, the "developer" just kept a history of all price changes with dates. So the only way to know what price you charged for an item is to join it to the history table and figure out what price was in effect at the time the order was created.---------------------------EmeraldCityDomains.com |
 |
|
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2005-07-21 : 12:38:28
|
| TG and rockmoose, would you like to add any commentary on what you like about your approaches over mine? They may not be correlated subqueries, which I agree in general are slower than non-correlated ones, but interestingly when I convert the temp tables to real tables in my sandbox and then paste all three of our solutions into one window and show the execution plan, the "Query cost (relative to the batch)" shows:Mine: 29.37%TG: 36.69%rockmoose: 33.94%Of course these could all be radically skewed due to the small table sizes and lack of indexing since I literally just converted the example to use real tables instead of temps.---------------------------EmeraldCityDomains.com |
 |
|
|
emilg
Starting Member
13 Posts |
Posted - 2005-07-21 : 12:43:36
|
| The intresting fact is that the EXECUTION PLAN shows 44% for the SUBQUERY version and 55% for the JOIN version of the query.Try running both in the same batch on your database, from query analyzer with SHOW EXECUTION PLAN ON, and see what the execution plan sais.The results might be different than mine due to indexes and statistics, but this is a good start in optimising the query.--Emil |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-07-21 : 12:58:26
|
| Could You convert the part in red from my example to a real table, and join to that iof the derived version,and run a comparison test again ?(I did compare plans (each said 0%), but I have no data to speak of and can't say anything desicive about Your situation)However,if You can "denormalize for speed", creating the aditional column on the history table,or creating a precalculated table with that column, I think would be the easiest to maintain solution.And "the biggest" bang for the buck;(relatively easy to maintain, small impact, facilitate queries, improve performance)rockmoose |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-07-21 : 13:02:37
|
| "I don't want "the most recent" record"Ah, sorry. I'll read the question next time!Kristen |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2005-07-21 : 13:10:31
|
| >>TG and rockmoose, would you like to add any commentary on what you like about your approaches over mine?I try not to "like" one way over an other, I was just offering an alternative. I wouldn't make a decision until testing was done on real DDL loaded with data. Then let the results determine the decision.I still like the idea of summary tables or warehousing the data with a time dimension. Can make any of the reporting queries way faster than anything we could come up with against the raw tables.Be One with the OptimizerTG |
 |
|
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2005-07-21 : 17:08:33
|
| Unfortunately we don't have much data in this yet, even in production, and so I suspect that any test results will be skewed by the small sizes of the tables. Given that nobody has smacked me upside the head yet, saying, "You should use method x which is unquestionably better than your approach" I'll stick with what I've got for now, but will know to look back here if/when we have performance issues as the data size grows.TG, I agree that a separate reporting database or warehouse would be a great solution, but unfortunately, work demands and project resources do not lend themselves to that approach at this time. Again, something I'll keep in mind for future reference.Thanks for the help!---------------------------EmeraldCityDomains.com |
 |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2005-07-21 : 17:18:51
|
| Don't forget about the idea of generating a few million rows for testing in your testbed or sandbox or whatever we call it these days.good luckBe One with the OptimizerTG |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-07-21 : 17:42:03
|
quote: Originally posted by TG Don't forget about the idea of generating a few million rows for testing in your testbed or sandbox or whatever we call it these days.good luckBe One with the OptimizerTG
Concur,work done now might save work later.(Or be a waste of time...)rockmoose |
 |
|
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2005-07-21 : 19:11:43
|
Yes, good point. I'll have to give some serious thought to best way to produce a good, large set of test data for this monster. Rockmoose, I like your caveat. BTW, TG, what I was referring to earlier as my sandbox is a database on my local SQL Server that is used for testing ideas, especially table structures, that does not need to be cleaned up. It's a database that I can do anything I want to, including DROPPING it at any time without any consequences to actual dev/test/live environments. But hey, that's some of the "fun" of IT... Reusing acronyms and jargon to mean completely different things and then wondering why we can't seem to communicate clearly with others.---------------------------EmeraldCityDomains.com |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2005-07-21 : 19:21:35
|
>>I must need more coffee because I can't figure out what a TVF is right now.No, I need to stop being lazy on the keyboard...Table Valued Function. Easier to type than "Table Valued UDF". The irony....I think your original solution is the easiest to understand and would probably stick to that until things hit fans...I was thinking more about making the reporting queries easier to understand/maintain..By moving the scalar subquery value into the "core" table (LineItem), we have a central location for "history logic"...create view LineItem_HistoricalasSELECT L.ItemID, L.ItemDate, L.ProductID , ( SELECT MAX(UpdateDate) FROM ProductHistory innerP WHERE innerP.ProductID = L.ProductID AND innerP.UpdateDate <= L.ItemDate ) as PriceDate --Repeat other historical valuesFROM LineItem LgoSELECT L.ItemID, L.ItemDate, L.ProductID, P.Pricefrom LineItem_Historical AS LINNER join ProductHistory P ON L.ProductID = P.ProductID AND P.UpdateDate = L.PriceDate Hopefully SQL Server would be smart enough to use the identical plan to your original query..DavidMA front-end is something that tries to violate a back-end. |
 |
|
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2005-07-22 : 13:12:51
|
| David, I really like the view idea for a couple of reasons. One is that it makes the main SELECT easier to read. It also simplifies the reusability of this concept, and we will definitely need to reuse it. Thanks for the push.More irony is that the grand SELECT statement I'm working on will probably be placed inside a TVF.---------------------------EmeraldCityDomains.com |
 |
|
|
|
|
|
|
|