Author |
Topic |
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-11 : 16:58:55
|
Details for #sqlhelp question:I am working my way through long-running, high CPU and high Reads stored procedures and am puzzled by this one. Original code:UPDATE t1SET [status] = 'C'FROM dbo.table1 t1WHERE t1.planned_status = 'Y' AND t1.[status] = 'N' AND t1.[wave_type_id] = 1 AND NOT EXISTS ( SELECT * FROM dbo.table2 t2 JOIN dbo.table3 t3 ON t3.order_number = t2.order_number AND t3.wh_id = t2.wh_id WHERE t2.[status] <> 'SHIPPED' AND t3.wave_id = t1.wave_id AND t2.[type_id] = 1);UPDATE t1SET [status] = 'N'FROM dbo.table1 t1WHERE t1.planned_status = 'Y' AND t1.[status] = 'C' AND t1.wave_type_id = 1 AND EXISTS ( SELECT * FROM dbo.table2 t2 JOIN dbo.table3 t3 ON t3.order_number = t2.order_number AND t3.wh_id = t2.wh_id WHERE t2.[status] <> 'SHIPPED' AND t3.wave_id = t1.wave_id AND t2.[type_id] = 1); They looked so similar and since it's scanning table3, I figured I'd combine them. Refactored:UPDATE t1SET [status] = CASE WHEN t1.[status] = 'N' AND dt.wave_id IS NULL THEN 'C' WHEN t1.[status] = 'C' AND dt.wave_id IS NOT NULL THEN 'N' ENDFROM dbo.table1 t1LEFT JOIN ( SELECT DISTINCT t3.wave_id FROM dbo.table2 t2 JOIN dbo.table3 t3 ON t3.order_number = t2.order_number AND t3.wh_id = t2.wh_id WHERE t2.[status] <> 'SHIPPED' AND t2.[type_id] = 1) dtON t1.wave_id = dt.wave_idWHERE t1.planned_status = 'Y' AND t1.wave_type_id = 1; CPU has improved a little, Reads have improved a lot, Duration has improved a bit...but Writes have gone up significantly.Here's the aggregated results from Profiler:StoredProcVersion DurationAvgMs DurationMinMs DurationMaxMs ReadsAvg ReadsMin ReadsMax WritesAvg WritesMin WritesMax CpuAvg CpuMin CpuMax----------------- ------------- ------------- ------------- -------- -------- -------- --------- --------- --------- ------ ------ ------Baseline 13441 7238 39801 2654503 2610548 2707097 11 0 56 7688 6770 8642Refactored 7443 6106 16633 611615 603664 613221 4461 580 4605 6410 6068 6973 Why would Writes go up SO much?If you see a more efficient way to do this, let me know. I'm all ears.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-11 : 17:26:27
|
I've added "AND t1.[status] IN ('C', 'N')" to the refactored code's WHERE clause. We are re-running a load test to see how it does. The status column can be C, N or NULL.2nd code change:UPDATE t1SET [status] = CASE WHEN t1.[status] = 'N' AND dt.wave_id IS NULL THEN 'C' WHEN t1.[status] = 'C' AND dt.wave_id IS NOT NULL THEN 'N' ENDFROM dbo.table1 t1LEFT JOIN ( SELECT DISTINCT t3.wave_id FROM dbo.table2 t2 JOIN dbo.table3 t3 ON t3.order_number = t2.order_number AND t3.wh_id = t2.wh_id WHERE t2.[status] <> 'SHIPPED' AND t2.[type_id] = 1) dtON t1.wave_id = dt.wave_idWHERE t1.planned_status = 'Y' AND t1.[status] IN ('C', 'N') AND t1.wave_type_id = 1; Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-11 : 17:57:53
|
3rd code change as 1st and 2nd produced some inaccurate results:UPDATE t1SET [status] = CASE WHEN t1.[status] = 'N' AND dt.wave_id IS NULL THEN 'C' WHEN t1.[status] = 'C' AND dt.wave_id IS NOT NULL THEN 'N' ELSE t1.[status] ENDFROM dbo.table1 t1LEFT JOIN ( SELECT DISTINCT t3.wave_id FROM dbo.table2 t2 JOIN dbo.table3 t3 ON t3.order_number = t2.order_number AND t3.wh_id = t2.wh_id WHERE t2.[status] <> 'SHIPPED' AND t2.[type_id] = 1) dtON t1.wave_id = dt.wave_idWHERE t1.planned_status = 'Y' AND ( (t1.[status] = 'N' AND dt.wave_id IS NULL) OR (t1.[status] = 'C' AND dt.wave_id IS NOT NULL) ) AND t1.wave_type_id = 1; Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2014-12-11 : 23:34:56
|
I don't have a better suggestion, but I'm guessing the DISTINCT is spooling somewhere and probably causing the writes to go up. However, once spooled, it would have fewer pages to read and process, leading to the CPU and read reductions you're seeing.I imagine if you put this on #sqlhelp you had someone weigh in who knows what they're talking about. (I haven't checked). |
|
|
ScottPletcher
Aged Yak Warrior
550 Posts |
Posted - 2014-12-12 : 12:46:33
|
I suggest this:UPDATE t1SET [status] = CASE WHEN t1.[status] = 'C' THEN 'N' ELSE 'C' ENDFROM dbo.table1 t1OUTER APPLY ( SELECT TOP (1) 1 AS check_exists FROM dbo.table2 t2 JOIN dbo.table3 t3 ON t3.order_number = t2.order_number AND t3.wh_id = t2.wh_id WHERE t2.[status] <> 'SHIPPED' AND t3.wave_id = t1.wave_id AND t2.[type_id] = 1) AS ca1WHERE t1.planned_status = 'Y' AND t1.[status] IN ('C', 'N') AND t1.[wave_type_id] = 1 AND ((t1.[status] = 'C' AND ca1.check_exists IS NULL) OR (t1.[status] = 'N' AND ca1.check_exists IS NOT NULL)) |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-12 : 12:59:26
|
We just completed a load test of my 3rd code change above, and here are the results:StoredProcVersion DurationAvgMs DurationMinMs DurationMaxMs ReadsAvg ReadsMin ReadsMax WritesAvg WritesMin WritesMax CpuAvg CpuMin CpuMax----------------- ------------- ------------- ------------- -------- -------- -------- --------- --------- --------- ------ ------ ------Baseline 13441 7238 39801 2654503 2610548 2707097 11 0 56 7688 6770 8642Refactored 7796 5126 22099 60355 52910 62064 3 0 23 5495 5070 6209 Scott, I will try out your suggestion! I like the idea of TOP 1. I've seen OUTER APPLY many times, but it isn't something I understand yet. Will dig into that.I'll schedule a load test for your solution. Thank you!Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-18 : 13:37:23
|
We've finally completed a load test that used Scott's solution. I'm a little confused by the results. The OUTER APPLY solution has lower duration and CPU but much higher reads than the LEFT JOIN/DISTINCT solution (but still lower reads than the baseline). StoredProcVersion DurationAvgMs DurationMinMs DurationMaxMs ReadsAvg ReadsMin ReadsMax WritesAvg WritesMin WritesMax CpuAvg CpuMin CpuMax-------------------------- ------------- ------------- ------------- -------- -------- --------- --------- --------- --------- ------ ------ -----------Baseline 13441 7238 39801 2654503 2610548 2707097 11 0 56 7688 6770 8642RefactoredLeftJoinDistinct 7796 5126 22099 60355 52910 62064 3 0 23 5495 5070 6209RefactoredOuterApply 1435 481 4595 624327 396091 859621 6 0 26 863 483 1310 Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
mandm
Posting Yak Master
120 Posts |
Posted - 2014-12-18 : 15:50:27
|
First off let me say that you guys and gals rock when it comes to this stuff and I am not in the same league as you when it comes to SQL Server knowledge. That said I do sometimes find that CTE's work well for situations like this so here is my stab at it.WITH Sub_Query_CTE (wave_id)AS( SELECT DISTINCT t3.wave_id FROM dbo.table2 t2 JOIN dbo.table3 t3 ON t3.order_number = t2.order_number AND t3.wh_id = t2.wh_id WHERE t2.[status] <> 'SHIPPED' AND t2.[type_id] = 1 ) UPDATE t1 SET [status] = CASE WHEN t1.[status] = 'N' AND dt.wave_id IS NULL THEN 'C' WHEN t1.[status] = 'C' AND dt.wave_id IS NOT NULL THEN 'N' ELSE t1.[status] END FROM dbo.table1 t1LEFT JOIN Sub_Query_CTE AS dt ON t1.wave_id = dt.wave_id WHERE t1.planned_status = 'Y' AND ( (t1.[status] = 'N' AND dt.wave_id IS NULL) OR (t1.[status] = 'C' AND dt.wave_id IS NOT NULL) ) AND t1.wave_type_id = 1; |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-18 : 15:55:40
|
The CTE solution is the same as the LEFT JOIN/DISTINCT solution. It's just a different way of writing it: CTE vs derived tables. So yes it'll be better than the baseline and should be equivalent to the solution I wrote and tested last week, but the OUTER APPLY solution from Scott beats all solutions thus far.Editing this post as it sounds like I am dismissing your idea. I wanted to say THANK YOU for offering another solution. I will do a test to make sure that what I wrote above regarding CTE vs. derived tables being the same.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
ScottPletcher
Aged Yak Warrior
550 Posts |
Posted - 2014-12-18 : 16:02:39
|
Very interesting. I'd have to see the query plan for my query. I'd hope/expect that SQL would use a lazy/table spool (Edit: or maybe, less hopefully, a work table) for the OUTER APPLY to prevent "re-lookups". That will generate more "reads" (logical reads), but the spool reads will be much more efficient vs. re-running the entire lookup query again.Thus, if a spool is being used, I can see a fixed LEFT JOIN having technically less overall reads but still possibly taking more time to run. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-18 : 16:20:37
|
I did post the showplan xml but have now deleted it because I realized it was for a different stored procedure that has very similar code. The whoisactive table doesn't contain the info for your query as it's too efficient now. I'll grab it through a trace or XE session the next time we have a load test run. We're refreshing the PERF environment now, which will take at least a day.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
namman
Constraint Violating Yak Guru
285 Posts |
Posted - 2014-12-29 : 18:53:27
|
Why not doing simple like this :UPDATE t1SET [status] = case when t1.status='C' then 'N' when t1.status='N' then 'C' else t1.status endFROM dbo.table1 t1WHERE t1.planned_status = 'Y' AND t1.[wave_type_id] = 1 AND NOT EXISTS ( SELECT * FROM dbo.table2 t2 JOIN dbo.table3 t3 ON t3.order_number = t2.order_number AND t3.wh_id = t2.wh_id WHERE t2.[status] <> 'SHIPPED' AND t3.wave_id = t1.wave_id AND t2.[type_id] = 1); |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-29 : 19:20:22
|
quote: Originally posted by namman Why not doing simple like this :UPDATE t1SET [status] = case when t1.status='C' then 'N' when t1.status='N' then 'C' else t1.status endFROM dbo.table1 t1WHERE t1.planned_status = 'Y' AND t1.[wave_type_id] = 1 AND NOT EXISTS ( SELECT * FROM dbo.table2 t2 JOIN dbo.table3 t3 ON t3.order_number = t2.order_number AND t3.wh_id = t2.wh_id WHERE t2.[status] <> 'SHIPPED' AND t3.wave_id = t1.wave_id AND t2.[type_id] = 1);
I did a brief test on it as I was suspicious it wasn't functionally equivalent to the original or the OUTER APPLY that we went with. When I switch it to a SELECT for testing purposes, it returns no data but the original does. I didn't dig into it.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
namman
Constraint Violating Yak Guru
285 Posts |
Posted - 2014-12-29 : 20:21:51
|
Surprise ... if returning no data. I Just remove 1 condition from your original post to include all statusUPDATE t1SET [status] = 'C'SET [status] = case when t1.status='C' then 'N' when t1.status='N' then 'C' else t1.status endFROM dbo.table1 t1WHERE t1.planned_status = 'Y' AND t1.[status] = 'N' AND t1.[wave_type_id] = 1 AND NOT EXISTS ( SELECT * FROM dbo.table2 t2 JOIN dbo.table3 t3 ON t3.order_number = t2.order_number AND t3.wh_id = t2.wh_id WHERE t2.[status] <> 'SHIPPED' AND t3.wave_id = t1.wave_id AND t2.[type_id] = 1); Perhaps, I miss something .... |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2014-12-29 : 20:52:45
|
The original is now returning no data (due to the time of day), but yours is returning 118,000 rows. I think the issue is with status=C needs to be paired with EXISTS in the WHERE clause, while status=N needs to be paired with NOT EXISTS. Yours is both statuses (only two that exist in the table) and NOT EXISTS.I don't understand the business logic of this stored procedure to be able to explain what it's doing. I can from a T-SQL standpoint, but I wouldn't be adding any value to this thread since those replying are T-SQL geniuses.Tara KizerSQL Server MVP since 2007http://weblogs.sqlteam.com/tarad/ |
|
|
namman
Constraint Violating Yak Guru
285 Posts |
Posted - 2014-12-29 : 22:39:43
|
Got it, that what I miss.... |
|
|
|