Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-07-25 : 17:55:42
|
I have seen questions posted a number of times where someone wants to find the maximum or minimum value from a set of columns in a single row of a table.This script demonstrates two methods for finding the maximum value in a row across a set of columns in the row when any or all of the columns are allowed to be null or equal.Method 1 uses a UNION ALL sub query for all the columns with a MAX. It is much simpler to code and test, especially when you get much past 4 columns. Adding another column is as simple as adding one more SELECT to the subquery.Method 2 uses a CASE statement to determine the MAX. It is much more complex to code (and test), and gets exponentially harder to code as the number of columns goes up. I think 5 or 6 columns may be about the limit of complexity of coding that you would want to take on. One advantage of this script is that you can use the simpler to code Method 1 to test the more complex code for the Method 2 if you choose to implement it as a CASE statement.If you have another method you would like to contribute, feel free. Also, if anyone wants to post performance test results, that would be nice.print 'Create table to hold test data'create table #t ( number int not null primary key clustered, Val1 int, Val2 int, Val3 int, Val4 int)GOprint 'Load test data'insert into #tselect number, -- Generate random numbers -- with about 1/7th null case when abs(checksum(newid()))%7 = 0 then null else checksum(newid())%1000000 end, case when abs(checksum(newid()))%7 = 0 then null else checksum(newid())%1000000 end, case when abs(checksum(newid()))%7 = 0 then null else checksum(newid())%1000000 end, case when abs(checksum(newid()))%7 = 0 then null else checksum(newid())%1000000 endfrom -- Load one million rows of test data. -- Number table function here -- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685 dbo.F_TABLE_NUMBER_RANGE(1,1000000)goprint 'Find rows that do not match for Method 1 and Method 2'select out1.*, out2.*from(-- Method 1, using a subquery with a maxselect a.number, a.Val1, a.Val2, a.Val3, a.Val4, [Max_of_Val1_to_Val4] = ( select X1= max(bb.xx) from ( select xx = a.Val1 where a.Val1 is not null union all select xx = a.Val2 where a.Val2 is not null union all select xx = a.Val3 where a.Val3 is not null union all select xx = a.Val4 where a.Val4 is not null ) bb )from #t a) out1join(-- Method 2, using a caseselect a.number, a.Val1, a.Val2, a.Val3, a.Val4, [Max_of_Val1_to_Val4] = case when a.Val1 is not null and (a.Val1 >= a.Val2 or a.Val2 is null) and (a.Val1 >= a.Val3 or a.Val3 is null) and (a.Val1 >= a.Val4 or a.Val4 is null) then a.Val1 when a.Val2 is not null and (a.Val2 >= a.Val1 or a.Val1 is null) and (a.Val2 >= a.Val3 or a.Val3 is null) and (a.Val2 >= a.Val4 or a.Val4 is null) then a.Val2 when a.Val3 is not null and (a.Val3 >= a.Val1 or a.Val1 is null) and (a.Val3 >= a.Val2 or a.Val2 is null) and (a.Val3 >= a.Val4 or a.Val4 is null) then a.Val3 when a.Val4 is not null and (a.Val4 >= a.Val1 or a.Val1 is null) and (a.Val4 >= a.Val2 or a.Val2 is null) and (a.Val4 >= a.Val3 or a.Val3 is null) then a.Val4 else null endfrom #t a) out2on out1.number = out2.numberwhere -- Look for results that do not match (out1.[Max_of_Val1_to_Val4] is null and out2.[Max_of_Val1_to_Val4] is not null) or (out1.[Max_of_Val1_to_Val4] is not null and out2.[Max_of_Val1_to_Val4] is null) or out1.[Max_of_Val1_to_Val4] <> out2.[Max_of_Val1_to_Val4]goprint 'Find count of rows with different columns null'print 'Should have a rowcount of 16 to test all conditions'select Null_Column_Conditions = case when Val1 is null then 0 else 1000 end+ case when Val2 is null then 0 else 0100 end+ case when Val3 is null then 0 else 0010 end+ case when Val4 is null then 0 else 0001 end, count(*)from #tgroup by case when Val1 is null then 0 else 1000 end+ case when Val2 is null then 0 else 0100 end+ case when Val3 is null then 0 else 0010 end+ case when Val4 is null then 0 else 0001 endorder by 1godrop table #t Results:Create table to hold test dataLoad test data(1000000 row(s) affected)Find rows that do not match for Method 1 and Method 2(0 row(s) affected)Find count of rows with different columns nullShould have a rowcount of 16 to test all conditionsNull_Column_Conditions ---------------------- ----------- 0 3951 244410 256011 14760100 2400101 14955110 14843111 902061000 25181001 148571010 149891011 902561100 151001101 896591110 897831111 540275(16 row(s) affected) CODO ERGO SUM |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-07-25 : 18:31:26
|
RMHCOMMON? E 12°55'05.76"N 56°04'39.42" |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-07-26 : 18:08:26
|
I ran the following script with 10,000, 100,000, and 1,000,000 rows of test data to test the relative performance of the Subquery method vs. the CASE method.It looks like there is a large difference, 5 to 1, in run time with 1,000,000 rows.With 10,000 rows, and 100,000 rows the Subquery method sometimes is faster than the CASE method. This may be because it takes longer to compile a query plan; it is slower on the first run, but faster on the second run.drop table #tgoprint 'Create table to hold test data'create table #t ( number int not null primary key clustered, Val1 int, Val2 int, Val3 int, Val4 int, MaxVal int)GOprint 'Load test data'declare @t table ( number int not null primary key clustered, Val1 int, Val2 int, Val3 int, Val4 int)insert into @tselect top 100 percent number, -- Generate random numbers -- with about 1/7th null Val1 = case when abs(checksum(newid()))%7 = 0 then null else checksum(newid())%1000000 end, Val2 = case when abs(checksum(newid()))%7 = 0 then null else checksum(newid())%1000000 end, Val3 = case when abs(checksum(newid()))%7 = 0 then null else checksum(newid())%1000000 end, Val4 = case when abs(checksum(newid()))%7 = 0 then null else checksum(newid())%1000000 endfrom -- Load one million rows of test data. -- Number table function here -- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685 dbo.F_TABLE_NUMBER_RANGE(1,1000000) aorder by a.numberinsert into #tselect top 100 percent aa.number, aa.Val1, aa.Val2, aa.Val3, aa.Val4, [MaxVal] = ( select X1= max(bb.xx) from ( select xx = aa.Val1 where aa.Val1 is not null union all select xx = aa.Val2 where aa.Val2 is not null union all select xx = aa.Val3 where aa.Val3 is not null union all select xx = aa.Val4 where aa.Val4 is not null ) bb )from @t aaorder by aa.numberGOprint 'Scan table to cache data'declare @st datetimedeclare @count intset @st = getdate()select @count = max(isnull(aa.MaxVal,0))+ max(isnull(aa.Val1,0))+ max(isnull(aa.Val2,0))+ max(isnull(aa.Val3,0))+ max(isnull(aa.Val4,0))from #t aaselect Elapsed_Time_SCAN = right(convert(varchar(30),getdate()-@st,121),12)goprint 'Test Elapsed time for Method 1, using Subquery'declare @st datetimedeclare @count intset @st = getdate()select @count = count(*)from ( select a.*, [Max_of_Val1_to_Val4] = ( select X1= max(bb.xx) from ( select xx = a.Val1 where a.Val1 is not null union all select xx = a.Val2 where a.Val2 is not null union all select xx = a.Val3 where a.Val3 is not null union all select xx = a.Val4 where a.Val4 is not null ) bb ) from #t a ) aawhere aa.MaxVal <> aa.[Max_of_Val1_to_Val4]select Elapsed_Time_Subquery = right(convert(varchar(30),getdate()-@st,121),12)goprint 'Test Elapsed time for Method 2, using CASE'declare @st datetimedeclare @count intset @st = getdate()select @count = count(*)from ( select a.*, [Max_of_Val1_to_Val4] = case when a.Val1 is not null and (a.Val1 >= a.Val2 or a.Val2 is null) and (a.Val1 >= a.Val3 or a.Val3 is null) and (a.Val1 >= a.Val4 or a.Val4 is null) then a.Val1 when a.Val2 is not null and (a.Val2 >= a.Val1 or a.Val1 is null) and (a.Val2 >= a.Val3 or a.Val3 is null) and (a.Val2 >= a.Val4 or a.Val4 is null) then a.Val2 when a.Val3 is not null and (a.Val3 >= a.Val1 or a.Val1 is null) and (a.Val3 >= a.Val2 or a.Val2 is null) and (a.Val3 >= a.Val4 or a.Val4 is null) then a.Val3 when a.Val4 is not null and (a.Val4 >= a.Val1 or a.Val1 is null) and (a.Val4 >= a.Val2 or a.Val2 is null) and (a.Val4 >= a.Val3 or a.Val3 is null) then a.Val4 else null end from #t a ) aawhere aa.MaxVal <> aa.[Max_of_Val1_to_Val4]select Elapsed_Time_CASE = right(convert(varchar(30),getdate()-@st,121),12)goprint 'Test Elapsed time for Method 1, using Subquery'declare @st datetimedeclare @count intset @st = getdate()select @count = count(*)from ( select a.*, [Max_of_Val1_to_Val4] = ( select X1= max(bb.xx) from ( select xx = a.Val1 where a.Val1 is not null union all select xx = a.Val2 where a.Val2 is not null union all select xx = a.Val3 where a.Val3 is not null union all select xx = a.Val4 where a.Val4 is not null ) bb ) from #t a ) aawhere aa.MaxVal <> aa.[Max_of_Val1_to_Val4]select Elapsed_Time_Subquery = right(convert(varchar(30),getdate()-@st,121),12)goprint 'Test Elapsed time for Method 2, using CASE'declare @st datetimedeclare @count intset @st = getdate()select @count = count(*)from ( select a.*, [Max_of_Val1_to_Val4] = case when a.Val1 is not null and (a.Val1 >= a.Val2 or a.Val2 is null) and (a.Val1 >= a.Val3 or a.Val3 is null) and (a.Val1 >= a.Val4 or a.Val4 is null) then a.Val1 when a.Val2 is not null and (a.Val2 >= a.Val1 or a.Val1 is null) and (a.Val2 >= a.Val3 or a.Val3 is null) and (a.Val2 >= a.Val4 or a.Val4 is null) then a.Val2 when a.Val3 is not null and (a.Val3 >= a.Val1 or a.Val1 is null) and (a.Val3 >= a.Val2 or a.Val2 is null) and (a.Val3 >= a.Val4 or a.Val4 is null) then a.Val3 when a.Val4 is not null and (a.Val4 >= a.Val1 or a.Val1 is null) and (a.Val4 >= a.Val2 or a.Val2 is null) and (a.Val4 >= a.Val3 or a.Val3 is null) then a.Val4 else null end from #t a ) aawhere aa.MaxVal <> aa.[Max_of_Val1_to_Val4]select Elapsed_Time_CASE = right(convert(varchar(30),getdate()-@st,121),12)go Results:---------------------------------------------------------------------- Test with 10,000 rows of data--------------------------------------------------------------------Create table to hold test dataLoad test data(10000 row(s) affected)(10000 row(s) affected)Scan table to cache dataElapsed_Time_SCAN ----------------- 00:00:00.017(1 row(s) affected)Test Elapsed time for Method 1, using SubqueryElapsed_Time_Subquery --------------------- 00:00:00.080(1 row(s) affected)Test Elapsed time for Method 2, using CASEElapsed_Time_CASE ----------------- 00:00:00.110(1 row(s) affected)Test Elapsed time for Method 1, using SubqueryElapsed_Time_Subquery --------------------- 00:00:00.063(1 row(s) affected)Test Elapsed time for Method 2, using CASEElapsed_Time_CASE ----------------- 00:00:00.017(1 row(s) affected)---------------------------------------------------------------------- Test with 100,000 rows of data--------------------------------------------------------------------Create table to hold test dataLoad test data(100000 row(s) affected)(100000 row(s) affected)Scan table to cache dataElapsed_Time_SCAN ----------------- 00:00:00.093(1 row(s) affected)Test Elapsed time for Method 1, using SubqueryElapsed_Time_Subquery --------------------- 00:00:00.843(1 row(s) affected)Test Elapsed time for Method 2, using CASEElapsed_Time_CASE ----------------- 00:00:00.907(1 row(s) affected)Test Elapsed time for Method 1, using SubqueryElapsed_Time_Subquery --------------------- 00:00:00.657(1 row(s) affected)Test Elapsed time for Method 2, using CASEElapsed_Time_CASE ----------------- 00:00:00.140(1 row(s) affected)---------------------------------------------------------------------- Test with 1,000,000 rows of data--------------------------------------------------------------------Create table to hold test dataLoad test data(1000000 row(s) affected)(1000000 row(s) affected)Scan table to cache dataElapsed_Time_SCAN ----------------- 00:00:00.923(1 row(s) affected)Test Elapsed time for Method 1, using SubqueryElapsed_Time_Subquery --------------------- 00:00:06.607(1 row(s) affected)Test Elapsed time for Method 2, using CASEElapsed_Time_CASE ----------------- 00:00:02.327(1 row(s) affected)Test Elapsed time for Method 1, using SubqueryElapsed_Time_Subquery --------------------- 00:00:06.640(1 row(s) affected)Test Elapsed time for Method 2, using CASEElapsed_Time_CASE ----------------- 00:00:01.373(1 row(s) affected) CODO ERGO SUM |
|
|
DLTaylor
Posting Yak Master
136 Posts |
Posted - 2009-03-27 : 10:07:26
|
Thanks to both of you.I will read into replies and let you know how i get on.Thanks |
|
|
deviji
Starting Member
7 Posts |
Posted - 2009-08-07 : 05:33:28
|
Nice Post.say if i want to insert / update the result of the above query how can i do it?that is calculating max / min or a row and update the same into the same row.create table #t(v1 int,v2 int,v3 int,max1 int,min1 int,tot int,avg int)in the above table calculate the max of the 3 values and store in the same table.and i want to know which column either v1 or v2 or v3 score the maximum...Thanks,RegardsViji |
|
|
deviji
Starting Member
7 Posts |
Posted - 2009-08-07 : 05:33:38
|
Nice Post.say if i want to insert / update the result of the above query how can i do it?that is calculating max / min or a row and update the same into the same row.create table #t(v1 int,v2 int,v3 int,max1 int,min1 int,tot int,avg int)in the above table calculate the max of the 3 values and store in the same table.and i want to know which column either v1 or v2 or v3 score the maximum...Thanks,RegardsViji |
|
|
|
|
|
|
|