Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Script Library
 MIN/MAX Across Multiple Columns

Author  Topic 

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-07-25 : 17:55:42
I have seen questions posted a number of times where someone wants to find the maximum or minimum value from a set of columns in a single row of a table.

This script demonstrates two methods for finding the maximum value in a row across a set of columns in the row when any or all of the columns are allowed to be null or equal.

Method 1 uses a UNION ALL sub query for all the columns with a MAX. It is much simpler to code and test, especially when you get much past 4 columns. Adding another column is as simple as adding one more SELECT to the subquery.

Method 2 uses a CASE statement to determine the MAX. It is much more complex to code (and test), and gets exponentially harder to code as the number of columns goes up. I think 5 or 6 columns may be about the limit of complexity of coding that you would want to take on. One advantage of this script is that you can use the simpler to code Method 1 to test the more complex code for the Method 2 if you choose to implement it as a CASE statement.


If you have another method you would like to contribute, feel free. Also, if anyone wants to post performance test results, that would be nice.



print 'Create table to hold test data'
create table #t (
number int not null primary key clustered,
Val1 int,
Val2 int,
Val3 int,
Val4 int
)
GO
print 'Load test data'
insert into #t
select
number,
-- Generate random numbers
-- with about 1/7th null
case
when abs(checksum(newid()))%7 = 0
then null
else checksum(newid())%1000000
end,
case
when abs(checksum(newid()))%7 = 0
then null
else checksum(newid())%1000000
end,
case
when abs(checksum(newid()))%7 = 0
then null
else checksum(newid())%1000000
end,
case
when abs(checksum(newid()))%7 = 0
then null
else checksum(newid())%1000000
end
from
-- Load one million rows of test data.
-- Number table function here
-- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685
dbo.F_TABLE_NUMBER_RANGE(1,1000000)
go
print 'Find rows that do not match for Method 1 and Method 2'
select
out1.*,
out2.*
from
(
-- Method 1, using a subquery with a max
select
a.number,
a.Val1,
a.Val2,
a.Val3,
a.Val4,
[Max_of_Val1_to_Val4] =
(
select
X1= max(bb.xx)
from
(
select xx = a.Val1 where a.Val1 is not null union all
select xx = a.Val2 where a.Val2 is not null union all
select xx = a.Val3 where a.Val3 is not null union all
select xx = a.Val4 where a.Val4 is not null
) bb
)

from
#t a
) out1
join
(
-- Method 2, using a case
select
a.number,
a.Val1,
a.Val2,
a.Val3,
a.Val4,
[Max_of_Val1_to_Val4] =
case
when a.Val1 is not null and
(a.Val1 >= a.Val2 or a.Val2 is null) and
(a.Val1 >= a.Val3 or a.Val3 is null) and
(a.Val1 >= a.Val4 or a.Val4 is null)
then a.Val1
when a.Val2 is not null and
(a.Val2 >= a.Val1 or a.Val1 is null) and
(a.Val2 >= a.Val3 or a.Val3 is null) and
(a.Val2 >= a.Val4 or a.Val4 is null)
then a.Val2
when a.Val3 is not null and
(a.Val3 >= a.Val1 or a.Val1 is null) and
(a.Val3 >= a.Val2 or a.Val2 is null) and
(a.Val3 >= a.Val4 or a.Val4 is null)
then a.Val3
when a.Val4 is not null and
(a.Val4 >= a.Val1 or a.Val1 is null) and
(a.Val4 >= a.Val2 or a.Val2 is null) and
(a.Val4 >= a.Val3 or a.Val3 is null)
then a.Val4
else null
end
from
#t a
) out2
on out1.number = out2.number
where
-- Look for results that do not match
(out1.[Max_of_Val1_to_Val4] is null and out2.[Max_of_Val1_to_Val4] is not null) or
(out1.[Max_of_Val1_to_Val4] is not null and out2.[Max_of_Val1_to_Val4] is null) or
out1.[Max_of_Val1_to_Val4] <> out2.[Max_of_Val1_to_Val4]
go
print 'Find count of rows with different columns null'
print 'Should have a rowcount of 16 to test all conditions'
select
Null_Column_Conditions =
case when Val1 is null then 0 else 1000 end+
case when Val2 is null then 0 else 0100 end+
case when Val3 is null then 0 else 0010 end+
case when Val4 is null then 0 else 0001 end,
count(*)
from
#t
group by
case when Val1 is null then 0 else 1000 end+
case when Val2 is null then 0 else 0100 end+
case when Val3 is null then 0 else 0010 end+
case when Val4 is null then 0 else 0001 end
order by
1
go
drop table #t




Results:


Create table to hold test data
Load test data

(1000000 row(s) affected)

Find rows that do not match for Method 1 and Method 2

(0 row(s) affected)

Find count of rows with different columns null
Should have a rowcount of 16 to test all conditions
Null_Column_Conditions
---------------------- -----------
0 395
1 2444
10 2560
11 14760
100 2400
101 14955
110 14843
111 90206
1000 2518
1001 14857
1010 14989
1011 90256
1100 15100
1101 89659
1110 89783
1111 540275

(16 row(s) affected)






CODO ERGO SUM

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-07-25 : 18:31:26
RMHCOMMON?



E 12°55'05.76"
N 56°04'39.42"
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2007-07-26 : 18:08:26
I ran the following script with 10,000, 100,000, and 1,000,000 rows of test data to test the relative performance of the Subquery method vs. the CASE method.

It looks like there is a large difference, 5 to 1, in run time with 1,000,000 rows.

With 10,000 rows, and 100,000 rows the Subquery method sometimes is faster than the CASE method. This may be because it takes longer to compile a query plan; it is slower on the first run, but faster on the second run.


drop table #t
go
print 'Create table to hold test data'
create table #t (
number int not null primary key clustered,
Val1 int,
Val2 int,
Val3 int,
Val4 int,
MaxVal int
)
GO
print 'Load test data'
declare @t table (
number int not null primary key clustered,
Val1 int,
Val2 int,
Val3 int,
Val4 int
)

insert into @t
select top 100 percent
number,
-- Generate random numbers
-- with about 1/7th null
Val1 =
case
when abs(checksum(newid()))%7 = 0
then null
else checksum(newid())%1000000
end,
Val2 =
case
when abs(checksum(newid()))%7 = 0
then null
else checksum(newid())%1000000
end,
Val3 =
case
when abs(checksum(newid()))%7 = 0
then null
else checksum(newid())%1000000
end,
Val4 =
case
when abs(checksum(newid()))%7 = 0
then null
else checksum(newid())%1000000
end
from
-- Load one million rows of test data.
-- Number table function here
-- http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685
dbo.F_TABLE_NUMBER_RANGE(1,1000000) a
order by
a.number

insert into #t
select top 100 percent
aa.number,
aa.Val1,
aa.Val2,
aa.Val3,
aa.Val4,
[MaxVal] =
(
select
X1= max(bb.xx)
from
(
select xx = aa.Val1 where aa.Val1 is not null union all
select xx = aa.Val2 where aa.Val2 is not null union all
select xx = aa.Val3 where aa.Val3 is not null union all
select xx = aa.Val4 where aa.Val4 is not null
) bb
)
from
@t aa
order by
aa.number
GO
print 'Scan table to cache data'
declare @st datetime
declare @count int
set @st = getdate()

select
@count =
max(isnull(aa.MaxVal,0))+
max(isnull(aa.Val1,0))+
max(isnull(aa.Val2,0))+
max(isnull(aa.Val3,0))+
max(isnull(aa.Val4,0))
from
#t aa


select Elapsed_Time_SCAN = right(convert(varchar(30),getdate()-@st,121),12)

go
print 'Test Elapsed time for Method 1, using Subquery'
declare @st datetime
declare @count int
set @st = getdate()

select
@count = count(*)
from
(
select
a.*,
[Max_of_Val1_to_Val4] =
(
select
X1= max(bb.xx)
from
(
select xx = a.Val1 where a.Val1 is not null union all
select xx = a.Val2 where a.Val2 is not null union all
select xx = a.Val3 where a.Val3 is not null union all
select xx = a.Val4 where a.Val4 is not null
) bb
)

from
#t a
) aa
where
aa.MaxVal <> aa.[Max_of_Val1_to_Val4]

select Elapsed_Time_Subquery = right(convert(varchar(30),getdate()-@st,121),12)

go
print 'Test Elapsed time for Method 2, using CASE'
declare @st datetime
declare @count int
set @st = getdate()

select
@count = count(*)
from
(
select
a.*,
[Max_of_Val1_to_Val4] =
case
when a.Val1 is not null and
(a.Val1 >= a.Val2 or a.Val2 is null) and
(a.Val1 >= a.Val3 or a.Val3 is null) and
(a.Val1 >= a.Val4 or a.Val4 is null)
then a.Val1
when a.Val2 is not null and
(a.Val2 >= a.Val1 or a.Val1 is null) and
(a.Val2 >= a.Val3 or a.Val3 is null) and
(a.Val2 >= a.Val4 or a.Val4 is null)
then a.Val2
when a.Val3 is not null and
(a.Val3 >= a.Val1 or a.Val1 is null) and
(a.Val3 >= a.Val2 or a.Val2 is null) and
(a.Val3 >= a.Val4 or a.Val4 is null)
then a.Val3
when a.Val4 is not null and
(a.Val4 >= a.Val1 or a.Val1 is null) and
(a.Val4 >= a.Val2 or a.Val2 is null) and
(a.Val4 >= a.Val3 or a.Val3 is null)
then a.Val4
else null
end
from
#t a
) aa
where
aa.MaxVal <> aa.[Max_of_Val1_to_Val4]

select Elapsed_Time_CASE = right(convert(varchar(30),getdate()-@st,121),12)
go
print 'Test Elapsed time for Method 1, using Subquery'
declare @st datetime
declare @count int
set @st = getdate()

select
@count = count(*)
from
(
select
a.*,
[Max_of_Val1_to_Val4] =
(
select
X1= max(bb.xx)
from
(
select xx = a.Val1 where a.Val1 is not null union all
select xx = a.Val2 where a.Val2 is not null union all
select xx = a.Val3 where a.Val3 is not null union all
select xx = a.Val4 where a.Val4 is not null
) bb
)

from
#t a
) aa
where
aa.MaxVal <> aa.[Max_of_Val1_to_Val4]

select Elapsed_Time_Subquery = right(convert(varchar(30),getdate()-@st,121),12)

go
print 'Test Elapsed time for Method 2, using CASE'
declare @st datetime
declare @count int
set @st = getdate()

select
@count = count(*)
from
(
select
a.*,
[Max_of_Val1_to_Val4] =
case
when a.Val1 is not null and
(a.Val1 >= a.Val2 or a.Val2 is null) and
(a.Val1 >= a.Val3 or a.Val3 is null) and
(a.Val1 >= a.Val4 or a.Val4 is null)
then a.Val1
when a.Val2 is not null and
(a.Val2 >= a.Val1 or a.Val1 is null) and
(a.Val2 >= a.Val3 or a.Val3 is null) and
(a.Val2 >= a.Val4 or a.Val4 is null)
then a.Val2
when a.Val3 is not null and
(a.Val3 >= a.Val1 or a.Val1 is null) and
(a.Val3 >= a.Val2 or a.Val2 is null) and
(a.Val3 >= a.Val4 or a.Val4 is null)
then a.Val3
when a.Val4 is not null and
(a.Val4 >= a.Val1 or a.Val1 is null) and
(a.Val4 >= a.Val2 or a.Val2 is null) and
(a.Val4 >= a.Val3 or a.Val3 is null)
then a.Val4
else null
end
from
#t a
) aa
where
aa.MaxVal <> aa.[Max_of_Val1_to_Val4]

select Elapsed_Time_CASE = right(convert(varchar(30),getdate()-@st,121),12)
go


Results:

--------------------------------------------------------------------
-- Test with 10,000 rows of data
--------------------------------------------------------------------
Create table to hold test data
Load test data

(10000 row(s) affected)


(10000 row(s) affected)

Scan table to cache data
Elapsed_Time_SCAN
-----------------
00:00:00.017

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery
---------------------
00:00:00.080

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE
-----------------
00:00:00.110

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery
---------------------
00:00:00.063

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE
-----------------
00:00:00.017

(1 row(s) affected)



--------------------------------------------------------------------
-- Test with 100,000 rows of data
--------------------------------------------------------------------
Create table to hold test data
Load test data

(100000 row(s) affected)


(100000 row(s) affected)

Scan table to cache data
Elapsed_Time_SCAN
-----------------
00:00:00.093

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery
---------------------
00:00:00.843

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE
-----------------
00:00:00.907

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery
---------------------
00:00:00.657

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE
-----------------
00:00:00.140

(1 row(s) affected)



--------------------------------------------------------------------
-- Test with 1,000,000 rows of data
--------------------------------------------------------------------

Create table to hold test data
Load test data

(1000000 row(s) affected)


(1000000 row(s) affected)

Scan table to cache data
Elapsed_Time_SCAN
-----------------
00:00:00.923

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery
---------------------
00:00:06.607

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE
-----------------
00:00:02.327

(1 row(s) affected)

Test Elapsed time for Method 1, using Subquery
Elapsed_Time_Subquery
---------------------
00:00:06.640

(1 row(s) affected)

Test Elapsed time for Method 2, using CASE
Elapsed_Time_CASE
-----------------
00:00:01.373

(1 row(s) affected)




CODO ERGO SUM
Go to Top of Page

DLTaylor
Posting Yak Master

136 Posts

Posted - 2009-03-27 : 10:07:26
Thanks to both of you.
I will read into replies and let you know how i get on.
Thanks
Go to Top of Page

deviji
Starting Member

7 Posts

Posted - 2009-08-07 : 05:33:28
Nice Post.
say if i want to insert / update the result of the above query how can i do it?
that is calculating max / min or a row and update the same into the same row.

create table #t(v1 int,
v2 int,
v3 int,
max1 int,
min1 int,
tot int,
avg int)

in the above table calculate the max of the 3 values and store in the same table.
and i want to know which column either v1 or v2 or v3 score the maximum...

Thanks,
Regards
Viji
Go to Top of Page

deviji
Starting Member

7 Posts

Posted - 2009-08-07 : 05:33:38
Nice Post.
say if i want to insert / update the result of the above query how can i do it?
that is calculating max / min or a row and update the same into the same row.

create table #t(v1 int,
v2 int,
v3 int,
max1 int,
min1 int,
tot int,
avg int)

in the above table calculate the max of the 3 values and store in the same table.
and i want to know which column either v1 or v2 or v3 score the maximum...

Thanks,
Regards
Viji
Go to Top of Page
   

- Advertisement -