Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-02-05 : 18:20:01
|
SQL Server User defined functions can be a powerful tool, but they can also create a substantial performance penalty in a query where they are called a large number of times. Sometimes it is something that must be accepted to get the job done, but there is often an alternative of putting the code from the function “in-line” in a SQL query. That also has a penalty in development time, so judgment about which way to go is called for.I did some testing on three different methods of converting combinations of integer values of Year, Month, Day, Hour, Minute, and Second to Datetime values and compared the runtime of each. In the first method, I did the conversion in-line in the query. In the second method, I used a UDF to do the conversion using the same algorithm as the in-line query. In the third method, I used a UDF that called two more UDFs to do the conversion.To perform the test, I loaded a table with 3,999,969 randomly generated date/times, along with the matching year, month, day, hour, minute, and seconds, in the range of 1753-01-01 to 9999-12-31. I re-indexed the table with fill factor of 100 to make the physical size as small as possible.For the actual test, I ran queries that converted the year, month, day, hour, minute, and second on each row to a datatime, and compared it to the datetime from that row. I ran the query using the in-line conversion, single UDF (DateTime1), and with the UDF (DateTime2) that called two more UDFs (Date and Time). I ran the tests several times, and saw only minor variations in run time. The single UDF took over 8 times as long to run as the in-line conversion. The test with the UDF that called other UDFs took over 36 times as long to run as the in-line conversion, and took over 4 times as long to run as the single UDF.These results show that there can be a substantial performance penalty for using a UDF in place of in-line code, and that UDFs that call other UDFs can also have a substantial performance penalty compared to a UDF that does not call other UDFs.Code to load table with test data. The functions used in the script to load the test data can be found on these links:Random Datetime Function:http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=69499Number Table Function:http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685create table T_DATE ([DATE] datetime not null primary key clustered ,[YEAR] smallint not null ,[MONTH] tinyint not null ,[DAY] tinyint not null ,[HOUR] tinyint not null ,[MINUTE] tinyint not null ,[SECOND] tinyint not null )insert into T_DATEselect distinct top 100 percent [DATE] = dateadd(ms,-datepart(ms,a.[DATE]),a.[DATE]), [YEAR] = year(a.[DATE]), [MONTH] = month(a.[DATE]), [DAY] = day(a.[DATE]), [HOUR] = datepart(hour,a.[DATE]), [MINUTE] = datepart(minute,a.[DATE]), [SECOND] = datepart(second,a.[DATE])from(select top 100 percent [DATE] = [dbo].[F_RANDOM_DATETIME]( '17530101', '99991231',newid() )from f_table_number_range(1,4000000) aaorder by 1) aorder by a.[DATE]dbcc dbreindex(T_DATE,'',100)exec sp_spaceused 'T_DATE','true'select count(*) from T_DATE Code to create functions used in the test. These functions are based on functions that Jeff posted in his blog on this link, modified with some suggestions of mine:http://weblogs.sqlteam.com/jeffs/archive/2007/01/02/56079.aspxcreate function DateTime1(@Year int, @Month int, @Day int, @Hour int, @Minute int, @Second int)-- returns a dateTime value for the date and time specified.returns datetimeasbeginreturn dateadd(month,((@Year-1900)*12)+@Month-1, dateadd(ss,(@Hour*3600)+(@Minute*60)+@Second,@Day-1))endgocreate function Date(@Year int, @Month int, @Day int)-- returns a datetime value for the specified year, month and dayreturns datetimeasbeginreturn dateadd(month,((@Year-1900)*12)+@Month-1,@Day-1)endgocreate function Time(@Hour int, @Minute int, @Second int)-- Returns a datetime value for the specified time at the "base" date (1/1/1900)returns datetimeasbeginreturn dateadd(ss,(@Hour*3600)+(@Minute*60)+@Second,0)endgocreate function DateTime2(@Year int, @Month int, @Day int, @Hour int, @Minute int, @Second int)-- returns a dateTime value for the date and time specified.returns datetimeasbeginreturn dbo.Date(@Year,@Month,@Day) + dbo.Time(@Hour, @Minute,@Second)endgo Test code:set nocount ongoselect [T_DATE Rowcount] = count(*) from T_DATEgodeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> 0+a.[DATE]select [MS No Action] = datediff(ms,0,getdate()-@st)godeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> dateadd(month,((a.YEAR-1900)*12)+a.MONTH-1, dateadd(ss,(a.HOUR*3600)+(a.MINUTE*60)+a.SECOND,a.DAY-1))select [MS No Function] = datediff(ms,0,getdate()-@st)godeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> dbo.DateTime1(a.YEAR,a.MONTH,a.DAY,a.HOUR,a.MINUTE,a.SECOND)select [MS DateTime1] = datediff(ms,0,getdate()-@st)godeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> dbo.DateTime2(a.YEAR,a.MONTH,a.DAY,a.HOUR,a.MINUTE,a.SECOND)select [MS DateTime2] = datediff(ms,0,getdate()-@st)go Sample test results:T_DATE Rowcount --------------- 3999969MS No Action ------------ 1773MS No Function -------------- 9923MS DateTime1 ------------ 82213MS DateTime2 ------------ 357683 CODO ERGO SUM |
|
Kristen
Test
22859 Posts |
Posted - 2007-02-06 : 10:42:40
|
MVJ: Very useful, if somewhat scary!I'm interested to know the effect in the SELECT, rather than the WHERE. Could you run a test something like this please?declare @MyDate datetimedeclare @st datetimeselect @st = getdate()select @MyDate = dbo.DateTime2(a.YEAR,a.MONTH,a.DAY,a.HOUR,a.MINUTE,a.SECOND)from T_DATE aselect [Test Label] = datediff(ms,0,getdate()-@st)go ThanksKristen |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-02-06 : 22:53:37
|
Here is the test code and the results of two test runs. This testing was conducted on my fairly old home computer.As you can see, it didn't make much difference in the relative runtime.I'm sure that a fast, modern server could do much better with the UDFs, but I thought this was a good illustration of the potential impact. The results would certainly make me think twice before using a UDF in a 10 million row ETL process. I would want to at least test the in-line alternative. And definitely watch out for functions that call other functions or are recursive.set nocount ongoselect [T_DATE Rowcount] = count(*) from T_DATEgodeclare @MyDate datetimedeclare @st datetimeselect @st = getdate()select @MyDate = 0+a.[DATE]from T_DATE aselect [MS No Action] = datediff(ms,0,getdate()-@st)godeclare @MyDate datetimedeclare @st datetimeselect @st = getdate()select @MyDate = dateadd(month,((a.YEAR-1900)*12)+a.MONTH-1, dateadd(ss,(a.HOUR*3600)+(a.MINUTE*60)+a.SECOND,a.DAY-1))from T_DATE aselect [MS No Function] = datediff(ms,0,getdate()-@st)godeclare @MyDate datetimedeclare @st datetimeselect @st = getdate()select @MyDate = dbo.DateTime1(a.YEAR,a.MONTH,a.DAY,a.HOUR,a.MINUTE,a.SECOND)from T_DATE aselect [MS DateTime1] = datediff(ms,0,getdate()-@st)godeclare @MyDate datetimedeclare @st datetimeselect @st = getdate()select @MyDate = dbo.DateTime2(a.YEAR,a.MONTH,a.DAY,a.HOUR,a.MINUTE,a.SECOND)from T_DATE aselect [MS DateTime2] = datediff(ms,0,getdate()-@st)go Test results:Test #1T_DATE Rowcount --------------- 3999962MS No Action ------------ 3086MS No Function -------------- 13366MS DateTime1 ------------ 80890MS DateTime2 ------------ 338186Test #2T_DATE Rowcount --------------- 3999962MS No Action ------------ 3083MS No Function -------------- 13060MS DateTime1 ------------ 80830MS DateTime2 ------------ 337833 CODO ERGO SUM |
|
|
Jeff Moden
Aged Yak Warrior
652 Posts |
Posted - 2007-02-07 : 00:53:52
|
I'll have to dig out the examples I did in an "exchange of ideas" with another... on a modern server, many of the UDF's beat the inline code depending, of course, on how the UDF was written and some made it worse.--Jeff Moden |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-02-07 : 02:29:05
|
Thanks for that MVJ"As you can see, it didn't make much difference in the relative runtime"Am I reading it wrongly? Looked like a big difference to me.Kristen |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-02-07 : 02:49:53
|
[code]Test #1 Test #2 Diff in percentT_DATE Rowcount T_DATE Rowcount --------------- --------------- --------------- 3999962 3999962 0.00%MS No Action MS No Action ------------ ------------ --------------- 3086 3083 0.09%MS No Function MS No Function -------------- ------------ --------------- 13366 13060 2.29%MS DateTime1 MS DateTime1 ------------ ------------ --------------- 80890 80830 0.07%MS DateTime2 MS DateTime2 ------------ ------------ --------------- 338186 337833 0.10%[/code]Peter LarssonHelsingborg, Sweden |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-02-07 : 03:58:42
|
Ah, thanks Peso.I was comparing [MS No Function] and [MS DateTime1] and deciding that this type of function in the SELECT was rubbish!I had already decided it was Rubbish in the WHERE clause ...So, UDFs are a bad idea then ... except for the ones that Jeff mentions.We've centralised quite a lot of critical code in UDFs, I can see I need to go review that decision. Blast!Kristen |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-02-07 : 14:17:11
|
I would not go as far as to say that UDFs are bad ideas, only that there are situations where the performance penalty may justify using inline code. Also, there may be situations where it is just not practical to use inline code.By the way, I think this really only applies to scalar UDFs, not table valued UDFs.CODO ERGO SUM |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-02-07 : 17:01:42
|
In order to see if the server made much difference, I ran the test script below on a more powerful system: Dual 3.4 GHz Xeon CPU, 3.5 GB of RAM, Windows Server 2003 SP1, SQL Server 2000 Enterprise Edition, SP4.I increased the size of the test table to 9,999,792 rows to give it more data to work with. These tests were run on an idle system. The tests all ran much faster; no surprise since the CPU and other resources are much faster. I ran the test 3 times, and there was very little variation in the results.The results show that there was an even larger difference between the run time of the inline code, compared to the two functions. The run time using the DateTime1 function was over 26 times as long as the inline code, and run time using the DateTime2 function was over 106 times as long as the inline code. The run time using DateTime2, the function that calls two more functions, was almost 4 times as long a DateTime1.This test confirms the results of my first series of tests on my desktop computer, and shows an even greater performance penalty for the functions compared to the inline code.Test Script:set nocount ongoselect [T_DATE Rowcount] = count(*) from T_DATEgodeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> 0+a.[DATE]select [MS No Action] = datediff(ms,0,getdate()-@st), [Error Count] = @count godeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> dateadd(month,((a.YEAR-1900)*12)+a.MONTH-1, dateadd(ss,(a.HOUR*3600)+(a.MINUTE*60)+a.SECOND,a.DAY-1))select [MS No Function] = datediff(ms,0,getdate()-@st), [Error Count] = @count godeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> dbo.DateTime1(a.YEAR,a.MONTH,a.DAY,a.HOUR,a.MINUTE,a.SECOND)select [MS DateTime1] = datediff(ms,0,getdate()-@st), [Error Count] = @count godeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> dbo.DateTime2(a.YEAR,a.MONTH,a.DAY,a.HOUR,a.MINUTE,a.SECOND)select [MS DateTime2] = datediff(ms,0,getdate()-@st), [Error Count] = @count go Test Results:Test #1T_DATE Rowcount --------------- 9999792MS No Action Error Count ------------ ----------- 593 0MS No Function Error Count -------------- ----------- 2860 0MS DateTime1 Error Count ------------ ----------- 77186 0MS DateTime2 Error Count ------------ ----------- 304840 0Test #2T_DATE Rowcount --------------- 9999792MS No Action Error Count ------------ ----------- 580 0MS No Function Error Count -------------- ----------- 2856 0MS DateTime1 Error Count ------------ ----------- 78720 0MS DateTime2 Error Count ------------ ----------- 304353 0Test #3T_DATE Rowcount --------------- 9999792MS No Action Error Count ------------ ----------- 580 0MS No Function Error Count -------------- ----------- 2873 0MS DateTime1 Error Count ------------ ----------- 77203 0MS DateTime2 Error Count ------------ ----------- 305573 0--Calculate relative runtimes:select [Inline to Datetime1] = 77186./2860., [Inline to Datetime2] = 304840./2860., [Datetime1 to Datetime2] = 304840./77186. union allselect 78720./2856., 304353./2856., 304353./78720. union allselect 77203./2873., 305573./2873., 305573./77203.Inline to Datetime1 Inline to Datetime2 Datetime1 to Datetime2 ------------------- ------------------- ---------------------- 26.988111 106.587412 3.94942027.563025 106.566176 3.86627226.871910 106.360250 3.958045 CODO ERGO SUM |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-02-07 : 17:32:41
|
quote: Originally posted by Jeff Moden I'll have to dig out the examples I did in an "exchange of ideas" with another... on a modern server, many of the UDF's beat the inline code depending, of course, on how the UDF was written and some made it worse.--Jeff Moden
I think I may know the thread you are talking about; at least I saw something like it on SQLServerCentral a while ago. I don't have the link to it, so I can't look back at it right now.If that is the one you are talking about, I was unconvinced from the examples that UDFs could beat the inline code in a controlled test. Most of the examples that I saw on that thread seemed to be tests using live data on a live system, so it was hard to eliminate the impact of other factors. I had run some quick tests before, and saw a big difference in run time, so I decided to do my own investigation.That is why I tried to eliminate any other factors in these tests. Nothing else was running on the systems. The SQL was just a straight table scan with all the data cached in memory and returning only a single row result set into a local variable. The only variation in the SQL code was the use of the inline code vs. the functions.I would be interested in seeing a test that shows that a UDF can perform as well as identical inline code.I would also be interested in seeing this test done under SQL Server 2005. I don’t have a SQL 2005 server available that I can “idle” out for a test, and the load of the test data table takes substantial resources, so I prefer not to have angry users after me.CODO ERGO SUM |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-02-07 : 23:10:24
|
I tested again with SQL Server 2005 Developer Edition on a desktop computer. This test showed the least difference between inline code and the 2 functions, but the runtime with function Datetime1 was still 5 times as long as the inline code, and the runtime with function Datetime2 that calls 2 other functions was 26 times as long as the inline code.It is still a very substantial performance difference. With the inline code, it was able to process 935,000 rows/second, using function Datetime1 dropped the speed to 185,000 rows/second, and using function Datetime2 dropped the speed to 35,000 rows/second.Test Results Analysis:select Test, [Inline to Datetime1] = D1/NF, [Inline to Datetime2] = D2/NF, [Datetime1 to Datetime2] =D2/D1from(select Test= '1',NF=5346.0,D1=27046.0,D2=143450.0 union allselect '2',5266.0,26766.0,142296.0 union allselect '3',5283.0,26843.0,142703.0 union allselect '4',5280.0,26826.0,142420.0) aTest Inline to Datetime1 Inline to Datetime2 Datetime1 to Datetime2 ---- ------------------- ------------------- ---------------------- 1 5.0591096 26.8331462 5.303926642 5.0827952 27.0216483 5.316296793 5.0810145 27.0117357 5.316209064 5.0806818 26.9734848 5.30902855 Test Results:Test 1T_DATE Rowcount --------------- 4999950MS No Action Error Count ------------ ----------- 1140 0MS No Function Error Count -------------- ----------- 5346 0MS DateTime1 Error Count ------------ ----------- 27046 0MS DateTime2 Error Count ------------ ----------- 143450 0Test 2T_DATE Rowcount --------------- 4999950MS No Action Error Count ------------ ----------- 1123 0MS No Function Error Count -------------- ----------- 5266 0MS DateTime1 Error Count ------------ ----------- 26766 0MS DateTime2 Error Count ------------ ----------- 142296 0Test 3T_DATE Rowcount --------------- 4999950MS No Action Error Count ------------ ----------- 1123 0MS No Function Error Count -------------- ----------- 5283 0MS DateTime1 Error Count ------------ ----------- 26843 0MS DateTime2 Error Count ------------ ----------- 142703 0Test 4T_DATE Rowcount --------------- 4999950MS No Action Error Count ------------ ----------- 1126 0MS No Function Error Count -------------- ----------- 5280 0MS DateTime1 Error Count ------------ ----------- 26826 0MS DateTime2 Error Count ------------ ----------- 142420 0 Test script:set nocount ongoprint 'Test 1'goselect [T_DATE Rowcount] = count(*) from T_DATEgodeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> 0+a.[DATE]select [MS No Action] = datediff(ms,0,getdate()-@st), [Error Count] = @count godeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> dateadd(month,((a.YEAR-1900)*12)+a.MONTH-1, dateadd(ss,(a.HOUR*3600)+(a.MINUTE*60)+a.SECOND,a.DAY-1))select [MS No Function] = datediff(ms,0,getdate()-@st), [Error Count] = @count godeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> dbo.DateTime1(a.YEAR,a.MONTH,a.DAY,a.HOUR,a.MINUTE,a.SECOND)select [MS DateTime1] = datediff(ms,0,getdate()-@st), [Error Count] = @count godeclare @count intdeclare @st datetimeselect @st = getdate()select @count = count(*)from T_DATE awhere a.[DATE] <> dbo.DateTime2(a.YEAR,a.MONTH,a.DAY,a.HOUR,a.MINUTE,a.SECOND)select [MS DateTime2] = datediff(ms,0,getdate()-@st), [Error Count] = @count go CODO ERGO SUM |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-02-08 : 07:44:32
|
Very useful post MVJ, thanks.Kristen |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-02-08 : 08:57:04
|
quote: Originally posted by Kristen Very useful post MVJ, thanks.Kristen
So, are you redoing all your systems to replace functions with inline code? CODO ERGO SUM |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-02-08 : 09:02:36
|
Well ... that part of your findings was NOT so helpful [:-(] |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2007-02-08 : 11:32:42
|
quote: Originally posted by Kristen Well ... that part of your findings was NOT so helpful [:-(]
Can’t you just assign that to one of your lackeys?CODO ERGO SUM |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-02-08 : 11:59:22
|
I'm now worried about the number of CPU cycles that will generate. So I rather feel I may have to fix the SQL before I use the Lackey Assignment Tool any more ... |
|
|
|
|
|
|
|