Intro to User Defined Functions (Updated)

By Garth Wells on 8 January 2001 | Tags: Application Design , User Defined Functions , Functions

Garth Wells sends us this article on User Defined Functions. It's taken mainly from his book, Code-Centric: T-SQL Programming with Stored Procedures and Triggers , about programming Microsoft SQL Server. He also has a web site for the book with sample chapter downloads. Thanks for the article Garth!

Introduction

The ability to create a user-defined function (UDF) is a new feature added to SQL Server 2000. Developers have been asking Microsoft to add this feature for several versions of the product, so let's take a quick look at how to create a few simple UDFs to see if they can help you in your programming endeavors.

Creating a Simple UDF

A user-defined function is a database object that encapsulates one or more Transact-SQL statements for reuse. This definition is similar to the one for stored procedures, but there are many important differences between user-defined functions and stored procedures—the most pronounced being what types of data they can return. Let’s create one so you can see how easy they are to create and reference.

The following statement shows how to create a function that accepts two input parameters, sums them together and then returns the sum to the calling statement.

CREATE FUNCTION fx_SumTwoValues
( @Val1 int, @Val2 int )
RETURNS int
AS
BEGIN
  RETURN (@Val1+@Val2)
END

The structure of the CREATE FUNCTION statement is fairly straightforward. You provide an object name (fx_SumTwoValues), input parameters (@Val1 and @Val2), the type of data the function will return () and the statement(s) the function executes are located between the BEGIN…END block. The following SELECT statement calls the function. Note that the two-part name (owner.object_name) is required when calling this function.

SELECT dbo.fx_SumTwoValues(1,2) AS SumOfTwoValues

SumOfTwoValues
--------------
3

When the SELECT is executed, the input parameters 1 and 2 are added together and the sum 3 is returned. You can use any values that either are, or can be, implicitly converted to an int data type for the input parameters. Keep in mind, though, that only an int can be returned, so the following statement will not produce the desired results.

SELECT dbo.fx_SumTwoValues(1.98,2.78) AS SumOfTwoValues

SumOfTwoValues
--------------
3

The function returns a 3, which indicates the decimal portion of the parameters are truncated before the calculation occurs.

SQL Server’s ability to implicitly convert data allows the following to execute successfully.

SELECT dbo.fx_SumTwoValues('7','7') AS SumOfTwoValues

SumOfTwoValues
--------------
14

When values that cannot be converted to an int are passed to the function, the following error message is generated.

SELECT dbo.fx_SumTwoValues('Y','7') AS SumOfTwoValues

Server:Msg 245,Level 16,State 1,Line 1
Syntax error converting the varchar value 'Y'to a column of data type int.

Three Types of User-Defined Functions

Now that you have seen how easy it is to create and implement a simple function, let’s cover the three different types of user-defined functions and some of the nuances of how they are implemented.

Scalar Functions

A scalar function returns a single value of the data type referenced in the RETURNS clause of the CREATE FUNCTION statement. The returned data can be of any type except text, ntext, image, cursor, or timestamp.

The example we covered in the previous section is a scalar function. Although the previous example only contained one statement in the BEGIN…END block, a scalar function can contain an unlimited number of statements as long as only one value is returned. The following example uses a WHILE construct to demonstrate this.

CREATE FUNCTION fx_SumTwoValues2
( @Val1 int, @Val2 int )
RETURNS int
AS
BEGIN
 WHILE @Val1 <100
  BEGIN
   SET @Val1 =@Val1 +1
  END
  RETURN (@Val1+@Val2)
END
go

SELECT dbo.fx_SumTwoValues2(1,7) AS SumOfTwoValues

SumOfTwoValues
--------------
107

The @Val1 input parameter is set to 1 when the function is called, but the WHILE increments the parameter to 100 before the RETURN statement is executed. Note that the two-part name (owner.object_name) is used to call the function. Scalar functions require that their two-part names be used when they are called. As you will see in the next two sections, this is not the case with the other two types of functions.

Inline Table-Valued Functions

An inline table-valued function returns a variable of data type table whose value is derived from a single SELECT statement. Since the return value is derived from the SELECT statement, there is no BEGIN/END block needed in the CREATE FUNCTION statement. There is also no need to specify the table variable name (or column definitions for the table variable) because the structure of the returned value is generated from the columns that compose the SELECT statement. Because the results are a function of the columns referenced in the SELECT, no duplicate column names are allowed and all derived columns must have an associated alias.

The following uses the Customer table in the Northwind database to show how an inline table-valued function is implemented.

USE Northwind
go
CREATE FUNCTION fx_Customers_ByCity
( @City nvarchar(15) )
RETURNS table
AS
RETURN (
        SELECT CompanyName
        FROM Customers
        WHERE City =@City
		 )
go
SELECT * FROM fx_Customers_ByCity('London')

CompanyName
----------------------------------------
Around the Horn
. . . 
Seven Seas Imports

Multi-Statement Table-Valued Functions

The multi-statement table-valued function is slightly more complicated than the other two types of functions because it uses multiple statements to build the table that is returned to the calling statement. Unlike the inline table-valued function, a table variable must be explicitly declared and defined. The following example shows how to implement a multi-statement table-valued function that populates and returns a table variable.

USE Northwind
go
CREATE FUNCTION fx_OrdersByDateRangeAndCount
( @OrderDateStart smalldatetime, 
  @OrderDateEnd smalldatetime, 
  @OrderCount smallint )
RETURNS @OrdersByDateRange TABLE
  (  CustomerID nchar(5),
     CompanyName nvarchar(40),
     OrderCount smallint,
     Ranking char(1) )
AS
BEGIN
 --Statement 1
 INSERT @OrdersByDateRange
 SELECT a.CustomerID,
        a.CompanyName,
        COUNT(a.CustomerID)AS OrderCount,
       'B'
 FROM Customers a
 JOIN Orders b ON a.CustomerID =b.CustomerID
 WHERE OrderDate BETWEEN @OrderDateStart AND @OrderDateEnd
 GROUP BY a.CustomerID,a.CompanyName
 HAVING COUNT(a.CustomerID)>@OrderCount

--Statement 2
UPDATE @OrdersByDateRange
SET Ranking ='A'
WHERE CustomerID IN (SELECT TOP 5 WITH TIES CustomerID
                     FROM (SELECT a.CustomerID,
                                  COUNT(a.CustomerID)AS OrderTotal
                           FROM Customers a
                           JOIN Orders b ON a.CustomerID =b.CustomerID
                           GROUP BY a.CustomerID) AS DerivedTable
                           ORDER BY OrderTotal DESC)
RETURN
END

The main difference between this example and the one in the previous section is that we were required to specify the structure of the @OrdersByDateRange table variable used to hold the resultset and list @OrdersByDateRange in the RETURNS clause. As you can see from the input parameter list, the function accepts a start date, an end date and an order count value to filter the resultset.

The first statement (--Statement 1) uses the input parameters to populate the table variable with customers who meet the specified criteria. The second statement (-Statement 2) updates the rows in table variable to identify the top five overall order placers. The IN portion of the UPDATE may seem a little confusing at first glance, but all its doing is using a derived table to select the CustomerID values of the top five order producers. Derived tables are discussed in Chapter 4. You can use the following to find the companies who have submitted more than two orders between 1/1/96 and 1/1/97.

SELECT *
FROM fx_OrdersByDateRangeAndCount ('1/1/96','1/1/97',2)
ORDER By Ranking

CustomerID  CompanyName                      OrderCount   Ranking
----------  ------------------------------   ----------   -------
ERNSH       Ernst Handel                      6           A
FOLKO       Folk och fä HB                    3           A
HUNGO       Hungry Owl All-Night Grocers      5           A
QUICK       QUICK-Stop                        6           A
SAVEA       Save-a-lot Markets                3           A
SEVES       Seven Seas Imports                3           B
SPLIR       Split Rail Beer &Ale              5           B
...

The rows ranking values of ‘A’ indicate the top five order placers of all companies. The function allows you to perform two operations with one object. Retrieve the companies who have placed more than two orders between 1/1/96 and 1/1/97 and let me know if any of these companies are my top five order producers.

One of the advantages of using this type of function over a view is that the body of the function can contain multiple SQL statements to populate the table variable, whereas a view is composed of only one statement. The advantage of using multi-statement table-valued function versus a stored procedure is that the function can be referenced in the FROM clause of a SELECT statement while a stored procedure cannot. Had a stored procedure been used to return the same data, the resultset could only be accessed with the EXECUTE command.

A Real-World Example

Now that you have an idea of the different types of functions available in SQL Server 2000, let's wrap up this article with an example you might be able to use on one of your projects. The following statements create a function that determines the last day of the month (LDOM) for a given date parameter.

CREATE FUNCTION fx_LDOM
(  @Date varchar(20) )
RETURNS datetime
AS
BEGIN
 --ensure valid date
 IF ISDATE(@Date) = 1
  BEGIN
   --determine first day of month
   SET @Date = DATEADD(day,-DAY(@Date)+1,@Date)
   --determine last day of month
   SET @Date = DATEADD(day,-1,DATEADD(month,1,@Date))
  END
 ELSE
  SET @Date = '1/1/80'

  RETURN @Date
END

The function's parameter (@Date) is defined as varchar(20), so error-checking code can be implemented. The ISDATE function makes sure the supplied value is a valid date. When an invalid date value is supplied, the function returns '1/1/80' to the calling statement. If you do not use this type of error-checking, the call to the function will fail when an invalid date is supplied.

When a valid date value is supplied, the DATEADD function is used to:

Determine the first day of the month, and
Determine the last day of the month.

If you have never used DATEADD before this may seem a little confusing, but a quick explanation should eliminate any that might exist. You use DATEADD to add or substract a date/time unit from a given date. The first parameter (in this case 'day') indicates the portion of the date that should be incremented. You can also specify year, quarter, month...millisecond. The second parameter is the number of units to add or substract. When subtracting, you simply make the value negative as shown in the example. The third parameter is the date value on which the calculation is performed.

The first day of the month is determined by calculating the number of elapsed days in the supplied parameter with the DAY function, adding 1 and then substracting it from the parameter. For an @Date value of 1/15/01, it simply subtracts 14 (15-1) days to get 1/1/01.

The last day of the month is determined by adding 1 month to the current month value and subtracting one day. So, 1/1/01 plus 1 month is equal to 2/1/01 and when you substract one day you get: 1/31/01.

The following shows how fx_LDOM is used in a SELECT statement to calculate the number of days remaining in a month.

CREATE TABLE fx_Testing (DateValue datetime)
go
INSERT fx_Testing VALUES ('1/1/01')
INSERT fx_Testing VALUES ('2/15/01')
INSERT fx_Testing VALUES ('2/15/02')
INSERT fx_Testing VALUES ('2/15/03')
INSERT fx_Testing VALUES ('2/15/04')

SELECT DateValue,
       dbo.fx_LDOM(DateValue) AS LDOM,
       DATEDIFF(day,DateValue,dbo.fx_LDOM(DateValue)) AS DaysLeftInMonth
FROM fx_Testing

DateValue                 LDOM                    DaysLeftInMonth
------------------------  ----------------------- ---------------
2001-01-01 00:00:00.000   2001-01-31 00:00:00.000  30
2001-02-15 00:00:00.000   2001-02-28 00:00:00.000  13
2002-02-15 00:00:00.000   2002-02-28 00:00:00.000  13
2003-02-15 00:00:00.000   2003-02-28 00:00:00.000  13
2004-02-15 00:00:00.000   2004-02-29 00:00:00.000  14

The DATEDIFF function is used to determine the difference between two dates. In this case, the number of days between the value in the DateValue column and the last day of the month calculated by fx_LDOM.