Returning Complex Data from User-Defined Functions with CROSS APPLYBy Jeff Smith on 11 June 2007 | Tags: Queries , User Defined Functions SQL Server User-Defined Functions (UDFs) can return either a single value or virtual tables. However, sometimes we might like for a User-Defined Function to simply return more than 1 piece of information, but an entire table is more than what we need. For example, suppose we want a function that parses a single VARCHAR() containing a street address and returns:
In other words, it would accept "100 Main St #44" and return 3 distinct, separate values:
As you can see, each value would have its own label and also potentially its own data type. Other examples would be a function that accepts an email address and returns the username and the domain separately, or a function that accepts a full name and parses it into separate First, Middle and Last values. In general, if a single function call can return multiple pieces of information all at once, it reduces the number of function calls you need to make, resulting in shorter code and allowing you to put more complex business logic into fewer functions. Returning multiple values from a UDF can be done fairly easily in SQL Server, but we must return those values as a virtual table. The following very simple ParseEmail() function accepts an email address as an argument, parses it, and returns the username and domain separately: create function EmailParse (@email varchar(1000)) returns @t table (UserName varchar(20), Domain varchar(20)) as begin declare @i int select @i = charindex('@', @email,1); if (@i > 1) insert into @t values (left(@email,@i-1), substring(@email,@i+1,20)) else insert into @t values (null,null) return end Now, technically, the above UDF returns a table; we have just written it so that it will always return exactly 1 row. An entire table is more than we need, but since we want to return multiple values, a single-row table with multiple columns is really our only option. Remember that because the return value is a table, it cannot be called like this: select dbo.EmailParse('jeff@jeff.com') Msg 4121, Level 16, State 1, Line 1 Cannot find either column "dbo" or the user-defined function or aggregate "dbo.EmailParse", or the name is ambiguous. That syntax is not valid; even if it was, there is no way to reference the different columns in the table that are returned. In addition, even though we know our function will always return exactly one row, that is not always or usually the case for table-valued UDFs, so a more generic approach must be used. The proper way to is to select from the results of the function just like you would from any other table, like this: select username, domain from dbo.EmailParse('jeff@jeff.com') username domain -------------------- -------------------- jeff jeff.com (1 row(s) affected) That works beautifully, and using this technique, we are able to return different pieces of information all from one user-defined function call. It appears to be just what we need! However, there is a catch: what if we wish to call this user-defined function for each row in a table within a set-based SQL statement? For example, suppose we have a table of email addresses to parse, and we wish to parse them all at once in a single SELECT? Consider the following table: create table emails (ID int primary key, EmailAddress varchar(100)) insert into emails select 1,'jeff@jeff.com' union all select 2,'yak@sqlteam.com' union all select 3,'billg@microsoft.com' We'd like to parse this data using our EmailParse() function to split each email address into username and domain for each row of the data. How would we write this? Well, if we try this: select emails.id, dbo.EmailParse(emails.EmailAddress) from emails We get: Msg 4121, Level 16, State 1, Line 1 Cannot find either column "dbo" or the user-defined function or aggregate "dbo.EmailParse", or the name is ambiguous. Which is to be expected; it’s the same error message we got previously, and looking at that statement, you can see it doesn’t really make logical sense if you consider that EmailParse() is table-valued function that returns multiple rows and columns Since EmailParse() always returns a table with a single row, if there were a way to CROSS JOIN our emails table with calls to the function, it seems like that might do the trick. Perhaps something like this: select emails.ID, s.Username, s.Domain from emails cross join EmailParse(emails.EmailAddress) s That seems to make more logical sense, and the syntax looks like it allows us to reference the columns returned by the UDF just fine. Unfortunately, we get an error again: Msg 4104, Level 16, State 1, Line 1 The multi-part identifier "emails.EmailAddress" could not be bound. It might seem like we are stuck, but SQL Server 2005 introduces a featured called CROSS APPLY that basically allows us to call a table-valued user-defined over and over for each row in a SELECT. This is just what we were looking for, and we can use it to solve our problem: select emails.ID, s.Username, s.Domain from emails cross apply EmailParse(emails.EmailAddress) s ID Username Domain ----------- -------------------- -------------------- 1 jeff jeff.com 2 yak sqlteam.com 3 billg microsoft.com (3 row(s) affected) Voila! There it all is -- our function is called for each row in our data, and we can access the columns returned by the function to retrieve the username and the domain from the email address passed in. The above syntax basically says: “Select each row from the emails table, call ParseEmail() on the EmailAddress column, and cross join that row with the rows returned from the function call.” Because we always return exactly 1 row with multiple columns from our function, the CROSS APPLY doesn't affect the number of rows returned -- it just adds additional columns to the result, which is exactly what we need. This means that using CROSS APPLY in conjunction with table-valued UDFs that return exactly one row is the perfect way for to return complex multi-field values from a User-Defined Function! In addition to parsing email addresses, we can return multiple date characteristics all at once, split strings into a fixed number of segments, parse schemes and domains and paths from URLs, return extra bit columns to indicate errors or other flags, and so on. I know I’ve come across situations in the past where I’ve thought that returning a few extra columns in a single-value UDF would be helpful, but I didn’t want to forgo the benefits of set-based processing so other, more complicated techniques were necessary. Using CROSS APPLY with functions that return a single row allows you to handle this situation perfectly, and write more powerful UDFs that result in fewer function calls and more efficient and more powerful functions.
|
- Advertisement - |