Returning Complex Data from User-Defined Functions with CROSS APPLY

By Jeff Smith on 11 June 2007 | Tags: Queries , User Defined Functions


SQL Server User-Defined Functions (UDFs) can return either a single value or virtual tables. However, sometimes we might like for a User-Defined Function to simply return more than 1 piece of information, but an entire table is more than what we need. For example, suppose we want a function that parses a single VARCHAR() containing a street address and returns:

  • Street Number
  • Street Name
  • Unit Number

In other words, it would accept "100 Main St #44" and return 3 distinct, separate values:

  • Street Number: "100"
  • Street Name: "Main St"
  • Unit Number: "#44"

As you can see, each value would have its own label and also potentially its own data type. Other examples would be a function that accepts an email address and returns the username and the domain separately, or a function that accepts a full name and parses it into separate First, Middle and Last values. In general, if a single function call can return multiple pieces of information all at once, it reduces the number of function calls you need to make, resulting in shorter code and allowing you to put more complex business logic into fewer functions.

Returning multiple values from a UDF can be done fairly easily in SQL Server, but we must return those values as a virtual table. The following very simple ParseEmail() function accepts an email address as an argument, parses it, and returns the username and domain separately:

create function EmailParse (@email varchar(1000))
returns @t table (UserName varchar(20), Domain varchar(20))
as
begin
	declare @i int

	select @i = charindex('@', @email,1);

	if (@i > 1)
		insert into @t values (left(@email,@i-1), substring(@email,@i+1,20))
	else
		insert into @t values (null,null)

	return 
end

Now, technically, the above UDF returns a table; we have just written it so that it will always return exactly 1 row. An entire table is more than we need, but since we want to return multiple values, a single-row table with multiple columns is really our only option.

Remember that because the return value is a table, it cannot be called like this:

select dbo.EmailParse('jeff@jeff.com')

Msg 4121, Level 16, State 1, Line 1
Cannot find either column "dbo" or the user-defined function 
or aggregate "dbo.EmailParse", or the name is ambiguous.

That syntax is not valid; even if it was, there is no way to reference the different columns in the table that are returned. In addition, even though we know our function will always return exactly one row, that is not always or usually the case for table-valued UDFs, so a more generic approach must be used. The proper way to is to select from the results of the function just like you would from any other table, like this:

select username, domain 
from   dbo.EmailParse('jeff@jeff.com')

username             domain
-------------------- --------------------
jeff                 jeff.com

(1 row(s) affected)

That works beautifully, and using this technique, we are able to return different pieces of information all from one user-defined function call. It appears to be just what we need!

However, there is a catch: what if we wish to call this user-defined function for each row in a table within a set-based SQL statement? For example, suppose we have a table of email addresses to parse, and we wish to parse them all at once in a single SELECT? Consider the following table:

create table emails (ID int primary key, EmailAddress varchar(100))

insert into emails
select 1,'jeff@jeff.com' union all
select 2,'yak@sqlteam.com' union all
select 3,'billg@microsoft.com'

We'd like to parse this data using our EmailParse() function to split each email address into username and domain for each row of the data. How would we write this? Well, if we try this:

select emails.id, dbo.EmailParse(emails.EmailAddress)
from   emails

We get:

Msg 4121, Level 16, State 1, Line 1
Cannot find either column "dbo" or the user-defined function 
or aggregate "dbo.EmailParse", or the name is ambiguous.

Which is to be expected; it’s the same error message we got previously, and looking at that statement, you can see it doesn’t really make logical sense if you consider that EmailParse() is table-valued function that returns multiple rows and columns

Since EmailParse() always returns a table with a single row, if there were a way to CROSS JOIN our emails table with calls to the function, it seems like that might do the trick. Perhaps something like this:

select 
	emails.ID, s.Username, s.Domain
from 
	emails
cross join 
	EmailParse(emails.EmailAddress) s 

That seems to make more logical sense, and the syntax looks like it allows us to reference the columns returned by the UDF just fine. Unfortunately, we get an error again:

Msg 4104, Level 16, State 1, Line 1
The multi-part identifier "emails.EmailAddress" could not be bound.

It might seem like we are stuck, but SQL Server 2005 introduces a featured called CROSS APPLY that basically allows us to call a table-valued user-defined over and over for each row in a SELECT. This is just what we were looking for, and we can use it to solve our problem:

select 
	emails.ID, s.Username, s.Domain
from 
	emails
cross apply 
EmailParse(emails.EmailAddress) s

ID          Username             Domain
----------- -------------------- --------------------
1           jeff                 jeff.com
2           yak                  sqlteam.com
3           billg                microsoft.com

(3 row(s) affected)

Voila! There it all is -- our function is called for each row in our data, and we can access the columns returned by the function to retrieve the username and the domain from the email address passed in. The above syntax basically says:

“Select each row from the emails table, call ParseEmail() on the EmailAddress column, and cross join that row with the rows returned from the function call.”

Because we always return exactly 1 row with multiple columns from our function, the CROSS APPLY doesn't affect the number of rows returned -- it just adds additional columns to the result, which is exactly what we need. This means that using CROSS APPLY in conjunction with table-valued UDFs that return exactly one row is the perfect way for to return complex multi-field values from a User-Defined Function!

In addition to parsing email addresses, we can return multiple date characteristics all at once, split strings into a fixed number of segments, parse schemes and domains and paths from URLs, return extra bit columns to indicate errors or other flags, and so on. I know I’ve come across situations in the past where I’ve thought that returning a few extra columns in a single-value UDF would be helpful, but I didn’t want to forgo the benefits of set-based processing so other, more complicated techniques were necessary. Using CROSS APPLY with functions that return a single row allows you to handle this situation perfectly, and write more powerful UDFs that result in fewer function calls and more efficient and more powerful functions.


Related Articles

Using Dynamic SQL in Stored Procedures (7 March 2011)

Joining to the Next Sequential Row (2 April 2008)

Writing Outer Joins in T-SQL (11 February 2008)

Aggregating Correlated Sub-Queries (23 October 2007)

How to Use GROUP BY with Distinct Aggregates and Derived tables (31 July 2007)

How to Use GROUP BY in SQL Server (30 July 2007)

Returning a week number for any given date and starting fiscal month (2 May 2007)

Common Table Expressions (15 August 2006)

Other Recent Forum Posts

I have a sql query which returns a sum of total amount by user and I am trying to add another column which uses the total and multiply it by a number (21h)

Troubleshooting Deadlocks in SQL Server (10d)

Last Login date and time (11d)

Negative effects of High VLF counts (11d)

Need to return a value that indicates that a record has been added, but not when a record is modified (12d)

Indexex on low cardinality fields (12d)

Error in stored procedure (13d)

Spam post flagging (13d)

- Advertisement -