Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Fastest method of parsing csv string into table?

Author  Topic 

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-01-31 : 08:36:07
I have stored procedures where I pass large CSV strings as parameter values, and parse them into a table variable. Currently I use the function below, but can anyone advise please if there is a faster method available?


CREATE FUNCTION [dbo].[myApp_CreateIdTable]
(
@Ids nvarchar(max),
@separator char(1)
)
RETURNS @tbl_Ids TABLE
(
IdVal int NOT NULL
)
AS
BEGIN
-- check @Ids string ends with separator character
IF NOT RIGHT(@Ids, 1) = @separator
BEGIN
SET @Ids = @Ids + @separator
END
DECLARE @separator_position INT
DECLARE @array_value nvarchar(10)
WHILE PATINDEX('%' + @separator + '%' , @Ids) <> 0
BEGIN -- feed array of integers into tempTable
SELECT @separator_position = PATINDEX('%' + @separator + '%' , @Ids)
SELECT @array_value = LEFT(@Ids, @separator_position - 1)
INSERT INTO @tbl_Ids (IdVal) VALUES (@array_value)
SELECT @Ids = STUFF(@Ids, 1, @separator_position, '')
END
RETURN
END

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2012-01-31 : 08:51:24
What is your definition of a "large CSV string"?
hmmm...
Maybe the optimization should happen in the use of the function and not in changing the function itself...?


No, you're never too old to Yak'n'Roll if you're too young to die.
Go to Top of Page

khtan
In (Som, Ni, Yak)

17689 Posts

Posted - 2012-01-31 : 08:53:21
refer to http://www.sqlservercentral.com/articles/Tally+Table/72993/


KH
[spoiler]Time is always against us[/spoiler]

Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-01-31 : 10:06:32
Hi

My CSV strings vary in size. Maybe at the lower end, 50 characters of variable length between commas, right up to 50,000 characters, again with variable lengths between commas (although for both the values are numeric ID values, eg. '5,6,7897,564,487874,54,125,4')

I tried DelimitedSplit8K but it stops working on larger strings. Is there a way to configure it without adding more SELECT 1 UNION ALL commands?
Go to Top of Page

webfred
Master Smack Fu Yak Hacker

8781 Posts

Posted - 2012-01-31 : 10:43:08
I can't believe my eyes...
If you are able to give up 50,000 characters as CSV-parm to Stored Procedure then you should be able go another way.
Maybe you can fill a table with that values instead of giving them as a parm?


No, you're never too old to Yak'n'Roll if you're too young to die.
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-01-31 : 10:53:58
Okay here's what happens. In an ASP.NET application I have a control that allows a user to select other users to report against. Some of our customers have only, say, 100 users, while others have 5000.
The chosen users are collected in ASP.NET as a CSV list of database ID values, and passed to SQL to use in the report.
So if the user selects 100 users, SQL gets a fairly short CSV string. IF they select all the users, the string might well be 50,000 characters long depending on the length of each User's ID value.

If you know of a better way to do this I am extremely open to suggestions.
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-01-31 : 11:23:16
If it's a matter of selecting all of 5000 IDs, and never 4000 or 3000, then pass 'ALL' as the parameter value and modify your function to return everything. I agree that Jeff Moden's technique is the best way to go, short of using a CLR UDF to parse the string.

Another method you may want to investigate is using fixed-length values instead of comma separated. You can store int values in 4 bytes of binary and pass the entire set as a varbinary(max). The length of the value / 4 is the number of int's passed. Here's an example:
-- create and populate sample table
CREATE TABLE #a(i INT NOT NULL, alpha AS CHAR(64+i))
;WITH n(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM n WHERE n<100)
INSERT #a SELECT n FROM n

-- set up variable (parameter) and append ID values as binary(4)
DECLARE @param VARBINARY(MAX)=0x;
SET @param=@param + CAST(4 AS BINARY(4))
SET @param=@param + CAST(23 AS BINARY(4))
SET @param=@param + CAST(19 AS BINARY(4))
SET @param=@param + CAST(47 AS BINARY(4))
SET @param=@param + CAST(87 AS BINARY(4))

-- parse fixed-length binary values and convert to int, then join to table
;WITH n(n) AS (SELECT 1 UNION ALL SELECT n+4 FROM n WHERE n<DATALENGTH(@param)-4),
i(i) AS (SELECT CAST(SUBSTRING(@param,n,4) AS INT) FROM n)
SELECT a.* FROM i
INNER JOIN #a a ON i.i=a.i
The technique is practically identical to Jeff's article and avoids having to search for commas. It's also more space efficient, particularly if you were using GUIDs instead of ints (which was the reason I started using this technique).
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-01-31 : 11:59:54
Hi Rob

Thanks for that. I was having problems getting Jeff Moden's technique to work with lengthy strings. His example seems limited to varchar(8000) and my T-SQL isn't hot enough to expand on it (other than the guess I made above)...

Your method looks interesting, but surely getting the application to convert all the records to a byte array would surely outweigh the speed gains from the function you provided, right??

I will certainly modify my procedure to use the 'ALL' technique, but unfortunately I then end up heading down a catch-all query path, which I really wanted to avoid (as ALL would require a CustomerID lookup instead of userID lookup, and I don't fancy duplicating this query for all combinations of parameter values...).

Unless of course there's a better way...
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-01-31 : 16:48:46
quote:
Your method looks interesting, but surely getting the application to convert all the records to a byte array would surely outweigh the speed gains from the function you provided, right??
Not really, CLR code to convert int to binary will be very fast and compiled. You'll take a bigger performance hit by keeping this in a SQL UDF, especially a multi-statement table function, especially if you're JOINing or CROSS APPLYing to it. This will have to be interpreted by the query processor and won't optimize well. A single-statement TVF or stored procedure can be optimized.

You can avoid most or all of this by using a SQL CLR function to split strings. There's a great analysis of various methods here:

http://sqlblog.com/blogs/aaron_bertrand/archive/2010/07/07/splitting-a-list-of-integers-another-roundup.aspx

And Adam's CLR version here:

http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx
quote:
I will certainly modify my procedure to use the 'ALL' technique, but unfortunately I then end up heading down a catch-all query path, which I really wanted to avoid (as ALL would require a CustomerID lookup instead of userID lookup, and I don't fancy duplicating this query for all combinations of parameter values...).

Unless of course there's a better way...
If you've got a lot of parameters, dynamic SQL may be the answer. Check Gail's post:

http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-02-01 : 06:21:56
Hi Rob

Thank you very much for those links. I had read Gail's post a few times before, but the first two links are priceless.

I've never created a CLR function before. From my Googling this morning, it seems the compiled .NET code must be referenced inside a SQL stored procedure in order for it to be run. Could it also be linked inside a SQL function as well?
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2012-02-01 : 06:51:05
Sure, once the assembly is registered and the function/procedure/data type created, CLR objects work like any other database object.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2012-02-01 : 12:34:51
quote:
Originally posted by R

His example seems limited to varchar(8000) and my T-SQL isn't hot enough to expand on it (other than the guess I made above)...


You should, probably!, be able to just change

varchar(8000)
to
varchar(max)

to get handling for unlimited sized string

There is skilful discussion of the various methods here:

http://www.sommarskog.se/arrays-in-sql-2005.html

and also table-value-parameters, which you may very well be able to do easily from ASP.NET (in essence passing a ready-formed databases "table" instead of a CSV "list")

I have several SPLIT functions that I use, different ones for whether the function is returning INTs or Strings, and whether the input string is < 8,000 or > 8,000

The performance is vastly different, and I use different algorithms for Small vs. Large string splitting. Make sure you time yours on both Small and Large string tests, we didn't realise how disastrously our original code performed on large strings until we fell over the problem by accident!
Go to Top of Page

R
Constraint Violating Yak Guru

328 Posts

Posted - 2012-02-01 : 12:52:57
Hi Kristen

Thanks for your valued input. Unfortunately our live machine is SQL 2005, so I don't think Table Value Parameters work on that version. Safe to say I'll be given them my full attention when we upgrade, as that would make life so much easier.

I've been researching the CLR methods in depth today. Although I've never actually used or create one, it does seem that these far outperform any other internal SQL function or approach. So... I'm currently figuring out how to get my hacked .NET CLR function to be available in SSMS... :-)
Go to Top of Page
   

- Advertisement -