Parsing CSV Values Into Multiple Rows

By Rob Volk on 19 February 2001 | Tags: SELECT


There have been quite a few requests on how to take a string of comma-separated values (CSV), and parse out the values individually. graz's article uses a stored procedure to extract each value into a variable. However, if you have a table with a column of CSV values, and you want to translate them into a normalized table, this technique becomes pretty hairy. You would need to use a cursor to loop through your CSV table and process each row. (A loop within another loop...I'm getting queasy)

Fortunately there's a way to do this using just a regular SELECT. It requires a table of numeric values, but it does the job beautifully; you can INSERT normalized CSV values into a table with one SQL statement!

The tricky part is that each word can be a different length, and each CSV can have a different number of words in it. We'll need a separate table (Tally), with a single column of sequential integer values. The "tally table" is mentioned in Joe Celko's SQL for Smarties -- the best SQL book ever written -- as a handy source for a numeric sequence. Here, it will replace the loop needed to parse out each value from the CSV string. (I'll use an upper limit of 8,000; this is the maximum size of a SQL Server varchar value)

Here's the table of CSV values I want to normalize (Quotes): I want it to look like this (OnlyWords): (Tally)
AuthorPhrase
ShakespeareA,rose,by,any,other,name,smells,just,as,sweet
KiplingAcross,the,valley,of,death,rode,the,six,hundred
ColeridgeIn,Xanadu,did,Kubla,Khan,...,,,,damn,I,forgot,the,rest,of,it
DescartesI,think,therefore,I,am
VolkI,think,therefore,I,need,another,beer
FeldmanNo,it's,pronounced,I,gor
SimpsonMmmmmm,donuts
FuddBe,vewwy,vewwy,quiet,I'm,hunting,wabbits
AuthorWord
ShakespeareA
Shakespearerose
Shakespeareby
Shakespeareany
Shakespeareother
Shakespearename
Shakespearesmells
...etc.
ID
1
2
3
4
5
6
...
8,000

Here's the SQL statement that will chop up the Phrase column into each Word (and include the Author) and place them in separate rows:

SELECT Author, 
NullIf(SubString(',' + Phrase + ',' , ID , CharIndex(',' , ',' + Phrase + ',' , ID) - ID) , '') AS Word 
FROM Tally, Quotes 
WHERE ID <= Len(',' + Phrase + ',') AND SubString(',' + Phrase + ',' , ID - 1, 1) = ',' 
AND CharIndex(',' , ',' + Phrase + ',' , ID) - ID > 0

The best way to explain how this works is to use a single row from Quotes. The Shakespeare row contains the Phrase A,rose,by,any,other,name,smells,just,as,sweet. We'll use the SQL function SubString() to extract part of the Phrase. We'll begin with the first word, starting from the 1st character, and extending to the first comma in the string. To get A, the following values would work:

SELECT SubString('A,rose,by,any,other,name,smells,just,as,sweet', 1, 2) AS Word

However, I don't know the position of the first (or any) comma. The CharIndex() function will tell me:

SELECT SubString('A,rose,by,any,other,name,smells,just,as,sweet', 1, 
CharIndex(',' , 'A,rose,by,any,other,name,smells,just,as,sweet', 1 ) - 1) AS Word

I subtract 1 from the CharIndex() value to eliminate the comma from the end (A instead of A,).

Here's where the Tally table values come into play. Instead of using a 1, I can substitute ID from Tally. This will let me extract each character, up to the comma:

SELECT SubString('A,rose,by,any,other,name,smells,just,as,sweet', ID, 
CharIndex(',' , 'A,rose,by,any,other,name,smells,just,as,sweet', ID ) - ID) AS Word 
FROM Tally

A few problems come up. The last word of the phrase will not be parsed, because it appears after the last comma in the string. We'll get an error with ID numbers that are greater than the length of the string. For each ID value (3,4,5,6) this formula will read from that character position up to the next comma, resulting in 4 rows for rose, ose, se, e, when all we want is one row for the second word, rose.

To get around these issues, we'll concatenate a comma to the beginning and end of the string. We'll use a WHERE clause to restrict ID values to less than or equal to the length of the string. For each word we extract, we will check the character immediately to the left of that word. If it is a comma, we know that we have started a new word; if not, it is a word fragment, and the WHERE clause will discard it:

SELECT SubString(',' + Phrase + ',' , ID , 
CharIndex(',' , ',' + Phrase + ',' , ID) - ID) AS Word 
FROM Tally, Quotes 
WHERE ID <= Len(',' + Phrase + ',') 
AND SubString(',' + Phrase + ',' , ID - 1, 1) = ','

Almost done! Suppose there are missing words (consecutive comma delimiters), but I want to include a row for them -- Coleridge is missing a few :) I'll use NullIf() to replace empty strings ('') with NULL. I'll add a condition to the WHERE clause to prevent NULL rows in the results; remove this condition if you want NULL rows included:

SELECT Author, 
NullIf(SubString(',' + Phrase + ',' , ID , CharIndex(',' , ',' + Phrase + ',' , ID) - ID) , '') AS Word 
FROM Tally, Quotes 
WHERE ID <= Len(',' + Phrase + ',') AND SubString(',' + Phrase + ',' , ID - 1, 1) = ',' 
AND CharIndex(',' , ',' + Phrase + ',' , ID) - ID > 0  --remove this line to keep NULL rows

Finally, I use this SELECT as part of my INSERT statement to put them into the OnlyWords table (I removed the last WHERE condition to include the missing words from the Coleridge row):

INSERT INTO OnlyWords SELECT Author, 
NullIf(SubString(',' + Phrase + ',' , ID , CharIndex(',' , ',' + Phrase + ',' , ID) - ID) , '') AS Word 
FROM Tally, Quotes 
WHERE ID <= Len(',' + Phrase + ',') AND SubString(',' + Phrase + ',' , ID - 1, 1) = ','

You can wrap this up in a stored procedure that accepts any delimiter character (like graz's code) and parses using that delimiter. If you combine it with a little Dynamic SQL, you can pass a delimiter, source table(s), source column(s), and destination table/column, and use this for any table in your database! You could also modify it to count words in a string, or as an alternate way to extract complete words from a string.

This link will list all of the CSV related articles on this site, including one by Garth on how to combine rows into one CSV value.

-robvolk

P.S. - Hopefully someone may apply the Tally table solution to the question that got me hooked on SQLTeam.com!


Related Articles

Joining to the Next Sequential Row (2 April 2008)

Writing Outer Joins in T-SQL (11 February 2008)

How to Use GROUP BY with Distinct Aggregates and Derived tables (31 July 2007)

How to Use GROUP BY in SQL Server (30 July 2007)

SQL Server 2005: Using OVER() with Aggregate Functions (21 May 2007)

Server Side Paging using SQL Server 2005 (4 January 2007)

Using XQuery, New Large DataTypes, and More (9 May 2006)

Counting Parents and Children with Count Distinct (10 January 2006)

Other Recent Forum Posts

Using the CASE Clause with a date range for just Month and DAy (18h)

SQL problem to be solved (7d)

Align column in SQL Query output (8d)

I have a sql query which returns a sum of total amount by user and I am trying to add another column which uses the total and multiply it by a number (9d)

Troubleshooting Deadlocks in SQL Server (18d)

Last Login date and time (20d)

Negative effects of High VLF counts (20d)

Need to return a value that indicates that a record has been added, but not when a record is modified (21d)

- Advertisement -