Author |
Topic |
AskSQLTeam
Ask SQLTeam Question
0 Posts |
Posted - 2001-02-14 : 20:59:55
|
There have been quite a few requests on how to take a string of comma-separated values (CSV), and parse out the values individually. graz's article uses a stored procedure to extract each value into a variable. However, if you have a table with a column of CSV values, and you want to translate them into a normalized table, this technique becomes pretty hairy. You would need to use a cursor to loop through your CSV table and process each row. (A loop within another loop...I'm getting queasy) Fortunately there's a way to do this using just a regular SELECT. It requires a table of numeric values, but it does the job beautifully; you can INSERT normalized CSV values into a table with one SQL statement! Article Link. |
|
AjarnMark
SQL Slashing Gunting Master
3246 Posts |
Posted - 2002-08-19 : 20:24:26
|
Rob,Thanks for a GREAT article! I have referred several people here in the past, but it wasn't until just today that I needed to use the technique myself. In fact, I wrapped some other code around it such that I was able to actually process two correlated CSV arrays and insert their values into a table such that the pairings were together in their own rows. Worked like a charm! Thanks again! |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-09-18 : 14:50:44
|
Hey Rob, I have spent the majority of the morning working on adapting this to a situation where the separator is variable. I'm stuck ...declare @csv varchar(100), @sep varchar(5)select @csv = 'this, , is, a, test', @sep = ', 'select nullif(substring(@sep+@csv+@sep,n+len(@sep),charindex(@sep,@sep+@csv+@sep,n)-n-len(@sep)),'') as wordfrom numberswhere n<=len(@sep+@csv+@sep) and substring(@sep+@csv+@sep,n-1,1)=@sep and charindex(@sep,@sep+@csv+@sep,n)-n-len(@sep)>0 This works great, until I change the seperator!! Any advice?Jay White{0} |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-09-18 : 15:14:17
|
Does this fix it?declare @csv varchar(100), @sep varchar(5)select @csv = 'this, , is, a, test', @sep = ', 'select nullif(substring(@sep+@csv+@sep,n+len(@sep), charindex(@sep,@sep+@csv+@sep,n)-n-len(@sep)),'') as wordfrom numberswhere n<=len(@sep+@csv+@sep) and substring(@sep+@csv+@sep,n-1,len(@sep))=@sep and charindex(@sep,@sep+@csv+@sep,n)-n-len(@sep)>0 |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-09-18 : 15:27:10
|
quote: LEN (T-SQL)Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.
...so...declare @csv varchar(100), @sep varchar(5)select @csv = 'thishellhellishellahelltest', @sep = 'hell'select nullif(substring(@sep+@csv+@sep,n,charindex(@sep,@sep+@csv+@sep,n)-n),'') as wordfrom numberswhere n<=len(@sep+@csv+@sep) and substring(@sep+@csv+@sep,n-datalength(@sep),datalength(@sep))=@sep and charindex(@sep,@sep+@csv+@sep,n)-n>0 I think the only thing that needs to change (from the original article) is the where filter that picks the n. The line the changes is red above.Pardon me, while I go whittle my keyboard into the shape of a spike and fall on it ... datalength ... eff meJay White{0} |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-09-18 : 15:42:01
|
Oooooooooooooops, yeah, that sounds like that might've been the problem. |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-09-20 : 12:54:13
|
I did some testing between Rob's method in this article and Graz's sp_parsearray proc, and got some interesting results that I though I might share ...First off, I create a table of random length (up to 10 characters) strings. I then wrote a proc that uses this COALESCE method for building a csv string out of a variable number of random records from my strings table. I then converted my code snippet above (adapted from Rob's article) and Graz'a proc into UDFs for the testing.For the testing, I ran each function in a while loop while @i<=800 and created the csv out of @i random elements from strings. I then passed that csv to my two parser functions and captured execution time metrics (simple set @t = getdate() before the execution and captured datediff(ms,@t,getdate()) after it) ... I ran through this loop several times to get a decent population.First things first. On the average, Rob's method is a) a good bit faster and b) more consistant ....method | Avg(ms/element) | Stdev----------------------------------Rob | 0.87404 | 0.92330Graz | 2.05991 | 1.98419 One thing to note, though. The couple of outliers in the testing of Rob's method were really out there. For example, when testing a csv of 684 random elements (of random length), Graz's method scored {470,480,490,503,523} and Rob's scored {200,210,220,220,1240}. There are a handful of these cases (not enough to really effect STDEV, but interesting nontheless) ...All tests run on a Dell laptop, XP Pro, SQL 2Ksp2, 1.7Gig P4 . . .Scripts available on request ...Jay White{0} |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-09-20 : 13:29:17
|
If I had to guess, I'd say that the sequence table might cache differently from one execution to the next. Some of the CSVs you're parsing might require an extra data page or two to be retrieved...especially if that table spans more than one extent. You did clear the data cache between each run, right???? (DBCC DROPCLEANBUFFERS)graz's code has a huge benefit because it works with scalar, singular values and doesn't require any I/O or caching beyond whatever the optimizer might do in creating work tables, and that's highly unlikely. There could also be subtle differences relating to PatIndex (graz) and CharIndex (mine) performance. Have you tried clearing the procedure cache between each run too? |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-09-20 : 13:51:20
|
The sequence table used is a real table, and not a temp table/table variable generated on each run ...I'm running it again (just 3 iterations this time) clearing both the data and proc cache before each call to each function. I don't think it will change the results, but I'll post back in about 30 minutes ...Jay White{0} |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-09-20 : 14:10:13
|
quote: The sequence table used is a real table, and not a temp table/table variable generated on each run ...
Wouldn't matter, it would depend entirely on how the table is physically stored. Especially if it spanned multiple extents, a set of values needed to process one run may exist on one page/extent, and another run may require rows on another page/extent not previously read into cache, and would need some extra I/O to retrieve it.Again, that's just a guess, but it's the kind of thing that can make the same code and the same data run differently on different machines. |
|
|
Page47
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2002-09-20 : 15:01:00
|
Rob you were right ...With clearing data and proc buffers between each run ...method | Avg(ms/element) | Stdev----------------------------------Rob | 2.47138 | 12.9338Graz | 1.45296 | 7.08893 I am curious as to how and why Graz's method improved ...Jay White{0} |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2002-09-20 : 15:21:34
|
The earlier, better results for my method benefited from data caching. Possibly, graz's code was hampered earlier by an old(er) execution plan kept in cache; recompiling each time might've helped, but again I think data caching played the biggest role here. And yes, the convential wisdom that "recompiles are always bad" isn't always true.And if you created a new set of data, physical storage differences could completely skew the results compared to the earlier run.Don't you just HATE how that physical implementation shit keeps f---ing up a perfectly good theory? BTW, you have reached the segment of the performance evaluation process I lovingly call the WTF??? Stage. I have JOYOUS (not!) memories of running the same friggin' code HUNDREDS of times and getting wildly different results each time. The only solution I found to be reliable was DBCC DROPCLEANBUFFERS and FREEPROCCACHE and FLUSHPROCINDB before each run. Since all of these were undocumented until I got Ken Henderson'sThe Guru's Guide to Transact-SQLyou can understand a little better why I exalt Ken above all other SQL Server authors. Without him I'd be an even bigger babbling drooling idiot than I am today (shut up Femenella!) |
|
|
nguyen
Starting Member
8 Posts |
Posted - 2003-08-26 : 11:54:36
|
I tinkered with the code and used derived tables, to come up with a slightly cleaner version. Here Quotes contains only one row. The derived table determines s (position of the starting comma) and e (position of the ending comma) for each word. The rows are filtered by where the ending comma position is greater than the starting comma position.select word=substring(phrase,s,(e-s-1))from(SELECT id ,phrase ,s=CharIndex(',' ,','+Phrase+',', ID) ,e=CharIndex(',' ,','+Phrase+',', ID+1)FROM Tally,Quoteswhere CharIndex(',' ,','+Phrase+',', ID) < CharIndex(',' ,','+Phrase+',', ID+1)) as a |
|
|
AndrewGail
Starting Member
3 Posts |
Posted - 2004-03-26 : 09:51:33
|
Hi,This site has quickly become my favourite SQL related site, and for good reason. I've used your code found in this article to great effect before, but I've spent all day trying to rework it to achieve a slightly different goal.I understand that putting things in context helps, so excuse the long winded post.I've an old SQL Server database who's main table stores links and link titles (amoungst other things). The links/titles are stored as follows:LINKS-------http://www.bbc.co.uk,http://www.sqlteam.com,http://www.google.comLINKTITLES------------BBC Online,SQLTeam,GoogleI'm upgrading the web applications that use these fields and at the same time I want to put these two fields into their own table:Table: Links--------------UIN - int - {Unique Identifier}Parent - int - {reference to UIN from main table}Title - nvarchar - {the link title}Link - nvarchar - {the link itself}When I run your code against a single column (e.g. linktitles) it all works perfectly returning 1033 rows.I then tried a modification of your code:SELECT UIN, NullIf(SubString(',' + LINKTITLES + ',' , ID , CharIndex(',' , ',' + LINKTITLES + ',' , ID) - ID) , '') AS Title, NullIf(SubString(',' + LINKS + ',' , ID , CharIndex(',' , ',' + LINKS + ',' , ID) - ID) , '') AS LinkFROM #Tally, linksTempWHERE (ID <= Len(',' + LINKTITLES + ',') AND SubString(',' + LINKTITLES + ',' , ID - 1, 1) = ',') AND (ID <= Len(',' + LINKS + ',') AND SubString(',' + LINKS + ',' , ID - 1, 1) = ',')AND CharIndex(',' , ',' + LINKTITLES + ',' , ID) - ID > 0ORDER BY UIN It works. But its only returning 630 rows. I'm only getting one link/title per UIN.Could anyone shed some light on where I'm going wrong here?Thanks,Andrew |
|
|
ESquared
Starting Member
4 Posts |
Posted - 2005-08-29 : 00:06:32
|
Heck, I know the original article is more than four years old... My apologies!I see a way to simplify some of the expressions, by removing unnecessary concatenation and arithmetic:-- Include empty strings as nulls, for direct comparisonSELECT Author, Word = NullIf(SubString(Phrase, ID, CharIndex(',', Phrase + ',', ID) - ID), '')FROM Tally T INNER JOIN Quotes Q ON T.ID <= Len(Q.Phrase) AND Substring(',' + Q.Phrase, T.ID, 1) = ','--Don't include empty strings (changes in blue)SELECT Author, Word = SubString(Phrase, ID, CharIndex(',', Phrase + ',', ID) - ID)FROM Tally T INNER JOIN Quotes Q ON T.ID <= Len(Q.Phrase) AND Substring(',' + Q.Phrase, T.ID, 2) LIKE ',[^,]' |
|
|
ESquared
Starting Member
4 Posts |
Posted - 2005-08-29 : 00:34:47
|
CREATE TABLE Quotes (Author varchar(100), Phrase varchar(7000))INSERT Quotes SELECT 'Shakespeare', 'A,rose,by,any,other,name,smells,just,as,sweet'UNION SELECT 'Kipling', 'Across,the,valley,of,death,rode,the,six,hundred'UNION SELECT 'Coleridge', 'In,Xanadu,did,Kubla,Khan,...,,,,damn,I,forgot,the,rest,of,it'UNION SELECT 'Descartes', 'I,think,therefore,I,am'UNION SELECT 'Volk', 'I,think,therefore,I,need,another,beer'UNION SELECT 'Feldman', 'No,it''s,pronounced,I,gor'UNION SELECT 'Simpson', 'Mmmmmm,donuts'UNION SELECT 'Fudd', 'Be,vewwy,vewwy,quiet,I''m,hunting,wabbits' |
|
|
ESquared
Starting Member
4 Posts |
Posted - 2005-09-09 : 17:53:26
|
CREATE TABLE Tally (ID int identity(1,1))INSERT Tally DEFAULT VALUESWHILE SCOPE_IDENTITY() < 8000 INSERT Tally DEFAULT VALUES |
|
|
ashish_khanolkar
Starting Member
1 Post |
Posted - 2006-10-03 : 18:57:23
|
This solution for 'Parsing CSV values into Multiple rows is excellent but I want to use this in SYBASE.The only problem is CHARINDEX.In SYBASE the CHARINDEX syntax is CHARINDEX(expression1, expression2).How can I use this query in SYBASE? |
|
|
bryonpierce
Starting Member
1 Post |
Posted - 2008-04-18 : 13:16:16
|
--I loved this solution but here is a one executable script to see--this in action using table variable...have fun :0DECLARE @Quotes TABLE (Author varchar(100), Phrase varchar(7000))INSERT INTO @Quotes SELECT 'Shakespeare', 'A,rose,by,any,other,name,smells,just,as,sweet'UNION SELECT 'Kipling', 'Across,the,valley,of,death,rode,the,six,hundred'UNION SELECT 'Coleridge', 'In,Xanadu,did,Kubla,Khan,...,,,,damn,I,forgot,the,rest,of,it'UNION SELECT 'Descartes', 'I,think,therefore,I,am'UNION SELECT 'Volk', 'I,think,therefore,I,need,another,beer'UNION SELECT 'Feldman', 'No,it''s,pronounced,I,gor'UNION SELECT 'Simpson', 'Mmmmmm,donuts'UNION SELECT 'Fudd', 'Be,vewwy,vewwy,quiet,I''m,hunting,wabbits'DECLARE @Tally TABLE (ID int identity(1,1))INSERT @Tally DEFAULT VALUESWHILE SCOPE_IDENTITY() < 8000 INSERT @Tally DEFAULT VALUESDECLARE @OnlyWords TABLE (Author varchar(100), WORD varchar (100),Phrase varchar(7000))INSERT INTO @OnlyWords SELECT Author, NullIf(SubString(',' + Phrase + ',' , ID , CharIndex(',' , ',' + Phrase + ',' , ID) - ID) , '') AS Word ,PhraseFROM @Tally, @Quotes WHERE ID <= Len(',' + Phrase + ',') AND SubString(',' + Phrase + ',' , ID - 1, 1) = ','--select top 2 * from @Quotes--select top 3 * from @Tallyselect * from @OnlyWordsorder by Author,Word,PhraseBryon Piercebryonpierce at geezmailz |
|
|
kamalee
Starting Member
1 Post |
Posted - 2009-03-10 : 12:46:10
|
Hi, i implemented the code exactly as you show on your example, but the output only present a single value...create table ParseWords1 (ID integer primary key, Location varchar(50))insert into parsewords1 (ID, Location) values (1,'Toronto,Ontorio,Canada')insert into parsewords1 (ID, Location) values (2,'Boston,Massachusetts,USA')insert into parsewords1 (ID, Location) values (3,'Vancouver,British Columbia,Canada')SELECT NullIf(SubString(',' + Location + ',' , ID , CharIndex(',' , ',' + Location + ',' , ID) - ID) , '') AS Localidades FROM parsewords1WHERE ID <= Len(',' + Location + ',') AND SubString(',' + Location + ',' , ID - 1, 1) = ',' AND CharIndex(',' , ',' + Location + ',' , ID) - ID > 0the output: | Localidades1| Bostoni'm using sql server 2005kamalee |
|
|
psycotech
Starting Member
5 Posts |
Posted - 2009-04-27 : 01:54:24
|
Here is the nastiest SQL code i have ever Writen.It takes your CSV file as a parameter and you can well insert the outcome to wherever you like.I post this to perhaps save someone else the trouble of writing it.If anyone who needs it sees any potential improvement, all is welcome.Note some of the other solutions don't take into account double quotes and commas in the fields, this one does. CREATE TABLE #TEST( Line int, The varchar(255)) DECLARE @err int INSERT INTO #TEST /* Execute dbo.Mailing_CSV_Read @CSV the file you want, 'Table Columns you need from csv'*/ EXEC @err = dbo.Mailing_CSV_Read 'CSVFILE Goes HERE', 'Line,The' SELECT * FROM #TESTCREATE PROCEDURE dbo.Mailing_CSV_Read @CSV varchar(max), @ColumnsReturn varchar(max)ASCREATE TABLE #Temp( Line int, Col int, Val varchar(255))DECLARE @Cols varchar(max)DECLARE @Row varchar(max)DECLARE @Col varchar(255)DECLARE @RowStart intDECLARE @RowEnd intDECLARE @ColStart intDECLARE @ColEnd intDECLARE @Len intDECLARE @LineINDX intDECLARE @ColINDX int/* Remove this sample data*/SET @CSV = 'The,f,l,p,c,add,addMs ,Katrina,Rendell,,Somersault Group,"Level 2, 93 Pacific Highway ",Mr ,Paul,Bird,,"Three, "", and the "" Plus",3/40 Querrin Street ,Ms.,Dorothy ,Illing,,Three Plus,3/40 Querrin Street ,Prof,Sandra ,Harding,Vice Chancellor & President,James Cook University ,,Emeritus Prof,Colin ,Power AM,,Eidos Institute,4 Carnegie Street ,Prof,Neil,Dempster,Prof in Education,Faculty of Education,Mount Gravatt Campus ,Griffith UniversityMs.,Elizabeth,Jameson,Director,Board Matters ,P O Box 377 ,Dr.,Cherrell ,Hirst A.O.,Acting CEO ,QBF,P O Box 2076 ,Mr,John B,Reid AO ,School of Strategy & Entrepreneurship,Australian School of Business ,Level 5 Bldg E12,University of NSW Mr,Christopher ,Thorn,Principal,Goldman Sachs JB Were ,Philanthropic Services ,GPO Box 2050Mr,Jeremy,Madin,Headmaster,Cranbrook School,5 Victoria Road ,'SET @RowEnd = 1SET @RowStart = 0SET @LineINDX = 0WHILE @RowEnd > 0BEGIN SET @RowEnd = CHARINDEX(CHAR(13), @CSV, @RowStart) SET @Len = @RowEnd - @RowStart IF @RowEnd < 1 SET @Len = 99999 SET @Row = SUBSTRING(@CSV, @RowStart, @Len) SET @ColEnd = 1 SET @ColStart = 0 SET @ColINDX = 0 WHILE @ColEnd > 0 BEGIN SET @ColEnd = CHARINDEX(',', @Row, @ColStart) SET @Len = @ColEnd - @ColStart IF @ColEnd < 1 SET @Len = 99999 SET @Col = SUBSTRING(@Row, @ColStart, @Len) IF LEFT(@Col,1) = '"' BEGIN SET @ColEnd = CHARINDEX('"', @Row, @ColStart + 1) WHILE (SUBSTRING(@Row, @ColEnd + 1, 1) = '"') SET @ColEnd = CHARINDEX('"', @Row, @ColEnd + 2) SET @Len = @ColEnd - @ColStart - 2 IF @ColEnd < 1 SET @Len = 99999 SET @Col = REPLACE(SUBSTRING(@Row, @ColStart + 1, @Len), '""', '"') END INSERT INTO #Temp VALUES (@LineINDX, @ColINDX, @Col) SET @ColStart = @ColEnd + 1 SET @ColINDX = @ColINDX + 1 END SET @RowStart = @RowEnd + 2 SET @LineINDX = @LineINDX + 1ENDDECLARE @columns varchar(max)DECLARE @SelectColumns varchar(max)SET @ColEnd = 1SET @ColStart = 0SET @ColINDX = 0WHILE @ColEnd > 0BEGIN SET @ColEnd = CHARINDEX(',', @ColumnsReturn, @ColStart) SET @Len = @ColEnd - @ColStart IF @ColEnd < 1 SET @Len = 99999 SET @Col = SUBSTRING(@ColumnsReturn, @ColStart, @Len) IF (@Col = 'Line') SET @SelectColumns = COALESCE(@SelectColumns + ',[Line]', '[Line]') ELSE BEGIN SELECT @ColINDX = Col FROM #Temp WHERE Line = 0 AND Val = @Col IF (@Col is NULL) RETURN 0 SET @SelectColumns = COALESCE(@SelectColumns + ',[' + cast(@ColINDX as varchar) + '] as [' + @Col + ']', '[' + cast(@ColINDX as varchar)+ '] as [' + @Col + ']') SET @columns = COALESCE(@columns + ',[' + cast(@ColINDX as varchar) + ']', '[' + cast(@ColINDX as varchar)+ ']') END SET @ColStart = @ColEnd + 1ENDDELETE FROM #TempWHERE Line = 0DECLARE @Query VARCHAR(max)SET @Query = 'SELECT ' + @SelectColumns +'FROM #tempPIVOT(MIN(Val)FOR [Col]IN (' + @columns + '))AS p'EXEC(@Query) |
|
|
Next Page
|
|
|