Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Import/Export (DTS) and Replication (2000)
 CSV with double quotes [SOLVED]

Author  Topic 

Crow555
Starting Member

5 Posts

Posted - 2007-12-24 : 08:20:32
Hi Everyone, I've tried to find the answer to this problem on this forum and elsewhere but no success.

I've got a CSV file with a number of columns exported from another system to be imported by ours. However, when one of the text columns contains a comma (,) the exporting system will wrap that particular field of data with double quotes.

This of course means that Bulk Insert finds an extra column and that BCP with a format file can't look for '",' as a terminator as the double quote isn't always there.

Has anyone encountered a problem like this and if so, how was it dealt with?

Many thanks in advance,

Dave

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2007-12-24 : 21:43:30
You should use other character as column delimiter, | for instance.
Go to Top of Page

sakets_2000
Master Smack Fu Yak Hacker

1472 Posts

Posted - 2007-12-27 : 14:31:43
putting the fields within "....." might also help
Go to Top of Page

Crow555
Starting Member

5 Posts

Posted - 2007-12-31 : 08:13:03
Unfortunately, the CSV file format is beyond our control as it is exported by a different system therefore we are stuck with this particular type of format.

I'm considering amending our import stored procedure to read the file and change any commas (,) in between double quotes (") to pipes (|). Then after the import, updating the affected columns to change the pipes back to commas.

However, my SQL isn't that good and any suggestions on how to achieve this (or even a better solution to this problem) would be very much appreciated.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2007-12-31 : 08:25:23
You can achieve this using REPLACE function. look into BOL for syntax and usage.
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2007-12-31 : 09:20:44
How big are the files? If they are small you can use a function to get the olumns after importing. It's very slow though so you wouln't want to do it on large fiels.



==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Crow555
Starting Member

5 Posts

Posted - 2007-12-31 : 09:44:26
Hi NR,

The files vary but are usually sitting around 400-500kb as .csv files.

I've been scratching my head and worked out that the openrowset functions can give me the results I want.

For example, if these were lines in my file...

1,Sam Smith,123,abc
2,Tom Turner,456,def
3,"Bill Brannigan, Developer",789,ghi
4,Paul Palmer,910,jkl

.. line 3 has the problem entry. Bulk insert picks this up as 5 columns instead of 4 and errors out. However, an openrowset query will return 4 columns and strip the double quotes. Now all I have to do is delimit that into a different file with a different delimiter but I'm stuck as to how to do that in T-SQL. Like I said, I'm not to hot at SQL. :(
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2007-12-31 : 09:58:29
Why not import using the openrowset?

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Crow555
Starting Member

5 Posts

Posted - 2007-12-31 : 10:11:57
Like I said, I've hopeless at SQL and hoping to get better at it. :)

The thing I've found about openrowset is that it takes the first row and makes that the column header meaning in my above example, the columns would be called '1', 'Sam Smith', '123', 'abc' and there would be only 3 lines. It's a small niggle but would mean one record would be missing.

Given the position I'm in what would you recommend?
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2008-01-02 : 05:59:01
Can you sort the files to put the double quote records to the top?
Can you run a 2-pass extract to deal with the regualr recrods 1st and then the double-quote ones?
Can you write a VB/C++/C# program to convert the "double-quote" to standard records and then pipe through the ETL process?
Go to Top of Page

Crow555
Starting Member

5 Posts

Posted - 2008-01-02 : 06:14:37
Hi Andrew and all,

I think I've worked out the problem but wanted to test it fully first before replying.

Whereas
SELECT * FROM OPENROWSET('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)}; DefaultDir=D:\Store;', 'select * from RolesTest.csv') Rowset_1

has no headers, this one includes column headers, albeit in F1, F2, F3 etc aliases (sp?)

SELECT * FROM OPENROWSET('Microsoft.Jet.OLEDB.4.0', 'Text;Database=D:\Store;HDR=NO', 'select * from RolesTest.csv') Rowset_3


Having used this, I simply replaced the Bulk Insert clause in the stored procedure and added the above code into an insert statement (We create a hash table before this so SELECT INTO isn't used).

And there we have it, all column displayed correctly and no missing rows. Thanks to everyone who replies, especially NR and I hope this thread will be of use to someone in the future.
Go to Top of Page
   

- Advertisement -