Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Import/Export (DTS) and Replication (2000)
 BCPdotFMTcommaQUOTEDmess

Author  Topic 

SamC
White Water Yakist

3467 Posts

Posted - 2005-04-22 : 09:54:10
I've got a new CSV file from a customer. This one is comma delimited, and occasionally, there is a piece of data with a comma, but that's OK because that data is enclosed with quotes.

My tests with BCP show it doesn't recognize the quotes. Seems to think quotes are data. The quoted comma is taken as a delimiter.

I wrestled with the format using EXCEL and Notepad. I lost.

Argh!

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-04-22 : 10:15:56
change all [;] to ____bblahblah____.
import it to table.
replace the ____bblahblah____ to ; with a simple update.

Go with the flow & have fun! Else fight the flow
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-04-22 : 10:28:41
quote:
Originally posted by spirit1

change all [;] to ____bblahblah____.
import it to table.
replace the ____bblahblah____ to ; with a simple update.

Thanks for the reply.

I tried using Notepad (bad choice for a 55,000 line file). Seems Notepad starts at the beginning after every replace. It froze up after a minute.

I tried using EXCEL, which is risky, but it won't delete the quotations. I can use it to change comma delimitors to tabs, which helps.

Seems there ought to be a more flexible product for doing imports. Does everyone solve these import problems using Microsoft Office products to clean the data?
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-04-22 : 10:30:54
i use TextPad or UltraEdit...
works for me.

you can also use VS.


Go with the flow & have fun! Else fight the flow
Go to Top of Page

mwjdavidson
Aged Yak Warrior

735 Posts

Posted - 2005-04-22 : 11:36:40
Hi
You can do this with a format file, but it's a bit convoluted:

Compare the following two format files for a data file looking like this:

date colour count
1/1/2004 red 10
10/1/2004 blue 20


This format file would work:

1 SQLCHAR 0 255 "," 1 date ""
2 SQLCHAR 0 255 "," 2 colour ""
3 SQLCHAR 0 255 "\r\n" 3 count ""


If the fields were text qualified, this one would work:

8.0
5
1 SQLCHAR 0 1 "\"" 0 first_quote ""
2 SQLCHAR 0 255 "\",\"" 1 date ""
3 SQLCHAR 0 255 "\",\"" 2 colour ""
4 SQLCHAR 0 255 "\"\r\n" 3 count ""


I'm afraid I don't have time to give a decent explanation right now, but hopefully, it should be enough to get it working for your data.

Mark
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-04-22 : 11:53:53
Thanks Mark. I didn't know a quote could be specified, but I am still stuck.

Whatever product wrote this CSV file only quoted the fields *when* a comma was contained in the data. This means most rows are not quoted, only occasional rows.

IF bcp.fmt can handle this, I'm game.
Go to Top of Page

mwjdavidson
Aged Yak Warrior

735 Posts

Posted - 2005-04-22 : 14:54:12
Ah... that's not so handy! As it's essentially just hacking the delimiters in the format file, this won't work unless the use of text qualifiers is consistent throughout the file. Sorry!

Mark
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2005-04-22 : 15:54:17
You could BCP the data into a table in a single column, manipulate the "special" delimiters with SQL into a consistant format, then BCP it out again to a file, and then BCP it into a table using a format file.

It is a bit painful, but I have done this with large files that had a few special rows in them.

It you want to try this, create a table and a view like this:
create table T_BCP_IN_DATA
(
Sequence int identity(1,1)
primary key clustered,
Data varchar(8000) null,
)
go
create view V_BCP_IN_DATA
as
select Data from T_BCP_IN_DATA

Then BCP the file into the View. The reason for the Sequence column is to give you a PK to identify individual rows, and the reason for the View is so you don't have to worry about the Sequence when you BCP the file in.

Once you BCP the data in, all you need is a little bit of clever SQL to clean it up. You might try changing the delimiter to pipe "|" on the rows where the "," delimiter does not occur, and then see what is left to deal with on the other rows. Using the Sequence number, you can also just edit the data directly in Enterprise Manager by doing Open Table for 100 rows, click the SQL button in the data view, and adding a where clause with the Sequence you want to edit.

I know this seems like a lot of work, but it sounds like you are stuck with nothing better.




CODO ERGO SUM
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2005-04-22 : 18:47:16
1. Download these utilities and unzip them to a folder:

http://unxutils.sourceforge.net/

2. Find the sed.exe utility and copy it to the same folder as your data file.

3. Copy the following into a file called csv2tab.sed, in the same folder as your data file:

s/\",\"/\t/g
s/\",/\t/g
s/,\"/\t/g
s/^\"\(.*\)/\1/
s/\"//


4. Run the following on the command line:

sed -r -fcsv2tab.sed datafile.txt >newdatafile.txt

That will convert your CSV file to a nice, unquoted tab-delimited file (newdatafile.txt). I'm not 100% sure that the sed script I posted will cover all of the possible quote-comma combinations, but it should be close enough.

FYI, sed is a VERY COOL Unix/Linux utility that can do search-and-replace using regular expressions. It is scriptable and extremely powerful at manipulating text. You can find more information about it here:

http://www.student.northpark.edu/pemente/sed/sedfaq.html
http://main.rtfiber.com.tw/~changyj/sed/
http://sed.sourceforge.net/#docs

Some words of advice: your head WILL explode, you WILL get frustrated and confused at the syntax, and you WILL be extremely glad you went through it all once you learn it and start to use it. The examples on those pages only hint at the things sed can do.
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-04-22 : 19:27:10
Thanks Rob,

Sam
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-04-23 : 14:24:43
I'm struggling with this.

My understanding of CSV (which I've never had a problem with using BCP and I'm pretty sure DTS also) is:

1) Comma between fields
2) If field contains comma surround it with "
3) (Its OK to surround all fields with " if you like)
4) If a field, already surrounded with ", contains an embedded quote then double it to be ""

Loads of applications do (3) as a cheap get-out. They probably aren't capable of doing (4).

Either way, you are hosed if the field contains CR / LF

Kristen
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-04-23 : 14:43:05
quote:
Originally posted by Kristen

I'm struggling with this.

My understanding of CSV (which I've never had a problem with using BCP and I'm pretty sure DTS also) is:

1) Comma between fields
2) If field contains comma surround it with "
3) (Its OK to surround all fields with " if you like)
4) If a field, already surrounded with ", contains an embedded quote then double it to be ""

Loads of applications do (3) as a cheap get-out. They probably aren't capable of doing (4).

Either way, you are hosed if the field contains CR / LF

Kristen


1) is true
2) may be true, I've never checked.
3) single quote ' maybe true, double quote " is interpeted as data
4) don't know

The problem I'm encountering is that my customers deliver "what they can". When it's non-conformant, BCP doesn't seem to be flexible enough to accomodate. Rob's solution is a work around that helps.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-04-24 : 03:13:49
"When it's non-conformant"

That's my point, really. Most stuff that considers itself to be CSV is non-conformant :-)

Kristen
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2005-04-27 : 16:05:24
see
http://www.mindsdoor.net/SQLTsql/BCP_quoted_CSV_Format_file.html

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page
   

- Advertisement -