| Author |
Topic |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2005-04-22 : 09:54:10
|
| I've got a new CSV file from a customer. This one is comma delimited, and occasionally, there is a piece of data with a comma, but that's OK because that data is enclosed with quotes.My tests with BCP show it doesn't recognize the quotes. Seems to think quotes are data. The quoted comma is taken as a delimiter.I wrestled with the format using EXCEL and Notepad. I lost.Argh! |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2005-04-22 : 10:15:56
|
change all [;] to ____bblahblah____.import it to table.replace the ____bblahblah____ to ; with a simple update.Go with the flow & have fun! Else fight the flow |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2005-04-22 : 10:28:41
|
quote: Originally posted by spirit1 change all [;] to ____bblahblah____.import it to table.replace the ____bblahblah____ to ; with a simple update.
Thanks for the reply.I tried using Notepad (bad choice for a 55,000 line file). Seems Notepad starts at the beginning after every replace. It froze up after a minute.I tried using EXCEL, which is risky, but it won't delete the quotations. I can use it to change comma delimitors to tabs, which helps.Seems there ought to be a more flexible product for doing imports. Does everyone solve these import problems using Microsoft Office products to clean the data? |
 |
|
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2005-04-22 : 10:30:54
|
i use TextPad or UltraEdit...works for me.you can also use VS. Go with the flow & have fun! Else fight the flow |
 |
|
|
mwjdavidson
Aged Yak Warrior
735 Posts |
Posted - 2005-04-22 : 11:36:40
|
Hi You can do this with a format file, but it's a bit convoluted:Compare the following two format files for a data file looking like this:date colour count1/1/2004 red 1010/1/2004 blue 20 This format file would work:1 SQLCHAR 0 255 "," 1 date ""2 SQLCHAR 0 255 "," 2 colour ""3 SQLCHAR 0 255 "\r\n" 3 count "" If the fields were text qualified, this one would work:8.051 SQLCHAR 0 1 "\"" 0 first_quote ""2 SQLCHAR 0 255 "\",\"" 1 date ""3 SQLCHAR 0 255 "\",\"" 2 colour ""4 SQLCHAR 0 255 "\"\r\n" 3 count "" I'm afraid I don't have time to give a decent explanation right now, but hopefully, it should be enough to get it working for your data.Mark |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2005-04-22 : 11:53:53
|
| Thanks Mark. I didn't know a quote could be specified, but I am still stuck.Whatever product wrote this CSV file only quoted the fields *when* a comma was contained in the data. This means most rows are not quoted, only occasional rows.IF bcp.fmt can handle this, I'm game. |
 |
|
|
mwjdavidson
Aged Yak Warrior
735 Posts |
Posted - 2005-04-22 : 14:54:12
|
| Ah... that's not so handy! As it's essentially just hacking the delimiters in the format file, this won't work unless the use of text qualifiers is consistent throughout the file. Sorry!Mark |
 |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2005-04-22 : 15:54:17
|
You could BCP the data into a table in a single column, manipulate the "special" delimiters with SQL into a consistant format, then BCP it out again to a file, and then BCP it into a table using a format file.It is a bit painful, but I have done this with large files that had a few special rows in them.It you want to try this, create a table and a view like this:create table T_BCP_IN_DATA(Sequence int identity(1,1) primary key clustered,Data varchar(8000) null,)gocreate view V_BCP_IN_DATAasselect Data from T_BCP_IN_DATA Then BCP the file into the View. The reason for the Sequence column is to give you a PK to identify individual rows, and the reason for the View is so you don't have to worry about the Sequence when you BCP the file in.Once you BCP the data in, all you need is a little bit of clever SQL to clean it up. You might try changing the delimiter to pipe "|" on the rows where the "," delimiter does not occur, and then see what is left to deal with on the other rows. Using the Sequence number, you can also just edit the data directly in Enterprise Manager by doing Open Table for 100 rows, click the SQL button in the data view, and adding a where clause with the Sequence you want to edit.I know this seems like a lot of work, but it sounds like you are stuck with nothing better.CODO ERGO SUM |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2005-04-22 : 18:47:16
|
| 1. Download these utilities and unzip them to a folder:http://unxutils.sourceforge.net/2. Find the sed.exe utility and copy it to the same folder as your data file.3. Copy the following into a file called csv2tab.sed, in the same folder as your data file:s/\",\"/\t/gs/\",/\t/gs/,\"/\t/gs/^\"\(.*\)/\1/s/\"//4. Run the following on the command line:sed -r -fcsv2tab.sed datafile.txt >newdatafile.txtThat will convert your CSV file to a nice, unquoted tab-delimited file (newdatafile.txt). I'm not 100% sure that the sed script I posted will cover all of the possible quote-comma combinations, but it should be close enough.FYI, sed is a VERY COOL Unix/Linux utility that can do search-and-replace using regular expressions. It is scriptable and extremely powerful at manipulating text. You can find more information about it here:http://www.student.northpark.edu/pemente/sed/sedfaq.htmlhttp://main.rtfiber.com.tw/~changyj/sed/http://sed.sourceforge.net/#docsSome words of advice: your head WILL explode, you WILL get frustrated and confused at the syntax, and you WILL be extremely glad you went through it all once you learn it and start to use it. The examples on those pages only hint at the things sed can do. |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2005-04-22 : 19:27:10
|
| Thanks Rob,Sam |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-04-23 : 14:24:43
|
| I'm struggling with this.My understanding of CSV (which I've never had a problem with using BCP and I'm pretty sure DTS also) is:1) Comma between fields2) If field contains comma surround it with "3) (Its OK to surround all fields with " if you like)4) If a field, already surrounded with ", contains an embedded quote then double it to be ""Loads of applications do (3) as a cheap get-out. They probably aren't capable of doing (4).Either way, you are hosed if the field contains CR / LFKristen |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2005-04-23 : 14:43:05
|
quote: Originally posted by Kristen I'm struggling with this.My understanding of CSV (which I've never had a problem with using BCP and I'm pretty sure DTS also) is:1) Comma between fields2) If field contains comma surround it with "3) (Its OK to surround all fields with " if you like)4) If a field, already surrounded with ", contains an embedded quote then double it to be ""Loads of applications do (3) as a cheap get-out. They probably aren't capable of doing (4).Either way, you are hosed if the field contains CR / LFKristen
1) is true2) may be true, I've never checked.3) single quote ' maybe true, double quote " is interpeted as data4) don't knowThe problem I'm encountering is that my customers deliver "what they can". When it's non-conformant, BCP doesn't seem to be flexible enough to accomodate. Rob's solution is a work around that helps. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-04-24 : 03:13:49
|
| "When it's non-conformant"That's my point, really. Most stuff that considers itself to be CSV is non-conformant :-)Kristen |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
|
|
|