BCP and UCS-2 files. - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

SQL Server Administration (2008)

BCP and UCS-2 files.

Author

Topic

Mamsaac
Starting Member

9 Posts

Posted - 2011-05-25 : 18:32:46

So, tests I had been doing on importing where on a copy of the files I would be receiving. Funny enough, the copies I had where on ANSI encoding.

However, the real files are on UCS-2 (LE) format. When trying to work with it, I received an "Unexpected EOF" problem (which is understandable).

I'm using a format file (since I only import a subset of the table, other columns being generated by the DB itself). My question relies on how I can tell bcp that the imported data is in UCS-2 format and not ANSI.

Thanks!

robvolk
Most Valuable Yak

15732 Posts

Posted - 2011-05-25 : 19:09:07

If the advice here doesn't work for UCS-2:

http://msdn.microsoft.com/en-us/library/ms188289.aspx

You'll have to convert the data file to regular Unicode or ANSI text.

Mamsaac
Starting Member

9 Posts

Posted - 2011-05-25 : 19:21:10

Well, I got most of it working... All lines are imported with no issues EXCEPT for the first one. It seems that UCS-2 files start with 1 (or 2) bytes of extra data and bcp is not being able to say "oh, this is just the start of the file, I can ignore it", and those throws a message like the following:

Starting copy...
SQLState = 22018, NativeError = 0
Error = [Microsoft][SQL Native Client]Invalid character value for cast specification

2432110 rows copied.

Notice it only happens for the first row. If I manually change first row to any other row of the file, the new line that is at first row will fail.

Mamsaac
Starting Member

9 Posts

Posted - 2011-05-25 : 20:21:14

As hacky as it may seem, my "solution" was to preppend a dummy line to the file. That way the dummy line (which I don't care about) is not inserted and no data is lost. It works, at least... but I'm surprised I couldn't find a way to just do it with the tool.

Mamsaac
Starting Member

9 Posts

Posted - 2011-05-25 : 22:01:21

Just in case one day someone has the same issue with Unicode, this should save such person some time:

string tempFileName = prefix + Path.GetRandomFileName();
StreamReader reader = new StreamReader(fileName);
StreamWriter newFile = new StreamWriter(tempFileName, false, Encoding.Unicode);
string line = reader.ReadLine();
newFile.WriteLine(line);
reader.DiscardBufferedData();
reader.BaseStream.Seek(2, SeekOrigin.Begin);
reader.BaseStream.Position = 2;
while ((line = reader.ReadLine()) != null)
{
newFile.WriteLine(line);
}
reader.Close();
newFile.Close();

Using Unicode because bcp does NOT support UTF8, which is .NET default encoding. Seeking to position 2 because you don't want to write those 2 characters (that indicate the encoding) for a second time in the file, but only once (when copied first line).

newFile is the file I can process without losing any data.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources