Fun with bulk insert. - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Fun with bulk insert.

Author

Topic

dukey07
Starting Member

16 Posts

Posted - 2006-02-06 : 12:23:10

So I found a nasty bug today, and I thought I would share. Testing alerted me to an instance where the first record was being skipped in some data that was being loaded from a text file via the DTS Bulk Insert task. Sure enough this was indeed the case. I took it over to t-sql and QA to see if I could duplicate it and sure enough it was bad there too. Now here is the funny part. The file had a header, and I was telling bulk insert to skip the header and start on line two. Here is an example of what the file looked like

Header Data1|Data2|Data4|Data5|Data6
Data line 1 1|2|3|4|5|Six
Data line 2 2|3|4|5|6|Seven

As you can see the header file has one less argument then the data does. This is the issue. The Bulk Insert never fails, because the first row is skipped. However, for some reason it tries to still parse that first row and this somehow causes it to skip the first row of actual data. I also noticed that if the header row has too many arguments, then it seems to process just fine. Anyways, we fixed our headers to get around the problem, and I thought I would alert anyone else about this weird behavior.

nosepicker
Constraint Violating Yak Guru

366 Posts

Posted - 2006-02-06 : 18:58:59

I don't know if I would call this a bug. I would characterize it more as a known limitation of bulk insert/bcp. The advantage of bulk insert/bcp is its speed. The disadvantage is that it's not very "intelligent" in that it doesn't allow for much variation. It has a very simple algorithm: it mainly just looks for column delimiters and row delimiters. That's about it. That's why things like headers and footers will generally screw it up because they will almost always have data that doesn't fit the expected target table. To start at a particular line number, bulk insert/bcp still has to parse through column and row delimiters. Therefore, it's always a best practice to remove headers and footers before bulk loading data.

nr
SQLTeam MVY

12543 Posts

Posted - 2006-02-06 : 20:08:03

It's a feature of bcp. If a row is short it will take data from the next row to complete it. If this doesn't cause a type mismatch then it will be happy and you will just have odd data and rows missing.
It's why you should always have an expected rowcount or some other control for a bcp (or any transfer really).

There are other oddities.
bcp into a text field with no delimiter will fail if the size of the file is divisible by 4
(I check and add a couple of characters if it is).
The order of rows in the file is not preserved.

There are others but I can never remember until I come across them.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources