Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 bulk load stalls

Author  Topic 

ps2
Starting Member

3 Posts

Posted - 2005-10-22 : 10:43:04
Hi,

There seem to be some really knowledgeable people on this forum so I'm hoping you'll be equally helpful on this post.

I'm working with two fundamentally equivalent databases, one in development with approx 6 million rows, and one in production with approx 70 million rows. This is a datawarehouse application where I need to load roughly 145K rows an hour (24x7) using bcp (slow, unfortunately, I have two indexes that I can't loose). On the development machine it takes roughly 30 - 40 seconds which I can live with. In production it can take as much as a 45 minutes although it's usually closer to 30min. What is most disturbing about the production DB (apart from the fact that it is unusable at the moment) is that the bcp appears to stall about half way through the load (of course locking out all transactions until it completes). It appears that the first 85K rows are loaded almost immediately, then it just sits there for about 30 minutes or more and then the last rows also load quickly. It always stalls at 85K so I'm thinking that might have something to do with it. There is plenty of room in the transaction log and yet for some reason I'm inclined to think that the log or virtual log files have something to do with this. Both machines are running with simple recovery mode. There are some minor hardware differences, there is more CPU power on production server and the disks are probably a touch slower than development although probably nothing substantial.

Any ideas? Suggestions?

Thanks!

robvolk
Most Valuable Yak

15732 Posts

Posted - 2005-10-22 : 11:18:34
bcp and BULK INSERT are designed to be minimally logged operations, so your transaction log should not show much growth. Simple recovery will also help here. However, since you have indexes on the table you will incur some logging that cannot be avoided. You can always drop the indexes, bulk load the data, and then recreate them afterwards. I do this quite a bit and the improvement is quite significant. You are definitely not seeing bcp's best performance with these indexes in place.

The stall in bcp can be due to file growth in either the data or log file(s), if the database is set to autogrow. Check to see if the database is set to autoshrink, if it is, turn it off. This will keep the files from being shrunk, and therefore reduce the instances where they need to grow again.

Try these first, and if you're still seeing the problem, try setting a batch size and see if it helps.
Go to Top of Page

ps2
Starting Member

3 Posts

Posted - 2005-10-22 : 11:59:30
Rob,

Thanks very much for the response. I switched the production database to simple recovery because under full it was doing a transaction log backup right about the time of the stall. I tried switching to bulk logged and it immediately dumped a 13GB transaction log backup although I have no idea why. I thought switching to simple would resolve it but again it stalled at 85K.

Dropping and re-creating probably isn't an option for a few different reasons;

a) The unique / clustered index takes a while to build (a very long time last time, again not sure why).
b) I need the unique index to prevent duplicates during the load

Auto-extend or shrink isn't contributing to the problem, the data file is 40GB and the database is only about 17GB. The transaction log is probably oversided at 10GB but hopefully isn't causing any problems.

I've tried setting a batch size on development but found there was little difference in the total elapsed time - is there anything you can recommend?

Any other thoughts?

Thanks again
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2005-10-22 : 12:07:32
The next suggestion I'd make is to try splitting up the file you're importing into smaller ones, say 50,000 rows each, and see if they go in without problems. You may even want to get down to 25,000 row files.

If you only have a clustered index on the table, you can leave it in place and use the ORDER and TABLOCK hints for your bcp load. These hints are documented in Books Online. The ORDER hint only works when the data file is ordered the same way as the table though. Non-clustered indexes would need to be dropped to get the best performance.
Go to Top of Page

ps2
Starting Member

3 Posts

Posted - 2005-10-24 : 07:24:21
Rob,

Thanks again. Right now I have two indexes, the clustered index and one other. I tried sorting the file but found it made no difference, perhaps the other index was to blame or it's possible that I did something wrong (bcp doesn't provide a lot of valuable feedback).

I had tried a smaller load on Friday - I forget the exact numbers (don't have them in front of me) but it was still something like 50K rows in 10 minutes. The problem is that the table is locked the entire time and this is supposed to be a DSS application with on-line users all day long (actually 8am - 6pm mostly). I can't have 3 10-minute periods every hour where they can't get access to data.

I have played around with loading the data into a second empty table, this typically runs in a fraction of the time in development (although it does take a little longer each hour of the day). My thought was to split the data between the two tables, querying from both / either as necessary. The changes to the application aren't too difficult but I'm concerned that I won't be able to get the data from the old table into the new table efficiently enough to make a 8am cutoff. Mind you, if I can drop the indexes in the middle of the night it may not be such a big deal to load the 3M or so rows using fast bcp.

I'm starting to get concerned that there are some fundamental issues with the design. Surely I'm not the first to run a 100M row database, I actually didn't think performance would be this bad at 70M rows.

Any other thoughts? I'm willing to explore some more radical changes to the database if necessary.

Thanks again!

P.S. I tried to post this yesterday but the site was down.
Go to Top of Page
   

- Advertisement -