Author |
Topic |
ACushen
Starting Member
29 Posts |
Posted - 2006-09-22 : 23:40:55
|
Hi all!I am in need of some help importing a .CSV file into a SQL Server 2005 Express database. I can't use SSIS/DTS because it's SQL Server Express. The operation needs to run as a parameterized Stored Proc which I will call from ASP.NET. The Data Import Wizard doesn't work either for reasons stated below.The problem is that I need to get the only 2 columns from the text file, then "tack on" 3 more columns that will have data that changes, which will come in as parameters to the T-SQL stored proc. This prevents me from using a straight BULK INSERT.The data in each of these columns will be the same for each record in that column - that is, every record in every field in column 3, for example, will be the same.Some of the files will have 4 million records and up, so speed is of the essence here. I tried using BULK INSERT to dump the data into a temporary table - which took 38 seconds and was acceptably fast - but then, my next step was to dump the additional data into the other columns using UPDATE... SET. I gave up on this after the query ran for THIRTY MINUTES! My next step was going to be to move the data from the temporary table into the target permanent table somehow, but I never got that far when I saw how long the previous step took...It's a little odd, because I can do the same thing in MS Access in under 5 minutes by using a SQL statement like this in my ASP.NET code:INSERT INTO [Target_Table] (field1, field2, field3, field4)SELECT F1, F2, strSource , Now() AS DateTimeStampFROM [Text;HDR=NO;DATABASE=c:\test;].[test.CSV] strSource is a string that the User enters (properly vetted for security); strPath and strFilename are strings holding the path and filename to the .CSV file. I create a unique filename from the file the user uploads. This works in under 5 minutes for several millions of records in MS Access, as I said above. I've had no luck getting anything similar to work in SQL Server, though.Can anyone suggest a way to do this at a reasonable speed? Is there a way to speed up the following approach:1) Use BULK INSERT or similar to dump .CSV data into temp table2) Use UPDATE to get additonal data into temp table4) Use INSERT...SELECT INTO to get the data from the temp table into the target permanent db tableIs there a faster way to do any of the above steps? Using ALTER TABLE perhaps??HELLLLLLPPPP.......Thanks in advance,-Andrew |
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-09-23 : 01:59:16
|
There's no need to update the temp table before you insert into the target table, you can put those three variable values into the insert much like you did with Access. Assuming your three variables coming into the stored proc are @var1, @var2, and @var3, just do thisINSERT [Target_Table] (col1, col2, col3, col4, col4)SELECT tempCol1, tempCol2, @var1, @var2, @var3FROM #TempTableYou should try dropping any indexes and constraints from the target table before you start and then adding them back again afterwards. |
|
|
ACushen
Starting Member
29 Posts |
Posted - 2006-09-23 : 11:15:53
|
Wow. I've been struggling with this so long that I didn't see what was in front of my face...Thanks!!My 21MB file now imports in 6:40 (min:sec), which is MUCH better...I will try dropping any indexes and constraints from the target table. Does anyone have any other ideas for speeding this monster up?I have already added DATAFILETYPE = 'native' and TABLOCK to the BULK_INSERT statement, and they helped. I tried SET RECOVERY BULK_LOGGED, but that seemed to actually increase the time the query took. Perhaps I need to run that test again...Thanks again,-Andrew |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2006-09-23 : 11:33:41
|
You can also choose the columns to bulk insert by using a format file.seehttp://www.nigelrivett.net/SQLTsql/BCP_quoted_CSV_Format_file.htmlThis would make the bulk insert faster and also the insert into the final table because it has to read less pages from disk. It won't make much difference with a small file but it will be appreciable with a large one.If you are going to play with the table structure then you could set a default on the extra columns and bulk insert directly into the final table.If you don't want to do that then faster than the insert would probably bebulk insert to tablebcp out to file from query including the new valuesbulk insert to final table==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
ACushen
Starting Member
29 Posts |
Posted - 2006-09-23 : 12:05:45
|
NR-Thanks, I am already using a format file.As far as setting a default on the target table columns, the problem is that the data going into those columns is coming from a user-entered string. If I set a default then bulk insert the 2 columns from the .CSV directly to the target db table, how do I then get my user-entered value into just those columns without considerable gymnastics, or slowing the whole thing down?I would like to avoid using bcp if possible..Thanks,-Andrew |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2006-09-23 : 12:10:51
|
Just add the defaults before you start the insert and remove them afterwards.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-09-23 : 12:50:21
|
"Does anyone have any other ideas for speeding this monster up?"6 minutes sounds like a very long time for a 21MB import to me ...If the import file is pre-sorted in the order of the Clustered Index on the table you can use the "ORDER" bulk copy hint - which will alert SQL Server to the fact that the file is the same order as the clustered index (if the Order command specifies an order different to the clustered index SQL Server will just ignore the hint)Kristen |
|
|
ACushen
Starting Member
29 Posts |
Posted - 2006-09-26 : 11:45:59
|
Thanks all,I will try a few of your ideas.-Andrew |
|
|
ACushen
Starting Member
29 Posts |
Posted - 2006-09-27 : 14:19:36
|
quote: Originally posted by nr Just add the defaults before you start the insert and remove them afterwards.
UPDATE: I have changed the table so that I now only need to add one value to the target table. I am trying to set that value as a default, so I can BULK_INSERT directly to the target table. But...nr:Remember I am calling the stored proc which does the .CSV import from an ASP.NET website, so I can't manually add & remove the Defaults. It is my understanding that using ALTER TABLE tableNameALTER COLUMN colNameSET DEFAULT defaultValue requires a constant value for the defaultValue; I need to use a user-entered value. I have tried building the above statement dynamically with EXEC and sp_executesql, but I get an error like this:"Procedure sp_executesql, Line 1Procedure expects parameter '@statement' of type 'ntext/nchar/nvarchar'." Is it possible to set a user-entered value as a Default??-Andrew |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-09-27 : 14:45:12
|
Yes you can, here's an exampleCREATE TABLE Test (one int CONSTRAINT ColOneDefault DEFAULT 0)GOALTER PROC AddsDefault (@def int)ASDECLARE @sql nvarchar(1000)SET @sql = 'ALTER TABLE Test ' + 'ADD CONSTRAINT ColOneDefault DEFAULT ' + cast(@def AS varchar(10)) + ' FOR one'ALTER TABLE TestDROP CONSTRAINT ColOneDefaultEXEC sp_executesql @stmt = @sqlGOEXEC AddsDefault 85 |
|
|
ACushen
Starting Member
29 Posts |
Posted - 2006-09-27 : 18:06:33
|
snSQL:Thanks, I had finally gotten that to work using EXEC(), and forgotten to post. When I have a few minutes I will take your code and try to convert my proc to work with it.To all-I have determined that doing a BULK_INSERT directly to the target table will be too slow; I already have data in the table, and a clustered index exists, so setting the Recovery mode to BULK_LOGGED has no effect. As a result, the direct BULK_INSERT is painfully slow. Not to mention that my log file balloons from 128MB to over 10GB...I have discovered that, using a combination of methods suggested by helpful folks on this and other forums, I can get my 4-million-row test file into a temporary table in approximately 5 seconds, which is "plenty fast". Unfortunately, my INSERT INTO...SELECT method of getting the data from the temp table into the target permanent table, is taking upwards of 10 minutes...Any advice on the fastest way to get all the records from the temp table into the target table?Thanks again...-Andrew |
|
|
ACushen
Starting Member
29 Posts |
Posted - 2006-09-27 : 23:05:47
|
UPDATE: "Things are really getting weird" dept....OK. So I have played around with a few different options and done some tests.As a test file, I used one .CSV file that is about 21.5 MB, containing 3.68 Million records.In between each test, I wiped the Transaction log back to its starting size. I also deleted the table at the end of each run, re-creating it from the same script.I tested the following 3 scenarios: - Run 1 - BULK INSERT into temp table, INSERT...SELECT INTO target table; Recovery set to BULK_LOGGED temporarily
- Run 2 - BULK INSERT directly into target table, setting a temporary default on one column; Recovery set to BULK_LOGGED temporarily
- Run 3 - BULK INSERT directly into target table, setting a temporary default on one column; Recovery set to SIMPLE temporarily
Here are the results: (min:sec)Run 1: 03:58Run 2: 09:57Run 3: 04:19I have to say that I am really surprised that dumping the data into a temp table and then moving it to the target table was so much faster than the BULK_LOGGED version which dumped the data directly to the target table! Anyone have any thoughts? Also, I am still in search of a faster way to get the data from the temp table into the target table. INSERT...SELECT INTO doesn't seem like it's that quick...and while the 3:58 import is fast enough, some of the files I will need to import are more than 3 times the size of the test file I used above, and fifteen minutes for a single import is too long...Thanks again everyone,-Andrew |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-09-28 : 00:06:21
|
Have you tried with your CSV file pre-sorted in clustered-index order? (You need to provide a Hint to Bulk Insert too)Is it the 15 minutes for a single transaction thats the problem, or the fact that it takes 15 minutes to get the data in?I mean: could you transfer the data from Staging to Target table in batches so that it wasn't a single lock-out transaction?Kristen |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-09-28 : 00:23:52
|
Are you dropping indexes and dropping or turning off constraints on the target before you INSERT ... SELECT? Is there a clustered index on the target table, if so, try setting the fillfactor very low on that index before the INSERT ... SELECT. |
|
|
ACushen
Starting Member
29 Posts |
Posted - 2006-09-28 : 15:24:32
|
Kristen-Unfortunately the target table is not sorted at all - records are entered as the users enter them, pretty much randomly, and the Clustered Index is set on an Identity column. The .CSV file is sorted by the 2 columns that are in it, but there is no column corresponding to the Identity column in the target table (if that makes any sense--?)As far as the 15 minutes - the entire import is happening via an ASP.NET web page. The user selects a .CSV file from his/her PC, and enters some text to identify it. The file is then uploaded to the server, my code re-names it with a unique name, and then the import starts via Stored Procedure. I am using a background thread in ASP.NET to do the upload/Stored Proc call, and the web page keeps re-loading via an HTML META REFRESH, checking the thread to see if it's finished. So I don't see how I can do the upload in batches yet still keep the user informed -- it's a complex situation and perhaps not the best design, but the client wanted a web interface, so...Basically, I want to get the time of the entire operation down to as short a period as possible, so the users don't have to sit there with a browser window open any longer than necessary.snSQL:I have not yet tried your suggestions. I will tackle them next and get back to you.Thanks,-Andrew |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-09-28 : 15:37:48
|
OK, then "Option One", from your Benchmarks, is probably your best bet. Organising your Temp Table to have a PK / Clustered index on CSV's 2 sorted columns (and provide the appropriate Bulk Insert Hint) may help get it into the Temp Staging Table a bit quicker.If the transfer to the Main Table is then too long (in that I mean it is locking other people out for too long) consider transferring it in batches - which won't work if the job needs to be Atomic of course.Add an Identity Column to your TEMP table, and then you can do something like:DECLARE @intID int, @intMaxID intSELECT @intID = 0SELECT @intMaxID = MAX(MyIdentity) FROM TempTable WHILE @intID < @intMaxIDBEGIN SELECT @intID = @intID + 10000 INSERT INTO MainTable SELECT * FROM TempTable WHERE MyIdentity <= @intID DELETE TempTable WHERE MyIdentity <= @intID WAITFOR DELAY '000:00:05'END Kristen |
|
|
ACushen
Starting Member
29 Posts |
Posted - 2006-09-28 : 16:03:58
|
snSQL:There is only one Index on the table - the Clustered index.I will try dropping constraints.How do I programmatically change the fillfactor from a Stored Proc??-Andrew |
|
|
snSQL
Master Smack Fu Yak Hacker
1837 Posts |
Posted - 2006-09-28 : 16:43:29
|
quote: How do I programmatically change the fillfactor from a Stored Proc??
You use DBCC DBREINDEX but you don't want to do that in the sp that runs every time. It rebuilds the entire index which will itself take a while. You want to just run it periodically so that the index pages have more free space in them for when the new records are added. |
|
|
ACushen
Starting Member
29 Posts |
Posted - 2006-09-28 : 16:57:12
|
UPDATE:I think I am now going to mark this one Closed, with many thanks to all who participated.I have now got the insert going directly into the target table to run in 1:28 (min:sec). That's fast enough for me! Plus all the time I've spent tuning the insert comes out of MY wallet - I can't charge the client.The key was noticing that the only index on the table was the clustered index on the identity column. I played around with the indexes and found that putting a single index on BOTH the 2 columns I'm inserting did the trick (the index was placed on the target table in the db). Interestingly, The no-temp-table approach worked faster than the temp-table approach - the opposite of everything I saw up to this point.I will continue playing around with the indexes, but I think I can do that without any assistance.Thanks again to all who took time from their personal lives to help me with this one. It was really driving me crazy!-Andrew |
|
|
ACushen
Starting Member
29 Posts |
Posted - 2006-09-28 : 17:26:01
|
snSQL:Ahhh, thanks. I was wondering, because the info I found in Books Online made it sound like something not to be done lightly...BTW, is there any way to "officially" mark a thread closed in this Forum, other than just saying so in a post?Thanks,-Andrew |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-09-29 : 01:26:34
|
"BTW, is there any way to "officially" mark a thread closed in this Forum, other than just saying so in a post?"Some people re-edit the subject and append "[SOLVED]" - that probably GooglersGlad you got it solved, the timings now look like what I would expect for that size of import.Kristen |
|
|
Next Page
|