| Author |
Topic |
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-26 : 15:18:14
|
I am generating lots of files sequentially like this:exec xp_cmdshell 'bcp "select .. from ... where ..." queryout "C:\...\xxx0001.sql" -c -C RAW -T'exec xp_cmdshell 'bcp "select .. from ... where ..." queryout "C:\...\xxx0002.sql" -c -C RAW -T'...exec xp_cmdshell 'bcp "select .. from ... where ..." queryout "C:\...\xxx0999.sql" -c -C RAW -T' Now bcp is pretty fast,But I wonder what other techniques could be used to generate the files as fast and efficiently as possible ?Could a custom program that reads the database and generates the files be faster ?At the moment 300 files takes approx 50 sec.It would be nice if I could cut it to 10 sec or less.rockmoose |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2005-01-26 : 16:15:04
|
| >> Could a custom program that reads the database and generates the files be faster ?Almost certainly not.Using native format would be faster but I guess you need tab delimitted.I suspect that most of the time is spent on making the connections as it has to login and drop the connection each time.Maybe if you ran them in parallel rather than sequentially? As long as the queries don't cause each other to slow down.You could do this by creating a job for each command (or a batch of commands) then starting them.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2005-01-26 : 16:33:33
|
| Haven't we come along way! 300 database files in 50 secs! It used to take that long to spin up a 5 1/4 inch drive.The update of the BCP GUI I wrote does batch BCP'ing by spawning a thread for each BCP command. I cannot remember off the top of my head what the max thread count is... 24?But if this is a fixed process, creating a SQL job for each is a fantastic idea... Nice Nigel.DavidM"Always pre-heat the oven" |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2005-01-26 : 16:59:31
|
| You're kidding right?bcp is the fastest way out that I know...Brett8-) |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2005-01-26 : 17:00:48
|
| Doesn't have to be fixed.I seem to recall writing something a while ago to test SPs running in parallel.It would take each command from a table, create a job for each then run them.It would keep on running them for a specified number of times logging the start and end times of each.You could use a similar thing to run these bcp's.Or I wonder if the stress testing app would do it?==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
MichaelP
Jedi Yak
2489 Posts |
Posted - 2005-01-26 : 17:10:44
|
| 300 files in 50 seconds seems pretty darn fast to me man.The jobs idea is a good one, but some hardware might be in order here as well, esp. as your database grows.Michael<Yoda>Use the Search page you must. Find the answer you will.</Yoda> |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-26 : 17:48:32
|
>> Using native format would be faster but I guess you need tab delimittedI did not notice any difference between using -c / -n, it's just one column that is bcp:ed out.>> I suspect that most of the time is spent on making the connections as it has to login and drop the connection each timeJust using QA at the moment, eventually a number of clients can login and trigger the bcp processing, which runs in a proc.>> some hardware might be in order here as wellAgree... ;)>> You're kidding right? bcp is the fastest way out that I know...I wanted faster !I was thinking of maybe bcp:ing all the data out in one chunk and have a file-splitting program split the "large" file into "smaller" files or whatever.FWIW here is sample code concept form the bcp proc.set nocount on-- table of bcp commands to doselect bcpcmd = 'bcp "select t.datacol from tab t (nolock) where t.schema = ' + ltrim(number) + ' order by t.ordercol" queryout "C:\...\XXX' + ltrim(number) + '.sql" -c -C RAW -T'into #bcpfrom master.dbo.spt_values where type = 'p' and number / 100 = 1declare @cmd varchar(8000)-- loop through all the bcp table and bcp the codedeclare csr_bcp cursor fast_forward for select bcpcmd from #bcpopen csr_bcpfetch next from csr_bcp into @cmdwhile @@fetch_status = 0begin select @cmd as [bcpthis...] -- exec master..xp_cmdshell @cmd, no_output fetch next from csr_bcp into @cmdendclose csr_bcpdeallocate csr_bcpdrop table #bcp I will check more on the parallell execution / several jobs track, I think that would speed up the process.Thanks for the suggestion!btw, minimal performance is gained by losing the cursor rockmoose |
 |
|
|
MichaelP
Jedi Yak
2489 Posts |
Posted - 2005-01-26 : 17:51:39
|
| Parallel Execution might put lots of stress on your system, so take that into account when exploring that option. Running this thing at night might not be bad, but mid day might slow things down for users, even with nolock.Michael<Yoda>Use the Search page you must. Find the answer you will.</Yoda> |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-26 : 18:02:30
|
quote: Originally posted by MichaelP Parallel Execution might put lots of stress on your system, so take that into account when exploring that option. Running this thing at night might not be bad, but mid day might slow things down for users, even with nolock.
The purpose of the system is this:Users enter metadata and files are generated from the metadata.The time to process the metadata is very small, the time to generate the physical files is very large.So that's what I am trying to optimize.rockmoose |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-29 : 07:25:39
|
| Parallell execution roughly doubles the throughput. Thanx.rockmoose |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2005-01-29 : 09:15:11
|
| >>I cannot remember off the top of my head what the max thread count is... 24Typically registry settings allow a maximum of 25 worker threads. The numbers can be interrogated using the GetAvailableThreads and GetMaxThreads methods in System.Threading namespace.But the .net threadpool (managed virtual threading pool) will accept an unlimited number of threads.We just wrote a .NET console app for ftping files using the threadpool. We look at the number of required processes (files) and divide the process into 15 threads and pool those - we are seeing some impressive throughput gains. It was taking about 8 seconds to bring down a single file and now we are receiving ~ 8 files every 3 seconds. |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-30 : 16:47:46
|
| I'd suggest to compare to this VB-SQL-DMO code:Dim srv As New SQLDMO.SQLServer, bcp As New BulkCopyDim db As SQLDMO.Database, isrv.EnableBcp = Truesrv.Connect ".", "sa", "pwd"Set db = srv.Databases("myDB")bcp.UseExistingConnection = TrueFor i = 1 To 3bcp.DataFilePath = "D:\t" & i & ".txt"db.Tables("t" & i).ExportData bcpNext iSet bcp = NothingSet db = Nothingsrv.DisconnectSet srv = Nothing |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-30 : 18:19:58
|
| The SQL-DMO BulkCopy object can only take a table or view as source, not a select statement.You are suggesting an alternative using DMO, but I can't see how that would be faster than bcp ?>> bcp.UseExistingConnection = TrueIs the point having clients run the export process in parallel through many connections ?rockmoose |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-30 : 19:05:32
|
| > an alternative using DMO,> but I can't see how that would be faster than bcp ?I think they are essentially the same (internally) thing.I just thought... if Nigel's right and> most of the time is spent on making the connections...then why not if we can use only one connection for bulkcopying?For i = 1 To 300Shell "c:\mssql7\binn\bcp " & _"myDB..t" & i & " out D:\temp\t" & i & ".txt -c -Sz -Usa -P""", vbHideNext i |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-30 : 19:45:42
|
| PSShell function works in async mode. |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-31 : 02:59:58
|
| Interesting,Each xp_cmdshell 'bcp ....' takes approx 300 ms, I don't know how much of that time is spent on setting up the connections.Right now I have 4 jobs doing the bcp in parallell, it's doing about 10-12 logins&logouts/sec, 1 per file being exported.The BulkCopy.UseExistingConnection = True could help with the connection issues, but it's a pity one has to provide a table/view for the BulkCopy to operate on, (the data is in one table).rockmoose |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-31 : 03:09:54
|
I've tested it: DMO-BulkCopy vs "pure" bcp.exes.Involved tables t1-t100 are just copies of table Authors from pubs db.VB code and info from profiler:Dim srv As New SQLDMO.SQLServer, bcp As New BulkCopyDim db As SQLDMO.Database, isrv.EnableBcp = Truesrv.Connect "server", "sa", ""Set db = srv.Databases("pubs")bcp.UseExistingConnection = TrueFor i = 1 To 100bcp.DataFilePath = "C:\temp\t" & i & ".txt"db.Tables("t" & i).ExportData bcpNext iSet bcp = NothingSet db = Nothingsrv.DisConnectSet srv = NothingFor i = 1 To 100Shell "c:\mssql7\binn\bcp " & _"pubs..t" & i & " out C:\temp\tt" & i & ".txt -c -Sserver -Usa -P""", vbHideNext iEvent Class Application Name Connection ID Start Time End TimeConnect SQLDMO_1 233 09:26:21.060Disconnect SQLDMO_1 233 09:26:21.060 09:26:27.467Connect Microsoft SQL Server 234 09:26:28.187-- here multiple bcp.exe's connects/disconnects (in quite messy order)Disconnect Microsoft SQL Server 330 09:26:50.873 09:26:53.060 any comments? |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-31 : 03:16:45
|
| oops! rocko, I missed your latest post... a bit busy at the moment. |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-31 : 13:52:02
|
| Thank You stoad, test results are coming up....It looks like SQLDMO BulkCopy with UseExistingConnection is the winner with my testsetup.rockmoose |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-31 : 13:56:16
|
Exporting pubs..Authors table 100 timesTest Results are in:---------------------------------------------------------DMO BulkCopy UseExistingConnection = True 2 sec -DMO BulkCopy UseExistingConnection = False 3 sec -Async BCP with Shell Command 5 sec -Sequential BCP with xp_cmdshell 17 sec ----------------------------------------------------------Detailed Test resultsDMO BulkCopy UseExisitngConnection = True1,9375 secEventClass Application Start Finish---------------------------------------------------------------------------------------Audit Login SQLDMO_1 2005-01-31 19:21:56.727 Audit Logout SQLDMO_1 2005-01-31 19:21:56.727 2005-01-31 19:21:58.680DMO BulkCopy UseExisitngConnection = False3,234375 secEventClass Application Start Finish---------------------------------------------------------------------------------------Audit Login SQLDMO_1 2005-01-31 19:23:37.133 ...Audit Logout SQLDMO_101 2005-01-31 19:23:40.367 2005-01-31 19:23:40.383 Audit Logout SQLDMO_1 2005-01-31 19:23:37.133 2005-01-31 19:23:40.383 BCP Shell Command5,34375 secEventClass Application Start Finish---------------------------------------------------------------------------------------Audit Logout Microsoft SQL Server 2005-01-31 19:25:59.290 2005-01-31 19:25:59.320 ...Audit Logout Microsoft SQL Server 2005-01-31 19:26:05.337 2005-01-31 19:26:05.367 xp_cmdshell loop with bcp17.70 secEventClass Application Start Finish---------------------------------------------------------------------------------------Audit Logout Microsoft SQL Server 2005-01-31 19:32:50.117 2005-01-31 19:32:50.117...Audit Logout Microsoft SQL Server 2005-01-31 19:33:07.630 2005-01-31 19:33:07.630 The Test Code(2 vbscript programs and a T-SQL loop as follows)---------------------------------------------------------------------------------------Dim srv,bcp,db,i,StartTime,varvar = MsgBox("UseExistingConnection",4)Set srv = CreateObject("SQLDMO.SQLServer")Set bcp = CreateObject("SQLDMO.BulkCopy")Set db = CreateObject("SQLDMO.Database")srv.EnableBcp = Truesrv.LoginSecure = Truesrv.ConnectSet db = srv.Databases("pubs")if var = 6 then 'Yes clicked bcp.UseExistingConnection = Trueelse bcp.UseExistingConnection = Falseend ifStartTime = TimerFor i = 1 To 100 bcp.DataFilePath = "C:\temp\t" & i & ".txt" db.Tables("authors").ExportData bcpNextSet bcp = NothingSet db = Nothingsrv.DisConnectSet srv = Nothingif var = 6 then MsgBox Timer - StartTime & " seconds - bulk UseExistingConnection = True"else MsgBox Timer - StartTime & " seconds - bulk UseExistingConnection = False"end if---------------------------------------------------------------------------------------Dim StartTimeStartTime = TimerFor i = 1 To 100CreateObject("WScript.Shell").Run "bcp pubs..authors out C:\temp\t" & i & ".txt -c -C RAW -T", 0NextMsgBox Timer - StartTime & " seconds - bcp shell"---------------------------------------------------------------------------------------set nocount ondeclare @i int, @d datetime, @cmd nvarchar(256)select @i = 1, @d = getdate()while @i < 101begin set @cmd = N'bcp pubs..authors out C:\temp\t' + ltrim(@i) + N'.txt -c -C RAW -T' exec master..xp_cmdshell @cmd, no_output set @i = @i+1endselect datediff(millisecond,@d,getdate())/1000.0 as [seconds - xp_cmdshell loop]---------------------------------------------------------------------------------------rockmoose |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-31 : 15:00:31
|
| great, rocko! pity it only works with tables & views. |
 |
|
|
Next Page
|