Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 lots of bcp

Author  Topic 

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-26 : 15:18:14
I am generating lots of files sequentially like this:

exec xp_cmdshell 'bcp "select .. from ... where ..." queryout "C:\...\xxx0001.sql" -c -C RAW -T'
exec xp_cmdshell 'bcp "select .. from ... where ..." queryout "C:\...\xxx0002.sql" -c -C RAW -T'
.
.
.
exec xp_cmdshell 'bcp "select .. from ... where ..." queryout "C:\...\xxx0999.sql" -c -C RAW -T'


Now bcp is pretty fast,
But I wonder what other techniques could be used to generate the files as fast and efficiently as possible ?
Could a custom program that reads the database and generates the files be faster ?

At the moment 300 files takes approx 50 sec.
It would be nice if I could cut it to 10 sec or less.

rockmoose

nr
SQLTeam MVY

12543 Posts

Posted - 2005-01-26 : 16:15:04
>> Could a custom program that reads the database and generates the files be faster ?
Almost certainly not.

Using native format would be faster but I guess you need tab delimitted.
I suspect that most of the time is spent on making the connections as it has to login and drop the connection each time.
Maybe if you ran them in parallel rather than sequentially? As long as the queries don't cause each other to slow down.
You could do this by creating a job for each command (or a batch of commands) then starting them.



==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2005-01-26 : 16:33:33
Haven't we come along way! 300 database files in 50 secs! It used to take that long to spin up a 5 1/4 inch drive.

The update of the BCP GUI I wrote does batch BCP'ing by spawning a thread for each BCP command. I cannot remember off the top of my head what the max thread count is... 24?

But if this is a fixed process, creating a SQL job for each is a fantastic idea... Nice Nigel.



DavidM

"Always pre-heat the oven"
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-01-26 : 16:59:31
You're kidding right?

bcp is the fastest way out that I know...



Brett

8-)
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2005-01-26 : 17:00:48
Doesn't have to be fixed.
I seem to recall writing something a while ago to test SPs running in parallel.
It would take each command from a table, create a job for each then run them.
It would keep on running them for a specified number of times logging the start and end times of each.

You could use a similar thing to run these bcp's.
Or I wonder if the stress testing app would do it?

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2005-01-26 : 17:10:44
300 files in 50 seconds seems pretty darn fast to me man.

The jobs idea is a good one, but some hardware might be in order here as well, esp. as your database grows.

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-26 : 17:48:32
>> Using native format would be faster but I guess you need tab delimitted
I did not notice any difference between using -c / -n, it's just one column that is bcp:ed out.

>> I suspect that most of the time is spent on making the connections as it has to login and drop the connection each time
Just using QA at the moment, eventually a number of clients can login and trigger the bcp processing, which runs in a proc.

>> some hardware might be in order here as well
Agree... ;)

>> You're kidding right? bcp is the fastest way out that I know...
I wanted faster !
I was thinking of maybe bcp:ing all the data out in one chunk and have a file-splitting program split the "large" file into "smaller" files or whatever.


FWIW here is sample code concept form the bcp proc.
set nocount on

-- table of bcp commands to do
select
bcpcmd = 'bcp "select t.datacol from tab t (nolock) where t.schema = ' + ltrim(number) + ' order by t.ordercol" queryout "C:\...\XXX' + ltrim(number) + '.sql" -c -C RAW -T'
into #bcp
from master.dbo.spt_values where type = 'p' and number / 100 = 1

declare @cmd varchar(8000)

-- loop through all the bcp table and bcp the code
declare csr_bcp cursor fast_forward for
select bcpcmd from #bcp

open csr_bcp
fetch next from csr_bcp into @cmd
while @@fetch_status = 0
begin
select @cmd as [bcpthis...]
-- exec master..xp_cmdshell @cmd, no_output
fetch next from csr_bcp into @cmd
end

close csr_bcp
deallocate csr_bcp

drop table #bcp


I will check more on the parallell execution / several jobs track, I think that would speed up the process.
Thanks for the suggestion!

btw, minimal performance is gained by losing the cursor
rockmoose
Go to Top of Page

MichaelP
Jedi Yak

2489 Posts

Posted - 2005-01-26 : 17:51:39
Parallel Execution might put lots of stress on your system, so take that into account when exploring that option. Running this thing at night might not be bad, but mid day might slow things down for users, even with nolock.

Michael

<Yoda>Use the Search page you must. Find the answer you will.</Yoda>
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-26 : 18:02:30
quote:
Originally posted by MichaelP

Parallel Execution might put lots of stress on your system, so take that into account when exploring that option. Running this thing at night might not be bad, but mid day might slow things down for users, even with nolock.

The purpose of the system is this:
Users enter metadata and files are generated from the metadata.
The time to process the metadata is very small, the time to generate the physical files is very large.
So that's what I am trying to optimize.

rockmoose
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-29 : 07:25:39
Parallell execution roughly doubles the throughput. Thanx.

rockmoose
Go to Top of Page

ehorn
Master Smack Fu Yak Hacker

1632 Posts

Posted - 2005-01-29 : 09:15:11
>>I cannot remember off the top of my head what the max thread count is... 24

Typically registry settings allow a maximum of 25 worker threads. The numbers can be interrogated using the GetAvailableThreads and GetMaxThreads methods in System.Threading namespace.

But the .net threadpool (managed virtual threading pool) will accept an unlimited number of threads.

We just wrote a .NET console app for ftping files using the threadpool. We look at the number of required processes (files) and divide the process into 15 threads and pool those - we are seeing some impressive throughput gains. It was taking about 8 seconds to bring down a single file and now we are receiving ~ 8 files every 3 seconds.
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-30 : 16:47:46
I'd suggest to compare to this VB-SQL-DMO code:

Dim srv As New SQLDMO.SQLServer, bcp As New BulkCopy
Dim db As SQLDMO.Database, i

srv.EnableBcp = True
srv.Connect ".", "sa", "pwd"

Set db = srv.Databases("myDB")

bcp.UseExistingConnection = True

For i = 1 To 3
bcp.DataFilePath = "D:\t" & i & ".txt"
db.Tables("t" & i).ExportData bcp
Next i

Set bcp = Nothing
Set db = Nothing
srv.Disconnect
Set srv = Nothing
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-30 : 18:19:58
The SQL-DMO BulkCopy object can only take a table or view as source, not a select statement.
You are suggesting an alternative using DMO, but I can't see how that would be faster than bcp ?

>> bcp.UseExistingConnection = True
Is the point having clients run the export process in parallel through many connections ?

rockmoose
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-30 : 19:05:32
> an alternative using DMO,
> but I can't see how that would be faster than bcp ?
I think they are essentially the same (internally) thing.

I just thought... if Nigel's right and
> most of the time is spent on making the connections...
then why not if we can use only one connection for bulkcopying?


For i = 1 To 300
Shell "c:\mssql7\binn\bcp " & _
"myDB..t" & i & " out D:\temp\t" & i & ".txt -c -Sz -Usa -P""", vbHide
Next i
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-30 : 19:45:42
PS
Shell function works in async mode.
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-31 : 02:59:58
Interesting,
Each xp_cmdshell 'bcp ....' takes approx 300 ms, I don't know how much of that time is spent on setting up the connections.
Right now I have 4 jobs doing the bcp in parallell, it's doing about 10-12 logins&logouts/sec, 1 per file being exported.

The BulkCopy.UseExistingConnection = True could help with the connection issues, but it's a pity one has to provide a table/view for the BulkCopy to operate on, (the data is in one table).

rockmoose
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-31 : 03:09:54
I've tested it: DMO-BulkCopy vs "pure" bcp.exes.
Involved tables t1-t100 are just copies of table Authors from pubs db.
VB code and info from profiler:

Dim srv As New SQLDMO.SQLServer, bcp As New BulkCopy
Dim db As SQLDMO.Database, i
srv.EnableBcp = True
srv.Connect "server", "sa", ""
Set db = srv.Databases("pubs")
bcp.UseExistingConnection = True
For i = 1 To 100
bcp.DataFilePath = "C:\temp\t" & i & ".txt"
db.Tables("t" & i).ExportData bcp
Next i
Set bcp = Nothing
Set db = Nothing
srv.DisConnect
Set srv = Nothing

For i = 1 To 100
Shell "c:\mssql7\binn\bcp " & _
"pubs..t" & i & " out C:\temp\tt" & i & ".txt -c -Sserver -Usa -P""", vbHide
Next i

Event Class Application Name Connection ID Start Time End Time

Connect SQLDMO_1 233 09:26:21.060
Disconnect SQLDMO_1 233 09:26:21.060 09:26:27.467

Connect Microsoft SQL Server 234 09:26:28.187
-- here multiple bcp.exe's connects/disconnects (in quite messy order)
Disconnect Microsoft SQL Server 330 09:26:50.873 09:26:53.060

any comments?
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-31 : 03:16:45
oops! rocko, I missed your latest post... a bit busy at the moment.
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-31 : 13:52:02
Thank You stoad, test results are coming up....
It looks like SQLDMO BulkCopy with UseExistingConnection is the winner with my testsetup.

rockmoose
Go to Top of Page

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-31 : 13:56:16
Exporting pubs..Authors table 100 times


Test Results are in:
---------------------------------------------------------
DMO BulkCopy UseExistingConnection = True 2 sec -
DMO BulkCopy UseExistingConnection = False 3 sec -
Async BCP with Shell Command 5 sec -
Sequential BCP with xp_cmdshell 17 sec -
---------------------------------------------------------


Detailed Test results

DMO BulkCopy UseExisitngConnection = True
1,9375 sec
EventClass Application Start Finish
---------------------------------------------------------------------------------------
Audit Login SQLDMO_1 2005-01-31 19:21:56.727
Audit Logout SQLDMO_1 2005-01-31 19:21:56.727 2005-01-31 19:21:58.680


DMO BulkCopy UseExisitngConnection = False
3,234375 sec
EventClass Application Start Finish
---------------------------------------------------------------------------------------
Audit Login SQLDMO_1 2005-01-31 19:23:37.133
...
Audit Logout SQLDMO_101 2005-01-31 19:23:40.367 2005-01-31 19:23:40.383
Audit Logout SQLDMO_1 2005-01-31 19:23:37.133 2005-01-31 19:23:40.383


BCP Shell Command
5,34375 sec
EventClass Application Start Finish
---------------------------------------------------------------------------------------
Audit Logout Microsoft SQL Server 2005-01-31 19:25:59.290 2005-01-31 19:25:59.320
...
Audit Logout Microsoft SQL Server 2005-01-31 19:26:05.337 2005-01-31 19:26:05.367


xp_cmdshell loop with bcp
17.70 sec
EventClass Application Start Finish
---------------------------------------------------------------------------------------
Audit Logout Microsoft SQL Server 2005-01-31 19:32:50.117 2005-01-31 19:32:50.117
...
Audit Logout Microsoft SQL Server 2005-01-31 19:33:07.630 2005-01-31 19:33:07.630


The Test Code
(2 vbscript programs and a T-SQL loop as follows)
---------------------------------------------------------------------------------------
Dim srv,bcp,db,i,StartTime,var

var = MsgBox("UseExistingConnection",4)

Set srv = CreateObject("SQLDMO.SQLServer")
Set bcp = CreateObject("SQLDMO.BulkCopy")
Set db = CreateObject("SQLDMO.Database")
srv.EnableBcp = True
srv.LoginSecure = True
srv.Connect
Set db = srv.Databases("pubs")

if var = 6 then 'Yes clicked
bcp.UseExistingConnection = True
else
bcp.UseExistingConnection = False
end if

StartTime = Timer

For i = 1 To 100
bcp.DataFilePath = "C:\temp\t" & i & ".txt"
db.Tables("authors").ExportData bcp
Next

Set bcp = Nothing
Set db = Nothing
srv.DisConnect
Set srv = Nothing

if var = 6 then
MsgBox Timer - StartTime & " seconds - bulk UseExistingConnection = True"
else
MsgBox Timer - StartTime & " seconds - bulk UseExistingConnection = False"
end if
---------------------------------------------------------------------------------------
Dim StartTime
StartTime = Timer
For i = 1 To 100
CreateObject("WScript.Shell").Run "bcp pubs..authors out C:\temp\t" & i & ".txt -c -C RAW -T", 0
Next
MsgBox Timer - StartTime & " seconds - bcp shell"
---------------------------------------------------------------------------------------
set nocount on
declare @i int, @d datetime, @cmd nvarchar(256)
select @i = 1, @d = getdate()
while @i < 101
begin
set @cmd = N'bcp pubs..authors out C:\temp\t' + ltrim(@i) + N'.txt -c -C RAW -T'
exec master..xp_cmdshell @cmd, no_output
set @i = @i+1
end
select datediff(millisecond,@d,getdate())/1000.0 as [seconds - xp_cmdshell loop]
---------------------------------------------------------------------------------------


rockmoose
Go to Top of Page

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-31 : 15:00:31
great, rocko! pity it only works with tables & views.
Go to Top of Page
    Next Page

- Advertisement -