Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
billsox
Yak Posting Veteran
74 Posts |
Posted - 2003-03-21 : 09:04:43
|
| Hi,I have a situation where I need to extract a lot of data from a query and put the data into multiple text files (say, anywhere between 20 and 2,000 text files). (Granted, 2,000 text files is not the norm; probably 20-80 is about average but occasionally I'll need to do a lot more.) The number of rows to be exported into each file is about 8,000 rows. I'm trying to find the fastest possible solution. My first thought would be to put all of the data from the query into a temporary table (this could mean a temp table containing 200,000 to 20,000,000 records and about 40-60 columns!); then place indexes on the primary key columns. From there, I could run a DTS package over and over, each time changing the source query and destination file. To make matters more complicated, the file name of the destination file must contain how many markets and weeks are in its respective file. This would suggest I need a second table that contains market and week counts from the first table.Does anyone have any ideas on how to streamline this process to make it faster? I know it's not a pretty situation but I don't make the business rules. =) Thanks in advance.Bill |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-03-21 : 23:33:45
|
quote: My first thought would be to put all of the data from the query into a temporary table (this could mean a temp table containing 200,000 to 20,000,000 records and about 40-60 columns!)
Bag that, that's about the WORST performing solution I can think of. TempDB WILL blow up on you and you'll run out of space in both the data and log files, and probably the drive itself. Adding indexes on top of that will only make it worse. That kind of volume is best left in its original place as much as possible. Don't transfer the data to another area and index it, use its existing indexes and add some new ones for the sake of this transfer, if needed.Build a second table that performs the summary by market and weeks, that will let you build the file name accordingly, and make its key as close as possible to the one on the source table(s), OR the clustering key if it's not the primary key. You want to be able to join this summary table on the clustering key of the detail tables during export, so that for each file you export you are essentially performing a singleton "WHERE value=123" select, and the optimizer can do index seeks and essentially grab only those pages that match the value in the join. This is akin to slicing out only the little part of the table that you want, and since those pages will be contiguous the I/O should be nice and fast and sequential. Depending on how fast you really need this to be, if your tables are not clustered on the crucial columns it might be worthwhile to try changing the clustering to accommodate this operation.40-60 columns seems to rule out using covering indexes, they'll be way too much overhead, so I think you'll have to use the clustering key as much as possible. Clustering on market is a good start if each file will represent only one or a few markets. You want to cluster on whatever key would form a natural separation for the files being exported. It might be 2 or more columns, like market and date, for example. If you can describe in more detail how the files are to be segregated that would help us to refine the recommendation.I'd also avoid DTS for something of this volume, it could work sure, but it'll have a lot more overhead than using bcp would. Lately I've been doing a lot of fun things with this technique:http://www.sqlteam.com/Forums/topic.asp?TOPIC_ID=24371In your scenario, you'd do the reverse: build the summary table and use its values to construct a file name, then export the data based on a value in that table, joined to the clustering column(s) of the table. Assuming that each file would represent a market and year, you can build the summary with this:SELECT MarketName, DatePart(year, InvoiceDate) AS Year, Count(DISTINCT DatePart(month, InvoiceDate)) AS Months, Count(DISTINCT DatePart(week, InvoiceDate)) AS WeeksINTO #summary --yeah, this one should be small enough to use a temp table, your choiceFROM MyBigFatGreekInvoiceTableGROUP BY MarketName, DatePart(year, InvoiceDate)Once that's done:DECLARE @bcp varchar(8000), @market varchar(50), @year intWHILE EXISTS (SELECT * FROM #summary)BEGIN SET @bcp='' SELECT TOP 10 @bcp=@bcp+'EXEC master..xp_cmdshell ''bcp "SELECT * FROM MyBigFatGreekInvoiceTable WHERE MarketName=''' + MarketName + ''' AND InvoiceDate BETWEEN ''' + cast(Year AS varchar) + '0101'' AND ''' + cast(Year AS varchar) + '1231''" queryout c:\export\' + Market + '_' + cast(Year AS varchar) + '_W' + cast(Weeks AS varchar) + '.txt -Sserver -Uusername -Ppassword -c -r\n -t\t''; ' FROM #summary DELETE #summary WHERE CharIndex(Market, @bcp)>0 AND CharIndex(cast(Year as varchar), @bcp)>0 EXEC(@bcp)ENDDROP TABLE #summary I know that looks messy, but all it does is dynamically construct the file names from the data, and then runs bcp to use a query to extract the data straight from the table. You'll probably have to tweak that, and might also want to run fewer than 10 bcp ops per cycle (depending on the length of the market names you may have to, to keep the @bcp variable from exceeding 8000 characters) You might also want to look at some other bcp options like error file reporting and such or to change the delimiter or use a trusted connection. I also chose to specify actual date values instead of using a DatePart() to the bcp query. If you cluster on date, using values instead of expressions will let the optimizer take advantage of the index.Hope this helps! Let me know if you get stuck or if this won't work at all for whatever reason.Edited by - robvolk on 03/21/2003 23:36:37 |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2003-03-22 : 20:55:57
|
| Depends on the queries to create the text files.If they are closely related then it may be better to create a table with the the data, index it to facilitate the file creation queries then run them one by one.A temp table of that size would mean defining tempdb to be large and it would probably be better to create this table in a permanant database. Make it specifically for this process so that it doesn't affect anything else if it has problems - and so that you can truncate the log on checkpoint and allow non-logged operations.Then you will need an efficient way of creating the table - that depends on the requirements.Creating the files - I would recommend bcp as it is a lot simpler than dts - and faster for a single file. If you are creating a table it is the obvious choice. dts is good at producing formatted output but usually it is is better to format the data then output it.<rant> or are you yet another one looking for an excuse to use dts </rant>You might also consider exporting the data to another server and running the process on that - depends on the system.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
|
|
|
|
|