Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
JRea
Starting Member
4 Posts |
Posted - 2004-06-14 : 09:19:37
|
Hi,This is my first attempt at importing data into SQL Server, and I'm stuck. Hopefully someone can help.I have some text delimited files which I need to import into a database. The data is financial transactions. The first 10 fields on each row are the header data which I need to insert into one table (contains transaction type, timestamp, batch number, etc.) The rest of the data on each row is a repeating group of 7 fields depending on how many transactions were entered in that particular batch. On some rows there is only one occurence of the repeating data, on others there are several hundred occurrences.So I want to be able to put the header data on one table and each occurrence of the repeating data as a separate row on another table, taking various info from the header data as the foreign key.Simplified example. Data currently looks like this:C|HECBIC|16/01/04|4545|145.56|12345678X|CR|10/01/04|567.98|12345678X|CR|10/01/04|332.01|54245678Z|DR|12/01/04|And I want to end up with something like this:Table 1 (Header)C HECBIC 16/01/04 4545Table 2 (Transaction Data)HECBIC 16/01/04 4545 145.56 12345678X CR 10/01/04 HECBIC 16/01/04 4545 567.98 12345678X CR 10/01/04HECBIC 16/01/04 4545 332.01 54245678Z DR 12/01/04I can't dump it all into one table and then write a script as for some rows the total length of the data is over 10000 characters and several thousand columns.Apologies if this is a bit lengthy and hope it makes sense. Thanks in advance for any advice.  |
|
|
X002548
Not Just a Number
15586 Posts |
|
|
JRea
Starting Member
4 Posts |
Posted - 2004-06-15 : 06:08:49
|
| I intend to split the rows into two separate tables - one containing the header data for each row, and then splitting the repeating transaction data for each row onto multiple rows on the second table.Eg, using my example row from above:C|HECBIC|16/01/04|4545|145.56|12345678X|CR|10/01/04|567.98|12345678X|CR|10/01/04|332.01|54245678Z|DR|12/01/04|The above is currently one row in my text file. The red data needs to go into the first table. The rest then has to be put into the second table (along with a couple of fields from the red data to be the key) as separate rows. The different colours need to be inserted as separate rows as this info can be repeated hundreds of times on one row in my text file.In other words I need to split that row of data like this:Column1 > Table 1.Column 1Column2 > Table 1.Column 2Column3 > Table 1.Column 3Column4 > Table 1.Column 4Column2 > Table 2.Column 1 (Red data as key)Column3 > Table 2.Column 2 (Red data as key)Column4 > Table 2.Column 3 (Red data as key)Column5 > Table 2.Column 4 (Blue data above)Column6 > Table 2.Column 5 (Blue data above)Column7 > Table 2.Column 6 (Blue data above)Column8 > Table 2.Column 7 (Blue data above)(New row on second table)Column2 > Table 2.Column 1 (Red data as key)Column3 > Table 2.Column 2 (Red data as key)Column4 > Table 2.Column 3 (Red data as key)Column9 > Table 2.Column 4 (Green data above)Column10 > Table 2.Column 5 (Green data above)Column11 > Table 2.Column 6 (Green data above)Column12 > Table 2.Column 7 (Green data above)(New row on second table)Column2 > Table 2.Column 1 (Red data as key)Column3 > Table 2.Column 2 (Red data as key)Column4 > Table 2.Column 3 (Red data as key)Column13 > Table 2.Column 4 (Purple data above)Column14 > Table 2.Column 5 (Purple data above)Column15 > Table 2.Column 6 (Purple data above)Column16 > Table 2.Column 7 (Purple data above)Have looked at your other thread but doesn't that involve initially dumping into a staging table? Don't think I can do this as some of my rows will exceed the maximum 8060 bytes row length & 1024 columns per table. Hence why I think I need to be able to split the data during the load.Cheers,James. |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2004-06-15 : 10:52:31
|
| [code]SET NOCOUNT ON/*--------------------------------------------------------------------------------------------------------------------------------Create Test Data--------------------------------------------------------------------------------------------------------------------------------*/CREATE TABLE stage (id int identity(1,1), data text)INSERT INTO stageSELECT 'C|HECBIC|16/01/04|4545|145.56|12345678X|CR|10/01/04|567.98|12345678X|CR|10/01/04|332.01|54245678Z|DR|12/01/04|' union allSELECT 'C|HECBIX|16/01/04|4546|245.56|12345678X|CR|10/01/04|568.98|12345678X|CR|10/01/04|333.01|54245678Z|DR|12/01/04|' + replicate ('145.59|12345678X|CR|10/01/04|567.98|12345678X|CR|10/01/04|332.01|54245678Z|DR|12/01/04|',600)/*--------------------------------------------------------------------------------------------------------------------------------Create tally table used in parsing routine--------------------------------------------------------------------------------------------------------------------------------*/declare @n int; set @n = 0create table numbers( n int primary key)while @n <= 10000begin insert into numbers select @n set @n = @n+1end/*--------------------------------------------------------------------------------------------------------------------------------Create Temp Tables--------------------------------------------------------------------------------------------------------------------------------*/CREATE TABLE ##stage ( n int identity(1,1),id int,val varchar(20))CREATE TABLE ##stage2 (id int, col1 varchar(20),col2 varchar(20), col3 varchar(20), col4 varchar(20), gp int,val varchar(20), rowct int)/*--------------------------------------------------------------------------------------------------------------------------------Build Dymanic SQL to Parse the text field into n number of VARCHAR() Data chunks--------------------------------------------------------------------------------------------------------------------------------*/DECLARE @id int DECLARE @datalen int DECLARE @sql varchar(8000) DECLARE @sql1 varchar(8000) DECLARE @cnt int --Loop through all records in stage table and parseSET @id = 0WHILE @id <= (SELECT Max(id) FROM stage)BEGINSET @id = @id+1 -- or whatever the id -- get the length SELECT @datalen = DATALENGTH (data) / 4000 + 1 FROM stage WHERE id = @id -- collect into @sql declarations of @str1, @str2,...@strn SET @cnt = 1 SET @sql='DECLARE @id int; SET @id = ' + CONVERT(VARCHAR(10), @id) + CHAR (13) + 'DECLARE ' SET @sql1 = '' WHILE (@cnt <= @datalen) BEGIN SELECT @sql = @sql + CASE @cnt WHEN 1 THEN '' ELSE ', ' + CHAR(13) END + ' @str'+CONVERT(varchar(10),@cnt)+' VARCHAR(8000)' SET @cnt = @cnt + 1 END -- collect into @sql data chunksSET @cnt = 1 WHILE (@cnt <= @datalen) BEGIN IF LEN(@sql) < 7850 SELECT @sql = @sql + CHAR (13) + 'SELECT @str' + CONVERT(VARCHAR(10), @cnt) + ' = SUBSTRING(data, ' + CONVERT(VARCHAR(30), (@cnt-1)*4000+1) + ', 4000)' +'FROM stage ' + 'WHERE id = ''' + CONVERT(VARCHAR(10), @id) + '''' ELSE SELECT @sql1 = @sql1 + CHAR (13) + 'SELECT @str' + CONVERT(VARCHAR(10), @cnt) + ' = SUBSTRING(data, ' + CONVERT(VARCHAR(30), (@cnt-1)*4000+1) + ', 4000)' +'FROM stage ' + 'WHERE id = ''' + CONVERT(VARCHAR(10), @id) + '''' SET @cnt = @cnt + 1 END --Build up the string concat variabledeclare @str varchar(8000); set @str = ''SET @cnt = 1 WHILE (@cnt <= @datalen) BEGIN SELECT @str = @str + '@str' + CONVERT (varchar(10), @cnt) IF @CNT < @DATALEN BEGIN SET @str = @str + '+'ENDSET @cnt = @cnt + 1 END --Add Parsing LogicSET @sql1 = @sql1 + CHAR(13) + 'INSERT INTO ##stage (id,val)' + CHAR(13) +'SELECT @id,' + CHAR(13) +'NullIf(SubString(''|'' + ' + @str + ' + ''|'' , n , CharIndex(''|'' , ''|'' + ' + @str + ' + ''|'' , n) - n) , '''')' + CHAR(13) +'FROM numbers ' + CHAR(13) +'WHERE n <= Len(''|'' + ' + @str + ' + ''|'') AND SubString(''|'' + ' + @str + ' + ''|'' , n - 1, 1) = ''|''' + CHAR(13) +'AND CharIndex(''|'' , ''|'' + ' + @str + ' + ''|'' , n) - n > 0 ' + CHAR(13) /*-------------------------------------------------------------------------------------------------------------------------------- Execute the Dynamic SQL--------------------------------------------------------------------------------------------------------------------------------*/ --PRINT @sql + @sql1 EXEC(@sql + @sql1)END/*--------------------------------------------------------------------------------------------------------------------------------Group and insert the data into ##stage2--------------------------------------------------------------------------------------------------------------------------------*/--Loop through all rows in staging table and parse the headerSET @id = 0WHILE @id < (SELECT MAX(id) FROM stage)BEGIN SET @id = @id+1 DECLARE @n_min int, @n_max int, @val varchar(20), @rowct int, @gp int SELECT @n_min = MIN(n) FROM ##Stage WHERE id = @id SELECT @n_max = MAX(n) FROM ##Stage WHERE id = @id SELECT @val = '', @rowct = 1 SET @gp = 0--header recordDECLARE @col1 varchar(20), @col2 varchar(20), @col3 varchar(20), @col4 varchar(20)SELECT @col1 = MAX(col1) , @col2 = MAX(col2) , @col3 = MAX(col3) , @col4 = MAX(col4) FROM ( SELECT TOP 4 CASE WHEN n = @n_min THEN val ELSE '' END col1, CASE WHEN n = @n_min+1 THEN val ELSE '' END col2, CASE WHEN n = @n_min+2 THEN val ELSE '' END col3, CASE WHEN n = @n_min+3 THEN val ELSE '' END col4 FROM ##stage WHERE id = @id AND val IS NOT NULL) d--Detail RecordsSET @n_min = @n_min+4WHILE @n_min <=@n_maxBEGIN SELECT @val = val FROM ##stage WHERE id = @id AND n = @n_min SET @n_min = @n_min +1 INSERT INTO ##stage2 (id,col1,col2,col3,col4,gp,val,rowct) SELECT @id, @col1 col1, @col2 col2, @col3 col3, @col4 col4, @gp gp, @val val, @rowct rowct IF @rowct = 4 SELECT @rowct = 1, @gp = @gp + 1 ELSE SET @rowct = @rowct + 1ENDEND/*--------------------------------------------------------------------------------------------------------------------------------Group and Pivot the Output --------------------------------------------------------------------------------------------------------------------------------*/--Header RecordsSELECT id, col1, col2, col3, col4FROM ##stage2GROUP BY id,col1,col2,col3,col4--Detail RecordsSELECT id, col1, col2, col3, MAX(CASE WHEN rowct = 1 THEN val END) as col4, MAX(CASE WHEN rowct = 2 THEN val END) as col5, MAX(CASE WHEN rowct = 3 THEN val END) as col6, MAX(CASE WHEN rowct = 4 THEN val END) as col7FROM ##stage2GROUP BY id,col1,col2,col3,id,gp/*--------------------------------------------------------------------------------------------------------------------------------Clean Up--------------------------------------------------------------------------------------------------------------------------------*/drop table stage, ##stage, ##stage2[/code] |
 |
|
|
JRea
Starting Member
4 Posts |
Posted - 2004-06-16 : 09:17:03
|
Thanks a lot for that - should be most useful. However isn't this is still working from the starting point of the data already being in a staging table on the database?I still can't work out how to get the rows longer than 8060 bytes into the database in the first place. |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2004-06-16 : 09:21:03
|
| [code]CREATE TABLE stage (id int identity(1,1), data text)[/code]BULK INSERT the file into the stage table with No FIELDTERMINATOR. Each row is imported into the TEXT column data. Then use the parsing routine provided above. Just reseed the stage table after each import. |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2004-06-16 : 12:18:31
|
Yeah...the parsing of the delimeters is gonna be a little bitch though...something like....(it needs to be fixed up...)USE NorthwindGOCREATE TABLE myTable99(Col1 text)GOINSERT INTO myTable99(Col1)SELECT '1|2|3|4|'+REPLICATE('x',8000)+'|5'GOCREATE FUNCTION udf_Word (@str text, @word int, @Delim varchar(3))RETURNS varchar(1000)ASBEGIN DECLARE @DelimFound INT, @LastDelimPosition INT SELECT @DelimFound = 0, @LastDelimPosition = 0 WHILE (@DelimFound <= @Word) BEGIN IF (CHARINDEX(' ', @str, @LastDelimPosition + 1) = 0) BREAK ELSE BEGIN SET @LastDelimPosition = CHARINDEX(' ', @str, @LastDelimPosition + 1) END END RETURN SUBSTRING(@str,@LastdelimPosition+1,CHARINDEX(' ', @str, @LastDelimPosition + 1)-@LastdelimPosition+1) ENDSELECT dbo.udf_Word(Col1,3,'|') FROM myTable99DROP FUNCTION udf_WordDROP TABLE myTable99Brett8-) |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2004-06-16 : 12:32:17
|
quote: Originally posted by X002548 Yeah...the parsing of the delimeters is gonna be a little bitch
Brett, It's done... See above code |
 |
|
|
JRea
Starting Member
4 Posts |
Posted - 2004-06-18 : 09:01:46
|
Thanks very much for your help guys. Much appreciated. Saved me a lot of brain-ache trying to figure it out. |
 |
|
|
|
|
|
|
|