Help Needed with importing text files

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Import/Export (DTS) and Replication (2000)

Help Needed with importing text files

Author

Topic

JRea
Starting Member

4 Posts

Posted - 2004-06-14 : 09:19:37

Hi,

This is my first attempt at importing data into SQL Server, and I'm stuck. Hopefully someone can help.

I have some text delimited files which I need to import into a database. The data is financial transactions. The first 10 fields on each row are the header data which I need to insert into one table (contains transaction type, timestamp, batch number, etc.) The rest of the data on each row is a repeating group of 7 fields depending on how many transactions were entered in that particular batch. On some rows there is only one occurence of the repeating data, on others there are several hundred occurrences.

So I want to be able to put the header data on one table and each occurrence of the repeating data as a separate row on another table, taking various info from the header data as the foreign key.

Simplified example. Data currently looks like this:

C|HECBIC|16/01/04|4545|145.56|12345678X|CR|10/01/04|567.98|12345678X|CR|10/01/04|332.01|54245678Z|DR|12/01/04|

And I want to end up with something like this:

Table 1 (Header)
C HECBIC 16/01/04 4545

Table 2 (Transaction Data)
HECBIC 16/01/04 4545 145.56 12345678X CR 10/01/04
HECBIC 16/01/04 4545 567.98 12345678X CR 10/01/04
HECBIC 16/01/04 4545 332.01 54245678Z DR 12/01/04

I can't dump it all into one table and then write a script as for some rows the total length of the data is over 10000 characters and several thousand columns.

Apologies if this is a bit lengthy and hope it makes sense.

Thanks in advance for any advice.

X002548
Not Just a Number

15586 Posts

Posted - 2004-06-14 : 12:50:44

So what do you plan to do with a row that 10,000 bytes?

split it up in to multiple rows? a different table?

This was a good thread

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=27752&SearchTerms=header,trailer

Brett

8-)

JRea
Starting Member

4 Posts

Posted - 2004-06-15 : 06:08:49

I intend to split the rows into two separate tables - one containing the header data for each row, and then splitting the repeating transaction data for each row onto multiple rows on the second table.
Eg, using my example row from above:

C|HECBIC|16/01/04|4545|145.56|12345678X|CR|10/01/04|567.98|12345678X|CR|10/01/04|332.01|54245678Z|DR|12/01/04|

The above is currently one row in my text file. The red data needs to go into the first table. The rest then has to be put into the second table (along with a couple of fields from the red data to be the key) as separate rows. The different colours need to be inserted as separate rows as this info can be repeated hundreds of times on one row in my text file.

In other words I need to split that row of data like this:

Column1 > Table 1.Column 1
Column2 > Table 1.Column 2
Column3 > Table 1.Column 3
Column4 > Table 1.Column 4

Column2 > Table 2.Column 1 (Red data as key)
Column3 > Table 2.Column 2 (Red data as key)
Column4 > Table 2.Column 3 (Red data as key)
Column5 > Table 2.Column 4 (Blue data above)
Column6 > Table 2.Column 5 (Blue data above)
Column7 > Table 2.Column 6 (Blue data above)
Column8 > Table 2.Column 7 (Blue data above)

(New row on second table)
Column2 > Table 2.Column 1 (Red data as key)
Column3 > Table 2.Column 2 (Red data as key)
Column4 > Table 2.Column 3 (Red data as key)
Column9 > Table 2.Column 4 (Green data above)
Column10 > Table 2.Column 5 (Green data above)
Column11 > Table 2.Column 6 (Green data above)
Column12 > Table 2.Column 7 (Green data above)

(New row on second table)
Column2 > Table 2.Column 1 (Red data as key)
Column3 > Table 2.Column 2 (Red data as key)
Column4 > Table 2.Column 3 (Red data as key)
Column13 > Table 2.Column 4 (Purple data above)
Column14 > Table 2.Column 5 (Purple data above)
Column15 > Table 2.Column 6 (Purple data above)
Column16 > Table 2.Column 7 (Purple data above)

Have looked at your other thread but doesn't that involve initially dumping into a staging table? Don't think I can do this as some of my rows will exceed the maximum 8060 bytes row length & 1024 columns per table.

Hence why I think I need to be able to split the data during the load.

Cheers,
James.

ehorn
Master Smack Fu Yak Hacker

1632 Posts

Posted - 2004-06-15 : 10:52:31

[code]
SET NOCOUNT ON

/*--------------------------------------------------------------------------------------------------------------------------------
Create Test Data
--------------------------------------------------------------------------------------------------------------------------------*/
CREATE TABLE stage (id int identity(1,1), data text)
INSERT INTO stage
SELECT 'C|HECBIC|16/01/04|4545|145.56|12345678X|CR|10/01/04|567.98|12345678X|CR|10/01/04|332.01|54245678Z|DR|12/01/04|' union all
SELECT 'C|HECBIX|16/01/04|4546|245.56|12345678X|CR|10/01/04|568.98|12345678X|CR|10/01/04|333.01|54245678Z|DR|12/01/04|' +
replicate ('145.59|12345678X|CR|10/01/04|567.98|12345678X|CR|10/01/04|332.01|54245678Z|DR|12/01/04|',600)

/*--------------------------------------------------------------------------------------------------------------------------------
Create tally table used in parsing routine
--------------------------------------------------------------------------------------------------------------------------------*/

declare @n int; set @n = 0
create table numbers( n int primary key)
while @n <= 10000
begin
insert into numbers select @n
set @n = @n+1
end

/*--------------------------------------------------------------------------------------------------------------------------------
Create Temp Tables
--------------------------------------------------------------------------------------------------------------------------------*/
CREATE TABLE ##stage ( n int identity(1,1),id int,val varchar(20))
CREATE TABLE ##stage2 (id int, col1 varchar(20),col2 varchar(20), col3 varchar(20), col4 varchar(20), gp int,val varchar(20), rowct int)

/*--------------------------------------------------------------------------------------------------------------------------------
Build Dymanic SQL to Parse the text field into n number of VARCHAR() Data chunks
--------------------------------------------------------------------------------------------------------------------------------*/
DECLARE @id int
DECLARE @datalen int
DECLARE @sql varchar(8000)
DECLARE @sql1 varchar(8000)
DECLARE @cnt int

--Loop through all records in stage table and parse
SET @id = 0
WHILE @id <= (SELECT Max(id) FROM stage)
BEGIN
SET @id = @id+1 -- or whatever the id

-- get the length
SELECT @datalen = DATALENGTH (data) / 4000 + 1 FROM stage WHERE id = @id

-- collect into @sql declarations of @str1, @str2,...@strn
SET @cnt = 1
SET @sql='DECLARE @id int; SET @id = ' + CONVERT(VARCHAR(10), @id) + CHAR (13) + 'DECLARE '
SET @sql1 = ''
WHILE (@cnt <= @datalen)
BEGIN
SELECT
@sql = @sql + CASE @cnt
WHEN 1 THEN ''
ELSE ', ' + CHAR(13)
END
+ ' @str'+CONVERT(varchar(10),@cnt)+' VARCHAR(8000)'
SET @cnt = @cnt + 1
END
-- collect into @sql data chunks
SET @cnt = 1
WHILE (@cnt <= @datalen)
BEGIN
IF LEN(@sql) < 7850
SELECT @sql = @sql + CHAR (13) +
'SELECT @str' + CONVERT(VARCHAR(10), @cnt) + ' = SUBSTRING(data, ' +
CONVERT(VARCHAR(30), (@cnt-1)*4000+1) + ', 4000)' +
'FROM stage ' +
'WHERE id = ''' + CONVERT(VARCHAR(10), @id) + ''''
ELSE
SELECT @sql1 = @sql1 + CHAR (13) +
'SELECT @str' + CONVERT(VARCHAR(10), @cnt) + ' = SUBSTRING(data, ' +
CONVERT(VARCHAR(30), (@cnt-1)*4000+1) + ', 4000)' +
'FROM stage ' +
'WHERE id = ''' + CONVERT(VARCHAR(10), @id) + ''''
SET @cnt = @cnt + 1
END

--Build up the string concat variable
declare @str varchar(8000); set @str = ''
SET @cnt = 1
WHILE (@cnt <= @datalen)
BEGIN
SELECT @str = @str + '@str' + CONVERT (varchar(10), @cnt)
IF @CNT < @DATALEN
BEGIN
SET @str = @str + '+'
END
SET @cnt = @cnt + 1
END

--Add Parsing Logic
SET @sql1 = @sql1 + CHAR(13) +
'INSERT INTO ##stage (id,val)' + CHAR(13) +
'SELECT @id,' + CHAR(13) +
'NullIf(SubString(''|'' + ' + @str + ' + ''|'' , n , CharIndex(''|'' , ''|'' + ' + @str + ' + ''|'' , n) - n) , '''')' + CHAR(13) +
'FROM numbers ' + CHAR(13) +
'WHERE n <= Len(''|'' + ' + @str + ' + ''|'') AND SubString(''|'' + ' + @str + ' + ''|'' , n - 1, 1) = ''|''' + CHAR(13) +
'AND CharIndex(''|'' , ''|'' + ' + @str + ' + ''|'' , n) - n > 0 ' + CHAR(13)

/*--------------------------------------------------------------------------------------------------------------------------------
Execute the Dynamic SQL
--------------------------------------------------------------------------------------------------------------------------------*/
--PRINT @sql + @sql1
EXEC(@sql + @sql1)

END

/*--------------------------------------------------------------------------------------------------------------------------------
Group and insert the data into ##stage2
--------------------------------------------------------------------------------------------------------------------------------*/
--Loop through all rows in staging table and parse the header
SET @id = 0
WHILE @id < (SELECT MAX(id) FROM stage)
BEGIN

SET @id = @id+1

DECLARE @n_min int, @n_max int, @val varchar(20), @rowct int, @gp int
SELECT @n_min = MIN(n) FROM ##Stage WHERE id = @id
SELECT @n_max = MAX(n) FROM ##Stage WHERE id = @id
SELECT @val = '', @rowct = 1
SET @gp = 0

--header record
DECLARE
@col1 varchar(20),
@col2 varchar(20),
@col3 varchar(20),
@col4 varchar(20)

SELECT
@col1 = MAX(col1) ,
@col2 = MAX(col2) ,
@col3 = MAX(col3) ,
@col4 = MAX(col4)
FROM
(
SELECT TOP 4
CASE WHEN n = @n_min THEN val ELSE '' END col1,
CASE WHEN n = @n_min+1 THEN val ELSE '' END col2,
CASE WHEN n = @n_min+2 THEN val ELSE '' END col3,
CASE WHEN n = @n_min+3 THEN val ELSE '' END col4
FROM ##stage
WHERE id = @id
AND val IS NOT NULL
) d

--Detail Records
SET @n_min = @n_min+4
WHILE @n_min <=@n_max
BEGIN
SELECT @val = val
FROM ##stage
WHERE id = @id
AND n = @n_min

SET @n_min = @n_min +1

INSERT INTO ##stage2 (id,col1,col2,col3,col4,gp,val,rowct)
SELECT
@id,
@col1 col1,
@col2 col2,
@col3 col3,
@col4 col4,
@gp gp,
@val val,
@rowct rowct

IF @rowct = 4
SELECT @rowct = 1, @gp = @gp + 1
ELSE
SET @rowct = @rowct + 1
END

END

/*--------------------------------------------------------------------------------------------------------------------------------
Group and Pivot the Output
--------------------------------------------------------------------------------------------------------------------------------*/
--Header Records
SELECT
id,
col1,
col2,
col3,
col4
FROM ##stage2
GROUP BY id,col1,col2,col3,col4

--Detail Records
SELECT
id,
col1,
col2,
col3,
MAX(CASE WHEN rowct = 1 THEN val END) as col4,
MAX(CASE WHEN rowct = 2 THEN val END) as col5,
MAX(CASE WHEN rowct = 3 THEN val END) as col6,
MAX(CASE WHEN rowct = 4 THEN val END) as col7
FROM ##stage2
GROUP BY id,col1,col2,col3,id,gp

/*--------------------------------------------------------------------------------------------------------------------------------
Clean Up
--------------------------------------------------------------------------------------------------------------------------------*/
drop table stage, ##stage, ##stage2[/code]

JRea
Starting Member

4 Posts

Posted - 2004-06-16 : 09:17:03

Thanks a lot for that - should be most useful.

However isn't this is still working from the starting point of the data already being in a staging table on the database?

I still can't work out how to get the rows longer than 8060 bytes into the database in the first place.

ehorn
Master Smack Fu Yak Hacker

1632 Posts

Posted - 2004-06-16 : 09:21:03

[code]CREATE TABLE stage (id int identity(1,1), data text)
[/code]BULK INSERT the file into the stage table with No FIELDTERMINATOR. Each row is imported into the TEXT column data. Then use the parsing routine provided above. Just reseed the stage table after each import.

X002548
Not Just a Number

15586 Posts

Posted - 2004-06-16 : 12:18:31

Yeah...the parsing of the delimeters is gonna be a little bitch though...

something like....(it needs to be fixed up...)


USE Northwind
GO

CREATE TABLE myTable99(Col1 text)
GO

INSERT INTO myTable99(Col1)
SELECT '1|2|3|4|'+REPLICATE('x',8000)+'|5'
GO

CREATE FUNCTION udf_Word 
	(@str text, @word int, @Delim varchar(3))
RETURNS varchar(1000)
AS
BEGIN
	DECLARE @DelimFound INT, @LastDelimPosition INT
	SELECT @DelimFound = 0, @LastDelimPosition = 0

	WHILE (@DelimFound <= @Word)
	BEGIN
		IF (CHARINDEX(' ', @str, @LastDelimPosition + 1) = 0)
			BREAK
	
		ELSE
		BEGIN
			SET @LastDelimPosition = CHARINDEX(' ', @str, @LastDelimPosition + 1)
		END
		
	END

	RETURN SUBSTRING(@str,@LastdelimPosition+1,CHARINDEX(' ', @str, @LastDelimPosition + 1)-@LastdelimPosition+1)
	
END

SELECT dbo.udf_Word(Col1,3,'|') FROM myTable99


DROP FUNCTION udf_Word
DROP TABLE myTable99