Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Data Corruption Issues
 flat file correction using t-sql

Author  Topic 

aakcse
Aged Yak Warrior

570 Posts

Posted - 2010-03-22 : 15:53:13
Hi all,

I have loaded a file content in to a single table with single column,the file is having data as below


Record 1. ABC|AVC1|1234|
xyz|abc
Record 2. ABC1|AVC1|12334|
xyz
|ABC
Record 3. ABC1|AVC1|12334|xyz|ABC



I need to correct the above data in the table, the records 3. is the correct format which I want record1 and record2 to be like.

Note Record1..3. text I have added above. ( this is just an e.g. actual table has 5million rec)

Can any one of you help me in correcting this file using plsql or sql query by creating another table from this one, which has corrected data

Logic I am think is

Every records must have 4 delimiter if less then bring the next line record to the previous line e.g.

Record 1. ABC|AVC1|1234|xyz|abc
Even if there are no delimiter on the line then also bring the next line record to the previous line e.g.

Record 2. ABC1|AVC1|12334| xyz
|ABC

Like wise cleaning the entire table and placing it in a new table, to export.

aakcse
Aged Yak Warrior

570 Posts

Posted - 2010-03-22 : 15:55:00
What we can do is, count the no of delimiter in the row ( it is 11 actually) if it is 4 say then the row has correct data in it, only possiblity would be as below
11111|aaaaa|4455|45AAA
BBB

Corrected one would be like this, with or with out space is ok for me
11111|aaaaa|4455|45AAA BBB

For relation ship we can add a row number another column as a primary key auto increatemet, assume that this is added for all the rows ( as below)

1 11111|aaaaa|4455|45AAA
2 BBB
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2010-03-22 : 15:56:40
This works in Oracle as below, If some one can help me in writing a procedure or converting the below query in sql would be of help.

1 with t as (
select 1 as rn, 'ABC|AVC1|1234|' as txt from dual union all
2 select 2, 'xyz|abc' from dual union all
3 select 3, 'ABC1|AVC1|12334|' from dual union all
4 select 4, 'xyz' from dual union all
5 select 5, '|ABC' from dual union all
6 select 6, 'ABC1|AVC1|12334|xyz|ABC' from dual)
7 -- END OF TEST DATA
8 select rn, txt
9 from (
10 select rn, regexp_substr(txt, '^[^|]+\|[^|]+\|[^|]+\|[^|]+\|[^|]+') as txt
11 ,case when s < nvl(lag(s) over (order by rn),999) then 1 else 0 end as s
12 from (
13 select rn, replace(txt||'|'||lead(txt) over (order by rn)
14 ||'|'||lead(txt,2) over (order by rn)
15 ||'|'||lead(txt,3) over (order by rn)
16 ||'|'||lead(txt,4) over (order by rn),'||','|') as txt
17 ,mod(sum(length(regexp_replace(txt,'[^|]'))) over (order by rn),5) as s
18 from t
19 )
20 )
21 where s = 1
22* order by rn



RN TXT
---------- ------------------------------
1 ABC|AVC1|1234|xyz|abc
3 ABC1|AVC1|12334|xyz|ABC
6 ABC1|AVC1|12334|xyz|ABC





replacing "t" in the main query with the name of your own table, and removing all the above lines from 1 till 7 will help in oracle
but it need it to be done in sql
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2010-03-22 : 16:09:35
I have done till here, can any one help with proceeding further..

(1)
select * into upload_bkup from upload
select * from upload

(2)
alter table upload add dtt_id1 int identity(1,1)

(3)
select len(column0) - len(replace(column0,'|','')) from upload
-- with 11 delimiter is the correct one

(4)
delete upload where isnull(column0,'')=''
--drop the previous identity and created new one retaining data from the previous identity column dt_id1
alter table upload add dtt_id2 int identity(1,1)

(5) The below query gives you all the records which went to next line with out delimiter

select a.dtt_id2 dtt_id2,b.dtt_id1 dtt_id1,a.column0+' '+b.column0 column0 into #upload
from upload a,
(select b.dtt_id1,b.dtt_id2,b.column0 from upload b where b.column0 not like '%|%')b
where b.dtt_id2 = a.dtt_id2 + 1

select * from #upload

(6) consolidating all the records in to one table

select a.* into upload_now_clean_delimiter from
(select a.dtt_id2, a.column0 from upload a
where a.dtt_id2 not in( select dtt_id2 from #upload)
and a.column0 like '%|%'
union
select dtt_id2,column0 from #upload
)a

select * from upload_now_clean_delimiter

(7) now clean the records which are coming with delimiter to next line
Go to Top of Page

aakcse
Aged Yak Warrior

570 Posts

Posted - 2010-03-24 : 07:04:37
Here is the code which works, for the above let me know if any one finds any issue for this code


DECLARE @X as table(rn int identity(1,1),txt varchar(max))

INSERT INTO @X(txt)
select 'ABC|AVC1|1234|' union all
select 'xyz|abc' union all
select 'ABC1|AVC1|12334|' union all
select 'x' union all
select 'y' union all
select 'z' union all
select '|ABC' union all
select 'ABC1|AVC1|12334|xyz|ABC' union all
select 'XYZ'

DECLARE @max bigint,@c bigint
SELECT @max = MAX(rn),@c = 1 from @X
DECLARE @targ varchar(max)

WHILE @max <> 0
BEGIN
SELECT @targ = txt FROM @X WHERE rn = @max

IF (@targ not like '%|%|%|%|%')
BEGIN
UPDATE @X SET txt = txt+@targ WHERE rn = @max - 1
DELETE FROM @X WHERE rn = @max
END
SET @max = @max - 1
END

SELECT * FROM @X
Go to Top of Page
   

- Advertisement -