Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Import/Export (DTS) and Replication (2000)
 Scrub Data using DTS

Author  Topic 

blinton25
Yak Posting Veteran

90 Posts

Posted - 2003-08-22 : 09:41:49
Hello,

How are you today?

I wish to carry out the following:

1. Periodocally Import data from a csv file into a temp table.
2. Clean the data (which mainly would involve identifying values not
present in a lookup table and cleaning Postal Code info.)
3. Report any errors found so these can be fixed
4. Append the clean data to the destination table.

Since my destination table will be in SQL Server I wish to use DTS,
but was wondering if they are any good tutorials which I can use as a
guideline or alternatively if scripts exist to achieve this goal.

tkizer
Almighty SQL Goddess

38200 Posts

Posted - 2003-08-22 : 12:37:56
Have a look at:

[url]http://sqldts.com/[/url]

Tara
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-08-22 : 20:34:24
You can use BULK INSERT in a T-SQL stored procedure, no need for DTS at all, and you can use it with temp tables as well.

We can help out a little more if you post some sample data and describe the cleanup process you need to perform.
Go to Top of Page

blinton25
Yak Posting Veteran

90 Posts

Posted - 2003-08-22 : 23:30:54
Hello,

For example my raw data has flight codes:

A13
8G-ZAZ
QR1/PW-EFT

I have a list of valid flight codes in a table, so if a mistake was made in data entry (78G-ZAZ), then I want to detect this error (Flight Code present which does not match any in my look up table).

From there I am not absolutely sure how to proceed, and would welcome suggestions:

1. Should I just copy the error free data to its final destination, and store the records which had errors in an error table? Issue here is when a user corrects the errors how do they copy them to the final destination table, and should I run an error check again?

2. Maybe the records with errors could be flagged and allow the user to correct them (while still in the temp table) and then copy the data from the temp table to the final destination.

Other considerations:
a) I would want to check several fields for errors, so should the user check all fields at once and then display a list of all errors found, or should they check field by field?

b) Some errors may be common, would be nice to automatically change these to the correct value and inform the user of the change made.

c) Actually thinking about this any error detected could be automatically corrected, and then the user would be shown the correction so they can decide if to keep it or change it before the data is copied to the final destination. In cases where automatic correction wasn't possible a default value (Unable to correct record) could be used.

These are just some thoughts, if a better process exists feel free to indicate.
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2003-08-23 : 22:08:15
Do you know T-SQL? (not meant to be disparaging)

If so...

BULK INSERT like Rob suggests, or bcp...(or DTS if you must)...

into to a table of all varchar data...

then interogate what you've got...

ISDATE and ISNUMERIC will be helpful...

Got headers and trailers (oh I forgot this ain't mainframe)

If you do, search the site on how to remove them...

oh, and if not...how do you plan to AUDIT and PROVE what you recieved was correct?

Load the data to a holding table (work table) and then use code to make sure everything checks out.. then INSERT into your table...


MOO




Brett

8-)

SELECT POST=NewId()

That's correct! It's an AlphaNumeric!
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2003-08-23 : 22:15:33
quote:

c) Actually thinking about this any error detected could be automatically corrected, and then the user would be shown the correction so they can decide if to keep it or change it before the data is copied to the final destination. In cases where automatic correction wasn't possible a default value (Unable to correct record) could be used.




Exactly...didn't read this part...sorry...

All the more reason to do this in a sproc (stored prpcedure).

Need a separate table to weed out the "bad" stuff...

Your "businessm" rules would do this..the more "table driven" your rules are the better...

Read the rules table to fiigure out what is good.

that way you don't have to touch code...

again...

moo




Brett

8-)

SELECT POST=NewId()

That's correct! It's an AlphaNumeric!
Go to Top of Page
   

- Advertisement -