Re: Deleting Duplicates, but protecting lowest qty

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Re: Deleting Duplicates, but protecting lowest qty

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2005-03-03 : 07:44:55

Elena writes "Our company does direct mail. Many times we receive 3 or 4 files from a client that has to be deduped and the file with the lowest amount of records has to be protected.

Sometimes we have to protect certain makes or year ranges as well.

What is your expert advice on this?

Thanks in advance,

Elena Boughey
DME

btw, great site!!!"

nr
SQLTeam MVY

12543 Posts

Posted - 2005-03-03 : 08:27:08

What do you have? Is the data in a table or do you have to look at a folder for the files.
How do you know which clinet files are from?
Where do you get the make/year range from?

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

eboughey
Starting Member

3 Posts

Posted - 2005-03-03 : 11:35:35

The files we receive come in all types of formats and nothing is ever uniform. A client might send us data from 2 or 3 different reporting systems. We record the filename as we import as well.

My main dilemna is how to protect the files as requested. I am currently adding a priority field as I import and running the duplicate processing ordering by priority. I am hoping there is a way to count how many by the filename field and protect the one with the lowest quantity automatically to remove the human error issue.

Thanks

nr
SQLTeam MVY

12543 Posts

Posted - 2005-03-03 : 14:25:57

I repeat my questions.
How are you detecting duplicates, how do you know how many recs in a file.

i.e. what data do you have to work on.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

eboughey
Starting Member

3 Posts

Posted - 2005-03-04 : 15:34:21

I detect duplicates and update my file with the below command. I order by Priority code and dealer # if I have several files to protect data so it pulls the duplicate from the highest record....

Update [63805-client1]
set Query_ID = 'DUPES'
from [63805-client1] a
where uniqueid not in(select min(uniqueid) from [63805-CLIENT1] B WHERE B.LASTNAME = a.LASTNAME
AND B.ADDRESS1 = a.ADDRESS1
AND B.ZIP = a.ZIP)

nr
SQLTeam MVY

12543 Posts

Posted - 2005-03-04 : 15:45:21

you seem to have a table [63805-client1]
which contains
Query_ID
uniqueid
LASTNAME
ADDRESS1
ZIP

You are detecting duplicates on LASTNAME, ADDRESS1, ZIP

You seem to be leaving us to guess where dealer # and priority code are held.
Still nothing about the number of records in a file or even a file. I'm guessing that a uniqueid represents a file?

Try this. Consider that you have not seen this system and know nothing about the problem. Read through your posts and see if you could possibly understand what the problem is from what you have written then try composing the question again.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-03-04 : 16:05:36

I see nothing wrong with your query as such.
Could you be more specific on the advice You want.

Is it on the database design? (it looks like You have 1 table per file...)
- If you had 1 table with LASTNAME+ADRESS+ZIP as PK, you would not have the dup problem there,
But You would have to do some work to clean the data first.

rockmoose

eboughey
Starting Member

3 Posts

Posted - 2005-03-04 : 17:20:44

Okay, I hope I can explain it better here.

I receive 3 files from a dealer that say we code #15

Mongoose honda file.xls (3500 records)
Mongoose mits file.txt (1250 records)
Service file for mits and honda.csv (12000 records)

I import the file information including 3 new fields (dealernum, filename, and priority). I use the dealer & priority field to order my file permanently by dealer, priority before I do any deduping (which is why that information is not in my update query).

My question is, is there a way to avoid the human error factor by using SQL to calculate and prioritize my files automatically using The filename field. I want it to protect the file with the least amount of records and pull the duplicates from the file with the most records for the dealer.

Elena

nr
SQLTeam MVY

12543 Posts

Posted - 2005-03-04 : 17:44:26

So you have a table with this info and a numrecs column I guess.
I would have thought you should be checking the DealerNum rather than the address.
If this is something you set then hold the address in another table and just hold the dealernum in this one. I assume you can set the dealernum to group the addresses.
Then it's just
update tbl
set priority = (select count(*) from tbl t2 where t.dealernum = t2.dealernum and t2.numrecs <= t.numrecs)
from tbl t

Then you can set the priority for those with the same number of recs using the uniqueid.
Then those with priority 1 are the ones that aren't dups.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources