| Author |
Topic |
|
AskSQLTeam
Ask SQLTeam Question
0 Posts |
Posted - 2005-03-03 : 07:44:55
|
| Elena writes "Our company does direct mail. Many times we receive 3 or 4 files from a client that has to be deduped and the file with the lowest amount of records has to be protected. Sometimes we have to protect certain makes or year ranges as well.What is your expert advice on this?Thanks in advance,Elena BougheyDMEbtw, great site!!!" |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2005-03-03 : 08:27:08
|
| What do you have? Is the data in a table or do you have to look at a folder for the files.How do you know which clinet files are from?Where do you get the make/year range from?==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
eboughey
Starting Member
3 Posts |
Posted - 2005-03-03 : 11:35:35
|
| The files we receive come in all types of formats and nothing is ever uniform. A client might send us data from 2 or 3 different reporting systems. We record the filename as we import as well.My main dilemna is how to protect the files as requested. I am currently adding a priority field as I import and running the duplicate processing ordering by priority. I am hoping there is a way to count how many by the filename field and protect the one with the lowest quantity automatically to remove the human error issue.Thanks |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2005-03-03 : 14:25:57
|
| I repeat my questions.How are you detecting duplicates, how do you know how many recs in a file.i.e. what data do you have to work on.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
eboughey
Starting Member
3 Posts |
Posted - 2005-03-04 : 15:34:21
|
| I detect duplicates and update my file with the below command. I order by Priority code and dealer # if I have several files to protect data so it pulls the duplicate from the highest record....Update [63805-client1]set Query_ID = 'DUPES'from [63805-client1] awhere uniqueid not in(select min(uniqueid) from [63805-CLIENT1] B WHERE B.LASTNAME = a.LASTNAMEAND B.ADDRESS1 = a.ADDRESS1AND B.ZIP = a.ZIP) |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2005-03-04 : 15:45:21
|
| you seem to have a table [63805-client1]which containsQuery_IDuniqueidLASTNAMEADDRESS1ZIPYou are detecting duplicates on LASTNAME, ADDRESS1, ZIPYou seem to be leaving us to guess where dealer # and priority code are held.Still nothing about the number of records in a file or even a file. I'm guessing that a uniqueid represents a file?Try this. Consider that you have not seen this system and know nothing about the problem. Read through your posts and see if you could possibly understand what the problem is from what you have written then try composing the question again.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-03-04 : 16:05:36
|
| I see nothing wrong with your query as such.Could you be more specific on the advice You want.Is it on the database design? (it looks like You have 1 table per file...)- If you had 1 table with LASTNAME+ADRESS+ZIP as PK, you would not have the dup problem there,But You would have to do some work to clean the data first.rockmoose |
 |
|
|
eboughey
Starting Member
3 Posts |
Posted - 2005-03-04 : 17:20:44
|
| Okay, I hope I can explain it better here.I receive 3 files from a dealer that say we code #15Mongoose honda file.xls (3500 records)Mongoose mits file.txt (1250 records)Service file for mits and honda.csv (12000 records)I import the file information including 3 new fields (dealernum, filename, and priority). I use the dealer & priority field to order my file permanently by dealer, priority before I do any deduping (which is why that information is not in my update query).My question is, is there a way to avoid the human error factor by using SQL to calculate and prioritize my files automatically using The filename field. I want it to protect the file with the least amount of records and pull the duplicates from the file with the most records for the dealer.Elena |
 |
|
|
nr
SQLTeam MVY
12543 Posts |
Posted - 2005-03-04 : 17:44:26
|
| So you have a table with this info and a numrecs column I guess.I would have thought you should be checking the DealerNum rather than the address.If this is something you set then hold the address in another table and just hold the dealernum in this one. I assume you can set the dealernum to group the addresses.Then it's justupdate tblset priority = (select count(*) from tbl t2 where t.dealernum = t2.dealernum and t2.numrecs <= t.numrecs)from tbl tThen you can set the priority for those with the same number of recs using the uniqueid.Then those with priority 1 are the ones that aren't dups.==========================================Cursors are useful if you don't know sql.DTS can be used in a similar way.Beer is not cold and it isn't fizzy. |
 |
|
|
|