Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Jimminy Cricket
Starting Member
3 Posts |
Posted - 2005-05-27 : 07:25:39
|
| Hello All,I'm writing a stored proc which will be used for deduplicating our client data. I need to output details of records which match on key fields entered by the user at the start (eg, surname, forenames and dob) but which have slight discrepancies on other fields.Whilst looping through 'sets' of potential duplicates I set flags on the following table to show which fields there are differences on.CREATE TABLE dbo.MisMatch ( [DupSet] [int], [SurnameMatch] [char] (1) DEFAULT 'Y', [ForenamesMatch] [char] (1) DEFAULT 'Y', [CltypeMatch] [char] (1) DEFAULT 'Y', [ClstatMatch] [char] (1) DEFAULT 'Y', [DobMatch] [char] (1) DEFAULT 'Y', [SalutnMatch] [char] (1) DEFAULT 'Y', [MailnameMatch] [char] (1) DEFAULT 'Y', [HouseMatch] [char] (1) DEFAULT 'Y', [NumberMatch] [char] (1) DEFAULT 'Y', [StreetMatch] [char] (1) DEFAULT 'Y', [TownMatch] [char] (1) DEFAULT 'Y', [CityMatch] [char] (1) DEFAULT 'Y', [CountyMatch] [char] (1) DEFAULT 'Y', [CountryMatch] [char] (1) DEFAULT 'Y', [PostcodeMatch] [char] (1) DEFAULT 'Y', [PafindMatch] [char] (1) DEFAULT 'Y', [BadaddressMatch] [char] (1) DEFAULT 'Y', [NinoMatch] [char] (1) DEFAULT 'Y', [PayrollnoMatch] [char] (1) DEFAULT 'Y', [ExtclrefMatch] [char] (1) DEFAULT 'Y', [CorrrefMatch] [char] (1) DEFAULT 'Y', [OccclassMatch] [char] (1) DEFAULT 'Y', [AgeadmitMatch] [char] (1) DEFAULT 'Y', [EligtaxMatch] [char] (1) DEFAULT 'Y', [TaxoffMatch] [char] (1) DEFAULT 'Y', [TaxrefMatch] [char] (1) DEFAULT 'Y', [DeathdtMatch] [char] (1) DEFAULT 'Y', [NotifdtMatch] [char] (1) DEFAULT 'Y', [NameadmitdMatch] [char] (1) DEFAULT 'Y', [DatechangeMatch] [char] (1) DEFAULT 'Y', [GoneawyindMatch] [char] (1) DEFAULT 'Y', [PaftypeMatch] [char] (1) DEFAULT 'Y', [LastupdMatch] [char] (1) DEFAULT 'Y', [ClientotherpensindMatch] [char] (1) DEFAULT 'Y', [ClientninocodeMatch] [char] (1) DEFAULT 'Y', [ClienttitleMatch] [char] (1) DEFAULT 'Y', [ClientspousedobMatch] [char] (1) DEFAULT 'Y', [ClientresidencecodeMatch] [char] (1) DEFAULT 'Y' )The following table stores the info for each client: CREATE TABLE dbo.PotentialDuplicates ( [IDNUM] [int] identity(1,1), [CLNO] [int], [SURNAME] [varchar] (40), [FORENAMES] [varchar] (40), [CLTYPE] [char] (1), [CLSTAT] [tinyint], [MARITSTAT] [tinyint], [DOB] [datetime], [SALUTN] [varchar] (40), [MAILNAME] [varchar] (40), [ADDRESS1] [varchar] (40), [ADDRESS2] [varchar] (40), [ADDRESS3] [varchar] (40), [ADDRESS4] [varchar] (40), [HOUSE] [varchar] (40), [NUMBER] [varchar] (40), [STREET] [varchar] (40), [TOWN] [varchar] (40), [CITY] [varchar] (40), [COUNTY] [varchar] (40), [COUNTRY] [varchar] (40), [POSTCODE] [char] (12), [PAFIND] [tinyint], [BADADDRESS] [tinyint], [NINO] [char] (9), [PAYROLLNO] [char] (12), [EXTCLREF] [char] (10), [CORRREF] [char] (6), [OCCCLASS] [int], [AGEADMIT] [tinyint], [SMOKER] [tinyint], [LTSTSAL] [int], [LTSTPNEARN] [money], [LTSTSALDT] [datetime], [ELIGTAX] [tinyint], [TAXOFF] [varchar] (20), [TAXREF] [char] (12), [DEATHDT] [datetime], [NOTIFDT] [datetime], [CLIENTNOTE] [text], [DAYTELNO] [varchar] (18), [EVETELNO] [varchar] (18), [CLIENTAG] [char] (8), [CAREIND] [tinyint], [NAMEADMITD] [tinyint], [DATECHANGE] [datetime], [GONEAWYIND] [tinyint], [PAFTYPE] [tinyint], [LASTUPD] [varchar] (22), [ClientOtherPensInd] [tinyint], [ClientNINoCode] [tinyint], [Address5] [varchar] (40), [Address6] [varchar] (40), [ClientTitle] [varchar] (40), [ClientExternalId] [char] (12), [ClientAddrChangeDate] [datetime], [CreatorId] [varchar] (12), [CreatorTypeCode] [tinyint], [CreatedDate] [datetime], [ClientSpouseDOB] [datetime], [ClientHomeEmail] [varchar] (60), [ClientWorkEmail] [varchar] (60), [ClientPIN] [char] (6), [ClientPINLstChngDt] [datetime], [ClientWebAccessReq] [tinyint], [ClientPINLetterSent] [tinyint], [ClientResidenceCode] [tinyint], [ClientLastMaxFundChkDt] [datetime], [ClientMaxFundChkExpiryDt] [datetime], [ClientMaxFundLtrResponseStat] [tinyint], [ClientMaxFundChkNxtRemindNo] [tinyint], [Action] [varchar] (45), [Master] [int], [Processed] [char] (1), [DupSet] [int] )I can join these two tables on Dupset. There will be one row on Mismatch for each Dupset but there could be two, three or more clients in each Dupset on PotentialDuplicates. So I have looped through and compared each client with the others in each Dupset for the fields on the Mismatch table and set the flag to 'N' if there is a difference on any of them. So that the users can then check whether the differences are geniune or not, I need to make the data easier to read, with the intention of putting it in a spreadsheet at the end. The format the table needs to be in is (Ignore the '....', that was the only way I could get it lined up properly!):Extclref PostCode NINO DOB Surname Forenames Mismatch Details175857X......................................... .............Clno...........24586.........78787......85868...................................................................DOB..........17/12/64.....17/12/64...17/12/63...................................................................Forenames..Susan.......Susan.......Ssuan188799C........................................................Clno..........44895........44896.......118239 ...................................................................Forenames..Ann.........Ann-Marie.....Ann-MarieThe first six columns on the left hand side of the table will display the criteria that was entered by the user at the start, that the clients in each set do match on. This will come from parameters passed to the Stored Proc. The Mismatch Details will vary for each 'set' and depend on the flags set in the Mismatch table. I guess we could have the mismatch details laid out the other way, such as Clno .......DOB.....Forenames 24586....17/12/64...Susan 78787....17/12/64...Susan 85868....17/12/63...SsuanCan anybody help with how to get the mismatch info out without having a column for every field that there could possibly be a mismatch on. I want to be able to determine at the time for each 'set' which fields I need to output.Hope that makes sense!Cheers,James. |
|
|
jhermiz
3564 Posts |
Posted - 2005-05-27 : 10:15:01
|
Holy table batman. Keeping the web experience alive -- [url]http://www.web-impulse.com[/url]Imperfection living for perfection -- [url]http://jhermiz.blogspot.com/[/url] |
 |
|
|
tuenty
Constraint Violating Yak Guru
278 Posts |
Posted - 2005-05-27 : 12:24:35
|
Can you post sample data of what a table will haveWhat event should take placeand what the result should be??Does the table already contains duplicated keys? If not you just need to set the PKA candle loses nothing by lighting another candle |
 |
|
|
Jimminy Cricket
Starting Member
3 Posts |
Posted - 2005-05-30 : 16:09:32
|
| In the example above, client records which have the same value in the 'Extclref' field are being grouped together and compared against each other to see if all the other fields on the client record match.There are 3 client records which have the same Extclref value of 175857X.The PotentialDuplicates table looks like this:Idnum, Clno, Surname, Forenames, Cltype, Clstat, Maritstat,DOB, .etc.15, 24586, Smith, Susan, 1, 3, 5, 17/12/64....etc.356, 78787, Smith, Susan, 1, 3, 5, 17/12/64...etc.1015, 85868, Smith, Ssuan, 1, 3, 5, 17/12/63....etc.We loop through and compare each field and if the values match then the column on the 'Mismatch' table is left as a 'Y'. If there is a difference then it is set to 'N'. So the 'Mismatch' table would look like this:Dupset,SurnameMatch,ForenamesMatch,CltypeMatch,ClstatMatch,DobMatch35, Y, N, Y, Y, N, Y, Y, Y, Y, ....., YThe 35 (the 'Dupset' field) is a unique number and is just for the set of clients that are being compared. The two 'N' values are because there has been a mismatch on the Forenames and DOB. All the other columns are set to 'Y' as there are no other differences.The first two records on PotentialDuplicates are okay but the third one has differences on the Forenames and DOB so we need to highlight this by outputting to a spreadsheet for the users to look at and make amendments as necessary. It needs to look like my example above. On the left hand side is the parameter used to group 'sets' of clients - the Extclref column. On the right is the mismatch columns plus the 'Clno' which is a unique identifier for each client to help the users look them up quickly.So I need a way of going through the Mismatch table to check which columns have an 'N' value and then output the corresponding values from the PotentialDuplicates table for each client in that 'set'. Preferably in the layout specified. The Mismatch and PotentialDuplicates tables can be joined on the 'Dupset' field. There will be just one row on the Mismatch table for each Dupset, bit there will be 2,3 or more rows on the PotentialDuplicates table for each Dupset.Does any of that make ANY sense?!!!!! |
 |
|
|
jen
Master Smack Fu Yak Hacker
4110 Posts |
Posted - 2005-05-30 : 23:12:35
|
| why are you allowing duplicates to be entered if you're going to remove them?anyways, you should set which fields are to be unique and group those fields and return count>1something likeselect field1, field2, field3from table1group by field1,field2,field3having count(*)>1then to show those records, make a select using those fieldsHTH--------------------keeping it simple... |
 |
|
|
|
|
|
|
|