Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Help with Stored Proc please

Author  Topic 

Jimminy Cricket
Starting Member

3 Posts

Posted - 2005-05-27 : 07:25:39
Hello All,

I'm writing a stored proc which will be used for deduplicating our client data. I need to output details of records which match on key fields entered by the user at the start (eg, surname, forenames and dob) but which have slight discrepancies on other fields.

Whilst looping through 'sets' of potential duplicates I set flags on the following table to show which fields there are differences on.

CREATE TABLE dbo.MisMatch (
[DupSet] [int],
[SurnameMatch] [char] (1) DEFAULT 'Y',
[ForenamesMatch] [char] (1) DEFAULT 'Y',
[CltypeMatch] [char] (1) DEFAULT 'Y',
[ClstatMatch] [char] (1) DEFAULT 'Y',
[DobMatch] [char] (1) DEFAULT 'Y',
[SalutnMatch] [char] (1) DEFAULT 'Y',
[MailnameMatch] [char] (1) DEFAULT 'Y',
[HouseMatch] [char] (1) DEFAULT 'Y',
[NumberMatch] [char] (1) DEFAULT 'Y',
[StreetMatch] [char] (1) DEFAULT 'Y',
[TownMatch] [char] (1) DEFAULT 'Y',
[CityMatch] [char] (1) DEFAULT 'Y',
[CountyMatch] [char] (1) DEFAULT 'Y',
[CountryMatch] [char] (1) DEFAULT 'Y',
[PostcodeMatch] [char] (1) DEFAULT 'Y',
[PafindMatch] [char] (1) DEFAULT 'Y',
[BadaddressMatch] [char] (1) DEFAULT 'Y',
[NinoMatch] [char] (1) DEFAULT 'Y',
[PayrollnoMatch] [char] (1) DEFAULT 'Y',
[ExtclrefMatch] [char] (1) DEFAULT 'Y',
[CorrrefMatch] [char] (1) DEFAULT 'Y',
[OccclassMatch] [char] (1) DEFAULT 'Y',
[AgeadmitMatch] [char] (1) DEFAULT 'Y',
[EligtaxMatch] [char] (1) DEFAULT 'Y',
[TaxoffMatch] [char] (1) DEFAULT 'Y',
[TaxrefMatch] [char] (1) DEFAULT 'Y',
[DeathdtMatch] [char] (1) DEFAULT 'Y',
[NotifdtMatch] [char] (1) DEFAULT 'Y',
[NameadmitdMatch] [char] (1) DEFAULT 'Y',
[DatechangeMatch] [char] (1) DEFAULT 'Y',
[GoneawyindMatch] [char] (1) DEFAULT 'Y',
[PaftypeMatch] [char] (1) DEFAULT 'Y',
[LastupdMatch] [char] (1) DEFAULT 'Y',
[ClientotherpensindMatch] [char] (1) DEFAULT 'Y',
[ClientninocodeMatch] [char] (1) DEFAULT 'Y',
[ClienttitleMatch] [char] (1) DEFAULT 'Y',
[ClientspousedobMatch] [char] (1) DEFAULT 'Y',
[ClientresidencecodeMatch] [char] (1) DEFAULT 'Y'
)

The following table stores the info for each client:
CREATE TABLE dbo.PotentialDuplicates (
[IDNUM] [int] identity(1,1),
[CLNO] [int],
[SURNAME] [varchar] (40),
[FORENAMES] [varchar] (40),
[CLTYPE] [char] (1),
[CLSTAT] [tinyint],
[MARITSTAT] [tinyint],
[DOB] [datetime],
[SALUTN] [varchar] (40),
[MAILNAME] [varchar] (40),
[ADDRESS1] [varchar] (40),
[ADDRESS2] [varchar] (40),
[ADDRESS3] [varchar] (40),
[ADDRESS4] [varchar] (40),
[HOUSE] [varchar] (40),
[NUMBER] [varchar] (40),
[STREET] [varchar] (40),
[TOWN] [varchar] (40),
[CITY] [varchar] (40),
[COUNTY] [varchar] (40),
[COUNTRY] [varchar] (40),
[POSTCODE] [char] (12),
[PAFIND] [tinyint],
[BADADDRESS] [tinyint],
[NINO] [char] (9),
[PAYROLLNO] [char] (12),
[EXTCLREF] [char] (10),
[CORRREF] [char] (6),
[OCCCLASS] [int],
[AGEADMIT] [tinyint],
[SMOKER] [tinyint],
[LTSTSAL] [int],
[LTSTPNEARN] [money],
[LTSTSALDT] [datetime],
[ELIGTAX] [tinyint],
[TAXOFF] [varchar] (20),
[TAXREF] [char] (12),
[DEATHDT] [datetime],
[NOTIFDT] [datetime],
[CLIENTNOTE] [text],
[DAYTELNO] [varchar] (18),
[EVETELNO] [varchar] (18),
[CLIENTAG] [char] (8),
[CAREIND] [tinyint],
[NAMEADMITD] [tinyint],
[DATECHANGE] [datetime],
[GONEAWYIND] [tinyint],
[PAFTYPE] [tinyint],
[LASTUPD] [varchar] (22),
[ClientOtherPensInd] [tinyint],
[ClientNINoCode] [tinyint],
[Address5] [varchar] (40),
[Address6] [varchar] (40),
[ClientTitle] [varchar] (40),
[ClientExternalId] [char] (12),
[ClientAddrChangeDate] [datetime],
[CreatorId] [varchar] (12),
[CreatorTypeCode] [tinyint],
[CreatedDate] [datetime],
[ClientSpouseDOB] [datetime],
[ClientHomeEmail] [varchar] (60),
[ClientWorkEmail] [varchar] (60),
[ClientPIN] [char] (6),
[ClientPINLstChngDt] [datetime],
[ClientWebAccessReq] [tinyint],
[ClientPINLetterSent] [tinyint],
[ClientResidenceCode] [tinyint],
[ClientLastMaxFundChkDt] [datetime],
[ClientMaxFundChkExpiryDt] [datetime],
[ClientMaxFundLtrResponseStat] [tinyint],
[ClientMaxFundChkNxtRemindNo] [tinyint],
[Action] [varchar] (45),
[Master] [int],
[Processed] [char] (1),
[DupSet] [int]
)

I can join these two tables on Dupset. There will be one row on Mismatch for each Dupset but there could be two, three or more clients in each Dupset on PotentialDuplicates. So I have looped through and compared each client with the others in each Dupset for the fields on the Mismatch table and set the flag to 'N' if there is a difference on any of them.

So that the users can then check whether the differences are geniune or not, I need to make the data easier to read, with the intention of putting it in a spreadsheet at the end. The format the table needs to be in is (Ignore the '....', that was the only way I could get it lined up properly!):

Extclref PostCode NINO DOB Surname Forenames Mismatch Details
175857X......................................... .............Clno...........24586.........78787......85868
...................................................................DOB..........17/12/64.....17/12/64...17/12/63
...................................................................Forenames..Susan.......Susan.......Ssuan

188799C........................................................Clno..........44895........44896.......118239
...................................................................Forenames..Ann.........Ann-Marie.....Ann-Marie

The first six columns on the left hand side of the table will display the criteria that was entered by the user at the start, that the clients in each set do match on. This will come from parameters passed to the Stored Proc. The Mismatch Details will vary for each 'set' and depend on the flags set in the Mismatch table. I guess we could have the mismatch details laid out the other way, such as
Clno .......DOB.....Forenames
24586....17/12/64...Susan
78787....17/12/64...Susan
85868....17/12/63...Ssuan

Can anybody help with how to get the mismatch info out without having a column for every field that there could possibly be a mismatch on. I want to be able to determine at the time for each 'set' which fields I need to output.

Hope that makes sense!

Cheers,
James.

jhermiz

3564 Posts

Posted - 2005-05-27 : 10:15:01
Holy table batman.



Keeping the web experience alive -- [url]http://www.web-impulse.com[/url]
Imperfection living for perfection --
[url]http://jhermiz.blogspot.com/[/url]
Go to Top of Page

tuenty
Constraint Violating Yak Guru

278 Posts

Posted - 2005-05-27 : 12:24:35
Can you post sample data of what a table will have
What event should take place
and what the result should be??

Does the table already contains duplicated keys? If not you just need to set the PK


A candle loses nothing by lighting another candle
Go to Top of Page

Jimminy Cricket
Starting Member

3 Posts

Posted - 2005-05-30 : 16:09:32
In the example above, client records which have the same value in the 'Extclref' field are being grouped together and compared against each other to see if all the other fields on the client record match.

There are 3 client records which have the same Extclref value of 175857X.

The PotentialDuplicates table looks like this:
Idnum, Clno, Surname, Forenames, Cltype, Clstat, Maritstat,DOB, .etc.
15, 24586, Smith, Susan, 1, 3, 5, 17/12/64....etc.
356, 78787, Smith, Susan, 1, 3, 5, 17/12/64...etc.
1015, 85868, Smith, Ssuan, 1, 3, 5, 17/12/63....etc.

We loop through and compare each field and if the values match then the column on the 'Mismatch' table is left as a 'Y'. If there is a difference then it is set to 'N'. So the 'Mismatch' table would look like this:
Dupset,SurnameMatch,ForenamesMatch,CltypeMatch,ClstatMatch,DobMatch
35, Y, N, Y, Y, N, Y, Y, Y, Y, ....., Y

The 35 (the 'Dupset' field) is a unique number and is just for the set of clients that are being compared. The two 'N' values are because there has been a mismatch on the Forenames and DOB. All the other columns are set to 'Y' as there are no other differences.

The first two records on PotentialDuplicates are okay but the third one has differences on the Forenames and DOB so we need to highlight this by outputting to a spreadsheet for the users to look at and make amendments as necessary. It needs to look like my example above. On the left hand side is the parameter used to group 'sets' of clients - the Extclref column. On the right is the mismatch columns plus the 'Clno' which is a unique identifier for each client to help the users look them up quickly.

So I need a way of going through the Mismatch table to check which columns have an 'N' value and then output the corresponding values from the PotentialDuplicates table for each client in that 'set'. Preferably in the layout specified. The Mismatch and PotentialDuplicates tables can be joined on the 'Dupset' field. There will be just one row on the Mismatch table for each Dupset, bit there will be 2,3 or more rows on the PotentialDuplicates table for each Dupset.

Does any of that make ANY sense?!!!!!
Go to Top of Page

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2005-05-30 : 23:12:35
why are you allowing duplicates to be entered if you're going to remove them?

anyways, you should set which fields are to be unique and group those fields and return count>1

something like

select field1, field2, field3
from table1
group by field1,field2,field3
having count(*)>1

then to show those records, make a select using those fields

HTH



--------------------
keeping it simple...
Go to Top of Page
   

- Advertisement -