Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
SamC
White Water Yakist
3467 Posts |
Posted - 2005-06-24 : 10:57:39
|
This is an application of Arnold's string distance algorithm, but it's tough... I have two tables... TableA, TableB. Both have Firstname, Middlename, Lastname columns and are *unfortunately*, the only key which can be used to match the two tables. Spelling errors.Write a query that will return a recordset of exact matches, and include records without an exact match but have a minimal distance (2 or 3) for those that rows do not match. |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2005-06-24 : 11:10:30
|
How about just use the function Arnold provided with a cross join?select a.FirstName [AFirst] ,a.middleName [AMiddle] ,a.LastName [ALast] ,b.FirstName [BFirst] ,b.middleName [BMiddle] ,b.LastName [BLast]from tableA across join tableB bwhere dbo.edit_distance( a.FirstName+a.middleName+a.LastName, b.FirstName+b.middleName+b.LastName) < 4 Be One with the OptimizerTG |
 |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2005-06-24 : 12:54:48
|
quote: Originally posted by TG How about just use the function Arnold provided with a cross join?
Won't it match a Row from TableA to more than one row in TableB?I didn't state the problem well. It's one I've been playing with this week. TableA and TableB should have unique matches, but there are misspellings. (My solution is the misspellings should be corrected., but it's interesting to toy with a distance formula to see how close you can get without correcting the misspellings.)The problem statement could be refined to something like: Match each row in TableA to a single row in TableB with the minimum distance.e.g. one match, distance of 0, then 1, then 2 until there is a match.If two rows match at a distance level (e.g. two matches at 1), it's no match. |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2005-06-24 : 15:10:45
|
then i would follow TG's approach... more or less  select a.FirstName [AFirst] ,a.middleName [AMiddle] ,a.LastName [ALast] ,b.FirstName [BFirst] ,b.middleName [BMiddle] ,b.LastName [BLast] ,dbo.edit_distance( a.FirstName+a.middleName+a.LastName, b.FirstName+b.middleName+b.LastName) [distance] Into #resultsPrep from tableA a cross join tableB b where dbo.edit_distance( a.FirstName+a.middleName+a.LastName, b.FirstName+b.middleName+b.LastName) < 4 Select Y.* From ( Select AFirst, AMiddle, ALast, distance=min(distance) From #resultsPrep s1 Group By AFirst, AMiddle, ALast Having (Select count(*) From #resultsPrep Where AFirst=s1.AFirst and AMiddle=s1.AMiddle and ALast=s1.ALast and distance=min(s1.distance))=1 ) Z Inner Join #resultsPrep Y On Z.AFirst=Y.AFirst and Z.AMiddle=Y.AMiddle and Z.ALast=Y.ALast and Z.distance=Y.distance Drop Table #resultsPrep Corey Co-worker on The Wizard of Oz "...those three midgets that came out and danced, the freaked me out when I was little. But they are ok now." |
 |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2005-06-24 : 16:51:04
|
I wonder if the distance would give better results if applied individually to First, Middle, Last name.Sam |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2005-06-24 : 17:40:21
|
I don't think it matters... because of the algorithmCorey Co-worker on The Wizard of Oz "...those three midgets that came out and danced, the freaked me out when I was little. But they are ok now." |
 |
|
|
|
|
|
|