Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 SOUNDEX / DIFFERENCE - HELP

Author  Topic 

cvarick2
Starting Member

7 Posts

Posted - 2010-12-09 : 16:07:55
I need help with SOUNDEX / DIFFERENCE

I have two tables which I would like to compare the string in two particular columns.

Example:
table1
fname
-----
mike
andy
sam

table2
fname
-----
bob
fred
anthony
chuck
drew
jeff
victor
samantha

I need to create a query that will compare each of table1's fname values to each of the table2's fname values and return the 'fname' from table2 that matches with table1. The DIFFERENCE must be 4 or 3

Any help??

I am no pro at t-sql so I have something that obviously didn't work:
SELECT t1.fname, t2.fname
FROM table1 t1, table2 t2
WHERE DIFFERENCE(t1.fname, t2.fname) = 4

Doesn't seem to work properly..


Thanks in advance,
Chris Varick

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-09 : 16:15:22
[code]SELECT t2.fname
FROM table1 t1
JOIN table2 t2
On DIFFERENCE(t1.fname, t2.fname) > 2[/code]
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-09 : 16:17:42
@Russell: Possibly needs

SELECT t1.fname, t2.fname
...

??
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-09 : 16:20:22
Yep, probably
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-09 : 16:24:12
OK, probably a possibility then
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-09 : 16:25:09
lol
Go to Top of Page

cvarick2
Starting Member

7 Posts

Posted - 2010-12-09 : 16:29:19
@Russell:

Thanks for your help, however I run into the same situation as my original query. When I run the query:
SELECT t2.fname, t1.fname
FROM table1 t1
JOIN table2 t2
On DIFFERENCE(t1.fname, t2.fname) > 2

I get something like the following:
t1.fname | t2.fname
sam | samantha
sam | fred
sam | rob
sam | jeff
sam | chuck

Any suggestions? Also for this example table1 has 10 records where as table2 has 10,000 ....

Thanks!

Thanks in advance,
Chris Varick
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-09 : 16:44:41
You sure?

SELECT DIFFERENCE('sam', 'fred')

should give 2 (and thus should not be included)

perhaps post your exact code (assuming its a bit more complex than the example here) in case anyone can spot and snags with it
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-09 : 16:44:41
Using your sample data, I get two results. Are you sure that DIFFERENCE of 3 or 4 is what you want? What are the expected results?

Running this:
Create Table table1 (fname varchar(16));
go
Create Table table2 (fname varchar(16));
go
insert table1 values('mike');
insert table1 values('andy');
insert table1 values('sam');
insert table2 values('bob');
insert table2 values('fred');
insert table2 values('anthony');
insert table2 values('chuck');
insert table2 values('drew');
insert table2 values('jeff');
insert table2 values('victor');
insert table2 values('samantha');
GO

SELECT t1.fname, t2.fname, DIFFERENCE(t1.fname, t2.fname)
FROM table1 t1
JOIN table2 t2
On DIFFERENCE(t1.fname, t2.fname) > 2;
GO

Returns this:

fname fname
---------------- ---------------- -----------
mike chuck 3
andy anthony 3

(2 row(s) affected)
Go to Top of Page

cvarick2
Starting Member

7 Posts

Posted - 2010-12-09 : 16:53:26
@Russell:

Maybe I am experiencing an issue because I have duplicates in the fname column in table2?
I know I do so what is your suggestion on that? :)

Thanks!

Thanks in advance,
Chris Varick
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-09 : 17:03:11
Make sure you have t1 and t2 in the right order.

You'll see what I mean if you execute this:
select difference ('andy', 'samantha')
select difference ('samantha', 'andy')
Go to Top of Page

cvarick2
Starting Member

7 Posts

Posted - 2010-12-13 : 16:05:35
Looks like I am still getting same issue. Where table1 fname field is the same value for all the table2 fname fields.

This was just an example not an actual return query:
t1.fname | t2.fname
-------- ---------
sam | samantha
sam | fred
sam | rob
sam | jeff
sam | chuck

This example was showing what I am getting back... I tried doing the following: (putting distinct in)
But this query is taking quite some time I will see if this helps in about 30 minutes more I hope!

SELECT Distinct t1.fname, t2.fname, DIFFERENCE(t1.fname, t2.fname)
FROM table1 t1
JOIN table2 t2
On DIFFERENCE(t1.fname, t2.fname) > 2;


Thanks in advance,
Chris Varick
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-13 : 16:59:15
If you're getting DIFFERENCEs < 3 then switch around t1 and t2.
Go to Top of Page

cvarick2
Starting Member

7 Posts

Posted - 2010-12-13 : 17:49:22
@Russell here is what I have.

SELECT us.email, can.email, difference(left(CAN.EMAIL,CHARINDEX('@',CAN.EMAIL)-1), left(US.EMAIL,CHARINDEX('@',US.EMAIL)-1))
FROM table1 CAN
JOIN table2 US
On DIFFERENCE(left(CAN.EMAIL,CHARINDEX('@',CAN.EMAIL)-1), left(US.EMAIL,CHARINDEX('@',US.EMAIL)-1)) > 2


This is query result:
----------------------
us.email can.email diff
jerome.clampitt@aol.com jchinniah@hotmail.com 3
jerome.clampitt@aol.com karenh@hotmail.com 3
jerome.clampitt@aol.com jsnow@rogers.com 3
jerome.clampitt@aol.com jlin@dowco.com 3
jerome.clampitt@aol.com brianh@freemail.hu 3
jerome.clampitt@aol.com ji63jap@yahoo.com 3

not sure what is happening...

Thanks in advance,
Chris Varick
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-13 : 18:24:51
Let's start from the beginning. What are you trying to accomplish?

Show both table definitions, a few sample rows and expected results.

We'll get you straightened out.
Go to Top of Page

cvarick2
Starting Member

7 Posts

Posted - 2010-12-13 : 19:06:47
Russell

It's a school project. I am doing something creative, but the summary of it is I am comparing email address similarities. So I have two tables table1 contains let's say 100 records while table2 contains 10,000 records.

I want to return 1 record from table2 that has a difference of 3 or more for each record in table1. In this case I am dealing with email addresses, so I am only comparing the string left of the '@'. Its nothing exciting just trying something out... I guess my problem is that there is the possiblility that more than one record in table2 will match for each record in table1. Since I am not great at SQL I do not know how to make it work.

Examples (note these are all generated emails):

Table1:
field1
ddubois3138@sympatico.ca
cgonzalez@efni.com
fstpierre@accesswave.ca
asamra@nbnet.nb.ca
roberte@canada.com

Table2:
field1
jerome.clampitt@aol.com
lisah2@mindspring.com
dbessing@hotmail.com
lincolntcm3@aol.com
dmcclung3@bellsouth.net
carmenc1968@sbcglobal.net
mbmpines@msn.com
baybeeangel1@aol.com
sniraula@peoplepc.com
crunkoncrack3@yahoo.com
henrylinder76@yahoo.com
hotdebndesert@hotmail.com
debradickerson@verizon.net
unicorn_baby74467@yahoo.com
ryanstephanie@hotmail.com
alsion@aol.com
arthur.belcher@gmail.com
racingmax@gateway.net
justincalady@yahoo.com
sowhat073@yahoo.com
tywilwar@yahoo.com
joseph.palazzolo5@nc.rr.com
cv_cheer@hotmail.com
sexydevil0815@yahoo.com
chinostiff@hotmail.com
libbyblodgett@yahoo.com
leetx@live.com
legendmrlee73@yahoo.com
steviedino@hotmail.com
diana.wright@blackplanet.com
martin5646@sbcglobal.net
michael.french5@aol.com
carlyle.curtis@hotmail.com
dyigdal@aol.com
lord_dog@att.net
ndawg87@gmail.com
shommie13@yahoo.com
golpramish@yahoo.com
mrs.rich905@yahoo.com
tgree18@hotmail.com
madridedir@yahoo.com
gucci_gurl@hotmail.com
mlee2608@aol.com
dabannaschick_1@yahoo.com
snsivaraman@yahoo.com
elmerixcoy@hotmail.com
gangstagurl7735@aol.com
kim@pinnellbusch.com
dpuetz@latinmail.com
jdvance74@aol.com

any help? :)

Thanks in advance,
Chris Varick
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-13 : 22:17:37
So you only want the first match in table2 for each address in table1?
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-13 : 22:32:37
[code]-- populate test data --
Create Table CAN (email varchar(32));
Create Table USA (email varchar(32));
GO

INSERT CAN
SELECT 'ddubois3138@sympatico.ca' UNION ALL
SELECT 'cgonzalez@efni.com' UNION ALL
SELECT 'fstpierre@accesswave.ca' UNION ALL
SELECT 'asamra@nbnet.nb.ca' UNION ALL
SELECT 'roberte@canada.com';

INSERT USA
SELECT 'jerome.clampitt@aol.com' UNION ALL
SELECT 'lisah2@mindspring.com' UNION ALL
SELECT 'dbessing@hotmail.com' UNION ALL
SELECT 'lincolntcm3@aol.com' UNION ALL
SELECT 'dmcclung3@bellsouth.net' UNION ALL
SELECT 'carmenc1968@sbcglobal.net' UNION ALL
SELECT 'mbmpines@msn.com' UNION ALL
SELECT 'baybeeangel1@aol.com' UNION ALL
SELECT 'sniraula@peoplepc.com' UNION ALL
SELECT 'crunkoncrack3@yahoo.com' UNION ALL
SELECT 'henrylinder76@yahoo.com' UNION ALL
SELECT 'hotdebndesert@hotmail.com' UNION ALL
SELECT 'debradickerson@verizon.net' UNION ALL
SELECT 'unicorn_baby74467@yahoo.com' UNION ALL
SELECT 'ryanstephanie@hotmail.com' UNION ALL
SELECT 'alsion@aol.com' UNION ALL
SELECT 'arthur.belcher@gmail.com' UNION ALL
SELECT 'racingmax@gateway.net' UNION ALL
SELECT 'justincalady@yahoo.com' UNION ALL
SELECT 'sowhat073@yahoo.com' UNION ALL
SELECT 'tywilwar@yahoo.com' UNION ALL
SELECT 'joseph.palazzolo5@nc.rr.com' UNION ALL
SELECT 'cv_cheer@hotmail.com' UNION ALL
SELECT 'sexydevil0815@yahoo.com' UNION ALL
SELECT 'chinostiff@hotmail.com' UNION ALL
SELECT 'libbyblodgett@yahoo.com' UNION ALL
SELECT 'leetx@live.com' UNION ALL
SELECT 'legendmrlee73@yahoo.com' UNION ALL
SELECT 'steviedino@hotmail.com' UNION ALL
SELECT 'diana.wright@blackplanet.com' UNION ALL
SELECT 'martin5646@sbcglobal.net' UNION ALL
SELECT 'michael.french5@aol.com' UNION ALL
SELECT 'carlyle.curtis@hotmail.com' UNION ALL
SELECT 'dyigdal@aol.com' UNION ALL
SELECT 'lord_dog@att.net' UNION ALL
SELECT 'ndawg87@gmail.com' UNION ALL
SELECT 'shommie13@yahoo.com' UNION ALL
SELECT 'golpramish@yahoo.com' UNION ALL
SELECT 'mrs.rich905@yahoo.com' UNION ALL
SELECT 'tgree18@hotmail.com' UNION ALL
SELECT 'madridedir@yahoo.com' UNION ALL
SELECT 'gucci_gurl@hotmail.com' UNION ALL
SELECT 'mlee2608@aol.com' UNION ALL
SELECT 'dabannaschick_1@yahoo.com' UNION ALL
SELECT 'snsivaraman@yahoo.com' UNION ALL
SELECT 'elmerixcoy@hotmail.com' UNION ALL
SELECT 'gangstagurl7735@aol.com' UNION ALL
SELECT 'kim@pinnellbusch.com' UNION ALL
SELECT 'dpuetz@latinmail.com' UNION ALL
SELECT 'jdvance74@aol.com';
GO

-- return result --
WITH emailAddresses (CANmail, USAmail, rnk)
AS (
SELECT c.email, u.email, rank() over (
partition by c.email
order by DIFFERENCE(left(c.EMAIL,CHARINDEX('@',c.EMAIL)-1), left(u.EMAIL,CHARINDEX('@',u.EMAIL)-1)) DESC, u.email ASC
) rnk
FROM CAN c
JOIN USA u
On DIFFERENCE(left(c.EMAIL,CHARINDEX('@',c.EMAIL)-1), left(u.EMAIL,CHARINDEX('@',u.EMAIL)-1)) > 2
)
SELECT * FROM emailAddresses WHERE rnk = 1;
GO


-- Drop Table CAN;
-- Drop Table USA;
-- GO
[/code]
Go to Top of Page

cvarick2
Starting Member

7 Posts

Posted - 2010-12-14 : 14:26:32
@Russell

Wow I think this may have done the trick! :)

I am doing some further testing, but I greatly appreciate your help.

Thanks in advance,
Chris Varick
Go to Top of Page

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2010-12-14 : 14:45:12
Glad to help.

Take a close look at the ORDER BY clause. Note that it takes an email from the Canada table and takes the first match it finds in the USA table. If there are multiple results, it takes the first it finds with a difference of 4. Else it take the first it finds with a difference of 3.

So basically, it sorts by DIFFERENCE DESC then by dictionary order ASC.

There is no further logic to determine which one it should match on when several records are found.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-15 : 04:27:23
Sorry guys I've not been following the conversation closely, so this may be a red herring.

When we do "sounds like" on Email addresses we use the REVERSE of the Email address up to the "@" so that we can compare Domain Name part - in addition to soundex (say, or Levenshtein) on the first part.
Go to Top of Page
    Next Page

- Advertisement -