Author |
Topic |
cvarick2
Starting Member
7 Posts |
Posted - 2010-12-09 : 16:07:55
|
I need help with SOUNDEX / DIFFERENCEI have two tables which I would like to compare the string in two particular columns.Example:table1fname-----mikeandysamtable2fname-----bobfredanthonychuckdrewjeffvictorsamanthaI need to create a query that will compare each of table1's fname values to each of the table2's fname values and return the 'fname' from table2 that matches with table1. The DIFFERENCE must be 4 or 3Any help??I am no pro at t-sql so I have something that obviously didn't work:SELECT t1.fname, t2.fnameFROM table1 t1, table2 t2WHERE DIFFERENCE(t1.fname, t2.fname) = 4Doesn't seem to work properly..Thanks in advance,Chris Varick |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-09 : 16:15:22
|
[code]SELECT t2.fnameFROM table1 t1JOIN table2 t2On DIFFERENCE(t1.fname, t2.fname) > 2[/code] |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-09 : 16:17:42
|
@Russell: Possibly needsSELECT t1.fname, t2.fname... ?? |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-09 : 16:20:22
|
Yep, probably |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-09 : 16:24:12
|
OK, probably a possibility then |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-09 : 16:25:09
|
lol |
 |
|
cvarick2
Starting Member
7 Posts |
Posted - 2010-12-09 : 16:29:19
|
@Russell:Thanks for your help, however I run into the same situation as my original query. When I run the query:SELECT t2.fname, t1.fnameFROM table1 t1JOIN table2 t2On DIFFERENCE(t1.fname, t2.fname) > 2I get something like the following:t1.fname | t2.fnamesam | samanthasam | fredsam | robsam | jeffsam | chuckAny suggestions? Also for this example table1 has 10 records where as table2 has 10,000 ....Thanks!Thanks in advance,Chris Varick |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-09 : 16:44:41
|
You sure?SELECT DIFFERENCE('sam', 'fred')should give 2 (and thus should not be included)perhaps post your exact code (assuming its a bit more complex than the example here) in case anyone can spot and snags with it |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-09 : 16:44:41
|
Using your sample data, I get two results. Are you sure that DIFFERENCE of 3 or 4 is what you want? What are the expected results?Running this:Create Table table1 (fname varchar(16));goCreate Table table2 (fname varchar(16));goinsert table1 values('mike');insert table1 values('andy');insert table1 values('sam');insert table2 values('bob');insert table2 values('fred');insert table2 values('anthony');insert table2 values('chuck');insert table2 values('drew');insert table2 values('jeff');insert table2 values('victor');insert table2 values('samantha');GOSELECT t1.fname, t2.fname, DIFFERENCE(t1.fname, t2.fname)FROM table1 t1JOIN table2 t2On DIFFERENCE(t1.fname, t2.fname) > 2;GO Returns this:fname fname ---------------- ---------------- -----------mike chuck 3andy anthony 3(2 row(s) affected) |
 |
|
cvarick2
Starting Member
7 Posts |
Posted - 2010-12-09 : 16:53:26
|
@Russell:Maybe I am experiencing an issue because I have duplicates in the fname column in table2? I know I do so what is your suggestion on that? :)Thanks!Thanks in advance,Chris Varick |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-09 : 17:03:11
|
Make sure you have t1 and t2 in the right order.You'll see what I mean if you execute this:select difference ('andy', 'samantha')select difference ('samantha', 'andy') |
 |
|
cvarick2
Starting Member
7 Posts |
Posted - 2010-12-13 : 16:05:35
|
Looks like I am still getting same issue. Where table1 fname field is the same value for all the table2 fname fields.This was just an example not an actual return query:t1.fname | t2.fname-------- ---------sam | samanthasam | fredsam | robsam | jeffsam | chuckThis example was showing what I am getting back... I tried doing the following: (putting distinct in)But this query is taking quite some time I will see if this helps in about 30 minutes more I hope! SELECT Distinct t1.fname, t2.fname, DIFFERENCE(t1.fname, t2.fname)FROM table1 t1JOIN table2 t2On DIFFERENCE(t1.fname, t2.fname) > 2;Thanks in advance,Chris Varick |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-13 : 16:59:15
|
If you're getting DIFFERENCEs < 3 then switch around t1 and t2. |
 |
|
cvarick2
Starting Member
7 Posts |
Posted - 2010-12-13 : 17:49:22
|
@Russell here is what I have.SELECT us.email, can.email, difference(left(CAN.EMAIL,CHARINDEX('@',CAN.EMAIL)-1), left(US.EMAIL,CHARINDEX('@',US.EMAIL)-1))FROM table1 CANJOIN table2 USOn DIFFERENCE(left(CAN.EMAIL,CHARINDEX('@',CAN.EMAIL)-1), left(US.EMAIL,CHARINDEX('@',US.EMAIL)-1)) > 2This is query result:----------------------us.email can.email diffjerome.clampitt@aol.com jchinniah@hotmail.com 3jerome.clampitt@aol.com karenh@hotmail.com 3jerome.clampitt@aol.com jsnow@rogers.com 3jerome.clampitt@aol.com jlin@dowco.com 3jerome.clampitt@aol.com brianh@freemail.hu 3jerome.clampitt@aol.com ji63jap@yahoo.com 3not sure what is happening...Thanks in advance,Chris Varick |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-13 : 18:24:51
|
Let's start from the beginning. What are you trying to accomplish?Show both table definitions, a few sample rows and expected results.We'll get you straightened out. |
 |
|
cvarick2
Starting Member
7 Posts |
Posted - 2010-12-13 : 19:06:47
|
RussellIt's a school project. I am doing something creative, but the summary of it is I am comparing email address similarities. So I have two tables table1 contains let's say 100 records while table2 contains 10,000 records.I want to return 1 record from table2 that has a difference of 3 or more for each record in table1. In this case I am dealing with email addresses, so I am only comparing the string left of the '@'. Its nothing exciting just trying something out... I guess my problem is that there is the possiblility that more than one record in table2 will match for each record in table1. Since I am not great at SQL I do not know how to make it work.Examples (note these are all generated emails):Table1:field1ddubois3138@sympatico.cacgonzalez@efni.comfstpierre@accesswave.caasamra@nbnet.nb.caroberte@canada.comTable2:field1jerome.clampitt@aol.comlisah2@mindspring.comdbessing@hotmail.comlincolntcm3@aol.comdmcclung3@bellsouth.netcarmenc1968@sbcglobal.netmbmpines@msn.combaybeeangel1@aol.comsniraula@peoplepc.comcrunkoncrack3@yahoo.comhenrylinder76@yahoo.comhotdebndesert@hotmail.comdebradickerson@verizon.netunicorn_baby74467@yahoo.comryanstephanie@hotmail.comalsion@aol.comarthur.belcher@gmail.comracingmax@gateway.netjustincalady@yahoo.comsowhat073@yahoo.comtywilwar@yahoo.comjoseph.palazzolo5@nc.rr.comcv_cheer@hotmail.comsexydevil0815@yahoo.comchinostiff@hotmail.comlibbyblodgett@yahoo.comleetx@live.comlegendmrlee73@yahoo.comsteviedino@hotmail.comdiana.wright@blackplanet.commartin5646@sbcglobal.netmichael.french5@aol.comcarlyle.curtis@hotmail.comdyigdal@aol.comlord_dog@att.netndawg87@gmail.comshommie13@yahoo.comgolpramish@yahoo.commrs.rich905@yahoo.comtgree18@hotmail.commadridedir@yahoo.comgucci_gurl@hotmail.commlee2608@aol.comdabannaschick_1@yahoo.comsnsivaraman@yahoo.comelmerixcoy@hotmail.comgangstagurl7735@aol.comkim@pinnellbusch.comdpuetz@latinmail.comjdvance74@aol.comany help? :)Thanks in advance,Chris Varick |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-13 : 22:17:37
|
So you only want the first match in table2 for each address in table1? |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-13 : 22:32:37
|
[code]-- populate test data --Create Table CAN (email varchar(32));Create Table USA (email varchar(32));GOINSERT CANSELECT 'ddubois3138@sympatico.ca' UNION ALLSELECT 'cgonzalez@efni.com' UNION ALLSELECT 'fstpierre@accesswave.ca' UNION ALLSELECT 'asamra@nbnet.nb.ca' UNION ALLSELECT 'roberte@canada.com';INSERT USASELECT 'jerome.clampitt@aol.com' UNION ALLSELECT 'lisah2@mindspring.com' UNION ALLSELECT 'dbessing@hotmail.com' UNION ALLSELECT 'lincolntcm3@aol.com' UNION ALLSELECT 'dmcclung3@bellsouth.net' UNION ALLSELECT 'carmenc1968@sbcglobal.net' UNION ALLSELECT 'mbmpines@msn.com' UNION ALLSELECT 'baybeeangel1@aol.com' UNION ALLSELECT 'sniraula@peoplepc.com' UNION ALLSELECT 'crunkoncrack3@yahoo.com' UNION ALLSELECT 'henrylinder76@yahoo.com' UNION ALLSELECT 'hotdebndesert@hotmail.com' UNION ALLSELECT 'debradickerson@verizon.net' UNION ALLSELECT 'unicorn_baby74467@yahoo.com' UNION ALLSELECT 'ryanstephanie@hotmail.com' UNION ALLSELECT 'alsion@aol.com' UNION ALLSELECT 'arthur.belcher@gmail.com' UNION ALLSELECT 'racingmax@gateway.net' UNION ALLSELECT 'justincalady@yahoo.com' UNION ALLSELECT 'sowhat073@yahoo.com' UNION ALLSELECT 'tywilwar@yahoo.com' UNION ALLSELECT 'joseph.palazzolo5@nc.rr.com' UNION ALLSELECT 'cv_cheer@hotmail.com' UNION ALLSELECT 'sexydevil0815@yahoo.com' UNION ALLSELECT 'chinostiff@hotmail.com' UNION ALLSELECT 'libbyblodgett@yahoo.com' UNION ALLSELECT 'leetx@live.com' UNION ALLSELECT 'legendmrlee73@yahoo.com' UNION ALLSELECT 'steviedino@hotmail.com' UNION ALLSELECT 'diana.wright@blackplanet.com' UNION ALLSELECT 'martin5646@sbcglobal.net' UNION ALLSELECT 'michael.french5@aol.com' UNION ALLSELECT 'carlyle.curtis@hotmail.com' UNION ALLSELECT 'dyigdal@aol.com' UNION ALLSELECT 'lord_dog@att.net' UNION ALLSELECT 'ndawg87@gmail.com' UNION ALLSELECT 'shommie13@yahoo.com' UNION ALLSELECT 'golpramish@yahoo.com' UNION ALLSELECT 'mrs.rich905@yahoo.com' UNION ALLSELECT 'tgree18@hotmail.com' UNION ALLSELECT 'madridedir@yahoo.com' UNION ALLSELECT 'gucci_gurl@hotmail.com' UNION ALLSELECT 'mlee2608@aol.com' UNION ALLSELECT 'dabannaschick_1@yahoo.com' UNION ALLSELECT 'snsivaraman@yahoo.com' UNION ALLSELECT 'elmerixcoy@hotmail.com' UNION ALLSELECT 'gangstagurl7735@aol.com' UNION ALLSELECT 'kim@pinnellbusch.com' UNION ALLSELECT 'dpuetz@latinmail.com' UNION ALLSELECT 'jdvance74@aol.com';GO-- return result --WITH emailAddresses (CANmail, USAmail, rnk)AS ( SELECT c.email, u.email, rank() over ( partition by c.email order by DIFFERENCE(left(c.EMAIL,CHARINDEX('@',c.EMAIL)-1), left(u.EMAIL,CHARINDEX('@',u.EMAIL)-1)) DESC, u.email ASC ) rnk FROM CAN c JOIN USA u On DIFFERENCE(left(c.EMAIL,CHARINDEX('@',c.EMAIL)-1), left(u.EMAIL,CHARINDEX('@',u.EMAIL)-1)) > 2)SELECT * FROM emailAddresses WHERE rnk = 1;GO-- Drop Table CAN;-- Drop Table USA;-- GO[/code] |
 |
|
cvarick2
Starting Member
7 Posts |
Posted - 2010-12-14 : 14:26:32
|
@RussellWow I think this may have done the trick! :)I am doing some further testing, but I greatly appreciate your help.Thanks in advance,Chris Varick |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2010-12-14 : 14:45:12
|
Glad to help. Take a close look at the ORDER BY clause. Note that it takes an email from the Canada table and takes the first match it finds in the USA table. If there are multiple results, it takes the first it finds with a difference of 4. Else it take the first it finds with a difference of 3.So basically, it sorts by DIFFERENCE DESC then by dictionary order ASC.There is no further logic to determine which one it should match on when several records are found. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-15 : 04:27:23
|
Sorry guys I've not been following the conversation closely, so this may be a red herring.When we do "sounds like" on Email addresses we use the REVERSE of the Email address up to the "@" so that we can compare Domain Name part - in addition to soundex (say, or Levenshtein) on the first part. |
 |
|
Next Page
|