Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Challenging problem for me at least

Author  Topic 

label
Posting Yak Master

197 Posts

Posted - 2003-05-29 : 13:08:12
I have two tables.

usr_smc
customer_data

In usr_smc I have individual users registered there. These users have (sometimes) given me adderess and company information.

In customer_data I have company information, not individual user information.

I need to create and populate a table that will link the individual users (usr_smc) with the company (customer_data) they work for.

The problems:

1. The data I have in Customer data is limited to the following fields for matching purposes:
-Address (line1)
-Address_2
-Zip
-City
-State
-Company Name

2. The data that individual users fill out in the usr_smc table is unverified so a person may type in "ABC Pnuematics" or "ABC Pneumatics Inc" or variations of those....possibly mispelled, or they may not type anything at all in.

Note: I can also derive the company name from the individuals Email address, though that is usually different than the Company Name in the customer_data table.

My thoughts on how to solve this problem so far:

1. Create a cursor that grabs every individual from the usr_smc table and then use Soundex to derive matches for company names either from their email or from the field they type in the company name. If there's a match there, check part of the zip code or address to verify it's still a match and then create the relation.

2. For those who don't have company info from either their Company name or email address in the usr_smc table, possibly go with state, city, zip and address for a correlation.

I'm really not sure the best way to go about creating this relationship and I'm wanting to do the most effecient way possible as we're dealing with a fair amount of records here.

Also, I want to make as many (accurate) matches as possible as everything else will be manually matched.

Thanks in advance for any advice I recieve here.


Edit: also, which is better to use for this purpose? Soundex or Difference and why?


Edited by - label on 05/29/2003 13:29:08

X002548
Not Just a Number

15586 Posts

Posted - 2003-05-29 : 14:24:35
Well....

First, I hope your doing this attempt to correct your model. You have a non-identifying relationship, whereby a person may or may not have company info. Also what's your key to person? If it's something like person Id + ComapnyName than you can't have a primary key. You need to make it a unique clustered index (pk's can't be null). Second add a foreign key relationship to company.

But you can't do any of that until the data is sanitized. Identify all of the person rows that don't have a row in company. How many is that? Is it managable?

I don't think SOUNDEX or DIFFERENCE is going to help...although I'm not sure. BOL's exampple:

SELECT SOUNDEX('Green'),
SOUNDEX('Greene'), DIFFERENCE('Green','Greene')


Doesn't seem to help much

Here is the result set:

----- ----- -----------
G650 G650 4


Start by determining how out of whack everything is first and let us know.




Brett

8-)
Go to Top of Page

label
Posting Yak Master

197 Posts

Posted - 2003-05-29 : 15:24:29
quote:

First, I hope your doing this attempt to correct your model.


Unfortuantely I can't. Neither are my tables or databases to begin with. One if from an AS400 mainframe that houses our customer contact list (customer_data) and the other is a web-based registration database that tracks individual users (usr_smc).

I've been asked to marry the two together in order to produce some meaningful sales statistics and what impact the web site is having on sales.

quote:
You have a non-identifying relationship, whereby a person may or may not have company info. Also what's your key to person?


An arbitrary unique_id field that's auto-incrementing for the usr_smc table. A unqiue customer_id assigned to each company in the case of the customer_data table.

quote:
Second add a foreign key relationship to company.


Already tried that but they are in different databases.

quote:
But you can't do any of that until the data is sanitized. Identify all of the person rows that don't have a row in company. How many is that? Is it managable?


Only a couple thousand out of 40,000 so I think we're just going to not worry about them and go exclusively by what's in the Company Name field for each record.

quote:
I don't think SOUNDEX or DIFFERENCE is going to help...although I'm not sure. BOL's exampple:

SELECT SOUNDEX('Green'),
SOUNDEX('Greene'), DIFFERENCE('Green','Greene')


Doesn't seem to help much

Here is the result set:

----- ----- -----------
G650 G650 4





Yeah, I read that after I posted it.....neither function helped.

quote:
Start by determining how out of whack everything is first and let us know.


Pretty out of whack unfortuantely, however, I can't really make any modifications to the Web based table and the other table can't be added to the Web DB so I'm pretty much stuck at this point.

Thanks for you help though. I'll try and muddle through it.

At this point I'm just using this:

insert into smc_etech_key 
select c.customer_id, u.unique_id
from
customer_data c
join
smc_new_products.dbo.usr_smc u
on
c.customer_name like u.company_name + '%'
where u.unique_id not in
(select etech_customer_Id from smc_etech_key)
and
left(u.company_name, 3)<>'smc'
and
u.company_name<>''
and
u.company_name is not null


Which should give us most instances of accurate matches. I'd further narrow it down by using the zip code on the Join as well but in some cases, the same company (listed several times for each location) might have different zip codes and we need to associate every instance of a company to a single user.

Edited by - label on 05/29/2003 15:26:41
Go to Top of Page

label
Posting Yak Master

197 Posts

Posted - 2003-05-29 : 15:25:11
crap...double post

Edited by - label on 05/29/2003 15:26:16
Go to Top of Page
   

- Advertisement -