Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
TimSman
Posting Yak Master
127 Posts |
Posted - 2013-05-13 : 08:41:30
|
Not sure exactly where to fit this question, so here goes...Here's the situation:We are implementing a new system at work, and part of the set-up requires me to identify unique city, state/province, and country combinations. I have customer and vendor addresses, each coming from different source databases and tables (and different systems). We have addresses in the U.S. and Canada at this point.I am trying to clean up the city names (I have stuff in place to handle the state/province and country names & codes), and then generate unique codes for each (code format is PPCCCCCC, where PP is the state/province code, and CCCCCC is some abbreviation for the city name). Right now, it's the clean up part that is giving me fits; I have abbreviations and typos galore (Los Angeles, Las Angeles, Las Angels, LA, LosAngeles just to give you some idea).I have looked at numerous options, but I'm either not sure exactly how to use them (Levenshtein Distance), or I'm unable to determine a good source of data (looking at web services).I'm curious as to what approaches others have taken, or if anyone can point towards some sort of tool/API (free is always good...). Going forward, I know that we will be implementing standards related to how this data is entered, so if anyone has tips/thoughts on what standards have worked for them, that would be appreciated also. |
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2013-05-14 : 00:47:10
|
What we did was to use a Azure marketplace data service containing details on cities of US and Canada and create a knowledge base in Data Quality Services (DQS) based on that. Then we used DQS task in our ETL packages (SSIS) to do the cleansing and get correct mapping of city details. Of course we're on SQL 2012 and SSIS 2012 have DQS support.------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/https://www.facebook.com/VmBlogs |
|
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2013-05-14 : 10:53:29
|
We use this service for validation and normalization of US addresses:http://smartystreets.com/You can batch validate/normalize a file of addresses, or call the API to validate addresses as they are entered into your system. The result gives you a valid US postal service address along with information like latitude/longitude, county, ZIP+4, etc.They do not do Canadian addresses, but there are many other providers of this type of service.CODO ERGO SUM |
|
|
|
|
|
|
|