Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Kalaiselvan
Posting Yak Master
112 Posts |
Posted - 2013-10-23 : 07:37:48
|
Hi, I got a Table with some Customer Info. Where as the Address Columns were Splitted into Many columns. Hence some Customer Address were Loaded in various Ways. Need to Overcome this Issue and Required Unique set of Records of Customers.Table: (CustomerInfo)[NAME] [ADDRESS1] [ADDRESS2] [ADDRESS3] [AREA] [CITY]BABU V NO.41 31st street 2nd Avenue PADI CHENNAIBABU V 31st Street No.41 2nd Aven PADI CHENNAIBABU V 31st street, PADI CHARUN ADYAR CHENNAIARUN NO.50 3rd Cross St ADYAR CHENNAIARUN 3rd Cross St ADYAR CHEARUN No.75 AA Avenue CHOOLAI CHELike the Above table I have got nearly millions of Rows. With some Name Duplication and Also with Un Matched Address Column.But from above Table, Required output will be 3 Rows.TABLE: (CustomerInfo Unique)[NAME] [ADDRESS] [CITY]BABU V No.41 2nd Avenue 31st Steet, PADI CHENNAIARUN NO.50 3rd Cross Street ADYAR CHENNAIARUN NO.75 AA Avenue CHOOLAI CHENNAIOutput Table has Combined Address Column. Where as Address Near by Match is taken as Single Row. Please help me How to fix this Using Query or else using any Fuzzy Grouping or Fuzzy Logic Transformation in SSIS.Regards,Kalai |
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2013-10-23 : 08:36:19
|
you need use fuzzy lookup for this. Or use some data quality tools like Mellisa data component or DQS task in SSIS. DQS is available in SSIS from 2012 onwards. Mellisa data is third party component.------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/https://www.facebook.com/VmBlogs |
|
|
Kalaiselvan
Posting Yak Master
112 Posts |
Posted - 2013-10-23 : 14:24:27
|
Can u Explain me in detail in how to tackle this Issue. I can use Fuzzy Logic and not other as mentioned.Or Can Help with SQL Queries too...Regards,Kalai |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2013-10-24 : 01:46:08
|
You need to have a master table with standard entries for each group ie for example amongNO.41 31st street 2nd Avenue PADI CHENNAI31st Street No.41 2nd Aven PADI CHENNAI31st street, PADI CH.. store one standard value (most accurate) in tableThen in SSIS use lookup task from your source data to this table. This will capture all exact matches in lookup match output. Now take nomatch output and link it to Fuzzy Lookup task based on the Address value. Then get output onto a staging table or file. Then check this table/file content. It will have few additional columns like Similarity,Confidence which will have a value based on how similar value is to actual value. Based on values you get you need to determine what similarity and confidence values gives you expected matches (this has to be done by trial and error method doing few runs). Then add a conditional split task to fuzzy match output and create an output specifying your values for similarity and confidence. Then link this output to match output of previous lookup task via Union all and you'll get required pattern matches in your final outputSo package will look like below inside data flow task Your source tasks | | Lookup Task / / / Match NoMatch | | | | | Fuzzy Lookup | | | | | Conditional Split | / | / | / | / | / | / Union all ------------------------------------------------------------------------------------------------------SQL Server MVPhttp://visakhm.blogspot.com/https://www.facebook.com/VmBlogs |
|
|
|
|
|