Author |
Topic |
Skorch
Constraint Violating Yak Guru
300 Posts |
Posted - 2008-05-28 : 17:40:39
|
Here's my problem: I have to clean up a SQL Server 2005 database with a large number of phone number records (several hundred thousand). The records are of varchar datatype and contain phone numbers in every format imaginable. In fact, many records have written notes regarding the phone numbers after the numbers themselves. What I need to do is format all of the phone numbers to this format: ###-###-####-Basically I'm figuring I need to do the following:1. Strip all non-numeric characters from the record2. Remove the 1 from any records that have a leading 1 (in many cases the records contain stuff like 1-888-555-1234)3. Remove any digits following the first 10 digits (they don't want to keep any extensions - the formatting is more important)4. Add dashes after the first three digits, after the second three and at the end of the phone numberThis seems like a rather complex problem to me, and honestly I don't even know where to begin. I can accomplish this rather easily in javascript or C#, but writing SQL to solve this is beyond me. I'd really appreciate any help you guys can provide. Thanks alot! |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2008-05-28 : 17:42:59
|
Don't do #4, leave the data as numbers only. When displaying the data back to the client, then format however you want.I'd suggest searching these forums for how to clean up your data as this has been posted several times in the past here.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Database maintenance routines:http://weblogs.sqlteam.com/tarad/archive/2004/07/02/1705.aspx |
 |
|
Skorch
Constraint Violating Yak Guru
300 Posts |
Posted - 2008-05-28 : 18:24:15
|
quote: Originally posted by tkizer Don't do #4, leave the data as numbers only. When displaying the data back to the client, then format however you want.I'd suggest searching these forums for how to clean up your data as this has been posted several times in the past here.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Database maintenance routines:http://weblogs.sqlteam.com/tarad/archive/2004/07/02/1705.aspx
Thanks. I actually do have to format the data with the dashes - the database I'm using is MS CRM 3.0 and the data displays exactly as it is in the database for the phone number field.I've done some searching but did not come up with anything very useful. I do have a quick question though:If I cast a varchar as an integer, will it strip the non-numeric values from it or will it simply give me an error? |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2008-05-28 : 18:30:28
|
It will give you an error.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Database maintenance routines:http://weblogs.sqlteam.com/tarad/archive/2004/07/02/1705.aspx |
 |
|
Skorch
Constraint Violating Yak Guru
300 Posts |
|
Skorch
Constraint Violating Yak Guru
300 Posts |
Posted - 2008-05-28 : 19:52:35
|
After some more research I've discovered CLR functions in SQL Server 2005. Sounds like this will be the way to go for me. |
 |
|
raja_saminathan
Starting Member
12 Posts |
Posted - 2008-05-29 : 02:27:04
|
Hi Skorch,Are the Phone Numbers are in the starting Position followed by charcters or it may come in middle Also.Can you post some sample Formats to get the idea to filter only the Numbers by removing the extras.Do the 1- comes in the FirstPosition of the String?Rajesh |
 |
|
Skorch
Constraint Violating Yak Guru
300 Posts |
Posted - 2008-05-29 : 13:04:52
|
quote: Originally posted by raja_saminathan Hi Skorch,Are the Phone Numbers are in the starting Position followed by charcters or it may come in middle Also.Can you post some sample Formats to get the idea to filter only the Numbers by removing the extras.Do the 1- comes in the FirstPosition of the String?Rajesh
The phone numbers are in all different formats. Here's a sample:(602)555-12341-888-555-1234235.555.12341 800 555 4321786-555-1234 ex. 4567(602)555 4321 (cell)What I need to do is remove all of the non-numeric characters so I'm left with numbers only. Then I need to remove the 1 in the front if the number begins with a 1. After that I need to format the number into this pattern: ###-###-####- |
 |
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2008-05-29 : 14:06:58
|
One thing you can try is using a simple algorithm like this:1) strip out all non-numeric characters from each phone number (i.e., remove all ( and ) and . and - characters plus anything else in there)2) if the resulting number starts with a 1, remove the 13) using the resulting number, set the area code from positions 1-3, the exchange from positions 4-6, the number from 7-10, and anything from position 11 on as the extensionThat's one basic idea. I recommend to store the phone number pieces in separate columns (areacode, exchange, number, extension) that way you can enforce integrity and consistency and you easily combine those columns any way you want to format the values.Also -- since this is a one-time clean-up, don't worry about efficiency -- just get it done accurately. Even if you use a cursor, or client code that processes one row at a time, and so on. Whatever is easiest for you to work with.- Jeffhttp://weblogs.sqlteam.com/JeffS |
 |
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2008-05-29 : 16:35:49
|
The concept of stripping out the non-numerics is pretty easy. Some time ago I posted a function on this site that does just that: http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=93292&SearchTerms=StripPattern.You can call this function with the second parameter of '%[^0-9]%'. Or change this parameter to be cover less and specify specific variables you want to remove. |
 |
|
Skorch
Constraint Violating Yak Guru
300 Posts |
Posted - 2008-05-29 : 19:48:44
|
Thanks for your help guys. Here's where I'm at so far:I've created two functions - the first one removes all of the non-numeric characters from the telephone:ALTER FUNCTION [dbo].[RemoveChars](@Str varchar(1000))RETURNS VARCHAR(1000)BEGINdeclare @NewStr varchar(1000),@i intset @i = 1set @NewStr = '' while @i <= len(@str)begin --grab digits or (| in regex) decimal if substring(@str,@i,1) like '%[0-9]%' begin set @NewStr = @NewStr + substring(@str,@i,1) end else begin set @NewStr = @NewStr endset @i = @i + 1endRETURN Rtrim(Ltrim(@NewStr))ENDThe second one removes the leading 1 if the phone number begins with a 1 and then checks if the phone number is longer than 10 characters, in which case it trims it to 10:ALTER FUNCTION [dbo].[Remove1AndCut](@Str varchar(1000))RETURNS VARCHAR(1000)BEGINdeclare @NewStr varchar(1000)--,@i int--set @i = 1set @NewStr = '' --while @i <= len(@str)--begin --grab digits or (| in regex) decimal if substring(@str,1,1) = '1' begin set @NewStr = @NewStr + substring(@str,2,len(@str)) end if len(@str)>10 begin set @NewStr = substring(@str,1,10) end else begin set @NewStr = @NewStr end--set @i = @i + 1--endRETURN Rtrim(Ltrim(@NewStr))ENDWhat do you guys think? I've tested them and they seem to work fine so far, but please let me know if you see anything blatantly wrong with them. Thanks! |
 |
|
raja_saminathan
Starting Member
12 Posts |
Posted - 2008-05-30 : 01:59:51
|
Hi,Use this Function and Test with all PossibilitiesCREATE FUNCTION dbo.TrimExtra(@myString varchar(500), @validChars varchar(100))RETURNS varchar(500) ASBEGIN While @myString like '%[^' + @validChars + ']%' Select @myString = replace(@myString,substring(@myString,patindex('%[^' + @validChars + ']%',@myString),1),'') select @myString =case when @myString like '1-%' then replace(substring(@myString,3,len(@mystring)),'-','') else replace(@myString,'-','') end select @myString =case when @myString like '1 %' then replace(substring(@myString,3,len(@mystring)),' ','') else replace(@myString,' ','') end select @mystring= substring(@mystring,1,3)+'-'+substring(@mystring,4,3)+'-'+substring(@mystring,8,4)+'-' Return @myStringENDGo--- TestingDeclare @Test Table ( t Varchar (1000))insert into @Test select '1-3223-45645 r1-jkjfkgjfjgjgf' unionselect '1 rewnk 2323m 4k2k4k k23 423' unionselect '7889371-0030 3202' unionselect 'rers1-89088-98088' select * from @Testupdate @Testset t= (select dbo.GetCharacters1(t, '0-9- '))select * from @TestRegards,rajesh |
 |
|
raja_saminathan
Starting Member
12 Posts |
Posted - 2008-05-30 : 02:02:26
|
Kindly change the name of the function accordingly before testingRajesh |
 |
|
madhivanan
Premature Yak Congratulator
22864 Posts |
|
|