| Author |
Topic |
|
label
Posting Yak Master
197 Posts |
Posted - 2003-04-21 : 12:48:34
|
| I've been tasked with providing a search mechanism that would allow someone to enter a phone number (sometimes approximate) and return a list of possible candidates from a Database. The problem is that whoever originally wrote the registration process did not enforce any specific format for phonenumbers so they could be like '333-333-3333' or '3333333333' or even '333/333 3333'. They may even be null. I tried using Soundex and Difference but apparently those only work on strings. So how do I best search for a phone number in a database where there is no way to tell what format the number may be in, if it exists at all in the first place?Thanks. |
|
|
dsdeming
479 Posts |
Posted - 2003-04-21 : 12:57:55
|
| If there are a limited number of non-numeric characters in the data, say spaces, dashes, slashes and parentheses, you could use nested replaces, but it's a tedious process and quickly gets out of hand. I'd look at a writing a udf that strips non-numeric characters from a string.HTH |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-04-21 : 12:59:51
|
| Can you cleanup the data? I would suggest removing all formatting (MOO), you could always format it on the way outUPDATE Table1 SET Col1 = REPLACE(REPLACE(Col1,'/',''),'-','')) *If you can't, then make that part of the predicate (it will do a table scan though):SELECT Col1 FROM Table1 WHERE @Searchstring Like REPLACE(REPLACE(Col1,'/',''),'-','')) * * For as many different chars to remove.I'm having my own headaches with International Phone numbers:http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=25261Brett8-) |
 |
|
|
label
Posting Yak Master
197 Posts |
Posted - 2003-04-21 : 13:15:45
|
quote: Can you cleanup the data? I would suggest removing all formatting (MOO), you could always format it on the way outUPDATE Table1 SET Col1 = REPLACE(REPLACE(Col1,'/',''),'-','')) *If you can't, then make that part of the predicate (it will do a table scan though):SELECT Col1 FROM Table1 WHERE @Searchstring Like REPLACE(REPLACE(Col1,'/',''),'-','')) *
Yeah, that's the approach I've been working with as well. However, I wasn't sure if there was a simpler way in SQL just to do a search on only the Ints in a field or not.......I can't remove the characters permanently because I also have international phone numbers, but I can format the predicate search like you suggest which looks like what I might have to do. |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-04-21 : 14:24:19
|
| How do you store intl phones....just let them type anything they want?So if its: 12 34 1243-4554 is that what you store?Or do you have a format "Rules" table that say Ireland is this format, England is this, USA is this...?VERY curious.Using the predicate with REPLACE though will perform very poorly.Maybe you can add a "SEARCH" column identical to the phone and in 1 pass update the column with unformatted info, and then have a trigger fire on INSERT or UPDATE to keep the column up to date?Just a thought.Brett8-) |
 |
|
|
label
Posting Yak Master
197 Posts |
Posted - 2003-04-21 : 15:01:08
|
| Sheesh......double post. Sorry.Edited by - label on 04/21/2003 15:02:53 |
 |
|
|
label
Posting Yak Master
197 Posts |
Posted - 2003-04-21 : 15:01:09
|
quote: How do you store intl phones....just let them type anything they want?So if its: 12 34 1243-4554 is that what you store?
That's what was in place when I got it. Once I rewrite it (currently in the planning stages to do), w're going to go with a table that will autoformat their number according to their country.So, when someone goes to register, they'll get a pull down box that makes them select Country. Once we know that we'll be able to provide them with a dynamic form that provides the right fields/validation for that country. In fact, the goal is to make the registration page an abstracted process that can be utilized in many application rather than redevelop the registration process over and over. It'll probably be a WebControl for .NET when it's all done. quote: Using the predicate with REPLACE though will perform very poorly.
Yeah, I was pretty sure it would. Fortunately for us we only have about 60,000 users so it's not that big of a database. Perhaps in the next iteration we'll index by country hopefully speed things up.quote: Maybe you can add a "SEARCH" column identical to the phone and in 1 pass update the column with unformatted info, and then have a trigger fire on INSERT or UPDATE to keep the column up to date?
Yeah......I like that idea. Adds a column to the table but in the long run would be quicker to search through while there's no formatting being applied. Have to think about writing a process that will go through and populate that phoneSearch column. Would indexing that column give me any benefits since it's strictly an integer field? |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-04-21 : 15:13:13
|
I would think indexing it would help either char or numeric. Numeric would be smaller though. But that elimiates phone numbers like:1800CALLUS9BUT, it won't use an index if you do something like:Where Phone Like '%'+@Search+'%'Also if it were numeric, I don't think you can do a like, you'd have to convert (not sure though)..even if you removed the leading '%', I think the converstion might force a scan (again I'm not sure....boy do I say that a lot):Where Convert(varchar(20,Phone) Like Convert(varchar(20,@Search)+'%'Any any comments about:quote: How do you store intl phones....just let them type anything they want? So if its: 12 34 1243-4554 is that what you store? Or do you have a format "Rules" table that say Ireland is this format, England is this, USA is this...? VERY curious.
Brett8-) |
 |
|
|
label
Posting Yak Master
197 Posts |
Posted - 2003-04-21 : 15:21:08
|
Any any comments about:quote: How do you store intl phones....just let them type anything they want? So if its: 12 34 1243-4554 is that what you store? Or do you have a format "Rules" table that say Ireland is this format, England is this, USA is this...? VERY curious.
Right now they type whatever they want with no validation or formatting being performed. In fact, the field isn't even required. Basically, whatever they enter is what gets put into the database. As for how we are planning on doing it: I plan on having a template for each country that will provide the user with the exact layout of their country and then validate their entries at that point. So if someone entered "USA" as their country of origin, the Phone number portion of their registiration form would have 3 text fields that would only allow them to enter 3 integers into the first field, 3 integers into the next field and 4 into the last. Any characters like "." or "-" or "/" would be static text in between the columns and formatted properly (using a phone number form rules table or perhaps using regular expressions) and then inserted into the database. Beyond that, I still like your idea of having a second column used only for searching phonenumbers that only has the raw integers of the phone number.Edited by - label on 04/21/2003 15:21:42 |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-04-21 : 15:30:21
|
quote: As for how we are planning on doing it: I plan on having a template for each country that will provide the user with the exact layout of their country and then validate their entries at that point.
Well if you ever do figure that out, I'd love to see it.Brett8-) |
 |
|
|
label
Posting Yak Master
197 Posts |
Posted - 2003-04-21 : 15:39:39
|
quote: Well if you ever do figure that out, I'd love to see it.
Sure thing, we're supposed to have a conference call today to kick this project off so it'll probably be finished or close to it later this week. Edited by - label on 04/21/2003 16:44:17 |
 |
|
|
Tigger
Yak Posting Veteran
85 Posts |
Posted - 2003-04-21 : 20:12:24
|
| This function was written by Arnold Fribble to fix a similar problem that I had:CREATE FUNCTION RemoveChars (@Str varchar(8000), @DelPat varchar(8000))RETURNS varchar(8000)ASBEGIN DECLARE @Res varchar(8000), @i int SET @DelPat = '%[' + @DelPat + ']%' SET @Res = '' SET @i = PATINDEX(@DelPat, @Str) WHILE @i <> 0 BEGIN SET @Res = @Res + SUBSTRING(@Str, 1, @i - 1) SET @Str = SUBSTRING(@Str, @i + 1, 8000) SET @i = PATINDEX(@DelPat, @Str) END SET @Res = @Res + @Str RETURN @ResENDTo remove non-numeric characters from your phone numberSELECT dbo.removechars('00 64 (04)-123-4567','^0-9')This gives '0064041234567'The ^ in '^0-9' means 'Keep' so '^0-9' means 'keep values 0-9 and remove everything else'. |
 |
|
|
label
Posting Yak Master
197 Posts |
Posted - 2003-04-23 : 09:36:12
|
quote: To remove non-numeric characters from your phone numberSELECT dbo.removechars('00 64 (04)-123-4567','^0-9')This gives '0064041234567'The ^ in '^0-9' means 'Keep' so '^0-9' means 'keep values 0-9 and remove everything else'.
Very cool....Thanks for the code. |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-05-02 : 16:04:15
|
| Label,How did your meeting go...did you figure anything out?Brett8-) |
 |
|
|
|