Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Script Library
 Filtering a string

Author  Topic 

RyanRandall
Master Smack Fu Yak Hacker

1074 Posts

Posted - 2007-02-13 : 12:18:09
Function which filters a string - i.e. takes a string, and returns the same string with specified characters left or removed. Handles unicode characters if specified.

e.g. '1a2b&3,45 c,>DE.$~99X yZ' -> '1a2b345c99y'

IF OBJECT_ID('fn_FilterString') IS NOT NULL DROP FUNCTION dbo.fn_FilterString
GO
CREATE FUNCTION dbo.fn_FilterString
(@String VARCHAR(8000), @Filter VARCHAR(100), @IsUnicode BIT = 0)
RETURNS VARCHAR(8000)
AS
/*
Returns a string with specified characters left or removed. Run the examples for clarity.

Examples:

DECLARE @s VARCHAR(500)
SET @s = '1a2b&3,45 c,>DE.$~99X yZ'

SELECT dbo.fn_FilterString(@s, '_', 0) --identity
SELECT dbo.fn_FilterString(@s, '[0-9]', 0) --just numbers
SELECT dbo.fn_FilterString(@s, '[^0-9]', 0) --no numbers
SELECT dbo.fn_FilterString(@s, '[a-z]', 0) --just letters
SELECT dbo.fn_FilterString(@s, '[a-z]', 1) --just lower case letters
SELECT dbo.fn_FilterString(@s, '[A-Z]', 1) --just upper case letters
SELECT dbo.fn_FilterString(@s, '[^ ,.]', 0) --removes spaces, commas and full stops
*/

BEGIN

DECLARE @StringLength INT
SET @StringLength = LEN(@String)

DECLARE @t TABLE (i INT IDENTITY(1, 1), c CHAR(1))
INSERT @t SELECT SUBSTRING(@String, number, 1) FROM dbo.F_TABLE_NUMBER_RANGE(0, @StringLength) ORDER BY number
--> See http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=47685 for dbo.F_TABLE_NUMBER_RANGE

DECLARE @s VARCHAR(500)
SET @s = ''

SELECT @s = @s + c FROM @t
WHERE (@IsUnicode = 0 AND c LIKE @Filter)
OR (@IsUnicode = 1 AND c COLLATE Latin1_General_BIN LIKE @Filter)
ORDER BY i

RETURN @s

END
Any ideas for better ways of doing this?


Ryan Randall
www.monsoonmalabar.com London-based IT consultancy

Solutions are easy. Understanding the problem, now, that's the hard part.

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-02-13 : 12:51:07
Is 40 times faster good enough?
My suggestion is also case sensitive
CREATE TABLE	#Test
(
fuid VARCHAR(36),
nuid nvarchar(36)
)

DECLARE @i INT

SELECT @i = 0

WHILE @i < 10000
BEGIN
INSERT #Test
(
fuid
)
SELECT CAST(NEWID() AS VARCHAR(36))

SET @i = @i + 1
END

UPDATE #Test
SET fuid = LOWER(LEFT(fuid, 18)) + UPPER(RIGHT(fuid, 18))

UPDATE #Test
SET nuid = cast(fuid as nvarchar(36))

select * from #test

go
declare @dt datetime
select @dt = getdate()

select max(dbo.fnfilterstring(fuid, '[A-Cd-f]')),
max(dbo.fnfilterstring(nuid, '[A-Cd-f]'))
from #test

select 'Peso ', datediff(ms, @dt, getdate())

go
declare @dt datetime
select @dt = getdate()

select max(dbo.fn_filterstring(fuid, '[A-Cd-f]', 0)),
max(dbo.fn_filterstring(nuid, '[A-Cd-f]', 0))
from #test

select 'Ryan ', datediff(ms, @dt, getdate())
go

drop table #test

Here is my suggestion for the function
CREATE FUNCTION dbo.fnFilterString
(
@String VARCHAR(8000),
@Filter VARCHAR(100)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @Index SMALLINT

SET @Index = DATALENGTH(@String)

WHILE @Index > 0
IF SUBSTRING(@String, @Index, 1) COLLATE Latin1_General_BIN LIKE @Filter
SET @Index = @Index - 1
ELSE
SELECT @String = STUFF(@String, @Index, 1, ''),
@Index = @Index - 1

RETURN @String
END

Peter Larsson
Helsingborg, Sweden
Go to Top of Page

RyanRandall
Master Smack Fu Yak Hacker

1074 Posts

Posted - 2007-02-13 : 13:05:35
No fair, Peso!

I wrote this function years ago in SQL 2000, but I noticed the 'loop trick' didn't work directly in 2005 without involving a temporary table - so I rewrote it using one. And surprise, surprise, that's what slows it down.

Good work on the better version


Ryan Randall
www.monsoonmalabar.com London-based IT consultancy

Solutions are easy. Understanding the problem, now, that's the hard part.
Go to Top of Page

RyanRandall
Master Smack Fu Yak Hacker

1074 Posts

Posted - 2007-02-13 : 13:18:01
By the way, Peso, have you come across this or something like it yet...

DECLARE @s VARCHAR(10)
SET @s = ''
SELECT @s = @s + SUBSTRING('a1b2c3d4e5', number, 1) FROM dbo.F_TABLE_NUMBER_RANGE(0, 10) ORDER BY number
SELECT @s

--SQL 2000 = 'a1b2c3d4e5'
--SQL 2005 = '5'

Have you seen any discussions on it?


Ryan Randall
www.monsoonmalabar.com London-based IT consultancy

Solutions are easy. Understanding the problem, now, that's the hard part.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-02-13 : 13:24:07
Actually I have another improvement. If there is not valid filter match, what should the faulty character be replaced with?
An empty string or something else? There is also an option to replace all redundant replacement characters.
ALTER FUNCTION [dbo].[fnFilterString]
(
@Source VARCHAR(8000),
@Filter VARCHAR(8000),
@Replacement CHAR(1),
@SingleBetweenReplacementOnly BIT
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE @Index SMALLINT

SET @Index = DATALENGTH(@Source)

WHILE @Index > 0
IF SUBSTRING(@Source, @Index, 1) COLLATE Latin1_General_BIN LIKE @Filter
SET @Index = @Index - 1
ELSE
SELECT @Source = STUFF(@Source, @Index, 1, @Replacement),
@Index = @Index - 1

IF @SingleBetweenReplacementOnly = 1
BEGIN
WHILE CHARINDEX(@Replacement + @Replacement, @Source) > 0
SET @Source = REPLACE(@Source, @Replacement + @Replacement, @Replacement)

IF LEFT(@Source, 1) = @Replacement
SET @Source = STUFF(@Source, 1, 1, '')

IF RIGHT(@Source, 1) = @Replacement
SET @Source = STUFF(@Source, DATALENGTH(@Source), 1, '')
END

RETURN @Source
END
Use this new version with this test code!
declare @s varchar(500)

select @s = 'Peso' + char(10) + 'Helsingborg' + CHAR(13) + 'Sweden' + CHAR(13) + CHAR(10) + 'SQL'

print @s

print ''

print dbo.fnFilterString(@s, '[^' + CHAR(13) + CHAR(10) + ']', '_', 0)
print dbo.fnFilterString(@s, '[e]', '*', 1)
print dbo.fnFilterString(@s, '[^n]', '', 0)

Peter Larsson
Helsingborg, Sweden
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-02-13 : 13:29:04
quote:
Originally posted by RyanRandall

Have you seen any discussions on it?
Yes, there was one here at SQLTeam some time ago. I remember RockMoose and Kristen were involved.
However, this works in SQL Server 2005

DECLARE @s VARCHAR(10)
SET @s = ''
SELECT @s = @s + SUBSTRING('a1b2c3d4e5', number, 1) FROM dbo.F_TABLE_NUMBER_RANGE(1, 10) ORDER BY number
SELECT @s
go

DECLARE @s VARCHAR(10)
SET @s = ''
SELECT @s = @s + z
from (select top 100 percent SUBSTRING('a1b2c3d4e5', number, 1) as z FROM dbo.F_TABLE_NUMBER_RANGE(1, 10) ORDER BY number) as d

SELECT @s


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

RyanRandall
Master Smack Fu Yak Hacker

1074 Posts

Posted - 2007-02-13 : 14:09:08
quote:
However, this works in SQL Server 2005
Excellent. Thank you!

Ryan Randall
www.monsoonmalabar.com London-based IT consultancy

Solutions are easy. Understanding the problem, now, that's the hard part.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-04-14 : 16:08:57
A slight variation
CREATE FUNCTION dbo.fnExtractDigitSequence
(
@ColumnDelimiter CHAR(1) = ',',
@Filter VARCHAR(8000) = '[0-9]',
@UserData VARCHAR(8000) = NULL
)
RETURNS VARCHAR(8000)
AS
BEGIN

DECLARE @Position INT,
@ValidChar TINYINT,
@LastValidChar TINYINT

SELECT @Position = LEN(@UserData),
@LastValidChar = 0

WHILE @Position > 0
SELECT @ValidChar = CASE
WHEN SUBSTRING(@UserData, @Position, 1) COLLATE Latin1_General_BIN LIKE @Filter THEN 1
ELSE 0
END,
@UserData = CASE
WHEN @ValidChar = 1 THEN @UserData
WHEN @LastValidChar = 0 THEN STUFF(@UserData, @Position, 1, '')
ELSE STUFF(@UserData, @Position, 1, @ColumnDelimiter)
END,
@LastValidChar = @ValidChar,
@Position = @Position - 1

RETURN CASE
WHEN @UserData LIKE @ColumnDelimiter + '%' THEN SUBSTRING(@UserData, 2, 7999)
ELSE NULLIF(@UserData, '')
END
END
You can call with
SELECT	dbo.fnExtractDigitSequence(default, '[0-9]', '...,,,,,123456789,,,,,,.........876564532$$££^^(.........,,,,,')
SELECT dbo.fnExtractDigitSequence(',', default, '...,,,,,123456789,,,,,,.........876564532$$££^^(.........,,,,,')
SELECT dbo.fnExtractDigitSequence(default, default, '...,,,,,123456789,,,,,,.........876564532$$££^^(.........,,,,,')




E 12°55'05.25"
N 56°04'39.16"
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-11-12 : 06:57:29
If you only want numeric characters from a string?
DECLARE @Value NVARCHAR(200)

SET @Value = 'a+(6aaaa02.......()/)(/)=(#¤NIOU#H¤CÖIÖERFÖNKER)123456789/fffffffffffff'

WHILE @Value LIKE '%[^0-9]%'
SET @Value = REPLACE(@Value, SUBSTRING(@Value, PATINDEX('%[^0-9]%', @Value), 1), '')

SELECT @Value



E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

madhivanan
Premature Yak Congratulator

22864 Posts

Posted - 2008-11-12 : 08:13:00
http://sqlblogcasts.com/blogs/madhivanan/archive/2007/12/18/extract-only-numbers-from-a-string.aspx

Madhivanan

Failing to plan is Planning to fail
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2008-11-12 : 08:43:37
A bit different.

1) I use no tally table where you do
2) I only replace a character once where you loop all characters and check them one by one


E 12°55'05.63"
N 56°04'39.26"
Go to Top of Page

amachanic
SQL Server MVP

169 Posts

Posted - 2008-11-12 : 19:02:00
And another version, designed to properly mimic the behavior of REPLACE in all cases:

http://sqlblog.com/blogs/adam_machanic/archive/2006/07/12/pattern-based-replacement-udf.aspx

---
Adam Machanic
SQL Server MVP
http://www.datamanipulation.net
Go to Top of Page
   

- Advertisement -