Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Longest Common Subsequence to Longest Common Subst

Author  Topic 

dms-distance
Starting Member

8 Posts

Posted - 2011-06-07 : 06:28:24
for calculating the length of de LCS i already found this post:
quote:
Originally posted by Stoad strehngthend

Of crs, I saw it. I meant "abcdef" = "abcdefxyzxyz", no referring to human names.
Secondly, my exact taste is: CompareText(txt1, txt2) => LCS(txt1, txt2) / MAX(Len(txt1), Len(txt2)) * 100%,
where LCS is Longest Common Subsequence of txt1 and txt2: http://en.wikipedia.org/wiki/Longest_common_subsequence
And of course it's slow (O(n*m)):
[code][font=Courier New]
CREATE FUNCTION mf_LCSLength
(
@x varchar(8000),
@y varchar(8000)
)
returns int
AS
/*
Computing the length of the LCS: http://en.wikipedia.org/wiki/Longest_common_subsequence

The below function takes as input sequences X[1..m] and Y[1..n] computes the LCS
between X[1..i] and Y[1..j] for all 1 = i = m and 1 = j = n, and stores it in
C[i,j]. C[m,n] will contain the length of the LCS of X and Y.

function LCSLength(X[1..m], Y[1..n])
C = array(0..m, 0..n)
for i := 0..m
C[i,0] = 0
for j := 0..n
C[0,j] = 0
for i := 1..m
for j := 1..n
if X[i] = Y[j]
C[i,j] := C[i-1,j-1] + 1
else:
C[i,j] := max(C[i,j-1], C[i-1,j])
return C[m,n]
*/
BEGIN
declare @i int
declare @j int
declare @p int
declare @q int
declare @n int
declare @m int
declare @t table (m int, a int, b int)
select @n = len(@x), @m = len(@y), @i = 0

while @i <= @m
begin
insert into @t select @i, 0, 0
set @i = @i + 1
end

set @i = 1
while @i <= @n
begin
set @j = 1
while @j <= @m
begin
if substring(@x, @i, 1) = substring(@y, @j, 1)
update @t set b = (select a from @t where m = @j - 1) + 1 where m = @j
else
begin
select @p = a from @t where m = @j
select @q = b from @t where m = @j - 1
if @p < @q set @p = @q
update @t set b = @p where m = @j
end
set @j = @j + 1
end
update @t set a = b
set @i = @i + 1
end
RETURN(select max(b) from @t)
END



furthermore i found this one:
http://devio.wordpress.com/2010/09/16/calculating-the-length-of-the-longest-common-subsequence-in-tsql/

but i need a function which returns the lcs itself and not its length.
example:
dbo.lcs('John','AAAJohnathan') should return John not 4

has anyone an idea how to change it?

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-06-07 : 08:52:56
I realize this isn't a modification of the posted sample... but doesn't it solve your problem?

How about this:


Declare @str1 varchar(8000),
@str2 varchar(8000)


Set @str1 = 'John' Set @str2 = 'AAAJohnathan'
--Set @str1 = 'JohnZZZZ' Set @str2 = 'ZZZZJohnathan'


;with n As (
Select n = A.number + B.n*2048
From master..spt_values A
Cross Join (Select n = number From master..spt_values Where type = 'P' and number < 4) B
Where type = 'P'

)

Select
seg1,
RankNum
From
(
Select
*,
seg1 = substring(str1,pos,n+1),
RankNum = Dense_Rank() Over(Order By n desc)
From
(
Select
str1 = @str1,
str1Len = len(@str1),
pos = n
From n
Where n < len(@str1)
and n > 0
) A
Inner Join n B
On A.str1Len - A.pos >= B.n
and 0 <= B.n
Where @str2 like '%'+substring(str1,pos,n+1)+'%'
) Z
Where RankNum = 1


Corey

I Has Returned!!
Go to Top of Page

dms-distance
Starting Member

8 Posts

Posted - 2011-06-07 : 09:30:49
thanks a lot, it seems to be right.

but how can i make a function of your code, that returns only the lcs?
what is meant by master..spt_values?
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2011-06-07 : 10:18:27
quote:

thanks a lot, it seems to be right.

but how can i make a function of your code, that returns only the lcs?
what is meant by master..spt_values?



I'm using master..spt_values to generate a numbers (or tally) table. There are lots of ways to do this, but this is a simple one. Not the fastest option out there...

This article has a good cte tally table (see Fig. 7): http://www.sqlservercentral.com/articles/Tally+Table/72993/



Drop Function LongestCommonSegment
Go

CREATE FUNCTION LongestCommonSegment
(
@x varchar(8000),
@y varchar(8000)
)
returns @results table (seq varchar(8000), seg1Len int)
AS
Begin
if (len(isnull(@x,'')) > len(isnull(@y,'')))
Begin
Declare @z varchar(8000)

Set @z = isnull(@x,'')
Set @x = isnull(@y,'')
Set @y = @z
End

;with n As (
Select n = A.number + B.n*2048
From master..spt_values A
Cross Join (Select n = number From master..spt_values Where type = 'P' and number < 4) B
Where type = 'P'

)

Insert Into @results
Select
seg,
seg1Len = n+1
From
(
Select
*,
seg = substring(str1,pos,n+1),
RankNum = Dense_Rank() Over(Order By n desc)
From
(
Select
str1 = @x,
str1Len = len(@x),
pos = n
From n
Where n < len(@x)
and n > 0
) A
Inner Join n B
On A.str1Len - A.pos >= B.n
and 0 <= B.n
Where @y like '%'+substring(str1,pos,n+1)+'%'
) Z
Where RankNum = 1

Return
End
Go


Declare @str1 varchar(8000),
@str2 varchar(8000)


Set @str1 = 'John' Set @str2 = 'AAAJohnathan'
--Set @str1 = 'JohnZZZZ' Set @str2 = 'ZZZZJohnathan'

Select * From dbo.LongestCommonSegment(@str2,@str1)



Corey

I Has Returned!!
Go to Top of Page

dms-distance
Starting Member

8 Posts

Posted - 2011-06-08 : 04:50:36
thank you very much!!!
Go to Top of Page

thusi
Starting Member

25 Posts

Posted - 2011-10-04 : 14:22:35
quote:
Originally posted by Seventhnight


...

CREATE FUNCTION LongestCommonSegment
(
@x varchar(8000),
@y varchar(8000)
)
returns @results table (seq varchar(8000), seg1Len int)
AS
Begin
...
End
Go


Declare @str1 varchar(8000),
@str2 varchar(8000)


Set @str1 = 'John' Set @str2 = 'AAAJohnathan'
--Set @str1 = 'JohnZZZZ' Set @str2 = 'ZZZZJohnathan'

Select * From dbo.LongestCommonSegment(@str2,@str1)






Thanks for this post Seventhnight. Was VERY helpful for some of the work I'm doing. However, as you've correctly said, doesn't look like it's the fastest. I'm trying to use this as a UDF to run across ~600k records, and it's taking forever on a 2.4GHz i5 machine with 8GB RAM :( Any thoughts on how you could speed it up? Many thanks!
Go to Top of Page
   

- Advertisement -