Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
dms-distance
Starting Member
8 Posts |
Posted - 2011-06-07 : 06:28:24
|
for calculating the length of de LCS i already found this post:quote: Originally posted by Stoad strehngthend Of crs, I saw it. I meant "abcdef" = "abcdefxyzxyz", no referring to human names.Secondly, my exact taste is: CompareText(txt1, txt2) => LCS(txt1, txt2) / MAX(Len(txt1), Len(txt2)) * 100%,where LCS is Longest Common Subsequence of txt1 and txt2: http://en.wikipedia.org/wiki/Longest_common_subsequenceAnd of course it's slow (O(n*m)):[code][font=Courier New]CREATE FUNCTION mf_LCSLength( @x varchar(8000), @y varchar(8000))returns intAS/*Computing the length of the LCS: http://en.wikipedia.org/wiki/Longest_common_subsequenceThe below function takes as input sequences X[1..m] and Y[1..n] computes the LCSbetween X[1..i] and Y[1..j] for all 1 = i = m and 1 = j = n, and stores it inC[i,j]. C[m,n] will contain the length of the LCS of X and Y.function LCSLength(X[1..m], Y[1..n]) C = array(0..m, 0..n) for i := 0..m C[i,0] = 0 for j := 0..n C[0,j] = 0 for i := 1..m for j := 1..n if X[i] = Y[j] C[i,j] := C[i-1,j-1] + 1 else: C[i,j] := max(C[i,j-1], C[i-1,j]) return C[m,n]*/BEGINdeclare @i intdeclare @j intdeclare @p intdeclare @q intdeclare @n intdeclare @m intdeclare @t table (m int, a int, b int)select @n = len(@x), @m = len(@y), @i = 0while @i <= @m begin insert into @t select @i, 0, 0 set @i = @i + 1 endset @i = 1while @i <= @nbegin set @j = 1 while @j <= @m begin if substring(@x, @i, 1) = substring(@y, @j, 1) update @t set b = (select a from @t where m = @j - 1) + 1 where m = @j else begin select @p = a from @t where m = @j select @q = b from @t where m = @j - 1 if @p < @q set @p = @q update @t set b = @p where m = @j end set @j = @j + 1 end update @t set a = b set @i = @i + 1endRETURN(select max(b) from @t)END
furthermore i found this one:http://devio.wordpress.com/2010/09/16/calculating-the-length-of-the-longest-common-subsequence-in-tsql/but i need a function which returns the lcs itself and not its length. example:dbo.lcs('John','AAAJohnathan') should return John not 4has anyone an idea how to change it? |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2011-06-07 : 08:52:56
|
I realize this isn't a modification of the posted sample... but doesn't it solve your problem?How about this:Declare @str1 varchar(8000), @str2 varchar(8000) Set @str1 = 'John' Set @str2 = 'AAAJohnathan'--Set @str1 = 'JohnZZZZ' Set @str2 = 'ZZZZJohnathan' ;with n As ( Select n = A.number + B.n*2048 From master..spt_values A Cross Join (Select n = number From master..spt_values Where type = 'P' and number < 4) B Where type = 'P' ) Select seg1, RankNum From ( Select *, seg1 = substring(str1,pos,n+1), RankNum = Dense_Rank() Over(Order By n desc) From ( Select str1 = @str1, str1Len = len(@str1), pos = n From n Where n < len(@str1) and n > 0 ) A Inner Join n B On A.str1Len - A.pos >= B.n and 0 <= B.n Where @str2 like '%'+substring(str1,pos,n+1)+'%' ) Z Where RankNum = 1 Corey I Has Returned!! |
 |
|
dms-distance
Starting Member
8 Posts |
Posted - 2011-06-07 : 09:30:49
|
thanks a lot, it seems to be right.but how can i make a function of your code, that returns only the lcs?what is meant by master..spt_values? |
 |
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2011-06-07 : 10:18:27
|
quote: thanks a lot, it seems to be right.but how can i make a function of your code, that returns only the lcs?what is meant by master..spt_values?
I'm using master..spt_values to generate a numbers (or tally) table. There are lots of ways to do this, but this is a simple one. Not the fastest option out there...This article has a good cte tally table (see Fig. 7): http://www.sqlservercentral.com/articles/Tally+Table/72993/ Drop Function LongestCommonSegmentGoCREATE FUNCTION LongestCommonSegment( @x varchar(8000), @y varchar(8000))returns @results table (seq varchar(8000), seg1Len int)AS Begin if (len(isnull(@x,'')) > len(isnull(@y,''))) Begin Declare @z varchar(8000) Set @z = isnull(@x,'') Set @x = isnull(@y,'') Set @y = @z End ;with n As ( Select n = A.number + B.n*2048 From master..spt_values A Cross Join (Select n = number From master..spt_values Where type = 'P' and number < 4) B Where type = 'P' ) Insert Into @results Select seg, seg1Len = n+1 From ( Select *, seg = substring(str1,pos,n+1), RankNum = Dense_Rank() Over(Order By n desc) From ( Select str1 = @x, str1Len = len(@x), pos = n From n Where n < len(@x) and n > 0 ) A Inner Join n B On A.str1Len - A.pos >= B.n and 0 <= B.n Where @y like '%'+substring(str1,pos,n+1)+'%' ) Z Where RankNum = 1 ReturnEndGoDeclare @str1 varchar(8000), @str2 varchar(8000) Set @str1 = 'John' Set @str2 = 'AAAJohnathan'--Set @str1 = 'JohnZZZZ' Set @str2 = 'ZZZZJohnathan'Select * From dbo.LongestCommonSegment(@str2,@str1) Corey I Has Returned!! |
 |
|
dms-distance
Starting Member
8 Posts |
Posted - 2011-06-08 : 04:50:36
|
thank you very much!!! |
 |
|
thusi
Starting Member
25 Posts |
Posted - 2011-10-04 : 14:22:35
|
quote: Originally posted by Seventhnight...CREATE FUNCTION LongestCommonSegment( @x varchar(8000), @y varchar(8000))returns @results table (seq varchar(8000), seg1Len int)AS Begin ...EndGoDeclare @str1 varchar(8000), @str2 varchar(8000) Set @str1 = 'John' Set @str2 = 'AAAJohnathan'--Set @str1 = 'JohnZZZZ' Set @str2 = 'ZZZZJohnathan'Select * From dbo.LongestCommonSegment(@str2,@str1)
Thanks for this post Seventhnight. Was VERY helpful for some of the work I'm doing. However, as you've correctly said, doesn't look like it's the fastest. I'm trying to use this as a UDF to run across ~600k records, and it's taking forever on a 2.4GHz i5 machine with 8GB RAM :( Any thoughts on how you could speed it up? Many thanks! |
 |
|
|
|
|
|
|