Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
lazerath
Constraint Violating Yak Guru
343 Posts |
Posted - 2014-07-03 : 11:12:26
|
I have a database of configuration entries with multiple, slightly different copies and was looking for an easy way to strip out the differences between versions to come up with a template for each one.I searched the internet for a string comparison / difference function, procedure or CLR function and couldn't find one for SQL Server. Instead, I ran across a reference to a paper describing an algorithm and decided, what the heck, let's try it. The algorithm I used is the O(ND) Difference Algorithm over at http://www.xmailserver.org/diff2.pdf.There are a couple points that I should note:1.) The larger the strings are AND the more differences that exist between the strings, the slower this will go. For most use cases, this should be OK, but don't bother comparing > 1k strings with little to no similarities.2.) It only returns common subsequences at this point -- ideally, I would like to have it return the differences as well. The algorithm itself is meant to produce Shortest Edit Script (SES), which is what you would use to visualize differences akin to a source code comparison tool.3.) Ideally, this would be implemented as a CLR function and that will mitigate much of the issues.4.) For my purposes, I decided it would work better if I didn't include single character matches in the end result. I intend to update the function to include a parameter to control this, but haven't gotten to it yet.5.) Since performance dropped off once string size increases into the thousands, I decided to limit the string sizes to 8k. Please feel free to point me to other implementations of difference algorithms, tell me this isn't the right layer to implement this, or offer constructive feedback to improve performance or features.First, before we get to the algorithm, you will need a number generator or source. I chose to use a technique I lifted from Jeff Moden's excellent Split function, so I am providing it here:[CODE]GOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOIF OBJECT_ID('dbo.udfNumberFast') IS NOT NULL DROP FUNCTION dbo.udfNumberFast;GOCREATE FUNCTION [dbo].[udfNumberFast]( @TOTAL INT)/*This FUNCTION RETURNS an integer TABLE containing ALL integersIN the range of @START_NUMBER through @END_NUMBER, inclusive.The maximum number of rows that this FUNCTION can RETURNIS 16777216.*/RETURNS TABLE ASRETURN(--===== "Inline" CTE Driven "Tally Table" produces values from 0 up to 10,000... -- enough to cover NVARCHAR(4000) WITH E1(N) AS ( SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 ), --10E+1 or 10 rows E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows E4(N) AS (SELECT TOP (@TOTAL) 1 FROM E2 a, E2 b, E2 c, E2 d, E2 e), --10E+4 or 10,000 rows max cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front -- for both a performance gain and prevention of accidental "overruns" SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4 )SELECT c.N AS NumberFROM cteTallyAS c)GO[/CODE] Here is the string difference/compare function that finds the common sub-sequences between two strings:[CODE]GOSET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOIF OBJECT_ID('dbo.udfCompareStrings') IS NOT NULL DROP FUNCTION dbo.udfCompareStrings;GOCREATE FUNCTION [dbo].[udfCompareStrings]( @String1 VARCHAR(8000), @String2 VARCHAR(8000))RETURNS @Output TABLE ( d SMALLINT, -- Number of non-diagonal edges to this point k SMALLINT, -- Diagonal x SMALLINT, -- Character X coordinate in grid y SMALLINT, -- Character Y coordinate in grid LastK SMALLINT, -- Breadcrumb of path NodeCount SMALLINT, -- Number of characters in the diagonal String VARCHAR(8000) ) AS-- Implementation of the O(ND) Difference algorithm at:-- http://www.xmailserver.org/diff2.pdf-- Algorithm ported by Lazerath on 7/3/2014BEGIN ------------------------------------------------------------------------------------------------------- -- INITIALIZATION ------------------------------------------------------------------------------------------------------- --SET NOCOUNT ON; DECLARE @DT DATETIME; SET @DT = GETDATE(); DECLARE @String1Len SMALLINT, @String2Len SMALLINT, @StringMINLen SMALLINT, @StringMAXLen SMALLINT, @k_m SMALLINT, @k_n SMALLINT, @IDX_d SMALLINT, @IDX_k SMALLINT, @x_k1 SMALLINT, @x_k2 SMALLINT, @NodeCount SMALLINT, @LastK SMALLINT, @x SMALLINT, @y SMALLINT; DECLARE @Number TABLE (Number SMALLINT PRIMARY KEY); DECLARE @String1Chars TABLE (x SMALLINT PRIMARY KEY, VALUE CHAR(1)); DECLARE @String2Chars TABLE (y SMALLINT PRIMARY KEY, VALUE CHAR(1)); DECLARE @EditGraph TABLE ( k SMALLINT, x SMALLINT, y SMALLINT, x2 SMALLINT, isConnectedAfter BIT PRIMARY KEY(k,x) ); DECLARE @V TABLE ( k SMALLINT, -- Starting Node x SMALLINT, -- Starting Node y SMALLINT, -- Starting Node LastK SMALLINT, NodeCount SMALLINT, PRIMARY KEY(k) ); DECLARE @VPath TABLE ( d SMALLINT, k SMALLINT, -- Starting Node x SMALLINT, -- Starting Node y SMALLINT, -- Starting Node LastK SMALLINT, NodeCount SMALLINT, PRIMARY KEY(d,k) ); ------------------------------------------------------------------------------------------------------- -- Begin Code ------------------------------------------------------------------------------------------------------- SET @String1Len = LEN(@String1); SET @String2Len = LEN(@String2); SET @StringMINLen = CASE WHEN @String1Len < @String2Len THEN @String1Len ELSE @String2Len END; SET @StringMAXLen = CASE WHEN @String1Len < @String2Len THEN @String2Len ELSE @String1Len END; INSERT @Number(Number) SELECT n.Number FROM dbo.udfNumberFast(@StringMAXLen) AS n; INSERT @String1Chars(x,VALUE) SELECT n.Number AS x,SUBSTRING(@String1,n.Number,1) AS VALUE FROM @Number AS n WHERE n.Number <= @String1Len; INSERT @String2Chars(y,VALUE) SELECT n.Number AS y,SUBSTRING(@String2,n.Number,1) AS VALUE FROM @Number AS n WHERE n.Number <= @String2Len; DECLARE @msg VARCHAR(100); SET @IDX_d = 0 WHILE @IDX_d <= @String1Len + @String2Len BEGIN SET @k_m = CASE WHEN @StringMINLen < @String2Len-@IDX_d THEN @StringMINLen ELSE @String2Len-@IDX_d END; SET @k_n = CASE WHEN @StringMINLen < @String1Len-@IDX_d THEN @StringMINLen ELSE @String1Len-@IDX_d END; SET @k_m = CASE WHEN @k_m < 0 THEN 0 ELSE @k_m END; SET @k_n = CASE WHEN @k_n < 0 THEN 0 ELSE @k_n END; ;WITH cteDiagonalGridPoint AS ( SELECT -@IDX_d AS k, n.Number AS x, n.Number+@IDX_d AS y FROM @Number AS n WHERE n.Number <= @k_m UNION SELECT @IDX_d AS k, n.Number+@IDX_d AS x, n.Number AS y FROM @Number AS n WHERE n.Number <= @k_n ) INSERT @EditGraph (k,x,y,x2) SELECT d.k,d.x,d.y,d.x FROM cteDiagonalGridPoint AS d JOIN @String1Chars AS s1 ON d.x = s1.x JOIN @String2Chars AS s2 ON d.y = s2.y WHERE s1.Value = s2.Value; UPDATE e1 SET e1.isConnectedAfter = 0 FROM @EditGraph AS e1 WHERE e1.k IN(-@IDX_d,@IDX_d) AND NOT EXISTS ( SELECT * FROM @EditGraph AS e2 WHERE e1.k = e2.k AND e1.x = e2.x-1 ); UPDATE e1 SET e1.x2 = ( SELECT MIN(e2.x) AS x2 FROM @EditGraph AS e2 WHERE e1.k = e2.k AND e1.x <= e2.x AND e2.isConnectedAfter = 0 ) FROM @EditGraph AS e1 WHERE e1.k IN(-@IDX_d,@IDX_d); -- DELETE nodes at the end of a tunnel that only span a single character DELETE g FROM @EditGraph AS g WHERE g.x = g.x2 AND g.isConnectedAfter=0; INSERT @v(k,x,y,LastK,NodeCount) SELECT -@IDX_d,0,0,0,0 UNION SELECT +@IDX_d,0,0,0,0 SET @IDX_k = -@IDX_d; WHILE (@IDX_k <= @IDX_d) BEGIN SET @x_k1=COALESCE((SELECT v.x FROM @v AS v WHERE v.k=@IDX_k-1),0); SET @x_k2=COALESCE((SELECT v.x FROM @v AS v WHERE v.k=@IDX_k+1),0); -- Am I at the beginning of the loop? If so, take the k+1 diagonal since the k-1 doesn't exist -- If I am not at the end and the k+1 diagonal is a farther path, use that, otherwise, use the k-1 IF (@IDX_k=-@IDX_d OR (@IDX_k < @IDX_d AND @x_k1 < @x_k2)) BEGIN SET @x = @x_k2; -- Vertical Edge SET @LastK = @IDX_k+1; END ELSE BEGIN SET @x = @x_k1+1; -- Horizontal Edge SET @LastK = @IDX_k-1; END SET @y = @x - @IDX_k; SET @NodeCount = 0; -- Follow the snake (if one exists)! SELECT @x = e.x2, @y = e.x2-@IDX_k, @NodeCount = e.x2-e.x+1 FROM @EditGraph AS e WHERE e.k = @IDX_k AND e.x = @x+1; -- Update the current status of my path progress on this diagonal (k) UPDATE v SET x = @x, y = @y, LastK=@LastK, NodeCount = @NodeCount FROM @v AS v WHERE k = @IDX_k; -- Save my state so I can come back later and get the full path of the SES INSERT @VPath SELECT @IDX_d,* FROM @v WHERE k=@IDX_k; -- Check if I am at the end or I have somehow run out of d values IF @x >= @String1Len AND @y >= @String2Len OR @IDX_d > @String1Len + @String2Len BEGIN GOTO END_LOOP; END SET @IDX_k = @IDX_k + 2; END SET @IDX_d = @IDX_d +1; END END_LOOP: -- Go backwards to eliminate all path entries that didn't contribute to the SES WHILE (@IDX_d > 0) BEGIN -- Delete all path entries that weren't part of the SES DELETE p FROM @VPath AS p WHERE p.d = @IDX_d AND NOT (p.k = @IDX_k); -- Determine the last K that helped me get to this D (it identifies the path of the SES) SELECT @IDX_k = p.LastK FROM @VPath AS p WHERE p.d = @IDX_d AND p.k = @IDX_k; SET @IDX_d = @IDX_d-1; END INSERT @Output SELECT v.d, v.k, v.x, v.y, v.LastK, v.NodeCount, SUBSTRING(@String1,v.x-v.NodeCount+1,v.NodeCount) AS SubSequence FROM @VPath AS v WHERE v.NodeCount > 0 ORDER BY v.d, v.k, v.x; RETURN;ENDGO[/CODE] |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2014-07-03 : 13:08:58
|
I'm probably dull today, but doesn't this give the very same result?DECLARE @String1 VARCHAR(8000) = 'Aardvaark', @String2 VARCHAR(8000) = 'aard';WITH cteString1(Pos, chr)AS ( SELECT Number, SUBSTRING(@String1, Number, 1) FROM dbo.udfNumberFast(LEN(@String1))), cteString2(Pos, chr)AS ( SELECT Number, SUBSTRING(@String2, Number, 1) FROM dbo.udfNumberFast(LEN(@String2))), cteSource(pos1, pos2)AS ( SELECT s1.pos AS po1, s2.pos AS pos2 FROM cteString1 AS s1 INNER JOIN cteString2 AS s2 ON s2.chr = s1.chr GROUP BY s1.pos, s2.pos)SELECT MIN(pos1) AS FromPos, MAX(pos1) AS ToPos, COUNT(*) AS Characters, SUBSTRING(@String1, MIN(pos1), COUNT(*)) AS LettersFROM cteSourceGROUP BY pos1 - pos2HAVING COUNT(*) >= 2ORDER BY COUNT(*) DESC Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA |
|
|
lazerath
Constraint Violating Yak Guru
343 Posts |
Posted - 2014-07-03 : 13:37:36
|
No, very different. The differences are visible with your example, however become far starker given other examples:/* -- Result from swepeso's functionFromPos ToPos Characters Letters1 4 4 Aard6 8 3 aar--*/DECLARE @String1 VARCHAR(8000) = 'Aardvaark', @String2 VARCHAR(8000) = 'aard';SELECT *FROM dbo.udfCompareStrings(@String1,@String2)/* -- Result from udfCompareStringsd k x y LastK NodeCount String0 0 4 4 1 4 Aard--*/ Try this:DECLARE @String1 VARCHAR(8000), @String2 VARCHAR(8000);SET @String1 = 'Columns are returned in an unknown orderThe fear is that an application will contain a bug when it depends on a column order that is incorrect.';SET @String2 = 'Columns are returned in an unknown orderThe fear is that an application will contain a bug when it depends on a column order that is incorrect. You can’t search code for the use of a columnThat’s an interesting one. Say I’ve got a USERS table with a Title column.';/* -- Result from swepeso's functionFromPos ToPos Characters Letters1 145 145 Columns are returned in an unknown orderThe fear is that an application will contain a bug when it depends on a column order that is incorrect.3 133 27 lumns are returned in an un13 136 20 returned in an unkno10 133 20 re returned in an un12 126 18 returned in an un1 114 18 Columns are return13 144 17 returned in an un4 91 17 umns are returned3 122 16 lumns are return8 127 16 are returned in6 142 16 ns are returned 8 138 16 are returned in2 120 16 olumns are retur... RESULTS REDUCED FOR BREVITY ...122 132 2 or9 22 2 ar12 24 2 r106 123 2 nd138 139 2 co--*/SELECT *FROM dbo.udfCompareStrings(@String1,@String2)/* -- Result from udfCompareStringsd k x y LastK NodeCount String0 0 145 145 1 145 Columns are returned in an unknown orderThe fear is that an application will contain a bug when it depends on a column order that is incorrect.--*/ The function dbo.udfCompareStrings correctly determines that both strings have the same sequence and only lists that sequence once in the final result. Try something like this to:DECLARE @String1 VARCHAR(8000), @String2 VARCHAR(8000);SET @String1 = REPLICATE('<TESTURL>http://www.hgmailh.com</TESTURL> ',3);SET @String2 = REPLICATE('<TESTURL>http://www.gmail.com</TESTURL>',3);/* -- Result from swepeso's functionFromPos ToPos Characters Letters1 107 24 <TESTURL>http://www.hgma5 86 22 TURL>http://www.hgmail11 104 22 ttp://www.hgmailh.com<5 62 21 TURL>http://www.hgmai1 44 21 <TESTURL>http://www.h43 86 21 <TESTURL>http://www.h37 118 21 TURL> <TESTURL>http:/47 104 21 TURL>http://www.hgmai1 20 20 <TESTURL>http://www.19 76 20 w.hgmailh.com</TESTU19 76 20 w.hgmailh.com</TESTU61 118 20 w.hgmailh.com</TESTU85 104 20 <TESTURL>http://www.28 125 17 .com</TESTURL> <T37 83 16 TURL> <TESTURL>h79 125 16 TURL> <TESTURL>h103 125 15 w.hgmailh.com</19 41 15 w.hgmailh.com</40 115 12 L> <TESTURL>2 53 12 TESTURL>http5 86 12 TURL>http://44 95 12 TESTURL>http37 118 12 TURL> <TESTU5 76 11 TURL>http:/... RESULTS REDUCED FOR BREVITY ...82 89 2 L>16 95 2 /w63 112 2 hg37 57 2 TU--*/SELECT *FROM dbo.udfCompareStrings(@String1,@String2)/* -- Result from udfCompareStringsd k x y LastK NodeCount String0 0 20 20 1 20 <TESTURL>http://www.1 1 26 25 0 5 gmail2 2 41 39 1 14 .com</TESTURL>3 3 62 59 2 20 <TESTURL>http://www.4 4 68 64 3 5 gmail5 5 83 78 4 14 .com</TESTURL>6 6 104 98 5 20 <TESTURL>http://www.7 7 110 103 6 5 gmail8 8 125 117 7 14 .com</TESTURL>--*/ The algorithm is able to correctly line up the two very similar strings, with repeated segments, in order to find and eliminate the minor differences properly. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2014-07-03 : 15:33:23
|
So the difference is a TOP(1) ? Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA |
|
|
lazerath
Constraint Violating Yak Guru
343 Posts |
Posted - 2014-07-03 : 16:27:16
|
No, that is not correct, TOP(1) would only give you the the longest common sub-sequence. Take a look at the last example... selecting TOP 1 with your function would not produce anything close to the correct result. In fact, that would yield "<TESTURL>http://www.hgma", which itself is not even common to both strings, thereby even failing the goal of achieving longest common sub-sequence. Even if you modified your method to only select the longest common sub-sequence between two strings, that is not what this function aims to achieve.Consider the problem I outlined. If you want to take a configuration entry for a webservice from your development environment and compare it against the configuration entry from production in order to determine a general template that you could then use to create configuration entries for QA, UAT, etc...DECLARE @String1 VARCHAR(8000),@String2 VARCHAR(8000);SET @String1 = '<connectionString>'+ '<url>https://applications.companyname.com/APP1</url>'+ '<proxy>http://proxy.companyname.com:5555</proxy>' + '<licenseKey>570038F0-739D-43EA-A105-FCA09A1466E9</licenseKey>' + '<username>15E10D713A9F4BE782954FEE10D0CD40</username>' + '<password>A67B5FEFD56E4D2A97A3B4CA28B583C8</password>' + '</connectionString>';SET @String2 = '<connectionString>' + '<url>https://VXABC123SERVER/APP1</url>' + '<proxy>http://VXABC123SERVER:5555</proxy>' + '<licenseKey>EC131372-1A3B-C964-80F7-69FE8B6991C1</licenseKey>' + '<username>08EDFB1479DA42CAA1190B686AFFDAF1</username>' + '<password>CBB30AF546B34A5499CFAB2C276B8C0A</password>' + '</connectionString>';/* -- Result from swepeso's functionFromPos ToPos Characters Letters36 304 114 ications.companyname.com/APP1</url><proxy>http://proxy.companyname.com:5555</proxy><licenseKey>570038F0-739D-43EA-39 290 38 tions.companyname.com/APP1</url><proxy1 235 35 <connectionString><url>https://appl65 285 27 </url><proxy>http://proxy.c27 249 24 ps://applications.compan12 229 21 String><url>https://a28 297 21 s://applications.comp39 302 19 tions.companyname.c39 266 18 tions.companyname.288 304 17 connectionString>92 298 17 ompanyname.com:5514 237 17 ring><url>https:/... RESULTS REDUCED FOR BREVITY ...228 287 2 na31 52 2 /a283 294 2 rd45 68 2 co7 36 2 ct--*/SELECT *FROM dbo.udfCompareStrings(@String1,@String2)/* -- Result from udfCompareStrings0 0 31 31 1 31 <connectionString><url>https://42 14 84 70 15 25 /APP1</url><proxy>http://77 21 130 109 22 25 :5555</proxy><licenseKey>149 21 189 168 22 23 </licenseKey><username>213 21 242 221 22 21 </username><password>277 21 304 283 22 30 </password></connectionString>--*/ The result from udfCompareStrings can be concatenated together to form a working configuration entry (sans variables). All that would be needed would be to add a placeholder for the variable name that could programatically replaced.One thing I can see I need to adjust is the threshold for matches. Right now, it will take any 2 or greater combination, however I can easily see the need for dynamic adjustment. |
|
|
lazerath
Constraint Violating Yak Guru
343 Posts |
Posted - 2014-07-03 : 17:04:17
|
I have reviewed the method you posted above and see where your problem lies:SELECT MIN(pos1) AS FromPos, MAX(pos1) AS ToPos, COUNT(*) AS Characters, SUBSTRING(@String1, MIN(pos1), COUNT(*)) AS LettersFROM cteSourceGROUP BY pos1 - pos2HAVING COUNT(*) >= 2ORDER BY COUNT(*) DESC You are grouping by the k value (diagonal aka pos1-pos2) and taking the first matching character and extending your string for the total number of matching characters on that diagonal. Unfortunately, this will produce incorrect results on all but the best case scenarios. The reason being is that many strings will have non-contiguous match points. Take for example this string comparison:DECLARE @String1 VARCHAR(8000),@String2 VARCHAR(8000);SET @String1 = 'aaaaaBccccccccDeeeeeeee';SET @String2 = 'aaaaaaFcccccccGeeeeeee';/* -- Result from swepeso's functionFromPos ToPos Characters Letters1 22 19 aaaaaBccccccccDeeee1 21 18 aaaaaBccccccccDeee2 23 17 aaaaBccccccccDeee1 20 15 aaaaaBccccccccD3 23 14 aaaBccccccccDe1 19 12 aaaaaBcccccc4 23 11 aaBcccccccc1 18 9 aaaaaBccc5 23 8 aBcccccc1 17 6 aaaaaB13 23 5 ccDee7 16 3 ccc14 23 3 cDe--*/SELECT *FROM dbo.udfCompareStrings(@String1,@String2)/* -- Result from udfCompareStringsd k x y LastK NodeCount String0 0 5 5 1 5 aaaaa3 -1 13 14 0 7 ccccccc6 0 22 22 1 7 eeeeeee--*/ Only a single result returned from your method is common across both strings, 'ccc', and it is not even the complete sub sequence. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2014-07-03 : 19:59:40
|
Ok, I spent 30 minutes on this version vs 10 minutes on the previousDECLARE @String1 VARCHAR(8000) = 'aaaaaBccccccccDeeeeeeee', @String2 VARCHAR(8000) = 'aaaaaaFcccccccGeeeeeee';WITH cteSource(FromPosition, ToPosition, Characters)AS ( SELECT s.Number AS FromPosition, s.Number + MAX(l.Number) - 1 AS ToPosition, MAX(l.Number) AS Characters FROM dbo.udfNumberFast(LEN(@String1)) AS s CROSS APPLY dbo.udfNumberFast(LEN(@String1) - s.Number + 1) AS l WHERE @String2 LIKE '%' + SUBSTRING(@String1, s.Number, l.Number) + '%' GROUP BY s.Number), cteDisplay(FromPosition, ToPosition, Characters, Expression, rnk)AS ( SELECT MIN(FromPosition) AS FromPosition, ToPosition, MAX(Characters) AS Characters, SUBSTRING(@String1, MIN(FromPosition), MAX(Characters)) AS Expression, DENSE_RANK() OVER (PARTITION BY SUBSTRING(@String1, MIN(FromPosition), MAX(Characters)) ORDER BY MIN(FromPosition)) AS rnk FROM cteSource GROUP BY ToPosition)SELECT FromPosition, ToPosition, Characters, ExpressionFROM cteDisplayWHERE rnk = 1ORDER BY Characters DESC, FromPosition; Microsoft SQL Server MVP, MCT, MCSE, MCSA, MCP, MCITP, MCTS, MCDBA |
|
|
lazerath
Constraint Violating Yak Guru
343 Posts |
Posted - 2014-08-13 : 18:01:49
|
@SwePeso: Thank you for your input into this. Your solution is very clever and a very neat technique. I apologize for the delay in responding, but my wife and I had a child and I have been very busy.That said, it is not perfect. Unfortunately, this method does not serve the same purpose.- As constructed, it does not work with repeating segments due to the grouping. This is a requirement of such string comparison. Compare the output with the repeated gmail example by comparing REPLICATE('<TESTURL>http://www.hgmailh.com</TESTURL> ',3) against REPLICATE('<TESTURL>http://www.gmail.com</TESTURL>',3)
- Even if this is restructured to avoid the grouping, this method has problems with "refinding" smaller sequences that have already been found. Try comparing 'ThisIsAString,ThisAbsolutelyIsHis' against 'ThisIsAString'
Again, I really appreciate the attempt.I have some optimizations and updates to the original PoC function that makes the whole thing much cleaner and faster. I will try to post tomorrow. |
|
|
|
|
|
|
|