Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Need a Set-Based method to replace this cursor

Author  Topic 

SMerrill
Posting Yak Master

206 Posts

Posted - 2004-05-26 : 19:16:58
Team:
You’ll probably all tell me to go away, but is there a way to normalize this data set into a tree-view node list using a set-based method? I have already written a cursor to do it, but the cursor takes 9 hours per day, and this is getting ridiculous. You see, I have 280,000 BehaviorPattern records, and 1,200,000 NodeList records.

Think of this data as a tree-view. Start at the top with node 1 that contains CommandID=1. (It is the only node without a parent node.)
From there, it is the parent of node 2 which contains CommandID=2, and it is also the parent of node 8, which contains CommandID=3.

The source table contains a string of patterns, and there is a unique constraint on BehaviorPattern.PatternString.

Source table called BehaviorPattern:

(PatternID, PatternString)
1, “1, 2, 18, 4”
2, “1, 2, 54, 60, 4”
3, “1, 3, 19, 4”
4, “1, 3, 65, 4”


The destination table arbitrarily assigns node numbers to each node by going through each Pattern and tacking on a new NodeList record only if one doesn’t already exist where it belongs.

Destination Table called NodeList:

(NodeID, ParentNodeID, CommandID, PatternID)
1, Null, 1, Null
2, 1, 2, Null
3, 2, 18, Null
4, 3, 4, 1
5, 2, 54, Null
6, 5, 60, Null
7, 6, 4, 2
8, 1, 3, Null
9, 8, 19, Null
10, 9, 4, 4
11, 8, 65, Null
12, 11, 4, 3

Here is my cursor procedure:


/* This procedure loads the NodeList table
* by reading from the BehaviorPattern table.
*
* For each Behavior Pattern, the PatternString is dissected
* by delimiters between each CommandID within the pattern.
*
* Each CommandID is looked up in the NodeList table, and if
* it doesn't exist below the appropriate parent node, it is created.
*/

declare @PatID integer
declare @PatType Integer
declare @CmdID INTEGER
DECLARE @ParentID INTEGER
DECLARE @TempID INTEGER
declare @ptrDelim INTEGER
DECLARE @First INTEGER
DECLARE @Last INTEGER
DECLARE @CNT INTEGER
Declare @Pat varchar(250)

set nocount on
DECLARE Pat_Cursor CURSOR FOR
SELECT BehaviorPattern.PatternID, BehaviorPattern.PatternType,
BehaviorPattern.PatternString
FROM BehaviorPattern WITH (nolock) LEFT OUTER JOIN
NodeList WITH (nolock) ON
BehaviorPattern.PatternID = NodeList.PatternID
WHERE (NodeList.PatternID IS NULL) AND
PatternString NOT LIKE '%NULL%'

OPEN Pat_Cursor
FETCH NEXT FROM Pat_Cursor INTO @PatID, @PatType, @Pat
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT @First = 1
SELECT @Last = 0
-- Dissect @Pat by delimiter ', ' and loop through each numeric value.
--PRINT 'Pattern=' + cast(@PatID as varchar(20)) + '; String="' + @Pat + '"'
SELECT @ParentID = NULL
WHILE LEN(@Pat)>0
BEGIN
SELECT @PtrDelim = CHARINDEX(', ',@Pat,1) --point to the first delimiter
IF @PtrDelim = 0
BEGIN -- Delimiter not found - must be the last one in the string
SELECT @Last=1
SELECT @CmdID = CAST(ISNULL(@Pat,'0') as INTEGER)
END
ELSE
BEGIN -- Delimiter found - parse out the CmdID
SELECT @First=0
SELECT @CmdID = CAST(ISNULL(SUBSTRING(ISNULL(@Pat,""),1,@ptrDelim - 1),'0') as INTEGER)
END
/*
For each number within a pattern string,
create a node if it doesn't exist within NodeList
below the appropriate parent node.
*/
IF @ParentID is null
BEGIN
--@TempID stores the NODE of the CommandID so it can be used as a ParentID next time thru
--@CNT stores a zero if there is not an existing node with the appropriate parent
SELECT @TempID=NODE FROM NodeList with (nolock) WHERE (ParentNode IS NULL) AND (CommandID = @CmdID)
SELECT @CNT=@@ROWCOUNT -- 0 if we didn't find the node, 1 if we did
END
ELSE
BEGIN
--@TempID stores the NODE of the CommandID so it can be used as a ParentID next time thru
--@CNT stores a zero if there is not an existing node with the appropriate parent
SELECT @TempID=NODE FROM NodeList with (nolock) WHERE (ParentNode = @ParentID) AND (CommandID = @CmdID)
SELECT @CNT=@@ROWCOUNT -- 0 if we didn't find the node, 1 if we did
END
IF @CNT = 0
BEGIN
--Create the node if it does not exist below the appropriate parent
INSERT INTO Nodelist with (rowlock) (ParentNode, CommandID, PatternID)
VALUES(@ParentID , @CmdID , @PatID)

IF @@ERROR<>0
BEGIN
PRINT 'Could not insert Node: '
+ 'ParentNode=' + cast(isnull(@ParentID,'NULL') AS varchar(20))
+ ', CommandID=' + cast(isnull(@CmdID ,'NULL') AS varchar(20))
+ ', PatternID=' + cast(isnull(@PatID ,'NULL') AS varchar(20))
END
SELECT @TempID=@@IDENTITY
END --IF @CNT=0

SELECT @ParentID = @TempID
IF @Last=0
BEGIN
-- Chop off the left side of @PAT each time through the loop:
SELECT @PAT = SUBSTRING(@Pat,@ptrDelim + 2,LEN(@Pat)-(@ptrDelim+1))
END
ELSE
BEGIN
SELECT @PAT = ""
END
END -- While LEN(@Pat)>0

FETCH NEXT FROM Pat_Cursor INTO @PatID, @PatType, @Pat
END --FETCH LOOP

CLOSE Pat_CURSOR
DEALLOCATE Pat_CURSOR

Thanks for taking a look at this wild-bear code.


ALSO may I point out that the PatternID col is Null unless it is a childless node.


~ Shaun Merrill
Seattle, WA

jbkayne
Posting Yak Master

100 Posts

Posted - 2004-05-26 : 21:41:59
I don't know if you can entirely do away with the cursor since you need to back reference as you go.

I DO however have some ideas that could really cut down the amount of time.

First off, can you change the schema for the pattern? I think something like the following will perform much better (especially when properly indexed)


create table pattern (
PatternId int identity(1,1) not null
, PatternGroupId int not null
, PatternPart int not null
, PatternSeq int not null

)

insert into pattern values (1,1,1)
insert into pattern values (1,2,2)
insert into pattern values (1,18,3)
insert into pattern values (1,4,4)

insert into pattern values (2,1,1)
insert into pattern values (2,2,2)
insert into pattern values (2,54,3)
insert into pattern values (2,60,4)
insert into pattern values (2,4,5)

insert into pattern values (3,1,1)
insert into pattern values (3,3,2)
insert into pattern values (3,19,3)
insert into pattern values (3,4,4)

insert into pattern values (4,1,1)
insert into pattern values (4,3,2)
insert into pattern values (4,65,3)
insert into pattern values (4,4,4)


My second idea would be to insert in bulk.

//---- SNIPPET FROM YOUR EXAMPLE
INSERT INTO Nodelist with (rowlock) (ParentNode, CommandID, PatternID)
VALUES(@ParentID , @CmdID , @PatID)

---//

Instead:

try inserting in the following manner.... Pseudo code below

create table nodelist (
NodeID int identity (1,1) not null
, ParentNodeID int null
, CommandID int not null
, PatternID int null
)

declare @i int
select @i = 1

while @i < (select max(patterngroupid) from pattern)
begin

insert into nodelist (parentnodeid, commandid, patternid)
select
null as parentnodeid
, patternpart as commandid
, null as patternid
from
pattern
where
patterngroupid = @i


select @i = @i + 1
end


I haven't worked out any of the CASE statements needed to generate parentnodeid and patternid, but I can help more if needed.


Sincerely,

jbkayne

BTW, I too work in Seattle. Let's grab a beer sometime after work!

Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2004-05-27 : 05:51:43
Skipping through the obvious requirement that going set-based should be far better than your cursor, one presumes that you have proper indices built on the relevent tables to support the joins/where conditions....what's the execution plan like on the CURSOR definition...can you post it?

also note...that the (NOT) LIKE condition may be the true source of the problem in that (as far as I remember) won't support/use ANY index...and will be a real performance killer....

can you define the unwanted records any better (ie in a way that avoids the use of the LIKE statement and in particular the LIKE statement with a % in the 1st character of the comparison string)?



As an FYI....AjarnMark also lives/works in Seattle....I met up with him last year when I brought 2 soccer teams from Ireland to Seattle....slowly but surely the virtual community/world of SQLTeam.com is establishing physical links.
Go to Top of Page

SMerrill
Posting Yak Master

206 Posts

Posted - 2004-05-27 : 13:52:54
I am inspired enough to change course ... thanks!

This database is called a "Behavior Pattern Recognition System", and I am normalizing a multi-line record set, and recognizing patterns in the behavior.
It is up to me to recognize if a set of commands is a unique permutation of CommandIDs. If it is, I wish to create and assign a new PatternID. If it is not, I wish to look up the PatternID.
At any rate, thanks to your help, my new inspiration is to make it like your first idea up front.

I already thought of that shape, but it was downstream of the NodeList, and I created it by iterating backwards through the NodeList once it was complete. Perhaps I could re-work things and compute the BlockList before I compute the NodeList. (That will take several beers!)

I will also try to nix the NOT LIKE, or prevent {1,3,15,NULL,4,3} from happening at all.
AjarnMark used to work at the same place I now work. We are good friends.
Now, if I could just get a SQL job in Ireland and locate my family roots, I'd jump on it...

Cheers,

~ Shaun Merrill
Seattle, WA
Go to Top of Page

kselvia
Aged Yak Warrior

526 Posts

Posted - 2004-05-28 : 00:46:46
Very intersting problem. What are the permissible ranges of CommandID's in the pattern string? (i.e Max 255 or no limit) Just mulling over some ideas...
Go to Top of Page

SMerrill
Posting Yak Master

206 Posts

Posted - 2004-05-28 : 14:45:17
No limit. SELECT MAX(PatternID) FROM BehaviorPattern is currently 280295.

~ Shaun Merrill
Seattle, WA
Go to Top of Page

kselvia
Aged Yak Warrior

526 Posts

Posted - 2004-05-28 : 14:54:00
I was asking about CommandID, (the comma delimited integers in the PatternString), not PatternId.
Go to Top of Page

SMerrill
Posting Yak Master

206 Posts

Posted - 2004-05-28 : 14:56:38
Oh, Sorry. 243095 is the current maximum, but there is no limit there either.

~ Shaun Merrill
Seattle, WA
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-05-28 : 15:57:41
Dare I say you could get rid of cursors... I have a fuzzy attempt at a process:

Declare @BP Table (PatternId int identity(1,1) not null, pStr nvarchar(1000), parentId int)
Insert Into @BP Select pStr = '1,2,18,4', parentId = null
Insert Into @BP Select pStr = '1,2,54,60,4', parentId = null
Insert Into @BP Select pStr = '1,3,19,4', parentId = null
Insert Into @BP Select pStr = '1,3,65,4', parentId = null

Declare @NL Table (nodeId int identity(1,1) not null, pNodeId int, CommandId int, PatternId int)

While exists(Select * From @bp Where pStr like '%,%')
Begin
Insert Into @NL
Select
Distinct
pNodeId = (Select nodeId From @nl Where CommandId = parentId),
CommandId = left(pStr,charIndex(',',pStr)-1),
PatternId = null
From @bp
Where pStr like '%,%'

Update @bp
Set
pStr = right(pStr,len(pStr)-charIndex(',',pStr)),
parentId = left(pStr,charIndex(',',pStr)-1)
From @bp
Where pStr like '%,%'
End
Insert Into @NL
Select
Distinct
pNodeId = (Select nodeId From @nl Where CommandId = parentId),
CommandId = convert(int,pStr),
PatternId
From @bp

Select * From @NL

Select * From @bp


I will definitely take a few tweaks to get it how you want it, but something along these lines should work.

P.S. On Cursors: I cut approx 60 hours off of a cursor process by reinventing the process. It now runs in approx. 15 minutes.
Go to Top of Page

kselvia
Aged Yak Warrior

526 Posts

Posted - 2004-05-28 : 16:18:16
That is one of the coolest solutions I have ever seen. Amazing.

Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-05-28 : 16:31:55
why thank you :)

I stumbled across the concept while build a dynamic tree structure for client reporting. Honestly the above code is almost the inverse of what I needed as I was going from the parent/child rows to the 'parentage path' as I call it (so I could display a tree from 1 recordset).

Go to Top of Page

SMerrill
Posting Yak Master

206 Posts

Posted - 2004-05-28 : 19:03:36
WOW
This is the quintescence of pattern slicing! It simply chows down my data!!
As I understand your excellent solution, it slices each column of numbers off vertically.
My problem with it is that my nodes do not always contain unique values.
For example, if you change the data to
Insert Into @BP Select pStr = '1,2,18,19,4', parentId = null
Insert Into @BP Select pStr = '1,2,54,60,4', parentId = null
Insert Into @BP Select pStr = '1,3,19,4', parentId = null
Insert Into @BP Select pStr = '1,3,65,4', parentId = null

... then you will get the following error:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
But I will work with this one until I get it working.
THIS IS SIMPLY ELEGANT.

~ Shaun Merrill
Seattle, WA
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-05-28 : 23:48:31
I kinda of expected that you would geth the 'more than 1 value' error, but I thought I would wait and see how your data worked out.

In my original experiment writing the query I did not use the look up query for the nodeId, but instead I simply retained the parentId. Since I didn't know the actual use of either table, I didn't know if a change like that would affect you dramatically. If you are simply trying identify distinct parent child relationships, then you don't need the look up.

If you need the lookup to retain the 'path' then you may run into a problem as you do not know (in my solution) which instance of the commandId is the 'true' parent. It then becomes an issue of paths, for example, your @Pb record could keep not only the unprocessed commands but also the process commands.

process unprocessed
-------- ------------
'1,2' '18,19,4'

If you carry this concept through to the nodelist you should be able to accurately lookup the correct node. The drawback is that the rowsize will be slightly larger, and you will have some loss in speed.

If you want to add a little about your ultimate goal, I may be able to give you a more specific solution.

Good Luck with it!!

Corey Aldebol
Greenville, SC
Go to Top of Page

SMerrill
Posting Yak Master

206 Posts

Posted - 2004-06-01 : 12:14:51
That confirms my independent conclusion. Thanks for the confirmation. That will allow it to wind it's way down through the "parentage" of the tree perfectly. It already chows down the data like a hungry wolf. Instead of taking 08:30:00, it now takes 00:00:46 for processing time!
Bravo!

~ Shaun Merrill
Seattle, WA
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2004-06-02 : 07:24:07
now...that's a real good return on investment for 10 minutes typing.

RE Ireland....if you're related to the Merrill Lynch's.....then i'm sure there will be plenty of relations that will pop out of the woodwork....


If you ever come this way....and I can recommend it....do drop by.
Go to Top of Page

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2004-06-02 : 08:06:57
>>As an FYI....AjarnMark also lives/works in Seattle....I met up with him last year when I brought 2 soccer teams from Ireland to Seattle....slowly but surely the virtual community/world of SQLTeam.com is establishing physical links.

Andrew, any plans on coming down to Dubai with a soccer team or two? I know a few people in the soccer business, I'd be glad to welcome you here. Heck, my boss is a damn good player himself! Its a pity its too hot for anything for the next four-five months, but after that its season time!

OS
Go to Top of Page

AndrewMurphy
Master Smack Fu Yak Hacker

2916 Posts

Posted - 2004-06-02 : 09:21:27
Now...if I had an invite...it'd be something we could explore further.

However since my 'more enthusiastic team' is a ladies team...(both teams (mens and ladies) are talented...but the men are a bit casual about putting in the effort to travel)....it might be a stumbling block to going to Dubai....or would it?....the trip away isn't the point of the exercise for us....it's about playing competitive soccer (being abroad just makes it more exciting/exotic)


However, since we only went away last year....it's going to be another 3/4 years before we can get away again....financing amateur soccer trips (done well) is an expensive business......and my lotto numbers haven't come up yet.


If anybody fancies coming this direction...I'm sure we can explore the reverse role of 'host'.
Go to Top of Page

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2004-06-02 : 09:44:59
I see your predicament. And yes, you don't see ladies teams playing it out here often...ummm... let's just put it this way: I haven't seen one so far . But hey, any SQLTeamer wants to visit Dubai, they are really welcome and I'd be glad to show them around! And if your lotto numbers do come up, you won't forgot your old friends at SQLTeam, will you?

Shaun, sorry to hijack your thread!

OS
Go to Top of Page

SMerrill
Posting Yak Master

206 Posts

Posted - 2004-06-02 : 12:05:56
LOL That's Ok, carry on as long as you like! I wonder if I'll have to learn soccer before I can get another job... The way the American economy is going, maybe we'll have to move to India to find work! But if I won the lotto, I'd still be programming. But I think it would be in bonnie Ireland.
Go to Top of Page

SMerrill
Posting Yak Master

206 Posts

Posted - 2004-06-02 : 17:53:01
Well, I'm in SQL 7, and am having a heck of a time trying to figure how it is possible to get more than one node with the same parentage. Did I do something wrong here?
Look for the red TOP 500 statement. If I change this number to something small enough, it will pass. But I seem to be getting 100% NULLs in the pNodeID column.
If I didn't have 119179 records in BehaviorPattern, I'd post the data that I'm having trouble with.


declare @c int
set nocount on
DROP TABLE _NL

CREATE TABLE _NL (nodeId INT IDENTITY(1,1) NOT NULL
, pNodeId INT NULL
, CommandId INT NULL
, PatternId INT NULL
, Parentage VARCHAR(250) NULL
)

DROP TABLE _BP
CREATE TABLE _BP (
PatternId INT NOT NULL
, Processed VARCHAR(250) NULL
, Unprocessed VARCHAR(250) NULL
)
INSERT INTO _BP (
PatternID
, Unprocessed
, Processed
)
SELECT TOP 500
PatternID
, PatternString
, NULL
FROM BehaviorPattern WITH (NOLOCK)
WHERE PatternString NOT LIKE '%NULL%'
ORDER BY PatternID

select @C = 0
-- SELECT * FROM _BP

WHILE EXISTS (SELECT * FROM _BP WHERE Unprocessed LIKE '%,%')
BEGIN

select @C = @C + 1
print 'Loop number ' + cast(@C as varchar)

INSERT INTO _NL
SELECT
DISTINCT
pNodeId = (SELECT nodeId FROM _NL NL
WHERE isnull(BP.Processed,0) = isnull(NL.Parentage,0))
, CommandId = LEFT(BP.Unprocessed, CHARINDEX(',',BP.Unprocessed)-1)
, PatternId = NULL
, Parentage = BP.Processed
FROM _BP BP
WHERE Unprocessed LIKE '%,%'

UPDATE _BP
SET
Unprocessed = right(Unprocessed,len(Unprocessed)-charIndex(',',Unprocessed))
, Processed = isnull(Processed + ',','') + left(Unprocessed,charIndex(',',Unprocessed)-1)
From _BP BP WITH (NOLOCK)
Where Unprocessed like '%,%'

End
print 'end of loop'

Insert Into _NL
Select
Distinct
pNodeId = (Select nodeId From _NL NL Where BP.Processed = NL.Parentage)
, CommandId = Unprocessed
, PatternId
, Parentage = BP.Processed
From _BP BP WITH (NOLOCK)


~ Shaun Merrill
Seattle, WA
Go to Top of Page

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-06-02 : 18:56:40
Oh!!! we're back to topic?!?!

I will take a look and see what I can come up with...

Corey
Go to Top of Page
    Next Page

- Advertisement -