Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
SMerrill
Posting Yak Master
206 Posts |
Posted - 2004-05-26 : 19:16:58
|
Team:You’ll probably all tell me to go away, but is there a way to normalize this data set into a tree-view node list using a set-based method? I have already written a cursor to do it, but the cursor takes 9 hours per day, and this is getting ridiculous. You see, I have 280,000 BehaviorPattern records, and 1,200,000 NodeList records.Think of this data as a tree-view. Start at the top with node 1 that contains CommandID=1. (It is the only node without a parent node.)From there, it is the parent of node 2 which contains CommandID=2, and it is also the parent of node 8, which contains CommandID=3.The source table contains a string of patterns, and there is a unique constraint on BehaviorPattern.PatternString.Source table called BehaviorPattern:(PatternID, PatternString)1, “1, 2, 18, 4”2, “1, 2, 54, 60, 4”3, “1, 3, 19, 4”4, “1, 3, 65, 4” The destination table arbitrarily assigns node numbers to each node by going through each Pattern and tacking on a new NodeList record only if one doesn’t already exist where it belongs.Destination Table called NodeList: (NodeID, ParentNodeID, CommandID, PatternID) 1, Null, 1, Null 2, 1, 2, Null 3, 2, 18, Null 4, 3, 4, 1 5, 2, 54, Null 6, 5, 60, Null 7, 6, 4, 2 8, 1, 3, Null 9, 8, 19, Null10, 9, 4, 411, 8, 65, Null12, 11, 4, 3 Here is my cursor procedure:/* This procedure loads the NodeList table * by reading from the BehaviorPattern table. * * For each Behavior Pattern, the PatternString is dissected * by delimiters between each CommandID within the pattern. * * Each CommandID is looked up in the NodeList table, and if * it doesn't exist below the appropriate parent node, it is created. */declare @PatID integerdeclare @PatType Integerdeclare @CmdID INTEGERDECLARE @ParentID INTEGERDECLARE @TempID INTEGERdeclare @ptrDelim INTEGERDECLARE @First INTEGERDECLARE @Last INTEGERDECLARE @CNT INTEGERDeclare @Pat varchar(250)set nocount onDECLARE Pat_Cursor CURSOR FOR SELECT BehaviorPattern.PatternID, BehaviorPattern.PatternType, BehaviorPattern.PatternStringFROM BehaviorPattern WITH (nolock) LEFT OUTER JOIN NodeList WITH (nolock) ON BehaviorPattern.PatternID = NodeList.PatternIDWHERE (NodeList.PatternID IS NULL) AND PatternString NOT LIKE '%NULL%'OPEN Pat_CursorFETCH NEXT FROM Pat_Cursor INTO @PatID, @PatType, @PatWHILE @@FETCH_STATUS = 0BEGIN SELECT @First = 1 SELECT @Last = 0 -- Dissect @Pat by delimiter ', ' and loop through each numeric value. --PRINT 'Pattern=' + cast(@PatID as varchar(20)) + '; String="' + @Pat + '"' SELECT @ParentID = NULL WHILE LEN(@Pat)>0 BEGIN SELECT @PtrDelim = CHARINDEX(', ',@Pat,1) --point to the first delimiter IF @PtrDelim = 0 BEGIN -- Delimiter not found - must be the last one in the string SELECT @Last=1 SELECT @CmdID = CAST(ISNULL(@Pat,'0') as INTEGER) END ELSE BEGIN -- Delimiter found - parse out the CmdID SELECT @First=0 SELECT @CmdID = CAST(ISNULL(SUBSTRING(ISNULL(@Pat,""),1,@ptrDelim - 1),'0') as INTEGER) END /* For each number within a pattern string, create a node if it doesn't exist within NodeList below the appropriate parent node. */ IF @ParentID is null BEGIN --@TempID stores the NODE of the CommandID so it can be used as a ParentID next time thru --@CNT stores a zero if there is not an existing node with the appropriate parent SELECT @TempID=NODE FROM NodeList with (nolock) WHERE (ParentNode IS NULL) AND (CommandID = @CmdID) SELECT @CNT=@@ROWCOUNT -- 0 if we didn't find the node, 1 if we did END ELSE BEGIN --@TempID stores the NODE of the CommandID so it can be used as a ParentID next time thru --@CNT stores a zero if there is not an existing node with the appropriate parent SELECT @TempID=NODE FROM NodeList with (nolock) WHERE (ParentNode = @ParentID) AND (CommandID = @CmdID) SELECT @CNT=@@ROWCOUNT -- 0 if we didn't find the node, 1 if we did END IF @CNT = 0 BEGIN --Create the node if it does not exist below the appropriate parent INSERT INTO Nodelist with (rowlock) (ParentNode, CommandID, PatternID) VALUES(@ParentID , @CmdID , @PatID) IF @@ERROR<>0 BEGIN PRINT 'Could not insert Node: ' + 'ParentNode=' + cast(isnull(@ParentID,'NULL') AS varchar(20)) + ', CommandID=' + cast(isnull(@CmdID ,'NULL') AS varchar(20)) + ', PatternID=' + cast(isnull(@PatID ,'NULL') AS varchar(20)) END SELECT @TempID=@@IDENTITY END --IF @CNT=0 SELECT @ParentID = @TempID IF @Last=0 BEGIN -- Chop off the left side of @PAT each time through the loop: SELECT @PAT = SUBSTRING(@Pat,@ptrDelim + 2,LEN(@Pat)-(@ptrDelim+1)) END ELSE BEGIN SELECT @PAT = "" END END -- While LEN(@Pat)>0 FETCH NEXT FROM Pat_Cursor INTO @PatID, @PatType, @PatEND --FETCH LOOPCLOSE Pat_CURSORDEALLOCATE Pat_CURSORThanks for taking a look at this wild-bear code. ALSO may I point out that the PatternID col is Null unless it is a childless node. ~ Shaun MerrillSeattle, WA |
|
|
jbkayne
Posting Yak Master
100 Posts |
Posted - 2004-05-26 : 21:41:59
|
I don't know if you can entirely do away with the cursor since you need to back reference as you go.I DO however have some ideas that could really cut down the amount of time.First off, can you change the schema for the pattern? I think something like the following will perform much better (especially when properly indexed)create table pattern ( PatternId int identity(1,1) not null, PatternGroupId int not null, PatternPart int not null, PatternSeq int not null)insert into pattern values (1,1,1)insert into pattern values (1,2,2)insert into pattern values (1,18,3)insert into pattern values (1,4,4)insert into pattern values (2,1,1)insert into pattern values (2,2,2)insert into pattern values (2,54,3)insert into pattern values (2,60,4)insert into pattern values (2,4,5)insert into pattern values (3,1,1)insert into pattern values (3,3,2)insert into pattern values (3,19,3)insert into pattern values (3,4,4)insert into pattern values (4,1,1)insert into pattern values (4,3,2)insert into pattern values (4,65,3)insert into pattern values (4,4,4)My second idea would be to insert in bulk.//---- SNIPPET FROM YOUR EXAMPLE INSERT INTO Nodelist with (rowlock) (ParentNode, CommandID, PatternID) VALUES(@ParentID , @CmdID , @PatID)---//Instead:try inserting in the following manner.... Pseudo code belowcreate table nodelist ( NodeID int identity (1,1) not null , ParentNodeID int null , CommandID int not null , PatternID int null)declare @i intselect @i = 1while @i < (select max(patterngroupid) from pattern) begin insert into nodelist (parentnodeid, commandid, patternid) select null as parentnodeid , patternpart as commandid , null as patternid from pattern where patterngroupid = @i select @i = @i + 1 endI haven't worked out any of the CASE statements needed to generate parentnodeid and patternid, but I can help more if needed.Sincerely,jbkayneBTW, I too work in Seattle. Let's grab a beer sometime after work! |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-05-27 : 05:51:43
|
| Skipping through the obvious requirement that going set-based should be far better than your cursor, one presumes that you have proper indices built on the relevent tables to support the joins/where conditions....what's the execution plan like on the CURSOR definition...can you post it?also note...that the (NOT) LIKE condition may be the true source of the problem in that (as far as I remember) won't support/use ANY index...and will be a real performance killer....can you define the unwanted records any better (ie in a way that avoids the use of the LIKE statement and in particular the LIKE statement with a % in the 1st character of the comparison string)?As an FYI....AjarnMark also lives/works in Seattle....I met up with him last year when I brought 2 soccer teams from Ireland to Seattle....slowly but surely the virtual community/world of SQLTeam.com is establishing physical links. |
 |
|
|
SMerrill
Posting Yak Master
206 Posts |
Posted - 2004-05-27 : 13:52:54
|
I am inspired enough to change course ... thanks!This database is called a "Behavior Pattern Recognition System", and I am normalizing a multi-line record set, and recognizing patterns in the behavior.It is up to me to recognize if a set of commands is a unique permutation of CommandIDs. If it is, I wish to create and assign a new PatternID. If it is not, I wish to look up the PatternID.At any rate, thanks to your help, my new inspiration is to make it like your first idea up front.I already thought of that shape, but it was downstream of the NodeList, and I created it by iterating backwards through the NodeList once it was complete. Perhaps I could re-work things and compute the BlockList before I compute the NodeList. (That will take several beers!)I will also try to nix the NOT LIKE, or prevent {1,3,15,NULL,4,3} from happening at all.AjarnMark used to work at the same place I now work. We are good friends. Now, if I could just get a SQL job in Ireland and locate my family roots, I'd jump on it... Cheers,~ Shaun MerrillSeattle, WA |
 |
|
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-05-28 : 00:46:46
|
| Very intersting problem. What are the permissible ranges of CommandID's in the pattern string? (i.e Max 255 or no limit) Just mulling over some ideas... |
 |
|
|
SMerrill
Posting Yak Master
206 Posts |
Posted - 2004-05-28 : 14:45:17
|
| No limit. SELECT MAX(PatternID) FROM BehaviorPattern is currently 280295.~ Shaun MerrillSeattle, WA |
 |
|
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-05-28 : 14:54:00
|
| I was asking about CommandID, (the comma delimited integers in the PatternString), not PatternId. |
 |
|
|
SMerrill
Posting Yak Master
206 Posts |
Posted - 2004-05-28 : 14:56:38
|
| Oh, Sorry. 243095 is the current maximum, but there is no limit there either.~ Shaun MerrillSeattle, WA |
 |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-05-28 : 15:57:41
|
| Dare I say you could get rid of cursors... I have a fuzzy attempt at a process:Declare @BP Table (PatternId int identity(1,1) not null, pStr nvarchar(1000), parentId int)Insert Into @BP Select pStr = '1,2,18,4', parentId = nullInsert Into @BP Select pStr = '1,2,54,60,4', parentId = nullInsert Into @BP Select pStr = '1,3,19,4', parentId = nullInsert Into @BP Select pStr = '1,3,65,4', parentId = nullDeclare @NL Table (nodeId int identity(1,1) not null, pNodeId int, CommandId int, PatternId int)While exists(Select * From @bp Where pStr like '%,%')Begin Insert Into @NL Select Distinct pNodeId = (Select nodeId From @nl Where CommandId = parentId), CommandId = left(pStr,charIndex(',',pStr)-1), PatternId = null From @bp Where pStr like '%,%' Update @bp Set pStr = right(pStr,len(pStr)-charIndex(',',pStr)), parentId = left(pStr,charIndex(',',pStr)-1) From @bp Where pStr like '%,%'End Insert Into @NL Select Distinct pNodeId = (Select nodeId From @nl Where CommandId = parentId), CommandId = convert(int,pStr), PatternId From @bpSelect * From @NLSelect * From @bpI will definitely take a few tweaks to get it how you want it, but something along these lines should work.P.S. On Cursors: I cut approx 60 hours off of a cursor process by reinventing the process. It now runs in approx. 15 minutes. |
 |
|
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-05-28 : 16:18:16
|
| That is one of the coolest solutions I have ever seen. Amazing. |
 |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-05-28 : 16:31:55
|
| why thank you :)I stumbled across the concept while build a dynamic tree structure for client reporting. Honestly the above code is almost the inverse of what I needed as I was going from the parent/child rows to the 'parentage path' as I call it (so I could display a tree from 1 recordset). |
 |
|
|
SMerrill
Posting Yak Master
206 Posts |
Posted - 2004-05-28 : 19:03:36
|
WOW This is the quintescence of pattern slicing! It simply chows down my data!!As I understand your excellent solution, it slices each column of numbers off vertically.My problem with it is that my nodes do not always contain unique values.For example, if you change the data to Insert Into @BP Select pStr = '1,2,18,19,4', parentId = nullInsert Into @BP Select pStr = '1,2,54,60,4', parentId = nullInsert Into @BP Select pStr = '1,3,19,4', parentId = nullInsert Into @BP Select pStr = '1,3,65,4', parentId = null... then you will get the following error:Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.But I will work with this one until I get it working.THIS IS SIMPLY ELEGANT.~ Shaun MerrillSeattle, WA |
 |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-05-28 : 23:48:31
|
| I kinda of expected that you would geth the 'more than 1 value' error, but I thought I would wait and see how your data worked out.In my original experiment writing the query I did not use the look up query for the nodeId, but instead I simply retained the parentId. Since I didn't know the actual use of either table, I didn't know if a change like that would affect you dramatically. If you are simply trying identify distinct parent child relationships, then you don't need the look up.If you need the lookup to retain the 'path' then you may run into a problem as you do not know (in my solution) which instance of the commandId is the 'true' parent. It then becomes an issue of paths, for example, your @Pb record could keep not only the unprocessed commands but also the process commands.process unprocessed-------- ------------'1,2' '18,19,4'If you carry this concept through to the nodelist you should be able to accurately lookup the correct node. The drawback is that the rowsize will be slightly larger, and you will have some loss in speed.If you want to add a little about your ultimate goal, I may be able to give you a more specific solution.Good Luck with it!!Corey AldebolGreenville, SC |
 |
|
|
SMerrill
Posting Yak Master
206 Posts |
Posted - 2004-06-01 : 12:14:51
|
| That confirms my independent conclusion. Thanks for the confirmation. That will allow it to wind it's way down through the "parentage" of the tree perfectly. It already chows down the data like a hungry wolf. Instead of taking 08:30:00, it now takes 00:00:46 for processing time!Bravo!~ Shaun MerrillSeattle, WA |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-06-02 : 07:24:07
|
| now...that's a real good return on investment for 10 minutes typing.RE Ireland....if you're related to the Merrill Lynch's.....then i'm sure there will be plenty of relations that will pop out of the woodwork....If you ever come this way....and I can recommend it....do drop by. |
 |
|
|
mohdowais
Sheikh of Yak Knowledge
1456 Posts |
Posted - 2004-06-02 : 08:06:57
|
| >>As an FYI....AjarnMark also lives/works in Seattle....I met up with him last year when I brought 2 soccer teams from Ireland to Seattle....slowly but surely the virtual community/world of SQLTeam.com is establishing physical links.Andrew, any plans on coming down to Dubai with a soccer team or two? I know a few people in the soccer business, I'd be glad to welcome you here. Heck, my boss is a damn good player himself! Its a pity its too hot for anything for the next four-five months, but after that its season time!OS |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-06-02 : 09:21:27
|
Now...if I had an invite...it'd be something we could explore further. However since my 'more enthusiastic team' is a ladies team...(both teams (mens and ladies) are talented...but the men are a bit casual about putting in the effort to travel)....it might be a stumbling block to going to Dubai....or would it?....the trip away isn't the point of the exercise for us....it's about playing competitive soccer (being abroad just makes it more exciting/exotic)However, since we only went away last year....it's going to be another 3/4 years before we can get away again....financing amateur soccer trips (done well) is an expensive business......and my lotto numbers haven't come up yet. If anybody fancies coming this direction...I'm sure we can explore the reverse role of 'host'. |
 |
|
|
mohdowais
Sheikh of Yak Knowledge
1456 Posts |
Posted - 2004-06-02 : 09:44:59
|
I see your predicament. And yes, you don't see ladies teams playing it out here often...ummm... let's just put it this way: I haven't seen one so far . But hey, any SQLTeamer wants to visit Dubai, they are really welcome and I'd be glad to show them around! And if your lotto numbers do come up, you won't forgot your old friends at SQLTeam, will you? Shaun, sorry to hijack your thread!OS |
 |
|
|
SMerrill
Posting Yak Master
206 Posts |
Posted - 2004-06-02 : 12:05:56
|
LOL That's Ok, carry on as long as you like! I wonder if I'll have to learn soccer before I can get another job... The way the American economy is going, maybe we'll have to move to India to find work! But if I won the lotto, I'd still be programming. But I think it would be in bonnie Ireland. |
 |
|
|
SMerrill
Posting Yak Master
206 Posts |
Posted - 2004-06-02 : 17:53:01
|
Well, I'm in SQL 7, and am having a heck of a time trying to figure how it is possible to get more than one node with the same parentage. Did I do something wrong here?Look for the red TOP 500 statement. If I change this number to something small enough, it will pass. But I seem to be getting 100% NULLs in the pNodeID column.If I didn't have 119179 records in BehaviorPattern, I'd post the data that I'm having trouble with.declare @c intset nocount on DROP TABLE _NLCREATE TABLE _NL (nodeId INT IDENTITY(1,1) NOT NULL , pNodeId INT NULL , CommandId INT NULL , PatternId INT NULL , Parentage VARCHAR(250) NULL )DROP TABLE _BPCREATE TABLE _BP ( PatternId INT NOT NULL , Processed VARCHAR(250) NULL , Unprocessed VARCHAR(250) NULL )INSERT INTO _BP ( PatternID , Unprocessed , Processed )SELECT TOP 500 PatternID , PatternString , NULL FROM BehaviorPattern WITH (NOLOCK)WHERE PatternString NOT LIKE '%NULL%'ORDER BY PatternIDselect @C = 0-- SELECT * FROM _BPWHILE EXISTS (SELECT * FROM _BP WHERE Unprocessed LIKE '%,%')BEGIN select @C = @C + 1 print 'Loop number ' + cast(@C as varchar) INSERT INTO _NL SELECT DISTINCT pNodeId = (SELECT nodeId FROM _NL NL WHERE isnull(BP.Processed,0) = isnull(NL.Parentage,0)) , CommandId = LEFT(BP.Unprocessed, CHARINDEX(',',BP.Unprocessed)-1) , PatternId = NULL , Parentage = BP.Processed FROM _BP BP WHERE Unprocessed LIKE '%,%' UPDATE _BP SET Unprocessed = right(Unprocessed,len(Unprocessed)-charIndex(',',Unprocessed)) , Processed = isnull(Processed + ',','') + left(Unprocessed,charIndex(',',Unprocessed)-1) From _BP BP WITH (NOLOCK) Where Unprocessed like '%,%'Endprint 'end of loop'Insert Into _NL Select Distinct pNodeId = (Select nodeId From _NL NL Where BP.Processed = NL.Parentage) , CommandId = Unprocessed , PatternId , Parentage = BP.Processed From _BP BP WITH (NOLOCK)~ Shaun MerrillSeattle, WA |
 |
|
|
Seventhnight
Master Smack Fu Yak Hacker
2878 Posts |
Posted - 2004-06-02 : 18:56:40
|
Oh!!! we're back to topic?!?! I will take a look and see what I can come up with...Corey |
 |
|
|
Next Page
|
|
|
|
|