More Trees & Hierarchies in SQL

By Rob Volk on 1 May 2002 | Tags: Table Design


Hierarchies are sometimes difficult to store in SQL tables...things like trees, threaded forums, org charts and the like...and it's usually even harder to retrieve the hierarchy once you do store it. Here's a method that's easy to understand and maintain, and gives you the full hierarchy (or any piece of it) very quickly and easily.

While XML handles hierarchical data quite well, relational SQL doesn't. There are several different ways to model a hierarchical structure. The most common and familiar is known as the adjacency model, and it usually works like this:

The table would look like this:

EmployeeIDNameBossID
1001Denis Eaton-HoggNULL
1002Bobbi Flekman1001
1003Ian Faith1002
1004David St. Hubbins1003
1005Nigel Tufnel1003
1006Derek Smalls1003
And the org chart/indented list looks like this:
Denis Eaton-Hogg
  Bobbi Flekman
    Ian Faith
      David St. Hubbins
      Nigel Tufnel
      Derek Smalls

It's called the "adjacency" model because the parent (boss) data is stored in the same row as the child (employee) data, in an adjacent column. It's a pretty straightforward design that's easily understood by everyone...no deep relational theory needed. You can find a person's boss easily, and you can find their coworkers by querying the BossID column.

The trouble begins when you want to list several levels of a hierarchy. To find the boss's boss, you would need to join the Employees table to itself, like this:

SELECT BigBoss.Name BigBoss, Boss.Name Boss, Employees.Name Employee
FROM Employees 
INNER JOIN Employees AS Boss ON Employees.BossID=Boss.EmployeeID
INNER JOIN Employees BigBoss ON Boss.BossID=BigBoss.EmployeeID
And you'd get the following:

BigBossBossEmployee
Denis Eaton-HoggBobbi FlekmanIan Faith
Bobbi FlekmanIan FaithDavid St. Hubbins
Bobbi FlekmanIan FaithNigel Tufnel
Bobbi FlekmanIan FaithDerek Smalls

For each level, you'd need to join the table to itself...not an attractive option if you have 5 or more levels! It would be great if it could join itself as many times as needed. This is called a recursive join, and though some database products support it (Oracle has the CONNECT BY syntax) SQL Server is not one of them.

If you look in Books Online under "expanding hierarchies" you'll find a stored procedure that runs through an adjacency table to expand the hierarchy. While it works, it's a procedural method that requires a stack (using a temp table) and can take a while to run with large hierarchies. It also PRINTs out the indented list, so you'd need to modify it to use ANOTHER temp table if you wanted the results as a table/query.

If you've followed Joe Celko's columns or bought his books, he recommends the nested set model for representing trees in SQL (he's posted it on SQL Team a few times). It's very well detailed in the following articles, Part I, II, III, IV, and also in his book, SQL For Smarties, and I recommend checking it out. It's very efficient and makes it extremely easy to pull out trees/subtrees from the table.

However (you knew this was coming!) one of the issues I have with nested sets is the complexity required to do relatively simple tasks, like adding, deleting, or moving nodes in the tree. Even finding an employee's immediate supervisor or subordinates requires 3 self-joins AND a subquery! Joe admits this shortcoming in his book...and it's interesting that the solution ONLY appears in his book, I've never seen him post it online.

Although there's a very seductive logic to nested sets, and it's easy to do complicated tree operations with them, I find them less intuitive than the adjacency model. It's harder for me to visualize a hierarchy or org chart with them. You may be able to use them more easily than I can, but if you also find them daunting, read on.

So how to represent a hierarchy, using adjacency, and avoiding recursion wherever possible? It's pretty easy really...you build it and store it in the table! (I've posted this method in this thread a while back, and I'm elaborating on it here)

Here's the table definition for the Tree:

CREATE TABLE Tree (
Node int NOT NULL IDENTITY(100, 1),
ParentNode int, 
EmployeeID int NOT NULL,  
Depth tinyint,
Lineage varchar(255) )

I'm keeping the Tree table separate for a few good reasons I'll discuss later, but you could simply add the Depth and Lineage columns to the Employees table above, and substitute BossID for ParentNode. (I also didn't really WANT to use an identity column, but most people will anyway) The terms "node" and "lineage" might seem unfamiliar, but I wanted to generalize them a little more than "child", "parent" and "hierarchy".

Based on the Employees table, here's how the Tree will be filled:

NodeParentNodeEmployeeIDDepthLineage
100NULL1001NULLNULL
101NULL1002NULLNULL
102NULL1003NULLNULL
103NULL1004NULLNULL
104NULL1005NULLNULL
105NULL1006NULLNULL

The first thing to do is to populate the parent nodes, which is unecessary if you use a single table, but it's easy to do in any case:

UPDATE T SET T.ParentNode=P.Node
FROM Tree T 
INNER JOIN Employees E ON T.EmployeeID=E.EmployeeID
INNER JOIN Employees B ON E.BossID=B.EmployeeID
INNER JOIN Tree P ON B.EmployeeID=P.EmployeeID

And you'll get this:

NodeParentNodeEmployeeIDDepthLineage
100NULL1001NULLNULL
1011001002NULLNULL
1021011003NULLNULL
1031021004NULLNULL
1041021005NULLNULL
1051021006NULLNULL

This will only need to be done once, and afterwards you won't need to maintain the BossID column in the Employees table. The next part is to find the root node of the tree, also known as the top-level, or big boss man, etc. in an org chart. That's the node that has no parent (Null), so we will start there and set the Lineage column as the root:

UPDATE Tree SET Lineage='/', Depth=0 WHERE ParentNode Is Null

Once that's done, we can then update the rows who are immediate children of the root node:

UPDATE T SET T.depth = P.Depth + 1, 
T.Lineage = P.Lineage + Ltrim(Str(T.ParentNode,6,0)) + '/' 
FROM Tree AS T 
INNER JOIN Tree AS P ON (T.ParentNode=P.Node) 
WHERE P.Depth>=0 
AND P.Lineage Is Not Null 
AND T.Depth Is Null
In fact, we can just put a loop on this to run through all of the children/grandchildren etc. of the tree:
WHILE EXISTS (SELECT * FROM Tree WHERE Depth Is Null) 
UPDATE T SET T.depth = P.Depth + 1, 
T.Lineage = P.Lineage + Ltrim(Str(T.ParentNode,6,0)) + '/' 
FROM Tree AS T 
INNER JOIN Tree AS P ON (T.ParentNode=P.Node) 
WHERE P.Depth>=0 
AND P.Lineage Is Not Null 
AND T.Depth Is Null

Don't worry about the loop, it runs once for each level in the hierarchy...10 loops for 10 levels or generations. For a corporation, 10 layers of management is pretty deep; for a family tree, you could trace an American family back to the Revolutionary War! And under normal circumstances, you'd also only have to run this procedure once. The final result is:

NodeParentNodeEmployeeIDDepthLineage
100NULL10010/
10110010021/100/
10210110032/100/101/
10310210043/100/101/102/
10410210053/100/101/102/
10510210063/100/101/102/

You'll notice that for each node, the entire lineage back to the root is stored. This means that finding someone's boss, or their boss' boss, doesn't require any self-joins or recursion to create an indented list. In fact, it can be accomplished with a single SELECT!

SELECT Space(T.Depth*2) + E.Name AS Name
FROM Employees E 
INNER JOIN Tree T ON E.EmployeeID=T.EmployeeID
ORDER BY T.Lineage + Ltrim(Str(T.Node,6,0))
If you kept everything in one table you would not even need the JOIN! The Depth column comes in handy for performing the indent by using the Space() function. Using ORDER BY Lineage...etc. will sort the org chart properly, with each subordinate nesting underneath their parent. Sort order is maintained by Node values, and can be changed simply by updating the node value. Inserting or deleting a new node does not affect the rest of the tree, unlike the nested set model. The lineage column can be maintained automatically using triggers, so moving or promoting a node is a no-brainer.


Related Articles

Implementing Table Interfaces (19 May 2008)

Implementing Table Inheritance in SQL Server (20 February 2008)

Custom Auto-Generated Sequences with SQL Server (24 April 2007)

The Case for the Surrogate Key (9 August 2002)

Using TABLE Variables (7 June 2002)

Default Constraint Names (24 January 2001)

Temporary Tables (17 January 2001)

Denormalize for Performance (10 January 2001)

Other Recent Forum Posts

Select a single row based on conditions in multiple rows (4h)

I want Help Managing Big Data Sets in T SQL Efficiently (15h)

SQL stored procedure to load the error and correct record based on some business rules (1d)

Query is running too long (1d)

Sql Query to check status change of an item (1d)

Can I create differential backups tied to a specifc Full backup instead of the most recent? (7d)

My informix Sql query retruns Null always (8d)

Vehicle availability query (9d)

- Advertisement -