SQL for Threaded Discussion Forums

By Bill Graziano on 4 February 2001 | Tags: Application Design


One of questions we get on a regular basis involves threaded discussion forums. Everyone wants to know an easy way to do this. Everyone is also trying to do recursive SQL or self joins to make this work. That's hard. Here's an easy way to do this. (UPDATE: Fixed a problem and thought I'd repost for everyone to see the changes)

UPDATE (2/4/01): Carlos discovered a little error with the stored procedure that would allow posts to get out of order. I added the two lines in red to the stored procedure below.

When you first look at this problem your first thought IS to use some type of self join or a recursive stored procedure into a temp table. Those are hard to write and hard to test. As you should know by now, I'm lazy and I wanted an easier way. One of my fundamental principles of database design is this: If you can't get data in or out easily, you're not storing it right. Let start with a table that looks like this:

CREATE TABLE [dbo].[Posts] (
	[PostID] [int] IDENTITY (1, 1) NOT NULL ,
	[ThreadID] [int] NOT NULL ,
	[ParentID] [int] NOT NULL ,
	[PostLevel] [smallint] NOT NULL ,
	[SortOrder] [int] NOT NULL ,
	[Subject] [char] (20) NOT NULL ,
	[PostDate] [datetime] NOT NULL 
) ON [PRIMARY]
GO

PostID is my single field Primary Key for this table. I'm a huge fan of a single field primary key. It makes joins and subqueries so much easier. ThreadID is the field I'm going to use to keep all the posts in a thread together. The ThreadID is going to be the PostID of the first post in a thread (which is a post with a PostLevel = 1 ). ParentID is the PostID of the parent of this post. For the first post in a thread, ParentID will be set to the PostID.

The first post in a thread will have the PostLevel equal 1. It's child posts are PostLevel = 2 and so on. Subject is the user entered subject and PostDate is the date and time the post was put into the database. I don't have a field for the actual post in this example. I wanted to keep this simple. You can easily add a varchar or text field to hold the users post. SortOrder is the key to this whole crazy scheme. That's the field I'm going to use to order the posts. My trick is to always keep the data sorted just like I want it in the table. That means my queries need to do as little work as possible.

Let's take a look at a SELECT statment to display a threaded discussion:

SELECT Subject = convert(varchar, SPACE(2 * (PostLevel - 1) ) + Subject),
PostID,
ThreadID,
ParentID,
PostLevel,
SortOrder,
PostDate = convert(varchar(19), PostDate, 120)
FROM Posts 
ORDER BY ThreadID, SortOrder

and the output it generates:

Subject                        PostID      ThreadID    ParentID    PostLevel SortOrder   PostDate            
------------------------------ ----------- ----------- ----------- --------- ----------- ------------------- 
First Post                     1           1           1           1         1           2000-11-05 13:23:46
  First Reply                  10          1           1           2         2           2000-11-14 18:42:14
    First Sub Reply            11          1           10          3         3           2000-11-14 18:42:33
    Reply to #10               13          1           10          3         4           2000-11-14 18:47:18
      Reply to #13             14          1           13          4         5           2000-11-14 18:48:00
    Reply to #10               15          1           10          3         6           2000-11-14 18:48:12
  Second Reply                 12          1           1           2         7           2000-11-14 18:43:17
    Reply to #12               17          1           12          3         8           2000-11-14 18:49:02
  Reply to #1                  16          1           1           2         9           2000-11-14 18:48:53
Top Level                      18          18          18          1         1           2000-11-14 18:55:28
  Next                         20          18          18          2         2           2000-11-14 18:55:56
    nextdown                   22          18          20          3         3           2000-11-14 18:56:15
      farther                  23          18          22          4         4           2000-11-14 18:56:31
        farther                24          18          23          5         5           2000-11-14 18:56:35
          farther              25          18          24          6         6           2000-11-14 18:56:43
            farther            26          18          25          7         7           2000-11-14 18:56:46
              farther          27          18          26          8         8           2000-11-14 18:56:49
                farther        28          18          27          9         9           2000-11-14 18:56:53
  three                        21          18          18          2         10          2000-11-14 18:56:03
  HERE                         29          18          18          2         11          2000-11-14 18:57:31
Top Level                      19          19          19          1         1           2000-11-14 18:55:47

As you can see my SELECT statement is pretty darn simple. Add an index on ThreadID and SortOrder and you'll get great performance. This is very important in this type of application. Most of the transactions against this table are going to be queries so this needs to be optimized to maximize query response time. Looking at the simplicity of this query I think you'll agree we've done that.

In this example I've used the SPACE function to generate my indents. In your application you'll probably do that in ASP, VB or whatever you're writing in.

The key to this approach is getting the records in the database in the right order. All the work is done in the stored procedure that puts the records into the table. Which looks like this:

CREATE PROCEDURE spPost (@ReplyToID int, @Subject char(20) ) AS

DECLARE @MaxSortOrder int, @ParentLevel int, @ThreadID int
DECLARE @ParentSortOrder int, @NextSortOrder int, @NewPostID int

BEGIN TRAN

IF @ReplyToID = 0 -- New Post
  BEGIN
    INSERT Posts ( ThreadID, ParentID, PostLevel, SortOrder, Subject, PostDate )
    VALUES (0, 0, 1, 1, @Subject, getdate())

    SELECT @NewPostID = @@IDENTITY

    UPDATE Posts
    SET ThreadID = @NewPostID,
        ParentID = @NewPostID
    WHERE PostID = @NewPostID
  END

ELSE -- @ReplyToID <> 0 means reply to an existing post

  BEGIN
    -- Get Post Information for what we are replying to
    SELECT @ParentLevel = PostLevel,
           @ThreadID = ThreadID,
           @ParentSortOrder = SortOrder
    FROM Posts
    WHERE PostID = @ReplyToID

    -- Is there another post at the same level or higher
    IF EXISTS (SELECT * 
               FROM Posts 
               WHERE PostLevel <= @ParentLevel 
               AND SortOrder > @ParentSortOrder
               AND ThreadID = @ThreadID )
      BEGIN
        -- Find the next post at the same level or higher
        SELECT @NextSortOrder = Min(SortOrder)
        FROM Posts
        WHERE PostLevel <= @ParentLevel 
        AND SortOrder > @ParentSortOrder
        AND ThreadID = @ThreadID

        -- Move the existing posts down
	UPDATE Posts
        SET SortOrder = SortOrder + 1
        WHERE ThreadID = @ThreadID
        AND SortOrder >= @NextSortOrder

        --  And put this one into place
        INSERT Posts (ThreadID, ParentID, PostLevel, SortOrder, Subject, PostDate )
        VALUES (@ThreadID, @ReplyToID, @ParentLevel + 1, @NextSortOrder, @Subject, getdate() )

      END

   ELSE -- There are no posts at this level or above

     BEGIN
    	-- Find the highest sort order for this parent
    	SELECT @MaxSortOrder = MAX(SortOrder)
    	FROM Posts
    	WHERE ThreadID = @ThreadID

    	INSERT Posts (ThreadID, ParentID, PostLevel, SortOrder, Subject, PostDate )
    	VALUES (@ThreadID, @ReplyToID, @ParentLevel + 1, @MaxSortOrder + 1, @Subject, getdate() )
     END 
  END

COMMIT TRAN
GO

So let's break down this procedure. The first main section handles a new post at the top level. That's pretty easy. I insert the record and update the record using the identity value generated. I'm sure there are faster ways to do that but I'm all about simple.

The next case is replying to a post. The two options are a reply in the middle of a thread or appending to the end. Appending to the end is pretty easy. It's just like adding a new record. Putting a record in the middle is difficult. That is where the SortOrder field comes into play. We always keep this table sorted just like we want it to display. I have to "move down" the existing posts to insert a new one.

I've also thought about writing a stored procedure to generate this type of table given a parent child relationship in a table. All you really need to do is run through the original table and call this stored procedure for each record. This will put them in the proper order. Hope this helps. Let me know how this works for you or if there's anything you'd like me to change. I'd suggest you test this strongly. I haven't had as much time to test it as I'd like.

UPDATE (2/4/01): And it looks like that last sentence proved prophetic.

-graz


Related Articles

Application Locks (or Mutexes) in SQL Server 2005 (7 January 2008)

What I Wish Developers Knew About SQL Server (Presentation) (11 October 2007)

Multiple Active Result Sets (MARS) – Transactions and Debugging (17 June 2007)

Multiple Active Result Sets (MARS) (3 April 2007)

How SQL Server 2005 Enables Service-Oriented Database Architectures (8 September 2006)

Presentation: What I Wish Developers Knew About SQL Server (17 November 2005)

GeoCoding with SQL Server and Google (8 August 2005)

How to Asynchronously Execute a DTS package from ASP or ASP.NET (27 March 2005)

Other Recent Forum Posts

Sql Query to check status change of an item (12h)

Can I create differential backups tied to a specifc Full backup instead of the most recent? (6d)

My informix Sql query retruns Null always (6d)

Vehicle availability query (8d)

SSDT - Unable to reference 'master' or 'msdb' with new sdk style project (8d)

Ola Hallengren backup jobs (8d)

Compare alpha results to INT after get values from a string (11d)

Query performance Call Center data (13d)

- Advertisement -