Using NEWID to Randomly Sort Records

By Bill Graziano on 16 April 2002 | Tags: Randomness


We've got a quite a few articles on randomly selecting and ordering rows. I'm still amazed how popular that topic is. I recently stumbled across a post in the forums that led me to a better way to accomplish this. This has actually been posted a few times before. I'm just finally catching up.

There was a post in the forums that led me to Randomly Sorting Query Results in the Tips and Tricks section of Microsoft's SQL Server web site. They said that
SELECT *
FROM Northwind..Orders 
ORDER BY NEWID()

would randomly sort the records in the table. This would be much better than our current solution which uses a temp table AND a cursor. NEWID() was available in SQL Server 7 but I tested this in SQL Server 2000. I'm not sure if this will work in SQL Server 7 or not. If anyone wants to test it and post the results in the article comments I'd appreciate it.

Testing

My first goal was to test this approach and see how random it really was. I wrote a simple T-SQL script to build a temp table and select the first record 100,000 times. You can download the script if you'd like. It loops around this piece of code:

select @found = ID
from @results
order by NEWID()

My script also summarzes the results and this is what it returned:

ID          HitCount    
----------- ----------- 
1           9905
2           9970
3           10042
4           10028
5           9988
6           9944
7           10011
8           10084
9           9991
10          10037

Std Dev                                               
------------------------
52.05

This says it selected the first record 9905 times, the second record 9970 times, etc. Unfortunately I'm not a great statistician. I'm not even a poor statistician. I've run this code again and again and again and the results seem pretty random to me. The result you see above is pretty typical. If someone out there can give me a statistical model for randomness I'll certainly test this against it.

That only tested the whether the first item in the list was random. I also needed to know if it sorted them in random order. So I wrote another script to track the position of each row. I limited by table to five rows this time. The entire result set is too large to print out but the results for the 4th row are pretty typical:

ID          Position    HitCount    
----------- ----------- ----------- 
4           1           413
4           2           424
4           3           395
4           4           371
4           5           397

Again if you want more detail you can download the script and run them for yourselves. They only use temp tables and table variables so they won't create any objects in your database. This test also seemed like it generated random results.

Rewriting the Articles

Rewriting the query in our original article, Returning Rows in Random Order, gives us this:
SELECT ID
FROM FOO
ORDER BY NEWID()

which is much better than the cursor over a temp table in the original article. I also wrote Returning a Single Random Row. Using our new method you could select a single random row like this:

SELECT TOP 1 ID
FROM FOO
ORDER BY NEWID()

This is much simpler than the approach in the earlier article which built a sequential number and then picked the record at random. The NEWID() approach can generate a table scan of the target table -- or at least a search of the records the WHERE clause limits it too. For small tables or systems where performance isn't critical this might be acceptable. For larger tables or performance critical systems the approach in the other article might be better. The other article also has a way to weight different rows so they are more likely to be selected. I don't see any easy way to do that using NEWID() just yet.

You'll also find a comment in this thread that indicates this is only possible on Windows 2000. I was able to test SQL7 on NT4 and confirmed that NEWID generated a non-random (i.e. sequential) result. I tested SQL7 on Windows 2000 and it generated random results so that post seems to be accurate.

In summary, this is a better way to randomly order rows in SQL Server 2000. It has some performance concerns but it's very easy to implement. And if you're especially curious you can find out a little about the history of random number generation.


- Advertisement -