Numeric Primary Keys

By Bill Graziano on 13 June 2001 | Tags: Data Types


Tim writes "The most obvious thing to do for records that have no unique column/s is to create an int identity column. For example, I have a 'address' table that should have a unique address_id for each record. Now, normally, I would configure a column address_id as 'int identity(1,1) primary key'. But, for theoretical sakes, what happens if one day, I hit the max value for an int (2^31-1). The next time I add a record, I'll get error 8115 (Arithmetic overflow). During this time though, many records have been deleted leaving holes in possible address_id values which I can reuse. How can I effectively reuse these values? Any ideas?"

The full text of Tim's question is: "SQLServer 7 or 2000:

The most obvious thing to do for records that have no unique column/s is to create an int identity column.

For example, I have a 'address' table that should have a unique address_id for each record. Now, normally, I would configure a column address_id as 'int identity(1,1) primary key'.

But, for theoretical sakes, what happens if one day, I hit the max value for an int (2^31-1). The next time I add a record, I'll get error 8115 (Arithmetic overflow). During this time though, many records have been deleted leaving holes in possible address_id values which I can reuse.

How can I effectively reuse these values?

A reference is the id field for SQL server objects. This is not an identity field (sysobjects table) but looks like an almost randomly generated value (in certain ranges).

Any ideas?"

So you want to get theoretical? I certainly do have some thoughts. And I like the question. But be warned that this column is a little different from what I normally write.

Int

An int field holds from roughly -2.1 billion to +2.1 billion. Giving you a range of roughly 4 billion numbers. You can actually get an extra 2 billion values by starting your identity at -2.1 billion. A CREATE TABLE statement might look like this:

CREATE TABLE Addresses (
	[AddressID] [int] IDENTITY (-2100000000, 1) NOT NULL ,
	[Address1] [char] (50) ,
. . .

The numbers look a little weird but it works. You can SELECT on it and join on it and it works just fine. You now have 4 x 109 or 4 billion entries you can use. Incidently that's 4 times more Cokes than have ever been sold but fewer people than are currently on the earth. And we only need 4 bytes to store the value.

BigInt

Moving up the numeric "food chain" we come to BIGINT. This is a new data type introduced in SQL Server 2000. Most numeric functions in SQL Server have not been changed to support BIGINT. Some that have are MIN, MAX, IDENTITY, @@ROWCOUNT and @@IDENTITY. You can use BIGINT values to join to other BIGINT values.

BIGINT numbers range from -263 to 263-1 or from -9.2 quintillion to 9.2 quintillion give or take a few quadrillions. A quintillion is 1018. A BIGINT is big enough to store one record for every kernel of wheat ever produced on the earth - eighteen times. Of course, you'd need to buy a bigger server. This whopping monster of a field takes 8 bytes to store it's values.

BIGINT is not very well undestood yet so please test extensively before using this in production. Especially test with numbers over 2 billion since that's where the implicit conversion to INT will fail.

Decimal

Still not big enough? Let's move on to DECIMAL or NUMERIC. These fields can store up to 38 digits split between the left and right site of the decimal place. That works out to 9.99 x 1038 or one shy of a duodecillion. And they work for identity fields. Sweet! I'm not sure what kind of number @@IDENTITY would return once you got above the sextillions though. I'm really not sure I'd suggest this for an IDENTITY field since it can take up to 38 bytes to store this value (depending on how many digits you actually need).

How big is a duodecillion? There are 24 septillion (24 x 1024)atoms in a cup of water. And there are 6 sextillion (6 x 1021) cups of water in the world's oceans. Which means that a duodecillion could count about 80% of the atoms in the world's oceans.

GUID

Those are your choices using IDENTITY fields but certainly not your final choices. You can use a uniqueidentifier field and populate it using the NEWID() function. This will give you a globally unique identifier (GUID) "across time and space" according to our earlier article on these. They take 16 bytes to store. Not exactly an IDENTITY field but they might do the trick.

Filling in the Gaps

You can fill in the gaps by renumbering or actually finding the holes and filling them. A query to find a hole might look like this:
select top 1 t1.AddressID
from Addresses t1
left join Addresses as t2
  on t1.AddressID + 1 = t2.AddressID
where t2.AddressID is null

It will return the first AddressID that doesn't have a value one higher in the table. This probably won't be a fast query, especially when the table gets full. I certainly wouldn't want to run this in an online scenario. It will lock the table for the duration of the run. If you do run it, you can use SET IDENTITY_INSERT to allow you to insert an explicity value into an IDENTITY field.

Renumbering

You also asked about renumbering. This is a little tricky with an identity. You can't run an update against an identity column. You'll need to remove the IDENTITY attribute for the column, renumber it and make the column an identity field again. Just make sure you set the SEED value to be higher than the highest existing value. And while you're doing this the table is unavailable. And any table that uses AddressID as a foreign key will need to be updated.

Summary

All things being equal I'd suggest an INT field for SQL Server 7 and possibly a BIGINT for SQL Server 2000. Are you really going to have more than 4 billion inserts in the table over it's life? After all 4 billion is the distance in miles from the Sun to Pluto. I'd also seriously consider finding a "natural" primary key other than a meaningless numeric field.

I hope you don't mind my having a little fun with your question. I also hoped I help point you in a direction that works for you. You can also read more about Big Numbers if you'd like.


Related Articles

Using the TIME data type in SQL Server 2008 (6 March 2008)

Using the DATE data type in SQL Server 2008 (6 December 2007)

Working with Time Spans and Durations in SQL Server (15 October 2007)

DATEDIFF Function Demystified (20 March 2007)

The Cost of GUIDs as Primary Keys (8 January 2005)

Search and Replace in a TEXT column (18 January 2004)

INF: Frequently Asked Questions - SQL Server 2000 - Table Variables (7 December 2003)

How to search for date and time values (2 May 2003)

Other Recent Forum Posts

Query performance Call Center data (21h)

Looking for on Premises tool to read data from SQL logs and populate data warehouse (1d)

Possible SQL 2014 to 2016 Issue - Some Application Functions Slow/Failing (1d)

Working with multiple WHERE statements (1d)

Create a new field value that shows a difference in value from 2 fields (3d)

Hierarchy wise Sales Targets (3d)

Get the MaxDate in results (5d)

Switch Statement within a SSRS SQL Expression (6d)

- Advertisement -