Database design..artificial id columns

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Database Design and Application Architecture

Database design..artificial id columns

Author

Topic

sth_Weird
Starting Member

38 Posts

Posted - 2010-08-03 : 10:14:21

Hello,
Sorry if there is already a thread about this question, but I did not know what to search for.
At work we are discussing the best practice of database design.
There are two different views about auto-generated ids as primary keys.
Since there are a lot of database experts here I was wondering about your optinion.
Let's take the following, easy example (columns incomplete, of course!):
-Table tPerson (Name, Age,...)
-Table tAddress (Country, ZIP,...)
-Table tPerson_Address
There is a many-to-many relationship between Persons and Addresses

(1) The tables tPerson and tAddress have an artificial, auto-generated primary key ("id"), because there is no information about a person or address that might not change (a persons name might change due to marriage, the ZIP-code of a city might change or be shared among several cities,...).
BUT, the table tPerson_Address does not have an auto-generated primary key, it only holds the two "id" columns, which is used as the new primary key for the table.
(2) Same as (1), but WITH an auto-generated primary key that is used as a primary key (but the two id columns are used as ids, too).
The main principle of this point of view is that ALL tables should have an artificial, auto-generated primary key.

I'm not sure...when I went to university, I was tought (1).
What do you think?

thx for your opinion!
sth_Weird

Kristen
Test

22859 Posts

Posted - 2010-08-03 : 10:34:29

Have a Google for Surrogate key vs. Natural key

sth_Weird
Starting Member

38 Posts

Posted - 2010-08-04 : 07:16:40

Thanx for your reply and the keywords.
Unfortunatelly, after reading though a lot of google matches, I'm still not much wiser, because all the articles and discussions were about the tables listing data, not about the tables linking data/tables with each other. It's the latter case I am searching for.
For example, in a table that assigns persons to adresses (many-to-many relationship), is it enough to include the person-id and address-id in that table and make it the new primary key of that table, OR should I create a surrogate key (auto-id) as primary key and control the uniqueness of the person-id and address-id (which actually are the "real world" columns) by constraints/index? Because that's what a collegue suggests an I just don't see any sense in that.
Or is it all just a matter of taste?

sth_Weird

jeffw8713
Aged Yak Warrior

819 Posts

Posted - 2010-08-04 : 19:01:29

FWIW - I agree with you. There is no reason to create an identity column for this table. When you use this table in any joins, you have to use the actual columns and cannot use this identity column anyways.

The only reason for adding an identity column would be to create a clustered index on that column. The clustered index would then force all inserts to the logical end. However, for this type of table - that will have an impact on your reads since all of the related columns could be spread out across multiple pages.

If you create the clustered index on the composite of both columns, set a reasonable fill factor (90-95%), then your inserts won't cause a lot of page splits and your reads would be reduced since related entries would most likely be on only one or two pages.

Additionally, you are going to have to have a unique constraint on those two columns - which is going to require an index to maintain. With an identity column added - you now have two indexes to maintain, the clustered index on the identity column and the non clustered composite index on your other two columns. The non-clustered index would then be used for your queries (most likely) and all you have with the identity column is additional overhead to maintain something that probably won't be used.

Jeff

Kristen
Test

22859 Posts

Posted - 2010-08-05 : 03:17:31

"Or is it all just a matter of taste?"

IME if you ask two Database Designers you'll get three answers ...

"There is no reason to create an identity column for this table"

How will you create tie-break numbers for Address tables? Person joins to many Address records. Can have Person-1 with Address-1, Address-2 and Person-2 with Address-1, Address-2 - but then you have to have some form of MAX(ItemNo) logic to assign the next number - which may cause problems with concurrency.

Or you could have IDENTITY on the Address and Person-1 has Address-1, Address-2 and Person-2 had Address-3, Address-4

I think the second is easier.

"set a reasonable fill factor (90-95%)"

There is a contrary argument to use 10)% fill fact and let the page splits happen. Maybe inserts will be clustered anyway, so there will be 4 or 5 inserts into the same page and a page-split will happen anyway, even at 90% ... and lots of pages will be carrying slack and will result in more index reads.

"after reading though a lot of google matches, I'm still not much wiser"

I think the key thing that a surrogate key lets you do is allow the User to change the key (easily).

We have a Category table on our e-commerce product. It has a Natural key. Our Clients set up Category codes, and then often have a re-think, or don't like how they look in the URLs. But they then cannot change them.

If we used an IDENTITY they would be able to change the Code whenever they wanted. But ... what would happen to all the URLs in Google and links from other sites if they DID change them? We'd need to have a history of all previously used Category Codes so that we could map the old ones to the new ones ...

jeffw8713
Aged Yak Warrior

819 Posts

Posted - 2010-08-05 : 16:21:04

Kristen,

I think you are mis-reading what I wrote. I am not advocating using natural key vs. surrogates. In this particular case where the table is only used as a relationship (sometimes called a junction table), there is no absolutely no value to adding an identity column as a PK and/or a clustered index.

There will never be a case where you need a tie-breaker, because this table is only used to build the many-to-many relationship between the two tables. You have two columns required in this relationship - the PK from Person and the PK from Address.

Adding an identity to *this* table does not add any value - increases storage requirements, and will increase the number of reads required to access a single person's data if a new address is added later, since that row will not be contained on the same page.

Jeff

Ndidiamaka
Starting Member

9 Posts

Posted - 2010-08-06 : 00:25:46

Am having a similar issue with my design..
My tables are question, Answer, and persons table with Test as the junction table for the other three, when i make an insert which probably will be an import from a flat file, the colunms are mapped to the first three tables and i do expect the junction table to update automatically but it dosent...as a result i can get any result from my queries...pls really need help..i use ssis to import and have tried to use bulkinsert and openrowset but the last two give hell of erros while ssis does the import but the Test table is nt updated

Ndidiamaka
Starting Member

9 Posts

Posted - 2010-08-06 : 00:31:22

Kristen
Test

22859 Posts

Posted - 2010-08-06 : 02:21:50

quote:
Originally posted by jeffw8713
I think you are mis-reading what I wrote.

I was just putting a contrary view.

I didn't propose using IDENTITY for the Primary Key, only that it could be used as the tie-break.

Using an IDENTITY column the Address PK could be PersonID + IDENTITY. There has to be something to use for the address PK. It could be just IDENTITY (which has no benefit), or it can be Person+Item (where item starts at 1 for each person), or it can be Person+IDENTITY (so no relationship between IDENTITY and Person, other than to provide a tie-break) ... or it could be something else - such as a manually assign / Natural key such as "first 4 letters or name and a two digit tie break"

jeffw8713
Aged Yak Warrior

819 Posts

Posted - 2010-08-06 : 16:02:33

Kristen, I don't understand - sorry. How would adding a tie-break column in the Person_Address table help?

Whether or not an IDENTITY is used in the Person table, or the Address table is not relevant to the OP's question. The OP was asking about putting an IDENTITY column on the relationship (junction) table Person_Address. This table only needs two columns - the PersonID column (FK to ID column in Person table) and AddressID (FK to ID column in Address table).

Create a composite primary key over the PersonID And AddressID columns - make it clustered and that's it.

Using some kind of calculated column for the PK in the Address table is not very useful. Instead of having a single address for a family, you'll have multiple entries - one for each family member. Of course, as soon as you do that - the addresses are going to be different (e.g. one will have Street, the other will have St. and a third will have just St).

Jeff

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources