Implementing Table Inheritance in SQL Server

By Jeff Smith on 20 February 2008 | Tags: Database Design , Table Design


When designing a database, we sometimes come across situations where there are multiple types of entities that we are modeling, but we'd like them to all have certain attributes or relations in common. Using "sub-type" tables is a simple way to implement table inheritance in SQL Server.

For example, a question came up recently about modeling the following entities in a "School" database:

  • Students
  • Teachers
  • Parents

Each of those entities has many of the same attributes, such as first name, last name, middle name, and birth date. Yet, we must separate them into multiple tables because we need to store and track different data for students, teachers and parents: students have grades and classes and parents; teachers have classes taught, skills, employment information, and so on.

In addition to sharing common attributes, these entities also have common relations. For example, for each of those entities we might also like to store addresses, phone numbers, correspondence history, etc. To do this in a nicely normalized database, we would model that data by creating additional tables:

  • StudentsAddresses
  • TeachersAddresses
  • ParentsAddresses
  • StudentPhoneNumbers
  • TeacherPhoneNumbers
  • ParentPhoneNumbers
  • StudentCorrespondance
  • TeacherCorrespondance
  • ParentCorrespondance
  • ...etc...

On top of the redundant, similar tables, we would need a whole mess of redundant, similar stored procedures to add/update/delete/select items from these tables. Yet, again, we need different tables for these different entities because they each have their own set of relations and attributes to track.

Is there an easier way to model this in a relational database? Absolutely -- let's take a look.

Creating A "Base Table"

We can start by recognizing that Students, Teachers, and Parents are all "People", and we can note that it makes sense to say that all People can have addresses and phone numbers and correspondence history:

  • People
  • PeopleAddresses
  • PeoplePhoneNumbers
  • PeopleCorrespondence

In the People table, we'd store all of the common attributes of Students, Teachers and Parents that we discussed earlier: name, birth date, and so on. We remove all of these common columns from the Students, Teachers and Parents tables and put them all in one place. Now, maintaining phone numbers, addresses, names, birthdays, and correspondence can all be done with one set of generic stored procedures. The redundancy of those activities has now been reduced, and any changes in phone number or address formats can all be done in place. We can refer to the People table as a "base table".

Deriving Sub-Tables from the Base Table

Of course, we still need our Students, Teachers and Parents tables -- but now the primary key of these tables also becomes a foreign key to the People table.

Because any row in the Students, Teachers or Parents tables require a related row in the People table, and it also shares the same primary key as the People table (i.e., it is a 1:1 relation), I'll refer to the Students, Teachers and Parents tables as sub-tables of the People table. Think of these sub-tables as tables that extend the basic information that the People table provides; this is similar to how inheritance works in Object-Oriented Programming (OOP). Essentially, we are performing "table inheritance" by doing this; since every student, teacher and parent is by definition also a "person" and we are guaranteed that we can work with all of those entities the same way by treating them as People if we want, or we can work with them using their specific attributes and relations.

That gives us a schema like this:

create table People(PersonID int primary key, Name varchar ... etc ...)

create table Students(PersonID int primary key references People(PersonID), ...)

create table Teachers(PersonID int primary key references People(PersonID), ...)

create table Parents(PersonID int primary key references People(PersonID), ...)

With that particular schema, notice a few things:

  1. You cannot create a Student/Teacher/Parent until you first set up the "Person". Once the person is created, you can now add a related row to the appropriate table, depending what type they are.
  2. We still have our common data all in one place, but now we have tables set up to store specific attributes and relations for these different types of People. A simple JOIN from Students to People gives us the student's name, contact information, and so on.
  3. This schema allows a person to be a student, teacher and parent -- all at once! It also allows a person to be none of these -- you simply don't add a related row in any of the sub-tables.
  4. If we want to output a "Type" column (Student, Teacher, Parent) when querying the People table, we must outer join to all the 3 "sub-tables" and determine if any matching rows exist.

That 3rd point is important, as it may not be what you want; we'll get back to that in a moment. But for our purposes, I think it makes sense: suppose a person is initially a student, comes back to work for the school as a teacher, and then eventually has kids of their own. This schema allows us to handle that.

The 4th point is important to consider as well. We could add a "PersonType" column to the table, and create a table of PersonTypes -- but then we introduce redundancies and the possibility of conflicting data. Also, as mentioned, a person can be more than one "type" at a time, so the existence of this column doesn't really make sense in this table without changing our logical model.

Using this schema is easy; we can quickly get all of the information for all of the students with a simple join:

select s.*, p.*
from students s 
inner join people p on s.personID = p.personID

(you'd replace * with actual columns that you need, of course.)

So, everything works beautifully. We have our base table, we have our sub-tables, and the inheritance allows us to work with the different entities all the same way, but to also treat each of them specifically on their own as necessary.

Modeling One-to-Either Constraints

However, what if we want to set up our model so that a Person can have only one type; i.e., either a Student, OR a teacher, OR a parent? This is a more accurate database implementation of inheritance, since in OOP you cannot create an instance of something that is more than one sub-class at a time -- you must pick one or the other (or potentially the base class itself, if that's what you need).

Modeling this can be tricky, and it can be done a couple of ways. Here’s a technique I like to use, it doesn’t require a trigger, it is easy to implement, and it uses standard referential integrity without complicated check constraints.

First, create a table of PersonTypes:

create table PersonType (PersonTypeID int primary key, PersonType varchar(10))

And let’s insert the following values, one per “sub-table”:

insert into PersonType
select 1,'Student' union all
select 2,'Teacher' union all
select 3,'Parent'

This is the table that defines the different types our system will allow. There will be one entry in this table per "sub-table" that we create. If we want to allow for a Person to be just a generic "Person" without any specify sub-type, we could add a row for "Other".

With that table in place, we will now add a PersonType column to our People table. For this example, let’s only define a generic Name column for our People table:

create table People
(
  PersonID int primary key, 
  PersonTypeID int references PersonType(PersonTypeID) not null, 
  Name varchar(10) not null
)

Pretty standard stuff so far – we have different types of People we are modeling, and each Person must be assigned a type.

And now comes the trick: We will add an extra PersonType column to each of our sub-tables, and force the value in that column to be equal to the PersonTypeID for that table. In SQL 2000, we can just add a default and a check constraint, but in SQL 2005 we can use a persisted computed column. So, our Students table will have a constant PersonType value of 1, the Teachers table a value of 2, and the Parents table a value of 3.

Then, we simply create a foreign key relation back to the People table – but instead of just on the PersonID column, we create the constraint on both the PersonID and the PersonTypeID columns. With these constraints in place, we are guaranteed that Person will only have one type, and that they can only have data in the matching sub-table for that type.

There is a catch, however: we cannot create our sub-tables just yet; if we try it, we’ll get the following error:

create table Students 
(
  PersonID int primary key, 
  PersonTypeID as 1 persisted, -- student
  EnrollmentDate datetime, 
  foreign key (PersonID, PersonTypeID) references People(PersonID, PersonTypeID)
)

Msg 1776, Level 16, State 0, Line 1
There are no primary or candidate keys in the referenced table 'People' that 
match the referencing column list in the foreign key 'FK__Students__1F98B2C1'.
Msg 1750, Level 16, State 0, Line 1
Could not create constraint. See previous errors.

This is because the unique constraint on the People table is just on the PersonID column, but we are trying to set up a foreign key constraint on the combination of PersonID/PersonTypeID. To handle this, we simply add an additional unique constraint to the People table, covering both PersonID and PersonTypeID:

drop table People
go

create table People
(
  PersonID int primary key, 
  PersonTypeID int references PersonType(PersonTypeID), 
  Name varchar(10)
  constraint People_AltPK unique (PersonID,PersonTypeID)
)

With that in place, we are good to go:

create table Students 
(
  PersonID int primary key, 
  PersonTypeID as 1 persisted, -- student
  EnrollmentDate datetime, 
  foreign key (PersonID, PersonTypeID) references People(PersonID, PersonTypeID)
)

create table Teachers 
(
  PersonID int primary key, 
  PersonTypeID as 2 persisted, -- teacher
  HireDate datetime, 
  foreign key (PersonID, PersonTypeID) references People(PersonID, PersonTypeID)
)

create table Parents
(
  PersonID int primary key, 
  PersonTypeID as 3 persisted, -- parents
  DifficultyScore int,
  foreign key (PersonID, PersonTypeID) references People(PersonID, PersonTypeID)
)

Command(s) completed successfully.

Note: If you are using SQL Server 2000, instead of computed columns for the PersonTypeID constants, you would instead use a default value and a check constraint, since it does not allow computed columns to be used in foreign key constraints:

create table Teachers -- SQL 2000 Version 
(
  PersonID int primary key, 
  PersonTypeID int not null default 2 check (PersonTypeID = 2), -- teachers type
  HireDate datetime, 
  foreign key (PersonID, PersonTypeID) references People(PersonID, PersonTypeID)
)

Now to test it. Let’s create a new Person, called "Bob", and make him a student. First, we add a row to the People table, and then a matching row to the Student table:

insert into People (PersonID, PersonTypeID, Name)
values (1,1,'Bob')

(1 row(s) affected)

insert into Students (PersonID, EnrollmentDate)
values (1,'2007-01-01')

(1 row(s) affected)

Notice that we cannot add teacher information for Bob, since he is a student:

insert into Teachers (PersonID, HireDate)
values (1,'2007-01-01')

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY constraint "FK__Teachers__31B762FC". 
The conflict occurred in database "PlayGround", table "dbo.People".
The statement has been terminated.

And, of course, we cannot change Bob’s type from a student to a teacher, since there is already a row in the Students table for him:

update People set PersonTypeID=2 where PersonID = 1

Msg 547, Level 16, State 0, Line 1
The UPDATE statement conflicted with the REFERENCE constraint "FK__Students__2645B050". 
The conflict occurred in database "PlayGround", table "dbo.Students".
The statement has been terminated.

The only way to change Bob’s type is to remove the row from the Students table first:

-- delete student attributes:
delete from Students where PersonID=1

-- change the type from a student to to a teacher
update People set PersonTypeID=2 where PersonID = 1

-- add teacher attributes:
insert into Teachers (PersonID, HireDate)
values (1,'2007-01-01')

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)

And there you have it! A "one-to-either" constraint, where each row in one table has a related row in only one of several possible tables, without the need for any triggers or any complicated T-SQL programming to maintain data integrity.

Conclusion

Once things are in place, we have a simpler data model with fewer tables and stored procedures, and we are guaranteed consistency when dealing with common attributes and relations of "People" in the database. We can easily create new types of People simply by adding a row to the PersonTypes table and then creating a new "sub-table". As soon as the new sub-table is created, that new type immediately has all of the attributes and relations of a Person – without coding up anything or making any other changes to the database!


Related Articles

Using SET NULL and SET DEFAULT with Foreign Key Constraints (12 August 2008)

Implementing Table Interfaces (19 May 2008)

Custom Auto-Generated Sequences with SQL Server (24 April 2007)

The Daily Database Build (23 August 2004)

HOW TO: Move a Database Diagram (12 December 2003)

Database Design Thoughts for Packaged Applications (26 October 2003)

The current state of database research (16 September 2003)

Using Metadata (24 March 2003)

Other Recent Forum Posts

SSRS error on sign in ERR_UNEXPECTED (6h)

SSIS Component C sharp source (1d)

Simple SQL Update Query behaviour changing based on record count (2d)

Simple SQL Update Query behaviour changing based on record count (2d)

Unable to execute stored procedure while Database is Synchronizing (2d)

SQL query for products ratings and reviews in my store (3d)

Split column in MS SQL an copy to new columns (4d)

Help needed with query (5d)

- Advertisement -