Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Database Design and Application Architecture
 Old question revisited: char vs int PK

Author  Topic 

DemBones79
Starting Member

2 Posts

Posted - 2009-10-09 : 18:33:45
First a little background:

My company (a service organization) uses Teradata for its data warehouse and SQL Server for applications. I am in the unique position to be able to create a new database from the ground up to support a new project. It is going to require a lot of data to be pulled from Teradata on a daily basis.

Our order ID tables are PKed on unit (char(7)), order # (char(8)), and creation date.
Our attempt tables add attempt date to the base PK.
To date, we've designed our SQL Server tables to be a 1:1 match to our Teradata tables.

Now that that's out of the way:

For a previous project, I experimented with using a single-column PK for order IDs by concatenating the unit to the order #. It seemed to work rather well on a table with over 250k records, but it did not account for the creation date. Well, I've improved my system to be able to include creation date and even attempt date for the attempts table.

The values are all numeric characters but once combined and cast as a numeric type, the resulting number is too large for any of the numeric data types in Teradata. Even decimal(32,0)- our version of TD supposedly supports decimal(38,0)- is resulting in corrupted/truncated data by the time I get to the 10 millions place. So for now the value is remaining a char field.

My question is, given that I can not create a working numeric field in our data warehouse that represents this data, should I bother? Or given that, at worst, our PK is comprised of two char fields and two date fields can I really do no harm by creating a single, albeit large, primary key using concatenated character fields?

dportas
Yak Posting Veteran

53 Posts

Posted - 2009-10-10 : 16:05:02
Why would you want to concatenate these columns into a numeric key? Couldn't you use a surrogate (arbitrary) numeric key instead? I'm assuming you also enforce a composite key constraint on the original columns.
Go to Top of Page

DemBones79
Starting Member

2 Posts

Posted - 2009-10-11 : 12:19:16
I'll admit to still being new to a lot of concepts in database design. Up until now I mostly have had to deal with what I've been handed. So I still have lots of questions about some topics. In order to help me understand this a bit better I was wondering if anyone could either point me toward the answers to the following (I'll be doing my own digging too, but I prefer to get as many opinions as possible):

1. Is there a difference in performance when performing a join operation (inner/left or right outer) on multiple columns vs a single one?

2. SQL Server does not have the ability to designate multiple columns for an in/not in comparison ex: where (name,phone) in (select name, phone from...). The common recommendation I have read is to concatenate the two or more columns on both sides of the "in". Is there a performance penalty to this vs using a single column? If there is a penalty, would the difference in performance become less if the single column was larger in size such as a char(32)?

3. How much of a difference in performance is there when joining on a primary key vs an indexed column or secondary key?
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2009-10-11 : 14:59:13
quote:
Originally posted by DemBones79

I'll admit to still being new to a lot of concepts in database design. Up until now I mostly have had to deal with what I've been handed. So I still have lots of questions about some topics. In order to help me understand this a bit better I was wondering if anyone could either point me toward the answers to the following (I'll be doing my own digging too, but I prefer to get as many opinions as possible):

1. Is there a difference in performance when performing a join operation (inner/left or right outer) on multiple columns vs a single one?
Depends on factors like presence of index, data types of columns ...
2. SQL Server does not have the ability to designate multiple columns for an in/not in comparison ex: where (name,phone) in (select name, phone from...). The common recommendation I have read is to concatenate the two or more columns on both sides of the "in". Is there a performance penalty to this vs using a single column? If there is a penalty, would the difference in performance become less if the single column was larger in size such as a char(32)?
you can use joins in that case rather than in
the equivalent statement for
where (name,phone) in (select name, phone from...)
is
table1 t1
join table2 t2
on t2.name=t1.name
and t2.phone=t1.phone

3. How much of a difference in performance is there when joining on a primary key vs an indexed column or secondary key?

what do you mean by secondary key?

see below
http://www.sql-server-performance.com/tips/tuning_joins_p1.aspx


Go to Top of Page

dportas
Yak Posting Veteran

53 Posts

Posted - 2009-10-12 : 15:42:07
2. Is that why you want to use a concatenated key? IN can easily be replace by alternatives using EXISTS or JOIN. The problems with a concatenated key would likely far outweigh any potential benefit of using IN.

3. No difference at all. The performance is determined by the type of index, shape and size of the query and so on.

Go to Top of Page

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2009-10-13 : 14:07:20
I just want to add that EXISTS and JOIN will likely far outperform IN in anything but the smallest record sets.
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2009-10-13 : 16:04:04
quote:
Originally posted by DemBones79

Our order ID tables are PKed on unit (char(7)), order # (char(8)), and creation date.

Surrogate vs natural keys may be hotly debated, but these days the use of "superkeys" (concatenated composite natural keys) is nearly universally disfavored.

quote:
Originally posted by DemBones79

To date, we've designed our SQL Server tables to be a 1:1 match to our Teradata tables.

Excuse me? You are basing your operational (transactional or reporting) database directly on your data warehouse design?

________________________________________________
If it is not practically useful, then it is practically useless.
________________________________________________
Go to Top of Page

dportas
Yak Posting Veteran

53 Posts

Posted - 2009-10-13 : 17:58:27
quote:
Originally posted by blindmanthese days the use of "superkeys" (concatenated composite natural keys) is nearly universally disfavored.


The term I'm familiar with for the concept is "Intelligent Key". Superkey definitely means something different (a superset of a key) and there's no reason why it can or should be out of favour!
Go to Top of Page

blindman
Master Smack Fu Yak Hacker

2365 Posts

Posted - 2009-10-15 : 09:16:24
I've never heard of it referred to as an "Intelligent Key" before, which is good, because I would have choked from laughing.
It is probably the WORST choice for a primary key, for reasons which should be obvious. Length, susceptibility to change, redundancy, cost of creation and maintenance...

________________________________________________
If it is not practically useful, then it is practically useless.
________________________________________________
Go to Top of Page
   

- Advertisement -