Author |
Topic |
DemBones79
Starting Member
2 Posts |
Posted - 2009-10-09 : 18:33:45
|
First a little background:My company (a service organization) uses Teradata for its data warehouse and SQL Server for applications. I am in the unique position to be able to create a new database from the ground up to support a new project. It is going to require a lot of data to be pulled from Teradata on a daily basis.Our order ID tables are PKed on unit (char(7)), order # (char(8)), and creation date.Our attempt tables add attempt date to the base PK.To date, we've designed our SQL Server tables to be a 1:1 match to our Teradata tables.Now that that's out of the way:For a previous project, I experimented with using a single-column PK for order IDs by concatenating the unit to the order #. It seemed to work rather well on a table with over 250k records, but it did not account for the creation date. Well, I've improved my system to be able to include creation date and even attempt date for the attempts table.The values are all numeric characters but once combined and cast as a numeric type, the resulting number is too large for any of the numeric data types in Teradata. Even decimal(32,0)- our version of TD supposedly supports decimal(38,0)- is resulting in corrupted/truncated data by the time I get to the 10 millions place. So for now the value is remaining a char field.My question is, given that I can not create a working numeric field in our data warehouse that represents this data, should I bother? Or given that, at worst, our PK is comprised of two char fields and two date fields can I really do no harm by creating a single, albeit large, primary key using concatenated character fields? |
|
dportas
Yak Posting Veteran
53 Posts |
Posted - 2009-10-10 : 16:05:02
|
Why would you want to concatenate these columns into a numeric key? Couldn't you use a surrogate (arbitrary) numeric key instead? I'm assuming you also enforce a composite key constraint on the original columns. |
|
|
DemBones79
Starting Member
2 Posts |
Posted - 2009-10-11 : 12:19:16
|
I'll admit to still being new to a lot of concepts in database design. Up until now I mostly have had to deal with what I've been handed. So I still have lots of questions about some topics. In order to help me understand this a bit better I was wondering if anyone could either point me toward the answers to the following (I'll be doing my own digging too, but I prefer to get as many opinions as possible):1. Is there a difference in performance when performing a join operation (inner/left or right outer) on multiple columns vs a single one?2. SQL Server does not have the ability to designate multiple columns for an in/not in comparison ex: where (name,phone) in (select name, phone from...). The common recommendation I have read is to concatenate the two or more columns on both sides of the "in". Is there a performance penalty to this vs using a single column? If there is a penalty, would the difference in performance become less if the single column was larger in size such as a char(32)?3. How much of a difference in performance is there when joining on a primary key vs an indexed column or secondary key? |
|
|
visakh16
Very Important crosS Applying yaK Herder
52326 Posts |
Posted - 2009-10-11 : 14:59:13
|
quote: Originally posted by DemBones79 I'll admit to still being new to a lot of concepts in database design. Up until now I mostly have had to deal with what I've been handed. So I still have lots of questions about some topics. In order to help me understand this a bit better I was wondering if anyone could either point me toward the answers to the following (I'll be doing my own digging too, but I prefer to get as many opinions as possible):1. Is there a difference in performance when performing a join operation (inner/left or right outer) on multiple columns vs a single one?Depends on factors like presence of index, data types of columns ... 2. SQL Server does not have the ability to designate multiple columns for an in/not in comparison ex: where (name,phone) in (select name, phone from...). The common recommendation I have read is to concatenate the two or more columns on both sides of the "in". Is there a performance penalty to this vs using a single column? If there is a penalty, would the difference in performance become less if the single column was larger in size such as a char(32)?you can use joins in that case rather than in the equivalent statement forwhere (name,phone) in (select name, phone from...)istable1 t1join table2 t2on t2.name=t1.nameand t2.phone=t1.phone3. How much of a difference in performance is there when joining on a primary key vs an indexed column or secondary key? what do you mean by secondary key?see belowhttp://www.sql-server-performance.com/tips/tuning_joins_p1.aspx
|
|
|
dportas
Yak Posting Veteran
53 Posts |
Posted - 2009-10-12 : 15:42:07
|
2. Is that why you want to use a concatenated key? IN can easily be replace by alternatives using EXISTS or JOIN. The problems with a concatenated key would likely far outweigh any potential benefit of using IN.3. No difference at all. The performance is determined by the type of index, shape and size of the query and so on. |
|
|
lazerath
Constraint Violating Yak Guru
343 Posts |
Posted - 2009-10-13 : 14:07:20
|
I just want to add that EXISTS and JOIN will likely far outperform IN in anything but the smallest record sets. |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2009-10-13 : 16:04:04
|
quote: Originally posted by DemBones79 Our order ID tables are PKed on unit (char(7)), order # (char(8)), and creation date.
Surrogate vs natural keys may be hotly debated, but these days the use of "superkeys" (concatenated composite natural keys) is nearly universally disfavored.quote: Originally posted by DemBones79 To date, we've designed our SQL Server tables to be a 1:1 match to our Teradata tables.
Excuse me? You are basing your operational (transactional or reporting) database directly on your data warehouse design?________________________________________________If it is not practically useful, then it is practically useless.________________________________________________ |
|
|
dportas
Yak Posting Veteran
53 Posts |
Posted - 2009-10-13 : 17:58:27
|
quote: Originally posted by blindmanthese days the use of "superkeys" (concatenated composite natural keys) is nearly universally disfavored.
The term I'm familiar with for the concept is "Intelligent Key". Superkey definitely means something different (a superset of a key) and there's no reason why it can or should be out of favour! |
|
|
blindman
Master Smack Fu Yak Hacker
2365 Posts |
Posted - 2009-10-15 : 09:16:24
|
I've never heard of it referred to as an "Intelligent Key" before, which is good, because I would have choked from laughing.It is probably the WORST choice for a primary key, for reasons which should be obvious. Length, susceptibility to change, redundancy, cost of creation and maintenance...________________________________________________If it is not practically useful, then it is practically useless.________________________________________________ |
|
|
|