Clustered Index vs Primary Key...

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

Transact-SQL (2008)

Clustered Index vs Primary Key...

Author

Topic

malachi151
Posting Yak Master

152 Posts

Posted - 2012-05-01 : 11:50:42

This is regarding the ordering of columns for compound keys. (This is in relation to a reporting scenarios, not OLTP).

Generally speaking, the common wisdom (and my own experience) says that a compound index, (in this case the PK) works best when the most selective element is on the left and the least selective element is on the right.

To use an example, let's look at a 3 column index in an example. Don't worry that column C looks like a unique key all by itself for this example.


CA| CB | CC
1   A   3245
1   A   6546
1   B   987
1   B   9878
1   C   19803
1   C   292347
2   D   1987
2   E   9346
2   F   6743

Now, it would seem to me that the best way to organize the data from a read perspective would be the way that I have it shown above, with a clustered index on A, B, C, but if you do that then the first column in the index is not very selective so it will perform poorly if you try to put column C in the search predicate, i.e. WHERE CC = 987.

But, it would seem that if you want to do something like get s SUM of some value by Column CA or by Column CA, CB, then having the clustered index on CA, CB, CC, as shown, would be beneficial.

If you put the clustered index on CC, CB, CA (as standard practice would dictate), then the values for the aggregates have to be read from "random" locations all over the disk since it will be sorted on the values in column CC.

So, given then, what if you create the clustered index on columns CA, CB, CC, but create the Primary Key on columns CC, CB, CA?

Does that resolve the issue, or does it just create a new problem is does it solve nothing?

Also, if I have a table with a natural key in it, that resolves down to the unique id, let's say that its a 4 column natural key, but I also create a synthetic key, what is the impact of creating a clustered index on the natural key, with the columns ranging from least selective to most selective, but creating the PK on the synthetic key?

An example here might be something like a claim transactions table.

Let's say that the claims will typically be reported on by policy number > claim number > transaction date. Thus organizing the data on the disk would seem to make sense to organize it by policy_number, claim_number, transaction_date, but each transaction also has a transaction_id, which is unique across the table, so that the transactions ids are sequential across transactions, but the entry of transactions has no relationship to claim numbers, etc., i.e. transaction_id 12345 may go to claim XYZ and transaction 12346 may go to claim AFG, etc.

Typically, what would be done is to summarize the claim transactions by claim number or policy number, so it would seem that those rows together physically via a clustered index would make sense.

So in that case would it make sense to put a clustered index on policy_number, claim_number, transaction_date, transaction_id, but put the PK purely on transaction_id?

--------------------------------------------------
My blog: http://practicaltsql.net/
Twitter: @RGPSoftware

LoztInSpace
Aged Yak Warrior

940 Posts

Posted - 2012-05-01 : 23:58:45

* A clustered index will not save the world. Don't sweat it if you can't find a use for one
* Not all queries benefit from a CI. You have described a whole different bunch of scenarios. A CI can only really help with one or two
* Best candidates for CIs are where you do a range scan. Your policy/claim/transaction date is a good choice if you are looking at all policies or a range. If it's constrained by date (e.g. all policies last month) then having the date first would be an obvious choice in your CI. Of course this is only true if doing this report represents a significant part of your work. Do it one way you'll win on some reports at the expense of others. You need to work that out. If you are reporting on one policy then it doesn't make as much difference if it's clustered or not.
* If you have queries that return a lot of data it will scan the entire table (CI) anyway so your indexing is moot.
* If you feel you need to have a synthetic key then you should create a unique index on the natural key as well. This index should ideally be useful to you as well as provide uniqueness, so you need to decided if you can leverage it for something like a range scan then do so and have your PK as a non-CI
* Bear in mind the CI key appears in all other indexes so you might lose a bit of space and increase IO with wide CIs

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2012-05-02 : 05:05:15

Index design can be very complex sometimes, but remember that you also have nonclustered indexes with the option of included columns. With my limited knowledge of your production environment I'd probably consider a clustered index in the order CBA and then two nonclustered indexes on columns CA and CB.

- Lumbago
My blog-> http://thefirstsql.com/2011/07/08/how-to-find-gaps-in-identity-columns-at-the-speed-of-light/

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources