Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
avansor
Starting Member
2 Posts |
Posted - 2013-12-05 : 13:36:56
|
Hello,I am creating a simple database to log the number of visits from clients. The table is as follows :tbl_statistics--------------------------------------------------p_key | client_id | client_datetime--------------------------------------------------1______4________2013-12-05 21:00:002______3________2013-12-05 21:02:113______1________2013-12-05 21:07:314______3________2013-12-05 21:12:42.....It will probably be logging about 2-3 rows in this table per second maximum so there will be some rows ... My question to you experts now is how the design of this table is done in the best way ( performance wise) in terms of "primary keys" , " clustered index" , "non clustered index"...The only type of query that will retrieve data from this table is a standard "SELECT FROM" where " client_id " is of a certain value and " DATE_TIME " is between a specific date range.example:SELECT Count ( client_id ) AS totantal FROM tbl_statisticsWHERE ( client_datetime ) Between '2013 - 01-01 00:00:00 ' And '2014 - 01-01 00:00:00 'Or is this better?SELECT Count ( p_key ) AS totantal FROM tbl_statisticsWHERE ( client_datetime ) Between '2013 - 01-01 00:00:00 ' And '2014 - 01-01 00:00:00 'Are extremely grateful for all tips and comments. |
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2013-12-05 : 15:22:26
|
Do the sequence client_datetime values pretty much coincide with identity sequence? In other words the value is NOW for each insert?If so then I would make the clustered index on client_datetime. That way inserts are very efficient with rows added naturally to the "bottom" of the table. And your queries by client_datetime range will be very efficient. However, I now see that your example date ranges are for a full year. That is a lot of data to return at 2-3 rows per second. If that is a typical query then I would probably change my answer and make the clustered index on p_key and a non clustered on (Client_id, client_datetime).Be One with the OptimizerTG |
|
|
avansor
Starting Member
2 Posts |
Posted - 2013-12-05 : 17:09:49
|
TG:You are correct. The "client_datetime" is always "NOW". In the example query i selected a range for a full year. This will not be the most common question. I guess the most common range would be 1-30 days. So would that mean you recommend a clustered index on client_datetime and nothing on the other columns (p_key, client_id) ? |
|
|
TG
Master Smack Fu Yak Hacker
6065 Posts |
Posted - 2013-12-05 : 17:30:29
|
Best practice is to have some sort of unique constraint on the table so that you can avoid duplicates and have a reliable row identifier. If there is no logical key - in other words duplicates are OK - then a non clustered primary key on p_key an obvious choice. I would only consider an index on client_id to resolve a performance problem. I could see that if do end up many rows in your date range or no date range at all.EDIT:one more thing to consider. because you only have the three columns perhaps a clustered primary key on all of them could be best. That would essentially mean the PK IS the table. (Client_Datetime, Client_id, p_key). no other indexes.The p_key would be last and just be a tie breaker if a couple rows got in there with the exact same date.If all your queries include a date range then that would probably be best performing for both the inserts and the selects.Be One with the OptimizerTG |
|
|
|
|
|