Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 Tabledesign (newbie)

Author  Topic 

avansor
Starting Member

2 Posts

Posted - 2013-12-05 : 13:36:56
Hello,

I am creating a simple database to log the number of visits from clients. The table is as follows :

tbl_statistics

--------------------------------------------------
p_key | client_id | client_datetime
--------------------------------------------------
1______4________2013-12-05 21:00:00
2______3________2013-12-05 21:02:11
3______1________2013-12-05 21:07:31
4______3________2013-12-05 21:12:42
.....



It will probably be logging about 2-3 rows in this table per second maximum so there will be some rows ... My question to you experts now is how the design of this table is done in the best way ( performance wise) in terms of "primary keys" , " clustered index" , "non clustered index"...

The only type of query that will retrieve data from this table is a standard "SELECT FROM" where " client_id " is of a certain value and " DATE_TIME " is between a specific date range.

example:

SELECT Count ( client_id ) AS totantal FROM tbl_statistics
WHERE ( client_datetime ) Between '2013 - 01-01 00:00:00 ' And '2014 - 01-01 00:00:00 '

Or is this better?

SELECT Count ( p_key ) AS totantal FROM tbl_statistics
WHERE ( client_datetime ) Between '2013 - 01-01 00:00:00 ' And '2014 - 01-01 00:00:00 '


Are extremely grateful for all tips and comments.

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2013-12-05 : 15:22:26
Do the sequence client_datetime values pretty much coincide with identity sequence? In other words the value is NOW for each insert?
If so then I would make the clustered index on client_datetime. That way inserts are very efficient with rows added naturally to the "bottom" of the table. And your queries by client_datetime range will be very efficient.

However, I now see that your example date ranges are for a full year. That is a lot of data to return at 2-3 rows per second. If that is a typical query then I would probably change my answer and make the clustered index on p_key and a non clustered on (Client_id, client_datetime).

Be One with the Optimizer
TG
Go to Top of Page

avansor
Starting Member

2 Posts

Posted - 2013-12-05 : 17:09:49

TG:

You are correct. The "client_datetime" is always "NOW". In the example query i selected a range for a full year. This will not be the most common question. I guess the most common range would be 1-30 days.

So would that mean you recommend a clustered index on client_datetime and nothing on the other columns (p_key, client_id) ?
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2013-12-05 : 17:30:29
Best practice is to have some sort of unique constraint on the table so that you can avoid duplicates and have a reliable row identifier. If there is no logical key - in other words duplicates are OK - then a non clustered primary key on p_key an obvious choice. I would only consider an index on client_id to resolve a performance problem. I could see that if do end up many rows in your date range or no date range at all.

EDIT:
one more thing to consider. because you only have the three columns perhaps a clustered primary key on all of them could be best. That would essentially mean the PK IS the table. (Client_Datetime, Client_id, p_key). no other indexes.

The p_key would be last and just be a tie breaker if a couple rows got in there with the exact same date.

If all your queries include a date range then that would probably be best performing for both the inserts and the selects.

Be One with the Optimizer
TG
Go to Top of Page
   

- Advertisement -