Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Database Design and Application Architecture
 Long - Indexes and partitions on a large database

Author  Topic 

Cody
Starting Member

24 Posts

Posted - 2009-03-27 : 09:44:21
I'm only about 6 months into my book-taught DBA journey so bear with me here.

Background:

My database currently resides in a single file, has about 70 tables, and is 20GB+ and growing. The top 10 tables contain about 10 million rows each, the next 10 tables are in the millions of rows, and so on.

There is row-level security on every table, a varchar(3) column is named customer_id and there is a corresponding view to do a SELECT * FROM table_name t WHERE t.customer_id = (the current login/user).

Primary keys are clustered and include the customer_id, more on that later.

Customers have access to the database and are only able to see their own data. It appears to be completely self-contained to them. No aggregation is done across customers in any way. There are about 22 customer_id's and the list will grow but not by more than ~10.

While the database itself started life about 10 years ago, it was previously hosted by each customer, and didn't contain those top 10 tables. Over the past year I've combined everything onto one server and implemented the security and created some new huge tables.

The Situation:

When I started loading on the first complete customer database, all seemed well. We actually didn't even have any indexes apart from a simple primary key. Table scans must (?) have been taking place but because the tables were small it didn't take long so wasn't a big deal.

As I loaded more and more data, it started taking longer and longer. I expect this is because of shuffling more and more data around to fit the new customers in. I was doing bulk inserts. I have since finished loading initial data - and any new data that flows in is a trickle (ie: a couple records a second during business hours).

It did start me thinking about indexes though, and I was going to add a lot of indexes on the table columns which are most commonly joined, though this has also brought up the possibility of changing some of the primary keys to non-clustered and clustering on something else. I've also read whatever I could find on partitioning (most SQL 2005 Admin books gloss over it which really annoys me) but I'm not sure if I need it as I don't have multiple disks.

Question:

1. In a situation like this, for example where you have a 10 million row table but it's accessed only a portion at a time depending on the customer_id column ... what's the best thing to do re: indexing or partitioning?

I couldn't work out if having the customer_id at the beginning or end of the primary key, and whether having the primary key as clustered or non-clustered, was going to provide the best performance.

I couldn't work out if it would be best to have the customer_id at the end of the primary key and create a separate non-clustered index with just the customer_id's in it. Or the other way around.

I'd like to make sure that if a customer does something that requires a table scan, that it's going to complete in as short amount of time as possible without having to touch all the other "hidden" customer information. I don't know if this is already happening or not?

2. As above, but when data is added to the database. Say for example there's a table with transaction id's. Each customer would be at a different transaction id point.

Transaction 1 - Customer 1
Transaction 1 - Customer 2
Transaction ... 500 ... Customer 1
Transaction ... 100,000 - Customer 2



If I have it clustered on transaction_id, customer_id ... which is what most people recommend for indexes as transaction_id is the most unique column in the table ... when Customer 1 is adding transactions in their lower range is that creating a small or huge overhead in MS SQL when it's shuffling the Customer 2 data forward to make room?

Furthermore, if I was to swap the clustered index to be customer_id, transaction_id ...

Customer 1 - Transaction 1-100,000
Customer 2 - Transaction 1-500...

Would it be the same?

3. I just can't seem to get a grip on partitions. I could tell ...

a) You create a partition on a key like customer_id and specify all the possible values ...
b) You assign this customer_id partition to a series of file groups, so each customer would basically have their own physical file ...
c) Then you recreate all these tables in them and move the data across ...

Something like that. Would that be useful in my case? Is there anything out there to show how this might actually be done?

Thanks.
   

- Advertisement -