Need help on Splitting large table

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

SQL Server Administration (2005)

Need help on Splitting large table

Author

Topic

DBTEAM
Starting Member

4 Posts

Posted - 2010-03-26 : 04:41:29

Hi All,

Splitting a Large table

We have a 30 million + table with 30 fields
as follows
ID
name
job
company
address
city
state
zip
phone
fax
.
.
.
.
.
.
.
.
.
Email

like this we have 30 fields

In the above ID column is Identity datatype and email is clustered Primary key.

Mainting large table is becoming difficult for us.
Planning to split this table

Please suggest us the best way of how can be done this..

No front end access for this.

vaibhavktiwari83
Aged Yak Warrior

843 Posts

Posted - 2010-03-26 : 07:25:28

You want to do normalization or what ?

Vaibhav T

DBTEAM
Starting Member

4 Posts

Posted - 2010-03-26 : 07:42:17

No idea
Actually we need all the column information most of the times. when the records count is increasing ,

mainting single table like this hurts the performance.

B'coz monthly around 5 lakhs we 'll insert into that table.

Please help me out how can i come out of this...

Is Normalization is better way?
But every time i need to join all the tables right if i go for this

vaibhavktiwari83
Aged Yak Warrior

843 Posts

Posted - 2010-03-26 : 09:57:49

If you dont go for the normalization then you can split the table in 3 or 4 parts and put the one primary key column common to all the tables.

As in our database we are also doing the same as we have mstsite having a lot of columns so we are storing some part of information of mstsite in globalconfig_site and there we put the siteid primary key column from mstsite as it is to join these two tables.

I think normalization is nt the suggession from my side as your table is so much bulky table as normalization will organize the database in very good way but for the repots and other stuff it might be inefficient.

Vaibhav T

Kristen
Test

22859 Posts

Posted - 2010-03-26 : 12:40:18

I'm struggling to see what splitting the table achieves unless:

There are some columns that are very rarely referenced
There are some columns that are TEXT/VARCHAR(MAX) and they are blocking reindexing (because you don't have enterprise version fo SQL and cannot do REINDEX ONLINE and have no maintenance window to do it offline)

30 millions rows is large, but not huge. My gut feeling is that your time would be better spent optimising the queries.

Now, if the data can be Partitioned that might well help. But if this is just a table of People, or Companies, I'm not sure that partitioning is as beneficial as, say, invoices where 90% of the data help is old, and rarely referenced.

Bustaz Kool
Master Smack Fu Yak Hacker

1834 Posts

Posted - 2010-03-26 : 19:27:46

I'll suggest three approaches to consider:

1) Horizontal partitioning - Identical tables that each hold a subset of the data. You divide the data according to some "useful" value. An example of this would be having a separate Sales table for each year (SalesCurrent, Sales2009, Sales2008, SalesHistorical).

2) Vertical partitioning - Have separate tables that each hold a subset of the columns in your current table. You might be able to split it along the lines of "commonly used" columns and "rarely used" columns or you might be able to divide the columns along functional lines (e.g. Columns used by everyone, columns used mostly by Sales, columns used mostly by Operations and columns used mostly be Support). You will need to include your primary key in each table.

3) Hardware partitioning - The data is actually stored in separate files but appears to to be a single table. [BOLD]IF[/BOLD] you can find a sensible way to partition the data that works for you, (e.g., Geography) and [BOLD]IF[/BOLD] your queries reference this partitioning data (e.g., WHERE Region = 'NORTH'), then the SQL engine acts only on the single, smaller data file and gains the performance boost.

All of these are discussed in further depth in BOL.

HTH

=======================================
There are no passengers on spaceship earth. We are all crew. -Marshall McLuhan, educator and philosopher (1911-1980)

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-03-26 : 19:54:37

I would remove the clustered primary key as index on email column.
An email column as clustered? With 30+ million records?
What do you think happens when you insert email value "aaron@gmail.com"?

Make the index over email as unique nonclustered index instead, and use INCLUDE to incorporate the other columns.

N 56°04'39.26"
E 12°55'05.63"

Kristen
Test

22859 Posts

Posted - 2010-03-29 : 03:52:01

"What do you think happens when you insert email value "aaron@gmail.com"?"

It will be inserted right after "aardvark@hotmail.com"

Peso: perhaps clarify what your recommended Clustered Index would be?

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2010-03-29 : 06:31:04

And before the other 30 million records

The short answer is "It depends". You have to know what type of queries are run against this table.
Without knowing this, I would guess clustered index on ID column would be a not-bad choice.

If email is used as the key instead, I would probably go for a unique non-clustered index over email, and INCLUDE other columns as they are used.

N 56°04'39.26"
E 12°55'05.63"

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources