Author |
Topic |
DBTEAM
Starting Member
4 Posts |
Posted - 2010-03-26 : 04:41:29
|
Hi All,Splitting a Large tableWe have a 30 million + table with 30 fieldsas followsIDnamejobcompanyaddresscitystatezipphonefax.........Emaillike this we have 30 fieldsIn the above ID column is Identity datatype and email is clustered Primary key.Mainting large table is becoming difficult for us.Planning to split this tablePlease suggest us the best way of how can be done this..No front end access for this. |
|
vaibhavktiwari83
Aged Yak Warrior
843 Posts |
Posted - 2010-03-26 : 07:25:28
|
You want to do normalization or what ?Vaibhav T |
 |
|
DBTEAM
Starting Member
4 Posts |
Posted - 2010-03-26 : 07:42:17
|
No ideaActually we need all the column information most of the times. when the records count is increasing , mainting single table like this hurts the performance.B'coz monthly around 5 lakhs we 'll insert into that table.Please help me out how can i come out of this...Is Normalization is better way?But every time i need to join all the tables right if i go for this |
 |
|
vaibhavktiwari83
Aged Yak Warrior
843 Posts |
Posted - 2010-03-26 : 09:57:49
|
If you dont go for the normalization then you can split the table in 3 or 4 parts and put the one primary key column common to all the tables.As in our database we are also doing the same as we have mstsite having a lot of columns so we are storing some part of information of mstsite in globalconfig_site and there we put the siteid primary key column from mstsite as it is to join these two tables.I think normalization is nt the suggession from my side as your table is so much bulky table as normalization will organize the database in very good way but for the repots and other stuff it might be inefficient.Vaibhav T |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-03-26 : 12:40:18
|
I'm struggling to see what splitting the table achieves unless:There are some columns that are very rarely referencedThere are some columns that are TEXT/VARCHAR(MAX) and they are blocking reindexing (because you don't have enterprise version fo SQL and cannot do REINDEX ONLINE and have no maintenance window to do it offline)30 millions rows is large, but not huge. My gut feeling is that your time would be better spent optimising the queries.Now, if the data can be Partitioned that might well help. But if this is just a table of People, or Companies, I'm not sure that partitioning is as beneficial as, say, invoices where 90% of the data help is old, and rarely referenced. |
 |
|
Bustaz Kool
Master Smack Fu Yak Hacker
1834 Posts |
Posted - 2010-03-26 : 19:27:46
|
I'll suggest three approaches to consider:1) Horizontal partitioning - Identical tables that each hold a subset of the data. You divide the data according to some "useful" value. An example of this would be having a separate Sales table for each year (SalesCurrent, Sales2009, Sales2008, SalesHistorical).2) Vertical partitioning - Have separate tables that each hold a subset of the columns in your current table. You might be able to split it along the lines of "commonly used" columns and "rarely used" columns or you might be able to divide the columns along functional lines (e.g. Columns used by everyone, columns used mostly by Sales, columns used mostly by Operations and columns used mostly be Support). You will need to include your primary key in each table.3) Hardware partitioning - The data is actually stored in separate files but appears to to be a single table. [BOLD]IF[/BOLD] you can find a sensible way to partition the data that works for you, (e.g., Geography) and [BOLD]IF[/BOLD] your queries reference this partitioning data (e.g., WHERE Region = 'NORTH'), then the SQL engine acts only on the single, smaller data file and gains the performance boost.All of these are discussed in further depth in BOL.HTH=======================================There are no passengers on spaceship earth. We are all crew. -Marshall McLuhan, educator and philosopher (1911-1980) |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-03-26 : 19:54:37
|
I would remove the clustered primary key as index on email column.An email column as clustered? With 30+ million records?What do you think happens when you insert email value "aaron@gmail.com"?Make the index over email as unique nonclustered index instead, and use INCLUDE to incorporate the other columns. N 56°04'39.26"E 12°55'05.63" |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-03-29 : 03:52:01
|
"What do you think happens when you insert email value "aaron@gmail.com"?"It will be inserted right after "aardvark@hotmail.com" Peso: perhaps clarify what your recommended Clustered Index would be? |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2010-03-29 : 06:31:04
|
And before the other 30 million records The short answer is "It depends". You have to know what type of queries are run against this table.Without knowing this, I would guess clustered index on ID column would be a not-bad choice.If email is used as the key instead, I would probably go for a unique non-clustered index over email, and INCLUDE other columns as they are used. N 56°04'39.26"E 12°55'05.63" |
 |
|
|