clustered index on a VERY big table

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

Site Related Forums

The Yak Corral

clustered index on a VERY big table

Author

Topic

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2006-10-26 : 23:39:21

Anyone care to guess how long it took me to build a clustered index on a table with 900 million rows? This is the largest amount of data in a single table I have had to work with thus far in my career

Some details:

1. running sql 2005 on a dual proc 32bit server, 8gb ram, hyperthreaded, 3ghz clock

2. ddl for table:


CREATE TABLE [dbo].[fld](
	[id] [bigint] NOT NULL,
	[id2] [tinyint] NOT NULL,
	[extid] [bigint] NOT NULL,
	[dd] [bit] NOT NULL,
	[mp] [tinyint] NOT NULL,
	[ss] [tinyint] NOT NULL,
	[cc] [datetime] NOT NULL,
	[ff] [tinyint] NOT NULL,
	[mm] [smallint] NOT NULL,
	[ds] [smallint] NOT NULL
)

3. ddl for index (this is the only index on the table):


CREATE CLUSTERED INDEX [CIfld] 
ON [dbo].[fld] 
(
	extid asc
)WITH (FILLFACTOR=100, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF)

4. extid column was not sorted to begin with. ordering was completely random.

Note that I have changed the column names, etc, to protect the innocent.

SqlSpec - a fast, cheap, and comprehensive data dictionary generator for
SQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-10-27 : 03:42:57

15 minutes.

Peter Larsson
Helsingborg, Sweden

mr_mist
Grunnio

1870 Posts

Posted - 2006-10-27 : 03:47:32

Would depend on the disks also.

I guess six hours.

-------
Moo. :)

harsh_athalye
Master Smack Fu Yak Hacker

5581 Posts

Posted - 2006-10-27 : 04:09:25

My guess is something between 30 to 50 minutes.

So what's the answer ?

Harsh Athalye
India.
"Nothing is Impossible"

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2006-10-27 : 04:47:24

2 hours 53 minutes and 42 seconds.

Go with the flow & have fun! Else fight the flow

blog thingie: http://weblogs.sqlteam.com/mladenp

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2006-10-27 : 05:15:53

~6 hours

--------------------
keeping it simple...

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2006-10-27 : 05:58:38

True, I forgot to tell you about the disks. The server is on a decent SAN. not sure of specs tho.

anyway, it took 1hr 14min.

I was expecting it to take much longer! Somehow I feel like indexes build MUCH faster on 2005 than 2000. I haven't done any tests to prove it though.

SqlSpec - a fast, cheap, and comprehensive data dictionary generator for
SQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org

mr_mist
Grunnio

1870 Posts

Posted - 2006-10-27 : 06:04:32

I reckon 1.14 is a good time.

The real question though is how long it takes to extract any useful queries from that much data. Back where I used to work we had one table with 225 million rows, and it was hours of work to get the simplest of queries.

-------
Moo. :)

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2006-10-27 : 06:16:27

It's thankfully not a transactional table. Once it's built there are no inserts/updates/deletes.

It only has one query that runs against it, a select ordered by the clustered index, no where clause. So it's a farly cheap query.

It does take a while to go through the whole table though!

SqlSpec - a fast, cheap, and comprehensive data dictionary generator for
SQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2006-10-27 : 07:10:05

no where? so you have an index scan, no?

Go with the flow & have fun! Else fight the flow

blog thingie: http://weblogs.sqlteam.com/mladenp

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-10-27 : 07:13:36

So every SELECT against the table returns 900 million rows?
Then why even bother to create an index?

Peter Larsson
Helsingborg, Sweden

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2006-10-27 : 10:38:56

I know it sounds crazy - this table is the input for a huge statistical calculation I have to do. Can't go into details though, NDA type stuff.

Yes, I need all 900 million because I am doing a calculation on the whole set.

I need the index because the calc requires the result set to be ordered by extid.

SqlSpec - a fast, cheap, and comprehensive data dictionary generator for
SQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-10-27 : 10:45:41

The query will be much, much faster if you add a covering index.

Peter Larsson
Helsingborg, Sweden

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2006-10-27 : 11:08:07

maybe this'll will be of use?
http://weblogs.sqlteam.com/mladenp/articles/9502.aspx

Go with the flow & have fun! Else fight the flow

blog thingie: http://weblogs.sqlteam.com/mladenp

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2006-10-27 : 12:14:32

I don't find the query being the bottlneck. The query to get the data returns with the first rows pretty much immediately. what takes time is crunching the numbers. Luckily the calculation can be parallelized so I use multiple threads. Even so it keeps a 4-way, hyperthreaded box pretty busy for a while

Mladen - good tip. I do find that if I add a meaningless where clause that my plan goes from a scan to a seek. I don't see much difference in how fast the first rows come back though.

SqlSpec - a fast, cheap, and comprehensive data dictionary generator for
SQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2006-10-27 : 15:13:39

Maybe the "Number chrunching" can be speedier with better algorithm?

Peter Larsson
Helsingborg, Sweden

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2006-10-27 : 16:23:23

quote:
Originally posted by Peso
Maybe the "Number chrunching" can be speedier with better algorithm?

Maybe, but it's more than fast enough as is. Thanks for the ideas though!

SqlSpec - a fast, cheap, and comprehensive data dictionary generator for
SQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org

Kristen
Test

22859 Posts

Posted - 2006-10-28 : 03:34:08

Reminds me of an oil exploration company (processing geological data looking for oil) that had a CPU monitor alert - to alert the ADMINS if the CPU activity fell BELOW 90% !!

jezemine
Master Smack Fu Yak Hacker

2886 Posts

Posted - 2006-10-28 : 09:52:11

Actually in order to *generate* the 900m row dataset (a pure calc, no sql involved), I kept NINE 4-way servers pegged at 100% cpu for 7 days! I didn't have an alert though. Just checked taskmanager every once in a while to make sure each was still pegged.

I like to think about how hot the rack got in the server room during that marathon. Better wear oven mitts!

SqlSpec - a fast, cheap, and comprehensive data dictionary generator for
SQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources