Author |
Topic |
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2006-10-26 : 23:39:21
|
Anyone care to guess how long it took me to build a clustered index on a table with 900 million rows? This is the largest amount of data in a single table I have had to work with thus far in my career Some details:1. running sql 2005 on a dual proc 32bit server, 8gb ram, hyperthreaded, 3ghz clock2. ddl for table:CREATE TABLE [dbo].[fld]( [id] [bigint] NOT NULL, [id2] [tinyint] NOT NULL, [extid] [bigint] NOT NULL, [dd] [bit] NOT NULL, [mp] [tinyint] NOT NULL, [ss] [tinyint] NOT NULL, [cc] [datetime] NOT NULL, [ff] [tinyint] NOT NULL, [mm] [smallint] NOT NULL, [ds] [smallint] NOT NULL) 3. ddl for index (this is the only index on the table):CREATE CLUSTERED INDEX [CIfld] ON [dbo].[fld] ( extid asc)WITH (FILLFACTOR=100, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF) 4. extid column was not sorted to begin with. ordering was completely random.Note that I have changed the column names, etc, to protect the innocent. SqlSpec - a fast, cheap, and comprehensive data dictionary generator forSQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-10-27 : 03:42:57
|
15 minutes.Peter LarssonHelsingborg, Sweden |
 |
|
mr_mist
Grunnio
1870 Posts |
Posted - 2006-10-27 : 03:47:32
|
Would depend on the disks also.I guess six hours.-------Moo. :) |
 |
|
harsh_athalye
Master Smack Fu Yak Hacker
5581 Posts |
Posted - 2006-10-27 : 04:09:25
|
My guess is something between 30 to 50 minutes.So what's the answer ?Harsh AthalyeIndia."Nothing is Impossible" |
 |
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2006-10-27 : 04:47:24
|
2 hours 53 minutes and 42 seconds.Go with the flow & have fun! Else fight the flow blog thingie: http://weblogs.sqlteam.com/mladenp |
 |
|
jen
Master Smack Fu Yak Hacker
4110 Posts |
Posted - 2006-10-27 : 05:15:53
|
~6 hours--------------------keeping it simple... |
 |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2006-10-27 : 05:58:38
|
True, I forgot to tell you about the disks. The server is on a decent SAN. not sure of specs tho.anyway, it took 1hr 14min. I was expecting it to take much longer! Somehow I feel like indexes build MUCH faster on 2005 than 2000. I haven't done any tests to prove it though. SqlSpec - a fast, cheap, and comprehensive data dictionary generator forSQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org |
 |
|
mr_mist
Grunnio
1870 Posts |
Posted - 2006-10-27 : 06:04:32
|
I reckon 1.14 is a good time. The real question though is how long it takes to extract any useful queries from that much data. Back where I used to work we had one table with 225 million rows, and it was hours of work to get the simplest of queries.-------Moo. :) |
 |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2006-10-27 : 06:16:27
|
It's thankfully not a transactional table. Once it's built there are no inserts/updates/deletes.It only has one query that runs against it, a select ordered by the clustered index, no where clause. So it's a farly cheap query. It does take a while to go through the whole table though! SqlSpec - a fast, cheap, and comprehensive data dictionary generator forSQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org |
 |
|
spirit1
Cybernetic Yak Master
11752 Posts |
Posted - 2006-10-27 : 07:10:05
|
no where? so you have an index scan, no?Go with the flow & have fun! Else fight the flow blog thingie: http://weblogs.sqlteam.com/mladenp |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-10-27 : 07:13:36
|
So every SELECT against the table returns 900 million rows?Then why even bother to create an index?Peter LarssonHelsingborg, Sweden |
 |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2006-10-27 : 10:38:56
|
I know it sounds crazy - this table is the input for a huge statistical calculation I have to do. Can't go into details though, NDA type stuff.Yes, I need all 900 million because I am doing a calculation on the whole set.I need the index because the calc requires the result set to be ordered by extid. SqlSpec - a fast, cheap, and comprehensive data dictionary generator forSQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-10-27 : 10:45:41
|
The query will be much, much faster if you add a covering index.Peter LarssonHelsingborg, Sweden |
 |
|
spirit1
Cybernetic Yak Master
11752 Posts |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2006-10-27 : 12:14:32
|
I don't find the query being the bottlneck. The query to get the data returns with the first rows pretty much immediately. what takes time is crunching the numbers. Luckily the calculation can be parallelized so I use multiple threads. Even so it keeps a 4-way, hyperthreaded box pretty busy for a while Mladen - good tip. I do find that if I add a meaningless where clause that my plan goes from a scan to a seek. I don't see much difference in how fast the first rows come back though. SqlSpec - a fast, cheap, and comprehensive data dictionary generator forSQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org |
 |
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2006-10-27 : 15:13:39
|
Maybe the "Number chrunching" can be speedier with better algorithm?Peter LarssonHelsingborg, Sweden |
 |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2006-10-27 : 16:23:23
|
quote: Originally posted by PesoMaybe the "Number chrunching" can be speedier with better algorithm?
Maybe, but it's more than fast enough as is. Thanks for the ideas though! SqlSpec - a fast, cheap, and comprehensive data dictionary generator forSQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org |
 |
|
Kristen
Test
22859 Posts |
Posted - 2006-10-28 : 03:34:08
|
Reminds me of an oil exploration company (processing geological data looking for oil) that had a CPU monitor alert - to alert the ADMINS if the CPU activity fell BELOW 90% !! |
 |
|
jezemine
Master Smack Fu Yak Hacker
2886 Posts |
Posted - 2006-10-28 : 09:52:11
|
Actually in order to *generate* the 900m row dataset (a pure calc, no sql involved), I kept NINE 4-way servers pegged at 100% cpu for 7 days! I didn't have an alert though. Just checked taskmanager every once in a while to make sure each was still pegged. I like to think about how hot the rack got in the server room during that marathon. Better wear oven mitts! SqlSpec - a fast, cheap, and comprehensive data dictionary generator forSQL Server 2000/2005 and Analysis Server 2005 - http://www.elsasoft.org |
 |
|
|