Author |
Topic |
matti
Starting Member
4 Posts |
Posted - 2010-12-27 : 10:34:14
|
Hi, I created table with 10 columns, one column(RecordID INT) as primary key, identity, clustered index. There is about 70 millions of records in this table now and some performance issues are arising in queries. This select runs fast: SELECT* FROM MyTable WHERE RecordID = MyIDBut this is another story: SELECT* FROM MyTable WHERE RecordID IN (some large set of IDs, e.g 2000 values). The latter one takes up to 120 seconds, which is extremly slow.Im confused because I thought this is the most simple query and there is no way I could tune it or make it more simple. Im looking up the clustered index and afaik it is the most simple lookup that can be made. Am I wrong or is it just caused by such large number of rows? Is there anything I can do except using partitions on the table?Thank you for any advice! |
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2010-12-27 : 10:49:05
|
>> Is there anything I can do except using partitions on the table?What makes you think that will help?Have you updated statistic and recompiled plans?Have a look at the query plan - I guess it's doing a scan.Do you need the select * or can you restrict the number of columns and add a covering indexShouldn't make a difference but trySELECT t.* FROM MyTable tjoin (some large set of IDs, e.g 2000 values) aon t.RecordID = a.RecordIDalso tryselect (some large set of IDs, e.g 2000 values)into #acreate unique clustered index ix on #a RecordIDSELECT t.* FROM MyTable tjoin #a aon t.RecordID = a.RecordID==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-27 : 12:27:28
|
"IN (some large set of IDs, e.g 2000 values)."Frequently slow IME.Use JOIN (if there is only one matching row) or EXISTS (if there are multiple matching rows) as they are, IME, usually faster than "IN". make sure the columns used in the JOIN / EXISTS are indexed. |
|
|
matti
Starting Member
4 Posts |
Posted - 2010-12-28 : 07:46:21
|
Thank you all for your comments.To make it as simple as possible Im selecting only the RecordID from MyTable At first I used table variable as storage table for set of IDs:DECLARE @34 TABLE([RecordID] INT PRIMARY KEY CLUSTERED)and joined with MyTable. The select took the same time as before. So JOIN is not faster over IN in this case.Then I created #tmp table and joined with MyTable again and this was crucial improvement. From 120 seconds at best for the first case, now it takes only 15 seconds. Still it is too long. I would like to get it down to 5 sec. Do you think it is possible?This is the code:CREATE TABLE #tmp([RecordID] INT PRIMARY KEY CLUSTERED)INSERT INTO #tmp MyIDsSELECT o.RecordIDFROM [dbo].[Detail] oINNER JOIN #tmp ON #tmp.RecordID = o.RecordID Here is execution plan: http://www.4freeimagehost.com/show.php?i=aadc2754eae3.pngAs for the partitions, it is possible to divide my large table into partitions by month and then the select is supposed to scan only one partition(there would be additional date condition). |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-12-29 : 09:49:26
|
I can't see that selected 2,000 rows from a table is going to take 15 seconds ... or 5 seconds for that matter ... provided you have indexes and their statistics are up to date.Can't tell from the picture you posted, but it looks like you are selecting on the Clustered Index on the main table.So maybe the time is taken up with something else? Whatever you are doing in the "INSERT INTO #tmp MyIDs" step perhaps? |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-29 : 13:41:02
|
Indeed, but the Query Plan shown is only for the SELECT I think?Doesn't seem reasonable to me that a JOIN between an indexed temp table containing 2,000 records and the Clustered Index INT on the main table would take 5 seconds or more.Thus I wonder if the timing includes other steps - such as Splitting a CSV or somesuch ...(Or the index on the main table is badly fragmented perhaps) |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-29 : 16:53:38
|
I missed the 96% of batch cost, thanks.Still doesn't feel right to me that selecting 2,000 rows on an INT Clustered Index key would take so long. So maybe fragmentation?e.g.SELECT *FROM(SELECT [Database] = db.name, [Object ID] = ps.OBJECT_ID, [Name] = OBJECT_NAME(ps.OBJECT_ID), [Index ID] = ps.index_id, [Index Name] = I.[name], [Partition No.] = ps.partition_number, [Fragmentation] = ps.avg_fragmentation_in_percent, [Page count] = ps.page_countFROM sys.dm_db_index_physical_stats (db_id(db_name()), NULL, NULL , NULL, N'Limited') AS ps JOIN sys.databases AS db ON db.database_id = ps.database_id JOIN sys.indexes AS I ON I.object_id = ps.object_id AND I.index_id = ps.index_idWHERE ps.index_id > 0 AND OBJECT_NAME(ps.OBJECT_ID) = 'MyTableName') AS XORDER BY [Name], [Index ID]OPTION (MaxDop 1) |
|
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2010-12-29 : 20:43:51
|
% batch cost doesn't translate to time.It might do but it's possible to have 90% cost and 10% time.Treat it with caution.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2010-12-29 : 23:03:17
|
Take a query - run it twice Will have 50% cost for each.If there is enough memory for the data to be held the first query will be doing disk reads the second won't and will be much fasster.All part of the art of building efficient systems. Make sure you do as much processing as possible on data while it is in memory.I've seen people spend a lot of time optimising things due to cost when they actually don't improve the time spent executing the batch. It's why I always say to not rely on the query plan alone - also log the start and end time of statements in batch systems.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
nigelrivett
Master Smack Fu Yak Hacker
3385 Posts |
Posted - 2010-12-29 : 23:25:09
|
That example has one starting point and the assumption that cache will be empty.Problem is that if you are running an SP then probably when it starts the data won't be in cache but as it runs statements data will be read and available for a time. If you don't take this into account you can spend time optimising things that aren't necessary and also miss out on a lot of efficiency to be gained by ordering the processing.Anyway cost and time are not related. I worked with someone who was adamant he could predict the time taken by any statement from the cost - even after I showed him timings to disprove it.==========================================Cursors are useful if you don't know sql.SSIS can be used in a similar way.Beer is not cold and it isn't fizzy. |
|
|
matti
Starting Member
4 Posts |
Posted - 2011-01-03 : 11:24:15
|
Allways Im running this statement before I do any measurements on my select: DBCC FREEPROCCACHE DBCC DROPCLEANBUFFERS 4% of the batch is for the initial commands: CREATE TABLE #tmp([RecordID] INT PRIMARY KEY CLUSTERED)INSERT INTO #tmp MyIDsand 96% is for the main select(takes 15 seconds or more). So the time is measured only for the main select.The fragmentation of the clustered primary key index on RecordID is 4%. Page Count for this index is 400 000. Index maintenance is performed every day.I have tried these "improvements" to no avail:1.) SELECT o.RecordID FROM [Detail] inner join #tmp ON #tmp.RecordID = o.RecordID OPTION(MERGE JOIN).2.) SELECT #tmp.RecordID FROM #tmp inner join [Detail] ON #tmp.RecordID = o.RecordID OPTION(FORCE ORDER).3.) SELECT o.RecordID FROM [Detail] WITH(NOLOCK) inner join #tmp ON #tmp.RecordID = o.RecordID.4.) I applied cursor on #tmp table, and for each record from cursor i tried to find RecordID in [Detail] table(i tried this only because the select is executed really fast if there is condition: WHERE RecordID = SomeRecordID). This last shot was really the slowest one.To sum up all your comments: - 70 millions of rows is not that much for properly designed table(which i think it is, because there is only one primary key clustered index)- this table has primary key clustered index on RecordID and there is no apparent reason for my select to take so long as long as the selected value and column used in condition\join clause is only RecordID.- table indexes are rebuilded every day and my measurements are performed on table with clustered index(RecordID) fragmented up to 4%Now Im going to check situation on hardware level if there is any bottleneck. As long as this database is pretty much in use and server I\O rise up to 10MB\s providing data to many applications, this could be an issue. I will provide you with results this week. Thank you all for your comments |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-01-03 : 11:34:27
|
I think the hardware is unlikely to be the limiting factor. But ... I share you conclusions ... and thus checking the hardware would at least allow it to be excluded as a possible cause (or, indeed, for it to BECOME the cause ) |
|
|
matti
Starting Member
4 Posts |
Posted - 2011-01-20 : 11:55:19
|
At first, thank you all for your advice, I learned a lot from your comments. The problem i had was an hardware issue. There is a large number of databases on the server. I allways checked processor and memory usage and only ocassionaly took a look at disk I\O. Moreover disk I\O was not critical high(up to 20MB\sec at peak), but the bottleneck was large number of random accesses. As long as we found this out, we moved some databases at new hard drive and everything works fine. |
|
|
|