Author |
Topic |
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-06-30 : 12:22:19
|
HiI'm not sure if this is posted in the correct forum topic area. Please advise if not and I'll re-post. I'm just looking for some general advice. I have an online application that uses SQL 2005 and ASP.NET. Some database tables get a lot more use than other tables, and today i was asked how 'full' these tables need to get before the server starts struggling to process requests. Currently the database tables are very small (less than 10,000 rows), so to test the question I duplicated the table into a test database and inserted 15,000,000 rows into it (making sure to duplicate the table index as well).Suddenly SELECT queries that take a fraction of a second on the main database table now take over a minute on the test data. This is okay for me, but not for my clients who have no patience and would think the application has stopped working for them.So I'm looking for advice please to see what DBA's generally do when databases become very large? I know I could upgrade the server to a more powerful spec, but I've heard buzzwords like "data-warehousing" etc and wondered whether this is relevant here? Please feel free to share your methods with me so I can go away and research it. Thank you in advance! |
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
X002548
Not Just a Number
15586 Posts |
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-06-30 : 15:59:15
|
quote: Originally posted by tkizer 15 million rows in a table is small these days. You are probably missing indexes on your table if your queries are performing slow.
Hi and thanks for your replies. Here is the DDL code:CREATE TABLE [dbo].[tbl_Test]( [ID] [int] IDENTITY(1,1) NOT NULL, [sID] [int] NOT NULL, -- foreign key [sgID] [int] NOT NULL, -- foreign key [userID] [int] NOT NULL, -- foreign key [EventID] [int] NOT NULL, -- foreign key [newSkillLevel] [numeric](5, 2) NULL, [targetScore] [numeric](5, 2) NOT NULL, [comments] [nvarchar](max), [adminComments] [nvarchar](max), [dateAttained] [datetime] NOT NULL DEFAULT (getutcdate()), CONSTRAINT [PK_tbl_Test_1] PRIMARY KEY CLUSTERED ([ID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]) ON [PRIMARY] And here are the two indexes that exist:ALTER TABLE [dbo].[tbl_Test] ADD CONSTRAINT [PK_tbl_Test_1] PRIMARY KEY CLUSTERED ([ID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]CREATE NONCLUSTERED INDEX [sgID_sID_uID] ON [dbo].[tbl_Test] ( [sgID] ASC, [sID] ASC, [userID] ASC)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] Please note that I haven't copied the primary key tables/columns from my original database because it would have taken a long time to recreate primary key data for each, so this is a more simplified example.The SELECT statements vary quite a lot based on what data is needed, but the WHERE clause always has the following content:WHERE (sID = @sID) AND (sgID = @sgID) AND (userID = @userID OR @userID IS NULL) AND (eventID = @eventID OR @eventID IS NULL) Because the database is small, I haven't rebuilt the indexes. When I inserted the test rows, would I have needed to rebuild afterwards? I would have thought the index would have been in a good state directly afterwards?It gives me some reassurance that 15 million rows is relatively small! I look forward to reading more of your advice! |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-06-30 : 16:05:53
|
http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/--Gail ShawSQL Server MVP |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-07-01 : 04:05:24
|
quote: Originally posted by tkizer Are there joins in your query referencing the foreign key tables? If so, are those columns indexed? SQL Server does not add indexes on foreign keys. SQL Server does do it for primary keys though. I make it a standard practice to add an index to all of my foreign keys.
That's good advice, I was not aware that indexes on foreign keys are not added. My code doesn't always join to the primary key columns/tables, but my index in the above table does include all the foreign key tables.quote: Originally posted by GilaMonster [url]http://sqlinthewild.co.za/index.php/2009/03/19/catch-all-queries/[/url]
Hi Gail. Within the replies to your post there is this reply:quote: Instead of using this form:WHERE (key1 = @key1 OR @key1 IS NULL)AND (key2 = @key2 OR @key2 IS NULL)AND (key3 = @key3 OR @key3 IS NULL you should consider using this:WHERE (key1 = @key1 AND @key1 IS NOT NULL)OR (key2 = @key2 AND @key2 IS NOT NULL)OR (key3 = @key3 AND @key3 IS NOT NULL) ...I have found in my testing that this solution performs very well; indexes ARE used, performance is great, and dynamic code is not required.
Is this something that you agree with? I'd have to test out this code on my test table, but I assume it does produce the same output, using fewer resources? Or, is the code in your original post actually better:WHERE (ProductID = CASE WHEN @Product IS NULL THEN ProductID ELSE @Product END) AND |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-07-01 : 07:05:06
|
Read the comments below that as well. It's not something I've tested (no time recently) but a later commenter noted that the revised query form had different behaviour from the original.As for the CASE in my original post, I explained right below how that behaves - table scan. A table scan on 15 million rows is not going to be pretty.--Gail ShawSQL Server MVP |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-07-01 : 09:24:28
|
Hi GailSorry if I've missed something really obvious, but I don't follow your answer. I appreciate that CASE is not good, as I read in your article. However, the article comment I quoted about doesn't use CASE.quote: Read the comments below that as well.
The only comments below seem to refer to dynamic SQL, which I'm not using. I'm just loading up my laptop now with the test database on, to try out the alternative approach that I quoted above...Taking my post a step further, suppose I utilise the correct indexes and get the SELECTs tuned up, but the table grows to, say, 100,000,000 rows and things are still too slow. Providing I haven't retired and am living in paradise, what is the next step in improving efficiency? Can the database be spead across multiple machines? Again I'm sorry for my lack of knowledge here. This is the first time I've addressd the subject of huge database management. |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-07-01 : 09:43:56
|
Actually I just ran the queries and they produce totally different results, which makes perfect sense when I actually read the code |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-07-01 : 10:13:39
|
One thing is confusing me slightly. When I run a stored procedure against my table (which only searches the three foreign key (indexed) columns), in the Execution Plan I see this:Nested Loops(Inner Join)Cost: 22%Index Scan[dbo].[tbl_Test]Cost: 78%Key Lookup[dbo].[tbl_Test]Cost: 0% Surely if the stored procedure is only searching indexed columns, shouldn't it use an Index Seek rather than a Scan? Also why is there an Inner join mentioned, when their is no join in the stored procedure?The query took 55 seconds to search 15,000,000 rows, but I'd like to reduce that if possible without a hardware upgrade! |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-07-01 : 10:30:09
|
It's because of the catch-all query. SQL cannot index seek with a query of that form. Index scan is the best that it can do.The join is between the data retrieved from the index scan and the data (columns needed b ut not in the used index) retrieved by the key lookup.Remove/replace those (@Variable IS NULL OR Column = @Variable) predicates and you'll almost certainly get a much better performance. Queries of that form DO NOT perform well.--Gail ShawSQL Server MVP |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-07-01 : 10:46:41
|
One option is to do something like:Is @Param1 used?make a first cut on that, getting PKs into @TempTableIf @Param1 IS NULL and @Param2 IS NOT NULL then make a first cut on thatChoose Param1 (i.e. the one you process first for the first-cut) to be the one that will give the most efficient query balanced against how frequently it is likely to be used. No sense having something "first" that grabs 90% of the database every time!Then join @TempTable to the others using the catch-all query to refine toe exactly what you wantYou may have to move the various First-Cut queries into separate Sprocs to get the query plans cached without interference from the variety of paths that could be exercised in an all-in-one SprocBut, having typed that, we are moving towards dynamic (but parametrised) SQL in SQL2008 now that we can have EXECUTE permission on Sprocs and use certificates to allow dynamic SQL to be used [without SELECT permissions being required on the individual tables] |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-07-01 : 11:06:57
|
Thank you both for your comments. I wish I had learned that advice a long time ago.Within the table, the [eventID] column can only ever be one of seven possible integer values. Therefore, should I use that first with the WHERE clause to eliminate the most data straight away?Could you possibly show me a working example of your advice based on this initial Sproc that I used? I realise this query is not suitable for my large table as it currently is written:CREATE PROC myTest @sID int = NULL, @sgID int = NULL, @userID int = NULL, @eventID int = NULLAS SELECT * FROM [myTest] WHERE (eventID = @eventID OR @eventID IS NULL) AND (sID = @sID OR @sID IS NULL) AND (sgID = @sgID OR @sgID IS NULL) AND (userID = @userID OR @userID IS NULL) Thanks again. This is really a good learning exercise for me |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-07-01 : 11:37:50
|
The WHERE clause won't care what is first - Query Optimiser will look at the whole thing and then decide "how" to tackle it - i.e. create a query plan.If EventID has only seven distinct values it will be a poor index candidate (not highly selective), so is likely to cause a table scan |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-07-01 : 13:04:44
|
I strongly recommend that you look at the dynamic SQL alternative on my blog. It will be fase regardless of what parameters are passed and what won't, and it can effectively use indexes.--Gail ShawSQL Server MVP |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-07-02 : 12:14:59
|
Having removed eventID from the index, then rebuilt the index, and finally altered the Sproc so that it now only searches sID, sgID and userID (the indexed columns), the execution time has fallen to 31 seconds.Within the execution plan, it still shows INDEX SCAN 72% when the proc is used as follows:EXEC testProc @sID=500, @sgID=500, @userID=500Could you please advise why an index SEEK is not being used instead, when only indexed columns are being searched? |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-07-02 : 13:00:39
|
As I stated earlierquote: It's because of the catch-all query. SQL cannot index seek with a query of that form. Index scan is the best that it can do.
--Gail ShawSQL Server MVP |
|
|
R
Constraint Violating Yak Guru
328 Posts |
Posted - 2010-07-04 : 03:57:14
|
Hi GailI incorrectly presumed that when values were passed to the Sproc for each parameter, that the query optimizer would interpret this as:WHERE (sID = 500) AND (sgID = 600) AND (userID = 700) Thanks for clarifying, and for sharing your knowledge with me. It is much appreciated |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2010-07-04 : 05:37:22
|
No. The optimiser cannot rewrite the proc. Remember execution plans are cached and reused for later executions of the proc. Any plan generated has to be safe for reuse. That is, has to return correct results no matter what parameters are passed. Hence it must keep the possibility for the params to be null, because on future executions that may be the case.--Gail ShawSQL Server MVP |
|
|
Kristen
Test
22859 Posts |
Posted - 2010-07-04 : 05:51:26
|
"WHERE (sID = 500) AND (sgID = 600) AND (userID = 700)"Hence my suggestion to check for parameters being non-nullIF @sID IS NOT NULLBEGIN SELECT ... FROM ... WHERE (sID = @sID) AND (@sgID IS NULL OR sgID = @sgID) AND (@userID IS NULL OR userID = @userID) ENDELSEIF @sgID IS NOT NULLBEGIN SELECT ... FROM ... WHERE sgID = @sgID AND (@userID IS NULL OR userID = @userID) ENDELSE ... although, as I suggested, each of those queries may need to be a separate sub-Sproc call so that each gets its own query planPut tests for the most common / most efficient query first. |
|
|
Next Page
|