Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
migil
Starting Member
4 Posts |
Posted - 2004-09-08 : 08:27:26
|
| This is an odd one. Almost certainly related to query plans, index use, etc.I have a simple query (part of a large stored procedure) returning a single row from a two table join.The tables have ~1,000,000 and ~5,000,000 rows respectively. The join is on single column (integer, and the indexed primary key in one table) and two columns on each table are matched to local @variables. The two tables have several possible indexes to use.This morning the query was giving me a massive performance hit (~8 sec, instead of ~20 ms). Performance was restored if I replaced the local variables with specific values (integers all) - !! So I pulled this query out of the procedure and put it three times (query with values/@variables/values) into (a) its own proc and (b) a script. I.e:begin declare @v integer, @n integer select @n = 1, @n = 2 select * from a, b where a_key = b_key and a_value = 1 and b_number = 2 select * from a, b where a_key = b_key and a_value = @v and b_number = @n select * from a, b where a_key = b_key and a_value = 1 and b_number = 2end[typical elapsed times 13ms, 7500 ms, 0 ms - so not a caching issue]The results were inconsistent, odd and annoying...On two out of three servers the same performance hit was observed in the procedure using local @variables compared to values. The performance hit was NOT observed in the script. Nor was it observed in either the proc or the script in the third server...Using query analyser (which I am not as familiar with as I ought to be) it is clear that slightly different query plans are being used with the introduction of a 'bookmark/filter' (?) in the versions that ran faster. The third server (which is an older version, 8.00.194) uses the same query plan each time.What is going on?!And more importantly, how do I fix it to run the faster version of the query plan!Thanks for your help.mike |
|
|
mr_mist
Grunnio
1870 Posts |
Posted - 2004-09-08 : 08:45:58
|
| Smallint would be faster than int if you can afford the fewer digits.-------Moo. :) |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-09-08 : 08:47:11
|
| sounds an unusual problem...however...some simple stuff to begin with....1...use a tool like redgate to compare server definitions....are all identical in structure/load?2..."(which is an older version, 8.00.194) "....version of what?3...can you supply ddl for affected tables?...including indices?4..."select * from a inner join b on a_key = b_key where a_value = 1 and b_number = 2" is a more readable format...and it also allows specifying of index hints.5...you can/should clear/reset query plans before each phase of the queries/scripts...for benchmarking.6...can you post expected/actual execution plans?7...SQL Profiler is another tool for your problem investigation toolit. |
 |
|
|
migil
Starting Member
4 Posts |
Posted - 2004-09-08 : 09:31:35
|
| [quote]Originally posted by AndrewMurphy1....are all identical in structure/load?Similar - same database reloaded into other two servers - all effectively single user.2..."(which is an older version, 8.00.194) "....version of what?Microsoft SQL Server 2000 - 8.00.194 (Intel X86) Aug 6 2000 00:57:48 etc..3...can you supply ddl for affected tables?...including indices?Posted below.4...and it also allows specifying of index hints.I use 'table with(index(index_name))' ok in my current format. In this case using the obvious (unique key) indexes has no effect. 5...you can/should clear/reset query plans ...for benchmarking.Is this possible in a stored procedure? How do you do it?6...can you post expected/actual execution plans? Need to work out how to get them in a postable form...7...SQL Profiler is another tool ..Will investigate.DDLs for tables:------------------------------------------------create table sequences( seq_id ID, seq_type CODE, seq_source CODE, seq_obj_id ID, seq_end tinyint NULL, ::: (other rows) seq_masked char(1) null, seq_revcomp char(1) null, seq_tag varchar(16) null, seq_sequence text)gocreate unique nonclustered index seq_unique on sequences ( seq_id )create nonclustered index seq_source on sequences ( seq_obj_id )create unique nonclustered index seq_object on sequences ( seq_obj_id, seq_end, seq_version, seq_revcomp, seq_masked, seq_tag )go------------------------------------------------create table project_sequences( psq_prj_id ID, psq_version integer default 1, psq_seq_id ID, psq_parent_seq_id ID null, psq_offset integer null ::: (other rows))create unique nonclustered index psq_unique on project_sequences ( psq_prj_id, psq_version, psq_seq_id )create nonclustered index psq_parent on project_sequences ( psq_prj_id, psq_version, psq_parent_seq_id )create nonclustered index psq_seq on project_sequences ( psq_seq_id )go------------------------------------------------ |
 |
|
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2004-09-08 : 10:09:21
|
| in your query what corresponds to what?"select * from a, b where a_key = b_key and a_value = 1 and b_number = 2"a = sequencesb = project_sequencesa_key = seq_idb_key = psq_seq_ida_value = ?b_number = ?is this a parent-child relationship....based on seq_id->psq_seq_id?if so...reversing (psq_seq_id, psq_version, psq_prj_id) in index psq_unique would give the same uniqueness as before...with less/no need for index psq_seq...at no obvious cost....(as a non-clustered index it won't re-order the data).ideally you should be including the "where cols" in indices as well...for top-of-the range performance...and avoiding the convention "Select *" |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2004-09-08 : 10:11:49
|
| 1) please show us what your User defined datatypes are (CODE, ID)2) you still didn't show us which columns you are using -- your example does not indicate which of the columns in these tables you are trying to filter on- Jeff |
 |
|
|
migil
Starting Member
4 Posts |
Posted - 2004-09-08 : 10:33:25
|
| Actual query is:select seq_id, seq_masked, seq_version, psq_sense, psq_offset from project_sequences, sequences where psq_prj_id = @prj_id and psq_version = @version and psq_seq_id = seq_id and seq_obj_id = @obj_id and seq_end = @end and seq_strand = 1 (I didn't show this column B4)I appreciate some folk like a different format.Remarks so far, most helpful, thanks.Now I've also found something interesting, if I remove the reference to seq_strand (un-indexed) it's really fast... I think the indexes now 'cover' the query. This has (partially) solved the problem, but not really answered the question of why using local @variables produced a (relatively) inefficient query plan in stored procedures... Time to review some indexes anyway.mike |
 |
|
|
migil
Starting Member
4 Posts |
Posted - 2004-09-08 : 10:35:08
|
| 1) please show us what your User defined datatypes are (CODE, ID)CODE = varchar(4)ID = integer2) you still didn't show us which columns you are using -- your (above) |
 |
|
|
kselvia
Aged Yak Warrior
526 Posts |
Posted - 2004-09-08 : 10:43:12
|
| This article on "parameter sniffing" from Bart Duncan @ Microsoft may be helpful.Copied from: http://www.examnotes.net/archive79-2002-7-48335.html--------------------------------------------------------------Brian,A couple things come to mind here. First add SET NOCOUNT ON to thebeginning of the sp and see if that helps. But more than likely yourrunning into this:The reason for the performance difference stems from a feature called"parameter sniffing". Consider a stored proc defined as follows:CREATE PROC proc1 @p1 int ASSELECT * FROM table1 WHERE c1 = @p1GOKeep in mind that the server has to compile a complete execution plan forthe proc before the proc begins to execute. In 6.5, at compile time SQLdidn't know what the value of @p1 was, so it had to make a lot of guesseswhen compiling a plan. Suppose all of the actual parameter values for"@p1 int" that a user ever passed into this stored proc were uniqueintegers that were greater than 0, but suppose 40% of the [c1] values in[table1] were, in fact, 0. SQL would use the average density of thecolumn to estimate the number of rows that this predicate would return;this would be an overestimate, and SQL would might choose a table scanover an index seek based on the rowcount estimates. A table scan wouldbe the best plan if the parameter value was 0, but unfortunately ithappens that users will never or rarely pass @p1=0, so performance of thestored proc for more typical parameters suffers.In SQL 7.0 or 2000, suppose you executed this proc for the first time(when the sp plan is not in cache) with the command "EXEC proc1 @p1 =10". Parameter sniffing allows SQL to insert the known value ofparameter @p1 into the query at compile time before a plan for the queryis generated. Because SQL knows that the value of @p1 is not 0, it cancompile a plan that is tailored to the class of parameters that isactually passed into the proc, so for example it might select an indexseek instead of a table scan based on the smaller estimated rowcount --this is a good thing if most of the time 0 is not the value passed as@p1. Generally speaking, this feature allows more efficient stored procexecution plans, but a key requirement for everything to work as expectedis that the parameter values used for compilation be "typical".In your case, the problem is that you have default NULL values for yourparameters ("@Today DATETIME = NULL, ...") that are not typical becausethe parameter values are changed inside the stored proc before they areused -- as a result NULL will never actually be used to search thecolumn. If the first execution of this stored proc doesn't pass in anexplicit value for the @Today parameter, SQL believes that its value willbe NULL. When SQL compiles the plan for this sp it substitutes NULL foreach occurrence of @Today that is embedded within a query.Unfortunately, after execution begins the first thing the stored procdoes is change @Today to a non-NULL value if it is found to be NULL, butunfortunately SQL doesn't know about this at compile time. Because NULLis a very atypical parameter value, the plan that SQL generates may notbe a good one for the new value of the parameter that is assigned atexecution time.So, the bottom line is that if you assign defaults to your sp parametersand later use those same parameters in a query, the defaults should be"typical" because they will be used during plan generation. If you mustuse defaults and business logic dictates that they be atypical (as may bethe case here if app modifications are not an option), there are twopossible solutions if you determine that the substitution of atypicalparameter values is causing bad plans:1. "Disable" parameter sniffing by using local DECLARE'd variables thatyou SET equal to the parameters inside the stored proc, and use the localvariables instead of the offending parameters in the queries. This is thesolution that you found yourself. SQL can't use parameter sniffing inthis case so it must make some guesses, but in this case the guess basedon average column density is better than the plan based on a specific but"wrong" parameter value (NULL).2. Nest the affected queries somehow so that they run within a differentcontext that will require a distinct execution plan. There are severalpossibilities here. for example:a. Put the affected queries in a different "child" stored proc. Ifyou execute that stored proc within this one *after* the parameter @Todayhas been changed to its final value, parameter sniffing will suddenlybecome your friend because the value SQL uses to compile the queriesinside the child stored proc is the actual value that will be used in thequery.b. Use sp_executesql to execute the affected queries. The plan won'tbe generated until the sp_executesql stmt actually runs, which is ofcourse after the parameter values have been changed.c. Use dynamic SQL ("EXEC (@sql)") to execute the affected queries.An equivalent approach would be to put the query in a child stored procjust like 2.a, but execute it within the parent proc with EXEC WITHRECOMPILE.Option #1 seems to have worked well for you in this case, althoughsometimes one of the options in #2 is a preferable choice. Here are someguidelines, although when you're dealing with something as complicated asthe query optimizer experimentation is often the best approach <g>:- If you have only one "class" (defined as values that have similardensity in the table) of actual parameter value that is used within aquery (even if there are other classes of data in the base table that arenever or rarely searched on), 2.a. or 2.b is probably the best option.This is because these options permit the actual parameter values to beused during compilation which should result in the most efficient queryplan for that class of parameter.- If you have multiple "classes" of parameter value (for example, forthe column being searched, half the table data is NULL, the other halfare unique integers, and you may do searches on either class), 2.c can beeffective. The downside is that a new plan for the query must becompiled on each execution, but the upside is that the plan will alwaysbe tailored to the parameter value being used for that particularexecution. This is best when there is no single execution plan thatprovides acceptable execution time for all classes of parameters.HTH -Bart------------Bart DuncanMicrosoft SQL Server SupportPlease reply to the newsgroup only - thanks.This posting is provided "AS IS" with no warranties, and confers norights.--KenI want to die in my sleep like my grandfather, not screaming in terror like his passengers. |
 |
|
|
|
|
|
|
|