Correlated Query Performance low when Indexes are used.

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Correlated Query Performance low when Indexes are used.

Author

Topic

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2002-01-04 : 09:28:53

Narendra writes "I have posted this question on a couple of sites for which i did not get a satisfactory response other than that it was expected behaviour and that the plans would be different because of use of indexes.

--Create table
CREATE TABLE [dbo].[Test] (
     	[TestID]	[INT] IDENTITY (1,1) NOT NULL ,
        [Fk] 		[INT] NOT NULL ,
	[TStamp]	[TIMESTAMP] NOT 
NULL 
)
GO

--Add a Clustered Index
ALTER TABLE [dbo].[Test] WITH NOCHECK ADD 
	CONSTRAINT [PK_Test] PRIMARY KEY  CLUSTERED 
	(
		[TestID]
	)
GO

--load the table with 10000 records
DECLARE @CounterFk INT,
	@Counter INT
SET  @CounterFk = 1
WHILE(  @CounterFk <= 1000 )
BEGIN
	SET @Counter = 1
	WHILE ( @Counter < = 10 )
	BEGIN
		INSERT INTO Test (Fk) Values ( @CounterFk)
		SET @Counter = @Counter + 1
	END 
	SET @CounterFk = @CounterFk + 1
END

GO
--view all teh records in the table
SELECT * FROM Test

GO
--Select statement which gives the occurence of Fk's value
--which uses Tstamp in where clause 
SELECT 
	TestOuter.TestID,
	TestOuter.Fk,
	(	
		SELECT 
			COUNT( TestInner.Fk) 
		FROM 
			Test TestInner
		WHERE 
			TestOuter.Fk = TestInner.Fk
			And 
			Cast(TestOuter.TStamp As Money)  
>= Cast( TestInner.TStamp  As Money )
	) AS Counter
FROM 
	Test TestOuter

--Select statement which gives the occurence of Fk's value
-- which uses TestID in the where clause
SELECT 
	TestOuter.TestID,
	TestOuter.Fk,
	(	
		SELECT 
			COUNT( TestInner.Fk) 
		FROM 
			Test TestInner
		WHERE 
			TestOuter.Fk = TestInner.Fk
			And 
			TestOuter.TestID >= 
TestInner.TestID
	) AS Counter
FROM 
	Test TestOuter

In the select statements above the execution plans differ
and also the time taken for executions(1 sec for involving tstamp column vs 15 sec involving pk of the table which has a clustered index) differ drastically when executed repeatedly.

I could not figure out much from teh execution plans. I would be glad if u could explain them and why the difference in performance."

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-01-04 : 12:01:37

Assuming SQL Server 2000.

The query plan for the last query is a very naive approach: for each row in TestOuter, seek the range of TestInner.TestID <= TestOuter.TestID, keep only the ones with the same Fk value and count them. While it may be doing an index seek, it is having to look at O(n^2) rows in that inner loop, even though the seek is only delivering 1-10 rows to be aggregated.

The previous query first builds a temporary index for TestInner on Fk and then seeks on that for each row TestOuter row. So it only looks at 10 rows in the temporary index for each TestOuter row. Then it throws away the ones with smaller timestamps and counts what's left. The Hash Match/Cache doesn't look very useful since it's hashing on (tstamp, fk) which is unique by definition.

As to why the first query doesn't take the temporary index approach, I don't know. Given that the query processor estimates doing it the naive way will take much longer it does seem pretty odd. What's totally weird is that it does take that approach if you ALTER TABLE TEST DROP CONSTRAINT PK_TEST first. I'd suggest it's a query optimizer bug.

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-01-04 : 15:01:37

My 2¢, could be (probably is!) completely wrong:

I don't know how or if this relates to Arnold's observations, but it may also be that you're using the WHERE clause to perform a join, instead of using the JOIN syntax:

SELECT TestOuter.TestID, TestOuter.Fk, COUNT( TestInner.Fk) AS Counter
FROM Test TestInner INNER JOIN Test TestOuter
ON (TestOuter.Fk=TestInner.Fk And Cast(TestOuter.TStamp As Money)>=Cast(TestInner.TStamp As Money))

I think this might work too:

SELECT TestOuter.TestID, TestOuter.Fk, COUNT(*) AS Counter
FROM Test TestInner INNER JOIN Test TestOuter
ON (TestOuter.Fk=TestInner.Fk And TestOuter.TStamp>=TestInner.TStamp)

I only translated the first query.

The Count(*) might generate a more favorable plan than Count(TestInner.FK). If the JOIN works without using CAST() then the performance should also pick up a little.

In my experience (not tested or confirmed) SQL Server's optimizer joins better when using the JOIN syntax rather than the WHERE syntax. The JOIN syntax seems to generate a more consistent plan between slight changes than the WHERE syntax does. This could be the "bug" Arnold mentions. I don't remember ever getting the same plan using both styles, but again I haven't really researched it either.

HTH

GreatInca
Posting Yak Master

102 Posts

Posted - 2002-01-04 : 15:30:45

The join syntax is almost always better than the where syntax.

ToddV
Posting Yak Master

218 Posts

Posted - 2002-01-04 : 16:03:12

quote:

SELECT TestOuter.TestID, TestOuter.Fk, COUNT( TestInner.Fk) AS Counter
FROM Test TestInner INNER JOIN Test TestOuter
ON (TestOuter.Fk=TestInner.Fk And Cast(TestOuter.TStamp As Money)>=Cast(TestInner.TStamp As Money))

You need a group by here:
GROUP BY TestOuter.TestID, TestOuter.Fk

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-01-04 : 18:04:49

Thanks Todd! I owe you a

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-01-05 : 07:20:52

Yes, knowing that SQL Server's optimization of subqueries in the SELECT part tends to be rather naive, I'd probably write the query that way myself -- assuming the temporary storage for the hash join is likely to fit. I was just concentrating on why the performance with the subquery is made worse by having an index, albeit a not very useful index for the query.

CREATE INDEX IX_Test_Fk ON Test (Fk)
Adding an index on Fk will help the subquery that uses TestID, as expected.
But now the version using TStamp varies depending on whether that CAST is left in or not. With the CAST, it produces a query plan with a higher cost than before, using a bookmark lookup from the new index. Without the CAST, it avoids IX_Test_Fk and keeps the temporary index approach!

CREATE INDEX IX_Test_Fk ON Test (Fk, TStamp)
Alternatively, creating an index that covers for both TestID and TStamp subqueries yields 3 query plans with the same approach using that index.

Perhaps "bug" was an overstatement: by the very nature of optimization there is always an element of heuristics involved.

Edited by - Arnold Fribble on 01/05/2002 07:38:56

narendrareddy
Starting Member

6 Posts

Posted - 2002-01-07 : 01:36:08

Thanks a lot for your explanations and suggestions.

Satisfied,
Reddy.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources