COALESCE for performance improvements? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

COALESCE for performance improvements?

Author

Topic

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-02-21 : 19:06:48

Ok, this is kind of crazy. I'm evaluating SQL Expert (www.leccotech.com), and so far it's kind of a mixed bag. However, it had been able to reduce the query cost on a few of my queries... in a way that I just do not understand.

Here's a before and after with a stored procedure intended to detect spam blasts. Before:

select inetfrom,count(*)
 from email
 where inetfrom is not null
	and inetfrom<>'ourdomain.com'
	and added>=dateadd(hour,-72,getdate())
 group by inetfrom
 having count(*)>3
 order by count(*) desc

After:

select inetfrom, 
       count(*) 
  from email 
 where inetfrom >= CHAR(0) 
   and (inetfrom < 'ourdomain.com' 
         OR inetfrom > 'ourdomain.com') 
   and added >= dateadd(hour, -72, getdate()) 
 group by inetfrom 
having count(*) > COALESCE(3, 3) 
 order by count(*) desc

...According to SQL query anayzer, the cost of the before query is 0.828530, and the cost of the after version is 0.655004 -- a 20% improvement. Ok, so I know that using <> is a bad idea, and I rewrote the query using the suggested where clause. That gets me from .828530 to .657049.

Sure enough, just changing the having clause from a fixed into to COALESCE shaves off that extra bit.

Now, I know those are estimated costs, and in practice the performance gain is negligable, but isn't that weird? I mean, is it possible that >COALESCE(X,X) is actually faster than >X ?

Cheers
-b

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-02-21 : 19:52:12

Haven't tested this so I can't comment on the actual plans it generates, but I have a feeling that using COALESCE like that is tricking the optimizer into using a technique where it might build a worktable that summarizes the results and finds the count that way, or, preventing it from using a worktable because it costs a little more.

I personally wouldn't use the COALESCE because a) the improvement over a regular HAVING expression is really not worth it and b) more importantly, COALESCE is complicating a very simple expression only for the sake of the most minor performance tweak. Betcha $50 the next service pack will break that tweak, or the next version of SQL Server, and it's unlikely ANYONE will understand why that clause is written that way. The rest of the tweaks I understand, and they make sense, but I can't help but wonder why it didn't just add some index or join hints instead. Relying on that kind of esoterica for its optimizations doesn't strike me as particularly solid.

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-02-22 : 13:30:13

I hear ya -- the product actually isn't capable of adding indexes; it will try suppressing indexes and hinting at different indexes. It's definitely a brute force "try every permutation of these tricks" product. It's helped me a bit -- it found a long running report that it improved by changing join orders. However, it's got some severe limitations as well.

Thanks for the note. Your comments makes perfect sense.

-b

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2003-02-23 : 06:40:12

quote:
The rest of the tweaks I understand, and they make sense

Really? How does replacing varcharColumn IS NOT NULL with varcharColumn >= CHAR(0) make sense? If the intent is that the second can make use of an index where the first cannot then it's mistaken (at least for SQL Server 2000 -- it can use IsNotNull in the SEEK:() predicate). The only difference is that the number of 'downstream' rows estimated is likely to be different. That could be used to attempt to push the plan generation in a certain direction, but the cost estimations for the resulting query will not be particularly meaningful.

For example, create some random stuff:


DROP TABLE TestChar
GO

CREATE TABLE TestChar (
  i int PRIMARY KEY,
  v varchar(10) NULL
)

BEGIN TRANSACTION
SET NOCOUNT ON
DECLARE @i int
SET @i = 0

WHILE @i < 100000
BEGIN
  SET @i = @i + 1

  -- It's not a bug, it's a feature 
  INSERT INTO TestChar
  SELECT @i, CASE WHEN @i % 500 = 0 THEN NULL ELSE c+c+c+c+c+c+c+c+c+c END
  FROM (SELECT CHAR(97+FLOOR(RAND()*26)) AS c) AS A
END

COMMIT TRANSACTION

CREATE INDEX v ON TestChar(v)

Compare the real and estimated execution costs:


SET STATISTICS IO ON
SET STATISTICS TIME ON

-- Between these two:
SELECT COUNT(*)
FROM TestChar
WHERE v IS NOT NULL

SELECT COUNT(*)
FROM TestChar
WHERE v >= CHAR(0)

-- Between these 3:
SELECT *
INTO #x
FROM TestChar
WHERE v IS NOT NULL
ORDER BY i

SELECT *
INTO #y
FROM TestChar
WHERE v >= CHAR(0)
ORDER BY i

SELECT *
INTO #z
FROM TestChar
WHERE v >= CHAR(0) AND LEN(v) >= 0 AND DATALENGTH(v) >= 0
ORDER BY i

GO

DROP TABLE #x
DROP TABLE #y
DROP TABLE #z

aiken
Aged Yak Warrior

525 Posts

Posted - 2003-02-23 : 15:12:56

Well, I stand corrected

. Thanks!

-b

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources