Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Purpose of exists..

Author  Topic 

narendrareddy
Starting Member

6 Posts

Posted - 2002-01-09 : 01:33:17
Many people suggest using Exists clause for an existence check. But the 2 queries below result in the same execution plan being used and have the same query cost. (50% relative cost to batch when both are executed in QA )
Could anyone suggest any cases where there is an adavantage of using EXISTS compared to SubQueries.

USE pubs
GO

SELECT DISTINCT pub_name
FROM publishers
WHERE EXISTS
(SELECT *
FROM titles
WHERE pub_id = publishers.pub_id
AND type = 'business')
GO

-- Or, using the IN clause:
USE pubs
GO
SELECT distinct pub_name
FROM publishers
WHERE pub_id IN
(SELECT pub_id
FROM titles
WHERE type = 'business')
GO

When the above query is written using joins it seems to have a perf gain. So are joins with distinct or groupby the best bet for existence check ?
SELECT
distinct pub_name
FROM
publishers
inner join
titles
on
titles.pub_id = publishers.pub_id
where
titles.type = 'business'


SELECT
pub_name
FROM
publishers
inner join
titles
on
titles.pub_id = publishers.pub_id
where
titles.type = 'business'
group by
pub_name

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2002-01-09 : 02:03:16
The EXISTS keyword will check for a single instance meeting the criteria but the IN clause will check for all occurrences.

Assumptions:
Parts has 100 rows and Orders has a 100 rows

eg.

SELECT P.PartID
FROM Parts P
WHERE P.PartID IN (SELECT PartID FROM Orders)


Every row int the Orders table will be read for every row in the Parts table ie. 100*100 = 10,000 records read!


SELECT P.PartID
FROM Parts P
WHERE EXISTS (SELECT PartID FROM Orders O where O.PartID = P.PartID)


Only 1 row of Orders will be read for every Product thus only 100 records read!

Don't just rely on the query plan for performance measurement..
Use..
SET STATISTICS TIME ON
SET STATISTICS IO ON
and
SET SHOWPLAN_ALL ON

As a general rule, a join will be faster than EXISTS or IN, so use a join when you can....


DavidM

Tomorrow is the same day as Today was the day before.
Go to Top of Page

ToddV
Posting Yak Master

218 Posts

Posted - 2002-01-09 : 09:22:35
quote:


As a general rule, a join will be faster than EXISTS or IN, so use a join when you can....




My 2 cents: Almost all INs can be written and perform better as joins.




Go to Top of Page

davidpardoe
Constraint Violating Yak Guru

324 Posts

Posted - 2002-01-09 : 10:20:13
I remember reading somewhere that EXISTS will break out as soon as it finds an occurrence so it doesn't need to go through the entire SELECT clause after the EXISTS. I have a feeling IN would have to 'finish' the entire SELECT clause.

============
The Dabbler!
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-01-09 : 10:48:33
David (both Davids!) are correct and echo conventional wisdom: joins are usually faster, and EXISTS is usually faster because it won't have to process an entire table if one match is made. I also agree with Todd, if you can write a query as a JOIN instead of an IN clause you will very likely get much better performance.

I think the reason you don't see much improvement is a factor of your example. The pubs database is pretty small, and pub_id is an indexed column in all the selected tables (PK in publishers). The query optimizer will notice this and probably start off using these indexes no matter what your query is.

If you had a query where there weren't convenient indexes such as these, or one that is much larger (1 million+ rows), you will almost certainly see an improvement using EXISTS.

Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2002-01-09 : 11:34:12
>> My 2 cents: Almost all INs can be written and perform better as joins.
not true.

I did an article on this as part of performance optimisation.
It was to find all the records in one table that weren't in another.
The three options covered (under v7) were exists, left outer join and in clause.

Of the three (on 50 - 500,000 recs) the in clause was fastest, outer join next then exists - but not much difference.
This changed drastically if it was part of update where the outer join was fastests, exists next (not much difference) then much slower (orders of magnitude) was the in clause.

I tend to use the outer join if there is no problem with nulls or duplicate recs and the query is simple. If it is part of a complicated query I will use an exists. I try never to use the in clause as this will always fail if it includes a null and would need major changes if you want to add another field to the check - it does have the advantage that you can just run the subquery to check the data though.





==========================================
Cursors are useful if you don't know sql.
Beer is not cold and it isn't fizzy.
Go to Top of Page

narendrareddy
Starting Member

6 Posts

Posted - 2002-01-10 : 00:23:48
Based on what you have said would it be correct to say that what i was looking at were just plans of executions they would be the same for both subquery and exists. When they are executed over a large table we can definitely see a perf adv in using exists over subquery.

When Exists is used it stops reading the table once a match is found but with a subquery it would continue..

Thanks for your help,
Reddy.

Go to Top of Page
   

- Advertisement -