Exists, In, and Inner Joins - Oh My!!

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Exists, In, and Inner Joins - Oh My!!

Author

Topic

klindeman72
Starting Member

2 Posts

Posted - 2005-01-20 : 14:44:51

Hello everyone, pleasure to meet you all. I have few questions if you have time. I have been getting into a debate of late as to which is the best method to use. EXISTS, IN, or INNER JOINS. I have read a bit, and it seems that the hottest contention is that EXISTS and INNER JOINS work best on large subqueries that might return a lot of data if you were to use IN. So the first question is that if I reduced the return set of the IN subquery would it be best to use IN as I hear that IN works fastest on small subqueries. I.E.

Query 1:
SELECT * FROM sometable WHERE EXISTS (SELECT * FROM
someothertable WHERE y = 'foo')
--This finds a match and goes back to main query
Query 2:
SELECT * FROM sometable WHERE y IN (SELECT y FROM
someothertable)
--This returns x amount of rows and compares
--Each for equality
Query 3:
SELECT * FROM sometable WHERE y IN (SELECT y FROM
someothertable WHERE y = 'foo')
--This returns x amount amount of rows and compares
--Each for equality, however the return set is small

So in that case would EXISTS still be faster, then to further cloud the issue, would an INNER JOIN make the query faster? To give you a real example, let me show you what I have done so far with my query, though I have not yet figured out the syntax to make this work with INNER JOIN, I am sure that is not but a stones throw away..

Here is my EXISTS query:

SELECT *
FROM Contact
WHERE (dbo.Contact.FirstName LIKE '%Bob%')
AND (dbo.Contact.LastName LIKE '%Golby%')
AND (dbo.Contact.Email LIKE '%bobgolby@bob.com%')
AND (dbo.Contact.Company LIKE '%Bob Inc%')
AND (dbo.Contact.HomePhone LIKE '%303.333.3333%')
AND (dbo.Contact.WorkPhone LIKE '%303.333.3333%')
AND (dbo.Contact.HomeAddress1 LIKE '%1234 Bob St%')
AND (dbo.Contact.HomeCity LIKE '%Bobville%')
AND (dbo.Contact.HomeState LIKE '%BO%')
AND (dbo.Contact.HomePostalCode LIKE '%80012%')
AND (dbo.Contact.HomeCountry LIKE '%United States%')
AND (dbo.Contact.WorkAddress1 LIKE '%1234 Bobby St%')
AND (dbo.Contact.WorkAddress2 LIKE '%Suite 112%')
AND (dbo.Contact.WorkCity LIKE '%Bobville%')
AND (dbo.Contact.WorkState LIKE '%BO%')
AND (dbo.Contact.WorkPostalCode LIKE '%80012%')
AND (dbo.Contact.WorkCountry LIKE '%United States%')
AND EXISTS (
SELECT ProjectContactAssign.ContactID
FROM ProjectContactAssign
WHERE EXISTS (
SELECT ProjectUserAssign.ProjectID
FROM ProjectUserAssign
WHERE EXISTS (
SELECT *
FROM ProjectUserAssign
WHERE ProjectUserAssign.UserID = '{4B40B11A-8087-4DF8-A4DA-789E0672D074}')))

And here is the same query using IN..

SELECT *
FROM Contact
WHERE (dbo.Contact.FirstName LIKE '%Bob%')
AND (dbo.Contact.LastName LIKE '%Golby%')
AND (dbo.Contact.Email LIKE '%bobgolby@bob.com%')
AND (dbo.Contact.Company LIKE '%Bob Inc%')
AND (dbo.Contact.HomePhone LIKE '%303.333.3333%')
AND (dbo.Contact.WorkPhone LIKE '%303.333.3333%')
AND (dbo.Contact.HomeAddress1 LIKE '%1234 Bob St%')
AND (dbo.Contact.HomeCity LIKE '%Bobville%')
AND (dbo.Contact.HomeState LIKE '%BO%')
AND (dbo.Contact.HomePostalCode LIKE '%80012%')
AND (dbo.Contact.HomeCountry LIKE '%United States%')
AND (dbo.Contact.WorkAddress1 LIKE '%1234 Bobby St%')
AND (dbo.Contact.WorkAddress2 LIKE '%Suite 112%')
AND (dbo.Contact.WorkCity LIKE '%Bobville%')
AND (dbo.Contact.WorkState LIKE '%BO%')
AND (dbo.Contact.WorkPostalCode LIKE '%80012%')
AND (dbo.Contact.WorkCountry LIKE '%United States%')
AND Contact.ContactID IN (
SELECT ProjectContactAssign.ContactID
FROM ProjectContactAssign
WHERE ProjectContactAssign.ProjectID IN (
SELECT ProjectUserAssign.ProjectID
FROM ProjectUserAssign
WHERE EXISTS (
SELECT *
FROM ProjectUserAssign
WHERE ProjectUserAssign.UserID = '{4B40B11A-8087-4DF8-A4DA-789E0672D074}')))

Is the EXISTS query faster than the IN query, should I re-write it to use an INNER JOIN because that is faster? Thanks in advance and for your patience.

Kent

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-20 : 15:07:21

as I read EXISTS faster all of them;

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-20 : 15:18:06

- In the SELECT clause don't use "SELECT *"
- Use JOINS to join all the tables together
- In the WHERE clause filter the data
- Use table aliasing
- To get a match, you probably do not have to do col = LIKE('%x%') for every column in the table !
- Use QA to show You the "Execution Plan" ( see BOL )

Very generally:
JOINS will be the most effective, then EXISTS, then IN.

Is this more readable ?

SELECT
	c.FirstName
	,c.LastName
	,c.Email
	,c.Company
	,c.HomePhone
	,c.WorkPhone
	,c.HomeAddress1
	,c.HomeCity
	,c.HomeState
	,c.HomePostalCode
	,c.HomeCountry
	,c.WorkAddress1
	,c.WorkAddress2
	,c.WorkCity
	,c.WorkState
	,c.WorkPostalCode
	,c.WorkCountry
FROM
	dbo.Contact c
	JOIN dbo.ProjectContactAssign pca
		ON c.ContactID = pca.ContactID
	JOIN dbo.ProjectUserAssign pua
		ON pca.ProjectID = pua.ProjectID
WHERE
	ProjectUserAssign.UserID = '{4B40B11A-8087-4DF8-A4DA-789E0672D074}'
	AND c.FirstName = 'Bob'
	AND c.LastName = 'Golby'
	/* Is this a search query or used to just retrieve Bob Golby ? 
	AND (c.FirstName LIKE '%Bob%')
	AND (c.LastName LIKE '%Golby%')
	AND (c.Email LIKE '%bobgolby@bob.com%')
	AND (c.Company LIKE '%Bob Inc%')
	AND (c.HomePhone LIKE '%303.333.3333%')
	AND (c.WorkPhone LIKE '%303.333.3333%')
	AND (c.HomeAddress1 LIKE '%1234 Bob St%')
	AND (c.HomeCity LIKE '%Bobville%')
	AND (c.HomeState LIKE '%BO%')
	AND (c.HomePostalCode LIKE '%80012%')
	AND (c.HomeCountry LIKE '%United States%')
	AND (c.WorkAddress1 LIKE '%1234 Bobby St%')
	AND (c.WorkAddress2 LIKE '%Suite 112%')
	AND (cc.WorkCity LIKE '%Bobville%')
	AND (c.WorkState LIKE '%BO%')
	AND (c.WorkPostalCode LIKE '%80012%')
	AND (c.WorkCountry LIKE '%United States%') */

rockmoose

X002548
Not Just a Number

15586 Posts

Posted - 2005-01-20 : 15:26:09

Nothing will be faster with a LIKE '%anything%'

That's a scan...

Brett

8-)

klindeman72
Starting Member

2 Posts

Posted - 2005-01-20 : 15:32:52

Verah nice... Thanks Rockmoose.. as to the matchs, everything there can or could be included in the query, it depends on the users input, once the whole thing is correct I can, and will truncate it only with pertinant data. Very elegant solution though I had to add a DISTINCT there at the start to get it 100%.. Thank you both!

AndyB13
Aged Yak Warrior

583 Posts

Posted - 2005-01-20 : 15:41:05

That is a well formatted query rockmoose

or should i say beatified

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-20 : 15:50:17

rockmoose;
>Very generally:
>JOINS will be the most effective, then EXISTS

are you sure? or maybe you have reliable stats data?

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-20 : 16:05:17

Yeahh, it was "beautified"

, ran it in a cool Java Applet I found on the web - Not.

As Brett says, a LIKE will force scanning the Customer table.
But SQL server might be able to minimize the rowset that needs to be scanned.

Anyway, I think I got the M:M relationships now.
If You want to get rid of the DISTINCT you could try the following FROM constructs as well.
Try the different approaches in QA and analyze the Execution Plans !

SELECT
	... -- from Contact table
FROM
	dbo.Contact c
	JOIN
	(	SELECT DISTINCT
			pca.ContactID
		FROM
			dbo.ProjectUserAssign pua
			JOIN dbo.ProjectContactAssign pca
				ON pca.ProjectID = pua.ProjectID
		WHERE
			pua.UserID = '{4B40B11A-8087-4DF8-A4DA-789E0672D074}'
	) AS pcaContacts
		ON c.ContactID = pcaContacts.ContactID
WHERE
	... -- filtering on Contact table



SELECT
	... -- from Contact table
FROM
	dbo.Contact c
WHERE
	EXISTS(	SELECT *
		FROM
			dbo.ProjectUserAssign pua
			JOIN dbo.ProjectContactAssign pca
				ON pca.ProjectID = pua.ProjectID
		WHERE
			pua.UserID = '{4B40B11A-8087-4DF8-A4DA-789E0672D074}'
			AND pca.ContactID = c.ContactID )
	... -- filtering on Contact table

rockmoose

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-20 : 16:09:37

quote:
Originally posted by Stoad

rockmoose;
>Very generally:
>JOINS will be the most effective, then EXISTS

are you sure? or maybe you have reliable stats data?

Sorry, I regret writing that now, and was just waiting to be jumped.
As far as I have experience many times the Query plan will be about the same.
Just that I feel that JOINS are clearer and IMO easier to hint and change.
No reliable stats data either way, nope.
General ramble over.

rockmoose

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-20 : 16:14:01

kind of a counter-sample (just recalled it):
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=30318

A quote from there:

quote:
Originally posted by Arnold Fribble

I think David was believing the cost estimates too much.

don't believe it too much

X002548
Not Just a Number

15586 Posts

Posted - 2005-01-20 : 16:17:52

The bigest problem is all the possible combinations...you're gonna say they can supply any piece of the search criteria..

I still like this

http://weblogs.sqlteam.com/brettk/archive/2004/05/05/1312.aspx

And if you can loose the leading % (what are the length of these fields anyway?) you're in business.

Brett

8-)

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-20 : 16:28:31

A crappy query is a crappy query wither or wether there EXISTS JOIN clauses...

I think that the Query Optimizer would handle EXISTS and JOIN operators similarily.
By determining the best way to operate SET1 with SET2.

Lot's of speculation here

, sorry about that.

rockmoose

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2005-01-20 : 16:39:23

Personally I try to avoid the IN operator.

To be honest, we shouldn't have to care. And in most of the cases, SQL Server will pick the best plan. Trying to "out think" the optimiser usually results in pain and suffering.

DavidM

"Always pre-heat the oven"

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-20 : 16:49:09

>> I still like this....

Very interesting blog Brett, thanx for the link.
Let's say we are doing some free for all search proc on a table with dynamic paramaters,(the standard case).
Which method is preferred?

- union all method
- adding left joins
- existance check for each column
- where a=@a or coalesce(a,@a) is null or @a is null -- method
- x method

what are the factors involved?
any research & stats available?

Or is the answer just - "It depends"

rockmoose

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2005-01-20 : 16:50:35

I hadn't seen that thread, Stoad! And Arnold is dead right too! Your query is faster when there is data!

I initially made the same mistake with UDFs when I first laid eyes on them. Nothing like production volumes to show that mistake.

DavidM

"Always pre-heat the oven"

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-21 : 02:17:53

never mind, David; lol;
seriously, my pink dream is to forget for good about Optimization & Indexes.
This shamanic stuff really sucks. I wish they put grains of AI into the RDBMSs.

Kristen
Test

22859 Posts

Posted - 2005-01-21 : 12:29:57

My $0.02:

If you are having to use DISTINCT because the JOINs are bringing multiple, uneeded, rows then use EXISTS instead.

If you have lots of parameters which are OPTIONAL then consider constructing parameterised SQL for the WHERE clause (i.e. leaving out the bits where the criteria is blank) and use sp_ExecuteSQL to get a cached query plan.

Kristen

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-21 : 12:53:41

I think you get quite a bang for the bucks by following Kristen's $0.02.
Good advice.

rockmoose

Kristen
Test

22859 Posts

Posted - 2005-01-21 : 13:03:56

These days $0.02 requires saving about a trillion CPU cycles!

Kristen

Stoad
Freaky Yak Linguist

1983 Posts

Posted - 2005-01-21 : 15:16:38

rockmoose; Kristen;
thank you both; really appreciate and enjoy reading/studying your posts;

now a jokish test of optimization skills for those who're interested;
The below sql solves symbolic equation send + more = money.
After replacing the same letters with the same digits we get a true math
expression (without leading zeroes).
Try to accelerate (I think abt. 1000 times) my straightforward solution.

create table #t(n int)
GO
insert #t
select 0 union all
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9
GO
-- send + more = money >> d e m n o r s y
select

s.n*1000+e.n*100+n.n*10+d.n AS 'send',

m.n*1000+o.n*100+r.n*10+e.n AS 'more',

m.n*10000+o.n*1000+n.n*100+e.n*10+y.n AS 'money'

from #t d, #t e, #t m, #t n, #t o, #t r, #t s, #t y

where

s.n>0 and m.n>0 and

s.n*1000+e.n*100+n.n*10+d.n
+
m.n*1000+o.n*100+r.n*10+e.n
=
m.n*10000+o.n*1000+n.n*100+e.n*10+y.n

drop table #t

rockmoose
SQL Natt Alfen

3279 Posts

Posted - 2005-01-21 : 17:57:24

Well m = 1 and (d.n+e.n)%10 = y.n gives like 80 times perf gain....
s.n = 9 ~ 700x

rockmoose

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources