| Author |
Topic |
|
klindeman72
Starting Member
2 Posts |
Posted - 2005-01-20 : 14:44:51
|
| Hello everyone, pleasure to meet you all. I have few questions if you have time. I have been getting into a debate of late as to which is the best method to use. EXISTS, IN, or INNER JOINS. I have read a bit, and it seems that the hottest contention is that EXISTS and INNER JOINS work best on large subqueries that might return a lot of data if you were to use IN. So the first question is that if I reduced the return set of the IN subquery would it be best to use IN as I hear that IN works fastest on small subqueries. I.E.Query 1:SELECT * FROM sometable WHERE EXISTS (SELECT * FROMsomeothertable WHERE y = 'foo')--This finds a match and goes back to main queryQuery 2:SELECT * FROM sometable WHERE y IN (SELECT y FROMsomeothertable)--This returns x amount of rows and compares--Each for equalityQuery 3:SELECT * FROM sometable WHERE y IN (SELECT y FROMsomeothertable WHERE y = 'foo')--This returns x amount amount of rows and compares--Each for equality, however the return set is smallSo in that case would EXISTS still be faster, then to further cloud the issue, would an INNER JOIN make the query faster? To give you a real example, let me show you what I have done so far with my query, though I have not yet figured out the syntax to make this work with INNER JOIN, I am sure that is not but a stones throw away..Here is my EXISTS query:SELECT * FROM ContactWHERE (dbo.Contact.FirstName LIKE '%Bob%')AND (dbo.Contact.LastName LIKE '%Golby%')AND (dbo.Contact.Email LIKE '%bobgolby@bob.com%')AND (dbo.Contact.Company LIKE '%Bob Inc%')AND (dbo.Contact.HomePhone LIKE '%303.333.3333%')AND (dbo.Contact.WorkPhone LIKE '%303.333.3333%')AND (dbo.Contact.HomeAddress1 LIKE '%1234 Bob St%') AND (dbo.Contact.HomeCity LIKE '%Bobville%')AND (dbo.Contact.HomeState LIKE '%BO%')AND (dbo.Contact.HomePostalCode LIKE '%80012%')AND (dbo.Contact.HomeCountry LIKE '%United States%')AND (dbo.Contact.WorkAddress1 LIKE '%1234 Bobby St%') AND (dbo.Contact.WorkAddress2 LIKE '%Suite 112%')AND (dbo.Contact.WorkCity LIKE '%Bobville%')AND (dbo.Contact.WorkState LIKE '%BO%')AND (dbo.Contact.WorkPostalCode LIKE '%80012%')AND (dbo.Contact.WorkCountry LIKE '%United States%')AND EXISTS ( SELECT ProjectContactAssign.ContactID FROM ProjectContactAssign WHERE EXISTS ( SELECT ProjectUserAssign.ProjectID FROM ProjectUserAssign WHERE EXISTS ( SELECT * FROM ProjectUserAssign WHERE ProjectUserAssign.UserID = '{4B40B11A-8087-4DF8-A4DA-789E0672D074}')))And here is the same query using IN..SELECT * FROM ContactWHERE (dbo.Contact.FirstName LIKE '%Bob%')AND (dbo.Contact.LastName LIKE '%Golby%')AND (dbo.Contact.Email LIKE '%bobgolby@bob.com%')AND (dbo.Contact.Company LIKE '%Bob Inc%')AND (dbo.Contact.HomePhone LIKE '%303.333.3333%')AND (dbo.Contact.WorkPhone LIKE '%303.333.3333%')AND (dbo.Contact.HomeAddress1 LIKE '%1234 Bob St%') AND (dbo.Contact.HomeCity LIKE '%Bobville%')AND (dbo.Contact.HomeState LIKE '%BO%')AND (dbo.Contact.HomePostalCode LIKE '%80012%')AND (dbo.Contact.HomeCountry LIKE '%United States%')AND (dbo.Contact.WorkAddress1 LIKE '%1234 Bobby St%') AND (dbo.Contact.WorkAddress2 LIKE '%Suite 112%')AND (dbo.Contact.WorkCity LIKE '%Bobville%')AND (dbo.Contact.WorkState LIKE '%BO%')AND (dbo.Contact.WorkPostalCode LIKE '%80012%')AND (dbo.Contact.WorkCountry LIKE '%United States%')AND Contact.ContactID IN ( SELECT ProjectContactAssign.ContactID FROM ProjectContactAssign WHERE ProjectContactAssign.ProjectID IN ( SELECT ProjectUserAssign.ProjectID FROM ProjectUserAssign WHERE EXISTS ( SELECT * FROM ProjectUserAssign WHERE ProjectUserAssign.UserID = '{4B40B11A-8087-4DF8-A4DA-789E0672D074}')))Is the EXISTS query faster than the IN query, should I re-write it to use an INNER JOIN because that is faster? Thanks in advance and for your patience.Kent |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-20 : 15:07:21
|
| as I read EXISTS faster all of them; |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-20 : 15:18:06
|
- In the SELECT clause don't use "SELECT *"- Use JOINS to join all the tables together- In the WHERE clause filter the data- Use table aliasing- To get a match, you probably do not have to do col = LIKE('%x%') for every column in the table !- Use QA to show You the "Execution Plan" ( see BOL )Very generally:JOINS will be the most effective, then EXISTS, then IN.Is this more readable ?SELECT c.FirstName ,c.LastName ,c.Email ,c.Company ,c.HomePhone ,c.WorkPhone ,c.HomeAddress1 ,c.HomeCity ,c.HomeState ,c.HomePostalCode ,c.HomeCountry ,c.WorkAddress1 ,c.WorkAddress2 ,c.WorkCity ,c.WorkState ,c.WorkPostalCode ,c.WorkCountryFROM dbo.Contact c JOIN dbo.ProjectContactAssign pca ON c.ContactID = pca.ContactID JOIN dbo.ProjectUserAssign pua ON pca.ProjectID = pua.ProjectIDWHERE ProjectUserAssign.UserID = '{4B40B11A-8087-4DF8-A4DA-789E0672D074}' AND c.FirstName = 'Bob' AND c.LastName = 'Golby' /* Is this a search query or used to just retrieve Bob Golby ? AND (c.FirstName LIKE '%Bob%') AND (c.LastName LIKE '%Golby%') AND (c.Email LIKE '%bobgolby@bob.com%') AND (c.Company LIKE '%Bob Inc%') AND (c.HomePhone LIKE '%303.333.3333%') AND (c.WorkPhone LIKE '%303.333.3333%') AND (c.HomeAddress1 LIKE '%1234 Bob St%') AND (c.HomeCity LIKE '%Bobville%') AND (c.HomeState LIKE '%BO%') AND (c.HomePostalCode LIKE '%80012%') AND (c.HomeCountry LIKE '%United States%') AND (c.WorkAddress1 LIKE '%1234 Bobby St%') AND (c.WorkAddress2 LIKE '%Suite 112%') AND (cc.WorkCity LIKE '%Bobville%') AND (c.WorkState LIKE '%BO%') AND (c.WorkPostalCode LIKE '%80012%') AND (c.WorkCountry LIKE '%United States%') */rockmoose |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2005-01-20 : 15:26:09
|
| Nothing will be faster with a LIKE '%anything%'That's a scan...Brett8-) |
 |
|
|
klindeman72
Starting Member
2 Posts |
Posted - 2005-01-20 : 15:32:52
|
| Verah nice... Thanks Rockmoose.. as to the matchs, everything there can or could be included in the query, it depends on the users input, once the whole thing is correct I can, and will truncate it only with pertinant data. Very elegant solution though I had to add a DISTINCT there at the start to get it 100%.. Thank you both! |
 |
|
|
AndyB13
Aged Yak Warrior
583 Posts |
Posted - 2005-01-20 : 15:41:05
|
That is a well formatted query rockmoose or should i say beatified |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-20 : 15:50:17
|
| rockmoose;>Very generally:>JOINS will be the most effective, then EXISTSare you sure? or maybe you have reliable stats data? |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-20 : 16:05:17
|
Yeahh, it was "beautified" , ran it in a cool Java Applet I found on the web - Not.As Brett says, a LIKE will force scanning the Customer table.But SQL server might be able to minimize the rowset that needs to be scanned.Anyway, I think I got the M:M relationships now.If You want to get rid of the DISTINCT you could try the following FROM constructs as well.Try the different approaches in QA and analyze the Execution Plans !SELECT ... -- from Contact tableFROM dbo.Contact c JOIN ( SELECT DISTINCT pca.ContactID FROM dbo.ProjectUserAssign pua JOIN dbo.ProjectContactAssign pca ON pca.ProjectID = pua.ProjectID WHERE pua.UserID = '{4B40B11A-8087-4DF8-A4DA-789E0672D074}' ) AS pcaContacts ON c.ContactID = pcaContacts.ContactIDWHERE ... -- filtering on Contact tableSELECT ... -- from Contact tableFROM dbo.Contact cWHERE EXISTS( SELECT * FROM dbo.ProjectUserAssign pua JOIN dbo.ProjectContactAssign pca ON pca.ProjectID = pua.ProjectID WHERE pua.UserID = '{4B40B11A-8087-4DF8-A4DA-789E0672D074}' AND pca.ContactID = c.ContactID ) ... -- filtering on Contact tablerockmoose |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-20 : 16:09:37
|
quote: Originally posted by Stoad rockmoose;>Very generally:>JOINS will be the most effective, then EXISTSare you sure? or maybe you have reliable stats data?
Sorry, I regret writing that now, and was just waiting to be jumped.As far as I have experience many times the Query plan will be about the same.Just that I feel that JOINS are clearer and IMO easier to hint and change.No reliable stats data either way, nope.General ramble over.rockmoose |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-20 : 16:14:01
|
kind of a counter-sample (just recalled it):http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=30318A quote from there:quote: Originally posted by Arnold Fribble I think David was believing the cost estimates too much.
don't believe it too much |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2005-01-20 : 16:17:52
|
| The bigest problem is all the possible combinations...you're gonna say they can supply any piece of the search criteria..I still like thishttp://weblogs.sqlteam.com/brettk/archive/2004/05/05/1312.aspxAnd if you can loose the leading % (what are the length of these fields anyway?) you're in business.Brett8-) |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-20 : 16:28:31
|
A crappy query is a crappy query wither or wether there EXISTS JOIN clauses...I think that the Query Optimizer would handle EXISTS and JOIN operators similarily.By determining the best way to operate SET1 with SET2.Lot's of speculation here , sorry about that.rockmoose |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2005-01-20 : 16:39:23
|
| Personally I try to avoid the IN operator.To be honest, we shouldn't have to care. And in most of the cases, SQL Server will pick the best plan. Trying to "out think" the optimiser usually results in pain and suffering.DavidM"Always pre-heat the oven" |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-20 : 16:49:09
|
| >> I still like this....Very interesting blog Brett, thanx for the link.Let's say we are doing some free for all search proc on a table with dynamic paramaters,(the standard case).Which method is preferred?- union all method- adding left joins- existance check for each column- where a=@a or coalesce(a,@a) is null or @a is null -- method- x methodwhat are the factors involved?any research & stats available?Or is the answer just - "It depends"rockmoose |
 |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2005-01-20 : 16:50:35
|
| I hadn't seen that thread, Stoad! And Arnold is dead right too! Your query is faster when there is data!I initially made the same mistake with UDFs when I first laid eyes on them. Nothing like production volumes to show that mistake.DavidM"Always pre-heat the oven" |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-21 : 02:17:53
|
| never mind, David; lol;seriously, my pink dream is to forget for good about Optimization & Indexes.This shamanic stuff really sucks. I wish they put grains of AI into the RDBMSs. |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-01-21 : 12:29:57
|
| My $0.02:If you are having to use DISTINCT because the JOINs are bringing multiple, uneeded, rows then use EXISTS instead.If you have lots of parameters which are OPTIONAL then consider constructing parameterised SQL for the WHERE clause (i.e. leaving out the bits where the criteria is blank) and use sp_ExecuteSQL to get a cached query plan.Kristen |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-21 : 12:53:41
|
| I think you get quite a bang for the bucks by following Kristen's $0.02.Good advice.rockmoose |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-01-21 : 13:03:56
|
| These days $0.02 requires saving about a trillion CPU cycles!Kristen |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2005-01-21 : 15:16:38
|
| rockmoose; Kristen;thank you both; really appreciate and enjoy reading/studying your posts;now a jokish test of optimization skills for those who're interested;The below sql solves symbolic equation send + more = money.After replacing the same letters with the same digits we get a true mathexpression (without leading zeroes).Try to accelerate (I think abt. 1000 times) my straightforward solution.create table #t(n int)GOinsert #tselect 0 union allselect 1 union allselect 2 union allselect 3 union allselect 4 union allselect 5 union allselect 6 union allselect 7 union allselect 8 union allselect 9GO -- send + more = money >> d e m n o r s yselects.n*1000+e.n*100+n.n*10+d.n AS 'send',m.n*1000+o.n*100+r.n*10+e.n AS 'more',m.n*10000+o.n*1000+n.n*100+e.n*10+y.n AS 'money'from #t d, #t e, #t m, #t n, #t o, #t r, #t s, #t ywheres.n>0 and m.n>0 ands.n*1000+e.n*100+n.n*10+d.n+m.n*1000+o.n*100+r.n*10+e.n=m.n*10000+o.n*1000+n.n*100+e.n*10+y.ndrop table #t |
 |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2005-01-21 : 17:57:24
|
| Well m = 1 and (d.n+e.n)%10 = y.n gives like 80 times perf gain....s.n = 9 ~ 700xrockmoose |
 |
|
|
Next Page
|