I have the following query:select a.PartyID as ID1, b.PartyID from ( select top 500000 (FirstName + LastName + LTrim(RTrim(Address1))) as MatchString, PCM1.partyid from Pn1 inner join PCM1 on Pn1.partyid = PCM1.partyID inner join CM1 on PCM1.CmID = CM1.CmID INNER join postaladdress e on CM1.cmid = e.cmid ) a Inner JOIN ( select top 500000 (FirstName + LastName +LTrim(RTrim(Address1))) as MatchString, PCM2.partyid from Person Pn2 inner join PCM2 on Pn2.partyid = PCM2.partyID inner join CM2 on PCM2.CMID = CM2.CMID INNER join postaladdress d on CM2.cmid = d.cmid ) b ON a.MatchString = b.MatchString and a.partyId < b.partyid
Each of those derived tables returns a result set of approx 286000 rows. If I remove the TOP clause and try to run the query on all the data it runs forever never returning anything ( well, never being as long as I was willing to wait, about 15minutes). If I put TOP 500000, which is there now, it runs in 14 seconds. I am guessing this has something to do with query optimization? TOP 500000 should return the same resultset as the query without TOP at all, as 500000 is far more than the actual number of rows returned without TOP. Am I misunderstaning how TOP works? Is this an execution plan issue?