Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 join query performance

Author  Topic 

spock
Starting Member

35 Posts

Posted - 2002-02-04 : 22:30:20
Hi

i have an employee table which has a clustered index on employee id and no non clustered indexes. i have a position title table which is clustered on post id and ctry code.

i used the following query to get the postition titles of each employee using merge joins
as i read they were more faster then inner/hash joins.

select * from (
(select top 100 percent * from empt_emp order by post_id) emp inner join
(select top 100 percent post_id,post_nme from pstt_post_title order by post_id) pos
on emp.post_id = pos.post_id )

removing the order by from the second sub query made it a hash join. My question is , is this method really more efficient than just inner joining without ordering by the join columns like

select * from (
(select * from empt_emp ) emp inner join
(select post_id,post_nme from pstt_post_title ) pos
on emp.post_id = pos.post_id )

i want to know how much of resources would be used for performing the order by.

i am on sql 7.0 and iis 4.0 .

thanks in advance
kaushik



byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2002-02-04 : 23:13:30
Spock,

As you have found out "ORDER BY" is very very expensive...

I would guess that over half the processing in that query will be used to order the set....

DavidM

Tomorrow is the same day as Today was the day before.
Go to Top of Page

spock
Starting Member

35 Posts

Posted - 2002-02-05 : 00:03:47
david ,
quote:

As you have found out "ORDER BY" is very very expensive...
I would guess that over half the processing in that query will be used to order the set....



does this mean i go in for the second query i have posted and not the first one ?
the query analyser reports 9% and 11% cost for the order by parts for the first query.
is this supposed to be high?
when i ran the second query with option(merge join) added, it reported the same execution plan as that was generated for the first query.
so i am now confused as to when am i supposed to use the merge joins option?

kaushik

Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2002-02-05 : 01:22:10
Spock,

More often than not SQL Server will pick the best join operator...

From BOL

"The merge join operation may be either a regular or a many-to-many operation. A many-to-many merge join uses a temporary table to store rows. If there are duplicate values from each input, one of the inputs will have to rewind to the start of the duplicates as each duplicate from the other input is processed.

Merge join itself is very fast, but it can be an expensive choice if sort operations are required. However, if the data volume is large and the desired data can be obtained presorted from existing B-tree indexes, merge join is often the fastest available join algorithm."

They are basically saying that if all the predicate columns (WHERE, ON etc..) are indexed (pre-sorted) then the Merge join should be fastest.

Have a look at this in Northwind....

Select E.* from Employees as E
inner merge join EmployeeTerritories ET on ET.EmployeeID = E.EmployeeID
Select E.* from Employees as E
inner hash join EmployeeTerritories ET on ET.EmployeeID = E.EmployeeID
Select E.* from Employees as E
inner loop join EmployeeTerritories ET on ET.EmployeeID = E.EmployeeID


Have a look at the execution plans to see where the resources are spent...

Because the EmployeeID is clustered and hence pre-sorted then the merge will be faster than hash join but the loop join is faster than the merge join... If you remove the hints, guess which one SQL Server picked....

Also, Hash and Merge join are memory intensive while Loop joins are I/O intensive...

HTH

DavidM

Tomorrow is the same day as Today was the day before.
Go to Top of Page
   

- Advertisement -