execution plan question - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

execution plan question

Author

Topic

Tim
Starting Member

392 Posts

Posted - 2002-12-11 : 01:26:28

when selecting all records and joining multiple tables my query's execution plan normally shows index or table scan for the first table and then index seeks for the others, with nested loops for each join.

when I add a group by clause I get "Bookmark Lookups" in the execution plan that add to the cost.

"is it normal doc? or am i dying?"

what are bookmark lookups and can they be got rid of?

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-12-11 : 09:00:51

Ummmmmm, don't use GROUP BY?

From Books Online:

quote:
Bookmark Lookup

The Bookmark Lookup logical and physical operator uses a bookmark (row ID or clustering key) to look up the corresponding row in the table or clustered index. The Argument column contains the bookmark label used to look up the row in the table or clustered index. The Argument column also contains the name of the table or clustered index in which the row is looked up. If the WITH PREFETCH clause appears in the Argument column, then the query processor has determined that it is optimal to use asynchronous prefetching (read-ahead) when looking up bookmarks in the table or clustered index.

You can add your own JOIN hints to the query and see if they improve performance.

If you're grouping on a joined column, try putting another table's column in the GROUP BY instead, it might use that table's indexes instead.

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2002-12-11 : 09:24:03

Chances are you are doing many more joins than you need to, as well.

Consider the following tables:

Customers: *CustomerID , Name
Orders: *CustomerID, *OrderID, Date, Amount

(*indicates key fields for each table)

Suppose you want a result that gives you customer name and their total order amount.

So you say:

SELECT Name, SUM(Amount) as TotalAmount
FROM Customers C
INNER JOIN Orders O
ON C.CustomerID = O.OrderID
GROUP BY C.Name

Pretty easy, right? Except -- there is a more efficient way to do the above query. In the above, you are joining every line in the "Orders" table to the "Customer" table -- if you have 10,000 orders and 10 Customers, you are performing 10,000 joins to the Customer table.

Consider this:

SELECT C.Name, O.TotalAmount
FROM Customers C
INNER JOIN
(SELECT CustomerID, SUM(Amount) as TotalAmount FROM Orders GROUP BY CustomerID) O
ON C.CustomerID = O.CustomerID

The above query performs easily rolls up the Orders table from 10,000 into 10 records and THEN peforms only 10 joins, because now it doesn't have to do a JOIN on every record.

Now, SQL may optimise the first query to perform the same as the second, I'm not 100% sure, but I do know that in general (not just with SQL server but with SQL in general) the second query is the most efficient way to write the above.

I only point this out because a) the first query is what 99% of people write and b) it sounds like you may have something similiar, because you mentioned you had lots of joins and then just added GROUP BY's and noticed the performance suffered.

So, my point is (finally!) :

Just adding GROUP BY's without re-working the a query that includes many joins probably will always degrade performance because you are now forcing SQL to sort and GROUP in ways in which the optimizer wasn't doing it before. Before, it concentrated on efficient joins. But now, it has to deal with GROUPING as well.

- Jeff

Tim
Starting Member

392 Posts

Posted - 2002-12-11 : 18:25:35

Jeff, I like your idea while the derived table has a very small number of records. but...

The derived rows still have to be joined to the first table, and the derived table supplies no index for this.

So at some point there must be a limit where you lose the efficiency gained in the "non join rollup" to the process of "join the rolled up records".

Do I have it right?

If so, then I have to make a call about which technique to use, based on how many rows I forsee being the result of the rollup.

simondeutsch
Aged Yak Warrior

547 Posts

Posted - 2002-12-11 : 21:24:05

Tim, I just tried both types of queries,and the derived table one actually did much better.
Customers table(100,000 rows), Orders table (450,000) rows
and the first query using ordinary joins had 61.44 of the cost, query takes 1.36 mins to complete, the derived table one had 38.56, took 17 seconds. There was no WHERE clause in either query, and there were approx. 45,000 rows returned.

Sarah Berger MCSD

Oooopsss.... I just ran DBCC DropCleanBuffers and reran query 2. Took 52 secs.

Edited by - simondeutsch on 12/11/2002 21:26:31

robvolk
Most Valuable Yak

15732 Posts

Posted - 2002-12-11 : 21:27:05

Ahhhh, but did you DBCC DROPCLEANBUFFERS between the two runs? Also do a DBCC FREEPROCCCACHE too.

Never mind...

Edited by - robvolk on 12/11/2002 21:28:02

Tim
Starting Member

392 Posts

Posted - 2002-12-11 : 22:08:42

sarah thanks for that... it helps ... and saves me buggerising around doing it myself !

Just to be sure I understand, your orders table must have 45,000 distinct customers in it and 55,000 of your customers don't have orders. right?

(I would be bothered if I were the boss of that virtual business !

)

Maybe when I finally receive my copy of the guru's guide next week it will have the solution! I will just open the TOC and look up "Tims problem #22307" and flip straight to the answer. (note the # references SQL team TOPIC_ID for convenience, as all good books should

)

I have BIG expectations of this book after all the hype generated by "the pimp"

Edited by - tim on 12/11/2002 22:16:20

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2002-12-11 : 22:11:37

whew! it sounds like i had at least SOME idea of what I was talking about!

- Jeff

Tim
Starting Member

392 Posts

Posted - 2002-12-11 : 22:18:53

always a relief eh jeff !

simondeutsch
Aged Yak Warrior

547 Posts

Posted - 2002-12-12 : 11:52:10

Hi, Tim!
After Rob's pimping, I got the book too. You will not be disappointed, even if you don't find "Tim's Problem #22307" in the index.
And never mind about that data: It's junk. Not a production db. Otherwise, marketing would be all be under mob contracts.

Sarah Berger MCSD

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2002-12-13 : 12:14:10

I'm not convinced.


DROP TABLE Orders
DROP TABLE Customers

CREATE TABLE Customers (CustomerID int PRIMARY KEY, "Name" varchar(50) NOT NULL)

CREATE TABLE Orders (CustomerID int NOT NULL REFERENCES Customers(CustomerID),
  OrderID int NOT NULL, "Date" datetime NOT NULL, Amount money NOT NULL,
  PRIMARY KEY (CustomerID, OrderID)
)

INSERT INTO Customers
SELECT n, CAST(NEWID() AS varchar(50))
FROM Numbers
WHERE n < 10

INSERT INTO Orders
SELECT N1.n, N2.n,
  DATEADD(s, 86400*CAST(N2.n AS float) + 43200*RAND(CAST(NEWID() AS binary(4))), '20000101'),
  RAND(CAST(NEWID() AS binary(4)))*1000.0
FROM Numbers AS N1, Numbers AS N2
WHERE N1.n < 10
  AND N2.n < 1000

The test queries:


-- A
SELECT Name, SUM(Amount) as TotalAmount
FROM Customers C
INNER JOIN Orders O
ON C.CustomerID = O.CustomerID
-- I presume Jeff didn't really mean C.CustomerID = O.OrderID
GROUP BY C.Name

-- B
SELECT C.Name, O.TotalAmount
FROM Customers C
INNER JOIN (SELECT CustomerID, SUM(Amount) as TotalAmount FROM Orders GROUP BY CustomerID) O
ON C.CustomerID = O.CustomerID

The first query has the same execution plan as the second, except that it then sorts on Customers."Name" and stream aggs to finishes the SUM. But that's to be expected. Unless there's a UNIQUE constraint on Customers."Name" these are different queries.

What's less good is that adding in the UNIQUE constraint makes no difference. SQL Server 2000 can't figure out that because both CustomerID and "Name" are candidate keys for Customers, there's no need to do a second aggregation.
What if we'd grouped on CustomerID?


-- C
SELECT Name, SUM(Amount) as TotalAmount
FROM Customers C
INNER JOIN Orders O
ON C.CustomerID = O.CustomerID
GROUP BY C.CustomerID, C.Name

Now this is equivalent to the second query (B) above, irrespective of the UNIQUE constraint. And it has the same execution plan.

Edited by - Arnold Fribble on 12/13/2002 12:20:50

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources