Big Table Query Performance Problem

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Big Table Query Performance Problem

Author

Topic

Kristen
Test

22859 Posts

Posted - 2004-10-13 : 02:00:17

I have a Page Log table, with a column storing the Session number.

I want to query all the rows for a given session. At its simplest this can be expressed as:

SELECT COUNT(*)
FROM dbo.MyLogTable
WHERE MySession = 79284

This is taking 87 seconds (the COUNT(*) is 3,630); there are 14,656,260 rows in the table

There is an index on the MySession column, but its not the PK (i.e. not clustered)

However, SQL appears to be using the PK instead (which uses an Identity, and does nto reference the MySession column)

Running with stats on shows:

Table 'MyLogTable'. Scan count 330, logical reads 258925, physical reads 58, read-ahead reads 243374.

SQL Server Execution Times:
CPU time = 7735 ms, elapsed time = 81263 ms.

ShowPlan_Text gives:

StmtText
|--Compute Scalar(DEFINE:([Expr1002]=Convert([globalagg1004])))
|--Stream Aggregate(DEFINE:([globalagg1004]=SUM([partialagg1003])))
|--Parallelism(Gather Streams)
|--Stream Aggregate(DEFINE:([partialagg1003]=Count(*)))
|--Clustered Index Scan(OBJECT:([MyDatabase].[dbo].[MyLogTable].[PK_MyTable]), WHERE:([MyTable].[MySession]=79284))

The constraint for PK_MyTable says:

ALTER TABLE dbo.MyLogTable
ADD CONSTRAINT PK_MyTable PRIMARY KEY CLUSTERED
(
MyID
) WITH FILLFACTOR = 90 ON PRIMARY
GO

so why the Dickens is SQL using the PK when my WHERE clause is for a completely different column?

I tried bunging an index hint in:

WITH (INDEX ( MySessionIndex ) )

but that made it REALLY slow.

I can't do much about the client's hardware, but its a 4 Xeon box with 2GB RAM (the DB size is less than that), and right now its 6:30 in the morning and I doubt anyone much is using the box.

Checking the stats history tells me that the index was rebuilt yesterday late evening (stats are rebuilt by the inbuilt automated process, not a Tara-script)

Kristen

Kristen
Test

22859 Posts

Posted - 2004-10-13 : 13:55:54

<BUMP/>

jsmith8858
Dr. Cross Join

7423 Posts

Posted - 2004-10-13 : 14:08:20

What is the datatype of MySession?

(just curious)

- Jeff

objectuser
Starting Member

14 Posts

Posted - 2004-10-13 : 15:51:02

My first thought is that count(*) is being "optimized" to count using the primary key because SQL Server knows you have one. I just ran a count using count(*) on a couple of tables in my schema and the (expected) plan doesn't use the primary key in either case ... weird.

Have you tried to select count(MySession) instead?

Please let us know if you find a resolution.

--Kevin

X002548
Not Just a Number

15586 Posts

Posted - 2004-10-13 : 15:56:21

OK...where do you get parallelism from?

I betcha it's a partitioned view...where the ddl of the "table"

Maybe I'm wrong...but are you sure that Plan is for that query

Where's the DDL

And what is MyTable.MySession?

Brett

8-)

Kristen
Test

22859 Posts

Posted - 2004-10-14 : 02:27:07

MySession is an int ( "yppl_ysvlID" in the DDL below)

Please note that I don't actually want a COUNT(*), I want to do a SELECT This,That,TheOther into a temporary table, but the COUNT(*) gives similar terrible, IMHO, performance


CREATE TABLE [kk_sm_yPPL_PageProcedureLog] 
(
	[yppl_zEdit] [int] NULL ,
	[yppl_zCrDt] [datetime] NULL ,		-- Create Date/User
	[yppl_zCrUser] [varchar] (4) NULL ,
	[yppl_zUpDt] [datetime] NULL ,		-- Modify Date/User
	[yppl_zUpUser] [varchar] (4) NULL ,
	[yppl_ID] [int] IDENTITY (10000, 1) NOT NULL ,	-- PK
	[yppl_SessionID] [varchar] (20) NULL ,	-- No longer used
	[yppl_CameFrom] [varchar] (255) NULL ,
	[yppl_Page] [smallint] NULL ,		-- No longer used
	[yppl_Special1] [varchar] (1000) NULL ,
	[yppl_Special2] [varchar] (255) NULL ,
	[yppl_Special3] [varchar] (255) NULL ,
	[yppl_Special4] [varchar] (255) NULL ,
	[yppl_Special5] [varchar] (255) NULL ,
	[yppl_AccessDt] [datetime] NULL 
		CONSTRAINT [kk_sm_yPPL_PageProcedureLog_ypplAccessDt] 
			DEFAULT (getdate()),
	[yppl_UsrID] [varchar] (20) NULL ,	-- No longer used
	[yppl_ErrNo] [int] NULL 
		CONSTRAINT [kk_sm_yPPL_PageProcedureLog_ypplErrNo] 
			DEFAULT (0),
	[yppl_ErrMsg] [varchar] (1000) NULL ,
	[yppl_ysvlID] [int] NULL ,		-- "MySession"
	[yppl_PageNo] [int] NULL ,
	[yppl_Referrer] [int] NULL ,
	[yppl_Type] [tinyint] NULL ,
	CONSTRAINT [PK_kk_sm_yPPL_PageProcedureLog] PRIMARY KEY  CLUSTERED 
	(
		[yppl_ID]
	) WITH  FILLFACTOR = 90  ON [PRIMARY] 
) ON [PRIMARY]
GO

CREATE  INDEX [KKI_ypplSessionID] 
ON [dbo].[kk_sm_yPPL_PageProcedureLog]
(
	[yppl_SessionID]
) 
WITH  FILLFACTOR = 90 ON [PRIMARY]
GO

CREATE  INDEX [KKI_ypplzCrDt] 
ON [dbo].[kk_sm_yPPL_PageProcedureLog]
(
	[yppl_zCrDt]
) 
WITH  FILLFACTOR = 90 ON [PRIMARY]

Kristen

Kristen
Test

22859 Posts

Posted - 2004-10-14 : 02:48:18

Here's a more realistic query:


DECLARE @tbl_kk_sm_SP_Session_LIST_01 TABLE
(
	yppl_zCrDt	datetime NOT NULL,
	yppl_ID		int NOT NULL,
	yppl_ysvlID	int NULL
	PRIMARY KEY 
	(
		yppl_zCrDt, 
		yppl_ID
	)
)

SET STATISTICS IO ON
SET STATISTICS TIME ON

INSERT INTO @tbl_kk_sm_SP_Session_LIST_01
SELECT	yppl_zCrDt, 
	yppl_ID, 
	yppl_ysvlID
FROM	dbo.kk_sm_yPPL_PageProcedureLog
WHERE	    yppl_ysvlID = 79284	-- Specific Session Number

SET STATISTICS IO OFF
SET STATISTICS TIME OFF

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

(3630 row(s) affected)

Table '#35E1EF6F'. Scan count 0, logical reads 7295, physical reads 0, read-ahead reads 0.
Table 'kk_sm_yPPL_PageProcedureLog'. Scan count 1, logical reads 273182, physical reads 3, read-ahead reads 262254.

SQL Server Execution Times:
CPU time = 8078 ms, elapsed time = 97301 ms.

SQL Server Execution Times:
CPU time = 0 ms, elapsed time = 0 ms.

StmtText
----------
..|--Clustered Index Insert(OBJECT:(@tbl_kk_sm_SP_Session_LIST_01), SET:(@tbl_kk_sm_SP_Session_LIST_01.[yppl_ysvlID]=[kk_sm_yPPL_PageProcedureLog].[yppl_ysvlID], @tbl_kk_sm_SP_Session_LIST_01.[yppl_ID]=[kk_sm_yPPL_PageProcedureLog].[yppl_ID], @tbl_kk_sm_SP_Session_LIST_01.[yppl_zCrDt]=RaiseIfNull([kk_sm_yPPL_PageProcedureLog].[yppl_zCrDt])))
.......|--Top(ROWCOUNT est 0)
............|--Sort(ORDER BY:([kk_sm_yPPL_PageProcedureLog].[yppl_zCrDt] ASC, [kk_sm_yPPL_PageProcedureLog].[yppl_ID] ASC))
|--Clustered Index Scan(OBJECT:([MyDB].[dbo].[kk_sm_yPPL_PageProcedureLog].[PK_kk_sm_yPPL_PageProcedureLog]), WHERE:([kk_sm_yPPL_PageProcedureLog].[yppl_ysvlID]=79284))

I still don't understand why its using the PK (PK_kk_sm_yPPL_PageProcedureLog) to get at the Session Number (yppl_ysvlID) :-(

Kristen

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-10-14 : 04:14:40

I can't understand why the COUNT(*) query would ever use the clustered index instead of a non-clustered index on the column you're filtering for. Given the 4000:1 ratio, I'm having a hard time seeing why you'd get a table scan on the other query too, even if the alternative is a bookmark lookup.
What execution plans were you getting with the index hint?
Clutching at straws, have you tried it with OPTION (MAXDOP 1)?

Wanderer
Master Smack Fu Yak Hacker

1168 Posts

Posted - 2004-10-14 : 04:24:49

Kristen,

out of interest, regards your "actual" query, have you consider a covering index of:


create index kk_sm_yPPL_PageProcedureLog_CovIX1
on kk_sm_yPPL_PageProcedureLog
([yppl_ysvlID],[yppl_zCrDt],[yppl_ID],[yppl_ysvlID])

*##* *##* *##* *##*

Chaos, Disorder and Panic ... my work is done here!

Kristen
Test

22859 Posts

Posted - 2004-10-14 : 06:17:49

"OPTION (MAXDOP 1)"

Great idea, but doesn't seem to help:

(3630 row(s) affected)

Table '#0BE5F049'. Scan count 0, logical reads 7295, physical reads 0, read-ahead reads 0.
Table 'kk_sm_yPPL_PageProcedureLog'. Scan count 1, logical reads 274477, physical reads 91, read-ahead reads 272815.

SQL Server Execution Times:
CPU time = 7610 ms, elapsed time = 104042 ms.

Query Plan looks to be the same - Rowcount and OrderBy steps are reverse.

I wondered if the OrderBy in the Plan was mucking it up, so I took the PK off the Temporary Table (which got rid of the OrderBy in the Plan, but it still used the same PK and took forever ...

I tried it on another client's machine. Different hardware, same logging table. 21,750,000 rows from which I tried to select 2,357 into my temporary table. Same query plan, same dire performance ...

"covering index"

Thanks, but unfortuately the real query is pulling a load more columns ...

Kristen

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-10-14 : 08:24:54

So what did the query plan with the index hint look like?

Kristen
Test

22859 Posts

Posted - 2004-10-14 : 09:14:34

Would you adam-and-eve it. The Index is on a column called yppl_SessionID which is the now-obsolete-column, replaced with yppl_ysvlID.

So I've thrown away the old index, and built a new one, and it works sweet as a nut.

I think SQLTeam should start making cardboard cutouts - I just need to explain the whole thing to one of those, and the answer would be staring me in the face.

I still don't understand why the original query used the PK, rather than a Table Scan, but its reduced to an intellectual exercise now!

Kristen

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-10-14 : 09:19:56

quote:
Originally posted by Kristen
...
I think SQLTeam should start making cardboard cutouts - I just need to explain the whole thing to one of those, and the answer would be staring me in the face.
...

That's what my math teacher in high school always told me... well not the reference to SQLTeam, but you know what I mean. Maybe you could just put a mirror on your desk or something...

Corey

Kristen
Test

22859 Posts

Posted - 2004-10-14 : 09:43:50

Mirror on my desk? Do I LOOK like a cardboard cutout to you?

On second thoughts, don't answer that ...

Kristen

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2004-10-14 : 09:45:30

One of my cow-orkers used to explain things to his pet rabbit. You could try that.

quote:
I still don't understand why the original query used the PK, rather than a Table Scan

I thought you said the PK was clustered? How's it supposed to do a table scan on a table with a clustered index?

Seventhnight
Master Smack Fu Yak Hacker

2878 Posts

Posted - 2004-10-14 : 09:46:12

Ok... fine.

Get a distorting mirror. Just so it doesn't look exactly like you.

Alot of times, I figure things out in the car or in the shower... maybe you could put those in your office as well (you are the boss so you should have room

)

Corey

Kristen
Test

22859 Posts

Posted - 2004-10-14 : 10:20:37

"I thought you said the PK was clustered? How's it supposed to do a table scan on a table with a clustered index?"

Well that figures ... so is there no such thing as a TableScan on a table with a clustered index then? (I suppose I thought that a clustered index scan was looking for all rows between A and B in the index, and that the actual records would be "right there" and thus be efficient, whereas a TableScan was just "walk the table from end to end looking for data" without bothering to mess around with the index at all.

"I figure things out in the car"

I have 20 yards to walk from the house to the office in the morning. I just can't seem to wolve the days problems in that time. Perhaps I should look into having the office moved further away ...

Kristen

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2004-10-14 : 17:23:04

I'm way underqualified for this...all I wanna say is that I would probably use WITH (NOLOCK) to prevent table lock escalation. Could buy you some time...

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"

Kristen
Test

22859 Posts

Posted - 2004-10-15 : 01:02:41

Clearly I'm not sufficiently underqualified then ... 'coz I've never used it!

Looking in BoL it suggests that NOLOCK will give me uncommitted transactions - it may seem weird, but we do rollback log entries (its something I'm working on ...) so I wouldn't want those showing - I can't see a hint (in BoL) that will give me all committed rows without any locks, and with no escallation. That does sound sensible.

I ought really to get more proficient at locking, we don't have any lock hints anywhere, and perhaps we should (we're very careful about coding and transaction blocks, but just don't have any lock hints). I'll start a separate thread).

Kristen

Lumbago
Norsk Yak Master

3271 Posts

Posted - 2004-10-15 : 04:15:34

WITH (NOLOCK) is essential for historical data, and data that needn't be 100% consistent or data that doesn't change very often. I use it *everywhere* on my website where data is unlikely to have changed and I see no reason why everybody else shouldn't also... it's not unlikely that a query like this will escalate to a table lock and for what? ->

SELECT MONTH(myDate), SUM(Volume)
FROM transactiopns
WHERE myDate BEWTWEEN '2002-01-01 00:00:00' AND '2003-01-01 00:00:00'
GROUP BY MONTH(myDate).

I'm convinced that 90% of the queries in any given app doesn't need locking...

--
Lumbago
"Real programmers don't document, if it was hard to write it should be hard to understand"

Kristen
Test

22859 Posts

Posted - 2004-10-16 : 01:55:10

Doesn't need locking, I agree ... but needs blocking? Yup, I'm pretty sure mine need that - I don't want some data that is rubbish and is about to be rolled back. This is now a 3-thread discussion. Discuss. No, that can't be right. Debate. Yeah, that sounds better!

Kristen

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources