Clustered index on IDENTITY or Date column? - SQL Server Forums

Home | Weblogs | Forums

Site Sponsored By: SQLDSC - SQL Server Desired State Configuration

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

Transact-SQL (2000)

Clustered index on IDENTITY or Date column?

Author

Topic

Kristen
Test

22859 Posts

Posted - 2005-11-26 : 05:37:59

Pretty sure I know the answer to this one, but would like some second opinions pls

I have a table that logs each [website] page that users visit.

That has the Date, IDENTITY ID, any Error number (from an UPDATE to the record if/when an error is recorded by the application), and a Session ID (i.e. many pages per session) which joins to a Session Logging Table.

OK ...

Currently I have a CLUSTERED PK on the IDENTITY.

We basically do one insert, and then one update (to record the "completed time" and any error number, and then never edit that row again. 7 days later the row will be deleted [3 days later on very busy sites].

We query by Create Date. And that takes a while, and (for a range of, say, one day) probably never uses the Date Index because it is 1/7th [or 1/3rd] of the total rows in the table.

We also have a report that looks for pages with Error Number <> 0 (within a date range).

Question:

Should I move the clustered index to the date?

Supplemental question: Should the columns with DEFAULT on them be set to NOT NULL (the application doesn't care if they contain NULLS, but in practice they aren't any - is there some performance advantage - e.g. the optimiser knowing that the column cannot contain NULLs?)


CREATE TABLE MyPageLog
(
	MyCreateDate	datetime NULL DEFAULT (GetDate()),
	MyPageLogID	int IDENTITY (10000, 1) NOT NULL ,
	MyErrorNo	int NULL DEFAULT (0),
	MySessionID	int NULL ,
	... other columns ...
	CONSTRAINT PK_MyPageLog PRIMARY KEY  CLUSTERED 
	(
		MyPageLogID
	)
)

CREATE  INDEX IX_MyPageLog_MySessionID 
	ON dbo.MyPageLog
(
	MySessionID
)
GO

CREATE  INDEX IX_MyPageLog_MyCreateDate
	ON dbo.MyPageLog
(
	MyCreateDate, 
	MyErrorNo
)
GO

Kristen

druer
Constraint Violating Yak Guru

314 Posts

Posted - 2005-11-28 : 10:36:23

Would adding a few more fields to make the index "cover" the query be something you can do, or are there to many additional fields that are needed for the report?

Have you asked for Microsoft's opinion (ie The Index Tuning Wizard)?

You should try your query in the Query Analyzer, showing the execution plan. As my "guess" would be that it will use the index on the date field. While I think you are logically right in that the DAY is 1/7'th or 1/3'rd of your data, the datatype also includes the minutes so the optimizer won't consider the data to be 1/7 or 1/3 of the data. At least I would guess that would be the case and would use your index.

If you optimize to resolve the performance with your reporting, is the performance hit for the initial update something you can live with? Obviously the clustered index on the ID field gives you the fastest UPDATE to reflect the completion time as my guess is that it what you are using for your WHERE clause to find the record to update.

Have you considered using a fill factor of 100% for either/both indexes (IDENTITY, DATE) since your data is always going upwards in a predictable manner to eliminate the wasted space that would exist with a default fill factor, since you know you don't need to leave any holes in your pages?

1 1/2 maybe 2 cents worth,
Dalton

Kristen
Test

22859 Posts

Posted - 2005-11-30 : 09:06:51

"Would adding a few more fields to make the index "cover" the query"

Nope, but we have added some to cover the WHERE clause

"Have you asked for Microsoft's opinion"

Nope

"the datatype also includes the minutes so the optimizer won't consider ..."

If I ask for all rows from "29-Nov-2005 00:00:00" to "29-Nov-2005 23:59:59" its gonna SCAN the table

But that's my main reason for thinking that moving the Clustered Index to the date is going to help - now I get a SCAN on 100% of the table, with the change I think I will get a clustered index scan on 1/5th of the table only.

"is the performance hit for the initial update something you can live with"

Good question, which I had not asked myself.

We insert (and update once) about 8,000,000 rows a day to this table. We report on it not-very-often. However, currently reporting can take 30 seconds or so - which is probably putting strain on the concurrent INSERTS/UPDATEs which are being attempted, and might be better avoided? (I don't know the answer to that one)

"Have you considered using a fill factor of 100%"

We have that on the current Clustered PK by IDENTITY, and the non-clustered DATE index - but its a good point if I re-arrange the indexes.

I think I'll run the reports with the indexes as they are, and switched round, and see just how fast the difference is. And I'll also try 1,000 INSERTs & UPDATEs to see how they look.

Test test test ...

Kristen

X002548
Not Just a Number

15586 Posts

Posted - 2005-11-30 : 10:20:13

you insert 8 million rows a day, and you don't have hotspots with the inserts because the clustered index is an identity?

I'm amazed.

Post the query that you says scans when you look for rows by dates...and some sample data too..

I don't believe it....

I'm liking MySessionID and MyCreateDate as the Cluster

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

Kristen
Test

22859 Posts

Posted - 2005-11-30 : 14:39:48

" you don't have hotspots with the inserts because the clustered index is an identity?"

Maybe we do, and it hasn't caused any problems that I've spotted

"Post the query that you says scans when you look for rows by dates"

Well ... you're probably going to tell me I'm being really amateur. Here's the query and SHOWPLAN_TEXT


SELECT	MIN(MyCreateDate), MAX(MyCreateDate), COUNT(*)
FROM	dbo.MyPageLog

2005-11-26 00:00:00.000 2005-11-30 19:24:05.227 23,595,027


DECLARE	@dtStart datetime, 
	@dtStop datetime  
SELECT	@dtStart = '27-Nov-2005', 	-- Pick ONE DAY, 20% of data
	@dtStop = '28-Nov-2005'
SELECT	[T_Type] = 'H',	-- Hourly
	[MyCreateDate] = DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0),
	MyLogType, 
	MyPageNo = CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END, 
	[D_MySessionID] = COUNT(DISTINCT MySessionID),
	[C_MyErrorNo] = SUM(CASE WHEN MyErrorNo = 0 THEN 0 ELSE 1 END),
	[D_MyUserID] = COUNT(DISTINCT MyUserID),
	[C_Count] = COUNT(*),
	[MIN_Elapsed] = MIN(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
	[AVG_Elapsed] = AVG(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
	[MAX_Elapsed] = MAX(DateDiff(Millisecond, MyCreateDate, MyUpdateDate))
INTO	#MyTempTable	-- Store it somewhere ...
FROM	dbo.MyPageLog 
WHERE	    MyCreateDate >= @dtStart 
	AND MyCreateDate < @dtStop 
GROUP BY DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0), 
	MyLogType, 
	CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END

|--Table Insert(OBJECT:([#MyTempTable]), SET:([#MyTempTable].[MAX_Elapsed]=[Expr1010], [#MyTempTable].[AVG_Elapsed]=[Expr1009], [#MyTempTable].[MIN_Elapsed]=[Expr1008], [#MyTempTable].[C_Count]=[Expr1007], [#MyTempTable].[D_MyUserID]=[Expr1006], [#MyTempTable].[C_MyErrorNo]=[Expr1005], [#MyTempTable].[D_MySessionID]=[Expr1004], [#MyTempTable].[MyPageNo]=[Expr1003], [#MyTempTable].[MyLogType]=[MyPageLog].[MyLogType], [#MyTempTable].[MyCreateDate]=[Expr1002], [#MyTempTable].[T_Type]='H'))
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003]))
|--Hash Match(Inner Join, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003])=([Expr1002], [MyPageLog].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]))
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003]))
| |--Compute Scalar(DEFINE:([Expr1004]=Convert([globalagg1036])))
| |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]) DEFINE:([globalagg1036]=SUM([partialagg1035])))
| |--Sort(ORDER BY:([Expr1002] ASC, [MyPageLog].[MyLogType] ASC, [Expr1003] ASC))
| |--Hash Match(Partial Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) DEFINE:([partialagg1035]=COUNT_BIG([MyPageLog].[MySessionID])))
| |--Hash Match(Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003], [MyPageLog].[MySessionID]), RESIDUAL:((([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) AND [MyPageLog].[MySessionID]=[MyPageLog].[MySessionID]))
| |--Table Spool
| |--Compute Scalar(DEFINE:([Expr1002]=dateadd(hour, datediff(hour, 'Jan 1 1900 12:00AM', [MyPageLog].[MyCreateDate]), 'Jan 1 1900 12:00AM'), [Expr1003]=If ([MyPageLog].[MyLogType]=1) then Convert([MyPageLog].[MyOtherData]) else [MyPageLog].[MyPageNo]))
| |--Clustered Index Scan(OBJECT:([MyDBName].[dbo].[MyPageLog].[PK_MyPageLog]), WHERE:([MyPageLog].[MyCreateDate]>=[@dtStart] AND [MyPageLog].[MyCreateDate]<[@dtStop]))
|--Nested Loops(Inner Join, WHERE:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]))
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003]))
| |--Compute Scalar(DEFINE:([Expr1006]=Convert([globalagg1038])))
| |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]) DEFINE:([globalagg1038]=SUM([partialagg1037])))
| |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]) DEFINE:([partialagg1037]=COUNT_BIG([MyPageLog].[MyUserID])))
| |--Sort(ORDER BY:([Expr1002] ASC, [MyPageLog].[MyLogType] ASC, [Expr1003] ASC))
| |--Hash Match(Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003], [MyPageLog].[MyUserID]), RESIDUAL:((([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) AND [MyPageLog].[MyUserID]=[MyPageLog].[MyUserID]))
| |--Hash Match(Partial Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003], [MyPageLog].[MyUserID]), RESIDUAL:((([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) AND [MyPageLog].[MyUserID]=[MyPageLog].[MyUserID]))
| |--Table Spool
|--Table Spool
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003]))
|--Compute Scalar(DEFINE:([Expr1007]=Convert([Expr1039]), [Expr1009]=If ([Expr1040]=0) then NULL else ([Expr1041]/Convert([Expr1040]))))
|--Hash Match(Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) DEFINE:([Expr1005]=SUM(If ([MyPageLog].[MyErrorNo]=0) then 0 else 1), [Expr1039]=COUNT(*), [Expr1040]=COUNT_BIG(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate])), [Expr1041]=SUM(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate])), [Expr1008]=MIN(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate])), [Expr1010]=MAX(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate]))))
|--Table Spool

(I didn't make the Query Plan fixed width 'coz its kinda wide, but hopefully you can cut&paste it to something that doesn't word-wrap)

Not really quite sure what I can give you for data - obviosuly I can dump some ... but what sort of thing are you after? Would SELECT COUNT(DISTINCT MyColumn1), COUNT(*) type stuff be more informative?

Thanks for your help

Kristen

Kristen
Test

22859 Posts

Posted - 2005-11-30 : 14:45:49

If I change it to:


FROM	dbo.MyPageLog 
WHERE	    MyCreateDate >= '20051127 10:00:00'
	AND MyCreateDate < '20051127 11:00:00'

so that only one hour is selected - COUNT(*)=201,044 rows - then it still uses the

|--Clustered Index Scan(OBJECT:([HAWKIN_LIVE].[dbo].[MyPageLog].[PK_MyPageLog]), WHERE:([MyPageLog].[MyCreateDate]>='Nov 27 2005 10:00AM' AND [MyPageLog].[MyCreateDate]<'Nov 27 2005 11:00AM'))

(however, if I just do a SELECT COUNT(*) on that WHERE clause it does use the Date Index)

Kristen

X002548
Not Just a Number

15586 Posts

Posted - 2005-11-30 : 15:30:14

That's not what I'm getting, I get an index seek....have you run stats?


USE Northwind
GO

CREATE TABLE MyPageLog99
(	
	MyCreateDate	datetime NULL DEFAULT (GetDate()),
	MyUpdateDate	datetime NULL ,
	MyPageLogID	int IDENTITY (10000, 1) NOT NULL ,
	MyErrorNo	int NULL DEFAULT (0),
	MySessionID	int NULL ,
	MyLogType	char(1) ,
	MyOtherData	char(1) ,
	MyPageNo	char(1) ,
	MyUserID	char(10) ,
	CONSTRAINT PK_MyPageLog PRIMARY KEY  CLUSTERED 
	(
		MyPageLogID
	)
)
GO

CREATE  INDEX IX_MyPageLog_MySessionID 
	ON dbo.MyPageLog99
(
	MySessionID
)
GO

CREATE  INDEX IX_MyPageLog_MyCreateDate
	ON dbo.MyPageLog99
(
	MyCreateDate, 
	MyErrorNo
)
GO

SET SHOWPLAN_TEXT ON
GO

DECLARE	@dtStart datetime, 
	@dtStop datetime  
SELECT	@dtStart = '27-Nov-2005', 	-- Pick ONE DAY, 20% of data
	@dtStop = '28-Nov-2005'

SELECT	[T_Type] = 'H',	-- Hourly
	[MyCreateDate] = DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0),
	MyLogType, 
	MyPageNo = CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END, 
	[D_MySessionID] = COUNT(DISTINCT MySessionID),
	[C_MyErrorNo] = SUM(CASE WHEN MyErrorNo = 0 THEN 0 ELSE 1 END),
	[D_MyUserID] = COUNT(DISTINCT MyUserID),
	[C_Count] = COUNT(*),
	[MIN_Elapsed] = MIN(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
	[AVG_Elapsed] = AVG(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
	[MAX_Elapsed] = MAX(DateDiff(Millisecond, MyCreateDate, MyUpdateDate))
INTO	#MyTempTable	-- Store it somewhere ...
FROM	dbo.MyPageLog99
WHERE	    MyCreateDate >= @dtStart 
	AND MyCreateDate < @dtStop 
GROUP BY DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0), 
	MyLogType, 
	CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END

SET SHOWPLAN TEXT OFF
GO

DROP TABLE MyPageLog99
GO

/*
|--Table Insert(OBJECT:([#MyTempTable]), SET:([#MyTempTable].[MAX_Elapsed]=[Expr1010], [#MyTempTable].[AVG_Elapsed]=[Expr1009], [#MyTempTable].[MIN_Elapsed]=[Expr1008], [#MyTempTable].[C_Count]=[Expr1007], [#MyTempTable].[D_MyUserID]=[Expr1006], [#MyTempTable].[C_MyErrorNo]=[Expr1005], [#MyTempTable].[D_MySessionID]=[Expr1004], [#MyTempTable].[MyPageNo]=[Expr1003], [#MyTempTable].[MyLogType]=[MyPageLog99].[MyLogType], [#MyTempTable].[MyCreateDate]=[Expr1002], [#MyTempTable].[T_Type]='H'))
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003]))
|--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003])=([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType]) AND [Expr1003]=[Expr1003]))
|--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003])=([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType]) AND [Expr1003]=[Expr1003]))
| |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003]))
| | |--Compute Scalar(DEFINE:([Expr1004]=Convert([Expr1037])))
| | |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]) DEFINE:([Expr1037]=COUNT_BIG([MyPageLog99].[MySessionID])))
| | |--Sort(DISTINCT ORDER BY:([Expr1002] ASC, [MyPageLog99].[MyLogType] ASC, [Expr1003] ASC, [MyPageLog99].[MySessionID] ASC))
| | |--Table Spool
| | |--Compute Scalar(DEFINE:([Expr1002]=dateadd(hour, datediff(hour, 'Jan 1 1900 12:00AM', [MyPageLog99].[MyCreateDate]), 'Jan 1 1900 12:00AM'), [Expr1003]=If (Convert([MyPageLog99].[MyLogType])=1) then Convert([MyPageLog99].[MyOtherData]) else Convert([MyPageLog99].[MyPageNo])))
| | |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([Northwind].[dbo].[MyPageLog99]))
| | |--Index Seek(OBJECT:([Northwind].[dbo].[MyPageLog99].[IX_MyPageLog_MyCreateDate]), SEEK:([MyPageLog99].[MyCreateDate] >= [@dtStart] AND [MyPageLog99].[MyCreateDate] < [@dtStop]) ORDERED FORWARD)
| |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003]))
| |--Compute Scalar(DEFINE:([Expr1007]=Convert([Expr1038]), [Expr1009]=If ([Expr1039]=0) then NULL else ([Expr1040]/Convert([Expr1039]))))
| |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]) DEFINE:([Expr1038]=Count(*), [Expr1005]=SUM(If ([MyPageLog99].[MyErrorNo]=0) then 0 else 1), [Expr1039]=COUNT_BIG(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate])), [Expr1040]=SUM(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate])), [Expr1008]=MIN(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate])), [Expr1010]=MAX(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate]))))
| |--Sort(ORDER BY:([Expr1002] ASC, [MyPageLog99].[MyLogType] ASC, [Expr1003] ASC))
| |--Table Spool
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003]))
|--Compute Scalar(DEFINE:([Expr1006]=Convert([Expr1041])))
|--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]) DEFINE:([Expr1041]=COUNT_BIG([MyPageLog99].[MyUserID])))
|--Sort(DISTINCT ORDER BY:([Expr1002] ASC, [MyPageLog99].[MyLogType] ASC, [Expr1003] ASC, [MyPageLog99].[MyUserID] ASC))
|--Table Spool
*/

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

Kristen
Test

22859 Posts

Posted - 2005-11-30 : 16:24:44

"That's not what I'm getting, I get an index seek"

That's encouraging ... Darn it, I should have run a light-weight test, rather than just throwing 25 million rows at it

"have you run stats?"

Theoretically, yes; they run daily after the purge of umpteen-million-rows. I'll get up early tomorrow morning when the system is "quiet" and run them manually to be sure, then chuck some further tests at it, and at "MyPageLog99" :)

Thanks Brett.

Kristen

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 02:19:16

I ran

UPDATE STATISTICS MyPageLog WITH FULLSCAN

and reran the query and its the same plan.

If I do a SELECT COUNT(*) it uses the Date index, but if I select the aggregates it uses a Clustered Index Scan on the PK column (identity)

COUNT(*) for the one day I am reporting on is 5,825,985 rows out of a total of 21,890,248

I reckon its just too many rows to bother doing a BOOKMARK.

So I'm back to moving the Clustered Index to the Date column, and sticking a UNIQUE Index on the IDENTITY, and worrying about whether the INSERT/UPDATE is going to be slower because it will have to go via the non-clustered index to do the UPDATE ...

SELECT COUNT(DISTINCT MyCreateDate)
FROM dbo.MyPageLog

gives 9,192,029 - so 2.3 entries for each distinct date/time, on average.

Actually I've had a thought. I could store the MIN/MAX IDENTITY number for each day, and use that as part of the WHERE clause so that the current Clustered Index dramatically reduces the number of rows that are processed.

Kristen

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2005-12-01 : 02:53:03

same here... it's a seek

have you tried sp_recompile?

--------------------
keeping it simple...

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 03:21:39

"same here... it's a seek"

I reckon that Brett's example table just doesn't have enough rows ...

"have you tried sp_recompile?"

The code isn't in an Sproc as yet ...

Kristen

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2005-12-01 : 03:43:29

instead of update, have you tried recreating the stats?

--------------------
keeping it simple...

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 04:06:56

"instead of update, have you tried recreating the stats?"

Nope, don't know about that - what's the drill please? (It takes blinking ages to redo the stats on that table every night given that a) its big an b) we have to do it regularly 'coz 1/5th gets deleted each day)

Kristen

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2005-12-01 : 04:36:04

I think there's a warning attached... better read BOL before proceeding

from BOL:
execution plan-->execution plan caching and reuse
create statistics
drop statistics

HTH

--------------------
keeping it simple...

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 05:07:54

Ah, so I could explicitly CREATE STATISTICS on the Date Column ... that might be interesting.

Thanks for that pointer.

Kristen

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2005-12-01 : 06:26:23

Something of a side-issue, but I do notice you've got two distinct counts in there, amongst the other aggregates. From what I can see in the execution plan text (not much, they're never very easy to read), it's having to run the distinct counts separately. If MyUserID is dependent on MySessionID, you might be able to make the query do one pass over the data by doing the partial aggregation manually. Though whether or not it comes out quicker probably depends on the number of rows in the hash table, which will slow down the non-distinct aggregations.

p.s. whatever happened to "people who bought this bought that too?"

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 06:48:20

"I do notice you've got two distinct counts in there"

Always handy that, an Eagle-eye. Thanks I'll look into that. MyUserID could change if they log in twice, differently, during a session, but that's pretty obtuse and I reckon I prob. only want to know how many sessions resulted in a login; I'll check it through, thanks.

"whatever happened to "people who bought this bought that too?"

The final response looked right to me, and I was way off beam, but I haven't had time to check it yet. Will report back once I've got something debugged!

Kristen

X002548
Not Just a Number

15586 Posts

Posted - 2005-12-01 : 10:27:27

OK, so what happens when you do this?


USE Northwind
GO

CREATE TABLE MyPageLog99
(	
	MyCreateDate	datetime NULL DEFAULT (GetDate()),
	MyUpdateDate	datetime NULL ,
	MyPageLogID	int IDENTITY (10000, 1) NOT NULL ,
	MyErrorNo	int NULL DEFAULT (0),
	MySessionID	int NULL ,
	MyLogType	char(1) ,
	MyOtherData	char(1) ,
	MyPageNo	char(1) ,
	MyUserID	char(10) ,
	CONSTRAINT PK_MyPageLog PRIMARY KEY  CLUSTERED 
	(
		MyPageLogID
	)
)
GO

CREATE  INDEX IX_MyPageLog_MySessionID 
	ON dbo.MyPageLog99
(
	MySessionID
)
GO

CREATE  INDEX IX_MyPageLog_MyCreateDate
	ON dbo.MyPageLog99
(
	MyCreateDate, 
	MyErrorNo
)
GO

DECLARE	@dtStart datetime, 
	@dtStop datetime  
SELECT	@dtStart = '27-Nov-2005', 	-- Pick ONE DAY, 20% of data
	@dtStop = '28-Nov-2005'

SELECT *
INTO	#MyTempTable	-- Store it somewhere ...
FROM	dbo.MyPageLog99
WHERE	    MyCreateDate >= @dtStart 
	AND MyCreateDate < @dtStop 

SELECT	[T_Type] = 'H',	-- Hourly
	[MyCreateDate] = DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0),
	MyLogType, 
	MyPageNo = CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END, 
	[D_MySessionID] = COUNT(DISTINCT MySessionID),
	[C_MyErrorNo] = SUM(CASE WHEN MyErrorNo = 0 THEN 0 ELSE 1 END),
	[D_MyUserID] = COUNT(DISTINCT MyUserID),
	[C_Count] = COUNT(*),
	[MIN_Elapsed] = MIN(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
	[AVG_Elapsed] = AVG(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
	[MAX_Elapsed] = MAX(DateDiff(Millisecond, MyCreateDate, MyUpdateDate))
INTO	#MyTempTable_2	-- Store it somewhere ...
FROM	#MyTempTable
WHERE	    MyCreateDate >= @dtStart 
	AND MyCreateDate < @dtStop 
GROUP BY DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0), 
	MyLogType, 
	CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END
GO


DROP TABLE MyPageLog99, #MyTempTable, #MyTempTable_2
GO

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 10:48:02

(0 row(s) affected)

(0 row(s) affected)

Do you want me to pump some of my real-world data into it?

With the empty table the first


SELECT *
INTO	#MyTempTable	-- Store it somewhere ...
FROM	dbo.MyPageLog99
WHERE	    MyCreateDate >= @dtStart 
	AND MyCreateDate < @dtStop

gives me

|--Table Insert(OBJECT:([#MyTempTable]), SET:([#MyTempTable].[MyUserID]=[MyPageLog99].[MyUserID], [#MyTempTable].[MyPageNo]=[MyPageLog99].[MyPageNo], [#MyTempTable].[MyOtherData]=[MyPageLog99].[MyOtherData], [#MyTempTable].[MyLogType]=[MyPageLog99].[MyLogType], [#MyTempTable].[MySessionID]=[MyPageLog99].[MySessionID], [#MyTempTable].[MyErrorNo]=[MyPageLog99].[MyErrorNo], [#MyTempTable].[MyPageLogID]=[Expr1002], [#MyTempTable].[MyUpdateDate]=[MyPageLog99].[MyUpdateDate], [#MyTempTable].[MyCreateDate]=[MyPageLog99].[MyCreateDate]))
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1002]=setidentity([MyPageLog99].[MyPageLogID], -7, 0, '#MyTempTable')))
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([Northwind].[dbo].[MyPageLog99]))
|--Index Seek(OBJECT:([Northwind].[dbo].[MyPageLog99].[IX_MyPageLog_MyCreateDate]), SEEK:([MyPageLog99].[MyCreateDate] >= [@dtStart] AND [MyPageLog99].[MyCreateDate] < [@dtStop]) ORDERED FORWARD)

but I don't know how that would way up with the subsequent

INTO #MyTempTable_2 -- Store it somewhere ...
FROM #MyTempTable

which is going to then do a SCAN - but the main table will have been released by then, which would be good news.

5,000,000 rows too many for a @TempTable [as distcint from a #TempTable] do you think?

Kristen

X002548
Not Just a Number

15586 Posts

Posted - 2005-12-01 : 11:21:34

Yes it will scan, but it's your whole population, so it's suppose to...it's the seek we are trying to get...plus I made a boo-boo...

This


WHERE	    MyCreateDate >= @dtStart 
	AND MyCreateDate < @dtStop

Is suppose to come out of the query...

Also table variables are suppose to be stored in memory...I don't think (I hate when that happens) that they use tempdb at all...I guess it's written to the virtual memory file...

Which is a good question, which is more effecient...I gotta look it up, but for large volumes, I would say #temp, not @temp...

Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 12:46:12

"I gotta look it up, but for large volumes, I would say #temp, not @temp..."

I reckon so too .... I don't want to piss folk off, so I'll do the TEST early tomorrow morning.

Thanks Brett,

Kristen

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources