Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 Transact-SQL (2000)
 Clustered index on IDENTITY or Date column?

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2005-11-26 : 05:37:59
Pretty sure I know the answer to this one, but would like some second opinions pls

I have a table that logs each [website] page that users visit.

That has the Date, IDENTITY ID, any Error number (from an UPDATE to the record if/when an error is recorded by the application), and a Session ID (i.e. many pages per session) which joins to a Session Logging Table.

OK ...

Currently I have a CLUSTERED PK on the IDENTITY.

We basically do one insert, and then one update (to record the "completed time" and any error number, and then never edit that row again. 7 days later the row will be deleted [3 days later on very busy sites].

We query by Create Date. And that takes a while, and (for a range of, say, one day) probably never uses the Date Index because it is 1/7th [or 1/3rd] of the total rows in the table.

We also have a report that looks for pages with Error Number <> 0 (within a date range).

Question:

Should I move the clustered index to the date?

Supplemental question: Should the columns with DEFAULT on them be set to NOT NULL (the application doesn't care if they contain NULLS, but in practice they aren't any - is there some performance advantage - e.g. the optimiser knowing that the column cannot contain NULLs?)

CREATE TABLE MyPageLog
(
MyCreateDate datetime NULL DEFAULT (GetDate()),
MyPageLogID int IDENTITY (10000, 1) NOT NULL ,
MyErrorNo int NULL DEFAULT (0),
MySessionID int NULL ,
... other columns ...
CONSTRAINT PK_MyPageLog PRIMARY KEY CLUSTERED
(
MyPageLogID
)
)

CREATE INDEX IX_MyPageLog_MySessionID
ON dbo.MyPageLog
(
MySessionID
)
GO

CREATE INDEX IX_MyPageLog_MyCreateDate
ON dbo.MyPageLog
(
MyCreateDate,
MyErrorNo
)
GO

Kristen

druer
Constraint Violating Yak Guru

314 Posts

Posted - 2005-11-28 : 10:36:23
Would adding a few more fields to make the index "cover" the query be something you can do, or are there to many additional fields that are needed for the report?

Have you asked for Microsoft's opinion (ie The Index Tuning Wizard)?

You should try your query in the Query Analyzer, showing the execution plan. As my "guess" would be that it will use the index on the date field. While I think you are logically right in that the DAY is 1/7'th or 1/3'rd of your data, the datatype also includes the minutes so the optimizer won't consider the data to be 1/7 or 1/3 of the data. At least I would guess that would be the case and would use your index.

If you optimize to resolve the performance with your reporting, is the performance hit for the initial update something you can live with? Obviously the clustered index on the ID field gives you the fastest UPDATE to reflect the completion time as my guess is that it what you are using for your WHERE clause to find the record to update.

Have you considered using a fill factor of 100% for either/both indexes (IDENTITY, DATE) since your data is always going upwards in a predictable manner to eliminate the wasted space that would exist with a default fill factor, since you know you don't need to leave any holes in your pages?

1 1/2 maybe 2 cents worth,
Dalton
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-11-30 : 09:06:51
"Would adding a few more fields to make the index "cover" the query"

Nope, but we have added some to cover the WHERE clause

"Have you asked for Microsoft's opinion"

Nope

"the datatype also includes the minutes so the optimizer won't consider ..."

If I ask for all rows from "29-Nov-2005 00:00:00" to "29-Nov-2005 23:59:59" its gonna SCAN the table But that's my main reason for thinking that moving the Clustered Index to the date is going to help - now I get a SCAN on 100% of the table, with the change I think I will get a clustered index scan on 1/5th of the table only.

"is the performance hit for the initial update something you can live with"

Good question, which I had not asked myself.

We insert (and update once) about 8,000,000 rows a day to this table. We report on it not-very-often. However, currently reporting can take 30 seconds or so - which is probably putting strain on the concurrent INSERTS/UPDATEs which are being attempted, and might be better avoided? (I don't know the answer to that one)

"Have you considered using a fill factor of 100%"

We have that on the current Clustered PK by IDENTITY, and the non-clustered DATE index - but its a good point if I re-arrange the indexes.

I think I'll run the reports with the indexes as they are, and switched round, and see just how fast the difference is. And I'll also try 1,000 INSERTs & UPDATEs to see how they look.

Test test test ...

Kristen
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-11-30 : 10:20:13
you insert 8 million rows a day, and you don't have hotspots with the inserts because the clustered index is an identity?

I'm amazed.

Post the query that you says scans when you look for rows by dates...and some sample data too..

I don't believe it....

I'm liking MySessionID and MyCreateDate as the Cluster



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-11-30 : 14:39:48
" you don't have hotspots with the inserts because the clustered index is an identity?"

Maybe we do, and it hasn't caused any problems that I've spotted

"Post the query that you says scans when you look for rows by dates"

Well ... you're probably going to tell me I'm being really amateur. Here's the query and SHOWPLAN_TEXT

SELECT MIN(MyCreateDate), MAX(MyCreateDate), COUNT(*)
FROM dbo.MyPageLog

2005-11-26 00:00:00.000 2005-11-30 19:24:05.227 23,595,027

DECLARE @dtStart datetime,
@dtStop datetime
SELECT @dtStart = '27-Nov-2005', -- Pick ONE DAY, 20% of data
@dtStop = '28-Nov-2005'
SELECT [T_Type] = 'H', -- Hourly
[MyCreateDate] = DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0),
MyLogType,
MyPageNo = CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END,
[D_MySessionID] = COUNT(DISTINCT MySessionID),
[C_MyErrorNo] = SUM(CASE WHEN MyErrorNo = 0 THEN 0 ELSE 1 END),
[D_MyUserID] = COUNT(DISTINCT MyUserID),
[C_Count] = COUNT(*),
[MIN_Elapsed] = MIN(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
[AVG_Elapsed] = AVG(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
[MAX_Elapsed] = MAX(DateDiff(Millisecond, MyCreateDate, MyUpdateDate))
INTO #MyTempTable -- Store it somewhere ...
FROM dbo.MyPageLog
WHERE MyCreateDate >= @dtStart
AND MyCreateDate < @dtStop
GROUP BY DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0),
MyLogType,
CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END

|--Table Insert(OBJECT:([#MyTempTable]), SET:([#MyTempTable].[MAX_Elapsed]=[Expr1010], [#MyTempTable].[AVG_Elapsed]=[Expr1009], [#MyTempTable].[MIN_Elapsed]=[Expr1008], [#MyTempTable].[C_Count]=[Expr1007], [#MyTempTable].[D_MyUserID]=[Expr1006], [#MyTempTable].[C_MyErrorNo]=[Expr1005], [#MyTempTable].[D_MySessionID]=[Expr1004], [#MyTempTable].[MyPageNo]=[Expr1003], [#MyTempTable].[MyLogType]=[MyPageLog].[MyLogType], [#MyTempTable].[MyCreateDate]=[Expr1002], [#MyTempTable].[T_Type]='H'))
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003]))
|--Hash Match(Inner Join, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003])=([Expr1002], [MyPageLog].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]))
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003]))
| |--Compute Scalar(DEFINE:([Expr1004]=Convert([globalagg1036])))
| |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]) DEFINE:([globalagg1036]=SUM([partialagg1035])))
| |--Sort(ORDER BY:([Expr1002] ASC, [MyPageLog].[MyLogType] ASC, [Expr1003] ASC))
| |--Hash Match(Partial Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) DEFINE:([partialagg1035]=COUNT_BIG([MyPageLog].[MySessionID])))
| |--Hash Match(Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003], [MyPageLog].[MySessionID]), RESIDUAL:((([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) AND [MyPageLog].[MySessionID]=[MyPageLog].[MySessionID]))
| |--Table Spool
| |--Compute Scalar(DEFINE:([Expr1002]=dateadd(hour, datediff(hour, 'Jan 1 1900 12:00AM', [MyPageLog].[MyCreateDate]), 'Jan 1 1900 12:00AM'), [Expr1003]=If ([MyPageLog].[MyLogType]=1) then Convert([MyPageLog].[MyOtherData]) else [MyPageLog].[MyPageNo]))
| |--Clustered Index Scan(OBJECT:([MyDBName].[dbo].[MyPageLog].[PK_MyPageLog]), WHERE:([MyPageLog].[MyCreateDate]>=[@dtStart] AND [MyPageLog].[MyCreateDate]<[@dtStop]))
|--Nested Loops(Inner Join, WHERE:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]))
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003]))
| |--Compute Scalar(DEFINE:([Expr1006]=Convert([globalagg1038])))
| |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]) DEFINE:([globalagg1038]=SUM([partialagg1037])))
| |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]) DEFINE:([partialagg1037]=COUNT_BIG([MyPageLog].[MyUserID])))
| |--Sort(ORDER BY:([Expr1002] ASC, [MyPageLog].[MyLogType] ASC, [Expr1003] ASC))
| |--Hash Match(Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003], [MyPageLog].[MyUserID]), RESIDUAL:((([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) AND [MyPageLog].[MyUserID]=[MyPageLog].[MyUserID]))
| |--Hash Match(Partial Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003], [MyPageLog].[MyUserID]), RESIDUAL:((([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) AND [MyPageLog].[MyUserID]=[MyPageLog].[MyUserID]))
| |--Table Spool
|--Table Spool
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003]))
|--Compute Scalar(DEFINE:([Expr1007]=Convert([Expr1039]), [Expr1009]=If ([Expr1040]=0) then NULL else ([Expr1041]/Convert([Expr1040]))))
|--Hash Match(Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) DEFINE:([Expr1005]=SUM(If ([MyPageLog].[MyErrorNo]=0) then 0 else 1), [Expr1039]=COUNT(*), [Expr1040]=COUNT_BIG(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate])), [Expr1041]=SUM(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate])), [Expr1008]=MIN(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate])), [Expr1010]=MAX(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate]))))
|--Table Spool

(I didn't make the Query Plan fixed width 'coz its kinda wide, but hopefully you can cut&paste it to something that doesn't word-wrap)

Not really quite sure what I can give you for data - obviosuly I can dump some ... but what sort of thing are you after? Would SELECT COUNT(DISTINCT MyColumn1), COUNT(*) type stuff be more informative?

Thanks for your help

Kristen
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-11-30 : 14:45:49
If I change it to:

FROM dbo.MyPageLog
WHERE MyCreateDate >= '20051127 10:00:00'
AND MyCreateDate < '20051127 11:00:00'

so that only one hour is selected - COUNT(*)=201,044 rows - then it still uses the

|--Clustered Index Scan(OBJECT:([HAWKIN_LIVE].[dbo].[MyPageLog].[PK_MyPageLog]), WHERE:([MyPageLog].[MyCreateDate]>='Nov 27 2005 10:00AM' AND [MyPageLog].[MyCreateDate]<'Nov 27 2005 11:00AM'))

(however, if I just do a SELECT COUNT(*) on that WHERE clause it does use the Date Index)

Kristen
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-11-30 : 15:30:14
That's not what I'm getting, I get an index seek....have you run stats?


USE Northwind
GO

CREATE TABLE MyPageLog99
(
MyCreateDate datetime NULL DEFAULT (GetDate()),
MyUpdateDate datetime NULL ,
MyPageLogID int IDENTITY (10000, 1) NOT NULL ,
MyErrorNo int NULL DEFAULT (0),
MySessionID int NULL ,
MyLogType char(1) ,
MyOtherData char(1) ,
MyPageNo char(1) ,
MyUserID char(10) ,
CONSTRAINT PK_MyPageLog PRIMARY KEY CLUSTERED
(
MyPageLogID
)
)
GO

CREATE INDEX IX_MyPageLog_MySessionID
ON dbo.MyPageLog99
(
MySessionID
)
GO

CREATE INDEX IX_MyPageLog_MyCreateDate
ON dbo.MyPageLog99
(
MyCreateDate,
MyErrorNo
)
GO

SET SHOWPLAN_TEXT ON
GO

DECLARE @dtStart datetime,
@dtStop datetime
SELECT @dtStart = '27-Nov-2005', -- Pick ONE DAY, 20% of data
@dtStop = '28-Nov-2005'

SELECT [T_Type] = 'H', -- Hourly
[MyCreateDate] = DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0),
MyLogType,
MyPageNo = CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END,
[D_MySessionID] = COUNT(DISTINCT MySessionID),
[C_MyErrorNo] = SUM(CASE WHEN MyErrorNo = 0 THEN 0 ELSE 1 END),
[D_MyUserID] = COUNT(DISTINCT MyUserID),
[C_Count] = COUNT(*),
[MIN_Elapsed] = MIN(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
[AVG_Elapsed] = AVG(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
[MAX_Elapsed] = MAX(DateDiff(Millisecond, MyCreateDate, MyUpdateDate))
INTO #MyTempTable -- Store it somewhere ...
FROM dbo.MyPageLog99
WHERE MyCreateDate >= @dtStart
AND MyCreateDate < @dtStop
GROUP BY DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0),
MyLogType,
CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END

SET SHOWPLAN TEXT OFF
GO

DROP TABLE MyPageLog99
GO



/*
|--Table Insert(OBJECT:([#MyTempTable]), SET:([#MyTempTable].[MAX_Elapsed]=[Expr1010], [#MyTempTable].[AVG_Elapsed]=[Expr1009], [#MyTempTable].[MIN_Elapsed]=[Expr1008], [#MyTempTable].[C_Count]=[Expr1007], [#MyTempTable].[D_MyUserID]=[Expr1006], [#MyTempTable].[C_MyErrorNo]=[Expr1005], [#MyTempTable].[D_MySessionID]=[Expr1004], [#MyTempTable].[MyPageNo]=[Expr1003], [#MyTempTable].[MyLogType]=[MyPageLog99].[MyLogType], [#MyTempTable].[MyCreateDate]=[Expr1002], [#MyTempTable].[T_Type]='H'))
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003]))
|--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003])=([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType]) AND [Expr1003]=[Expr1003]))
|--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003])=([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType]) AND [Expr1003]=[Expr1003]))
| |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003]))
| | |--Compute Scalar(DEFINE:([Expr1004]=Convert([Expr1037])))
| | |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]) DEFINE:([Expr1037]=COUNT_BIG([MyPageLog99].[MySessionID])))
| | |--Sort(DISTINCT ORDER BY:([Expr1002] ASC, [MyPageLog99].[MyLogType] ASC, [Expr1003] ASC, [MyPageLog99].[MySessionID] ASC))
| | |--Table Spool
| | |--Compute Scalar(DEFINE:([Expr1002]=dateadd(hour, datediff(hour, 'Jan 1 1900 12:00AM', [MyPageLog99].[MyCreateDate]), 'Jan 1 1900 12:00AM'), [Expr1003]=If (Convert([MyPageLog99].[MyLogType])=1) then Convert([MyPageLog99].[MyOtherData]) else Convert([MyPageLog99].[MyPageNo])))
| | |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([Northwind].[dbo].[MyPageLog99]))
| |
|--Index Seek(OBJECT:([Northwind].[dbo].[MyPageLog99].[IX_MyPageLog_MyCreateDate]), SEEK:([MyPageLog99].[MyCreateDate] >= [@dtStart] AND [MyPageLog99].[MyCreateDate] < [@dtStop]) ORDERED FORWARD)
| |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003]))
| |--Compute Scalar(DEFINE:([Expr1007]=Convert([Expr1038]), [Expr1009]=If ([Expr1039]=0) then NULL else ([Expr1040]/Convert([Expr1039]))))
| |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]) DEFINE:([Expr1038]=Count(*), [Expr1005]=SUM(If ([MyPageLog99].[MyErrorNo]=0) then 0 else 1), [Expr1039]=COUNT_BIG(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate])), [Expr1040]=SUM(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate])), [Expr1008]=MIN(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate])), [Expr1010]=MAX(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate]))))
| |--Sort(ORDER BY:([Expr1002] ASC, [MyPageLog99].[MyLogType] ASC, [Expr1003] ASC))
| |--Table Spool
|--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003]))
|--Compute Scalar(DEFINE:([Expr1006]=Convert([Expr1041])))
|--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]) DEFINE:([Expr1041]=COUNT_BIG([MyPageLog99].[MyUserID])))
|--Sort(DISTINCT ORDER BY:([Expr1002] ASC, [MyPageLog99].[MyLogType] ASC, [Expr1003] ASC, [MyPageLog99].[MyUserID] ASC))
|--Table Spool
*/


Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-11-30 : 16:24:44
"That's not what I'm getting, I get an index seek"

That's encouraging ... Darn it, I should have run a light-weight test, rather than just throwing 25 million rows at it

"have you run stats?"

Theoretically, yes; they run daily after the purge of umpteen-million-rows. I'll get up early tomorrow morning when the system is "quiet" and run them manually to be sure, then chuck some further tests at it, and at "MyPageLog99" :)

Thanks Brett.

Kristen
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 02:19:16
I ran

UPDATE STATISTICS MyPageLog WITH FULLSCAN

and reran the query and its the same plan.

If I do a SELECT COUNT(*) it uses the Date index, but if I select the aggregates it uses a Clustered Index Scan on the PK column (identity)

COUNT(*) for the one day I am reporting on is 5,825,985 rows out of a total of 21,890,248

I reckon its just too many rows to bother doing a BOOKMARK.

So I'm back to moving the Clustered Index to the Date column, and sticking a UNIQUE Index on the IDENTITY, and worrying about whether the INSERT/UPDATE is going to be slower because it will have to go via the non-clustered index to do the UPDATE ...

SELECT COUNT(DISTINCT MyCreateDate)
FROM dbo.MyPageLog

gives 9,192,029 - so 2.3 entries for each distinct date/time, on average.

Actually I've had a thought. I could store the MIN/MAX IDENTITY number for each day, and use that as part of the WHERE clause so that the current Clustered Index dramatically reduces the number of rows that are processed.

Kristen
Go to Top of Page

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2005-12-01 : 02:53:03
same here... it's a seek

have you tried sp_recompile?

--------------------
keeping it simple...
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 03:21:39
"same here... it's a seek"

I reckon that Brett's example table just doesn't have enough rows ...

"have you tried sp_recompile?"

The code isn't in an Sproc as yet ...

Kristen
Go to Top of Page

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2005-12-01 : 03:43:29
instead of update, have you tried recreating the stats?



--------------------
keeping it simple...
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 04:06:56
"instead of update, have you tried recreating the stats?"

Nope, don't know about that - what's the drill please? (It takes blinking ages to redo the stats on that table every night given that a) its big an b) we have to do it regularly 'coz 1/5th gets deleted each day)

Kristen
Go to Top of Page

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2005-12-01 : 04:36:04
I think there's a warning attached... better read BOL before proceeding

from BOL:
execution plan-->execution plan caching and reuse
create statistics
drop statistics

HTH


--------------------
keeping it simple...
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 05:07:54
Ah, so I could explicitly CREATE STATISTICS on the Date Column ... that might be interesting.

Thanks for that pointer.

Kristen
Go to Top of Page

Arnold Fribble
Yak-finder General

1961 Posts

Posted - 2005-12-01 : 06:26:23
Something of a side-issue, but I do notice you've got two distinct counts in there, amongst the other aggregates. From what I can see in the execution plan text (not much, they're never very easy to read), it's having to run the distinct counts separately. If MyUserID is dependent on MySessionID, you might be able to make the query do one pass over the data by doing the partial aggregation manually. Though whether or not it comes out quicker probably depends on the number of rows in the hash table, which will slow down the non-distinct aggregations.

p.s. whatever happened to "people who bought this bought that too?"
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 06:48:20
"I do notice you've got two distinct counts in there"

Always handy that, an Eagle-eye. Thanks I'll look into that. MyUserID could change if they log in twice, differently, during a session, but that's pretty obtuse and I reckon I prob. only want to know how many sessions resulted in a login; I'll check it through, thanks.

"whatever happened to "people who bought this bought that too?"

The final response looked right to me, and I was way off beam, but I haven't had time to check it yet. Will report back once I've got something debugged!

Kristen
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-12-01 : 10:27:27
OK, so what happens when you do this?


USE Northwind
GO

CREATE TABLE MyPageLog99
(
MyCreateDate datetime NULL DEFAULT (GetDate()),
MyUpdateDate datetime NULL ,
MyPageLogID int IDENTITY (10000, 1) NOT NULL ,
MyErrorNo int NULL DEFAULT (0),
MySessionID int NULL ,
MyLogType char(1) ,
MyOtherData char(1) ,
MyPageNo char(1) ,
MyUserID char(10) ,
CONSTRAINT PK_MyPageLog PRIMARY KEY CLUSTERED
(
MyPageLogID
)
)
GO

CREATE INDEX IX_MyPageLog_MySessionID
ON dbo.MyPageLog99
(
MySessionID
)
GO

CREATE INDEX IX_MyPageLog_MyCreateDate
ON dbo.MyPageLog99
(
MyCreateDate,
MyErrorNo
)
GO

DECLARE @dtStart datetime,
@dtStop datetime
SELECT @dtStart = '27-Nov-2005', -- Pick ONE DAY, 20% of data
@dtStop = '28-Nov-2005'

SELECT *
INTO #MyTempTable -- Store it somewhere ...
FROM dbo.MyPageLog99
WHERE MyCreateDate >= @dtStart
AND MyCreateDate < @dtStop

SELECT [T_Type] = 'H', -- Hourly
[MyCreateDate] = DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0),
MyLogType,
MyPageNo = CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END,
[D_MySessionID] = COUNT(DISTINCT MySessionID),
[C_MyErrorNo] = SUM(CASE WHEN MyErrorNo = 0 THEN 0 ELSE 1 END),
[D_MyUserID] = COUNT(DISTINCT MyUserID),
[C_Count] = COUNT(*),
[MIN_Elapsed] = MIN(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
[AVG_Elapsed] = AVG(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)),
[MAX_Elapsed] = MAX(DateDiff(Millisecond, MyCreateDate, MyUpdateDate))
INTO #MyTempTable_2 -- Store it somewhere ...
FROM #MyTempTable
WHERE MyCreateDate >= @dtStart
AND MyCreateDate < @dtStop

GROUP BY DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0),
MyLogType,
CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END
GO


DROP TABLE MyPageLog99, #MyTempTable, #MyTempTable_2
GO





Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 10:48:02

(0 row(s) affected)


(0 row(s) affected)



Do you want me to pump some of my real-world data into it?

With the empty table the first

SELECT *
INTO #MyTempTable -- Store it somewhere ...
FROM dbo.MyPageLog99
WHERE MyCreateDate >= @dtStart
AND MyCreateDate < @dtStop

gives me

|--Table Insert(OBJECT:([#MyTempTable]), SET:([#MyTempTable].[MyUserID]=[MyPageLog99].[MyUserID], [#MyTempTable].[MyPageNo]=[MyPageLog99].[MyPageNo], [#MyTempTable].[MyOtherData]=[MyPageLog99].[MyOtherData], [#MyTempTable].[MyLogType]=[MyPageLog99].[MyLogType], [#MyTempTable].[MySessionID]=[MyPageLog99].[MySessionID], [#MyTempTable].[MyErrorNo]=[MyPageLog99].[MyErrorNo], [#MyTempTable].[MyPageLogID]=[Expr1002], [#MyTempTable].[MyUpdateDate]=[MyPageLog99].[MyUpdateDate], [#MyTempTable].[MyCreateDate]=[MyPageLog99].[MyCreateDate]))
|--Top(ROWCOUNT est 0)
|--Compute Scalar(DEFINE:([Expr1002]=setidentity([MyPageLog99].[MyPageLogID], -7, 0, '#MyTempTable')))
|--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([Northwind].[dbo].[MyPageLog99]))
|--Index Seek(OBJECT:([Northwind].[dbo].[MyPageLog99].[IX_MyPageLog_MyCreateDate]), SEEK:([MyPageLog99].[MyCreateDate] >= [@dtStart] AND [MyPageLog99].[MyCreateDate] < [@dtStop]) ORDERED FORWARD)


but I don't know how that would way up with the subsequent

INTO #MyTempTable_2 -- Store it somewhere ...
FROM #MyTempTable

which is going to then do a SCAN - but the main table will have been released by then, which would be good news.

5,000,000 rows too many for a @TempTable [as distcint from a #TempTable] do you think?

Kristen
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-12-01 : 11:21:34
Yes it will scan, but it's your whole population, so it's suppose to...it's the seek we are trying to get...plus I made a boo-boo...

This


WHERE MyCreateDate >= @dtStart
AND MyCreateDate < @dtStop



Is suppose to come out of the query...


Also table variables are suppose to be stored in memory...I don't think (I hate when that happens) that they use tempdb at all...I guess it's written to the virtual memory file...

Which is a good question, which is more effecient...I gotta look it up, but for large volumes, I would say #temp, not @temp...



Brett

8-)

Hint: Want your questions answered fast? Follow the direction in this link
http://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspx

Add yourself!
http://www.frappr.com/sqlteam
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-12-01 : 12:46:12
"I gotta look it up, but for large volumes, I would say #temp, not @temp..."

I reckon so too .... I don't want to piss folk off, so I'll do the TEST early tomorrow morning.

Thanks Brett,

Kristen
Go to Top of Page
    Next Page

- Advertisement -