Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
Kristen
Test
22859 Posts |
Posted - 2005-11-26 : 05:37:59
|
Pretty sure I know the answer to this one, but would like some second opinions plsI have a table that logs each [website] page that users visit.That has the Date, IDENTITY ID, any Error number (from an UPDATE to the record if/when an error is recorded by the application), and a Session ID (i.e. many pages per session) which joins to a Session Logging Table.OK ...Currently I have a CLUSTERED PK on the IDENTITY.We basically do one insert, and then one update (to record the "completed time" and any error number, and then never edit that row again. 7 days later the row will be deleted [3 days later on very busy sites].We query by Create Date. And that takes a while, and (for a range of, say, one day) probably never uses the Date Index because it is 1/7th [or 1/3rd] of the total rows in the table.We also have a report that looks for pages with Error Number <> 0 (within a date range).Question:Should I move the clustered index to the date?Supplemental question: Should the columns with DEFAULT on them be set to NOT NULL (the application doesn't care if they contain NULLS, but in practice they aren't any - is there some performance advantage - e.g. the optimiser knowing that the column cannot contain NULLs?)CREATE TABLE MyPageLog( MyCreateDate datetime NULL DEFAULT (GetDate()), MyPageLogID int IDENTITY (10000, 1) NOT NULL , MyErrorNo int NULL DEFAULT (0), MySessionID int NULL , ... other columns ... CONSTRAINT PK_MyPageLog PRIMARY KEY CLUSTERED ( MyPageLogID ))CREATE INDEX IX_MyPageLog_MySessionID ON dbo.MyPageLog( MySessionID)GOCREATE INDEX IX_MyPageLog_MyCreateDate ON dbo.MyPageLog( MyCreateDate, MyErrorNo)GO Kristen |
|
|
druer
Constraint Violating Yak Guru
314 Posts |
Posted - 2005-11-28 : 10:36:23
|
| Would adding a few more fields to make the index "cover" the query be something you can do, or are there to many additional fields that are needed for the report?Have you asked for Microsoft's opinion (ie The Index Tuning Wizard)?You should try your query in the Query Analyzer, showing the execution plan. As my "guess" would be that it will use the index on the date field. While I think you are logically right in that the DAY is 1/7'th or 1/3'rd of your data, the datatype also includes the minutes so the optimizer won't consider the data to be 1/7 or 1/3 of the data. At least I would guess that would be the case and would use your index.If you optimize to resolve the performance with your reporting, is the performance hit for the initial update something you can live with? Obviously the clustered index on the ID field gives you the fastest UPDATE to reflect the completion time as my guess is that it what you are using for your WHERE clause to find the record to update. Have you considered using a fill factor of 100% for either/both indexes (IDENTITY, DATE) since your data is always going upwards in a predictable manner to eliminate the wasted space that would exist with a default fill factor, since you know you don't need to leave any holes in your pages?1 1/2 maybe 2 cents worth,Dalton |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-11-30 : 09:06:51
|
"Would adding a few more fields to make the index "cover" the query"Nope, but we have added some to cover the WHERE clause"Have you asked for Microsoft's opinion"Nope "the datatype also includes the minutes so the optimizer won't consider ..."If I ask for all rows from "29-Nov-2005 00:00:00" to "29-Nov-2005 23:59:59" its gonna SCAN the table But that's my main reason for thinking that moving the Clustered Index to the date is going to help - now I get a SCAN on 100% of the table, with the change I think I will get a clustered index scan on 1/5th of the table only."is the performance hit for the initial update something you can live with"Good question, which I had not asked myself.We insert (and update once) about 8,000,000 rows a day to this table. We report on it not-very-often. However, currently reporting can take 30 seconds or so - which is probably putting strain on the concurrent INSERTS/UPDATEs which are being attempted, and might be better avoided? (I don't know the answer to that one)"Have you considered using a fill factor of 100%"We have that on the current Clustered PK by IDENTITY, and the non-clustered DATE index - but its a good point if I re-arrange the indexes.I think I'll run the reports with the indexes as they are, and switched round, and see just how fast the difference is. And I'll also try 1,000 INSERTs & UPDATEs to see how they look.Test test test ... Kristen |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2005-11-30 : 10:20:13
|
| you insert 8 million rows a day, and you don't have hotspots with the inserts because the clustered index is an identity?I'm amazed.Post the query that you says scans when you look for rows by dates...and some sample data too..I don't believe it....I'm liking MySessionID and MyCreateDate as the ClusterBrett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-11-30 : 14:39:48
|
" you don't have hotspots with the inserts because the clustered index is an identity?"Maybe we do, and it hasn't caused any problems that I've spotted "Post the query that you says scans when you look for rows by dates"Well ... you're probably going to tell me I'm being really amateur. Here's the query and SHOWPLAN_TEXTSELECT MIN(MyCreateDate), MAX(MyCreateDate), COUNT(*)FROM dbo.MyPageLog 2005-11-26 00:00:00.000 2005-11-30 19:24:05.227 23,595,027DECLARE @dtStart datetime, @dtStop datetime SELECT @dtStart = '27-Nov-2005', -- Pick ONE DAY, 20% of data @dtStop = '28-Nov-2005'SELECT [T_Type] = 'H', -- Hourly [MyCreateDate] = DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0), MyLogType, MyPageNo = CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END, [D_MySessionID] = COUNT(DISTINCT MySessionID), [C_MyErrorNo] = SUM(CASE WHEN MyErrorNo = 0 THEN 0 ELSE 1 END), [D_MyUserID] = COUNT(DISTINCT MyUserID), [C_Count] = COUNT(*), [MIN_Elapsed] = MIN(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)), [AVG_Elapsed] = AVG(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)), [MAX_Elapsed] = MAX(DateDiff(Millisecond, MyCreateDate, MyUpdateDate))INTO #MyTempTable -- Store it somewhere ...FROM dbo.MyPageLog WHERE MyCreateDate >= @dtStart AND MyCreateDate < @dtStop GROUP BY DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0), MyLogType, CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END |--Table Insert(OBJECT:([#MyTempTable]), SET:([#MyTempTable].[MAX_Elapsed]=[Expr1010], [#MyTempTable].[AVG_Elapsed]=[Expr1009], [#MyTempTable].[MIN_Elapsed]=[Expr1008], [#MyTempTable].[C_Count]=[Expr1007], [#MyTempTable].[D_MyUserID]=[Expr1006], [#MyTempTable].[C_MyErrorNo]=[Expr1005], [#MyTempTable].[D_MySessionID]=[Expr1004], [#MyTempTable].[MyPageNo]=[Expr1003], [#MyTempTable].[MyLogType]=[MyPageLog].[MyLogType], [#MyTempTable].[MyCreateDate]=[Expr1002], [#MyTempTable].[T_Type]='H')) |--Top(ROWCOUNT est 0) |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003])) |--Hash Match(Inner Join, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003])=([Expr1002], [MyPageLog].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003])) |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003])) | |--Compute Scalar(DEFINE:([Expr1004]=Convert([globalagg1036]))) | |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]) DEFINE:([globalagg1036]=SUM([partialagg1035]))) | |--Sort(ORDER BY:([Expr1002] ASC, [MyPageLog].[MyLogType] ASC, [Expr1003] ASC)) | |--Hash Match(Partial Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) DEFINE:([partialagg1035]=COUNT_BIG([MyPageLog].[MySessionID]))) | |--Hash Match(Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003], [MyPageLog].[MySessionID]), RESIDUAL:((([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) AND [MyPageLog].[MySessionID]=[MyPageLog].[MySessionID])) | |--Table Spool | |--Compute Scalar(DEFINE:([Expr1002]=dateadd(hour, datediff(hour, 'Jan 1 1900 12:00AM', [MyPageLog].[MyCreateDate]), 'Jan 1 1900 12:00AM'), [Expr1003]=If ([MyPageLog].[MyLogType]=1) then Convert([MyPageLog].[MyOtherData]) else [MyPageLog].[MyPageNo])) | |--Clustered Index Scan(OBJECT:([MyDBName].[dbo].[MyPageLog].[PK_MyPageLog]), WHERE:([MyPageLog].[MyCreateDate]>=[@dtStart] AND [MyPageLog].[MyCreateDate]<[@dtStop])) |--Nested Loops(Inner Join, WHERE:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003])) |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003])) | |--Compute Scalar(DEFINE:([Expr1006]=Convert([globalagg1038]))) | |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]) DEFINE:([globalagg1038]=SUM([partialagg1037]))) | |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]) DEFINE:([partialagg1037]=COUNT_BIG([MyPageLog].[MyUserID]))) | |--Sort(ORDER BY:([Expr1002] ASC, [MyPageLog].[MyLogType] ASC, [Expr1003] ASC)) | |--Hash Match(Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003], [MyPageLog].[MyUserID]), RESIDUAL:((([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) AND [MyPageLog].[MyUserID]=[MyPageLog].[MyUserID])) | |--Hash Match(Partial Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003], [MyPageLog].[MyUserID]), RESIDUAL:((([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) AND [MyPageLog].[MyUserID]=[MyPageLog].[MyUserID])) | |--Table Spool |--Table Spool |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType], [Expr1003]=[Expr1003])) |--Compute Scalar(DEFINE:([Expr1007]=Convert([Expr1039]), [Expr1009]=If ([Expr1040]=0) then NULL else ([Expr1041]/Convert([Expr1040])))) |--Hash Match(Aggregate, HASH:([Expr1002], [MyPageLog].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog].[MyLogType]=[MyPageLog].[MyLogType]) AND [Expr1003]=[Expr1003]) DEFINE:([Expr1005]=SUM(If ([MyPageLog].[MyErrorNo]=0) then 0 else 1), [Expr1039]=COUNT(*), [Expr1040]=COUNT_BIG(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate])), [Expr1041]=SUM(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate])), [Expr1008]=MIN(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate])), [Expr1010]=MAX(datediff(millisecond, [MyPageLog].[MyCreateDate], [MyPageLog].[MyUpdateDate])))) |--Table Spool(I didn't make the Query Plan fixed width 'coz its kinda wide, but hopefully you can cut&paste it to something that doesn't word-wrap)Not really quite sure what I can give you for data - obviosuly I can dump some ... but what sort of thing are you after? Would SELECT COUNT(DISTINCT MyColumn1), COUNT(*) type stuff be more informative?Thanks for your helpKristen |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-11-30 : 14:45:49
|
If I change it to:FROM dbo.MyPageLog WHERE MyCreateDate >= '20051127 10:00:00' AND MyCreateDate < '20051127 11:00:00' so that only one hour is selected - COUNT(*)=201,044 rows - then it still uses the |--Clustered Index Scan(OBJECT:([HAWKIN_LIVE].[dbo].[MyPageLog].[PK_MyPageLog]), WHERE:([MyPageLog].[MyCreateDate]>='Nov 27 2005 10:00AM' AND [MyPageLog].[MyCreateDate]<'Nov 27 2005 11:00AM'))(however, if I just do a SELECT COUNT(*) on that WHERE clause it does use the Date Index)Kristen |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2005-11-30 : 15:30:14
|
That's not what I'm getting, I get an index seek....have you run stats?USE NorthwindGOCREATE TABLE MyPageLog99( MyCreateDate datetime NULL DEFAULT (GetDate()), MyUpdateDate datetime NULL , MyPageLogID int IDENTITY (10000, 1) NOT NULL , MyErrorNo int NULL DEFAULT (0), MySessionID int NULL , MyLogType char(1) , MyOtherData char(1) , MyPageNo char(1) , MyUserID char(10) , CONSTRAINT PK_MyPageLog PRIMARY KEY CLUSTERED ( MyPageLogID ))GOCREATE INDEX IX_MyPageLog_MySessionID ON dbo.MyPageLog99( MySessionID)GOCREATE INDEX IX_MyPageLog_MyCreateDate ON dbo.MyPageLog99( MyCreateDate, MyErrorNo)GOSET SHOWPLAN_TEXT ONGODECLARE @dtStart datetime, @dtStop datetime SELECT @dtStart = '27-Nov-2005', -- Pick ONE DAY, 20% of data @dtStop = '28-Nov-2005'SELECT [T_Type] = 'H', -- Hourly [MyCreateDate] = DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0), MyLogType, MyPageNo = CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END, [D_MySessionID] = COUNT(DISTINCT MySessionID), [C_MyErrorNo] = SUM(CASE WHEN MyErrorNo = 0 THEN 0 ELSE 1 END), [D_MyUserID] = COUNT(DISTINCT MyUserID), [C_Count] = COUNT(*), [MIN_Elapsed] = MIN(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)), [AVG_Elapsed] = AVG(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)), [MAX_Elapsed] = MAX(DateDiff(Millisecond, MyCreateDate, MyUpdateDate))INTO #MyTempTable -- Store it somewhere ...FROM dbo.MyPageLog99WHERE MyCreateDate >= @dtStart AND MyCreateDate < @dtStop GROUP BY DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0), MyLogType, CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo ENDSET SHOWPLAN TEXT OFFGODROP TABLE MyPageLog99GO /* |--Table Insert(OBJECT:([#MyTempTable]), SET:([#MyTempTable].[MAX_Elapsed]=[Expr1010], [#MyTempTable].[AVG_Elapsed]=[Expr1009], [#MyTempTable].[MIN_Elapsed]=[Expr1008], [#MyTempTable].[C_Count]=[Expr1007], [#MyTempTable].[D_MyUserID]=[Expr1006], [#MyTempTable].[C_MyErrorNo]=[Expr1005], [#MyTempTable].[D_MySessionID]=[Expr1004], [#MyTempTable].[MyPageNo]=[Expr1003], [#MyTempTable].[MyLogType]=[MyPageLog99].[MyLogType], [#MyTempTable].[MyCreateDate]=[Expr1002], [#MyTempTable].[T_Type]='H')) |--Top(ROWCOUNT est 0) |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003])) |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003])=([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType]) AND [Expr1003]=[Expr1003])) |--Merge Join(Inner Join, MANY-TO-MANY MERGE:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003])=([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]), RESIDUAL:(([Expr1002]=[Expr1002] AND [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType]) AND [Expr1003]=[Expr1003])) | |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003])) | | |--Compute Scalar(DEFINE:([Expr1004]=Convert([Expr1037]))) | | |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]) DEFINE:([Expr1037]=COUNT_BIG([MyPageLog99].[MySessionID]))) | | |--Sort(DISTINCT ORDER BY:([Expr1002] ASC, [MyPageLog99].[MyLogType] ASC, [Expr1003] ASC, [MyPageLog99].[MySessionID] ASC)) | | |--Table Spool | | |--Compute Scalar(DEFINE:([Expr1002]=dateadd(hour, datediff(hour, 'Jan 1 1900 12:00AM', [MyPageLog99].[MyCreateDate]), 'Jan 1 1900 12:00AM'), [Expr1003]=If (Convert([MyPageLog99].[MyLogType])=1) then Convert([MyPageLog99].[MyOtherData]) else Convert([MyPageLog99].[MyPageNo]))) | | |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([Northwind].[dbo].[MyPageLog99])) | | |--Index Seek(OBJECT:([Northwind].[dbo].[MyPageLog99].[IX_MyPageLog_MyCreateDate]), SEEK:([MyPageLog99].[MyCreateDate] >= [@dtStart] AND [MyPageLog99].[MyCreateDate] < [@dtStop]) ORDERED FORWARD) | |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003])) | |--Compute Scalar(DEFINE:([Expr1007]=Convert([Expr1038]), [Expr1009]=If ([Expr1039]=0) then NULL else ([Expr1040]/Convert([Expr1039])))) | |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]) DEFINE:([Expr1038]=Count(*), [Expr1005]=SUM(If ([MyPageLog99].[MyErrorNo]=0) then 0 else 1), [Expr1039]=COUNT_BIG(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate])), [Expr1040]=SUM(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate])), [Expr1008]=MIN(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate])), [Expr1010]=MAX(datediff(millisecond, [MyPageLog99].[MyCreateDate], [MyPageLog99].[MyUpdateDate])))) | |--Sort(ORDER BY:([Expr1002] ASC, [MyPageLog99].[MyLogType] ASC, [Expr1003] ASC)) | |--Table Spool |--Compute Scalar(DEFINE:([Expr1002]=[Expr1002], [MyPageLog99].[MyLogType]=[MyPageLog99].[MyLogType], [Expr1003]=[Expr1003])) |--Compute Scalar(DEFINE:([Expr1006]=Convert([Expr1041]))) |--Stream Aggregate(GROUP BY:([Expr1002], [MyPageLog99].[MyLogType], [Expr1003]) DEFINE:([Expr1041]=COUNT_BIG([MyPageLog99].[MyUserID]))) |--Sort(DISTINCT ORDER BY:([Expr1002] ASC, [MyPageLog99].[MyLogType] ASC, [Expr1003] ASC, [MyPageLog99].[MyUserID] ASC)) |--Table Spool*/Brett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-11-30 : 16:24:44
|
"That's not what I'm getting, I get an index seek"That's encouraging ... Darn it, I should have run a light-weight test, rather than just throwing 25 million rows at it "have you run stats?"Theoretically, yes; they run daily after the purge of umpteen-million-rows. I'll get up early tomorrow morning when the system is "quiet" and run them manually to be sure, then chuck some further tests at it, and at "MyPageLog99" :)Thanks Brett.Kristen |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-12-01 : 02:19:16
|
| I ranUPDATE STATISTICS MyPageLog WITH FULLSCANand reran the query and its the same plan.If I do a SELECT COUNT(*) it uses the Date index, but if I select the aggregates it uses a Clustered Index Scan on the PK column (identity)COUNT(*) for the one day I am reporting on is 5,825,985 rows out of a total of 21,890,248I reckon its just too many rows to bother doing a BOOKMARK.So I'm back to moving the Clustered Index to the Date column, and sticking a UNIQUE Index on the IDENTITY, and worrying about whether the INSERT/UPDATE is going to be slower because it will have to go via the non-clustered index to do the UPDATE ...SELECT COUNT(DISTINCT MyCreateDate)FROM dbo.MyPageLoggives 9,192,029 - so 2.3 entries for each distinct date/time, on average.Actually I've had a thought. I could store the MIN/MAX IDENTITY number for each day, and use that as part of the WHERE clause so that the current Clustered Index dramatically reduces the number of rows that are processed.Kristen |
 |
|
|
jen
Master Smack Fu Yak Hacker
4110 Posts |
Posted - 2005-12-01 : 02:53:03
|
| same here... it's a seekhave you tried sp_recompile?--------------------keeping it simple... |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-12-01 : 03:21:39
|
| "same here... it's a seek"I reckon that Brett's example table just doesn't have enough rows ..."have you tried sp_recompile?"The code isn't in an Sproc as yet ...Kristen |
 |
|
|
jen
Master Smack Fu Yak Hacker
4110 Posts |
Posted - 2005-12-01 : 03:43:29
|
| instead of update, have you tried recreating the stats?--------------------keeping it simple... |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-12-01 : 04:06:56
|
| "instead of update, have you tried recreating the stats?"Nope, don't know about that - what's the drill please? (It takes blinking ages to redo the stats on that table every night given that a) its big an b) we have to do it regularly 'coz 1/5th gets deleted each day)Kristen |
 |
|
|
jen
Master Smack Fu Yak Hacker
4110 Posts |
Posted - 2005-12-01 : 04:36:04
|
| I think there's a warning attached... better read BOL before proceedingfrom BOL:execution plan-->execution plan caching and reusecreate statisticsdrop statisticsHTH--------------------keeping it simple... |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-12-01 : 05:07:54
|
| Ah, so I could explicitly CREATE STATISTICS on the Date Column ... that might be interesting.Thanks for that pointer.Kristen |
 |
|
|
Arnold Fribble
Yak-finder General
1961 Posts |
Posted - 2005-12-01 : 06:26:23
|
| Something of a side-issue, but I do notice you've got two distinct counts in there, amongst the other aggregates. From what I can see in the execution plan text (not much, they're never very easy to read), it's having to run the distinct counts separately. If MyUserID is dependent on MySessionID, you might be able to make the query do one pass over the data by doing the partial aggregation manually. Though whether or not it comes out quicker probably depends on the number of rows in the hash table, which will slow down the non-distinct aggregations.p.s. whatever happened to "people who bought this bought that too?" |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-12-01 : 06:48:20
|
| "I do notice you've got two distinct counts in there"Always handy that, an Eagle-eye. Thanks I'll look into that. MyUserID could change if they log in twice, differently, during a session, but that's pretty obtuse and I reckon I prob. only want to know how many sessions resulted in a login; I'll check it through, thanks."whatever happened to "people who bought this bought that too?"The final response looked right to me, and I was way off beam, but I haven't had time to check it yet. Will report back once I've got something debugged!Kristen |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2005-12-01 : 10:27:27
|
OK, so what happens when you do this?USE NorthwindGOCREATE TABLE MyPageLog99( MyCreateDate datetime NULL DEFAULT (GetDate()), MyUpdateDate datetime NULL , MyPageLogID int IDENTITY (10000, 1) NOT NULL , MyErrorNo int NULL DEFAULT (0), MySessionID int NULL , MyLogType char(1) , MyOtherData char(1) , MyPageNo char(1) , MyUserID char(10) , CONSTRAINT PK_MyPageLog PRIMARY KEY CLUSTERED ( MyPageLogID ))GOCREATE INDEX IX_MyPageLog_MySessionID ON dbo.MyPageLog99( MySessionID)GOCREATE INDEX IX_MyPageLog_MyCreateDate ON dbo.MyPageLog99( MyCreateDate, MyErrorNo)GODECLARE @dtStart datetime, @dtStop datetime SELECT @dtStart = '27-Nov-2005', -- Pick ONE DAY, 20% of data @dtStop = '28-Nov-2005'SELECT *INTO #MyTempTable -- Store it somewhere ...FROM dbo.MyPageLog99WHERE MyCreateDate >= @dtStart AND MyCreateDate < @dtStop SELECT [T_Type] = 'H', -- Hourly [MyCreateDate] = DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0), MyLogType, MyPageNo = CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo END, [D_MySessionID] = COUNT(DISTINCT MySessionID), [C_MyErrorNo] = SUM(CASE WHEN MyErrorNo = 0 THEN 0 ELSE 1 END), [D_MyUserID] = COUNT(DISTINCT MyUserID), [C_Count] = COUNT(*), [MIN_Elapsed] = MIN(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)), [AVG_Elapsed] = AVG(DateDiff(Millisecond, MyCreateDate, MyUpdateDate)), [MAX_Elapsed] = MAX(DateDiff(Millisecond, MyCreateDate, MyUpdateDate))INTO #MyTempTable_2 -- Store it somewhere ...FROM #MyTempTableWHERE MyCreateDate >= @dtStart AND MyCreateDate < @dtStop GROUP BY DATEADD(Hour, DATEDIFF(Hour, 0, MyCreateDate), 0), MyLogType, CASE WHEN MyLogType = 1 THEN CONVERT(int, MyOtherData) ELSE MyPageNo ENDGODROP TABLE MyPageLog99, #MyTempTable, #MyTempTable_2GO Brett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-12-01 : 10:48:02
|
(0 row(s) affected)(0 row(s) affected) Do you want me to pump some of my real-world data into it?With the empty table the first SELECT *INTO #MyTempTable -- Store it somewhere ...FROM dbo.MyPageLog99WHERE MyCreateDate >= @dtStart AND MyCreateDate < @dtStop gives me |--Table Insert(OBJECT:([#MyTempTable]), SET:([#MyTempTable].[MyUserID]=[MyPageLog99].[MyUserID], [#MyTempTable].[MyPageNo]=[MyPageLog99].[MyPageNo], [#MyTempTable].[MyOtherData]=[MyPageLog99].[MyOtherData], [#MyTempTable].[MyLogType]=[MyPageLog99].[MyLogType], [#MyTempTable].[MySessionID]=[MyPageLog99].[MySessionID], [#MyTempTable].[MyErrorNo]=[MyPageLog99].[MyErrorNo], [#MyTempTable].[MyPageLogID]=[Expr1002], [#MyTempTable].[MyUpdateDate]=[MyPageLog99].[MyUpdateDate], [#MyTempTable].[MyCreateDate]=[MyPageLog99].[MyCreateDate])) |--Top(ROWCOUNT est 0) |--Compute Scalar(DEFINE:([Expr1002]=setidentity([MyPageLog99].[MyPageLogID], -7, 0, '#MyTempTable'))) |--Bookmark Lookup(BOOKMARK:([Bmk1000]), OBJECT:([Northwind].[dbo].[MyPageLog99])) |--Index Seek(OBJECT:([Northwind].[dbo].[MyPageLog99].[IX_MyPageLog_MyCreateDate]), SEEK:([MyPageLog99].[MyCreateDate] >= [@dtStart] AND [MyPageLog99].[MyCreateDate] < [@dtStop]) ORDERED FORWARD)but I don't know how that would way up with the subsequentINTO #MyTempTable_2 -- Store it somewhere ...FROM #MyTempTablewhich is going to then do a SCAN - but the main table will have been released by then, which would be good news.5,000,000 rows too many for a @TempTable [as distcint from a #TempTable] do you think?Kristen |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2005-12-01 : 11:21:34
|
Yes it will scan, but it's your whole population, so it's suppose to...it's the seek we are trying to get...plus I made a boo-boo...ThisWHERE MyCreateDate >= @dtStart AND MyCreateDate < @dtStop Is suppose to come out of the query...Also table variables are suppose to be stored in memory...I don't think (I hate when that happens) that they use tempdb at all...I guess it's written to the virtual memory file...Which is a good question, which is more effecient...I gotta look it up, but for large volumes, I would say #temp, not @temp...Brett8-)Hint: Want your questions answered fast? Follow the direction in this linkhttp://weblogs.sqlteam.com/brettk/archive/2005/05/25/5276.aspxAdd yourself!http://www.frappr.com/sqlteam |
 |
|
|
Kristen
Test
22859 Posts |
Posted - 2005-12-01 : 12:46:12
|
| "I gotta look it up, but for large volumes, I would say #temp, not @temp..."I reckon so too .... I don't want to piss folk off, so I'll do the TEST early tomorrow morning.Thanks Brett,Kristen |
 |
|
|
Next Page
|
|
|
|
|