Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-01-25 : 10:33:39
|
| CREATE TABLE dbo.PageHits ( PageHitID INT NOT NULL IDENTITY (1000000,1) , ClientID INT NOT NULL , UserID INT NOT NULL , CourseID INT NULL , ModuleID INT NULL , UrlDate DATETIME NOT NULL DEFAULT GETDATE() , Url varchar (300) NOT NULL)GOThis table logs pages visited in a course. Records are inserted at the end and do not have a performance problem.The query to check for course completion retrieves a recordset for the UserID and CourseID. Select ModuleID FROM PageHitsWHERE CourseID=@CourseID and UserID=@UserID and ClientID=@ClientIDWorst case, there are 5,000,000 records in the table. If size becomes a problem I could move completed course records to another table.Any suggestions on how the query performance could be improved using indicies?Sam |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-01-25 : 10:42:13
|
| A clustered index on UserID, CourseID, and ClientID would probably be most effective, if this query is more important that INSERT performance. Clustering on those columns may cause more page splits, and it will also spread the INSERTs out along more of the data file. |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-01-25 : 12:32:57
|
| On a live table (not too busy on the weekend)CREATE CLUSTERED INDEX PageHitIndex ON PageHits (ClientID, UserID, CourseID)Would create the index....Do the stored procedures that reference table PageHits need to change to refer to the new clustered index somehow to get a performance benefit or is this transparent?Sam |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-01-25 : 12:37:16
|
| You might need to recompile them in order for them to utilize the new index. Try this:DECLARE @db intSET @db=DB_ID('myDatabase')DBCC FREEPROCCACHEDBCC FLUSHPROCINDB(@db)That will flush the procedure cache and force a recompile on all of the stored procedures in the database. If the sproc's still don't seem to perform better after this, try dropping and re-creating them. |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-02-03 : 18:51:54
|
I've tried the clustered index with no noticeable improvement. The query takes 30 seconds to calculate . I wish I could get it down to 3 seconds. The query appears below. It calculates the average time all users spend reading a course module. There are about 600,000 rows (growing to several million) in table PageHits which records the moment a User visits each URL (Module in course lingo).CREATE TABLE [PageHits] ( [PageHitID] [int] IDENTITY (1000000, 1) NOT NULL , [ClientID] [int] NOT NULL , [UserID] [int] NOT NULL , [CourseID] [int] NULL , [ModuleID] [int] NULL , [UrlDate] [datetime] NOT NULL DEFAULT (getdate()), [Url] [varchar] (300) NOT NULL )An expensive part of the query is the retrieval of the next module read by the user from the table. Take a close look at:(SELECT MIN(urldate) FROM PageHits WHERE urldate > p.urldate and userid = p.userid) as nexturldateHere's the clustered index I added which I had hoped would boost the performance:CREATE CLUSTERED INDEX [PageHitIndex] ON [dbo].[PageHits]([ClientID], [UserID], [CourseID], [UrlDate]) I saw zero performance improvement after creating the clustered index. I dropped the query and added it again to force a recompile (in addition to performing Rob's suggestion in an earlier post). The query remains at 35 seconds with 600,000 rows in the table.SamHere's the query---------------------SELECT M.ModuleSequence, M.ModuleTitle, CAST(avg(minutes) as decimal(7,2)) as [Average Minutes] -- Average visitation time of ALL Users FROM Modules M LEFT OUTER JOIN (SELECT UserID, ModuleID, CAST(SUM(ElapsedTime)/60.0 as decimal (7,2)) Minutes -- Total time a user spent on a URL/Module FROM (SELECT UserID, ModuleID -- Elapsed Time for every Module (URL) hit for every User (8 minute maximum) , CASE WHEN DATEDIFF(second, UrlDate, nextUrlDate) > 480 THEN 480 ELSE DATEDIFF(second, UrlDate, nextUrlDate) END as ElapsedTime FROM (SELECT UserID, ModuleID, urldate, -- YIELDS ROW WITH NEXT URL DATE (SELECT MIN(urldate) FROM PageHits WHERE urldate > p.urldate and userid = p.userid) as nexturldate FROM PageHits P WHERE ModuleID IS NOT NULL AND CourseID=@CourseID ) PROW -- The Start time and Next Page Time for this Page Hit ) PELAPSED -- The elapsed time a User spend on this page load GROUP BY UserID, ModuleID ) PSUM -- Here we have totaled all the elapsed times for every visit the user made to this page ON PSUM.ModuleID = M.ModuleID WHERE M.CourseID = @CourseID GROUP BY ModuleSequence, ModuleTitle ORDER BY ModuleSequence |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-02-03 : 22:22:31
|
| I replaced the expensive part of the query(SELECT MIN(urldate) FROM PageHits WHERE urldate > p.urldate and userid = p.userid) as nexturldate With URLDateWhich is wrong, but it would give me an idea of the execution time spent on this part of the query.It reduced the query time from 35 seconds to 6 seconds.I'm beginning to think I should add another column and populate it with the proper value to get the performance boost. This would undesirably increase the INSERT overhead though.Finding another way to improve the query without adding the new column would be great.Sam |
 |
|
|
|
|
|
|
|