Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Would an index improve search time for this table

Author  Topic 

SamC
White Water Yakist

3467 Posts

Posted - 2003-01-25 : 10:33:39
CREATE TABLE dbo.PageHits (
PageHitID INT NOT NULL IDENTITY (1000000,1) ,
ClientID INT NOT NULL ,
UserID INT NOT NULL ,
CourseID INT NULL ,
ModuleID INT NULL ,
UrlDate DATETIME NOT NULL DEFAULT GETDATE() ,
Url varchar (300) NOT NULL
)
GO

This table logs pages visited in a course. Records are inserted at the end and do not have a performance problem.

The query to check for course completion retrieves a recordset for the UserID and CourseID.

Select ModuleID
FROM PageHits
WHERE CourseID=@CourseID and UserID=@UserID and ClientID=@ClientID

Worst case, there are 5,000,000 records in the table. If size becomes a problem I could move completed course records to another table.

Any suggestions on how the query performance could be improved using indicies?

Sam

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-01-25 : 10:42:13
A clustered index on UserID, CourseID, and ClientID would probably be most effective, if this query is more important that INSERT performance. Clustering on those columns may cause more page splits, and it will also spread the INSERTs out along more of the data file.

Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2003-01-25 : 12:32:57
On a live table (not too busy on the weekend)

CREATE CLUSTERED INDEX PageHitIndex
ON PageHits (ClientID, UserID, CourseID)

Would create the index....

Do the stored procedures that reference table PageHits need to change to refer to the new clustered index somehow to get a performance benefit or is this transparent?

Sam




Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-01-25 : 12:37:16
You might need to recompile them in order for them to utilize the new index. Try this:

DECLARE @db int
SET @db=DB_ID('myDatabase')
DBCC FREEPROCCACHE
DBCC FLUSHPROCINDB(@db)


That will flush the procedure cache and force a recompile on all of the stored procedures in the database. If the sproc's still don't seem to perform better after this, try dropping and re-creating them.

Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2003-02-03 : 18:51:54
I've tried the clustered index with no noticeable improvement. The query takes 30 seconds to calculate. I wish I could get it down to 3 seconds.

The query appears below. It calculates the average time all users spend reading a course module. There are about 600,000 rows (growing to several million) in table PageHits which records the moment a User visits each URL (Module in course lingo).

CREATE TABLE [PageHits] (
[PageHitID] [int] IDENTITY (1000000, 1) NOT NULL ,
[ClientID] [int] NOT NULL ,
[UserID] [int] NOT NULL ,
[CourseID] [int] NULL ,
[ModuleID] [int] NULL ,
[UrlDate] [datetime] NOT NULL DEFAULT (getdate()),
[Url] [varchar] (300) NOT NULL
)

An expensive part of the query is the retrieval of the next module read by the user from the table. Take a close look at:
(SELECT MIN(urldate) FROM PageHits WHERE urldate > p.urldate and userid = p.userid) as nexturldate

Here's the clustered index I added which I had hoped would boost the performance:

CREATE CLUSTERED INDEX [PageHitIndex] ON [dbo].[PageHits]([ClientID], [UserID], [CourseID], [UrlDate])

I saw zero performance improvement after creating the clustered index. I dropped the query and added it again to force a recompile (in addition to performing Rob's suggestion in an earlier post). The query remains at 35 seconds with 600,000 rows in the table.

Sam

Here's the query
---------------------


SELECT M.ModuleSequence, M.ModuleTitle, CAST(avg(minutes) as decimal(7,2)) as [Average Minutes] -- Average visitation time of ALL Users
FROM Modules M
LEFT OUTER JOIN
(SELECT UserID, ModuleID, CAST(SUM(ElapsedTime)/60.0 as decimal (7,2)) Minutes -- Total time a user spent on a URL/Module
FROM (SELECT UserID, ModuleID -- Elapsed Time for every Module (URL) hit for every User (8 minute maximum)
, CASE WHEN DATEDIFF(second, UrlDate, nextUrlDate) > 480 THEN 480 ELSE DATEDIFF(second, UrlDate, nextUrlDate) END as ElapsedTime
FROM (SELECT UserID, ModuleID, urldate, -- YIELDS ROW WITH NEXT URL DATE
(SELECT MIN(urldate) FROM PageHits WHERE urldate > p.urldate and userid = p.userid) as nexturldate
FROM PageHits P
WHERE ModuleID IS NOT NULL AND CourseID=@CourseID
) PROW -- The Start time and Next Page Time for this Page Hit
) PELAPSED -- The elapsed time a User spend on this page load
GROUP BY UserID, ModuleID
) PSUM -- Here we have totaled all the elapsed times for every visit the user made to this page
ON PSUM.ModuleID = M.ModuleID
WHERE M.CourseID = @CourseID
GROUP BY ModuleSequence, ModuleTitle
ORDER BY ModuleSequence


Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2003-02-03 : 22:22:31

I replaced the expensive part of the query

(SELECT MIN(urldate) FROM PageHits WHERE urldate > p.urldate and userid = p.userid) as nexturldate

With

URLDate

Which is wrong, but it would give me an idea of the execution time spent on this part of the query.

It reduced the query time from 35 seconds to 6 seconds.

I'm beginning to think I should add another column and populate it with the proper value to get the performance boost. This would undesirably increase the INSERT overhead though.

Finding another way to improve the query without adding the new column would be great.

Sam

Go to Top of Page
   

- Advertisement -