Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-11-09 : 12:59:20
|
| Damian recently posted a solution for deleting older records from a Table.I've had a nagging concern that there may be a "less costly" way to find and delete older records without requiring an INNER JOIN . Before chasing a new solution, a check of the execution plan on the INNER JOIN solution show a surprising two table scans (sad to see since the table is FULLY indexed). This can't be right, but it may be the *bad* result of doing an INNER JOIN using a " < " condition?Here's the query as it is used today...DELETE A FROM dbo.Feedback2 A INNER JOIN dbo.Feedback2 B ON A.CourseID=B.CourseID AND A.UserID=B.UserID AND A.FormID=B.FormID AND A.CreateDate < B.CreateDateMoving on in hopes that a revised query won't use a table scan... An alternative DELETE (conceptually) would delete all records of a given catagory where the record's "CreateDate" is less than MAX(CreateDate) for that catagory. DELETE FROM dbo.Feedback2 WHERE catagory = this catagory AND CreateDate < MAX(CreateDate this catagory)Of course this won't work as is, and using a MAX would require GROUP BY somwhere. This thought process leads to a formulation for the recordset to SAVE (not DELETE)SELECT CourseID, UserID, FormID, MAX(CreateDate) as CreateDateFROM dbo.Feedback2GROUP BY CourseID, UserID, FormIDDeleting the unwanted records is thenDELETE AFROM dbo.Feedback2 AINNER JOIN (SELECT CourseID, UserID, FormID, MAX(CreateDate) as CreateDateFROM dbo.Feedback2GROUP BY CourseID, UserID, FormID) B on B.CourseID=A.CourseID AND B.UserID = A.UserID AND B.FormID = A.FormID AND B.CreateDate > A.CreateDateComparing the execution plan of both solutions shows disappointing table scans in both queries. This suggests both queries are winners, but I have a problem with the table statistics forcing the table scans. Is there a way to reset statistics on a table which might lead to a better execution plan?Sam |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-11-09 : 18:03:13
|
| How many rows in the table?Are you seeking for a better performance solution or trying to get more eyes-pleasant execution plan?How about simple:delete from t where dt<>(select max(dt) from t tt where tt.uid=t.uid) ??PS Common mistake: to take "it all" for a kind of Real Science... |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-11-09 : 18:18:17
|
| Right now, only 300 rows. It's going to grow large over time however. I need efficent execution, not a good-looking query. Right now Damian's INNER JOIN is as good (or better) than anything else, but it's hard to compare when table scans show up in the execution plan.Could the compiler be choosing a table scan because it's more efficent than an index seek for a few hundred rows?Sam |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-11-09 : 18:54:24
|
| I think it is so. Too few rows. |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-11-09 : 19:00:55
|
quote: it's hard to compare when table scans show up in the execution plan
Sorry, Sam, but I think that Experiment is the best Judge. |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2003-11-09 : 19:16:09
|
quote: (sad to see since the table is FULLY indexed
SamC, Can you post Feedback2 DDL and indexes? |
 |
|
|
Stoad
Freaky Yak Linguist
1983 Posts |
Posted - 2003-11-09 : 19:51:04
|
| ehorn,do you mean the DateTime column is indexed on its own? |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2003-11-09 : 19:52:36
|
| I am trying to better understand the indexes you have on the Feedback2 tabel as you had mentioned it is "fully indexed".. |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-11-09 : 19:55:28
|
| [code]CREATE TABLE [Feedback2] ( [UserID] [int] NOT NULL , [CourseID] [int] NOT NULL , [CreateDate] [datetime] NOT NULL , [FormID] [int] NOT NULL , [Question] [nvarchar] (1000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [ChoiceID] [int] NOT NULL , [ChoiceText] [nvarchar] (2000) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ) ON [PRIMARY]GO CREATE INDEX [IDX_Feedback1] ON [dbo].[Feedback2]([UserID]) ON [PRIMARY] CREATE INDEX [IDX_Feedback2] ON [dbo].[Feedback2]([CourseID]) ON [PRIMARY] CREATE INDEX [IDX_Feedback3] ON [dbo].[Feedback2]([FormID]) ON [PRIMARY] CREATE INDEX [IDX_Feedback4] ON [dbo].[Feedback2]([ChoiceID]) ON [PRIMARY] CREATE INDEX [IDX_Feedback5] ON [dbo].[Feedback2]([CreateDate]) ON [PRIMARY]GO[/code] |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2003-11-09 : 23:54:15
|
One option is to force an index...You can add a primary key, such as:FeedBack2ID INT (IDENTITY(0,1) PRIMARY KEY spooling costs but scales better than the table scansDELETE AFROM FeedBack2 AINNER JOIN FeedBack2 B ON A.CourseID=B.CourseID AND A.UserID=B.UserID AND A.FormID=B.FormID AND A.CreateDate < B.CreateDate quote: This thought process leads to a formulation for the recordset to SAVE (not DELETE)
Which leads to another option to use the nonclustered indexes to build the alias (d) and the clustered index (FeedBack2ID) to perform the delete. This option still produces the spooling but also scales much better than the scan and makes use of all the indexes.delete FeedBack2 where Feedback2ID not in( select Feedback2ID from FeedBack2 c join ( select distinct a.courseid, a.userid, a.formid, (select max(createdate) from FeedBack2 b where b.courseid=a.courseid and b.userid = a.userid and b.formid=a.formid) createdate from FeedBack2 a ) d on c.courseid=d.courseid and c.userid = d.userid and c.formid=d.formid and c.createdate = d.createdate) How to prevent those spools??? |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-11-10 : 07:05:24
|
quote: One option is to force an index...You can add a primary key, such as:FeedBack2ID INT (IDENTITY(0,1) PRIMARY KEYspooling costs but scales better than the table scansDELETE AFROM FeedBack2 AINNER JOIN FeedBack2 B ON A.CourseID=B.CourseID AND A.UserID=B.UserID AND A.FormID=B.FormID AND A.CreateDate < B.CreateDate
Why would adding a primary key (which isn't used in the query) cause the compiler not to generate table scans? |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2003-11-10 : 10:02:08
|
| Sam, I do not understand this...My assumption is that the optimizer cannot assume any uniqueness on the self-join w/o a unique constraint. But it does seem to always use the Clustered index for additional "optimization" ??? information. As You and Stoad discussed, maybe the optimizer sees the scan as the best solution, although, regardless of how large the table gets, it appears to never use the non-clustered indexes for the hash match. Perhaps the Gurus can better explain what is happening here or suggest a more appropriate course. |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-11-10 : 10:43:35
|
| I regret naming this thread "Delete puzzler" since the indexing (not the query) seems to be generating the poor execution plan.The DDL is posted above. If anyone suggests a better index scheme, I'll give it a try.Sam |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2003-11-10 : 11:15:47
|
| Sam,Are the existing indexes (UserID,CourseID,FormID,Createdate) serving some purpose for other application functionality (ie other types of searching)? |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-11-10 : 11:27:31
|
The only other operations are INSERTs, and there's a report which uses this query:SELECT CASE WHEN ChoiceID > 0 THEN Total ELSE NULL END AS Total , CAST(CASE WHEN ChoiceID > 0 AND GrandTotal > 0 THEN CAST(100.0 * Total / GrandTotal AS NUMERIC(3,0)) ELSE NULL END AS VARCHAR) + '%' AS [Percent] , Question FROM ( SELECT FormID, 0 as ChoiceID, '<b>' + MAX(Question) + '</b>' AS Question, COUNT(*) as Total , -- List of all the questions 1 as GrandTotal FROM dbo.Feedback2 WHERE CourseID = @CourseID AND ChoiceID > 0 -- Select just the "Choice" Responses AND CreateDate BETWEEN @StartDate AND @FinishDate GROUP BY FormID UNION ALL SELECT F.FormID, ChoiceID, ChoiceText, COUNT(*) AS Total , MAX(GrandTotal) AS GrandTotal -- FormID, Choice and Total for each choosable item FROM dbo.Feedback2 F LEFT OUTER JOIN (SELECT FormID, COUNT(*) AS GrandTotal FROM dbo.Feedback2 WHERE CourseID=@CourseID AND ChoiceID > 0 AND CreateDate BETWEEN @StartDate AND @FinishDate GROUP BY FormID) GT ON GT.FormID = F.FormID WHERE CourseID = @CourseID AND ChoiceID > 0 -- Select just the "Choice" Responses AND CreateDate BETWEEN @StartDate AND @FinishDate GROUP BY F.FormID, ChoiceID, ChoiceText ) X ORDER BY FormID ASC, ChoiceID ASC This query produces a summary of the ChoiceIDs (radio buttons) user's selected on certain FormIDs, totals 'em up, calculates a percentage.It's not pretty.Sam |
 |
|
|
ehorn
Master Smack Fu Yak Hacker
1632 Posts |
Posted - 2003-11-10 : 11:44:09
|
| Sorry for so many questions. Are the indexes being used for the insert?As you probably know indexes slow down insert,updates and deletes. If the indexes are not be effectively used you may consider removing them with the exception of maybe createdate and a primary key. I tested the delete and it was much faster with no spooling. Not sure how this will effect insert but it may ultimately be a design tradeoff. |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-11-10 : 13:00:03
|
Sam,I was playing aroung with this, when I realized the Delete will never delete anything...anyway, back toi the index thingee...I swear I saw a one of the plans using parallesim...but can't get it to repeat it...Takes about 15 minutes to run, and it needs to be run in stages...but it's interesting...what's eager spool?CREATE TABLE Feedback2 ( UserID int NOT NULL , CourseID int NOT NULL , CreateDate datetime NOT NULL DEFAULT Getdate() , FormID int NOT NULL , Question varchar (1000) , ChoiceID int , ChoiceText varchar (2000))GO CREATE INDEX IDX_Feedback1 ON dbo.Feedback2(UserID) CREATE INDEX IDX_Feedback2 ON dbo.Feedback2(CourseID) CREATE INDEX IDX_Feedback3 ON dbo.Feedback2(FormID) CREATE INDEX IDX_Feedback4 ON dbo.Feedback2(ChoiceID) CREATE INDEX IDX_Feedback5 ON dbo.Feedback2(CreateDate) CREATE INDEX IDX_Feedback6 ON dbo.Feedback2(UserID, CourseID, FormID, CreateDate)GOCREATE TABLE Feedback3 ( UserID int NOT NULL , CourseID int NOT NULL , CreateDate datetime NOT NULL DEFAULT Getdate() , FormID int NOT NULL , Question varchar (1000) , ChoiceID int , ChoiceText varchar (2000))GO CREATE INDEX IDX_Feedback1 ON dbo.Feedback3(UserID) CREATE INDEX IDX_Feedback2 ON dbo.Feedback3(CourseID) CREATE INDEX IDX_Feedback3 ON dbo.Feedback3(FormID) CREATE INDEX IDX_Feedback4 ON dbo.Feedback3(ChoiceID) CREATE INDEX IDX_Feedback5 ON dbo.Feedback3(CreateDate) CREATE INDEX IDX_Feedback6 ON dbo.Feedback3(UserID, CourseID, FormID, CreateDate)GO--SET SHOWPLAN_TEXT OFF--SET SHOWPLAN_ALL OFF--GOSET NOCOUNT ONDECLARE @x int, @y int, @z intSELECT @x = 1, @y = 1, @z = 1WHILE @x < 100 BEGIN WHILE @y < 5 BEGIN WHILE @z < 2 BEGIN INSERT INTO Feedback2 (UserId, CourseId, FormId) SELECT @x, @y, @z SELECT @z = @z + 1 END SELECT @z = 1, @y = @y + 1 END SELECT @y = 1, @x = @x + 1 END SELECT COUNT(*) FROM Feedback2SELECT @x = 1, @y = 1, @z = 1WHILE @x < 200 BEGIN WHILE @y < 100 BEGIN WHILE @z < 10 BEGIN INSERT INTO Feedback3 (UserId, CourseId, FormId) SELECT @x, @y, @z SELECT @z = @z + 1 END SELECT @z = 1, @y = @y + 1 END SELECT @y = 1, @x = @x + 1 END SELECT COUNT(*) FROM Feedback3GO DELETE A FROM dbo.Feedback3 AINNER JOIN ( SELECT CourseID, UserID, FormID, MAX(CreateDate) as CreateDate FROM dbo.Feedback3 GROUP BY CourseID, UserID, FormID ) B ON B.CourseID = A.CourseID AND B.UserID = A.UserID AND B.FormID = A.FormID AND B.CreateDate > A.CreateDateSELECT 'Rows Deleted: ' + CONVERT(varchar(15),@@ROWCOUNT) DELETE A FROM dbo.Feedback2 AINNER JOIN ( SELECT CourseID, UserID, FormID, MAX(CreateDate) as CreateDate FROM dbo.Feedback2 GROUP BY CourseID, UserID, FormID ) B ON B.CourseID = A.CourseID AND B.UserID = A.UserID AND B.FormID = A.FormID AND B.CreateDate > A.CreateDateSELECT 'Rows Deleted: ' + CONVERT(varchar(15),@@ROWCOUNT)GOSET NOCOUNT OFFDROP TABLE FeedBack2DROP TABLE FeedBack3GO Brett8-) |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-11-10 : 13:49:47
|
| Hi Brett,Searching on "Eager spool", yours is the only post that speaks of it.Sam |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-11-10 : 14:05:54
|
| Check out the plan on the secons one...Table Spool/ Eager Spool...Never seen it before...And I'm still confused on how I saw parallelism...and now can't recreate it...damn hangover...Brett8-) |
 |
|
|
SamC
White Water Yakist
3467 Posts |
Posted - 2003-11-10 : 16:03:42
|
Hi Brett,I ran your test scripts (thanks) and found that there's no duplicate data to delete - so that explains why 0 rows were deleted.I see the eager spool. I can imagine the SQL design team thinking up these names... (apethetic spool? no...)The execution plan for DELETE A FROM dbo.Feedback2 A INNER JOIN dbo.Feedback2 B ON A.CourseID=B.CourseID AND A.UserID=B.UserID AND A.FormID=B.FormID AND A.CreateDate < B.CreateDate is much shorter, but has table scans, not eager spools when I paste it into the script you wrote. Same DB and table, but it runs WAY faster than the test query in your script. (20 ms versus 2100 ms) using GETDATE to time 'em like so..declare @T1 DATETIMESET @t1 = GETDATE()DELETE A FROM dbo.Feedback3 AINNER JOIN ( SELECT CourseID, UserID, FormID, MAX(CreateDate) as CreateDate FROM dbo.Feedback3 GROUP BY CourseID, UserID, FormID ) B ON B.CourseID = A.CourseID AND B.UserID = A.UserID AND B.FormID = A.FormID AND B.CreateDate > A.CreateDatePrint DATEDIFF(MS, @t1, getdate())SET @t1 = GETDATE()DELETE A FROM dbo.Feedback2 A INNER JOIN dbo.Feedback2 B ON A.CourseID=B.CourseID AND A.UserID=B.UserID AND A.FormID=B.FormID AND A.CreateDate < B.CreateDatePrint DATEDIFF(MS, @t1, getdate()) Moral: No matter how good it looks, you may be look'en at the wrong thing altogether. |
 |
|
|
|
|
|
|
|