Choosing how many duplicates to delete

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Choosing how many duplicates to delete

Author

Topic

Bex
Aged Yak Warrior

580 Posts

Posted - 2005-03-03 : 10:09:21

Afternoon all

My scenario:
We perform a run of client files into a DB every day. If something goes wrong, the run has to be done again. I have a sproc to delete the records from the previous run, but alas, people forget to ask me to do it. I have now implemented a mechanism to prevent re-runs on the same day, but I still have duplicates to remove.

Solution:
After much grit, sweat and toil, I have finally written a solution that deletes the latest duplicate. However, there are at times more than two duplicates. So rather than using MAX(ID field), I would like to use TOP <number of duplicates>. Below is the query and the part to change is highlighted in red.

Query:

CREATE PROCEDURE [dbo].[usp_DeleteDuplicateRecords]
(
@cycleInstanceDate SMALLDATETIME
)

AS

DELETE LetterRepository
FROM LetterRepository L INNER JOIN
(SELECT MAX(RepositoryID) AS RepositoryID, T1.A000_Document, T1.MailDate, T1.CycleInstanceID, T1.LetterCodeID
FROM LetterRepository T1
INNER JOIN
(SELECT A000_Document, MailDate, R.CycleInstanceID, LetterCodeID
FROM LetterRepository R
INNER JOIN CycleInstance C on C.CycleInstanceID = R.CycleInstanceID
GROUP BY A000_Document, MailDate, R.CycleInstanceID, LetterCodeID, C.CycleInstanceDate
HAVING COUNT(A000_Document) > 1
AND C.CycleInstanceDate = @cycleInstanceDate
) AS T2
ON T1.A000_Document = T2.A000_Document
AND T1.MailDate = T2.MailDate
AND T1.CycleInstanceID = T2.CycleInstanceID
AND T1.LetterCodeID = T2.LetterCodeID
GROUP BY T1.A000_Document, T1.MailDate, T1.CycleInstanceID, T1.LetterCodeID) as T3
ON L.RepositoryID = T3.RepositoryID
AND L.A000_Document = T3.A000_Document
AND L.MailDate = T3.MailDate
AND L.CycleInstanceID = T3.CycleInstanceID
AND L.LetterCodeID = T3.LetterCodeID
GO

Change the text in red to:
TOP ? RepositoryID
(and include an ORDER BY clause)

Problem:
Because I do not know how many times the run has been done, I would like to dynamically count the number of duplicates, and delete accordingly:

To count the duplicates:
SELECT COUNT(*)FROM LetterRepository R
INNER JOIN CycleInstance C on C.CycleInstanceID = R.CycleInstanceID
GROUP BY A000_Document, MailDate, R.CycleInstanceID, LetterCodeID, CycleInstanceDate
HAVING COUNT(A000_Document) > 1
AND C.CycleInstanceDate = @cycleInstanceDate

Is there a way to assign this to a parameter, and then use this parameter in the TOP clause? Like so:

DECLARE @counter INT
SET @counter = SELECT......ETC..see above

DELETE LetterRepository
FROM LetterRepository L INNER JOIN
(SELECT SELECT TOP @counter RepositoryID ......ETC

I have tried the above and it doesn't work, I get the following error message:

Server: Msg 170, Level 15, State 1, Line 1
Line 1: Incorrect syntax near '@count'.

Any suggestions?????

Hearty head pats

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2005-03-03 : 10:39:13

How about getting the MIN (ID field) and use an outer join to delete all dupes except the minimum one?

Be One with the Optimizer
TG

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-03-03 : 10:53:48

set rowcount @counter
SELECT RepositoryID ......ETC
into #temp
from...
set rowcount 0

DELETE LetterRepository
FROM LetterRepository L INNER JOIN
#temp on etc...

Go with the flow & have fun! Else fight the flow

Bex
Aged Yak Warrior

580 Posts

Posted - 2005-03-03 : 10:59:13

Fantastic! Thats just what I need!!

Thank you both!

Hearty head pats

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources