Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 Site Related Forums
 Article Discussion
 Article: Deleting Duplicate Records

Author  Topic 

AskSQLTeam
Ask SQLTeam Question

0 Posts

Posted - 2001-03-25 : 13:44:22
Seema writes "There is a Table with no key constraints. It has duplicate records. The duplicate records have to be deleted (eg there are 3 similar records, only 2 have to be deleted). I need a single SQL query for this." This is a pretty common question so I thought I'd provide some options.

Article Link.

aiken
Aged Yak Warrior

525 Posts

Posted - 2002-02-08 : 13:14:48
And then there's the dynamic SQL approach which uses no temp tables or identity keys. Probably not great for thousands or millions of duplicates, but very nice for trimming those pesky hundred duplicates from a million row table.

Here's the query as it would be for the demo DB in the article:

select 'delete from dup_authors where au_id=
(select top 1 au_id from dup_authors
where au_lname=' + au_lname + ' and au_fname='
+ au_fname + ' and city=' + city + ' and state='
+ state + ')'
from dup_authors
group by by au_lname, au_fname, city, state
having count(*) > 1

...each time it runs it will delete one of the duplicates; you have to run it a couple of times if one row is duplicated multiple times.

Cheers
-b



Go to Top of Page

MakeYourDaddyProud

184 Posts

Posted - 2002-05-30 : 11:12:15
-- delete dups i have done this b4

-- 1) assuming table has a unique identifier preferably keyed.
-- to delete all duplicate names keeping the one with highest ID

create table #Namez
(id int primary key,
name char(10) not null)
go

insert into #Namez values (1, "Mom")
insert into #Namez values (2, "Mom")
insert into #Namez values (3, "Sister")
insert into #Namez values (4, "Sister")
insert into #Namez values (5, "Sister")
insert into #Namez values (6, "Sister")
insert into #Namez values (7, "Sister")
insert into #Namez values (8, "Sister")
insert into #Namez values (9, "Dad")
go

select * from #Namez
go

DELETE #Namez
FROM #Namez
JOIN
(select [name], max([id]) AS MaxID
from #Namez
group by [Name]) AS G
ON G.[Name] = #Namez.[Name]
WHERE #Namez.[ID] < G.[MaxID]
AND #Namez.[Name] = G.[Name]

select * from #Namez
go

The correlated subquery agreggate establishes the link id in which to reference grouped deletes on name with a lesser id

HTH

Daniel Small CEO
www.danielsmall.com


Go to Top of Page

karbon
Starting Member

9 Posts

Posted - 2003-12-09 : 10:34:34

Why do you use long way????

Easy way
------------------
Example table
------------------
Create table EG
(
ID int,
Value1 int,
Value2 int
)

declare @ID int
declare @Count integer
declare CursorDuplicates Cursor for
SELECT ID FROM EG
open CursorDuplicates
fetch next from CursorDuplicates into @ID
while @@fetch_status=0
begin
select @Count = count(ID) from EG where ID = @ID
if @Count > 1
begin
DELETE EG WHERE CURRENT OF CursorDuplicates
end
fetch next from CursorDuplicates into @ID
end
close CursorDuplicates
deallocate CursorDuplicates
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2003-12-09 : 23:57:16
You are kidding us, right?

Or do you prefer to make 3 left turns instead of one right turn?
Go to Top of Page

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2003-12-10 : 00:07:30
[code]
WHERE CURRENT OF
[/code]

I vote this the "Worst Operator ever implemented"

DavidM

"SQL-3 is an abomination.."
Go to Top of Page

Deep Blue
Starting Member

1 Post

Posted - 2004-06-28 : 13:09:42
Hi!

The query helped me a bit. I've just rewritten it like this and works better in my case (my table having dups already has an autonumeric Primary Key)

delete dim_zonas_venta
where zona_venta_key not in
(
select max(zona_venta_key) as zona_venta_key
from dim_zonas_venta
group by cliente_key, mercado_grupo_key, cod_zona_venta, zona_venta
)
Go to Top of Page

Deep Blue
Starting Member

1 Post

Posted - 2004-06-28 : 13:13:39
Hi!

The query helped me a bit. I've just rewritten it like this and works better in my case (my table having dups already has an autonumeric Primary Key)

delete dim_zonas_venta
where zona_venta_key not in
(
select max(zona_venta_key) as zona_venta_key
from dim_zonas_venta
group by cliente_key, mercado_grupo_key, cod_zona_venta, zona_venta
)
Go to Top of Page

darrendorlando
Starting Member

1 Post

Posted - 2005-05-09 : 18:18:43
http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_


This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!


DELETE FROM [Table with Duplicates]
WHERE [Primary Key Field] IN
(
SELECT a.[Primary Key Field]
FROM [Table with Duplicates] a,
[Table with Duplicates] b

WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey
AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
)


dd
ddorlando@gmail.com
Go to Top of Page

rosieq13
Starting Member

1 Post

Posted - 2005-06-14 : 16:37:29
Your instructions on Deleting Duplicate Records are just great. I followed them and was able to accomplish the deletion. Usually I don't rate articles, but I cannot help saying Thanks to this one.
Go to Top of Page

speedadkt
Starting Member

4 Posts

Posted - 2005-11-18 : 16:18:21
This is by far the most straight forward, efficient method of the bunch. Works for any number of dupes too which is great.

quote:
Originally posted by darrendorlando

http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_


This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!


DELETE FROM [Table with Duplicates]
WHERE [Primary Key Field] IN
(
SELECT a.[Primary Key Field]
FROM [Table with Duplicates] a,
[Table with Duplicates] b

WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey
AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
)


dd
ddorlando@gmail.com


Go to Top of Page

Merkwurdigliebe
Starting Member

1 Post

Posted - 2006-07-11 : 17:47:17
And this is the only example I have found in several hours of looking that works with non-trivial tables and data. How often do you need to clean up a two-column table with an identity column and no PK, anyway?

Thanks for the clear, logical, and concise code!

I was able to use this to clean up a mess in a table with a three-way compound key comprised of GUID's (uniquidentifiers) that needed a dozen casts in other examples to even think about running. (and still bombed)


quote:
Originally posted by darrendorlando

http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_


This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!


DELETE FROM [Table with Duplicates]
WHERE [Primary Key Field] IN
(
SELECT a.[Primary Key Field]
FROM [Table with Duplicates] a,
[Table with Duplicates] b

WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey
AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
)


dd
ddorlando@gmail.com


Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2007-02-05 : 04:05:27
Related information:

http://support.microsoft.com/kb/139444 (as posted in Positioned Updates/Deleted without PK)

Kristen
Go to Top of Page

Drummos
Starting Member

4 Posts

Posted - 2007-02-11 : 14:39:14
The article on deleting duplicates worked well for me. But, now every time I try to create the PK a new duplicate record is created. I'm not sure what's going on maybe it's a technical issue in SQL 2005?

Ally
Go to Top of Page

graz
Chief SQLTeam Crack Dealer

4149 Posts

Posted - 2007-02-11 : 23:48:34
How are you creating the primary key?

===============================================
Creating tomorrow's legacy systems today.
One crisis at a time.
Go to Top of Page

SwePeso
Patron Saint of Lost Yaks

30421 Posts

Posted - 2007-04-16 : 04:49:17
How do all these methods compare to this I discovered today?
DELETE		t1
FROM (
SELECT ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2 DESC) AS RecID
FROM Table1
) AS t1
WHERE RecID > 1


Peter Larsson
Helsingborg, Sweden
Go to Top of Page

peter3286
Starting Member

1 Post

Posted - 2007-04-16 : 07:20:17
get the list of duplicate delete

SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
Go to Top of Page

jcarnes
Starting Member

5 Posts

Posted - 2007-05-27 : 14:17:23
I'm trying to use this command from above to delete dupes:
____________________

DELETE FROM [Table with Duplicates]
WHERE [Primary Key Field] IN
(
SELECT a.[Primary Key Field]
FROM [Table with Duplicates] a,
[Table with Duplicates] b

WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey
AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname
AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname
AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey
)
____________________

Unfortunately, I'm so new to SQL that I don't even know what to put in the various fields. My relevant fields are: id, title, introtext, and sectionid.

I'd like this query to check for dupes in "title" and "introtext" and if it finds any AND the "sectionid" is identical, then I'd like to delete the oldest entry and leave the newest. If the "sectionid" won't work with this, it's not a big deal. If the date check won't work, it isn't a big deal either. Trashing any dupes is better than having them and they're usually within a day of each other anyway.

If anyone can give me some guidance on this I'd really appreciate it. Thanks!
Go to Top of Page

rakesh koyadi
Starting Member

3 Posts

Posted - 2007-05-30 : 15:51:03
Hi,
First set the row count to 1 and then delete the duplicate record from the table and then again set the rowcount to 0. see the below example :

create table tab12 (a int)

select * from tab12

insert into tab12 values (1)
insert into tab12 values (1)

--set the rowcount depending on the number of duplicates you want to delete

set rowcount 1
delete from tab12 where a=1
set rowcount 0

select * from tab12

Rakesh


rakesh
Go to Top of Page

jcarnes
Starting Member

5 Posts

Posted - 2007-05-30 : 16:17:29
I'm really red-faced here. Turns out I have MySQL and therefore this has wasted your time. I'm sorry. I am so new with SQL that I didn't know there were even different flavors. Thanks for the response!
Go to Top of Page

aruntom
Starting Member

1 Post

Posted - 2007-07-11 : 03:22:18
If we use DISTINCT
It will not work if our table has text, ntext, or image data type
we wil get an error as:
The text, ntext, or image data type cannot be selected as DISTINCT.

so add another identity column..
then search for replicating data and delete with that new ID..
It will work definitely..

regard
Arun Thomas

Go to Top of Page
    Next Page

- Advertisement -