Author |
Topic |
AskSQLTeam
Ask SQLTeam Question
0 Posts |
Posted - 2001-03-25 : 13:44:22
|
Seema writes "There is a Table with no key constraints. It has duplicate records. The duplicate records have to be deleted (eg there are 3 similar records, only 2 have to be deleted). I need a single SQL query for this." This is a pretty common question so I thought I'd provide some options. Article Link. |
|
aiken
Aged Yak Warrior
525 Posts |
Posted - 2002-02-08 : 13:14:48
|
And then there's the dynamic SQL approach which uses no temp tables or identity keys. Probably not great for thousands or millions of duplicates, but very nice for trimming those pesky hundred duplicates from a million row table.Here's the query as it would be for the demo DB in the article:select 'delete from dup_authors where au_id= (select top 1 au_id from dup_authors where au_lname=' + au_lname + ' and au_fname=' + au_fname + ' and city=' + city + ' and state=' + state + ')' from dup_authors group by by au_lname, au_fname, city, state having count(*) > 1...each time it runs it will delete one of the duplicates; you have to run it a couple of times if one row is duplicated multiple times.Cheers-b |
|
|
MakeYourDaddyProud
184 Posts |
Posted - 2002-05-30 : 11:12:15
|
-- delete dups i have done this b4-- 1) assuming table has a unique identifier preferably keyed.-- to delete all duplicate names keeping the one with highest IDcreate table #Namez (id int primary key, name char(10) not null)goinsert into #Namez values (1, "Mom")insert into #Namez values (2, "Mom")insert into #Namez values (3, "Sister")insert into #Namez values (4, "Sister")insert into #Namez values (5, "Sister")insert into #Namez values (6, "Sister")insert into #Namez values (7, "Sister")insert into #Namez values (8, "Sister")insert into #Namez values (9, "Dad")goselect * from #NamezgoDELETE #NamezFROM #NamezJOIN (select [name], max([id]) AS MaxID from #Namez group by [Name]) AS G ON G.[Name] = #Namez.[Name]WHERE #Namez.[ID] < G.[MaxID]AND #Namez.[Name] = G.[Name]select * from #NamezgoThe correlated subquery agreggate establishes the link id in which to reference grouped deletes on name with a lesser idHTHDaniel Small CEOwww.danielsmall.com |
|
|
karbon
Starting Member
9 Posts |
Posted - 2003-12-09 : 10:34:34
|
Why do you use long way????Easy way------------------Example table------------------Create table EG(ID int,Value1 int,Value2 int)declare @ID intdeclare @Count integerdeclare CursorDuplicates Cursor forSELECT ID FROM EG open CursorDuplicates fetch next from CursorDuplicates into @ID while @@fetch_status=0begin select @Count = count(ID) from EG where ID = @ID if @Count > 1 begin DELETE EG WHERE CURRENT OF CursorDuplicates endfetch next from CursorDuplicates into @IDendclose CursorDuplicates deallocate CursorDuplicates |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-12-09 : 23:57:16
|
You are kidding us, right?Or do you prefer to make 3 left turns instead of one right turn? |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2003-12-10 : 00:07:30
|
[code]WHERE CURRENT OF[/code]I vote this the "Worst Operator ever implemented"DavidM"SQL-3 is an abomination.." |
|
|
Deep Blue
Starting Member
1 Post |
Posted - 2004-06-28 : 13:09:42
|
Hi! The query helped me a bit. I've just rewritten it like this and works better in my case (my table having dups already has an autonumeric Primary Key)delete dim_zonas_ventawhere zona_venta_key not in (select max(zona_venta_key) as zona_venta_keyfrom dim_zonas_ventagroup by cliente_key, mercado_grupo_key, cod_zona_venta, zona_venta) |
|
|
Deep Blue
Starting Member
1 Post |
Posted - 2004-06-28 : 13:13:39
|
Hi! The query helped me a bit. I've just rewritten it like this and works better in my case (my table having dups already has an autonumeric Primary Key)delete dim_zonas_ventawhere zona_venta_key not in (select max(zona_venta_key) as zona_venta_keyfrom dim_zonas_ventagroup by cliente_key, mercado_grupo_key, cod_zona_venta, zona_venta) |
|
|
darrendorlando
Starting Member
1 Post |
Posted - 2005-05-09 : 18:18:43
|
http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!DELETE FROM [Table with Duplicates]WHERE [Primary Key Field] IN ( SELECT a.[Primary Key Field] FROM [Table with Duplicates] a, [Table with Duplicates] b WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey ) ddddorlando@gmail.com |
|
|
rosieq13
Starting Member
1 Post |
Posted - 2005-06-14 : 16:37:29
|
Your instructions on Deleting Duplicate Records are just great. I followed them and was able to accomplish the deletion. Usually I don't rate articles, but I cannot help saying Thanks to this one. |
|
|
speedadkt
Starting Member
4 Posts |
Posted - 2005-11-18 : 16:18:21
|
This is by far the most straight forward, efficient method of the bunch. Works for any number of dupes too which is great.quote: Originally posted by darrendorlando http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!DELETE FROM [Table with Duplicates]WHERE [Primary Key Field] IN ( SELECT a.[Primary Key Field] FROM [Table with Duplicates] a, [Table with Duplicates] b WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey ) ddddorlando@gmail.com
|
|
|
Merkwurdigliebe
Starting Member
1 Post |
Posted - 2006-07-11 : 17:47:17
|
And this is the only example I have found in several hours of looking that works with non-trivial tables and data. How often do you need to clean up a two-column table with an identity column and no PK, anyway?Thanks for the clear, logical, and concise code! I was able to use this to clean up a mess in a table with a three-way compound key comprised of GUID's (uniquidentifiers) that needed a dozen casts in other examples to even think about running. (and still bombed)quote: Originally posted by darrendorlando http://com-hpdevelopersolutions-2s.wwwa.com/members/remoteLoginCheck.cfm?id=MTMzMDAsYmxhY2tmaW4sL21lbWJlcnMvZXF1aXBtZW50L2luZGV4LmNmbQ_This works for me and is way more efficient! Give'r a try, but remember to back up first and test the results to make sure you what you are expecting happens!DELETE FROM [Table with Duplicates]WHERE [Primary Key Field] IN ( SELECT a.[Primary Key Field] FROM [Table with Duplicates] a, [Table with Duplicates] b WHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. Userkey AND a.[Value to check]= b.[Value to Check] -- i.e. Lastname AND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. Firstname AND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey ) ddddorlando@gmail.com
|
|
|
Kristen
Test
22859 Posts |
|
Drummos
Starting Member
4 Posts |
Posted - 2007-02-11 : 14:39:14
|
The article on deleting duplicates worked well for me. But, now every time I try to create the PK a new duplicate record is created. I'm not sure what's going on maybe it's a technical issue in SQL 2005?Ally |
|
|
graz
Chief SQLTeam Crack Dealer
4149 Posts |
Posted - 2007-02-11 : 23:48:34
|
How are you creating the primary key?===============================================Creating tomorrow's legacy systems today.One crisis at a time. |
|
|
SwePeso
Patron Saint of Lost Yaks
30421 Posts |
Posted - 2007-04-16 : 04:49:17
|
How do all these methods compare to this I discovered today?DELETE t1FROM ( SELECT ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2 DESC) AS RecID FROM Table1 ) AS t1WHERE RecID > 1 Peter LarssonHelsingborg, Sweden |
|
|
peter3286
Starting Member
1 Post |
Posted - 2007-04-16 : 07:20:17
|
get the list of duplicate deleteSELECT col1, col2, count(*)FROM t1GROUP BY col1, col2HAVING count(*) > 1 |
|
|
jcarnes
Starting Member
5 Posts |
Posted - 2007-05-27 : 14:17:23
|
I'm trying to use this command from above to delete dupes:____________________DELETE FROM [Table with Duplicates]WHERE [Primary Key Field] IN(SELECT a.[Primary Key Field]FROM [Table with Duplicates] a,[Table with Duplicates] bWHERE a.[Primary Key Field]!= b.[Primary Key Field] -- i.e. UserkeyAND a.[Value to check]= b.[Value to Check] -- i.e. LastnameAND a.[Second Value to Check] = b.[Second Value to Check] -- i.e. FirstnameAND a.[Primary Key Field] < b.[Primary Key Field] -- i.e. Userkey) ____________________Unfortunately, I'm so new to SQL that I don't even know what to put in the various fields. My relevant fields are: id, title, introtext, and sectionid.I'd like this query to check for dupes in "title" and "introtext" and if it finds any AND the "sectionid" is identical, then I'd like to delete the oldest entry and leave the newest. If the "sectionid" won't work with this, it's not a big deal. If the date check won't work, it isn't a big deal either. Trashing any dupes is better than having them and they're usually within a day of each other anyway.If anyone can give me some guidance on this I'd really appreciate it. Thanks! |
|
|
rakesh koyadi
Starting Member
3 Posts |
Posted - 2007-05-30 : 15:51:03
|
Hi, First set the row count to 1 and then delete the duplicate record from the table and then again set the rowcount to 0. see the below example :create table tab12 (a int)select * from tab12insert into tab12 values (1)insert into tab12 values (1)--set the rowcount depending on the number of duplicates you want to deleteset rowcount 1delete from tab12 where a=1set rowcount 0select * from tab12Rakeshrakesh |
|
|
jcarnes
Starting Member
5 Posts |
Posted - 2007-05-30 : 16:17:29
|
I'm really red-faced here. Turns out I have MySQL and therefore this has wasted your time. I'm sorry. I am so new with SQL that I didn't know there were even different flavors. Thanks for the response! |
|
|
aruntom
Starting Member
1 Post |
Posted - 2007-07-11 : 03:22:18
|
If we use DISTINCTIt will not work if our table has text, ntext, or image data typewe wil get an error as:The text, ntext, or image data type cannot be selected as DISTINCT.so add another identity column..then search for replicating data and delete with that new ID..It will work definitely..regardArun Thomas |
|
|
Next Page
|