Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
label
Posting Yak Master
197 Posts |
Posted - 2003-01-23 : 10:22:41
|
| I'm trying to delete all Duplicate rows in a table that has a Unique ID autocount field. I've read the articles here on the sites but they dealt with deleting duplicates from Tables that did not have Unique ID's. Currently, I've written a query that runs very slowly:CREATE PROCEDURE dbo.ap_clean_duplicates ASBegin Create table #duprecords(mycount int, mydup varchar(1000))insert into #duprecordsselect count(modelname+configname+convert(varchar(20), requestdatetime)+convert(varchar(5), result)+[format]+version+username+interfacename), modelname+configname+convert(varchar(20), requestdatetime)+convert(varchar(5), result)+[format]+version+username+interfacename as myuidfrom smc_new_products.dbo.etechmodelrequestsgroup by modelname+configname+convert(varchar(20), requestdatetime)+convert(varchar(5), result)+[format]+version+username+interfacenameCreate table #dupes(req_id int)Declare Cursid Cursor for select distinct(mydup) from #duprecords where mycount>1Declare @cid varchar(1000)Open Cursid Fetch Next from Cursid into @cidwhile (@@fetch_status<>-1) Begin insert into #dupes select top 1 requestID from smc_new_products.dbo.etechmodelrequests where modelname+configname+convert(varchar(20), requestdatetime)+convert(varchar(5), result)+[format]+version+username+interfacename=@cidFetch Next from Cursid into @cidEnd delete from smc_new_products.dbo.etechmodelrequestswhere requestID not in (select req_id from #dupes)drop table #duprecordsdrop table #dupesdeallocate CursidEndGOThe table I'm trying to delete duplicates from has an AutoCount Primary Key Unique Id called "RequestId" and then about 12-15 other columns. I appreciate any help anyone can offer and thank you in advance. |
|
|
Robwhittaker
Yak Posting Veteran
85 Posts |
Posted - 2003-01-23 : 10:41:44
|
| This might not be the best way to do this, but you need a select that gives you a list of requestid to be deleted.select a.requestid from table a, table bwhere a.col = b.coland a.col2 = b.col2 ... and a.requestid <> (select min(b.requestid) from table b)then just delete this lot delete from table where requestid = (select a.requestid from table a, table b where a.col = b.col and a.col2 = b.col2 . . . and a.requestid <> (select min(b.requestid) from table b) ) You do need to worry about null fields, because if null=null it doesn't match so turn these in to "" or something.Bewarned, just re-reading my SQL it's not totally correct when selecting the min requestid, needs to be for the same duplicates but that's why you run the inital select to make sure it gives ypu the records you want to delete, and don't for get to us a trans so you can always roll backEdited by - robwhittaker on 01/23/2003 10:49:29 |
 |
|
|
jsmith8858
Dr. Cross Join
7423 Posts |
Posted - 2003-01-23 : 19:02:36
|
| Try using GROUP BY to find duplicates. Much faster, easier to write, and handles Nulls just find.DELETE FROM TableWHERE ID Not In(SELECT Min(ID) as ID FROM Table GROUP BY Field1, Field2, ....)Be sure to group by all fields. The above will keep the min of each unique ID for all duplicates. Be sure to include all fields that need to unique in the GROUP BY clause (and exclude the ID, of course).Run just the SELECT part to see what you are keeping; everything else will be deleted.- Jeff |
 |
|
|
label
Posting Yak Master
197 Posts |
Posted - 2003-01-24 : 08:53:10
|
quote: Try using GROUP BY to find duplicates. Much faster, easier to write, and handles Nulls just find.DELETE FROM TableWHERE ID Not In(SELECT Min(ID) as ID FROM Table GROUP BY Field1, Field2, ....)Be sure to group by all fields. The above will keep the min of each unique ID for all duplicates. Be sure to include all fields that need to unique in the GROUP BY clause (and exclude the ID, of course).Run just the SELECT part to see what you are keeping; everything else will be deleted.- Jeff
I tried that and it pulled all the records......here's what I tried: SELECT Min(RequestID) as [ID] FROM EtechmodelrequestsGROUP BY modelname, configname, intimestamp, requestdatetime, result, serviceusername, [format], version, username, interfacenameThe table I'm woring with is this: CREATE TABLE [dbo].[EtechModelRequests] ( [RequestID] [int] IDENTITY (1, 1) NOT NULL , [modelname] [varchar] (256) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [configname] [varchar] (256) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [intimestamp] [varchar] (256) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [RequestDateTime] [datetime] NULL , [result] [int] NOT NULL , [serviceusername] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL , [format] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [version] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [username] [varchar] (50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL , [interfacename] [varchar] (256) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ) ON [PRIMARY]GO |
 |
|
|
label
Posting Yak Master
197 Posts |
Posted - 2003-01-24 : 09:04:28
|
| Ok, I was wrong, I'm not getting the exact same number but it's certainly not picking up all the duplicates. I've run 3 different attempts at getting just a raw count and am coming up with 3 different numbers. Below are the different queries I've tried: Thanks again for any insight anyone can provide. select count(*) from etechmodelrequestsResult: 457663SELECT Min(RequestID) as [ID] FROM EtechmodelrequestsGROUP BY modelname, configname, intimestamp, requestdatetime, result, serviceusername, [format], version, username, interfacenameResult: 451447select count(distinct (modelname+configname+intimestamp+convert(varchar(20), requestdatetime)+convert(varchar(20), result)+serviceusername+[format]+version+username+interfacename)) from etechmodelrequestsResult: 449318select count(modelname+configname+convert(varchar(20), requestdatetime)+convert(varchar(5), result)+[format]+version+username+interfacename), modelname+configname+convert(varchar(20), requestdatetime)+convert(varchar(5), result)+[format]+version+username+interfacename as myuidfrom smc_new_products.dbo.etechmodelrequestsgroup by modelname+configname+convert(varchar(20), requestdatetime)+convert(varchar(5), result)+[format]+version+username+interfacenameResult: 397886 |
 |
|
|
label
Posting Yak Master
197 Posts |
Posted - 2003-01-24 : 14:53:07
|
| Surely someone's got an idea of how to do this or why I'm getting 3 different results here? |
 |
|
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2003-01-24 : 15:10:07
|
| You're getting 3 different results because you're running 3 different queries. I don't know what's so surprising about that. Remember that GROUP BY will create groups for each null column, but concatenating a null in a string makes the entire expression null, no matter how many columns may have been null. That's why grouping all of the columns individually has the highest row count. It also explains why the two different count expression have different results, a null expression is not counted.Why doesn't Jeff's solution work for you? |
 |
|
|
label
Posting Yak Master
197 Posts |
Posted - 2003-01-24 : 15:26:42
|
quote: Remember that GROUP BY will create groups for each null column, but concatenating a null in a string makes the entire expression null, no matter how many columns may have been null.
So your saying the last query I used is inaccurate because any null in any column caused the entire expression to return null? Fair enough....so then the problem with both of the last two queries was that I was concatonating nulls into the string which threw off my totals. (If this is an inaccurate interpretation of what you said please correct me as I'm not terribly experienced in this area) quote: Why doesn't Jeff's solution work for you?
Well, it may work indeed. I was assuming that I had 50,000 duplicates in the database (based on the last query in the post above) and that only having 7,000 wasn't reflecting all of the actual duplicates. However, after reading your explanation the exact opposite seems to be true and the 7,000 is right and the 50,000 wrong. Thanks for your insight and help.Edited by - label on 01/24/2003 15:27:10Edited by - label on 01/24/2003 15:27:32 |
 |
|
|
|
|
|
|
|