Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Duplicate Records

Author  Topic 

doggi3
Yak Posting Veteran

62 Posts

Posted - 2004-10-07 : 03:45:30
I have a table with duplicate Email records.

I have to delete the duplicated records.
For example:
John john@hotmail.com
Mary john@hotmail.com
Nick john@hotmail.com

----------------------

John john@hotmail.com <-- I only require one left. Anyone will do.

I have the following code, but it too is wrong:
Delete From TableName
Where Count(Email_ADD)>1

I have to delete until there is one record of the email left.

Can anyone help?? Thanks!


- HELP -

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-10-07 : 03:59:36
easiest way to do this is to put corrected data into a temp table
delete the original one a nd put the data back into the original from temp table.

Go with the flow & have fun! Else fight the flow
Go to Top of Page

hgorijal
Constraint Violating Yak Guru

277 Posts

Posted - 2004-10-07 : 04:01:29
how did you determine john should be left; or do you not care even if mary is left.

one way (assuming you have a id column in the table)

delete A
from table A, (select min(id) id, email from table group by email) as B
where A.email = B.email and A.id > b.id)


or

delete A
from table A, (select min(name) name, email from table group by email) as B
where A.email = B.email and A.name > b.name)


Hemanth Gorijala
BI Architect / DBA...
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-10-07 : 04:33:16
that's true.
but in my experience deleteing duplicates in
this way can very easily result in unwanted deletes.
one wrong condition in where and bye bye data... and u might even not know why...
that's why i prefer temp table because you can compare data before deletion.

yeah i was also wondering how is he going to determine which records to delete...

Go with the flow & have fun! Else fight the flow
Go to Top of Page

doggi3
Yak Posting Veteran

62 Posts

Posted - 2004-10-07 : 04:35:29
I was reading this article: http://www.sqlteam.com/item.asp?ItemID=3331

This is really a great help on teaching me how to delete duplicate records. However, I am somewhere lost at the last part. Can someone kindly explain?

=====================================
-- Add a new column
-- In real life I'd put an index on it
Alter table dup_authors add NewPK int NULL
go

-- populate the new Primary Key
declare @intCounter int
set @intCounter = 0
update dup_authors
SET @intCounter = NewPK = @intCounter + 1

-- ID the records to delete and get one primary key value also
-- We'll delete all but this primary key
select au_lname, au_fname, city, state, RecCount=count(*), PktoKeep = max(NewPK)
into #dupes
from dup_authors
group by au_lname, au_fname, city, state
having count(*) > 1
order by count(*) desc, au_lname, au_fname

-- delete dupes except one Primary key for each dup record
delete dup_authors
from dup_authors a join #dupes d
on d.au_lname = a.au_lname
and d.au_fname = a.au_fname
and d.city = a.city
and d.state = a.state
where a.NewPK not in (select PKtoKeep from #dupes)

-- remove the NewPK column
ALTER TABLE dup_authors DROP COLUMN NewPK
go

drop table #dupes

=====================================

My original table does not have a PK. I only want the email address to be unique.

I am so lost. I do not know which fields to change. Help...

Thx hgorijal. I don't care if John or Mary is left. Anyone will do. I only want the email address to be unique.




- HELP -
Go to Top of Page

jen
Master Smack Fu Yak Hacker

4110 Posts

Posted - 2004-10-07 : 04:49:24
how about if you get the email address which are duplicates, then decide which user should own the email address? from there you can set the other email address to a new value. don't delete, instead update the eamil address field to null, that way you can use the username as key if it's unique.
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-10-07 : 04:56:55
then you simply do what hgorijal already suggested:
select min(name) as name, email
into #tempTable
from MyTable
group by email

truncate MyTable
insert into MyTable
select name, email
from #temp

drop #temp

be carefull with truncate. it doesn't log the transaction.

Go with the flow & have fun! Else fight the flow
Go to Top of Page

doggi3
Yak Posting Veteran

62 Posts

Posted - 2004-10-07 : 05:00:25
What does "min(name)" means?

- HELP -
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-10-07 : 05:07:29
it gets the first name orders alphabetcally.
name:
ana
ala
aba
aca

select min(Name)

returns aba



Go with the flow & have fun! Else fight the flow
Go to Top of Page

doggi3
Yak Posting Veteran

62 Posts

Posted - 2004-10-07 : 05:10:15
select min(PERS_LAST_NM) as PERS_LAST_NM, EMAIL_ADD_NM
into #tempTable
from TableName
group by EMAIL_ADD_NM

truncate TableName
insert into TableName
select PERS_LAST_NM, EMAIL_ADD_NM
from #temp

drop #temp

==================================

I run the code above. It says: Incorrect Syntax near 'TableName'

- HELP -
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-10-07 : 05:21:45
back up your data before u do this:

select min(PERS_LAST_NM) as PERS_LAST_NM, EMAIL_ADD_NM
into #tempTable
from TableName
group by EMAIL_ADD_NM

TRUNCATE TABLE TableName

insert into TableName
select PERS_LAST_NM, EMAIL_ADD_NM
from #tempTable

drop #tempTable


Go with the flow & have fun! Else fight the flow
Go to Top of Page

doggi3
Yak Posting Veteran

62 Posts

Posted - 2004-10-07 : 05:26:57
I copied and paste the query, but it says incorrect syntax near '#tempTable'.

Do I have to create a table call 'tempTable'?

- HELP -
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-10-07 : 05:29:59
change
drop #tempTable
to
drop table #tempTable

my bad..

Go with the flow & have fun! Else fight the flow
Go to Top of Page

doggi3
Yak Posting Veteran

62 Posts

Posted - 2004-10-07 : 05:45:01


I copy and paste the whole query. I went back to: Select (*) from TableName

My table is empty. What happened?

I tried to break up the query and do bit by bit.

After:
SELECT MIN(PERS_LAST_NM) AS PERS_LAST_NM, EMAIL_ADD_NM
INTO #tempTable
FROM TableName
GROUP BY EMAIL_ADD_NM

It says 5908 records were affected. I have a total of 7496 records.

After:
insert into TableName
select PERS_LAST_NM, EMAIL_ADD_NM
from #tempTable

It says: Column name or number of supplied values does not match table defination.

I dun understand.



- HELP -
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-10-07 : 06:07:10
how many columns do you have in TableName?

Go with the flow & have fun! Else fight the flow
Go to Top of Page

doggi3
Yak Posting Veteran

62 Posts

Posted - 2004-10-07 : 06:21:25
104

- HELP -
Go to Top of Page

doggi3
Yak Posting Veteran

62 Posts

Posted - 2004-10-07 : 06:27:22
I did:
SELECT <Column Names>, MIN(PERS_LAST_NM) AS PERS_LAST_NM, EMAIL_ADD_NM, <Column Names>
INTO #tempTable
FROM TableName
GROUP BY <Column Names>, EMAIL_ADD_NM, <Column Names>

TRUNCATE TABLE TableName

insert into TableName
select <Column Names>, PERS_LAST_NM, EMAIL_ADD_NM, <Column Names>
from #tempTable

Here, it gives me the error: Column name or number of supplied values does not match table defination.




- HELP -
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2004-10-07 : 07:33:10
104??? ufff...
ok then u need to do this
select t1.*
into #tempTable
from TableName t1
inner join (select SELECT MIN(PERS_LAST_NM) AS PERS_LAST_NM, EMAIL_ADD_NM from TableName group by EMAIL_ADD_NM) t2 on t1.PERS_LAST_NM = t2.PERS_LAST_NM and t1.EMAIL_ADD_NM = t2.EMAIL_ADD_NM

... do other stuff

Go with the flow & have fun! Else fight the flow
Go to Top of Page

hgorijal
Constraint Violating Yak Guru

277 Posts

Posted - 2004-10-07 : 07:45:49
quote:
Originally posted by doggi3

I did:
SELECT <Column Names>, MIN(PERS_LAST_NM) AS PERS_LAST_NM, EMAIL_ADD_NM, <Column Names>
INTO #tempTable
FROM TableName
GROUP BY <Column Names>, EMAIL_ADD_NM, <Column Names>

TRUNCATE TABLE TableName

insert into TableName
select <Column Names>, PERS_LAST_NM, EMAIL_ADD_NM, <Column Names>
from #tempTable

Here, it gives me the error: Column name or number of supplied values does not match table defination.




- HELP -





Everything can be the same.. Just change the Insert to

insert into TableName <Column Names>, PERS_LAST_NM, EMAIL_ADD_NM, <Column Names>
select <Column Names>, PERS_LAST_NM, EMAIL_ADD_NM, <Column Names>

Make sure the column names are in the same order in botht he cases.

If you are still getting the same error, see if the table has an IDENTITY column.

Hemanth Gorijala
BI Architect / DBA...
Go to Top of Page

doggi3
Yak Posting Veteran

62 Posts

Posted - 2004-10-07 : 22:07:07
Thanks all so much~ I got it~ ^o^

- HELP -
Go to Top of Page
   

- Advertisement -