Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Duplicates again

Author  Topic 

SamC
White Water Yakist

3467 Posts

Posted - 2005-03-09 : 12:11:40
I've inherited a DB and a table with duplicates. Gotta eliminate the dups.

Problem is, there's nothing unique in the row, so the usual technique of deleting the one with the smallest date/pkid or whatever isn't there.

How do I delete the dups when

select a, b, c
from mytable
group by a, b, c
having count(*) > 1
There are no other columns...

Sam

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2005-03-09 : 13:05:31
How about:

Select a,b,c INTO myNewTable from myTable group by a,b,c

drop myTable
exec sp_rename 'myNewTable', 'myTable'

Be One with the Optimizer
TG
Go to Top of Page

PW
Yak Posting Veteran

95 Posts

Posted - 2005-03-09 : 13:13:09
Curious, why a GROUP BY and not DISTINCT ?

Query plan is the same, but DISTINCT saves you typing column names, since you can't do GROUP BY *

SELECT DISTINCT * INTO myNewTable FROM myTable
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-03-09 : 13:25:49
Well I'm not a big fan for select *

But I think TG meant to say SELECT DISTINCT

That would do nicely

And then how about a unique constraint on all the columns?



Brett

8-)
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-03-09 : 13:33:30
Thanks guys.

Another thing. Isn't there a Select into something or other that'll create the destination table? I've used it once or twice before. Seems like it would be perfect for this problem.
Go to Top of Page

spirit1
Cybernetic Yak Master

11752 Posts

Posted - 2005-03-09 : 13:35:54
you mean:

select ....
into #temp
from ...

Go with the flow & have fun! Else fight the flow
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-03-09 : 13:37:23
Sam...it's right here..TG just left off the DISTINCT

quote:
Originally posted by TG

How about:

Select DISTINCTa,b,c INTO myNewTable from myTable group by a,b,c

drop myTable
exec sp_rename 'myNewTable', 'myTable'

Be One with the Optimizer
TG



Brett

8-)
Go to Top of Page

PW
Yak Posting Veteran

95 Posts

Posted - 2005-03-09 : 13:42:33
Yes, but why a GROUP BY ?
If there are no aggregates, it's redundant. Just use DISTINCT ...
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2005-03-09 : 13:43:22
I deliberately chose to use "group by" rather than "distinct". Either works, you don't need both. (Personal preference)

Be One with the Optimizer
TG
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-03-09 : 13:48:08
Brett, thanks. I didn't read TG's code, just got the idea, ran to the end zone, realized I didn't have the ball...

PW - OK! OK! I won't group by !
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-03-09 : 13:52:05
Yeah, and I'm Eric Lindro's skating with my head down...

I didn't see the GROUP BY in TG's code

And TG, DISTINCT and GROUP BY are not the same.

GROUP BY will have more overhead

Gotta look that up though...could be wrong (What? not again)



Brett

8-)
Go to Top of Page

PW
Yak Posting Veteran

95 Posts

Posted - 2005-03-09 : 13:56:41
>>GROUP BY will have more overhead

To thrash the already dead horse some more, in my testing, the query plan was exactly the same when there were no aggregates. So the issue was not performance, just verbosity of the code and requirement to replicate the entire column list twice, which was trivial in the case of (a,b,c).
Go to Top of Page

TG
Master Smack Fu Yak Hacker

6065 Posts

Posted - 2005-03-09 : 13:57:22
Actually PW, like you said the exec plan is the same and I have found that using "group by" helps me minimize coding errors on more complex statements. If I explicitly state the grouping then I'm more likely to find problems with join logic (or brain logic) because I'll get sql syntax errors rather than just less rows. The habbit just carried over for me this statement, I didn't even think about it.

Be One with the Optimizer
TG
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-03-09 : 14:08:50
Surprised me:
quote:
Server: Msg 8163, Level 16, State 3, Line 1
The text, ntext, or image data type cannot be selected as DISTINCT.
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-03-09 : 14:10:02
Changing to GROUP BY:
quote:
Server: Msg 306, Level 16, State 2, Line 1
The text, ntext, and image data types cannot be compared or sorted, except when using IS NULL or LIKE operator.
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-03-09 : 14:12:46
Well which type is it?

And what's it's MAX(DATALENGTH(

Can you post the DDL?



Brett

8-)
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-03-09 : 14:13:11
The column is type text. I wonder why they didn't use varchar?
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-03-09 : 14:13:54
[code]CREATE TABLE [sur_response_answer] (
[response_id] [int] NOT NULL ,
[item_id] [int] NOT NULL ,
[subitem_id] [int] NULL ,
[answer_id] [int] NULL ,
[answer_text] [text] COLLATE SQL_Latin1_General_CP1_CI_AS NULL ,
[other_text] [text] COLLATE SQL_Latin1_General_CP1_CI_AS NULL
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO[/code]
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2005-03-09 : 14:14:50
I think I'll CAST it as varchar then see what happens...
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-03-09 : 14:27:17
I have to get her book on internals

http://www.windowsitpro.com/SQLServer/Article/ArticleID/23519/23519.html



Brett

8-)
Go to Top of Page

X002548
Not Just a Number

15586 Posts

Posted - 2005-03-09 : 14:31:05
What does


SELECT MAX(DATALENGTH(answer_text))
, MAX(DATALENGTH(other_text))
FROM sur_response_answer



Give you?



Brett

8-)
Go to Top of Page
    Next Page

- Advertisement -