Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 Purging Duplicate Rows and Then Some

Author  Topic 

BobRoberts
Posting Yak Master

109 Posts

Posted - 2012-07-26 : 14:47:29
I've looked around and tried some things, but have fallen flat. It seems hard enough just to purge duplicate rows, but what I want to do is kicking it up in difficulty. In my example, I have "customer cards" that were converted from another platform. The key field is CustID, but for some reason that conversion assigned each card its individual customer ID, even though it belongs to the same customer. The critical column here is "Text3", that is the old customer ID that was brought over as a sort of comment field. It's really the only way to identify cards with different CustID's as actually belonging to the same customer.

What I want to do is, in the Customers table, delete any duplicate rows for a particular customer. I then want to change the customer ID's in the CustomerCards table to equal the remaining Customer table row's customer ID. So that you're ended up having a true one-to-many relationship.

Also, need to skip any rows in the Customers table that have a Text3 value of 0. Also, if a CustomerCards row has a Status of 1,"active", I want to use the customer ID associated with that row (as far as which one to keep and which to assign to the common CustomerCards rows).

The assumption is, if there are both active (1) and inactive CustomerCards rows, there will only be one listed as active - if more than one is active, then all will be active for that customer.
Here are the 2 tables as they are now:


Customers table:

CustID|Text3|Comment
------------------------------
111111|AAAAAA|Test Comment 1a
222222|AAAAAA|Test comment 2a
333333|AAAAAA|Test comment 3a
444444|BBBBBB|Test coment 4b
555555|CCCCCC|Test comment 5c
666666|CCCCCC|Test comment 6c
777777|0-----|Test comment 70
888888|0-----|Test comment 70
999999|0-----|Test comment 70

CustomerCards table:

CustID|CardID|Status
--------------------
111111|1747473|0----
222222|8472373|1----
333333|6353626|0----
444444|5386262|1----
555555|2536363|1----
666666|6263838|1----
777777|2123445|1----
888888|7849393|1----
999999|4569785|1----

Here is what I'd like them to look like when the query is done:

Desired Customers table results:

CustID|Text3-|Comment
-------------------------------
222222|AAAAAA|Test Comment 1a
444444|BBBBBB|Test coment 4b
555555|CCCCCC|Test comment 5c
777777|0-----|Test comment 70
888888|0-----|Test comment 70
999999|0-----|Test comment 70

Desired CustomerCards table results:

CustID|CardID-|Status
----------------------
222222|1747473|0
222222|8472373|1
222222|6353626|0
444444|5386262|1
555555|2536363|1
555555|6263838|1
777777|2123445|1
888888|7849393|1
999999|4569785|1


Also, I don't know if it's relevant, but in the real example the CustID and CardID columns are of type Unique Identifier, but in this example they are just character. Here is some script to create the test data:

USE [AdventureWorks]
GO
/****** Object: Table [dbo].[Customers] Script Date: 07/26/2012 11:17:06 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Customers](
[CustID] [nvarchar](10) NOT NULL,
[Text3] [nvarchar](20) NOT NULL,
[Comment] [nvarchar](50) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'111111', N'AAAAAA', N'Test Comment 1a')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'222222', N'AAAAAA', N'Test comment 2a')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'333333', N'AAAAAA', N'Test comment 3a')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'444444', N'BBBBBB', N'Test coment 4b')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'555555', N'CCCCCC', N'Test comment 5c')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'666666', N'CCCCCC', N'Test comment 6c')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'777777', N'0', N'Test comment 70')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'888888', N'0', N'Test comment 70')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'999999', N'0', N'Test comment 70')
/****** Object: Table [dbo].[CustomerCards] Script Date: 07/26/2012 11:17:06 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[CustomerCards](
[CustID] [nvarchar](10) NOT NULL,
[CardID] [nvarchar](10) NOT NULL,
[Status] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'111111', N'1747473', 0)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'222222', N'8472373', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'333333', N'6353626', 0)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'444444', N'5386262', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'555555', N'2536363', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'666666', N'6263838', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'777777', N'2123445', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'888888', N'7849393', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'999999', N'4569785', 1)

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-26 : 16:08:19
whats the rule to determine which instance of duplicated customer record to be deleted from customers table?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-26 : 16:35:42
anyways here's an illustration for you assuming random row being retained (as you've not specified rule for deletion from Customers)


--DROP TABLE [dbo].[Customers]
--DROP TABLE [dbo].[CustomerCards]


/****** Object: Table [dbo].[Customers] Script Date: 07/26/2012 11:17:06 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Customers](
[CustID] [nvarchar](10) NOT NULL,
[Text3] [nvarchar](20) NOT NULL,
[Comment] [nvarchar](50) NULL
) ON [PRIMARY]
GO
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'111111', N'AAAAAA', N'Test Comment 1a')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'222222', N'AAAAAA', N'Test comment 2a')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'333333', N'AAAAAA', N'Test comment 3a')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'444444', N'BBBBBB', N'Test coment 4b')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'555555', N'CCCCCC', N'Test comment 5c')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'666666', N'CCCCCC', N'Test comment 6c')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'777777', N'0', N'Test comment 70')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'888888', N'0', N'Test comment 70')
INSERT [dbo].[Customers] ([CustID], [Text3], [Comment]) VALUES (N'999999', N'0', N'Test comment 70')
/****** Object: Table [dbo].[CustomerCards] Script Date: 07/26/2012 11:17:06 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[CustomerCards](
[CustID] [nvarchar](10) NOT NULL,
[CardID] [nvarchar](10) NOT NULL,
[Status] [int] NOT NULL
) ON [PRIMARY]
GO
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'111111', N'1747473', 0)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'222222', N'8472373', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'333333', N'6353626', 0)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'444444', N'5386262', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'555555', N'2536363', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'666666', N'6263838', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'777777', N'2123445', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'888888', N'7849393', 1)
INSERT [dbo].[CustomerCards] ([CustID], [CardID], [Status]) VALUES (N'999999', N'4569785', 1)


select * from Customers

select * from CustomerCards


DECLARE @DELETED_CUST table
(
CustID nvarchar(10),
CustDesc nvarchar(20)
)

DELETE t
OUTPUT DELETED.CustID,DELETED.Text3
INTO @DELETED_CUST
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY [Text3] ORDER BY NEWID()) AS Rn,[text3],CustID
FROM Customers
WHERE text3 <> '0')t
WHERE Rn>1


UPDATE cc
SET cc.CustID=c.CustID
FROM CustomerCards cc
INNER JOIN @DELETED_CUST d
ON d.CustID = cc.CustID
INNER JOIN Customers c
ON c.Text3 = d.CustDesc


select * from Customers

select * from CustomerCards

output
--------------------------------------------------------

before deletion
-------------------------------------

CustID Text3 Comment
111111 AAAAAA Test Comment 1a
222222 AAAAAA Test comment 2a
333333 AAAAAA Test comment 3a
444444 BBBBBB Test coment 4b
555555 CCCCCC Test comment 5c
666666 CCCCCC Test comment 6c
777777 0 Test comment 70
888888 0 Test comment 70
999999 0 Test comment 70

CustID CardID Status
111111 1747473 0
222222 8472373 1
333333 6353626 0
444444 5386262 1
555555 2536363 1
666666 6263838 1
777777 2123445 1
888888 7849393 1
999999 4569785 1


after deletion
--------------------------


CustID Text3 Comment
333333 AAAAAA Test comment 3a
444444 BBBBBB Test coment 4b
666666 CCCCCC Test comment 6c
777777 0 Test comment 70
888888 0 Test comment 70
999999 0 Test comment 70


CustID CardID Status
333333 1747473 0
333333 8472373 1
333333 6353626 0
444444 5386262 1
666666 2536363 1
666666 6263838 1
777777 2123445 1
888888 7849393 1
999999 4569785 1





------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

BobRoberts
Posting Yak Master

109 Posts

Posted - 2012-07-26 : 16:48:29
Keep only one instance and delete all others - it doesn't matter which is kept and which is deleted.

In fact, in thinking about it, forget about everything after the EXCEPT below. It is not needed, and doesn't conform to the realities of the data.

- EXCEPT:
if the associated rows in CustomerCards
have a Status of all 0's with only one 1, then keep the Customers row that has the same CustID as the CustomerCards row with the Status of 1 (this signifies that only 1 card is an active card). In the example, this is illustrated by the Customers table first 3 rows sharing the Text3 of AAAAAA. These CustID's are 111111, 222222, and 333333. The Status for these three CustID's in the CustomerCards table are 0, 1, 0, respectively. So the Customers row with CustID of 222222 is kept - those with 111111 and 333333 are deleted. In the CustomerCards table, all three CustID's are changed to 222222. But locating the status of 1 among 0's is perhaps less important, and I would not consider it critical to the query.




quote:
Originally posted by visakh16

whats the rule to determine which instance of duplicated customer record to be deleted from customers table?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/



Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-26 : 16:59:31
if that condition is not critical you can use the posted suggestion

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page

BobRoberts
Posting Yak Master

109 Posts

Posted - 2012-07-27 : 10:00:29
visakh16, that worked perfectly. Now I'll modify it and try it on the real tables, which contain a little more complexity and so may reveal hidden surprises. I searched all around and, as frequently happens, found bloated, superficial approaches. Many were designed more to show off the vast SQL skill of the author rather than to directly and efficiently accomplish the task. As happens so often, this site is the only one I found that revealed an ELEGANT solution. I don't think I'll ever approach your intuitive SQL ability, but I will say that with every problem I learn a little bit more. In fact, there have been several recent instances where I've come up with solutions to complex problems on my own that six months ago I probably would have posted to this forum. Thank you.




quote:
Originally posted by visakh16

if that condition is not critical you can use the posted suggestion

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/



Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

52326 Posts

Posted - 2012-07-27 : 10:42:09
quote:
Originally posted by BobRoberts

visakh16, that worked perfectly. Now I'll modify it and try it on the real tables, which contain a little more complexity and so may reveal hidden surprises. I searched all around and, as frequently happens, found bloated, superficial approaches. Many were designed more to show off the vast SQL skill of the author rather than to directly and efficiently accomplish the task. As happens so often, this site is the only one I found that revealed an ELEGANT solution. I don't think I'll ever approach your intuitive SQL ability, but I will say that with every problem I learn a little bit more. In fact, there have been several recent instances where I've come up with solutions to complex problems on my own that six months ago I probably would have posted to this forum. Thank you.




quote:
Originally posted by visakh16

if that condition is not critical you can use the posted suggestion

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/






you're welcome Bob

Glad that we could be of encouragement and could make the difference in you. Keep learning and dont hesitate to ask if in doubt. We all have come through the same way

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/

Go to Top of Page
   

- Advertisement -