Please start any new threads on our new site at We've got lots of great SQL Server experts to answer whatever question you can come up with.

Our new SQL Server Forums are live! Come on over! We've restricted the ability to create new threads on these forums.

SQL Server Forums
Profile | Active Topics | Members | Search | Forum FAQ
Save Password
Forgot your Password?

 All Forums
 General SQL Server Forums
 New to SQL Server Programming
 De duping methods
 Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

Aged Yak Warrior

United Kingdom
550 Posts

Posted - 01/13/2013 :  16:01:15  Show Profile  Reply with Quote
Hello there.

Does anyone know of any good de duping methods between two table tables.

or is there no common or best practice way of doing it. Just simply de-duping on a unique column that would be in both tables and perform sub querys ie ( where not in) or (exists )

I dont have an example yet, just wondering if there are any good ideas.

any help would be appreciated.

Thank you.

Flowing Fount of Yak Knowledge

2875 Posts

Posted - 01/13/2013 :  17:23:39  Show Profile  Reply with Quote
It really depends on what you are trying to accomplish. EXISTS and NOT EXISTS are good options. So are INTERSECT and EXCEPT, as well as MERGE WHEN MATCHED ON SOURCE. The better question is, why do you need to de-dup between two tables?


Everyday I learn something that somebody else already knew
Go to Top of Page

Jeff Moden
Aged Yak Warrior

652 Posts

Posted - 01/13/2013 :  21:59:31  Show Profile  Reply with Quote
JimF touched on many of the methods above. The reason why someone would want to do this is typically in the area of ETL. I consider it to be fool-hardy to try an import data directly to a final table. It think it's much safer to load the data into a staging table, validate it, identify what is new and what must be updated, and only then start adding to or modifying the target table. It usually turns out to be faster, as well because I don't generally have to do joined inserts or updates on a table that is in use. No blocking to worry about on the staging table.

--Jeff Moden
RBAR is pronounced "ree-bar" and is a "Modenism" for "Row By Agonizing Row".

First step towards the paradigm shift of writing Set Based code:
"Stop thinking about what you want to do to a row... think, instead, of what you want to do to a column."

When writing schedules, keep the following in mind:
"If you want it real bad, that's the way you'll likely get it."
Go to Top of Page

Very Important crosS Applying yaK Herder

52326 Posts

Posted - 01/13/2013 :  22:30:34  Show Profile  Reply with Quote
We dump the incoming data onto staging table and then do all validations, checks, transformation etc as Jeff suggested. The logic for data transfer from source to staging would be straight pull. For insert/updates we make use of datetime fields to compare between source and destination and do insert/updates. To compare, we can use several methods

SQL Server MVP

Go to Top of Page
  Previous Topic Topic Next Topic  
 Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.02 seconds. Powered By: Snitz Forums 2000