Please start any new threads on our new site at http://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

Our new SQL Server Forums are live! Come on over! We've restricted the ability to create new threads on these forums.

SQL Server Forums
Profile | Active Topics | Members | Search | Forum FAQ
 
Register Now and get your question answered!
Username:
Password:
Save Password
Forgot your Password?

 All Forums
 SQL Server 2008 Forums
 Transact-SQL (2008)
 SQL bulk insert from mySQL filter out duplicates
 Reply to Topic
 Printer Friendly
Author Previous Topic Topic Next Topic  

2revup
Posting Yak Master

112 Posts

Posted - 06/11/2013 :  19:38:30  Show Profile  Reply with Quote
Guys I am doing a bulk insert from mysql, but I wondered if I could do a bulk insert and put a where around it. I want to do this so I don't insert duplicates.

Here is an example of the bulk insert

-- Create table first (Shows the structure in MSSQL)
CREATE TABLE [dbo].[UserRatings_all](
[CaseID] [int] NULL,
[Correspondence_ID] [bigint] NULL,
[UserID] [varchar](max) NULL,
[Rating] [int] NULL,
[Creation_date] [datetime] NULL
)

-- Ok lets insert data
insert into [UserRatings_all]
select * from openquery([support_cases],'select cr.CASE_ID, cr.CORRESPONDENCE_ID, CORRESPONDENCE_AGENT_ID, RATING, UPDATE_DATE FROM case_correspondence_rating cr
Left join d_case_details cd on cr.CASE_Id=cd.CASE_ID
WHERE UPDATE_DATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
limit 6000000;')

As you can see I am doing it from a date range, but there is room for overlap here, and there appears to be some, I need to ensure that the duplicates are killed, and I am only inserting fresh records.

I could do this on 'CORRESPONDENCE_ID' as this is a unique number. any ways that I could do this without bulk inserting into a temp table and then doing a insert from that temp table?

visakh16
Very Important crosS Applying yaK Herder

India
52326 Posts

Posted - 06/12/2013 :  00:51:00  Show Profile  Reply with Quote
what according to you are duplicates? can you illustrate with an example?

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

2revup
Posting Yak Master

112 Posts

Posted - 06/12/2013 :  01:35:57  Show Profile  Reply with Quote
sure

Lets say I run the insert as above I would get the below from the date range

CaseID Correspondence_ID UserID Rating Creation_date
12911199333 5570802313 User1 4 6/10/2013
12911163573 5570245363 User2 5 6/10/2013
12911163573 5570515953 User3 5 6/10/2013
12898391011 25016198145 User4 2 6/10/2013
12910760153 5564211803 User1 5 6/10/2013
12898959621 25014914845 User9 2 6/10/2013
12899340751 25015936585 User1 5 6/10/2013
12911177063 5570523243 User6 5 6/10/2013

But my data I might have: (NOTE I have added the 'match' text to the right of a field that will match to the top, essentially a duplicate, I need these matches filtered out before inserting)

12911199333 5570802313 User1 4 6/10/2013 Match
12911229593 5571174303 User6 5 6/10/2013
12911228413 5571214293 asais 5 6/10/2013
12911163573 5570245363 User2 5 6/10/2013 Match
12911163573 5570515953 User3 5 6/10/2013 Match
12999894701 25016321555 User3 5 6/10/2013
12911213413 5571028223 User9 3 6/10/2013
12911213413 5571058303 User8 3 6/10/2013



Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52326 Posts

Posted - 06/12/2013 :  01:45:33  Show Profile  Reply with Quote

insert into [UserRatings_all]
select * from openquery([support_cases],'select cr.CASE_ID, cr.CORRESPONDENCE_ID, CORRESPONDENCE_AGENT_ID, RATING, UPDATE_DATE FROM case_correspondence_rating cr
Left join d_case_details cd on cr.CASE_Id=cd.CASE_ID
WHERE UPDATE_DATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
limit 6000000;')t
where not exists (select 1
                  from [UserRatings_all]
                  where CaseID = t.CaseID
                  and Correspondence_ID = t. Correspondence_ID
                 )


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

2revup
Posting Yak Master

112 Posts

Posted - 06/12/2013 :  01:51:20  Show Profile  Reply with Quote
When this is run I am getting 'Invalid column name 'CaseID''
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52326 Posts

Posted - 06/12/2013 :  01:55:41  Show Profile  Reply with Quote

insert into [UserRatings_all]
select * from openquery([support_cases],'select cr.CASE_ID, cr.CORRESPONDENCE_ID, CORRESPONDENCE_AGENT_ID, RATING, UPDATE_DATE FROM case_correspondence_rating cr
Left join d_case_details cd on cr.CASE_Id=cd.CASE_ID
WHERE UPDATE_DATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
limit 6000000;')t
where not exists (
                  select 1
                  from [UserRatings_all]
                  where CaseID = t.Case_ID
                  and Correspondence_ID = t. Correspondence_ID
                  )


------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs

Edited by - visakh16 on 06/12/2013 01:56:56
Go to Top of Page

2revup
Posting Yak Master

112 Posts

Posted - 06/12/2013 :  01:58:21  Show Profile  Reply with Quote
The table I am pulling the data from
= cr.CASE_ID, cr.CORRESPONDENCE_ID

the table I am dumping the data to
= Case_ID, CORRESPONDENCE_ID

So it looks like based on what you have its correct.
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52326 Posts

Posted - 06/12/2013 :  02:00:46  Show Profile  Reply with Quote
quote:
Originally posted by 2revup

The table I am pulling the data from
= cr.CASE_ID, cr.CORRESPONDENCE_ID

the table I am dumping the data to
= Case_ID, CORRESPONDENCE_ID

So it looks like based on what you have its correct.


then it should be


insert into [UserRatings_all]
select * from openquery([support_cases],'select cr.CASE_ID, cr.CORRESPONDENCE_ID, CORRESPONDENCE_AGENT_ID, RATING, UPDATE_DATE FROM case_correspondence_rating cr
Left join d_case_details cd on cr.CASE_Id=cd.CASE_ID
WHERE UPDATE_DATE >= DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY)
limit 6000000;')t
where not exists (
                  select 1
                  from [UserRatings_all]
                  where Case_ID = t.Case_ID
                  and Correspondence_ID = t. Correspondence_ID
                  )



------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page

2revup
Posting Yak Master

112 Posts

Posted - 06/12/2013 :  02:01:30  Show Profile  Reply with Quote
Dang your right I also missed that completely, and it worked like a charm... I didnt think I could do this with MySQL thank you, thank you , thank you!!!!!
Go to Top of Page

visakh16
Very Important crosS Applying yaK Herder

India
52326 Posts

Posted - 06/12/2013 :  02:03:19  Show Profile  Reply with Quote
welcome...
so far as its ANSI based it should work!

------------------------------------------------------------------------------------------------------
SQL Server MVP
http://visakhm.blogspot.com/
https://www.facebook.com/VmBlogs
Go to Top of Page
  Previous Topic Topic Next Topic  
 Reply to Topic
 Printer Friendly
Jump To:
SQL Server Forums © 2000-2009 SQLTeam Publishing, LLC Go To Top Of Page
This page was generated in 0.15 seconds. Powered By: Snitz Forums 2000