Best way to get list of ID's from bulk insert

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Best way to get list of ID's from bulk insert

Author

Topic

urobertson
Starting Member

9 Posts

Posted - 2003-12-17 : 14:09:56

Hi,
I need to add a whole lot of info at one time to an existing database. I need to add about 40,000 products at a time to the main product table and other product data to other tables (colours, sizes, etc) then cross-references, etc, etc.

But there is a ton of other products already contained in the DB. I would like to make sure that after I insert the new products into product table I only insert info relating to those records into the other tables. For instance, I don't want to add colour information for existing products so a simple


INSERT INTO product_colour
SELECT product_id 
FROM products 
INNER JOIN colour 
ON product.barcode = colour.barcode

wouldn't work. I would instead like to use a


INSERT INTO product_colour
SELECT product_id, colour_id 
FROM products 
INNER JOIN colour 
ON product.barcode = colour.barcode
WHERE product_id > @firstNewID AND product_id < @lastNewID

Would code like the following work?


DECLARE 
 @lastNewID int,
 @rowCount int,
 @firstNewID int

SELECT 
 @lastNewID= 0,
 @rowCount = 0,
 @firstNewID = 0

INSERT INTO product
(v.vendor_id, pi.price, s.season_id, ...)
FROM vendor v INNER JOIN productIncoming pi
ON v.vendor_number = pi.vendor_number INNER JOIN ...

SELECT 
 @lastNewID = @@IDENTITY, 
 @rowCount = @@ROWCOUNT,
 @firstNewID = (@lastNewID - @rowCount + 1)

Please let me know if this is the best way of doing this? I'm trying to avoid single-row processing since I have so much data to input it's already slow enough (and I know it's 99% evil - 100%?).

BTW this is just quick code 'cause I just want to find out about the @@IDENTITY, @@ROWCOUNT stuff so if the code seems wonky please ignore. (Unless you can't read my question now. Let me know?).

I looked on the site (and another one) for help but couldn't find anything. Feel free to send me to an FAQ/other thread - I know there's gotta be one out there somewhere! :)

Thanks,
Ursula

X002548
Not Just a Number

15586 Posts

Posted - 2003-12-17 : 14:13:14

Nahhhh

Just say...

WHERE NOT EXISTS (SELECT * FROM PRODUCT_COLOUR WHERE key = key)

Brett

8-)

urobertson
Starting Member

9 Posts

Posted - 2003-12-17 : 15:21:19

Thanks Brett. Much cleaner but I do have a question.

I thought NOT EXISTS and NOT IN's were slow?

We're looking at about 5 inserts like this one involving anywhere from 40,000 to 180,000 records (ex. one product has many colour/size combinations).

There's some pretty heavy joins to grab this info too, although I'm trying to use temp tables where possible to hold common data so I don't have to do the same huge joins over and over.

Then again, the question is: would my way be any faster? Any comments?

I ran them both and they each took 11 seconds. Although, I haven't added in the extra time to save the @@IDENTITY, do the math, pass the variables between sp's, etc, etc, etc,. It does seem your method is the way to go :)

Thanks,
Ursula

X002548
Not Just a Number

15586 Posts

Posted - 2003-12-17 : 16:35:24

The anwer to everything...

It depends...

You can also do (which is the equivalent)

INSERT INTO...
SELECT yada yada yada
FROM Table1 a
INNER JOIN Table2 b
ON a.key = b.key
LEFT JOIN Table3 c
ON a.Key = c.Key
WHERE c.Key IS NULL

Basically WHERE A doesn't exist in C

Brett

8-)

X002548
Not Just a Number

15586 Posts

Posted - 2003-12-17 : 16:37:56

quote:
Originally posted by urobertson

Then again, the question is: would my way be any faster? Any comments?

I ran them both and they each took 11 seconds. Although, I haven't added in the extra time to save the @@IDENTITY, do the math, pass the variables between sp's, etc, etc, etc,. It does seem your method is the way to go :)

Thanks,
Ursula

Let me ask you...what would happen if 2 process where happening at the same time (inserting into that table)? I know you would say it can't...

But in this business, if it can, it will..

Brett

8-)

urobertson
Starting Member

9 Posts

Posted - 2003-12-17 : 17:05:49

Actually, I was considering the implications of >1 process, >1 user etc, etc. However, I WAS thinking that the retrieval of @@IDENTITY right after the insert would occur while the table/row was still locked but this is probably not the case, is it? You could have process-1 do an INSERT, process-2 do an INSERT and when process-1 calls the @@IDENTITY you would have process-2's @@IDENTITY. Correct?

Just out of curiousity would using SCOPE_IDENTITY solve this?

BTW - Thanks for all your help today :)

Thanks,
Ursula

urobertson
Starting Member

9 Posts

Posted - 2003-12-17 : 18:08:17

quote:
The anwer to everything...

It depends...

Ok, how about this: I could have millions of records in each of these tables. The inserts could be anywhere from 40,000 to 180,000 records. Some of the unique keys I am checking are varchar(20). There could be a combined unique key to check for i.e. a product has a vendor_number(varchar(20)) and vendor_id (int). vendor_numbers are distinct for each vendor and vendor_number/vendor_id combinations make a product unique.

Which solution would be theorectically faster? The

WHERE A doesn't exist in C JOIN

or the

NOT EXISTS

? Is

NOT IN

any better than

NOT EXISTS

? Would it be faster to use

NOT EXISTS (SELECT * FROM table...)

NOT EXISTS (SELECT ID FROM table...)

Is there just somewhere I can go where this is already documented? Aaaaaaahhhhhh, so many questions!!!!

Thanks,
Ursula

mohdowais
Sheikh of Yak Knowledge

1456 Posts

Posted - 2003-12-18 : 01:54:02

>>You could have process-1 do an INSERT, process-2 do an INSERT and when process-1 calls the @@IDENTITY you would have process-2's @@IDENTITY. Correct?

Nope. @@IDENTITY works at a connection level, so the chances of another process messing up your @@IDENTITY variable are nil. BUT, the chances of a trigger on the table messing up the variable are high indeed, and for this reason you should always use SCOPE_IDENTITY(). Also look at IDENT_CURRENT...more details in the Books online.

WRT: EXISTS VS IN VS OUTER JOIN, I'd say EXISTS and IN perform fairly well on smaller resultsets. You'll probably see a perfomance benefit using an OUTER JOIN for larger ones. Of course it depends on a lot of things particularly the size of the rows being compared, datatypes, appropriate INDEXING, etc so YMMV. The best way to find out is to test it yourself!

Avoid using SELECT * wherever you can

Owais

We make a living out of what we get, but we make a life out of what we give.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources