Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
| Author |
Topic |
|
urobertson
Starting Member
9 Posts |
Posted - 2003-12-17 : 14:09:56
|
Hi,I need to add a whole lot of info at one time to an existing database. I need to add about 40,000 products at a time to the main product table and other product data to other tables (colours, sizes, etc) then cross-references, etc, etc.But there is a ton of other products already contained in the DB. I would like to make sure that after I insert the new products into product table I only insert info relating to those records into the other tables. For instance, I don't want to add colour information for existing products so a simple INSERT INTO product_colourSELECT product_id FROM products INNER JOIN colour ON product.barcode = colour.barcode wouldn't work. I would instead like to use a INSERT INTO product_colourSELECT product_id, colour_id FROM products INNER JOIN colour ON product.barcode = colour.barcodeWHERE product_id > @firstNewID AND product_id < @lastNewID Would code like the following work?DECLARE @lastNewID int, @rowCount int, @firstNewID intSELECT @lastNewID= 0, @rowCount = 0, @firstNewID = 0INSERT INTO product(v.vendor_id, pi.price, s.season_id, ...)FROM vendor v INNER JOIN productIncoming piON v.vendor_number = pi.vendor_number INNER JOIN ...SELECT @lastNewID = @@IDENTITY, @rowCount = @@ROWCOUNT, @firstNewID = (@lastNewID - @rowCount + 1) Please let me know if this is the best way of doing this? I'm trying to avoid single-row processing since I have so much data to input it's already slow enough (and I know it's 99% evil - 100%?). BTW this is just quick code 'cause I just want to find out about the @@IDENTITY, @@ROWCOUNT stuff so if the code seems wonky please ignore. (Unless you can't read my question now. Let me know?).I looked on the site (and another one) for help but couldn't find anything. Feel free to send me to an FAQ/other thread - I know there's gotta be one out there somewhere! :)Thanks,Ursula |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-12-17 : 14:13:14
|
| NahhhhJust say... WHERE NOT EXISTS (SELECT * FROM PRODUCT_COLOUR WHERE key = key)Brett8-) |
 |
|
|
urobertson
Starting Member
9 Posts |
Posted - 2003-12-17 : 15:21:19
|
| Thanks Brett. Much cleaner but I do have a question.I thought NOT EXISTS and NOT IN's were slow?We're looking at about 5 inserts like this one involving anywhere from 40,000 to 180,000 records (ex. one product has many colour/size combinations).There's some pretty heavy joins to grab this info too, although I'm trying to use temp tables where possible to hold common data so I don't have to do the same huge joins over and over.Then again, the question is: would my way be any faster? Any comments?I ran them both and they each took 11 seconds. Although, I haven't added in the extra time to save the @@IDENTITY, do the math, pass the variables between sp's, etc, etc, etc,. It does seem your method is the way to go :)Thanks,Ursula |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-12-17 : 16:35:24
|
| The anwer to everything...It depends...You can also do (which is the equivalent)INSERT INTO...SELECT yada yada yadaFROM Table1 aINNER JOIN Table2 bON a.key = b.keyLEFT JOIN Table3 cON a.Key = c.KeyWHERE c.Key IS NULLBasically WHERE A doesn't exist in CBrett8-) |
 |
|
|
X002548
Not Just a Number
15586 Posts |
Posted - 2003-12-17 : 16:37:56
|
quote: Originally posted by urobertson Then again, the question is: would my way be any faster? Any comments?I ran them both and they each took 11 seconds. Although, I haven't added in the extra time to save the @@IDENTITY, do the math, pass the variables between sp's, etc, etc, etc,. It does seem your method is the way to go :)Thanks,Ursula
Let me ask you...what would happen if 2 process where happening at the same time (inserting into that table)? I know you would say it can't...But in this business, if it can, it will..Brett8-) |
 |
|
|
urobertson
Starting Member
9 Posts |
Posted - 2003-12-17 : 17:05:49
|
| Actually, I was considering the implications of >1 process, >1 user etc, etc. However, I WAS thinking that the retrieval of @@IDENTITY right after the insert would occur while the table/row was still locked but this is probably not the case, is it? You could have process-1 do an INSERT, process-2 do an INSERT and when process-1 calls the @@IDENTITY you would have process-2's @@IDENTITY. Correct? Just out of curiousity would using SCOPE_IDENTITY solve this?BTW - Thanks for all your help today :)Thanks,Ursula |
 |
|
|
urobertson
Starting Member
9 Posts |
Posted - 2003-12-17 : 18:08:17
|
quote: The anwer to everything...It depends...
Ok, how about this: I could have millions of records in each of these tables. The inserts could be anywhere from 40,000 to 180,000 records. Some of the unique keys I am checking are varchar(20). There could be a combined unique key to check for i.e. a product has a vendor_number(varchar(20)) and vendor_id (int). vendor_numbers are distinct for each vendor and vendor_number/vendor_id combinations make a product unique.Which solution would be theorectically faster? The WHERE A doesn't exist in C JOIN or the NOT EXISTS ? Is NOT IN any better than NOT EXISTS ? Would it be faster to use NOT EXISTS (SELECT * FROM table...) or NOT EXISTS (SELECT ID FROM table...) Is there just somewhere I can go where this is already documented? Aaaaaaahhhhhh, so many questions!!!!Thanks,Ursula |
 |
|
|
mohdowais
Sheikh of Yak Knowledge
1456 Posts |
Posted - 2003-12-18 : 01:54:02
|
| >>You could have process-1 do an INSERT, process-2 do an INSERT and when process-1 calls the @@IDENTITY you would have process-2's @@IDENTITY. Correct?Nope. @@IDENTITY works at a connection level, so the chances of another process messing up your @@IDENTITY variable are nil. BUT, the chances of a trigger on the table messing up the variable are high indeed, and for this reason you should always use SCOPE_IDENTITY(). Also look at IDENT_CURRENT...more details in the Books online.WRT: EXISTS VS IN VS OUTER JOIN, I'd say EXISTS and IN perform fairly well on smaller resultsets. You'll probably see a perfomance benefit using an OUTER JOIN for larger ones. Of course it depends on a lot of things particularly the size of the rows being compared, datatypes, appropriate INDEXING, etc so YMMV. The best way to find out is to test it yourself!Avoid using SELECT * wherever you canOwaisWe make a living out of what we get, but we make a life out of what we give. |
 |
|
|
|
|
|
|
|