Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
pwvailla
Starting Member
31 Posts |
Posted - 2010-12-02 : 17:04:00
|
New to SQL Server 2005, I have to extract valid PO's from an Oracle Purchasing system which I have only read rights to. I need to pull 50,000 rows of valid po's for the project control system. I can't use the IMPORT data routine as a full table retrieval is too slow and it crashes due to network or data governance issues, not to mention it would take forever.I don't know enough about the constructs I should use to build a results table which needs to go against a SQL Server 2005 reference table I have which contains the valid po, po_lineno values I need. What would the "looping" DO WHILE (or whatever) look like? assuming:the key fields needed to extract are po, po_lineno, po_descthe reference table is dbo.VALID_POsthe Oracle Purchasing table is PO.PO_LINES_ALLthe destination table is dbo.PO_LINES_ALLthanks (and then some) |
|
yosiasz
Master Smack Fu Yak Hacker
1635 Posts |
Posted - 2010-12-02 : 17:17:23
|
is this continuous or a one time thing?If you don't have the passion to help people, you have no passion |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-02 : 17:28:43
|
I would do:SELECT *INTO #TEMPFROM OPENQUERY(MyLinkedOracleServer, 'SELECT Col1, Col2, ... FROM OracleDatbase.OracleTable WHERE ...') and then JOIN #TEMP to the appropriate local tables and make my update.Make sure that you Oracle Query is carefully optimised (if that is the slow part) |
 |
|
pwvailla
Starting Member
31 Posts |
Posted - 2010-12-02 : 23:27:11
|
Not sure this answers the question, but...Where does the #TEMP table get created? I have no ability to create on the Oracle environment. I am needing to reference a lookup table on SQL Server 2005 database on another server environment (my WHERE clause will require me to SELECT from Oracle ERP where a value EXISTS in a SQL environment). That is possible?quote: Originally posted by Kristen I would do:SELECT *INTO #TEMPFROM OPENQUERY(MyLinkedOracleServer, 'SELECT Col1, Col2, ... FROM OracleDatbase.OracleTable WHERE ...') and then JOIN #TEMP to the appropriate local tables and make my update.Make sure that you Oracle Query is carefully optimised (if that is the slow part)
|
 |
|
Sachin.Nand
2937 Posts |
Posted - 2010-12-03 : 00:14:17
|
Are you able to query the oracle DB using linked server?Something like thisselect * from openquery(MyLinkedOracleServer,'SELECT Col1,Col2, ... FROM OracleDatbase.OracleTable) PBUH |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-03 : 03:05:49
|
"Where does the #TEMP table get created?"Locally on SQL Server (in its TEMPDB databases, but you could use a permanent table if you prefer - clear it down before each batch, for example)"(my WHERE clause will require me to SELECT from Oracle ERP where a value EXISTS in a SQL environment). That is possible?"Possible. But tricky.IME things that do that run like a dog The Oracle databases we connect to have a ChangeDate on every row / table. We use that to get anything "newer" than last time we queried the database. Then we put them in our Staging tables. Then we process them from there.Key benefit is that we have a light-weight query for oracle (only things which are newer than last time) and we move relatively little data between the databases. It still takes 15 minutes to run each batch, and we do that twice an hour! |
 |
|
pwvailla
Starting Member
31 Posts |
Posted - 2010-12-03 : 16:22:36
|
I am making progress, now that IT has fixed the privileges that I was denied. I want a query like this:SELECT *INTO #TEMPFROM OPENQUERY(gpspgjp1, SELECT po_header_id, segment1, vendor_id, vendor_site_id FROM PO.PO_HEADERS_ALL WHERE PO.PO_HEADERS_ALL.segment1 EXISTS (SELECT DBO.VALID_PO_PROJECTS.segment1 FROM DBO.VALID_PO_PROJECTS))How can I get a subset of the source table by using a lookup table on the destination server? Source is an Oracle Purchasing table containing 2 million rows and I only require about 50,000 rows to be retrieved to the destination server running SQL Server 2005.So far I am getting an invalid relational operator... which lead me to believe I can't do this type of query. Is there another way to tackle it?How would you set up a loop construct to go row by row through my reference table (actually a view) and retrieve data a row at a time? That is the only way I see this working at the moment...quote: Originally posted by pwvailla Not sure this answers the question, but...Where does the #TEMP table get created? I have no ability to create on the Oracle environment. I am needing to reference a lookup table on SQL Server 2005 database on another server environment (my WHERE clause will require me to SELECT from Oracle ERP where a value EXISTS in a SQL environment). That is possible?quote: Originally posted by Kristen I would do:SELECT *INTO #TEMPFROM OPENQUERY(MyLinkedOracleServer, 'SELECT Col1, Col2, ... FROM OracleDatbase.OracleTable WHERE ...') and then JOIN #TEMP to the appropriate local tables and make my update.Make sure that you Oracle Query is carefully optimised (if that is the slow part)
|
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-03 : 19:16:03
|
"How can I get a subset of the source table by using a lookup table on the destination server?"You can do something likeSELECT po_header_id, segment1, vendor_id, vendor_site_idFROM gpspgjp1.PO.PO_HEADERS_ALL AS POHAJOIN dbo.VALID_PO_PROJECTS ON dbo.VALID_PO_PROJECTS.segment1 = POHA.segment1 but I think it will run very slow.You could also use dynamic to construct a list of the values for segment1 in VALID_PO_PROJECTS and pass it as a "list" using an IN clause- but if the IN clause is large I think that will be unworkable too.Or you could pull the "changed data" from oracle, and "apply" the changes to a local staging table, and then JOIN in the normal way. In effect you will have a complete local copy of the oracle table - which may be huge and thus unworkable. Compromises will be required I'm afraid. |
 |
|
yosiasz
Master Smack Fu Yak Hacker
1635 Posts |
Posted - 2010-12-03 : 19:29:40
|
dump n rows to n csv, xml files. then SSIS into your destination tableif this is a continuous thing initial pull will be big and slow subsequent pulls will be nice and fastIf you don't have the passion to help people, you have no passion |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2010-12-03 : 19:49:39
|
quote: You could also use dynamic to construct a list of the values for segment1 in VALID_PO_PROJECTS and pass it as a "list" using an IN clause
I second this, I had to do the same thing with an Oracle data warehouse a number of years ago. It's aggravating but the performance is about as good as you'll get.quote: but if the IN clause is large I think that will be unworkable too
I had to break up my IN list into 1000-2000 element chunks and run multiple statements to get them all, but it worked and was quite fast. It's difficult to do as dynamic SQL using OPENQUERY, my recommendation is to generate all the SQL to a script file and run the script via the sqlcmd utility, or run it manually. OPENQUERY will probably give better performance than using 4-part references. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-04 : 02:13:18
|
That's good to hear Rob. I've never had to do it with anything truly large, but I'll keep it in mind for when I do have to |
 |
|
pwvailla
Starting Member
31 Posts |
Posted - 2010-12-05 : 17:55:48
|
Rob,I am going against a 2 million row Oracle Purchasing ERP table... VERY slow going. How would your example of 1000-2000 record chunks look?I have been wondering if I could learn to use a stored procedure or something that would run a query against the 2M table for each of the 50,000 valid POs that I need data for. 1-2000 rows sounds better but I am not sure how to do this since I am new to SQL Server 2005.quote: Originally posted by robvolk
quote: You could also use dynamic to construct a list of the values for segment1 in VALID_PO_PROJECTS and pass it as a "list" using an IN clause
I second this, I had to do the same thing with an Oracle data warehouse a number of years ago. It's aggravating but the performance is about as good as you'll get.quote: but if the IN clause is large I think that will be unworkable too
I had to break up my IN list into 1000-2000 element chunks and run multiple statements to get them all, but it worked and was quite fast. It's difficult to do as dynamic SQL using OPENQUERY, my recommendation is to generate all the SQL to a script file and run the script via the sqlcmd utility, or run it manually. OPENQUERY will probably give better performance than using 4-part references. |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2010-12-05 : 21:59:53
|
Essentially you build a series of SQL statements like so:INSERT INTO #TEMPSELECT * FROM OPENQUERY(gpspgjp1, 'SELECT po_header_id, segment1, vendor_id, vendor_site_idFROM PO.PO_HEADERS_ALL WHERE PO.PO_HEADERS_ALL.segment1 IN(1001,1002,1003,...,2000)') You'll replace the portion in red with a list of segment IDs from DBO.VALID_PO_PROJECTS. I'd recommend starting with 1000 IDs at most and seeing what the performance is like. If the Oracle table is indexed on that column then pulling only 1000 at a time should be quick, even if there are millions of rows. You can then try 2000, 5000 or even more, but you'll reach some practical limit short of your total of 50,000.I'd recommend building these statements by hand and running them before worrying about automating or generating it dynamically. You can do SELECT TOP 1000 segment1 + ',' FROM DBO.VALID_PO_PROJECTS and pasting the results into the red portion of the query above. If you find that 1000 take to long, reduce it to 500, or 200 or whatever provides reasonable performance. |
 |
|
pwvailla
Starting Member
31 Posts |
Posted - 2010-12-06 : 09:59:55
|
Rob,Things are starting to sink into my reptile brain. OPENQUERY executes as if it is on the Oracle box, therefore Oracle SQL is used?I am getting the "ORA-01722: invalid number message. This is a VARCHAR(20) field on Oracle. Is there an equivalent statement for the VARCHAR statement you would use? nI have tried to_number, enclosing in quotes....quote: Originally posted by robvolk Essentially you build a series of SQL statements like so:INSERT INTO #TEMPSELECT * FROM OPENQUERY(gpspgjp1, 'SELECT po_header_id, segment1, vendor_id, vendor_site_idFROM PO.PO_HEADERS_ALL WHERE PO.PO_HEADERS_ALL.segment1 IN(1001,1002,1003,...,2000)') You'll replace the portion in red with a list of segment IDs from DBO.VALID_PO_PROJECTS. I'd recommend starting with 1000 IDs at most and seeing what the performance is like. If the Oracle table is indexed on that column then pulling only 1000 at a time should be quick, even if there are millions of rows. You can then try 2000, 5000 or even more, but you'll reach some practical limit short of your total of 50,000.I'd recommend building these statements by hand and running them before worrying about automating or generating it dynamically. You can do SELECT TOP 1000 segment1 + ',' FROM DBO.VALID_PO_PROJECTS and pasting the results into the red portion of the query above. If you find that 1000 take to long, reduce it to 500, or 200 or whatever provides reasonable performance.
|
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2010-12-06 : 10:31:14
|
Yes, OPENQUERY runs it on the Oracle box and returns the results. There is minimal overhead compared to using 4-part names with a JOIN. The SQL does need to conform to Oracle syntax, but what you're using here is mostly identical to SQL Server, and you're generating most of it anyway.Try this to get the correct quotes on the varchar data:SELECT TOP 1000 '''''' + segment1 + ''''',' FROM DBO.VALID_PO_PROJECTS Paste those results into the query as I described earlier. |
 |
|
pwvailla
Starting Member
31 Posts |
Posted - 2010-12-06 : 16:27:52
|
Rob,Still not able to get this working but I was able to get this to run:USE GPO_PCSELECT * FROM OPENQUERY (gpspgjp1, 'SELECT po_header_id, po_line_id, line_num, unit_price, quantity, unit_meas_lookup_code, item_description, attribute3, attribute4, attribute6, attribute5, closed_flag, closed_date, cancel_flag, cancel_date FROM po.po_lines_all')718934 rows retrieved in 7:40 seconds. I can live with that as the system I will replace took hours to get the subset of data I really need.How can I INSERT this data into a table on SQL Server 2005? If I can do that, then I am (no you are) a rock star.Where would the INSERT INTO dbo.PO_LINES_ALL (SQL Server tablename) go?Or is there a way to write the query results to a file on my SQL Server disk and then parse it for the insert (if that were faster).Thanks - Paulquote: Originally posted by robvolk Yes, OPENQUERY runs it on the Oracle box and returns the results. There is minimal overhead compared to using 4-part names with a JOIN. The SQL does need to conform to Oracle syntax, but what you're using here is mostly identical to SQL Server, and you're generating most of it anyway.Try this to get the correct quotes on the varchar data:SELECT TOP 1000 '''''' + segment1 + ''''',' FROM DBO.VALID_PO_PROJECTS Paste those results into the query as I described earlier.
|
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2010-12-06 : 16:43:36
|
This should work:INSERT INTO dbo.PO_LINES_ALLSELECT * FROM OPENQUERY (gpspgjp1,'SELECTpo_header_id,po_line_id,line_num,unit_price,quantity,unit_meas_lookup_code,item_description,attribute3,attribute4,attribute6,attribute5,closed_flag,closed_date,cancel_flag,cancel_dateFROM po.po_lines_all') If the columns differ use an explicit column list in the INSERT clause. |
 |
|
Kristen
Test
22859 Posts |
Posted - 2010-12-07 : 03:45:49
|
"If the columns differ use an explicit column list in the INSERT clause."I suggest you do that even if they do match - in case someone adds a new column to dbo.PO_LINES_ALL in the future. |
 |
|
|
|
|
|
|