Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2005 Forums
 Transact-SQL (2005)
 Querying 50,000 rows, one row at a time

Author  Topic 

pwvailla
Starting Member

31 Posts

Posted - 2010-12-02 : 17:04:00
New to SQL Server 2005, I have to extract valid PO's from an Oracle Purchasing system which I have only read rights to. I need to pull 50,000 rows of valid po's for the project control system. I can't use the IMPORT data routine as a full table retrieval is too slow and it crashes due to network or data governance issues, not to mention it would take forever.

I don't know enough about the constructs I should use to build a results table which needs to go against a SQL Server 2005 reference table I have which contains the valid po, po_lineno values I need.

What would the "looping" DO WHILE (or whatever) look like? assuming:

the key fields needed to extract are po, po_lineno, po_desc
the reference table is dbo.VALID_POs
the Oracle Purchasing table is PO.PO_LINES_ALL
the destination table is dbo.PO_LINES_ALL

thanks (and then some)

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2010-12-02 : 17:17:23
is this continuous or a one time thing?

If you don't have the passion to help people, you have no passion
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-02 : 17:28:43
I would do:

SELECT *
INTO #TEMP
FROM OPENQUERY(MyLinkedOracleServer, 'SELECT Col1, Col2, ... FROM OracleDatbase.OracleTable WHERE ...')

and then JOIN #TEMP to the appropriate local tables and make my update.

Make sure that you Oracle Query is carefully optimised (if that is the slow part)
Go to Top of Page

pwvailla
Starting Member

31 Posts

Posted - 2010-12-02 : 23:27:11
Not sure this answers the question, but...

Where does the #TEMP table get created? I have no ability to create on the Oracle environment. I am needing to reference a lookup table on SQL Server 2005 database on another server environment (my WHERE clause will require me to SELECT from Oracle ERP where a value EXISTS in a SQL environment). That is possible?


quote:
Originally posted by Kristen

I would do:

SELECT *
INTO #TEMP
FROM OPENQUERY(MyLinkedOracleServer, 'SELECT Col1, Col2, ... FROM OracleDatbase.OracleTable WHERE ...')

and then JOIN #TEMP to the appropriate local tables and make my update.

Make sure that you Oracle Query is carefully optimised (if that is the slow part)

Go to Top of Page

Sachin.Nand

2937 Posts

Posted - 2010-12-03 : 00:14:17
Are you able to query the oracle DB using linked server?
Something like this



select * from openquery(MyLinkedOracleServer,'SELECT Col1,Col2, ... FROM OracleDatbase.OracleTable)



PBUH

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-03 : 03:05:49
"Where does the #TEMP table get created?"

Locally on SQL Server (in its TEMPDB databases, but you could use a permanent table if you prefer - clear it down before each batch, for example)

"(my WHERE clause will require me to SELECT from Oracle ERP where a value EXISTS in a SQL environment). That is possible?"

Possible. But tricky.

IME things that do that run like a dog

The Oracle databases we connect to have a ChangeDate on every row / table. We use that to get anything "newer" than last time we queried the database. Then we put them in our Staging tables. Then we process them from there.

Key benefit is that we have a light-weight query for oracle (only things which are newer than last time) and we move relatively little data between the databases. It still takes 15 minutes to run each batch, and we do that twice an hour!
Go to Top of Page

pwvailla
Starting Member

31 Posts

Posted - 2010-12-03 : 16:22:36
I am making progress, now that IT has fixed the privileges that I was denied. I want a query like this:

SELECT *
INTO #TEMP
FROM OPENQUERY(gpspgjp1,
SELECT po_header_id, segment1, vendor_id, vendor_site_id
FROM PO.PO_HEADERS_ALL
WHERE PO.PO_HEADERS_ALL.segment1 EXISTS
(SELECT DBO.VALID_PO_PROJECTS.segment1
FROM DBO.VALID_PO_PROJECTS))

How can I get a subset of the source table by using a lookup table on the destination server? Source is an Oracle Purchasing table containing 2 million rows and I only require about 50,000 rows to be retrieved to the destination server running SQL Server 2005.

So far I am getting an invalid relational operator... which lead me to believe I can't do this type of query. Is there another way to tackle it?

How would you set up a loop construct to go row by row through my reference table (actually a view) and retrieve data a row at a time? That is the only way I see this working at the moment...


quote:
Originally posted by pwvailla

Not sure this answers the question, but...

Where does the #TEMP table get created? I have no ability to create on the Oracle environment. I am needing to reference a lookup table on SQL Server 2005 database on another server environment (my WHERE clause will require me to SELECT from Oracle ERP where a value EXISTS in a SQL environment). That is possible?


quote:
Originally posted by Kristen

I would do:

SELECT *
INTO #TEMP
FROM OPENQUERY(MyLinkedOracleServer, 'SELECT Col1, Col2, ... FROM OracleDatbase.OracleTable WHERE ...')

and then JOIN #TEMP to the appropriate local tables and make my update.

Make sure that you Oracle Query is carefully optimised (if that is the slow part)



Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-03 : 19:16:03
"How can I get a subset of the source table by using a lookup table on the destination server?"

You can do something like

SELECT po_header_id, segment1, vendor_id, vendor_site_id
FROM gpspgjp1.PO.PO_HEADERS_ALL AS POHA
JOIN dbo.VALID_PO_PROJECTS
ON dbo.VALID_PO_PROJECTS.segment1 = POHA.segment1

but I think it will run very slow.

You could also use dynamic to construct a list of the values for segment1 in VALID_PO_PROJECTS and pass it as a "list" using an IN clause- but if the IN clause is large I think that will be unworkable too.

Or you could pull the "changed data" from oracle, and "apply" the changes to a local staging table, and then JOIN in the normal way. In effect you will have a complete local copy of the oracle table - which may be huge and thus unworkable. Compromises will be required I'm afraid.
Go to Top of Page

yosiasz
Master Smack Fu Yak Hacker

1635 Posts

Posted - 2010-12-03 : 19:29:40
dump n rows to n csv, xml files. then SSIS into your destination table
if this is a continuous thing initial pull will be big and slow subsequent pulls will be nice and fast


If you don't have the passion to help people, you have no passion
Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2010-12-03 : 19:49:39
quote:
You could also use dynamic to construct a list of the values for segment1 in VALID_PO_PROJECTS and pass it as a "list" using an IN clause
I second this, I had to do the same thing with an Oracle data warehouse a number of years ago. It's aggravating but the performance is about as good as you'll get.
quote:
but if the IN clause is large I think that will be unworkable too
I had to break up my IN list into 1000-2000 element chunks and run multiple statements to get them all, but it worked and was quite fast. It's difficult to do as dynamic SQL using OPENQUERY, my recommendation is to generate all the SQL to a script file and run the script via the sqlcmd utility, or run it manually. OPENQUERY will probably give better performance than using 4-part references.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-04 : 02:13:18
That's good to hear Rob. I've never had to do it with anything truly large, but I'll keep it in mind for when I do have to
Go to Top of Page

pwvailla
Starting Member

31 Posts

Posted - 2010-12-05 : 17:55:48
Rob,

I am going against a 2 million row Oracle Purchasing ERP table... VERY slow going. How would your example of 1000-2000 record chunks look?

I have been wondering if I could learn to use a stored procedure or something that would run a query against the 2M table for each of the 50,000 valid POs that I need data for. 1-2000 rows sounds better but I am not sure how to do this since I am new to SQL Server 2005.

quote:
Originally posted by robvolk

quote:
You could also use dynamic to construct a list of the values for segment1 in VALID_PO_PROJECTS and pass it as a "list" using an IN clause
I second this, I had to do the same thing with an Oracle data warehouse a number of years ago. It's aggravating but the performance is about as good as you'll get.
quote:
but if the IN clause is large I think that will be unworkable too
I had to break up my IN list into 1000-2000 element chunks and run multiple statements to get them all, but it worked and was quite fast. It's difficult to do as dynamic SQL using OPENQUERY, my recommendation is to generate all the SQL to a script file and run the script via the sqlcmd utility, or run it manually. OPENQUERY will probably give better performance than using 4-part references.

Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2010-12-05 : 21:59:53
Essentially you build a series of SQL statements like so:
INSERT INTO #TEMP
SELECT * FROM OPENQUERY(gpspgjp1, 'SELECT po_header_id, segment1, vendor_id, vendor_site_id
FROM PO.PO_HEADERS_ALL WHERE PO.PO_HEADERS_ALL.segment1 IN(1001,1002,1003,...,2000)')
You'll replace the portion in red with a list of segment IDs from DBO.VALID_PO_PROJECTS. I'd recommend starting with 1000 IDs at most and seeing what the performance is like. If the Oracle table is indexed on that column then pulling only 1000 at a time should be quick, even if there are millions of rows. You can then try 2000, 5000 or even more, but you'll reach some practical limit short of your total of 50,000.

I'd recommend building these statements by hand and running them before worrying about automating or generating it dynamically. You can do SELECT TOP 1000 segment1 + ',' FROM DBO.VALID_PO_PROJECTS and pasting the results into the red portion of the query above. If you find that 1000 take to long, reduce it to 500, or 200 or whatever provides reasonable performance.

Go to Top of Page

pwvailla
Starting Member

31 Posts

Posted - 2010-12-06 : 09:59:55
Rob,

Things are starting to sink into my reptile brain. OPENQUERY executes as if it is on the Oracle box, therefore Oracle SQL is used?

I am getting the "ORA-01722: invalid number message. This is a VARCHAR(20) field on Oracle. Is there an equivalent statement for the VARCHAR statement you would use? nI have tried to_number, enclosing in quotes....


quote:
Originally posted by robvolk

Essentially you build a series of SQL statements like so:
INSERT INTO #TEMP
SELECT * FROM OPENQUERY(gpspgjp1, 'SELECT po_header_id, segment1, vendor_id, vendor_site_id
FROM PO.PO_HEADERS_ALL WHERE PO.PO_HEADERS_ALL.segment1 IN(1001,1002,1003,...,2000)')
You'll replace the portion in red with a list of segment IDs from DBO.VALID_PO_PROJECTS. I'd recommend starting with 1000 IDs at most and seeing what the performance is like. If the Oracle table is indexed on that column then pulling only 1000 at a time should be quick, even if there are millions of rows. You can then try 2000, 5000 or even more, but you'll reach some practical limit short of your total of 50,000.

I'd recommend building these statements by hand and running them before worrying about automating or generating it dynamically. You can do SELECT TOP 1000 segment1 + ',' FROM DBO.VALID_PO_PROJECTS and pasting the results into the red portion of the query above. If you find that 1000 take to long, reduce it to 500, or 200 or whatever provides reasonable performance.



Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2010-12-06 : 10:31:14
Yes, OPENQUERY runs it on the Oracle box and returns the results. There is minimal overhead compared to using 4-part names with a JOIN. The SQL does need to conform to Oracle syntax, but what you're using here is mostly identical to SQL Server, and you're generating most of it anyway.

Try this to get the correct quotes on the varchar data:
SELECT TOP 1000 '''''' + segment1 + ''''',' FROM DBO.VALID_PO_PROJECTS
Paste those results into the query as I described earlier.
Go to Top of Page

pwvailla
Starting Member

31 Posts

Posted - 2010-12-06 : 16:27:52
Rob,

Still not able to get this working but I was able to get this to run:

USE GPO_PC
SELECT * FROM OPENQUERY
(gpspgjp1,
'SELECT
po_header_id,
po_line_id,
line_num,
unit_price,
quantity,
unit_meas_lookup_code,
item_description,
attribute3,
attribute4,
attribute6,
attribute5,
closed_flag,
closed_date,
cancel_flag,
cancel_date
FROM po.po_lines_all')

718934 rows retrieved in 7:40 seconds. I can live with that as the system I will replace took hours to get the subset of data I really need.

How can I INSERT this data into a table on SQL Server 2005? If I can do that, then I am (no you are) a rock star.

Where would the INSERT INTO dbo.PO_LINES_ALL (SQL Server tablename) go?

Or is there a way to write the query results to a file on my SQL Server disk and then parse it for the insert (if that were faster).

Thanks - Paul


quote:
Originally posted by robvolk

Yes, OPENQUERY runs it on the Oracle box and returns the results. There is minimal overhead compared to using 4-part names with a JOIN. The SQL does need to conform to Oracle syntax, but what you're using here is mostly identical to SQL Server, and you're generating most of it anyway.

Try this to get the correct quotes on the varchar data:
SELECT TOP 1000 '''''' + segment1 + ''''',' FROM DBO.VALID_PO_PROJECTS
Paste those results into the query as I described earlier.


Go to Top of Page

robvolk
Most Valuable Yak

15732 Posts

Posted - 2010-12-06 : 16:43:36
This should work:
INSERT INTO dbo.PO_LINES_ALL
SELECT * FROM OPENQUERY
(gpspgjp1,
'SELECT
po_header_id,
po_line_id,
line_num,
unit_price,
quantity,
unit_meas_lookup_code,
item_description,
attribute3,
attribute4,
attribute6,
attribute5,
closed_flag,
closed_date,
cancel_flag,
cancel_date
FROM po.po_lines_all')
If the columns differ use an explicit column list in the INSERT clause.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2010-12-07 : 03:45:49
"If the columns differ use an explicit column list in the INSERT clause."

I suggest you do that even if they do match - in case someone adds a new column to dbo.PO_LINES_ALL in the future.
Go to Top of Page
   

- Advertisement -