Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 Best way to process a streaming interface into SQL

Author  Topic 

smccreadie
Aged Yak Warrior

505 Posts

Posted - 2004-01-18 : 21:51:19
I need to build an interface from an external transactional system into SQL Server. The interface is essentially keeping the two systems in synch. The data coming from the external database (Informix) is streaming (the order of the actions is important), yet I receive the data in batch (every 10 minutes). What I need to do is order the rows by their timestamp and iterate one row at a time, doing the action indicated in the row to the SQL copy of the data (Each row tells me what action took place - table name, column changed, old, new value).

For example, I may get several rows for a given order number showing a modify action which I need to then program an update statement in SQL. Sometimes a column can be modified several times in the 10 minutes and thus the order is important.

I pride myself in having no production cursor-based solutions out there, but this one is screaming cursor. Anyone got a set based approach to this one or another idea?

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2004-01-20 : 10:26:01
You rang?

quote:

For example, I may get several rows for a given order number showing a modify action which I need to then program an update statement in SQL. Sometimes a column can be modified several times in the 10 minutes and thus the order is important.



Sort of. Multiple updates to the same row can be aggregated; you're only concerned with the latest change to a given column.

I would approach the problem the following way:
(1) At the end of the batch, what rows exist? (process the consequences of the various INSERTs and DELETEs in time order to get the latest state of a given order). Disregard the deleted rows in future processing.

(2) At the end of the batch, what are the final column values for each row? Use the last change for each column and then run an update (I'd imagine you'd create a temp table w/ order # and new values, and run an update joining your prod table to this temp table).

Jonathan
Gaming will never be the same
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-20 : 10:56:24
Keep the receipt of data separate from the processing i.e. don't rely on the batch size being received - define your own processing batch size then you can vary it at will and won't get into trouble if you don't process received rec for a few days. Putting the batch size down to one menas that you will pocess rec by rec.

If you can use an identity on the table then that's probably easiest but any single increasing indexed column will do.

Take a batch of recs - use a last rec_id or delete the recs and just process from the top.
Putting into a temp table is easiest.
Get rid of all those recs for rows that have a delete later.
Get rid of all those updates for cols that have an update for the same row later.
Apply the ones that only have a single entry per rec - hopefully that will get rid of a lot.
You may find that after that processing one by one is OK.


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2004-01-20 : 12:06:59
Well, one of the basic tenets of set-based processing is not caring about the size of the sets. The batch size is of course irrelevant.

Jonathan
Gaming will never be the same
Go to Top of Page

smccreadie
Aged Yak Warrior

505 Posts

Posted - 2004-01-20 : 12:07:02
OK, thanks for the tips. I'll give it a shot.
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-20 : 12:12:53
>> Well, one of the basic tenets of set-based processing is not caring about the size of the sets
But for an implementation you do need to care - the amount of data you work on should be tailored to the amount of memory available.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2004-01-20 : 12:19:36
If the batch size increases, the processing time increases, no matter. He hasn't specified any performance requirements so I'll assume he has adequate hardware and he'll write the stored procedure efficiently.

Jonathan
Gaming will never be the same
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-20 : 12:35:15
No matter what the hardware if you run out of memory and have to page it will take a sudden performance dip.
For something like this where you may have to stop the processing for a few days then catch up it is necessary to have a method of regulating the amount of data processed in a single batch or you could easily end up with corrupt databases.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-20 : 13:35:36
Isn't anyone going to argue about that?

(Was a lot more apparent in <= v6.5 but I still believe that sql server is much more sensitive to memory than lesser databases starting with O for instance - and is faster if written with the amount of memory in mind, but much slower if not).

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2004-01-20 : 14:16:42
There's no direct correlation between larger data sets and higher memory consumption. Some massive data sets can be operated on quickly and much smaller ones take longer; it depends on the query operators selected in the execution plan.

I would offer as well that many folks have differing opinions on what constitutes "enough" memory. 2Gb? 4Gb? 16Gb?

Jonathan
Gaming will never be the same
Go to Top of Page

SamC
White Water Yakist

3467 Posts

Posted - 2004-01-20 : 18:29:29
Ding! End of Round 1.
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-21 : 05:55:36
>> I would offer as well that many folks have differing opinions on what constitutes "enough" memory. 2Gb? 4Gb? 16Gb?

Depends on the implementation as to what is enough memory.
Consider a table scan. If the table can be held in memory then (the second time) this will be much faster (thousands of times faster) than if it has to be read in every time.
Same goes for using indexes - if the index entries are already in memory thge operation will be faster than if they have to be read.
That answers your first point - no matter what size the data you are working on the process will be faster if the data needed is already in memory.

The art is to tailor the system so that the data being worked on is already in memory - i.e. complete all processes on data before purging it and replacing with the next lot.
Hence setting the batch size - get a batch into memory and run all the processes on it, making sure that those processes do not purge your batch. The more memory you have the bigger the batch size you can probably deal with.

Writing queries assuming that the amount of data is irrelevant is a recipe for disaster - and why a lot of developments have problems because they are developed on small amounts of data then take too long in system test with real life volumes.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

setbasedisthetruepath
Used SQL Salesman

992 Posts

Posted - 2004-01-21 : 09:13:35
Ok ... time to put this one to bed. The market's opening soon and I'm in the mood to make money.

Clearly you've been burned by this in the past. Some process you wrote, or inherited, wasn't efficient enough to deal with the system load and you went through hoops to make it work. I get it. Let's compromise: the system refresh time (in smc's example, 10 minutes) is effectively the batch size. I do not like the idea of explicitly designing for it in the import routine b/c it is not elegant. If he can't process 10 minutes worth of data at once, shorten the window.

Jonathan
Gaming will never be the same
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-21 : 13:05:05
That's fine as long as he always processes the data when it arrives - but what happens if he has to stop the data processing for a couple of days or gets a large initial import.

I haven't been burnt because I always design with the data volumes in mind.
One of my early systems was rating with millions of telecom call recs and a day within a few minutes of the call - had to run it on my desktop for a few months because the server wasn't reliable - just reduced the batch size to allow for the lack of memory.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

smccreadie
Aged Yak Warrior

505 Posts

Posted - 2004-01-21 : 13:46:24
Didn't mean to start a controversey but your points are well taken. Memory shouldn't be an issue as the batch size will range from 10 - 100 rows (I process every 10 minutes). In the event of a downtime or the initial load, I do have a batch process to import the data quickly. It's only the changes after the batch load that I have to process.
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2004-01-22 : 06:21:00
It's OK - we're just hijacking the thread for a theoretical discussion.

For that amount of data it's not worth spending much time over optimisation - you might even ask if it's easier to process row by row unless you think it might grow in the future.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page
   

- Advertisement -