Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 SQL Server 2000 Forums
 SQL Server Development (2000)
 ETL tracking of data

Author  Topic 

Vivaldi
Constraint Violating Yak Guru

298 Posts

Posted - 2006-08-22 : 10:38:54
Hello,

I am starting to deal with a great deal of retrieving data and then publishing to multiple systems in a multitude of ways (some get data via odbc, others via flat files).

So I have table A and B and C that hold data other systems want. (all 3 tables then have a row_id for a unique internal key)

I have "subscribers" 1 (Mainframe), 2(AS400) (not in the SQL sense of the word) that want to retrieve data at different times and different amounts.

So I developed a Control table that holds what system, what table, and when did they get the data. I then have a child table that tracks what rows were taken for a given control record.

The problem is that you end up with a lot of rows in your child table because each system is tracking each row individually. The difficulty is I am adding more tables and more systems all the time. So adding a column for each system in each table is not really a good option.

As anyone ever dealt with trying to track who has what data?

Just looking for some additional ideas.

nr
SQLTeam MVY

12543 Posts

Posted - 2006-08-22 : 10:54:34
Do you care which rows they have retrieved? They are retrieving the data so it should be up to them to control what they have accessed. It would be different if you were pushing the data - then you would be controlling it and so should keep track.

I would just log which tables they have retrieved from and how many rows maybe so that you can check if what they are doing is reasonable.

By they I mean the remote systems - doesn't matter if the same person is controlling them.


==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Vivaldi
Constraint Violating Yak Guru

298 Posts

Posted - 2006-08-22 : 10:58:27
quote:
Originally posted by nr

Do you care which rows they have retrieved?



Yeah, we do. The problem is some we literally push via a flat file and ftp. And if the receiving system has trouble, or deletes the file or..., they want to be able to recreate a batch. And this exists for nearly every scenario. We have started to fiddle with the notion of an internal datetime stamp, and just setup a batch and retrieve the range, but I haven't worked through the scenarios yet.



________________________________________________
"Wow, you have a master's degree in psychology? I will order a vodka martini, thanks"
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2006-08-22 : 11:31:16
For that sending the file is your commitment.
It should be up to them to deal wit the loading.
Have a lok at my site for loading text files.
Load the file and move it to an archive folder - then if you need to load again it is in the archive folder, just move it back to the receive folder.

On your side do the same. Extract to a text file, ftp everything in that folder and mopve it to an archive folder.
The archive folder becomes a record of what you have sent and it can be resent by moving to the send folder (send via a job that sends everything in the send folder)

I also keep an image of the data sent in a table but that depends on how much data is sent.

so send
procs which
Job 1
get batch id and control process
Get the id's of data to send and perhaps the data and store in table with batch id
Format the data to send and store row images in table with batch id
extract data in output image table to text file in send folder for destination

Job 2
Here should also zip/encrypt files - have separate folders and archive folserd for each process. Each process works on all files in the folder rather than being controlled per file.
send all files in send folder to ftp destination and move files to archive folder
This should also check the ftp destination after esch send to ensure that the file is there if possible.

I've implemented this process for a number of companies and it works quite well and gives a good independence of process.
If the jobs and processes are set up as above you can see how it's doing and what has failed just by looking at the folders. It also eases development because you can test each process independantly.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Vivaldi
Constraint Violating Yak Guru

298 Posts

Posted - 2006-08-22 : 12:20:47
Ok

We do the same thing with files we receive and have a pretty solid process that does that, and that alone.

Hadn't thought about doing that for files I was sending. That would cut back on our data tracking a bit.

Thanks for the thoughts. Good to get a different perspective.

quote:
Originally posted by nr

For that sending the file is your commitment.
It should be up to them to deal wit the loading.
Have a lok at my site for loading text files.
Load the file and move it to an archive folder - then if you need to load again it is in the archive folder, just move it back to the receive folder.

On your side do the same. Extract to a text file, ftp everything in that folder and mopve it to an archive folder.
The archive folder becomes a record of what you have sent and it can be resent by moving to the send folder (send via a job that sends everything in the send folder)

I also keep an image of the data sent in a table but that depends on how much data is sent.

so send
procs which
Job 1
get batch id and control process
Get the id's of data to send and perhaps the data and store in table with batch id
Format the data to send and store row images in table with batch id
extract data in output image table to text file in send folder for destination

Job 2
Here should also zip/encrypt files - have separate folders and archive folserd for each process. Each process works on all files in the folder rather than being controlled per file.
send all files in send folder to ftp destination and move files to archive folder
This should also check the ftp destination after esch send to ensure that the file is there if possible.

I've implemented this process for a number of companies and it works quite well and gives a good independence of process.
If the jobs and processes are set up as above you can see how it's doing and what has failed just by looking at the folders. It also eases development because you can test each process independantly.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.



________________________________________________
"Wow, you have a master's degree in psychology? I will order a vodka martini, thanks"
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2006-08-22 : 12:48:08
I treat sending and receiving as similar processes and do something very similar for the receiving.
The question is do you have a root folder for the remote system or for the process.
I usually go for the system but they are all stored in a table so is easily changeable.
I also control the process via metadata but sounds like you are doing something similar.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page

Vivaldi
Constraint Violating Yak Guru

298 Posts

Posted - 2006-08-22 : 14:29:45
In order for this to work in my world, a file server will be required as far too many systems and delivery methods are used now. Hence why the original system is all stored in the database. I will have to work on a system that can essentially move the file given a variety of delivery methods (network copies/ftp/old x25 type stuff)

I am working on some putting thoughts to paper on how to achieve this so the units of work will be small and simple, and flexible enough so it doesn't take tons of time to add a new system and file subscription. The "event" or workflow will have to be data driven so that mgmt/support of the process won't be rocket science.

Have you ever seen any discussions on such a topic, we are certainly not the first to have to deal with multiple subscribers in different systems with different delivery protocols.

quote:
Originally posted by nr

I treat sending and receiving as similar processes and do something very similar for the receiving.
The question is do you have a root folder for the remote system or for the process.
I usually go for the system but they are all stored in a table so is easily changeable.
I also control the process via metadata but sounds like you are doing something similar.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.



________________________________________________
"Wow, you have a master's degree in psychology? I will order a vodka martini, thanks"
Go to Top of Page

nr
SQLTeam MVY

12543 Posts

Posted - 2006-08-23 : 07:26:58
I just code what is needed for a comnpany.
I've never seen this published anywhere but I've taught quite a few people in the UK how to implement it.

I keep meaning to write an article on it but never get round to it.

I something that is particularly useful for companies that are uncontrolled or have innexperienced staff as it keeps the systems separate and stops them trying to contrrol one system from another - have never managed to get anyone to realise that is what's happening, they seem to think that there processes suddenly become easier to manage (and define) by coincidence.

==========================================
Cursors are useful if you don't know sql.
DTS can be used in a similar way.
Beer is not cold and it isn't fizzy.
Go to Top of Page
   

- Advertisement -