Keeping a change history of a record

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2000 Forums

SQL Server Development (2000)

Keeping a change history of a record

Author

Topic

rbuckton
Starting Member

4 Posts

Posted - 2004-06-25 : 09:54:32

(first a description of what i'm doing, questions follow)

I am working on a medical application and one of my requirements is the ability to track changes on a record. This is not as simple as having some log table somewhere where i write records using a trigger based on changes in the main table, but rather I must be able to query the history to get an exact snapshot of the record at any point in time for review.

The idea i came up with uses a "chaining" type of system, where both the current and historical records are stored in the same table. Queries are constructed to return specific versions of a record as well as just the latest versions of a record. A sample schema is below:

Item
----
itemId : uniqueidentifier not null
currentItemId : uniqueidentifier not null primary key
previousItemId : uniqueidentifier not null
version : int not null default(1)
archived : bit not null default(0)
lastModifiedOn : datetime not null
lastModifiedBy : nvarchar(255) not null
[actual record data...]

The idea is that when a record is first created the itemid, currentitemid and previousitemid are set to the same value. When a record changes, it is instead inserted into the table and its history is maintained, so it gets a new currentitemid, the itemid stays the same, and the previousitemid is set to the previous version. With any change the record gets a new version number. Finally, as data is not permitted to be deleted, an archived bit is set to 1 instead which signals the end-of-life of the record.

I constructed some queries to give me the data i need, such as getting all of the latest only records using a correlated sub-query and comparing version against MAX(version), however I am not so sure about the speed of such a solution. I cannot say I am very familiar with SQL performance monitoring or load testing and cannot easily say how fast queries would be, especially when the database gets large (scaling by factors of 10, e.g. 100 items, 1000 items, 10k items, etc.)

My question is, has anyone implemented a similar system in the past and what caveats should i look for? Is this idea a sound one and will it perform well?

The database platform is SQL Server 2000 (currently MSDE for a single-user app but will likely migrate to Server to handle multiple users and heavier load)

The actual solution is much more complex as well (related tables with independant chaining history that interconnect while still maintaining an accurate snapshot).

btw: sample query for getting all latest version records:

select * from Item A WHERE version = (SELECT MAX(version) FROM Item B WHERE B.itemId = A.itemId) AND archived = 0

I also might (maybe?) be able to use an "instead of" trigger on update/delete to do the inserts, but i also need to figure out how that will affect performance and usability.

Any suggestions, comments would be appreciated.

Ron

rbuckton
Starting Member

4 Posts

Posted - 2004-06-28 : 12:13:54

(bump) No comments at all? I am still looking for some suggestions on whether or not this is a good idea. I'll be trying some performance testing soon on my own.

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2004-08-19 : 13:41:08

This is a fantastic question and I am extremely surprised nobody took the time to answer this. I have some experience dealing with this type of problem and my solution differs greatly. I do not have time to post it in depth now, but I will attempt to respond later.

Rob & Page47... what solutions have you implemented that deal with these issues?

DustinMichaels
Constraint Violating Yak Guru

464 Posts

Posted - 2004-08-19 : 13:49:08

Perhaps this will work for you.

Make 2 tables. One called MedicalRecord and one called MedicalRecordHistory.

There is a 1 to many relationship between medical record and medical record history.

The MedicalRecord table would contain an MedicalRecordID and a CurrentHistoryID (which points to the current medical record history row to speed up queries).

Every time you make a change to a medical application you will insert a new row into the MedicalRecordHistory table and update the MedicalRecord's CurentHistoryID with latest Identity value.

If you do something like this, then the only thing you need to do when getting the latest medical record is do an Inner Join between the MedicalRecord and MedicalRecordHistory table using the MedicalRecord's CurrentHistoryID property.

Hope this helps.

Dustin Michaels

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2004-08-19 : 13:55:24

BTW, the original post is rather old and the author may no longer care about the answer, however I was not able to find an in-depth discussion on this topic in the forum and I think this is definately an issue that many, many aspiring DBAs have to deal with (or should). This is also an issue that I've had to personally wrestle with, so I am particularly interested in this topic.

If there are some exceptional articles or discussions on this topic, please post the links. Alternately, if you have experience in dealing with this, please share.

Regards

Kristen
Test

22859 Posts

Posted - 2004-08-19 : 14:03:38

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=38604 any good?

I would use an AUDIT table with the same columns as the main table, plus columns for Date, User and Action (insert/update/delete)

I would not put them in the same table (should someone with permissions to the main table be able toi fiddle with the audit records? Will someone with SELECT permissions be sure to get the Current copy of any record? and not get confused by all the intervening edits? (on the other hand if they want an audit report they can use both MAIN and AUDIT tables and will be expecting multiple rows in AUDIT for each MAIN row).

Kristen

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2004-08-19 : 17:29:45

Kristen & Dustin, thank you taking the time to provide input.

Kristen, does a delete Action absolutely need to have every column of the last history row repeated? Isn't that violating a normal form? Also, say you have to insert/update/delete 500,000+ rows every day... what 'audit' solution provides the best performance? How would you manage this in a production environment to prevent table-locks from crippling your real-time processes? Say the tables have extensive amounts of data and real-time processes need access the history data as it was at a single point in time? Across separate history tables that can be updated at different intervals?

All of these factors together cannot be solved by the article or the solutions presented.

How would you solve them?

There are many reasons that SQL professionals offer their advice on this forum, but I think one of them is to ultimately learn new things and improve their skills. How many people out there think that their solution for auditing/history is as good as it can be? If you think your solution is adequate, why not improve upon it?

I am attempting to spark a deep thread here that the "masters" of SQL server can be involved in on an intellectual level. I think such a thread can truly help a lot of people. Perhaps an article can be written for this site on the results of this thread.

I understand that perfection is not achiveable because, ultimately, it depends on your requirements. However, I have not seen a truly indepth discussion on this issue and I seriously think it would be a great benefit to all involved.

I will definately post my solution when I have a chance.

Kristen
Test

22859 Posts

Posted - 2004-08-20 : 02:34:29

We have a LOG table for debugging. It inserts a row whenever ANY [near-as-damnit!] SProc executes, and UPDATEs that row when the SProc exits. Both of those actions are performed through SProcs themselves, and in turn the INSERT Sproc does some name -> code look ups. That's a fiersome number of actions (0.5m inserts per day), per day, on our busy installations and a potential bottleneck for table locks, but SQL doesn't seem to break into a sweat on "decent" current hardware.

"does a delete Action absolutely need to have every column of the last history row repeated? Isn't that violating a normal form?"

How does it violate normal form? Anyway, I would not be worried about that I just want to store the data the users need, efficiently.

As it happens we only store DELETEs in the History table. The current row is in the main table.

I think a one-table-stores-all-column-changes approach is just unworkable in practice. Table is huge, Trigger script is huge, and an ongoing deveopers nightmare as columns are added, plus reporting is very difficult.

So I'm in the one-audit-table per main-table camp. Our applications have AUDIT tables, of this type, on about 50% of their tables. (We see no point auditing tables with high-turnover of data which is likely to be useless in any subsequent post mortem- such as User Session Data - if we don't get to that within a week or so, before it is purges, its unlikely to be any use to us, bt in extremis we could lift it off the backups).

This approach also allows us to list the previous changes to a record at the bottom of a "record card" style maintenance screen - which in turn allows users to cut&paste from previous edits into the current change - a poor-man's undo!

I would be tempted to put the AUDIT tables in a separate database, but then synchronising BACKUPs becomes fraught. So I think better to have scheduled tasks moving rows out of the main database, into an audit database, a little after-the-fact to keep DB size down and performance up.

Kristen

JimL
SQL Slinging Yak Ranger

1537 Posts

Posted - 2004-08-20 : 07:53:04

I would agree with Dustin on this. The two table “current record pointer” would seem to offer the most efficient method of dealing with your scenario. I would go one step further by adding an archive table to move your records to when the patient is deceased this would be indicated by a secondary column in the primary table this would keep the table at peak efficiency.

Jim
Users <> Logic

ehorn
Master Smack Fu Yak Hacker

1632 Posts

Posted - 2004-08-20 : 08:04:45

Nigel has written a very good audit trigger...

http://www.nigelrivett.net/AuditTrailTrigger.html

Kristen
Test

22859 Posts

Posted - 2004-08-20 : 14:15:27

I'm afraid I don't like that route much. It uses dynamic SQL to work out what has changed (including querying the System Tables to see what columns are in play). That has performance and permission implications (find for systems where that is not an issue, of course).

Then it stores all column changes in a single table. This makes that table huge, and difficult to report on.

I have used that approach before (expect that I wrote a sql script that create the syntax for the trigger for all appropriate columns, so didn't need to use dynamic SQL). But it still suffered from Huge data table and reporting diffculties.

I have grown more fond of the one-audit-table-per-main-table approach, just storing deleted records.

Out Audit tables have two additional leading column - an "Audit Type" and an "Audit Date", and then the triggers just need something along the lines of:


INSERT dbo.MyTable_Archive
SELECT	CASE WHEN I.MyPK IS NULL THEN 'D' ELSE 'U' END, GetDate(), D.*
FROM	deleted D
	LEFT OUTER JOIN inserted I
		 ON I.MyPK = D.MyPK

Kristen

SqlFriend
Starting Member

26 Posts

Posted - 2004-08-20 : 14:26:15

For databases where the data is not growing extraordinarily fast and the records are not insanely huge- I just keep the entire old record and mark it as deleted. (I also record the time it was deleted).

Then you have a complete history of every change- it appears in order when sorted by date. You always have the latest one and it is rapidly accessible.

I don't know if other people do this but I am so attached to the idea that I often utilize this (don't delete anything, unless the tables are growing at astronomical rates). For any internal project, this seems to be the ideal. Then there are no questions of who to blame- it is always marked in the database.

byrmol
Shed Building SQL Farmer

1591 Posts

Posted - 2004-08-20 : 18:57:08

lazerath,

You will be glad to know that the "masters" of RM have tried to answer this. "TEMPORAL DATA AND THE RELATIONAL MODEL" by Date, Darwen and Lorentzos. Basically (there is nothing basic about it, the develop a theory of time in a database), they created a interval data type, define a new form (6NF) and give a generic design for implementation. The design would really need to built into the DBMS itself. Have a look at it.

In one shop I worked the decision was made to used an "audit" table for each "current" table. No FK's to the audit table to allow deletion from the "current" table. Each audit table was partitioned into 2 tables. One containing data 30 days or less ("current audit") and the other everything else ("historic audit"). A partioned view is created to join all 3 tables for full querying solutions. The "historic audit" tables where placed on read-only file groups and obviously a maintenance task is run once a month to keep the system intact... Worked well enough...

DavidM

"Always pre-heat the oven"

derrickleggett
Pointy Haired Yak DBA

4184 Posts

Posted - 2004-08-21 : 15:19:41

I like a variety of SqlFriends solution. I design many times to keep the data in the table with a delete key. Then have a batch process at night to move into a history table or warehouse if one exists. This way indexes don't get holes shot in them during the day. You can use views to look at the combined results.

MeanOldDBA
derrickleggett@hotmail.com

When life gives you a lemon, fire the DBA.

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2004-08-24 : 12:25:17

First, I'd like to thank everyone who has weighed in the on the subject. To prefix my own solution here, I was hoping I'd have enough time to put together a full sample with ddl, code & test data, however that will have to wait. I'll explain the way we do it and hopefully someone will have some feedback that can improve it and alternately, I hope this can be useful to someone else.

Let just take the medical example as noted above and say we are building out a design to store patient information. You could have the following tables:

Patient
PatientPhysicalAddress
PatientMailingAddress

Table Patient has a Uniqueidentifier as the primary key (PatientID). Let's say that a Patient must always have a Mailing Address, but may have an alternate Physical Address where packages may be shipped.

So, if we want to audit these tables, the standard approach is to add (at least) one audit table per "current" table:

PatientAudit
PatientPhysicalAddressAudit
PatientMailingAddressAudit

Each audit table is composed of the same data elements of current tables and additionally contains a DateStamp and an ActionCode. The primary key for the table is a composite with the PatientID and DateStamp.

Now, here's where some of your solutions were hazy: how to manage the data in the tables. For us, we manage the audit and "current" tables by only inserting into the audit tables. There are two triggers on each audit table. The first is an Instead Of Insert that compares the current row to the inserted row and only inserts "changed" rows. It does this by comparing datatype "defaults", which after much testing was the fasted method (The slipshot is that there are some datatypes have a "default" that could potentially be a valid value). The second trigger is an After Insert that Inserts, Updates & Deletes information from the "current" table.

Now, why did we chose to do it this way? Well, we wanted an audit solution that would scale to millions of records easily, so the first design decision was to keep the "current" table much like a hard view. This way, the trigger that compares the inserted row to the "current" row doesn't access a row in the table it is inserting into (thereby avoiding locks). In addition, when I need access to a current row, I don't need to join my audit to my "current" key (which is the method recommended by Dustin and approved by JimL; That innocent "inner join" on potentially millions of rows kills performance on my test systems).

We chose to implement the information entry as strictly an insert because that allows us perform all operations (Insert, Update, & Delete) in as large a batch as we want without duplicating logic, thereby simplifying logic everywhere. In addition, I've found that its very fast. I encourage you to setup test cases and pit this against your method.

So, thats where I was about 3 months ago. I have since been working on improving this to solve a few business problems.

I will illustrate:

So, lets say that Table Patient has a foreign key column, InsuranceFK. Lets also say that a Patient can get zero-to-many medical claims that are paid based on the type of Insurance the Patient has at the time of the claim.

Now, the front end applications display claim information, but they need to know things about the Patient in two different states: the way the Patient was at the time of the claim processed and the way the Patient is now.

Lets say that storing the key to the current audit row isn't an option when at the time the claim is entered in the system. So, you are stuck trying to associate the claim with the correct audit record in the past. You can write a query to join a claim based on its date with the date of the audit row, but that is a very expensive query. Furthermore, a Patient's address information may have changed and they may have added or removed a physical address at a different time than their core information changed. So, you would have to run that expensive query for each correlary table associated with a Patient to get the complete picture of the Patient. As standard method to solve this, you could run those expensive queries when you load your claims information and just keep the keys.

Enter the "chronicle" tables. Their job is to keep a datetime range that a given set of information related to an entity has not changed. So, the following tables would be added to the schema:

PatientChronicle
PatientPhysicalAddressAuditChronicle
PatientMailingAddressAuditChronicle

Notice there are two types of Chronicle tables. PatientChronicle is the "master" table and at a minimum contains the following fields:

PatientChroniclePK int identity(1,1)
PatientFK Uniqueidentifier
Datestamp datetime
BegStamp datetime
EndStamp datetime

The PatientFK & Datestamp represent the foreign key to the PatientAudit record associated with this record. As long as there is a "Current" row for this patient, a row will always exist with an EndStamp of '12/31/9999' or alternately NULL (if you wish to incurr the small penalty of ISNULL or COALESCE on all your queries with the Chronicle). If the Patient has been deleted (which would pretty much only happen for data-entry errors), the last row for that patient has an end date of the delete action.

Each intersection table (...AuditChronicle) will contain the following fields:

PatientChronicleFK int
DateStamp

Where the datestamp is the missing key element to the audit table when combined with the PatientFK in the "master" chronicle.

For the chronicle tables to be useful, they have to represent a snapshot of the Patient information across all correlary tables. Therefore, any time data changes for a Patient a new record is inserted into the Chronicle.

These Chronicle tables can be maintained by triggers. On the "master" audit table (PatientAudit), you need to add two statements to your After Trigger. One that inserts a new chronicle record for all inserted rows. The next updates the EndStamp of the "Current" chronicle record when the action is a delete.

The "Master" Chronicle table (PatientChronicle) needs two triggers as well. An Instead Of Insert trigger to close the last chronicle record by setting the EndStamp to the new StartStamp (-3 ms). An After Insert trigger would insert all the AuditChronicle intersection records as they were for the previous "master" chronicle record (the one that has an endstamp = startstamp -3ms).

Finally, you need to add four statements to each non-master audit table (Physical & Mailing Address). These statements will fix the associated AuditChronicle intersections. They could be placed at the end of your After Insert. You need an Insert into the "master" chronicle table if the datestamp on your audit record is new. You will also need Insert, Update & Delete statements to fix the current correlary index record.

Now, in order for this to work flawlessly, we keep the same datestamp for correlary audit tables when they correspond to the same "batch" of updates. That way you don't end up with multiple chronicle records for a single logical update.

To make this all useable, I've been putting together a .NET client program to generate all those additional structures and code to maintain the audit and chronicles. But, if you index your tables right, you end up with super fast access to all your audit information for a single point in time.

I hope this makes sense. Let me know if you have questions or feedback to my approach.

X002548
Not Just a Number

15586 Posts

Posted - 2004-08-24 : 12:41:18

Seems overly complex...

I'd go with David's solution...(mostly because that's what I do)

Never thought about the 2 timeperiod audit tables though....I just use 1.

But I like it!

And as for the datetiem fields...just let them default, and make sure you move the row as is....

Brett

8-)

JimL
SQL Slinging Yak Ranger

1537 Posts

Posted - 2004-08-24 : 13:02:51

I wish you luck with your model But I thought the point was to be able to view ether the current record or all changes.

The one to many table layout will do this the fastest.

Plus you only trigger when you want to move said records into inactive.

Jim
Users <> Logic

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2004-08-24 : 13:52:38

The current record is always maintained (Patient), and the changes are all in the audit table (PatientAudit). That is precisely "the one to many table layout" you speak of. If you are refering to the Patient table storing only the key to the current audit record, then I assure you it by far slower when querying. With my method, you end up trading a little bit of loading performance and a little bit of storage for query execution speed, which I'll take most any day.

The chronicle tables are not needed if the only reason to keep history information is as an offline audit trail. However, I am attempting to fill the gap that method creates by allowing use of that information in a real-time environment.

Yes its complex. However, a code generator makes this process super simple and I guarentee the benefits far outweigh the disadvantages.

X002548
Not Just a Number

15586 Posts

Posted - 2004-08-24 : 14:07:30

OK, well good luck

How much data are we talking about anyway?

Brett

8-)

lazerath
Constraint Violating Yak Guru

343 Posts

Posted - 2004-08-24 : 14:28:32

Well, my situation is very similar to that Patient / Claim relationship and all of the associated entities. So, I have about 3/4 of a million "Patients" right now and about 10 times as many claims. Both are going to be growing significantly in the next few years.

doxology
Starting Member

1 Post

Posted - 2004-09-28 : 20:42:42

You people seem to be the gurus we need. I'm developing a Sharepoint SErver clinical trial management system for the Department of Veterans affairs and need to find the best solution for transactional level audit recording of changes to INfoPath based forms. I've looked at several 3rd party tools but we would rather have something built in and dedicated to reporting SPS SQL database changes. Any takers? david.rose@md.va.gov please write me if you would like to help.. dave

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources