Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Data Corruption Issues
 In need of maintenance plan

Author  Topic 

WindChaser
Posting Yak Master

225 Posts

Posted - 2006-07-22 : 17:19:19

Hi folks.

Our products are VB6-based interacting with MSDE2000 i.e. none of our clients have EM. We have auto-backups performed twice a day by default and we encourage people to keep it to a least 2 per day.

Yesterday, one of our clients reported a problem. Upon investigation, I did a DBCC CheckDB WITH ALL_ERRORMSGS which returned the following:

Object ID 1461580245, forward row page (1:159), slot 50 points to page (1:234), slot 43. Did not encounter forward row. Possible allocation error.

So after reading a lot of posts and blogs from Paul Randal, I proceeded cautiously to copy the DB then perform a DBCC CheckDB REPAIR_REBUILD which had no effect, then a DBCC CheckDB REPAIR_ALLOW_DATA_LOSS which also had no effect. Then, I determined which table was at fault via DBCC CheckTable and I exported its data to a blank table where I discovered the missing data row and corrected for it manually. End of story.

Not a funny situation. The worst part is that this defect may have been there for a very long time, meaning that restoring the latest backup would not have helped the situation. I now realize that relying on backups alone is a huge no-no.

So, having been scarred into reality, I would like to install an automated maintenance plan. I'm used to doing it on my personal station using EM but, as stated above, I can't do that for the clientele. I was thinking about simply shrinking the databases and then doing a DBCC CheckDB WITH ALL_ERRORMSGS on all of them before performing a backup. Would this be a complete enough procedure or should I be doing something else? My understanding is that this will verify index structure and data integrity, and not attempt to repair anything which is uncovered.

Any thoughts of wisdom would be greatly appreciated.

Thanks!

Kristen
Test

22859 Posts

Posted - 2006-07-23 : 03:14:52
Do NOT do a SHRINK

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=55210&SearchTerms=Database%20Shrinking,Shrink,Shrinking

Quite a lot to suggest I'm afraid:

Lets start with: How critical is the data, and how much is it worth?

You'll need to do a cost-benefit equation - it takes a lot of time and effort to set up a bullet-proof maintenance routine, although it can be pretty much automated once you've got everything covered. If you want "five 9s" - 99.999% up-time - its going to cost you a fortune!

I would recommend that you use Transaction backups, in addition to "Full" backups (you need to set your Recovery Model to FULL, rather than SIMPLE, for this).

One reason is that there may be an error in a FULL backup (RAID disk went down, for example), but not in the Transaction backups, and thus you could recover to a really old, known-good, Full Backup and then all Transaction backups since and be a hero! (Also you can restore TBackups to a point-in-time which is prudent in case of some unforeseen disaster - like
DELETE FROM MyImportantTable WHERE 1=1 !!

OK, So lets assume you are making full backups once a day, and TBackups every, say, 15 minutes (you need to do them often enough to catch a failure, every 15 minutes will make 4 small files an hour, rather than one larger file, so no reason not to do them frequently)

So now you need to consider what disasters befall you:

User data accident (DELETE *) - restore to latest time when the DB was "known good". Likewise for a software bug that wrecks the data unexpectedly.

DBCC says the database is corrupted - same procedure, but be prepared that the backups might be duff too, and you will have to restore to an earlier point-in-time (frequent DBCC CHECKDB checks are a good idea therefore ).

Machine is fried ... buy a new machine and restore from tape. How often are your tape backups? How much data will you lose?

You might decide that you need to improve your backup routine to IMMEDIATELY copy the backup file to another machine on the LAN and not wait for the tape backup (but you still need the tape backup for backup-history and off-site storage etc.)

Aeroplane lands on the building / building burns to the ground ... you should be rotating tapes to off-site storage (taking one home each night may be sufficient, using a secure service may be a requirement, etc) ... how quickly can you get spare machinery up and running?, and can you actually still buy a tape drive for the really old tapes you are using? You need a business continuity plan which considers these things.

Now there is a chemical/radioactive leak and there is a 50 mile exclusion cordon - can you still get to your off-site tape backups and disaster-recovery site?

At this point your disaster recovery plan has to think about available personnel etc. (one company I did some consultancy for had to plan that no-one in the department was alive any more ... hard to raise enthusiasm for making that plan!)

So you've got your cost-benefit equation, and that should be telling you that you need something between:

a) the business owner takes a tape home each night and has a tape drive on his home computer that he can restore it to, and in the event of a disaster his plan is to buy new computers at Acme Computer Store and run them on his dinning table.

b) a fail-over or warm-standby machine at a remote site is required with a maximum of 5-minutes switch-over!

OK, all the plans are in place. Now you have to prove that they work. You have to rehearse the plan, and make sure all the bits work. You probably don't actually have to buy 10 PCs and put them on the boss's dinning room table but you DO have to take the ones from the office and put them on his table - is his broadband/firewall sufficient?, are there enough power points?, what about the heat and noise they generate?, is his single voice phone enough? .... you need to know the issues BEFORE the disaster has happened.

And you need to do a Fire Drill at least 4 times a year - otherwise something in your systems will change and it won't be covered in your plan. You may very well be able to "make do" when that happens, BUT time will be tight, everything will be in disarray, and NOT having a copy of that new application update that was installed last week, for example, will waste precious time.

Coming backup to your SQL backup, off the top of my head you should consider:

Daily full backup

Transaction backup every, say, 15 minutes

(Copy to Tape daily / other machines on the LAN as required, etc.)

Daily DEFRAG (or possibly Index Required) of database tables which need it (ONLY!!)

Daily rebuild of statistics (ideally only on tables that need it, but I haven't found a way to determine that yet)

Daily DBCC CHECKDB to check the health of the database

Ideally restore every backup to a different machine (and DBCC CHECKDB it) to make sure the backups CAN be recovered. As a fall back restore & test one FULL backup a week.

Check, daily, that all these things have happened as intended, and without error. This is very tricky ... an Email telling you what was backed up requires you to realise that the email did NOT arrive - and its going to fail once in a blue moon, and then only when you have a hangover and are on holiday!

You could have a number of machines talk to each other about their health, and then any of those machines could "complain" if they didn't hear from a machine - which has most likely failed. Do NOT build a system that ONLY sends an email when it fails - 'coz if its SMTP service has failed you won't hear anything.

And finally! do NOT get yourself added to this thread!!!

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=67782

Kristen
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2006-07-23 : 12:28:56
Kristen,

Your detailed response will certainly become a reference for the forum.

However, most of it would not be easy for us to implement because, unlike a DBA who has access to the DBs, we sell our products and do not have access to them afterwards unless a client needs us to access them. We have hundreds of clients, all of which are medical clinics which have DBs ranging from 2M to 15M in size. They usually have an average of 20-100 transactions per day. So although the DBs are mission-critical, they encounter low-traffic. None of the users (who are not very computer-literate) have direct access to DBs, so the DBs are insulated from any accidental command which might destroy any data. A backup twice a day is already overkill for most of these clinics. In most cases, once the auto-backup is completed, it's compressed, encrypted and auto-transfered to a distant FTP site. Actually, I honestly thought that's what most DBAs did, but then again our backups are quite small in size so we certainly would not want to contend with the ackwardness of tapes. So, although the content of your reply specifically targets large enterprises, I find that it's great for small businesses as well who can use a subset of it, depending on their specific needs.

However, there is something that I don't understand. You wrote "One reason is that there may be an error in a FULL backup (RAID disk went down, for example), but not in the Transaction backups". If any type of backup is preceded by a DBCC CheckDB and that the check comes out clean, wouldn't it be guaranteed that at least the DATA is OK in the backup, whatever the type?

Thanks!

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-07-23 : 12:56:06
"So, although the content of your reply specifically targets large enterprises ..."

I reckon you are already doing what I was proposing, ergo:

Backup twice a day and the backups are being ZIPped and FTP'd to a remote location. That's the same as local copy to LAN and Off-site tape backup! but a more right-sized approach for your circumstances.

Your backup strategy sounds fine for 20 - 100 transaction to me [with the caveat that the users would be able to REMEMBER what the days transactions are if the system went Bang!, otherwise you do need TBackups too].

Probably worth keeping a week, say, backup files on the users hard disk to be available to recover from (i.e. as an end-user routine)

You ought to have a plan (sounds like you already have) for a user saying "My computer caught fire!" - presumably they can, with your help, reinstall the software on a new machine and FTP down the latest backup, restore and resume from there.

"If any type of backup is preceded by a DBCC CheckDB and that the check comes out clean, wouldn't it be guaranteed that at least the DATA is OK in the backup, whatever the type?"

No, I don't think you can make that assumption. The only assumption that DBAs I know make is that "I have recovered this backup on a separate server and then DBCC CHECKDB'd it, and it was OK"

I can't tell you what the odds are of a DB being "clean" and the backup being "bust", I should imagine negligibly small to be honest, but a hardware fault occurring between DBCC CHECKDB and BACKUP could cause the writing of the backup to be duff.

I can't imagine a backup from yesterday also being similarly broken - apart from some obscure not-yet-found bug in MS's backup routine - even more vanishingly small odds I would guess. If the controller failed just before yesterday's backup then I would expect all sorts of errors and so on during today's processing, so the fact there was a problem would most likely have come to light, and today's DBCC is probably going to raise an error. So long as the operator doesn't ignore the message, and you have some backup history to revert to, it sounds fine to me.

"so the DBs are insulated from any accidental command which might destroy any data"

Don't forget the possibility of a software bug doing the same thing.

Kristen
Go to Top of Page

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2006-07-23 : 13:10:20
Here's the thing that most people don't realize about CHECKDB, when it completes, its telling you that what it read over the last hour (or however long the CHECKDB took to run) was good. That's where the guarantee ends. The instant the command completes, a corruption could happen. The backup file is only as good as the hardware that its stored on but anything could happen to it.

Paul Randal
Lead Program Manager, Microsoft SQL Server Storage Engine + SQL Express
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-07-23 : 14:14:14
It is possible for the database be OK, but the backups could be written to a bad disk and be unusable. You may want to add a process to insure the backups are good.

I would definitely do transaction log backups. They don't really cost extra, and there usually comes a time that you need them.

I would also look at some kind of local replication of the backups and transaction log backups to another machine, where it is possible. ROBOCOPY is a great tool for mirroring a directory from one machine to another.

One last thing to consider. Most software companies do not get involved in database administration, and leave it to local administrators. There is an important reason for this; they are not responsible for any damages as a result of failure to do backups.

Since your company is involved in the backup process and are setting up the process for your customers, you have a responsibility to make sure that you meet some reasonable, defensible standard of care to insure that your clients do not suffer a loss because of a failure in the backup process you have setup.

I would look very carefully at the potential areas for failure in this process, the potential for financial loss in case of a failure, and the recommendations that Kristen and others make to insure against it. Even if you have provisions in your contracts to limit your liability, that may not protect you if you are not meeting reasonable standards of care.





CODO ERGO SUM
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2006-07-23 : 16:10:20
Kristen,

You're correct in that all backups accumulate on the local HD and usually get cleaned out only once a year (they're so small that it won't affect any reasonably size HD). And if a problem occurs mid-day, then whatever happened since the last VALID backup is gone and has to be re-entered manually, but that's not much. And, again, you're correct in that we restore the backups and re-install the app in the event of a catastrophe. So, perhaps doing a DBCC CheckDB followed by a Simple backup at least twice a day, then defragging the indices once a week will be sufficient.

Of course, the "Don't forget the possibility of a software bug doing the same thing" is always a concern.... but perish the thought! We have long beta test periods to avoid such nightmarish circumstances.

Paul, thanks for pointing that out. The DBs are so small in our case that a DBCC CheckDB of all the DBs runs in less than 8 seconds on my old computer, so there’s pretty limited risk there, then the auto-backup kicks in and gets done in about 10-15 seconds.

Michael, you’re right that the transaction log backups take next to nothing in terms of time and resources, but honestly, given how little time it takes to do a DB backup, it’s probably best to just increase their frequency. And explaining Tbackups to users would be a roadblock. And just a precision: by default, the software configuration settings are set to 2 auto-backups per day at noon and 5PM but the users can specify between 0 and 48 backups per day in their options, in addition to having the possibility of generating an on-the-spot manual backup anytime they wish. The users control their options as well as their backup-handling protocol. And even if a backup should end up corrupted, a user would have to prove that the fault lies with the application, not the hardware, nor the system, nor the transfer medium, nor SQL Server…. which is simply impossible to prove. One client once removed all auto-backup settings and hadn’t done a manual backup in 6 months until he needed it. I’m sure that he must have investigated his legal options, only to realize that we couldn’t be blamed.

Thanks for your contributions. With better quality information, I feel a heck of a lot better w.r.t. my chances of better controlling the future, and in life that's no small thing.
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-07-23 : 16:54:37
"given how little time it takes to do a DB backup, it’s probably best to just increase their frequency"

I reckon you are in pretty good shape, and in your scenario you may not need TBackups at all, but I would define their usefulness a little differently from how you've said it.

So just for pedantic clarity! TBackups would:

1) Take VERY little space given your DB size and transaction rate. However, even with a large hard disk 48 FULL backups a day would cost some significant space over a period of a few weeks or months ... !

2) TBackups give the ability to recover to point-in-time. That has a number of advantages:

a) Obviously you can restore to just-before a Human Error

b) You can progressively recover (albeit that it is tedious!) to discover precisely what transactions occurred - fraud / unfathomable software error / stupid-user error! etc.

c) That progressive recovery might be crucial in restoring up-to-the-point of a hardware failure

d) We have had DB corruptions that were also represented in the FULL backup but not in the TBackups (i.e. a page fault in the DB was replicated in the Full backup, but the transactions were not damaged) and that gave us an extra ability to recover [from an earlier clean FULL backup + all Tbackups] without actually losing ANY data. So a possible additional string to the bow

Downside is that they do increase the complexity of the overall beast somewhat, so it may not suit you.

I assume you have steps in place to ensure that your Recovery model is set to SIMPLE if you are not using TBackups - otherwise the LOG file will grow huge ... And if its an MSDN database you might to look into whether AutoClose is on (which it is by default on DBs created on an MSDN system) or not. Pros and Cons of having the database closed when the user just switches off their machine etc. against the delay in re-opening the database each time the user starts/uses your application.

Kristen
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-07-23 : 17:20:22

I think you are missing my point about liability. Of course there are many possible reasons for data loss. However, the client could claim that they relied on procedures setup by you (the presumed expert), to safeguard them against loss, and that the procedures you setup for them to follow did not meet a reasonable standard of care, and they suffered a loss because of this.

For example, they might find experts who advise that transaction log backups should be scheduled every 15 minutes, but that your procedure did not do this. Or that off-site backups should be retained for a considerable time to insure that undetected data losses can be recovered from old backups. Or that backups should be verified to ensure that they can actually be used for recovery. Or…



CODO ERGO SUM
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2006-07-23 : 22:36:03

Kristen, thanks for the precisions and extra advice.

Michael, your point is well taken and it has always been a personal concern of mine. This is a topic which is close to my heart given that I have a degree in Law, but I don't know if getting into specifics is appropriate for this forum. But inasmuch as we do not let this thread go too far, it might be useful info for the forum to understand the developers’ standpoint.

The fact is that we allow the clients to determine what they want in terms of backup strategy. We are not our clients DBAs. We cannot be presumed to be, nor are we contracted to be, nor are we compensated for being such, nor can we be sued for not having been adequate DBAs. Presumption of innocence remains the hurdle for a claimant, not the defendant (albeit this is true in my country, it is not an international norm of course). Most applications on the market offer no backup facility. The fact is that our application could be sold without backup facilities and clients could just ghost their disks every 15 minutes if they wanted to. MSWord, MSExcel, income tax software, etc., etc., do not come with backup tools or backup strategies. If there’s a hardware failure which corrupts a critical MSWord document, a user can’t sue Microsoft for not having provided proper backup counseling. If an application uses SQL Server to store its data instead of a MSWord document file, the reasoning remains the same and you can’t hold an application which uses SQL Server as its “data repository” to a higher standard. So although you bring up a good point in that a claimant could bring in an expert witness to state that better can be done, just because an application uses SQL Server to store its data doesn’t mean that it has to use all its wears to try to attain 99.9999999999% reliability at whatever cost. Further, just because an application provides the means to backup data doesn’t mean that the developer has to hold the user’s hand and to advise him how best to do it. We do not claim that the data will forever be 100% safe, nor do we advertise it, nor do we guarantee it.

Actually, just to go one step further, even if an application is wrongly coded and grossly miscalculates something of huge importance like say taxable income, if the Licensing Agreement or Usage Convention is properly written, the developer cannot be held liable at all. I understand where you’re coming from in that SQL Server provides all of us with a rich set of backup procedures and that DBAs try to make the most of them as they face very difficult circumstances and, well, job or contract expectations. We, as software developers, are not held to the same standard and are in no way bound to make full use of this rich set. In that sense, DBA's have it tougher than developers. Nor can we be pressured into upgrading our products from MSDE2000 to SQLServer 2005 Express because it has more and improved backup features. In our case, I guess it’s kind’a political: although we can’t be held liable, we still want the clients to be 100% satisfied, within reason, otherwise market forces will quite simply push us aside.
Go to Top of Page

KenW
Constraint Violating Yak Guru

391 Posts

Posted - 2006-07-24 : 12:03:15
WindChaser,

There's one problem with your explanation above. MS Word doesn't offer backup; your application does. I'd be concerned - the application offers backup, and the end-user is well within their rights to expect (even depend on) that backup process being dependable. I'm not sure I'd agree that you'd have no liability if it didn't. And remember: the burden of proof is much lighter in civil proceedings than in criminal.

Just my $0.02.

Ken
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-07-24 : 12:54:28
You lot in the USA are litigious nuts! Just my GBP 0.02 worth ...

... WindChaser is backing the database up. If the database gets corrupted by a hardware fault and the backup also contains that corruption then the customer can go jump. Similarly if the backup file got corrupted by some separate hardware fault. Its consequential on their hardware having failed. Tough.

Having a process that alerts the customer to a detectable corruption in their database is a sensible feature. When it reports an error the customer may lose all the data since the last time the report showed their data was clean. The customer will quite likely, IME, ignore the error anyway if it seems like the data is OK and they are busy at the time ...

... but having said that a "Phone Home" when DBCC CHECKDB detects an error is not a bad thing (assuming the customer ignores it).

And something that proved that the backup file, once created, was received by the remote FTP site would be good too - particularly if the backup was ZIPped and the remote location also did a ZipTest to check the CRC on the zip contents.

If the customer wants more than that they can take their own responsibility for their data, employee a DBA and pay for five 9's !!

Kristen
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-07-24 : 13:36:15
quote:
Originally posted by Kristen
...You lot in the USA are litigious nuts!...

Oh yes. Lots of sharks swimming in the water, eager to take a big bite, so it pays to look where you're swimming. You can go just as broke from a lawsuit you win, too.


CODO ERGO SUM
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2006-07-24 : 15:41:49

Michael is so right: you do need to shield yourself because the environment is such as it is, although more so in USA.

Ken, I agree but with a caveat: the burden is proof, not presumption. And reasonable presumptions can only be used to fill gaps in an ensemble in civil proceedings. Proving that the backup is at fault is next to impossible, as Kristen points out. Also, given that you can only really claim damages, it's all about diligently limiting the damages for the client. So if several sanity checks and backups are made every day, and only a few transactions are missing in the event of a DB problem, it's just not worth a lawsuit.

Kristen, I also thought of the homing beacon. Imagine the client's face when he gets a call from us 2 minutes after he gets the warning message on his screen. They'll appreciate the proactiveness.

Thanks all!

Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-07-24 : 16:54:31
"homing beacon"

Stolen for future use!

"They'll appreciate the proactiveness"

Can't you just send them a Writ for not looking after your application properly?

Kristen
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-07-24 : 17:17:05
quote:
Originally posted by WindChaser
...I also thought of the homing beacon. Imagine the client's face when he gets a call from us 2 minutes after he gets the warning message on his screen. They'll appreciate the proactiveness...

Of course, once you do that, you really are their DBA. "We saw the message, but you didn't call, so we thought everyting was OK"...

What is the FTP site you mentioned? Is that something your company provides, or is it a third party service?



CODO ERGO SUM
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2006-07-24 : 21:45:48


quote:
Of course, once you do that, you really are their DBA. "We saw the message, but you didn't call, so we thought everyting was OK"...


You know what, you're right, only the good die young so you should never go out of your way to do the right thing (Seinfeld rules). Mind you, DBA is specific to database mgmt. We could just as easily say that our product called the mothership for help for whatever reason which may or may not be related to the DBs.


quote:
What is the FTP site you mentioned? Is that something your company provides, or is it a third party service?



Huge question. No, we do not provide them with FTP service because we'd become responsible for ensuring that the site is always up and we'd have to provide janitorial services to clean out the space so it doesn't overflow. Not worth it. Clients can obtain in excess of 10GB of web space for less than $8/month these days (2006). All they have to do is configure the settings in the options of the application. Folks love the idea of not having to remember to leave the premises with their backup and that they can get the service on the cheap.
Go to Top of Page

Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)

7020 Posts

Posted - 2006-07-24 : 21:58:36
quote:
Originally posted by WindChaser
...You know what, you're right, only the good die young so you should never go out of your way to do the right thing (Seinfeld rules). Mind you, DBA is specific to database mgmt. We could just as easily say that our product called the mothership for help for whatever reason which may or may not be related to the DBs...


How about it you call them and say you noticed their data is badly corrupted due to a multi-jiggawatt overload in the flux capacitor and offer to work your magic to rescue their data (for a substantial fee of course).




CODO ERGO SUM
Go to Top of Page

WindChaser
Posting Yak Master

225 Posts

Posted - 2006-07-25 : 08:24:11

Michael, please remain on topic. Flux capacitors are for time travel...
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2006-07-25 : 08:34:26
"Flux capacitors are for time travel..."

There you go then ...

... "We are calling you from the week-after-next in response to your panic call about the corruption to your database that you haven't yet noticed, and will continue not to notice for another 14 days ..."

That would be cool wouldn't it ... "Your disk drive just failed, we have travelled back to yesterday and preemptively fitted a new one. Here is an invoice for $10,000"

That should sort out the blame-culture too Mind you, how would you prove that the old drive WOULD have failed? ...

... "I've just arrived from the future to fit a replacement for your drive which failed [sic!] tomorrow"

... the customers would just pay up, wouldn't they!

And you only need one spare drive to start your company - so long as all your clients have the same drive type

<fx:Toddles off to get Venture Capital funding!>

Kristen
Go to Top of Page
   

- Advertisement -