Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Data Corruption Issues
 How to recover from a corruption

Author  Topic 

Kristen
Test

22859 Posts

Posted - 2005-08-05 : 11:55:02
Paul,

Having read your useful insight into the sorts of issues around the corruptions that folk have posted in this forum I'm interested in preparing for the day when one of our servers gets corrupted :-(

Should I expect to be able to:

Restore FULL backup from Before the corruption first happened
Restore all Tranaction log backups since

and would get me back to ground zero?

I'm thinking:

if I do a DBCC CHECKDB once a week, say, immediately after a FULL backup - then if that gives me some error messages I've got last weeks Full backup and a weeks worth of Trans Log Backups to restore - which is OK for a rare disaster.

I apprecaite that the thing that caused the corruption might be a memory or disk fault, which might also have broken the creation of the backup files too - in which case I'm fairly stuffed!

Does a RESTORE do a checksum of somesort that would detect a corruption caused by, say, a memory fault or a disk controller that wrote the block faultily [twiddled some bits, but did not report error]?

In which case continuous restores of backup files to A.N.Other server would discover the moment at which a backup was "known bad"

Am I close?

Thanks

Kristen

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2005-08-05 : 12:32:18
Yes, you're close. The trick with a disaster recovery plan is
1) to practice regularly
2) check the integrity of your backups (i.e. restore them and make sure that they're ok after taking them)
3) don't rely on backups if you need to limit downtime

Step 2 unfortunately requires extra disk space, but I've just patented an algorithm which allows running CHECKDB on a backup without restoring it entirely - should be available in Yukon+1.

No, RESTORE doesn't run any checks on the integrity of the pages in the backup. In SQL Server 2005, there's an option for BACKUP to check the checksums and torn-page bits (which don't get checked by the buffer pool as backup doesn't read the pages through the buffer pool).

We're going to write a series of whitepapers for SQL Server 2005 on:

  • Using BACKUP/RESTORE as part of data/disaster recovery

  • DBCC (best practices, common scenarios, etc)

  • Interpreting CHECKDB results (this will be very advanced)



that should go some way towards clearing the confusion around disaster recovery planning.

Thanks

Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-08-05 : 12:56:08
Most helpful, thanks Paul.

Doesn't look like my original plan gets me there then :-(

So DBCC CHECKDB more frequently than once a week would be safer then, in case there is a corruption and the backups have got messed-up too?

Would restoring the Backups to a SpareServer and then immediately doing a DBCC CHECKDB there be enough to detect corruption on my ProductionServer?

And if not, and assuming that there are corruptions on ProductionServer already, when I subsequently run a weekly DBCC CHECKDB on the Production server, and then discover that its corrupted, can I be sure my restores on SpareServer are reliable to backup & restore back on to ProductionServer?

(Actually presumable the existing backup files will do, they were OK when restored onto SpareServer)

I made some assumptions:

1) Backup files may not include a disk block that is broken in ProductServer, so restore OK on SpareServer but ProductionServer is corrupted already, we just don't know it yet - particularly if my Full Backup was made when the server was healthy (e.g. several days OK), and I've been making Trans Log backups since then (as it happens we do a DIFF backup daily too)

2) Full Backup and Trans Log Backup once restored on to SpareServer and DBCC CHECKDB without error means that I'm in good shape

Do those assumptions hold water?

Thanks

Kristen
Go to Top of Page

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2005-08-05 : 13:18:09
quote:
Originally posted by Kristen

Most helpful, thanks Paul.

Doesn't look like my original plan gets me there then :-(


At least you care and have a plan - which is better than many it seems.
quote:

So DBCC CHECKDB more frequently than once a week would be safer then, in case there is a corruption and the backups have got messed-up too?


This is a stick question. How often is enough? We say that you don't need to run CHECKDB as we don't have corruption bugs (unlike v6.5) but you still have the total unknown of all the hardware and software (OS + drivers + controller firmware + drive firmware) between SQL Server's buffer pool and the oxide (and then you still have to contend with the oxide itself). There's also memory problems to contend with.

So, what I like to recommend is a weekly full CHECKDB with as many runs of CHECKDB WITH PHYSICAL_ONLY as you're comfortable with in-between. You also need to make sure you've got the latest driver and firmware updates, run memory and IO diagnostics regularly and you should have run sqliostress at least once to make sure your IO system can handle the stress your workload places upon it.
quote:

Would restoring the Backups to a SpareServer and then immediately doing a DBCC CHECKDB there be enough to detect corruption on my ProductionServer?


It'll tell you whether corruption existed at the time the backup was made. Obviously there's no way to know the instant corruption occurs or a page goes bad on disk, but this is as close as you're going to get.
quote:

And if not, and assuming that there are corruptions on ProductionServer already, when I subsequently run a weekly DBCC CHECKDB on the Production server, and then discover that its corrupted, can I be sure my restores on SpareServer are reliable to backup & restore back on to ProductionServer?

(Actually presumable the existing backup files will do, they were OK when restored onto SpareServer)

I made some assumptions:

1) Backup files may not include a disk block that is broken in ProductServer, so restore OK on SpareServer but ProductionServer is corrupted already, we just don't know it yet - particularly if my Full Backup was made when the server was healthy (e.g. several days OK), and I've been making Trans Log backups since then (as it happens we do a DIFF backup daily too)

2) Full Backup and Trans Log Backup once restored on to SpareServer and DBCC CHECKDB without error means that I'm in good shape

Do those assumptions hold water?

Thanks

Kristen



Yes, up to a point. Depending on the nature of the corruption, it may have been propagated into the log backups and so you may not be able to restore past the point of corruption. Worst case scenario. The vast majority of cases I've seen go according to plan but I have seen several cases where the backups are toast too because of propagated errors.

You also need to factor in how much work your business can afford to lose if things go badly wrong - if its a week, then weekly backups are fine. If its a day, then you need to do daily full backups - and so on.

Thanks


Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)
Go to Top of Page

Kristen
Test

22859 Posts

Posted - 2005-08-05 : 14:43:18
Fascinating, thanks.

So myt plan is:

Full backup once a week [or more often], and trans log backup every, say, hour.

I will restore all the backups to SpareServer and immedaitely run DBCC CHECKDB, and thus I will be sure that the backups are good to that point, and can be restored - even if there has already been an earlier corruption on the ProductionServer which has not (Yet! ) corrupted the Full/Trans.Log backup files.

In that secnario my worst-case loss is one hour - the moment when the SpareServer tells me the last Trans.Log backup it restored was toast. We shut up shop immediately at that point, restore everything to an hour before the duff-backup and run from there

(Let's assume the SpareServer has been promoted to MainServer at this point because the old ProductionServer is now marked as Suspect )

Kristen
Go to Top of Page

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2005-08-05 : 16:16:03
Sounds good.

Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2005-08-05 : 17:10:48
I'd like to see SQL Server support tlog mirroring like Oracle does. Any chance a feature like that might show up in a future release?



-ec
Go to Top of Page

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2005-08-05 : 17:24:27
quote:
Originally posted by eyechart

I'd like to see SQL Server support tlog mirroring like Oracle does. Any chance a feature like that might show up in a future release?

-ec



Do you mean having a copy of the transaction log or shipping live log to another server , which is continuously recovering and available as a hot standby?

The former we don't have. The latter is called Database Mirroring and is one of our new features in SQL Server 2005.

Thanks

Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)
Go to Top of Page

eyechart
Master Smack Fu Yak Hacker

3575 Posts

Posted - 2005-08-05 : 17:57:23
quote:
Originally posted by paulrandal
Do you mean having a copy of the transaction log or shipping live log to another server , which is continuously recovering and available as a hot standby?

The former we don't have. The latter is called Database Mirroring and is one of our new features in SQL Server 2005.



The hot standby is cool, we will most definitely use the feature for some of our systems. But I am more interested in having the tlog written to multiple destinations simultaneously (like oracle does). This would probably not be easy to implement, as Oracle's approach with their tlogs is completely different than SQL Server's.

Anyway, I've been told that there was a feature like this with SQL 6, but that was before my time as a SQL Server DBA.



-ec
Go to Top of Page

ryanston
Microsoft SQL Server Product Team

89 Posts

Posted - 2005-08-07 : 10:39:01
Unfortunately, the data mirroring feature in SQL Server 2005 doesn't support multiple mirrors (i.e., multiple log destinations simultaneously), but I have heard of disaster recovery implementations that use transactional replication. It will support multiple destinations (subscribers), but is different in that the "log shipping" is logical rather than physical (like in the case of data mirroring). Perhaps it is something you can look into.

Thanks,

----------------------
Ryan Stonecipher
Developer, Microsoft SQL Server Storage Engine, DBCC
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)
Go to Top of Page
   

- Advertisement -