Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
cmbergin
Starting Member
14 Posts |
Posted - 2011-09-26 : 12:40:35
|
My weekly system database maintenance plan failed on the "check database integrity" step. Running CHECKDB this morning, I found corruption in MSDB.I don't care about the data in sysmail_allitems; it's nice to have, but not a big deal. I need to have the shortest maintenance window possible, so if repair allow data loss is faster than a restore from last week's MSDB backup, I'd like to do that instead. However, I'm concerned about the errors found not associated with any particular object. I don't know what might happen there.In a related question, will my user databases remain online while restoring MSDB? I make use of database mail and some CLR procedures; if only those things will fail while MSDB is being restored, I can get around those failures by letting the users know what features will be disabled so they can work around it.Anyway, sorry for the rambling. Here's the CHECKDB output:Msg 8909, Level 16, State 1, Line 1Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:5126312) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).Msg 8909, Level 16, State 1, Line 1Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:5126313) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).Msg 8909, Level 16, State 1, Line 1Table error: Object ID 0, index ID -1, partition ID 0, alloc unit ID 0 (type Unknown), page ID (1:5126314) contains an incorrect page ID in its page header. The PageId in the page header = (0:0).CHECKDB found 0 allocation errors and 3 consistency errors not associated with any single object.Msg 8929, Level 16, State 1, Line 1Object ID 1454628225, index ID 1, partition ID 72057594044874752, alloc unit ID 72057594049331200 (type In-row data): Errors found in off-row data with ID 155254784 owned by data record identified by RID = (1:5125138:1)Msg 8928, Level 16, State 1, Line 1Object ID 1454628225, index ID 1, partition ID 72057594044874752, alloc unit ID 72057594049462272 (type LOB data): Page (1:5126312) could not be processed. See other errors for details.Msg 8965, Level 16, State 1, Line 1Table error: Object ID 1454628225, index ID 1, partition ID 72057594044874752, alloc unit ID 72057594049462272 (type LOB data). The off-row data node at page (1:5126312), slot 0, text ID 155254784 is referenced by page (1:5125138), slot 1, but was not seen in the scan.Msg 8928, Level 16, State 1, Line 1Object ID 1454628225, index ID 1, partition ID 72057594044874752, alloc unit ID 72057594049462272 (type LOB data): Page (1:5126313) could not be processed. See other errors for details.Msg 8965, Level 16, State 1, Line 1Table error: Object ID 1454628225, index ID 1, partition ID 72057594044874752, alloc unit ID 72057594049462272 (type LOB data). The off-row data node at page (1:5126313), slot 0, text ID 155254784 is referenced by page (1:5125138), slot 1, but was not seen in the scan.Msg 8928, Level 16, State 1, Line 1Object ID 1454628225, index ID 1, partition ID 72057594044874752, alloc unit ID 72057594049462272 (type LOB data): Page (1:5126314) could not be processed. See other errors for details.Msg 8965, Level 16, State 1, Line 1Table error: Object ID 1454628225, index ID 1, partition ID 72057594044874752, alloc unit ID 72057594049462272 (type LOB data). The off-row data node at page (1:5126314), slot 0, text ID 155254784 is referenced by page (1:5125138), slot 1, but was not seen in the scan.CHECKDB found 0 allocation errors and 7 consistency errors in table 'sysmail_mailitems' (object ID 1454628225).CHECKDB found 0 allocation errors and 10 consistency errors in database 'msdb'.repair_allow_data_loss is the minimum repair level for the errors found by DBCC CHECKDB (msdb). |
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2011-09-26 : 13:10:25
|
The damage is entirely in sysmail_mailitems. If you look, the page numbers for the three 'errors not associated with any single object' are the same page numbers for the errors in sysmail...Do you know that last week's backup was clean?As for repairing, maybe... http://sqlskills.com/BLOGS/PAUL/post/Is-running-repair-on-msdb-safe.aspxIf you're absolutely certain you don't need the mail items that will be removed then it's probably OK.SQL agent will be down while MSDB repairs or restores. Probably service broker, definitely mail. CLR procedures should be fine (unless you have ones stored in MSDB). Backups may fail because they can't write history.--Gail ShawSQL Server MVP |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-09-26 : 13:28:51
|
"Backups may fail because they can't write history."Just curious: If they are triggered by Sshduled Tasks will they not run anyway (whilst MSDB inaccessible)? |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2011-09-26 : 13:30:04
|
SQL Agent can't run without MSDB. Manually (or windows scheduled or 3rd party tool scheduled) backups may fail as well.--Gail ShawSQL Server MVP |
|
|
cmbergin
Starting Member
14 Posts |
Posted - 2011-09-26 : 14:34:47
|
Thank you. I do know that last week's backup is clean because the maintenance plan is set up to check integrity first, then only take a backup if the integrity check is successful; the "verify backup integrity" option is turned on as well. I guess I could always restore the backup as a user database and run CHECKDB on that to be sure.Thank you for the link. Since the damage is entirely to the mail items table, I feel safe running the repair option. We only use the mail records there to prove that certain messages were sent, and we've only needed to do that twice in the last three years. It's an acceptable data loss for us.I read somewhere about a couple of techniques to see exactly what's going to be deleted by using DBCC PAGE or something like that. Perhaps I'll find one of those articles and see if I can extract the data the repair will delete just to be safe.Thank you for your help!PS - is there a way to estimate how long a repair will run based on the size of the data files or something like that? I know how long my backups typically take if those times are in any way related. I need to request an appropriately-sized maintenance window from my IT director. |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2011-09-26 : 14:59:18
|
Nope. It takes as long as it takes, usually longer than a normal checkDB. Not related to backup duration.Have fun trying to get the raw data back... those are LOB columns on the damaged pages, and that table has multiple LOB columns included in it.--Gail ShawSQL Server MVP |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2011-09-26 : 15:00:58
|
p.s. Yes, you should restore and run checkDB to ensure that a backup is usable. The verify option just checks that the backup header is intact (unless you are backing up with checksum)--Gail ShawSQL Server MVP |
|
|
cmbergin
Starting Member
14 Posts |
Posted - 2011-09-26 : 15:05:09
|
Alright. Thank you again. I'll definitely test the backup, then I'll probably just restore over msdb since I know roughly how long that takes. |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-09-26 : 17:35:57
|
You could restore a fresh MSDB backup to A.N.Other machine (or A.N.Other database on main machine) and do a repair there - that might give you some confidence on the repair time.Why doesn't MSDB run in Full Recovery model? If it did you'd probably get the whole lot back from TLog backups. Oh well ... Is the cause of this likely to be a hardware issue? if so that needs to be found & fixed before it strikes again. |
|
|
Kristen
Test
22859 Posts |
Posted - 2011-09-26 : 17:40:22
|
quote: Originally posted by GilaMonster SQL Agent can't run without MSDB. Manually (or windows scheduled or 3rd party tool scheduled) backups may fail as well.
As I thought, thanks. So only got to worry about manually launched things, (we don't use 3rd party backups, but I'll try to remember that in case I'm working on a site where they do)I might build into our adhoc "Make A Backup Sproc" to check that MSDB is available - it would be a bit neater than potentially generating a backup file that then cannot be logged in MSDB. |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2011-09-27 : 05:32:07
|
I did a quick test:quote: Processed 3760 pages for database 'Masters', file 'Masters' on file 1.Processed 1 pages for database 'Masters', file 'Masters_log' on file 1.Msg 924, Level 14, State 1, Line 1Database 'msdb' is already open and can only have one user at a time.Msg 3009, Level 16, State 1, Line 1Could not insert a backup or restore history/detail record in the msdb database. This may indicate a problem with the msdb database. The backup/restore operation was still successful.BACKUP DATABASE successfully processed 3761 pages in 1.418 seconds (20.721 MB/sec).
--Gail ShawSQL Server MVP |
|
|
cmbergin
Starting Member
14 Posts |
Posted - 2011-09-27 : 15:58:25
|
Thank you for your suggestion, Kristen. That makes a lot of sense.With corruption, I normally assume something is wrong with the hardware and immediately send over a work order to the hardware group. It's only happened twice before, and both times it was something going wrong on the RAID controller. This time, they said the RAID cabinet has no errors or warnings, and there's nothing in the Windows event log, either. I'm not too surprised, as this was after a failover and subsequent failback to our passive server, and we use a third-party tool for that. I'd feel better using SQL Server's built-in HA options, but we only have Standard Edition. |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2011-09-27 : 16:09:22
|
Corruption is (in the vast majority of cases) a hardware error. If that 3rd party tool is replicating the data files, it could have made mistakes. What tool is it?Something has scrambled the page IDs of 3 contiguous pages in MSDB. If they contain recent data, could be that their changes were not replicated correctly--Gail ShawSQL Server MVP |
|
|
cmbergin
Starting Member
14 Posts |
Posted - 2011-09-27 : 16:47:38
|
We use DoubleTake, which I've found to be a little touchy. It works fine when it's a controlled failover for the purpose of moving to a new server with minimal downtime. I don't trust it to handle failovers during disasters, as in my experience, we're usually just left with a suspect database on the passive target and have to restore from backup anyway. My best guess is that the product is designed to work with two servers in close physical proximity, but we've got it replicating between two different data centers. I, unfortunately, don't get to make the call regarding backup and HA solutions. My SQL Server backups are "unofficial" since, in theory, BackupExec is taking care of the database backups, but I don't see any reason to trust BackupExec, and plenty of reasons to trust native backups. |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2011-09-27 : 17:18:53
|
Ok... Nice environment. Considered looking for somewhere a little saner?--Gail ShawSQL Server MVP |
|
|
|
|
|
|
|