Event Log reports Database Corruption

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2008 Forums

SQL Server Administration (2008)

Event Log reports Database Corruption

Author

Topic

Kristen
Test

22859 Posts

Posted - 2011-09-27 : 06:11:05

Application Event Log recorded an Error event:

Attempt to fetch logical page (1:1634317) in database 8 failed. It belongs to allocation unit 19422104110694400 not to 72057594049134592.

I presume "database 8" is the same as


SELECT name
FROM master.sys.databases
WHERE database_id = 8

and if so then


DBCC CHECKDB WITH NO_INFOMSGS, EXTENDED_LOGICAL_CHECKS, DATA_PURITY

shows no errors.

Can't see any earlier errors in the event log. Bit reluctant to just ignore it though.

We've been having the devils own time with the poxy Oracle drivers on this box (pulling data from a linked Oracle server). I can't seem to find any files under the \oracle root folder, or any file on a local drive named ora*.*, that has changed since the error was recorded in the error logs.

When I try to run the scheduled task to query the Oracle server manually I'm frequently (but not all the time) getting

Server: Msg 7399, Level 16, State 1, Line 1
The OLE DB provider "OraOLEDB.Oracle" for linked server "MY_ORACLE_NAME" reported an error. The provider reported an unexpected catastrophic failure.
Server: Msg 7323, Level 16, State 1, Line 1
An error occurred while submitting the query text to OLE DB provider "OraOLEDB.Oracle" for linked server "MY_ORACLE_NAME".

That may be totally unrelated though. Its always been an issue (although I have no idea how often, if it was running 25% of the time I don't suppose anyone would notice, the process is fail-safe and resumes from the last successful attempt)

Kristen
Test

22859 Posts

Posted - 2011-09-27 : 06:23:02

I've run

DBCC CHECKDB WITH NO_INFOMSGS, EXTENDED_LOGICAL_CHECKS, DATA_PURITY

on all DBs (except TEMPDB) and no errors reported

There are Exceptions logged for each scheduled task run during the night (but not for the ones I have run manually where I got an error)

previously there were exceptions reported (coinciding with some of, but not all, the scheduled task runs) when the Oracle drive was playing up (I think that just resulted in a reboot, rather than installing newer version of Oracle drivers)

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-09-27 : 06:56:12

Let me guess, there's been an index rebuild/reorganise since that error?

Nothing to do with oracle drivers, that's database corruption. Check hardware.

--
Gail Shaw
SQL Server MVP

Kristen
Test

22859 Posts

Posted - 2011-09-27 : 07:16:30

I'll try to refrain from advising "Just rebuild your indexes, you'll be fine" next time I see a data corruption message here!

Thanks for that, off to investigate.

Hardware fault may be causing the Oracle drivers to be wobbly too, perhaps.

Kristen
Test

22859 Posts

Posted - 2011-09-27 : 08:25:48

No sign of index rebuild

There is no scheduled task.

MODIFY date on the index has not changed
Latest Stats date has 3 statistics more recent that the error logged in Event Viewer, but there is no "wholesale" set of stats dates that would suggest a scheduled task had run.

Could the duff page-fetch have been statistics (which have since been updated)?

The IT guys are looking into the health of the server meantime.

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-09-27 : 08:58:25

No. Not statistics (they aren't data). Something happened to deallocate that page between the time that the message appeared and the time you did checkDB.

Index rebuilds are NOT a fix for corruption. Since they deallocate pages, they can make a minor corruption disappear. Emphasis 'minor'.

--
Gail Shaw
SQL Server MVP

Kristen
Test

22859 Posts

Posted - 2011-09-27 : 10:55:42

"Index rebuilds are NOT a fix for corruption"

Hehehe ... yeah, I was kidding

The tables in this database get quite a lot of updates - the database is used for staging. We don't bother with UPSERT we just do:


DELETE T
FROM Target AS T
    JOIN #Source AS S
        ON S.ID = T.ID

then


INSERT INTO Target(Col1, ...)
SELECT Col1, ...
FROM #Source

Could that have deallocated the broken page?

Or should it have given me a Torn Page error? (I'll just go check that Torn Page IS turned on for this database)

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-09-27 : 12:00:09

It may well have. If a delete emptied a page then the page could have been deallocated.

No, the error you got is not a torn page error. Why do you have torn page on anyway? Checksum is the default and recommended setting unless you're still on SQL 2000 (which, from the error message you're not)

--
Gail Shaw
SQL Server MVP

Kristen
Test

22859 Posts

Posted - 2011-09-27 : 12:13:35

Sorry, mental blockage, I was thinking 2000-speak. We do indeed have CHECKSUM turned on for that database; its SQL 2008 (R1).

Would SQL not have raised error when deallocating that page (when it found that the checksum was wrong)? Or is the error in Event Viewer the result of a CHECKSUM error raised by the page deallocation? (I was expecting my app. to be given a fatal error - although its possible it was and it hasn't manifested in any useful way through the APP)

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-09-27 : 13:10:58

But that's not an incorrect checksum error...
It was a very fatal error though. It's severity 21 which means that the connection would have been terminated as a result.

I can't go back in time and watch your app to see what happened. 'Disappearing corruptions' are a result of the page that's in error being deallocated or some other operation that modifies the affected structures.

--
Gail Shaw
SQL Server MVP

Kristen
Test

22859 Posts

Posted - 2011-09-27 : 13:22:30

"But that's not an incorrect checksum error..."

Sorry, I should have realised that.

So if I understand it correctly SQL has written a page with a pointer to an incorrect page. There was physical checksum error on that page, so a checksum error was not raised. Fair enough. But nonetheless that page was written pointing to an incorrect location ?

So the bits were NOT twiddled between SQL and the Disk - e.g. by a faulty disk controller - (otherwise a checksum error would have arisen?)

So under what sort of hardware fault does this sort of scenario occur?

I'm wondering whether the exception errors that the Oracle debacle triggered may have mucked things up enough to induce that?

Apparently SQL is running on a virtual machine, the physical machine is home to all sorts of stuff. The IT people think that if there was a hardware error it would be likely to manifest itself all over the various virtual machine instances.

(Its a machine owned by a client which we use for a staging database to gather data from Oracle and pass it on to our SQL box, so I don't have much knowledge of the client's infrastructure; but I am keen to get them to sort out a hardware fault if I can pin it down enough for them to do so)

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-09-27 : 14:28:00

quote:
Originally posted by Kristen

So if I understand it correctly SQL has written a page with a pointer to an incorrect page. There was physical checksum error on that page, so a checksum error was not raised. Fair enough. But nonetheless that page was written pointing to an incorrect location ?

No, it's an allocation unit mess up. Page owned by incorrect table/index, not an incorrect page number. Not sure if it's page header wrong or allocation structure wrong or both.

--
Gail Shaw
SQL Server MVP

Kristen
Test

22859 Posts

Posted - 2011-09-27 : 14:38:57

Could it be something other than hardware fault do you think? (leaving aside remote possibility that I am the first person to stumble over a new bug in SQL Server)

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2011-09-27 : 14:43:33

I chatted with Paul. He said "Transient in-memory corruption from intermittently faulty memory chip"

That won't cause a page checksum error, cause the checksum is calculated at the point the page goes to disk.

--
Gail Shaw
SQL Server MVP

Kristen
Test

22859 Posts

Posted - 2011-09-27 : 19:52:42

Thanks very much Gail. I'll let you know what the IT folk find. The behaviour of the Oracle Driver is a bit strange, I've posted the details here:
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=165989

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources