Author |
Topic |
getanothername
Starting Member
4 Posts |
Posted - 2008-07-28 : 16:47:18
|
Here's the situation, as it's now happened twice, and I've got a CTO yelling for a "valid" explanation:Running windows server 2003, SQL 2000 Standard, attached to a san, running raid 5Hard drive fails, raid kicks in and rebuilds using the hot spare. The issue comes in that several (10 out of 200+) databases have ended up with index or torn page corruption.Can anyone explain how, with the use of raid 5, any kind of corruption is being introduced? There is even corruption in a database that had zero activity at the time.Thanks in advance for your help. |
|
Haywood
Posting Yak Master
221 Posts |
Posted - 2008-07-28 : 18:07:07
|
You should really be getting your SAN vendor involved. It appears that they/it are failing in thier design/implimentation of your hardware...Have the CTO yell at them for why thier hardware/software is not _really_ working as it should.Your friendly High-Tech Janitor:http://grayburn.wordpress.com |
|
|
getanothername
Starting Member
4 Posts |
Posted - 2008-07-29 : 07:30:26
|
Sorry, I mis-spoke in regards to the san. It's actually a direct-attached-storage device, I believe from Dell. |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2008-07-29 : 11:43:00
|
Check the server's error log, check the raid controller error logs (if they exist). The only thing I can guess is a faulty raid controller, or maybe more than one of the drives in the array is faulty.This isn't a SQL problem. Torn pages are issues in the IO system, not within the database engine.--Gail ShawSQL Server MVP |
|
|
surya_rakanta
Starting Member
19 Posts |
Posted - 2008-07-30 : 06:18:43
|
This isn't a SQL problem. Torn pages are issues in the IO system, not within the database engine.---But at which offset the torn page occur ?Eka Siswanto------------SQL Server 2000 Adept :) |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2008-07-30 : 09:44:56
|
quote: Originally posted by surya_rakanta This isn't a SQL problem. Torn pages are issues in the IO system, not within the database engine.---But at which offset the torn page occur ?
Why are you interested?--Gail ShawSQL Server MVP |
|
|
surya_rakanta
Starting Member
19 Posts |
Posted - 2008-07-30 : 20:13:04
|
---But at which offset the torn page occur ?Why are you interested?---Well, by using the the last operation information that is already done, or about to be done on that page (by viewing the value of m_lsn value, and checks this out against the log file), at least we can glance at the cause of the problem.Or what is the error log says about this ?--Eka S.http://ekasiswanto.wordpress.com |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2008-07-31 : 04:46:33
|
The last operation done on the page is unlikely to have any to do with how it became torn. The best you could do by looking at the last modification LSN (assuming it's intact) is getting a latest date where the page was known to be intact. That depends on having all the log entries available to compare the LSN with.The definition of a torn page is that it was written to disk and the write did not succeed completely, or the page was modified or damaged on disk some time after it was written--Gail ShawSQL Server MVP |
|
|
surya_rakanta
Starting Member
19 Posts |
Posted - 2008-07-31 : 05:18:34
|
That depends on having all the log entries available to compare the LSN with.---------In this case, assuming the log file is not also corrupted, and assuming the write-ahead theory to log file is dutifully done, which is a 50-50 chance, then I should look at the latest LSN value for that particular page id (that have torn page error in it).RgdsEka S.God didn't do this, Anna, WE did.— Robert Neville in I am Legend. |
|
|
GilaMonster
Master Smack Fu Yak Hacker
4507 Posts |
Posted - 2008-07-31 : 15:26:48
|
Ok. And that tells me what? So I find that my torn page has a last-modified LSN that, after a great deal of reading through log backups, I identify as coming from a transaction done three weeks ago. All that tells me is that the page was intact three weeks ago. It may have been torn during that write, or it may have been damaged some time afterwards. Why do you say that write-ahead logging is 50-50? The log record is always written before the data page may be hardened to disk.--Gail ShawSQL Server MVP |
|
|
surya_rakanta
Starting Member
19 Posts |
Posted - 2008-07-31 : 20:31:39
|
after a great deal of reading through log backups-----------In this context, I'm not referring to log backup, just read the current log that I have seconds after the database is crashed. I'm also not referring to the last m_lsn in that page, those value may be the latest, but also may be about to be overwritten.What I'm trying to convey is the latest LSN value that is existence in the CURRENT log file for that page id. Compare it with the m_lsn in the damaged page to decide which of these lsn is the most up-to-date.Well, there are some cases I found that doesn't guarantee that the records always resides in log file (more on this in my website). Write-ahead is excellent, but this is by assuming that SQL server is not aborted/kicked by OS in the even of hardware failure. Hence, there are 50-50 chance that the log file is also in corrupted state. I do not know whether there are any method to checks the consistencies of the log file though :)RgdsEka S.God didn't do this, Anna, WE did.— Robert Neville in I am Legend. |
|
|
|