Author |
Topic |
brads9999
Starting Member
6 Posts |
Posted - 2007-03-28 : 12:33:22
|
I'm in the process of trying to cutover SLQ 2000 database activity to a newer used server running Windows 2003. All the latest Win & SQL Service Packs through late Feb 2007 have been installed. The server has a Promise Fast Track S150 TX2plus SCSI raid with 2 80 gb drives. I have been using at as database development system for 3 months without any issues.This morning I restored the latest full backup and 2 intervening transaction log backups. I am the only user on the system today. I ran DBDCC CHECKDB WITH TABLERESULTS 5 times over a 45 minute period and received 5 different results!Run 1 - No ErrorRun 2 - Following Errors:2511 16 2 Table error: Object ID 754101727, Index ID 0. Keys out of order on page (1:631938), slots 40 and 41. repair_rebuild8990 10 1 CHECKDB found 0 allocation errors and 1 consistency errors in table 'DatabaseEventLog' (object ID 754101727).8928 16 1 Object ID 1218103380, index ID 255: Page (1:373888) could not be processed. See other errors for details. repair_allow_data_loss8990 10 1 CHECKDB found 0 allocation errors and 1 consistency errors in table 'WeatherTransaction' (object ID 1218103380).Run 3 - Following Errors:2511 16 2 Table error: Object ID 754101727, Index ID 0. Keys out of order on page (1:631938), slots 40 and 41. repair_rebuild8990 10 1 CHECKDB found 0 allocation errors and 1 consistency errors in table 'DatabaseEventLog' (object ID 754101727).8928 16 1 Object ID 1218103380, index ID 255: Page (1:373888) could not be processed. See other errors for details. repair_allow_data_loss8990 10 1 CHECKDB found 0 allocation errors and 1 consistency errors in table 'WeatherTransaction' (object ID 1218103380).Run 4 - Following Errors:2511 16 2 Table error: Object ID 754101727, Index ID 0. Keys out of order on page (1:631938), slots 40 and 41. repair_rebuild8990 10 1 CHECKDB found 0 allocation errors and 1 consistency errors in table 'DatabaseEventLog' (object ID 754101727).8928 16 1 Object ID 1218103380, index ID 255: Page (1:373888) could not be processed. See other errors for details. repair_allow_data_loss8941 16 102 Table error: Object ID 1218103380, index ID 255, page (1:373888). Test (sorted [i].offset >= PAGEHEADSIZE) failed. Slot 8, offset 0x2 is invalid. repair_allow_data_loss8942 16 101 Table error: Object ID 1218103380, index ID 255, page (1:373888). Test (sorted[i].offset >= max) failed. Slot 0, offset 0x8 overlaps with the prior row. repair_allow_data_loss8990 10 1 CHECKDB found 0 allocation errors and 3 consistency errors in table 'WeatherTransaction' (object ID 1218103380).Run 5 - No Error!I ran several of the obligatory CHKDSK and had no errors:c:\>chkdskThe type of the file system is NTFS.WARNING! F parameter not specified.Running CHKDSK in read-only mode.CHKDSK is verifying files (stage 1 of 3)...File verification completed.CHKDSK is verifying indexes (stage 2 of 3)...Index verification completed.CHKDSK is verifying security descriptors (stage 3 of 3)...Security descriptor verification completed. 78116030 KB total disk space. 45291176 KB in 33231 files. 10496 KB in 3135 indexes. 0 KB in bad sectors. 121086 KB in use by the system. 65536 KB occupied by the log file. 32693272 KB available on disk. 4096 bytes in each allocation unit. 19529007 total allocation units on disk. 8173318 allocation units available on disk.Are these DBDCC CHECKDB WTIH TABLERESULTS differences normal or is this a bug or hardware issue? Any advice you can offer would be appreciated.Thanks! Brad |
|
Kristen
Test
22859 Posts |
Posted - 2007-04-02 : 03:41:50
|
I believe that WITH TABLERESULTS is not documented ... are you inserting the results into the database you are testing? If so maybe try inserting into a different database instead? </GraspingAtStraws>Kristen |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2007-04-03 : 11:14:51
|
Kristen - running tableresults will have no effect on the actual corruptions returned - just how they're reported.Hi Brad - you didn't get 5 different sets of results. Run 1 doesn't count - as it had no corruptions, which is the normal case. Runs 2 and 3 have the exact same results. Run 4 has the same results as 2 & 3 but with two additional corruptions on the same page - they were probably not reported during runs 2 & 3 as there were active updates to the text page (index ID 255 stores the text for an object) while CHECKDB was running.Run 5 is the interesting one. Is the system live and undergoing I/U/D activity? If so, my guess is that the corruptions were removed by concurrent activity. Or when you say you're the only user, are you saying the database was *completely* quiescent with no active transaction log?ThanksPaul RandalPrincipal Lead Program Manager, Microsoft SQL Server Core Storage Engine(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)http://blogs.msdn.com/sqlserverstorageengine/default.aspx |
|
|
brads9999
Starting Member
6 Posts |
Posted - 2007-04-03 : 12:08:21
|
Paul - This server has had NO I/U/D activity on the database after the backup and transaction logs were restored. I have only run the DBCC CHECKDB queries against it to verify everything went ok after the restores. When I got the first error, I aborted the cutover to making it the live server - fortunately!By the way, I have run this torture test script more a couple of times before capturing the results and posting then in this log. The results vary on each run.I'm convinced it's a hardware issue. I have an identical server that I have since performed the same restore steps on and ran the DBDCC CHECKDB torture test multiple times with perfect results - no error. So we are bringing that server to our hosting facility today.The thing that really, really concerns me is how else do you detect that you have hardware error in this situation? No complaints in the system logs; no complaints by CHKDSK. If I only ran the DBCC CHECKDB test once with 0 errors, and slapped this puppy into production, I believe I would be in serious trouble right now.Any insight you can give would be appreciated. If there are other diagnostics I can run, I will be happy to. Thanks! Brad |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2007-04-03 : 13:07:10
|
On SQL 2000 the options are limited - hardware diagnostics and using IOStress to test the IO subsystem. Are there any disk errors in the NT event logs?On SQL 2005 you can use page checksums to detect when the IO subsystem is changing page contents underneath SQL Server and also SQLIOSim to do more comprehensive testing of the whole IO subsystem's capabilities. There are more details in the two SQL IO whitepapers:2000 IO: http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIObasics.mspx2005 IO: http://www.microsoft.com/technet/prodtechnol/sql/2005/iobasics.mspxThanksPaul RandalPrincipal Lead Program Manager, Microsoft SQL Server Core Storage Engine(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)http://blogs.msdn.com/sqlserverstorageengine/default.aspx |
|
|
Kristen
Test
22859 Posts |
Posted - 2007-04-03 : 13:46:41
|
"The thing that really, really concerns me is how else do you detect that you have hardware error in this situation?"I'm fond of running a memory test on servers destined for 24/7 operation for at least 48 hours. We use:www.memtest86.comDoesn't help with disk controllers though!Kristen |
|
|
brads9999
Starting Member
6 Posts |
Posted - 2007-04-05 : 12:25:04
|
To All - I ran the anaylze component of the disk defragmenter and found that the disk drive was severely fragmented - lots of large red areas all over.I ran DBDCC CHECKDB several times and it still exhibited the inconsistent errors. Keep in mind that there have NO I/U/D's done on this server since I orginally restored tha database on 3/28/07.I ran the defragementer a couple of time to defrag and condense the data.Now for the best part - I re-ran my DBCC CHECKDB 25 times in a row, saving the results to a table I created. NO ERRORS! NONE!I just can't believe that disk fragmentation can cause database consistency errors, unless the RAID 1 controller is doing some magic under the covers and returning the wrong result, or not returning the result quick enough, so you end up with errors like: Table error: Object ID 754101727, Index ID 0. Keys out of order on page (1:631938), slots 40 and 41.a Object ID 1218103380, index ID 255: Page (1:373888) could not be processed. See other errors for details. Table error: Object ID 1218103380, index ID 255, page (1:373888). Test (sorted [i].offset >= PAGEHEADSIZE) failed. Slot 8, offset 0x2 is invalid.I still don't trust this server. We haven't defragged our current server in the past 3 years and we have never had this type of odd consistency error.Any additonal thoughts you have would be appreciated. Thanks Brad |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2007-04-05 : 17:50:24
|
Given that the problems went away when you changed the disk file fragmentation, I can only think of a h/w problem returning a bad page. There's nothing inside CHECKDB that could possibly be affected by physical file layout. I wouldn't trust that IO subsystem any more.Paul RandalPrincipal Lead Program Manager, Microsoft SQL Server Core Storage Engine(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)http://blogs.msdn.com/sqlserverstorageengine/default.aspx |
|
|
brads9999
Starting Member
6 Posts |
Posted - 2007-04-05 : 23:39:23
|
Paul - Thank you for your reply. I don't trust this server either. I dropped by our colocation site tonight and started the recommended memory test for yuks. I have a duplicate server we are stress testing at the office now. If all goes well, I plan to install that on Tuesday and swap out the problem one.I will post another reply, if there are any new developments. Thank you for your time and advice. It is appreciated. Brad |
|
|
brads9999
Starting Member
6 Posts |
Posted - 2007-05-03 : 15:41:30
|
Paul - I just wanted to close the loop on this issue. I prepared another server of the same make and manufacture as the one with the data corruption. With the lessons learned from my previous attempts, I did a lot more stress testing and dry runs without incident.I cut over the new server this morning and so far, so good. Thank you for your help and epertise. Brad |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2007-05-03 : 16:07:34
|
No problem. Let me know if anything else happens. Cheers.Paul RandalPrincipal Lead Program Manager, Microsoft SQL Server Core Storage Engine(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)http://blogs.msdn.com/sqlserverstorageengine/default.aspx |
|
|
|