Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

 All Forums
 General SQL Server Forums
 Data Corruption Issues
 Error 838 - Non DBA needs help again

Author  Topic 

k420
Starting Member

32 Posts

Posted - 2008-07-17 : 07:10:16
Hi all,

I'm a Web/VB/SQL developer for a small company where we run large SQL
Server 2000 databases but don't have luxury of being able to afford a
proper DBA so I get stuck with any database problems.

I recently had a problem with chain linkage mismatch errors which was dealt with in this post
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=105658

There's been no problems since until last night. I received an email from our event log watcher on the database server which stated the following:

Source: MSSQLSERVER

Date/Time: 16-Jul-2008 18:03:25

EntryType: Error

Category: Server

Message:
Error: 823, Severity: 24, State: 2
I/O error (bad page ID) detected during read at offset 0x00000271b0a000 in file 'e:\MSSQL\Data\dbname.mdf'.


So, I've run DBCC CHECKDB (dbname) WITH ALL_ERRORMSGS, NO_INFOMSGS against it and it doesn't come back with any problems.

Based on what I've read and the previous problem that I referred to, we've decided to fail over the cluster to the standby server and replace the server that seems to be dodgy.

Firstly, we haven't ordered the new server yet so before we do, is this a sound decision???

Secondly, when this happened last time I could tell that our back up was ok because when I ran DBCC CHECKDB (dbname) WITH ALL_ERRORMSGS, NO_INFOMSGS against it there were no errors. Given that the live one isn't reporting errors in this case, I don't expect the backup to report errors either so how do I know if its ok? I've only received the one event about this and the system seems to be working fine. Is there a chance that the error was caused by hardware alone and that there isn't anything up with the mdf and that I don't need to do a restore?

Thanks for your time in advance

Keith

GilaMonster
Master Smack Fu Yak Hacker

4507 Posts

Posted - 2008-07-17 : 14:40:41
823 and 824 are IO errors. Look at your drives, not at the server. Especially, in your case, check the mirroring of the drives, how it works and if it reported any errors.

Check the event log, see if you have anything from the IO subsystem around the same time, probably in the system event log

If CheckDB isn't seeing corruption, you're probably OK. I would be very wary though. Run checkDB on the other databases as well. Paranoia is sometimes good.


--
Gail Shaw
SQL Server MVP
Go to Top of Page

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2008-07-17 : 14:45:18
Agree with Gail.

My guess would be a transient IO problem - *maybe* a memory problem. Unfortunately you're on 2000 so the diagnostics capabalities aren't as good as 2005. Try running memory diagnostics too.

PS Bravo for having an error log watcher set up!

Paul Randal
SQL Server MVP, Managing Director, SQLskills.com
Go to Top of Page

k420
Starting Member

32 Posts

Posted - 2008-07-18 : 03:56:46
Thanks for the help guys. There wasn't anything in the system log around the same time as the SQL Server error but at around 3am on the same day there were about 50 entries to do with write I/O errors which were entered by the clustering software. We're definitely going ahead with replacing that server because its too risky trying to keep it in and diagnose and resolve the problem especially when its in a data centre 300 miles away.

I've run checkdb against all the databases (including the one which had the problem again) and everything has come back clean so we're just going to leave them as they are running on the fail over server.

The event log watcher is just a .NET service which took me about an hour to write. I've often found myself over the years writing programs in the areas I specialise in to help me with the areas that I don't ;-)

Cheers

Keith
Go to Top of Page
   

- Advertisement -