chain link mismatch - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

General SQL Server Forums

Data Corruption Issues

chain link mismatch

Author

Topic

amanzimdwini
Starting Member

7 Posts

Posted - 2005-09-19 : 21:31:51

I have been throwing these lovely errors on my db (approx 16Gb) all of a sudden - they might all be related:

1) Operations Manager Critical Error Alert
Severity: Critical Error
Status: New
Source: MSSQLSERVER
Name: Attempt to fetch logical page belongs to different object .
Description: Error: 605, Severity: 21, State: 1
Attempt to fetch logical page (4:1878339) in database 'NGProd' belongs to object 'claim_requests', not to object 'Assessment_'.

2) Event Description:
Error: 644, Severity: 21, State: 5
Could not find the index entry for RID '36303030318eefd3713d52f4a85df3b0b8a889904e4e608102703230303530393139'
in index page (3:271543), index ID 9, database 'NGProd'.

3) Event Description:
Error: 8908, Severity: 22, State: 6
Table error: Database ID 10, object ID 86095893, index ID 0. Chain linkage mismatch. (4:1880687)->next = (4:441596), but (4:441596)->prev = (4:441595)

dbcc checkdb gives me 0 errors & running the built-in integrity reports does not seem to find anything.

I have been told that my db has "cracks" in it, and that I need to export ALL of it and then re-import. Seems a bit too much work...

Anyone have any ideas on what I should first attempt to do? I don't even know how to fix these errors... seems that I only throw then when users are accessing specific records.

Any help would be most welcome!

Thanks,
k

ryanston
Microsoft SQL Server Product Team

89 Posts

Posted - 2005-09-19 : 22:28:03

quote:
Originally posted by amanzimdwini

I have been throwing these lovely errors on my db (approx 16Gb) all of a sudden - they might all be related:

1) Operations Manager Critical Error Alert
Severity: Critical Error
Status: New
Source: MSSQLSERVER
Name: Attempt to fetch logical page belongs to different object .
Description: Error: 605, Severity: 21, State: 1
Attempt to fetch logical page (4:1878339) in database 'NGProd' belongs to object 'claim_requests', not to object 'Assessment_'.

2) Event Description:
Error: 644, Severity: 21, State: 5
Could not find the index entry for RID '36303030318eefd3713d52f4a85df3b0b8a889904e4e608102703230303530393139'
in index page (3:271543), index ID 9, database 'NGProd'.

3) Event Description:
Error: 8908, Severity: 22, State: 6
Table error: Database ID 10, object ID 86095893, index ID 0. Chain linkage mismatch. (4:1880687)->next = (4:441596), but (4:441596)->prev = (4:441595)

dbcc checkdb gives me 0 errors & running the built-in integrity reports does not seem to find anything.

I have been told that my db has "cracks" in it, and that I need to export ALL of it and then re-import. Seems a bit too much work...

Anyone have any ideas on what I should first attempt to do? I don't even know how to fix these errors... seems that I only throw then when users are accessing specific records.

Any help would be most welcome!

Thanks,
k

The first thing you should do is check for hardware errors in the NT event log. After that, run your manufacturer's hardware diagnostics to ensure that you've got no latent hardware problems. Verify that you're up-to-date on the latest and greatest firmware revisions from your hardware manufacturer, too.

If possible, trigger a CHECKDB immediately - or as soon as you can - after the errors are reported by the user. You might be able to catch the problem sooner.

The errors you're seeing are indicative of corruption, but the fact that CHECKDB doesn't report it indicates that it's potentially in-memory only. This coudl be caused by problems with stale reads from a caching disk controller or possibly missed writes. I don't see anythign that might indicate a software bug, especially since the failures started "all of a sudden" -- I assume no change in workload or maintenance activity?

Do you have any 3rd party processes running in-process in SQL? I'm thinking specifically of OLE DB providers, COM components, extended stored procedures, etc. They can cause memory corruption like this...

Thanks, and let me know if I can help further.

----------------------
Ryan Stonecipher
Developer, Microsoft SQL Server Storage Engine, DBCC
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)

amanzimdwini
Starting Member

7 Posts

Posted - 2005-09-20 : 13:01:11

Ryan -
thanks for your info. I am currently the unwilling participant in a "pass-the-buck" game, with my hardware support stating that we have the "latest and greatest", and my software support stating that a db with dbcc checkdb returning 0 errors is "clean". I don't think they can both be right, but am not in a position to tell who might be wrong...

The hardware people claim that if index tables are corrupted I have a software problem; the software people are looking for bad RAID controllers (which I once had).

TWO questions for you (or anyone else out there)
a) is there a "more thorough" way to check for link mismatches / bad index tables etc in my db then dbcc checkdb (or are there option to that that I should know about but don't?) Like I said before, I am afraid that I have a db that has "old" problems in it that I am only now finding...
b) Any suggestions as to whom to call at MSFT SQL to have THEM have a look at my db?

Thanks for all your suggestions,
karl

ps - I am physically in Kirkland, so am problably quite close to you. My servers live in Seattle...
k

amanzimdwini
Starting Member

7 Posts

Posted - 2005-09-20 : 13:04:13

Ryan -
I forgot to tell you:
the box that we are running on is a SQL server AND a file server. However, the only SQL running on it is the one db I am worried about; there are no
"3rd party processes running in-process in SQL / OLE DB providers, COM components, extended stored procedures, etc."

karl

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2005-09-20 : 15:33:19

CHECKDB is the best there is - if that says your DB is clean, then its clean and you've got either transient h/w issues or a memory-only corruption.

Which build are you on?

Do you have any kind of support contract with MS? (From the table names and location I have an inkling who you may work for) If so, you should call into PSS and have them help you with this.

Thanks

Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)

amanzimdwini
Starting Member

7 Posts

Posted - 2005-09-21 : 00:55:44

Paul (and Ryan)
I just ran dbcc checkdb with no_infomsg (or similar) - hardware told me that I need to pass that 3 times without errors (which I just did). They are now doing something else...

Hardware talked to MSFT (we have a contract) and "tweaked" my system during the day. No errors thrown since then!

Build: 2000.080.0760.00

I am still wondering if there is a "deeper" way to check my db then "just" checkdb. Is it true that the only "real" way to look for deep problems is to export and re-import (and then look for errors??)

Thanks for the help (I doubt you *really* know who I work for: we are quite small)

karl

ryanston
Microsoft SQL Server Product Team

89 Posts

Posted - 2005-09-21 : 13:06:18

Good to know that the hardware changes have at least suppressed the problem temporarily -- hopefully it's fixed for good. As to your quesiton about deeper checks, the answer is no. CHECKDB is as deep as it gets. I don't think that export/import is ever the correct way to check for problems. Sometimes it's the only way to get your database back up and running after a corruption, but I'd never recommend it as a diagnostic "tool".

Remember that corruptions can manifest themselves only in-memory and never make it to disk. These are the real nasty ones to track down because they can be so intermittent. For example, at point A, a page is in memory and some random process comes along and writes to it when it shouldn't. After this, at point B, the buffer pool kicks the page out of the cache to make room for something else. If you run CHECKDB between points A and B, you might find corruption; if you run CHECKDB after point B, though, you'll get a clean image of the page, and we won't report corruption. This is because the guy who wrote to the page in A wasn't supposed to, and the buffer pool didn't ever flush the page to disk (it wasn't "dirty"), and when we read the page in after point B, it was clean. Yuck.

Other times, a controller that has caching enabled might be at fault. If we read a page from disk, and the controller has a cache, it's that controller's responsibility to return to us the most recent page that it has seen. If it gives us an old page though (a "stale" page), we can report corruption because it's older than the rest of the pages we've read. Later on, that caching controller might flush itself and the next time we read the page, we actually get it from disk, where it's right. Yuck again.

Typically, it takes the likes of the rockstars in our support team to help you track these problems down, and they have some runtime diagnostics they can help you turn on to get to the bottom of it. That's really beyond the scope of a forum like this one though.

More information than you wanted, probably :) Glad we could help.

----------------------
Ryan Stonecipher
Developer, Microsoft SQL Server Storage Engine, DBCC
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)

amanzimdwini
Starting Member

7 Posts

Posted - 2005-09-22 : 02:26:27

Ryan -
thanks for all the info. I have three more questions, and then I'll let you go:
1) Can I run checkdb on a db that is in use? I usually run it "after hours" when I can be the only owner, 'cause I seem to have found error during active use that vanish once the db is flushed and written to disk. Is that a correct assumption on my part? Or should I be worried if I find ANY error?
2) What would happen if I run ANY repair option on a db that does not have errors? Or that has errors "only" in memory but not yet on disk?
3) Last one: I am still wondering about the "deep" problems that I have been told checkdb would not find, like chain link mismatches... (does it find them?) What are the limits of checkdb, and how would I find that in principle? How about making a copy of my PROD and then scripting an export and import routine (at least on a TEST db?) Too much work?

Thanks for everything,
karl

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2005-09-22 : 15:25:05

quote:
Originally posted by amanzimdwini
[) Last one: I am still wondering about the "deep" problems that I have been told checkdb would not find, like chain link mismatches... (does it find them?) What are the limits of checkdb, and how would I find that in principle?

I'll let Ryan answer the others but I'll bite on the last question.

Who told you? They're wrong - tell them I said so and I wrote CHECKDB for SQL Server 2000. There are some pieces of data it doesn't check (like validity of statistics blobs, for instance) but it will find all structural problems with the database.

There are some limitations on what it can repair (e.g. PFS page header corruption, some system table data page corruption, a few pathalogical cases involving multiply-cross-linked IAM chains) but its only supposed to be a last resort if you don't have a backup.

Thanks

Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)

ryanston
Microsoft SQL Server Product Team

89 Posts

Posted - 2005-09-23 : 12:56:56

quote:
Originally posted by amanzimdwini

Ryan -
thanks for all the info. I have three more questions, and then I'll let you go:
1) Can I run checkdb on a db that is in use? I usually run it "after hours" when I can be the only owner, 'cause I seem to have found error during active use that vanish once the db is flushed and written to disk. Is that a correct assumption on my part? Or should I be worried if I find ANY error?

Yes, you can run on a database that's in use. Sometimes I have seen the log record "replay" logic that we use trigger spurious errors. You should always investigate ANY errors reported by CHECKDB.

quote:

2) What would happen if I run ANY repair option on a db that does not have errors? Or that has errors "only" in memory but not yet on disk?

If the database has no errors, then repair won't do anything. As for errors that are only in-memory, is that when you put the database into single-user mode (which is required to run repair), all buffers will be flushed to disk, and read back in; repair won't see any of the transient errors.

That said, don't rely on repair! Have a good backup strategy, and use your database & transaction log backups to recover from failure. You WILL lose data when CHECKDB requires REPAIR_ALLOW_DATA_LOSS. I'm fairly confident that this is a bad thing for most customers

quote:

3) Last one: I am still wondering about the "deep" problems that I have been told checkdb would not find, like chain link mismatches... (does it find them?) What are the limits of checkdb, and how would I find that in principle? How about making a copy of my PROD and then scripting an export and import routine (at least on a TEST db?) Too much work?

Thanks for everything,
karl

----------------------
Ryan Stonecipher
Developer, Microsoft SQL Server Storage Engine, DBCC
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)

amanzimdwini
Starting Member

7 Posts

Posted - 2005-09-30 : 17:29:24

Paul / Ryan
Here I am a week later with the following "resolution" according to my external IT support:
a) we had TWO identical tables in the db AND
b) one of them had NON ASCII characters in it.

After deleting the records with the non-ascii characters, things have been beautiful...

Is there any chance that that "resolution" is currect? Would checkdb find non-ascii characters?

Thanks,
karl

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2005-10-01 : 10:53:27

Sounds like total rubbish to me. Why would having two identical tables cause physical corruptions within one of the tables? And why would have non-ascii chars cause b-tree page linkage problems?

How were the records deleted?

Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)

abarringer
Starting Member

13 Posts

Posted - 2005-10-28 : 17:29:44

Hey guys,

I started having the same problem on one of my production machines at 3:03pm CDT today, immediately after doing unusually heavy write activities replication started failing with Error 644.

Interestingly DBCC CHECKDB reported 0 errors.

I did a DBCC DBREINDEX() on all the tables one at a time and retested until I found one that fixed the issue.

However, I'm greatly concerned it didn't fix the root issue. The hardware is about 3 months old, and has had absolutely no problems since putting it in production.

No other unusual things in any logs anywhere. Not sure what else to try?

Running Version:
Microsoft SQL Server 2000 - 8.00.2039 (Intel X86)
May 3 2005 23:18:38
Copyright (c) 1988-2003 Microsoft Corporation
Standard Edition on Windows NT 5.2 (Build 3790: Service Pack 1)

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2005-10-28 : 17:35:39

Please start a new thread.

Paul Randal
Dev Lead, Microsoft SQL Server Storage Engine
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)

WiredUK
Starting Member

1 Post

Posted - 2005-10-31 : 08:38:48

quote:
Originally posted by amanzimdwini
Hardware talked to MSFT (we have a contract) and "tweaked" my system during the day. No errors thrown since then!

Karl,

Any ideas what the "tweaks" were? We are experiencing similar issues.

8908 chain linkage errors, over 4 days, same DB, same table

CHECKDB reports no issues

No entries in event log or HP system logs

It seems to have quieted down for now, but I am worried that it could return...
David

amanzimdwini
Starting Member

7 Posts

Posted - 2005-10-31 : 12:11:55

[/quote]
Karl,

Any ideas what the "tweaks" were? We are experiencing similar issues.

8908 chain linkage errors, over 4 days, same DB, same table

CHECKDB reports no issues

No entries in event log or HP system logs

It seems to have quieted down for now, but I am worried that it could return...
David
[/quote]
I'll have my external IT have a look at this post & let you know what they did. Don't know how long it will take for them to get around to it, though... sorry.

karl

aa_ronnie
Starting Member

10 Posts

Posted - 2005-12-02 : 10:43:57

Dear Karl,

We are having exactly the same problem as number 1) error after the electric power was suddently off, we have about 80000 records in the last object. when I am doing query on this object with "where" cause, this error will appear, but this error won't appear if without "where" cause, at the same time when I scrolled down to the bottom records, the same error appears again.
Here is my contact mail: aa_ronnie@126.com

Please tell me how to solve this error, it is urgent! Thanks a lot!

RONNIE

paulrandal
Yak with Vast SQL Skills

899 Posts

Posted - 2005-12-02 : 12:28:05

quote:
Originally posted by aa_ronnie

Dear Karl,

We are having exactly the same problem as number 1) error after the electric power was suddently off, we have about 80000 records in the last object. when I am doing query on this object with "where" cause, this error will appear, but this error won't appear if without "where" cause, at the same time when I scrolled down to the bottom records, the same error appears again.
Here is my contact mail: aa_ronnie@126.com

Please tell me how to solve this error, it is urgent! Thanks a lot!

RONNIE

Please start a new thread, and include the results from 'DBCC CHECKDB (dbname) WITH ALL_ERRORMSGS, NO_INFOMSGS'. Do you have a disaster recovery strategy? (I'm guessing no otherwise you wouldn't be posting here). Do you have any kind of backup?

Regards

Paul Randal
Lead Program Manager, Microsoft SQL Server Storage Engine
(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.)

aa_ronnie
Starting Member

10 Posts

Posted - 2005-12-04 : 11:06:30

Dear Paul,

Yes, you are right, they were not doing a good job on disaster recovery strategy with closest backup back to Nov 21st. We create an alternated temporary table to hold the data and then create a new table in the TEST instance, it seems this problem gone away. We will do it on PROD instance on Monday.

Please give me more good recommends.

Best Regards

RONNIE

aa_ronnie
Starting Member

10 Posts

Posted - 2005-12-04 : 11:24:58

BTW, I am pretty new on SQL Server, would you please tell how to run DBCC CHECKDB? (I couldn't find this tool in the Server). After knowing it, I will try to get the "ALL_ERRORMSGS" & "NO_INFORMSGS" for your reference.

RONNIE

aa_ronnie
Starting Member

10 Posts

Posted - 2005-12-04 : 11:35:36

Or I just need to run this script like: "DBCC CHECKDB (dbname) WITH ALL_ERRORMSGS, NO_INFORMSGS"? Thanks for your help again. :-)

RONNIE

Next Page

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources