Author |
Topic |
Wang
Starting Member
48 Posts |
Posted - 2005-10-24 : 06:38:52
|
Hi, hope this is the right forum to post in.I found the following in the logs this morning (SLQ2k Sp3, Win2k)quote: ex_raise2 - No handler found for exception major 36 minor 24 severity 20 - Server terminating
Followed byquote: Problem creating stack dump file due to internal exception2005-10-23 16:52:23.71 spid2304 SQL Server Assertion: File: <p:\sql\ums\inc\umslist.h>, line=317 Failed Assertion = 'el->m_next == 0'.
Then a SQL Server restart. Not a nice thing to see first thing in the morning I'm sure you'll all agree.However I think this all started a second or so earlier: quote: Using 'dbghelp.dll' version '4.0.5'*Dump thread - spid = 65, PSS = 0x3643d228, EC = 0x3643d550
quote: SQL Server Assertion: File: <recbase.cpp>, line=1374 Failed Assertion = 'm_nVars > 0'.
quote: Error: 3624, Severity: 20, State: 1.
We see several of these earlier in the day, quite similar, caused by varying items of SQL.Finally in the same second as the ex_raise2 - on a different spid again to the others quote: Using 'dbghelp.dll' version '4.0.5'*Stack Dump being sent to m:\sql_data\log\SQLDump1702.txt
quote: SqlDumpExceptionHandler: Process 161 generated fatal exception c0000005 EXCEPTION_ACCESS_VIOLATION. SQL Server is terminating this process..
quote: Error: 0, Severity: 19, State: 0
Half a second later on spid0 (!) quote: Open of fault log m:\sql_data\log\exception.log failed.* 00A09029 Module(sqlservr+00609029) (CSubRuleRemoveSubqInFOJN::`vftable'+00000019)
If anyone has any ideas, please let me know - I've seen various posts pointing at corruption, but dbcc checkdb seems to be clear. |
|
Wang
Starting Member
48 Posts |
Posted - 2005-10-24 : 06:45:58
|
Sorry, to be clear the last messages in the log are:quote: 2005-10-23 16:52:23.57 spid0 Open of fault log m:\sql_data\log\exception.log failed.* 00A09029 Module(sqlservr+00609029) (CSubRuleRemoveSubqInFOJN::`vftable'+00000019)2005-10-23 16:52:23.71 spid2304 Problem creating stack dump file due to internal exception2005-10-23 16:52:23.71 spid2304 SQL Server Assertion: File: <p:\sql\ums\inc\umslist.h>, line=317 Failed Assertion = 'el->m_next == 0'.2005-10-23 16:52:23.71 spid2304 ex_raise2 - No handler found for exception major 36 minor 24 severity 20 - Server terminating
|
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2005-10-24 : 06:59:25
|
The UMS assert you received is an artifact of incomplete exception handling around errors received during IOs (in this case the 3624 error is complaining about the recbase corruption assert you saw). This has been fixed in SP4.This is all symptomatic of corruption. Can you run the following on all your databases and post any output you receive?DBCC CHECKDB (dbname) WITH ALL_ERRORMSGS, NO_INFOMSGSThanksPaul RandalDev Lead, Microsoft SQL Server Storage Engine(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
Wang
Starting Member
48 Posts |
Posted - 2005-10-24 : 07:24:58
|
Ok, thanks very much for the advice, I will do.Am I right in the belief that this will be a pretty heavy process for an online production database? - ie something best to schedule for a period of low activity, or might it be best to bite the bullet and run it now? Production db is about 140gig. |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2005-10-24 : 07:43:44
|
You need to bite the bullet and run it now before corruption gets any worse. Depending on your CPUs and IO capabilities, and what corruption may be present, I'm guessimg it'll take maybe 6 hours for your DB size. On a heavily loaded TPCC simulator in-house we've seen transaction throughput drop 20% during online CHECKDB on SQL2k, but YMMV.You should also look through the SQL Server errorlog and Windows event logs for evidence of h/w problems. Are all your h/w drivers/firmware up-to-date?Be prepared to have to use your backups to restore this DB on different h/w if the h/w is going bad. Do you have a sounds disaster recovery strategy?ThanksPaul RandalDev Lead, Microsoft SQL Server Storage Engine(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
Wang
Starting Member
48 Posts |
Posted - 2005-10-24 : 07:53:14
|
Thanks again for the input. Fortunately I have been reviewing the DR over the last week, I think we are pretty sound. |
|
|
Wang
Starting Member
48 Posts |
Posted - 2005-10-26 : 05:17:10
|
I now have the results from that: all dbcc's completed cleanly, no messages. Which is nice to come back to after my midweek weekend :) |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2005-10-26 : 12:20:58
|
That's good.. and bad. You did have corruption at the point those messages were raised so something's going wrong somewhere. My advice to:1) keep running regular CHECKDBs2) check all the firmware versions are up-to-date3) check all drivers are up-to-date4) look through the Windows event logs and SQL errorlogs for signs of IO problems5) if #4 proves fruitless, run IO diagnostics just to be sureCould have been a UFO but it doesn't hurt to do all these checks just to make sure.ThanksPaul RandalDev Lead, Microsoft SQL Server Storage Engine(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
Wang
Starting Member
48 Posts |
Posted - 2005-10-26 : 13:06:43
|
We are getting daily (or more) asserts (been going on longer than I've been around). Various reading has suggested it to be to do with either text columns, read uncommitted/nolock data movement, or some bizarre asp (eg reuse of recordsets).The IO concern worries me though, so I shall get the systems guys to look into it more thoroughly from their end.Cheers very much for the help.Richard |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2005-10-26 : 13:45:48
|
Can you post some examples of the asserts?Paul RandalDev Lead, Microsoft SQL Server Storage Engine(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
Wang
Starting Member
48 Posts |
Posted - 2005-10-27 : 07:31:29
|
quote: 2005-10-25 17:33:25.90 spid218 Using 'dbghelp.dll' version '4.0.5'*Dump thread - spid = 218, PSS = 0x3c4a7228, EC = 0x3c4a7550*Stack Dump being sent to m:\sql_data\log\SQLDump1705.txt2005-10-25 17:33:52.67 spid218 Stack Signature for the dump is 0xEB0A771A2005-10-25 17:33:52.68 spid218 SQL Server Assertion: File: <recbase.cpp>, line=1374 Failed Assertion = 'm_nVars > 0'.2005-10-25 17:33:52.89 spid218 Error: 3624, Severity: 20, State: 1.
quote: 2005-10-24 16:56:39.06 spid113 Using 'dbghelp.dll' version '4.0.5'*Dump thread - spid = 113, PSS = 0x316f5228, EC = 0x316f5550*Stack Dump being sent to m:\sql_data\log\SQLDump1703.txt2005-10-24 16:56:43.77 spid113 Stack Signature for the dump is 0xFBE12B8C2005-10-24 16:56:43.77 spid113 SQL Server Assertion: File: <p:\sql\ntdbms\storeng\drs\include\record.inl>, line=1447 Failed Assertion = 'm_SizeRec > 0 && m_SizeRec <= MAXDATAROW'.
Are the 2 typically seen.The first one is a long asp inline query that is being engineered out to a proc, the second is a (imho) massively more complex proc than it needs to be. There are others that assert sometimes, but these are the 2 core ones - the other asserts are almost always the same: nvars or maxdatarow.The 2 calls that generate these are some of the most commonly called code, I guess up to around 150 times a minute each perhaps. |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2005-10-27 : 13:48:59
|
ok - these two asserts are saying the records are corrupt. I think you should open a case with CSS to have them help you with this as we're not going to be able to debug this over a forum.ThanksPaul RandalDev Lead, Microsoft SQL Server Storage Engine(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
Wang
Starting Member
48 Posts |
Posted - 2005-10-28 : 05:37:06
|
Ok, cheers for the help.Rich |
|
|
|