SQL instance failing due tempdb space issues

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2012 Forums

SQL Server Administration (2012)

SQL instance failing due tempdb space issues

Author

Topic

sql-lover
Yak Posting Veteran

99 Posts

Posted - 2013-05-23 : 10:30:09

1st, I should clarify my instance is Cluster aware and running SQL2012.

The issue started or I noticed it, after few planned failovers due an scheduled Cluster's patch for an iSCSI / SAN issue. Everytime I try to move the whole SQL failover instance, all resources come online on the other node but the SQL one. It stays online for a few seconds and then go down again. Here's the SQL log error:


2013-05-22 17:42:40.72 spid11s     Error: 1205, Severity: 13, State: 35.
2013-05-22 17:42:40.72 spid11s     Transaction (Process ID 11) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
2013-05-22 17:42:40.72 spid11s     Could not create tempdb. You may not have enough disk space available. Free additional disk space by deleting other files on the tempdb drive and then restart SQL Server. Check for additional errors in the event log that may indicate why the tempdb files could not be initialized.
2013-05-22 17:42:40.72 spid11s     SQL Trace was stopped due to server shutdown. Trace ID = '1'. This is an informational message only; no user action is required.
2013-05-22 17:42:41.11 Logon       Error: 18456, Severity: 14, State: 38.

tempdb_data LUN is 250GB and it is currently using 20GB of that. tempdb_logs is 200GB and it is using 12GB only. I also kept an eye on those numbers for about a week and they remain the same. Upper limit was set to unlimited so if they have to growth after a restart, they can. So the lack of temdpb space makes no sense to me.

I checked instant file initialization and SQL service is added to the local policies there.

SQL 2012 is running on top of Windows 2008R2.

Any hint?

ahmeds08
Aged Yak Warrior

737 Posts

Posted - 2013-05-23 : 10:42:27

If tempdb is growing then there might be some big transaction running behind.Is there any data loading like heavy inserts,updates or deletes running?
Any Index rebuilds running??
If yes,you need to chunk them into small batches.

mohammad.javeed.ahmed@gmail.com

sql-lover
Yak Posting Veteran

99 Posts

Posted - 2013-05-23 : 10:47:35

quote:
Originally posted by ahmeds08

If tempdb is growing then there might be some big transaction running behind.Is there any data loading like heavy inserts,updates or deletes running?
Any Index rebuilds running??
If yes,you need to chunk them into small batches.

mohammad.javeed.ahmed@gmail.com

No data loading during the maintenance window, which is usually when I do the planned failover.

But we do have 500+ databases there. It is virtually impossible for me to stop any outstanding activity, if one. I do stop new client connectivity though, via web server and SQL connection string.

Also, the SQL failover should work no matter what. That's the purpose of the Cluster's setup. Of course, some transactions maybe lost (rollback or rolled forward) but the overall Cluster functionality should work. As a matter of fact, this is not my 1st Cluster setup but the 1st time I see this tempdb issue.

By the way, thanks for reply.

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2013-05-23 : 11:10:27

What do you see in the windows application event logs?

And what do you mean by "try to move the failover instance"? Do you mean failover?

Is tempdb path correct on passive node?

sql-lover
Yak Posting Veteran

99 Posts

Posted - 2013-05-23 : 11:44:04

quote:
Originally posted by russell

What do you see in the windows application event logs?

And what do you mean by "try to move the failover instance"? Do you mean failover?

Is tempdb path correct on passive node?

Hi there Russell,

Yes, I mean, failing over. Moving from active node to passive node.

Not sure if I understand your question about tempdb path. This is a Cluster aware instance, so the path is the same. tempdb has its own LUN and the tempdb disk or LUN moves to the other node fine. Of course, the internal SQL path is still the same.

SQL eventually goes up (without me changing anything) but I usually need to "kill" or reboot the passive node and/or keep forcing or bringing the SQL resource online. This us unacceptable and in case of a real crash, it will stay down.

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2013-05-23 : 12:33:21

What are you seeing in the event logs?

sql-lover
Yak Posting Veteran

99 Posts

Posted - 2013-05-23 : 16:10:28

quote:
Originally posted by russell

What are you seeing in the event logs?

This is what I can see on the system logs, 1 second after tempdb's error


The SQL Server service terminated with service-specific error The specified resource name cannot be found in the image file..

russell
Pyro-ma-ni-yak

5072 Posts

Posted - 2013-05-23 : 17:09:14

I would expect there would be more interesting info in the event logs than that.

Tried moving tempdb?

jackv
Master Smack Fu Yak Hacker

2179 Posts

Posted - 2013-05-24 : 01:54:41

Analyze the Tempdb activity , and develop some tactics for alleviating pressure - this post has some ideas
http://www.sqlserver-dba.com/2011/04/tempdb-performance-and-strategy-checklist.html

Jack Vamvas
--------------------
http://www.sqlserver-dba.com

sql-lover
Yak Posting Veteran

99 Posts

Posted - 2013-05-24 : 07:51:02

Well, I thought about that ... but only place would be "c" drive, which is local to the Proliant server. tempdb is on a SAN, RAID10 LUN.

I already saw tempdb activity. It is normall or low, does not exceed the upper limits.

This is a really weird problem that I have never seen before. Little or no errors on system log that do not help either.

jeffw8713
Aged Yak Warrior

819 Posts

Posted - 2013-06-07 : 13:19:07

How are the LUNS created? Are they iSCSI attached?

I have seen issues where the LUNS are not actually available before SQL Server tries to start and this causes a failure - very similar to what you are seeing. I would validate that SQL Server is dependent on all of the LUNS it uses - and that those LUNS are not available until they have actually been mounted to the server.

sql-lover
Yak Posting Veteran

99 Posts

Posted - 2013-06-21 : 16:08:47

quote:
Originally posted by jeffw8713

How are the LUNS created? Are they iSCSI attached?

I have seen issues where the LUNS are not actually available before SQL Server tries to start and this causes a failure - very similar to what you are seeing. I would validate that SQL Server is dependent on all of the LUNS it uses - and that those LUNS are not available until they have actually been mounted to the server.

Well, I do not think that's the case here, because the dependencies are fine. The disk resources go up before SQL server and SQL server depends of the Disk Resources (they go 1st)

I actually check those one by one and manually (disk resources) and they move back and forth without issues.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources