Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-12-02 : 04:59:12
|
Hi all,Sorry in advance for the lengthy post:I've got a weird situation that I can't figure out. Here's the situation.There are 3 separate stacks comprising of1 application server (running linux with our java application)1 SQL SERVER 2005 Database ServerThe hardware of the stacks is the same (HP Proliant servers)The operating systems and the sql server setup are the same. (same tempddb settings, RAID setup, same everything)On each of the database servers we have multiple databases (maybe 20 per stack) (1 for each of our clients). Each database has an associated SQL SERVER AGENT job set up that runs at 1AM every day which does various things to implement our business logic. (future dated changes to employee data etc). The databases have the same schemas but different data.At 1AM no-one is going to be using our systems barring special cases - the load on severs at that time is non-existent.Here's the problem.On one of the stacks (stack 3) we get sporadic deadlock issues with the sql server agent jobs. There seems to be no rhyme or reason for it - but the jobs all deadlock in step 1 which handles future dated data changes (for address / name / salary / whatever). We never get deadlocks on the other stacks even though everything is the same.Deadlock frequency seems to be random. We'll have days, or weeks, when there are no deadlocks and suddenly (like last night) 15 clients deadlocks during this step.What I've done so far:1) Enclosed the step in a TRY / CATCH block that attempts to email me the details of the deadlock using a VIEW. However, I don't get any indication of the object locked and the sql text I return always is just the sql of the TRY / CATCH block.2) partially rewritten step 1 to make is SLIGHTLY less horrible - it basically makes a cursor and derives dynamic sql updates from an approval event table. I'm made sure that the cursor is local.What I'm going to do:===================================================================Run profiler on the db server from 1 to 2. I didn't want to do this because of the randomness of the problem.Do you guys have any advice?Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
|
abeman
Starting Member
3 Posts |
Posted - 2009-12-02 : 06:12:57
|
i got one idea as we had the same sort of issue with our software.Have you looked at SIDS being re-used due to muti- threading in SQL server?Is there any chance an application is reusing the same SID , causing the deadlock? |
 |
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2009-12-02 : 07:02:39
|
try to staggger the servers to work at 1am, 2am & 3am.... to totally eliminate any possibility of cross-server interference.can you also stagger the database updates to eliminate cross-database links.when all else fails, grasp at straws....or post on the internet. |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-12-02 : 08:24:23
|
quote: Originally posted by abeman i got one idea as we had the same sort of issue with our software.Have you looked at SIDS being re-used due to muti- threading in SQL server?Is there any chance an application is reusing the same SID , causing the deadlock?
Cheers for the suggestion.I guess I'll see that in profiler tonight (even if there are no deadlocks).Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-12-02 : 08:26:46
|
quote: Originally posted by AndrewMurphy try to staggger the servers to work at 1am, 2am & 3am.... to totally eliminate any possibility of cross-server interference.can you also stagger the database updates to eliminate cross-database links.when all else fails, grasp at straws....or post on the internet. 
Yup -- at that stage!I was going to stagger the jobs if all else failed but 1am - 2am is our downtime window. The jobs can take some time if there happens to be a lot of data to change.That wouldn't explain why the problems all happen on only one box though.But yeah -- try everything!Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2009-12-02 : 08:40:15
|
Do these jobs operate on data only on their own server or do they all perform operations on the same set of data? If the former, staggering the job times won't help.Are the reads and writes occurring against the same tables or are you reading from one and writing to another?Is the cursor read only?Is the select statement in the cursor utilizing an index?Sure nothing else is happening at the same time on server that may be contributing to the deadlocks? |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-12-02 : 11:25:52
|
quote: Originally posted by russell Do these jobs operate on data only on their own server or do they all perform operations on the same set of data? If the former, staggering the job times won't help.Are the reads and writes occurring against the same tables or are you reading from one and writing to another?Is the cursor read only?Is the select statement in the cursor utilizing an index?Sure nothing else is happening at the same time on server that may be contributing to the deadlocks?
Cheers Russel,The jobs all operate on their own data. Each database has an [approvalEventRow] table that keeps track of changes local to that database. The changes are things like name changes, salary changes, grade changes, etc. Once the effective date has been reached then the job generates UPDATE statements to change the relevant data in the real tables.The UPDATES are blind. The real tables aren't read from just updated by the statements. In general only a few tables will ever be touched. the [Employee] table, the [EmployeeCompensation] table and the [employeeDependant] table.The loop logic is a nested CURSOR (urgh I know) The CURSORS are-- Outside loopDECLARE eventCursor CURSOR LOCAL READ_ONLY FOR SELECT [id] , [employeeId] , [effectiveDate]FROM ApprovalEventWHERE [status] IN ('P', 'A') AND [effectiveDate] < @tomorrow AND [companyId] = @companyId AND ( [employeeID] = @employeeID OR @employeeID IS NULL )-- Inside loopDECLARE eventRowCursor CURSOR LOCAL READ_ONLY FOR SELECT [tableName] , [idValue] , [columnName] , [newValue] , [oldValue] FROM ApprovalEventRow WHERE [eventId] = @eventId ORDER BY [tableName] , [idValue] There are a mess of indices on the tables. Over the years developers have tended to throw an index on one or more columns whenever they remembered that indices existed.However, the deadlock can happen even when the outer cursor returns no rows (there's no event to be processed).I've modified these cursors (making them LOCAL) in case that was the problem. Before they were just declared with no additional flags - it hasn't seemed to make much difference  quote: Sure nothing else is happening at the same time on server that may be contributing to the deadlocks?
99% sure. I'm going to set up a profiler task to run between 1 and 2 am and see what's what.Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
russell
Pyro-ma-ni-yak
5072 Posts |
Posted - 2009-12-02 : 13:04:25
|
where do the updates occur in all this? also should check the exec plan on those queries.profiler may reveal the problem tonight |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2009-12-02 : 13:56:42
|
I haven't read the whole thread, but you may want to consider using READ_COMMITTED_SNAPSHOT isolation level to avoid deadlocks between writes and reads. If your deadlocks are between writes and writes, that setting will not help. But it is the recommended isolation level for OLTP system, so it is worth it to change it anyway. The default isolation level is READ_COMMITTED, which causes performance issues with reads blocking writes.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog"Let's begin with the premise that everything you've done up until this point is wrong." |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-12-03 : 04:11:02
|
Thanks for the suggestion Tara - that's something I can set on the individual batch isn't it?SET TRANSACTION ISOLATION LEVEL READ_COMMITTED_SNAPSHOT......<stuff> The only problem is that testing such changes is really hard when the problem is so intermittent. I'll change the isolation level and see though.Russell -- sadly no deadlocks happened last night . Even though nothing is different from yesterday (that I can see)...I'll keep on tracing.Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
AndrewMurphy
Master Smack Fu Yak Hacker
2916 Posts |
Posted - 2009-12-03 : 05:06:56
|
1. whats different is the portion of the database that is updated each night. bacause of the future dated transactions I suspect different employees are beign updated each night. any chance some employees are having simialr updtes on problem nights.2. can you load a "problem" night into a test environment and re-run there? you should get the same results if the environment & machine power is the same |
 |
|
spirit1
Cybernetic Yak Master
11752 Posts |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-12-03 : 07:54:29
|
quote: Originally posted by spirit1 here's a method that will tell you about deadlocks and give yo uthe whole report without you having to run profiler and strain the system. it can even send you an email.http://weblogs.sqlteam.com/mladenp/archive/2008/07/18/Immediate-deadlock-notifications-without-changing-existing-code.aspxnote that you have to enable service broker on the monitored database___________________________________________________________________________Causing trouble since 1980Blog: http://weblogs.sqlteam.com/mladenpSpeed up SSMS development: www.ssmstoolspack.com <- version 1.7 out!
Hi Spirit1Yeah -- I did look at that and wasn't allowed to enable service broker . I did try using a TRY CATCH method and do get mails but unfortunately they aren't very helpful as they just contain the sql text of the stored proc that makes up the email Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-12-03 : 07:59:28
|
quote: Originally posted by AndrewMurphy 1. whats different is the portion of the database that is updated each night. bacause of the future dated transactions I suspect different employees are beign updated each night. any chance some employees are having simialr updtes on problem nights.2. can you load a "problem" night into a test environment and re-run there? you should get the same results if the environment & machine power is the same
Hi Andrew.Sadly that doesn't work.I've tested getting a database backup from just before (backup jobs run at 2200 -- nothing happened between then and 1AM)I then restored the backup onto the same database server (with a different name) and added the overnight job. Then if I run the overnight job on that database it works fine. no deadlocks.A deadlock can occur even if there are no changes to be made . It doesn't seem to matter if there are valid changes or not. That's why I thought there must have been some other process running at 1am but I can't find anything.Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-12-03 : 08:06:58
|
Hi Spirit1. Can you advice if there is a glaring error in my TRY CATCH block and associated mailing sp's?Here's the sql jobDECLARE @Body VARCHAR(4000) , @From VARCHAR(50) , @To VARCHAR(50) , @Subject VARCHAR(100) , @companyId INT , @tomorrow DATETIME , @error INT , @message XML , @ErrorNumber INT , @ErrorMessage NVARCHAR(2048) , @ErrorSeverity INT , @errorState INT , @MailBody NVARCHAR(MAX)SET @companyId = 3SET @tomorrow = CONVERT(VARCHAR(50), DATEADD(DAY, 1, GETDATE()), 101)EXEC logStartFlexJobAudit @companyId, 'adminEffectEvents', 1BEGIN TRANBEGIN TRYEXEC @error = adminEffectEvents @companyId, @tomorrowCOMMITIF @error <> 0 BEGIN SELECT @Body = 'The overnight process has failed during the adminEffectEvents step, with return code '+CONVERT(VARCHAR(50),@error) SELECT @From = '<REMOVED>' SELECT @To = '<REMOVED>' SELECT @Subject = 'SQLMail: Overnight Job Failure - [' + @@SERVERNAME + '] ' + DB_NAME() EXEC master..sp_send_cdosysmail @From, @To, @Subject, @Body ENDEND TRYBEGIN CATCH SELECT @error = ERROR_NUMBER() , @errorMessage = ERROR_MESSAGE() , @ErrorSeverity = ERROR_SEVERITY() , @ErrorState = ERROR_STATE() IF @error = 1205 BEGIN SET @subject = 'OVERNIGHT DEADLOCK ON ' + @@SERVERNAME + '.' + DB_NAME() EXEC master.dbo.lockMailer @subject , '<REMOVED>' END RAISERROR(@errorMessage, @errorSeverity, @errorState) ROLLBACK RETURNEND CATCHEXEC logCompleteFlexJobAudit @companyId, 'adminEffectEvents', @error, 1 Here's the lockMailerALTER PROCEDURE [dbo].[lockMailer] @subject NVARCHAR(255) , @recipient NVARCHAR(255)AS BEGIN DECLARE @spid INT , @dbname NVARCHAR(255) , @lockedObjNAme NVARCHAR(512) , @sqlText NVARCHAR(MAX) , @loginName NVARCHAR(512) , @hostName NVARCHAR(512) , @isUserTran BIT , @transName NVARCHAR(512) , @msgSubject VARCHAR(2000) , @msgBody VARCHAR(8000) , @uniqueBatch UNIQUEIDENTIFIER , @from VARCHAR(255) SET @uniqueBatch = NEWID() SET @from = '<REMOVED>' DECLARE lockCursor CURSOR LOCAL FOR SELECT ISNULL([SPID], -1) , ISNULL([DatabaseName], 'NULL') , ISNULL([LockedObjectName], 'NULL') , ISNULL([SqlStatementText], 'NULL') , ISNULL([LoginName], 'NULL') , ISNULL([HostName], 'NULL') , ISNULL([IsUserTransaction], 1) , ISNULL([TransactionName], 'NULL') FROM master.dbo.vLocks ORDER BY [spid] OPEN lockCursor FETCH NEXT FROM lockCursor INTO @spid , @dbname , @lockedObjNAme , @sqlText , @loginName , @hostName , @isUserTran , @transName WHILE ( @@FETCH_STATUS = 0 ) BEGIN SET @msgSubject = @subject + ' (BATCH: ' + CAST(@uniqueBatch AS VARCHAR(64)) + ') (SPID: ' + CAST(@spid AS NVARCHAR(255)) + ')' SET @msgBody = '<body><p><table><tr><td>SPID: ' + CAST(@spid AS VARCHAR(255)) + '</td></tr><tr><td>DBNAME: ' + @dbName + '</td></tr><tr><td>LOCKED: ' + @lockedObjName + '</td></tr><tr><td>HOST: ' + @hostName + '</td></tr><tr><td>USER TRAN: ' + CASE @isUserTran WHEN 1 THEN 'YES' ELSE 'NO' END + '</td></tr><tr><td>TRAN NAME: ' + @transName + '</td></tr><tr><td>SQL: ' + CAST(@sqlText AS VARCHAR(1000)) + '</td></tr></table></p></body>' -- Send the message EXEC master.dbo.sp_send_cdosysmail @from , @recipient , @msgSubject , @msgBody FETCH NEXT FROM lockCursor INTO @spid , @dbname , @lockedObjNAme , @sqlText , @loginName , @hostName , @isUserTran , @transName END CLOSE lockCursor DEALLOCATE lockCursorEND And finally the viewUSE [master]GO/****** Object: View [dbo].[vLocks] Script Date: 12/03/2009 13:07:56 ******/SET ANSI_NULLS ONGOSET QUOTED_IDENTIFIER ONGOALTER VIEW [dbo].[vLocks] AS SELECT L.[request_session_id] AS SPID , DB_NAME(L.[resource_database_id]) AS DatabaseName , O.[Name] AS LockedObjectName , P.[object_id] AS LockedObjectId , L.[resource_type] AS LockedResource , L.[request_mode] AS LockType , ST.[text] AS SqlStatementText , ES.[login_name] AS LoginName , ES.[host_name] AS HostName , TST.[is_user_transaction] AS IsUserTransaction , AT.[name] AS TransactionNameFROM sys.dm_tran_locks L LEFT JOIN sys.partitions P ON P.[hobt_id] = L.[resource_associated_entity_id] LEFT JOIN sys.objects O ON O.[object_id] = P.[object_id] LEFT JOIN sys.dm_exec_sessions ES ON ES.[session_id] = L.[request_session_id] LEFT JOIN sys.dm_tran_session_transactions TST ON ES.[session_id] = TST.[session_id] LEFT JOIN sys.dm_tran_active_transactions AT ON TST.[transaction_id] = AT.[transaction_id] LEFT JOIN sys.dm_exec_requests ER ON AT.[transaction_id] = ER.[transaction_id] CROSS APPLY sys.dm_exec_sql_text(ER.[sql_handle]) AS STWHERE resource_database_id = db_id() With this I do get multiple mails but the sqltext is always the text of dbo.lockmailer which isn't helpful!Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2009-12-03 : 12:23:51
|
quote: Originally posted by Transact Charlie Thanks for the suggestion Tara - that's something I can set on the individual batch isn't it?SET TRANSACTION ISOLATION LEVEL READ_COMMITTED_SNAPSHOT......<stuff> The only problem is that testing such changes is really hard when the problem is so intermittent. I'll change the isolation level and see though.
I recommend doing this at the database level and not at the query level. It does require downtime though since the database needs exclusive access during the change. You can make the change via ALTER DATABASE.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog"Let's begin with the premise that everything you've done up until this point is wrong." |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-12-04 : 04:27:11
|
quote: Originally posted by tkizer
quote: Originally posted by Transact Charlie Thanks for the suggestion Tara - that's something I can set on the individual batch isn't it?SET TRANSACTION ISOLATION LEVEL READ_COMMITTED_SNAPSHOT......<stuff> The only problem is that testing such changes is really hard when the problem is so intermittent. I'll change the isolation level and see though.
I recommend doing this at the database level and not at the query level. It does require downtime though since the database needs exclusive access during the change. You can make the change via ALTER DATABASE.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog"Let's begin with the premise that everything you've done up until this point is wrong."
Cheers for the advice.Can you advise if setting the isolation level for the batch would allow me to test this rather than jumping on and changing the default isolation level?Changing the isolation level settings on the database proper is something that I'd have to argue for. I'm not a DBA (we don't really have one) - just the guy that has to deal with stuff like this. I'd have to argue the case with out infrastructure team and they will (quite rightly) be a little concerned about this for the downtime alone.Looking at the docs for 2005 it says that "READ_COMMITED_SNAPSHOT database option is set OFF, which is the default setting, read committed isolation behaves as it did in earlier versions of SQL Server. Both implementations meet the ANSI definition of read committed isolation."I take it that means that if a certain block of code has no problems at the current isolation level (READ_COMMITED) then it will * definitely* still have no problems at (READ_COMMITED_SNAPSHOT)?I'm not doubting your advice at all. I'd just like to know that it will help before arguing for such a change.Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2009-12-04 : 10:58:32
|
I would assume you could test it at the query level, but I've never tried that. Just to repeat what I earlier said, switching the isolation level will only help if your deadlocks are between reads and writes. If they are between writes and writes, then it will not help. Have you identified the deadlock victim and owner via SQL Profiler or the trace flag?We haven't had any issues with READ_COMMITTED_SNAPSHOT. We are running it on very mission critical systems with a heavy load (3000 queries per second). The recommendation to switch from the default to READ_COMMITTED_SNAPSHOT is actually being made by Microsoft. Each time I have opened a case with them regarding deadlocks and performance issues, their first question is always if we are running READ_COMMITTED_SNAPSHOT. After a while, we got the hint and start switching to it. The downtime needed is only a few seconds.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog"Let's begin with the premise that everything you've done up until this point is wrong." |
 |
|
Transact Charlie
Master Smack Fu Yak Hacker
3451 Posts |
Posted - 2009-12-04 : 12:18:24
|
quote: I would assume you could test it at the query level, but I've never tried that. Just to repeat what I earlier said, switching the isolation level will only help if your deadlocks are between reads and writes. If they are between writes and writes, then it will not help. Have you identified the deadlock victim and owner via SQL Profiler or the trace flag?
Sadly (or not!) we haven't had any deadlocks since I started profiling. I've got no clue what's causing the deadlock or even which object is the point of contention :(What's the trace flag you mention?I've set the isolation level for the batch but obviously is is hard to prove a negative.I'll get us switched into READ_COMMITTED_SNAPSHOT on our client testing environment soon and then hopefully make the switch to live after a week or so.BTW -- is there any reason why the documentation for SQL server 2005 has bothREAD_COMMITED_SNAPSHOTandREAD_COMMITTED_SNAPSHOTas options?is COMMITED an American spelling?Charlie===============================================================Msg 3903, Level 16, State 1, Line 1736The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION |
 |
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2009-12-04 : 12:23:53
|
The trace flag is 1222. You'll get deadlock output in the Error Log if you enable it that way you don't need to constantly run a trace. DBCC TRACEON(1222, -1)As far as I know, the command to run to switch the isolation level is:ALTER DATABASE dbName SET READ_COMMITTED_SNAPSHOT ONTo see what databases have it, you check the is_read_committed_snapshot_on column in sys.databases.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog"Let's begin with the premise that everything you've done up until this point is wrong." |
 |
|
|
|
|
|
|