Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
sgoodwi3
Starting Member
6 Posts |
Posted - 2007-02-13 : 09:37:07
|
We are hitting a crippling 701 "insufficient System Memory" error intermittently in out production environment. I haven’t gotten anywhere with PSS in two weeks. The error has occurred 4 times over the past two weeks, crippling our SQL server and application each time. When the error occurs it lasts for 5 to 20 minutes, causing the app to time out, refusing new connections, and a massive slow-down of anything that is running. SQL has recovered on its own two of these times. It recovered following a Kill of hundreds of threads reporting “SEMAPHORE WAIT”. The most recent occurrence nailed all 16 processors at 100%. We were forced to issue shutdown with nowait. I have been monitoring Perfmon very closely; there are no symptoms that precede the error. Each occurrence captures a different query. Any of the queries, when run from Management Studio, complete in under a second. DBCC MEMORY STATUS reports all memory as being in an unstressed state. The first time the error occurred there were 10 GB still available on the server.Has anyone else experienced this problem or anything similar? We don’t use linked servers or table valued functions (there are known memory bugs related to each of these items)The following server and configuration has been running in production for 6 weeks with no issues:.SQL 2005 EE SP1 Post SP1 Hotfix kb918222.Win 2003 SP1 (dedicated box).Quad Dual Core 3GHz .32 GB memory.AWE enabled.No memory related flags in boot.ini."Lock Pages in memory" set for SQL Startup account.1 Instance (default).1,994 OLTP databases avg less than 100MB each.1,200 active user threads on average (from connection pool of avg 4,000 concurent users)Any comments would be appraciated |
|
Kristen
Test
22859 Posts |
Posted - 2007-02-13 : 10:51:02
|
"Quad Dual Core 3GHz" and "nailed all 16 processors at 100%"Longshot: I make that 4 x 2 processors - are you running hyper-threading too? Dunno abut 2005 but I've often read suggestions that hyper-threading is turned off for SQL Server.(You might want to post a LINK to this thread in http://www.sqlteam.com/forums/forum.asp?FORUM_ID=22 - there are some MS folk in that forum who may be able to help)Kristen |
 |
|
pareshmotiwala
Constraint Violating Yak Guru
323 Posts |
Posted - 2007-02-13 : 11:58:17
|
One of the things we did is to use some of these offending queries on the performance tuning tool provided by 2005. It has really saved our bacon big time. Just out of curiosity, are these the same queries or a group of queries from the same app.We ran into a similar situation some time ago and we cut the Degree of parallelism down to one and we never had CPU issues again.You have not mentioned what is the disk subsystem. I would suspect if you are using EE you must be on SAN but, if it is a small disk subsystem like MSA 500 or so, it would queue up your transactions and not let go of your memory and cpu. Also on perfmon, what is the DPC indicate? |
 |
|
sgoodwi3
Starting Member
6 Posts |
Posted - 2007-02-13 : 12:14:50
|
Pareshmotiwala, thanks for the comments. The queries captured by the trace flag are different every time. The one thing they have in common is that they are all very insignificant resource wise, and execute in a few milliseconds. As for the disk subsystem, it is DAS with 8 15K drives in RAID 5. There is no disk queing during the issue (only time we see any queuing is during db backup, and even then it's not significant). DPC rate avg 30, peak 90. |
 |
|
pareshmotiwala
Constraint Violating Yak Guru
323 Posts |
Posted - 2007-02-14 : 12:54:01
|
quote: Originally posted by sgoodwi3 Pareshmotiwala, thanks for the comments. The queries captured by the trace flag are different every time. The one thing they have in common is that they are all very insignificant resource wise, and execute in a few milliseconds. As for the disk subsystem, it is DAS with 8 15K drives in RAID 5. There is no disk queing during the issue (only time we see any queuing is during db backup, and even then it's not significant). DPC rate avg 30, peak 90.
Try reducing the degree of parallelism to 0.or this could be a case where you need a middleware for taking the brunt of the CPU/Memory intensive processing. I had a similar situation with a HR vendor, whose clients saw about 5-10 minutes wait to get simple returns as in your case.Somebody had told them that using all CPUs for all queries will be great. I had them bring it down from 8 to 0 and voila...it worked..may be worth a shot.Here's the sample scriptEXEC sys.sp_configure N'show advanced options', N'1' RECONFIGURE WITH OVERRIDEGOEXEC sys.sp_configure N'max degree of parallelism', N'0'GORECONFIGURE WITH OVERRIDEGOEXEC sys.sp_configure N'show advanced options', N'0' RECONFIGURE WITH OVERRIDEGOand yes...all the best. |
 |
|
wangji
Starting Member
2 Posts |
Posted - 2007-02-15 : 09:09:25
|
max degree of parallelism should be set to 1 in this case, which means using 1 cpu at a time for a connection. Set to 0 equals to using all cpus.Can yo run select * from master..sysprocesses to see what resources the system is waiting on? |
 |
|
sgoodwi3
Starting Member
6 Posts |
Posted - 2007-02-15 : 12:08:45
|
Thanks for the advice. This box is dedicated to OLTP so we do have Parallelism set to 1. |
 |
|
sgoodwi3
Starting Member
6 Posts |
Posted - 2007-02-23 : 17:15:37
|
Thanks for everyones comments. It turns out we were running out of memory in the visible region (MemToLeave). In short the amount of space allocated to this region of memory on Win32 is limited (~1.2GB/instance). Having a very large number of active databases (1500 to 2000) consumed the majority of this memory, leaving to little to support larger multi-page allocations that we periodically are encounter (large plan optimization, backup, etc). The quick fix is to distribute the databases over multiple instance and or servers. The ideal solution is to migrate to 64 bit where this are of memory is limited only by the physical memory limitations of the server. |
 |
|
|
|
|
|
|