Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2009-09-12 : 20:08:23
|
Hello. I have been trying to understand the source of some severe performance degradation on one of our servers. The symptoms we are seeing are excessive application timeouts and queries way taking longer to run than normal - even when the execution plan is optimal.I have been googling and binging for the last two weeks and have been monitoring key performance counters and I'm not 100% sure what to make of what I'm finding. I'm hoping one or a few of the gurus here can lend a hand.To start, our server configuration is as follows:- Windows Windows 2003 Server R2 Standard Edition SP2 (32 bit)
- SQL Server 2005 SP2
- Data storage is on an iSCSI SAN with a 1 GB NIC (HP AIO600)
- 4 GB memory
The SAN is shared with other database servers but the drives are carved out and dedicated to each server (versus all the drives configured as one large virtual). Four of the logical disks are exposed to this one server (1-RAID5, 3-RAID1).These are the main performance counters I have been observing:- Average Disk Queue Length (one for each drive, not _Total)
- Buffer Cache Hit Ratio
- Page Life Expectancy
- Checkpoint pages/sec
- Lazy writes/sec
Combined with these I am monitoring some of the DMVs and finding a lot of PAGEIOLATCH_SH (and _EX).I am also monitoring CPU counters, page file, etc. All in all, the only two things that stand out are the PAGEIOLATCH wait types and the Page Life Expectancy counter. Everything else appears to be optimal - buffer cache 96%+, checkpoint pages/sec & lazy writes/sec extremely low, CPUs are not being over utilized, etc.As far as the PAGEIOLATCH wait types, I think these indicate an I/O issue. But we've even put databases on their own spindels and the same performance still exists. The Page Life Expectancy peaks at about 600 when the SQL Server is started but when we turn on our applications it drops down <75 almost immediately. And of course, when this happens, the Average Disk Queue Length spikes mid range double digits for the duration of the query. Which drive depends on which query. The thing is it is consistent across databases and spindels, etc. So something tells me it is either a SQL or hardware issue, not an application issue.Ok, so sorry for the wall of text but my question is, how do you determine what this problem is? At this point I am not sure if it is a memory issue and we should add more memory to SQL Server or if there is a hardware issue with the iSCSI device.So, the million dollar question - any sugguestions?Thanks in advance! |
|
sodeep
Master Smack Fu Yak Hacker
7174 Posts |
Posted - 2009-09-13 : 22:48:58
|
You should increase your memory as Page life expectancy is really less. Also Enable lock pages in memory and AWE for 32-bit SQL Server to consume memory. Remember 32-bit SQL Server can't allocate more than 2GB for sorting and hashing operations even you allocate more memory. |
 |
|
tfountain
Constraint Violating Yak Guru
491 Posts |
Posted - 2009-09-14 : 16:58:23
|
Interesting twist. I had our network guys recreate one of the RAID drives so that all the partitions where aligned on the disks. We saw an immediate improvement of about 45%. While the memory thing might be valid, I really do not think it was the root cause. We are in process of reconfiguring the entire SAN disk so all the disks are aligned correctly. After tomorrow I will be able to gauge what degree this really buys us. |
 |
|
sodeep
Master Smack Fu Yak Hacker
7174 Posts |
Posted - 2009-09-14 : 22:07:17
|
It doesn't matter how much you optimize your hard disk ,you have to have enough memory so that pages doesn't flush from buffer pool to hard disk and eventually it has to read from disk.Get the counters of1) Memory grants pending and Outstanding2) Total Server Memory3) Available Mbytes4) Cache counts5) Page lookups/sec. |
 |
|
|
|
|
|
|