Author |
Topic |
simondeutsch
Aged Yak Warrior
547 Posts |
Posted - 2011-04-11 : 09:31:03
|
I originally posted this question [url]http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=155509[/url], but it's still unresolved and extremely frustrating. Any help that steers me in the right direction will be really appreciated!Basically, the performance is very erratic even though the usual indicators like high memory use or long-running queries are absent. A query of the top waits returns thiswait_type waiting_tasks_count wait_time_ms max_wait_time_ms signal_wait_time_ms ------------------------------------------------------------ -------------------- -------------------- -------------------- -------------------- SQLTRACE_INCREMENTAL_FLUSH_SLEEP 128150 513769585 4040 252OLEDB 189868779 509700990 563766 0TRACEWRITE 437071 509587033 2043 596914ASYNC_NETWORK_IO 2184542 1861763 71174 36551CXPACKET 58213 1430626 2304 12872PREEMPTIVE_OS_WAITFORSINGLEOBJECT 2109753 1227058 894 0PAGEIOLATCH_SH 41866 284903 545 613LATCH_EX 193257 232155 125 14663WRITELOG 42163 18942 27 10559PAGEIOLATCH_EX 2892 12628 151 8 The server just seeems...nonresponsive. Not overworked. Even just connecting to it takes longer than it should. |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-04-11 : 09:44:58
|
You don't appear to have any I/O problems. I'd look at your network connectivity, the OLEDB and ASYNC_NETWORK_IO are pretty high.Are you running any traces to files over the network? If so, turn those off or redirect them to a local disk.If you make any configuration changes, run DBCC SQLPERF("sys.dm_os_wait_stats" , CLEAR) immediately after. You won't be able to accurately measure the effects with the accumulated wait history you're showing here. |
 |
|
simondeutsch
Aged Yak Warrior
547 Posts |
Posted - 2011-04-11 : 09:55:40
|
That makes sense, given that in Activity Monitor nothing is high, except Network I/O under Resource Waits, and when running profiler traces the only thing that stands out is the AUDIT LOGON and LOGOFF events. There are no traces over the network.One thing: when looking at Task Manager, there's a service running dns.exe, which is consuming over 500000K memory and is possibly messing up the network, even though Task Manager shows network utilization to be low. |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-04-11 : 10:04:31
|
Yeah, sounds like someone included the DNS role/feature when they installed it. Probably should remove it.You should also run "netstat -an" on the SQL Server and see how many connections you have and what their status is. If you have a lot of wait statuses (CLOSE_WAIT or FIN_WAIT) then your application is not closing their SQL connections properly and is starving the box of available sockets. Anything more than 500 connections should be investigated, especially if most of them are waits. |
 |
|
simondeutsch
Aged Yak Warrior
547 Posts |
Posted - 2011-05-11 : 10:12:55
|
There are no connections other than those ESTABLISHED when I do netstat. There are only about 20 ESTABLISHED connections to SQL, matching the user count.The network admin had a fit when I asked about DNS. It will have to be a last resort after all other possibilities have been eliminated.So a question about network waits and connection pooling and stuff. If UserA is running a query that isn't consuming its data fast enough (let's say due to a loop in app code), would that affect the speed with which UserB receives a batch result? Would UserB be put into a queue to wait until UserA consumes all data before UserB's data gets sent? |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-05-11 : 10:45:34
|
Have you cleared the stats and re-checked them since? It's been a month, has anything changed?Are you doing a lot of linked server stuff? Or ADO connections with cursors? That may explain the high OLEDB waits. And are you running any traces besides the default trace? If it's just the default trace, I'd look at the disk it's being saved to, it might be getting pounded. |
 |
|
simondeutsch
Aged Yak Warrior
547 Posts |
Posted - 2011-05-11 : 11:11:28
|
Nothing has changed. The problem is still there, although the stats have been cleared several times. The stats have changed, of course, but the Network I/O is still very high.There are no linked servers. There are ADO connections with cursors. About 25 concurrent users. But most of the cursors are read-only and forward-only, or client-side and read-only, and there are few lengthy client-app loops or cursors with many rows. Most of it gets consumed pretty fast. The clients seem to freeze randomly, at different parts of the app, so it doesn't seem like a code issue in the client app.There are no other traces right now, but I've run other DMV queries and all the disk stats are ridiculously low. This server is very underutilized for its hardware. Far from getting pounded, it's barely getting touched. |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-05-11 : 11:23:36
|
quote: The clients seem to freeze randomly, at different parts of the app, so it doesn't seem like a code issue in the client app.
That seems a little contradictory, since the SQL Server doesn't seem to be the problem. Are there any other operations on that SQL Server that don't use that app, and are they running slow too?My next suggestion is to have your devs examine their code wherever they open or close a connection to the server. They should also examine the cursor settings and possibly test different variations. If the problem still exists then open a support case with Microsoft. They recently closed a case for us that fixed a problem with .Net network connections (framework bug, not released yet) and maybe you're experiencing the same problem. |
 |
|
simondeutsch
Aged Yak Warrior
547 Posts |
Posted - 2011-05-11 : 11:40:46
|
There are no other operations on this server, other than it being a domain controller. Nothing else runs on this SQL Server besides for my app.I'm the dev :-) Connections are opened explicitly when clients log on to SQL Server and are closed explicitly when clients log off. This matches the results from Netstat showing only sensible ESTABLISHED connections. I've tested different cursor types - same issue. The cursors are the fastest ones possible.If a cursor is slow, wouldn't it be consistently slow? Ditto for app code? What's happening is, that the clients can work for e.g. 3 minutes and all is hunky dory, and then someone will hang for a bit at any random point of the client app. Same scenario keeps on repeating, for different users, different parts of the application.The same application is in use in different environments and this problem isn't occurring anywhere. It's not likely to be bad coding, unless it's bad coding interacting with data access gone wrong. It only happened when this particular client site migrated to a newer, more powerful server with SQL 2008. On the old rattletrap of a box running SQL 7 this problem did NOT exist. |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-05-11 : 12:01:27
|
quote: On the old rattletrap of a box running SQL 7 this problem did NOT exist.
Was that box also DC? Same OS too? Same .Net library? Any of these changes could be contributing, if not the single cause. You can't compare how an older version of SQL works vs. the newest one (as I'm finding out myself...stupid 3rd party app takes 3x longer to log out of the hot new server than the old...so much for progress)It doesn't have to be a coding error, if this is indeed the same problem we had, it's a bug in the framework. I have a feeling it's not though, because you're not seeing waits on your netstats. Did you run that on the client machine(s) too?Have you done any network monitoring to see if it's getting saturated? Or Wireshark? I can see a DC getting hammered every 3 minutes if it's also serving DNS. You may want to look at some .Net counters too, although I couldn't tell you which ones besides networking/TCP. |
 |
|
simondeutsch
Aged Yak Warrior
547 Posts |
Posted - 2011-05-16 : 21:29:37
|
By bug in the framework, do you mean the network libraries? The clients are connecting using ADO 2.8.The client machines are displaying plenty TIME_WAIT. There's one ESTABLISHED entry, and every time there's a freeze, there's a SYN_SENT entry and then an increase in TIME_WAIT entries. I.e. the first time it froze, there were two TIME_WAIT entries on serveraddress:1433, then the second time it froze there were four, etc. Not necessarily a doubling of TIME_WAITS, but an increase. And even when the app resumes, the TIME_WAITS don't go away. |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-05-16 : 22:30:35
|
Interestingly, I went to a session at SQLRally last week that discussed wait stats, and the presenter mentioned ASYNC_NETWORK_IO as an indication of either network saturation and/or slow client processing. Is it possible the clients are not using the full network bandwidth? (10 Mbps vs. 100?) Are you able to test it with only 1 or 2 client connections and see if the waits still occur (including the TIME_WAITs)? |
 |
|
simondeutsch
Aged Yak Warrior
547 Posts |
Posted - 2011-05-18 : 23:45:49
|
Tested it with only six connections on the server, of which I presume all but one or two are inactive, and was able to get the same result with wait stats. It hangs every 15 requests or so, and netstat shows constant increases in the TIME_WAIT connections on this client machine. It does not make sense for it to be slow client processing, which would be consistent.I can't really see how it'd be possible for client computers not to utilize 100 mbps bandwith. The hardware is in place. This is happening on a LAN. |
 |
|
robvolk
Most Valuable Yak
15732 Posts |
Posted - 2011-05-19 : 07:30:11
|
We had a bad network cable knock a Gbit card down to 100 Mbit (found out JUST before rolling new server to production...PHEW!). I'm not saying all of these things are causes, I'm just suggesting based on my experience. If you haven't contacted Microsoft support yet, I think you should. |
 |
|
|