Author |
Topic |
rlaubert
Yak Posting Veteran
96 Posts |
Posted - 2008-07-11 : 08:51:46
|
I have been with this new position about 3 months now. The company is an Internet Application Provider that has grown beyond the initial hardware capabilities and has been experiencing problems with an application that really was not properly designed. During the startup phase and first years of growth, the hardware made up for the performance issues with the application. Now, we are experiencing the growing pains. Lots of work needs to be done on the SP, DB design, and the application itself.With the background out of the way, there is a practice here that my gut says is really wrong. On a daily basis we have periods of time when the load exceeds the capabilities and users begin to get timeouts from the web app. In the past the practice has been to failover on the cluster when this happens. Naturally everything calms down after the failover for a while. I have expressed my feelings on this but because "we" see an immediate improvement in processing request (IE no timeouts) everyone seems to feel that this is the appropriate solution until we get the application reworked.My question, am I wrong to assume that failing over should be due to hardware issues and not used to temporarily "improve performance". This improvement is really due to the fact that we dropped all the user connections to SQL and all the requests for data that were being processed have been rolledback and have to be resubmitted.I really want to understand the best practices for failing over and if this practice is acceptable. Thanks for your responses.Raymond LaubertMCSE, MCDBA, MCITP:Administration, MCT |
|
dinakar
Master Smack Fu Yak Hacker
2507 Posts |
Posted - 2008-07-11 : 12:51:39
|
It depends on the definition of "failure".. it could be an actual calamity or a failure due to performance..As you said you have bad code creating poor query plans and everytime you "failover" to the other node you are fine for a while until the same poor query plans get created over time/load. Hopefully you are working on the root cause fixing the code as you keep failing over every now and then..Dinakar Nethi************************Life is short. Enjoy it.************************http://weblogs.sqlteam.com/dinakar/ |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2008-07-11 : 14:11:52
|
This practice is not acceptable in mission critical environments as what is being done is considered downtime. If we had all of that downtime here, we'd be paying the customer a large amount of money due to our SLA with them. Instead of restarting the service (or failing over), why isn't anyone looking to see what spids need to be killed to improve performance (getting rid of their locks and/or blocks)? You shouldn't have to kill everyone out (as is done in a restart) in order to improve performance, rather a small subset should do it. Are the users aware that their transactions are being rolled back? Do they realize they need to resubmit the same queries that they were prevented from completing?Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
|
|
mfemenel
Professor Frink
1421 Posts |
Posted - 2008-07-11 : 14:56:03
|
Failing over on a cluster is not a performance improvement as stated earlier. You're just buying a few minutes of sanity with smoke and mirrors to your end users. I just recently dealt with an app with similar issues as you are describing. I would strongly reccomend you run profiler for a few days, evaluate the queries and SP's that are hurting performance and fix them and check the indexing. With luck this will be enough of a stop gap so that you can do the real work of straightening out the tables and design. If you're doing this as a visible project (to management I mean) then make it clear that this is just a band aid and if you gain breathing room it shouldn't be considered a final solution. Good luck.Mike"oh, that monkey is going to pay" |
|
|
rlaubert
Yak Posting Veteran
96 Posts |
Posted - 2008-07-11 : 15:15:12
|
It is so nice to hear that my thoughts on failover are correct. Sometimes you have to wonder.Anyway, we run profiler all day and store the results for troubleshooting and performance tuning. We know what SP have to be worked on first. I am very happy to say that we have already reduced the timeouts and increased performance just by making changes to the two largest SP that create issues. CPU untilization dropped from 50-60% per CPU to mid 20s today alone, AND no timeouts today. I am going to keep this thread so the next time I am asked to failover I can point out that this is not recommended and that we should continue to correct the problem not treat the symptoms.Thanks.Raymond LaubertMCSE, MCDBA, MCITP:Administration, MCT |
|
|
mfemenel
Professor Frink
1421 Posts |
Posted - 2008-07-11 : 15:20:02
|
Nice to know you're not the only person who thinks the world is round isn't it?Mike"oh, that monkey is going to pay" |
|
|
sodeep
Master Smack Fu Yak Hacker
7174 Posts |
Posted - 2008-07-11 : 15:36:57
|
In addition to all these comments, You can take a look at table partitioning(Enterprise edition only) and standard reports that comes with SQL Server 2005 |
|
|
tkizer
Almighty SQL Goddess
38200 Posts |
Posted - 2008-07-11 : 16:26:40
|
quote: Originally posted by rlaubert we run profiler all day and store the results for troubleshooting and performance tuning. We know what SP have to be worked on first.
Raymond, how are you collecting the data? In a table or a file? Where are you running the GUI from?Have you considered doing a less intrusive server-side trace (SQL Trace)?The reason why I ask is that SQL Profiler can greatly decrease your performance. There are ways to reduce its impact though, namely saving to a file and running the GUI from a client machine. But the best way to collect the data is via a server-side trace.Tara KizerMicrosoft MVP for Windows Server System - SQL Serverhttp://weblogs.sqlteam.com/tarad/Subscribe to my blog |
|
|
rlaubert
Yak Posting Veteran
96 Posts |
Posted - 2008-07-14 : 16:40:58
|
Profiler is running to a file everyday all day. This is used by customer server to troubleshoot script errors. GUI is run from local PCs looking at saved Profiler traces.Raymond LaubertMCSE, MCDBA, MCITP:Administration, MCT |
|
|
|