Please start any new threads on our new
site at https://forums.sqlteam.com. We've got lots of great SQL Server
experts to answer whatever question you can come up with.
Author |
Topic |
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2006-03-14 : 18:03:40
|
It's been pretty quiet on here recently - no-one experiencing any corruption problems we haven't already given guidance on? Cool...Paul RandalLead Program Manager, Microsoft SQL Server Storage Engine + SQL Express(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
Michael Valentine Jones
Yak DBA Kernel (pronounced Colonel)
7020 Posts |
Posted - 2006-03-14 : 19:21:05
|
Maybe enough people read those threads and thought, "Hey, maybe regular backups would be a good idea. I always wanted to try that Database Maintenance Wizard anyway."CODO ERGO SUM |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2006-03-14 : 19:30:34
|
Paul,Have you ever seen an instance where a heavily fragmented table (with potential corrupted indexes) fails to physically "save" rows? I have just had a couple of "mystery" transactions vanish.... The suprising thing was that the "audit" table (managed by triggers) actually did record the transaction but the base table did not.The problem could (nay..probably) have existed between keyboard and chair... but it might give you something to think about...DavidMIntelligent Design is NOT science.A front-end is something that tries to violate a back-end. |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2006-03-14 : 20:32:39
|
Is the clustered index corrupted?Paul RandalLead Program Manager, Microsoft SQL Server Storage Engine + SQL Express(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-03-15 : 11:36:25
|
This is not related to corrupted data, but will give it a shot anyway.We want to run dbcc indexdefrag on our 24x7 system that has been up for a few years with minimal maintenance.Unfortunately this always leads to blocking of user transactions which are constantly hammering the system.Even update staistics have disrupted the system when run manually.The auto update stats are on, so we let sql-server decide on that point.The tables are 1-10 M rows, a couple > 30 M.I have pretty much given up on any scheduled maintenance as this disrupts the system too much.I am waiting that the Cumulative Hotfix for SQL Server 2000 Service Pack 4 - Build 2162 for x86 and x64 Based Systemswill be put back on the MS site. I believe that it might keep dbcc indexdefrag from locking up the system!? So my questions are basically:1. How to best handle maintenance on busy 24x7 system.2. Is it worth doing? I mean the system has been running several years, and sql-server does not seem to suffer too much from the lack of maintenance!?3. Can bad maintenance (or excessive maintenance) increase the risk of data corruption issues?rockmoose |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2006-03-15 : 13:00:54
|
quote: Originally posted by rockmoose This is not related to corrupted data, but will give it a shot anyway.
Not a problem - I wrote DBCC INDEXDEFRAG so that's in my area of expertise quote: We want to run dbcc indexdefrag on our 24x7 system that has been up for a few years with minimal maintenance.
Why? Are you seeing decreasing query performance that you can correlate to increasing fragmentation? That's when you should be addressing fragmentation. See the whitepaper below for details:http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/ss2kidbp.mspxBe careful of some of the gotchas - INDEXDEFRAG doesn't rebuild statistics and will always fully log what its doing, not matter what recovery mode the database is in.quote: Unfortunately this always leads to blocking of user transactions which are constantly hammering the system.Even update staistics have disrupted the system when run manually.The auto update stats are on, so we let sql-server decide on that point.The tables are 1-10 M rows, a couple > 30 M.I have pretty much given up on any scheduled maintenance as this disrupts the system too much.I am waiting that the Cumulative Hotfix for SQL Server 2000 Service Pack 4 - Build 2162 for x86 and x64 Based Systemswill be put back on the MS site. I believe that it might keep dbcc indexdefrag from locking up the system!?
Yup - unfortunately SP4 had a bug in the lock manager that means the short duration page locks that DBCC INDEXDEFRAG takes are not dropped when it drops them, leading to eventual escalation to an exclusive table lock... quote: So my questions are basically:1. How to best handle maintenance on busy 24x7 system.
In a nutshell, work out what you really have to do, and the optimal way of doing it. Tasks to consider include defrag/rebuild, backups, consistency checks.. others with more experience of setting up maintenance plans can weigh in here. Note that I don't mention shrink as a maintenance operation you should consider.quote: 2. Is it worth doing? I mean the system has been running several years, and sql-server does not seem to suffer too much from the lack of maintenance!?
See above - do you really need to do defrags? Do your disaster recovery SLAs dictate a certain backup of CHECKDB schedule? It all depends..quote: 3. Can bad maintenance (or excessive maintenance) increase the risk of data corruption issues?
Absolutely not, in terms of SQL Server causing the corruption. If you have flaky hardware then the more activity using it the more likely it will introduce a problem.ThanksPaul RandalLead Program Manager, Microsoft SQL Server Storage Engine + SQL Express(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2006-03-15 : 15:34:13
|
quote: Originally posted by paulrandal Is the clustered index corrupted?
No....DavidMIntelligent Design is NOT science.A front-end is something that tries to violate a back-end. |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2006-03-15 : 16:41:41
|
What does CHECKDB say? You said some of the indexes may be corrupted..Paul RandalLead Program Manager, Microsoft SQL Server Storage Engine + SQL Express(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
byrmol
Shed Building SQL Farmer
1591 Posts |
Posted - 2006-03-15 : 17:30:37
|
>>What does CHECKDB say? You said some of the indexes may be corrupted..CHECKDB is all good... As I mentioned.. PEBKAC... "Problem exists between keyboard and chair"The scan density on the table concerned was at 11%..It's on a client site where the technical ability is just above a flat worms... I ran dbreindex remotely and apparently "that fixed it"... Take a deep breath Dave, "Serentity NOW!"Once again Paul, thanks for your input on SQLTeam...DavidMIntelligent Design is NOT science.A front-end is something that tries to violate a back-end. |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2006-03-15 : 17:40:43
|
ok - I'm still confused about the flow of this part of the thread but if everythings's fixed then cool.Paul RandalLead Program Manager, Microsoft SQL Server Storage Engine + SQL Express(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-03-15 : 19:52:55
|
Thanks for the answer.quote: Why? Are you seeing decreasing query performance that you can correlate to increasing fragmentation? That's when you should be addressing fragmentation. See the whitepaper below for details:
Good whitepaper (I actually read it a long time ago).We have about constant query performance, and any decrease is more correlated to datagrowth, peak usage times and reporting(aarrgh).BUT, I have never seen the system run in a defragmented mode , so I can't say if it would be more performant.I was thinking of "cleaning up" the fragmentation, tidying up, so to speak, and I thought that it would make things overall better.If nothing else I could regain some disk space as a side-effect of any defragging maneuvres.quote: Yup - unfortunately SP4 had a bug in the lock manager that means the short duration page locks that DBCC INDEXDEFRAG takes are not dropped when it drops them, leading to eventual escalation to an exclusive table lock...
Any news on when it will be available? I do have an "old" copy lying on a QA server where it is installed. But after the retraction from the MS site I decided to wait for the official release.quote: In a nutshell, work out what you really have to do, and the optimal way of doing it. Tasks to consider include defrag/rebuild, backups, consistency checks.. others with more experience of setting up maintenance plans can weigh in here. Note that I don't mention shrink as a maintenance operation you should consider.
I am asking the massive collective experience on this board Anyway, there are no scheduled maintplans, and what we do is monitor execution times, locking, examining execution plans and do manual tweaking where we detect problem areas. It has gone from really bad -> bad -> ok.Some attempts to schedule update statistics / defragging automatically, failed because it was too disruptive.What's the advice on CHECKDB, can it cause similar problems?I have never really bothered to run CHECKDB on a regular schedule, have I just been lucky over the last few years?The few times I did run it on various systems, it always reports all ok!Maybe it's better to run CHECKDB preemptively before any corruption issues, iof after!?quote: See above - do you really need to do defrags? Do your disaster recovery SLAs dictate a certain backup of CHECKDB schedule? It all depends..
Sql-Server imo does a really good job of keeping itself healthy.And my general thoughts is to not tax it with uneccessary reindexing, defragging and such.It's just a matter of deciding how much maintenance is necessary. Very little if you ask me.But in this case, since there is heavy fragmentation on all the main large tables, I wanted to do a one-off clean-up to see what happens, but CANNOT atm .quote: Absolutely not, in terms of SQL Server causing the corruption. If you have flaky hardware then the more activity using it the more likely it will introduce a problem.
I like your take on this... "It's not sql-server, it's the hardware." Of course, the more you plagues the system, the more likely it's going to get back at you.Thank You for a great product!____________________________________________________________(running on 2000sp4 without post sp4 cumulative HOTFIX btw)rockmoose |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2006-03-15 : 20:51:28
|
Unknown on the SP4 QFE issue.Before you take the hit of doing the defrag operation, work out whether any of the queries will actually benefit from removing fragmentation - if not then there's not much point.How were you triggering defrags when you tried to do it automatically and which method were you using (defrag/rebuild)?CHECKDB - you've been lucky! Ask yourself these questions: 1) what's the minimum amount of data/completed work that the business can afford to lose from corruption/disaster? 2) what's the maximum downtime the business can afford as a result of a corruption/disaster? 3) when did I last test that my backups actually contained valid databases? 4) when did I last test my disaster recovery plan?These two answers will dictate what kind of HA and DR strategies you need, including backup types/frequency, CHECKDB types/frequency, disk/server redundancy. For instance (roughly), if you can't afford to lose more than a day's work and can't be down for more than a few hours, you'd better be running daily CHECKDBs and have your data partitioned so it can be restored within 2 hours.Yes, I'm preaching here ThanksPaul RandalLead Program Manager, Microsoft SQL Server Storage Engine + SQL Express(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-03-16 : 03:14:45
|
Good morning!quote: Before you take the hit of doing the defrag operation, work out whether any of the queries will actually benefit from removing fragmentation - if not then there's not much point.
Isn't that a little hard to do beforehand?I mean, how does one decide wether the fragmentation of an index per se is a major cause of performance degradation.quote: How were you triggering defrags when you tried to do it automatically and which method were you using (defrag/rebuild)?
We used STATS_DATE and DBCC SHOWCONTIG to pipe results to some tables and had a process go through these and defrag the indexes with high fragmentation.DBCC INDEXDEFRAG + UPDATE STATISTICS is what we used.quote: CHECKDB - you've been lucky! Ask yourself these questions:1) what's the minimum amount of data/completed work that the business can afford to lose from corruption/disaster?2) what's the maximum downtime the business can afford as a result of a corruption/disaster?3) when did I last test that my backups actually contained valid databases?4) when did I last test my disaster recovery plan?These two answers will dictate what kind of HA and DR strategies you need, including backup types/frequency, CHECKDB types/frequency, disk/server redundancy. For instance (roughly), if you can't afford to lose more than a day's work and can't be down for more than a few hours, you'd better be running daily CHECKDBs and have your data partitioned so it can be restored within 2 hours.
Most of that is in place and we do check database backups weekly.It's basically just checkdb that we don't run on a regular basis, but recovery can be done to point in time fast if within the last week.Point in time recovery older than that will take more time as those backups will be on tape.But I will get to it and run some CHECKDB here.How deep is recommendable to run CHECKDB in order to get good results?I am thinking about the switches NOINDEX, PHYSICAL_ONLY. NO_INFOMSG is a must.What level of REPAIR is fair to choose?, if we start running this on a regular basis (say weekly).Does it have a TABLE_RESULT option or something like that?From BOL: A lot of warning signs here quote: DBCC CHECKDB is a CPU- and disk-intensive operation. Each data page that requires checking must first be read from disk into memory. In addition, DBCC CHECKDB uses tempdb to do sorting.If actively performing transactions while DBCC CHECKDB is running, the transaction log continues to grow because the DBCC command blocks log truncation until it has finished reading the log.It is recommended that DBCC CHECKDB be run during hours when the load is light on the server. If DBCC CHECKDB is run during heavy peak usage time, expect a performance hit on the transaction throughput as well as DBCC CHECKDB completion time.
The bad news is that we really don't have a maintenance window on this system.Running some tests right now...edit: updateDBCC CHECKDB('...'), WITH NO_INFOMSGS-------------------------------------20 minutes later:The command(s) completed successfully.I have never seen any different result than this, maybe I'm just lucky. rockmoose |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2006-03-16 : 12:35:25
|
quote:
quote: Before you take the hit of doing the defrag operation, work out whether any of the queries will actually benefit from removing fragmentation - if not then there's not much point.
Isn't that a little hard to do beforehand?I mean, how does one decide wether the fragmentation of an index per se is a major cause of performance degradation.
No - its relatively simple. All you need to do is see which queries are using which indexes for range scans rather than singleton lookups. These are the indexes that you should concentrate on.quote:
quote: How were you triggering defrags when you tried to do it automatically and which method were you using (defrag/rebuild)?
We used STATS_DATE and DBCC SHOWCONTIG to pipe results to some tables and had a process go through these and defrag the indexes with high fragmentation.DBCC INDEXDEFRAG + UPDATE STATISTICS is what we used.
Sorry - I meant which results from DBCC SHOWCONTIG are you basing the decision on?quote: Most of that is in place and we do check database backups weekly.
How do you check the backup is ok without running CHECKDB on the restored database?quote: But I will get to it and run some CHECKDB here.How deep is recommendable to run CHECKDB in order to get good results?I am thinking about the switches NOINDEX, PHYSICAL_ONLY. NO_INFOMSG is a must.
Depends on what SLAs you want to satisfy. A very rough guide would be to run a regular CHECKDB using WITH PHYSICAL_ONLY and a less regular full CHECKDB. PHYSICAL_ONLY implies NOINDEX - don't bother with NOINDEX.How regular is regular? Depends on how long it takes to run, what impact it has on the system workload, whether you have a maintenance window...quote: What level of REPAIR is fair to choose?, if we start running this on a regular basis (say weekly).
<gnashes teeth furiously!> Don't ever run repair automatically. Yes, I know that there's an option to do it when setting up a CHECKDB maintenance task but I had it removed in SQL Server 2005 - it was evil.Two reasons for this:1) you'll need to put the database into single-user mode (minor reason)2) you need to look at the errors to see what repair's going to do, work out why the errors happened, etc before deciding to restore or repairquote: Does it have a TABLE_RESULT option or something like that?
Yes, but its undocumented and changes between releases. Use at your own risk.quote: From BOL: A lot of warning signs here quote: DBCC CHECKDB is a CPU- and disk-intensive operation. Each data page that requires checking must first be read from disk into memory. In addition, DBCC CHECKDB uses tempdb to do sorting.If actively performing transactions while DBCC CHECKDB is running, the transaction log continues to grow because the DBCC command blocks log truncation until it has finished reading the log.It is recommended that DBCC CHECKDB be run during hours when the load is light on the server. If DBCC CHECKDB is run during heavy peak usage time, expect a performance hit on the transaction throughput as well as DBCC CHECKDB completion time.
Yup - you need to work out what impact its going to have - this is the same with any DB vendor - consistency checks are IO and CPU intensive. If the impact is too high to run it on the production system, consider running it on restored backups or doing a set of staggered CHECKTABLEs instead of a full CHECKDB - you're going to need to experiment to see what'll work for you.ThanksPaul RandalLead Program Manager, Microsoft SQL Server Storage Engine + SQL Express(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-03-16 : 16:23:30
|
quote: No - its relatively simple. All you need to do is see which queries are using which indexes for range scans rather than singleton lookups. These are the indexes that you should concentrate on.
Ok, I'll keep an open eye, and see if a situation arises soon where I can test this.(defragging changing execution plan)quote: Sorry - I meant which results from DBCC SHOWCONTIG are you basing the decision on?
It was a combination of low Scan Density (<70-80%)And high Logical Scan Fragmentation (>20-30%)Here is a sample of one sorry table:DBCC SHOWCONTIG scanning 'sample table' table...Table: 'sample table' (1040722760); index ID: 1, database ID: 5TABLE level scan performed.- Pages Scanned................................: 21578- Extent Switches..............................: 6906- Scan Density [Best Count:Actual Count].......: 39.06% [2698:6907]- Logical Scan Fragmentation ..................: 36.36%DBCC SHOWCONTIG scanning 'sample table' table...Table: 'sample table' (1040722760); index ID: 3, database ID: 5LEAF level scan performed.- Pages Scanned................................: 1702- Extent Switches..............................: 1518- Scan Density [Best Count:Actual Count].......: 14.02% [213:1519]- Logical Scan Fragmentation ..................: 55.99%DBCC SHOWCONTIG scanning 'sample table' table...Table: 'sample table' (1040722760); index ID: 47, database ID: 5LEAF level scan performed.- Pages Scanned................................: 3119- Extent Switches..............................: 2782- Scan Density [Best Count:Actual Count].......: 14.01% [390:2783]- Logical Scan Fragmentation ..................: 56.33%DBCC SHOWCONTIG scanning 'sample table' table...Table: 'sample table' (1040722760); index ID: 48, database ID: 5LEAF level scan performed.- Pages Scanned................................: 2460- Extent Switches..............................: 2204- Scan Density [Best Count:Actual Count].......: 13.97% [308:2205]- Logical Scan Fragmentation ..................: 54.07%DBCC SHOWCONTIG scanning 'sample table' table...Table: 'sample table' (1040722760); index ID: 50, database ID: 5LEAF level scan performed.- Pages Scanned................................: 602- Extent Switches..............................: 205- Scan Density [Best Count:Actual Count].......: 36.89% [76:206]- Logical Scan Fragmentation ..................: 21.43% quote: How do you check the backup is ok without running CHECKDB on the restored database?
That is probably a very good question...Well just test and trial basically. Atm we restore to the QA & DEV servers where a lot of "testing" is going on,and we have never had any <detected> corrupted data problems so far.But will get going on CHECKDB there.quote: <gnashes teeth furiously!> Don't ever run repair automatically
Ok, I hereby promise to never run repairs automatically.quote: Yes, but its undocumented and changes between releases. Use at your own risk.
I never used an undocumented feature in my whole life ....kiddingGood speaking to you, thanks for the information, advice and pointers.rockmoose |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2006-03-16 : 17:14:25
|
You're welcome - let me know if you have any further questions (here or email)Paul RandalLead Program Manager, Microsoft SQL Server Storage Engine + SQL Express(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-03-18 : 12:53:45
|
Hi again!I ran DBCC CHECKDB and everything looked right.At the same time I followed the activities on perfmon.The set up of the server is such that data is on SAN(RAID5), Logfiles on local disk, tempdb has 3 equally large datafiles spread out on 3 disks.I watched the current disk queue length on the SAN which normally is running 0-6.During the CHECKDB which took 10 - 20 minutes the disk queue lenght averaged 40, a significant increase.The processor usage just marginally increased, at the time fom about 10% to about 18%.This was all performed during a period of low activity on the system.On the disk write bytes/sec I did not notice anything exceptional.But the disk read bytes/sec averaged at appx 45MB/sec during the CHECKDB, which was more than a factor 10 increase from normal.My question is the following:Is this rise in disk queue length to be expected when running DBCC CHECKDB, or does it sound like there might be a disk-bottleneck?rockmoose |
|
|
Kristen
Test
22859 Posts |
Posted - 2006-03-19 : 04:57:32
|
"recovery can be done to point in time fast if within the last week.Point in time recovery older than that will take more time as those backups will be on tape"But there is then the work from the recovery point which the users need to repeated ...Perhaps you could run DEFRAG for, say, 10 minutes and abort it. After a few goes at that (several night's quiet periods??) you would have the index "fixed", and maybe defrags after that would run through within 5 minutes. Maybe none of the users will complain for a 5 minute slow down each night?"what we do is monitor execution times"Do you do sp_recompile? We have a number of SProcs that have performance which deteriorates markedly over time, and we recompiled them regularly.Other than that I think you need to be restoring to another server and doing CHECKDB there.On that server / restored DB copy you could do some specific performance tests THEN do a Defrag and Update Stats on the whole lot, and then rerun your performance tests. If there is a massive different you should be planning to find a way to do it on the live server!If you have SP4 installed then the whole-table lock escallation on DEFRAG will be a bit of a turn off!Kristen |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2006-03-19 : 11:59:24
|
quote: My question is the following:Is this rise in disk queue length to be expected when running DBCC CHECKDB, or does it sound like there might be a disk-bottleneck?
Looks like a bottleneck. A full CHECKDB is generally CPU bound, a WITH PHYSICAL_ONLY is generally IO bound. I would expect some queue length increase but it does seem like your IO system can't handle the required IOs. Now, if your regular workload doesn't tax the IO system beyond its limits, its up to you whether to upgrade it or not - again, all comes down to the disaster recovery SLAs. In your case I think you're fine.Can you try it WITH PHYSICAL_ONLY and monitor the queue length? I'm interested in the results.ThanksPaul RandalLead Program Manager, Microsoft SQL Server Storage Engine + SQL Express(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
rockmoose
SQL Natt Alfen
3279 Posts |
Posted - 2006-03-19 : 21:40:47
|
quote: Can you try it WITH PHYSICAL_ONLY and monitor the queue length? I'm interested in the results.
Would you care to have me email you a trace file with and without PHYSICAL_ONLY, or just to post a summary here?Thanks for the suggestions Kristen.quote: Do you do sp_recompile? We have a number of SProcs that have performance which deteriorates markedly over time, and we recompiled them regularly.
Not really, we did do sp_recompile on a couple of hotspot tables. And currently there is a job monitoring blockings on these tables,which will kick off a recompile if the number of blocks reaches a certain level.The number of blocks over time is monitored by another job which runs every minute. (minute-snapshot of blocks & deadlocks)Anyway, we set the MAXDOP 1 hint where we had problems, since the block was often CXPACKET when we ran into trouble.Since then ok._________________________________________________Having appropriate indexes is like night and day rockmoose |
|
|
paulrandal
Yak with Vast SQL Skills
899 Posts |
Posted - 2006-03-20 : 22:06:35
|
A summary here would be good - I'm pretty swamped this week.Paul RandalLead Program Manager, Microsoft SQL Server Storage Engine + SQL Express(Legalese: This posting is provided "AS IS" with no warranties, and confers no rights.) |
|
|
Next Page
|
|
|
|
|