Maintaining statistics - SQL Server Forums

Please start any new threads on our new site at https://forums.sqlteam.com. We've got lots of great SQL Server experts to answer whatever question you can come up with.

All Forums

SQL Server 2005 Forums

SQL Server Administration (2005)

Maintaining statistics

Author

Topic

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2008-01-29 : 17:11:36

Scenario:
For the most part we let SQL Server (2005) maintain our statistics for us. However we do have several large processes written in stored procedures. There is one main controller procedure that can call any number of other procedures. This process can take anywhere from 5 minutes to an hour+ to run (based on the size of the client). Back in the day of SQL Server 2000 we found that the performance of this procedure would diminish over time (while it was running). We implemented a queued concept of issuing UPDATE STATISTICS commands. This was done by adding a SQL Server job that ran every 10 minutes looking for new records in a table. Records where inserted at key points in these stored procedures (after large deletes, updates, inserts).

Goal:
Now, with all that background and with 2005, I'd like to review this concept and remove this implementation if possible, or at least remove the close association of maintaining the statistics from the business jobs. In 2005, are there better ways to monitor and maintain statistics at more of an administrative (but automated) way?

rmiao
Master Smack Fu Yak Hacker

7266 Posts

Posted - 2008-01-29 : 22:52:09

Not really, you still need update statistics in schedule.

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2008-01-31 : 10:57:27

I've read up on the new option for 2005, AUTO_UPDATE_STATISTICS_ASYNC, but I'm not sure if turning this on will help my problem. It sounds as it it's when a query plan is determined (at the start of a procedure execution) is when this option is initiated. But since my problem is the process itself knocks the stats out of whack that won't help me much.

I have no problem writing a job that monitors stats and updates stats on them on it's own. But how can I determine if statistics are outdated?

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2008-01-31 : 11:25:43

Actually, I think I just found my new friend - sp_updatestats. I didn't realize that this was enhanced in 2005. I just ran some preliminary tests and this should do what I am interested in.

TRACEYSQL
Aged Yak Warrior

594 Posts

Posted - 2008-01-31 : 12:37:54

I was under the impression that turning on AUTO_UPDATE_STATISTICS_ASYNC would take care of updating statistics, if the server is busy at the time it will update later. I have turned it on on all our db.

Then i am running this statement
SELECT
t.name AS Table_Name
,i.name AS Index_Name
,i.type_desc AS Index_Type
,STATS_DATE(i.object_id,i.index_id) AS Date_Updated
--,sp.rows --if you want to know how many rows unrem this
FROM
sys.indexes i JOIN
sys.tables t ON t.object_id = i.object_id JOIN
sys.partitions sp ON i.object_id = sp.object_id
and i.index_id = sp.index_id --new

WHERE
i.type > 0 and --if you want to see heap rem this
sp.rows > 0
ORDER BY
t.name ASC
,i.type_desc ASC
,i.name ASC

To determine if it works accordingly.

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2008-01-31 : 12:55:57

Tracey, thanks for the info. But when does the asynch process kick off? Does SQL Serve run it's own background process that runs its it on some interval? Or is it when the query plan is generated for a stored procedure? If the latter is the case then this won't assist me. Our problem is we have a large business process that can take some time to run. This process itself knocks the statistics out of whack due to the nature of the bulk inserts/updates/deletes it performs on tables, thus causing it to run slower than it can. We are talking anywhere from thousands to millions of records at a time (based on the size of our client).

Right now we have strategically placed calls in the stored proc to trigger a job (technically asynch in nature) that updates the statistics after each of these large bulk processes. With this in place, the procedure might run slower for a short bit but once the statistics are updated the performance returns to acceptable standards.

Does this explain my situation better?

TRACEYSQL
Aged Yak Warrior

594 Posts

Posted - 2008-01-31 : 13:25:22

The book i have says with this on the queries do not have to wait for the statistics to be updated before compiling.
So after compiling does it update statistics before the query is executed?

So the questions is when.
Good question,

I guess creating a new table with a few indexes on it - then run a query look at execution query plan - and see if it using the correct indexes ?

TRACEYSQL
Aged Yak Warrior

594 Posts

Posted - 2008-01-31 : 13:32:24

Here some reading from my book.

Updated Statistics
Statistics can be updated either manually or automatically. Manual updates happen when someone runs sp_updatestats or the UPDATE STATISTICS command. Automatic updates happen when SQL Server determines that existing statistics are out of date, or stale, and these updates only happen when the database has the option AUTO_UPDATE_STATISTICS set to ON. This could happen if another batch had tried to use one of the same tables or indexes used in the current plan, detected the statistics were stale, and initiated an UPDATE STATISTICS operation.

Stale Statistics
SQL Server will detect out-of-date statistics when it is first compiling a batch that has no plan in cache. It will also detect stale statistics for existing plans. You can see that SQL Server checks for stale statistics after first checking to see if there already are updated statistics available. If there are stale statistics, the statistics will be updated, and then a recompile will begin on the batch. If AUTO_UPDATE_STATISTICS_ASYNC is ON for the database, SQL Server will not wait for the update of statistics to complete, it will just recompile based on the stale statistics.

Statistics are considered to be stale if a sufficient number of modifications have occurred on the column supporting the statistics. Each table has a recompilation threshold, or RT, that determines how many changes can take place before any statistics on that table are marked as stale. The RT values for all of the tables referenced in a batch are stored with the query plans of that batch.

The RT values depend on the type of table, that is, whether it is permanent or temporary, and on the current number of rows in the table at the time a plan is compiled. The exact algorithms for determining the RT values are subject to change with each service pack, so I will show you the algorithm for the RTM release of SQL Server 2005. The formulas used in the various service packs will be similar to this, but are not guaranteed to be exactly the same. N indicates the cardinality of the table.

For both permanent and temporary tables, if N is less or equal to 500, the RT value is 500. This means that for a relatively small table, you must make at least 500 changes to trigger recompilation. For larger tables, at least 500 changes must be made, plus 20 percent of the number of rows.

For temporary tables, the algorithm is the same, with one exception. If the table is very small or empty (N is less than 6 prior to any data modification operations), all we need are 6 changes to trigger a recompile. This means that a procedure that creates a temporary table, which is empty when created, and then inserts 6 or more rows into that table, will have to be recompiled as soon as the temp table is accessed.

You can get around this frequent recompilation of batches that create temporary tables by using the KEEP PLAN query hint. Use of this hint changes the recompilation thresholds for temporary tables and makes them identical to those for permanent tables. So if changes to temporary tables are causing many recompilations, and you suspect that the recompilations are affecting overall system performance, you can use this hint and see if there is a performance improvement. The hint can be specified as shown in this query:

SELECT <column list>
FROM dbo.PermTable A INNER JOIN #TempTable B ON A.col1 = B.col2
WHERE <filter conditions>
OPTION (KEEP PLAN)
For table variables, there is no RT value. This means that you will not get recompilations caused by changes in the number of rows in a table variable.

Now in english words
I believe it will update before it runs ? what do you think

TRACEYSQL
Aged Yak Warrior

594 Posts

Posted - 2008-01-31 : 13:38:08

Pic from book http://www.stretchunlimited.com/plan.jpg
Takes a minute to load...
The black box is title Checking an exisiting plan to see if recompilation is necessary.

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2008-01-31 : 13:46:08

I understand all that, but now consider a long running procedure. At the beginning the stats are regenerated but with many large bulk updates/deletes/inserts, the performance of the procedure itself diminishes since the stats will not be regenerated while the procedure is running (assuming a linear process). So we need something outside of the process to monitor this for us. And actually I came up with this and am testing it now:

DECLARE @command1 NVARCHAR(2000)
DECLARE @command2 NVARCHAR(2000)
DECLARE @command3 NVARCHAR(2000)
DECLARE @precommand NVARCHAR(2000)
DECLARE @postcommand NVARCHAR(2000)

SET @command1 = 
    'PRINT REPLICATE(''-'', 100);' +
    'PRINT ''Starting statistic updates on database ? at '' + CONVERT(NVARCHAR(23), GETDATE(), 121);' +
    'PRINT REPLICATE(''-'', 100);' +
    'USE ?; EXEC sp_updatestats @resample=''resample'';' +
    'PRINT REPLICATE(''-'', 100);' +
    'PRINT ''Completed statistic updates on database ? at '' + CONVERT(NVARCHAR(23), GETDATE(), 121);' +
    'PRINT REPLICATE(''-'', 100);'

EXEC sp_MSforeachdb 
    @command1=@command1, 
    @command2=@command2, 
    @command3=@command3, 
    @precommand=@precommand, 
    @postcommand=@postcommand

This isn't production quality yet but it's a good proof of concept.

TRACEYSQL
Aged Yak Warrior

594 Posts

Posted - 2008-01-31 : 14:53:17

Do you know if the stats are being updated first.

Perhaps write this

SELECT
t.name AS Table_Name
,i.name AS Index_Name
,i.type_desc AS Index_Type
,STATS_DATE(i.object_id,i.index_id) AS Date_Updated
--,sp.rows --if you want to know how many rows unrem this
FROM
sys.indexes i JOIN
sys.tables t ON t.object_id = i.object_id JOIN
sys.partitions sp ON i.object_id = sp.object_id
and i.index_id = sp.index_id --new

WHERE
i.type > 0 and --if you want to see heap rem this
sp.rows > 0
ORDER BY
t.name ASC
,i.type_desc ASC
,i.name ASC

to a table inbetween to tell you if they are being updated.

I like to know as this is new to me too.

TRACEYSQL
Aged Yak Warrior

594 Posts

Posted - 2008-01-31 : 15:02:45

I just created a simple table with no data in it.

Added 100 rows to the table.

Stats are not updated until you do select x from table where x = b which is where the query optimizer comes into play.

So when queries are ran against this table the stats will become updated with the new values.

Perhaps best to do reindex after the updates this will then update the stats for you.

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2008-01-31 : 15:51:23

I think the option of having a job call sp_updatestats is the best option here. Especially considering the changes in sp_updatestats in 2005 where it will only update stats that are outdated instead of all of them.

The one thing you are neglecting to test is this concept through a large procedure that is executing, referencing the same tables repeatedly, performing many intestive bulk operations that runs for an extended period of time. This causes the performance of procedure to diminish. Even if code in the same procedure references one of these tables with outdated statistics it won't be updated. This is because the query plan is generated with the procedure is first executed. Something else in the background will need to deal with this. Hence a SQL job that runs at some predefined interval that does this for you.

TRACEYSQL
Aged Yak Warrior

594 Posts

Posted - 2008-01-31 : 16:53:15

Yes then best to run the sp_updatestats do you need to put parameters in this to say only update ones that are not updated? or its smart and does it .

Put the sync on in your database will help too.

pootle_flump

1064 Posts

Posted - 2008-02-01 : 06:51:43

I'm not sure if you guys got to the bottom of the asynch auto update but in case not:

If there are changes to 20% of the data since the last stats update and autoupdate (async or not) is on then the object is flagged for recompilation. The next time a query accesses this object, the update statistics are run. If the async option is on, the query that triggered the update runs as before using the old stats. If not then it waits until the new stats are created, create a new plan and then execute. Obviously, 20% is a high threshold and performance can degrade well before that point.

HTH

tfountain
Constraint Violating Yak Guru

491 Posts

Posted - 2008-02-01 : 10:57:06

Tracey - sp_updatestats now only updates statistics that are out of date as of the 2005 version. It's actually pretty nice - I have a job deployed to our DEV, QA and UA servers to gauge it's impact and such. It also reports every object and identifies if there was an updated required or not.

pootle - I agree that the 20% threshold is high and have witnessed severe degradation before that. Also, the new ASYNC option still doesn't benefit me in my situation - the current procedure executing that is being "self-degrading". This job that runs every so often is perfect for that purpose. On our slow servers when there is nothing to do except cycle and check for what to update it takes about 2 minutes to run (across all our user databases and user tables). And our production servers make these servers look like 486 class machines so I think I'll be good even if I have this job running every 20 minutes during peak periods.

Subscribe to SQLTeam.com

SQLTeam.com Articles via RSS

SQLTeam.com Weblog via RSS

- Advertisement -

Resources