Simon Liew

Introducing SQL Server Incremental Statistics for Partitioned Tables

February 3, 2017 by

If you are maintaining a very large database, you might be well aware of the pain to perform update statistics on a very large table.

This article introduces incremental statistics which is available from SQL Server 2014 highly simplifies statistics management on very large partitioned tables.

SQL Server Incremental Statistics

Accurate statistics are essential to allow query optimizer to generate a good enough query plan. In a very large partitioned table, updating table statistics requires to sample rows across all partitions and the statistics reflects the data distribution of the table as whole. Update statistics takes a lot of I/O and CPU resources not to mention the duration can be very lengthy.

Imagine the data distribution remain the same for all previous partitions, and you only need SQL Server to know the changed data distribution for a newly created\loaded partition. This sounds like a common scenario and now you can manage this scenario efficiently using Incremental Statistics which is built-in on SQL Server 2014 and onwards.

Prior to SQL Server 2014, the similar workaround to maintain partition specific statistics is to create filtered statistics for each partition manually and update the specific partition statistics.

This article will utilize WideWorldImporters database on SQL Server 2016 Developer Edition Service Pack 1 to understand the utilization of Incremental Statistics.

Brief on partitioned tables

WideWorldImporters is a great sample database as it comes with 2 partitioned tables – Purchasing.SupplierTransactions and Sales.CustomerTransactions.

For simplicity, we will just focus on table Purchasing.SupplierTransactions in this article.

Incremental statistics will only work on statistics which the index definition uses the same partition scheme as the partitioning column on the table to be able to set STATISTICS_INCREMENTAL = ON.


SQL Server Partition scheme

If you try to update partition level statistics on an index statistics which has not been set to use incremental statistics, it will prompt an error.

Msg 9111, Level 16, State 1, Line 34
UPDATE STATISTICS ON PARTITIONS syntax is not supported for non-incremental statistics.

Note that argument RESAMPLE is required (argument FULLSCAN or SAMPLE number PERCENT is not supported) when updating partition level statistics. RESAMPLE reads the leaf-level statistics using the same sample rates and merge this result back into the main statistics histogram.

A different partition sampling rate cannot be merged together and the syntax constraints made sure this does not occur as well.

Enabling Incremental Statistics

There is a database level setting to enable incremental statistics. When the option INCREMENTAL is turn on at the database level, newly auto created column statistics will use incremental statistics on partitioned tables by default.


Existing index or column statistics will not be affected by this database option. You will have to manually set the existing statistics to be an incremental statistics on the partitioned table. The command is quite straight-forward as below.


Once incremental statistics is enabled for an index statistics, the is_incremental value will be set to 1 on DMV sys.stats.


Clustered index

Now that incremental statistics is enabled on CX_Sales_CustomerTransactions, we can update the index statistics at the partition level.

From SQL Server 2014 SP2 and SQL Server 2016 SP1, you can leverage a documented DMF sys.dm_db_incremental_stats_properties to view properties of the incremental statistics


Incremental statistics

There are 5 partitions and each partition indicates it has its own statistics with a maximum of 200 steps for each partition that contains data. We have only updated the statistics for partition 3 and this is reflected by newer date and time stamp in the last_updated column.

Incremental statistics are not used by CE

It is great to know each partition can contain up to 200 steps to form a histogram. However, SQL Server do not use this partition level statistics in Cardinality Estimate (CE). The main statistics which get updates from partition level statistics is the statistics that SQL Server will use. CE refers to an estimated prediction of the number of rows in query result and primarily derived from histograms that are created when indexes or statistics are created.

To prove this statement, we will use DBCC SHOW_STATISTICS to get the statistics histogram of the main statistics and the incremental statistics and test the CE with a simple query.

Main statistics

At the time of this article, the only way to get detailed content of statistics histogram is to use DBCC SHOW_STATISTICS. Index statistics CX_Sales_CustomerTransactions has 200 steps and the screen shot is cut short to show the beginning and the end of the statistics histogram.


Output of DBCC Show_STATISTICS

Executing a simple query filtering on a TransactionDate = 2016-05-18 which has an equal EQ_ROWS in the statistics histogram returns with an accurate 101 rows in the Actual Number of Rows and also matches the Estimated Number of Rows in the query plan.


Execution plan

Partition Level incremental Statistics

We will use an undocumented trace flag 2309 to view the incremental statistics histogram. This trace flag allows an additional node_id parameter to be specified as an input into DBCC SHOW_STATISTICS command.

The node_id for a particular partition can be obtained using an undocumented DMF [sys].[dm_db_stats_properties_internal].


Partition Level incremental Statistics

We will pick partition 4 which has 152 steps to display the incremental statistics histogram as an example.


Partition Level incremental Statistics

Re-executing the SELECT query filtering on TransactionDate = 2016-05-27 indicates the Estimated Number of Rows is 87.6 whereas the actual number of rows read is 134 (134 is accurately reflected in the incremental statistics EQ_ROWS but is not used by SQL Server CE).

If you refer to the main statistics histogram, 87.6 is the AVG_RANGE_ROWS value for TransactionDate = 2016-05-31. So, SQL Server uses the main statistics histogram to get the CE and not the incremental statistics histogram.


Actual Execution Plan

Incremental Statistics in Action

We will insert 10 rows each into partition 1 and partition 5. The INSERT will not kick off automatic update statistics since the number of rows inserted are very small relative to the total number of rows in the table.


The index statistics CX_Sales_CustomerTransactions is not updated and hence the query plan below will not reflect the additional 10 rows inserted for TransactionDate = 2017-01-20.


Execution plan and incremental statistics

We now update the statistics for only partition 5 and check the main statistics


Output of incremental statistics

The main statistics now has reflected statistics on partition 5 only, and the statistics histogram between partition 1 and partition 4 remains the same.

Re-executing the same query on TransactionDate = 2017-01-20 would now reflect a more accurate estimation of rows returned.


accurate estimation for rows returned

Since index statistics CX_Sales_CustomerTransactions is updated using FULLSCAN, updating partition level statistics with RESAMPLE will also use FULLSCAN.

Manually updating partition 1 and partition 5 statistics took 39 ms.

SQL Server Execution Times:
      CPU time = 31 ms, elapsed time = 39 ms.

The conventional way without incremental statistics to update statistics using FULLSCAN on index statistics CX_Sales_CustomerTransactions took 82 ms. On this very small scale of testing, this update statistics is twice slower than just updating incremental statistics of 2 partitions.

It is easy to imagine the benefit if the rows in the table is of magnitude in scale.

SQL Server Execution Times:
      CPU time = 78 ms, elapsed time = 82 ms.

Summary

Incremental Statistics are only relevant for partitioned tables, and this feature is a clever way to allow more efficient statistics management for very large partitioned tables.

Whilst the partition level statistics are not used by SQL Server CE, allowing finer grain control to only update subset of the main statistics based on partitions which has changed data only helps tremendously with the performance of statistics maintenance.

Simon Liew
Partitioning, Query analysis

About Simon Liew

Simon Liew an independent SQL Server Consultant with deep understanding of Microsoft SQL Server technology with focus on delivering business solutions. He loves exploring data and passionate about sharing his knowledge. Simon has over 15+ years of database design, implementation, administration and development in SQL Server. He is a Microsoft Certified Master for SQL Server 2008 and holds a Master’s Degree in Distributed Computing. Achieving Microsoft masters-level certifications validate the deepest level of product expertise, as well as the ability to design and build the most innovative solutions for complex on-premises, off-premises, and hybrid enterprise environments using Microsoft technologies. View all posts by Simon Liew

168 Views