Rajendra Gupta
SQL Server Statistics properties

Persist sampling rate for automatic SQL Server statistics updates

August 27, 2021 by

This article explores Persist sampling rate feature for automatic SQL Server statistics update.

Introduction

DBA should always focus on the optimal database performance as it is directly related to the application’s performance. Therefore, the database administrators proactively configure, monitor, and maintain database configurations. Indexes help DBA to optimize queries based on the user data requirements.

SQL Server Statistics are helpful for cost optimization of various query execution plans. It is a collection of the distinct value that SQL Server collects using the sampling of table data. You can face high resource utilization (CPU, Memory, IO), inefficient query plan, inadequate operators such as Index scan, deadlocks, blocking. SQL Server uses statistics to choose the required index, data traverse mechanism and builds an optimal, cost-effective query execution plan.

SQL Server has multiple options for effective maintenance of statistics. This article explores statistics options with persisting sampling rates for Azure Database.

If you are new to SQL Server statistics, I would recommend you to go through the following articles.

  • Note: In this article, I use SQL Server 2017 CU23 (Version 14.0.3381.3) and Azure SQL Database for demonstration

    SQL Server Version

Automatic index maintenance SQL Server statistics

Each SQL Server database has the following properties for automatic statistics maintenance. It is applicable for all SQL Server versions, including Azure SQL Databases.

Connect to a SQL instance, select any database, and view its properties. The following figure shows SQL Server Statistics options available in Azure SQL Databases.

  • AUTO_CREATE_STATISTICS
  • AUTO_UPDATE_STATISTICS
  • AUTO_UPDATE_STATISTICS_ASYNC

SQL Server Statistics properties

Let’s understand the Auto-create and Auto-update statistics in detail with practical demonstration.

AUTO_CREATE_STATISTICS

The AUTO_CREATE_STATISTICS creates statistics on the individual (single) columns in the query predicates. These statistics are created on the columns that do not have a histogram. SQL Server uses a prefix _WA for each auto-created statistic for the query predicate column.

You can run the following query to retrieve a list of auto-created SQL Server Statistics.

AUTO_CREATE_STATISTICS

AUTO_UPDATE_STATISTICS

The SQL Server Statistics get out of date due to frequent DML operations. The query optimizer updates the statistics automatically based on the predefined threshold. This threshold is based on a table cardinality (number of data rows).

The following table highlights the recompilation threshold starting from SQL Server 2016.

Table Type

Table Cardinality(n)

Number of modifications ( Recompilation threshold)

Temporary

n<6

6

Temporary

6<=n<=500

500

Permanent

n<=500

500

Temporary or Permanent

n>500

MIN ( 500 + (0.20 * n), SQRT(1,000 * n) )

Example:

  • Table record count: 5 million
  • Recompilation threshold =
    • Value1: Min(500 + (0.20 * 5,00,000))= 100,500
    • Value2: SQRT( 1000 * 5,00,000) = 70710
    • Min( Value1, Value2) = 70710

As per the calculations, SQL Server updates statistics every 70,710 modifications. Automatic SQL Server Statistics update might not work efficiently for your workload. Therefore, usually, DBA add SQL Server statistics update as part of their regular database maintenance activities using database maintenance plans or T-SQL scripts. To update the statistics manually, you can run the UPDATE STATISTICS command. You can refer to Microsoft documentation for more details on it.

For the older versions SQL 2014 or lower, the calculation remains the same except for cardinality >500.

Formula: 500 + (0.20 * n)

As per this formula, SQL Server updates statistics for every 100,500 modifications.

Demonstration of SQL Server statistics updates

Let’s perform a quick demo for the statistics update. Execute the following script to create a sample table and inserts records into it.

As shown below, it inserted 421201 rows in the [demotable].

AUTO_UPDATE_STATISTICS

Now, run the following script to check the statistics details:

  • Total number of rows
  • last Statistics update timestamp
  • sampling percent
  • Number of modification since last stats

The following figure shows the NULL in most of the columns. The reason behind this is that no stats update was performed. We did not update stats manually or modified the SQL Server rows to update stats based on the threshold.

NULL values

Let’s run a select statement and view the statistics counter values.

SQL Server automatically updated the stats for the index. It uses 49% sample percent and 20378 sampling rows.

view the statistics counter values

Update Statistics with Full Scan

The query optimizer often does not create an optimized plan due to the low sampling percent, especially for a huge table. In these cases, you can update statistics manually using the UPDATE STATISTICS WITH FULL SCAN query. I have often seen query performance issues resolves if you have updated indexes statistics with the full scan.

With the full scan, the number of sampling rows equals the number of rows in the table. It is an optimized way for the query optimizer to use the latest statistics for a better decision. You can see value as 100 in the column – [SamplePercent].

Update Statistics with Full Scan

The modification counter value is zero because we did not perform any value update for the [demotable]. Now, run the following query to perform an update to 10,000 rows.

The following figure shows the value of modification_counter value to 10,000. It equals to the number of rows updated using the UPDATE statement.

Check Modification Counter value

We perform a manual statistics update with the default settings. SQL Server uses the 49% sampling percent for the statistics.

Sampling rate

Explore Update Statistics full Scan and Persist_Sample_Pecent

Like the previous case, if we update the records above the threshold calculation, SQL Server performs auto statistics update. Suppose we want SQL Server to update statistics with full scan once it reaches the threshold modification counter value. For this purpose, we need to focus on the last column, Persisted sample percentage.

We can force any statistics using the persisted sample percentage. The following query updates statistics [IX_DemoTable_Name] with full scan and enables persist_sample_percent counter.

After running the above query with PERSIST_SAMPLE_PERCENT, the last column below the figure’s value shows a 100% sampling rate.

After running the above keyword, you will notice that the very last column of our very first query will now show you sample percent, which we have instructed to preserve (persist) for our statistics.

Explore Update Statistics Full Scan

It ensures that SQL Server always uses the full scan for specified statistics and overrides the default behavior. It is applicable both for manual and automatics statistics updates.

Note: You can avoid query performance issues due to statistics with low sampling. The updating statistics with a full scan might take longer for a huge table. Therefore, you should test it on the non-prod environment and enable it only for required indexes.

  • Note: The persist sampling feature is available from SQL Server 2017 CU1 and SQL Server 2016 SP1 CU4. The database must have a compatibility level of 130 or higher. You can refer to the Microsoft knowledge base article – KB4039284 for more details on it

To turn off the persist sampling for stats, you can run the update statistics to disable it.

  • Note: This persisted sampling feature is not available at the database level. You can use it with the UPDATE STATISTICS command as described in this article

Conclusion

This article explored the enhancement feature – Persisted sampling for updating statistics with a fixed sampling and overriding the SQL Server default sampling mechanisms. You can use this feature for a table with frequent and high modification counter values. It can help resolve performance issues that you face with low sampling percent.

Rajendra Gupta
Monitoring, Performance

About Rajendra Gupta

As an MCSA certified and Microsoft Certified Trainer in Gurgaon, India, with 13 years of experience, Rajendra works for a variety of large companies focusing on performance optimization, monitoring, high availability, and disaster recovery strategies and implementation. He is the author of hundreds of authoritative articles on SQL Server, Azure, MySQL, Linux, Power BI, Performance tuning, AWS/Amazon RDS, Git, and related technologies that have been viewed by over 10m readers to date. He is the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups. Based on his contribution to the SQL Server community, he has been recognized with various awards including the prestigious “Best author of the year" continuously in 2020 and 2021 at SQLShack. Raj is always interested in new challenges so if you need consulting help on any subject covered in his writings, he can be reached at rajendra.gupta16@gmail.com View all posts by Rajendra Gupta

260 Views