Esat Erkec
Unique indexes affect on a query plan

SQL Server statistics from a different perspective

June 10, 2022 by

The goal of this article is to throw light on the less-known points about SQL Server statistics.

Introduction

SQL Server statistics are one of the crucial database objects because they are one of the extremely important inputs of the query optimizer. SQL Server statistics stores the distribution of the column values and this statistical data is used by the optimizer to estimate how many rows will return from a query. Based on this estimation optimizer decides query plan operators and performs other required calculations. In short, statistics accuracy plays a critical role in generating an effective query plan.

SQL Server offers three main database-level options to maintain statistics automatically.

Auto Create Statistics

This option allows the generation of new statistics for single columns by the optimizer that does not already have columns values distribution (histogram) data. The reason for this operation is, the optimizer wants to improve the accuracy of the estimated number of rows for the query plan.

Auto Update Statistics

This option allows the optimizer to trigger the update statistics operation when the data modifications reach a threshold, the optimizer decides to update the statistics operation. The query optimizer triggers to update statistics operation when the threshold value meets the following conditions:

  • The table row number has gone from 0 rows to more than 0 rows
  • The table had 500 rows or less when the statistics were last sampled and has since had more than 500 modifications
  • The table had more than 500 rows when the statistics were last updated and the number of row modifications is more than MIN (500 + (0.20 * n), SQRT(1,000 * n) ) formula result after the statistics were last sampled

Auto Update Statistics Asynchronously

When we enable this feature, the optimizer does not trigger to update the statistics during the execution of the query and it uses stale statics. However, the statistics will be updated subsequently by SQL Server.

Unique Indexes and SQL Server statistics

We can use the unique indexes to ensure the singularity for column data. The optimizer does not leverage SQL Server statistical data when data is searched on the unique indexed column through the equals (“=”) operator. Now let’s discover this scenario with a very simple example. At first, we will create a sample table and define a unique index for the col1 column.

In this step, we will insert only a single row into the table and execute a select query in order to create the statistics.

After executing the select statement the statistics will be updated. This situation can be seen using the dm_db_stats_properties view. This view returns three important details about the SQL Server statistics;

  • last_updated specifies when the statistic is last updated.
  • modification_counter shows the total number of the modifications made since the last statistical update. This number is not affected by the rolled-back changes.
  • rows_samples show the total number of rows sampled for statistics calculations.

Details of the  dm_db_stats_properties view

Now, we will insert some data into the sample table.

Explanation of the modification_counter

After this operation, we can expect that the statistics will be updated after any read request because the modification counter exceeds the threshold value. However, the following select query does not trigger the statistic update process.

Tip: 1=(SELECT 1) expression helps to get rid of the trivial query plans because they prevent updating statistics.

Unique indexes affect on a query plan

SQL Server Statistic details As we mentioned earlier, an equality condition search in a unique indexed column can return at most one row. The optimizer does not need to use statistics for this type of predicates because the estimation result is constant.

Estimated execution plan and SQL Server statistics

Execution plans are used to analyze and diagnose the query performance because the execution plans describe the query compilation and execution steps with all details. SQL Server offers two types of the execution plan:

  • Estimated Execution Plan is generated without executing the query.
  • Actual Execution Plan is generated after executing the query.

For the specific situations, we can need to analyze an estimated query plan but it may trigger the creation of the statistics behind the scenes during the generation of the estimated plans. Now, we will clarify this scenario with an example. At first, we will create an extended event to monitor which activities occur when an estimated execution plan is created. This event session will involve the following events:

  • object_created occurs when a new object is created using CREATE statement
  • query_post_compilation_showplan occurs when a SQL statement is compiled
  • auto_stats occur when the statistics are being loaded for use by the optimizer.

With the help of the following query, we will create this extended event session. We need to underline one point about this event session, we apply a filter to the session-id field so that we can exclude capturing other irrelevant activities.

After creating the event session, we create a new sample table and populate it some data.

Now we will start the event session and then we launch the Watch Live Data screen to view the captured events data in real-time. As the last step, we click the Display Executed Execution Plan for the following query.

Displaying estimated execution plan

The extended event has captured some events related to this operation. Now let’s interpret what these events indicate.

The extended event captured data

The first event indicates the initial compiled query plan and this query plan does not use any statistics data. When we click the Query Plan, we can see this query plan.

Capturing an estimated execution plan

The object_created events show the created statistics details.

Capturing a created statistic with an event session

The auto_stats events indicate the statistics are loaded by the optimizer to generate an efficient query plan.

Monitoring statistics details with an event session

In the final process, the optimizer generates a new query plan using the newly created SQL Server statistics. In this plan, the estimated number of rows is more accurate than the initial query plan because of the freshly created statistics.

Display estimated execution plan

SQL Server statistics and ROLLBACK

The ROLLBACK statement is used to revert changes made in a transaction so that all changed data is reverted to what it was before the transaction state. However, change_counter is not affected by these rollbacks and this rollback can be caused to redundant query recompiles. Let’s demonstrate this case with a very simple example.

Step 2: We create a stored procedure and execute it. After executing the stored procedure SQL Server statistics will be updated.

Monitor a statistics details

Step 3: In this step, we will insert some rows into the table, and then we undo these changes through the ROLLBACK command. However, this situation increments the modification_counter value of the statistics.

SQL Server statistics

Step 4: We will create an extended event session that helps to monitor the activities when we execute the sample stored procedure again.

Step 5: Start the extended event session

Step 6: Open the Watch Live Data screen of the event session.

Step 7: In this last step, we execute our sample procedure with the same parameters, and we discover what is going on behind the scene during the execution of the procedure.

Intereprenting extended event captured data

We can separate captured events into 3 sections and then interpret them. In the first part, the optimizer finds a cached plan of the procedure and decides to use it but then noticed the statistics are out-of-date. At this point, the optimizer begins to recompile the stored procedure because of the statistics changing. The second section specifies which statistics are updated. In the last step, the procedure is started to execute and is completed successfully.

Summary

SQL Server statistics are crucial objects for the optimizer to generate more effective query plans. In this article, we deep dove into some less-known points of statistics and explored which activities are occurred behind the scene.

Esat Erkec
Latest posts by Esat Erkec (see all)
168 Views