Esat Erkec
A basic query plan of a query

Unrevealed tips of SQL Server Statistics

June 13, 2022 by

In this article, we will explore some internal working principles of SQL Server statistics.

What is Cardinality Estimator (CE)?

The Cardinality Estimator (CE) is the core subunit of the SQL Server query optimizer and its responsibility is to predict how many rows will be returned from a query. The accuracy of this prediction directly affects the generated query plan efficiency. Hence, this effect upon the query plan makes CE decisive and critical on the query performance.

SQL Server statistics and Cardinality Estimator

SQL Server statistics stores the columns data distributions and they use histograms to organize the distribution of the data. The cardinality estimator uses these histograms to obtain the required analytical data and then runs a mathematical algorithm to estimate how many rows could be returned from a query. In order to figure out how the cardinality estimator performs these predictions more clearly, we will make some examples. However, firstly we require a sample table thus we will create a sample table through the following query. This table is a heap copy of the SalesOrderHeader table of the Adventureworks database.

What is the meaning of predicate?

A predicate defines a conditional expression that is applied to rows of the table to filter the result set of the query.

Single Predicate Example:

In this first example, we want to only return particular rows that the Freight column equals a specified value. Before executing the query, we will enable the actual execution plan.

A basic query plan of a query

As seen in the execution plan of the query, the Estimated Number of Rows for all Execution has been estimated at 1551 rows by the CE, and this estimation is calculated using a particular statistic. The used statistics details can be seen under the OptimizerStatsUsage attribute of the execution plan.

SQL Server query plan OptimizaerStatsUsage attribute

Through the DBCC SHOW_STATISTICS command, we can obtain detailed information about the statistics density and its histogram.

Histogram of a SQL Server statistics

The histogram data EQ_ROWS column shows the number of rows that have the same value as the predicate. The 178th row of the histogram matches with the predicate row and the estimated rows calculation is equal to 1551.

Double Predicate Example:

In this second example, we will apply two predicates in the query. In this case, the cardinality estimator uses a complicated formulation to calculate the estimated number of rows. At first, we will execute the following query and check out the estimated number of rows.

AND CurrencyRateId IS NULL

What is the meaning of Estimated Number of Rows for all Execution in SQL Server As we can see, the optimizer calculated the estimated number of rows is 1154,49, and this estimation is made using the following formula:

Estimated Number of Rows = (P0*P1^(1/2)*P2^(1/4)*P3^(1/8))*Card

P() expression indicates the selectivity of the predicates and Card expression shows the table total number of the rows. The selectivity is calculated by dividing the number of table rows by the estimated number of rows. As a first step, we will calculate the selectivity using the histograms.

Usage details of the DBCC SHOW_STATISTICS command

As we can see, the estimated rows are 1551, and the total number of the table is 315645. We apply the following formula to calculate the first predicate selectivity.

P() = Estimated Rows / Total Row Number of Table

P0 = 1551 / 31565

P0 = 0,049137

Cardinality Estimator working principle

We use the same formula to calculate selectivity for the second predicate.

P1= 17489 / 31565

P1 = 0,55406

Combined selectivity: P0*P1 =0,036575

The estimated number of rows = 31565* 1154,493

The estimated number of rows = 1154,493

As we can see, we achieved the estimated number of rows using SQL Server statistics. However, we need to take into account one point in this formula, before starting the calculation we need to sort the selectivity of predicates into ascending order and then apply the formula.

P0<P1<P2<P3

Multiple Predicates Example:

In this last example, we will add three predicates in our sample query and we will use the query_optimizer_estimate_cardinality event. This event can be used to diagnose cardinality estimate issues. Through the following query, we will create an event session and start it.

As the second step, we will execute the following query.

The event session captured our query and returned detailed information about the internals of the CE. When we click the CalculatorList field we can find out the selectivity of each predicate.

query_optimizer_estimate_cardinality event in SQL Server extended event

Calculator list

At first, we will sort the predicate selectivity and then apply the formula:

P0=0,049

P1=0,197

P2=0,554

The estimated number of rows = 0,049 * SQRT(0,197) * SQRT(SQRT(0,554)) * 31565

The estimated number of rows = 592,26

As can be seen, the result we found is almost the same as the estimated number of rows returned from the query.

SQL Server statistics and Legacy Cardinality Estimator

In software, development Legacy code terms describe the inherited codes from an older version of the software.

Microsoft has made radical changings the cardinality estimator algorithm in SQL Server 2014 version but the legacy cardinality estimator is still enabled in SQL Server.

We can use the FORCE_LEGACY_CARDINALITY_ESTIMATION hint in the queries to enable this legacy cardinality estimatimator. When we want to enable legacy cardinality at the database level we can apply the following changes in our database :

As an alternative method, we can enable legacy cardinality in the Database Scoped Configurations.

LEGACY CARDINALITY ESTIMATION setting in SSMS

However, if we apply this change to any database the cached query plan will be removed, and if we don’t have serious reason to use legacy estimator using FORCE_LEGACY_CARDINALITY_ESTIMATION hint will be more reasonable at the query level. Using the legacy cardinality estimator will change the estimated number of rows because of the legacy estimated row calculation algorithm. We will execute the following query using the FORCE_LEGACY_CARDINALITY_ESTIMATION query hint.

How to use FORCE_LEGACY_CARDINALITY_ESTIMATION hint in SQL Server

As we can see, the estimated number of rows is changed when we use the legacy cardinality estimator. In order to calculate the estimated rows when a query uses a legacy cardinality estimator the formula is:

Estimated Number of Rows = P0*P1*P2*…*Pn*Card

Estimated Number of Rows = 0,049137 * 0,55406 * 31565

Estimated Number of Rows = 859,3523

The result we found exact matches with the estimated number of rows returned from the query.

Updating SQL Server statistics in Parallel

Starting from SQL Server 2016 SP2 we can use the MAXDOP hint when we create or update statistics manually. However, how does SQL Server behave when automatically creating or updating statistics operations? The best option to find out this question is monitoring the activities that are performed by the SQL Server behind the scene when it creates SQL Server statistics automatically. In order to demonstrate this scenario, we need to drop all automatically created statistics in the sample table.

We will create an event session to capture which activities are performed by the SQL Server.

In this last step, we execute our sample query. During the execution of this query SQL Server will create three individual statistics.

After execution of the query let’s interpret the extended event data.

Extended event output data

The object_created event shows us three different SQL Server statistics have been created and these statistics are created for every predicate in the query. The auto_stats event indicates these statistics are used by the optimizer after they have been created.

The StatMan is an internal aggregation function and used for statistics operations. We can see the usage of this function in the captured event data.

Details of the StatMan

We can see the MAXDOP hint at the end of the query where the Statman function is used, and this shows us that the SQL Server statistics generation process is done in parallel. Shortly, we can say that auto-create and update SQL Server statistics operations can be performed in parallel.

Summary

SQL Server statistics are extremely critical objects for the query optimizer to generate more effective query plans. And also, the cardinality estimator is the core functionality of the optimizer so understanding the cardinality estimator and statistics synergy can be very helpful to interpret query plans more clearly.

References:

SQL Server CE: Multiple single-column statistics
Statistics – Single Threaded & Parallel Operations

Esat Erkec
Latest posts by Esat Erkec (see all)
168 Views