Esat Erkec
Table Spool (Lazy Spool) operator details

SQL TOP statement performance tips

November 9, 2021 by

In this article, we will discuss the performance details of the SQL TOP statement, and we will also work on a performance case study.

Pre-requirement

In this article, we will use the Adventureworks2019 sample database. At the same time, we will use the Create Enlarged AdventureWorks Tables script to obtain an enlarged version of this sample database.

What is SQL TOP Statement

The SQL TOP statement is used to limit the number of rows that are fetched or updated from one or more than one table. This row limitation can be determined to use a fixed value or percentage of the table. For example, the following query will return the first random 10 rows of the Production table.

Basic usage of the SQL Server TOP statement

At the same time, we can use the PERCENT keyword with the SQL TOP statement to return the percentage of the total result set. The following query will return 12% of the Production table.

Basic usage of the SQL Server TOP statement PERCENTAGE syntax

The query returned 61 rows because the production table contains 504 rows and 12% of that number ((504/100)*12=60.48) equals 61.

Impact of the SQL TOP statement on the query plan

On a daily routine, the SQL TOP statement is often used by the developers in their queries to limit the number of the record that returns from their queries. However, the SQL TOP statement may influence the execution plan that is generated by the query optimizer. When we use the TOP operator in a query, the query optimizer may decide on a different query plan for the same query than one without the TOP clause. The following query joins the SalesOrderHeaderEnlarged and SalesOrderDetailEnlarged table and when we execute this query the optimizer will decide to use the Adaptive Join operator. This operator allows the optimizer to decide to choose either nested loop join or hash join during the execution of the query. Query optimizer decides the join type based on a threshold number of rows and Actual Join Type defines which join type is used in the execution of the query.

Execution plan of a query

As seen above, the hash match join operator has been chosen by the query optimizer for this query. This join type is preferred by the optimizer when the unsorted large amount of data wants to be joined. In the hash match join, SQL Server builds a hash table in the memory and then begins to scans the matched rows into the hash table. Now, we will add the TOP 10 expression to our sample query and re-analyze its query plan.

TOP statements can influence the query plans

As we can see in the execution plan of the query, the optimizer has started to use the nested loop join instead of the hash join. The nested loop join type is based on a very simple loop algorithm. Each row from the outer table search on the inner table rows for the rows that satisfy the join criteria. This join type shows a good performance in the small row numbers. The idea behind the execution plan changing is that the query optimizer knows the query will return a small number of rows because of the TOP clause in the query. Therefore optimizer tries to seek out a more optimum plan to fetch the small number of rows more quickly. In this circumstance, the nested loop join is the cheapest and fastest way to fetch the small number of rows for this query, and also nested loop requires fewer resources. Here we need to emphasize one point, the optimizer benefits from a feature that is called row goal to fulfill this query plan changing because of the TOP clause.

SQL TOP statement and Row Goal

SQL Server query optimizer is a cost-based optimizer and it generates various execution plan alternatives for a query and then chooses an execution plan that has the lowest cost. On the other hand, some query keywords and hints limit the number of rows that return from a query so the optimizer decides an efficient execution plan which is more suitable to return the fewer number of rows. As we stated, the row goal is applied by the optimizer to the whole execution or part of the execution plan to fetch the rows more quickly. Now we will execute and then analyze the following query execution plan deeply to understand this feature more clearly.

In this query, the estimated number of rows is set as 10 because the optimizer knows this number exactly because of the SQL TOP statement.

ROW GOAL in the execution plan

However, this clue is not enough to understand the row goal is applied to the query plan by the optimizer. The EstimateRowsWithoutRowGoal attribute shows that the row goal is applied to the query plan operator and it defines how many rows will be estimated if the row goal is not used by the optimizer.

EstimateRowsWithoutRowGoal attribute in the execution plan

We can use the DISABLE_OPTIMIZER_ROWGOAL query hint to disable the row goal to apply the queries. When we execute the sample query with this hint, the estimated number of rows will change.

How to disable row goal in an execution plan

Case Study: Improve SQL TOP statement performance

In this case study, we will try to improve the performance of an example query that performs poorly. First of all, we will enable the I/O and time statistics and then execute the query to analyze its execution statistics.

What is worktable in query statistics

In the output of the query statistics, Worktable has performed 147.594 logical reads despite there does not exist any table in the query with the WorkTable name. In fact, the reason for this confusing is that worktables indicate temporary tables that are created during the execution of the query on the tempdb database. The main purpose of building these tables is to store intermediate result sets temporarily. Now let’s turn our eyes to the query plan to find out what caused this enormous IO. At this point, we will look at the actual execution plan of the query. In the execution plan, we see a Table Spool (Lazy Spool) operator and we can notice that it pumps enormous data to the nested loop operator. Table Spool (Lazy Spool) operator creates a temporary table in the tempdb and stores the rows in this temporary table when the parent operator requests a row.

Table Spool (Lazy Spool) operator details

At the same time, there is a dramatic difference between the estimated and actual number of rows. In general, to resolve this type of issue we can update the statistics but for this issue, this idea will not help to improve query performance. Additionally, the table spool is located on the inner side of the join it means that the sum of the actual rebinds and rewinds numbers equal to the actual number of the outer side join operator (clustered index scan).

Row goal details in an execution plan

The query optimizer has applied the row goal method to clustered index scan (SalesOrderDetailEnlarged) and nested loop operator but this method is not applied to the clustered index scan (SalesOrderHeader) operator. Particularly in this query, avoiding the table spool operator can reduce the I/O performance of the query so it helps to improve query performance. In general, we may observe the table spool operator with the nested loop join but we can force the optimizer to change this join type with other alternative join types. We can use the OPTION clause to add some hints that can force the optimizer to change the optimal query plan. In order to get rid of the table spool operator, we can force the optimizer to use a hash join instead of the nested loop join. To do this, we will add the OPTION (HASH JOIN) statement at the end of the query.

What is hash match join in an execution plan

I/O statistics of a query after the query optimization

As we can see after forcing the optimizer to use hash join, the logical read statistics, and execution time is reduced. As an alternative method, we can use the trace flag 8690 or NO_PERFORMANCE_SPOOL to disable the spool on the inner side of the nested loop.

Usage details of the 8690 trace flag

Usage details of the NO_PERFORMANCE_SPOOL

Table Spools operator is used to improve performance of the query performance but sometimes it can cause performance bottlenecks.

Conclusion

In this article, we have explored SQL TOP statement performance details. The query optimizer can change the execution plan when a query includes the TOP clause because of the row goal feature.

Esat Erkec
Latest posts by Esat Erkec (see all)
877 Views