Esat Erkec
Stream Aggregate Operator and ORDER BY statement

SQL Server ORDER BY performance tips

January 5, 2021 by

In this article, we will explore how the ORDER BY statement affects the query performance and we will also learn some performance tips related to sorting operations in SQL Server.

The duration and resource (CPU and memory) consumption of the sort operation will change depending on the number of rows to be sorted. In this context, we can easily say that sorting operation is very costly for the SQL Server when it works for a huge number of rows. Using the indexes can improve the performance of the sorting operation because the indexes create an ordered structure of the table rows so that the storage engine can fetch the table rows in a pre-ordered manner using the index structure. For this reason, we can use indexes to eliminate the costly sort operations in the queries. However, using indexes can decrease the performance of the insert, update and delete statements and they also increase disk space usage of the database files. When we take into account all of these points, we need to think twice to use indexes in order to improve the sort operation performance in the queries.

SORT operator

The task of the sort operator is to put in order the received input data. The following query will retrieve data from the SalesOrderDetail table and the ORDER BY statement provides that the result set will return in ascending order according to the ModifiedDate column.

When we check the query plan after the execution of the query, we will see that the storage engine reads the data by way of the clustered index scan operator. As a second step, the sort operator takes 121.317 rows as input and sorts them in ascending order.

A detailed explanation of an execution plan that includes an ORDER BY statement

Now, let’s take a new example query. This query will use an enlarging copy of the Adventureworks database and it will sort approximately 453.341 rows.

Sort spill problem demonstration

In the compilation phase of the query plan, the query optimizer calculates the memory requirement of the query based on the estimated number of rows and row sizes, and this memory requirement is called a memory grant or query work buffer. For some reasons (outdated statistics, wrong cardinality estimation, parameter sniffing, badly defined data types) the memory grant might be calculated as wrong.

In the execution plan, we see a warning sign on the sort operator which means that the memory grant is not enough for the sort operation and the query required more memory to perform the sort operation. In this circumstance, during the execution of the query, the sort operator uses the tempdb database to meet the memory deficit. When we analyze the sort operator properties, we can find out more detailed information about the tempdb spill issue.

Memory grant details of the sort spill issue

The sort operator uses 45 MB memory but this memory amount is not enough to perform sorting all rows, for this reason, sort operator used the tempdb database. During the execution of a query, the memory grant can not be changed dynamically and insufficient memory grant estimations cause this type of issue.

  • Tip: Row Mode Memory Grant Feedback feature in SQL Server 2019 helps to overcome this problem without any code changing. The main idea of this feature is to adjust the memory grant requirement of a query to use the last execution memory grant information. With the help of this feature, we can assume that several executions of the query the memory grant must be adjusted. The below image illustrates memory grant changings after the four execution of the query. In the last execution of the query, the optimizer has decided to find the proper memory grant amount for this query

Row Mode Memory Grant Feedback demonstration

At the same time, the following options can consider overcoming the tempdb spill issue:

  • Creating an index that tunes the ORDER BY statement performance
  • Using the MIN_GRANT_PERCENT query option
  • Update the outdated statistics

Sort Operations and Clustered Index

A clustered index sorts the data rows based on the specified key columns so that the clustered index creates a sorted data structure of the table. Only one clustered index can be defined for a table because the data rows can be sorted in just one way. When we used the clustered key column after the ORDER BY clause, the query optimizer may not need to use the sort operator because of the clustered index structure.

For example, the following query retrieves some columns of the SalesOrderHeaderEnlarged table and it also sorts the result set ascending order according to the SalesOrderID column.

Clustered index and ORDER BY statement

In the execution plan, we see an Ordered attribute of the clustered index scan operator as true. With the help of this attribute, we can understand that the storage engine has returned the data rows in a sorted manner.

On the other hand, when we don’t use the ORDER BY statement explicitly in the queries, the clustered index does not guarantee to return data rows in a pre-sorted fashion. Such as ;

Allocation Order Scan: SQL Server storage engine can access the data using either the b-tree structure or Index Allocation Map (IAM) through the clustered indexes. Allocation order scan returns the data rows in an unpredictable order. The NOLOCK hint can cause to drive this scan type when it also meets the following conditions:

  • The index size is greater than 64 pages
  • The ordered attribute of the index scan operator is false

This query uses the b-tree structure to retrieve data rows.

A SELECT statement execution plan

Now we will add the NOLOCK hint to the query.

Allocation Order Scan demonstration

In the execution plan of this query, we will see that the ordered attribute is false.

 Allocation order scan execution plan

Parallel Query Plans: The parallel query plan enables to separate the big data processing task into small ones. In this way, a query is processed by more than one thread so that the query completion time will be reduced. However, the sorting of data rows combined in the last stage of the parallel plans is not predictable if we don’t use the ORDER BY statement.

Sort operation and other execution plan operators interactions

We can also see the sort operator in the execution plan when we don’t use the ORDER BY statement in the query. The following operators required a sorted input data, for this reason, the sort operation is performed by the database engine.

  • Stream Aggregate
  • Merge Join
  • Distinct
  • Windows Function

The Stream Aggregate Operator performs the aggregation operations (SUM, COUNT, AGV, MIN, MAX, etc.) on their input. The stream aggregate operator input data must be sorted. For this reason, in most cases, we may see an extra sort operator in the execution plans before the stream aggregate operator. Let’s execute the following query and then analyze the execution plan of it.

Stream Aggregate Operator and ORDER BY statement

As we can see, the sort operation is performed by the SQL Server before the stream aggregate operator.

The Merge Join Operator is one of the join operators that converts the two received input data into a single combined data. This operator requires both input data must be sorted in the same way. We will execute the below query and analyze the execution plan.

Merge Join Operator and sort operation in SQL Server

In this execution plan, we could not see any sort operator because the clustered index scan offered the data rows in a sorted manner. The query optimizer does not choose the merge join operator because for some queries using this join type may cause to reduce the query plan cost.

For the following query, the query optimizer will decide to use the hash join operator.

Estimated subtree cost attribute of the execution plan

When we force this query to use the merge join, it increases the query cost.

Using the MERGE JOIN option

The Distinct Sort Operator performs eliminating the duplicate rows and sort operations at the same time.

Distinct sort operator and ORDER BY statement

Conclusion

In this article, we discussed how the ORDER BY statement affects the query performance. The sorting operations are very costly for the SQL Server for this reason we have to consider the performance issues when we use the ORDER BY statement. On the other hand, some operators use sort operations to perform their tasks.

Esat Erkec
37,779 Views