Dmitry Piliugin

SQL Server 2017: Scalar Subquery Simplification

April 26, 2018 by

Nowadays a lot of developers use Object-Relational Mapping (ORM) frameworks. ORM is a programming technique that maps data from an object-oriented to a relational format, i.e. it allows a developer to abstract from a relational database (SQL Server, for example), use object-oriented language (C#, for example) and let an ORM to do all the “talks” to a database engine by generating query texts automatically. ORMs are not perfect, especially if they are used in a wrong way. Sometimes they generate inefficient queries, e.g. a query with redundant expressions. SQL Server has a mechanism to struggle with that inefficiency called a query simplification.

Query simplification is a pre-optimization phase that is run during the query compilation, but before the actual optimization search is started. During that phase the optimizer applies simplification rules against a query tree. The simplification rule represents an algorithm that transforms some portion of a query tree or the whole tree into a simpler form. In this post, we will talk about the new optimizer rule in SQL Server 2017 – CollapseIdenticalScalarSubquery.

Collapsing Subqueries

I have tried different variants of scalar subqueries to make this rule work, but fortunately, the optimizer team was kind enough to guide me – a query pattern for this rule is:

I asked myself, who may write a query like this, and I think the answer might be an ORM.

Let’s look at the example. We run three queries, the first query is under compatibility level of 2016 (130), the second one under 2017 (140), the third also under 2017, but with the rule CollapseIdenticalScalarSubquery turned off.

The plans are accordingly:

You may see that in the first plan, there are two clustered index scans of the table SalesOrderDetail, however the subquery is exactly the same “exists (select * from Sales.SalesOrderDetail f where f.SalesOrderID = d.SalesOrderID)” but referenced twice.

In the second case, compiled under next compatibility level, the double reference of the subquery is collapsed and we see only one reference to the SalesOrderDetails table and more efficient plan, despite the query still has two subqueries with SalesOrderDetails.

In the third case, also compiled under 2017 level, we see the second branch with the SalesOrderDetail again, but that is because we turned off the rule CollapseIdenticalScalarSubquery with an undocumented hint queryruleoff (which I originally described in my blog post).

If you look into the DMV sys.dm_exec_query_transformation stats, you will see the succeed counter has increased after the rule was used during the query compilation:

The result is:


Note: not all subqueries of this shape will work, for some reason this query does not benefit from this rule:

The plan is:

The plan uses FactResellerSales table twice and do not benefit from the new simplification rule. Probably, this functionality will be evolved in the future, but nowadays it works in this way.

Conclusion

Unfortunately, this rule also does not solve a quite popular problem, that was described by Erland Sommarskog on the Connect site: Unnecessarily bad performance for coalesce(subquery), however, I hope that, as far as Microsoft invests some time to this kind of problems, the problem from the Connect site might be addressed in future.

That’s all for today, thank you for reading and stay tuned!

Table of contents

SQL Server 2017: Columnstore Indexes and Trivial Plan
SQL Server 2017: Columnstore in-place updates
SQL Server 2017: Scalar Subquery Simplification
SQL Server 2017: Interleaved Execution for mTVF
SQL Server 2017: Sort, Spill, Memory and Adaptive Memory Grant Feedback
SQL Server 2017: Statistics to Compile a Query Plan
SQL Server 2017: How to Get a Parallel Plan
SQL Server 2017: Adaptive Join Internals
Dmitry Piliugin
Execution plans, Query analysis, SQL Server 2017

About Dmitry Piliugin

Dmitry is a SQL Server enthusiast from Russia, Moscow. He started his journey to the world of SQL Server more than ten years ago. Most of the time he was involved as a developer of corporate information systems based on the SQL Server data platform. Currently he works as a database developer lead, responsible for the development of production databases in a media research company. He is also an occasional speaker at various community events and tech conferences. His favorite topic to present is about the Query Processor and anything related to it. Dmitry is a Microsoft MVP for Data Platform since 2014. View all posts by Dmitry Piliugin

168 Views