Milena Petrovic

Poor SQL query design – a SQL query performance killer – the basics

May 23, 2014 by
Poor query design is one of the top SQL Server performance killers. Even with good database design, no frequent recompilations, and no other SQL performance killers, poor query design can severely degrade performance.

Depending on the performance problem cause, fixing poor SQL query design can be quick or time consuming. It’s recommended to follow some general recommendations and best practices while writing queries, so you don’t have to rewrite and optimize them later.

Factors that affect query performance

Query performance also depends on data volume and transaction concurrency. Executing the same query on a table with millions of records requires more time that performing the same operation on the same table with only thousands of records.

A lot of concurrent transactions can degrade SQL Server performance. The shorter the queue of transactions that wait to be processed, the better performance.

Executing a SQL query to retrieve records from multiple joined tables with small sets of data in a sandbox is quick, but running the same query in production with millions of rows in each joined table and multiple users accessing the same tables and data can add significant pressure. That’s why sometimes developers are not fully aware of query performance.

To be able to see how a SQL query performs in real production environment, it’s necessary to provide the same conditions as in the production environment. Otherwise, the potential underperforming can be masked.

We’ll use STATISTICS TIME to show the number of milliseconds required to parse, compile, and execute a SELECT statement.

We’ll execute a simple SELECT statement:

When executed on a million row table, it takes approximately 13 seconds:

SQL Server Execution Times:
   CPU time = 1031 ms, elapsed time = 13263 ms.

When executed on a two million row table, the time practically is doubled:

(2000000 row(s) affected)

SQL Server Execution Times:
   CPU time = 2140 ms, elapsed time = 26961 ms.

STATISTICT TIME shows two execution times. The first one is the CPU time, it presents the number of milliseconds the processor has been processing the data. The second time named ‘elapsed time’ is the time needed to show the results in the grid. When working with a large set of records, showing the records in the grid lasts much longer than retrieving them from the table.

Don’t retrieve more data than necessary

When retrieving data from SQL Server tables, don’t retrieve more than you need. Obtaining excessive data is resource expensive and time consuming.

We’ll use the Person.Address table in the AdventureWorks database:

We’ll add random records to the table, so that it contains 1 million records:

If you need for example IDs of the addresses modified before 2015, retrieve only that information. Don’t retrieve their addresses, cities, postal codes, or dates they were modified.

We’ll start with the worst case – selecting all columns and all rows:

The statistics shows:

(1000000 row(s) affected)

SQL Server Execution Times:
   CPU time = 2219 ms, elapsed time = 20917 ms.

Now, we’ll retrieve only two columns – the address ID and modified date where the year condition should be applied:

The processor time is almost four times smaller:

(1000000 row(s) affected)

SQL Server Execution Times:
   CPU time = 578 ms, elapsed time = 10219 ms.

Finally, we’ll retrieve only the records needed – IDs of the addresses modified before 2015:

(158956 row(s) affected)

SQL Server Execution Times:
   CPU time = 515 ms, elapsed time = 3286 ms.

The number of the returned rows is approximately 6 times smaller, the number of the columns retrieved also smaller than in the previous example, yet the processor time is only 10% shorter. That’s due to the YEAR built-in function used on a non-index column.

If possible, avoid using functions (both built-in such as YEAR, LEFT, etc. and user-defined) in the WHERE clause:

(158956 row(s) affected)

SQL Server Execution Times:
   CPU time = 298 ms, elapsed time = 2930 ms.

As shown, the narrower the results set is, the less time is needed to retrieve it. More resources are available for other operations and there will be fewer chances for bottlenecks.

However, pay attention to the column you use in the function used in the predicate (as the parameter in the WHERE condition). In the example above, we used a non-index column. If the clustered index column is used, the time needed will be longer:

(158956 row(s) affected)

SQL Server Execution Times:
   CPU time = 845 ms, elapsed time = 3788 ms.

As shown, for the same number of records, the processor requires almost 300% more time

WHERE condition, 158,956 records retrieved
SELECT
*
SELECT
<column list>
ModifiedDate
< ‘2015/01/01’
YEAR (ModifiedDate)
< ‘2015’
LEFT (AddressID,2)
< 12000
Processor
time in ms
2,219 578 298 515 845

In this article, we’ve presented the factors that affect SQL query performance and gave some general guidelines for testing. We showed how much processor time is used for retrieving different amount of records. In the part 2 of this article, we’ll show how using indexed views instead of adding indexes to tables affects SQL performance.

Milena Petrovic
168 Views