Poor SQL query design - a SQL query performance killer

Poor query design is one of the top SQL Server performance killers. Even with good database design, no frequent recompilations, and no other SQL performance killers, poor query design can severely degrade performance.

Depending on the performance problem cause, fixing poor SQL query design can be quick or time consuming. It’s recommended to follow some general recommendations and best practices while writing queries, so you don’t have to rewrite and optimize them later.

Factors that affect query performance

Query performance also depends on data volume and transaction concurrency. Executing the same query on a table with millions of records requires more time that performing the same operation on the same table with only thousands of records.

A lot of concurrent transactions can degrade SQL Server performance. The shorter the queue of transactions that wait to be processed, the better performance.

Executing a SQL query to retrieve records from multiple joined tables with small sets of data in a sandbox is quick, but running the same query in production with millions of rows in each joined table and multiple users accessing the same tables and data can add significant pressure. That’s why sometimes developers are not fully aware of query performance.

To be able to see how a SQL query performs in real production environment, it’s necessary to provide the same conditions as in the production environment. Otherwise, the potential underperforming can be masked.

We’ll use STATISTICS TIME to show the number of milliseconds required to parse, compile, and execute a SELECT statement.

We’ll execute a simple SELECT statement:

SET STATISTICS TIME ON

SELECT [AddressID] [int], [AddressLine1], [AddressLine2], [City]

FROM [Person].[Address]

SET STATISTICS TIME OFF

When executed on a million row table, it takes approximately 13 seconds:

SQL Server Execution Times:
CPU time = 1031 ms, elapsed time = 13263 ms.

When executed on a two million row table, the time practically is doubled:

(2000000 row(s) affected)

SQL Server Execution Times:
CPU time = 2140 ms, elapsed time = 26961 ms.

STATISTICT TIME shows two execution times. The first one is the CPU time, it presents the number of milliseconds the processor has been processing the data. The second time named ‘elapsed time’ is the time needed to show the results in the grid. When working with a large set of records, showing the records in the grid lasts much longer than retrieving them from the table.

Don’t retrieve more data than necessary

When retrieving data from SQL Server tables, don’t retrieve more than you need. Obtaining excessive data is resource expensive and time consuming.

We’ll use the Person.Address table in the AdventureWorks database:

CREATE TABLE [Person].[Address](

[AddressID] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,

[AddressLine1] [nvarchar](60) NOT NULL,

[AddressLine2] [nvarchar](60) NULL,

[City] [nvarchar](30) NOT NULL,

[StateProvinceID] [int] NOT NULL,

[PostalCode] [nvarchar](15) NOT NULL,

[SpatialLocation] [geography] NULL,

[rowguid] [uniqueidentifier] ROWGUIDCOL NOT NULL,

[ModifiedDate] [datetime] NOT NULL,

CONSTRAINT [PK_Address_AddressID] PRIMARY KEY CLUSTERED

(

[AddressID] ASC

) ON [PRIMARY])

We’ll add random records to the table, so that it contains 1 million records:

DECLARE @i int

SET @i = 0

WHILE @i < 1000000

BEGIN

INSERT INTO Person.Address( AddressLine1,

AddressLine2,

City,

StateProvinceID,

PostalCode,

rowguid,

ModifiedDate )

VALUES(CONVERT(nvarchar(60), NEWID()) ,

CONVERT(nvarchar(60), NEWID()) ,

rand () *5 ,

rand (),

rand () *7,

NEWID(),

DATEADD (day, (ABS(CHECKSUM(NEWID())) % 1625), GETDATE())

SET @i = @i + 1

END

If you need for example IDs of the addresses modified before 2015, retrieve only that information. Don’t retrieve their addresses, cities, postal codes, or dates they were modified.

We’ll start with the worst case – selecting all columns and all rows:

SET STATISTICS TIME ON

SELECT *

FROM Person.Address

SET STATISTICS TIME OFF

The statistics shows:

(1000000 row(s) affected)

SQL Server Execution Times:
CPU time = 2219 ms, elapsed time = 20917 ms.

Now, we’ll retrieve only two columns – the address ID and modified date where the year condition should be applied:

SET STATISTICS TIME ON

SELECT AddressID, ModifiedDate

FROM Person.Address

SET STATISTICS TIME OFF

The processor time is almost four times smaller:

(1000000 row(s) affected)

SQL Server Execution Times:
CPU time = 578 ms, elapsed time = 10219 ms.

Finally, we’ll retrieve only the records needed – IDs of the addresses modified before 2015:

SET STATISTICS TIME ON

SELECT AddressID

FROM Person.Address

WHERE YEAR (ModifiedDate) < 2015

SET STATISTICS TIME OFF

(158956 row(s) affected)

SQL Server Execution Times:
CPU time = 515 ms, elapsed time = 3286 ms.

The number of the returned rows is approximately 6 times smaller, the number of the columns retrieved also smaller than in the previous example, yet the processor time is only 10% shorter. That’s due to the YEAR built-in function used on a non-index column.

If possible, avoid using functions (both built-in such as YEAR, LEFT, etc. and user-defined) in the WHERE clause:

SET STATISTICS TIME ON

SELECT AddressID

FROM person.Address1

WHERE ModifiedDate < '2015/01/01'

SET STATISTICS TIME OFF

(158956 row(s) affected)

SQL Server Execution Times:
CPU time = 298 ms, elapsed time = 2930 ms.

As shown, the narrower the results set is, the less time is needed to retrieve it. More resources are available for other operations and there will be fewer chances for bottlenecks.

However, pay attention to the column you use in the function used in the predicate (as the parameter in the WHERE condition). In the example above, we used a non-index column. If the clustered index column is used, the time needed will be longer:

SET STATISTICS TIME ON

SELECT AddressID

FROM Person.Address1

WHERE LEFT (AddressID,6) < 158956

SET STATISTICS TIME OFF

(158956 row(s) affected)

SQL Server Execution Times:
CPU time = 845 ms, elapsed time = 3788 ms.

As shown, for the same number of records, the processor requires almost 300% more time

			WHERE condition, 158,956 records retrieved
	SELECT *	SELECT <column list>	ModifiedDate < ‘2015/01/01’	YEAR (ModifiedDate) < ‘2015’	LEFT (AddressID,2) < 12000
Processor time in ms	2,219	578	298	515	845

In this article, we’ve presented the factors that affect SQL query performance and gave some general guidelines for testing. We showed how much processor time is used for retrieving different amount of records. In the part 2 of this article, we’ll show how using indexed views instead of adding indexes to tables affects SQL performance.

Author
Recent Posts

Milena Petrovic

Milena is a SQL Server professional with more than 20 years of experience in IT. She has started with computer programming in high school and continued at University.

She has been working with SQL Server since 2005 and has experience with SQL 2000 through SQL 2014.

Her favorite SQL Server topics are SQL Server disaster recovery, auditing, and performance monitoring.

View all posts by Milena "Millie" Petrovic

Latest posts by Milena Petrovic (see all)

Using custom reports to improve performance reporting in SQL Server 2014 – running and modifying the reports - September 12, 2014
Using custom reports to improve performance reporting in SQL Server 2014 – the basics - September 8, 2014
Performance Dashboard Reports in SQL Server 2014 - July 29, 2014

SQLShack

Poor SQL query design – a SQL query performance killer – the basics

Factors that affect query performance

Don’t retrieve more data than necessary

Factors that affect query performance

Don’t retrieve more data than necessary

Related posts: