Marcin Szeliga

Index Strategies – Part 1 – Choosing the right table structure

October 6, 2014 by

Introduction

Among many different things that can affect SQL Server performance, some are more significant than others. In addition, some changes can be relatively easy to implement, but others are quite painfully:

  1. Schema design has the single biggest impact on performance, but it’s also almost impossible to change afterwards
  2. Proper indexes also can switch a database to “turbo mode”, and they can be relatively easy added to production systems
  3. Query optimization should be the third task in performance tuning — the impact can be significant, but more often than not it requires changes on the client side
  4. Concurrency control, like switching from locking to versioning, can help in some specific cases
  5. Hardware tuning should be the last option.

Through this series we are going to evaluate end to end indexing strategy that helps you improve SQL Server overall performance without affecting others aspects of production systems, like data consistency, client applications behavior or maintenance routines.

Four different table structures

Before SQL Server 2014 we had only two options:a table could be a heap or a b-tree. Now data can also be stored inside in-memory tables or in form of columnstore indexes. Due to the fact that these structures are completely different, the very first step of index strategy is to choose the correct table structure. To help you take informed decision, I’m going to compare available options.

Test environment

Let’s start with a sample database:

In addition to this, we will need a tool that allows us to simulate and measure user activity(all we need is the ability to execute queries simultaneously and monitor execution times as well as number of read/write pages). You will find a lot of tools such as those in the Internet – I chose SQLQueryStress, a free tool made by Adam Mechanic.

Heaps

A table without clustered index is a heap — just a set of unordered pages. This is the simplest data structure available, but as you will see definitely not the best one:

Let’s check how long it takes to load 100000 rows into this table using 50 concurrent sessions — for this purpose I’m going to execute simple INSERT INTO dbo.Heap DEFAULT VALUES; statement with number of iteration set to 2000 and number of threads to 50.

On my test machine (SQL Server 2014, Intel Pentium i7 with SSD drive), it took 18 seconds, in average. As a result we got these 3300 pages of table:

Of course, because there is no ordering whatsoever, the only way to get a row from this table is through a scan:

Not only SELECT performance is bad, UPDATEs against heaps are also really slow. In order to see it we need two queries: the first one will return some numbers and will be used for parameter substation, the second one will be executed in 5 concurrent sessions:

The results are shown in the picture below (4 noticed exceptions were due to deadlocks):

SQLQueryStress dialog - results

Altogether 7 700 rows were modified, it means that we achieved about 800 modifications per second.

Not only UPDATEs are slow, but they can have a side effect that will affect the performance of queries and will last till the heap is rebuilt. Now I’m talking about forwarded records (if there is not enough space on a page to store new version of a record, the part of it will be saved on a different page, and only forwarding pointer to this page will be stored with the remaining part of this record). In this case more than 3000 forwarded records were created:

The remaining operations, DELETEs, also have unwanted side effects — space taken by deleted records will not be reclaim automatically:

At this point the only way of reclaiming this space and getting rid of forwarded records is to rebuild a table:

Recommendations

Because of suboptimal SELECT performance, and side effects of UPDATE and DELETE statements, heaps are sometimes used for logging and stage tables — in first case, rows are almost always only inserted into a table, in the second one we try maximize data load by inserting rows into a heap and after moving them to a destination, indexed table. But even for those scenarios heaps are not necessary the best (more on this topic in upcoming articles). So, the first step our indexing strategy is to find heaps. Quite often these tables will be unused, so they will have no or minimal impact on performance. This is why you should concentrate on active heaps — one of the standard ways of finding them in current database is to execute this query:

With this list in hand you should start asking serious question why these tables are heaps. And if no excuses were given, covert them into different structures.

What is coming next?

In the next article two new table structures (in-memory tables and columnstore indexes) will be described and compared with heaps.

Marcin Szeliga
Indexes

About Marcin Szeliga

Since 2006 invariably awarded Microsoft Most Valuable Professional title in the SQL category. A consultant, lecturer, authorized Microsoft trainer with 15 years’ experience, and a database systems architect. He prepared Microsoft partners for the upgrade to SQL Server 2008 and 2012 versions within the Train to Trainers program. A speaker at numerous conferences, including Microsoft Technology Summit, SQL Saturday, Microsoft Security Summit, Heroes Happen {Here}, as well as at user groups meetings. The author of many books and articles devoted to SQL Server. View all posts by Marcin Szeliga

168 Views