This article targets the beginners and gives an introduction of the clustered index in SQL Server.
An index plays a crucial role in SQL Server query performance. Consider a book library with thousands of books.
You want to search for a specific book that contains the keyword “adventure” in the title. The books in the library are unorganized.
- You need to fetch each book from the shelf, read it and put it back if it does not satisfy your requirements
- In a big library, it might take you a few days to find a single book
Suppose this library is a SQL table so in SQL Server terms, scanning all books in a library is known as table (index) scan.
Suppose you find two books for the “adventure” keyword and now you are interested in searching for another keyword in that book. Each book contains 2,000 pages.
Is it possible for you to search for a particular keyword in a book without reading it?
In a book, we use indexes for locating the page where you can find keywords. In SQL Server terms, we call it an index seek operation.
We can visualize the table scan and index seek operation as per the following image:
We have mainly two types of indexes in SQL Server:
- Clustered index(CI)
- Non-clustered index
Overview of a Clustered index in SQL Server
SQL Server creates the index in a B-tree structure. SQL Server organizes data in this structure to quickly search for the required data instead of a table scan.
As shown in the following image, we have three levels in a B-tree structure index:
It is a top node and consists of a pointer for the intermediate index pages or leaf node (data pages).
It consists of index key values along with pointers to the next intermediate level pages or the leaf data pages. It depends upon the size of the data for the number of intermediate levels.
It consists of actual data pages. Consider this as a point where actual data is stored in a clustered index in SQL Server.
Let’s imagine this b-tree for our library example. Suppose we have 10,000 books in the library and created CI on the bookid column of the library table:
As shown in the above image, we defined the index on the bookid column of the library table. Now, if we want to retrieve a specific book say bookid 7579, SQL Server reads the following pages:
- Root page: Root page points SQL Server for the intermediate node 5001 to 10,000
- Intermediate Node: SQL Server reads the intermediate index page and gets a pointer for the actual data page in the leaf node
- Leaf node: SQL Server reads the data page in the leaf node as per our requirement
As highlighted earlier, SQL Server reads the index page and returns the information. This process is known as Index seek. In certain situations, SQL Server reads all leaf-level data pages. It is known as an index scan. Undoubtedly, the index seek is faster than the index scan.
Create a Clustered index in SQL Server using SSMS
Let’s create a test table and create an index using SQL Server Management Studio GUI method:
CREATE TABLE dbo.bookstore
(book_id INT NOT NULL,
Insert into dbo.bookstore values(1,'Learn ABC of SQL Server')
Insert into dbo.bookstore values(2,'Advanced troubleshooting step SQL Server')
We do not have any index on the table. A table without a CI is known as the heap table. You can read more about heap tables, and its performance issue in the article Forwarded Records Performance issue in SQL Server.
Expand Databases and locate the table in which we want to define an index. Expand table and right-click on Indexes | New Index | Clustered Index:
It opens the following New Index configuration window. It highlights an error message that at least one column should be there in an index:
In the General page, it pre-populates the following pieces of information:
- Table name
- Index name: By default, it gives index name in the format of ClusteredIndex-YYYYMMDD-HHMMSS. We should avoid using this name for the index. We should use a familiar name and define a naming convention for all indexes in our databases
- Index Type: Clustered index
Specify an index name and click on Add to define the index key:
It opens following window to select CI key column on a specified table:
Once we select a column, it shows the status as Ready:
Click OK, and you can see the selected column in the Index key columns area:
Under the Sort Order column, by default, it selects value Ascending. If we want to store the index key value in descending order, change the sort order as shown below:
Now click on Options, and you get a few configuration options for creating CI’s:
Few useful configuration options are as follows:
- Auto recompute statistics: By default, SQL Server recomputes the statistics based on pre-defined thresholds. We should not change its value unless required
- Ignore duplicate values: Default value is false. We specify index behaviour for duplicate rows
- IGNORE_DUP_KEY OFF: it gives an error message for duplicate row insertion
- IGNORE_DUP_KEY ON: It ignores any duplicate data insertion and gives a warning message
- Allow online DML processing: Using this option, we can allow concurrent users to access the table during the index operation. It is useful for a busy OLTP environment for performing online index maintenance
- Maximum degree of parallelism: We can limit the processors for parallel plan execution using this. We should make changes to this with proper testing
- Sort in tempdb: We can specify this option for asking SQL Server to sort results during index creation in the tempdb. By default, SQL Server uses the source database in which the table exists
- Fill factor: We can define a fill factor for the index. If we have specified this fil factor during index creation, it overrides the default value of the fill factor set at the instance level
We have set the following options in my demo for the CI:
In the Storage page, we can specify the filegroup for the index. By default, it uses the filegroup in which the table exists:
Once you have set all required configurations, we can click OK, and it creates the CI for you. It is always advisable to configure the inputs and generate a script for the index. For a large table, sometimes the GUI method does not work correctly.
Click on Script | New Query Editor Window as shown in the following image:
It gives the following script in a new query window:
CREATE UNIQUE CLUSTERED INDEX [IX_bookstore_clustered] ON [dbo].[bookstore]
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = ON, IGNORE_DUP_KEY = OFF,
DROP_EXISTING = OFF, ONLINE = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON,
OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
Execute the script, and you can see the index in Object Explorer:
Note: If we define a primary key on a column, SQL Server automatically creates a CI on the primary key column
Check Clustered index levels in SQL Server
We can use DMV sys.dm_db_index_physical_stats for checking index fragmentation level, index levels for the CI. Insert a few records in the table and execute the following query:
FROM sys.dm_db_index_physical_stats(DB_ID('SQLShack'), OBJECT_ID('bookstore'), NULL, NULL, 'DETAILED');
In the following screenshot, we have two levels of an index:
View Execution plan for query with clustered index in SQL Server
Let’s execute the following query and view the actual execution plan:
SQL Server uses index seek for retrieving the records from bookstore table:
Let’s create a backup table using the following query:
This backup table has a similar structure and data as of the main table. The only difference is that the backup table does not have any index while our main table contains CI on the books_ID column.
Let’s execute the following queries in a separate window and compare the execution plan:
WHERE book_ID = 10000;
WHERE book_ID = 10000;
In the above screenshot, note the following differences. We can quickly figure out the performance comparison between a heap and a table with a clustered index:
Table with a clustered index in SQL
Estimated number of rows to be read
Estimated operator cost
Estimated IO Cost
Estimated CPU Cost
Number of rows read
In this article, we explored the clustered index in SQL Server along with GUI and t-SQL method to create it. We also learned the performance comparison between the heap and CI key table. You should define a CI as per the requirement for better query performance.
While working as a Senior consultant DBA for big customers and having certified with MCSA SQL 2012, he likes to share knowledge on various blogs.
He can be reached at firstname.lastname@example.org
View all posts by Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
- Auditing in AWS RDS SQL Server - January 20, 2020
- Import JSON data into SQL Server - January 17, 2020
- Access PostgreSQL databases using the Azure Data Studio extension - January 16, 2020