Prashanth Jayaram

Smart database backups in SQL Server 2017

May 14, 2018 by

So far, we’ve discussed several phases of backup that starts with planning, creating, strategizing and implementing. In this article, we are going to see how database administrators can define the strategy to improve backup performance and efficiently manage backups in SQL Server 2017. The following are the topics of discussion:

  1. Discuss checkpoints
  2. Discuss the enhancements made in the Dynamic Management View (DMV) sys.dm_db_file_space_usage for smart differential backups
  3. Discuss the enhancements made for the Dynamic Management function (DMF) sys.dm_db_log_stats  for smart transactional log backup
  4. Understand the functioning of smart differential backup and its internals
  5. Understand the Smart transaction log backup process and its internals
  6. T-SQL scripts
  7. And more…


Background

A checkpoint is a background process that writes dirty pages to disk. A checkpoint performs a full scan of the pages in the buffer pool, lists all the dirty pages that are yet to be written to the disk, and finally writes those pages to the disk. In SQL instances that do not have many dirty pages in the buffer pool, this is a trivial operation. However, with the SQL instances that have OLTP databases, use more memory and/or involve sequential scanning of all pages, the performance of the system could be impacted.

With SQL Server 2012, indirect checkpoints were introduced. In this case, the dirty page manager manages the dirty page list and generally keeps tracks of the all the dirty pages modifiers of the database. By default, it runs every 60 seconds and tracks the dirty pages that need to be flushed.

For example, if an indirect checkpoint is set to 120 seconds then individual pages would be flushed around 120 seconds after that they were made dirty. This will impact I/O. I/O also depends on the number of dirty modifiers lists of the database; we may even see a significant spike in the set target interval and see small hits as the pages change. This eliminates overhead to tracing and flushing the dirty pages which results in significant improvement in backup performance. To review further, go through the reference section below to understand more about the checkpoint and backup performance improvement process in detail.

On a database where an automatic checkpoint mode is set, tracing of the entire unit of the buffer pool is necessary to evaluate for possible dirty pages. On the other hand, with an indirect checkpoint, only those pages of the buffer pool that are dirtied have to be dealt with.

For example, let’s consider a system with 2 TB of memory that contains ~250 million buffer units and has 100 dirty pages. Assume that it requires 5 CPU cycles to trace and identify the status of each buffer unit. To traverse 250 million BUFs, it would require 1.25 trillion CPU cycles—and to what? Simply to find 100 positive dirty pages in the automatic checkpoint mode! However, 100 dirty pages are hardly anything with the indirect checkpoint mode. The entire buffer pool scanning time is eliminated.

Smart Differential Backup

Before the advent of SQL Server 2017, to understand how the extents are altered since the last full backup used to be done by querying the DBCC PAGE command. The Differential Changed Map (DCM) is a bitmap pointer referenced by SQL Server to trace the modified pages since the last full backup. It is stored in DCM pages, one per GAM interval. A differential backup process will identify those pages with CHANGED status in the DCM. There are usually three additional extents included in a differential backup.

It’s demo time

SQLShackAuthorDS is a sample table created with two columns, id and authorname.


DBCC PAGE arguments:

  • Database name or database id
  • Page file ID: 1
  • Page Number: 6; to retrieve DCM page
  • Output: 3; retrieve page header and details of every page details for the data interpretation

In the DIFF_MAP, we can see the status of DCM pages. The output lists the modified extents since the last full backup. There are currently five modified extents listed—1:0, 1:8, 1:16 1:64, 1:152


Now, let’s go ahead and update the SQLShackAuthorList table with some dummy values for the ids 40,50,60,70 and look for the modified extents

We can see that modified extents listed are 1:0, 1:8, 1:16, 1:24, 1:40, 1:64, 1:128, 1:52, and 1:304. Total of 9 extents are modified since the last backup.

Let’s now initiate a differential backup


Let’s perform the insert operation and identify the modified extents

In the output, the listed modified extents are 1:0, 1:8, 1:16, 1:24, 1:40, 1:64, 1:128, 1:36, 1:52, 1:160, 1:68, 1:192, and 1:304. A total of 13 extents are modified since the last full backup.

Again, let’s initiate a differential backup. As we can see, there is a total of 105 pages processed during the backup.


It looks like SQL Server adds three more additional extents than the listed extents in the DCM i.e. 98 pages were processed during the 1st differential backup and 4 extents were modified in the process of the update-and-insert operation. In total, 98+4+3=105 pages, as you can see in the above output. You can reset the DCM by initiating a FULL backup.

This is a hard way to estimate and identify the modified extents prior to SQL Server 2017.

Let’s walk through the details of SQL Server 2017 DMV’s and DMF’s which facilitates an extra column to measure the modified extents in a simpler way.

The DMV sys.dm_db_file_space_usage has a new field called modified_extent_page_count. This column tells us how many pages were changed since the last full backup.  The modified page count can be used to track differential changes in the database since the last full backup to decide if a differential backup is beneficial.

Now, run the following SQL to check how many pages were modified


Let’s run the update-and-insert SQL statement and then measure the modified pages

Let’s now check the modified pages by querying sys.dm_db_file_space_usage

We can see that there is a 28% change in the database. This information is very helpful and an important factor in deciding whether to initiate Full or a differential database backup.

Let’s prepare the SQL

The script can be saved as a SQL file and executed over multiple instances using the sqlcmd command.

  1. Variable declaration, the @fullthreshold and @diffthreshold variables require initial values, which are used to compare and initiate the full or differential backup
  2. Verify the SQL Server instance version
  3. Define a cursor to hold a list of active databases in a given instance
  4. Loop through every database
  5. Calculate the pages that have changed since last full backup and assign the value to a variable
  6. Prepare a dynamic SQL to compare and initiate a full or differential backup
  7. Reset the cursor variables

In the following output, we can see that the 48% of the database TestSQLShackDemo has changed and 70% of SQLShackDemoSBKP has changed. This results in initiating a differential backup.

Transaction log backups in SQL Server 2017

In most production environments, setting the right auto-growth value is a challenging and even daunting task. Understanding the VLFs and managing T-log backups is another area where database administrators spend a lot of time analyzing database growth and transaction rates of the database in order to configure the database to perform at an optimal level.

SQL Server 2017 introduces an enhancement to sys.dm_db_log_stats dynamic management function. The function has an output column called log_since_last_log_backup_mb, which has the value that denotes the amount of log data generated since the last transaction log backup. We can develop a procedure based on the value of this column and efficiently use it to initiate a transaction log backup. This facilitates maintaining a consistent log backup size and prevents abrupt database auto-growth due to transactional outbursts. It also helps database administrators to effectively manage the space issues, thereby speeding up the database recovery process.

I recommend reading the article, Transaction Log Physical Architecture to get more insight into the internals of the transaction log file.


Let’s initiate a transactional log backup for a specific database that has generated a log of 10 -20 MB. In the above output, we can see that SQLShackDemoSBKP has generated 12 MB log data.

After executing the above SQL, we can see that a T-log backup has been initiated on SQLShackDemoSBKP. We can schedule the script and initiate the backup according to the defined threshold conditions.

Wrapping up

In this article we walked through the concept of the checkpoint and the impact of implementing indirect checkpoints and why indirect checkpoints are recommended on systems with larger memory footprints.

We also demonstrated the interesting and useful enhancements with DMV and DMF for efficiently managing differential and transactional log backups.

T-SQL scripts were made available and can be easily customized for an SQL Server 2017 environment.

All in all, SQL Server 2017 provides many features, of which my favorites are handling smart backups, which is a really intuitive feature.

Table of contents

Database Backup and Restore process in SQL Server – series intro
An overview of the process of SQL Server backup-and-restore
Understanding the SQL Server Data Management Life Cycle
Understanding SQL Server database recovery models
Understanding SQL Server Backup Types
Backup and Restore (or Recovery) strategies for SQL Server database
Discussing Backup and Restore Automation using SQLCMD and SQL Server agent
Understanding Database snapshots vs Database backups in SQL Server
SqlPackage.exe – Automate SQL Server Database Restoration using bacpac with PowerShell or Batch techniques
Smart database backup in SQL Server 2017
How to perform a Page Level Restore in SQL Server
Backup Linux SQL databases Using PowerShell and Windows Task Scheduler
SQL Server Database backup and restore operations using the Cloud
Tail-Log Backup and Restore in SQL Server
SQL Server Database Backup and Restore reports
Database Filegroup(s) and Piecemeal restores in SQL Server
In-Memory Optimized database backup and restore in SQL Server
Understanding Backup and Restore operations in SQL Server Docker Containers
Backup and Restore operations with SQL Server 2017 on Docker containers using SQL Operations Studio
Interview questions on SQL Server database backups, restores and recovery – Part I
Interview questions on SQL Server database backups, restores and recovery – Part II
Interview questions on SQL Server database backups, restores and recovery – Part III
Interview questions on SQL Server database backups, restores and recovery – Part IV

References


See more

To manage SQL Server backups, consider ApexSQL Backup, a tool that offers automation of backup, restore, and log shipping jobs, stores details of all backup activities and enables easy cross server backup management and maintenance.


Prashanth Jayaram

Prashanth Jayaram

I’m a Database technologist having 11+ years of rich, hands-on experience on Database technologies. I am Microsoft Certified Professional and backed with a Degree in Master of Computer Application.

My specialty lies in designing & implementing High availability solutions and cross-platform DB Migration. The technologies currently working on are SQL Server, PowerShell, Oracle and MongoDB.

View all posts by Prashanth Jayaram
Prashanth Jayaram
1,458 Views