Smart database backups in SQL Server 2017

So far, we’ve discussed several phases of backup that starts with planning, creating, strategizing and implementing. In this article, we are going to see how database administrators can define the strategy to improve backup performance and efficiently manage backups in SQL Server 2017. The following are the topics of discussion:

Discuss checkpoints
Discuss the enhancements made in the Dynamic Management View (DMV) sys.dm_db_file_space_usage for smart differential backups
Discuss the enhancements made for the Dynamic Management function (DMF) sys.dm_db_log_stats for smart transactional log backup
Understand the functioning of smart differential backup and its internals
Understand the Smart transaction log backup process and its internals
T-SQL scripts
And more…

Background

A checkpoint is a background process that writes dirty pages to disk. A checkpoint performs a full scan of the pages in the buffer pool, lists all the dirty pages that are yet to be written to the disk, and finally writes those pages to the disk. In SQL instances that do not have many dirty pages in the buffer pool, this is a trivial operation. However, with the SQL instances that have OLTP databases, use more memory and/or involve sequential scanning of all pages, the performance of the system could be impacted.

With SQL Server 2012, indirect checkpoints were introduced. In this case, the dirty page manager manages the dirty page list and generally keeps tracks of the all the dirty pages modifiers of the database. By default, it runs every 60 seconds and tracks the dirty pages that need to be flushed.

For example, if an indirect checkpoint is set to 120 seconds then individual pages would be flushed around 120 seconds after that they were made dirty. This will impact I/O. I/O also depends on the number of dirty modifiers lists of the database; we may even see a significant spike in the set target interval and see small hits as the pages change. This eliminates overhead to tracing and flushing the dirty pages which results in significant improvement in backup performance. To review further, go through the reference section below to understand more about the checkpoint and backup performance improvement process in detail.

On a database where an automatic checkpoint mode is set, tracing of the entire unit of the buffer pool is necessary to evaluate for possible dirty pages. On the other hand, with an indirect checkpoint, only those pages of the buffer pool that are dirtied have to be dealt with.

For example, let’s consider a system with 2 TB of memory that contains ~250 million buffer units and has 100 dirty pages. Assume that it requires 5 CPU cycles to trace and identify the status of each buffer unit. To traverse 250 million BUFs, it would require 1.25 trillion CPU cycles—and to what? Simply to find 100 positive dirty pages in the automatic checkpoint mode! However, 100 dirty pages are hardly anything with the indirect checkpoint mode. The entire buffer pool scanning time is eliminated.

Smart Differential Backup

Before the advent of SQL Server 2017, to understand how the extents are altered since the last full backup used to be done by querying the DBCC PAGE command. The Differential Changed Map (DCM) is a bitmap pointer referenced by SQL Server to trace the modified pages since the last full backup. It is stored in DCM pages, one per GAM interval. A differential backup process will identify those pages with CHANGED status in the DCM. There are usually three additional extents included in a differential backup.

It’s demo time

SQLShackAuthorDS is a sample table created with two columns, id and authorname.

USE SQLShackDemoSBKP

--Create the sample table

CREATE TABLE [dbo].[SQLShackAuthorList](

[ID] [int] IDENTITY(1,1) NOT NULL,

[authorName] [varchar](8000) NULL,

PRIMARY KEY CLUSTERED

(

[ID] ASC

)

) ON [PRIMARY]

--insert 1000 dummy data

INSERT INTO SQLShackAuthorList values('SQL Shack Prashanth')

GO 1000

--select the sample data

select top 5 * From SQLShackAuthorList

--Backup the database

BACKUP DATABASE SQLShackDemoSBKP TO DISK = 'F:\PowerSQL\SQLShackDemoSBKP.bak' WITH INIT

-- List data and index pages allocated to the SQLShackAuthorsDS table

DBCC IND('SQLShackDSDemo',SQLShackAuthorList,-1)

--set the trace flag

DBCC TRACEON(3604)

DBCC PAGE('SQLShackDSDemo',1,6,3) WITH TABLERESULTS

DBCC PAGE arguments:

Database name or database id
Page file ID: 1
Page Number: 6; to retrieve DCM page
Output: 3; retrieve page header and details of every page details for the data interpretation

In the DIFF_MAP, we can see the status of DCM pages. The output lists the modified extents since the last full backup. There are currently five modified extents listed—1:0, 1:8, 1:16 1:64, 1:152

Now, let’s go ahead and update the SQLShackAuthorList table with some dummy values for the ids 40,50,60,70 and look for the modified extents

update SQLShackAuthorList

SET authorName='SQLShack Prashan Jayaram'

where id in(40,50,60,70)

DBCC PAGE('SQLShackDSDemo',1,6,3) WITH TABLERESULTS

We can see that modified extents listed are 1:0, 1:8, 1:16, 1:24, 1:40, 1:64, 1:128, 1:52, and 1:304. Total of 9 extents are modified since the last backup.

Let’s now initiate a differential backup

1	BACKUP DATABASE SQLShackDemoSBKP TO DISK = 'F:\PowerSQL\SQLShackDSDemo_Diff_1.bak' WITH DIFFERENTIAL;

Let’s perform the insert operation and identify the modified extents

Insert into SQLShackAuthorList values('Prashanth Jayaram'+ CAST(NEWID() AS VARCHAR(MAX)))

GO 100

DBCC PAGE('SQLShackDSDemo',1,6,3) WITH TABLERESULTS

In the output, the listed modified extents are 1:0, 1:8, 1:16, 1:24, 1:40, 1:64, 1:128, 1:36, 1:52, 1:160, 1:68, 1:192, and 1:304. A total of 13 extents are modified since the last full backup.

Again, let’s initiate a differential backup. As we can see, there is a total of 105 pages processed during the backup.

1	BACKUP DATABASE SQLShackDemoSBKP TO DISK = 'F:\PowerSQL\SQLShackDSDemo_Diff_2.bak' WITH DIFFERENTIAL;

It looks like SQL Server adds three more additional extents than the listed extents in the DCM i.e. 98 pages were processed during the 1^st differential backup and 4 extents were modified in the process of the update-and-insert operation. In total, 98+4+3=105 pages, as you can see in the above output. You can reset the DCM by initiating a FULL backup.

This is a hard way to estimate and identify the modified extents prior to SQL Server 2017.

Let’s walk through the details of SQL Server 2017 DMV’s and DMF’s which facilitates an extra column to measure the modified extents in a simpler way.

The DMV sys.dm_db_file_space_usage has a new field called modified_extent_page_count. This column tells us how many pages were changed since the last full backup. The modified page count can be used to track differential changes in the database since the last full backup to decide if a differential backup is beneficial.

Now, run the following SQL to check how many pages were modified

SELECT file_id,

total_page_count,

modified_extent_page_count,

(modified_extent_page_count *100) / total_page_count [% Change]

FROM

sys.dm_db_file_space_usage

Let’s run the update-and-insert SQL statement and then measure the modified pages

UPDATE SQLShackAuthorList

SET authorName='SQLShack Prashanth Jayaram'

insert into SQLShackAuthorList values('SQL Shack Prashanth Jayaram'+ CAST(NEWID() AS VARCHAR(MAX)))

GO 100000

Let’s now check the modified pages by querying sys.dm_db_file_space_usage

We can see that there is a 28% change in the database. This information is very helpful and an important factor in deciding whether to initiate Full or a differential database backup.

Let’s prepare the SQL

The script can be saved as a SQL file and executed over multiple instances using the sqlcmd command.

Variable declaration, the @fullthreshold and @diffthreshold variables require initial values, which are used to compare and initiate the full or differential backup
Verify the SQL Server instance version
Define a cursor to hold a list of active databases in a given instance
Loop through every database
Calculate the pages that have changed since last full backup and assign the value to a variable
Prepare a dynamic SQL to compare and initiate a full or differential backup
Reset the cursor variables

--variable declaration

DECLARE

@ver nvarchar(128),

@fullthreshold int=80,

@Diffthreshold int=35,

@changes numeric(5,2),

@sql nvarchar(max),

@sSQL nvarchar(max),

@rval int,

@ParmDefinition nvarchar(max),

@dbname NVARCHAR(100),

@fPath NVARCHAR(256) ,

@fName NVARCHAR(256) ,

@Date VARCHAR(20) ,

@value varchar(10)

--Checking the SQL Server versioni using serverproperty

SET @ver = CAST(serverproperty('ProductVersion') AS nvarchar)

SET @ver = SUBSTRING(@ver, 1, CHARINDEX('.', @ver) - 1)

SET @ParmDefinition = N'@retvalOUT int OUTPUT';

IF ( @ver = '14' )

BEGIN

-- backup directory

SET @fpath = 'F:\PowerSQL\SmartBackup\'

-- filename format

SELECT @Date = CONVERT(VARCHAR(20),GETDATE(),112)

--Declare the cursor

DECLARE database_cursor CURSOR READ_ONLY FOR

SELECT name

FROM master.sys.databases

WHERE name NOT IN ('master','model','msdb','tempdb') and state_Desc='ONLINE'

OPEN database_cursor

FETCH NEXT FROM database_cursor INTO @dbname

WHILE @@FETCH_STATUS = 0

BEGIN

SET @fName = @fpath + @dbname + '_' + @date

--Calculate the pages that are modified since last full backup and assign it to a variable

set @SQL='SELECT @retvalOUT=CAST(ROUND((SUM(modified_extent_page_count)*100.0)/SUM(allocated_extent_page_count),2) as decimal(10,2))

from '+@dbname+'.sys.dm_db_file_space_usage'

EXEC sp_executesql @SQL, @ParmDefinition, @retvalOUT=@rval OUTPUT;

SELECT @rval;

--Initiate FULL or differential backup based on the defined threshold values

SET @sSQL= 'IF ('+ cast(@rval as varchar(10))+'>' +cast(@fullthreshold as varchar(10))+')

BACKUP DATABASE '+@dbname +' TO DISK ='''+ @fName+'_full.bak''' +CHAR(13)

+'ELSE'+CHAR(13)+'

BEGIN

IF ('+ cast(@rval as varchar(10))+'>' +cast(@Diffthreshold as varchar(10))+' AND ' +cast(@rval as varchar(10))+'<'+ cast(@fullthreshold as varchar(10)) +')'+char(13)

+'BACKUP DATABASE '+@dbname +' TO DISK ='''+ @fName+'_Diff.bak'''+' WITH DIFFERENTIAL'+CHAR(13)+

'END'

--Print the SQL

print @sSQL

--Execute the dynamic SQL

EXEC sp_executesql @sSQL

FETCH NEXT FROM database_cursor INTO @dbname

END

CLOSE database_cursor

DEALLOCATE database_cursor

END

In the following output, we can see that the 48% of the database TestSQLShackDemo has changed and 70% of SQLShackDemoSBKP has changed. This results in initiating a differential backup.

Transaction log backups in SQL Server 2017

In most production environments, setting the right auto-growth value is a challenging and even daunting task. Understanding the VLFs and managing T-log backups is another area where database administrators spend a lot of time analyzing database growth and transaction rates of the database in order to configure the database to perform at an optimal level.

SQL Server 2017 introduces an enhancement to sys.dm_db_log_stats dynamic management function. The function has an output column called log_since_last_log_backup_mb, which has the value that denotes the amount of log data generated since the last transaction log backup. We can develop a procedure based on the value of this column and efficiently use it to initiate a transaction log backup. This facilitates maintaining a consistent log backup size and prevents abrupt database auto-growth due to transactional outbursts. It also helps database administrators to effectively manage the space issues, thereby speeding up the database recovery process.

I recommend reading the article, Transaction Log Physical Architecture to get more insight into the internals of the transaction log file.

SELECT

name AS 'DatabaseName',

total_vlf_count AS 'VLFcount' ,

log_since_last_log_backup_mb

FROM

sys.databases AS s

CROSS APPLY sys.dm_db_log_stats(s.database_id)

where

s.state_Desc='ONLINE'

Let’s initiate a transactional log backup for a specific database that has generated a log of 10 -20 MB. In the above output, we can see that SQLShackDemoSBKP has generated 12 MB log data.

--variable declaration

DECLARE

@ver nvarchar(128),

@logthreshold int=10,

@log_since_last_log_backup_mb decimal(10,2),

@sql nvarchar(max),

@sSQL nvarchar(max),

@rval int,

@ParmDefinition nvarchar(max),

@dbname NVARCHAR(100),

@fPath NVARCHAR(256) ,

@fName NVARCHAR(256) ,

@Date VARCHAR(20) ,

@value varchar(10)

--Checking the SQL Server versioni using serverproperty

SET @ver = CAST(serverproperty('ProductVersion') AS nvarchar)

SET @ver = SUBSTRING(@ver, 1, CHARINDEX('.', @ver) - 1)

SET @ParmDefinition = N'@retvalOUT int OUTPUT';

IF ( @ver = '14' )

BEGIN

-- backup directory

SET @fpath = 'F:\PowerSQL\SmartBackup\'

-- filename format

SELECT @Date = CONVERT(VARCHAR(20),GETDATE(),112)

--Declare the cursor

DECLARE database_cursor CURSOR READ_ONLY FOR

SELECT

name,

convert(decimal(10,2),log_since_last_log_backup_mb)

FROM

sys.databases AS s

CROSS APPLY sys.dm_db_log_stats(s.database_id)

where

s.state_Desc='ONLINE' and recovery_model_desc='FULL' and name IN ('SQLShackDemoSBKP','SQLShackTailLogDB_Test')

OPEN database_cursor

FETCH NEXT FROM database_cursor INTO @dbname,@log_since_last_log_backup_mb

WHILE @@FETCH_STATUS = 0

BEGIN

SET @fName = @fpath + @dbname + '_' + @date

--Initiate Log backup based on the defined threshold values

SET @sSQL= 'IF ('+ cast(@log_since_last_log_backup_mb as varchar(10)) +'>='+cast(@logthreshold as varchar(10))+')

BACKUP log '+@dbname +' TO DISK ='''+ @fName+'_log.bak'''

--Print the SQL

print @sSQL

--Execute the dynamic SQL

EXEC sp_executesql @sSQL

FETCH NEXT FROM database_cursor INTO @dbname, @log_since_last_log_backup_mb

END

CLOSE database_cursor

DEALLOCATE database_cursor

END

After executing the above SQL, we can see that a T-log backup has been initiated on SQLShackDemoSBKP. We can schedule the script and initiate the backup according to the defined threshold conditions.

Wrapping up

In this article we walked through the concept of the checkpoint and the impact of implementing indirect checkpoints and why indirect checkpoints are recommended on systems with larger memory footprints.

We also demonstrated the interesting and useful enhancements with DMV and DMF for efficiently managing differential and transactional log backups.

T-SQL scripts were made available and can be easily customized for an SQL Server 2017 environment.

All in all, SQL Server 2017 provides many features, of which my favorites are handling smart backups, which is a really intuitive feature.

Database Backup and Restore process in SQL Server – series intro

An overview of the process of SQL Server backup-and-restore

Understanding the SQL Server Data Management Life Cycle

Understanding SQL Server database recovery models

Understanding SQL Server Backup Types

Backup and Restore (or Recovery) strategies for SQL Server database

Discussing Backup and Restore Automation using SQLCMD and SQL Server agent

Understanding Database snapshots vs Database backups in SQL Server

SqlPackage.exe – Automate SQL Server Database Restoration using bacpac with PowerShell or Batch techniques

Smart database backup in SQL Server 2017

How to perform a Page Level Restore in SQL Server

Backup Linux SQL databases Using PowerShell and Windows Task Scheduler

SQL Server Database backup and restore operations using the Cloud

Tail-Log Backup and Restore in SQL Server

SQL Server Database Backup and Restore reports

Database Filegroup(s) and Piecemeal restores in SQL Server

In-Memory Optimized database backup and restore in SQL Server

Understanding Backup and Restore operations in SQL Server Docker Containers

Backup and Restore operations with SQL Server 2017 on Docker containers using Azure Data Studio

Interview questions on SQL Server database backups, restores and recovery – Part I

Interview questions on SQL Server database backups, restores and recovery – Part II

Interview questions on SQL Server database backups, restores and recovery – Part III

Interview questions on SQL Server database backups, restores and recovery – Part IV

Prashanth Jayaram

I’m a Database technologist having 11+ years of rich, hands-on experience on Database technologies. I am Microsoft Certified Professional and backed with a Degree in Master of Computer Application.

My specialty lies in designing & implementing High availability solutions and cross-platform DB Migration. The technologies currently working on are SQL Server, PowerShell, Oracle and MongoDB.

View all posts by Prashanth Jayaram

Latest posts by Prashanth Jayaram (see all)

Stairway to SQL essentials - April 7, 2021
A quick overview of database audit in SQL - January 28, 2021
How to set up Azure Data Sync between Azure SQL databases and on-premises SQL Server - January 20, 2021

Background

Smart Differential Backup

Transaction log backups in SQL Server 2017

Wrapping up

Table of contents

References

Related posts: