Rajendra Gupta
disk failure and failover

Database-level health detection in SQL Server Always On Availability Groups

July 30, 2020 by

This is the 8th article in the series for SQL Server Always On Availability Groups.

Introduction

SQL Server Always On Availability Groups create a group of databases that fails over together in case the primary replica is unavailable. These availability group databases failover manually or automatically depending upon your AG configuration. By default, SQL Server fails over all databases from the primary replica to the secondary replica if the primary replica is down or SQL Services stopped in the primary instance.

You can create multiple SQL Server Always On Availability Group in SQL Server configuration. Usually, we group the application dependent databases in an availability group. These databases must be failed over together for smooth working on the application.

Suppose you have two databases in an AG group. Your primary instance is online, but one of the AG databases becomes unhealthy on the primary replica. Your application reports issue because it could not access a database in the AG group. SQL Server does not perform a failover of the AG group to the secondary replica. It requires manual intervention to fix the issues at the primary replica or perform a manual failover to the secondary replica. Sometimes we want that SQL Server should check the database health as well in an AG group and perform automatic failover if any of the databases is not healthy.

Do we have the option to enable the database-level health detection in SQL Server Always ON? Let’s explore it in this article.

Prerequisites

You should configure SQL Server Always On Availability Groups in synchronized mode. You can refer to this article, Add a new node into existing SQL Server Always On Availability Groups for further insights.

In the earlier articles, we have created an availability group with the following configuration.

  • Current primary replica: SQLNode1
  • Secondary replica in Synchronous mode: SQLNode2
  • Secondary replica in Asynchronous mode: SQLNode3
  • Two availability group databases [MyNewDB] and [SQLShackDemo]

SQL Server Always on Availability Group setup

An Overview of the database-level health detection for SQL Server Always On Availability Groups

SQL Server 2016 introduced Enhanced database failover to monitor the database health and trigger the failover for that availability group. This feature is known as Database Level Health Detection in an availability group.

We can enable this feature at the availability group; therefore, it monitors all databases corresponding to that availability group. It cannot be enabled for an individual database in an AG group. Your database should be in synchronized data commit with automatic failover mode so that SQL Server can failover it in case of any issues.

SQL Server performs the following checks for the database-level health detection. If the database status is online and SQL Server could not write in the transaction log of an AG database, it initiates AG failover. It also checks for the following error ids as part of the database health check.

Suppose you have the following AG architecture with one secondary replica. Your AG database stores data files in D drive and log file in the L drive.

disk configuration

Now, on your primary replica, log drive L fails. SQL Server is not able to write in the AG database transaction log. It fails a successful database health detection. SQL Server availability group fails over from primary to the secondary replica.

disk failure and failover

In another scenario, your data drive fails on the primary replica, but your log drive is intact. In this case, no AG failover occurs because SQL Server can write in the transaction log.

data disk failure and failover

Database-level health detection does not monitor the idle database or disk uptime. It does not directly monitor the database files. In the above image, we saw that disk failure causes the availability group automatic failover. Suppose your AG database is idle and does not have any active transactions. In the idle state, SQL Server does not perform any physical activity to the disk. Now, suppose, you have other databases as well in the disk and disk gets full due to those database transactions. Due to disk space issues, SQL Server might not be able to perform any transactions for the databases lying in that disk. Later. You perform a transaction and SQL Server not able to write database checkpoints and write the data into the disk. At that time, SQL Server finds unhealthy database status and perform a database failover as a result of the database health change.

Similarly, if your database can satisfy the data page from the memory, SQL Server does not require to perform a physical read from the disk. It gets the data from the buffer pool memory. It might also not cause your database to failover for SQL Server always-on availability group.

Enhanced database-level failover and SQL Server event ids

SQL Server monitors the following event ids for the enhanced database-level failover.

Event ID

Description

605

Database or page corruption

823

Database checkpoint failure

829

Disk corruption issues

832

Hardware or memory-related issues

1101,1105

Disk space issues in the filegroup. In this case, SQL Server not able to allocate a new page in the filegroup.

5102

Missing filegroup ID requests

5180

Wrong file ID requests

5515

SQL Filestream container issues

5534

Filestream log corruption

5535

Filestream data container corruption

9004

Transaction log corruption

SQL Server logs get error id 41653 once availability group failovers due to database-level health detection.

Configure enhanced database failover in SQL Server Always On Availability Groups

You can configure the enhanced database failover using both the GUI and T-SQL. It is an optional feature. If you have an existing availability group, you can use the dynamic management view sys.availability_groups to check the enhanced database failover is enabled or not.

Execute the below database management view (DMV) in the primary replica instance to check database-level health detection status for your existing availability group. The DMV sys.availability_groups has a column [db_failover] to check the configuration. If this DMV returns value 1 for the [db_failover] column, it means database-level health detection is enabled else zero value shows the feature is not enabled for all databases in the availability group.

As shown below, currently, we are not using this feature.

Verify enhanced database failover

We can enable this feature while configuring a new availability group or in an existing availability group.

Enable enhanced database failover using GUI in an existing availability group

To enable the enhanced database failover, connect to the primary replica using SSMS. In the Always On Availability Group, right-click on the availability group and open its properties.

In the AG property, you get an option as Database-level health detection.

Enable Database-level health detection

Put a tick for the Database-level health detection and generate the T-SQL script.

Put a tick on Database-level health detection

It uses the Alter Availability Group script with the DB_FAILOVER argument.

Create a new SQL Server Always On Availability Group with enhanced database failover

To enable the database-level health detection while creating a new availability group, enable it in the Specify option page, as shown below.

Option to enable in the new availability group wizard

Extended event to track the database health detection

SQL Server 2016 onwards we have a new extended event availability_replica_database_fault_reporting to track the replica restart in case of the unhealthy database. You can add this event to the extended event session on the primary replica.

Extended event to track the database

Disable enhanced database failover for SQL Server Always On Availability Groups

You can disable the enhanced database failover without restarting SQL Services. To disable the feature, either remove the tick from the check from the database-level health detection or use the following query.

Availability group properties

Conclusion

In this article, we explored the useful enhanced database-level failover feature in an availability group. You should enable it for the production databases. You should also group the database appropriately in an AG group so that it gets failover in case any database reports fails the database-level health detection. You can create multiple AG groups and group databases as per their availability and application requirements.

Table of contents

A comprehensive guide to SQL Server Always On Availability Groups on Windows Server 2016
Configure Domain Controller and Active Directory for SQL Server Always On Availability Groups
Configure failover clusters, storage controllers and quorum configurations for SQL Server Always On Availability Groups
Install SQL Server 2019 on Windows Server 2016 with SQL Server Always On Availability Groups
Add a new node into existing SQL Server Always On Availability Groups
Configure Managed Service Accounts for SQL Server Always On Availability Groups
Add or remove a node from SQL Server Always On Availability Groups using T-SQL scripts
Database-level health detection in SQL Server Always On Availability Groups
Automatic Page Repair in SQL Server Always On Availability Groups
Deploy a domain-independent Windows Failover Cluster for SQL Server Always On Availability Groups
Configure a SQL Server Always On Availability Group on the domain-independent Failover Cluster
An overview of distributed SQL Server Always On Availability Groups
Deploy a distributed SQL Server Always On Availability Group
Monitor and failover a Distributed SQL Server Always On Availability Group
Transparent Data Encryption for SQL Server Always On Availability Groups
Configure SQL Server replication for a database in SQL Server Always On Availability Groups
Configuring SQL Server replication for distribution databases in SQL Server Always On Availability Groups
Explore Cross-database MSDTC for distributed transactions in SQL Server Always On Availability Groups
Deploy MSDTC for distributed transactions in SQL Server Always On Availability Groups
Restore an existing availability group database participating in SQL Server Always On Availability Groups
Exploring AG dashboards for monitoring SQL Server Always On Availability Groups
Backup compression in TDE enabled databases in SQL Server Always On Availability Groups
Impact of dropping a login in the active directory tied to SQL Server Always On Availability Groups
Refresh SQL Server Always On Availability Group databases using DBATools PowerShell
Configure SQL Server Reporting Services databases in SQL Server Always On Availability Groups
Suspend and Resume Data Movement in SQL Server Always On Availability Groups
Explore failover types in SQL Server Always On Availability Groups
Explore SQL Server PolyBase External Tables in SQL Server Always On Availability Groups
SQL Server Always On Availability Groups for SQL Server Linux instances
Column-level SQL Server encryption with SQL Server Always On Availability Groups
Make the most of secondary replicas in SQL Server Always On Availability Groups
Apply SQL Server patches or cumulative updates in SQL Server Always On Availability Groups
Monitor SQL Server Always On Availability groups using extended events

Rajendra Gupta
1,366 Views