Rajendra Gupta
Monitor the distributed availability

Monitor and failover a Distributed SQL Server Always On Availability Group

August 26, 2020 by

In this 14th article of SQL Server Always On Availability Group series, we monitor and failover a distributed AG.

Introduction

We can implement a distributed availability group between multiple independent failover cluster nodes. In the previous articles, we understood the concept and also walked through T-SQL scripts to implement it. In this article, we will explore the T-SQL scripts to monitor the distributed availability group and failover process.

Prerequisites

You should follow previous articles, An Overview of distributed SQL Server Always On Availability Groups, and Deploy a distributed SQL Server Always On Availability Group in the series of SQL Server Always On Availability groups and configure a distributed availability group before you go further in this article.

Verification of the distributed SQL Server Always On Availability Group

View the distributed availability group in SSMS

We have configured the distributed availability group between two different Windows failover clusters. Now, connect to the primary replica of the primary cluster and expand the availability group.

You can see a keyword Distributed for the distributed availability group.

distributed SQL Server Always On Availability Group

If you connect to the secondary replica of the primary cluster, it does not show the distributed availability group. You do not see it here because the distributed AG configured on the listener URL that always connects with the primary replica. Your database [MyNewDB] is in the synchronized state as well.

View secondary replica

Similarly, you can connect to the forwarder replica and view the distributed availability group.

forwarder replica

View the AG dashboard for the SQL Server Always On Availability Group

Right-click on the distributed availability group, and you see a few options such as Show Dashboard, Failover…, Delete… and Properties are greyed out. You cannot launch these options from the SSMS GUI wizard.

View the AG dashboard for the SQL Server Always On Availability Group

If you try to launch the dashboard from the parent folder Always On Availability group for distributed availability group, you get the below error.

Dashboard error for SQL Server Always On Availability Groups

View the failover cluster resources for the distributed SQL Server Always On Availability Group

In a traditional availability group, you see the resources such as listeners in the failover cluster manager.

View the failover cluster resources

For a distributed SQL Server Always On Availability Group, you do not see any resources in this failover cluster manager. A distributed AG stores all metadata inside SQL Server. You cannot manage any distributed AG resources from the cluster manager.

Monitor distributed availability groups using dynamic management views

We can use dynamic management views(DMV’s) to monitor a distributed availability group, its performance, issues. We can use the following dynamic management views ( DMV) for AG monitoring.

The DMV to check replica status, role and failover mode

The below query returns the availability of replicas and their state information. It filters the records for a distributed availability group using the is_distributed column of the sys.availabilty_groups DMV.

In the output, we can see underlying AG as well that we used to set up a distributed availability group. The current sync status is healthy.

DMV to check replica status, role and failover mode

The query to check the underlying SQL Server Always On Availability Groupmetadata

In the below output, we can see underlying availability group configurations details such as endpoint_url, connected state, synchronization health, failover mode.

Query to check the underlying AG metadata

The query to check distributed availability group performance

In a traditional always on, we can use the SSMS dashboard or DMV to check AG performance. In the distributed availability group, we can check current sync status, log rate, redo queue size, redo rate from the DMV sys.dm_hadr_database_replicas_states.

Query to check distributed availability group performance

The query to filter the OS performance counters value for a distributed availability group

In SQL Server, we use DMV sys.dm_os_performance_counters to retrieve OS performance counter and their values directly from the SQL Server. You can use a similar DMV to monitor the health of a distributed availability group as well.

In this query, specify the distributed AG name, and you get the performance counters’ values.

Query to filter the OS performance counter

You can create a SQL job to capture these values frequently and monitor AG health conditions.

The query to monitor the automatic seeding in a distributed availability group

In a distributed availability group, it uses automatic seeding for the following:

  • It uses direct seeding for database synchronization from the primary replica (primary cluster – Global primary) to the secondary replica (primary cluster). In our case, it uses automatic seeding from [SQLNode1\INST1] to [SQLNode2\INST2] for [AG1] availability group
  • It uses another direct seeding from the global primary (primary cluster) to the forwarder( primary replica of the secondary cluster). In our case, it uses automatic seeding from the [AG1] to [AG2] availability group from the [MyNewDB] database

We can see the automatic seeding status, its start and end time, source to destination replica in the below image. It uses an additional DMV sys.dm_hadr_automatic_seeding to monitor sync status. It shows NULL for the failure_state and error_code because our automatic seeding is successful.

Query to monitor the automatic seeding

Failover from the primary to secondary cluster AG in a distributed SQL Server Always On Availability Group

In a distributed availability group, we can do a manual failover. It does not support automatic failovers because the replicas are in different clusters. Let’s perform a manual failover for distributed AG.

  • We have a distributed availability group in asynchronous data commit configuration. In this configuration, we can have data loss if we perform an AG failover. We can set the distributed availability group in the synchronized mode. You need to run the query in both primary replicas of separate clusters

Change the data commit mode

  • Monitor the distributed availability group. It should be in SYNCHRONIZED mode, and last_hardened_lsn should be the same for the primary replica and forwarder. It might take some time for AG to be in the synchronized state depending upon the active transactions. You should not proceed until AG’s are fully synchronized

    Monitor the distributed availability

  • Run the below query to change the role of a primary replica of the primary Availability Group to secondary in a distributed availability group. You must run this command on the primary replica of the primary cluster( global primary)

    It modifies the distributed availability group in an offline state. It terminates all connections for the AG in the global replica.

  • Perform a manual failover with FORCE_FAILOVER_ALLOW_DATA_LOSS parameter. Run this command on both the current primary replica of the secondary availability group

  • You can run the monitoring query and see that the availability group [ag2] is now having the primary role

Failover a distributed AG

  • You can also monitor the forwarder SQL Server error logs. It gives details information about transitioning the role from the forwarder to the global primary

    View SQL Server error log

It might take a few minutes for databases to be in synchronized state again. You can monitor the health of the distributed availability group using the above-specified examples.

You can follow the same steps and perform a failback as well.

Impact of the distributed availability group by the failover within a cluster

In the previous section, we performed a distributed availability group failover from the global primary replica to forwarder replica. As you know, that the distributed AG works on the SQL Listener, and a SQL listener always points to the primary replica.

Suppose you are applying Windows patches to the AG replicas in the primary site. For this requirement, you follow below high-level steps.

  • Apply Windows patch on the secondary replica
  • Perform a failover from the Current primary replica to secondary replica
  • Apply Windows patch on the current secondary replica
  • Failback

In my demo, the current primary replica is on SQLNode1 for the primary cluster.

Let’s perform a failover from the SQLNode1 to SQLNode2. You can note here that both the nodes part of a Windows failover cluster. It affects the [MyNewDB] database that is part of the distributed availability group.

AG failover wizard

It performs a failover for the [AG] availability group.

Failover status

Rerun the query to check the current replica owner. Oh! Where are my distributed availability group?

View AG health

Wait! Don’t worry. We have switched the distributed availability group, so the new primary replica is the SQLNode2.

Refresh the availability group in the SQLNode2, and you see the [AG1] availability group is primary on it.

Refresh the availability group

It also causes a distributed availability group to reestablish the connections from the global primary. During the failover, the transactions could not move from the global primary to the forwarder. You can see similar events in the SQL Server logs.

View error log entries

Conclusion

In this article, we performed failover in a distributed SQL Server Always On Availability Group and monitored its performance using dynamic management views.

Attachment

Table of contents

A comprehensive guide to SQL Server Always On Availability Groups on Windows Server 2016
Configure Domain Controller and Active Directory for SQL Server Always On Availability Groups
Configure failover clusters, storage controllers and quorum configurations for SQL Server Always On Availability Groups
Install SQL Server 2019 on Windows Server 2016 with SQL Server Always On Availability Groups
Add a new node into existing SQL Server Always On Availability Groups
Configure Managed Service Accounts for SQL Server Always On Availability Groups
Add or remove a node from SQL Server Always On Availability Groups using T-SQL scripts
Database-level health detection in SQL Server Always On Availability Groups
Automatic Page Repair in SQL Server Always On Availability Groups
Deploy a domain-independent Windows Failover Cluster for SQL Server Always On Availability Groups
Deploy a domain-independent Windows Failover Cluster for SQL Server Always On Availability Groups
An overview of distributed SQL Server Always On Availability Groups
Deploy a distributed SQL Server Always On Availability Group
Monitor and failover a Distributed SQL Server Always On Availability Group
Transparent Data Encryption for SQL Server Always On Availability Groups
Configure SQL Server replication for a database in SQL Server Always On Availability Groups
Configuring SQL Server replication for distribution databases in SQL Server Always On Availability Groups
Explore Cross-database MSDTC for distributed transactions in SQL Server Always On Availability Groups
Deploy MSDTC for distributed transactions in SQL Server Always On Availability Groups
Restore an existing availability group database participating in SQL Server Always On Availability Groups
Exploring AG dashboards for monitoring SQL Server Always On Availability Groups
Backup compression in TDE enabled databases in SQL Server Always On Availability Groups
Impact of dropping a login in the active directory tied to SQL Server Always On Availability Groups
Refresh SQL Server Always On Availability Group databases using DBATools PowerShell
Configure SQL Server Reporting Services databases in SQL Server Always On Availability Groups
Suspend and Resume Data Movement in SQL Server Always On Availability Groups

Rajendra Gupta
489 Views