Rajendra Gupta

Monitor SQL Server Always On Availability groups using extended events

October 20, 2020 by

In this 33rd article of SQL Server Always On Availability Group series, we will use extended events to monitor the availability group.

Introduction

Database professionals’ primary role is to do proactive monitoring for ensuring system availability. DBA must investigate the problem root cause analysis in case any critical event or downtime occurs. You can use various things for investigation purposes such as SQL Server logs, Windows & Cluster logs, and dynamic management views output, profiler and extended events.

SQL Server Always On Availability Group provide robust high availability and disaster recovery solution. It is equally essential and beneficial to monitor the AG group ownership, their synchronization states, failover, critical parameters. Let’s go ahead and see how extended events are used in monitoring the availability groups.

Environment details

You can use earlier articles in this series (see TOC at the bottom) and configure two-node SQL Server Always On Availability group replicas as shown below. The availability group databases are in the synchronized commit mode.

  • Primary Replica: SQLNode1\INST1
  • Secondary Replica: SQLNode2\INST1
  • Database: [MyNewDB]

SQL Server Always On Availability group environment

SQL Server Always On Monitoring using the extended events

SQL Server Extended events are a lightweight monitoring tool that captures useful SQL instance and database parameters. We can collect the required database and use it for monitoring and troubleshooting purpose. For example, in an article, Monitoring SQL Server deadlocks using the system_health extended event, we use it to get deadlock XML and graph using the default system_health session. We can use SSMS GUI or t-SQL to create, configure and retrieve data from extended event files.

Once we configure the availability group, the create Availability Group Wizard automatically creates the AlwaysOn_health extended event session on all replicas participating in the AG configuration. You can view it in SSMS navigating to Management -> Extended Events -> AlwaysOn_health.

If you configure the availability group using t-SQL or new availability group, it does not start the alwayson_health event session. Therefore, you should verify that the extended event session is running and collecting the required data.

AlwaysOn_health extended event

To understand the AlwaysOn_health event session, it is a good idea to script it out. Right-click on the event session-> Script Session as -> Create To-> New Query editor window.

Script Session

As shown above, it has several extended events along with the error numbers to capture. We use the sys.dm_xe_objects to retrieve a brief description of these extended events.

extended events

alwayson_ddl_executed:

This event occurs when we execute a data definition language (DDL) statement such as CREATE, ALTER, DROP on an availability group database. It does not capture detailed information about DDL. For example, you cannot retrieve an execution plan. Its main purpose is to indicate you with the DDL followed by planned or unexpected failover.

availability_group_lease_expired:

If the Windows failover cluster and availability group have a connectivity issue, it might cause the lease expired messages in the event logs. You might face automatic failover for the synchronous replica’s in case it occurs on the primary replica.

availability_replica_automatic_failover_validation:

For an automatic failover, SQL Server validates that the primary target replica is synchronized or not. It provides an AG failover point for failovers. DBA’s can use the information to investigate the cause of an automatic failover using this extended event.

availability_replica_manager_state_change:

This event occurs if the availability replica manager states changes. It is also a useful event session to investigate the reason for a different state.

availability_replica_state_change:

We can use this event to monitor the state of the availability group replica. It is also helpful to understand the internal of an AG failover.

error_reported:

It tracks various error numbers to find out the connectivity errors for availability group endpoints. You can query the sys.messages table to check the description for these different error id’s.

In the query output, you get the error id and its corresponding description.

error description

To summaries these error ids you can refer to the following table for errors:

Category

Error id

description

Consistency issues\ Corruption

823

824

829

Logical consistency error, disk corruption, database consistency issues reported by DBCC CHECKDB.

Changing roles

1480

It gives you a message once the AG replica changes its role, such as secondary to primary.

Endpoint

9691

9692

9693

If SQL Server could not listen for an IP address or port, it logs the message in the extended event. For example, in the case of the issue of SQL listener, we get the corresponding event.

Connection handshake

28034

28036

28047

28048

28080

You get these errors in case of connection handshake failure. For example, if your AG owner or service account does not have to connect permissions for the endpoint, it raises a connection handshake message.

Connection timeouts

35201

35202

35204

35206

35207

In case your availability group replicas get any connection timeouts, it logs the information in the extended event session.

Windows failover informational messages

41048

41049

41050

41051

41052

41053

41054

41055

These are the informational messages in different states of replicas during failover. It is useful to understand the failover process in SQL Server Always On Availability Groups.

AG failover issues

41142

41144

Suppose your SQL Server Always On Availability group replica tries for automatic failover, but the new primary replica is not ready for it, it logs the corresponding event in the extended event.

hadr_db_partner_set_sync_state

This event is to track the HADR partner sync state changes.

lock_redo_blocked

This event is useful to track the redo thread blocks that might cause latency in the data synchronizations.

Data Storage for Alwayson_health extended event

Open the Alwayson_health extended event properties and navigate to Data Storage. By default, it stores the collected data in an event file. This event file gets created in the instance log directory.

  • Maximum file size: 5 MB
  • Maximum number of files: 4
  • Enable file rollover: It rollover the files once it extended event collected data fills the 4 files with 5 MB each. It again starts overwriting the data from the first event file

Enable file rollover

Browse to your Log directory for SQL instance, and here you see the event files.

event files

Monitor SQL Server Always On Availability Groups using the Extended event

We can extract data from the AlwaysOn_health extended event using GUI and T-SQL script.

You can expand the AlwaysOn_health in SSMS, right-click on the package0.event_data file and select View targeted data. If you want to view the live data, click on the Watch live data on the root Alwayson_health folder in SSMS.

It opens the event viewer to show the collected information. You can filter, aggregate, and choose columns in this window for your required data.

Monitor AG using the Extended event

Sometimes, it is not convenient to get information using the graphical method. You can use t-SQL as well to get data from the extended event files and choose the required data.

To demonstrate the data collection for SQL Server Always On Availability Group, I suspend the data movement from the secondary replica. You can refer to series articles for understanding the suspend and resume data movement.

Expand Availability Group Database -> Suspend Data Movement.

Suspend Data Movement

Once the data movement suspends, wait for some time and again resume the data movement using the Resume Data movement wizard.

Resume Data Movement

Use the following query to filter the extended event collected data for error 35264(suspend),35265 (resume).

In the below query, we use sys.fn_xe_file_target_read_file to read the event file and filter the data.

In the output, you can see the suspend and resume data movement for the availability group database. In the suspended message, it specifies that the database [MyNewDB] is suspended due to user activity.

Filter the extended events

Similarly, you can filter the extended event session for error for troubleshooting SQL Server Always On Availability Group.

Apart from the default availability group, you can configure additional event sessions as per your requirement. To check the extended events for HADR, you can filter events from the sys.dm_xe_objects as shown below.

It gives you a list of 178 events. Here, you can see a snippet of all corresponding extended events.

additional event sessions

Conclusion

In this article, we learned the extended events for monitor SQL Server Always On Availability group related events and errors. You can add additional monitoring events to collect the relevant data as per your requirement.

Table of contents

A comprehensive guide to SQL Server Always On Availability Groups on Windows Server 2016
Configure Domain Controller and Active Directory for SQL Server Always On Availability Groups
Configure failover clusters, storage controllers and quorum configurations for SQL Server Always On Availability Groups
Install SQL Server 2019 on Windows Server 2016 with SQL Server Always On Availability Groups
Add a new node into existing SQL Server Always On Availability Groups
Configure Managed Service Accounts for SQL Server Always On Availability Groups
Add or remove a node from SQL Server Always On Availability Groups using T-SQL scripts
Database-level health detection in SQL Server Always On Availability Groups
Automatic Page Repair in SQL Server Always On Availability Groups
Deploy a domain-independent Windows Failover Cluster for SQL Server Always On Availability Groups
Configure a SQL Server Always On Availability Group on the domain-independent Failover Cluster
An overview of distributed SQL Server Always On Availability Groups
Deploy a distributed SQL Server Always On Availability Group
Monitor and failover a Distributed SQL Server Always On Availability Group
Transparent Data Encryption for SQL Server Always On Availability Groups
Configure SQL Server replication for a database in SQL Server Always On Availability Groups
Configuring SQL Server replication for distribution databases in SQL Server Always On Availability Groups
Explore Cross-database MSDTC for distributed transactions in SQL Server Always On Availability Groups
Deploy MSDTC for distributed transactions in SQL Server Always On Availability Groups
Restore an existing availability group database participating in SQL Server Always On Availability Groups
Exploring AG dashboards for monitoring SQL Server Always On Availability Groups
Backup compression in TDE enabled databases in SQL Server Always On Availability Groups
Impact of dropping a login in the active directory tied to SQL Server Always On Availability Groups
Refresh SQL Server Always On Availability Group databases using DBATools PowerShell
Configure SQL Server Reporting Services databases in SQL Server Always On Availability Groups
Suspend and Resume Data Movement in SQL Server Always On Availability Groups
Explore failover types in SQL Server Always On Availability Groups
Explore SQL Server PolyBase External Tables in SQL Server Always On Availability Groups
SQL Server Always On Availability Groups for SQL Server Linux instances
Column-level SQL Server encryption with SQL Server Always On Availability Groups
Make the most of secondary replicas in SQL Server Always On Availability Groups
Apply SQL Server patches or cumulative updates in SQL Server Always On Availability Groups
Monitor SQL Server Always On Availability groups using extended events
The Hub and Spoke model of policy-based management for SQL Server Always On Availability Groups
Custom policies for AG dashboards of SQL Server Always On Availability Groups
Explore dynamic management views for monitoring SQL Server Always On Availability Groups
Dynamic management views for monitoring availability replicas and databases for SQL Server Always On Availability

Rajendra Gupta
479 Views