In this 27th article of the SQL Server Always On Availability Group series, we will discuss various failover modes for AG databases.
In the earlier articles, we saw that the SQL Server Always On Availability Group provides high availability and disaster recovery solution in SQL Server. As we have explored various configurations for AG, it is essential to know the failover options present for your availability groups. It helps to take the decision in case of any infrastructure issues and utilize AG to bring your database online.
Failover options in the SQL Server Always On Availability Group
Failover is a process to switch the roles of the availability replicas in an AG configuration. Once a failover occurs, the failover target (current secondary) takes the primary role and brings the database online to accept connections. Similarly, in a failback, the old primary (current secondary) again takes charge of the availability group as a primary replica.
We can have one primary and multiple secondary replicas depending upon different SQL Server versions. These multiple replicas can be in the Synchronous, or Asynchronous data commit mode.
- Synchronous: Primary receives the acknowledgement from the secondary replica, and then it commits transaction on the primary
- Asynchronous: Primary commits the transactions without waiting for any confirmation from the secondary replica
The failover modes in SQL Server Always On Availability Groups depend upon these Synchronous or Asynchronous commit. In a high-level, SQL Server supports the following failovers in AG.
- Automatic failover
- Planned Manual failover
- Forced failover (manual)
Let’s understand these failover types as per the data synchronization methods.
Failover modes in a Synchronous Commit Mode for SQL Server Always On Availability Groups
We can do automatic failover and planned manual failover for SQL Server Always On Availability Groups in the synchronous commit replica.
Automatic failover in SQL Server Always On Availability Groups
This type of failover automatically occurs in the case of the primary replica goes down.
- For an automatic failover, both primary and secondary replica (minimum one secondary) must be configured with the Synchronous commit and Automatic failover
- Both replicas should be synchronized
To verify your AG’s support automatic failover, right-click on the availability group and open its properties. It shows the highlighted configuration.
- Availability Mode: Synchronous commit
- Failover Mode: Automatic
Let’s see automatic failover in action. To initiate an automatic failover, I rebooted my primary replica intentionally. Before shutting down the primary replica, verify that your AG databases in synchronized. It also does not show any data loss on the dashboard.
- Note: Do not perform these steps in a production environment for testing purposes
- The current primary replica is SQLNode2\INST1
- Secondary replica: SQLNode1\INST1
Reboot the SQLNode2 and launch the dashboard from the new primary replica SQLNode1\INST1. We can see a new primary replica SQLNode1\INST1. The AG dashboard is in a critical state because the old primary is not up yet, and status is Not Synchronizing.
Once the server comes up (SQLNode2\INST1), and it captures the transaction logs, the status changes to Synchronized.
To investigate the background work done by SQL Server, view the error logs on the new primary replica. I highlight the useful messages for your reference.
It terminates the availability group connection with the primary database on SQLNode2\INST1
The secondary replica database changes roles from the Secondary to Resolving
It logs an entry in the error logs for the hardened LSN and Commits LSN
The secondary replica (SQLNode1\INST1) prepares to transition the role to the primary
The AG group status changes from the Resolving_Normal to Primary_Pending
It starts listening to the SQL listener
The local availability replica status changes from the Primary_Pending to Primary_Normal
Now, the availability group database role changes from Resolving to Primary
Our databases were in Synchronized mode. It performs the database recovery and notifies if any transactions rolled back is required
The old primary replica comes online. It detects that another replica is now the owner of the availability group. It establishes a connection with the new primary replica SQLNode2\INST1. It catches all pending transaction records from the current primary replica and comes in the Synchronized state. We can failback resources now once the status changes to Synchronized
Planned Manual failover in SQL Server Always On Availability Groups
The database administrator initiates a planned failover, and it causes the synchronized secondary replica to take over the role as the primary replica. For a manual failover, your AG databases should be synchronized to confirm that your secondary database copy is similar to the current primary database.
Suppose you have multiple availability groups in your SQL instance. Due to some specific requirement, you want to failover a specific availability group databases. If you stop SQL Services on the primary replica, it fails all synchronized availability group. In this case, the planned manual failover works. You can failover a specific availability group however point to note that it fails over all databases in that availability group. We cannot failover a specific database from the availability group.
You can use both automatic and manual failover in the synchronized data mode. In the case of multiple replicas, we might set a secondary replica for manual failover, as shown below.
We can initiate a manual failover from both SSMS GUI wizard and T-SQL. Right-click on the availability group to failover from the primary instance and go to failover.
Go through the failover wizard and perform an AG failover.
Alternatively, you can run the Alter Availability Group command to initiate a failover from the primary to the secondary replica.
ALTER AVAILABILITY GROUP [SQLShackDemoAG] FAILOVER;
Failover modes in an Asynchronous Commit Mode
We cannot have automatic or planned manual failover in an asynchronous commit mode. In a forced failover, we can failover to a secondary replica, especially in a disaster recovery scenario.
In an asynchronous commit, the secondary replica might fall behind with the primary database. Therefore, if failover, it causes data loss depending upon the sync lag. You should do force AG failover only if you need to restore database availability and you can afford to lose some data.
For a forced failover, your availability database should be in the Asynchronous commit mode. It only supports a manual failover. If your AG database is in synchronous mode, but these databases are not synchronized, it only can have a forced failover.
You can initiate a forced failover from the secondary replica. Before failover, you can measure the latency between primary and secondary replica using the Measuring Availability Group synchronization lag.
Connect to the secondary replica of SQL Server Always On Availability Group in SSMS and start a failover process. In the following screenshot, it shows a possible data loss warning if we perform a failover.
Click on the warning, and you get the message that databases are not in a Synchronized state. It could result in the data loss for any transactions that did not reach to the secondary instance before failover.
Confirm to perform a failover with the potential data loss.
Validate the failover configurations. In the failover actions, you get a message that this replica is not a failover ready.
It performs manual force failover. In the below AG dashboard, the primary replica is SQLNode2\INST1. AG dashboard is not healthy.
In the force failover, SQL Server starts a new recovery fork. Both the old primary and secondary database has different restore forks. Therefore, if we do a force failover, SQL Server suspends data movement to all secondary replicas. It is for you to verify the data loss and take a decision to suspend or resume the data movement.
In the availability database, resume data movement in the primary and secondary replicas, as shown below.
It synchronizes the AG with the new primary replica after the force failover.
Note: The secondary database can track a maximum of two recovery forks. In the case of multiple forced failovers, you might see data synchronization issues even if you resume it. In this case, remove the secondary database and rejoin after restoring the latest backups.
In this article, we explored a different kind of failover in SQL Server Always On Availability Groups. You should analyze your requirement and configure the AG to support automatic or manual failover.