Ivan Stankovic

What is a SQL Server failover clustering

March 14, 2014 by

What is a failover cluster?

SQL Server failover clusters are made of group of servers that run cluster enabled applications in a special way to minimize downtime. A failover is a process that happens if one node crashes, or becomes unavailable and the other one takes over and restarts the application automatically without human intervention

What does SQL Server failover clustering provide?

A SQL Server failover cluster is also known as a High-availability cluster, as it provides redundancy for critical systems. The main concept behind failover clustering it to eliminate a single point of failure by including multiple network connections and shared data storage connected via SAN (Storage area network) or NAS (Network attached storage)

Each node in a cluster environment is monitored all the time via a private network connection called the heartbeat. A system must be able to overcome the situation called “split-brain” which occurs if all heartbeat links go down simultaneously. Then, all other nodes can conclude that one node is down and will try to restart the application on themselves. A failover cluster uses a quorum-based approach to monitor overall cluster health and maximize node-level fault tolerance

Example

A cluster named CLUSTER-01 contains two servers – nodes, named CLUSTER-01-SRV-01, CLUSTER-01-SRV-02. There is one SQL Server instance called SQL-INST-01. Also, there is a shared storage connected to the all three servers

When the server CLUSTER-01-SRV-01 crashes, the failover cluster service in CLUSTER-01 is aware of the situation through the heartbeat and automatically starts the SQL Server instance SQL-INST-01 on the CLUSTER-01-SRV-02 server

SQL Server failover clustering - cluster configuration with two servers-nodes

In a SQL Server failover cluster, data needs to be on a shared storage. The cluster can move the SQL Server instance if one node is having a problem because all the data is shared. This solution can guarantee higher up-time and redundancy. Because there is only one storage space, regular SQL Server maintenance requirements are still needed. Also, if the shared storage isn’t redundant, after a storage failure, the SQL Server database will be unavailable. For the busy SQL Server environments, where the downtime is measured in seconds, the “falling over” time needs to be considered because the change between nodes isn’t instant

SQL Server failover cluster configurations

There are four main node configurations available in SQL Server failover clustering: Active/Active (Multi-Instance Failover Cluster), Active/Passive, N+1, and N+M

An active/active failover cluster or multi-instance failover cluster, shares resources between virtual servers. Each node can host two or more virtual servers at the same time. Traffic can be passed onto the second active node or can be load balanced across the remaining nodes if there is more than one node left active

Active/passive failover clusters have standby nodes that are activated only when the primary node is down. The primary node owns all the resources. In case of a failure, the standby node takes over all the resources and recovers the database from the database files and transaction logs

An N+1 failover cluster is based on active/passive nodes where two or more nodes share the same failover node. In the situation where all N nodes fail, the standby node must be capable to take over all load

An N+M failover cluster has two or more active nodes and two or more standby nodes. It is cheaper for implementation than the N+1 configuration, because the load can be distributed to more than one standby node

Ivan Stankovic
168 Views