This article will explain various business continuity options available for Azure SQL PaaS services (SQL databases and SQL managed instances). Today we will learn about several options so that you can choose a suitable business continuity solution for your SQL databases provisioned in the Azure cloud.
Business continuity is a capability through which we ensure our systems can keep on functioning or have a minor impact in case of any disaster or unplanned disruptions. We always plan a business continuity solution for our critical systems as we had done for our critical SQL Server databases systems hosted in on-premises datacenters in past. We used to implement Availability groups, robust backup & recovery mechanisms, log shipping, or other third-party solutions to ensure data protection and recovery time must be under our defined RTO and RPO values.
There are two very crucial terms (RTO and RPO) we must understand fully while deciding the business continuity plans. RTO is known as the recovery time objective. This is the maximum acceptable time you need to recover your application, system, or databases after disruption takes place. Another term is RPO which is the recovery point objective which means the maximum acceptable point till when you can lose your data. For example, if your RTO is 4 hours and RPO is 5 minutes it means you must recover your system in 4 hours whereas you can lose your data for 5 minutes since disruption takes place.
The values for RTO and RPO depend on the criticality of the business and systems, we generally decide these values after consulting with business people or clients, you can ask questions like how long you can bear the downtime or data loss in case of any disasters scenario. Once you will have these values, you can design or architect your systems that can meet up the RTO and RPO requirements.
As we all know, Azure cloud offers 3 deployment options under Azure SQL as given below. You can read more about all 3 deployment models of Azure SQL in the attached article Getting started with Azure SQL.
- SQL Server on Azure VM (IaaS)
- SQL Managed Instances (PaaS)
- SQL Database (PaaS)
For the IaaS deployment option, which is SQL Server on Azure VM, we can use all options like availability groups, log shipping, backup restores, or third-party solutions like storage mirroring as we do for SQL Server instances hosted in the on-prem datacenter. We are not going to discuss business continuity options for this deployment model in this article.
Let’s discuss various business continuity options for Azure SQL PaaS services SQL database and SQL managed instances. Both PaaS services (Azure SQL database and SQL managed instances) already come with in-built high availability capability that works based on availability group technology to overcome and recover itself if there is any issue like hardware or software failures in the cloud environment. But there are still some possibilities of different types of disruptions that can cause both PaaS services to not recover so we need to plan another way to protect and recover our data on time, here, business continuality comes in to picture to protect your databases from accidental data deletions, security attacks, data loss, or from any natural or manmade disaster which caused datacenter shutdown.
There are 4 business continuity options available for Azure SQL databases and SQL managed instances apart from its inbuilt high availability capability. Each option has its own RPO and RTO which is different from the other one.
- Automated backups
- Active geo-replication
- Failover groups
- Temporal tables
Backups are one of the legacies and traditional methods to protect our data. Both Azure SQL PaaS services also have backup and recovery methodologies to protect data and recover them at a point in time with the help of backup files. Backups for SQL databases and SQL-managed instances are designed to recover data points in time from defined short-term retention periods and protect the data for long-term retention for up to 10 years. Automatic backups for SQL databases run one full backup weekly, differential backup every 12 to 24 hours interval, and transaction log backup every 5 to 10 minutes. If you have RTO as 12 hours or more and RPO as 1 hour or more then you can choose automatic backup as a business continuity solution for your databases. If your RTO or RPO is less than the above-defined values, then you should also consider other options additionally which I am going to discuss in the next sections.
We can also configure backup storage redundancy to protect the backup files. There are 3 backup storage redundancy options available to protect these backup files created automatic backups.
- Geo-redundant backup storage
- Local redundant backup storage
- Zone redundant backup storage
Geo-redundant backup storage stores multiple copies of your backup files to the respective paired region, Local redundant backup storage keeps all backup copies locally on the same data center whereas Zone redundant backup storage configuration keeps backup files on the different data centers of the same region. I would recommend you go over my attached article, Understanding Backups for Azure SQL Database to understand backups.
With the help of automated backups, we can perform point-in-time recovery, geo restores, or recovery from a backup copy saved under a long-term retention period. You must always consider configuring automatic backups despite having other business continuity solutions like active geo-replication or failover groups.
This is an Azure SQL database feature to continuously replicate data to a single or more secondary SQL databases of a primary SQL database. This is nowhere related to SQL Server replication so you should not be confused between both terms. Active geo-replication uses availability group technologies to continuously replicate data asynchronously to its secondary databases. We can configure up to four secondary databases for an Azure SQL database. If you need more than four secondary databases, then you can configure secondary of secondary databases which is known as chaining. It is designed as a business continuity option for SQL databases to secure and protect your data in another location. We can failover the primary database to its respective secondary databases in case of any unplanned event or disaster recovery scenario. All other secondary databases will be automatically connected to the new primary database post successful failover. It does not support automatic failover so manual intervention will be required to failover it to its respective secondary database. Active geo-replication supports RTO as 30 seconds and RPO as 5 seconds.
Have a look at the below design:
Image Source: MSDN documentation
Secondary databases are also known as geo secondaries or geo replicas which can also be used to serve read-only transactions like availability group secondary replicas. We can offload read-only queries from the primary database and route them to respective geo secondaries to load balance the database workload. Another use of active geo-replication is in the case of database migration to other servers.
This feature can be configured for a single SQL database or a pool of databases known as an elastic pool, but it does not support SQL-managed instances. There is another solution to achieve business continuity for SQL-managed instances which is known as failover groups which I will describe in the next section.
There are some limitations with active geo-replication like it cannot be configured for SQL-managed instances and it does not support automatic failover etc. Microsoft has developed another business continuity feature called failover group which can fulfill these limitations that is there in geo-replication. We can configure it as a business continuity solution for Azure SQL databases as well as SQL-managed instances, and it also supports automatic failover in case of any disruption. Failover groups support 1 hour as RTO and 5 seconds as RPO. If your RTO is less than 1 hour, then you must consider active geo-replication as a business continuity solution.
Below is the design diagram of this feature about how it works. Although its high-level architecture is based on geo-replication only, you can notice it is slightly different than active geo-replication. You need listeners to read-write or read-only queries in the failover group, unlike geo-replication.
Image Source: MSDN documentation
We can configure one or multiple SQL databases as part of a failover group. Even we can configure multiple failover groups between primary and secondary servers hosted in different regions. You should note that you can only create one secondary server in a different region for failover groups.
If one or more databases will become inaccessible, then the failover group will initiate the failover process to its respective secondary server in case you have opted for automatic failover which is also a default configuration. The automatic failover can be disabled post successful failover to the secondary region to ensure there is no failback attempt. Also, the read-only listener will be disabled to ensure optimum performance, you can later enable it so that both types of transactions can start flowing to the new primary databases.
The temporal tables enable you to track and analyze the changes happening to the data in Azure SQL databases. This feature can be configured at the table level, and it can also protect your data from being accidentally deleted as well. You can recover your desired data manually with the help of T-SQL statements. Once you will make any table temporal by configuring it then a history table is maintained to track all its changes. You can use that history table to recover your data in case of deletions or updates.
Today I have explored various business continuity solutions available for Azure SQL databases and Azure SQL managed instances in this article. We have discussed automatic backups, active geo-replication, failover groups, and temporal tables in the above sections. As part of a business continuity solution for SQL databases, we should choose a combination of solutions like automated backups with active geo-replication or failover groups to ensure our data is fully protected and recoverable in case of any major outage or disaster. Stay tuned for my next articles in which I will explain each one of the above solutions in detail with steps on how to configure them.