Useful Considerations for SQL Server Disaster Recovery

This article discusses useful considerations for SQL Server Disaster Recovery.

An overview of RPO, RTO and SLA

In the digital era, data availability and security is a critical task for every organization. It is a database administrator’s duty to create a fail-safe mechanism to ensure your database is available as per the defined RPO, RTO and SLA.

Recovery Point Objective (RPO): It refers to the maximum amount of data you can afford to lose. It is aligned with your database backups and recovery strategies. For example, if you take database backup every hour, you can lose a maximum of one hour of data. If your RPO is 10 minutes, you are not in a good position from a data recovery point of view

In the below image, we see the RTO and RPO corresponding to SQL Server Disaster Recovery.

Recovery Time Objective (RTO)

Once a disaster occurs, DBA’s primary responsibility is to recover your database asap for business continuity. Therefore, the RTO is a measure of how long your organization can afford the downtime of databases before things come back to normal. In the above image, we see the recovery time objective after a disaster occurs. Usually, we should define our system and DR process in such a way to recover asap in less than the RTO.

Service Level Agreements (SLA)

The third useful parameter is the Service Level Agreement between the customer and the vendors. It covers the service quality, availability and responsibilities. It is defined based on the RPO and RTO of organization requirements for SQL Server Disaster Recovery.

There are several parameters you should consider meeting the recovery objectives (RPO, RTO) and SLA’s regarding the databases for SQL Server Disaster Recovery.

Define Database backups policy

You must define the backup policy for both critical and non-critical database systems. In SQL Server, we combine the full, differential, transaction log backups for restoring databases in case of any issues.

Combine different backup mechanism to ensure you minimize the downtime and effort in the restoration. For example, you use a combination of full, differential and transaction log backups for large databases while for smaller databases, you can take full backups and regular log backups. You can also leverage filegroup backups, piecemeal restores, the smart transaction log and differential backup as well to take backup depending upon data changes
Store the backups on a different media such as SAN, Tape, Cloud URL. You should not store the backups on the drives where your data and log file exists. In that particular drives goes down, you lose both data and backups
You must configure the backups even if you have high availability infrastructure such as VM snapshots, Storage level replication, Windows failover clusters or SQL Server Always On Availability Groups

Regular test the database backups

Sometimes, database professional feels good that they have 100% compliance on database backup. You also report 100% backup compliance to your regular reports to management. But, someday system crashes, and your backup would not work.

To avoid this humiliating scenario, database professionals must conduct regular database restoration drives. In these drives, you can choose a random restore point ( RPO) and restore your backups on a test environment and validate that you meet the RPO for SQL Server Disaster Recovery. It ensures your backup policy is aligned with your organization requirement as well as it boosts your confidence in database backups for any unforeseen situation.

Importance of System databases

Recently, I saw one scenario in which a SQL instance had a few critical databases for one of my customers. They use a third-party tool for their database backups, and it was configured to take a backup of all databases. All looks good at this point.

One day, due to a storage issue, the instance crashed. After all efforts, they could not recover the instance, so they built a new instance and tried to restore all databases. All user databases were restored but they realized that the backup tool was configured for backup of only user databases. As you did not take the system database backups, you can lose logins, credentials, proxies, jobs. It is a difficult situation, right.

Therefore, to avoid any such issues, always ensure to take database backups for the system databases. The system databases size is small, and you can take regular full backups quickly.

Database Consistency Check

You should add a database consistency check for your weekly or monthly maintenance schedule. It ensures the databases do not report any logical or physical consistency issues. If you perform regular restoration drills( as suggested above), run the consistency check on the restored database as well. It ensures your existing database backup does not hold consistency issues.

DR exercises for SQL Server Disaster Recovery

It is like a dry run to validate your preparation for the database recovery. Although it might not replicate the existing DR scenario, it is similar to a mock test for your strategies, preparations for handling any unforeseen issues. If you have configured the high availability scenarios such as SQL Server Always On Availability Groups, Failover clusters, then you should make sure you can failover for the secondary site. It would be best if you also asked your application teams to participate in the DR drills. They can also validate that the application works fine once you restore or recover the database.

Know your database infrastructure

A database professional should document the supporting infrastructure. Suppose after a disaster occurs, you prepared another SQL instance and restored the database backups. Your databases are online, but it might need additional configurations such as a linked server, Alias, port ( static\dynamic), MAX memory, MAXDOP, Parallelism, encryption, SQL Server edition, version, replication. For example, if you use replication to another production server, you should have details of the replication articles, subscribers, agent security, object filters. If you have documented the infrastructure, it becomes convenient for you to refer the documentation and prepare the things. You can schedule an agent job to script out the login security and permissions regularly at a shared location.

Using this documentation, you can take the right approach for database availability. For example, in the case of a TDE enabled database, you need a different approach for database backup and restores using the SQL Server certificates.

Use High availability and SQL Server Disaster Recovery solutions

You can use the high availability and disaster recovery solutions for your critical databases. These solutions could be SQL Server Always On Availability Groups, log shipping. Here, you maintain multiple copies of the existing databases and allows you to configure the standby copies of databases as well. It helps you to quickly redirect applications to the stand by or secondary database in case the primary database has issues. In the case of the AG group, you should ensure having a secondary synchronous copy of your critical databases. AG allows you to create multiple secondary replicas in your primary and secondary sites. Make sure you monitor the latency between the primary and secondary database copies. Simultaneously, you can create a DR database copy using the asynchronous data commit.

Prepare Checklists

A checklist is also essential for your SQL Server Disaster Recovery solutions. These checklists can have the following information.

You should have the Contact owner list for databases and applications. You should not look for the contact persons once a disaster strikes
Be ready with your organization escalation matrix, application, vendor support checklist
Critical contacts for Vendor support such as Microsoft premier support number to raise a high priority incident
Always note down the learning from your DR drills and recovery scenarios
Application dependency checklists

Conclusion

In this article, we explained several considerations for your SQL Server Disaster Recovery for SQL Server databases. The DR requirement might be different for different infrastructure. You should evaluate your requirements, recovery objectives and plan the solution that works best for you.

Author
Recent Posts

Rajendra Gupta

Hi! I am Rajendra Gupta, Database Specialist and Architect, helping organizations implement Microsoft SQL Server, Azure, Couchbase, AWS solutions fast and efficiently, fix related issues, and Performance Tuning with over 14 years of experience.

I am the author of the book "DP-300 Administering Relational Database on Microsoft Azure". I published more than 650 technical articles on MSSQLTips, SQLShack, Quest, CodingSight, and SeveralNines.

I am the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups.

Based on my contribution to the SQL Server community, I have been recognized as the prestigious Best Author of the Year continuously in 2019, 2020, and 2021 (2nd Rank) at SQLShack and the MSSQLTIPS champions award in 2020.

Personal Blog: https://www.dbblogger.com
I am always interested in new challenges so if you need consulting help, reach me at rajendra.gupta16@gmail.com

View all posts by Rajendra Gupta

Latest posts by Rajendra Gupta (see all)

How to install PostgreSQL on Ubuntu - July 13, 2023
How to use the CROSSTAB function in PostgreSQL - February 17, 2023
Learn the PostgreSQL COALESCE command - January 19, 2023