Do you know your Recovery Point and Recovery Time (RPO and RTO) Objectives?

Introduction

I have been working on a client project over the last few weeks, well it’s more like a few months now; as I was asked to help with a SQL Server performance issue. Once the performance problem was sorted, I ended up taking on many of the other database administration and architecture tasks that come with a green field project; including the database recovery strategy for the new system.

I really enjoy these type of  client engagements, as it always starts off with small issue’s; but it quickly unravels into many tasks that need to be done properly. It doesn’t matter what size the organization is either big or small, there is always something that can be done better or; something that hasn’t been given due consideration that could cause an issue down the line.

Once the performance issue had been rectified I was asked to input and look at the high availability and disaster recovery solution, of the SQL Servers that were running their CRM system. My clients informed me that they wanted to potentially use SQL Server Availability Groups.

I like availability groups, but I’m also conscious of using the right tool for the job. So starting any high availability disaster recovery plan design, I asked what the recovery point objective (RPO) and recovery time objective (RTO) would be for the final solution.

Why are these questions important? Sometimes one of these get overlooked and it can have a detrimental effect, which is best illustrated by a fun story.

A Fun Story

Often with RPO and RTO one of these factors can be overlooked, neglected or not considered fully. Lets look at a third hand example I heard once at a conference. This is the scenario:

  • An organization needed to restore their database.
  • They had a full backup.
  • They had been taking transaction log backups every 15 minutes since the last full backup.
  • There was a major issue and their mission critical database would need restoring.
  • The transaction log was in good shape, so a tail log backup could be taken.
  • No data would be lost.

Sounds good thus far right? Let me add a little more information:

  • The last full backup was taken two years previously!
  • It was on a tape drive and the tape would need to be found in the archive
  • 70000+ plus transaction log backups would need to be restored.

Now as the story goes the full backup was found in the tape archive. All 70000 plus logs were found and restored. If one of the 70000 logs had gone missing, the restore with no data loss would not be possible. It turns out all the backup files were available and could be restored… And they did restore them.

The process of doing that took in excess of 2 weeks. The business was crippled for two weeks! Could your business stop functioning for two weeks?

Recovery Time Objectives (RTO)

RTO = How long can the service/application be down/unavailable? When does everything need to be up and running again?

In the above example RTO was not considered in the backup strategy.

Recovery Point Objectives (RPO)

RPO = How much data loss is acceptable. 10 minutes, 30 minutes, 1 hour, 1 day?

You get the idea. The ‘system’ must be back up and running within 4 hours with at the very most 10 minutes’ data loss.

Considering RPO and RTO is key in delivering a solution that meets the business needs.

The right tools for the Job

Back to my scenario. My client told me that in the event of a disaster, they would want the system back in service in 4 hours. There we have our RTO. 4 hours.

How much data loss was acceptable? 15 minutes was the answer.

Throw into the mix the fact they want a separate reporting environment and Availability Groups seems like a reasonable solution. So that’s what we decided to go with.

The following is high-level design of what we had setup

Like most things in SQL Server, there are potentially other solutions that could have been used to achieve this multi-site setup; perhaps using log-shipping. With some additional work needed to successfully failover the databases and reconfigure the application, to point to the correct database and database server.

SQL Server Availability Groups

Like most things in SQL Server, there are potentially other solutions that could have been used to achieve this multi-site setup; perhaps using log-shipping. With some additional work needed to successfully failover the databases and reconfigure the application, to point to the correct database and database server.

The key piece in all of this is implementing a solution that meets the needs of the business and at the heart of this, is a correct backup and recovery strategy and more importantly; knowing how to restore.

If your reading this ask yourself whether you have the correct backup and recovery strategy in place? Do you have a solution in place that meets the businesses backup, recovery and high availability needs?

Type to search blog.learningtree.com

Do you mean "" ?

Sorry, no results were found for your query.

Please check your spelling and try your search again.