Calculating Recovery Time Objective & the Maximum Tolerable Period of Disruption

How do you calculate the Recovery Time Objective and the Maximum Tolerable Period of Disruption?

  • Discussion with Top Management
  • Definitions
  • Regulations and Service Level Agreements
  • Business Continuity Management Programme
  • Alternative Views
Top Management

It is essential that Top Management are included at the very start of the Business Continuity Planning process – in fact, they should initiate it.

The discussion needs to start before the definition of Recovery Time Objective and Maximum Tolerable Period of Disruption. Who is allowed to make a declaration needs to be determined with the client and when is a declaration made?

Often the client desire is very different for Recovery Time Objective and Maximum Tolerable Period of Disruption than is currently available based on the contract. The difference between a want for the client, regulations (if any) and current capability can be huge.

This starts the negotiations. Often when the client understands the regulations and current capability, funds can be allocated to come up with ‘reasonable’ Recovery Time Objective and Maximum Tolerable Period of Disruption.

It is essential that the Recovery Time Objective and Maximum Tolerable Period of Disruption are achievable and if there’s a gap between requirements and reality then resources need to be allocated towards increasing capability and reducing Recovery Times.

If a client or management demands ZERO downtime but will not fund the requirement, there is an issue.

It is important, where possible, to refer to related decisions made by Top Management. These will usually be captured in minutes of management or board meetings. If there have not been any relevant decisions, then it is vital to seek management direction.

Before deciding on a Recovery Time Objective, management must check recovery capacity and the cost to achieve these objectives.

What is a Recovery Time Objective?

The period of time following an incident within which

  • Product or Service must be resumed, or
  • Resources must be recovered
  • For products, services and activities, the Recovery Time Objective must be less than the time it would take for the adverse impacts that would arise as a result of not providing a product/ service or performing an activity to become unacceptable.

(ISO 22301:2012 definition)

The Recovery Time Objective is the time between a potential risk occurring, to the time you have resumed your Product or Service. This needs to include

  • The time between the incident happening, someone noticing, escalating it, an assessment being made to invoke the Business Continuity Plan
  • The time to rally the troops and stop the service levels crashing any further than they have (that means fix it onsite or get your alternate providers going)
  • The time you are operating at reduced or no service, attempting to stabilize, and/or limping along at your backup location

Once you have resumed your Product or Service in a temporary fashion, you will then need to:

  • Recover to original service level, including any time to swap back to your original site from the alternate, or move to yet another location
  • Allocate additional resource to the service to reduce any backlogs that occurred during the outage, recovery or since
What is the Maximum Tolerable Period of Disruption?

The time it would take for adverse impacts, which might arise as a result of not providing a product/ service or performing an activity, to become unacceptable.

This is also known as the Maximum Acceptable Outage.

(ISO 22301:2012 definition)

Maximum Tolerable Period of Disruption is the time until you file for liquidation.

During this process there needs to be constant monitoring of:

  • who has been called,
  • who is available,
  • who are where they are needed,
  • who has got their sleeves rolled up and working on their bit of the plan,
  • who’s tired out and taking a rest after doing their bit

Your response is complete once Service Levels have returned to an acceptable level and you have settled into your “new normal”.

External Stakeholders

The calculation needs to also consider those key activities that involve external or outsourced business partners. This is especially important to avoid any assumptions.

What are the different kinds of Service Level Agreements?
  • Internal
  • Contractual
  • Legislative
  • Priority suppliers and their supply chains
  • Interested parties – clients, Top Management, staff
  • Your provider of recovery services (their Service Level Agreement to you), the balance between you as a customer and their other customers. Your reliance on their response capability and contractual obligations.

Failing a contractual Service Level Agreement to a client will impact your reputation, possibly incur financial penalties, future loss of business and profits.

Good practice is to agree flexible service levels during a disruptive incident acceptable to both parties, this is not business as usual. Failing a Service Level Agreement severely has consequences, especially if you are integral to the client’s recovery.

Why include Service Level Agreements?

Service Level Agreements are essential to Business as Usual and during a recovery you’re not in Business as Usual mode, which makes Recovery Time Objectives the more important metric.

Although this is true, Service Level Agreements should be taken into consideration while defining such strategies. Management need data in order to come to a minimum and maximum recovery time.

It’s worth considering Service Level Agreements when you’re starting the process and gathering information for discussion to define the Recovery Time Objective and Maximum Tolerable Period of Disruption.

Reviewing Service Level Agreements allow Top Management to make a fully informed decision. If they can understand all the different variables relying on this Product or Service, then they are better able to decide on an appropriate Recovery Time Objective and Maximum Tolerable Period of Disruption. 

Example – Service Level Agreements

If the decided timeframe is 4-24 hours (this is already set by the Management) and the Service Level Agreement time of that process is 20 hours – what would be the Recovery Time Objective and Maximum Tolerable Period of Disruption?

  • It depends on the Service Level Agreement and the penalties for not meeting it.
  • If your Service Level Agreement was very strictly 20 hours, you may set that as your Maximum Tolerable Period of Disruption and set your Recovery Time as less to build in some fat.

In that case, should the Recovery Time Objective be 10 hours and Maximum Tolerable Period of Disruption 20 hours? Which means we may ask them to take 10 hours maximum to resume the Product or Service.

It’s a good starting point but the important questions to ask now are:

  • Is 10 hours a realistic Recovery Time Objective? 
  • Is 20 hours really the Maximum Tolerable Period of Disruption?
  • Would the client understand it being pushed out in exceptional circumstances? You want to find the real maximum. 

If these times work for the client and management then you should record them. Then run an exercise and see what plays out.

Business Continuity Management Programme

The company gets the programme it is prepared to commit to and resource.

The resultant Business Continuity / Resilience programme should support the direction the Senior Leadership Team want the company to take, but there are influencing factors. As shown in the recent Bank of England Discussion paper DP01/18.

There is an increased focus on the customer and the minimization of the impact and duration any crisis may have on them, as well as minimizing the shock to the greater community/sector your company may exist within.

This is a profound change in perspective to the previous introspective stance of “maintaining a company’s profit-making capability”.

How does this impact your programme design?

It forces your programme to see its customers as its primary stakeholder resulting in a customer centric focus (rather than internal).

This means that the Recovery Time Objective/ Maximum Tolerable Period of Disruption parameters reflect the maximum disruption your customers can tolerate and still be able to continue to meet their commitments. In a banking example – to access and reconcile their bank accounts. Some would say, “but we do this anyway as a result of what we do” but is your programme really centred on answering the question “how does this affect our customers?”

Dangerous Assumptions

One of the most important concepts in Business Continuity Management is not to assume anything. Everything must be comprehensively recorded. False assumptions can result in sudden additional costs arising during a disaster, or a show stopper which completely derails your plan.

Alternative Views

If you feel that setting a Recovery Time Objective is an arbitrary exercise, then an alternative is to focus on:

  • Recovery Capability
  • Recovery Capacity
  • Functionality
  • Cost

About the Author

Laura Toplis is the Business Continuity Coordinator for the New Zealand Ambulance Service and has recently developed a new product called BCP Builder – which is an Online Business Continuity Plan Template. This allows businesses to easily prepare an ISO 22301:2012 compliant Business Continuity Plan. 

BCP Builder can help Small Businesses quickly design and build their own unique Business Continuity Plan to be better prepared when disaster strikes, respond rapidly and recover confidently.