As the business landscape changes, so does the nature of potential disruptions. More businesses than ever before are joining forces to succeed, both globally and domestically, creating new and complex interdependencies. Increased reliance on digital tools places data and operations at the mercy of external parties. And unpredictable environmental or societal happenings in far-flung parts of the world could have negative consequences for businesses around the globe.
Threats like these are the reason business continuity (BC) exists—to try to mitigate the risks posed by disruptive incidents like these, plan for them, and prepare to weather any possible storm. But can you be prepared for the sheer number of business continuity risks in the world?
Because it’s such a challenge, smart BC managers take a strategic approach to managing business continuity risks and threats, asking: Which threats would have the most substantial impact on our mission and operations? Which are most likely to happen? Granted, these factors change continuously and require constant review—another challenge in and of itself. Whatever plans you do put into place, however, should be carefully, thoughtfully, and thoroughly crafted.
In this article, we’ll cover three areas that will help you implement a more strategic way of managing business continuity risks. First, we’ll review five threats to almost every organization that are often overlooked (and more likely to happen than their more commonly mentioned counterparts). Then, we’ll discuss the concepts of inherent vs. residual risk and how they apply to the creation of business recovery plans. Finally, we’ll lay out the formula for residual risk calculation as a way of evaluating how well the plans you’ve created will actually perform.
The business continuity threats and risks for organizations we see here at MHA aren’t the ones you might think of first, like natural disasters, terrorist incidents, or blackouts. Sure, those things happen, but many companies have planned for disruptions like these and can point to a well-defined strategy already in place. (If that’s the case for you, then you’re doing well—we advocate that you always consider Mother Nature in your risk assessment plans and factor in the risks associated with your geographic location.)
But the more frequent occurrences may be the ones you’re not planning for—simply because they don’t appear to be as threatening as they actually are. Below are four categories of business continuity risks that should have a place in your business continuity risk assessment matrix—and recovery plans to address them. The below risks may present a greater threat for some organizations than others, depending on the characteristics of your business and your risk tolerance.
Data breaches are happening so frequently nowadays that we no longer talk about if your organization gets hit, but when. Every organization—particularly small businesses with little to no data protection—will inevitably experience some type of data breach, so it’s important to consider the risks associated with it. Not only will cyber attackers try to hack into your environment directly, they may also employ ransomware attacks and phishing attempts to access your data. All types of breaches are occurring at an alarming rate; in fact, Half of small businesses that are attacked go out of business within six months as a result
This is both the most common threat category and the one with the most potential to impact an organization’s financial situation—as well as its brand. Before it happens, consider the valuable data you have, what could be compromised, and the possible repercussions should a breach ever occur. Also take into consideration any regulatory requirements that might make it necessary to bolster your protection.
This category encapsulates the mistakes people make, particularly those related to technology. It might be anything from a simple programming mistake to a misstep brought on by the complexity of massive technological systems, but it’s not hard to imagine that such a mistake can take a company down. (See the latest computer glitches with United and Delta airlines, as well as Amazon’s recent AWS outage.) Human error poses a bigger risk for organizations that are highly dependent on technology and tech workers.
Many types of common disruptive events—like floods, snowstorms, etc.—are disruptive to individuals as well as businesses. You can bet that in the event of such an emergency, most people will take care of themselves and their families before going to work. So while it’s good to have plans to move workers to another location in the event a building is inaccessible, there may simply be no one available to go.
Technology is vulnerable to single points of failure (SPOF)—a situation where the failure of one component of a network environment would take down the rest of the system. (For instance, consider an application that relies on a single database server; if the server goes down, so does the application.)
To protect against SPOF, ensure that critical technology components are redundant—for example, that you have multiple databases or secondary servers available that can be activated in an appropriate timeframe. Most hardware devices have redundancy built in for exactly that reason, but small organizations beware: If you use consumer-grade computers as your server (those not built with redundancy in mind), a failure could have a major impact on business operations. Also, many organizations today are scattered geographically and heavily rely on their networks to do business, making a single point of failure within the data network a real business continuity risk.
The concept of single points of failure also applies to human availability. This is especially relevant for a very lean workforce. When a regional call center that employs 150 people has 10 who can’t come to work, that’s not a big deal. But if that call center has two people and one can’t come, that’s a much bigger deal. Similarly, your level of risk is dependent on the functions people perform, some of which are more critical than others. If you can’t live without someone for a week because they have particular knowledge no one else has (have you ever placed calls to them during a vacation?), that’s a problem you need to address. Even a business that’s highly dependent on technology (like an automated factory, for instance) still needs at least a few humans to work.
Some businesses perform functions that are associated with inherently high risk, whether it’s from a standpoint of malpractice; individual health, life, and safety; or potentially dangerous operations. Hospitals, healthcare organizations, and chemical manufacturing plants are all examples of risky businesses.
Consider the primary function of your organization in terms of how it might impact the organization and its resiliency. For those businesses with risk-based functions, the stability of the organization at the leadership level is a critical consideration. And again, single point of failure comes into play: If you can’t function, is that a single point of failure for you? If the answer is yes, you need to plan for those possibilities.
Now that you have a better idea of the most frequently occurring business continuity risks, it’s time to create your business recovery plans. Many companies put extensive time and effort into crafting business recovery plans, and rightfully so—they are the key to the continuation of business processes in the event of a disruption.
The problem is, some business continuity managers don’t really want to know if the plans they’ve put into place will work.
If you do—and you want to significantly increase the likelihood that your recovery plans will succeed—you’ll use the concepts of inherent and residual risk to assess their effectiveness and make the required adjustments to improve.
NASA takes risk management seriously. In its own words, “Effective risk management is critical to mission success.” NASA’s ideas and practices related to risk management got us to the moon and beyond, which is why we advocate for applying similarly high standards to the practice of business continuity management. Your organization may not be preparing to search for signs of past microbial life on Mars, but your company’s mission is critical in its own way—especially for the people you employ and the customers you serve.
One of those ideas involves inherent vs. residual risk. NASA was one of the first to apply these concepts as a way of evaluating risk; other organizations and industries have employed them as well. We think they can be equally effective in evaluating risk for businesses. Let’s examine the differences between the two concepts and how they can be used in business continuity.
Inherent risk is the risk of the entity you’re trying to measure, without mitigating controls.
In the case of business continuity, we’re talking about the risks associated with a particular recovery plan for a particular business unit—for instance, the accounts payable department, the call center, or the SAP system. Inherent risk is what it is. It’s formed by the realities that exist before you’ve made any attempt to address them, and will influence the development of your recovery plan.
The inherent risk associated with a recovery plan is made up of two factors related to the business unit the plan covers:
Inherent risk is used in calculating residual risk.
The residual risk is the amount of risk that remains after all efforts have been made to identify and eliminate risk (i.e., your mitigating controls). If you really want to know if the business recovery plans you’ve put into place will work or not, you should be using the concept of residual risk as part of your business continuity management strategy.
Closely interwoven with inherent risk, residual risk can serve as justification for the time and resources required to support your recovery needs. By definition, it is the risk that remains after all efforts have been made to identify and eliminate risk.
The efforts you’ve made to identify and eliminate risk must include:
In the end, your calculations for residual risk will tell you definitively if the business continuity program you’ve spent time, money, and resources on can be executed effectively—or where your organization may be exceeding the recovery needs of the business, allowing you to make adjustments and conserve resources.
Inherent and residual risk go hand in hand. Despite their value, however, very few organizations do the legwork required to evaluate the inherent and residual risk in their business and/or information technology recovery plans. While the process may uncover areas in need of improvement, it also helps organizations to optimize valuable resources and effectively minimize risk. We’ll take a closer look at how to calculate residual risk in the next section.
A residual risk calculation will tell you definitively if you are doing enough to support your business recovery plan. Despite the fact that many businesses are devoting time and resources to creating business recovery strategies, few are concerned with measuring the effectiveness of their efforts—only three people in our most recent seminar of 50 were measuring risk at all. In fact, no recovery strategy is complete until you’ve taken this important step. Wondering how to calculate residual risk? Take a look below to see how we do it.
A. First, determine the recovery time objective (RTO) for the business unit. Though there may be two, three, four, or more processes associated with a particular unit, the residual risk formula considers only the RTO of the most critical process. So if Process A needs to be recovered in 24 hours and Process B in 48 hours, evaluate the business recovery plan for the unit using only the RTO for Process A.
The RTOs of each business unit and their business processes should have been uncovered as part of the BIA process.
B. Next, determine the business impact score. Each RTO category has a level of potential business impact associated with it. A critical business unit with a very short recovery timeframe indicates a high level of criticality and would therefore have a significant impact on the business should a disruption occur versus a business unit with a much longer recovery timeframe. Each RTO would have a corresponding impact score associated with it, such as:
1 = Insignificant Impact
2 = Minimal Impact
3 = Moderate Impact
4 = Critical Impact
5 = Catastrophic Impact
Putting It Into Practice
If, for example, the RTO of a call center is identified as 12 hours or less, this typically indicates a highly critical process. Based on the criticality assessment, the call center plan would get a business impact score of 4 or 5.
C. Identify the threat landscape and assign a threat probability level. Evaluate the natural, human-made, and technological threats facing the business unit. Is it in a high-risk area geographically? Are its processes especially vulnerable to attack? Assign a threat,level score to the unit, with 5 being high, 3 being moderate, and 1 being low.
D. Calculate the inherent risk factor. Multiply the business impact score and the threat landscape score; then divide by 5. The resulting number is the plan’s inherent risk level.
What does the score mean?
Scores will range anywhere from 2.0 to 5.0. A score between 4 and 5 means that the plan has high inherent risk. A score between 3 and 3.9 has moderate inherent risk. Anything less than that has low inherent risk.
A. First, educate management. Management will be unfamiliar with the concept of residual risk calculation and its significance. It’s up to you to explain to the management team how it works and why it’s important.
B. Next, advise management on an acceptable level of risk tolerance. Based on the level of inherent risk, assign a percentage to indicate how much risk your management team should be willing to accept, for example:
The lower the percentage, the tighter your controls should be. The more effort you put into it, the better your chance of recovery will be.
C. Finally, calculate management’s level of risk tolerance. Multiply the percentage of risk tolerance times the inherent risk factor. The resulting score is your risk tolerance.
Putting It Into Practice
Based on an inherent risk factor (business impact score) of 5, we identified our level of risk tolerance as low (10%). Multiply the risk factor by the risk tolerance (10% x 5); that’s 0.5. So, your maximum risk tolerance is 0.5. To get your risk factor-tolerance score, subtract 0.5 from 5; that’s 4.5. This means our mitigating controls must be in a state that their level of capability adds up to 4.5 or better to be within tolerance.
A. First, assign weights to your mitigating controls based on their importance. The controls that we think protect a recovery plan are:
Controls should be weighted based on how important each one is to the success of the plan. In our view, the two most important controls (and the ones that should be most heavily weighted) are the recovery strategy (the plan you actually have in place to recover a particular business unit) and recovery exercises (the practice you’ve had testing the plan and its ability to help you recover).
B. Next, evaluate each of your mitigating controls against the standards. Is your recovery plan in line with the recommendations outlined in the standards? Depending on how well each control stands up to the recommended qualifications, give it either a 1 (poor), 3 (average), or 5 (best practice).
C. Finally, determine the weighted score of your mitigating controls. For each control, multiply the score times the weight. Then, add up those results to come up with one overall score for your mitigating controls (your mitigating control state).
Putting It Into Practice
If the BIA is scored a 5 (best practice) and is weighted 10%, multiply 10% by 5; that’s a weighted score of 0.5 for this mitigating control. Do same for each of the controls. Add the scores for each to determine your overall mitigating control state.
To complete the residual risk formula, compare the mitigating control state to the risk factor-tolerance number. Look at the resulting number. How close is it to the risk factor-tolerance number? If it’s equal to or greater than the risk factor-tolerance number, you are well within tolerance range. The business recovery plan you’ve created is right on the mark.
If the number is lower than your risk tolerance, the plan is insufficient. Depending on how far off the mark you are, you’ll have to take further action to improve the strength of your business recovery plan.
With so many factors to consider in the creation of your business recovery plans, navigating it all can be a challenge. The BCMMetrics™ suite of online self-assessment tools can help.
For further guidance on residual risk calculations, look to our Residual Risk (R2) application. It is designed to provide BCM practitioners and risk managers with a simple, quantitative method to evaluate risk. With it, you can easily assess the risk factor of each business unit or system/application recovery plan, weight the importance of mitigating controls and evaluate them, establish risk tolerance levels, and perform a residual risk calculation for each plan.
BCMMetrics™ also comes with the Compliance Confidence (C2) tool, which evaluates your BC program against multiple major industry standards and gives you a “FICO-like” score for your business continuity planning. You can also compare your own company’s score against the scores of other users, giving you an idea of where your program stands in relation to the rest of your industry.
Not all business continuity threats make the news (nor do well-crafted business recovery plans). But the impacts they have on your business are real. Schedule a free demo of our tools today, and take the first step toward better protecting your business.