An entire book could easily be written on business continuity testing —in fact, we’re developing an ebook on this topic right now. We plan to publish it next year, but in the meantime, I wanted to share with you three things about BC testing that everyone in our field should know, but which few people do.
Related on BCMMETRICS: Let’s Get Real: The Limitations of Tabletop Recovery Exercises
As you are no doubt aware, in our field testing refers to the exercises we conduct to see whether our recovery plans are fully functional.
As business continuity professionals, we write plans in three areas, business continuity (BC), information technology/disaster recovery (IT/DR), and crisis management (CM). The goal of these plans is to ensure that these areas are resilient and can be restored quickly or handled efficiently in the wake of disruptions or emergencies.
The answer is, we test them. Or, in too many cases, we don’t test them. We just cross our fingers and hope they work.
The second option is not recommended!
The fact is, every responsible organization should have recovery plans addressing the three areas mentioned above. And every organization that has recovery plans needs to test them—to make sure they work, identify gaps, and train employees in their use.
MHA Consulting’s Richard Long recently wrote an excellent series of posts about testing, with a focus on devising and facilitating mock disaster exercises.
As I mentioned in the beginning, Richard and I are currently working on an ebook about business continuity testing.
Richard and I plan to publish our testing ebook next year. In the meantime, I wanted to put out three points about testing that occurred to me during various client engagements I’ve been on over the past few weeks. The three points are:
I discuss each in greater detail below.
Some people assume BC testing is a one and done proposition. Nothing could be further from the truth. The commonly understood activity that has the most similarities to BC testing is training to run a marathon.
No one spends three months lying around on the couch eating potato chips then gets up one morning and runs 26.2 miles. You have to train over a period of months, gradually increasing the distance you run. By making yourself run progressively longer distances, you steadily build up your strength.
I’ve run two marathons, in San Diego and New York. If you are also a distance runner, you’ll know this instinctively: No one succeeds at running 26.2 miles at a good pace simply by making a heroic effort on race day. You succeed by gradually building up your fitness ahead of time—and then you make a heroic effort on race day. And (if you’re like me) the last six miles are still torture. However, you keep going and cross the finish line, ideally at or ahead of your desired time. It feels great to finish. And of course it wouldn’t have been possible if you hadn’t systematically built up a foundation of endurance over the preceding weeks and months.
It’s the same with business continuity testing. It’s not realistic to expect to be great out of the gate. The way to get good at it is to start slow and gradually build up what you can do. Incrementally develop your organization’s knowledge and capacity. Be patient. Think like an athlete. Adopt the perspective of a long-distance runner.
There’s a lot of confusion out there regarding what testing is and the types of testing. Let’s clear it up right now.
There are four types of business continuity testing. They are:
This is when a couple of people sit in a room and go through the plan as a discussion exercise. It’s more talking than doing. It’s a good chance to see if your plan is missing anything, such as someone’s phone number or a high-level task. Tabletop testing is good as far as it goes, but it only goes so far.
A walkthrough test is basically a tabletop test plus a scenario. You’re not just reviewing the plan, you’re pretending that a certain specific disaster has happened (such as a flood or data breach). You then talk through the steps for dealing with this scenario. Walkthroughs are more stressful than tabletop exercises. The situations can seem very real. However, like tabletop exercises, walkthroughs are about talking more than doing. You might pretend that you escort everyone to an alternate work site, but in reality no one goes anywhere and you never leave your chair.
In a semi-functional test, the pressure is really ratcheted up. Now you are not only working through a specific disaster scenario, you actually implement your plan, at least to a degree. You might really relocate your group to an alternate work site and have some people work from home. This level of testing is rigorous enough that it typically unearths a lot of problems. For example, people working from home might discover that they are unable to log on to the company computer network. People at a remote location might find that they are missing critical supplies. All problems encountered should be documented and fixed. The amount of time spent working in the alternate locations is typically shorter than it would be in a real-life disaster.
This is as close to the real thing as you can get without it being real. In a fully functional test, you will be running all 26.2 miles, so to speak. The object is to implement the recovery plan fully in order to make sure you can really do it when the time comes. People work from home or at the alternate location not just for an hour but for a day or two. Fully functional exercises can be either announced or unannounced. They are the most stressful and time-consuming but also the most revealing type of exercise.
The last of my three points is about the importance of matching the level of testing to the criticality of the process or application.
I’m always surprised when I find organizations spending a lot of time and money running intensive tests of unimportant processes.
The Boy Scout motto is “Be Prepared,” not “Be Overprepared.” And yet, many organizations do overprepare.
I recommend dividing your processes into three categories based on when they need to be recovered in order to prevent a significant impact to the organization:
Once you determine the criticality of your various processes and applications, you can test them at the appropriate level as shown below:
The exact decisions in terms of levels and timing will probably vary with the organization. The important thing is to be logical in terms of where you devote your resources. Match the intensity of the testing to the criticality of the process.
This approach will lead you to invest your testing time and resources in the processes that are the most critical to the well-being of the organization.
Let’s conclude this post on BC testing with a test. It’s multiple choice, and the test question is: Is business continuity testing: a) something every organization should do, b) helpful for identifying gaps in your recovery plans, c) similar to training for a marathon, or d) all of the above? If you picked d, you passed the test.