DR Testing: You’ll Need Some Time!

In the event of an actual disaster, the objective is to failover very quickly, because this minimizes damage to the business. In a testing situation, the objective is to ensure that everyone involved both understands and can carry out the DR plan in a time of crisis —however, this takes time and practice.

Conducting failover-failback testing is meant to validate that the DR plan will work when needed, but it is also meant to identify potential complications, or gaps in the plan and to remedy those. There are different levels of testing, and each level is going to be time consuming in its own way.

Each level of testing requires that a test be executed, verified, documented and assessed, before proceeding with the next one. Testing individual components in isolation may take long periods of time because this process requires configuring components at a recovery site, and then documenting and tweaking those configurations as required during individual technical tests. Conducting a full failover test requires diverting the attention of many key IT resources to the secondary site and it’s not unusual for this test to consume a full day or a full weekend. (For example, that may mean shutting down production systems on a Friday evening and bringing them back online Sunday, before prime time business hours resume Monday morning.)

The priority is to guarantee that everyone knows what he or she is doing when disaster strikes. If it’s the first time conducting a failover test, or inexperienced people are involved, it will take longer and there will most likely be unforeseen complications. Time is required to resolve misunderstandings, problems or adjust unclear procedures. Also, everything needs to be thoroughly recorded in order to provide an assessment of how well the failback test worked. Finally, if production systems have been taken offline they will have to be reactivated—a process that can also be tricky. The DR testing process can range anywhere from a half day to a few days, depending on how elaborate failover testing and re-activation of production is. The downtime required to test is quite insignificant when compared to the confidence gained in knowing that a DR strategy is valid and the plan will be successful in the event of a real disaster.

Steve Tower