Mitigating Risks of the IT Disaster Recovery Test

The IT Disaster Recovery Test as part of the Business Continuity testing is becoming an annual event for most IT departments. It is mandated by a lot of regulators, nearly insisted upon by internal audit and ofcourse a very healthy thing to do.

But performing the IT DRP test without proper risk management can put your organization at significant risk.


To put things into perspective, let’s analyze the steps, risks and countermeasures of an IT Disaster Recovery test:

DRP Test Step Activity Risks Countermeasures
1. Failure of primary systems In order to perform a disaster situation, the Primary systems need to be caused to fail on some level
  1. Databases not closed properly/damaged due to forced shutdown or forced power failure
  2. Hardware components failing due to forced shutdown or power failure
  3. Spilt-brain cluster due to uncontrolled sequence of failures of servers and storage
  1. Full backup prior to the initiation of the DRP test
  2. Backup components and Vendor presence at ready during the entire test.
  3. Not performing a direct forced shutdown but forcing a network level isolation at the routers
2. Activation of Disaster Recovery systems Severing any relation between the DR and the primary systems and running the DR systems as temporary primary
  1. Actual failure of primary system during the test
  2. Failure of the primary system while the DR system is concluded to be non-functional
  1. Full awareness of the test of every interested party – business custodians, directors of divisions and top management to initiate the real Business Continuity Plan
  2. Full backup prior to the initiation of the DRP test at DRP site, and full vendor support.
3. Reconfiguring the user environment Intervening in the end-user environment in a way that will make them use the DR system
  1. Error in reconfiguration which may cause the end-user to input test data into the primary systems
  2. Error in reconfiguration which may cause the primary system to stop functioning.
  1. , 2. Scripted and documented steps of reconfiguration. All steps should be performed by 2 persons – one observing the others actions
4. Reverting to the primary systems Resuming the primary systems at some level and reestablishing the relation between the DR and the primary systems
  1. Error in reconfiguration which may cause the primary system to stop functioning.
  2. Copying of test data that was input into the DR test system back into the primary location3. Failure of primary systems during resumption
  1. Scripted and documented steps of reconfiguration. All steps should be performed by 2 persons – one observing the others actions.
  2. Fully controlled and documented process of resumption, which guarantees that only the primary system is data master.
  3. Full backup prior to the initiation of the DRP test, Backup components and Vendor presence at ready during the entire test.

With all these risks, is it more prudent to never perform an IT DRP test? – Absolutely NOT, and here is why:

  • Performing the IT DRP test actually confirms that things are running, and if something breaks, you are much more prepared for the next time.
  • Not performing the test will just make you think everything is great, until the incident occurs. And the incident is just as certain as death and taxes

So, perform the IT DRP test regularly, but with a whole set of countermeasures for the possible risks which can happen during the test. Of course you will miss some risks, but if you plan for 10 and miss 1 is much better then not planning at all!

Talkback and comments are most welcome

Related posts
iPhone Failed – Disaster Recovery Practical Insight
Business Continuity Analysis – Communication During Power Failure
Business Continuity Plan for Brick & Mortar Businesses
Example Business Continuity Plan For Online Business