Before a disaster becomes a disaster, it usually begins as a small, often ignorable incident. Perhaps a few emails go missing in transit. Maybe the video component of an audio communication program like Skype stalls out. Or maybe a company-specific application keeps closing without prompt. While such hypotheticals could very well denote minor glitches on the server side of things, they could also signify the birth of a crisis.
That is why detecting and diagnosing operational irregularities is such an important step in preventing and coping with disaster. Strong monitoring rooted in systems operations, technical support, and applications groups, is the frontline of defense against major technical problems, and allows organizations to deal with disasters in a swift fashion.
A successful formula for identifying threats to be dealt with includes five informal steps, which run from discovering potential dangers, to realizing the threat at hand, to putting a disaster recovery plan into action. The four steps are:
- Capture & Detection – This is mostly done by frontline people, such as system users, the “public” and technicians responsible for monitoring services. They initially encounter issues and spot operational abnormalities in systems and apps. In many cases, there are time delays, or gaps between the actual occurrence and the detection and reporting of a service issue.
- Diagnosis & Investigation – This stage is typically executed by IT people that look into the issues. They attempt to discover when the issue actually occurred, why it occurred and whether the issue is a minor incident or the first indication of a major disaster.
- Prognosis & Prediction – Using the information gathered during the investigation phase, a group of supervisors, managers, and IT specialists consider time to repair. For each escalated event, they look at potential workarounds and alternate methods of restoring services. Their focus is predicting expected duration of outage. Whether it will last five minutes, five days, or five weeks, determines the next step.
- Decision-making & Declaration/Dismissal – The report of the incident and its potential consequences is the basis for a decision to declare a disaster. Designated individuals (usually IT management and / or company executives are responsible for launching the DR plan. Since this usually involves lead time to activate, significant deployment of resources and obviously, expenses, the severity of taking this step is not taken lightly. Usually there are also “failback” and restoration costs associated with this decision as well.
Following these processes ensures that any disaster is addressed systematically, as soon as it begins to pose a threat to an organization. Additionally, the agenda is dependent on links between various departments and individuals with roles in the Disaster Recovery plan, enabling communications that are ultimately vital to rectifying a catastrophic situation.