Incident & Problem Management

Incident and problem management

The ability to carry out a successful Disaster Recovery Plan depends on how well an organization manages a particular problem. Incident management is the front-end “canary in the coalmine”; it detects if something has the potential to turn a minor threat into a full-blown disaster.

Incident or problem management is closely tied to Disaster Recovery planning because it may lead to the identification of a pending disaster, or it may spot a disaster that is already occurring. Accurate detection of a risk or threat is the first part of the equation; a good Incident –Problem management process is able to properly track, expand and escalate actions / reactions around the threat.

Progression of Escalation: How an incident reaches a DR Team

Consider the following practical example: an employee reports to a Service Desk colleague that she does not have access to her email. While seemingly insignificant, this problem may be a sign of a much larger and more threatening problem—for instance, a system failure. This incident, when properly managed, may be escalated through various stages of the organization until the categorization as a potential threat is lifted or until declared a disaster. Upon further investigation, IT staff could discover that this is not an isolated incident. Because this incident impacts multiple employees, it escalates to someone at the second level of diagnosis to determine root cause and whether this is a true disaster. If deemed “major”, it then escalates to a manager, who will determine severity of impact and potentially declare a disaster. At this stage of escalation, the designated Disaster Recovery Team takes over and activates the DR plan.

This example of DR planning works in tandem with an incident or problem management situation, as the process has built-in escalations and accommodates the increasing scale and scope of an incident. Correct management of minor incidents can aid in the prevention or mitigation of a large-scale disaster. A problem gets handed off to the DR team through the process of escalation. As the severity of the problem increases, the management of the incident is passed up through rising levels of responsibility and then over to the DR team or coordinator. Working quickly and smoothly through the escalation process will help to determine whether the prospective situation necessitates activation of the DR plan.

Rapid diagnosis of incidents and problems is the key to mitigating the effect of emerging disasters.  Wherever possible, integrate disaster recovery with IT operational support processes.  As a minimum, imbed disaster recovery identification in Service Desk procedures. Accommodate disaster recovery by using consistent terminology and common / shared definitions of incident severity, priorities, SLAs, RPOs, RTOs, tiers, business impact, and escalation timeframes. A well integrated Incident and Problem Management workflow will not necessary prevent disasters, but it may reduce their effect and help an organization recover faster.

Steve Tower

With many years of professional IT experience, and training as a Certified Management Consultant, a Project Management Professional, a Professional Engineer and a Member, Business Continuity Institute, Steve Tower has the skills and abilities required to assist with even the most complex disaster recovery planning initiatives. Below, Steve discusses the necessary tools involved in setting up a disaster recovery plan and program.