Being Aware of Incidents as Indicators

A single employee cannot access her email, and while this small incident seems of little significance, it could be just the tip of the iceberg, foreshadowing an impending disaster. The ability to mitigate the effect of a large-scale catastrophe requires that a business be aware of the indicators before the full impact is felt. The type of indicators that a business needs to be aware of change with the type and magnitude of incident or problem, but there are some general things that all businesses should consider.

Be on the lookout for the incidents that are indicators. When it comes to the markers of a potential disaster, there are two simple rules to follow: watch for the signs carefully and monitor systems regularly. Those responsible for overseeing incident management need to know where to look and how to diagnose to determine whether an incident is isolated or an indicator of a much larger disaster that may already be occurring.

For example, one dangerous indicator is a loss of network connectivity. It is important to establish whether this event is isolated or a more widespread issue. This event may be an indicator, which could be the result of a small local issue such as single computer connectivity to a network, or the whole network could be out—which could result in a loss of company productivity or worse. Having communication, diagnostic and escalation tools in place allows for an overview of the situation to assess it as either a local, isolated incident or as a larger, more catastrophic issue.

Other examples of incidents to watch carefully:

Loss of Internet access
Inability to send messages via email or mobile device
Inability to receive telephone calls
Anything having to do with “broken” communications between applications (especially with the large amount of processing that goes on in the cloud)

Who is responsible?

The management and diagnosing of incidents as they occur happens at every level of a company’s support structure, with each tier playing a specific and important role. Proper management of these situations also requires that a business have the capabilities in place to correctly gauge the scale of the problem.

1^st Level: When an incident initially occurs—such as the inability for an employee to send an email—there is a certain amount of “symptomatic” diagnosis that can be done at this stage. This type of diagnosis determines the scale of the issue to assess, whether it is narrow and benign or something more severe. This level of incident management mostly concerns following scripts and checking out the “usual suspects.”

2^nd Level: When issues appear to be more widespread in nature, second level support, which generally involves problem management resources and a higher skill level, is used to diagnosis the particulars of the situation. Are pieces of equipment, databases, networks, applications and other critical components in the IT infrastructure working correctly?

3^rd Level: The next group to get involved would be specialized external resources, such as vendors and suppliers who can assess the situation on their end to determine the magnitude of the problem. For example, if an organization uses a third party payment processing system, this system may need to be assessed from both ends to determine the root cause of the problem.

Who is responsible?

Steve Tower