IT Incident Management

 

By David Frank, Kepner-Tregoe

Those who know about Kepner-Tregoe problem-solving processes often consider these in relation to manufacturing, operations or problem management in IT. Surprisingly, many people overlook the value KT processes provides for incident management. The service desk and various support organizations throughout your IT function are responsible for resolving hundreds, if not thousands, of incidents every day. With this volume it may not seem like a rigorous set of processes like KT problem-solving is appropriate, but you would be mistaken. KT problem-solving processes are an ideal fit for your IT incident-management teams that provides an easy-to-understand set of activities, workflows and tools for determining the nature of an incident, making decisions and implementing an effective solution to quickly restores service.

How this would work

A service-desk agent receives a call from a user reporting a service disruption. The agent must ask the user the right questions to clarify the issue and understand its impact. If the issue is a common known error, the agent may troubleshoot directly with the customer and provide a standard workaround. Once the issue gets past that level of complexity is when the fun really begins.

From there, the agent must determine who needs to be involved, while recording the incident in a support ticket and routing it to the correct user or group. The agent assesses any dependencies that might be contributing to the situation and evaluating the impact to the business (severity).

The subsequent handover of information in a clear, consistent and concise manner is critical to speed and quality. The receiver must take over the troubleshooting process. KT can reduce the time required to understand the incident, avoiding re-clarification that would normally occur at this stage, empowering them to get up to speed and quarterback the restoration process. Once the reason(s) for outage have been identified the agent must leverage knowledge resources - such as notes from previous incidents, the CMDB and consultations with Subject Matter Experts (SMEs) - to identify alternatives to resolve the issue with a short-term-fix or work-around. Based on these alternatives, their expected efficacy, the cost/time to implement and risk, an agent either acts or recommends a course of action to the requestor.

Consistent processes improve performance

This scenario is only slightly different than the problem-management approach that operations staff and IT problem managers use to address long-term risks. The primary differences are:

  1. The incidents happen in a compressed timescale, multiple times each day. 
  2. The goal is service restoration vs. root cause (fix the symptom)
  3. The primary skills used by the agent is decision making rather than analytical investigation

By leveraging KT problem-solving processes for incident management, the goal isn’t to add overhead and unnecessary rigor to the incident-management process. Instead, it is to provide a simple, consistent approach to enable support agents and incident managers to process user issues quicker, identifying the underlying issues the first time, and selecting the correct fix/workaround based on the information at hand and risk profile of the issue in order to restore service quickly.

Consistent incident-management processes can help improve service desk and IT support team efficiency by more than 20%, while reducing variance amongst agents by over 40%. This can be translated as either increased capacity for resolving more incidents or a potential cost savings. It is not uncommon to see this approach reduce not only the time to solve issues, but also the number of people it takes.  Bridge calls participation is often reduced from dozens of participants down to a few targeted individuals.

Compounded across a large support organization, the benefits can increase quickly. The actual value to increased incident-management performance is faster time to resolve user issues. Each incident represents a disruption to your normal business processing. The incident may impact a single user (on the phone with the service desk) or the incident may be impairing the performance of entire business functions. The quicker you resolve the incident, the sooner your employees can return to their normal activities.

As an added benefit, the consistent and structured approach to data capture and documentation are a catalyst for downstream efficiencies as it relates to Problem Management, remediation, and change management.

Companies large and small use problem-solving processes from Kepner-Tregoe every day to help their staff members address all kinds of issues – from small tactical service requests to large-scale major incidents. Companies that use KT process understand its’ primary purpose is to help your employees do their jobs as effectively and efficiently as possible. When the process just works, your staff can focus on the business problems at hand. 

Learn more about Kepner-Tregoe’s approach to Incident Management

Read more blogs about IT Incident Management

Achieving Service Excellence Means Planning for Failure

Major Incident Management – Being prepared when a change goes horribly wrong