By Christoph Goldenstern

Attaining IT stability is a strategic need. One thing we increasingly hear from clients is how critical IT stability is. According to Forrester, 57% of organizations suffer on a weekly basis from performance and availability problems with business-critical applications.

The #1 desired capability:  Rapid Root Cause Analysis (RCA)

Problem Management (PM) and IT support operate in a high-pressure environment and the pressure is only going up. Research shows that the volume of incidents and problems is increasing at a dramatic rate, driven by the complexity of technology, new computing models, the Cloud and the rapid speed of innovation. Add to this, a multivendor environment that makes problem solving harder as accountability is less clear. Technical knowledge always lags behind, so product experience alone cannot solve problems.

In addition to new incidents, recurring incidents add pressure. Incidents recur when PM doesn’t get to root cause and when case content impedes knowledge reuse because it lacks consistency or structure. As a cost center, IT is under constant pressure to do more with less. And SLAs are increasingly demanding: customers want it better, faster, cheaper.

Within this challenging landscape, improvement is attainable only by improving Root Cause Analysis (RCA) through a rigorous approach that focuses on the most critical data (because speed is critical) and aligns all resources involved in the process. Relying on individual technical expertise and trial-and-error in today’s complex environment is inefficient and costly at best and disastrous in some cases. Rather, it’s about clear thinking under pressure.

The value of this kind of thinking is explored in Daniel Kahneman’s book, Thinking Fast and Slow. Kahneman describes the concepts of fast and slow (System 1 and System 2) thinking. System 1 uses intuition, unconscious thinking that is automatic and based on past knowledge and experience. System 2 is based on a more deliberate critical thinking approach, with time taken to pause and reflect and to check assumptions and facts. We can handle about 80% of everyday issues using our System 1 thinking, mainly based on the experience we built up over the years. But as the complexity increases—as does the risk and reward of putting a solution in place—we should pause and engage in System 2 thinking, a more deliberate, systematic approach to solving problems and making decisions—such as RCA.

Thinking under pressure tends to be inconsistent, if it is not structured, practiced, managed and supported. So what does a System 2 RCA/PM approach look like?

Here are some recommended high-level steps:

1. Situation Appraisal is about understanding if we are actually dealing with a problem and if it is worth pursuing. What is going on? What is the impact or value to the business? Do we need to know cause? At this point, we plan next steps such as selecting an approach to resolution and identifying resources.

2. Problem Analysis begins with understanding the symptoms and potential causes (not the other way around!). The “symptoms” are described and explored in a Problem Statement, noting what IS/IS NOT going on, and collecting data that describes what, where, when and the extent of the problem. From here we can identify and evaluate causes and then confirm true cause.

3. Think Beyond the Fix begins once cause is found. This is where we switch from reactive to proactive PM. Here we consider other areas that could benefit from our investigation. We ask: Where else could this be happening? What other damage could this cause create? What caused the cause?

4. Execute and Document is the step that transitions us into change management and ensures the creation of re-usable knowledge so we don’t solve the problem again and again: decide on the best fix, generate change requests, implement and verify and create a meaningful knowledge base article.

Albert Einstein famously said that if he had an hour to solve a problem, he would spend 55 minutes thinking about the problem and 5 minutes thinking about the solution. This kind of structured thinking may feel like it takes more time than we have. The opposite is true. We found throughout many engagements that the quality of the process actually speeds up the resolution. In one particular case, just the use of Situation Appraisal reduced resolution time by 52%.

Most IT organizations report that they rely too much on technological expertise to address the complexity and volume of problems they face, and resolution is delayed or never reached due to a lack of a common, critical-thinking based RCA approach.

We also believe that effective PM ultimately requires dedicated personnel who are given the authority to drive consistent RCAs. Proper training and implementation of this kind of approach, can deliver increased IT stability and dramatic time and cost savings let alone happier customers!

The KT problem solving approach is used worldwide for root cause analysis
and to improve IT stability

(4)