Root Cause Analysis

Many of the tough problems our clients ask us to facilitate are actually solution-caused problems. That is, there is a problem, they find the cause, they put a corrective action in place to make the problem go away and it does... only to be replaced by a new problem. The unintended consequences of change are common in every industry.

Unintended consequences are common
in every industry and are costly!

For example:

A petrochemical plant had a recurring overheating problem in a compressor. This had happened before and the response was to replace the dry gas seal, a $75,000 process that had made the problem go away only to return in a period of months. This time when the problem returned, it required an expensive shutdown as a safety measure. Determined to resolve the issue for good, troubleshooters used an analytic troubleshooting process and determined that bearing reliability is not only a function of bearing quality but also of bearing orientation within the compressor. The bearing orientation, not the entire dry gas seal was at the root of the problem, requiring a simpler $7,500 fix.

It’s not unusual for software updates to resolve existing issues and expand capabilities only to create new problems. When Apple introduced IOS 13 last year, a particularly buggy update, within eight days updates 13.1 and 13.1.1 were released to address new problems including everything from battery drain to syncing with third party keyboards.

When ‘black specks’ appeared in an ingredient at a pharmaceutical manufacturer, analysis identified them as small pieces of shredded gasket material. These findings were sent to the supplier of the ingredient, who quickly responded that they had corrected the problem by inserting a 704 stainless steel mesh filter to separate out the black specks. The black specks disappeared, and everyone was happy. But a month later, the manufacturer began noticing shiny specks in the same ingredient from the same supplier. When analyzed, these shiny specks turned out to be . . . 704 stainless steel.

Understanding how solution-caused problems arise
can help to prevent or contain them

Solution caused problems are expensive, disruptive, and more common than may be suspected. When discovered and analyzed, they lead to embarrassment, some finger-pointing, and a lot of head-wagging. Understanding how they arise can help to prevent or contain them.

Opportunity-Caused Problems occur when a variable is changed without thinking how it would affect the final product. In the rush to resolve an issue and get back online, opportunities that get you back to work may take precedent over finding root cause. But when you are taking advantage of an opportunity, you are taking an action, and actions can have unintended consequences—either downstream or as a recurring problem that can become costly over time. At the petrochemical plant, troubleshooters spent the time to use root cause analysis to gain insights into how bearing orientation affected production and put an end to a recurring problem with a relatively inexpensive, more permanent fix.

Problem-Relocation Problems often cause a down-stream problem when a change is made. Again, the failure to recognize potential problems with the corrective action can lead to a new problem in a different location. In software updates, the goal is to minimize these unintended problems up front and to resolve as many as possible that do occur during beta testing.

Containment-Caused Problems arise from failing to find root cause and adopting an interim action instead of a permanent corrective one. In the gasket problem, the unasked question was, Why was the gasket material getting into the blend and how could this be prevented? By failing to find the cause of the cause – and implementing a fix that did not act against degrading gaskets, the “solution” merely filtered the problem until the steel filter was also degraded.

Communication Problems compound solution-caused problems. Communication can fail within supplier relationships and within the same organization with unintended consequences. It’s critical to acknowledge that anytime you introduce a change into a process, you are potentially introducing variation. Intentions don’t matter. Changes can cause problems, and change needs to be analyzed and managed.

Three elements can minimize the occurrence of solution-caused problems
and, if a problem does occur, reduce its impact without creating more

1. An Analytical Approach. Asking, What could go wrong?, is a start, but just asking the question, and even listing a few potential problems, is unlikely to minimize the chance of something going wrong. Instead, you need to be quite detailed about the potential problems and specific enough to be able to hypothesize some likely causes for each potential problem.

Causes are crucial because any preventive action you take must be directed at the causes, not just at the effects. Not all attempts at prevention will succeed but an effective preventive action needs to significantly reduce the probability of the potential problem’s occurrence. Should the potential problem actually occur, despite any attempts to prevent it, proactive contingent actions aimed at the effect can reduce its impact.

It may sound cynical, but most people tend to avoid painstaking analysis if they can

2. A Change Management System that Builds-In an Analytical Approach. In the heat of the moment, like trying to get a costly line back up and running again, people may skip some steps in order to speed up a resolution. One of the first steps skipped is asking, What might go wrong? Therefore, building risk analysis of potential problems into SOPs is critical. It may sound cynical, but most people tend to avoid painstaking analysis if they can.

Many corrective and preventive action systems—especially in regulated industries such as pharmaceuticals, the ISO world of heavy manufacturing or in other precision industries—require a change management system. Minimally the system requires all changes to be logged in a central registry, described and dated. More stringent systems require a full experimental or manufacturing validation of the new component or new process before proceeding. Such a system is an optimal place to include potential problem analysis.

3. A Proactive Culture. Finally, the organization needs a culture that accepts the fact that unanticipated problems will occur and believes that it is better to consider them in advance than react after they occur.

Resistance to effective risk management is often due to how companies reward employees. Problems and potential problems have a built-in structural asymmetry. It’s easy to see if someone has solved a problem but close to impossible to ascertain whether someone prevented a problem from occurring. Non-occurrence is easily explained by assuming that there never was a potential problem or that some other unplanned event prevented it from occurring. It’s tough to prove that preventive action minimized the probability of a problem occurring or that contingent actions minimized the effects.

We have all met people who take great pride in their problem-solving skills—and we sometimes wonder if they would have fewer problems if they just thought ahead. Yet there are ways to reward the heroes that anticipate and prevent future problems. It requires clear thinking, consistency and a little creativity. But if addressing potential problems is not seen as a valued activity, it tends to be avoided. Organizations can quickly earn back multiples of the time and money invested in embracing the skills needed to attack problems, installing systems to track them and acknowledging the mind-set that values preventing them.

Also from Kepner-Tregoe

Root Cause Analysis Hacks for Easier Troubleshooting


About Kepner-Tregoe

Software and templates don’t solve problems. People solve problems!

What kind of people? People who are curious, ask great questions, make decisions based on facts, and are empowered to lead. They remain focused under pressure and act confidently to do what needs to be done. You’ll find these problem solving leaders both at our clients and here at Kepner-Tregoe. For over 60 years, Kepner-Tregoe has empowered thousands of companies to solve millions of problems. If we can save millions for a manufacturer, restore IT service for a stock exchange, and help Apollo 13 get back from space, we can help your business achieve success.