By Christian Green, Kepner-Tregoe
When people think about root cause analysis (RCA), the first thought that comes to mind is often incidents and figuring out why they happened, so you can prevent them from happening again. While diagnosing the cause of incidents is one of the things that root cause analysis can be used for, it is also tool that can be used in much more proactive ways as well. Continuous improvement is not just about fixing things after they have broken - it's about assessing the things you do every day to make your operations better.
Every asset or system that you have is designed and configured to operate in a specific way and to serve an intended purpose. Your company depends on these assets being reliable and stable with predictable performance for business processes, end-user activities and manufacturing operations to run smoothly. Whether you're talking about your IT systems or the machines that underlie your manufacturing processes, Improving the operational stability and reliability of your assets is key to improving business performance.
In the IT world, this is referred to as proactive problem management or using monitoring information combined with trend analysis to predict the types of events that are likely to occur and their projected impact, so you can make informed decisions about preemptively addressing the underlying issues to reduce the likelihood of occurrence. In manufacturing operations, methodologies like six-sigma include looking at trend analysis, operational variation and the use of control limits to identify when a process is performing abnormally so corrective measures can be taken.
Focus on areas that can have the biggest impact
Capturing a complete picture of your asset’s performance through telemetry and monitoring is only the first step. Root cause analysis is where this raw data is organized, contextualized, analyzed and interpreted to distill out meaningful and actionable insights that can be used to improve operational performance. There are two performance dimensions that your proactive root cause analysis efforts should focus on:
Operational Stability – Are your assets consistently available and performing at a level that they can serve their intended business purpose? Do you encounter frequent outages, changes in performance/throughput or do the assets require frequent reboots and configuration tuning? Once setup, a stable operational asset should be able to run continuously for a period of time without manual intervention at a consistent level of performance
Reliability – Can users and operations processes depend on the asset to perform it’s intended functions reliably to avoid causing operational interruption. While stability deals with the system being up and running, reliability focuses on the asset serving its intended purpose. That requires that the features and dependent services be operating successfully as well.
Root cause analysis can be used to interpret the monitoring and telemetry data that your systems are generating to identify areas of concern. Those areas of concern can then be diagnosed using root cause analysis techniques to understand where the underlying issues are coming from. RCA can look at how the overall environmental is configured (technical dependencies); system and business-related events taking place; and changes introduced that may impact the asset (things like configuration changes, upgrades, etc.). The result of the analysis is a clear understanding of what is causing the issue so leaders and operations staff can make informed decisions about what situations need actions taken to avoid future incidents.
Proactive problem management and root cause analysis to identify and diagnose operational stability and reliability issues with your manufacturing and IT assets is an important part of your continuous improvement efforts. Kepner-Tregoe has been the industry leader in problem solving training, tools and consulting for over 60 years – helping companies respond to incidents that have already occurred and proactively identifying and addressing situations that can be avoided. To learn more, visit www.kepner-tregoe.com