By Christoph Goldenstern, VP of Innovation and Service Excellence, Kepner-Tregoe
In a recent post, we introduced the subject of mitigating risk as a discipline in IT Change Management. We proposed applying a triad of dimensions to assess risk: The probability of a problem arising, the likely impact of that problem, and the ease of recovery if the problem occurs.
In addition to these dimensions, risk management can view proposed changes according to three different levels of scope:
- The risk associated with a Standard Change, as DevOps would call it — a routine, anticipated change, which ideally is automated because the overall risk is very low.
- On the other end of the continuum, a significant, unanticipated risk that poses a hazard to the reputation of the business, or even its viability — e.g., a data center goes down, or a critical customer facing application fails or is discovered to have a critical bug. For these risks, organizations develop multiple redundancies, simulations, tabletop exercises, and other sophisticated risk management and IT continuity procedures.
- Intermediate risks — risks associated with day-to-day operational changes that won’t bring the house down, but cannot be automated away because they are one-offs, bug-fixes or complex enough to cause serious trouble, outages, extended downtime and user/customer frustration - changes that potentially cause new incidents.
Risks in this intermediate category frequently are ignored or not actively managed, because they are not seen as mission-critical and because there is no obvious way to eliminate them through automation. IT organizations frequently fail to mitigate these mid-level risks — even organizations that are relatively effective in managing the more critical risks.
These intermediate risks represent a large proportion of what a change manager or CAB deals with every day. In fact, change managers generally regard standard changes, which can be automated, and mission-critical risks which could threaten the business, as the exceptional cases. The intermediate level changes — i.e., everything else the CAB deals with — are accepted as the reason the organization adopted change management in the first place. The risks hide in plain sight. It isn’t that the CAB is complacent about them; it’s that the team is so busy planning and implementing the change that the need to actually assess and manage risks gets often times ignored.
This represents a significant improvement opportunity to avoid future incidents.
The improvement involves assessing these changes at the intermediate level with respect to the three dimensions and making the actual risks visible through clear problem statements and the underlying causes as a basis for identifying and choosing mitigation actions, both at the cause and effect level. Finally, the analysis gets converted into a clearly owned plan of action.
We can further enhance this “practice” as part of our continuous service improvement efforts by applying what we call “thinking beyond the fix” to our problem management efforts by extending our RCAs to look for other areas that our fix could be applied to, but also looking at the risk our fixes pose since a fix is nothing else than another change.
One of our financial service clients recently estimated that this discipline had enabled them to prevent the occurrence of over 10,000 problems.
The bottom-line: risk mitigation is not an art, it’s first and foremost of all an operational discipline.
KT offers a one-day workshop on Risk Mitigation for IT organizations, which provides a useful, fundamental discipline to institute basic risk mitigation in your business.