The nightmare begins. A scheduled IT upgrade over a weekend suddenly threatened to become a total IT system shutdown by Monday morning. Any IT outage that threatens to stop business operations is the kind of potential nightmare that keeps IT managers awake at night, but now it was actually happening.
The IT team at an international architecture and engineering firm had been planning the scheduled upgrade throughout the year. The plan included an entire core network upgrade (“rip and replace”), server maintenance and a software upgrade that involved moving to a new vendor. The firm’s IT infrastructure resides and is managed at headquarters and supports employees at offices in the US, Europe and the Caribbean.
The director of IT was remotely monitoring the project which was executed by two IT employees from his firm and the vendor. But when the work wasn’t proceeding as expected and problems were mounting, the IT director headed in to join the team, concerned that systems would not be restored by start of business on Monday.
The director knew a total IT system shutdown could cost the company between $500,000 to $750,000 a day.
IT is at the core of the company’s workflow. The multiple locations rely on computer applications used by the architecture, engineering and construction teams. Work on the upgrade had begun on Friday with expected completion by Saturday afternoon. When the IT director arrived on Saturday, the team had encountered multiple problems and the servers were unable to connect to storage. Although the network was mostly stable, the primary server stack was not connecting to the data array, an absolutely vital resource for company personnel come Monday morning.
It was immediately apparent that the new vendor was not approaching the problems systematically and had led the team in the wrong direction. The IT director and his senior architect had some experience using Kepner-Tregoe Problem Solving and Decision Making (PSDM) and had been through PSDM training. They decided to use the KT approach to regain control and find the best way forward. They initiated a war room and, using KT PSDM worksheets, brought in IT team members at remote locations so they could provide insights. Relying on his KT PSDM training and experience, the IT director knew that instead of reacting to the problems, his team and the vendor had to identify each problem methodically so they could understand them clearly.
As part of the initial Situation Appraisal, it was evident that the new vendor was overly focused on a problem that seemed to exist solely at headquarters with no issues at other locations. The KT methodology allowed them to identify this separate problem as an outlier.
Indeed they eventually learned that the area’s fiber network had experienced a major outage that weekend that was totally unrelated to their existing IT problems. The team noted the issue but moved on to resolving the looming threat of a total IT shutdown. Focusing on the outlier issue of the local outage was adding unnecessary complexity and had distracted the team from its larger, more important task of resolving the multiple system problems that had arisen. By Monday, the team had worked through and resolved most of the problems, the network core and servers were stabilized and the system was back online.
The majority of the firm’s IT staff now are certified in a KT PSDM program taught by the IT director who has become a certified KT Program Leader. Most of those trained are from the support staff that answer standard help-desk calls and work in the firm’s U.S. locations. As their capabilities have grown, they have been able to serve the growing company without the need for additional IT support staff, generating savings in both problems solved and employee costs controlled.
The IT director is enthusiastic about KT’s “train-the-trainer” approach, which is allowing him to train the entire IT staff as well as non-IT employees, and to serve as a facilitator for major issues. KT processes are now integrated into their ticketing system, helping problem solvers create a problem statement and improving escalations. When the integrated system was launched, problem escalations were dramatically reduced due to better communication and to the systematically organized information and data that drives resolution.
"KT PSDM gave me a foundation to be a successful problem solver… I am a huge advocate for KT because I’ve seen it work so many times and in many situations."
–Director of Information Technology
The value of KT PSDM during the threatened system upgrade will improve the way IT projects are done in the future. Not only did the KT methodology help identify and solve the problems during the upgrade, it is now used in planning future IT projects. In addition to avoiding a total outage and the significant costs of such a disruption, the IT team is confident in their ability to handle multiple, complex issues and to avoid future outages.