By Shane Chagpar, Kepner-Tregoe

gears thinkingHas your organization considered Problem Solving skills training for your IT staff?  Maybe you should.   Here are 5 signs that your IT organization should invest in this critical set of skills…

1. Major Incidents feel more like a group of toddlers playing soccer than a group of highly paid professionals executing as a team.

When the most critical systems that your employees, suppliers and customers depend on are down or not working properly, you have a Major Incident on your hands.  This is the time when "all hands are on deck" in the IT department and the top priority is restoring service as quickly as possible.  What do these Major Incidents feel like in your organization?  Are they structured and well-orchestrated, with each team member knowing the role they need to play, actions that need to be taken and information to be collected?  Or do you have images of children playing soccer with everyone running around screaming and chasing the ball around the field but not really accomplishing anything?  Problem Solving training can help your IT organization be more effective and get better results from the experts on your team.

2. The most common resolution to system failures is "we re-booted the system"

How often do the resolution notes for system incidents read as a single line statement " rebooted the system and issue went away"?  What caused the incident to happen?  Will it happen again?  If your IT staff is narrowly focused on restoring service to meet their availability SLAs and not understanding why the issue occurred or what was going on within the system and in the environment, you are at a high risk of whatever the issue was recurring in the future.  Problem solving skills and a robust problem management process can help ensure that necessary data is captured to understand the underlying cause of the failure so preventative actions can be undertaken to avoid the situation from happening again.

3. The same failures seem to happen over and over again and you can't seem to get an answer to why.

How large is your "known issues" list becoming?  Most incident management processes stop focusing on an issue once a short-term-fix and/or work-around is identified and documented.  Known issues can be a real problem for organizations if left un-checked - leading to a user perception of IT systems being un-reliable and unstable.  Problem solving skills and processes can help your IT organization turn your "known issues" list into a "non-issue" list.

4. The most inexperienced and lowest paid person on the IT team is tasked with resolving incidents.

Production incidents represent flaws in a system that were not able to be anticipated, avoided, or identified through development and testing processes.  They are often some of the most complex and difficult situations to diagnose and resolve, yet the responsibility of addressing them often falls to the most junior person in the organization.  Effective problem solving is not an individual skill - it’s a team effort and must be embedded into the culture of your IT organization.  Problem solving training can help your IT organization understand who needs to be involved in the Incident and Problem Management processes- from your new-hire to the most senior subject matter expert.

5. Problem managers aren't engaged until after the service has been restored.

Most of the symptom and environmental data needed to diagnose a system problem and identify the underlying cause(s) is lost or destroyed when the service is restored.  If the people responsible for problem management are not engaged until after the incident is resolved, the likelihood of truly understanding the problem and avoiding it in the future is reduced.  Problem Solving skills and processes can help ensure that the right people are engaged at the right time and if they aren't available that others on the team know what data to capture so the problem can be analyzed later.

The Kepner-Tregoe problem solving approach is used worldwide for 
root cause analysis and to improve IT stability.

(1)