By Christoph Goldenstern, Kepner-Tregoe
Cybercriminals are everywhere. Unfortunately, once they get into your network, they get plenty of time to do their dirty work, as it takes almost two months (50 days) on average for IT teams to identify that a breach has occurred.
And the costs quickly add up. According to the Ponemon Institute’s 2019 Cost of a Data Breach Study, the average cost of recovery per incident is $3.92 million.
To mitigate these risks, you must have a cyber incident response plan. This allows you to minimize the damages and costs when—not if—you’re attacked.
But how does cybersecurity—and fast response to incidents—fit into your established (and presumably well-oiled) IT incident management machinery?
The IT Infrastructure Library (ITIL®) has become the standard for how many organizations manage their IT infrastructures. But although it—or variations of it—do a good job of helping companies organize and manage their IT services, it’s short on security. Relying on ITIL could leave you high and dry when the cybercriminals come for you.
In this blog, we talk about how security and IT incident management currently reside in different silos in many organizations. We examine why this is a problem, and suggest ways to fix this so that any incident that affects IT services—whether cyber security, IT infrastructure, or other issues—is addressed and solved swiftly and effectively.
Incident management and cybersecurity: separate, but equally important
Your IT incident management (IM) team is a key part of IT service management (ITSM). The team’s charter is to swiftly get services back to normal after interruptions. Their goal is to minimize how the issue affects users—and the business. Operationally, this means reducing the length and severity of disruption from unexpected hardware, software, and network slowdowns or outages.
The IM team goes into action when someone or something—a user or IT staff member, or perhaps an automated alert system—identifies that an event has occurred. Perhaps the network has slowed down, or an application fails to respond. The IM staff first contains the incident to prevent it from affecting other services. Then, they typically find a temporary workaround, deploy the fix, recover the system, and place that system back into production. IT staff then performs root-cause analysis (if required) to determine the reason for the problem, logs the incident for future reference, and, if necessary, involves the appropriate people to begin working on a permanent fix.
Many companies have also built up solid cybersecurity teams. These incident response (IR) teams traditionally have followed a comparable, although parallel, path to resolve problems as the IM team. The difference: their charter is dedicated to responding to security incidents. The kinds of events that trigger the cybersecurity team to respond include:
- Successful or unsuccessful hacker attack
- An alarm raised by an intrusion detection system
- Unauthorized access to sensitive information
- Unauthorized alteration of information
- Unauthorized access to classified or otherwise sensitive data
- Compromise of system/server integrity
- Denial of service (DoS) or distributed DoS
Cybersecurity teams and IM teams have traditionally worked independently. But the world is changing. Technology is advancing rapidly, and organizations are increasingly dependent on it. This means systems must be always on, or there is an adverse impact on the business. The complexities of today’s IT environments and the savviness of cybercriminals—who unfortunately always seem one step head of security professionals—mean that businesses like yours must forge a new relationship between the teams. This means analyzing the people, processes, and technologies of both operations, and seeing how they can work together more seamlessly.
The challenges of IM-IR integration
Today, enterprises are facing questions like: when does an IT IM issue become a security issue—and vice versa? Who—or what automated systems—make that call? Then what happens? Unfortunately, ITIL does not directly address these and other issues. Although it informs what to do at a high level, it is not very helpful operationally.
“The old ‘IT’ way of responding to security incidents with the CIO standing over your shoulder asking, ‘Is it fixed yet?’ is long gone,” wrote Robert Herjavec, president of the Herjavec Group, a consultancy based in Los Angeles.
There are many challenges due to this new complexity. Chief among them: IM and IR teams today are inundated with alerts and data from an increasing portfolio of siloed point solutions, for managing both infrastructure and security disruptions. But the data on these incidents and vulnerabilities often lack business context, making it difficult to know which ones pose the greatest threat to the organization, and who is the best to cope with the disruption. Manual processes and cross-team handoffs hinder the process even further.
A new approach
We’re already seeing a drastic change in how enterprises today are speaking about cybersecurity IR. The language being used originated with emergency response professionals like first responders and military teams: terms like discovery, containment, eradication, and recovery are frequently heard in conjunction with cybersecurity IR.
Leading organizations have also adopted the organizational structures of emergency professionals by calling the hierarchy of team members an “incident command” squad complete with an incident commander, the leader who calls the shots during what are frequently very serious cybersecurity incidents.
In fact, if you haven’t heard these terms when attempting to align IM and IT incident management, you probably don’t have the right team on hand to deal with a cybersecurity emergency. Here’s the usual lifecycle for the cybersecurity IT team:
Preparation: Do a risk assessment of the organization and establish security policies
Discovery: Identify when a potential breach has occurred
Containment: Prevent the intruders (or infection) from spreading
Eradication: Get rid of the problem
Recovery: Return systems and operations to normal
Lessons Learned: Do a thorough “post mortem” to be prepare for next time
To implement this lifecycle, you must first establish a process for identifying which events are security related—as opposed to events that belong to the IM team. Sometimes, it is obvious. For example, your intrusion or SIEM system might detect an DDOS attack. It would then of course send an alert to the security IR team. But it may sometimes take some time investigating an event before you realize that it is, in fact, a security issue.
A first step is to train your help desk and other IT IM personnel to recognize security Incidents—which is defined as any IT event that harms—or attempts to harm—the availability, privacy, confidentiality, or integrity of an IT service.
As with all incidents, the use of specific questioning techniques right at the beginning of an incident and the ability to break down the information is critical to uncovering security incidents early and drive specific actions. In particular, the step that we call “Separate & Clarify” is essential to describing the symptoms and impact at a granular enough level to see “what’s going on underneath”. This will increase the likelihood of an IM team identifying a likely security incident and involving the IR Team as early as possible in the process and minimize damage.
This only works if organizations, at least partially, integrate IM with cybersecurity incident response. A report by Enterprise Management Associates’ “Next-Generation IT Service Management: Changing the Future of IT,” found one of the top two strategic priorities for IT IM professionals—second only to improving the user experience—is “integrated cross-silo support for security”. Both these and other strategic priorities cited by survey respondents offer strong, continuing incentives for change and growth among enterprise IT IM teams to encompass cybersecurity.
The RESILIA guidelines released by AXELOS in 2019 is another place to begin your journey. RESILIA is a portfolio of training, learning, and certification underpinned by the RESILIA Cyber Resilience Best Practices Guide. It aligns cybersecurity guidelines to ITIL, allowing security to be integrated into your existing IM processes.
Of course, what applies to major (P1) incidents also applies to managing security incidents, which almost by definition are mostly “major”. They should be managed differently from regular incidents, once detected. This includes dedicated resources and incident managers that combine the technical expertise with the ability to facilitate major incident teams and drive them to resolution through excellent troubleshooting, decision-making and risk management.
What to do next
Security events can be stressful, especially since they will be outside the comfort zone of many of your IM personnel. This is especially true during complex incidents where professionals from other departments, such as HR, legal, compliance, or executive leadership, need to be involved. These professionals are used to working in a more structured way, and probably have had very little exposure to cybersecurity IR processes. This makes the IM team’s work even more critical. Indeed, if IM practices and guidelines are left out of cybersecurity IR, the recovery process could be dramatically less efficient and effective, resulting in more financial and reputational damage. That’s why your IM team is more important than ever in these days of continuous, chronic cybersecurity risk.
Kepner-Tregoe has empowered thousands of companies to solve millions of problems. KT provides a data-driven, consistent, scalable approach to clients in Operations, Manufacturing, IT Service Management, Technical Support, and Learning & Development. We empower you to solve problems. KT provides a unique combination of skill development and consulting services, designed specifically to reveal the root cause of problems and permanently address organizational challenges. Our approach to problem solving will deliver measurable results to any company looking to improve quality and effectiveness while reducing overall costs.