Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

2 Managing Cyber Incidents

The incident response framework detailed in the previous chapter provided the specific structure of a Computer Security Incident Response Team (CSIRT) and explained how the CSIRT will engage with other business units. The chapter further expanded on the necessary planning and preparation an organization should undertake to address cyber incidents. Unfortunately, planning and preparation cannot address all the variables and uncertainties inherent to cyber incidents.

This chapter will focus on executing the plans and frameworks detailed in Chapter 1 to properly manage a cyber incident. A solid foundation in, and an understanding of, cyber incident management allows organizations to put their plans into action more efficiently, communicate with key stakeholders in a timely manner, and, most importantly, lessen the potential damage or downtime of a cyber incident.

This chapter will address how to manage a cyber incident, examining the following topics:

Engaging the incident response team
Security orchestration, automation, and response
Incorporating crisis communications
Incorporating containment strategies
Getting back to normal – eradication, recovery, and post-incident activity

Engaging a CSIRT, much like a fire department, requires a set path of escalation. In the following sections, there are three CSIRT models that describe some options when looking at a proper escalation path.

Engaging the incident response team

A CSIRT functions is much the same way as a fire department. A fire department has specifically trained professionals who are tasked with responding to emergency situations with specialized equipment to contain and eradicate a fire. In order to engage a fire department, a citizen must contact emergency services and provide key information, such as the nature of the emergency, the location, and whether any lives are in danger. From here, this information is passed on to the fire department, which dispatches resources to the emergency.

The process of engaging a CSIRT is very similar to engaging a fire department. Internal or external personnel needs to escalate indications of a cybersecurity incident to the appropriate personnel. From here, resources are dispatched to the appropriate location(s), where those on the ground will take the lead in containing the incident and eradicating or limiting potential downtime or loss of data. To make this process as efficient as possible, the following components of the engagement process are critical:

CSIRT models provide a framework that places the CSIRT and the associated escalation procedures within the organizational structure
A war room describes the location from which the CSIRT manages the incident
Communication address the ability of the CSIRT to communicate properly
The rotation of staff manages the need to rest personnel during a prolonged incident

CSIRT engagement models

How an organization engages its CSIRT capability is largely dependent on how it is structured. Organizations configure their CSIRT to best fit their structure and resources. The following three basic structures can serve as a guide for placing the CSIRT within the most suitable part of the organization to facilitate speedy escalation, as well as capturing as many details of the incident as possible, in order for a proper investigation to take place.

Security operations center escalation

In this organizational model, the Security Operations Center (SOC) is responsible for handling the initial incident detection or investigation. In general, the SOC is responsible for the management of the security tools that monitor the network infrastructure. It has direct access to event management, intrusion prevention and detection, and antivirus systems. From here, it can view events, receive and review alerts, and process other security-related data.

SOC escalation is a common model among organizations that have a dedicated SOC, either through in-house personnel, a third-party Managed Security Service Provider (MSSP), or a Managed Detection and Response (MDR) provider. In this model, there are clearly defined steps, from the initial notification to the engagement of the CSIRT, as follows:

An alert or detection is received by the SOC or Tier 1 analyst.
The SOC or Tier 1 analyst conducts the initial analysis and then determines whether the alert or detection meets the criteria for an incident.
If warranted, the analyst will then escalate the incident to the SOC manager.
After a review by the SOC manager, the incident is escalated to an on-call incident response analyst(s).
The CSIRT analyst(s) will review the alert or detection and determine whether the incident warrants engaging the entire CSIRT capability based on its severity.
Depending on the severity of the incident, the CSIRT analysts will either address it or escalate it to the CSIRT manager to engage the entire CSIRT capability.

The following diagram shows the flow of incident escalation from detection to escalation to the CSIRT manager:

Figure 2.1 – The SOC engagement model

In this model, there are several issues of concern that need to be addressed by the CSIRT and SOC personnel, as follows:

First, engaging the CSIRT in this manner creates a situation where there are several individuals handling an incident before the CSIRT is fully engaged. This increases the time between the detection and the CSIRT response, which could increase the potential impact of an incident.
Second, if the incident escalation is not properly documented, the CSIRT manager would have to engage the SOC manager for clarification or additional information, thereby increasing the time taken to properly address an incident.
Third, the SOC personnel requires training to determine which observed events constitute an incident and which may be false positives. The CSIRT may suffer from burnout and become weary of the SOC chasing up false incidents.
Finally, communication between the SOC and the CSIRT needs to be clear and concise. Any gap in their ability to share information in real time will cause additional confusion.

Another variation of this model, common within organizations without a dedicated SOC, is where an initial security incident is received by either a helpdesk or a network operations center. This adds further complexity in terms of engaging the CSIRT in a timely manner, as this kind of personnel is often not trained to address incidents of this nature.

Tip

The best practice in a case such as this is to have several of the personnel on these teams trained in cybersecurity analysis, to address initial triage and a proper escalation.

A SOC integration model

To limit some of the drawbacks of the SOC escalation model, some organizations embed the SOC within the overall CSIRT. Placing the SOC in such a structure may prove to be a more efficient fit since the SOC has responsibility for the initial alerting and triaging function, which is directly related to the CSIRT.

In this model, the SOC analyst serves as the first tier. As previously discussed, they have the first view of security events or security control alerts. After processing and triaging the alert, they have the ability to immediately escalate the incident to the Tier 2 analyst, without having to engage a manager who would then escalate it to the CSIRT manager. This process is highlighted in the following diagram:

Figure 2.2 – A SOC integrated model

This model has some distinct advantages over the previous one. First, the CSIRT has a greater degree of visibility into what the SOC is seeing and doing. Furthermore, having the SOC embedded within the CSIRT allows the CSIRT manager and their team to craft more efficient policies and procedures related to incidents. A second distinct advantage of this approach is that the incident escalation is completed much faster and likely with greater precision. With the SOC analyst directly escalating to the next tier of CSIRT personnel, the entire process is much faster, and a more detailed analysis is performed as a result.

This approach works well in organizations with a dedicated SOC that is in-house and not outsourced. For organizations making use of a network operations center or a helpdesk, and without a dedicated SOC, this approach is not realistic, as those functions are often managed outside of the CSIRT or even the network security teams. One other issue is that, depending on the size of the SOC and CSIRTs, additional CSIRT managers may be required in order to address the day-to-day workload of both the SOC and the CSIRT.

A fusion center

As threat intelligence becomes an increasing part of daily security operations, one organizational structure that addresses this trend is the CSIRT fusion center. In this case, the CSIRT analysts, SOC analysts, and threat intelligence analysts team up within a single team structure. This merges the elements of a SOC- and CSIRT-combined structure with dedicated threat intelligence analysts. In such a scenario, the threat intelligence analysts would be responsible for augmenting incident investigations with external and internal resources related to the incident. They could also be leveraged for detailed analysis in other areas related to the incident. The following diagram shows the workflow from the fusion center director to the various personnel responsible for incident management:

Figure 2.3 – A fusion center model

As organizations continue to develop threat intelligence resources within their security operations, this model allows the CSIRT to make use of that capability without having to create new processes. In Chapter 17 we will discuss threat intelligence in greater depth and explain how this capability may enhance incident investigations.

Alongside additional personnel, the fusion center model makes use of additional technologies. The SOC models will often use a Security Information and Event Management (SIEM) system to provide network visibility and detect intrusions via log and alerting sources. The fusion center will often also make use of Security Orchestration, Automation, and Response (SOAR), which is discussed later in this chapter. Both tools provide network visibility and the ability to quickly pivot to key systems during an incident.

The CSIRT fusion center is not widely deployed, largely because threat intelligence integration is a relatively new methodology, as well as it being resource intensive. Very few organizations have the resources in either technology or personnel to make this type of structure effective. Pulling in full-time threat intelligence analysts, as well as various paid and open source feeds (and the technology to support them), is often cost-prohibitive. As a result of this, there are not many organizations that can leverage a full-time threat intelligence analyst as part of their CSIRT capability.

Investigating incidents

Once the CSIRT is engaged, one of its primary tasks is to investigate the incident. The lion’s share of this volume addresses the various methods that can be leveraged when investigating an incident. The primary goal of the CSIRT is to utilize methods that follow a systems analysis to address the following key facets of an incident:

Identifying the scope: In some incidents, the actual scope may not be clearly defined at the initial detection stage. For example, a law enforcement agency may contact an organization to indicate that a C2 server has been taken down. During an analysis of that system, the external IP address of the organization has been identified. From this data point, the scope is first defined as the entire network. From here, the CSIRT would analyze data from the firewall or web proxy to identify the internal systems that were found to be communicating with the C2 server. From this data, they would narrow down the initial scope of the incident to those systems that had been impacted.

When attempting to identify the scope of the incident, there is a drive to find patient zero, or the first system that was compromised. In some incidents, this may be easy to discover. A phishing email containing a PDF document that, when opened, executes malware can be easily identified by the user or security control. Other attacks may not be so obvious. While finding patient zero does provide a good deal of data for root-cause analysis, it is more important to identify the scope of the incident first, rather than looking for a single system.

Identifying the impact: Another key consideration is determining the impact of the incident. Those that have been exposed to the fundamental concepts of information security will be well familiar with the CIA triad. The CIA triad represents the elements of security within an information system: confidentiality, integrity, and availability. Any breach or violation of security will have an impact on one or more of these elements. For example, a ransomware incident that impacts 15 production servers impacts the availability of the data on those systems. Impacts on availability that are related to the incident, either occurring as a direct result, due to adversary actions, or the time it takes to respond and remediate, are important factors to determine the incident’s impact. Other incidents, such as the theft of intellectual property, impact the confidentiality of data. Finally, incidents involving unauthorized manipulation of source code or other data impact the integrity of that data. The following diagram highlights the CIA triad:

Figure 2.4 – The CIA triad

Understanding the potential impact of an incident is important for making decisions concerning the resources that are allocated for a response. A Distributed Denial-of-Service (DDoS) attack against a non-critical service on the web will not necessitate the same type of response as when discovering credit card-harvesting malware within a retail payment infrastructure. The impact also has a direct bearing on compliance with laws and other regulations. Understanding the potential impact of an incident on compliance is critical in ensuring that a proper response is conducted.

Identifying the root cause: The central question that IT professionals and managers will ask during, and especially after, an incident is: how did this happen? Organizations spend a great deal of money and resources on protecting their infrastructure. If an incident has occurred that causes an impact, there will be a need to understand how it happened. The goal of an incident investigation is to determine what sequence of events, vulnerabilities, or other conditions were present that led to the incident and its impact. Often, the root cause of an incident is not a simple vulnerability but a sequence of events and conditions that allowed an adversary to penetrate security systems and conduct their attack. Through an investigation, these events and conditions can be identified so that they are corrected, or otherwise controlled.
Incident attribution: One area of debate within incident investigation is incident attribution. With attribution, the CSIRT or investigative body attempts to determine which organization was behind the attack. Incidents may be attributed to nation-state actors, criminal groups, or other cyber adversaries.

While there is some importance to attribution from a threat intelligence perspective (Chapter 17 will address attribution as far as it relates to incident response), resources are better off spent on investigating or containing an incident. Attempting to ascertain the group or groups responsible for an attack is time-consuming, with few positive returns. If the organization’s leadership is adamant about determining attribution, the best approach is to comprehensively document the incident and pass off the data to a third party that specifically addresses attribution. Such organizations often combine data from several incident investigations to build a dossier on certain groups. If the data supplied matches the activities of one of these groups, they may be able to provide some context in terms of attribution.

The CSIRT war room

Another consideration when engaging a CSIRT is the need to have a single location from which the CSIRT can operate. There are several terms in use for the physical location that a CSIRT can operate from, such as a SOC or a crisis suite, but a simpler term is a war room. A war room can be set up as necessary; or, in some instances, a dedicated war room is set aside. In the former case, an existing meeting room is purposed as the war room for the duration of the entire incident. This is often the preferred option for those organizations that do not have a high enough number of incidents to necessitate a dedicated war room. For those organizations that experience a higher number of incidents or more complex incidents, there may be a need to create a dedicated war room.

The war room should have the following capabilities to facilitate a more orderly response to incidents:

Workspaces: Each member of the CSIRT core team should have a dedicated workspace in which to perform analysis and other incident-related tasks. Workspaces should include network connectivity, power, and monitors, along with specialized digital forensics tools.
Team displays: One of the frustrations that CSIRT members may encounter during an incident is the inability to share the output of the analysis. An overhead projector or a large screen can facilitate better sharing of data across the entire team.
Note sharing: Along the lines of sharing data through team displays, there may also be a need to share information among teams that are geographically dispersed. This may also be facilitated by using collaboration tools such as OneNote, SharePoint, or a wiki created for the incident.
Whiteboards: There is a good deal of information flowing in and out of a war room. Data related to assignments and running lists of compromised systems are best left on a whiteboard so that they are clear to everyone.
Limited access: The CSIRT should limit access to a war room to only those personnel who have a legitimate need to enter. Limiting access to this area prevents sensitive information from falling into the wrong hands.

Communications

One area of consideration that is often overlooked by organizations is how to communicate within a larger organization during an incident. With email, instant messaging, and voice calls, it may seem as though organizations already have the necessary tools to appropriately communicate internally. These communication platforms may need to be set aside in the event of an incident impacting user credentials, email systems, or other cloud-based collaboration platforms. For example, a common attack observed is the Office 365 cloud-based email being compromised. If attackers have gained access to the email system, they may have also compromised associated instant messaging applications, such as Skype. Given this, relying on these applications during an incident may, in fact, be providing the attackers with an insight into the actions of the CSIRT.

If it is suspected that these applications have been compromised, it is critical to have a secondary—and even tertiary—communications option. Commercially acquired cell phones are often a safe alternative. Furthermore, CSIRT members may leverage free or low-cost collaboration tools for a limited time. These can be leveraged until such time that the usual communication platforms are deemed safe for use.

Rotating staff

Prolonged incident investigations can begin to take their toll on CSIRT personnel, both physically and mentally. While it may seem prudent at the time to engage a team until an incident has been addressed, this can have a detrimental impact on the team’s ability to function. Studies have shown the negative cognitive effects of prolonged work with little rest. As a result, it is imperative that the Incident Commander (IC) places responders on shifts after approximately 12-24 hours have passed.

For example, approximately 24 hours after an incident investigation has been started, it will become necessary to start rotating personnel so that they have a rest period of 8 hours. This also includes the IC. During a prolonged incident, an alternative IC should be named, to ensure continuity and that each of the ICs gets the appropriate amount of rest.

Another strategy is to engage support elements during a period of inactivity in an incident. These periods of inactivity generally occur when an incident has been contained and potential Command-and-Control (C2) traffic has been addressed. Support personnel can be leveraged to monitor the network for any changes, giving the CSIRT time to rest.

SOAR

A CSIRT requires that a large and diverse group of people are brought together to properly address an incident. Whatever model an organization chooses to incorporate the functions of the CSIRT, there is still a good deal of coordination and information that needs to be analyzed and reported.

Note

SOAR technologies are most often found in organizations with a more mature security posture. This is usually in organizations that have a dedicated SOC or fusion center. Other key customers that utilize this technology are MSSP or MDR providers. This is due to the cost of not only purchasing a commercial SOAR product but also its continual maintenance. Most organizations will not have the need for such a platform if they are addressing a small number of incidents per year. This material is included for familiarizing purposes.

The technology research firm Gartner defines a SOAR as:

Solutions that combine incident response, orchestration and automation, and threat intelligence (TI) management capabilities in a single platform. SOAR tools are also used to document and implement processes (aka playbooks, workflows and processes); support security incident management; and apply machine-based assistance to human security analysts and operators.

SOAR platforms are the amalgamation of three separate tools. The first of these is an Incident Response Platform (IRP). This tool is used to manage incident response workflows, case management, and the CSIRT’s knowledge base. The second tool is the Security Orchestration and Automation (SOA). This tool is used to manage incident playbooks, workflows, and processes. The SOA also automates low-level tasks, such as isolating an endpoint in response to malware detection and then notifying a SOC or CSIRT analyst. The final tool that makes up a SOAR is a Threat Intelligence Platform (TIP). A TIP is used to aggregate Indicators of Compromise (IOC) from internal or external sources. The TIP can then be used to enrich an alert from an Intrusion Detection System (IDS) to provide further context to the detection.

For example, a malware detection tool is tied into an organization’s event management system. Something is detected and the information associated with the detection is forwarded to the SIEM. From here, the SIEM feeds data into the SOAR. Based on the parameters, the SOAR’s playbooks immediately isolate the system from the network. The file hash is compared against the threat intelligence feeds and indicates that the file belongs to the BazarLoader family. Finally, a notice is sent to the CSIRT’s Slack channel, and they are able to respond to the infection and clean the system before reconnecting it to the network.

There is a wide range of commercial and open source SOAR solutions. Even with the wide range of options available to organizations, most of these SOAR solutions, open source included, have the following capabilities:

Alert prioritization: CSIRT and SOC teams often have to address alerts from a variety of sources with a variety of severities. SOAR platforms are capable of tying a variety of alert sources, such as Endpoint Detection and Response (EDR) tools, IDSs and Intrusion Prevention Systems (IPSs), and Vulnerability Management Systems (VMSs), into a single source. From here, priorities for these alerts can be assigned to the SOC and CSIRT to ensure the most critical alerts are addressed first.
Automation: SOAR platforms have the capability to execute low-level tasks that are often executed by CSIRT or SOC personnel. In the previous example, the SOAR platform was configured to isolate the endpoint upon detection of the malware and then notify the CSIRT. Other actions include blocking file hashes and cutting off network connections.
Collaboration: The ability of the SOAR to aggregate alerts along with other incident data provides SOC and CSIRT personnel with the perfect platform to collaborate. Incidents can be investigated and documented in a central location, with actions being directed and communicated properly. Furthermore, everyone involved in an incident has visibility into other team members’ actions so that potential conflict can be avoided.
Threat intelligence enrichment: There is a range of external and internal threat intelligence sources that provide additional context around IOCs. A SOAR can be used to enrich a detection and provide context to the indicators. For example, an IP address detected by an IPS can be pulled into the SOAR, which indicates that the IP address is associated with the post-exploitation tool Cobalt Strike, enriching threat intelligence, providing additional information to the CSIRT, and alerting them to a potentially severe intrusion.
Reporting: SOAR platforms are also an excellent way to manage all the data across multiple incidents and provide extensive reporting on incidents and performance metrics. They will often have the ability to tailor the reporting to various audiences, including analysts, managers, and directors.

It is important to remember that the SOAR is not a replacement for a professional security analyst. Rather, SOC and CSIRT personnel should view the toolset as an augmentation that allows them to conduct investigations and respond at scale. It is nearly impossible to have complete visibility into even a modest-sized enterprise network. SOAR platforms perform much of the low-level activity so that CSIRT and SOC personnel can focus on the more high-level incident investigation and response activities.

Incorporating crisis communications

The notion that serious security incidents can be kept secret has long passed. High-profile security incidents, such as those that impacted Target and TJX, have been made very public. Adding to this lack of secrecy are the new breach notification laws that impact organizations across the globe. The General Data Protection Regulation (GDPR) Article 33 has a 72-hour breach notification requirement.

Other regulatory or compliance frameworks, such as the Health Insurance Portability and Accountability Act (HIPAA) Rule 45 CFR, § § 164.400-414, stipulate that notifications are to be made in the event of a data breach. Compounding legal and regulatory communication pressures need to be communicated to internal business units and external stakeholders. While it may seem that crafting and deploying a communications plan during an incident is a waste of resources, it has become a necessity in today’s legal and regulatory environment. When examining crisis communications, the three following focus areas need to be addressed:

Internal communications
External communications
Public notification

Each of these represents a specific audience and each requires different content and tone of messaging.

Internal communications

Internal communications are the kinds of communications that are limited to the business or organization’s internal personnel and reporting structure. Several business units need to be part of communications. The legal department will need to be kept abreast of the incident, as they will often have to determine reporting requirements and any additional regulatory requirements. Marketing and communications can be leveraged for crafting communications to external parties. This can best be facilitated by including them as early as possible in the process so that they have a full understanding of the incident and its impact. If the incident impacts any internal employees, Human Resources should also be included as part of internal communications.

One of the critical groups that are going to want to be informed as the incident unfolds is the C-suite and, more specifically, the CEO. A CSIRT will often fly well below the line of sight of senior leadership until there is a critical incident. At that point, the CEO will become very interested in the workings of the CSIRT and how they are addressing the incident.

With all of these parties needing to be kept in the loop, it is critical to ensure orderly communications and limit misinformation. To limit confusion, the IC or CSIRT lead should serve as a single point of contact. This way, for example, the legal department does not contact a CSIRT analyst to receive information about an investigation that is, at that time, speculative. Relying on this type of information can lead to serious legal consequences. To keep everyone informed, the CSIRT lead or IC should conduct periodic updates throughout each day of the incident. The cadence of these communications is dependent on the incident type and severity, but having a cadence of every 4 hours, with a conference call during the working hours of 6 a.m. to 10 p.m., will ensure that everyone is kept up to date.

In addition to a regular conference call, the CSIRT lead or the IC should prepare a daily status report to be sent to senior leadership. This daily status report does not have to be as comprehensive and detailed as a digital forensics report but should capture significant actions taken, any incident-related data that has been obtained, and any potential factors that may limit the ability of the CSIRT to function. At a minimum, a daily status meeting, in conjunction with this report, should be conducted with senior leadership and any other personnel that is required to be in attendance over the course of the incident.

External communications

Incidents may have a downstream impact on other external entities outside of the organization that is suffering the incident. Some of these external entities may include suppliers, customers, transaction processing facilities, or service providers. If any of these organizations have a direct link—such as a Virtual Private Network (VPN)—to the impacted organization, external partners need to be informed sooner rather than later. This is to limit any possibility that an attacker has leveraged this connection to compromise other organizations.

Note

A significant area of concern when addressing incident management and external communications for Managed Service Providers (MSPs) is the trend of attackers targeting MSPs first, with the intent of using them as a jumping-off point into other organizations through established VPNs.

One perfect example of this is the Target breach, where attackers compromised a Heating, Ventilation, and Air Conditioning (HVAC) vendor as the initial point of entry. Attackers are using this tried-and-true method of attacking MSPs using ransomware, now with the intent of compromising more than one organization per attack.

At a minimum, an organization should inform external parties that they are dealing with an incident and, as a precaution, the connection will be blocked until the incident has been addressed. This can then be followed up with additional information. Much like internal communications, setting a regular cadence may go a long way in smoothing out any damage to a working relationship as a result of an incident. In some cases, well-trusted external parties may be made part of regular daily status updates.

Public notification

As discussed previously, there are several legal and compliance requirements that need to be taken into consideration when discussing the notification of customers or the general public about an incident. Organizations may have to walk a fine line in terms of complying with the requirements of regulations such as HIPAA, without disclosing operational details of an incident still under investigation. Compounding this pressure are the possible implications on stock value or the potential for lost business. With all these pressures, it is critical to craft a message that is within the legal or compliance requirements but that also limits the damage to the organization’s reputation, revenue, or stock value.

Despite being directly related to the incident at hand, the CSIRT should not be responsible for crafting a public notification statement. Rather, the CSIRT should be available to provide insight into the incident investigation and answer any questions. The two best business units that should be involved in crafting a message are the legal and marketing departments. The marketing department would be tasked with crafting a message to limit the potential backlash from customers. The legal department would be tasked with crafting a message that meets legal or regulatory requirements. The CSIRT should advise as far as possible but these two business units should serve as the point of contact for any media or public questions.

Incorporating containment strategies

Containment strategies are the actions taken during an incident to limit damage to specific systems or areas of the network. It is critical for organizations to have prepared these in the event of an incident. The rise of ransomware that combines elements of viruses and worms that can quickly spread through an organization highlights the need to rapidly contain an outbreak before it impacts too many systems. What compounds the challenge of containment is that many enterprise IT systems utilize a flat topology, whereby the bulk of systems can communicate with each other. In this type of environment, ransomware and other worms can quickly propagate via legitimate protocols, such as Remote Desktop Services (RDS) or through the Server Message Block (SMB), which were popular during the WannaCry ransomware campaign, which leveraged the EternalBlue vulnerability in the Windows OS SMB installation. For more information, visit https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE-2017-0144.

In order to address containment, an organization should have a clear idea of the network topology. This type of network awareness can be achieved through outputs of network discovery tools, up-to-date network diagrams, system inventories, and vulnerability scans. This data should be shared with the CSIRT so that an overall view of the network can be achieved. From here, the CSIRT should coordinate containment plans with network operations personnel so that an overall containment strategy can be crafted, and the potential damage of an incident limited. Having network operations personnel as part of the technical support personnel goes a long way in ensuring this process is streamlined and containment is achieved as quickly as possible.

One other aspect of how infrastructure is managed that has a direct impact on incident management is that of change management. Mature IT infrastructures usually have a well-documented and governed change management process in place. During an incident, however, the CSIRT and support personnel cannot wait for change management authorization and a proper change window to implement changes. When exercising containment strategies, IT and organizational leadership should fully understand that changes are going to be made based on the incident. This does not absolve the CSIRT and IT personnel from exercising due care and ensuring that changes are well documented.

In terms of containing a malware outbreak such as a ransomware attack, there are several strategies that can be employed. Ideally, organizations should have some ability to isolate segments of the network from each other, but in the event that this is not possible, CSIRT and IT personnel can take one or more of the following measures:

Physical containment: In this case, the physical connection to the network is removed from the system. This can be as simple as unplugging the network cable, disabling wireless access, or disabling the connection through the operating system. While this sounds simple, there are several factors that can make this strategy challenging for even the smallest organization. The first is the ability to physically locate the systems impacted. This may be a simpler task inside a data center where the impacted systems are in the same rack, but attempting to physically locate 20 to 30 desktops in a fairly corporate environment takes a great deal of effort. In the time that it would take to remove 20 systems from the network, the malware could have easily spread across to other systems. Further compounding the difficulty of physical containment is the challenge of addressing geographically diverse systems. Having a data center or other operating site an hour’s drive away would necessitate having an individual on that site to perform the physical containment. As you can imagine, physically containing a malware outbreak or another incident can be very difficult if the scope of the incident is beyond the capability of the CSIRT. Physical containment should be reserved for those incidents where the scope is limited and the CSIRT personnel can immediately remove the systems from the network.
Network containment: A network containment strategy relies heavily on the expertise of network engineers or architects. It is for this reason that they are often included as part of the technical support personnel within the CSIRT and should be involved in any containment strategy planning. With this containment strategy, the network administrator(s) will be tasked with modifying switch configurations to limit the traffic from infected systems on a subnet to other portions of the network. This containment strategy may require modification of configurations on individual switches or using the management console. One aspect of this approach that needs to be addressed is how the organization handles change control. In many organizations, it is common practice to review any switch configuration changes as part of the normal change control process. There needs to be an exception written into that process to facilitate the rapid deployment of switch configuration changes during a declared incident. Network administrators should also ensure that any changes that are made are properly documented so that they can be reversed or otherwise modified during the recovery phase of an incident.
Perimeter containment: The perimeter firewall is an asset well suited for containment. In some circumstances, the perimeter firewall can be utilized in conjunction with network containment in a Russian nesting-doll approach, where the CSIRT contains network traffic at the perimeter and works its way to the specific subnets containing the impacted systems. For example, malware will often download additional code or other packages via tools such as PowerShell. In the event that the CSIRT has identified the external IP address that is utilized by the malware to download additional packages, it can be blocked at the firewall, thereby preventing additional damage. From here, the CSIRT can then work backward from the perimeter to the impacted systems. The organization can then leave the rule in place until such time that it is deemed no longer necessary. As with network containment, it is important to address any change control issues that may arise from making changes to the firewall ruleset.
Virtual containment: With the advent of cloud computing and virtualization, many organizations have at least partially moved systems such as servers from physical systems to virtualized systems. Virtualization provides a great deal of flexibility to organizations during normal operations but is also advantageous in the event that an incident may need to be contained. First, hypervisor software such as VMware’s ESXi platform can be utilized to remove the network connection from multiple systems at once. Organizations may also make use of virtual switching in much the same way as physical switches in terms of containment. Finally, virtualization software allows for the pausing of systems during an incident. This is the preferred method, as suspending or pausing a virtual machine during an incident preserves a good deal of evidence that can be examined later.

Once an incident is properly contained, the CSIRT and other personnel have some time to organize and begin the process of investigating the incident. They are also well situated to begin the process of removing the intruder and their tools from the network.

Getting back to normal – eradication, recovery, and post-incident activity

Once an incident has been properly and comprehensively investigated, it is time to move into the eradication and recovery phase. There may be a good deal of haste in getting to this stage, as there is a strong desire to return to normal operations. While there may be business drivers at play here, rushing eradication and recovery may reintroduce an unidentified compromised system that has been overlooked. In other scenarios, it could be possible to miss the patching of previously compromised systems, leaving them open to the same exploits that previously compromised them or, worse, placing a still-infected system back on the network. For this reason, we will thoroughly address both eradication and recovery strategies.

Eradication strategies

The unfortunate reality with modern malware is that there is no surefire way to ensure that all malicious code has been removed. In the past, organizations could simply scan the system with an antivirus program to trace the offending malicious code. Now, with malware techniques such as process injection or DLL hijacking, even if the original code is removed, there is still a chance that the system is still infected. There is also the possibility that additional code that has been downloaded is also installed and will go undetected. As a result, most eradication strategies rely on taking infected machines and reimaging them with a known good image or reverting to a known good backup.

A strategy that is often employed in the cases of malware and ransomware is to make use of three separate Virtual LAN (VLAN) segments and reimage the infected machines. First, all the infected machines are placed onto their own separate VLAN. From here, the CSIRT or system administrator will move one of the infected systems onto a secondary staging VLAN. The system is then reimaged with a known good image, or a known good backup is utilized. From here, once the system has been reimaged or has the backup installed, it is then moved to a production VLAN, where additional monitoring is conducted to ensure that there is no remaining infection or compromise. The following diagram shows a simple network structure that facilitates this recovery strategy:

Figure 2.5 – A system’s eradication and recovery architecture

While this method may be time-consuming, it is an excellent way to ensure that all systems that have been impacted have been addressed.

In the case of virtual systems, if the containment strategy previously discussed has been employed, the infected virtual systems will have no network connectivity. From here, the most straightforward eradication strategy is to revert systems to the last-known good snapshot. Once the system has been restarted, it should be connected to the VLAN with enhanced monitoring. It is important that in the case of reverting to snapshots, the CSIRT has a great deal of confidence in the timeline. If the CSIRT is unsure about the timeline of an attack, there is a possibility that the snapshot may be compromised as well. This is especially true in organizations that conduct snapshots regularly.

Recovery strategies

In terms of recovery, there are several tasks that the CSIRT will need to manage to bring operations back to normal. The first of these is to ensure that all systems—not just those that have been through the eradication phase but all systems—are properly patched with the most up-to-date patches. This is critical in instances where the attacker has taken advantage of a zero-day exploit or a relatively new vulnerability. In cases where a patch is not forthcoming from the manufacturer, the CSIRT should recommend additional security controls to mitigate any residual risk.

A second piece of the recovery phase is for the CSIRT to work with IT and information security personnel in crafting additional detection and prevention alerts. During the examination of the evidence when determining the root cause, or in the containment phase, the CSIRT may have provided data for detection and prevention controls. The CSIRT should work with other personnel to augment those with additional detective and preventive rules. These additional rules may be specific to the incident or may pertain to the specific vulnerabilities identified.

Third, any changes that were made to the infrastructure should be reviewed. These changes can be initially reviewed by the CSIRT and IT personnel to determine whether they are still required or can be removed. If changes are required in the long term, they should be evaluated by the organization’s change control, and approved according to the change control process.

Fourth, before the incident can be closed out, it is good practice to conduct a full vulnerability scan of all systems. This is critical to ensure that any systems that have been compromised have been addressed. Additionally, this step will also address any other systems that may not have been impacted by the security incident, ensuring that they are nonetheless patched for any security vulnerabilities.

Finally, at the end of an incident, it is important to conduct an After-Action Review (AAR). This review goes over the entire incident from start to finish. All actions taken by the CSIRT personnel are reviewed. In addition, the plans and playbooks that were utilized are also reviewed in light of the incident actions. Any deficiencies, such as a lack of specific tools, training, or processes, should be brought up so that they may be corrected. The output of this AAR should be documented as part of the overall incident documentation.

Post-incident activity

There are often many lessons that can be gleaned from an incident investigation and the activity associated with the incident response. These hard-won lessons should be captured as soon as the organization has returned to normal operations. The best way to address this is through an AAR. All individuals that were involved in the incident should be brought together, either virtually or, if possible, together in the same location. Usually, the IC serves as the lead in this effort, with another individual who functions as the scribe to capture all the pertinent details.

Overall, an AAR of the incident should examine not only what went right during the incident but also what went wrong and therefore needs improvement. The following is a list of sample questions that can be asked as part of the AAR:

How was the incident detected and was this detection made in a timely manner?
What was the initial severity indicated?
Were the escalation procedures sufficient to capture the needed information?
What containment strategies were implemented? How effective were they?
Was there sufficient evidence/time to determine the root cause of the incident?
Were communications between CSIRT elements clear, concise, and timely?

This list should serve as a starting point for the overall AAR. Depending on the severity and length of the incident, the AAR may take anywhere from minutes to a few hours to address all salient points and identify gaps in the organization’s capability. The goal here is to capture this information and integrate it into the improvement of CSIRT policies and procedures.

Summary

Planning for an incident is critical. Equally critical is the proper management of an incident. This involves several elements that each CSIRT must address during the life of an incident.

Proper logistics provide the necessary elements for the CSIRT to function. Having strategies to communicate incident information to leadership, third parties, and customers keeps these stakeholders informed, lessens speculation, and ensures that compliance requirements are met. Incident investigation allows the CSIRT to properly identify the attack and the scope and limit damage via a proper containment strategy. Finally, these elements are all part of eradicating an adversary’s ability to access a network and helping an organization to return to normal. As we stated at the beginning of this book, everyone has a plan until they get hit in the face. The real value of a CSIRT to an organization is not in the plans and playbooks, but in how well they perform when an incident occurs.

The next chapter will expand on the incident investigation portion of incident management by providing the digital forensics framework to which CSIRT personnel adhere.

Questions

Which of the following containment strategies is the most difficult to perform?
1. Physical
2. Network
3. Perimeter
4. Virtual
A cyber security breach can have an impact on which of the following?
1. Confidentiality
2. Integrity
3. Availability
4. All of the above
Attribution is critical and has to be completed for a successful incident investigation.
1. True
2. False

Table of Contents for
Chapter 2: Managing Cyber Incidents

2

Managing Cyber Incidents

Engaging the incident response team