The incident response framework detailed in the previous chapter provided the specific structure of a Computer Security Incident Response Team (CSIRT) and explained how the CSIRT will engage with other business units. The chapter further expanded on the necessary planning and preparation an organization should undertake to address cyber incidents. Unfortunately, planning and preparation cannot address all the variables and uncertainties inherent to cyber incidents.
This chapter will focus on executing the plans and frameworks detailed in Chapter 1 to properly manage a cyber incident. A solid foundation in, and an understanding of, cyber incident management allows organizations to put their plans into action more efficiently, communicate with key stakeholders in a timely manner, and, most importantly, lessen the potential damage or downtime of a cyber incident.
This chapter will address how to manage a cyber incident, examining the following topics:
Engaging a CSIRT, much like a fire department, requires a set path of escalation. In the following sections, there are three CSIRT models that describe some options when looking at a proper escalation path.
A CSIRT functions is much the same way as a fire department. A fire department has specifically trained professionals who are tasked with responding to emergency situations with specialized equipment to contain and eradicate a fire. In order to engage a fire department, a citizen must contact emergency services and provide key information, such as the nature of the emergency, the location, and whether any lives are in danger. From here, this information is passed on to the fire department, which dispatches resources to the emergency.
The process of engaging a CSIRT is very similar to engaging a fire department. Internal or external personnel needs to escalate indications of a cybersecurity incident to the appropriate personnel. From here, resources are dispatched to the appropriate location(s), where those on the ground will take the lead in containing the incident and eradicating or limiting potential downtime or loss of data. To make this process as efficient as possible, the following components of the engagement process are critical:
How an organization engages its CSIRT capability is largely dependent on how it is structured. Organizations configure their CSIRT to best fit their structure and resources. The following three basic structures can serve as a guide for placing the CSIRT within the most suitable part of the organization to facilitate speedy escalation, as well as capturing as many details of the incident as possible, in order for a proper investigation to take place.
In this organizational model, the Security Operations Center (SOC) is responsible for handling the initial incident detection or investigation. In general, the SOC is responsible for the management of the security tools that monitor the network infrastructure. It has direct access to event management, intrusion prevention and detection, and antivirus systems. From here, it can view events, receive and review alerts, and process other security-related data.
SOC escalation is a common model among organizations that have a dedicated SOC, either through in-house personnel, a third-party Managed Security Service Provider (MSSP), or a Managed Detection and Response (MDR) provider. In this model, there are clearly defined steps, from the initial notification to the engagement of the CSIRT, as follows:
The following diagram shows the flow of incident escalation from detection to escalation to the CSIRT manager:
Figure 2.1 – The SOC engagement model
In this model, there are several issues of concern that need to be addressed by the CSIRT and SOC personnel, as follows:
Another variation of this model, common within organizations without a dedicated SOC, is where an initial security incident is received by either a helpdesk or a network operations center. This adds further complexity in terms of engaging the CSIRT in a timely manner, as this kind of personnel is often not trained to address incidents of this nature.
Tip
The best practice in a case such as this is to have several of the personnel on these teams trained in cybersecurity analysis, to address initial triage and a proper escalation.
To limit some of the drawbacks of the SOC escalation model, some organizations embed the SOC within the overall CSIRT. Placing the SOC in such a structure may prove to be a more efficient fit since the SOC has responsibility for the initial alerting and triaging function, which is directly related to the CSIRT.
In this model, the SOC analyst serves as the first tier. As previously discussed, they have the first view of security events or security control alerts. After processing and triaging the alert, they have the ability to immediately escalate the incident to the Tier 2 analyst, without having to engage a manager who would then escalate it to the CSIRT manager. This process is highlighted in the following diagram:
Figure 2.2 – A SOC integrated model
This model has some distinct advantages over the previous one. First, the CSIRT has a greater degree of visibility into what the SOC is seeing and doing. Furthermore, having the SOC embedded within the CSIRT allows the CSIRT manager and their team to craft more efficient policies and procedures related to incidents. A second distinct advantage of this approach is that the incident escalation is completed much faster and likely with greater precision. With the SOC analyst directly escalating to the next tier of CSIRT personnel, the entire process is much faster, and a more detailed analysis is performed as a result.
This approach works well in organizations with a dedicated SOC that is in-house and not outsourced. For organizations making use of a network operations center or a helpdesk, and without a dedicated SOC, this approach is not realistic, as those functions are often managed outside of the CSIRT or even the network security teams. One other issue is that, depending on the size of the SOC and CSIRTs, additional CSIRT managers may be required in order to address the day-to-day workload of both the SOC and the CSIRT.
As threat intelligence becomes an increasing part of daily security operations, one organizational structure that addresses this trend is the CSIRT fusion center. In this case, the CSIRT analysts, SOC analysts, and threat intelligence analysts team up within a single team structure. This merges the elements of a SOC- and CSIRT-combined structure with dedicated threat intelligence analysts. In such a scenario, the threat intelligence analysts would be responsible for augmenting incident investigations with external and internal resources related to the incident. They could also be leveraged for detailed analysis in other areas related to the incident. The following diagram shows the workflow from the fusion center director to the various personnel responsible for incident management:
Figure 2.3 – A fusion center model
As organizations continue to develop threat intelligence resources within their security operations, this model allows the CSIRT to make use of that capability without having to create new processes. In Chapter 17 we will discuss threat intelligence in greater depth and explain how this capability may enhance incident investigations.
Alongside additional personnel, the fusion center model makes use of additional technologies. The SOC models will often use a Security Information and Event Management (SIEM) system to provide network visibility and detect intrusions via log and alerting sources. The fusion center will often also make use of Security Orchestration, Automation, and Response (SOAR), which is discussed later in this chapter. Both tools provide network visibility and the ability to quickly pivot to key systems during an incident.
The CSIRT fusion center is not widely deployed, largely because threat intelligence integration is a relatively new methodology, as well as it being resource intensive. Very few organizations have the resources in either technology or personnel to make this type of structure effective. Pulling in full-time threat intelligence analysts, as well as various paid and open source feeds (and the technology to support them), is often cost-prohibitive. As a result of this, there are not many organizations that can leverage a full-time threat intelligence analyst as part of their CSIRT capability.
Once the CSIRT is engaged, one of its primary tasks is to investigate the incident. The lion’s share of this volume addresses the various methods that can be leveraged when investigating an incident. The primary goal of the CSIRT is to utilize methods that follow a systems analysis to address the following key facets of an incident:
When attempting to identify the scope of the incident, there is a drive to find patient zero, or the first system that was compromised. In some incidents, this may be easy to discover. A phishing email containing a PDF document that, when opened, executes malware can be easily identified by the user or security control. Other attacks may not be so obvious. While finding patient zero does provide a good deal of data for root-cause analysis, it is more important to identify the scope of the incident first, rather than looking for a single system.
Figure 2.4 – The CIA triad
Understanding the potential impact of an incident is important for making decisions concerning the resources that are allocated for a response. A Distributed Denial-of-Service (DDoS) attack against a non-critical service on the web will not necessitate the same type of response as when discovering credit card-harvesting malware within a retail payment infrastructure. The impact also has a direct bearing on compliance with laws and other regulations. Understanding the potential impact of an incident on compliance is critical in ensuring that a proper response is conducted.
While there is some importance to attribution from a threat intelligence perspective (Chapter 17 will address attribution as far as it relates to incident response), resources are better off spent on investigating or containing an incident. Attempting to ascertain the group or groups responsible for an attack is time-consuming, with few positive returns. If the organization’s leadership is adamant about determining attribution, the best approach is to comprehensively document the incident and pass off the data to a third party that specifically addresses attribution. Such organizations often combine data from several incident investigations to build a dossier on certain groups. If the data supplied matches the activities of one of these groups, they may be able to provide some context in terms of attribution.
Another consideration when engaging a CSIRT is the need to have a single location from which the CSIRT can operate. There are several terms in use for the physical location that a CSIRT can operate from, such as a SOC or a crisis suite, but a simpler term is a war room. A war room can be set up as necessary; or, in some instances, a dedicated war room is set aside. In the former case, an existing meeting room is purposed as the war room for the duration of the entire incident. This is often the preferred option for those organizations that do not have a high enough number of incidents to necessitate a dedicated war room. For those organizations that experience a higher number of incidents or more complex incidents, there may be a need to create a dedicated war room.
The war room should have the following capabilities to facilitate a more orderly response to incidents:
One area of consideration that is often overlooked by organizations is how to communicate within a larger organization during an incident. With email, instant messaging, and voice calls, it may seem as though organizations already have the necessary tools to appropriately communicate internally. These communication platforms may need to be set aside in the event of an incident impacting user credentials, email systems, or other cloud-based collaboration platforms. For example, a common attack observed is the Office 365 cloud-based email being compromised. If attackers have gained access to the email system, they may have also compromised associated instant messaging applications, such as Skype. Given this, relying on these applications during an incident may, in fact, be providing the attackers with an insight into the actions of the CSIRT.
If it is suspected that these applications have been compromised, it is critical to have a secondary—and even tertiary—communications option. Commercially acquired cell phones are often a safe alternative. Furthermore, CSIRT members may leverage free or low-cost collaboration tools for a limited time. These can be leveraged until such time that the usual communication platforms are deemed safe for use.
Prolonged incident investigations can begin to take their toll on CSIRT personnel, both physically and mentally. While it may seem prudent at the time to engage a team until an incident has been addressed, this can have a detrimental impact on the team’s ability to function. Studies have shown the negative cognitive effects of prolonged work with little rest. As a result, it is imperative that the Incident Commander (IC) places responders on shifts after approximately 12-24 hours have passed.
For example, approximately 24 hours after an incident investigation has been started, it will become necessary to start rotating personnel so that they have a rest period of 8 hours. This also includes the IC. During a prolonged incident, an alternative IC should be named, to ensure continuity and that each of the ICs gets the appropriate amount of rest.
Another strategy is to engage support elements during a period of inactivity in an incident. These periods of inactivity generally occur when an incident has been contained and potential Command-and-Control (C2) traffic has been addressed. Support personnel can be leveraged to monitor the network for any changes, giving the CSIRT time to rest.
A CSIRT requires that a large and diverse group of people are brought together to properly address an incident. Whatever model an organization chooses to incorporate the functions of the CSIRT, there is still a good deal of coordination and information that needs to be analyzed and reported.
Note
SOAR technologies are most often found in organizations with a more mature security posture. This is usually in organizations that have a dedicated SOC or fusion center. Other key customers that utilize this technology are MSSP or MDR providers. This is due to the cost of not only purchasing a commercial SOAR product but also its continual maintenance. Most organizations will not have the need for such a platform if they are addressing a small number of incidents per year. This material is included for familiarizing purposes.
The technology research firm Gartner defines a SOAR as:
SOAR platforms are the amalgamation of three separate tools. The first of these is an Incident Response Platform (IRP). This tool is used to manage incident response workflows, case management, and the CSIRT’s knowledge base. The second tool is the Security Orchestration and Automation (SOA). This tool is used to manage incident playbooks, workflows, and processes. The SOA also automates low-level tasks, such as isolating an endpoint in response to malware detection and then notifying a SOC or CSIRT analyst. The final tool that makes up a SOAR is a Threat Intelligence Platform (TIP). A TIP is used to aggregate Indicators of Compromise (IOC) from internal or external sources. The TIP can then be used to enrich an alert from an Intrusion Detection System (IDS) to provide further context to the detection.
For example, a malware detection tool is tied into an organization’s event management system. Something is detected and the information associated with the detection is forwarded to the SIEM. From here, the SIEM feeds data into the SOAR. Based on the parameters, the SOAR’s playbooks immediately isolate the system from the network. The file hash is compared against the threat intelligence feeds and indicates that the file belongs to the BazarLoader family. Finally, a notice is sent to the CSIRT’s Slack channel, and they are able to respond to the infection and clean the system before reconnecting it to the network.
There is a wide range of commercial and open source SOAR solutions. Even with the wide range of options available to organizations, most of these SOAR solutions, open source included, have the following capabilities:
It is important to remember that the SOAR is not a replacement for a professional security analyst. Rather, SOC and CSIRT personnel should view the toolset as an augmentation that allows them to conduct investigations and respond at scale. It is nearly impossible to have complete visibility into even a modest-sized enterprise network. SOAR platforms perform much of the low-level activity so that CSIRT and SOC personnel can focus on the more high-level incident investigation and response activities.
The notion that serious security incidents can be kept secret has long passed. High-profile security incidents, such as those that impacted Target and TJX, have been made very public. Adding to this lack of secrecy are the new breach notification laws that impact organizations across the globe. The General Data Protection Regulation (GDPR) Article 33 has a 72-hour breach notification requirement.
Other regulatory or compliance frameworks, such as the Health Insurance Portability and Accountability Act (HIPAA) Rule 45 CFR, § § 164.400-414, stipulate that notifications are to be made in the event of a data breach. Compounding legal and regulatory communication pressures need to be communicated to internal business units and external stakeholders. While it may seem that crafting and deploying a communications plan during an incident is a waste of resources, it has become a necessity in today’s legal and regulatory environment. When examining crisis communications, the three following focus areas need to be addressed:
Each of these represents a specific audience and each requires different content and tone of messaging.
Internal communications are the kinds of communications that are limited to the business or organization’s internal personnel and reporting structure. Several business units need to be part of communications. The legal department will need to be kept abreast of the incident, as they will often have to determine reporting requirements and any additional regulatory requirements. Marketing and communications can be leveraged for crafting communications to external parties. This can best be facilitated by including them as early as possible in the process so that they have a full understanding of the incident and its impact. If the incident impacts any internal employees, Human Resources should also be included as part of internal communications.
One of the critical groups that are going to want to be informed as the incident unfolds is the C-suite and, more specifically, the CEO. A CSIRT will often fly well below the line of sight of senior leadership until there is a critical incident. At that point, the CEO will become very interested in the workings of the CSIRT and how they are addressing the incident.
With all of these parties needing to be kept in the loop, it is critical to ensure orderly communications and limit misinformation. To limit confusion, the IC or CSIRT lead should serve as a single point of contact. This way, for example, the legal department does not contact a CSIRT analyst to receive information about an investigation that is, at that time, speculative. Relying on this type of information can lead to serious legal consequences. To keep everyone informed, the CSIRT lead or IC should conduct periodic updates throughout each day of the incident. The cadence of these communications is dependent on the incident type and severity, but having a cadence of every 4 hours, with a conference call during the working hours of 6 a.m. to 10 p.m., will ensure that everyone is kept up to date.
In addition to a regular conference call, the CSIRT lead or the IC should prepare a daily status report to be sent to senior leadership. This daily status report does not have to be as comprehensive and detailed as a digital forensics report but should capture significant actions taken, any incident-related data that has been obtained, and any potential factors that may limit the ability of the CSIRT to function. At a minimum, a daily status meeting, in conjunction with this report, should be conducted with senior leadership and any other personnel that is required to be in attendance over the course of the incident.
Incidents may have a downstream impact on other external entities outside of the organization that is suffering the incident. Some of these external entities may include suppliers, customers, transaction processing facilities, or service providers. If any of these organizations have a direct link—such as a Virtual Private Network (VPN)—to the impacted organization, external partners need to be informed sooner rather than later. This is to limit any possibility that an attacker has leveraged this connection to compromise other organizations.
Note
A significant area of concern when addressing incident management and external communications for Managed Service Providers (MSPs) is the trend of attackers targeting MSPs first, with the intent of using them as a jumping-off point into other organizations through established VPNs.
One perfect example of this is the Target breach, where attackers compromised a Heating, Ventilation, and Air Conditioning (HVAC) vendor as the initial point of entry. Attackers are using this tried-and-true method of attacking MSPs using ransomware, now with the intent of compromising more than one organization per attack.
At a minimum, an organization should inform external parties that they are dealing with an incident and, as a precaution, the connection will be blocked until the incident has been addressed. This can then be followed up with additional information. Much like internal communications, setting a regular cadence may go a long way in smoothing out any damage to a working relationship as a result of an incident. In some cases, well-trusted external parties may be made part of regular daily status updates.
As discussed previously, there are several legal and compliance requirements that need to be taken into consideration when discussing the notification of customers or the general public about an incident. Organizations may have to walk a fine line in terms of complying with the requirements of regulations such as HIPAA, without disclosing operational details of an incident still under investigation. Compounding this pressure are the possible implications on stock value or the potential for lost business. With all these pressures, it is critical to craft a message that is within the legal or compliance requirements but that also limits the damage to the organization’s reputation, revenue, or stock value.
Despite being directly related to the incident at hand, the CSIRT should not be responsible for crafting a public notification statement. Rather, the CSIRT should be available to provide insight into the incident investigation and answer any questions. The two best business units that should be involved in crafting a message are the legal and marketing departments. The marketing department would be tasked with crafting a message to limit the potential backlash from customers. The legal department would be tasked with crafting a message that meets legal or regulatory requirements. The CSIRT should advise as far as possible but these two business units should serve as the point of contact for any media or public questions.
Containment strategies are the actions taken during an incident to limit damage to specific systems or areas of the network. It is critical for organizations to have prepared these in the event of an incident. The rise of ransomware that combines elements of viruses and worms that can quickly spread through an organization highlights the need to rapidly contain an outbreak before it impacts too many systems. What compounds the challenge of containment is that many enterprise IT systems utilize a flat topology, whereby the bulk of systems can communicate with each other. In this type of environment, ransomware and other worms can quickly propagate via legitimate protocols, such as Remote Desktop Services (RDS) or through the Server Message Block (SMB), which were popular during the WannaCry ransomware campaign, which leveraged the EternalBlue vulnerability in the Windows OS SMB installation. For more information, visit https://cve.mitre.org/cgi- bin/cvename.cgi?name=CVE-2017-0144.
In order to address containment, an organization should have a clear idea of the network topology. This type of network awareness can be achieved through outputs of network discovery tools, up-to-date network diagrams, system inventories, and vulnerability scans. This data should be shared with the CSIRT so that an overall view of the network can be achieved. From here, the CSIRT should coordinate containment plans with network operations personnel so that an overall containment strategy can be crafted, and the potential damage of an incident limited. Having network operations personnel as part of the technical support personnel goes a long way in ensuring this process is streamlined and containment is achieved as quickly as possible.
One other aspect of how infrastructure is managed that has a direct impact on incident management is that of change management. Mature IT infrastructures usually have a well-documented and governed change management process in place. During an incident, however, the CSIRT and support personnel cannot wait for change management authorization and a proper change window to implement changes. When exercising containment strategies, IT and organizational leadership should fully understand that changes are going to be made based on the incident. This does not absolve the CSIRT and IT personnel from exercising due care and ensuring that changes are well documented.
In terms of containing a malware outbreak such as a ransomware attack, there are several strategies that can be employed. Ideally, organizations should have some ability to isolate segments of the network from each other, but in the event that this is not possible, CSIRT and IT personnel can take one or more of the following measures:
Once an incident is properly contained, the CSIRT and other personnel have some time to organize and begin the process of investigating the incident. They are also well situated to begin the process of removing the intruder and their tools from the network.
Once an incident has been properly and comprehensively investigated, it is time to move into the eradication and recovery phase. There may be a good deal of haste in getting to this stage, as there is a strong desire to return to normal operations. While there may be business drivers at play here, rushing eradication and recovery may reintroduce an unidentified compromised system that has been overlooked. In other scenarios, it could be possible to miss the patching of previously compromised systems, leaving them open to the same exploits that previously compromised them or, worse, placing a still-infected system back on the network. For this reason, we will thoroughly address both eradication and recovery strategies.
The unfortunate reality with modern malware is that there is no surefire way to ensure that all malicious code has been removed. In the past, organizations could simply scan the system with an antivirus program to trace the offending malicious code. Now, with malware techniques such as process injection or DLL hijacking, even if the original code is removed, there is still a chance that the system is still infected. There is also the possibility that additional code that has been downloaded is also installed and will go undetected. As a result, most eradication strategies rely on taking infected machines and reimaging them with a known good image or reverting to a known good backup.
A strategy that is often employed in the cases of malware and ransomware is to make use of three separate Virtual LAN (VLAN) segments and reimage the infected machines. First, all the infected machines are placed onto their own separate VLAN. From here, the CSIRT or system administrator will move one of the infected systems onto a secondary staging VLAN. The system is then reimaged with a known good image, or a known good backup is utilized. From here, once the system has been reimaged or has the backup installed, it is then moved to a production VLAN, where additional monitoring is conducted to ensure that there is no remaining infection or compromise. The following diagram shows a simple network structure that facilitates this recovery strategy:
Figure 2.5 – A system’s eradication and recovery architecture
While this method may be time-consuming, it is an excellent way to ensure that all systems that have been impacted have been addressed.
In the case of virtual systems, if the containment strategy previously discussed has been employed, the infected virtual systems will have no network connectivity. From here, the most straightforward eradication strategy is to revert systems to the last-known good snapshot. Once the system has been restarted, it should be connected to the VLAN with enhanced monitoring. It is important that in the case of reverting to snapshots, the CSIRT has a great deal of confidence in the timeline. If the CSIRT is unsure about the timeline of an attack, there is a possibility that the snapshot may be compromised as well. This is especially true in organizations that conduct snapshots regularly.
In terms of recovery, there are several tasks that the CSIRT will need to manage to bring operations back to normal. The first of these is to ensure that all systems—not just those that have been through the eradication phase but all systems—are properly patched with the most up-to-date patches. This is critical in instances where the attacker has taken advantage of a zero-day exploit or a relatively new vulnerability. In cases where a patch is not forthcoming from the manufacturer, the CSIRT should recommend additional security controls to mitigate any residual risk.
A second piece of the recovery phase is for the CSIRT to work with IT and information security personnel in crafting additional detection and prevention alerts. During the examination of the evidence when determining the root cause, or in the containment phase, the CSIRT may have provided data for detection and prevention controls. The CSIRT should work with other personnel to augment those with additional detective and preventive rules. These additional rules may be specific to the incident or may pertain to the specific vulnerabilities identified.
Third, any changes that were made to the infrastructure should be reviewed. These changes can be initially reviewed by the CSIRT and IT personnel to determine whether they are still required or can be removed. If changes are required in the long term, they should be evaluated by the organization’s change control, and approved according to the change control process.
Fourth, before the incident can be closed out, it is good practice to conduct a full vulnerability scan of all systems. This is critical to ensure that any systems that have been compromised have been addressed. Additionally, this step will also address any other systems that may not have been impacted by the security incident, ensuring that they are nonetheless patched for any security vulnerabilities.
Finally, at the end of an incident, it is important to conduct an After-Action Review (AAR). This review goes over the entire incident from start to finish. All actions taken by the CSIRT personnel are reviewed. In addition, the plans and playbooks that were utilized are also reviewed in light of the incident actions. Any deficiencies, such as a lack of specific tools, training, or processes, should be brought up so that they may be corrected. The output of this AAR should be documented as part of the overall incident documentation.
There are often many lessons that can be gleaned from an incident investigation and the activity associated with the incident response. These hard-won lessons should be captured as soon as the organization has returned to normal operations. The best way to address this is through an AAR. All individuals that were involved in the incident should be brought together, either virtually or, if possible, together in the same location. Usually, the IC serves as the lead in this effort, with another individual who functions as the scribe to capture all the pertinent details.
Overall, an AAR of the incident should examine not only what went right during the incident but also what went wrong and therefore needs improvement. The following is a list of sample questions that can be asked as part of the AAR:
This list should serve as a starting point for the overall AAR. Depending on the severity and length of the incident, the AAR may take anywhere from minutes to a few hours to address all salient points and identify gaps in the organization’s capability. The goal here is to capture this information and integrate it into the improvement of CSIRT policies and procedures.
Planning for an incident is critical. Equally critical is the proper management of an incident. This involves several elements that each CSIRT must address during the life of an incident.
Proper logistics provide the necessary elements for the CSIRT to function. Having strategies to communicate incident information to leadership, third parties, and customers keeps these stakeholders informed, lessens speculation, and ensures that compliance requirements are met. Incident investigation allows the CSIRT to properly identify the attack and the scope and limit damage via a proper containment strategy. Finally, these elements are all part of eradicating an adversary’s ability to access a network and helping an organization to return to normal. As we stated at the beginning of this book, everyone has a plan until they get hit in the face. The real value of a CSIRT to an organization is not in the plans and playbooks, but in how well they perform when an incident occurs.
The next chapter will expand on the incident investigation portion of incident management by providing the digital forensics framework to which CSIRT personnel adhere.