Chapter 11, “Building an Incident Response Program,” provided an overview of the steps required to build and implement a cybersecurity incident response program according to the process advocated by the National Institute of Standards and Technology (NIST). In their Computer Security Incident Handling Guide, NIST outlines the four-phase incident response process shown in Figure 14.1.
The remainder of Chapter 11 provided an overview of the Preparation phase of incident response. Chapter 12, “Analyzing Indicators of Compromise,” and Chapter 13, “Performing Forensic Analysis and Techniques,” covered the details behind the Detection and Analysis phase, including sources of cybersecurity information and forensic analysis. This chapter concludes the coverage of CySA+ Domain 4.0: Incident Response with a detailed look at the final two phases of incident response: Containment, Eradication, and Recovery, and Postincident Activity.
The Containment, Eradication, and Recovery phase of incident response moves the organization from the primarily passive incident response activities that take place during the Detection and Analysis phase to more active undertakings. Once the organization understands that a cybersecurity incident is underway, it takes actions designed to minimize the damage caused by the incident and restore normal operations as quickly as possible.
Containment is the first activity that takes place during this phase, and it should begin as quickly as possible after analysts determine that an incident is underway. Containment activities are designed to isolate the incident and prevent it from spreading further. If that phrase sounds somewhat vague, that's because containment means very different things in the context of different types of security incidents. For example, if the organization is experiencing active exfiltration of data from a credit card processing system, incident responders might contain the damage by disconnecting that system from the network, preventing the attackers from continuing to exfiltrate information. On the other hand, if the organization is experiencing a denial-of-service attack against its website, disconnecting the network connection would simply help the attacker achieve its objective. In that case, containment might include placing filters on an upstream Internet connection that blocks all inbound traffic from networks involved in the attack or blocks web requests that bear a certain signature.
Containment activities typically aren't perfect and often cause some collateral damage that disrupts normal business activity. Consider the two examples described in the previous paragraph. Disconnecting a credit card processing system from the network may bring transactions to a halt, causing potentially significant losses of business. Similarly, blocking large swaths of inbound web traffic may render the site inaccessible to some legitimate users. Incident responders undertaking containment strategies must understand the potential side effects of their actions while weighing them against the greater benefit to the organization.
Cybersecurity analysts often use network segmentation as a proactive strategy to prevent the spread of future security incidents. For example, the network shown in Figure 14.2 is designed to segment different types of users from each other and from critical systems. An attacker who is able to gain access to the guest network would not be able to interact with systems belonging to employees or in the datacenter without traversing the network firewall.
You learned how network segmentation is used as a proactive control in a defense-in-depth approach to information security in Chapter 7, “Infrastructure Security and Controls.”
In addition to being used as a proactive control, network segmentation may play a crucial role in incident response. During the early stages of an incident, responders may realize that a portion of systems are compromised but wish to continue to observe the activity on those systems while they determine other appropriate responses. However, they certainly want to protect other systems on the network from those potentially compromised systems.
Figure 14.3 shows an example of how an organization might apply network segmentation during an incident response effort. Cybersecurity analysts suspect that several systems in the datacenter were compromised and built a separate virtual LAN (VLAN) to contain those systems. That VLAN, called the quarantine network, is segmented from the rest of the datacenter network and controlled by very strict firewall rules. Putting the systems on this network segment provides some degree of isolation, preventing them from damaging systems on other segments but allowing continued live analysis efforts.
Although segmentation does limit the access that attackers have to the remainder of the network, it sometimes doesn't go far enough to meet containment objectives. Cybersecurity analysts may instead decide that it is necessary to use stronger isolation practices to cut off an attack. Two primary isolation techniques may be used during a cybersecurity incident response effort: isolating affected systems and isolating the attacker.
Isolating affected systems is, quite simply, taking segmentation to the next level. Affected systems are completely disconnected from the remainder of the network, although they may still be able to communicate with each other and the attacker over the Internet. Figure 14.4 shows an example of taking the quarantine VLAN from the segmentation strategy and converting it to an isolation approach.
Notice that the only difference between Figures 14.3 and 14.4 is where the quarantine network is connected. In the segmentation approach, the network is connected to the firewall and may have some limited access to other networked systems. In the isolation approach, the quarantine network connects directly to the Internet and has no access to other systems. In reality, this approach may be implemented by simply altering firewall rules rather than bypassing the firewall entirely. The objective is to continue to allow the attacker to access the isolated systems but restrict their ability to access other systems and cause further damage.
Isolating the attacker is an interesting variation on the isolation strategy and depends on the use of sandbox systems that are set up purely to monitor attacker activity and that do not contain any information or resources of value to the attacker. Placing attackers in a sandboxed environment allows continued observation in a fairly safe, contained environment. Some organizations use honeypot systems for this purpose. For more information on honeypots, see Chapter 1, “Today's Cybersecurity Analyst.”
Removal of compromised systems from the network is the strongest containment technique in the cybersecurity analyst's incident response toolkit. As shown in Figure 14.5, removal differs from segmentation and isolation in that the affected systems are completely disconnected from other networks, although they may still be allowed to communicate with other compromised systems within the quarantine VLAN. In some cases, each suspect system may be physically disconnected from the network so that they are prevented from communicating even with each other. The exact details of removal will depend on the circumstances of the incident and the professional judgment of incident responders.
The primary objective during the containment phase of incident response is to limit the damage to the organization and its resources. Although that objective may take precedence over other goals, responders may still be interested in gathering evidence during the containment process. This evidence can be crucial in the continuing analysis of the incident for internal purposes, or it can be used during legal proceedings against the attacker.
Chapter 7 provided a thorough review of the forensic strategies that might be used during an incident investigation. Chapter 1 also included information on reverse engineering practices that may be helpful during an incident investigation.
If incident handlers suspect that evidence gathered during an investigation may be used in court, they should take special care to preserve and document evidence during the course of their investigation. NIST recommends that investigators maintain a detailed evidence log that includes the following:
Failure to maintain accurate logs will bring the evidence chain-of-custody into question and may cause the evidence to be inadmissible in court.
Identifying the perpetrators of a cybersecurity incident is a complex task that often leads investigators down a winding path of redirected hosts that crosses international borders. Although you might find IP address records stored in your logs, it is incredibly unlikely that they correspond to the actual IP address of the attacker. Any attacker other than the most rank of amateurs will relay their communications through a series of compromised systems, making it very difficult to trace their actual origin.
Before heading down this path of investigating an attack's origin, it's very important to ask yourself why you are pursuing it. Is there really business value in uncovering who attacked you, or would your time be better spent on containment, eradication, and recovery activities? The NIST Computer Security Incident Handling Guide addresses this issue head on, giving the opinion that “Identifying an attacking host can be a time-consuming and futile process that can prevent a team from achieving its primary goal—minimizing the business impact.”
Law enforcement officials may approach this situation with objectives that differ from those of the attacked organization's cybersecurity analysts. After all, one of the core responsibilities of law enforcement organizations is to identify criminals, arrest them, and bring them to trial. That responsibility may conflict with the core cybersecurity objectives of containment, eradication, and recovery. Cybersecurity and business leaders should take this conflict into consideration when deciding whether to involve law enforcement agencies in an incident investigation and the degree of cooperation they will provide to an investigation that is already underway.
Once the cybersecurity team successfully contains an incident, it is time to move on to the eradication phase of the response. The primary purpose of eradication is to remove any of the artifacts of the incident that may remain on the organization's network. This could include the removal of any malicious code from the network, the sanitization of compromised media, and the securing of compromised user accounts.
The recovery phase of incident response focuses on restoring normal capabilities and services. It includes reconstituting resources and correcting security control deficiencies that may have led to the attack. This could include rebuilding and patching systems, reconfiguring firewalls, updating malware signatures, and similar activities. The goal of recovery is not just to rebuild the organization's network but also to do so in a manner that reduces the likelihood of a successful future attack.
During the eradication and recovery effort, cybersecurity analysts should develop a clear understanding of the incident's root cause. This is critical to implementing a secure recovery that corrects control deficiencies that led to the original attack. After all, if you don't understand how an attacker breached your security controls in the first place, it will be hard to correct those controls so that the attack doesn't reoccur! Understanding the root cause of an attack is a completely different activity than identifying the attacker. Root cause assessment is a critical component of incident recovery whereas, as mentioned earlier, identifying the attacker can be a costly distraction.
Root cause analysis also helps an organization identify other systems they operate that might share the same vulnerability. For example, if an attacker compromises a Cisco router and root cause analysis reveals an error in that device's configuration, administrators may correct the error on other routers they control to prevent a similar attack from compromising those devices.
During an incident, attackers may compromise one or more systems through the use of malware, web application attacks, or other exploits. Once an attacker gains control of a system, security professionals should consider it completely compromised and untrustworthy. It is not safe to simply correct the security issue and move on because the attacker may still have an undetected foothold on the compromised system. Instead, the system should be rebuilt, either from scratch or by using an image or backup of the system from a known secure state.
Rebuilding and/or restoring systems should always be done with the incident root cause analysis in mind. If the system was compromised because it contained a security vulnerability, as opposed to through the use of a compromised user account, backups and images of that system likely have that same vulnerability. Even rebuilding the system from scratch may reintroduce the earlier vulnerability, rendering the system susceptible to the same attack. During the recovery phase, administrators should ensure that rebuilt or restored systems are remediated to address known security issues.
During the incident recovery effort, cybersecurity analysts will patch operating systems and applications involved in the attack. This is also a good time to review the security patch status of all systems in the enterprise, addressing other security issues that may lurk behind the scenes.
Cybersecurity analysts should first focus their efforts on systems that were directly involved in the compromise and then work their way outward, addressing systems that were indirectly related to the compromise before touching systems that were not involved at all. Figure 14.6 shows the phased approach that cybersecurity analysts should take to patching systems and applications during the recovery phase.
During the recovery effort, cybersecurity analysts may need to dispose of or repurpose media from systems that were compromised during the incident. In those cases, special care should be taken to ensure that sensitive information that was stored on that media is not compromised. Responders don't want the recovery effort from one incident to lead to a second incident!
Generally speaking, there are three options available for the secure disposition of media containing sensitive information: clear, purge, and destroy. NIST defines these three activities clearing in NIST SP 800-88: Guidelines for Media Sanitization:
These three levels of data disposal are listed in increasing order of effectiveness as well as difficulty and cost. Physically incinerating a hard drive, for example, removes any possibility that data will be recovered but requires the use of an incinerator and renders the drive unusable for future purposes.
Figure 14.7 shows a flowchart designed to help security decision makers choose appropriate techniques for destroying information and can be used to guide incident recovery efforts. Notice that the flowchart includes a Validation phase after efforts to clear, purge, or destroy data. Validation ensures that the media sanitization was successful and that remnant data does not exist on the sanitized media.
Before concluding the recovery effort, incident responders should take time to verify that the recovery measures put in place were successful. The exact nature of this verification will depend on the technical circumstances of the incident and the organization's infrastructure. Four activities that should always be included in these validation efforts follow:
These actions form the core of an incident recovery validation effort and should be complemented with other activities that validate the specific controls put in place during the Containment, Eradication, and Recovery phase of incident response.
After the immediate, urgent actions of containment, eradication, and recovery are complete, it is very tempting for the CSIRT to take a deep breath and consider their work done. While the team should take a well-deserved break, the incident response process is not complete until the team completes postincident activities that include managing change control processes, conducting a lessons learned session, and creating a formal written incident report.
During the containment, eradication, and recovery process, responders may have bypassed the organization's normal change control and configuration management processes in an effort to respond to the incident in an expedient manner. These processes provide important management controls and documentation of the organization's technical infrastructure. Once the urgency of response efforts pass, the responders should turn back to these processes and use them to document any emergency changes made during the incident response effort.
At the conclusion of every cybersecurity incident, everyone involved in the response should participate in a formal lessons learned session that is designed to uncover critical information about the response. This session also highlights potential deficiencies in the incident response plan and procedures. For more information on conducting the post-incident lessons learned session, see the “Lessons Learned Review” section in Chapter 11.
During the lessons learned session, the organization may uncover potential changes to the incident response plan. In those cases, the leader should propose those changes and move them through the organization's formal change process to improve future incident response efforts.
During an incident investigation, the team may encounter new indicators of compromise (IOCs) based on the tools, techniques, and tactics used by attackers. As part of the lessons learned review, the team should clearly identify any new IOC and make recommendations for updating the organization's security monitoring program to include those IOCs. This will reduce the likelihood of a similar incident escaping attention in the future.
Every incident that activates the CSIRT should conclude with a formal written report that documents the incident for posterity. This serves several important purposes. First, it creates an institutional memory of the incident that is useful when developing new security controls and training new security team members. Second, it may serve as an important record of the incident if there is legal action that results from the incident. Finally, the act of creating the written report can help identify previously undetected deficiencies in the incident response process that may feed back through the lessons learned process.
Important elements that the CSIRT should cover in a postincident report include the following:
Incident summary reports should be classified in accordance with the organization's classification policy and stored in an appropriately secured manner. The organization should also have a defined retention period for incident reports and destroy old reports when they exceed that period.
At the conclusion of an incident, the team should make a formal determination about the disposition of evidence collected during the incident. If the evidence is no longer required, then it should be destroyed in accordance with the organization's data disposal procedures. If the evidence will be preserved for future use, it should be placed in a secure evidence repository with the chain of custody maintained.
The decision to retain evidence depends on several factors, including whether the incident is likely to result in criminal or civil action and the impact of the incident on the organization. This topic should be directly addressed in an organization's incident response procedures.
After identifying a security incident in progress, CSIRT members should move immediately into the containment, eradication, and recovery phase of incident response. The first priority of this phase is to contain the damage caused by a security incident to lower the impact on the organization. Once an incident is contained, responders should take actions to eradicate the effects of the incident and recovery normal operations. Once the immediate response efforts are complete, the CSIRT should move into the postincident phase, conduct a lessons learned session, and create a written report summarizing the incident response process.
Explain the purpose of containment activities. After identifying a potential incident in progress, responders should take immediate action to contain the damage. They should select appropriate containment strategies based on the nature of the incident and impact on the organization. Potential containment activities include network segmentation, isolation, and removal of affected systems.
Know the importance of collecting evidence during a response. Much of the evidence of a cybersecurity incident is volatile in nature and may not be available later if not collected during the response. CSIRT members must determine the priority that evidence collection will take during the containment, eradication, and recovery phase and then ensure that they properly handle any collected evidence that can later be used in legal proceedings.
Explain how identifying attackers can be a waste of valuable resources. Most efforts to identify the perpetrators of security incidents are futile, consuming significant resources before winding up at a dead end. The primary focus of incident responders should be on protecting the business interests of the organization. Law enforcement officials have different priorities, and responders should be aware of potentially conflicting objectives.
Explain the purpose of eradication and recovery. After containing the damage, responders should move on to eradication and recovery activities that seek to remove all traces of an incident from the organization's network and restore normal operations as quickly as possible. This should include validation efforts that verify security controls are properly implemented before closing the incident.
Define the purpose of postincident activities. At the conclusion of a cybersecurity incident response effort, CSIRT members should conduct a formal lessons learned session that reviews the entire incident response process and recommends changes to the organization's incident response plan, as needed. Any such changes should be made through the organization's change control process. The team should also complete a formal incident summary report that serves to document the incident for posterity. Other considerations during this process include evidence retention, indicator of compromise (IoC) generation, and ongoing monitoring.
Label each one of the following figures with the type of incident containment activity pictured.
________________________________________
________________________________________
________________________________________
For each of the following incident response activities, assign it to one of the following CompTIA categories:
Remember that the categories assigned by CompTIA differ from those used by NIST and other incident handling standards.
Fill in the flowchart with the appropriate dispositions for information being destroyed following a security incident.
Each box should be completed using one of the following three words: