By Fatskills Exam Guides Team — the exam nerds behind 28,500+ quizzes and 2.1M practice questions across 500+ global exams.
Objective 4.2 Given a scenario, apply the appropriate incident response procedure Incident Response Procedures An incident is a negative event that occurs within the context of the organization. An incident could be the result of a malicious attack, employee complacency, accident, or even an environmental cause. Incident planning and response should be considered a subset of business continuity and disaster recovery planning, albeit on a more defined, narrow scale. Incidents have an impact on the organization, whether it be financial, reputational, technological, or even cultural. It can result in the loss of data, loss of equipment, and even sometimes a loss of life. Incident response is often viewed as a reactive process, but it is quite proactive if performed properly. Conducting an incident response successfully is a result of careful planning, which we’ll discuss in this module. As with many other formal processes, incident response has a defined lifecycle and process.
An example of this process is shown below, taken from the National Institute of Standards and Technology (NIST) Special Publication 800-61. FIGURE: The NIST Incident Response Lifecycle (adapted from NIST SP 800-61)
Although this view of incident response is accurate, there many other lifecycle models as well, but all of them essentially focus in some manner on preparation, detection, containment, recovery, investigation, analysis, and post-incident activities. Exam tip: You are not expected to know the particulars of the NIST incident response lifecycle; however, you should be aware of the general phases of incident response. Preparation Preparation may be the very most important phase of the incident response lifecycle. During this phase, several activities must be carefully planned and executed. Without adequate preparation, the organization cannot expect to have a successful incident response effort.
During this phase, the organization must take the following actions: - Develop the incident response plan. - Appoint incident response team members. - Develop detailed incident response procedures for a variety of scenarios. - Provision the team with the proper supplies and equipment. - Ensure the team is adequately trained on incident response procedures. - Exercise the incident response plan. - Improve the incident response plan from lessons learned through testing and exercise. Training Ideally, the incident response team should be filled with people in the organization who are already trained, experienced, and well-versed in incident response procedures. Most of the time, however, there will be a core team of personnel who have had some level of training or experience with only some aspects of incident response, but usually only two or three who might be seasoned incident response professionals. In addition to incident response training, the team members must be trained in their areas of expertise, such as intrusion detection systems, firewalls, server administration, and so on. The overall incident response plan should include a section on training that details the different areas and tasks that team members should be trained on. This should be in addition to the technical training that they get to support their primary job function.
Examples of topics that team members should be trained on include the following: - Communication skills - Public speaking - Report writing - Business continuity and disaster recovery procedures - Forensics - Relevant legal and governance considerations - Emergency procedures Testing How do you know that your incident response plan actually works? Will you wait until an incident occurs to test it? That’s generally not a good idea. An incident response plan should be tested annually, or at least at a prescribed minimum to meet governance requirements. However, a best practice is to test the plan when it significantly changes or when team members are rotated on or off the team. There are several ways you could test the plan, and you might even test it as part of larger business continuity or disaster recovery response exercise.
The most used types of tests include the following: - Documentation review: Individuals review copies of incident response plans and procedures. - Tabletop exercise: Individuals perform a “paper” exercise with a written scenario without actually performing incident response tasks. - Walkthrough: The team “walks through” the steps of an incident response without performing all the tasks or using all the necessary equipment. - Full test: The team conducts a full incident response test, using all equipment and performing all relevant tasks. Note: If you are used to seeing additional types of tests, such as parallel or full interruption tests, keep in mind that these are unique to disaster recovery tests. For incident response, the types of tests you may encounter simply involve additional detail or depth of testing, versus the ability to determine if redundant equipment and processes can be activated, as during a disaster recovery exercise. Documenting Procedures No aspect of cybersecurity is without the requirement to document processes. This is true in procedures for handling sensitive information, procedures for changing the ruleset on a firewall, and definitely true for business continuity or disaster recovery programs. Incident response is no exception. For a successful response effort to occur, the organization must document all its processes and procedures. Without this documentation, any corporate memory of previous incidents or exercises is lost; there is simply no continuity without documentation. Documentation is a formal process. Certainly, there will be drafts that are circulated for approval, but they must be written to be accurate, concise, and readable by the intended audience (the incident response team), and they must be vetted through members of the team with expertise in the relevant areas. They must also be approved by management. An auditor may encounter many very good incident response plans that are all these things, except they were not approved by management. Without formal management approval of the plan, there is no real plan because there would not be a requirement to follow those procedures with any kind of consistency.
To document these processes and procedures, the organization should do the following: - Use the relevant experts from the team to write the appropriate areas of the incident response plan. - Vet the drafts of the plan among the different team members for editing and consensus. - Ensure the plans are clear and concise. - Ensure procedures are well organized and easy to follow. - Get formal management support and approval for the plan.
In addition to the plan being written out, it must be updated periodically—at least annually is a strong recommendation. However, an update should occur any time there is a major change to the plan, a major turnover of personnel on the incident response team, or major changes in the organization’s technical or operating environment, as well as when an incident response exercise or test has been conducted. Ideally, updates should be made on a regularly scheduled basis, even if none of these things occur.
The incident response plan and related documentation should be considered company proprietary or sensitive information, but all members of the incident response team and any other relevant stakeholders must be able to access and use the plan at any time. Detection and Analysis Although preparation is the first step of the incident response lifecycle, the response effort itself begins when the incident is detected. Detection can happen in many ways. An end user could notice some strange occurrences on their computer, such as sluggishness or applications doing “funny things.” A system administrator could see some curious patterns in log files, or a cybersecurity technician could be alerted through an intrusion detection or prevention system (IDS/IPS) or a security information and event management (SIEM) system. No matter how the incident is detected, as soon as it happens, several things must occur very quickly. First, the frontline responder, typically a cybersecurity analyst, must determine whether or not this is a true incident. If they have insufficient information to make this determination, they should elevate the issue to management or even the incident response team. The incident response team, if provided with enough details, can determine as to whether this is an incident that requires a response or if it’s something that a technician can simply resolve at their level. One of the key issues with detection is response time. If there is an active attack in progress, the time span of detection to response is critical. The longer the incident is allowed to go unchallenged, the more damage can be done to systems and data.
The following are examples of some of the considerations for the first responder who detects or is alerted to an incident as well as for the incident response team lead in determining whether or not to categorize the incident as requiring a response: - Loss or corruption of a small dataset versus a large dataset - Systems suddenly going offline or suffering degradation of performance - Sudden, unexplainable increase in either inbound or outbound network traffic - Widespread user complaints of system sluggishness, data loss, equipment failure, or departure from the normal baseline - Unexplained events monitored by the SIEM, firewall, IDS, or other detection devices - Otherwise known or predetermined incident response triggers
Once the incident has been detected and the event elevated to senior management or the incident response team, containment actions almost immediately follow.
First, however, there is an initial analysis, where the team must determine the who, what, where, when, why, and how of the incident: - What has happened (what is the nature of the incident)? - To which part of the infrastructure has it happened (where did it happen)? - What is the timeframe (start time, duration so far, and so on)? - How did it happen (network attack, malware, massive file deletion, and so on)? - Why did it happen (motivation of the attacker, failure of security controls, and so on)?
Most of these questions will have immediate answers, or at least incomplete answers until the incident has been contained and a thorough investigation and analysis have occurred. However, the team should try to gather as much information as possible going into the response so they can better contain the incident, preserve evidence, and understand what has occurred. In the next few sections, we’ll discuss essential issues with incident detection and analysis. Characteristics Contributing to Severity Level Classification Once the team has determined that there is an incident that warrants a response, there must be an effort to determine the seriousness of the incident. As part of incident response planning, the team should have developed a severity scale that expresses the level of severity of the incident and the corresponding response. This should already have been approved by management in the plan, along with the measured responses for each level.
Depending on the organization, its infrastructure, and how the team is postured on the response, those criteria could include the following: - The scope of impact to the organization and its assets - The volume of data loss - The type of data loss (level of sensitivity) - The level of degradation or destruction of systems - The nature of the threat or attack vector - Any specially defined criteria in the incident response plan Downtime System or business process downtime would be a consideration in determining the severity or scope of the incident. However, this may be unknown until the incident has been contained. The incident response plan should define levels of downtime association with incident severity. The organization should have a measured response for each of those levels, and this part of the plan should derive, to an extent, from the business continuity plan. You could develop a maximum tolerable downtime (MTD) measurement not only for each system but for the overall organization as well and its business processes. This measurement should articulate with the RTO in the business continuity and disaster recovery plans as well. Recovery Time Beyond the maximum tolerable downtime, recovery time measurements should also be included in the plans and would contribute to how severe the incident is considered. Again, interfacing with the business continuity and disaster recovery plans for these items just makes sense. Levels set for recovery time objective (RTO) and recovery point objective (RPO) should be considered when determining the severity of an incident. Exam tip: Remember that the recovery time objective (RTO) is the maximum amount of time the organization can tolerate being down without suffering catastrophic losses to the business. The recovery point objective (RPO) is the maximum amount of data (measured by time) that the organization can stand to lose. These two key measurements can be per system or even per business process. Organizational management sets these two measurements. Integrity Remember that integrity is one of the three goals of security, along with confidentiality and availability. In this case, data integrity refers to how much data (by volume or by defined sensitivity level) is determined to be exposed to unauthorized modification. In addition to a data breach, where data is exfiltrated or disclosed to unauthorized entities, another concern is the modification or destruction of data, which commonly occurs during an attack. Data sensitivity is probably the more important of the two since the volume of data doesn’t necessarily mean that the data is critical to the organization. It can simply be an indicator of a serious attack. In any case, any indication that sensitive data is at risk should be sufficient reason for a credible response. Economic Impact It should go without saying that the economic impact of an incident should be a crucial factor in determining whether a response is warranted, and how large that response should be. Unfortunately, in the initial moments of the incident, the economic impact likely can’t be measured. However, if systems that support critical business processes are seriously compromised, or critical data is lost, it’s highly likely that economic impact will be significant to the organization. During the initial stages, based on the information a response team has at hand, senior managers must start to measure economic impact in terms of downtime of critical systems, loss of the ability to conduct business, and loss of critical information assets. Hopefully, the organization has already conducted a business impact analysis (BIA) during its business continuity and disaster recovery planning, so this information should be readily available. The organization can use criticality information from a BIA to determine exactly what the financial impact will be, which in turn may drive the level of response. Remember that it is not simply a matter of estimating the replacement cost of servers; it’s the overall financial impact to the organization if the servers supporting critical business processes that generate revenue are lost, especially over long periods of time. Caution: If your organization has not already performed a BIA, that should be one of the top priorities it has as part of its overall risk management program. Without a BIA, the organization will have no way of knowing what its critical processes are, what its critical assets are, and how to prioritize their protection and recovery in the event of a disaster or other incident. System Process Criticality Following the discussion on economic impact, system and business process criticality is important in determining the level of response. In that same BIA, the organization should already have determined its critical processes that support its business, the systems and data that support those critical processes, and what the impact would be if either were partially or completely lost for an extended period. Once that has happened, the organization should be able to plan on which critical processes and systems to prioritize for response and recovery should an incident occur. Note: You should already have guessed by now that incident response, business continuity, disaster recovery, and risk management are all interrelated parts of the same process, although they approach it from different directions. The goal of all these things is to preserve the critical assets that support processes that keep the business operational. Reverse Engineering Reverse engineering is an important part of the analytical process when it comes to gauging the intensity and severity of an incident, and later determining its root causes. We’ve already discussed reverse engineering in a previous module, but remember that essentially reverse engineering is decompiling unknown executables down to their basic assembly or machine language level to determine their function, operating characteristics, and potential malicious behaviors. On a higher level, reverse engineering can also be used to start from a known condition, such as the current state of the incident, and work your way backward to determine how an incident occurred, what attack methods may have been used, and how it affected the infrastructure. In either case, remember that reverse engineering is a critical part of the incident analysis. Data Correlation Data correlation essentially takes disparate types of data from a wide variety of different sources and consolidates them for analysis, looking for commonalities and connection points in the different data that can help develop and verify hypotheses as to how the attack occurred. You can correlate data using a variety of methods, but one critical way would be to use the data from a SIEM system, assuming you have one set up that ingests data from a variety of sources. These sources would include log data from systems, data from physical security systems, threat feeds, and so on. Some data doesn’t necessarily lend itself to automated correlation, so you may have to gather that type of data (qualitative or subjective data) manually and pair it with electronic data to discover how an incident may have happened, as well as its timeline in progress. An example of this type of qualitative data might be interviewing your users or system administrators to determine what they observed on the system in question. Yet another type of data might be documentary evidence from previous incidents, both within your organization and others, that discuss attack vectors and how they materialized, what vulnerabilities they took advantage of, and so on. Correlating these two types of information, for example, with the information you can gather from your electronic sources (more factual or quantitative) in your security devices may help you develop answers to the who, what, where, when, why, and how questions that you are asking throughout the response. Containment Containment is the most immediate and highest priority for incident response. During this phase, the incident response team is focused on halting or minimizing any damage to systems or data and stopping an attack. Depending on the nature of the incident, there are several ways to contain it. If the incident involves a network attack, someone may have to decide to shut off the external connection to the Internet; if it is malware, then the decision may have to be made to remove or isolate certain hosts or services. Other types of incidents will require other strategies. The key point here is to stop an incident and minimize its impact. Segmentation and isolation are two of the key containment strategies that are prevalent in most types of incidents, and the two we will discuss here. Segmentation Segmentation means that the infrastructure is broken up into different parts, such as a network into subnetworks or segments. This should be part of a good network design even before the incident. From a performance perspective, segmentation can reduce unnecessary network traffic by eliminating congestion that results from broadcast or collision domains. From a security perspective, segmentation can prevent hosts and different segments from communicating directly with each other. Segmentation can be achieved through either physical or logical means; networks can be physically separated through different cables or devices, and logically separated using virtual LANs, for example, or even through encryption methods. During an incident, networks could be further segmented to contain outbreaks of malware or malicious traffic. This can help minimize the effects of the traffic or malware to only specific parts of the network. Part of the incident response plan should include procedures for rapid reconfiguring of network devices to logically or physically separate critical segments, and these procedures could be executed as soon as the team determines that segmentation beyond what is normally found on the network would assist in limiting the effects of the attack. Isolation Isolation, like segmentation, means separating parts of the network, or even individual hosts, from the rest of the infrastructure. However, isolation goes one step further in that it ensures that a particular network segment or host cannot communicate at all with any other part of the network or any other host. Isolation should be performed to quarantine a host infected with malware, so it won’t spread, or even to segregate a specific network segment where one of the hosts is broadcasting malicious traffic. You may want to isolate either the host or the segment, versus simply shutting them down, to further gather data and analyze the attack without exposing other hosts or network segments to the attack. Removal goes one step further by not only physically and logically isolating a segment or host but actually shutting them down to prevent further damage to any data contained on them. Eradication and Recovery Once an incident is contained and there is significantly less risk of any malware or malicious traffic spreading across the network, the team must focus on eradicating the source of the attack and recovering systems and data. This can be a multistep process, and analysis is still going on during this phase of the incident response. The team must locate the source of the attack, such as malware, and completely get rid of it so that it does not continue to spread. Recovery means restoring the infrastructure to its original pre-incident state as much as possible. In the next few sections, we will discuss different activities involved in eradication and recovery, as well as any potential issues you might encounter. Note: Even as you contain the incident, eradicate the problem, and recover from it, you should always take care to collect and protect any evidence necessary for post-incident analysis and investigation. Even during time-critical incident response, make the time to collect the evidence before it is inadvertently destroyed during the response. Vulnerability Mitigation If the incident has been caused by a system vulnerability of any kind, the eradication strategy is to eliminate that vulnerability. This could include getting rid of malware, securing the configuration of the system, patching systems, and so on. Once you determine the cause of the incident, it’s a good idea to run vulnerability scans that focus on that specific vulnerability to determine which systems may be susceptible to attacks that target that vulnerability. If time is of the essence, as it usually is during an incident, you may not have time to run a complete scan across all network segments and perform widespread mitigations. You’ll likely have to focus on a few specific vulnerabilities that led to the attack and apply those specific patches or configuration changes that will mitigate them. If the organization has not been keeping up with vulnerability mitigation, this is going to take a long time, since the organization will have to update configurations and patches for many different vulnerabilities before it even gets to the one or two that caused the attack. In other words, prioritize your vulnerability mitigation for the most serious vulnerabilities that may have led to the incident. If your organization has already been keeping up with vulnerability management, this should make life much easier. Sanitization One key piece of eradicating the source of an attack is sanitization. Sanitization requires that the incident response team scrub and eliminate any traces of malware from media before a device is reloaded. Sanitization is necessary to keep the incident from reoccurring or malware from spreading to other systems. Because a system that is contaminated is likely to be reimaged with a clean, pristine operating system and backup, it only makes sense that you wouldn’t want to reuse media that could be possibly contaminated with malware. There are several ways to sanitize media for reuse, including the following: - Reformatting/repartitioning media: This is not effective for completely making files unrecoverable, but it can be somewhat effective for eliminating certain malware. - Overwriting media (also called “wiping”): This entails writing patterns of ones and zeros all over the media, making it exceedingly difficult to impossible to recover existing data; this is the preferred method of sanitizing media. Note that these two methods are used if you want to reuse the media, but formatting is not the most effective means of preventing data from being unrecoverable. If you do not intend to reuse the media, we will discuss methods in a later section that you can use when you intend to dispose of it. Reconstruction/Reimaging Once media on a device has been searched for evidence, and cleaned of the malware infection (sanitized), it may be necessary to reconstruct the system using known-good recent backups, assuming they exist. Hopefully, the organization has a good backup strategy in which critical data is backed up on a periodic and frequent basis, which should be based on the system’s criticality. Without this, there is little hope of restoring a system to its pre-incident configuration, and there will be lost data and functionality. If good backups exist that are current, media that’s going to be reused should be reimaged from a known-good master image (often called a “gold” master image, which is the official, up-to-date, secure baseline of the system) and restored from backup data. The organization’s RPO metrics should drive how often the data is backed up in the event of a disaster, because that will determine how much data, in terms of time, is potentially lost. Secure Disposal For media that you do not intend to reuse, it is a best practice to destroy the media or at least render it unusable before you get rid of it. This would prevent any unauthorized person from reconstructing any data remaining on it.
The two most popular methods for destroying media bound for disposal are degaussing and physical destruction: - Degaussing: With degaussing, you subject to the media to a strong magnetic field to destroy any data on it as well as render the read/write heads unusable. - Physical destruction: This method of media destruction can be performed by physically breaking the media using hammers, fire, destruction machinery, and so on. Both of these methods render the media completely unreadable and therefore unable to be reused, which may be necessary for a serious attack in which you want to make sure no data is left on the media or if the media becomes physically damaged during an incident. Patching As mentioned earlier when discussing vulnerability management, patching is critical in containing an incident and eradicating the issues that caused it. If an attacker took advantage of a vulnerability in which there was a patch available but not applied, getting that patch or update installed is crucial. Because you may not know whether the attacker still has a foothold in the network, if you don’t patch the vulnerability, the incident can start all over again. In the interest of time, focus on patching the specific vulnerabilities (if known) that allowed the attacker to compromise the network in the first place. But don’t stop there. Once those are patched, continue to patch any other serious vulnerabilities that could be taken advantage of. Just because the attacker used one or two specific vulnerabilities during the attack, that doesn’t mean they won’t use additional ones to maintain their foothold once they are in the system. Prioritize patching along with tightening insecure configurations. In the interest of time, you may not be able to test patches before they are applied, so it becomes a balancing act of delaying the response, so you can test a patch that may potentially break functionality, or quickly patching to stop an incident from becoming more widespread and damaging more systems and data. Make sure you record in detail all of your actions involved with patching because they are likely going to need to be reviewed post-incident for the configuration control board so that a new baseline can be established or updated. Restoration of Permissions One of the key factors in an attack may be that an attacker gained permissions they should not have had or was able to elevate permissions from a nonprivileged account. During your containment and eradication efforts, it’s important to go and review rights, permissions, and privileges on systems and objects to determine if they have been changed. This is where having a document that details who should have which permissions is helpful. Often, organizations set permissions over time but don’t necessarily record the reasons why different individuals have been given less-restrictive or elevated permissions. With a baseline document that details who should have which permissions and why, it will be easy to identify inconsistent permissions. Once these have been identified, you should make every effort to restore baseline permissions to their original pre-incident configuration. While restoring permissions to contain an incident is your top priority, you will likely also run across instances where permissions were granted over time to people who did not need them. This is called permission creep, and it can occur as people change job positions or need permissions temporarily for specific instances and those permissions are never taken back. Make sure you record any instances of this that you find, but don’t make it your top priority to fix them during the response itself. Just know that you will eventually have to go back and correct those permission issues to help prevent a future attack. Reconstitution of Resources and Restoration of Capabilities and Services Part of the recovery portion of incident response is reconstituting any resources needed to put the business back in operation. This means bringing servers back online, sharing out any network shares that were restricted during the incident response, restarting critical services, and so on. As the incident is contained, you might consider restoring critical services and capabilities first, and gradually phasing in other capabilities and services according to prioritization levels the organization (hopefully) set during its business impact analysis (BIA). A phased approach is a good idea when you can’t bring all services back online at once and you want to make sure you eliminate the possibility of reinfection for malware or sanitize and restore a compromised network segment or host its original state.
While there are specific restoration priorities you should prescribe within your organization, potentially these capabilities and services could include the following: - Network communications (host and device connectivity) - Security services (authentication, network encryption, malware detection) - Internal communications services (VoIP, instant messaging, e-mail) - Network file shares - Network printer services - Application services You could break these down into their component services as well, as determined by your BIA as well as your business continuity and disaster recovery plans. Verification of Logging/Communication to Security Monitoring Once containment, eradication, and recovery processes are complete, you will need to make sure everything is functioning the way it should. There should be a period where you retain a heightened awareness after incident recovery and use this time to verify that the incident has been effectively dealt with.
Activities that can help you do this include the following: - Ensure audit logs are logging the appropriate activity on critical devices and resources. - Make sure security services (such as SIEMs, IDS/IPS, firewalls, proxies, and so on) are correctly logging and monitoring activity across the infrastructure. - Preserve any logs or artifacts generated before and during the incident as evidence. Post-Incident Activities Once the team has contained the incident, eradicated any malware or any other causes, and recovered systems back to the original pre-incident state, the work is still not done. There are many things to do after the incident to help analyze, investigate, document, and learn from it. The goal is to find the cause of the incident, remedy any factors that helped create it, and prevent the incident from occurring again. There’s a temptation to disband or release the team back to their normal jobs, but this should be resisted until all post-incident activities have been concluded, as their expertise is still needed.
Examples of these activities include preparation and analysis of evidence, writing reports, cleaning up after any changes to systems, and ensuring that any new data resulting from the response effort is used to prevent further incidents. We’ll discuss many of these activities in the upcoming sections. Evidence Retention An entire objective of this domain is devoted to incident forensics, but it’s a good idea to discuss this topic here in the current context as well. Collecting and preserving evidence is not only a function of post-incident activities but should also be performed from the very beginning of an incident when evidence is fresh, before it can be overwritten by well-intentioned remediation activities. The incident response team should be trained to identify evidence early on and make every effort to preserve it. Unless the evidence is needed immediately to determine the cause and resolution of the incident, it should be set aside for later analysis.
Key evidence artifacts during an incident could include the following: - Log files - Malware binaries - Disk or file system images - RAM contents - Physical media - Infected files - Network traffic captures The incident response team must be trained in incident forensics procedures and techniques; otherwise, evidence could be contaminated, lost, or destroyed during a well-intentioned response.
To ensure the best chance of identifying, collecting, preserving, and analyzing evidence, the team should focus on the following: - Identifying potential evidence artifacts early in the response - Using approved forensics techniques and tools - Ensuring evidence is captured and preserved before media sanitization - Documenting all aspects of evidence collection, including device names, IP addresses, timestamps, hardware, and so on - Maintaining the chain of custody throughout the evidence collection process - Analyzing evidence using only forensically sound copies, rather than original media - Ensuring evidence is authenticated as true and accurate representations of artifacts discovered on devices and storage media Lessons Learned Report As part of the incident response report, discussed later, the incident response team should develop a “lessons learned” document that details what major lessons were learned from the response and how this information may be used to improve the response for future incidents, as well as prevent similar incidents.
The lessons learned report should include the following: - Information regarding the effectiveness of incident detection, notification, and team assembly - Any identified shortfalls of personnel, equipment, training, or supplies - Information on any identified vulnerabilities or control failures that helped to cause or worsen the incident - Possible solutions for any incident response effort failures - Suggestions for improving the incident response plan and effort
Additionally, the lessons learned document should not attempt to point fingers at anyone who may be to blame for any failures or shortfalls in the response effort. Rather, it should suggest who may be in a better position to address some of these deficiencies. Change Control Process For an organization that has a formal change control process, rapidly responding to changes in configuration for systems affected by the response may not be difficult. As part of the organization’s change and configuration management plan, the incident response should have already been included in those documents and addressed as emergency change procedures. If this is the case, then those emergency change procedures should have been followed, which may have included quick testing of any critical patches or configuration changes, rapid and efficient review and approval by a senior manager, decisive implementation, and documentation. After the incident is over, the changes should be better documented and vetted by senior managers for consideration as temporary changes or as permanent adoption to the system baseline. Communication is important here; members of the team in charge of rapidly changing configurations to contain an incident should be in constant communication with senior managers who both understand and can approve these changes to mitigate the incident. If some changes must be considered provisional, then the full change control board should meet post-incident to consider them as permanent or to weigh the possibility of having to further change the baseline for a more permanent solution. In any event, changes should not be made off the cuff and without consideration of the long-term effects on the organization’s baseline. Incident Response Plan Update During any type of incident response exercise or during an actual incident, issues will arise that possibly weren’t considered when the incident response plan was originally written, or the environment may have changed to the point that the plan has to be updated. Even without incident response exercises or actual incidents, the plan should be updated periodically to ensure it is current and keeps up with the changing technologies and environments the organization works in. Any changes that are identified during an exercise or an actual incident should be seriously considered to improve both the plan and the response effort for future incidents.
This may include items such as the following: - Changing the team composition - Reevaluating team member roles and responsibilities - Determining any additional training needed by team members - Increasing or decreasing supplies or equipment - Changing the type or frequency of exercises or tests - Changing incident response procedures
The plan should be reviewed and updated post-incident by the members of the incident response team, senior management, and other stakeholders. Ultimately, senior management must approve the new plan, and team members must be trained on its changes. Incident Summary Report After the incident, there must be an overall report submitted to management that describes the incident response effort. Writing the report should be a joint responsibility between the incident response team lead, the senior manager in charge of the response, incident response team members, and any other stakeholders whose input is deemed critical to the report.
All available information about the incident should be collected, organized, and summarized in the report. The report should: - Be concise, clear, and properly formatted for the organization - Include an executive summary - Fully describe the details of the incident in chronological and topical order - Explain the results of the investigation and root causes of the incident - Describe any recommendations or mitigations that could prevent or lessen the severity of a future incident - Offer lessons learned regarding the incident or its response - Reach a definitive conclusion about the incident
When writing the incident summary report, team members should keep in mind the technical level of their audience; the report is primarily intended for managers, so it should be written in understandable language. Any technical details required for the report should be included as attachments or appendices. In addition to the incident response team review, the report should be reviewed by other stakeholders before it is presented formally to management. The stakeholders could include the legal department, human resources, and IT. Any inputs from any external agencies, such as law enforcement, should also be considered for inclusion. Indicators of Compromise (IOCs) Generation As part of the post-incident response, the response team should generate and recommend a new list of indicators of compromise to the IT and cybersecurity teams for the organization. These IOCs will assist the respective teams in detecting future incidents by providing them with additional information that could help them identify any early warning signs of the incident. These IOCs can include specific log file entries, file system changes, network traffic anomalies, malware signatures, and so on. This is an important part of preventing future incidents or at least mitigating their severity. Monitoring Monitoring is also an important part of detecting and mitigating future incidents. Given the lessons learned during an incident or an exercise, the response team should be able to provide advice and IOCs to the cybersecurity and IT teams that will help them fine-tune their monitoring process and watch for traffic and other events that might indicate similar incidents. This should also serve to help the organization monitor events that they may not have been previously monitoring. Changes to the firewall ruleset, intrusion detection signatures, and SIEM event analysis, correlation, and rules may be necessary based on what was discovered during the incident. REVIEW Objective 4.2: Given a scenario, apply the appropriate incident response procedure In this module, we discussed a wide range of incident response procedures that are necessary to detect, respond to, contain, eradicate, and recover from an incident. The importance of preparation can’t be overstated since the amount of preparation an organization undertakes directly contributes to its effectiveness in responding to an incident. Preparation includes training qualified personnel for the different roles and responsibilities on the response team, as well as testing a well-written incident response plan and updating that plan when necessary. In addition to the incident response plan, all procedures and processes should be documented as much as possible. Early detection of an incident is critical to an effective response, and several quick decisions must be made early in the incident regarding how severe the incident could be and what its impact could be to the organization. Some of these factors in determining the severity or scope of an incident could include potential system downtime, how long the organization can afford to be offline from an incident (also known as the maximum tolerable downtime, or MTD), as well as data and system integrity. The economic impact to the organization due to loss or degradation of systems and data supporting critical business processes cannot be overstated. Loss of systems or data means that business processes will not function, which in turn means that the organization will be impacted by the loss of revenue, profit, and market standing. This directly relates to how critical systems and data are to business processes, and these are usually ascertained through a business impact assessment. Incident analysis isn’t just confined to post-incident tasks; the incident response team must continually analyze data during the event to determine its severity, scope, cause, and how the team must respond to it. Activities associated with incident analysis include reverse engineering of malware or higher-level events as well as data correlation from disparate sources, both quantitative (for example, electronic files such as logs) and qualitative (for example, interviews with critical systems personnel, documentation review, and systems observations). Containment is the most critical part of incident response; unless the incident is contained, malware could spread throughout the entire infrastructure, data could be compromised, modified, or destroyed, and systems could be rendered unusable. Two important parts of containment are segmenting networks and hosts to prevent the spread of malware or malicious traffic and isolating or removing hosts from the network so that they will not compromise other parts of the network. Eradicating malware or any other cause of the incident is the first goal of containment, followed by recovering systems and data and restoring services. To eradicate the issues that caused the incident, the response team must mitigate any vulnerabilities believed to have caused the incident, sanitize any affected media, and rebuild data through the reinstallation and reimaging of media. Any media that is not intended for reuse or can’t be reused should be disposed of securely, taking care to destroy any remnants of data through wiping or encryption. Another important part of eradication is patching systems that may have been compromised to mitigate the vulnerabilities that may have caused the incident. Investigating and restoring any permission issues to a secure state are also critical steps to prevent a malicious actor from using elevated permissions to continue the attack. Finally, after the source of the incident has been eliminated, the team must ensure that resources and services are restored. These can include network connectivity services, file shares, printer shares, and applications. Additionally, security services must be verified as fully operational, to include appropriate logging and auditing capabilities. The activities the organization performs after an incident are equally as important. This includes collecting and preserving evidence and communicating any lessons learned about the incident or the response to the incident to management. Any changes that have been made to the baseline of affected systems should be documented and either approved as permanent changes to the baseline or considered temporary and additional mitigations put in place. The incident response plan will most certainly change after an exercise or an incident and must be updated to reflect lessons learned or additional information gained during the incident. The team members should also recommend any new monitoring strategies or indicators of compromise to the cybersecurity and IT teams. Finally, the incident report should be written, vetted, and submitted to management. It should be clear and concise, and detail all the different aspects of the incident and its response.
Join 4M+ learners. Unlock unlimited quizzes, wrong-answer tracking, flashcards + reminders, study guides, and 1-on-1 challenges.