# Operations
<a name="operations"></a>

 Operations is the core of performing incident response. This is where the actions of responding and remediating security incidents occur. Operations includes the following five phases: *detection*, *analysis*, *containment*, *eradication*, and *recovery*. Descriptions of these phases and the goals can be found in Table 3.

* Table 3 – Operations phases*


|  Phase  |  Goal  | 
| --- | --- | 
|  Detection  |  Identify a potential security event.  | 
|  Analysis  |  Determine if security event is an incident and assess the scope of the incident.  | 
|  Containment  |  Minimize and limit the scope of the security event.  | 
|  Eradication  |  Remove unauthorized resources or artifacts related to the security event. Implement mitigations that caused the security incident.  | 
|  Recovery  |  Restore systems to known safe state and monitor these systems to verify that the threat does not return.  | 

 The phases should serve as guidance when you respond to and operate on security incidents in order to respond in an effective and robust way. The actual actions you take will vary depending on the incident. An incident involving ransomware, for example, will have a different set of response steps to follow than an incident involving a public Amazon S3 bucket. Additionally, these phases do not necessarily happen sequentially. After containment and eradication, you might need to return to analysis to understand if your actions were effective. 

# Detection
<a name="detection"></a>

 An alert is the main component of the detection phase. It generates a notification to initiate the incident response process based on AWS account activity of interest. 

 Alerting accuracy is challenging; it’s not always possible to determine with complete certainty if an incident has occurred, is in progress, or if it will happen in the future. Here are a few reasons: 
+  Detection mechanisms are based on baseline deviation, known patterns, and notification from internal or external entities. 
+  Because of the unpredictable nature of technology and people, respectively *the means* and *the actors* of security incidents, baselines change over time. Rogue patterns emerge through novel or modified threat actor *tactics*, *techniques*, and *procedures* (TTPs). 
+  Changes to people, technology, and processes are not immediately incorporated into the incident response process. Some are discovered during the progress of an investigation. 

# Alert sources
<a name="alert-sources"></a>

 You should consider using the following sources to define alerts: 
+ ** Findings** – AWS services such as [Amazon GuardDuty](https://aws.amazon.com/guardduty/), [AWS Security Hub CSPM](https://aws.amazon.com/security-hub/), [Amazon Macie](https://aws.amazon.com/macie/), [Amazon Inspector](https://aws.amazon.com/inspector/), [AWS Config](https://aws.amazon.com/config/), [IAM Access Analyzer](https://docs.aws.amazon.com/IAM/latest/UserGuide/what-is-access-analyzer.html), and [Network Access Analyzer](https://docs.aws.amazon.com/vpc/latest/network-access-analyzer/what-is-vaa.html) generate findings that can be used to craft alerts.
+ ** Logs** – AWS service, infrastructure, and application logs stored in Amazon S3 buckets and CloudWatch log groups can be parsed and correlated to generate alerts. 
+ ** Billing activit**y – A sudden change in billing activity can indicate a security event. Follow the documentation on [Creating a billing alarm to monitor your estimated AWS charges](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/monitor_estimated_charges_with_cloudwatch.html) to monitor for this. 
+ ** Cyber threat intelligence **– If you subscribe to a third-party cyber threat intelligence feed, you can correlate that information with other logging and monitoring tools to identify potential indicators of events. 
+ ** Partner tools** – Partners in the AWS Partner Network (APN) offer top-tier products that can help you meet your security objectives. For incident response, partner products with endpoint detection and response (EDR) or SIEM can help support your incident response objectives. For more information, see [Security Partner Solutions](https://aws.amazon.com/security/partner-solutions/) and [Security Solutions in the AWS Marketplace](https://aws.amazon.com/marketplace/solutions/security). 
+ **AWS trust and safety** – Support might contact customers if we identify abusive or malicious activity.
+ ** One-time contact** – Because it can be your customers, developers, or other staff in your organization who notice something unusual, it’s important to have a well-known, well-publicized method of contacting your security team. Popular choices include ticketing systems, contact email addresses, and web forms. If your organization works with the general public, you might also need a public-facing security contact mechanism. 

 For more information about cloud capabilities that you can use during your investigations, refer to [Appendix A: Cloud capability definitions](appendix-a-cloud-capability-definitions.md) in this document. 

# Detection as part of security control engineering
<a name="detection-as-security-control-engineering"></a>

 Detection mechanisms are an integral part of security control development. As *directive* and *preventative* controls are defined, related *detective* and *responsive* controls should be constructed. As an example, an organization establishes a directive control related to the root user of an AWS account, which should only be used for specific and very well-defined activities. They associate it with a preventative control implemented by using an AWS organization’s service control policy (SCP). If root user activity beyond the expected baseline happens, a detective control implemented with an EventBridge rule and SNS topic will alert the security operations center (SOC). The responsive control entails the SOC selecting the appropriate playbook, performing analysis, and working until the incident is resolved. 

 Security controls are best defined by threat modeling of workloads running in AWS. The criticality of detective controls will be set by looking at the business impact analysis (BIA) for the particular workload. Alerts generated by detective controls are not handled as they come in, but rather based on its initial criticality, to be adjusted during analysis. The initial criticality set is an aid for prioritization; the context in which the alert happened will determine its true criticality. As an example, an organization uses Amazon GuardDuty as a component of the detective control used for EC2 instances that are part of a workload. The finding `Impact:EC2/SuspiciousDomainRequest.Reputation` is generated, informing you that the listed Amazon EC2 instance within your workload is querying a domain name that is suspected of being malicious. This alert is set by default as low severity, and as the analysis phase progresses, it was determined that several hundred EC2 instances of type `p4d.24xlarge` have been deployed by an unauthorized actor, significantly increasing the organization’s operating cost. At this point, the incident response team makes the decision to adjust the criticality of this alert to *high*, increasing the sense of urgency and expediting further actions. Note that the GuardDuty finding severity cannot be changed. Rather, the organization’s alert based on the finding will have to be criticality adjusted. 

# Detective control implementations
<a name="detective-control-implementations"></a>

 It is important to understand how detective controls are implemented because they help determine how the alert will be used for the particular event. There are two main implementations of technical detective controls: 
+  **Behavioral detection** relies on mathematical models commonly referred to as machine learning (ML) or artificial intelligence (AI). The detection is made by inference; therefore, the alert might not necessarily reflect an actual event. 
+  **Rule-based detection** is deterministic; customers can set the exact parameters of what activity to be alerted on, and that is certain. 

 Modern implementations of detective systems, such as an intrusion detection system (IDS), generally come with both mechanisms. Following are some examples for rule-based and behavioral detections with GuardDuty. 
+  When the finding `Exfiltration:IAMUser/AnomalousBehavior` is generated, it informs you that “an anomalous API request was observed in your account.” As you look further into the documentation, it tells you that “The ML model evaluates all API requests in your account and identifies anomalous events that are associated with techniques used by adversaries,” indicating that this finding is of behavioral nature. 
+  For the finding `Impact:S3/MaliciousIPCaller`, GuardDuty is analyzing API calls from the Amazon S3 service in CloudTrail, comparing the `SourceIPAddress` log element with a table of public IP addresses that includes threat intelligence feeds. Once it finds a direct match to an entry, it generates the finding. 

 We recommend implementing a mix of both behavioral and rule-based alerting because it is not always possible to implement rule-based alerting for every activity within your threat model. 

# People-based detection
<a name="people-based-detection"></a>

 Up to this point, we have discussed technology-based detection. The other important source of detection comes from people inside or outside the customer’s organization. *Insiders* can be defined as an employee or contractor, and *outsiders* are entities such as security researchers, law enforcement, the news, and social media. 

 Though technology-based detection can be systematically configured, people-based detection comes in a variety of forms such as emails, tickets, mail, news posts, telephone calls, and in-person interactions. Technology-based detection notifications can be expected to be delivered in near real-time, but there are no timeline expectations for people-based detection. It is imperative that the security culture incorporates, facilitates, and empowers people-based detection mechanisms for a defense-in-depth approach to security. 

# Summary
<a name="detection-summary"></a>

 With detection, it’s important to have a mix of rule-based and behavioral driven alerting. Additionally, you should have mechanisms in place for people both internally and externally to submit a ticket about a security issue. Humans can be one of the most valuable sources for security events, so it’s important to have processes in place for people to escalate concerns. You should use threat models of your environment to get started with building detections. Threat models will help you build alerts based on threats that are most relevant to your environment. Lastly, you should use frameworks such as MITRE ATT&CK to understand threat actor tactics, techniques, and procedures (TTPs). The MITRE ATT&CK framework can be helpful to use as a common language across your various detection mechanisms. 

# Analysis
<a name="analysis"></a>

 Logs, query capabilities, and threat intelligence are a few of the supporting components required by the analysis phase. Many of the same logs used for detection are also used for analysis and will require onboarding and configuration of querying tools. 

# Validate, scope, and assess impact of alert
<a name="validate-scope-assess-alert-impact"></a>

 During the analysis phase, comprehensive log analysis is performed with the goal to validate alerts, define scope, and assess impact of the possible compromise. 
+  *Validation* of the alert is the entry point of the analysis phase. Incident responders will be looking for log entries from various sources and directly engaging with owners of the affected workload. 
+  *Scoping* is the next step, when all resources involved are inventoried and alert criticality is adjusted after stakeholders agree that it is unlikely to be a false-positive. 
+  Finally,* impact analysis* details the actual business disruption. 

Once the affected workload components are identified, scoping results can be correlated with the related workload’s recovery point objective (RPO) and recovery time objective (RTO), adjusting for alert criticality, which will initiate resource allocation and all activity happening next. Not all incidents will directly disrupt operations of a workload supporting a business process. Incidents such as sensitive data disclosure, intellectual property theft, or resource hijacking (as in cryptocurrency mining) might not stop or debilitate a business process immediately, but can result in consequences at a later time.

# Enrich security logs and findings
<a name="enrich-security-logs-and-findings"></a>

## Enrichment with threat intelligence and organizational context
<a name="enrichment-with-threat-intelligence"></a>

 During the course of analysis, observables of interest require enrichment for enhanced contextualization of the alert. As stated in the Preparation section, integrating and leveraging cyber threat intelligence can be helpful to understand more about a security finding. Threat intelligence services are used to assign reputation and attribute ownership to public IP addresses, domain names, and file hashes. These tools are available as paid and no charge services. 

 Customers adopting Amazon Athena as a log querying tool gain the advantage of AWS Glue jobs to load threat intelligence information as tables. The threat intelligence tables can be used in SQL queries to correlate log elements such as IP addresses and domain names, providing an enriched view of the data to be analyzed. 

 AWS does not provide threat intelligence directly to customers, but services such as Amazon GuardDuty makes use of threat intelligence for enrichment and finding generation. You can also upload custom threat lists to GuardDuty based on your own threat intelligence. 

## Enrichment with automation
<a name="enrichment-with-automation"></a>

 Automation is an integral part of AWS Cloud governance. It can be used throughout the various phases of the incident response lifecycle. 

 For the detection phase, rule-based automation matches patterns of interest from the threat model in logs and takes appropriate action, such as sending notifications. The analysis phase can leverage the detection mechanism and forward the alert body to an engine capable of querying logs and enriching observables for contextualization of the event. 

 The alert body, in its fundamental form, is comprised of a *resource* and an *identity*. As an example, you could implement an automation to query CloudTrail for AWS API activity performed by the alert body’s identity or resource around the time of the alert, providing additional insights including `eventSource`, `eventName`, `SourceIPAddress`, and `userAgent` of identified API activity. By performing these queries in an automated way, responders can save time during triage and get additional context to help make better informed decisions. 

 Refer to the [How to enrich AWS Security Hub findings with account metadata](https://aws.amazon.com/blogs/security/how-to-enrich-aws-security-hub-findings-with-account-metadata/) blog post for an example on how to use automation to enrich security findings and simplify analysis. 

# Collect and analyze forensic evidence
<a name="collect-analyze-forensic-evidence"></a>

 Forensics, as mentioned in the [Preparation](preparation.md) section of this document, is the process of collecting and analyzing artifacts during incident response. On AWS, it is applicable to infrastructure domain resources such as network traffic packet captures, operating system memory dump, and for service domain resources such as AWS CloudTrail logs. 

 The forensics process has the following fundamental characteristics: 
+  **Consistent** – It follows the exact steps documented, without deviations. 
+  **Repeatable** – It produces the exact same results when repeated against the same artifact. 
+  **Customary** – It’s publicly documented and widely adopted. 

 It is important to maintain chain of custody for artifacts collected during incident response. Using automation and having automatic documentation of this collection generated can help, in addition to storing the artifacts in read-only repositories. Analysis should only be performed on exact replicas of the collected artifacts to maintain integrity. 

# Collect relevant artifacts
<a name="collect-relevant-artifacts"></a>

 With these characteristics in mind, and based on the relevant alerts and assessment of impact and scope, you will need to collect the data that will be relevant to further investigation and analysis. Various types and sources of data that might be relevant to investigation, including service/control plane logs (CloudTrail, Amazon S3 data events, VPC Flow Logs), data (Amazon S3 metadata and objects), and resources (databases, Amazon EC2 instances). 

 Service/control plane logs can be collected for local analysis or, ideally, directly queried using native AWS services (where applicable). Data (including metadata) can be directly queried to obtain relevant information or to acquire the source objects; for example, use the AWS CLI to acquire Amazon S3 bucket and object metadata and directly acquire source objects. Resources need to be collected in a manner consistent with the resource type and intended method of analysis. For example, databases can be collected by creating a copy/snapshot of the system running the database, creating a copy/snapshot of the entire database itself, or querying and extracting certain data and logs from the database relevant to the investigation. 

 For Amazon EC2 instances, there is a specific set of data that should be collected and a specific order to collection that should be performed in order to acquire and preserve the most amount of data for analysis and investigation. 

 Specifically, the order for response to acquire and preserve the most amount of data from an Amazon EC2 instance is the following: 

1.  **Acquire instance metadata** – Acquire instance metadata relevant to the investigation and data queries (instance ID, type, IP address, VPC/subnet ID, Region, Amazon Machine Image (AMI) ID, security groups attached, launch time). 

1.  **Enable instance protections and tags** – Enable instance protections like termination protection, setting shutdown behavior to stop (if set to terminate), disabling Delete on Termination attributes for the attached EBS volumes, and applying appropriate tags for both visual denotation and use in possible response automations (for example, upon applying a tag with name of `Status` and value of `Quarantine`, perform forensic acquisition of data and isolate the instance). 

1. **Acquire disk (EBS snapshots)** – Acquire an EBS snapshot of the attached EBS volumes. Each snapshot contains the information that you need to restore your data (from the moment when the snapshot was taken) to a new EBS volume. See the step to perform live response/artifact collection if you’re using instance store volumes. 

1. **Acquire memory** – Because EBS snapshots only capture data that has been written to your Amazon EBS volume, which might exclude data that is stored or cached in memory by your applications or OS, it is imperative to acquire a system memory image using an appropriate third-party open-source or commercial tool in order to acquire available data from the system. 

1. **(Optional) Perform live response/artifact collection** – Perform targeted data collection (disk/memory/logs) through live response on the system only if disk or memory is unable to be acquired otherwise, or there is a valid business or operational reason. Doing this will modify valuable system data and artifacts. 

1. **Decommission the instance** – Detach the instance from auto scaling groups, deregister the instance from load balancers, and adjust or apply a pre-built instance profile with minimized or no permissions. 

1. **Isolate or contain the instance **– Verify that the instance is effectively isolated from other systems and resources within the environment by ending and preventing current and future connections to and from the instance. Refer to the [Containment](containment.md) section of this document for more details. 

1. **Responder’s choice** – Based on the situation and goals, select one of the following: 
   +  Decommission and shut down the system (recommended). 

      Shut the system down once the available evidence has been acquired in order to verify the most effective mitigation against a possible future impact to the environment by the instance. 
   +  Continue running the instance within an isolated environment instrumented for monitoring. 

      Though it is not recommended as a standard approach, if a situation merits continued observation of the instance (such as when additional data or indicators are needed to perform comprehensive investigation and analysis of the instance), you might consider shutting down the instance, creating an AMI of the instance, and re-launching the instance in your dedicated forensics account within a sandbox environment that is pre-instrumented to be completely isolated and configured with instrumentation to facilitate continuous monitoring of the instance (for example, VPC Flow Logs or VPC Traffic Mirroring). 

**Note**  
 It is essential to capture memory before live response activities or system isolation or shutdown in order to capture available volatile (and valuable) data. 

# Develop narratives
<a name="develop-narratives"></a>

 During analysis and investigation, document the actions taken, analysis performed, and information identified, to be used by the subsequent phases and ultimately a final report. These narratives should be succinct and precise, confirming that relevant information is included to verify effective understanding of the incident and to maintain an accurate timeline. They are also helpful when you engage people outside of the core incident response team. Here is an example: 

****  
 *The marketing and sales department received a ransom note on March 15th, 2022 demanding payment in cryptocurrency to avoid public posting of possible sensitive data. The SOC determined that the Amazon RDS database belonging to marketing and sales was publicly accessible on February 20th, 2022. The SOC queried RDS access logs and determined that IP address 198.51.100.23 was used on February 20th, 2022 with the credentials `mm03434` belonging to *Major Mary*, one of the web developers. The SOC queried VPC Flow Logs and determined that approximately 256MB of data egressed to the same IP address at the same date (time stamp 2022-02-20T15:50\$100Z). The SOC determined through open-source threat intelligence that the credentials are currently available in plain text in the public repository `https[:]//example[.]com/majormary/rds-utils`.* 

# Containment
<a name="containment"></a>

 One definition of containment, as it relates to incident response, is the process or implementation of a strategy during the handling of a security event that acts to minimize the scope of the security event and contain the effects of unauthorized usage within the environment. 

 A containment strategy depends on a myriad of factors and can be different from one organization to another in terms of application of containment tactics, timing, and purpose. The [NIST SP 800-61 Computer Security Incident Handling Guide](https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final) outlines several criteria for determining the appropriate containment strategy, which include: 
+  Potential damage to and theft of resources 
+  Need for evidence preservation 
+  Service availability (network connectivity, services provided to external parties) 
+  Time and resources needed to implement the strategy 
+  Effectiveness of the strategy (partial or full containment) 
+  Duration of the solution (emergency workaround to be removed in four hours, temporary workaround to be removed in two weeks, permanent solution) 

 Regarding services on AWS, however, the fundamental containment steps can be distilled down to three categories: 
+ ** Source containment** – Use filtering and routing to prevent access from a certain source. 
+ ** Technique and access containment **– Remove access to prevent unauthorized access to the affected resources. 
+ ** Destination containment **– Use filtering and routing to prevent access to a target resource. 

# Source containment
<a name="source-containment"></a>

 Source containment is the use and application of filtering or routing within an environment to prevent access to resources from a specific source IP address or network range. Examples of source containment using AWS services are highlighted here: 
+ **Security groups** – Creating and applying isolation security groups to Amazon EC2 instances or removing rules from an existing security group can help to contain unauthorized traffic to an Amazon EC2 instance or AWS resource. It is important to note that existing tracked connections won’t be shut down as a result of changing security groups – only future traffic will be effectively blocked by the new security group (refer to [this Incident Response Playbook](https://github.com/aws-samples/aws-customer-playbook-framework/blob/main/docs/Ransom_Response_EC2_Linux.md) and [Security group connection tracking](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/security-group-connection-tracking.html) for additional information on tracked and untracked connections). 
+ **Policies** – Amazon S3 bucket policies can be configured to block or allow traffic from an IP address, a network range, or a VPC endpoint. Policies create the ability to block suspicious addresses and access to the Amazon S3 bucket. Additional information on bucket policies can be found at [Adding a bucket policy using the Amazon S3 console](https://docs.aws.amazon.com/AmazonS3/latest/userguide/add-bucket-policy.html).
+ **AWS WAF **– Web access control lists (web ACLs) can be configured on AWS WAF to provide fine-grained control over web requests that resources respond to. You can add an IP address or network range to an IP set configured on AWS WAF, and apply match conditions, such as block, to the IP set. This will block web requests to a resource if the IP address or network ranges from the originating traffic match those configured in the IP set rules. 

 An example of source containment can be seen in the following diagram with an incident response analyst modifying a security group of an Amazon EC2 instance in order to restrict new connections to only certain IP addresses. As stated in the security groups bullet, existing tracked connections won’t be shut down as a result of changing security groups. 

![\[Diagram showing a source containment example\]](http://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/images/source-containment-example.png)


# Technique and access containment
<a name="technique-access-containment"></a>

 Prevent unauthorized use of a resource by limiting the functions and IAM principals with access to the resource. This includes restricting the permissions of IAM principals that have access to the resource; it also includes temporary security credentials revocation. Examples of technique and access containment using AWS services are highlighted here: 
+ **Restrict permissions** – Permissions assigned to an IAM principal should follow the [Principle of Least Privilege](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege). However, during an active security event, you might need to restrict access to a targeted resource from a specific IAM principal even further. In this case, it is possible to contain access to a resource by removing the permissions from the IAM principal to be contained. This is done with the IAM service and can be applied using AWS Management Console, the AWS CLI, or an AWS SDK. 
+ **Revoke keys** – IAM access keys are used by IAM principals to access or manage resources. These are long-term static credentials to sign programmatic requests to the AWS CLI or AWS API and begin with the prefix *AKIA* (for additional information, refer to the *Understanding unique ID prefixes* section in [IAM identifiers](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html)). To contain access for an IAM principal where an IAM access key has been compromised, the access key can be deactivated or deleted. It is important to note the following: 
  +  An access key can be reactivated after it has been deactivated. 
  +  An access key is not recoverable once it has been deleted. 
  +  An IAM principal can have up to two access keys at any given time. 
  +  Users or applications using the access key will lose access once the key is either deactivated or deleted. 
+ **Revoke temporary security credentials** – Temporary security credentials can be employed by an organization to control access to AWS resources and begin with the prefix *ASIA* (for additional information, see the *Understanding unique ID prefixes* section in [IAM identifiers](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html)). Temporary credentials are typically used by IAM roles and do not have to be rotated or explicitly revoked because they have a limited lifetime. In cases where a security event occurs involving a temporary security credential before the temporary security credential expiration, you might need to alter the effective permissions of the existing temporary security credentials. This can be completed [using the IAM service within AWS Management Console](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_revoke-sessions.html). Temporary security credentials can also be issued to IAM users (as opposed to IAM roles); however, as of the time of this writing, there is no option to revoke the temporary security credentials for an IAM user within the AWS Management Console. For security events where a user’s IAM access key is compromised by an unauthorized user who created temporary security credentials, the temporary security credentials can be revoked using two methods: 
  +  Attach an inline policy to the IAM user that prevents access based on the security token issue time (refer to the *Denying access to temporary security credentials issued before a specific time* section in [Disabling permissions for temporary security credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_control-access_disable-perms.html) for more detail). 
  +  Delete and recreate the IAM user with the compromised access keys. 
+ **AWS WAF** - Certain techniques employed by unauthorized users include common malicious traffic patterns, such as requests that contain SQL injection and cross-site scripting (XSS). AWS WAF can be configured to match and deny traffic employing these techniques using the AWS WAF built-in rule statements. 

 An example of technique and access containment can be seen in the following diagram, with an incident responder rotating access keys or removing an IAM policy to prevent an IAM user from accessing an Amazon S3 bucket. 

![\[Diagram showing a technique and access containment example\]](http://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/images/technique-and-access-containment.png)


# Destination containment
<a name="destination-containment"></a>

 Destination containment is the application of filtering or routing within an environment to prevent access to a targeted host or resource. In some cases, destination containment also involves a form of resiliency to verify that legitimate resources are replicated for availability; resources should be detached from these forms of resiliency for isolation and containment. Examples of destination containment using AWS services include: 
+ **Network ACLs **– Network ACLs (NACLs) that are configured on subnets that contain AWS resources can have deny rules added. These deny rules can be applied to prevent access to a particular AWS resource; however, applying an NACL will affect every resource on the subnet, not only the resources that are being accessed without authorization. Rules listed within an NACL are processed in top-down order, so the first rule in an existing NACL should be configured to deny unauthorized traffic to the targeted resource and subnet. Alternatively, a completely new NACL can be created with a single deny rule for both inbound and outbound traffic and associated with the subnet containing the targeted resource to prevent access to the subnet using the new NACL. 
+ **Shutdown** – Shutting down a resource completely can be effective at containing the effects of unauthorized use. Shutting down a resource will also prevent legitimate access for business needs and prevent volatile forensic data from being obtained, so this should be a purposeful decision and should be judged against an organization’s security policies. 
+ **Isolation VPCs **– Isolation VPCs can be used to provide effective containment of resources while providing access to legitimate traffic (such as anti-virus (AV) or EDR solutions that require access to the internet or an external management console). Isolation VPCs can be preconfigured in advance of a security event to permit valid IP addresses and ports, and targeted resources can immediately be moved into this isolation VPC during an active security event to contain the resource while allowing legitimate traffic to be sent and received by the targeted resource during subsequent phases of incident response. An important aspect of using an isolation VPC is that resources, such as EC2 instances, need to be shut down and relaunched in the new isolation VPC prior to use. Existing EC2 instances cannot be moved to another VPC or another Availability Zone. To do so, follow the steps outlined in [How do I move my Amazon EC2 instance to another subnet, Availability Zone, or VPC?](https://aws.amazon.com/premiumsupport/knowledge-center/move-ec2-instance/) 
+ **Auto scaling groups and load balancers **– AWS resources attached to auto scaling groups and load balancers should be detached and deregistered as part of destination containment procedures. Detachment and deregistration of AWS resources can be performed using the AWS Management Console, AWS CLI, and AWS SDK. 

 An example of destination containment is demonstrated in the following diagram with an incident response analyst adding an NACL to a subnet in order to block a network connection request from an unauthorized host. 

![\[Diagram showing an example of destination containment\]](http://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/images/destination-containment.png)


# Summary
<a name="containment-summary"></a>

 Containment is one step of the incident response process and can be manual or automated. The overall containment strategy should align with an organization’s security policies and business needs, and verify that negative effects are mitigated as efficiently as possible prior to eradication and recovery. 

# Eradication
<a name="eradication"></a>

 Eradication, in relation to security incident response, is the removal of suspicious or unauthorized resources in efforts to return the account to a known safe state. The eradication strategy depends on multiple factors, which depend on the business requirements for your organization. 

 The [NIST SP 800-61 Computer Security Incident Handling Guide](https://csrc.nist.gov/publications/detail/sp/800-61/rev-2/final) provides several steps for eradication: 

1.  Identify and mitigate all vulnerabilities that were exploited. 

1.  Remove malware, inappropriate materials, and other components. 

1.  If more affected hosts are discovered (for example, new malware infections), repeat the detection and analysis steps to identify all other affected hosts, then contain and eradicate the incident for them. 

 For AWS resources, this can be further refined through those events detected and analyzed through available logs or automated tooling such as CloudWatch Logs and Amazon GuardDuty. Those events should be the basis to determine which remediations should be performed to properly restore the environment to a known safe state. 

 The first step of eradication is determining which resources have been affected within the AWS account. This is accomplished through analysis of your available log data sources, resources, and automated tooling. 
+  Identify unauthorized actions taken by the IAM identities in your account. 
+  Identify unauthorized access or changes to your account. 
+  Identify the creation of unauthorized resources or IAM users. 
+  Identify systems or resources with unauthorized changes. 

 Once the list of resources is identified, you should assess each to determine the business impact if the resource is deleted or restored. As an example, if a web server is hosting your business application and deleting it would cause down time, then you should consider recovering the resource from verified safe backups or re-launching the system from a clean AMI before deleting the impacted server. 

 Once you have concluded your business impact analysis, then, using the events from your log analysis, you should go into the accounts and perform the appropriate remediations, such as: 
+  Rotate or delete keys - this step removes the ability of the actor to continue performing activities within the account. 
+  Rotate potentially unauthorized IAM user credentials. 
+  Delete unrecognized or unauthorized resources. 
**Important**  
 If you must keep resources for your investigation, consider backing up those resources. For example, if you must retain an Amazon EC2 instance for regulatory, compliance, or legal reasons, then [create an Amazon EBS snapshot](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-creating-snapshot.html) before removing the instance. 
+  For malware infections, you might need to reach out to an AWS Partner or other vendor. AWS does not offer native tools for malware analysis or removal. If you’re using the GuardDuty Malware module for Amazon EBS, then recommendations might be available for provided findings. 

 Once you have eradicated the identified affected resources, AWS recommends you perform a security review of your account. This can be done using AWS Config rules, using open-source solutions such as Prowler and ScoutSuite, or through other vendors. You should also consider performing vulnerability scans against your public-(internet) facing resources to assess residual risk. 

 Eradication is one step of the incident response process and can be manual or automated, depending on the incident and affected resources. The overall strategy should align with an organization’s security policies and business needs, and verify that negative effects are mitigated as inappropriate resources or configurations are removed. 

# Recovery
<a name="recovery"></a>

 Recovery is the process of restoring systems to a known safe state, validating that backups are safe or unaffected by the incident prior to restoration, testing to verify that the systems are working properly post-restoration, and addressing vulnerabilities associated with the security event. 

 The order of recovery depends on your organization’s requirements. As part of the process of recovery, you should perform a business impact analysis to determine, at minimum: 
+  Business or dependency priorities 
+  The restoration plan 
+  Authentication and authorization 

 The NIST SP 800-61 Computer Security Incident Handling Guide provides several steps to recover systems, including: 
+  Restoring systems from clean backups. 
  +  Verify that backups are evaluated before restoring to systems to make sure that the infection is not present and to prevent resurgence of the security event. 

     Backups should be evaluated on a regular basis as part of disaster recovery testing to verify that the backup mechanism is working properly and the data integrity meets recovery point objectives. 
  +  If possible, use backups from before the first event timestamp identified as part of root cause analysis. 
+  Rebuilding systems from scratch, including redeploying from trusted source using automation, sometime in a new AWS account. 
+  Replacing compromised files with clean versions. 

   You should exercise great caution when doing this. You must be absolutely certain the file you are recovering is known safe and unaffected by the incident 
+  Installing patches. 
+  Changing passwords. 
  +  This includes passwords for IAM principals that might have been abused. 
  +  If possible, we recommend using roles for IAM principals and federation as part of a least privilege strategy. 
+  Tightening network perimeter security (firewall rulesets, boundary router access control lists). 

 Once the resources have been recovered, it is important to capture lessons learned to update incident response policies, procedures, and guides. 

 In summary, it is imperative to implement a recovery process that facilitates a return to known safe operations. Recovery can take a long time and requires a close linkage with containment strategies to balance business impact against risk of reinfection. Recovery procedures should include steps for restoring resources and services, IAM principals, and performing a security review of the account to assess residual risk. 

# Conclusion
<a name="operations-conclusion"></a>

 Each operations phase has unique goals, techniques, methodologies, and strategies. Table 4 summarizes these phases and some of the techniques and methodologies covered in this section. 

* Table 4 – Operations phases: Goals, techniques, and methodologies*


|  Phase  |  Goal  |  Techniques and methodologies  | 
| --- | --- | --- | 
|  Detection  |  Identify a potential security event.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/operations-conclusion.html)  | 
|  Analysis  |  Determine if the security event is an incident and assess the scope of the incident.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/operations-conclusion.html)  | 
|  Containment  |  Minimize and limit the impact of the security event.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/operations-conclusion.html)  | 
|  Eradication  |  Remove unauthorized resources or artifacts related to the security event.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/operations-conclusion.html)  | 
|  Recovery  |  Restore systems to a known good state and monitor these systems to ensure the threat does not return.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/whitepapers/latest/aws-security-incident-response-guide/operations-conclusion.html)  |