

# Preparation
Preparation

 Preparing for an incident is critical for timely and effective incident response. Preparation is done across three domains: 
+  **People** – Preparing your people for a security incident involves identifying the relevant stakeholders for incident response and training them on incident response and cloud technologies. 
+ ** Process** – Preparing your processes for a security incident involves documenting architectures, developing thorough incident response plans, and creating playbooks for consistent response to security events. 
+  **Technology** – Preparing your technology for a security incident involves setting up access, aggregating and monitoring necessary logs, implementing effective alerting mechanisms, and developing response and investigative capabilities. 

 Each of these domains are equally important for effective incident response. No incident response program is complete or effective without all three. You will need to prepare people, processes, and technologies with tight integration in order to be prepared for an incident. 

# People
People

 To respond to a security event, you need to identify the stakeholders who would support the response to a security event. Additionally, it is critical for an effective response to have them trained on AWS technologies and your AWS environment. 

# Define roles and responsibilities
Define roles and responsibilities

 Handling security events requires cross-organizational discipline and an inclination for action. Within your organizational structure, there should be many people who are responsible, accountable, consulted, or kept informed during an incident, such as representatives from human resources (HR), the executive team, and legal. Consider these roles and responsibilities, and whether any third parties must be involved. Note that in many geographies, there are local laws that govern what should and should not be done. Although it might seem bureaucratic to build a responsible, accountable, consulted, and informed (RACI) chart for your security response plans, doing so enables quick and direct communication and clearly outlines the leadership across different stages of the event. 

 During an incident, including the owners/developers of impacted applications and resources is key because they are subject matter experts (SMEs) that can provide information and context to aid in measuring impact. Make sure to practice and build relationships with the developers and application owners before you rely on their expertise for incident response. Application owners or SMEs, such as your cloud administrators or engineers, might need to act in situations where the environment is unfamiliar or has complexity, or where the responders don’t have access. 

 Lastly, trusted relationships might be involved in the investigation or response because they can provide additional expertise and valuable scrutiny. When you don’t have these skills on your own team, you might want to hire an external party for assistance. 

# Train incident response staff
Train incident response staff

 Training your incident response staff on the technologies their organization uses will be crucial for them to adequately respond to a security event. Responses might be prolonged if your staff members don’t understand the underlying technologies. In addition to traditional incident response concepts, it’s also important that they understand AWS services and their AWS environment. There are a number of traditional mechanisms to train your incident staff, such as online training and classroom training. You should also consider running gamedays or simulations as a mechanism for training. For details on how to run simulations, see the [Run regular simulations](run-regular-simulations.md) section of this document. 

# Understand AWS Cloud technologies
Understand AWS Cloud technologies

 To reduce dependencies and decrease response time, ensure that your security teams and responders are educated about cloud services and have opportunities for hands-on practice with the specific cloud environment that your organization uses. For incident responders to be effective, it’s important to understand AWS foundations, IAM, AWS Organizations, AWS logging and monitoring services, and AWS security services.

 AWS provides online security workshops (refer to [AWS Security Workshops](https://workshops.aws/categories/Security)) where you can get hands-on experiences with AWS security and monitoring services. AWS also provides a number of training options and learning paths through digital training, classroom training, AWS training partners, and certifications. To learn more, refer to [AWS Training and Certification](https://aws.amazon.com/training/). 

 AWS provides both free and subscription based training supporting multiple personas and areas of focus. Visit [AWS Skillbuilder](https://skillbuilder.aws/) to learn more. 

# Understand your AWS environment
Understand your AWS environment

 In addition to understanding AWS services, their use cases, and how they integrate with each other, it’s equally important to understand how your organization’s AWS environment is actually architected and what operational processes are in place. Often, internal knowledge such as this is not documented and is understood by only a few domain experts, which can create dependencies, hinder innovation, and slow response time. 

 To avoid these dependencies and quicken response times, internal knowledge of your AWS environment should be documented, accessible, and understood by your security analysts. Understanding your complete cloud footprint will require collaboration between relevant security stakeholders and cloud administrators. Part of preparing your processes for incident response includes documenting and centralizing architecture diagrams, which is [Document and centralize architecture diagrams](document-and-centralize-architecture-diagrams.md) later in this whitepaper. However, from a people perspective, it’s important that your analysts can access and understand the diagrams and operational processes related to your AWS environment. 

# Understand AWS response teams and support
Understand AWS response teams and support

## Support
Support

 [Support](https://aws.amazon.com/premiumsupport/) offers a range of plans that provide access to tools and expertise that support the success and operational health of your AWS solutions. If you need technical support and more resources to help plan, deploy, and optimize your AWS environment, you can select a support plan that best aligns with your AWS use case. 

 Consider the [Support Center](https://console.aws.amazon.com/support) in the AWS Management Console (sign-in required) as the central point of contact to get support for issues that affect your AWS resources. Access to Support is controlled by IAM. For more information about getting access to AWS Support features, refer to [Getting started with Support](https://docs.aws.amazon.com/awssupport/latest/user/getting-started.html#accessing-support). 

 Additionally, if you need to report abuse, contact the [AWS Tust and Safety team](https://aws.amazon.com/forms/report-abuse). 

## Security Incident Response engineers
Security Incident Response engineers

 Security Incident Response engineers are a specialized always available global AWS team that provides support to customers during active security events on the customer side of the [AWS Shared Responsibility Model](https://aws.amazon.com/compliance/shared-responsibility-model/). 

 When Security Incident Response engineers support you, you will receive assistance with triage and recovery for an active security event on AWS. They will assist in root cause analysis through the use of AWS service logs and provide you with recommendations for recovery. They will also provide security recommendations and best practices to help you avoid security events in the future. 

 AWS customers can engage Security Incident Response engineers through an [AWS support case](https://docs.aws.amazon.com/awssupport/latest/user/case-management.html). 
+  **All Customers**: 

  1. Account and billing

  1. Service: Account

  1. Category: Security

  1. Severity: General question
+  **Customers with Developer Support plans**: 

  1. Account and billing

  1. Service: Account

  1. Category: Security

  1. Severity: Important question
+  **Customers with Business Support plans**: 

  1. Account and billing

  1. Service: Account

  1. Category: Security

  1. Severity: Urgent business impacting question
+  **Customers with Enterprise Support plans**: 

  1. Account and billing

  1. Service: Account

  1. Category: Security

  1. Severity: Critical business risk question
+  **Customers with AWS Security Incident Response subscriptions**: Open the Security Incident Response console at https://console.aws.amazon.com/security-ir/ 

## DDoS response support
DDoS response support

 AWS offers [AWS Shield](https://aws.amazon.com/shield/), which provides a managed distributed denial of service (DDoS) protection service that safeguards web applications running on AWS. AWS Shield provides always-on detection and automatic inline mitigations that can minimize application downtime and latency, so there is no need to engage Support to benefit from DDoS protection. There are two tiers of AWS Shield: Shield Standard and Shield Advanced. To learn about the differences between these two tiers, refer to the [Shield features documentation](https://aws.amazon.com/shield/features/). 

## AWS Managed Services (AMS)
AWS Managed Services (AMS)

 [AWS Managed Services](https://aws.amazon.com/managed-services/) (AMS) provides ongoing management of your AWS infrastructure so you can focus on your applications. By implementing best practices to maintain your infrastructure, AMS helps reduce your operational overhead and risk. AMS automates common activities such as change requests, monitoring, patch management, security, and backup services, and provides full-lifecycle services to provision, run, and support your infrastructure. 

 AMS takes responsibility for deploying a suite of security detective controls and provides an every day first line of response to alerts. When an alert is initiated, AMS follows a standard set of automated and manual playbooks to verify a consistent response. These playbooks are shared with AMS customers during onboarding so that they can develop and coordinate a response with AMS. 

# Process
Process

 Developing thorough and clearly defined incident response processes is key to a successful and scalable incident response program. When a security event occurs, clear steps and workflows will help you to respond in a timely manner. You might already have an existing incident response processes. Regardless of your current state, it’s important to update, iterate, and test your incident response processes regularly. 

# Develop and test an incident response plan
Develop and test an incident response plan

 The first document to develop for incident response is the *incident response plan*. The incident response plan is designed to be the foundation for your incident response program and strategy. An incident response plan is a high-level document that typically includes these sections: 
+ **An incident response team overview** – Outlines the goals and functions of the incident response team 
+ **Roles and responsibilities** – Lists the incident response stakeholders and details their roles when an incident occurs 
+ **A communication plan** – Details contact information and how you will communicate during an incident 

   It’s a best practice to have out-of-band communication as a backup for incident communication. An example of an application that provides a secure out-of-band communications channel is [AWS Wickr](https://aws.amazon.com/wickr/).
+ **Phases of incident response and actions to take** – Enumerates the phases of incident response – for example, detect, analyze, eradicate, contain, and recover – including high-level actions to take within those phases
+ **Incident severity and prioritization definitions** – Details how to classify the severity of an incident, how to prioritize the incident, and then how the severity definitions affect escalation procedures

 While these sections are common throughout companies of different sizes and industries, each organization’s incident response plan is unique. You will need to build an incident response plan that works best for your organization. 

# Document and centralize architecture diagrams
Document and centralize architecture diagrams

 To quickly and accurately respond to a security event, you need to understand how your systems and networks are architected. Understanding these internal patterns is not only important for incident response, but also for verifying consistency across applications that the patterns are architected with, according to best practices. You should also verify that this documentation is up to date and regularly updated in accordance with new architecture patterns. You should develop documentation and internal repositories that detail items such as: 
+ **AWS account structure** - You need to know: 
  +  How many AWS accounts do you have? 
  +  How are those AWS accounts organized? 
  +  Who are the business owners of the AWS accounts? 
  +  Do you use Service Control Policies (SCPs)? If so, what organizational guardrails are implemented by using SCPs? 
  +  Do you limit the Regions and services that can be used? 
  +  What differences are there between business units and environments (dev/test/prod)? 
+ **AWS service patterns** 
  +  What AWS services do you use? 
  +  What are the most widely used AWS services? 
+ **Architecture patterns** 
  +  What cloud architectures do you use? 
+ **AWS authentication patterns** 
  +  How do your developers typically authenticate to AWS? 
  +  Do you use IAM roles or users (or both)? Is your authentication to AWS connected to an identity provider (IdP)? 
  +  How do you map an IAM role or user to an employee or system? 
  +  How does access get revoked when someone is no longer authorized? 
+ **AWS authorization patterns** 
  +  What IAM policies do your developers use? 
  +  Do you use resource-based policies? 
+ **Logging and monitoring** 
  +  What logging sources do you use and where are they stored? 
  +  Do you aggregate AWS CloudTrail logs? If so, where are they stored? 
  +  How do you query CloudTrail logs? 
  +  Do you have Amazon GuardDuty enabled? 
  +  How do you access GuardDuty findings (for example, console, ticketing system, SIEM)? 
  +  Are findings or events aggregated in a SIEM? 
  +  Are tickets automatically created? 
  +  What tooling is in place to analyze logs for an investigation? 
+ **Network topology** 
  +  How are devices, endpoints, and connections on your network physically or logically arranged? 
  +  How does your network connect with AWS? 
  +  How is network traffic filtered between environments? 
+ **External infrastructure** 
  +  How are externally-facing applications deployed? 
  +  What AWS resources are publicly accessible? 
  +  What AWS accounts contain infrastructure that is externally facing? 
  +  What DDoS or external filtering is there? 

 Documenting internal technical diagrams and processes eases the incident response analyst’s job, helping them quickly obtain the institutional knowledge to respond to a security event. Thorough documentation of internal technical processes not only simplifies security investigations, but also adjusts for rationalization and evaluation of the processes. 

# Develop incident response playbooks
Develop incident response playbooks

 A key part of preparing your incident response processes is developing playbooks. Incident response playbooks provide a series of prescriptive guidance and steps to follow when a security event occurs. Having clear structure and steps simplifies the response and reduces the likelihood for human error. 

# What to create playbooks for
What to create playbooks for

 Playbooks should be created for incident scenarios such as: 
+  **Expected incidents** – Playbooks should be created for incidents you anticipate. This includes threats like denial of service (DoS), ransomware, and credential compromise. 
+ ** Known security findings or alerts** – Playbooks should be created for your known security findings and alerts, such as GuardDuty findings. You might receive a GuardDuty finding and think, “Now what?” To prevent mishandling of a GuardDuty finding or ignoring the finding, create a playbook for each potential GuardDuty finding. Some remediation details and guidance can be found in the [GuardDuty documentation](https://docs.aws.amazon.com/guardduty/latest/ug/guardduty_remediate.html). It’s worth noting that GuardDuty is not enabled by default and does incur a cost. More details on GuardDuty can be found in Appendix A: Cloud capability definitions - [Visibility and alerting](visibility-and-alerting.md). 

# What to include in playbooks
What to include in playbooks

 Playbooks should contain technical steps for a security analyst to complete in order to adequately investigate and respond to a potential security incident. 

 Items to include in a playbook include: 
+  **Playbook overview** – What risk or incident scenario does this playbook address? What is the goal of the playbook?
+  **Prerequisites** – What logs and detection mechanisms are required for this incident scenario? What is the expected notification? 
+ ** Stakeholder information** – Who is involved and what is their contact information? What are each of the stakeholders’ responsibilities? 
+ ** Response steps** – Across phases of incident response, what tactical steps should be taken? What queries should an analyst run? What code should be run to achieve the desired outcome? 
  + ** Detect **– How will the incident be detected? 
  + ** Analyze** – How will the scope of impact be determined? 
  + ** Contain** – How will the incident be isolated to limit scope? 
  + ** Eradicate** – How will the threat be removed from the environment? 
  + ** Recover** – How will the affected system or resource be brought back into production? 
+ ** Expected outcomes** – After queries and code are run, what is the expected result of the playbook? 

 To verify consistent information in each playbook, it can be helpful to create a playbook template to use across your other security playbooks. Some of the previously listed items, such as stakeholder information, can be shared across multiple playbooks. If that is the case, you can create centralized documentation for that information and reference it in the playbook, then enumerate the explicit differences in the playbook. This will prevent you from having to update the same information in all of your individual playbooks. Through creating a template and identifying common or shared information in playbooks, you can simplify and speed up playbook development. Lastly, your playbooks will likely evolve over time; once you have confirmed that the steps are consistent, this forms the requirements for automation. 

# Sample playbooks
Sample playbooks

 A number of sample playbooks can be found in Appendix B in [Playbook resources](appendix-b-incident-response-resources.md#playbook-resources). The examples here can be used to guide you on what playbooks to create and what to include in your playbooks. However, it’s important you craft playbooks that incorporate the risks most relevant to your business. You need to verify that the steps and workflows within your playbooks include your technologies and processes. 

# Run regular simulations
Run regular simulations

 Organizations grow and evolve over time, as does the threat landscape. Because of this, it’s important to continually review your incident response capabilities. Simulations are one method that can be used to perform this assessment. Simulations use real-world security event scenarios designed to mimic a threat actor’s tactics, techniques, and procedures (TTPs) and allow an organization to exercise and evaluate their incident response capabilities by responding to these mock cyber events as they might occur in reality. 

 Simulations have a variety of benefits, including: 
+  Validating cyber readiness and developing the confidence of your incident responders. 
+  Testing the accuracy and efficiency of tools and workflows. 
+  Refining communication and escalation methods aligned with your incident response plan. 
+  Providing an opportunity to respond to less common vectors. 

# Types of simulations
Types of simulations

 There are three main types of simulations: 
+  **Tabletop exercises** – The tabletop approach to simulations is strictly a discussion-based session involving the various incident response stakeholders to practice roles and responsibilities and use established communication tools and playbooks. Exercise facilitation can typically be accomplished in a full day in a virtual venue, a physical venue, or a combination. Because of its discussion-based nature, the tabletop exercise focuses on processes, people, and collaboration. Technology is an integral part of the discussion; however, the actual use of incident response tools or scripts is generally not a part of the tabletop exercise. 
+  **Purple Team exercises** – Purple Team exercises increase the level of collaboration between the incident responders (*Blue Team*) and simulated threat actors (*Red Team*). The Blue Team is generally comprised of members of the Security Operations Center (SOC), but can also include other stakeholders that would be involved during an actual cyber event. The Red Team is generally comprised of a penetration testing team or key stakeholders that are trained in offensive security. The Red Team works collaboratively with the exercise facilitators when designing a scenario so that the scenario is accurate and feasible. During Purple Team exercises, the primary focus is on the detection mechanisms, the tools, and the standard operating procedures (SOPs) supporting the incident response efforts. 
+ ** Red Team exercises** – During a Red Team exercise, the offense (*Red Team*) conducts a simulation to achieve a certain objective or set of objectives from a pre-determined scope. The defenders (*Blue Team*) will not necessarily know the scope and duration of the exercise, which provides a more realistic assessment of how they would respond to an actual incident. Because Red Team exercises can be invasive tests, you should be cautious and implement controls to verify that the exercise does not cause actual harm to your environment. 

**Note**  
AWS requires customers to review the policy for penetration testing available on the [ Penetration Testing website](https://aws.amazon.com/security/penetration-testing/) before they conduct Purple Team or Red Team exercises. 

 Table 1 summarizes a few key differences in these types of simulations. It’s important to note that the definitions are generally considered loose definitions and can be customized to fit the needs of your organization. 

* Table 1 – Types of simulations*


|   |  Tabletop exercise  |  Purple Team exercise  |  Red Team exercise  | 
| --- | --- | --- | --- | 
|  Summary  |  Paper-driven exercises that focus on one specific security incident scenario. These can be either high-level or technical, and are driven by a series of paper injects.  |  A more realistic offering compared to tabletop exercises. During Purple Team exercises, facilitators work collaboratively with the participants to increase exercise engagement and offer training where necessary.  |  Generally a more advanced simulation offering. There is usually a high level of covertness, where the participants might not know all of the details of the exercise.  | 
|  Resources required  |  Limited technical resources required  |  Various stakeholders required and high level of technical resources needed  |  Various stakeholders required and high level of technical resources needed  | 
|  Complexity  |  Low  |  Medium  |  High  | 

 Consider facilitating cyber simulations at a regular interval. Each exercise type can provide unique benefits to the participants and the organization as a whole, so you might choose to start with less complex simulation types (such as tabletop exercises) and progress to more complex simulation types (Red Team exercises). You should select a simulation type based on your security maturity, resources, and your desired outcomes. Some customers might not choose to perform Red Team exercises due to complexity and cost. 

# Exercise lifecycle
Exercise lifecycle

 Regardless of the type of simulation you choose, simulations generally follow these steps: 

1.  **Define core exercise elements** – Define the simulation scenario and the objectives of the simulation. Both of these should have leadership acceptance. 

1. ** Identify key stakeholders** – At a minimum, an exercise needs exercise facilitators and participants. Depending on the scenario, additional stakeholders such as legal, communications, or executive leadership might be involved. 

1. ** Build and test the scenario** – The scenario might need to be redefined as it is being built if specific elements aren’t feasible. A finalized scenario is expected as the output of this stage. 

1. ** Facilitate the simulation** – The type of simulation determines the facilitation used (paper-based scenario compared to highly technical, simulated scenario). The facilitators should align their facilitation tactics to the exercise objects and they should engage all exercise participants wherever possible to provide the most benefit. 

1. ** Develop the after action report (AAR)** – Identify areas that went well, those that can use improvement, and potential gaps. The AAR should measure the effectiveness of the simulation as well as the team’s response to the simulated event so that progress can be tracked over time with future simulations. 

# Technology
Technology

 If you develop and implement the appropriate technologies before a security incident, your incident response staff will be able to investigate, understand the scope, and take action in a timely manner. 

# Develop AWS account structure
Develop AWS account structure

 [AWS Organizations](https://aws.amazon.com/organizations/) helps centrally manage and govern an AWS environment as you grow and scale AWS resources. An AWS organization consolidates your AWS accounts so that you can administer them as a single unit. You can use organizational units (OUs) to group accounts together to administer as a single unit. 

 For incident response, it’s helpful to have an AWS account structure that supports the functions of incident response, which includes a *security OU* and a *forensics OU*. Within the security OU, you should have accounts for: 
+ ** Log archival **– Aggregate logs in a log archival AWS account. 
+ ** Security tooling** – Centralize security services in a security tool AWS account. This account operates as the delegated administrator for security services. 

 Within the forensics OU, you have the option to implement a single forensics account or accounts for each Region that you operate in, depending on which works best for your business and operational model. For an example of a per-Region account approach, if you only operate in US East (N. Virginia) (us-east-1) and US West (Oregon) (us-west-2), then you would have two accounts in the forensics OU: one for us-east-1 and one for us-west-2. Because it takes time to provision new accounts, it is imperative to create and instrument the forensics accounts well ahead of an incident so that responders can be prepared to effectively use them for response. 

 The following diagram displays a sample account structure including a forensics OU with per-Region forensics accounts: 

![\[Diagram of a per-region account structure for incident response\]](http://docs.aws.amazon.com/security-ir/latest/userguide/images/incident-response-account-structure.png)


# Develop and implement a tagging strategy
Develop and implement a tagging strategy

 Obtaining contextual information on the business use case and relevant internal stakeholders surrounding an AWS resource can be difficult. One way to do this is in the form of tags, which assign metadata to your AWS resources and consist of a user-defined key and value. You can create tags to categorize resources by purpose, owner, environment, type of data processed, and other criteria of your choice. 

 Having a consistent tagging strategy can speed up response times by allowing you to quickly identify and discern contextual information about an AWS resource. Tags can also serve as a mechanism to initiate response automations. For further information on what to tag, refer to the [documentation on tagging AWS resources](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). You’ll want to first define the tags you want to implement across your organization. After that, you’ll implement and enforce this tagging strategy. Details on implementation and enforcement can be found in the AWS blog [Implement AWS resource tagging strategy using AWS Tag Policies and Service Control Policies (SCPs)](https://aws.amazon.com/blogs/mt/implement-aws-resource-tagging-strategy-using-aws-tag-policies-and-service-control-policies-scps/). 

# Update AWS account contact information
Update AWS account contact information

 For each of your AWS accounts, it’s important to have accurate and up-to-date contact information so that the correct stakeholders receive important notifications from AWS on topics like security, billing, and operations. For each AWS account, you have a primary contact and alternate contacts for security, billing, and operations. Differences between these contacts can be found in the [AWS Account Management Reference Guide](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-update-contact.html#manage-acct-update-contact-alternate). 

 For details on managing alternate contacts, refer to the [AWS documentation on adding, changing, or removing alternate contacts](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/manage-account-payment.html#manage-account-payment-alternate-contacts). It’s a best practice to use an email distribution list if your team manages billing, operations, and security-related issues. An email distribution list removes dependencies on one person, which can cause blockages if they are out of the office or leave the company. You should also verify that the email and account contact information, including the phone number, are well protected to defend against root account password resets and multi-factor authentication (MFA) resets. 

 For customers using AWS Organizations, organization administrators can centrally manage alternate contacts for member accounts using the management account or a delegated administrator account without requiring credentials for each AWS account. You will also need to verify that newly created accounts have accurate contact information. Refer to the [Automatically update alternate contacts for newly created AWS accounts blog post](https://aws.amazon.com/blogs/mt/automatically-update-alternate-contacts-for-newly-created-aws-accounts/). 

# Prepare access to AWS accounts
Prepare access to AWS accounts

 During an incident, your incident response teams must have access to the environments and resources involved in the incident. Ensure that your teams have appropriate access to perform their duties before an event occurs. To do that, you should know what level of access your team members require (for example, what kinds of actions they are likely to take) and should provision least privilege access in advance. 

 To implement and provision this access, you should identify and discuss the AWS account strategy and cloud identity strategy with your organization's cloud architects to understand what authentication and authorization methods are configured. Due to the privileged nature of these credentials, you should consider using approval flows or retrieving credentials from a vault or safe as part of your implementation. After implementation, you should document and test the team members’ access well before an event occurs to make sure they can respond without delays. 

 Lastly, users that are created specifically to respond to a security incident are often privileged in order to provide sufficient access. Therefore, use of these credentials should be restricted, monitored, and not used for daily activities. 

# Understand the threat landscape
Understand the threat landscape

## Develop threat models
Develop threat models

 By developing threat models, organizations can identify threats and mitigations before an unauthorized user can. There are a number of strategies and approaches to threat modeling; refer to the [How to approach threat modeling](https://aws.amazon.com/blogs/security/how-to-approach-threat-modeling/) blog post. For incident response, a threat model can help identify the attack vectors a threat actor might have used during an incident. Understanding what you’re defending against will be crucial in order to respond in a timely manner. You can also use an AWS Partner for threat modeling. To search for an AWS partner, use the [AWS Partner Network](https://partners.amazonaws.com/). 

## Integrate and use cyber threat intelligence
Integrate and use cyber threat intelligence

 Cyber threat intelligence is the data and analysis of a threat actor’s intent, opportunity, and capability. Obtaining and using threat intelligence is helpful to detect an incident early and to better understand threat actor behavior. Cyber threat intelligence includes static indicators like IP addresses or file hashes of malware. It also includes high-level information, like behavioral patterns and intent. You can collect threat intelligence from a number of cyber security vendors and from open-source repositories. 

 To integrate and maximize threat intelligence for your AWS environment, you can use some out-of-the-box capabilities and integrate your own threat intelligence lists. Amazon GuardDuty uses AWS internal and third-party threat intelligence sources. Other AWS services, such as a DNS firewall and AWS WAF rules, also take inputs from AWS' advanced threat intelligence group. Some GuardDuty findings are mapped to the [MITRE ATT&CK Framework](https://attack.mitre.org/), which provides information on real-world observations on adversary tactics and techniques. 

# Select and set up logs for analysis and alerting
Select and set up logs for analysis and alerting

 During a security investigation, you need to be able to review relevant logs to record and understand the full scope and timeline of the incident. Logs are also required for alert generation, indicating certain actions of interest have happened. It is critical to select, enable, store, and set up querying and retrieval mechanisms, and set up alerting. Each of these actions are reviewed in this section. For more details, see the [Logging strategies for security incident response](https://aws.amazon.com/blogs/security/logging-strategies-for-security-incident-response/) AWS blog post.

# Select and enable log sources
Select and enable log sources

 Ahead of a security investigation, you need to capture relevant logs to retroactively reconstruct activity in an AWS account. Select and enable log sources relevant to their AWS account workloads. 

 AWS CloudTrail is a logging service that tracks API calls made against an AWS account capturing AWS service activity. It is enabled by default with 90-day retention of management events that can be [retrieved through CloudTrail’s Event History](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/view-cloudtrail-events.html) facility using AWS Management Console, the AWS CLI, or an AWS SDK. For longer retention and visibility of data events, you need to [create a CloudTrail Trail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-create-and-update-a-trail.html) and associated with an Amazon S3 bucket, and optionally, with a CloudWatch log group. Alternatively, you can create a [CloudTrail Lake](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake.html), which retains CloudTrail logs for up to seven years and provides a SQL-based querying facility. 

 AWS recommends that customers using a VPC enable network traffic and DNS logs using, respectively, [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) and [Amazon Route 53 resolver query logs](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/resolver-query-logs.html), streaming them to either an Amazon S3 bucket or a CloudWatch log group. You can create a VPC flow log for a VPC, a subnet, or a network interface. For VPC Flow Logs, you can be selective on how and where you enable Flow Logs to reduce cost. 

 AWS CloudTrail Logs, VPC Flow Logs, and Route 53 resolver query logs are the *basic logging trifecta* to support security investigations in AWS. 

 AWS services can generate logs not captured by the basic logging trifecta, such as Elastic Load Balancing logs, AWS WAF logs, AWS Config recorder logs, Amazon GuardDuty findings, Amazon Elastic Kubernetes Service (Amazon EKS) audit logs, and Amazon EC2 instance operating system and application logs. Refer to [Appendix A: Cloud capability definitions](appendix-a-cloud-capability-definitions.md) for the full list of logging and monitoring options. 

# Select log storage
Select log storage

 The choice of log storage is generally related to which querying tool you use, retention capabilities, familiarity, and cost. When you enable AWS service logs, provide a storage facility; usually an Amazon S3 bucket or CloudWatch log group. 

 An Amazon S3 bucket provides cost-effective durable storage with an optional lifecycle policy. Logs stored in Amazon S3 buckets can be natively queried using services such as Amazon Athena. A CloudWatch log group provides durable storage and a built-in query facility through CloudWatch Logs Insights. 

# Identify appropriate log retention
Identify appropriate log retention

 When you use an S3 bucket or CloudWatch log group to store logs, you must establish adequate lifecycles for each log source to optimize storage and retrieval costs. Customers generally have between 3 and 12 months of logs readily available for querying, with retention of up to seven years. The choice of availability and retention should align with your security requirements and a composite of statutory, regulatory, and business mandates. 

# Select and implement querying mechanisms for logs
Select and implement querying mechanisms for logs

 In AWS, the main services you can use to query logs are [CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html) for data stored in CloudWatch log groups, and [Amazon Athena](https://aws.amazon.com/athena/) and [Amazon OpenSearch Service](https://aws.amazon.com/opensearch-service/) for data stored in Amazon S3. You can also use third-party querying tools such as a security information and event management (SIEM). 

 The process for selecting a log querying tool should consider the people, process, and technology aspects of your security operations. Select a tool that fulfills operational, business, and security requirements, and is both accessible and maintainable in the long term. Keep in mind that log querying tools work optimally when the number of logs to be scanned is kept within the tool’s limits. It is not uncommon for customers to have multiple querying tools because of cost or technical constraints. For example, customers might use a third-party SIEM to perform queries for the last 90 days of data, and use Athena to perform queries beyond 90 days because of the log ingestion cost of a SIEM. No matter the implementation, verify that your approach minimizes the number of tools required to maximize operational efficiency, especially during a security event investigation. 

# Use logs for alerting
Use logs for alerting

 AWS natively provides alerting through security services, such as Amazon GuardDuty, [AWS Security Hub CSPM](https://aws.amazon.com/security-hub/), and AWS Config. You can also use custom alert generation engines for security alerts not covered by these services or for specific alerts relevant to your environment. Building these alerts and detections is covered in the section called [Detection](detection.md) in this document. 

# Develop forensics capabilities
Develop forensics capabilities

 Ahead of a security incident, consider developing forensics capabilities to support security event investigations. The [Guide to Integrating Forensic Techniques into Incident Response](https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-86.pdf) by NIST provides such guidance. 

# Forensics on AWS
Forensics on AWS

 Concepts from traditional on-premises forensics apply to AWS. The [Forensic investigation environment strategies in the AWS Cloud](https://aws.amazon.com/blogs/security/forensic-investigation-environment-strategies-in-the-aws-cloud/) blog post provides you with key information to start migrating their forensic expertise to AWS. 

 Once you have your environment and AWS account structure set up for forensics, you’ll want to define the technologies required to effectively perform forensically sound methodologies across the four phases: 
+ ** Collection** – Collect relevant AWS logs, such as AWS CloudTrail, AWS Config, VPC Flow Logs, and host-level logs. Collect snapshots, backups, and memory dumps of impacted AWS resources. 
+ ** Examination** – Examine the data collected by extracting and assessing the relevant information. 
+ ** Analysis** – Analyze the data collected in order to understand the incident and draw conclusions from it. 
+ ** Reporting** – Present the information resulting from the analysis phase. 

# Capture backups and snapshots
Capture backups and snapshots

 Setting up backups of key systems and databases are critical for recovering from a security incident and for forensics purposes. With backups in place, you can restore your systems to their previous safe state. On AWS, you can take snapshots of various resources. Snapshots provide you with point-in-time backups of those resources. There are many AWS services that can support you in backup and recovery. Refer to the [Backup and Recovery Prescriptive Guidance](https://docs.aws.amazon.com/prescriptive-guidance/latest/backup-recovery/services.html) for details on these services and approaches for backup and recovery. For more details, see the [Use backups to recover from security incidents](https://aws.amazon.com/blogs/security/use-backups-to-recover-from-security-incidents/) blog post.

 Especially when it comes to situations such as ransomware, it’s critical for your backups to be well protected. Refer to the [Top 10 security best practices for securing backups in AWS](https://aws.amazon.com/blogs/security/top-10-security-best-practices-for-securing-backups-in-aws/) blog post for guidance on securing your backups. In addition to securing your backups, you should regularly test your backup and restore processes to verify that the technology and processes you have in place work as expected. 

# Automation of forensics on AWS
Automation of forensics on AWS

 During a security event, your incident response team must be able to collect and analyze evidence quickly while maintaining accuracy for the time period surrounding the event. It’s both challenging and time consuming for the incident response team to manually collect the relevant evidence in a cloud environment, especially across a large number of instances and accounts. Additionally, manual collection can be prone to human error. For these reasons, customers should develop and implement automation for forensics. 

 AWS offers a number of automation resources for forensics, which are consolidated in the Appendix under [Forensic resources](appendix-b-incident-response-resources.md#forensic-resources). These resources are examples of forensics patterns that we have developed and customers have implemented. While they might be a useful reference architecture to start with, consider modifying them or creating new forensics automation patterns based on your environment, requirements, tools, and forensics processes. 

# Summary of preparation items
Summary of preparation items

 Thorough preparation for responding to security events is critical for timely and effective incident response. Incident response preparation involves people, processes, and technology. All three of these domains are equally important to preparation. You should prepare and evolve your incident response program across all three domains. 

 Table 2 summarizes the preparation items detailed in this section. 

* Table 2 – Incident response preparation items *


|  Domain  |  Preparation item  |  Action items  | 
| --- | --- | --- | 
|  People  |  Define roles and responsibilities.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  People  |  Train incident response staff on AWS.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  People  |  Understand AWS support options.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Process  |  Develop an incident response plan.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Process  |  Document and centralize architecture diagrams.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Process  |  Develop incident response playbooks.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Process  |  Run regular simulations.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Technology  |  Develop an AWS account structure.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Technology  |  Develop and implement a tagging strategy that helps responders to identify ownership and context for findings.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Technology  |  Update AWS account contact information.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Technology  |  Prepare access to AWS accounts.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Technology  |  Understand the threat landscape.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Technology  |  Select and set up logs.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 
|  Technology  |  Develop forensics capabilities.  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/security-ir/latest/userguide/preparation-summary.html)  | 

 An iterative approach is recommended for incident response preparation. All of these preparation items cannot be done overnight; you should create a plan to start small and continuously improve your incident response capabilities over time. 