

# Selection
<a name="a-selection"></a>

**Topics**
+ [PERF 1  How do you select the best performing architecture?](perf-01.md)
+ [PERF 2  How do you select your compute solution?](perf-02.md)
+ [PERF 3  How do you select your storage solution?](peff-03.md)
+ [PERF 4  How do you select your database solution?](perf-04.md)
+ [PERF 5  How do you configure your networking solution?](perf-05.md)

# PERF 1  How do you select the best performing architecture?
<a name="perf-01"></a>

 Often, multiple approaches are required for optimal performance across a workload. Well-architected systems use multiple solutions and features to improve performance. 

**Topics**
+ [PERF01-BP01 Understand the available services and resources](perf_performing_architecture_evaluate_resources.md)
+ [PERF01-BP02 Define a process for architectural choices](perf_performing_architecture_process.md)
+ [PERF01-BP03 Factor cost requirements into decisions](perf_performing_architecture_cost.md)
+ [PERF01-BP04 Use policies or reference architectures](perf_performing_architecture_use_policies.md)
+ [PERF01-BP05 Use guidance from your cloud provider or an appropriate partner](perf_performing_architecture_external_guidance.md)
+ [PERF01-BP06 Benchmark existing workloads](perf_performing_architecture_benchmark.md)
+ [PERF01-BP07 Load test your workload](perf_performing_architecture_load_test.md)

# PERF01-BP01 Understand the available services and resources
<a name="perf_performing_architecture_evaluate_resources"></a>

 Learn about and understand the wide range of services and resources available in the cloud. Identify the relevant services and configuration options for your workload, and understand how to achieve optimal performance. 

 If you are evaluating an existing workload, you must generate an inventory of the various services resources it consumes. Your inventory helps you evaluate which components can be replaced with managed services and newer technologies. 

 **Common anti-patterns:** 
+  You use the cloud as a collocated data center. 
+  You use shared storage for all things that need persistent storage. 
+  You do not use automatic scaling. 
+  You use instance types that are closest matched, but larger where needed, to your current standards. 
+  You deploy and manage technologies that are available as managed services. 

 **Benefits of establishing this best practice:** By considering services you may be unfamiliar with, you may be able to greatly reduce the cost of infrastructure and the effort required to maintain your services. You may be able to accelerate your time to market by deploying new services and features. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="perf01-bp01-implementation-guidance"></a>

 Inventory your workload software and architecture for related services: Gather an inventory of your workload and decide which category of products to learn more about. Identify workload components that can be replaced with managed services to increase performance and reduce operational complexity. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 

# PERF01-BP02 Define a process for architectural choices
<a name="perf_performing_architecture_process"></a>

 Use internal experience and knowledge of the cloud, or external resources such as published use cases, relevant documentation, or whitepapers, to define a process to choose resources and services. You should define a process that encourages experimentation and benchmarking with the services that could be used in your workload. 

 When you write critical user stories for your architecture, you should include performance requirements, such as specifying how quickly each critical story should run. For these critical stories, you should implement additional scripted user journeys to ensure that you have visibility into how these stories perform against your requirements. 

 **Common anti-patterns:** 
+  You assume your current architecture will become static and not be updated over time. 
+  You introduce architecture changes over time without justification. 

 **Benefits of establishing this best practice:** By having a defined process for making architectural changes, you enable using the gathered data to influence your workload design over time. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Select an architectural approach: Identify the kind of architecture that meets your performance requirements. Identify constraints, such as the media for delivery (desktop, web, mobile, IoT), legacy requirements, and integrations. Identify opportunities for reuse, including refactoring. Consult other teams, architecture diagrams, and resources such as AWS Solution Architects, AWS Reference Architectures, and AWS Partners to help you choose an architecture. 

 Define performance requirements: Use the customer experience to identify the most important metrics. For each metric, identify the target, measurement approach, and priority. Define the customer experience. Document the performance experience required by customers, including how customers will judge the performance of the workload. Prioritize experience concerns for critical user stories. Include performance requirements and implement scripted user journeys to ensure that you know how the stories perform against your requirements. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 

# PERF01-BP03 Factor cost requirements into decisions
<a name="perf_performing_architecture_cost"></a>

 Workloads often have cost requirements for operation. Use internal cost controls to select resource types and sizes based on predicted resource need. 

 Determine which workload components could be replaced with fully managed services, such as managed databases, in-memory caches, and ETL services. Reducing your operational workload allows you to focus resources on business outcomes. 

 For cost requirement best practices, refer to the *Cost-Effective Resources* section of the [Cost Optimization Pillar whitepaper](https://docs.aws.amazon.com/wellarchitected/latest/cost-optimization-pillar/welcome.html). 

 **Common anti-patterns:** 
+  You only use one family of instances. 
+  You do not evaluate licensed solutions versus open-source solutions 
+  You only use block storage. 
+  You deploy common software on EC2 instances and Amazon EBS or ephemeral volumes that are available as a managed service. 

 **Benefits of establishing this best practice:** Considering cost when making your selections will allow you to enable other investments. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Optimize workload components to reduce cost: Right size workload components and enable elasticity to reduce cost and maximize component efficiency. Determine which workload components can be replaced with managed services when appropriate, such as managed databases, in-memory caches, and reverse proxies. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1) ](https://www.youtube.com/watch?v=zt6jYJLK8sg&ref=wellarchitected) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 

# PERF01-BP04 Use policies or reference architectures
<a name="perf_performing_architecture_use_policies"></a>

 Maximize performance and efficiency by evaluating internal policies and existing reference architectures and using your analysis to select services and configurations for your workload. 

 **Common anti-patterns:** 
+  You allow wide use of technology selection that may impact the management overhead of your company. 

 **Benefits of establishing this best practice:** Establishing a policy for architecture, technology, and vendor choices will allow decisions to be made quickly. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Deploy your workload using existing policies or reference architectures: Integrate the services into your cloud deployment, then use your performance tests to ensure that you can continue to meet your performance requirements. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 

# PERF01-BP05 Use guidance from your cloud provider or an appropriate partner
<a name="perf_performing_architecture_external_guidance"></a>

 Use cloud company resources, such as solutions architects, professional services, or an appropriate partner to guide your decisions. These resources can help review and improve your architecture for optimal performance. 

 Reach out to AWS for assistance when you need additional guidance or product information. AWS Solutions Architects and [AWS Professional Services](https://aws.amazon.com/professional-services/) provide guidance for solution implementation. [AWS Partners](https://aws.amazon.com/partners/) provide AWS expertise to help you unlock agility and innovation for your business. 

 **Common anti-patterns:** 
+  You use AWS as a common data center provider. 
+  You use AWS services in a manner that they were not designed for. 

 **Benefits of establishing this best practice:** Consulting with your provider or a partner will give you confidence in your decisions. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Reach out to AWS resources for assistance: AWS Solutions Architects and Professional Services provide guidance for solution implementation. APN Partners provide AWS expertise to help you unlock agility and innovation for your business. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 

# PERF01-BP06 Benchmark existing workloads
<a name="perf_performing_architecture_benchmark"></a>

 Benchmark the performance of an existing workload to understand how it performs on the cloud. Use the data collected from benchmarks to drive architectural decisions. 

 Use benchmarking with synthetic tests and real-user monitoring to generate data about how your workload’s components perform. Benchmarking is generally quicker to set up than load testing and is used to evaluate the technology for a particular component. Benchmarking is often used at the start of a new project, when you lack a full solution to load test. 

 You can either build your own custom benchmark tests, or you can use an industry standard test, such as [TPC-DS](http://www.tpc.org/tpcds/) to benchmark your data warehousing workloads. Industry benchmarks are helpful when comparing environments. Custom benchmarks are useful for targeting specific types of operations that you expect to make in your architecture. 

 When benchmarking, it is important to pre-warm your test environment to ensure valid results. Run the same benchmark multiple times to ensure that you’ve captured any variance over time. 

 Because benchmarks are generally faster to run than load tests, they can be used earlier in the deployment pipeline and provide faster feedback on performance deviations. When you evaluate a significant change in a component or service, a benchmark can be a quick way to see if you can justify the effort to make the change. Using benchmarking in conjunction with load testing is important because load testing informs you about how your workload will perform in production. 

 **Common anti-patterns:** 
+  You rely on common benchmarks that are not indicative of your workload characteristics. 
+  You rely on customer feedback and perceptions as your only benchmark. 

 **Benefits of establishing this best practice:** Benchmarking your current implementation allows you to measure the improvement in performance. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Monitor performance during development: Implement processes that provide visibility into performance as your workload evolves. 

 Integrate into your delivery pipeline: Automatically run load tests in your delivery pipeline. Compare the test results against pre-defined key performance indicators (KPIs) and thresholds to ensure that you continue to meet performance requirements. 

 Test user journeys: Use synthetic or sanitized versions of production data (remove sensitive or identifying information) for load testing. Exercise your entire architecture by using replayed or pre-programmed user journeys through your application at scale. 

 Real-user monitoring: Use CloudWatch RUM to help you collect and view client-side data about your application performance. Use this data to help establish your real-user performance benchmarks. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Architecture Center](https://aws.amazon.com/architecture/) 
+  [AWS Partner Network](https://aws.amazon.com/partners/) 
+  [AWS Solutions Library](https://aws.amazon.com/solutions/) 
+  [AWS Knowledge Center](https://aws.amazon.com/premiumsupport/knowledge-center/) 
+  [Amazon CloudWatch RUM](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-RUM.html) 
+  [Amazon CloudWatch Synthetics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [This is my Architecture](https://aws.amazon.com/architecture/this-is-my-architecture/) 
+  [Optimize applications through Amazon CloudWatch RUM](https://www.youtube.com/watch?v=NMaeujY9A9Y) 
+  [Demo of Amazon CloudWatch Synthetics](https://www.youtube.com/watch?v=hF3NM9j-u7I) 

 **Related examples:** 
+  [AWS Samples](https://github.com/aws-samples) 
+  [AWS SDK Examples](https://github.com/awsdocs/aws-doc-sdk-examples) 
+  [Distributed Load Tests](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) 
+  [Measure page load time with Amazon CloudWatch Synthetics](https://github.com/aws-samples/amazon-cloudwatch-synthetics-page-performance) 
+  [Amazon CloudWatch RUM Web Client](https://github.com/aws-observability/aws-rum-web) 

# PERF01-BP07 Load test your workload
<a name="perf_performing_architecture_load_test"></a>

 Deploy your latest workload architecture on the cloud using different resource types and sizes. Monitor the deployment to capture performance metrics that identify bottlenecks or excess capacity. Use this performance information to design or improve your architecture and resource selection. 

 Load testing uses your *actual* workload so that you can see how your solution performs in a production environment. Load tests must be run using synthetic or sanitized versions of production data (remove sensitive or identifying information). Use replayed or pre-programmed user journeys through your workload at scale that exercise your entire architecture. Automatically carry out load tests as part of your delivery pipeline, and compare the results against pre-defined KPIs and thresholds. This ensures that you continue to achieve required performance. 

 **Common anti-patterns:** 
+  You load test individual parts of your workload but not your entire workload. 
+  You load test on infrastructure that is not the same as your production environment. 
+  You only conduct load testing to your expected load and not beyond, to help foresee where you may have future problems. 
+  Performing load testing without informing AWS Support, and having your test defeated as it looks like a denial of service event. 

 **Benefits of establishing this best practice:** Measuring your performance under a load test will show you where you will be impacted as load increases. This can provide you with the capability of anticipating needed changes before they impact your workload. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>

 Validate your approach with load testing: Load test a proof-of-concept to find out if you meet your performance requirements. You can use AWS services to run production-scale environments to test your architecture. Because you only pay for the test environment when it is needed, you can carry out full-scale testing at a fraction of the cost of using an on-premises environment. 

 Monitor metrics: Amazon CloudWatch can collect metrics across the resources in your architecture. You can also collect and publish custom metrics to surface business or derived metrics. Use CloudWatch or third-party solutions to set alarms that indicate when thresholds are breached. 

 Test at scale: Load testing uses your actual workload so you can see how your solution performs in a production environment. You can use AWS services to run production-scale environments to test your architecture. Because you only pay for the test environment when it is needed, you can run full-scale testing at a lower cost than using an on-premises environment. Take advantage of the AWS Cloud to test your workload to discover where it fails to scale, or if it scales in a non-linear way. For example, use Spot Instances to generate loads at low cost and discover bottlenecks before they are experienced in production. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html) 
+  [Building AWS CloudFormation Templates using CloudFormer](https://aws.amazon.com/blogs/devops/building-aws-cloudformation-templates-using-cloudformer/) 
+  [Amazon CloudWatch RUM](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-RUM.html) 
+  [Amazon CloudWatch Synthetics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch_Synthetics_Canaries.html) 
+  [Distributed Load Testing on AWS](https://docs.aws.amazon.com/solutions/latest/distributed-load-testing-on-aws/welcome.html) 

 **Related videos:** 
+  [Introducing The Amazon Builders’ Library (DOP328)](https://www.youtube.com/watch?v=sKRdemSirDM) 
+  [Optimize applications through Amazon CloudWatch RUM](https://www.youtube.com/watch?v=NMaeujY9A9Y) 
+  [Demo of Amazon CloudWatch Synthetics](https://www.youtube.com/watch?v=hF3NM9j-u7I) 

 **Related examples:** 
+  [Distributed Load Testing on AWS](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) 

# PERF 2  How do you select your compute solution?
<a name="perf-02"></a>

The optimal compute solution for a workload varies based on application design, usage patterns, and configuration settings. Architectures can use different compute solutions for various components and enable different features to improve performance. Selecting the wrong compute solution for an architecture can lead to lower performance efficiency.

**Topics**
+ [PERF02-BP01 Evaluate the available compute options](perf_select_compute_evaluate_options.md)
+ [PERF02-BP02 Understand the available compute configuration options](perf_select_compute_config_options.md)
+ [PERF02-BP03 Collect compute-related metrics](perf_select_compute_collect_metrics.md)
+ [PERF02-BP04 Determine the required configuration by right-sizing](perf_select_compute_right_sizing.md)
+ [PERF02-BP05 Use the available elasticity of resources](perf_select_compute_elasticity.md)
+ [PERF02-BP06 Re-evaluate compute needs based on metrics](perf_select_compute_use_metrics.md)

# PERF02-BP01 Evaluate the available compute options
<a name="perf_select_compute_evaluate_options"></a>

 Understand how your workload can benefit from the use of different compute options, such as instances, containers and functions. 

 **Desired outcome:** By understanding all of the compute options available, you will be aware of the opportunities to increase performance, reduce unnecessary infrastructure costs, and lower the operational effort required to maintain your workload. You can also accelerate your time to market when you deploy new services and features. 

 **Common anti-patterns:** 
+  In a post-migration workload, using the same compute solution that was being used on premises. 
+  Lacking awareness of the cloud compute solutions and how those solutions might improve your compute performance. 
+  Oversizing an existing compute solution to meet scaling or performance requirements, when an alternative compute solution would align to your workload characteristics more precisely. 

 **Benefits of establishing this best practice:** By identifying the compute requirements and evaluating the available compute solutions, business stakeholders and engineering teams will understand the benefits and limitations of using the selected compute solution. The selected compute solution should fit the workload performance criteria. Key criteria include processing needs, traffic patterns, data access patterns, scaling needs, and latency requirements. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Understand the virtualization, containerization, and management solutions that can benefit your workload and meet your performance requirements. A workload can contain multiple types of compute solutions. Each compute solution has differing characteristics. Based on your workload scale and compute requirements, a compute solution can be selected and configured to meet your needs. The cloud architect should learn the advantages and disadvantages of instances, containers, and functions. The following steps will help you through how to select your compute solution to match your workload characteristics and performance requirements. 


|  **Type**  |  **Server**  |  **Containers**  |  **Function**  | 
| --- | --- | --- | --- | 
|  AWS service  |  Amazon Elastic Compute Cloud (Amazon EC2)  |  Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS)  |  AWS Lambda  | 
|  Key Characteristics  |  Has dedicated option for hardware license requirements, Placement Options, and a large selection of different instance families based on compute metrics  |  Easy deployment, consistent environments, runs on top of EC2 instances, Scalable  |  Short runtime (15 minutes or less), maximum memory and CPU are not as high as other services, Managed hardware layer, Scales to millions of concurrent requests  | 
|  Common use-cases  |  Lift and shift migrations, monolithic application, hybrid environments, enterprise applications  |  Microservices, hybrid environments,  |  Microservices, event-driven applications  | 

 

 **Implementation steps:** 

1.  Select the location of where the compute solution must reside by evaluating [PERF05-BP06 Choose your workload’s location based on network requirements](perf_select_network_location.md). This location will limit the types of compute solution available to you. 

1.  Identify the type of compute solution that works with the location requirement and application requirements  

   1.  [https://aws.amazon.com/ec2/](https://aws.amazon.com/ec2/) virtual server instances come in a wide variety of different families and sizes. They offer a wide variety of capabilities, including solid state drives (SSDs) and graphics processing units (GPUs). EC2 instances offer the greatest flexibility on instance choice. When you launch an EC2 instance, the instance type that you specify determines the hardware of your instance. Each instance type offers different compute, memory, and storage capabilities. Instance types are grouped in instance families based on these capabilities. Typical use cases include: running enterprise applications, high performance computing (HPC), training and deploying machine learning applications and running cloud native applications. 

   1.  [https://aws.amazon.com/ecs/](https://aws.amazon.com/ecs/) is a fully managed container orchestration service that allows you to automatically run and manage containers on a cluster of EC2 instances or serverless instances using AWS Fargate. You can use Amazon ECS with other services such as Amazon Route 53, Secrets Manager, AWS Identity and Access Management (IAM), and Amazon CloudWatch. Amazon ECS is recommended if your application is containerized and your engineering team prefers Docker containers. 

   1.  [https://aws.amazon.com/eks/](https://aws.amazon.com/eks/) is a fully managed Kubernetes service. You can choose to run your EKS clusters using AWS Fargate, removing the need to provision and manage servers. Managing Amazon EKS is simplified due to integrations with AWS Services such as Amazon CloudWatch, Auto Scaling Groups, AWS Identity and Access Management (IAM), and Amazon Virtual Private Cloud (VPC). When using containers, you must use compute metrics to select the optimal type for your workload, similar to how you use compute metrics to select your EC2 or AWS Fargate instance types. Amazon EKS is recommended if your application is containerized and your engineering team prefers Kubernetes over Docker containers. 

   1.  You can use [https://aws.amazon.com/lambda/](https://aws.amazon.com/lambda/) to run code that supports the allowed runtime, memory, and CPU options. Simply upload your code, and AWS Lambda will manage everything required to run and scale that code. You can set up your code to automatically trigger from other AWS services or call it directly. Lambda is recommended for short running, microservice architectures developed for the cloud.  

1.  After you have experimented with your new compute solution, plan your migration and validate your performance metrics. This is a continual process, see [PERF02-BP04 Determine the required configuration by right-sizing](perf_select_compute_right_sizing.md). 

 **Level of effort for the implementation plan:** If a workload is moving from one compute solution to another, there could be a *moderate* level of effort involved in refactoring the application.   

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Compute with AWS ](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [EC2 Instance Types ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html?ref=wellarchitected) 
+  [Processor State Control for Your EC2 Instance ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html?ref=wellarchitected) 
+  [EKS Containers: EKS Worker Nodes ](https://docs.aws.amazon.com/eks/latest/userguide/worker.html?ref=wellarchitected) 
+  [Amazon ECS Containers: Amazon ECS Container Instances ](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html?ref=wellarchitected) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html?ref=wellarchitected#function-configuration) 
+  [Prescriptive Guidance for Containers](https://aws.amazon.com/prescriptive-guidance/?apg-all-cards.sort-by=item.additionalFields.sortText&apg-all-cards.sort-order=desc&awsf.apg-new-filter=*all&awsf.apg-content-type-filter=*all&awsf.apg-code-filter=*all&awsf.apg-category-filter=categories%23containers&awsf.apg-rtype-filter=*all&awsf.apg-isv-filter=*all&awsf.apg-product-filter=*all&awsf.apg-env-filter=*all) 
+  [Prescriptive Guidance for Serverless](https://aws.amazon.com/prescriptive-guidance/?apg-all-cards.sort-by=item.additionalFields.sortText&apg-all-cards.sort-order=desc&awsf.apg-new-filter=*all&awsf.apg-content-type-filter=*all&awsf.apg-code-filter=*all&awsf.apg-category-filter=categories%23serverless&awsf.apg-rtype-filter=*all&awsf.apg-isv-filter=*all&awsf.apg-product-filter=*all&awsf.apg-env-filter=*all) 

 **Related videos:** 
+  [How to choose compute option for startups](https://aws.amazon.com/startups/start-building/how-to-choose-compute-option/) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Amazon EC2 foundations (CMP211-R2) ](https://www.youtube.com/watch?v=kMMybKqC2Y0&ref=wellarchitected) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system ](https://www.youtube.com/watch?v=rUY-00yFlE4&ref=wellarchitected) 
+  [Deliver high-performance ML inference with AWS Inferentia (CMP324-R1) ](https://www.youtube.com/watch?v=17r1EapAxpk&ref=wellarchitected) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1) ](https://www.youtube.com/watch?v=_dvh4P2FVbw&ref=wellarchitected) 

 **Related examples:** 
+  [Migrating the web application to containers](https://application-migration-with-aws.workshop.aws/en/container-migration.html) 
+  [Run a Serverless Hello World](https://aws.amazon.com/getting-started/hands-on/run-serverless-code/) 

# PERF02-BP02 Understand the available compute configuration options
<a name="perf_select_compute_config_options"></a>

 Each compute solution has options and configurations available to you to support your workload characteristics. Learn how various options complement your workload, and which configuration options are best for your application. Examples of these options include instance family, sizes, features (GPU, I/O), bursting, time-outs, function sizes, container instances, and concurrency. 

 **Desired outcome:** The workload characteristics including CPU, memory, network throughput, GPU, IOPS, traffic patterns, and data access patterns are documented and used to configure the compute solution to match the workload characteristics. Each of these metrics plus custom metrics specific to your workload are recorded, monitored, and then used to optimize the compute configuration to best meet the requirements. 

 **Common anti-patterns:** 
+  Using the same compute solution that was being used on premises. 
+  Not reviewing the compute options or instance family to match workload characteristics. 
+  Oversizing the compute to ensure bursting capability. 
+  You use multiple compute management platforms for the same workload. 

** Benefits of establishing this best practice:** Be familiar with the AWS compute offerings so that you can determine the correct solution for each of your workloads. After you have selected the compute offerings for your workload, you can quickly experiment with those compute offerings to determine how well they meet your workload needs. A compute solution that is optimized to meet your workload characteristics will increase your performance, lower your cost and increase your reliability.

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 If your workload has been using the same compute option for more than four weeks and you anticipate that the characteristics will remain the same in the future, you can use [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) to provide a recommendation to you based on your compute characteristics. If AWS Compute Optimizer is not an option due to lack of metrics, [a non-supported instance type](https://docs.aws.amazon.com/compute-optimizer/latest/ug/requirements.html#requirements-ec2-instances) or a foreseeable change in your characteristics then you must predict your metrics based on load testing and experimentation.  

 **Implementation steps:** 

1.  Are you running on EC2 instances or containers with the EC2 Launch Type? 

   1.  Can your workload use GPUs to increase performance? 

      1.  [Accelerated Computing](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Accelerated_Computing) instances are GPU-based instances that provide the highest performance for machine learning training, inference and high performance computing. 

   1.  Does your workload run machine learning inference applications? 

      1.  [AWS Inferentia (Inf1)](https://aws.amazon.com/ec2/instance-types/inf1/) — Inf1 instances are built to support machine learning inference applications. Using Inf1 instances, customers can run large-scale machine learning inference applications, such as image recognition, speech recognition, natural language processing, personalization, and fraud detection. You can build a model in one of the popular machine learning frameworks, such as TensorFlow, PyTorch, or MXNet and use GPU instances, to train your model. After your machine learning model is trained to meet your requirements, you can deploy your model on Inf1 instances by using [AWS Neuron](https://aws.amazon.com/machine-learning/neuron/), a specialized software development kit (SDK) consisting of a compiler, runtime, and profiling tools that optimize the machine learning inference performance of Inferentia chips. 

   1.  Does your workload integrate with the low-level hardware to improve performance?  

      1.  [Field Programmable Gate Arrays (FPGA)](https://aws.amazon.com/ec2/instance-types/f1/) — Using FPGAs, you can optimize your workloads by having custom hardware-accelerated execution for your most demanding workloads. You can define your algorithms by leveraging supported general programming languages such as C or Go, or hardware-oriented languages such as Verilog or VHDL. 

   1.  Do you have at least four weeks of metrics and can predict that your traffic pattern and metrics will remain about the same in the future? 

      1.  Use [Compute Optimizer](https://aws.amazon.com/compute-optimizer/) to get a machine learning recommendation on which compute configuration best matches your compute characteristics. 

   1.  Is your workload performance constrained by the CPU metrics?  

      1.  [Compute-optimized](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Compute_Optimized) instances are ideal for the workloads that require high performing processors.  

   1.  Is your workload performance constrained by the memory metrics?  

      1.  [Memory-optimized](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Memory_Optimized) instances deliver large amounts of memory to support memory intensive workloads. 

   1.  Is your workload performance constrained by IOPS? 

      1.  [Storage-optimized](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Storage_Optimized) instances are designed for workloads that require high, sequential read and write access (IOPS) to local storage. 

   1.  Do your workload characteristics represent a balanced need across all metrics? 

      1.  Does your workload CPU need to burst to handle spikes in traffic? 

         1.  [Burstable Performance](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Instance_Features) instances are similar to Compute Optimized instances except they offer the ability to burst past the fixed CPU baseline identified in a compute-optimized instance. 

      1.  [General Purpose](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#General_Purpose) instances provide a balance of all characteristics to support a variety of workloads. 

   1.  Is your compute instance running on Linux and constrained by network throughput on the network interface card? 

      1.  Review [Performance Question 5, Best Practice 2: Evaluate available networking features](https://docs.aws.amazon.com/wellarchitected/latest/performance-efficiency-pillar/network-architecture-selection.html) to find the right instance type and family to meet your performance needs. 

   1.  Does your workload need consistent and predictable instances in a specific Availability Zone that you can commit to for a year?  

      1.  [Reserved Instances](https://aws.amazon.com/ec2/pricing/reserved-instances/) confirms capacity reservations in a specific Availability Zone. Reserved Instances are ideal for required compute power in a specific Availability Zone.  

   1.  Does your workload have licenses that require dedicated hardware? 

      1.  [Dedicated Hosts](https://aws.amazon.com/ec2/dedicated-hosts/) support existing software licenses and help you meet compliance requirements. 

   1.  Does your compute solution burst and require synchronous processing? 

      1.  [On-Demand Instances](https://aws.amazon.com/ec2/pricing/on-demand/) let you use the compute capacity by the hour or second with no long-term commitment. These instances are good for bursting above performance baseline needs. 

   1.  Is your compute solution stateless, fault-tolerant, and asynchronous?  

      1.  [Spot Instances](https://aws.amazon.com/ec2/spot/) let you take advantage of unused instance capacity for your stateless, fault-tolerant workloads.  

1.  Are you running containers on [Fargate](https://aws.amazon.com/fargate/)? 

   1.  Is your task performance constrained by the memory or CPU? 

      1.  Use the [Task Size](https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/capacity-tasksize.html) to adjust your memory or CPU. 

   1.  Is your performance being affected by your traffic pattern bursts? 

      1.  Use the [Auto Scaling](https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/capacity-autoscaling.html) configuration to match your traffic patterns. 

1.  Is your compute solution on [Lambda](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-features.html)? 

   1.  Do you have at least four weeks of metrics and can predict that your traffic pattern and metrics will remain about the same in the future? 

      1.  Use [Compute Optimizer](https://aws.amazon.com/compute-optimizer/) to get a machine learning recommendation on which compute configuration best matches your compute characteristics. 

   1.  Do you not have enough metrics to use AWS Compute Optimizer? 

      1.  If you do not have metrics available to use Compute Optimizer, use [AWS Lambda Power Tuning](https://docs.aws.amazon.com/lambda/latest/operatorguide/profile-functions.html) to help select the best configuration. 

   1.  Is your function performance constrained by the memory or CPU? 

      1.  Configure your [Lambda memory](https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html#configuration-memory-console) to meet your performance needs metrics. 

   1.  Is your function timing out on execution? 

      1.  Change the [timeout settings](https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html) 

   1.  Is your function performance constrained by bursts of activity and concurrency?  

      1.  Configure the [concurrency settings](https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html) to meet your performance requirements. 

   1.  Does your function execute asynchronously and is failing on retries? 

      1.  Configure the maximum age of the event and the maximum retry limit in the [asynchronous configuration](https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html) settings. 

## Level of effort for the implementation plan: 
<a name="level-of-effort-for-the-implementation-plan-to-establish-this-best-practice-you-must-be-aware-of-your-current-compute-characteristics-and-metrics.-gathering-those-metrics-establishing-a-baseline-and-then-using-those-metrics-to-identify-the-ideal-compute-option-is-a-low-to-moderate-level-of-effort.-this-is-best-validated-by-load-tests-and-experimentation."></a>

To establish this best practice, you must be aware of your current compute characteristics and metrics. Gathering those metrics, establishing a baseline and then using those metrics to identify the ideal compute option is a *low* to *moderate* level of effort. This is best validated by load tests and experimentation. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Compute with AWS ](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 
+  [EC2 Instance Types ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html?ref=wellarchitected) 
+  [Processor State Control for Your EC2 Instance ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html?ref=wellarchitected) 
+  [EKS Containers: EKS Worker Nodes ](https://docs.aws.amazon.com/eks/latest/userguide/worker.html?ref=wellarchitected) 
+  [Amazon ECS Containers: Amazon ECS Container Instances ](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html?ref=wellarchitected) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html?ref=wellarchitected#function-configuration) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2) ](https://www.youtube.com/watch?v=kMMybKqC2Y0&ref=wellarchitected) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system ](https://www.youtube.com/watch?v=rUY-00yFlE4&ref=wellarchitected) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1) ](https://www.youtube.com/watch?v=zt6jYJLK8sg&ref=wellarchitected) 

 **Related examples:** 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 

# PERF02-BP03 Collect compute-related metrics
<a name="perf_select_compute_collect_metrics"></a>

To understand how your compute resources are performing, you must record and track the utilization of various systems. This data can be used to make more accurate determinations about resource requirements.  

 Workloads can generate large volumes of data such as metrics, logs, and events. Determine if your existing storage, monitoring, and observability service can manage the data generated. Identify which metrics reflect resource utilization and can be collected, aggregated, and correlated on a single platform across. Those metrics should represent all your workload resources, applications, and services, so you can easily gain system-wide visibility and quickly identify performance improvement opportunities and issues.

 **Desired outcome:** All metrics related to the compute-related resources are identified, collected, aggregated, and correlated on a single platform with retention implemented to support cost and operational goals. 

 **Common anti-patterns:** 
+  You only use manual log file searching for metrics.  
+  You only publish metrics to internal tools. 
+  You only use the default metrics recorded by your selected monitoring software. 
+  You only review metrics when there is an issue. 

 

 **Benefits of establishing this best practice:** To monitor the performance of your workloads, you must record multiple performance metrics over a period of time. These metrics allow you to detect anomalies in performance. They will also help gauge performance against business metrics to ensure that you are meeting your workload needs. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Identify, collect, aggregate, and correlate compute-related metrics. Using a service such as Amazon CloudWatch, can make the implementation quicker and easier to maintain. In addition to the default metrics recorded, identify and track additional system-level metrics within your workload. Record data such as CPU utilization, memory, disk I/O, and network inbound and outbound metrics to gain insight into utilization levels or bottlenecks. This data is crucial to understand how the workload is performing and how the compute solution is utilized. Use these metrics as part of a data-driven approach to actively tune and optimize your workload's resources.  

 **Implementation steps:** 

1.  Which compute solution metrics are important to track? 

   1.  [EC2 default metrics](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html) 

   1.  [Amazon ECS default metrics](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html) 

   1.  [EKS default metrics](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/kubernetes-eks-metrics.html) 

   1.  [Lambda default metrics](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-access-metrics.html) 

   1.  [EC2 memory and disk metrics](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html) 

1.  Do I currently have an approved logging and monitoring solution? 

   1.  [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) 

   1.  [AWS Distro for OpenTelemetry](https://aws.amazon.com/otel/) 

   1.  [Amazon Managed Service for Prometheus](https://docs.aws.amazon.com/grafana/latest/userguide/prometheus-data-source.html) 

1.  Have I identified and configured my data retention policies to match my security and operational goals? 

   1.  [Default data retention for CloudWatch metrics](https://aws.amazon.com/cloudwatch/faqs/#AWS_resource_.26_custom_metrics_monitoring) 

   1.  [Default data retention for CloudWatch Logs](https://aws.amazon.com/cloudwatch/faqs/#Log_management) 

1.  How do you deploy your metric and log aggregation agents? 

   1.  [AWS Systems Manager automation](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-automation.html?ref=wellarchitected) 

   1.  [OpenTelemetry Collector](https://aws-otel.github.io/docs/getting-started/collector) 

 **Level of effort for the Implementation Plan: **There is a *medium* level of effort to identify, track, collect, aggregate, and correlate metrics from all compute resources. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon CloudWatch documentation](https://docs.aws.amazon.com/cloudwatch/index.html?ref=wellarchitected) 
+  [Collect metrics and logs from Amazon EC2 instances and on-premises servers with the CloudWatch Agent](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html?ref=wellarchitected) 
+  [Accessing Amazon CloudWatch Logs for AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-logs.html?ref=wellarchitected) 
+  [Using CloudWatch Logs with container instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_cloudwatch_logs.html?ref=wellarchitected) 
+  [Publish custom metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html?ref=wellarchitected) 
+  [AWS Answers: Centralized Logging](https://aws.amazon.com/answers/logging/centralized-logging/?ref=wellarchitected) 
+  [AWS Services That Publish CloudWatch Metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CW_Support_For_AWS.html?ref=wellarchitected) 
+  [Monitoring Amazon EKS on AWS Fargate](https://aws.amazon.com/blogs/containers/monitoring-amazon-eks-on-aws-fargate-using-prometheus-and-grafana/) 

 

 **Related videos:** 
+  [Application Performance Management on AWS](https://www.youtube.com/watch?v=5T4stR-HFas&ref=wellarchitected) 
+  [Build a Monitoring Plan](https://www.youtube.com/watch?v=OMmiGETJpfU&ref=wellarchitected) 

 

 **Related examples:** 
+  [Level 100: Monitoring with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_with_cloudwatch_dashboards/) 
+  [Level 100: Monitoring Windows EC2 instance with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_windows_ec2_cloudwatch/) 
+  [Level 100: Monitoring an Amazon Linux EC2 instance with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_linux_ec2_cloudwatch/) 

# PERF02-BP04 Determine the required configuration by right-sizing
<a name="perf_select_compute_right_sizing"></a>

 Analyze the various performance characteristics of your workload and how these characteristics relate to memory, network, and CPU usage. Use this data to choose resources that best match your workload's profile. For example, a memory-intensive workload, such as a database, could be served best by the r-family of instances. However, a bursting workload can benefit more from an elastic container system. 

 **Common anti-patterns:** 
+  You choose the largest instance available for all workloads. 
+  You standardize all instances types to one type for ease of management. 

 **Benefits of establishing this best practice:** Being familiar with the AWS compute offerings allows you to determine the correct solution for your various workloads. After you have selected the various compute offerings for your workload, you have the agility to quickly experiment with those compute offerings to determine which ones meet the needs of your workload. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Modify your workload configuration by right sizing: To optimize both performance and overall efficiency, determine which resources your workload needs. Choose memory-optimized instances for systems that require more memory than CPU, or compute-optimized instances for components that do data processing that is not memory-intensive. Right sizing enables your workload to perform as well as possible while only using the required resources 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/)  
+  [Cloud Compute with AWS](https://aws.amazon.com/products/compute/) 
+  [EC2 Instance Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [ECS Containers: Amazon ECS Container Instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [EKS Containers: EKS Worker Nodes](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration) 
+  [Processor State Control for Your EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2)](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1)](https://www.youtube.com/watch?v=_dvh4P2FVbw) 
+  [Deliver high performance ML inference with AWS Inferentia (CMP324-R1)](https://www.youtube.com/watch?v=17r1EapAxpk) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system](https://www.youtube.com/watch?v=rUY-00yFlE4) 
+  [How to choose compute option for startups](https://aws.amazon.com/startups/start-building/how-to-choose-compute-option/) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 

 **Related examples:** 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 

# PERF02-BP05 Use the available elasticity of resources
<a name="perf_select_compute_elasticity"></a>

 The cloud provides the flexibility to expand or reduce your resources dynamically through a variety of mechanisms to meet changes in demand. Combined with compute-related metrics, a workload can automatically respond to changes and use the optimal set of resources to achieve its goal. 

 Optimally matching supply to demand delivers the lowest cost for a workload, but you also must plan for sufficient supply to allow for provisioning time and individual resource failures. Demand can be fixed or variable, requiring metrics and automation to ensure that management does not become a burdensome and disproportionately large cost. 

 With AWS, you can use a number of different approaches to match supply with demand. The Cost Optimization Pillar whitepaper describes how to use the following approaches to cost: 
+  Demand-based approach 
+  Buffer-based approach 
+  Time-based approach 

 You must ensure that workload deployments can handle both scale-up and scale-down events. Create test scenarios for scale-down events to ensure that the workload behaves as expected. 

 **Common anti-patterns:** 
+  You react to alarms by manually increasing capacity. 
+  You leave increased capacity after a scaling event instead of scaling back down. 

 **Benefits of establishing this best practice:** Configuring and testing workload elasticity will help save money, maintain performance benchmarks, and improves reliability as traffic changes. Most non-production instances should be stopped when they are not being used. Although it's possible to manually shut down unused instances, this is impractical at larger scales. You can also take advantage of volume-based elasticity, which allows you to optimize performance and cost by automatically increasing the number of compute instances during demand spikes and decreasing capacity when demand decreases. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Take advantage of elasticity: Elasticity matches the supply of resources you have against the demand for those resources. Instances, containers, and functions provide mechanisms for elasticity either in combination with automatic scaling or as a feature of the service. Use elasticity in your architecture to ensure that you have sufficient capacity to meet performance requirements at all scales of use. Ensure that the metrics for scaling up or down elastic resources are validated against the type of workload being deployed. If you are deploying a video transcoding application, 100% CPU utilization is expected and should not be your primary metric. Alternatively, you can measure against the queue depth of transcoding jobs waiting to scale your instance types. Ensure that workload deployments can handle both scale up and scale down events. Scaling down workload components safely is as critical as scaling up resources when demand dictates. Create test scenarios for scale-down events to ensure that the workload behaves as expected. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Compute with AWS](https://aws.amazon.com/products/compute/) 
+  [EC2 Instance Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [ECS Containers: Amazon ECS Container Instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [EKS Containers: EKS Worker Nodes](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration) 
+  [Processor State Control for Your EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2)](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1)](https://www.youtube.com/watch?v=_dvh4P2FVbw) 
+  [Deliver high performance ML inference with AWS Inferentia (CMP324-R1)](https://www.youtube.com/watch?v=17r1EapAxpk) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system](https://www.youtube.com/watch?v=rUY-00yFlE4) 

 **Related examples:** 
+  [Amazon EC2 Auto Scaling Group Examples](https://github.com/aws-samples/amazon-ec2-auto-scaling-group-examples) 
+  [Amazon EFS Tutorials](https://github.com/aws-samples/amazon-efs-tutorial) 

# PERF02-BP06 Re-evaluate compute needs based on metrics
<a name="perf_select_compute_use_metrics"></a>

 Use system-level metrics to identify the behavior and requirements of your workload over time. Evaluate your workload's needs by comparing the available resources with these requirements and make changes to your compute environment to best match your workload's profile. For example, over time a system might be observed to be more memory-intensive than initially thought, so moving to a different instance family or size could improve both performance and efficiency. 

 **Common anti-patterns:** 
+  You only monitor system-level metrics to gain insight into your workload. 
+  You architect your compute needs for peak workload requirements. 
+  You oversize the compute solution to meet scaling or performance requirements when moving to a new compute solution would match your workload characteristics 

 **Benefits of establishing this best practice:** To optimize performance and resource utilization, you need a unified operational view, real-time granular data, and a historical reference. You can create automatic dashboards to visualize this data and perform metric math to derive operational and utilization insights. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>

 Use a data-driven approach to optimize resources: To achieve maximum performance and efficiency, use the data gathered over time from your workload to tune and optimize your resources. Look at the trends in your workload's usage of current resources and determine where you can make changes to better match your workload's needs. When resources are over-committed, system performance degrades, whereas underutilization results in a less efficient use of resources and higher cost. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Compute with AWS ](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 
+  [Cloud Compute with AWS](https://aws.amazon.com/products/compute/) 
+  [EC2 Instance Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [ECS Containers: Amazon ECS Container Instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [EKS Containers: EKS Worker Nodes](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration) 
+  [Processor State Control for Your EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2)](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1)](https://www.youtube.com/watch?v=_dvh4P2FVbw) 
+  [Deliver high performance ML inference with AWS Inferentia (CMP324-R1)](https://www.youtube.com/watch?v=17r1EapAxpk) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system](https://www.youtube.com/watch?v=rUY-00yFlE4) 

 **Related examples:** 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 

# PERF 3  How do you select your storage solution?
<a name="peff-03"></a>

 The optimal storage solution for a system varies based on the kind of access method (block, file, or object), patterns of access (random or sequential), required throughput, frequency of access (online, offline, archival), frequency of update (WORM, dynamic), and availability and durability constraints. Well-architected systems use multiple storage solutions and enable different features to improve performance and use resources efficiently. 

**Topics**
+ [PERF03-BP01 Understand storage characteristics and requirements](perf_right_storage_solution_understand_char.md)
+ [PERF03-BP02 Evaluate available configuration options](perf_right_storage_solution_evaluated_options.md)
+ [PERF03-BP03 Make decisions based on access patterns and metrics](perf_right_storage_solution_optimize_patterns.md)

# PERF03-BP01 Understand storage characteristics and requirements
<a name="perf_right_storage_solution_understand_char"></a>

 Identify and document the workload storage needs and define the storage characteristics of each location. Examples of storage characteristics include: shareable access, file size, growth rate, throughput, IOPS, latency, access patterns, and persistence of data. Use these characteristics to evaluate if block, file, object, or instance storage services are the most efficient solution for your storage needs. 

 **Desired outcome:** Identify and document the storage requirements per storage requirement and evaluate the available storage solutions. Based on the key storage characteristics, your team will understand how the selected storage services will benefit your workload performance. Key criteria include data access patterns, growth rate, scaling needs, and latency requirements. 

 **Common anti-patterns:** 
+  You only use one storage type, such as Amazon Elastic Block Store (Amazon EBS), for all workloads. 
+  You assume that all workloads have similar storage access performance requirements. 

 **Benefits of establishing this best practice:** Selecting the storage solution based on the identified and required characteristics will help improve your workloads performance, decrease costs and lower your operational efforts in maintaining your workload. Your workload performance will benefit from the solution, configuration, and location of the storage service. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Identify your workload’s most important storage performance metrics and implement improvements as part of a data-driven approach, using benchmarking or load testing. Use this data to identify where your storage solution is constrained, and examine configuration options to improve the solution. Determine the expected growth rate for your workload and choose a storage solution that will meet those rates. Research the AWS storage offerings to determine the correct storage solution for your various workload needs. Provisioning storage solutions in AWS increases the opportunity for you to test storage offerings and determine if they are appropriate for your workload needs. 


| AWS service | Key characteristics | Common use cases | 
| --- | --- | --- | 
| Amazon S3 |  99.999999999% durability, unlimited growth, accessible from anywhere, several cost models based on access and resiliency  |  Cloud-native application data, data archiving, and backups, analytics, data lakes, static website hosting, IoT data   | 
| Amazon Glacier |  Seconds to hours latency, unlimited growth, lowest cost, long-term storage  |  Data archiving, media archives, long-term backup retention.  | 
| Amazon EBS | Storage size requires management and monitoring, low latency, persistent storage, 99.8% to 99.9% durability, most volume types are accessible only from one EC2 instance. |  COTS applications, I/O intensive applications, relational and NoSQL databases, backup and recovery  | 
| EC2 Instance Store |  Pre-determined storage size, lowest latency, not persisted, accessible only from one EC2 instance  |  COTS applications, I/O intensive applications, in-memory data store  | 
| Amazon EFS |  99.999999999% durability, unlimited growth, accessible by multiple compute services  |  Modernized applications sharing files across multiple compute services, file storage for scaling content management systems  | 
| Amazon FSx |  Supports four file systems (NetApp, OpenZFS, Windows File Server, and Amazon FSx for Lustre), storage available different per file system, accessible by multiple compute services  |  Cloud native workloads, private cloud bursting, migrated workloads that require a specific file system, VMC, ERP systems, on-premises file storage and backups   | 
| Snow family |  Portable devices, 256-bit encryption, NFS endpoint, on-board computing, TBs of storage  |  Migrating data to the cloud, storage, and computing in extreme on-premises conditions, disaster recovery, remote data collection  | 
| AWS Storage Gateway |  Provides low-latency on-premises access to cloud-backed storage, fully managed on-premises cache   |  On-premises data to cloud migrations, populate cloud data lakes from on-premises sources, modernized file sharing.  | 

 **Implementation steps:** 

1. Use benchmarking or load tests to collect the key characteristics of your storage needs. Key characteristics include: 

   1. Shareable (what components access this storage) 

   1. Growth rate 

   1. Throughput 

   1. Latency 

   1. I/O size 

   1. Durability 

   1. Access patterns (reads vs writes, frequency, spikey, or consistent) 

1. Identify the type of storage solution that supports your storage characteristics. 

   1. [Amazon S3](https://aws.amazon.com/s3/) is an object storage service with unlimited scalability, high availability, and multiple options for accessibility. Transferring and accessing objects in and out of Amazon S3 can use a service, such as [Transfer Acceleration](https://aws.amazon.com/s3/transfer-acceleration/) or [Access Points](https://aws.amazon.com/s3/features/access-points/) to support your location, security needs, and access patterns. Use the [Amazon S3 performance guidelines](https://docs.aws.amazon.com/AmazonS3/latest/userguide/optimizing-performance-guidelines.html) to help you optimize your Amazon S3 configuration to meet your workload performance needs. 

   1. [Amazon Glacier](https://aws.amazon.com/s3/storage-classes/glacier/) is a storage class of Amazon S3 built for data archiving. You can choose from three archiving solutions ranging from millisecond access to 5-12 hour access with different cost and security options. Amazon Glacier can help you meet performance requirements by implementing a data lifecycle that supports your business requirements and data characteristics. 

   1. [Amazon Elastic Block Store (Amazon EBS)](https://aws.amazon.com/ebs/) is a high-performance block storage service designed for Amazon Elastic Compute Cloud (Amazon EC2). You can choose from [SSD- or HDD-based](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html) solutions with different characteristics that prioritize [IOPS](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/provisioned-iops.html) or [throughput](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/hdd-vols.html). EBS volumes are well suited for high-performance workloads, primary storage for file systems, databases, or applications that can only access attached stage systems. 

   1. [Amazon EC2 Instance Store](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/InstanceStorage.html) is similar to Amazon EBS as it attaches to an Amazon EC2 instance however, the Instance Store is only temporary storage that should ideally be used as a buffer, cache, or other temporary content. You cannot detach an Instance Store and all data is lost if the instance shuts down. Instance Stores can be used for high I/O performance and low latency use cases where data doesn’t need to persist. 

   1. [Amazon Elastic File System (Amazon EFS)](https://aws.amazon.com/efs/) is a mountable file system that can be accessed by multiple types of compute solutions. Amazon EFS automatically grows and shrinks storage and is performance-optimized to deliver consistent low latencies. EFS has [two performance configuration modes](https://docs.aws.amazon.com/efs/latest/ug/performance.html): General Purpose and Max I/O. General Purpose has a sub-millisecond read latency and a single-digit millisecond write latency. The Max I/O feature can support thousands of compute instance requiring a shared file system. Amazon EFS supports [two throughput modes](https://docs.aws.amazon.com/efs/latest/ug/managing-throughput.html): Bursting and Provisioned. A workload that experiences a spikey access pattern will benefit from the bursting throughput mode while a workload that is consistently high would be performant with a provisioned throughput mode. 

   1. [Amazon FSx](https://aws.amazon.com/fsx/) is built on the latest AWS compute solutions to support four commonly used file systems: NetApp ONTAP, OpenZFS, Windows File Server, and Lustre. Amazon FSx [latency, throughput, and IOPS](https://aws.amazon.com/fsx/when-to-choose-fsx/) vary per file system and should be considered when selecting the right file system for your workload needs. 

   1. [AWS Snow Family](https://aws.amazon.com/snow/) are storage and compute devices that support online and offline data migration to the cloud and data storage and computing on premises. AWS Snow devices support collecting large amounts of on-premises data, processing of that data and moving that data to the cloud. There are several [documented performance best practices](https://docs.aws.amazon.com/snowball/latest/developer-guide/performance.html) when it comes to the number of files, file sizes, and compression. 

   1. [AWS Storage Gateway](https://aws.amazon.com/storagegateway/) provides on-premises applications access to cloud-based storage. AWS Storage Gateway supports multiple cloud storage services including Amazon S3, Amazon Glacier, Amazon FSx, and Amazon EBS. It supports a number of protocols such as iSCSI, SMB, and NFS. It provides low-latency performance by caching frequently accessed data on premises and only sends changed data and compressed data to AWS. 

1. After you have experimented with your new storage solution and identified the optimal configuration, plan your migration and validate your performance metrics. This is a continual process, and should be reevaluated when key characteristics change or available services or options change. 

 **Level of effort for the implementation plan: **If a workload is moving from one storage solution to another, there could be a *moderate* level of effort involved in refactoring the application.   

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS Volume Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) 
+  [Amazon EC2 Storage](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html) 
+  [Amazon EFS: Amazon EFS Performance](https://docs.aws.amazon.com/efs/latest/ug/performance.html) 
+  [Amazon FSx for Lustre Performance](https://docs.aws.amazon.com/fsx/latest/LustreGuide/performance.html) 
+  [Amazon FSx for Windows File Server Performance](https://docs.aws.amazon.com/fsx/latest/WindowsGuide/performance.html) 
+ [Amazon FSx for NetApp ONTAP performance](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/performance.html)
+ [Amazon FSx for OpenZFS performance](https://docs.aws.amazon.com/fsx/latest/OpenZFSGuide/performance.html)
+  [Amazon Glacier: Amazon Glacier Documentation](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) 
+  [Amazon S3: Request Rate and Performance Considerations](https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) 
+  [Cloud Storage with AWS](https://aws.amazon.com/products/storage/) 
+ [AWS Snow Family](https://aws.amazon.com/snow/#Feature_comparison)
+  [EBS I/O Characteristics](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-io-characteristics.html) 

 **Related videos:** 
+  [Deep dive on Amazon EBS (STG303-R1)](https://www.youtube.com/watch?v=wsMWANWNoqQ) 
+  [Optimize your storage performance with Amazon S3 (STG343)](https://www.youtube.com/watch?v=54AhwfME6wI) 

 **Related examples:** 
+  [Amazon EFS CSI Driver](https://github.com/kubernetes-sigs/aws-efs-csi-driver) 
+  [Amazon EBS CSI Driver](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) 
+  [Amazon EFS Utilities](https://github.com/aws/efs-utils) 
+  [Amazon EBS Autoscale](https://github.com/awslabs/amazon-ebs-autoscale) 
+  [Amazon S3 Examples](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/s3-examples.html) 
+ [Amazon FSx for Lustre Container Storage Interface (CSI) Driver](https://github.com/kubernetes-sigs/aws-fsx-csi-driver)

# PERF03-BP02 Evaluate available configuration options
<a name="perf_right_storage_solution_evaluated_options"></a>

 Evaluate the various characteristics and configuration options and how they relate to storage. Understand where and how to use provisioned IOPS, SSDs, magnetic storage, object storage, archival storage, or ephemeral storage to optimize storage space and performance for your workload. 

 [Amazon EBS](https://aws.amazon.com/ebs) provides a range of options that allow you to optimize storage performance and cost for your workload. These options are divided into two major categories: SSD-backed storage for transactional workloads, such as databases and boot volumes (performance depends primarily on IOPS), and HDD-backed storage for throughput-intensive workloads, such as MapReduce and log processing (performance depends primarily on MB/s). 

 SSD-backed volumes include the highest performance provisioned IOPS SSD for latency-sensitive transactional workloads and general-purpose SSD that balance price and performance for a wide variety of transactional data. 

 [Amazon S3 transfer acceleration](https://aws.amazon.com/s3/transfer-acceleration/) enables fast transfer of files over long distances between your client and your S3 bucket. Transfer acceleration leverages Amazon CloudFront globally distributed edge locations to route data over an optimized network path. For a workload in an S3 bucket that has intensive GET requests, use Amazon S3 with CloudFront. When uploading large files, use multi-part uploads with multiple parts uploading at the same time to help maximize network throughput. 

 [Amazon Elastic File System (Amazon EFS)](https://aws.amazon.com/efs/) provides a simple, scalable, fully managed elastic NFS file system for use with AWS Cloud services and on-premises resources. To support a wide variety of cloud storage workloads, Amazon EFS offers two performance modes: general purpose performance mode, and max I/O performance mode. There are also two throughput modes to choose from for your file system: Bursting Throughput, and Provisioned Throughput. To determine which settings to use for your workload, see the [Amazon EFS User Guide](https://docs.aws.amazon.com/efs/latest/ug/performance.html). 

 [Amazon FSx](https://aws.amazon.com/fsx/) provides four file systems to choose from: [Amazon FSx for Windows File Server](https://aws.amazon.com/fsx/windows/) for enterprise workloads, [Amazon FSx for Lustre](https://aws.amazon.com/fsx/lustre/) for high-performance workloads, [Amazon FSx for NetApp ONTAP](https://docs.aws.amazon.com/fsx/latest/ONTAPGuide/index.html) for NetApps popular ONTAP file system, and [Amazon FSx for OpenZFS](https://docs.aws.amazon.com/fsx/latest/OpenZFSGuide/what-is-fsx.html) for Linux-based file servers. FSx is SSD-backed and is designed to deliver fast, predictable, scalable, and consistent performance. Amazon FSx file systems deliver sustained high read and write speeds and consistent low latency data access. You can choose the throughput level you need to match your workload’s needs. 

 **Common anti-patterns:** 
+  You only use one storage type, such as Amazon EBS, for all workloads. 
+  You use Provisioned IOPS for all workloads without real-world testing against all storage tiers. 
+  You assume that all workloads have similar storage access performance requirements. 

 **Benefits of establishing this best practice:** Evaluating all storage service options can reduce the cost of infrastructure and the effort required to maintain your workloads. It can potentially accelerate your time to market for deploying new services and features. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Determine storage characteristics: When you evaluate a storage solution, determine which storage characteristics you require, such as ability to share, file size, cache size, latency, throughput, and persistence of data. Then match your requirements to the AWS service that best fits your needs. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Storage with AWS](https://aws.amazon.com/products/storage/?ref=wellarchitected) 
+  [Amazon EBS Volume Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) 
+  [Amazon EC2 Storage](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html) 
+  [Amazon EFS: Amazon EFS Performance](https://docs.aws.amazon.com/efs/latest/ug/performance.html) 
+  [Amazon FSx for Lustre Performance](https://docs.aws.amazon.com/fsx/latest/LustreGuide/performance.html) 
+  [Amazon FSx for Windows File Server Performance](https://docs.aws.amazon.com/fsx/latest/WindowsGuide/performance.html) 
+  [Amazon Glacier: Amazon Glacier Documentation](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) 
+  [Amazon S3: Request Rate and Performance Considerations](https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) 
+  [Cloud Storage with AWS](https://aws.amazon.com/products/storage/) 
+  [Cloud Storage with AWS](https://aws.amazon.com/products/storage/?ref=wellarchitected) 
+  [EBS I/O Characteristics](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-io-characteristics.html) 

 **Related videos:** 
+  [Deep dive on Amazon EBS (STG303-R1)](https://www.youtube.com/watch?v=wsMWANWNoqQ) 
+  [Optimize your storage performance with Amazon S3 (STG343)](https://www.youtube.com/watch?v=54AhwfME6wI) 

 **Related examples:** 
+  [Amazon EFS CSI Driver](https://github.com/kubernetes-sigs/aws-efs-csi-driver) 
+  [Amazon EBS CSI Driver](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) 
+  [Amazon EFS Utilities](https://github.com/aws/efs-utils) 
+  [Amazon EBS Autoscale](https://github.com/awslabs/amazon-ebs-autoscale) 
+  [Amazon S3 Examples](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/s3-examples.html) 

# PERF03-BP03 Make decisions based on access patterns and metrics
<a name="perf_right_storage_solution_optimize_patterns"></a>

 Choose storage systems based on your workload's access patterns and configure them by determining how the workload accesses data. Increase storage efficiency by choosing object storage over block storage. Configure the storage options you choose to match your data access patterns. 

 How you access data impacts how the storage solution performs. Select the storage solution that aligns best to your access patterns, or consider changing your access patterns to align with the storage solution to maximize performance. 

 Creating a RAID 0 array allows you to achieve a higher level of performance for a file system than what you can provision on a single volume. Consider using RAID 0 when I/O performance is more important than fault tolerance. For example, you could use it with a heavily used database where data replication is already set up separately. 

 Select appropriate storage metrics for your workload across all of the storage options consumed for the workload. When using filesystems that use burst credits, create alarms to let you know when you are approaching those credit limits. You must create storage dashboards to show the overall workload storage health. 

 For storage systems that are a fixed size, such as Amazon EBS or Amazon FSx, ensure that you are monitoring the amount of storage used versus the overall storage size and create automation if possible to increase the storage size when reaching a threshold 

 **Common anti-patterns:** 
+  You assume that storage performance is adequate if customers are not complaining. 
+  You only use one tier of storage, assuming all workloads fit within that tier. 

 **Benefits of establishing this best practice:** You need a unified operational view, real-time granular data, and historical reference to optimize performance and resource utilization. You can create automatic dashboards and data with one-second granularity to perform metric math on your data and derive operational and utilization insights for your storage needs. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>

 Optimize your storage usage and access patterns: Choose storage systems based on your workload's access patterns and the characteristics of the available storage options. Determine the best place to store data that will enable you to meet your requirements while reducing overhead. Use performance optimizations and access patterns when configuring and interacting with data based on the characteristics of your storage (for example, striping volumes or partitioning data). 

 Select appropriate metrics for storage options: Ensure that you select the appropriate storage metrics for the workload. Each storage option offers various metrics to track how your workload performs over time. Ensure that you are measuring against any storage burst metrics (for example, monitoring burst credits for Amazon EFS). For storage systems that are fixed sized, such as Amazon Elastic Block Store or Amazon FSx, ensure that you are monitoring the amount of storage used versus the overall storage size. Create automation when possible to increase the storage size when reaching a threshold. 

 Monitor metrics: Amazon CloudWatch can collect metrics across the resources in your architecture. You can also collect and publish custom metrics to surface business or derived metrics. Use CloudWatch or third-party solutions to set alarms that indicate when thresholds are breached. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS Volume Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html) 
+  [Amazon EC2 Storage](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Storage.html) 
+  [Amazon EFS: Amazon EFS Performance](https://docs.aws.amazon.com/efs/latest/ug/performance.html) 
+  [Amazon FSx for Lustre Performance](https://docs.aws.amazon.com/fsx/latest/LustreGuide/performance.html) 
+  [Amazon FSx for Windows File Server Performance](https://docs.aws.amazon.com/fsx/latest/WindowsGuide/performance.html) 
+  [Amazon Glacier: Amazon Glacier Documentation](https://docs.aws.amazon.com/amazonglacier/latest/dev/introduction.html) 
+  [Amazon S3: Request Rate and Performance Considerations](https://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-perf-considerations.html) 
+  [Cloud Storage with AWS](https://aws.amazon.com/products/storage/) 
+  [EBS I/O Characteristics](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-io-characteristics.html) 
+  [Monitoring and understanding Amazon EBS performance using Amazon CloudWatch](https://aws.amazon.com/blogs/storage/valuable-tips-for-monitoring-and-understanding-amazon-ebs-performance-using-amazon-cloudwatch/) 

 **Related videos:** 
+  [Deep dive on Amazon EBS (STG303-R1)](https://www.youtube.com/watch?v=wsMWANWNoqQ) 
+  [Optimize your storage performance with Amazon S3 (STG343)](https://www.youtube.com/watch?v=54AhwfME6wI) 

 **Related examples:** 
+  [Amazon EFS CSI Driver](https://github.com/kubernetes-sigs/aws-efs-csi-driver) 
+  [Amazon EBS CSI Driver](https://github.com/kubernetes-sigs/aws-ebs-csi-driver) 
+  [Amazon EFS Utilities](https://github.com/aws/efs-utils) 
+  [Amazon EBS Autoscale](https://github.com/awslabs/amazon-ebs-autoscale) 
+  [Amazon S3 Examples](https://docs.aws.amazon.com/sdk-for-javascript/v2/developer-guide/s3-examples.html) 

# PERF 4  How do you select your database solution?
<a name="perf-04"></a>

 The optimal database solution for a system varies based on requirements for availability, consistency, partition tolerance, latency, durability, scalability, and query capability. Many systems use different database solutions for various subsystems and enable different features to improve performance. Selecting the wrong database solution and features for a system can lead to lower performance efficiency. 

**Topics**
+ [PERF04-BP01 Understand data characteristics](perf_right_database_solution_understand_char.md)
+ [PERF04-BP02 Evaluate the available options](perf_right_database_solution_evaluate_options.md)
+ [PERF04-BP03 Collect and record database performance metrics](perf_right_database_solution_collect_metrics.md)
+ [PERF04-BP04 Choose data storage based on access patterns](perf_right_database_solution_access_patterns.md)
+ [PERF04-BP05 Optimize data storage based on access patterns and metrics](perf_right_database_solution_optimize_metrics.md)

# PERF04-BP01 Understand data characteristics
<a name="perf_right_database_solution_understand_char"></a>

 Choose your data management solutions to optimally match the characteristics, access patterns, and requirements of your workload datasets. When selecting and implementing a data management solution, you must ensure that the querying, scaling, and storage characteristics support the workload data requirements. Learn how various database options match your data models, and which configuration options are best for your use-case.  

 AWS provides numerous database engines including relational, key-value, document, in-memory, graph, time series, and ledger databases. Each data management solution has options and configurations available to you to support your use-cases and data models. Your workload might be able to use several different database solutions, based on the data characteristics. By selecting the best database solutions to a specific problem, you can break away from monolithic databases, with the one-size-fits-all approach that is restrictive and focus on managing data to meet your customer's need. 

 **Desired outcome:** The workload data characteristics are documented with enough detail to facilitate selection and configuration of supporting database solutions, and provide insight into potential alternatives. 

 **Common anti-patterns:** 
+  Not considering ways to segment large datasets into smaller collections of data that have similar characteristics, resulting in missing opportunities to use more purpose-built databases that better match data and growth characteristics. 
+  Not identifying the data access patterns up front, which leads to costly and complex rework later. 
+  Limiting growth by using data storage strategies that don’t scale as quickly as is needed 
+  Choosing one database type and vendor for all workloads. 
+  Sticking to one database solution because there is internal experience and knowledge of one particular type of database solution. 
+  Keeping a database solution because it worked well in an on-premises environment. 

 **Benefits of establishing this best practice:** Be familiar with all of the AWS database solutions so that you can determine the correct database solution for your various workloads. After you select the appropriate database solution for your workload, you can quickly experiment on each of those database offerings to determine if they continue to meet your workload needs. 

 **Level of risk exposed if this best practice is not established:** High 
+  Potential cost savings may not be identified. 
+  Data may not be secured to the level required. 
+  Data access and storage performance may not be optimal. 

## Implementation guidance
<a name="implementation-guidance"></a>

 Define the data characteristics and access patterns of your workload. Review all available database solutions to identify which solution supports your data requirements. Within a given workload, multiple databases may be selected. Evaluate each service or group of services and assess them individually. If potential alternative data management solutions are identified for part or all of the data, experiment with alternative implementations that might unlock cost, security, performance, and reliability benefits. Update existing documentation, should a new data management approach be adopted. 


|  **Type**  |  **AWS Services**  |  **Key Characteristics**  |  **Common use-cases**  | 
| --- | --- | --- | --- | 
|  Relational  |  Amazon RDS, Amazon Aurora  |  Referential integrity, ACID transactions, schema on write  |  ERP, CRM, Commercial off-the-shelf software  | 
|  Key Value  |  Amazon DynamoDB  |  High throughput, low latency, near-infinite scalability  |  Shopping carts (ecommerce), product catalogs, chat applications  | 
|  Document  |  Amazon DocumentDB  |  Store JSON documents and query on any attribute  |  Content Management (CMS), customer profiles, mobile applications  | 
|  In Memory  |  Amazon ElastiCache, Amazon MemoryDB  |  Microsecond latency  |  Caching, game leaderboards  | 
|  Graph  |  Amazon Neptune  |  Highly relational data where the relationships between data have meaning  |  Social networks, personalization engines, fraud detection  | 
|  Time Series  |  Amazon Timestream  |  Data where the primary dimension is time  |  DevOps, IoT, Monitoring  | 
|  Wide column  |  Amazon Keyspaces  |  Cassandra workloads.  |  Industrial equipment maintenance, route optimization  | 
|  Ledger  |  Amazon QLDB  |  Immutable and cryptographically verifiable ledger of changes  |  Systems of record, healthcare, supply chains, financial institutions  | 

 **Implementation steps** 

1.  How is the data structured? (for example, unstructured, key-value, semi-structured, relational) 

   1.  If the data is unstructured, consider an object-store such as [Amazon S3](https://aws.amazon.com/products/storage/data-lake-storage/) or a NoSQL database such as [Amazon DocumentDB.](https://aws.amazon.com/documentdb/) 

   1.  For key-value data, consider [DynamoDB](https://aws.amazon.com/documentdb/), [ElastiCache for Redis](https://aws.amazon.com/elasticache/redis/) or [MemoryDB.](https://aws.amazon.com/memorydb/) 

   1.  If the data has a relational structure, what level of referential integrity is required? 

      1.  For foreign key constraints, relational databases such as [Amazon RDS](https://aws.amazon.com/rds/) and [Aurora](https://aws.amazon.com/rds/aurora/) can provide this level of integrity. 

      1.  Typically, within a NoSQL data-model, you would de-normalize your data into a single document or collection of documents to be retrieved in a single request rather than joining across documents or tables.  

1.  Is ACID (atomicity, consistency, isolation, durability) compliance required? 

   1.  If the ACID properties associated with relational databases are required, consider a relational database such as [Amazon RDS](https://aws.amazon.com/rds/) and [Aurora.](https://aws.amazon.com/rds/aurora/) 

1.  What consistency model is required? 

   1.  If your application can tolerate eventual consistency, consider a NoSQL implementation. Review the other characteristics to help choose which [NoSQL database](https://aws.amazon.com/nosql/) is most appropriate. 

   1.  If strong consistency is required, you can use strongly consistent reads with [DynamoDB](https://aws.amazon.com/documentdb/) or a relational database such as [Amazon RDS](https://aws.amazon.com/rds/). 

1.  What query and result formats must be supported? (for example, SQL, CSV, Parque, Avro, JSON, etc.) 

1.  What data types, field sizes and overall quantities are present? (for example, text, numeric, spatial, time-series calculated, binary or blob, document) 

1.  How will the storage requirements change over time? How does this impact scalability? 

   1.  Serverless databases such as [DynamoDB](https://aws.amazon.com/documentdb/) and [Amazon Quantum Ledger Database](https://aws.amazon.com/qldb/) will scale dynamically up to near-unlimited storage. 

   1.  Relational databases have upper bounds on provisioned storage, and often must be horizontally partitioned via mechanisms such as sharding once they reach these limits. 

1.  What is the proportion of read queries in relation to write queries? Would caching be likely to improve performance? 

   1.  Read-heavy workloads can benefit from a caching layer, this could be [ElastiCache](https://aws.amazon.com/elasticache/) or [DAX](https://aws.amazon.com/dynamodb/dax/) if the database is DynamoDB. 

   1.  Reads can also be offloaded to read replicas with relational databases such as [Amazon RDS](https://aws.amazon.com/rds/). 

1.  Does storage and modification (OLTP - Online Transaction Processing) or retrieval and reporting (OLAP - Online Analytical Processing) have a higher priority? 

   1.  For high-throughput transactional processing, consider a NoSQL database such as DynamoDB or Amazon DocumentDB. 

   1.  For analytical queries, consider a columnar database such as [Amazon Redshift](https://aws.amazon.com/redshift/) or exporting the data to Amazon S3 and performing analytics using [Athena](https://aws.amazon.com/athena/) or [QuickSight.](https://aws.amazon.com/quicksight/) 

1.  How sensitive is this data and what level of protection and encryption does it require? 

   1.  All Amazon RDS and Aurora engines support data encryption at rest using AWS KMS. Microsoft SQL Server and Oracle also support native Transparent Data Encryption (TDE) when using Amazon RDS. 

   1.  For DynamoDB, you can use fine-grained access control with [IAM](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/access-control-overview.html) to control who has access to what data at the key level. 

1.  What level of durability does the data require? 

   1.  Aurora automatically replicates your data across three Availability Zones within a Region, meaning your data is highly durable with less chance of data loss. 

   1.  DynamoDB is automatically replicated across multiple Availability Zones, providing high availability and data durability. 

   1.  Amazon S3 provides 11 9s of durability. Many database services such as Amazon RDS and DynamoDB support exporting data to Amazon S3 for long-term retention and archival. 

1.  Do [Recovery Time Objective (RTO) or Recovery Point Objectives (RPO)](https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/plan-for-disaster-recovery-dr.html) requirements influence the solution? 

   1.  Amazon RDS, Aurora, DynamoDB, Amazon DocumentDB, and Neptune all support point in time recovery and on-demand backup and restore.  

   1.  For high availability requirements, DynamoDB tables can be replicated globally using the [Global Tables](https://aws.amazon.com/dynamodb/global-tables/) feature and Aurora clusters can be replicated across multiple Regions using the Global database feature. Additionally, S3 buckets can be replicated across AWS Regions using cross-region replication.  

1.  Is there a desire to move away from commercial database engines / licensing costs? 

   1.  Consider open-source engines such as PostgreSQL and MySQL on Amazon RDS or Aurora 

   1.  Leverage [AWS DMS](https://aws.amazon.com/dms/) and [AWS SCT](https://aws.amazon.com/dms/schema-conversion-tool/) to perform migrations from commercial database engines to open-source 

1.  What is the operational expectation for the database? Is moving to managed services a primary concern? 

   1.  Leveraging Amazon RDS instead of Amazon EC2, and DynamoDB or Amazon DocumentDB instead of self-hosting a NoSQL database can reduce operational overhead. 

1.  How is the database currently accessed? Is it only application access, or are there Business Intelligence (BI) users and other connected off-the-shelf applications? 

   1.  If you have dependencies on external tooling then you may have to maintain compatibility with the databases they support. Amazon RDS is fully compatible with the difference engine versions that it supports including Microsoft SQL Server, Oracle, MySQL, and PostgreSQL. 

1.  The following is a list of potential data management services, and where these can best be used: 

   1.  Relational databases store data with predefined schemas and relationships between them. These databases are designed to support ACID (atomicity, consistency, isolation, durability) transactions, and maintain referential integrity and strong data consistency. Many traditional applications, enterprise resource planning (ERP), customer relationship management (CRM), and ecommerce use relational databases to store their data. You can run many of these database engines on Amazon EC2, or choose from one of the AWS-managed [database services](https://aws.amazon.com/products/databases/): [Amazon Aurora](https://aws.amazon.com/rds/aurora), [Amazon RDS](https://aws.amazon.com/rds), and [Amazon Redshift](https://aws.amazon.com/redshift). 

   1.  Key-value databases are optimized for common access patterns, typically to store and retrieve large volumes of data. These databases deliver quick response times, even in extreme volumes of concurrent requests. High-traffic web apps, ecommerce systems, and gaming applications are typical use-cases for key-value databases. In AWS, you can utilize [Amazon DynamoDB](https://aws.amazon.com/dynamodb/), a fully managed, multi-Region, multi-master, durable database with built-in security, backup and restore, and in-memory caching for internet-scale applications. 

   1.  In-memory databases are used for applications that require real-time access to data, lowest latency and highest throughput. By storing data directly in memory, these databases deliver microsecond latency to applications where millisecond latency is not enough. You may use in-memory databases for application caching, session management, gaming leaderboards, and geospatial applications. [Amazon ElastiCache](https://aws.amazon.com/elasticache/) is a fully managed in-memory data store, compatible with [Redis](https://aws.amazon.com/elasticache/redis/) or [Memcached](https://aws.amazon.com/elasticache/memcached). In case the applications also higher durability requirements, [Amazon MemoryDB for Redis](https://aws.amazon.com/memorydb/) offers this in combination being a durable, in-memory database service for ultra-fast performance. 

   1.  A document database is designed to store semistructured data as JSON-like documents. These databases help developers build and update applications such as content management, catalogs, and user profiles quickly. [Amazon DocumentDB](https://aws.amazon.com/documentdb/) is a fast, scalable, highly available, and fully managed document database service that supports MongoDB workloads. 

   1.  A wide column store is a type of NoSQL database. It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table. You typically see a wide column store in high scale industrial apps for equipment maintenance, fleet management, and route optimization. [Amazon Keyspaces (for Apache Cassandra)](https://aws.amazon.com/mcs/) is a wide column scalable, highly available, and managed Apache Cassandra–compatible database service. 

   1.  Graph databases are for applications that must navigate and query millions of relationships between highly connected graph datasets with millisecond latency at large scale. Many companies use graph databases for fraud detection, social networking, and recommendation engines. [Amazon Neptune](https://aws.amazon.com/neptune/) is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. 

   1.  Time-series databases efficiently collect, synthesize, and derive insights from data that changes over time. IoT applications, DevOps, and industrial telemetry can utilize time-series databases. [Amazon Timestream](https://aws.amazon.com/timestream/) is a fast, scalable, fully managed time series database service for IoT and operational applications that makes it easy to store and analyze trillions of events per day. 

   1.  Ledger databases provide a centralized and trusted authority to maintain a scalable, immutable, and cryptographically verifiable record of transactions for every application. We see ledger databases used for systems of record, supply chain, registrations, and even banking transactions. [Amazon Quantum Ledger Database (Amazon QLDB)](https://aws.amazon.com/qldb/) is a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log owned by a central trusted authority. Amazon QLDB tracks every application data change and maintains a complete and verifiable history of changes over time. 

 **Level of effort for the implementation plan: **If a workload is moving from one database solution to another, there could be a *high* level of effort involved in refactoring the data and application.   

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Databases with AWS ](https://aws.amazon.com/products/databases/?ref=wellarchitected) 
+  [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/?ref=wellarchitected) 
+  [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/?ref=wellarchitected) 
+  [Amazon Aurora best practices ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html?ref=wellarchitected) 
+  [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html?ref=wellarchitected) 
+  [Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/?ref=wellarchitected) 
+  [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/?ref=wellarchitected) 
+  [Amazon DynamoDB best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html?ref=wellarchitected) 
+  [Choose between EC2 and Amazon RDS](https://docs.aws.amazon.com/prescriptive-guidance/latest/migration-sql-server/comparison.html) 
+  [Best Practices for Implementing Amazon ElastiCache](https://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/BestPractices.html) 

 **Related videos:** 
+ [AWS purpose-built databases (DAT209-L) ](https://www.youtube.com/watch?v=q81TVuV5u28) 
+ [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54) 
+ [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM) 

 **Related examples:** 
+  [Optimize Data Pattern using Amazon Redshift Data Sharing](https://wellarchitectedlabs.com/sustainability/300_labs/300_optimize_data_pattern_using_redshift_data_sharing/) 
+  [Database Migrations](https://github.com/aws-samples/aws-database-migration-samples) 
+  [MS SQL Server - AWS Database Migration Service (DMS) Replication Demo](https://github.com/aws-samples/aws-dms-sql-server) 
+  [Database Modernization Hands On Workshop](https://github.com/aws-samples/amazon-rds-purpose-built-workshop) 
+  [Amazon Neptune Samples](https://github.com/aws-samples/amazon-neptune-samples) 

# PERF04-BP02 Evaluate the available options
<a name="perf_right_database_solution_evaluate_options"></a>

 Understand the available database options and how it can optimize your performance before you select your data management solution. Use load testing to identify database metrics that matter for your workload. While you explore the database options, take into consideration various aspects such as the parameter groups, storage options, memory, compute, read replica, eventual consistency, connection pooling, and caching options. Experiment with these various configuration options to improve the metrics. 

 **Desired outcome:** A workload could have one or more database solutions used based on data types. The database functionality and benefits optimally match the data characteristics, access patterns, and workload requirements. To optimize your database performance and cost, you must evaluate the data access patterns to determine the appropriate database options. Evaluate the acceptable query times to ensure that the selected database options can meet the requirements. 

 **Common anti-patterns:** 
+  Not identifying the data access patterns. 
+  Not being aware of the configuration options of your chosen data management solution. 
+  Relying solely on increasing the instance size without looking at other available configuration options. 
+  Not testing the scaling characteristics of the chosen solution. 

 

 **Benefits of establishing this best practice:** By exploring and experimenting with the database options you may be able to reduce the cost of infrastructure, improve performance and scalability and lower the effort required to maintain your workloads. 

 **Level of risk exposed if this best practice is not established:** High 
+  Having to optimize for a *one size fits all* database means making unnecessary compromises. 
+  Higher costs as a result of not configuring the database solution to match the traffic patterns. 
+  Operational issues may emerge from scaling issues. 
+  Data may not be secured to the level required. 

## Implementation guidance
<a name="implementation-guidance"></a>

 Understand your workload data characteristics so that you can configure your database options. Run load tests to identify your key performance metrics and bottlenecks. Use these characteristics and metrics to evaluate database options and experiment with different configurations. 


|  AWS Services  |  Amazon RDS, Amazon Aurora  |  Amazon DynamoDB  |  Amazon DocumentDB  |  Amazon ElastiCache  |  Amazon Neptune  |  Amazon Timestream  |  Amazon Keyspaces  |  Amazon QLDB  | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | 
|  Scaling Compute  |  Increase instance size, Aurora Serverless instances autoscale in response to changes in load  |  Automatic read/write scaling with on-demand capacity mode or automatic scaling of provisioned read/write capacity in provisioned capacity mode  |  Increase instance size  |  Increase instance size, add nodes to cluster  |  Increase instance size  |  Automatically scales to adjust capacity  |  Automatic read/write scaling with on-demand capacity mode or automatic scaling of provisioned read/write capacity in provisioned capacity mode  |  Automatically scales to adjust capacity  | 
|  Scaling-out reads  |  All engines support read replicas. Aurora supports automatic scaling of read replica instances  |  Increase provisioned read capacity units  |  Read replicas  |  Read replicas  |  Read replicas. Supports automatic scaling of read replica instances  |  Automatically scales  |  Increase provisioned read capacity units  |  Automatically scales up to documented concurrency limits  | 
|  Scaling-out writes  |  Increasing instance size, batching writes in the application or adding a queue in front of the database. Horizontal scaling via application-level sharding across multiple instances  |  Increase provisioned write capacity units. Ensuring optimal partition key to prevent partition level write throttling  |  Increasing primary instance size  |  Using Redis in cluster mode to distribute writes across shards  |  Increasing instance size  |  Write requests may be throttled while scaling. If you encounter throttling exceptions, continue to send data at the same (or higher) throughput to automatically scale. Batch writes to reduce concurrent write requests  |  Increase provisioned write capacity units. Ensuring optimal partition key to prevent partition level write throttling  |  Automatically scales up to documented concurrency limits  | 
|  Engine configuration  |  Parameter groups  |  Not applicable  |  Parameter groups  |  Parameter groups  |  Parameter groups  |  Not applicable  |  Not applicable  |  Not applicable  | 
|  Caching  |  In-memory caching, configurable via parameter groups. Pair with a dedicated cache such as ElastiCache for Redis to offload requests for commonly accessed items  |  DAX (DAX) fully managed cache available  |  In-memory caching. Optionally, pair with a dedicated cache such as ElastiCache for Redis to offload requests for commonly accessed items  |  Primary function is caching  |  Use the query results cache to cache the result of a read-only query  |  Timestream has two storage tiers; one of these is a high-performance in-memory tier  |  Deploy a separate dedicated cache such as ElastiCache for Redis to offload requests for commonly accessed items  |  Not applicable  | 
|  High availability / disaster recovery  |  Recommended configuration for production workloads is to run a standby instance in a second Availability Zone to provide resiliency within a Region.  For resiliency across Regions, Aurora Global Database can be used  |  Highly available within a Region. Tables can be replicated across Regions using DynanoDB global tables  |  Create multiple instances across Availability Zones for availability.  Snapshots can be shared across Regions and clusters can be replicated using DMS to provide Cross-Region Replication / disaster recovery  |  Recommended configuration for production clusters is to create at least one node in a secondary Availability Zone.  ElastiCache Global Datastore can be used to replicate clusters across Regions.  |  Read replicas in other Availability Zones serve as failover targets.  Snapshots can be shared across Region and clusters can be replicated using Neptune streams to replicate data between two clusters in two different Regions.  |  Highly available within a Region.  cross-Region replication requires custom application development using the Timestream SDK  |  Highly available within a Region.  Cross-Region Replication requires custom application logic or third-party tools  |  Highly available within a Region.  To replicate across Regions, export the contents of the Amazon QLDB journal to a S3 bucket and configure the bucket for Cross-Region Replication.  | 

 

 **Implementation steps** 

1.  What configuration options are available for the selected databases? 

   1.  Parameter Groups for Amazon RDS and Aurora allow you to adjust common database engine level settings such as the memory allocated for the cache or adjusting the time zone of the database 

   1.  For provisioned database services such as Amazon RDS, Aurora, Neptune, Amazon DocumentDB and those deployed on Amazon EC2 you can change the instance type, provisioned storage and add read replicas. 

   1.  DynamoDB allows you to specify two capacity modes: on-demand and provisioned. To account for differing workloads, you can change between these modes and increase the allocated capacity in provisioned mode at any time. 

1.  Is the workload read or write heavy?  

   1.  What solutions are available for offloading reads (read replicas, caching, etc.)?  

      1.  For DynamoDB tables, you can offload reads using DAX for caching. 

      1.  For relational databases, you can create an ElastiCache for Redis cluster and configure your application to read from the cache first, falling back to the database if the requested item is not present. 

      1.  Relational databases such as Amazon RDS and Aurora, and provisioned NoSQL databases such as Neptune and Amazon DocumentDB all support adding read replicas to offload the read portions of the workload. 

      1.  Serverless databases such as DynamoDB will scale automatically. Ensure that you have enough read capacity units (RCU) provisioned to handle the workload. 

   1.  What solutions are available for scaling writes (partition key sharding, introducing a queue, etc.)? 

      1.  For relational databases, you can increase the size of the instance to accommodate an increased workload or increase the provisioned IOPs to allow for an increased throughput to the underlying storage. 
         +  You can also introduce a queue in front of your database rather than writing directly to the database. This pattern allows you to decouple the ingestion from the database and control the flow-rate so the database does not get overwhelmed.  
         +  Batching your write requests rather than creating many short-lived transactions can help improve throughput in high-write volume relational databases. 

      1.  Serverless databases like DynamoDB can scale the write throughput automatically or by adjusting the provisioned write capacity units (WCU) depending on the capacity mode.  
         +  You can still run into issues with *hot* partitions though, when you reach the throughput limits for a given partition key. This can be mitigated by choosing a more evenly distributed partition key or by write-sharding the partition key.  

1.  What are the current or expected peak transactions per second (TPS)? Test using this volume of traffic and this volume \$1X% to understand the scaling characteristics. 

   1.  Native tools such as pg\$1bench for PostgreSQL can be used to stress-test the database and understand the bottlenecks and scaling characteristics. 

   1.  Production-like traffic should be captured so that it can be replayed to simulate real-world conditions in addition to synthetic workloads. 

1.  If using serverless or elastically scalable compute, test the impact of scaling this on the database. If appropriate, introduce connection management or pooling to lower impact on the database.  

   1.  RDS Proxy can be used with Amazon RDS and Aurora to manage connections to the database.  

   1.  Serverless databases such as DynamoDB do not have connections associated with them, but consider the provisioned capacity and automatic scaling policies to deal with spikes in load. 

1.  Is the load predictable, are there spikes in load and periods of inactivity? 

   1.  If there are periods of inactivity consider scaling down the provisioned capacity or instance size during these times. Aurora Serverless V2 will automatically scale up and down based on load. 

   1.  For non-production instances, consider pausing or stopping these during non-work hours. 

1.  Do you need to segment and break apart your data models based on access patterns and data characteristics? 

   1.  Consider using AWS DMS or AWS SCT to move your data to other services. 

## Level of effort for the implementation plan: 
<a name="level-of-effort-for-the-implementation-plan-to-establish-this-best-practice-you-must-be-aware-of-your-current-data-characteristics-and-metrics.-gathering-those-metrics-establishing-a-baseline-and-then-using-those-metrics-to-identify-the-ideal-database-configuration-options-is-a-low-to-moderate-level-of-effort.-this-is-best-validated-by-load-tests-and-experimentation."></a>

To establish this best practice, you must be aware of your current data characteristics and metrics. Gathering those metrics, establishing a baseline and then using those metrics to identify the ideal database configuration options is a *low* to *moderate* level of effort. This is best validated by load tests and experimentation. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Databases with AWS ](https://aws.amazon.com/products/databases/?ref=wellarchitected) 
+  [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/?ref=wellarchitected) 
+  [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/?ref=wellarchitected) 
+  [Amazon Aurora best practices ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html?ref=wellarchitected) 
+  [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html?ref=wellarchitected) 
+  [Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/?ref=wellarchitected) 
+  [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/?ref=wellarchitected) 
+  [Amazon DynamoDB best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html?ref=wellarchitected) 

 

 **Related videos:** 
+  [AWS purpose-built databases (DAT209-L) ](https://www.youtube.com/watch?v=q81TVuV5u28)
+ [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54) 
+  [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM)

 **Related examples:** 
+  [Amazon DynamoDB Examples](https://github.com/aws-samples/aws-dynamodb-examples) 
+  [AWS Database migration samples](https://github.com/aws-samples/aws-database-migration-samples) 
+  [Database Modernization Workshop](https://github.com/aws-samples/amazon-rds-purpose-built-workshop) 
+  [Working with parameters on your Amazon RDS for Postgress DB](https://github.com/awsdocs/amazon-rds-user-guide/blob/main/doc_source/Appendix.PostgreSQL.CommonDBATasks.Parameters.md) 

# PERF04-BP03 Collect and record database performance metrics
<a name="perf_right_database_solution_collect_metrics"></a>

 To understand how your data management systems are performing, it is important to track relevant metrics. These metrics will help you to optimize your data management resources, to ensure that your workload requirements are met, and that you have a clear overview on how the workload performs. Use tools, libraries, and systems that record performance measurements related to database performance. 

 

 There are metrics that are related to the system on which the database is being hosted (for example, CPU, storage, memory, IOPS), and there are metrics for accessing the data itself (for example, transactions per second, queries rates, response times, errors). These metrics should be readily accessible for any support or operational staff, and have sufficient historical record to be able to identify trends, anomalies, and bottlenecks. 

 

 **Desired outcome:** To monitor the performance of your database workloads, you must record multiple performance metrics over a period of time. This allows you to detect anomalies as well as measure performance against business metrics to ensure you are meeting your workload needs. 

 **Common anti-patterns:** 
+  You only use manual log file searching for metrics. 
+  You only publish metrics to internal tools used by your team and don’t have a comprehensive picture of your workload. 
+  You only use the default metrics recorded by your selected monitoring software. 
+  You only review metrics when there is an issue. 
+  You only monitor system level metrics, not capturing data access or usage metrics. 

 **Benefits of establishing this best practice:** Establishing a performance baseline helps in understanding normal behavior and requirements of workloads. Abnormal patterns can be identified and debugged faster improving performance and reliability of the database. Database capacity can be configured to ensure optimal cost without compromising performance. 

 **Level of risk exposed if this best practice is not established:** High 
+  Inability to differentiate out of normal vs. normal performance level will create difficulties in issue identification, and decision making. 
+  Potential cost savings may not be identified. 
+  Growth patterns will not be identified which might result in reliability or performance degradation. 

## Implementation guidance
<a name="implementation-guidance"></a>

 Identify, collect, aggregate, and correlate database-related metrics. Metrics should include both the underlying system that is supporting the database and the database metrics. The underlying system metrics might include CPU utilization, memory, available disk storage, disk I/O, and network inbound and outbound metrics while the database metrics might include transactions per second, top queries, average queries rates, response times, index usage, table locks, query timeouts, and number of connections open. This data is crucial to understand how the workload is performing and how the database solution is used. Use these metrics as part of a data-driven approach to tune and optimize your workload's resources.  

 **Implementation steps:** 

1.  Which database metrics are important to track? 

   1.  [Monitoring metrics for Amazon RDS](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Monitoring.html) 

   1.  [Monitoring with Performance Insights](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_PerfInsights.html) 

   1.  [Enhanced monitoring](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_Monitoring.OS.overview.html) 

   1.  [DynamoDB metrics](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/metrics-dimensions.html) 

   1.  [Monitoring DynamoDB DAX](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DAX.Monitoring.html) 

   1.  [Monitoring MemoryDB](https://docs.aws.amazon.com/memorydb/latest/devguide/monitoring-cloudwatch.html) 

   1.  [Monitoring Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/mgmt/metrics.html) 

   1.  [Timeseries metrics and dimensions](https://docs.aws.amazon.com/timestream/latest/developerguide/metrics-dimensions.html) 

   1.  [Cluster level metrics for Aurora](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Monitoring.Metrics.html) 

   1.  [Monitoring Amazon Keyspaces](https://docs.aws.amazon.com/keyspaces/latest/devguide/monitoring.html) 

   1.  [Monitoring Amazon Neptune](https://docs.aws.amazon.com/neptune/latest/userguide/monitoring.html) 

1.  Would the database monitoring benefit from a machine learning solution that detects operational anomalies performance issues? 

   1.  [Amazon DevOps Guru for Amazon RDS](https://docs.aws.amazon.com/devops-guru/latest/userguide/working-with-rds.overview.how-it-works.html) provides visibility into performance issues and makes recommendations for corrective actions. 

1.  Do you need application level details about SQL usage? 

   1.  [AWS X-Ray](https://docs.aws.amazon.com/xray/latest/devguide/xray-api-segmentdocuments.html#api-segmentdocuments-sql) can be instrumented into the application to gain insights and encapsulate all the data points for single query. 

1.  Do you currently have an approved logging and monitoring solution? 

   1.  [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) can collect metrics across the resources in your architecture. You can also collect and publish custom metrics to surface business or derived metrics. Use CloudWatch or third-party solutions to set alarms that indicate when thresholds are breached. 

1.  You identified and configured your data retention policies to match my security and operational goals? 

   1.  [Default data retention for CloudWatch metrics](https://aws.amazon.com/cloudwatch/faqs/#AWS_resource_.26_custom_metrics_monitoring) 

   1.  [Default data retention for CloudWatch Logs](https://aws.amazon.com/cloudwatch/faqs/#Log_management) 

 **Level of effort for the implementation plan: **There is a *medium* level of effort to identify, track, collect, aggregate, and correlate metrics from all database resources. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+ [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/) 
+ [ Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/)
+ [ Amazon Aurora best practices ](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html)
+  [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/)
+ [Amazon DynamoDB best practices ](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html) 
+ [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/) 
+ [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html) 
+ [Cloud Databases with AWS](https://aws.amazon.com/products/databases/) 
+  [Amazon RDS Performance Insights](https://aws.amazon.com/rds/performance-insights/) 

 **Related videos:** 
+ [AWS purpose-built databases (DAT209-L) ](https://www.youtube.com/watch?v=q81TVuV5u28) 
+  [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54)
+  [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM)

 **Related examples:** 
+  [Level 100: Monitoring with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_with_cloudwatch_dashboards/) 
+  [AWS Dataset Ingestion Metrics Collection Framework](https://github.com/awslabs/aws-dataset-ingestion-metrics-collection-framework) 
+  [Amazon RDS Monitoring Workshop](https://www.workshops.aws/?tag=Enhanced%20Monitoring) 

# PERF04-BP04 Choose data storage based on access patterns
<a name="perf_right_database_solution_access_patterns"></a>

 Use the access patterns of the workload to decide which services and technologies to use. In addition to non-functional requirements such as performance and scale, access patterns heavily influence the choice of the database and storage solutions. The first dimension is the need for transactions, ACID compliance, and consistent reads. Not every database supports these and most of the NoSQL databases provide an eventual consistency model. The second important dimension would be the distribution of write and reads over time and space. Globally distributed applications need to consider the traffic patterns, latency and access requirements in order to identify the optimal storage solution. The third crucial aspect to choose is the query pattern flexibility, random access patterns, and one-time queries. Considerations around highly specialized query functionality for text and natural language processing, time series, and graphs must also be taken into account. 

 **Desired outcome:** The data storage has been selected based on identified and documented data access patterns. This might include the most common read, write and delete queries, the need for ad-hoc calculations and aggregations, complexity of the data, the data interdependency, and the required consistency needs. 

 **Common anti-patterns:** 
+  You only select one database vendor to simplify operations management. 
+  You assume that data access patterns will stay consistent over time. 
+  You implement complex transactions, rollback, and consistency logic in the application. 
+  The database is configured to support a potential high traffic burst, which results in the database resources remaining idle most of the time. 
+  Using a shared database for transactional and analytical uses. 

 **Benefits of establishing this best practice:** Selecting and optimizing your data storage based on access patterns will help decrease development complexity and optimize your performance opportunities. Understanding when to use read replicas, global tables, data partitioning, and caching will help you decrease operational overhead and scale based on your workload needs. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Identify and evaluate your data access pattern to select the correct storage configuration. Each database solution has options to configure and optimize your storage solution. Use the collected metrics and logs and experiment with options to find the optimal configuration. Use the following table to review storage options per database service. 


|  AWS Services  |  Amazon RDS, Amazon Aurora  |  Amazon DynamoDB  |  Amazon DocumentDB  |  Amazon ElastiCache  |  Amazon Neptune  |  Amazon Timestream  |  Amazon Keyspaces  |  Amazon QLDB  | 
| --- | --- | --- | --- | --- | --- | --- | --- | --- | 
|  Scaling Storage  |  Storage automatic scaling option available to automatically scale provisioned storage IOPS can also be scaled independently of provisioned storage when leveraging provisioned IOPs storage types  |  Automatically scales. Tables are unconstrained in terms of size.  |  Storage automatic scaling option available scale provisioned storage  |  Storage is in-memory, tied to instance type or count  |  Storage automatic scaling option available to automatically scale provisioned storage  |  Configure retention period for in-memory and magnetic tiers in days  |  Scales table storage up and down automatically  |  Automatically scales. Tables are unconstrained in terms of size.  | 

 

 **Implementation steps:** 

1.  Identify and document the anticipated growth of the data and traffic. 

   1.  Amazon RDS and Aurora support storage automatic scaling up to documented limits. Beyond this, consider transitioning older data to Amazon S3 for archival, aggregating historical data for analytics or scaling horizontally via sharding. 

   1.  DynamoDB and Amazon S3 will scale to near limitless storage volume automatically. 

   1.  Amazon RDS instances and databases running on EC2 can be manually resized and EC2 instances can have new EBS volumes added at a later date for additional storage.  

   1.  Instance types can be changed based on changes in activity. For example, you can start with a smaller instance while you are testing, then scale the instance as you begin to receive production traffic to the service. Aurora Serverless V2 automatically scales in response to changes in load.  

1.  Document requirements around normal and peak performance (transactions per second TPS and queries per second QPS) and consistency (ACID and eventual consistency). 

1.  Document solution deployment aspects and the database access requirements (global, Mult-AZ, read replication, multiple write nodes) 

 **Level of effort for the implementation plan: **If you do not have logs or metrics for your data management solution, you will need to complete that before identifying and documenting your data access patterns. Once your data access pattern is understood, selecting, and configuring your data storage is a *low* level of effort. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+ [AWS Database Caching ](https://aws.amazon.com/caching/database-caching/)
+ [Amazon Athena top 10 performance tips ](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) 
+ [Amazon Aurora best practices](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html) 
+ [Amazon DynamoDB Accelerator ](https://aws.amazon.com/dynamodb/dax/) 
+ [Amazon DynamoDB best practices ](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html) 
+ [Amazon Redshift Spectrum best practices ](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/) 
+ [Amazon Redshift performance ](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html) 
+  [Cloud Databases with AWS](https://aws.amazon.com/products/databases/)
+  [Amazon RDS Storage Types](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_Storage.html) 

 **Related videos:** 
+ [AWS purpose-built databases (DAT209-L)](https://www.youtube.com/watch?v=q81TVuV5u28) 
+  [Amazon Aurora storage demystified: How it all works (DAT309-R) ](https://www.youtube.com/watch?v=uaQEGLKtw54)
+ [ Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1) ](https://www.youtube.com/watch?v=6yqfmXiZTlM)

 **Related examples:** 
+  [Experiment and test with Distributed Load Testing on AWS](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) 

# PERF04-BP05 Optimize data storage based on access patterns and metrics
<a name="perf_right_database_solution_optimize_metrics"></a>

 Use performance characteristics and access patterns that optimize how data is stored or queried to achieve the best possible performance. Measure how optimizations such as indexing, key distribution, data warehouse design, or caching strategies impact system performance or overall efficiency. 

 **Common anti-patterns:** 
+  You only use manual log file searching for metrics. 
+  You only publish metrics to internal tools. 

 **Benefits of establishing this best practice:** In order to ensure you are meeting the metrics required for the workload, you must monitor database performance metrics related to both reads and writes. You can use this data to add new optimizations for both reads and writes to the data storage layer. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>

 Optimize data storage based on metrics and patterns: Use reported metrics to identify any underperforming areas in your workload and optimize your database components. Each database system has different performance related characteristics to evaluate, such as how data is indexed, cached, or distributed among multiple systems. Measure the impact of your optimizations. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Database Caching](https://aws.amazon.com/caching/database-caching/) 
+  [Amazon Athena top 10 performance tips](https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/) 
+  [Amazon Aurora best practices](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.BestPractices.html) 
+  [Amazon DynamoDB Accelerator](https://aws.amazon.com/dynamodb/dax/) 
+  [Amazon DynamoDB best practices](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/BestPractices.html) 
+  [Amazon Redshift Spectrum best practices](https://aws.amazon.com/blogs/big-data/10-best-practices-for-amazon-redshift-spectrum/) 
+  [Amazon Redshift performance](https://docs.aws.amazon.com/redshift/latest/dg/c_challenges_achieving_high_performance_queries.html) 
+  [Cloud Databases with AWS](https://aws.amazon.com/products/databases/) 
+  [Analyzing performance anomalies with DevOps Guru for RDS](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/devops-guru-for-rds.html) 
+  [Read/Write Capacity Mode for DynamoDB](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html) 

 **Related videos:** 
+  [AWS purpose-built databases (DAT209-L)](https://www.youtube.com/watch?v=q81TVuV5u28) 
+  [Amazon Aurora storage demystified: How it all works (DAT309-R)](https://www.youtube.com/watch?v=uaQEGLKtw54) 
+  [Amazon DynamoDB deep dive: Advanced design patterns (DAT403-R1)](https://www.youtube.com/watch?v=6yqfmXiZTlM) 

 **Related examples:** 
+  [Hands-on Labs for Amazon DynamoDB](https://amazon-dynamodb-labs.workshop.aws/hands-on-labs.html) 

# PERF 5  How do you configure your networking solution?
<a name="perf-05"></a>

 The optimal network solution for a workload varies based on latency, throughput requirements, jitter, and bandwidth. Physical constraints, such as user or on-premises resources, determine location options. These constraints can be offset with edge locations or resource placement. 

**Topics**
+ [PERF05-BP01 Understand how networking impacts performance](perf_select_network_understand_impact.md)
+ [PERF05-BP02 Evaluate available networking features](perf_select_network_evaluate_features.md)
+ [PERF05-BP03 Choose appropriately sized dedicated connectivity or VPN for hybrid workloads](perf_select_network_hybrid.md)
+ [PERF05-BP04 Leverage load-balancing and encryption offloading](perf_select_network_encryption_offload.md)
+ [PERF05-BP05 Choose network protocols to improve performance](perf_select_network_protocols.md)
+ [PERF05-BP06 Choose your workload’s location based on network requirements](perf_select_network_location.md)
+ [PERF05-BP07 Optimize network configuration based on metrics](perf_select_network_optimize.md)

# PERF05-BP01 Understand how networking impacts performance
<a name="perf_select_network_understand_impact"></a>

 Analyze and understand how network-related decisions impact workload performance. The network is responsible for the connectivity between application components, cloud services, edge networks and on-premises data and therefor it can highly impact workload performance. In addition to workload performance, user experience is also impacted by network latency, bandwidth, protocols, location, network congestion, jitter, throughput, and routing rules. 

 **Desired outcome:** Have a documented list of networking requirements from the workload including latency, packet size, routing rules, protocols, and supporting traffic patterns. Review the available networking solutions and identify which service meets your workload networking characteristics. Cloud-based networks can be quickly rebuilt, so evolving your network architecture over time is necessary to improve performance efficiency. 

 **Common anti-patterns:** 
+  All traffic flows through your existing data centers. 
+  You overbuild Direct Connect sessions without understanding the actual usage requirements. 
+  You don’t consider workload characteristics and encryption overhead when defining your networking solutions. 
+  You use on-premises concepts and strategies for networking solutions in the cloud. 

 **Benefits of establishing this best practice:** Understanding how networking impacts workload performance will help you identify potential bottlenecks, improve user experience, increase reliability, and lower operational maintenance as the workload changes. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Identify important network performance metrics of your workload and capture its networking characteristics. Define and document requirements as part of a data-driven approach, using benchmarking or load testing. Use this data to identify where your network solution is constrained, and examine configuration options that could improve the workload. Understand the cloud-native networking features and options available and how they can impact your workload performance based on the requirements. Each networking feature has advantages and disadvantages and can be configured to meet your workload characteristics and scale based on your needs. 

 **Implementation steps:** 

1.  Define and document networking performance requirements: 

   1.  Include metrics such as network latency, bandwidth, protocols, locations, traffic patterns (spikes and frequency), throughput, encryption, inspection, and routing rules 

1.  Capture your foundational networking characteristics: 

   1.  [VPC Flow Logs ](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 

   1.  [AWS Transit Gateway metrics](https://docs.aws.amazon.com/vpc/latest/tgw/transit-gateway-cloudwatch-metrics.html) 

   1.  [AWS PrivateLink metrics](https://docs.aws.amazon.com/vpc/latest/privatelink/privatelink-cloudwatch-metrics.html) 

1.  Capture your application networking characteristics: 

   1.  [Elastic Network Adaptor](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html) 

   1.  [AWS App Mesh metrics](https://docs.aws.amazon.com/app-mesh/latest/userguide/envoy-metrics.html) 

   1.  [Amazon API Gateway metrics](https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-metrics-and-dimensions.html) 

1.  Capture your edge networking characteristics: 

   1.  [Amazon CloudFront metrics](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/viewing-cloudfront-metrics.html) 

   1.  [Amazon Route 53 metrics](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/monitoring-cloudwatch.html) 

   1.  [AWS Global Accelerator metrics](https://docs.aws.amazon.com/global-accelerator/latest/dg/cloudwatch-monitoring.html) 

1.  Capture your hybrid networking characteristics: 

   1.  [Direct Connect metrics](https://docs.aws.amazon.com/directconnect/latest/UserGuide/monitoring-cloudwatch.html) 

   1.  [AWS Site-to-Site VPN metrics](https://docs.aws.amazon.com/vpn/latest/s2svpn/monitoring-cloudwatch-vpn.html) 

   1.  [AWS Client VPN metrics](https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/monitoring-cloudwatch.html) 

   1.  [AWS Cloud WAN metrics](https://docs.aws.amazon.com/vpc/latest/cloudwan/cloudwan-cloudwatch-metrics.html) 

1.  Capture your security networking characteristics: 

   1.  [AWS Shield, WAF, and Network Firewall metrics](https://docs.aws.amazon.com/waf/latest/developerguide/monitoring-cloudwatch.html) 

1.  Capture end-to-end performance metrics with tracing tools: 

   1.  [AWS X-Ray](https://aws.amazon.com/xray/) 

   1.  [Amazon CloudWatch RUM](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-RUM.html) 

1.  Benchmark and test network performance: 

   1.  [Benchmark](https://aws.amazon.com/premiumsupport/knowledge-center/network-throughput-benchmark-linux-ec2/) network throughput: Some factors that can affect EC2 network performance when the instances are in the same VPC. Measure the network bandwidth between EC2 Linux instances in the same VPC. 

   1.  Perform [load tests](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) to experiment with networking solutions and options 

 **Level of effort for the implementation plan: **There is a *medium* level of effort to document workload networking requirements, options, and available solutions. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+ [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) 
+ [EC2 Enhanced Networking on Linux ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) 
+ [EC2 Enhanced Networking on Windows ](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/enhanced-networking.html) 
+ [EC2 Placement Groups ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) 
+ [Enabling Enhanced Networking with the Elastic Network Adapter (ENA) on Linux Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) 
+ [Network Load Balancer ](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+ [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+  [Transit Gateway ](https://docs.aws.amazon.com/vpc/latest/tgw)
+ [Transitioning to latency-based routing in Amazon Route 53 ](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+ [VPC Endpoints ](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+ [VPC Flow Logs ](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 

 **Related videos:** 
+ [Connectivity to AWS and hybrid AWS network architectures (NET317-R1) ](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+ [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1) ](https://www.youtube.com/watch?v=DWiwuYtIgu0) 
+  [Improve Global Network Performance for Applications](https://youtu.be/vNIALfLTW9M) 
+  [EC2 Instances and Performance Optimization Best Practices](https://youtu.be/W0PKclqP3U0) 
+  [Optimizing Network Performance for Amazon EC2 Instances](https://youtu.be/DWiwuYtIgu0) 
+  [Networking best practices and tips with the Well-Architected Framework](https://youtu.be/wOMNpG49BeM) 
+  [AWS networking best practices in large-scale migrations](https://youtu.be/qCQvwLBjcbs) 

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 

# PERF05-BP02 Evaluate available networking features
<a name="perf_select_network_evaluate_features"></a>

Evaluate networking features in the cloud that may increase performance. Measure the impact of these features through testing, metrics, and analysis. For example, take advantage of network-level features that are available to reduce latency, packet loss, or jitter. 

Many services are created to improve performance and others commonly offer features to optimize network performance. Services such as AWS Global Accelerator and Amazon CloudFront exist to improve performance while most other services have product features to optimize network traffic. Review service features, such as EC2 instance network capability, enhanced networking instance types, Amazon EBS-optimized instances, Amazon S3 transfer acceleration, and CloudFront, to improve your workload performance. 

**Desired outcome:** You have documented the inventory of components within your workload and have identified which networking configurations per component will help you meet your performance requirements. After evaluating the networking features, you have experimented and measured the performance metrics to identify how to use the features available to you. 

**Common anti-patterns:** 
+ You put all your workloads into an AWS Region closest to your headquarters instead of an AWS Region close to your end users. 
+ Failing to benchmark your workload performance and continually evaluating your workload performance against that benchmark.
+ You do not review service configurations for performance improving options. 

**Benefits of establishing this best practice:** Evaluating all service features and options can increase your workload performance, reduce the cost of infrastructure, decrease the effort required to maintain your workload, and increase your overall security posture. You can use the global AWS backbone to ensure that you provide the optimal networking experience for your customers. 

**Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

Review which network-related configuration options are available to you, and how they could impact your workload. Understanding how these options interact with your architecture and the impact that they will have on both measured performance and the performance perceived by users is critical for performance optimization. 

**Implementation steps:** 

1. Create a list of workload components. 

   1. Build, manage and monitor your organizations network using [AWS Cloud WAN](https://aws.amazon.com/cloud-wan/). 

   1. Get visibility into your network using [Network Manager](https://docs.aws.amazon.com/vpc/latest/tgwnm/what-is-network-manager.html). Use an existing configuration management database (CMDB) tool or a tool such as [AWS Config](https://aws.amazon.com/config/) to create an inventory of your workload and how it’s configured. 

1. If this is an existing workload, identify and document the benchmark for your performance metrics, focusing on the bottlenecks and areas to improve. Performance-related networking metrics will differ per workload based on business requirements and workload characteristics. As a start, these metrics might be important to review for your workload: bandwidth, latency, packet loss, jitter, and retransmits. 

1. If this is a new workload, perform [load tests](https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/) to identify performance bottlenecks. 

1. For the performance bottlenecks you identify, review the configuration options for your solutions to identify performance improvement opportunities. 

1. If you don’t know your network path or routes, use [Network Access Analyzer](https://docs.aws.amazon.com/vpc/latest/network-access-analyzer/what-is-vaa.html) to identify them. 

1. Review your network protocols to further reduce your latency.
   + [PERF05-BP05 Choose network protocols to improve performance](perf_select_network_protocols.md) 

1. If you are using an AWS Site-to-Site VPN across multiple locations to connect to an AWS Region, then review [accelerated Site-to-Site VPN connections](https://docs.aws.amazon.com/vpn/latest/s2svpn/accelerated-vpn.html) for opportunities to improve networking performance.

1. When your workload traffic is spread across multiple accounts, evaluate your network topology and services to reduce latency. 
   + Evaluate your operational and performance tradeoffs between [VPC Peering](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html) and [AWS Transit Gateway](https://aws.amazon.com/transit-gateway/) when connecting multiple accounts. AWS Transit Gateway supports an AWS Site-to-Site VPN throughput to scale beyond a single [IPsec maximum limit](https://aws.amazon.com/blogs/networking-and-content-delivery/scaling-vpn-throughput-using-aws-transit-gateway/) by using multi-path. Traffic between an Amazon VPC and AWS Transit Gateway remains on the private AWS network and is not exposed to the internet. AWS Transit Gateway simplifies how you interconnect all of your VPCs, which can span across thousands of AWS accounts and into on-premises networks. Share your AWS Transit Gateway between multiple accounts using [Resource Access Manager](https://aws.amazon.com/ram/). To get visibility into your global network traffic, use [Network Manager](https://aws.amazon.com/transit-gateway/network-manager/) to get a central view of your network metrics. 

1. Review your user locations and minimize the distance between your users and the workload.

   1. [AWS Global Accelerator](https://aws.amazon.com/global-accelerator/) is a networking service that improves the performance of your users’ traffic by up to 60% using the Amazon Web Services global network infrastructure. When the internet is congested, AWS Global Accelerator optimizes the path to your application to keep packet loss, jitter, and latency consistently low. It also provides static IP addresses that simplify moving endpoints between Availability Zones or AWS Regions without needing to update your DNS configuration or change client-facing applications. 

   1. [Amazon CloudFront](https://aws.amazon.com/cloudfront/) can improve the performance of your workload content delivery and latency globally. CloudFront has over 410 globally dispersed points of presence that can cache your content and lower the latency to the end user. 

   1. Amazon Route 53 offers [latency-based routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-latency.html), [geolocation routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-geo.html), [geoproximity routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-geoproximity.html), and [IP-based routing](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/routing-policy-ipbased.html) options to help you improve your workload’s performance for a global audience. Identify which routing option would optimize your workload performance by reviewing your workload traffic and user location. 

1. Evaluate additional Amazon S3 features to improve storage IOPs. 

   1.  [Amazon S3 Transfer acceleration](https://aws.amazon.com/s3/transfer-acceleration/) is a feature that lets external users benefit from the networking optimizations of CloudFront to upload data to Amazon S3. This improves the ability to transfer large amounts of data from remote locations that don’t have dedicated connectivity to the AWS Cloud. 

   1.  [Amazon S3 Multi-Region Access Points](https://docs.aws.amazon.com/AmazonS3/latest/userguide/MultiRegionAccessPoints.html) replicates content to multiple Regions and simplifies the workload by providing one access point. When a Multi-Region Access Point is used, you can request or write data to Amazon S3 with the service identifying the lowest latency bucket. 

1. Review your compute resource network bandwidth.

   1. Elastic Network Interfaces (ENA) used by EC2 instances, containers, and Lambda functions are limited on a per-flow basis. Review your placement groups to optimize your [EC2 networking throughput](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html). To avoid the bottleneck at the per flow-basis, design your application to use multiple flows. To monitor and get visibility into your compute related networking metrics, use [CloudWatch Metrics](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ec2-instance-network-bandwidth.html) and [https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html). `ethtool` is included in the ENA driver and exposes additional network-related metrics that can be published as a [custom metric](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html) to CloudWatch. 

   1. Newer EC2 instances can leverage enhanced networking. [N-series EC2 instances](https://aws.amazon.com/ec2/nitro/), such as `M5n` and `M5dn`, take advantage of the fourth generation of custom Nitro cards to deliver up to 100 Gbps of network throughput to a single instance. These instances offer four times the network bandwidth and packet process compared to the base `M5` instances and are ideal for network intensive applications. 

   1. [Amazon Elastic Network Adapters](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) (ENA) provide further optimization by delivering better throughput for your instances within a [cluster placement group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html#placement-groups-cluster%23placement-groups-limitations-cluster). 

   1. [Elastic Fabric Adapter](https://aws.amazon.com/hpc/efa/) (EFA) is a network interface for Amazon EC2 instances that enables you to run workloads requiring high levels of internode communications at scale on AWS. With EFA, High Performance Computing (HPC) applications using the Message Passing Interface (MPI) and Machine Learning (ML) applications using NVIDIA Collective Communications Library (NCCL) can scale to thousands of CPUs or GPUs. 

   1. [Amazon EBS-optimized](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html) instances use an optimized configuration stack and provide additional, dedicated capacity to increase the Amazon EBS I/O. This optimization provides the best performance for your EBS volumes by minimizing contention between Amazon EBS I/O and other traffic from your instance. 

**Level of effort for the implementation plan: **

To establish this best practice, you must be aware of your current workload component options that impact network performance. Gathering the components, evaluating network improvement options, experimenting, implementing, and documenting those improvements is a *low* to *moderate* level of effort. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS - Optimized Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html) 
+  [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) 
+  [Amazon EC2 instance network bandwidth](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-network-bandwidth.html) 
+  [EC2 Enhanced Networking on Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) 
+  [EC2 Enhanced Networking on Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/enhanced-networking.html) 
+  [EC2 Placement Groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) 
+  [Enabling Enhanced Networking with the Elastic Network Adapter (ENA) on Linux Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) 
+  [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+  [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+  [AWS Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw) 
+  [Transitioning to Latency-Based Routing in Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+  [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+  [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 
+  [Building a cloud CMDB](https://aws.amazon.com/blogs/mt/building-a-cloud-cmdb-on-aws-for-consistent-resource-configuration-in-hybrid-environments/) 
+  [Scaling VPN throughput using AWS Transit Gateway](https://aws.amazon.com/blogs/networking-and-content-delivery/scaling-vpn-throughput-using-aws-transit-gateway/) 

 **Related videos:** 
+  [Connectivity to AWS and hybrid AWS network architectures (NET317-R1)](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+  [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1)](https://www.youtube.com/watch?v=DWiwuYtIgu0) 
+  [AWS Global Accelerator](https://www.youtube.com/watch?v=lAOhr-5Urfk) 

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 

# PERF05-BP03 Choose appropriately sized dedicated connectivity or VPN for hybrid workloads
<a name="perf_select_network_hybrid"></a>

 When a common network is required to connect on-premises and cloud resources in AWS, ensure that you have adequate bandwidth to meet your performance requirements. Estimate the bandwidth and latency requirements for your hybrid workload. These numbers will drive the sizing requirements for AWS Direct Connect or your VPN endpoints. 

 **Desired outcome:** When deploying a workload that will need hybrid network connectivity, you have multiple configuration options for connectivity, such as managed and non-managed VPNs or Direct Connect. Select the appropriate connection type for each workload while ensuring you have adequate bandwidth and encryption requirements between your location and the cloud. 

 **Common anti-patterns:** 
+  You only evaluate VPN solutions for your network encryption requirements. 
+  You don’t evaluate backup or parallel connectivity options. 
+  You use default configurations for routers, tunnels, and BGP sessions. 
+  You fail to understand or identify all workload requirements (encryption, protocol, bandwidth and traffic needs). 

 **Benefits of establishing this best practice:** Selecting and configuring appropriately sized hybrid network solutions will increase the reliability of your workload and maximize performance opportunities. By identifying workload requirements, planning ahead, and evaluating hybrid solutions you will minimize expensive physical network changes and operational overhead while increasing your time to market. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Develop a hybrid networking architecture based on your bandwidth requirements: Estimate the bandwidth and latency requirements of your hybrid applications. Based on your bandwidth requirements, a single VPN or Direct Connect connection might not be enough, and you must architect a hybrid setup to enable traffic load balancing across multiple connections. Direct connect may be required which offers more predictable and consistent performance due to its private network connectivity. It is great for production workloads that require consistent latency and almost zero jitter. 

 AWS Direct Connect provides dedicated connectivity to the AWS environment, from 50 Mbps up to 10 Gbps. This gives you managed and controlled latency and provisioned bandwidth so your workload can connect easily and in a performant way to other environments. Using one of the AWS Direct Connect partners, you can have end-to-end connectivity from multiple environments, thus providing an extended network with consistent performance. 

 The AWS Site-to-Site VPN is a managed VPN service for VPCs. When a VPN connection is created, AWS provides tunnels to two different VPN endpoints. With AWS Transit Gateway, you can simplify the connectivity between multiple VPCs and also connect to any VPC attached to AWS Transit Gateway with a single VPN connection. AWS Transit Gateway also enables you to scale beyond the 1.25Gbps IPsec VPN throughput limit by enabling equal cost multi-path (ECMP) routing support over multiple VPN tunnels. 

 **Level of effort for the implementation plan: **There is a *high* level of effort to evaluate workload needs for hybrid networks and to implement hybrid networking solutions. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+ [Network Load Balancer ](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+ [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+ [Transit Gateway ](https://docs.aws.amazon.com/vpc/latest/tgw) 
+ [Transitioning to latency-based Routing in Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+ [VPC Endpoints ](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+ [VPC Flow Logs ](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 
+  [Site-to-Site VPN](https://docs.aws.amazon.com/vpn/latest/s2svpn/VPC_VPN.html) 
+  [Building a Scalable and Secure Multi-VPC AWS Network Infrastructure](https://docs.aws.amazon.com/whitepapers/latest/building-scalable-secure-multi-vpc-network-infrastructure/welcome.html) 
+  [Direct Connect](https://docs.aws.amazon.com/directconnect/latest/UserGuide/Welcome.html) 
+  [Client VPN](https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/what-is.html) 

 **Related videos:** 
+ [Connectivity to AWS and hybrid AWS network architectures (NET317-R1) ](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+ [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1) ](https://www.youtube.com/watch?v=DWiwuYtIgu0) 
+  [AWS Global Accelerator](https://www.youtube.com/watch?v=lAOhr-5Urfk) 
+  [Direct Connect* *](https://www.youtube.com/watch?v=DXFooR95BYc&t=6s) 
+  [Transit Gateway Connect](https://www.youtube.com/watch?v=_MPY_LHSKtM&t=491s) 
+  [VPN Solutions](https://www.youtube.com/watch?v=qmKkbuS9gRs) 
+  [Security with VPN Solutions](https://www.youtube.com/watch?v=FrhVV9nG4UM) 

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 

# PERF05-BP04 Leverage load-balancing and encryption offloading
<a name="perf_select_network_encryption_offload"></a>

 Distribute traffic across multiple resources or services to allow your workload to take advantage of the elasticity that the cloud provides. You can also use load balancing for offloading encryption termination to improve performance and to manage and route traffic effectively. 

 When implementing a scale-out architecture where you want to use multiple instances for service content, you can use load balancers inside your Amazon VPC. AWS provides multiple models for your applications in the ELB service. Application Load Balancer is best suited for load balancing of HTTP and HTTPS traffic and provides advanced request routing targeted at the delivery of modern application architectures, including microservices and containers. 

 Network Load Balancer is best suited for load balancing of TCP traffic where extreme performance is required. It is capable of handling millions of requests per second while maintaining ultra-low latencies, and it is optimized to handle sudden and volatile traffic patterns. 

 [https://aws.amazon.com/elasticloadbalancing/](https://aws.amazon.com/elasticloadbalancing/) provides integrated certificate management and SSL/TLS decryption, allowing you the flexibility to centrally manage the SSL settings of the load balancer and offload CPU intensive work from your workload. 

 **Common anti-patterns:** 
+  You route all internet traffic through existing load balancers. 
+  You use generic TCP load balancing and making each compute node handle SSL encryption. 

 **Benefits of establishing this best practice:** A load balancer handles the varying load of your application traffic in a single Availability Zone, or across multiple Availability Zones. Load balancers feature the high availability, automatic scaling, and robust security necessary to make your applications fault tolerant. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Use the appropriate load balancer for your workload: Select the appropriate load balancer for your workload. If you must load balance HTTP requests, we recommend Application Load Balancer. For network and transport protocols (layer 4 – TCP, UDP) load balancing, and for extreme performance and low latency applications, we recommend Network Load Balancer. Application Load Balancers support HTTPS and Network Load Balancers support TLS encryption offloading. 

 Enable offload of HTTPS or TLS encryption: Elastic Load Balancing includes integrated certificate management, user-authentication, and SSL/TLS decryption. It provides the flexibility to centrally manage TLS settings and offload CPU intensive workloads from your applications. Encrypt all HTTPS traffic as part of your load balancer deployment. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS - Optimized Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html) 
+  [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) 
+  [EC2 Enhanced Networking on Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) 
+  [EC2 Enhanced Networking on Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/enhanced-networking.html) 
+  [EC2 Placement Groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) 
+  [Enabling Enhanced Networking with the Elastic Network Adapter (ENA) on Linux Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) 
+  [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+  [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+  [Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw) 
+  [Transitioning to Latency-Based Routing in Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+  [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+  [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 

 **Related videos:** 
+  [Connectivity to AWS and hybrid AWS network architectures (NET317-R1)](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+  [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1)](https://www.youtube.com/watch?v=DWiwuYtIgu0) 

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 

# PERF05-BP05 Choose network protocols to improve performance
<a name="perf_select_network_protocols"></a>

 Make decisions about protocols for communication between systems and networks based on the impact to the workload’s performance. 

 There is a relationship between latency and bandwidth to achieve throughput. If your file transfer is using TCP, higher latencies will reduce overall throughput. There are approaches to fix this with TCP tuning and optimized transfer protocols, some approaches use UDP. 

 **Common anti-patterns:** 
+  You use TCP for all workloads regardless of performance requirements. 

 **Benefits of establishing this best practice:** Selecting the proper protocol for communication between workload components ensures that you are getting the best performance for that workload. Connection-less UDP allows for high speed, but it doesn't offer retransmission or high reliability. TCP is a full featured protocol, but it requires greater overhead for processing the packets. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Optimize network traffic: Select the appropriate protocol to optimize the performance of your workload. There is a relationship between latency and bandwidth to achieve throughput. If your file transfer is using TCP, higher latencies reduce overall throughput. There are approaches to fix latency with TCP tuning and optimized transfer protocols, some which use UDP. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS - Optimized Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html) 
+  [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) 
+  [EC2 Enhanced Networking on Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) 
+  [EC2 Enhanced Networking on Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/enhanced-networking.html) 
+  [EC2 Placement Groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) 
+  [Enabling Enhanced Networking with the Elastic Network Adapter (ENA) on Linux Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) 
+  [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+  [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+  [Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw) 
+  [Transitioning to Latency-Based Routing in Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+  [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+  [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 

 **Related videos:** 
+  [Connectivity to AWS and hybrid AWS network architectures (NET317-R1)](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+  [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1)](https://www.youtube.com/watch?v=DWiwuYtIgu0) 

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 

# PERF05-BP06 Choose your workload’s location based on network requirements
<a name="perf_select_network_location"></a>

 Use the cloud location options available to reduce network latency or improve throughput. Use AWS Regions, Availability Zones, placement groups, and edge locations such as AWS Outposts, AWS Local Zones, and AWS Wavelength, to reduce network latency or improve throughput. 

 The AWS Cloud infrastructure is built around Regions and Availability Zones. A Region is a physical location in the world having multiple Availability Zones. 

 Availability Zones consist of one or more discrete data centers, each with redundant power, networking, and connectivity, housed in separate facilities. These Availability Zones offer you the ability to operate production applications and databases that are more highly available, fault tolerant, and scalable than would be possible from a single data center 

 Choose the appropriate Region or Regions for your deployment based on the following key elements: 
+  **Where your users are located**: Choosing a Region close to your workload’s users ensures lower latency when they use the workload. 
+  **Where your data is located**: For data-heavy applications, the major bottleneck in latency is data transfer. Application code should execute as close to the data as possible. 
+  **Other constraints**: Consider constraints such as security and compliance. 

 Amazon EC2 provides placement groups for networking. A placement group is a logical grouping of instances to decrease latency or increase reliability. Using placement groups with supported instance types and an Elastic Network Adapter (ENA) enables workloads to participate in a low-latency, 25 Gbps network. Placement groups are recommended for workloads that benefit from low network latency, high network throughput, or both. Using placement groups has the benefit of lowering jitter in network communications. 

 Latency-sensitive services are delivered at the edge using a global network of edge locations. These edge locations commonly provide services such as content delivery network (CDN) and domain name system (DNS). By having these services at the edge, workloads can respond with low latency to requests for content or DNS resolution. These services also provide geographic services such as geo targeting of content (providing different content based on the end users’ location), or latency-based routing to direct end users to the nearest Region (minimum latency). 

 [https://aws.amazon.com/cloudfront/](https://aws.amazon.com/cloudfront/) is a global CDN that can be used to accelerate both static content such as images, scripts, and videos, as well as dynamic content such as APIs or web applications. It relies on a global network of edge locations that will cache the content and provide high-performance network connectivity to your users. CloudFront also accelerates many other features such as content uploading and dynamic applications, making it a performance addition to all applications serving traffic over the internet. [https://aws.amazon.com/lambda/edge/](https://aws.amazon.com/lambda/edge/) is a feature of Amazon CloudFront that will let you run code closer to users of your workload, which improves performance and reduces latency. 

 Amazon Route 53 is a highly available and scalable cloud DNS web service. It’s designed to give developers and businesses an extremely reliable and cost-effective way to route end users to internet applications by translating names, like www.example.com, into numeric IP addresses, like 192.168.2.1, that computers use to connect to each other. Route 53 is fully compliant with IPv6. 

 [https://aws.amazon.com/outposts/](https://aws.amazon.com/outposts/) is designed for workloads that need to remain on-premises due to latency requirements, where you want that workload to run seamlessly with the rest of your other workloads in AWS. AWS Outposts are fully managed and configurable compute and storage racks built with AWS-designed hardware that allow you to run compute and storage on-premises, while seamlessly connecting to the broad array of AWS services in in the cloud. 

 [https://aws.amazon.com/about-aws/global-infrastructure/localzones/](https://aws.amazon.com/about-aws/global-infrastructure/localzones/) is designed to run workloads that require single-digit millisecond latency, like video rendering and graphics intensive, virtual desktop applications. Local Zones allow you to gain all the benefits of having compute and storage resources closer to end-users. 

 [https://aws.amazon.com/wavelength/](https://aws.amazon.com/wavelength/) is designed to deliver ultra-low latency applications to 5G devices by extending AWS infrastructure, services, APIs, and tools to 5G networks. Wavelength embeds storage and compute inside telco providers 5G networks to help your 5G workload if it requires single-digit millisecond latency, such as IoT devices, game streaming, autonomous vehicles, and live media production. 

 Use edge services to reduce latency and to enable content caching. Ensure that you have configured cache control correctly for both DNS and HTTP/HTTPS to gain the most benefit from these approaches. 

 **Common anti-patterns:** 
+  You consolidate all workload resources into one geographic location. 
+  You chose the closest region to your location but not to the workload end user. 

 **Benefits of establishing this best practice:** You must ensure that your network is available wherever you want to reach customers. Using the AWS private global network ensures that your customers get the lowest latency experience by deploying workloads into the locations nearest them. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Reduce latency by selecting the correct locations: Identify where your users and data are located. Take advantage of AWS Regions, Availability Zones, placement groups, and edge locations to reduce latency. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS - Optimized Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html) 
+  [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) 
+  [EC2 Enhanced Networking on Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) 
+  [EC2 Enhanced Networking on Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/enhanced-networking.html) 
+  [EC2 Placement Groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) 
+  [Enabling Enhanced Networking with the Elastic Network Adapter (ENA) on Linux Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) 
+  [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+  [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+  [Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw) 
+  [Transitioning to Latency-Based Routing in Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+  [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+  [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 

 **Related videos:** 
+  [Connectivity to AWS and hybrid AWS network architectures (NET317-R1)](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+  [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1)](https://www.youtube.com/watch?v=DWiwuYtIgu0) 

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 

# PERF05-BP07 Optimize network configuration based on metrics
<a name="perf_select_network_optimize"></a>

 Use collected and analyzed data to make informed decisions about optimizing your network configuration. Measure the impact of those changes and use the impact measurements to make future decisions. 

 Enable VPC Flow Logs for all VPC networks that are used by your workload. VPC Flow Logs are a feature that allows you to capture information about the IP traffic going to and from network interfaces in your VPC. VPC Flow Logs help you with a number of tasks, such as troubleshooting why specific traffic is not reaching an instance, which in turn helps you diagnose overly restrictive security group rules. You can use flow logs as a security tool to monitor the traffic that is reaching your instance, to profile your network traffic, and to look for abnormal traffic behaviors. 

 Use networking metrics to make changes to networking configuration as the workload evolves. Cloud based networks can be quickly rebuilt, so evolving your network architecture over time is necessary to maintain performance efficiency. 

 **Common anti-patterns:** 
+  You assume that all performance-related issues are application-related. 
+  You only test your network performance from a location close to where you have deployed the workload. 

 **Benefits of establishing this best practice: T**o ensure that you are meeting the metrics required for the workload, you must monitor network performance metrics. You can capture information about the IP traffic going to and from network interfaces in your VPC and use this data to add new optimizations or deploy your workload to new geographic Regions. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>

 Enable VPC Flow Logs: VPC Flow Logs enable you to capture information about the IP traffic going to and from network interfaces in your VPC. VPC Flow Logs help you with a number of tasks, such as troubleshooting why specific traffic is not reaching an instance, which can help you diagnose overly restrictive security group rules. You can use flow logs as a security tool to monitor the traffic that is reaching your instance, to profile your network traffic, and to look for abnormal traffic behaviors. 

 Enable appropriate metrics for network options: Ensure that you select the appropriate network metrics for your workload. You can enable metrics for VPC NAT gateway, transit gateways, and VPN tunnels. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon EBS - Optimized Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html) 
+  [Application Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/application/introduction.html) 
+  [EC2 Enhanced Networking on Linux](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking.html) 
+  [EC2 Enhanced Networking on Windows](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/enhanced-networking.html) 
+  [EC2 Placement Groups](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/placement-groups.html) 
+  [Enabling Enhanced Networking with the Elastic Network Adapter (ENA) on Linux Instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html) 
+  [Network Load Balancer](https://docs.aws.amazon.com/elasticloadbalancing/latest/network/introduction.html) 
+  [Networking Products with AWS](https://aws.amazon.com/products/networking/) 
+  [Transit Gateway](https://docs.aws.amazon.com/vpc/latest/tgw) 
+  [Transitioning to Latency-Based Routing in Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/TutorialTransitionToLBR.html) 
+  [VPC Endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html) 
+  [VPC Flow Logs](https://docs.aws.amazon.com/vpc/latest/userguide/flow-logs.html) 
+  [Monitoring your global and core networks with Amazon Cloudwatch metrics](https://docs.aws.amazon.com/vpc/latest/tgwnm/monitoring-cloudwatch-metrics.html) 
+  [Continuously monitor network traffic and resources](https://docs.aws.amazon.com/whitepapers/latest/security-best-practices-for-manufacturing-ot/continuously-monitor-network-traffic-and-resources.html) 

 **Related videos:** 
+  [Connectivity to AWS and hybrid AWS network architectures (NET317-R1)](https://www.youtube.com/watch?v=eqW6CPb58gs) 
+  [Optimizing Network Performance for Amazon EC2 Instances (CMP308-R1)](https://www.youtube.com/watch?v=DWiwuYtIgu0) 
+  [Monitoring and troubleshooting network traffic](https://www.youtube.com/watch?v=Ed09ReWRQXc) 
+  [Simplify Traffic Monitoring and Visibility with Amazon VPC Traffic Mirroring](https://www.youtube.com/watch?v=zPovlZxuZ-c) 

 **Related examples:** 
+  [AWS Transit Gateway and Scalable Security Solutions](https://github.com/aws-samples/aws-transit-gateway-and-scalable-security-solutions) 
+  [AWS Networking Workshops](https://networking.workshop.aws/) 
+  [AWS Network Monitoring](https://github.com/aws-samples/monitor-vpc-network-patterns) 