# PERF 2  How do you select your compute solution?
<a name="perf-02"></a>

The optimal compute solution for a workload varies based on application design, usage patterns, and configuration settings. Architectures can use different compute solutions for various components and enable different features to improve performance. Selecting the wrong compute solution for an architecture can lead to lower performance efficiency.

**Topics**
+ [PERF02-BP01 Evaluate the available compute options](perf_select_compute_evaluate_options.md)
+ [PERF02-BP02 Understand the available compute configuration options](perf_select_compute_config_options.md)
+ [PERF02-BP03 Collect compute-related metrics](perf_select_compute_collect_metrics.md)
+ [PERF02-BP04 Determine the required configuration by right-sizing](perf_select_compute_right_sizing.md)
+ [PERF02-BP05 Use the available elasticity of resources](perf_select_compute_elasticity.md)
+ [PERF02-BP06 Re-evaluate compute needs based on metrics](perf_select_compute_use_metrics.md)

# PERF02-BP01 Evaluate the available compute options
<a name="perf_select_compute_evaluate_options"></a>

 Understand how your workload can benefit from the use of different compute options, such as instances, containers and functions. 

 **Desired outcome:** By understanding all of the compute options available, you will be aware of the opportunities to increase performance, reduce unnecessary infrastructure costs, and lower the operational effort required to maintain your workload. You can also accelerate your time to market when you deploy new services and features. 

 **Common anti-patterns:** 
+  In a post-migration workload, using the same compute solution that was being used on premises. 
+  Lacking awareness of the cloud compute solutions and how those solutions might improve your compute performance. 
+  Oversizing an existing compute solution to meet scaling or performance requirements, when an alternative compute solution would align to your workload characteristics more precisely. 

 **Benefits of establishing this best practice:** By identifying the compute requirements and evaluating the available compute solutions, business stakeholders and engineering teams will understand the benefits and limitations of using the selected compute solution. The selected compute solution should fit the workload performance criteria. Key criteria include processing needs, traffic patterns, data access patterns, scaling needs, and latency requirements. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Understand the virtualization, containerization, and management solutions that can benefit your workload and meet your performance requirements. A workload can contain multiple types of compute solutions. Each compute solution has differing characteristics. Based on your workload scale and compute requirements, a compute solution can be selected and configured to meet your needs. The cloud architect should learn the advantages and disadvantages of instances, containers, and functions. The following steps will help you through how to select your compute solution to match your workload characteristics and performance requirements. 


|  **Type**  |  **Server**  |  **Containers**  |  **Function**  | 
| --- | --- | --- | --- | 
|  AWS service  |  Amazon Elastic Compute Cloud (Amazon EC2)  |  Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS)  |  AWS Lambda  | 
|  Key Characteristics  |  Has dedicated option for hardware license requirements, Placement Options, and a large selection of different instance families based on compute metrics  |  Easy deployment, consistent environments, runs on top of EC2 instances, Scalable  |  Short runtime (15 minutes or less), maximum memory and CPU are not as high as other services, Managed hardware layer, Scales to millions of concurrent requests  | 
|  Common use-cases  |  Lift and shift migrations, monolithic application, hybrid environments, enterprise applications  |  Microservices, hybrid environments,  |  Microservices, event-driven applications  | 

 
 **Implementation steps:** 

1.  Select the location of where the compute solution must reside by evaluating [PERF05-BP06 Choose your workload’s location based on network requirements](perf_select_network_location.md). This location will limit the types of compute solution available to you. 

1.  Identify the type of compute solution that works with the location requirement and application requirements  

   1.  [https://aws.amazon.com/ec2/](https://aws.amazon.com/ec2/) virtual server instances come in a wide variety of different families and sizes. They offer a wide variety of capabilities, including solid state drives (SSDs) and graphics processing units (GPUs). EC2 instances offer the greatest flexibility on instance choice. When you launch an EC2 instance, the instance type that you specify determines the hardware of your instance. Each instance type offers different compute, memory, and storage capabilities. Instance types are grouped in instance families based on these capabilities. Typical use cases include: running enterprise applications, high performance computing (HPC), training and deploying machine learning applications and running cloud native applications. 

   1.  [https://aws.amazon.com/ecs/](https://aws.amazon.com/ecs/) is a fully managed container orchestration service that allows you to automatically run and manage containers on a cluster of EC2 instances or serverless instances using AWS Fargate. You can use Amazon ECS with other services such as Amazon Route 53, Secrets Manager, AWS Identity and Access Management (IAM), and Amazon CloudWatch. Amazon ECS is recommended if your application is containerized and your engineering team prefers Docker containers. 

   1.  [https://aws.amazon.com/eks/](https://aws.amazon.com/eks/) is a fully managed Kubernetes service. You can choose to run your EKS clusters using AWS Fargate, removing the need to provision and manage servers. Managing Amazon EKS is simplified due to integrations with AWS Services such as Amazon CloudWatch, Auto Scaling Groups, AWS Identity and Access Management (IAM), and Amazon Virtual Private Cloud (VPC). When using containers, you must use compute metrics to select the optimal type for your workload, similar to how you use compute metrics to select your EC2 or AWS Fargate instance types. Amazon EKS is recommended if your application is containerized and your engineering team prefers Kubernetes over Docker containers. 

   1.  You can use [https://aws.amazon.com/lambda/](https://aws.amazon.com/lambda/) to run code that supports the allowed runtime, memory, and CPU options. Simply upload your code, and AWS Lambda will manage everything required to run and scale that code. You can set up your code to automatically trigger from other AWS services or call it directly. Lambda is recommended for short running, microservice architectures developed for the cloud.  

1.  After you have experimented with your new compute solution, plan your migration and validate your performance metrics. This is a continual process, see [PERF02-BP04 Determine the required configuration by right-sizing](perf_select_compute_right_sizing.md). 

 **Level of effort for the implementation plan:** If a workload is moving from one compute solution to another, there could be a *moderate* level of effort involved in refactoring the application.   

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Compute with AWS ](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [EC2 Instance Types ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html?ref=wellarchitected) 
+  [Processor State Control for Your EC2 Instance ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html?ref=wellarchitected) 
+  [EKS Containers: EKS Worker Nodes ](https://docs.aws.amazon.com/eks/latest/userguide/worker.html?ref=wellarchitected) 
+  [Amazon ECS Containers: Amazon ECS Container Instances ](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html?ref=wellarchitected) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html?ref=wellarchitected#function-configuration) 
+  [Prescriptive Guidance for Containers](https://aws.amazon.com/prescriptive-guidance/?apg-all-cards.sort-by=item.additionalFields.sortText&apg-all-cards.sort-order=desc&awsf.apg-new-filter=*all&awsf.apg-content-type-filter=*all&awsf.apg-code-filter=*all&awsf.apg-category-filter=categories%23containers&awsf.apg-rtype-filter=*all&awsf.apg-isv-filter=*all&awsf.apg-product-filter=*all&awsf.apg-env-filter=*all) 
+  [Prescriptive Guidance for Serverless](https://aws.amazon.com/prescriptive-guidance/?apg-all-cards.sort-by=item.additionalFields.sortText&apg-all-cards.sort-order=desc&awsf.apg-new-filter=*all&awsf.apg-content-type-filter=*all&awsf.apg-code-filter=*all&awsf.apg-category-filter=categories%23serverless&awsf.apg-rtype-filter=*all&awsf.apg-isv-filter=*all&awsf.apg-product-filter=*all&awsf.apg-env-filter=*all) 

 **Related videos:** 
+  [How to choose compute option for startups](https://aws.amazon.com/startups/start-building/how-to-choose-compute-option/) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Amazon EC2 foundations (CMP211-R2) ](https://www.youtube.com/watch?v=kMMybKqC2Y0&ref=wellarchitected) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system ](https://www.youtube.com/watch?v=rUY-00yFlE4&ref=wellarchitected) 
+  [Deliver high-performance ML inference with AWS Inferentia (CMP324-R1) ](https://www.youtube.com/watch?v=17r1EapAxpk&ref=wellarchitected) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1) ](https://www.youtube.com/watch?v=_dvh4P2FVbw&ref=wellarchitected) 

 **Related examples:** 
+  [Migrating the web application to containers](https://application-migration-with-aws.workshop.aws/en/container-migration.html) 
+  [Run a Serverless Hello World](https://aws.amazon.com/getting-started/hands-on/run-serverless-code/) 

# PERF02-BP02 Understand the available compute configuration options
<a name="perf_select_compute_config_options"></a>

 Each compute solution has options and configurations available to you to support your workload characteristics. Learn how various options complement your workload, and which configuration options are best for your application. Examples of these options include instance family, sizes, features (GPU, I/O), bursting, time-outs, function sizes, container instances, and concurrency. 

 **Desired outcome:** The workload characteristics including CPU, memory, network throughput, GPU, IOPS, traffic patterns, and data access patterns are documented and used to configure the compute solution to match the workload characteristics. Each of these metrics plus custom metrics specific to your workload are recorded, monitored, and then used to optimize the compute configuration to best meet the requirements. 

 **Common anti-patterns:** 
+  Using the same compute solution that was being used on premises. 
+  Not reviewing the compute options or instance family to match workload characteristics. 
+  Oversizing the compute to ensure bursting capability. 
+  You use multiple compute management platforms for the same workload. 

** Benefits of establishing this best practice:** Be familiar with the AWS compute offerings so that you can determine the correct solution for each of your workloads. After you have selected the compute offerings for your workload, you can quickly experiment with those compute offerings to determine how well they meet your workload needs. A compute solution that is optimized to meet your workload characteristics will increase your performance, lower your cost and increase your reliability.

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 If your workload has been using the same compute option for more than four weeks and you anticipate that the characteristics will remain the same in the future, you can use [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) to provide a recommendation to you based on your compute characteristics. If AWS Compute Optimizer is not an option due to lack of metrics, [a non-supported instance type](https://docs.aws.amazon.com/compute-optimizer/latest/ug/requirements.html#requirements-ec2-instances) or a foreseeable change in your characteristics then you must predict your metrics based on load testing and experimentation.  

 **Implementation steps:** 

1.  Are you running on EC2 instances or containers with the EC2 Launch Type? 

   1.  Can your workload use GPUs to increase performance? 

      1.  [Accelerated Computing](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Accelerated_Computing) instances are GPU-based instances that provide the highest performance for machine learning training, inference and high performance computing. 

   1.  Does your workload run machine learning inference applications? 

      1.  [AWS Inferentia (Inf1)](https://aws.amazon.com/ec2/instance-types/inf1/) — Inf1 instances are built to support machine learning inference applications. Using Inf1 instances, customers can run large-scale machine learning inference applications, such as image recognition, speech recognition, natural language processing, personalization, and fraud detection. You can build a model in one of the popular machine learning frameworks, such as TensorFlow, PyTorch, or MXNet and use GPU instances, to train your model. After your machine learning model is trained to meet your requirements, you can deploy your model on Inf1 instances by using [AWS Neuron](https://aws.amazon.com/machine-learning/neuron/), a specialized software development kit (SDK) consisting of a compiler, runtime, and profiling tools that optimize the machine learning inference performance of Inferentia chips. 

   1.  Does your workload integrate with the low-level hardware to improve performance?  

      1.  [Field Programmable Gate Arrays (FPGA)](https://aws.amazon.com/ec2/instance-types/f1/) — Using FPGAs, you can optimize your workloads by having custom hardware-accelerated execution for your most demanding workloads. You can define your algorithms by leveraging supported general programming languages such as C or Go, or hardware-oriented languages such as Verilog or VHDL. 

   1.  Do you have at least four weeks of metrics and can predict that your traffic pattern and metrics will remain about the same in the future? 

      1.  Use [Compute Optimizer](https://aws.amazon.com/compute-optimizer/) to get a machine learning recommendation on which compute configuration best matches your compute characteristics. 

   1.  Is your workload performance constrained by the CPU metrics?  

      1.  [Compute-optimized](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Compute_Optimized) instances are ideal for the workloads that require high performing processors.  

   1.  Is your workload performance constrained by the memory metrics?  

      1.  [Memory-optimized](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Memory_Optimized) instances deliver large amounts of memory to support memory intensive workloads. 

   1.  Is your workload performance constrained by IOPS? 

      1.  [Storage-optimized](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Storage_Optimized) instances are designed for workloads that require high, sequential read and write access (IOPS) to local storage. 

   1.  Do your workload characteristics represent a balanced need across all metrics? 

      1.  Does your workload CPU need to burst to handle spikes in traffic? 

         1.  [Burstable Performance](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#Instance_Features) instances are similar to Compute Optimized instances except they offer the ability to burst past the fixed CPU baseline identified in a compute-optimized instance. 

      1.  [General Purpose](https://aws.amazon.com/ec2/instance-types/?trk=36c6da98-7b20-48fa-8225-4784bced9843&sc_channel=ps&sc_campaign=acquisition&sc_medium=ACQ-P|PS-GO|Brand|Desktop|SU|Compute|EC2|US|EN|Text&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types&ef_id=CjwKCAjwiuuRBhBvEiwAFXKaNNRXM5FrnFg5H8RGQ4bQKuUuK1rYWmU2iH-5H3VZPqEheB-pEm-GNBoCdD0QAvD_BwE:G:s&s_kwcid=AL!4422!3!536392622533!e!!g!!ec2%20instance%20types#General_Purpose) instances provide a balance of all characteristics to support a variety of workloads. 

   1.  Is your compute instance running on Linux and constrained by network throughput on the network interface card? 

      1.  Review [Performance Question 5, Best Practice 2: Evaluate available networking features](https://docs.aws.amazon.com/wellarchitected/latest/performance-efficiency-pillar/network-architecture-selection.html) to find the right instance type and family to meet your performance needs. 

   1.  Does your workload need consistent and predictable instances in a specific Availability Zone that you can commit to for a year?  

      1.  [Reserved Instances](https://aws.amazon.com/ec2/pricing/reserved-instances/) confirms capacity reservations in a specific Availability Zone. Reserved Instances are ideal for required compute power in a specific Availability Zone.  

   1.  Does your workload have licenses that require dedicated hardware? 

      1.  [Dedicated Hosts](https://aws.amazon.com/ec2/dedicated-hosts/) support existing software licenses and help you meet compliance requirements. 

   1.  Does your compute solution burst and require synchronous processing? 

      1.  [On-Demand Instances](https://aws.amazon.com/ec2/pricing/on-demand/) let you use the compute capacity by the hour or second with no long-term commitment. These instances are good for bursting above performance baseline needs. 

   1.  Is your compute solution stateless, fault-tolerant, and asynchronous?  

      1.  [Spot Instances](https://aws.amazon.com/ec2/spot/) let you take advantage of unused instance capacity for your stateless, fault-tolerant workloads.  

1.  Are you running containers on [Fargate](https://aws.amazon.com/fargate/)? 

   1.  Is your task performance constrained by the memory or CPU? 

      1.  Use the [Task Size](https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/capacity-tasksize.html) to adjust your memory or CPU. 

   1.  Is your performance being affected by your traffic pattern bursts? 

      1.  Use the [Auto Scaling](https://docs.aws.amazon.com/AmazonECS/latest/bestpracticesguide/capacity-autoscaling.html) configuration to match your traffic patterns. 

1.  Is your compute solution on [Lambda](https://docs.aws.amazon.com/lambda/latest/dg/gettingstarted-features.html)? 

   1.  Do you have at least four weeks of metrics and can predict that your traffic pattern and metrics will remain about the same in the future? 

      1.  Use [Compute Optimizer](https://aws.amazon.com/compute-optimizer/) to get a machine learning recommendation on which compute configuration best matches your compute characteristics. 

   1.  Do you not have enough metrics to use AWS Compute Optimizer? 

      1.  If you do not have metrics available to use Compute Optimizer, use [AWS Lambda Power Tuning](https://docs.aws.amazon.com/lambda/latest/operatorguide/profile-functions.html) to help select the best configuration. 

   1.  Is your function performance constrained by the memory or CPU? 

      1.  Configure your [Lambda memory](https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html#configuration-memory-console) to meet your performance needs metrics. 

   1.  Is your function timing out on execution? 

      1.  Change the [timeout settings](https://docs.aws.amazon.com/lambda/latest/dg/configuration-function-common.html) 

   1.  Is your function performance constrained by bursts of activity and concurrency?  

      1.  Configure the [concurrency settings](https://docs.aws.amazon.com/lambda/latest/dg/configuration-concurrency.html) to meet your performance requirements. 

   1.  Does your function execute asynchronously and is failing on retries? 

      1.  Configure the maximum age of the event and the maximum retry limit in the [asynchronous configuration](https://docs.aws.amazon.com/lambda/latest/dg/invocation-async.html) settings. 

## Level of effort for the implementation plan: 
<a name="level-of-effort-for-the-implementation-plan-to-establish-this-best-practice-you-must-be-aware-of-your-current-compute-characteristics-and-metrics.-gathering-those-metrics-establishing-a-baseline-and-then-using-those-metrics-to-identify-the-ideal-compute-option-is-a-low-to-moderate-level-of-effort.-this-is-best-validated-by-load-tests-and-experimentation."></a>

To establish this best practice, you must be aware of your current compute characteristics and metrics. Gathering those metrics, establishing a baseline and then using those metrics to identify the ideal compute option is a *low* to *moderate* level of effort. This is best validated by load tests and experimentation. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Compute with AWS ](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 
+  [EC2 Instance Types ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html?ref=wellarchitected) 
+  [Processor State Control for Your EC2 Instance ](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html?ref=wellarchitected) 
+  [EKS Containers: EKS Worker Nodes ](https://docs.aws.amazon.com/eks/latest/userguide/worker.html?ref=wellarchitected) 
+  [Amazon ECS Containers: Amazon ECS Container Instances ](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html?ref=wellarchitected) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html?ref=wellarchitected#function-configuration) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2) ](https://www.youtube.com/watch?v=kMMybKqC2Y0&ref=wellarchitected) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system ](https://www.youtube.com/watch?v=rUY-00yFlE4&ref=wellarchitected) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1) ](https://www.youtube.com/watch?v=zt6jYJLK8sg&ref=wellarchitected) 

 **Related examples:** 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 

# PERF02-BP03 Collect compute-related metrics
<a name="perf_select_compute_collect_metrics"></a>

To understand how your compute resources are performing, you must record and track the utilization of various systems. This data can be used to make more accurate determinations about resource requirements.  

 Workloads can generate large volumes of data such as metrics, logs, and events. Determine if your existing storage, monitoring, and observability service can manage the data generated. Identify which metrics reflect resource utilization and can be collected, aggregated, and correlated on a single platform across. Those metrics should represent all your workload resources, applications, and services, so you can easily gain system-wide visibility and quickly identify performance improvement opportunities and issues.

 **Desired outcome:** All metrics related to the compute-related resources are identified, collected, aggregated, and correlated on a single platform with retention implemented to support cost and operational goals. 

 **Common anti-patterns:** 
+  You only use manual log file searching for metrics.  
+  You only publish metrics to internal tools. 
+  You only use the default metrics recorded by your selected monitoring software. 
+  You only review metrics when there is an issue. 

 
 **Benefits of establishing this best practice:** To monitor the performance of your workloads, you must record multiple performance metrics over a period of time. These metrics allow you to detect anomalies in performance. They will also help gauge performance against business metrics to ensure that you are meeting your workload needs. 

 **Level of risk exposed if this best practice is not established:** High 

## Implementation guidance
<a name="implementation-guidance"></a>

 Identify, collect, aggregate, and correlate compute-related metrics. Using a service such as Amazon CloudWatch, can make the implementation quicker and easier to maintain. In addition to the default metrics recorded, identify and track additional system-level metrics within your workload. Record data such as CPU utilization, memory, disk I/O, and network inbound and outbound metrics to gain insight into utilization levels or bottlenecks. This data is crucial to understand how the workload is performing and how the compute solution is utilized. Use these metrics as part of a data-driven approach to actively tune and optimize your workload's resources.  

 **Implementation steps:** 

1.  Which compute solution metrics are important to track? 

   1.  [EC2 default metrics](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html) 

   1.  [Amazon ECS default metrics](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html) 

   1.  [EKS default metrics](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/kubernetes-eks-metrics.html) 

   1.  [Lambda default metrics](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-access-metrics.html) 

   1.  [EC2 memory and disk metrics](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html) 

1.  Do I currently have an approved logging and monitoring solution? 

   1.  [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/) 

   1.  [AWS Distro for OpenTelemetry](https://aws.amazon.com/otel/) 

   1.  [Amazon Managed Service for Prometheus](https://docs.aws.amazon.com/grafana/latest/userguide/prometheus-data-source.html) 

1.  Have I identified and configured my data retention policies to match my security and operational goals? 

   1.  [Default data retention for CloudWatch metrics](https://aws.amazon.com/cloudwatch/faqs/#AWS_resource_.26_custom_metrics_monitoring) 

   1.  [Default data retention for CloudWatch Logs](https://aws.amazon.com/cloudwatch/faqs/#Log_management) 

1.  How do you deploy your metric and log aggregation agents? 

   1.  [AWS Systems Manager automation](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-automation.html?ref=wellarchitected) 

   1.  [OpenTelemetry Collector](https://aws-otel.github.io/docs/getting-started/collector) 

 **Level of effort for the Implementation Plan: **There is a *medium* level of effort to identify, track, collect, aggregate, and correlate metrics from all compute resources. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Amazon CloudWatch documentation](https://docs.aws.amazon.com/cloudwatch/index.html?ref=wellarchitected) 
+  [Collect metrics and logs from Amazon EC2 instances and on-premises servers with the CloudWatch Agent](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html?ref=wellarchitected) 
+  [Accessing Amazon CloudWatch Logs for AWS Lambda](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-logs.html?ref=wellarchitected) 
+  [Using CloudWatch Logs with container instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_cloudwatch_logs.html?ref=wellarchitected) 
+  [Publish custom metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html?ref=wellarchitected) 
+  [AWS Answers: Centralized Logging](https://aws.amazon.com/answers/logging/centralized-logging/?ref=wellarchitected) 
+  [AWS Services That Publish CloudWatch Metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CW_Support_For_AWS.html?ref=wellarchitected) 
+  [Monitoring Amazon EKS on AWS Fargate](https://aws.amazon.com/blogs/containers/monitoring-amazon-eks-on-aws-fargate-using-prometheus-and-grafana/) 

 
 **Related videos:** 
+  [Application Performance Management on AWS](https://www.youtube.com/watch?v=5T4stR-HFas&ref=wellarchitected) 
+  [Build a Monitoring Plan](https://www.youtube.com/watch?v=OMmiGETJpfU&ref=wellarchitected) 

 
 **Related examples:** 
+  [Level 100: Monitoring with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_with_cloudwatch_dashboards/) 
+  [Level 100: Monitoring Windows EC2 instance with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_windows_ec2_cloudwatch/) 
+  [Level 100: Monitoring an Amazon Linux EC2 instance with CloudWatch Dashboards](https://wellarchitectedlabs.com/performance-efficiency/100_labs/100_monitoring_linux_ec2_cloudwatch/) 

# PERF02-BP04 Determine the required configuration by right-sizing
<a name="perf_select_compute_right_sizing"></a>

 Analyze the various performance characteristics of your workload and how these characteristics relate to memory, network, and CPU usage. Use this data to choose resources that best match your workload's profile. For example, a memory-intensive workload, such as a database, could be served best by the r-family of instances. However, a bursting workload can benefit more from an elastic container system. 

 **Common anti-patterns:** 
+  You choose the largest instance available for all workloads. 
+  You standardize all instances types to one type for ease of management. 

 **Benefits of establishing this best practice:** Being familiar with the AWS compute offerings allows you to determine the correct solution for your various workloads. After you have selected the various compute offerings for your workload, you have the agility to quickly experiment with those compute offerings to determine which ones meet the needs of your workload. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Modify your workload configuration by right sizing: To optimize both performance and overall efficiency, determine which resources your workload needs. Choose memory-optimized instances for systems that require more memory than CPU, or compute-optimized instances for components that do data processing that is not memory-intensive. Right sizing enables your workload to perform as well as possible while only using the required resources 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/)  
+  [Cloud Compute with AWS](https://aws.amazon.com/products/compute/) 
+  [EC2 Instance Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [ECS Containers: Amazon ECS Container Instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [EKS Containers: EKS Worker Nodes](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration) 
+  [Processor State Control for Your EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2)](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1)](https://www.youtube.com/watch?v=_dvh4P2FVbw) 
+  [Deliver high performance ML inference with AWS Inferentia (CMP324-R1)](https://www.youtube.com/watch?v=17r1EapAxpk) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system](https://www.youtube.com/watch?v=rUY-00yFlE4) 
+  [How to choose compute option for startups](https://aws.amazon.com/startups/start-building/how-to-choose-compute-option/) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 

 **Related examples:** 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 

# PERF02-BP05 Use the available elasticity of resources
<a name="perf_select_compute_elasticity"></a>

 The cloud provides the flexibility to expand or reduce your resources dynamically through a variety of mechanisms to meet changes in demand. Combined with compute-related metrics, a workload can automatically respond to changes and use the optimal set of resources to achieve its goal. 

 Optimally matching supply to demand delivers the lowest cost for a workload, but you also must plan for sufficient supply to allow for provisioning time and individual resource failures. Demand can be fixed or variable, requiring metrics and automation to ensure that management does not become a burdensome and disproportionately large cost. 

 With AWS, you can use a number of different approaches to match supply with demand. The Cost Optimization Pillar whitepaper describes how to use the following approaches to cost: 
+  Demand-based approach 
+  Buffer-based approach 
+  Time-based approach 

 You must ensure that workload deployments can handle both scale-up and scale-down events. Create test scenarios for scale-down events to ensure that the workload behaves as expected. 

 **Common anti-patterns:** 
+  You react to alarms by manually increasing capacity. 
+  You leave increased capacity after a scaling event instead of scaling back down. 

 **Benefits of establishing this best practice:** Configuring and testing workload elasticity will help save money, maintain performance benchmarks, and improves reliability as traffic changes. Most non-production instances should be stopped when they are not being used. Although it's possible to manually shut down unused instances, this is impractical at larger scales. You can also take advantage of volume-based elasticity, which allows you to optimize performance and cost by automatically increasing the number of compute instances during demand spikes and decreasing capacity when demand decreases. 

 **Level of risk exposed if this best practice is not established:** Medium 

## Implementation guidance
<a name="implementation-guidance"></a>

 Take advantage of elasticity: Elasticity matches the supply of resources you have against the demand for those resources. Instances, containers, and functions provide mechanisms for elasticity either in combination with automatic scaling or as a feature of the service. Use elasticity in your architecture to ensure that you have sufficient capacity to meet performance requirements at all scales of use. Ensure that the metrics for scaling up or down elastic resources are validated against the type of workload being deployed. If you are deploying a video transcoding application, 100% CPU utilization is expected and should not be your primary metric. Alternatively, you can measure against the queue depth of transcoding jobs waiting to scale your instance types. Ensure that workload deployments can handle both scale up and scale down events. Scaling down workload components safely is as critical as scaling up resources when demand dictates. Create test scenarios for scale-down events to ensure that the workload behaves as expected. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Compute with AWS](https://aws.amazon.com/products/compute/) 
+  [EC2 Instance Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [ECS Containers: Amazon ECS Container Instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [EKS Containers: EKS Worker Nodes](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration) 
+  [Processor State Control for Your EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2)](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1)](https://www.youtube.com/watch?v=_dvh4P2FVbw) 
+  [Deliver high performance ML inference with AWS Inferentia (CMP324-R1)](https://www.youtube.com/watch?v=17r1EapAxpk) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system](https://www.youtube.com/watch?v=rUY-00yFlE4) 

 **Related examples:** 
+  [Amazon EC2 Auto Scaling Group Examples](https://github.com/aws-samples/amazon-ec2-auto-scaling-group-examples) 
+  [Amazon EFS Tutorials](https://github.com/aws-samples/amazon-efs-tutorial) 

# PERF02-BP06 Re-evaluate compute needs based on metrics
<a name="perf_select_compute_use_metrics"></a>

 Use system-level metrics to identify the behavior and requirements of your workload over time. Evaluate your workload's needs by comparing the available resources with these requirements and make changes to your compute environment to best match your workload's profile. For example, over time a system might be observed to be more memory-intensive than initially thought, so moving to a different instance family or size could improve both performance and efficiency. 

 **Common anti-patterns:** 
+  You only monitor system-level metrics to gain insight into your workload. 
+  You architect your compute needs for peak workload requirements. 
+  You oversize the compute solution to meet scaling or performance requirements when moving to a new compute solution would match your workload characteristics 

 **Benefits of establishing this best practice:** To optimize performance and resource utilization, you need a unified operational view, real-time granular data, and a historical reference. You can create automatic dashboards to visualize this data and perform metric math to derive operational and utilization insights. 

 **Level of risk exposed if this best practice is not established:** Low 

## Implementation guidance
<a name="implementation-guidance"></a>

 Use a data-driven approach to optimize resources: To achieve maximum performance and efficiency, use the data gathered over time from your workload to tune and optimize your resources. Look at the trends in your workload's usage of current resources and determine where you can make changes to better match your workload's needs. When resources are over-committed, system performance degrades, whereas underutilization results in a less efficient use of resources and higher cost. 

## Resources
<a name="resources"></a>

 **Related documents:** 
+  [Cloud Compute with AWS ](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 
+  [Cloud Compute with AWS](https://aws.amazon.com/products/compute/) 
+  [EC2 Instance Types](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [ECS Containers: Amazon ECS Container Instances](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [EKS Containers: EKS Worker Nodes](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+  [Functions: Lambda Function Configuration](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration) 
+  [Processor State Control for Your EC2 Instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html) 

 **Related videos:** 
+  [Amazon EC2 foundations (CMP211-R2)](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [Better, faster, cheaper compute: Cost-optimizing Amazon EC2 (CMP202-R1)](https://www.youtube.com/watch?v=_dvh4P2FVbw) 
+  [Deliver high performance ML inference with AWS Inferentia (CMP324-R1)](https://www.youtube.com/watch?v=17r1EapAxpk) 
+  [Optimize performance and cost for your AWS compute (CMP323-R1)](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Powering next-gen Amazon EC2: Deep dive into the Nitro system](https://www.youtube.com/watch?v=rUY-00yFlE4) 

 **Related examples:** 
+  [Rightsizing with Compute Optimizer and Memory utilization enabled](https://www.wellarchitectedlabs.com/cost/200_labs/200_aws_resource_optimization/5_ec2_computer_opt/) 
+  [AWS Compute Optimizer Demo code](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer)