# 计算和硬件
<a name="compute-and-hardware"></a>

 适合特定工作负载的最佳计算方案会因应用程序设计、使用模式和配置设置而有所不同。架构可能会使用不同的计算方案来支持各种组件，并允许使用不同的功能来提高性能。为架构选择错误的计算方案可能会降低性能效率。

 该重点领域分享了有关如何识别和优化计算选项以提高云端性能效率的指导和最佳实践。

**Topics**
+ [PERF02-BP01 为工作负载选择最佳计算方案](perf_compute_hardware_select_best_compute_options.md)
+ [PERF02-BP02 了解可用的计算配置和功能](perf_compute_hardware_understand_compute_configuration_features.md)
+ [PERF02-BP03 收集与计算相关的指标](perf_compute_hardware_collect_compute_related_metrics.md)
+ [PERF02-BP04 配置计算资源并合理调整资源规模](perf_compute_hardware_configure_and_right_size_compute_resources.md)
+ [PERF02-BP05 动态扩展计算资源](perf_compute_hardware_scale_compute_resources_dynamically.md)
+ [PERF02-BP06 使用基于硬件的优化型计算加速器](perf_compute_hardware_compute_accelerators.md)

# PERF02-BP01 为工作负载选择最佳计算方案
<a name="perf_compute_hardware_select_best_compute_options"></a>

 通过为工作负载选择最合适的计算方案，可以提高性能，减少不必要的基础设施成本以及维护工作负载所需的运营工作。

 **常见反模式：**
+  使用本地所用的计算方案。
+  对云计算方案、功能和解决方案以及这些解决方案可以如何提高计算性能缺乏认识。
+  为了满足扩展或性能要求，过度预置现有计算方案，而使用替代计算方案可以更准确地满足工作负载特征需求。

 **建立此最佳实践的好处：**通过确定计算要求并对可用方案进行评估，可以让工作负载更高效地利用资源。

 **在未建立这种最佳实践的情况下暴露的风险等级：**高 

## 实施指导
<a name="implementation-guidance"></a>

 为了提高性能效率而优化云工作负载时，请务必根据应用场景和性能要求选择最合适的计算方案。AWS 提供了多种计算方案，可满足云中不同工作负载的需求。例如，您可以使用 [Amazon EC2](https://docs.aws.amazon.com/ec2/) 启动和管理虚拟服务器，使用 [AWS Lambda](https://docs.aws.amazon.com/lambda/?icmpid=docs_homepage_featuredsvcs) 运行代码而不必预置或管理服务器，使用 [Amazon ECS](https://aws.amazon.com/ecs/) 或 [Amazon EKS](https://aws.amazon.com/eks/) 运行和管理容器，或者使用 [AWS Batch](https://aws.amazon.com/batch/) 并行处理大量数据。应根据自己的规模和计算需求，选择和配置最适合自己情况的计算解决方案。也可以考虑在单个工作负载中使用多种类型的计算解决方案，因为每种解决方案都有自己的优缺点。

 以下步骤将指导您根据自身工作负载的特征和性能要求，选择合适的计算方案。

## 实施步骤
<a name="implementation-steps"></a>
+  了解工作负载计算要求。需要考虑的关键要求包括：处理需求、流量模式、数据访问模式、扩展需求和延迟要求。
+  了解适用于工作负载的不同 [AWS 计算服务](https://docs.aws.amazon.com/whitepapers/latest/aws-overview/compute-services.html)。有关更多信息，请参阅 [PERF01-BP01 了解并掌握可用的云服务和功能](perf_architecture_understand_cloud_services_and_features.md)。以下介绍了一些关键的 AWS 计算方案、这些方案的特征和常见应用场景：    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/wellarchitected/latest/performance-efficiency-pillar/perf_compute_hardware_select_best_compute_options.html)
+  评估与每种计算方案相关的成本（如每小时费用或数据传输）和管理开销（如修补和扩展）。
+  在非生产环境中进行试验和基准测试，确定哪种计算方案最能满足工作负载要求。
+  试验并确定新的计算解决方案后，规划迁移并验证性能指标。
+  使用 [Amazon CloudWatch](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/WhatIsCloudWatch.html) 等 AWS 监控工具和 [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 等优化服务，根据实际使用模式持续优化计算资源。

 

## 资源
<a name="resources"></a>

 **相关文档：**
+  [使用 AWS 进行云计算](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [Amazon EC2 实例类型](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html?ref=wellarchitected) 
+  [Amazon EKS 容器：Amazon EKS Worker 节点](https://docs.aws.amazon.com/eks/latest/userguide/worker.html?ref=wellarchitected) 
+  [Amazon ECS 容器：Amazon ECS 容器实例](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html?ref=wellarchitected) 
+  [函数：Lambda 函数配置](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html?ref=wellarchitected#function-configuration) 
+ [Prescriptive Guidance for Containers](https://aws.amazon.com/prescriptive-guidance/?apg-all-cards.sort-by=item.additionalFields.sortText&apg-all-cards.sort-order=desc&awsf.apg-new-filter=*all&awsf.apg-content-type-filter=*all&awsf.apg-code-filter=*all&awsf.apg-category-filter=categories%23containers&awsf.apg-rtype-filter=*all&awsf.apg-isv-filter=*all&awsf.apg-product-filter=*all&awsf.apg-env-filter=*all) 
+  [Prescriptive Guidance for Serverless](https://aws.amazon.com/prescriptive-guidance/?apg-all-cards.sort-by=item.additionalFields.sortText&apg-all-cards.sort-order=desc&awsf.apg-new-filter=*all&awsf.apg-content-type-filter=*all&awsf.apg-code-filter=*all&awsf.apg-category-filter=categories%23serverless&awsf.apg-rtype-filter=*all&awsf.apg-isv-filter=*all&awsf.apg-product-filter=*all&awsf.apg-env-filter=*all) 

 **相关视频：**
+  [AWS re:Invent 2023 - AWS Graviton: The best price performance for your AWS workloads](https://www.youtube.com/watch?v=T_hMIjKtSr4&ab_channel=AWSEvents) 
+  [AWS re:Invent 2023 - New Amazon Elastic Compute Cloud generative AI capabilities in AMS](https://www.youtube.com/watch?v=sSpJ8tWCEiA) 
+  [AWS re:Invent 2023 - What’s new with Amazon Elastic Compute Cloud](https://www.youtube.com/watch?v=mjHw_wgJJ5g) 
+  [AWS re:Invent 2023 - Smart savings: Amazon Elastic Compute Cloud cost-optimization strategies](https://www.youtube.com/watch?v=_AHPbxzIGV0) 
+  [AWS re:Invent 2021 - Powering next-gen Amazon Elastic Compute Cloud: Deep dive on the Nitro System](https://www.youtube.com/watch?v=2uc1vaEsPXU) 
+  [AWS re:Invent 2019 - Optimize performance and cost for your AWS compute](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [AWS re:Invent 2019 - Amazon Elastic Compute Cloud foundations](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [AWS re:Invent 2022 - Deploy ML models for inference at high performance and low cost](https://www.youtube.com/watch?v=4FqHt5bmS2o) 
+  [AWS re:Invent 2019 - Optimize performance and cost for your AWS compute ](https://www.youtube.com/watch?v=zt6jYJLK8sg) 
+  [Amazon EC2 foundations](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [部署 ML 模型，以便进行高性能和低成本的推理](https://www.youtube.com/watch?v=4FqHt5bmS2o) 

 **相关示例：**
+  [Migrating the Web application to containers](https://application-migration-with-aws.workshop.aws/en/container-migration.html) 
+  [运行无服务器程序“Hello World”](https://aws.amazon.com/getting-started/hands-on/run-serverless-code/) 
+  [Amazon EKS 研讨会](https://www.eksworkshop.com/) 
+  [Amazon EC2 讲习会](https://ec2spotworkshops.com/) 
+  [Efficient and Resilient Workloads with Amazon Elastic Compute Cloud Auto Scaling](https://catalog.us-east-1.prod.workshops.aws/workshops/20c57d32-162e-4ad5-86a6-dff1f8de4b3c/en-US) 
+  [Migrating to AWS Graviton with Container Services](https://catalog.us-east-1.prod.workshops.aws/workshops/dcab7555-32fc-42d2-97e5-2b7a35cd008f/en-US/) 

# PERF02-BP02 了解可用的计算配置和功能
<a name="perf_compute_hardware_understand_compute_configuration_features"></a>

 了解计算服务的可用配置选项和功能，帮助预置适量的资源并提高性能效率。

 **常见反模式：**
+  没有依据工作负载特征评估计算方案或可用的实例系列。
+  过度预置计算资源来满足高峰需求。

**建立此最佳实践的好处：**熟悉 AWS 计算功能和配置，以便使用经过优化的计算解决方案，满足工作负载特征和需求。

 **在未建立这种最佳实践的情况下暴露的风险等级：**中 

## 实施指导
<a name="implementation-guidance"></a>

 每种计算解决方案都有独特的配置和功能，可支持不同的工作负载特征和需求。了解这些方案如何完善工作负载，并确定哪些配置选项最适合您的应用程序。这些选项的示例包括实例系列、规模、功能（GPU、I/O）、突增、超时、函数大小、容器实例和并发。如果工作负载已经使用同一计算方案超过四周，并且预计这些特征在未来将保持不变，则可以使用 [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 从 CPU 和内存角度查明当前计算方案是否适合工作负载。

## 实施步骤
<a name="implementation-steps"></a>
+  了解工作负载要求（如 CPU 需求、内存和延迟）。
+  查看 AWS 文档和最佳实践，了解有助于提高计算性能的推荐配置选项。以下是一些需要考虑的关键配置选项：    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/wellarchitected/latest/performance-efficiency-pillar/perf_compute_hardware_understand_compute_configuration_features.html)

## 资源
<a name="resources"></a>

 **相关文档：**
+  [使用 AWS 进行云计算](https://aws.amazon.com/products/compute/?ref=wellarchitected) 
+  [Amazon EC2 实例类型](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html?ref=wellarchitected) 
+  [Amazon EC2 实例的处理器状态控制](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html?ref=wellarchitected) 
+  [Amazon EKS 容器：Amazon EKS Worker 节点](https://docs.aws.amazon.com/eks/latest/userguide/worker.html?ref=wellarchitected) 
+  [Amazon ECS 容器：Amazon ECS 容器实例](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html?ref=wellarchitected) 
+  [函数：Lambda 函数配置](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html?ref=wellarchitected#function-configuration) 

 **相关视频：**
+  [AWS re:Invent 2023 – AWS Graviton: The best price performance for your AWS workloads](https://www.youtube.com/watch?v=T_hMIjKtSr4) 
+  [AWS re:Invent 2023 – New Amazon EC2 generative AI capabilities in AWS 管理控制台](https://www.youtube.com/watch?v=sSpJ8tWCEiA) 
+  [AWS re:Invent 2023 – What's new with Amazon EC2](https://www.youtube.com/watch?v=mjHw_wgJJ5g) 
+  [AWS re:Invent 2023 – Smart savings: Amazon EC2 cost-optimization strategies](https://www.youtube.com/watch?v=_AHPbxzIGV0) 
+  [AWS re:Invent 2021 – Powering next-gen Amazon EC2: Deep dive on the Nitro System](https://www.youtube.com/watch?v=2uc1vaEsPXU) 
+  [AWS re:Invent 2019 – Amazon EC2 foundations](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [AWS re:Invent 2022 – Optimizing Amazon EKS for performance and cost on AWS](https://www.youtube.com/watch?v=5B4-s_ivn1o) 

 **相关示例：**
+  [Compute Optimizer 演示代码](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 
+  [Amazon EC2 竞价型实例讲习会](https://ec2spotworkshops.com/) 
+  [Efficient and Resilient Workloads with Amazon EC2 AWS Auto Scaling](https://catalog.us-east-1.prod.workshops.aws/workshops/20c57d32-162e-4ad5-86a6-dff1f8de4b3c/en-US) 
+  [Graviton 开发人员讲习会](https://catalog.us-east-1.prod.workshops.aws/workshops/dcab7555-32fc-42d2-97e5-2b7a35cd008f/en-US/) 
+  [AWS for Microsoft workloads immersion day](https://catalog.us-east-1.prod.workshops.aws/workshops/d6c7ecdc-c75f-4ad1-910f-fdd994cc4aed/en-US) 
+  [AWS for Linux workloads immersion day](https://catalog.us-east-1.prod.workshops.aws/workshops/a8e9c6a6-0ba9-48a7-a90d-378a440ab8ba/en-US) 
+  [AWS Compute Optimizer 演示代码](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 
+  [Amazon EKS 讲习会](https://www.eksworkshop.com/) 

  

# PERF02-BP03 收集与计算相关的指标
<a name="perf_compute_hardware_collect_compute_related_metrics"></a>

 记录和跟踪与计算相关的指标，以便更好地了解计算资源的表现情况，并提高计算资源的性能和利用率。

 **常见反模式：**
+  只手动搜索日志文件来查找指标。  
+  只使用由监控软件记录的默认指标。
+  只在出现问题时审查指标。

 **建立此最佳实践的好处：**收集与性能相关的指标有助于您根据业务要求调整应用程序性能，从而确保满足工作负载需求。收集指标还有利于您持续提高工作负载中的资源性能和利用率。

 **在未建立这种最佳实践的情况下暴露的风险等级：**高 

## 实施指导
<a name="implementation-guidance"></a>

 云工作负载会生成大量数据，例如指标、日志和事件。在 AWS 云 中，收集指标是提高安全性、成本效率、性能和可持续性的关键步骤。AWS 使用监控服务（如 [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/)）提供各种与性能相关的指标，从而为您提供宝贵的洞察。CPU 利用率、内存利用率、磁盘 I/O 以及网络入站和出站等指标有助于您深入了解利用率水平或性能瓶颈。将这些指标用作数据驱动方法的一部分，以便主动调整和优化工作负载的资源。  理想情况下，您应该在单一平台上收集与计算资源相关的所有指标，并实施留存策略以支持成本目标和运营目标。

## 实施步骤
<a name="implementation-steps"></a>
+  确定哪些与性能相关的指标与您的工作负载相关。您应该收集有关资源利用率和云工作负载运行方式的指标（例如响应时间和吞吐量）。
  +  [Amazon EC2 默认指标](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html) 
  +  [Amazon ECS 默认指标](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/cloudwatch-metrics.html) 
  +  [Amazon EKS 默认指标](https://docs.aws.amazon.com/prescriptive-guidance/latest/implementing-logging-monitoring-cloudwatch/kubernetes-eks-metrics.html) 
  +  [Lambda 默认指标](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-access-metrics.html) 
  +  [Amazon EC2 内存和磁盘指标](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html) 
+  为工作负载选择并设置合适的日志记录和监控解决方案。
  +  [AWS native Observability](https://catalog.workshops.aws/observability/en-US/aws-native) 
  +  [适用于 OpenTelemetry 的 AWS Distro](https://aws.amazon.com/otel/) 
  +  [Amazon Managed Service for Prometheus](https://docs.aws.amazon.com/grafana/latest/userguide/prometheus-data-source.html) 
+  根据工作负载要求为指标确定所需的筛选和聚合。
  +  [Quantify custom application metrics with Amazon CloudWatch Logs and metric filters](https://aws.amazon.com/blogs/mt/quantify-custom-application-metrics-with-amazon-cloudwatch-logs-and-metric-filters/) 
  +  [Collect custom metrics with Amazon CloudWatch strategic tagging](https://aws.amazon.com/blogs/infrastructure-and-automation/collect-custom-metrics-with-amazon-cloudwatch-strategic-tagging/) 
+  为指标配置数据留存策略，从而符合安全目标和运营目标。
  +  [CloudWatch 指标的默认数据留存](https://aws.amazon.com/cloudwatch/faqs/#AWS_resource_.26_custom_metrics_monitoring) 
  +  [CloudWatch Logs 的默认数据留存](https://aws.amazon.com/cloudwatch/faqs/#Log_management) 
+  如有需要，可为指标创建警报和通知，协助您主动应对与性能相关的问题。
  +  [Create alarms for custom metrics using Amazon CloudWatch anomaly detection](https://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/create-alarms-for-custom-metrics-using-amazon-cloudwatch-anomaly-detection.html) 
  +  [Create metrics and alarms for specific web pages with Amazon CloudWatch RUM](https://aws.amazon.com/blogs/mt/create-metrics-and-alarms-for-specific-web-pages-amazon-cloudwatch-rum/) 
+  使用自动化技术来部署指标和日志聚合代理。
  +  [AWS Systems Manager 自动化](https://docs.aws.amazon.com/systems-manager/latest/userguide/systems-manager-automation.html?ref=wellarchitected) 
  +  [OpenTelemetry Collector](https://aws-otel.github.io/docs/getting-started/collector) 

## 资源
<a name="resources"></a>

 **相关文档：**
+  [监控和可观测性](https://aws.amazon.com/cloudops/monitoring-and-observability/) 
+  [Best practices: implementing observability with AWS](https://aws.amazon.com/blogs/mt/best-practices-implementing-observability-with-aws/) 
+  [Amazon CloudWatch 文档](https://docs.aws.amazon.com/cloudwatch/index.html?ref=wellarchitected) 
+  [使用 CloudWatch 代理从 Amazon EC2 实例和本地部署服务器中收集指标和日志](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Install-CloudWatch-Agent.html?ref=wellarchitected) 
+  [访问 AWS Lambda 的 Amazon CloudWatch Logs](https://docs.aws.amazon.com/lambda/latest/dg/monitoring-functions-logs.html?ref=wellarchitected) 
+  [将 CloudWatch Logs 与容器实例结合使用](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/using_cloudwatch_logs.html?ref=wellarchitected) 
+  [发布自定义指标](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/publishingMetrics.html?ref=wellarchitected) 
+  [AWS Answers：集中式日志记录](https://aws.amazon.com/answers/logging/centralized-logging/?ref=wellarchitected) 
+  [发布 CloudWatch 指标的 AWS 服务](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CW_Support_For_AWS.html?ref=wellarchitected) 
+  [Monitoring Amazon EKS on AWS Fargate](https://aws.amazon.com/blogs/containers/monitoring-amazon-eks-on-aws-fargate-using-prometheus-and-grafana/) 

 **相关视频：**
+  [AWS re:Invent 2023 – [LAUNCH] Application monitoring for modern workloads](https://www.youtube.com/watch?v=T2TovTLje8w) 
+  [AWS re:Invent 2023 – Implementing application observability](https://www.youtube.com/watch?v=IcTcwUSwIs4) 
+  [AWS re:Invent 2023 – Building an effective observability strategy](https://www.youtube.com/watch?v=7PQv9eYCJW8) 
+  [AWS re:Invent 2023 – Seamless observability with AWS Distro for OpenTelemetry](https://www.youtube.com/watch?v=S4GfA2R0N_A) 
+  [Application Performance Management on AWS](https://www.youtube.com/watch?v=5T4stR-HFas&ref=wellarchitected) 

 **相关示例：**
+  [AWS for Linux Workloads Immersion Day- Amazon CloudWatch](https://catalog.us-east-1.prod.workshops.aws/workshops/a8e9c6a6-0ba9-48a7-a90d-378a440ab8ba/en-US/300-cloudwatch) 
+  [Monitoring Amazon ECS clusters and containers](https://ecsworkshop.com/monitoring/) 
+  [Monitoring with Amazon CloudWatch dashboards](https://catalog.workshops.aws/well-architected-performance-efficiency/en-US/3-monitoring/monitoring-with-cloudwatch-dashboards) 
+  [Amazon EKS 讲习会](https://www.eksworkshop.com/) 

# PERF02-BP04 配置计算资源并合理调整资源规模
<a name="perf_compute_hardware_configure_and_right_size_compute_resources"></a>

 配置计算资源并合理调整资源规模，使其满足您工作负载的性能要求，避免资源利用不足或过度利用。

 **常见反模式：**
+  忽略工作负载性能要求，导致计算资源预置过度或预置不足。
+  只选择适用于所有工作负载的最大或最小实例。
+  为了便于管理，只使用一个实例系列。
+  忽略来自 AWS Cost Explorer 或 Compute Optimizer 的关于合理调整规模的建议。
+  没有重新评估新实例类型是否适合工作负载。
+  只为组织认证少量实例配置。

 **建立此最佳实践的好处：**合理调整计算资源的规模后，可避免资源预置过度和预置不足，从而确保资源在云端以最佳方式运行。适当调整计算资源的规模通常可以提高性能和改进客户体验，同时还可以降低成本。

 **在未建立这种最佳实践的情况下暴露的风险等级：**中 

## 实施指导
<a name="implementation-guidance"></a>

 合理调整规模使组织能够以经济高效的方式运营云基础设施，同时满足业务需求。云资源预置过度可能会导致额外成本，而预置不足可能导致性能和客户体验不佳。AWS 提供 [AWS Compute Optimizer](https://aws.amazon.com/compute-optimizer/) 和 [AWS Trusted Advisor](https://aws.amazon.com/premiumsupport/technology/trusted-advisor/) 之类的工具，这些工具使用历史数据为计算资源提供合理调整规模的建议。

### 实施步骤
<a name="implementation-steps"></a>
+  选择最能满足您需求的实例类型：
  +  [如何为我的工作负载选择适当的 Amazon EC2 实例类型？](https://aws.amazon.com/premiumsupport/knowledge-center/ec2-instance-choose-type-for-workload/) 
  +  [Amazon EC2 Fleet 的基于属性的实例类型选择](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-fleet-attribute-based-instance-type-selection.html) 
  +  [使用基于属性的实例类型选择创建自动扩缩组](https://docs.aws.amazon.com/autoscaling/ec2/userguide/create-asg-instance-type-requirements.html) 
  +  [Optimizing your Kubernetes compute costs with Karpenter consolidation](https://aws.amazon.com/blogs/containers/optimizing-your-kubernetes-compute-costs-with-karpenter-consolidation/) 
+  分析您的工作负载的各种性能特性，以及这些特性与内存、网络和 CPU 使用率之间的关系。根据这些数据选择最符合您的工作负载情况和性能目标的资源。
+  使用 AWS 监控工具（如 Amazon CloudWatch）监控资源使用情况。
+  为计算资源选择合适的配置。
  +  对于临时工作负载，请评估[实例 Amazon CloudWatch 指标](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/viewing_metrics_with_cloudwatch.html)（例如 `CPUUtilization`），确定实例是利用不足还是利用过度。
  +  对于稳定工作负载，请定期检查 AWS 合理调整规模工具（如 AWS Compute Optimizer 和 AWS Trusted Advisor），从而挖掘优化计算资源和合理调整计算资源规模的机会。
+  在实际环境中实施之前，先在非生产环境中测试配置更改。
+  持续重新评估新的计算产品/服务，并与工作负载的需求进行比较。

## 资源
<a name="resources"></a>

 **相关文档：**
+  [使用 AWS 进行云计算](https://aws.amazon.com/products/compute/) 
+  [Amazon EC2 实例类型](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [Amazon ECS 容器：Amazon ECS 容器实例](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [Amazon EKS 容器：Amazon EKS Worker 节点](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+  [函数：Lambda 函数配置](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration) 
+  [Amazon EC2 实例的处理器状态控制](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html) 

 **相关视频：**
+  [Amazon EC2 foundations](https://www.youtube.com/watch?v=kMMybKqC2Y0) 
+  [AWS re:Invent 2023 – AWS Graviton: The best price performance for your AWS workloads](https://www.youtube.com/watch?v=T_hMIjKtSr4) 
+  [AWS re:Invent 2023 – New Amazon EC2 generative AI capabilities in AWS 管理控制台](https://www.youtube.com/watch?v=sSpJ8tWCEiA) 
+  [AWS re:Invent 2023 – What's new with Amazon EC2](https://www.youtube.com/watch?v=mjHw_wgJJ5g) 
+  [AWS re:Invent 2023 – Smart savings: Amazon EC2 cost-optimization strategies](https://www.youtube.com/watch?v=_AHPbxzIGV0) 
+  [AWS re:Invent 2021 – Powering next-gen Amazon EC2: Deep dive on the Nitro System](https://www.youtube.com/watch?v=2uc1vaEsPXU) 
+  [AWS re:Invent 2019 – Amazon EC2 foundations](https://www.youtube.com/watch?v=kMMybKqC2Y0) 

 **相关示例：**
+  [AWS Compute Optimizer 演示代码](https://github.com/awslabs/ec2-spot-labs/tree/master/aws-compute-optimizer) 
+  [Amazon EKS 讲习会](https://www.eksworkshop.com/) 
+  [合理调整规模建议](https://catalog.workshops.aws/well-architected-cost-optimization/en-US/3-cost-effective-resources/40-rightsizing-recommendations-100) 

# PERF02-BP05 动态扩展计算资源
<a name="perf_compute_hardware_scale_compute_resources_dynamically"></a>

 利用云的弹性根据需求动态增减计算资源，避免为工作负载预置的容量过多或者不足。

 **常见反模式：**
+  通过手动增加容量来对警报做出反应。
+  使用本地所用的规模调整指南（通常是静态基础设施）。
+  在扩展事件之后保留增加的容量，而不是缩减容量。

 **建立此最佳实践的好处：**配置和测试计算资源的弹性将有助于您节省资金、维护性能基准，以及在流量变化时提高可靠性。

 **在未建立这种最佳实践的情况下暴露的风险等级：**高 

## 实施指导
<a name="implementation-guidance"></a>

 AWS 让您能够通过各种扩展机制灵活地动态扩展或缩减资源，以便满足不断变化的需求。动态扩展结合计算相关的指标，可使工作负载自动响应变化，并利用一系列最优的计算资源来实现目标。

 您可以使用大量不同方法来实现资源的供需匹配。
+  **目标跟踪方法**：监控您的扩缩指标，并根据需要自动增加或减少容量。
+  **预测性扩缩**：根据每日和每周的趋势进行扩缩。
+  **基于计划的方法**：根据可预测的负载变化设置自己的扩缩计划。
+  **服务扩缩**：选择可根据设计自动扩缩的服务（如无服务器）。

 您必须确保工作负载部署可以处理扩展事件和缩减事件。

### 实施步骤
<a name="implementation-steps"></a>
+  计算实例、容器和函数都能够与自动扩缩服务相结合或作为此服务的一项功能来提供可实现弹性的机制。以下是自动扩缩机制的一些示例：    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/wellarchitected/latest/performance-efficiency-pillar/perf_compute_hardware_scale_compute_resources_dynamically.html)
+  扩缩通常与计算服务（如 Amazon EC2 实例或 AWS Lambda 函数）相关。此外，务必考虑配置非计算服务（如 [AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/auto-scaling.html)）来满足需求。
+  验证扩缩指标是否与正在部署的工作负载的特性相匹配。如果您正在部署一个视频转码应用程序，预计 CPU 利用率为 100%，但不应将此作为主要指标，而应使用转码作业队列的深度。如果需要，可以对扩缩策略使用[自定义指标](https://aws.amazon.com/blogs/mt/create-amazon-ec2-auto-scaling-policy-memory-utilization-metric-linux/)。要选择正确的指标，请考虑以下关于 Amazon EC2 的指导：
  +  指标应该是有效的利用率指标，并描述实例的繁忙程度。
  +  指标值必须随着自动扩缩组中的实例数按比例增加或减少。
+  确保对自动扩缩组使用[动态扩缩](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scale-based-on-demand.html)而不是[手动扩缩](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-manual-scaling.html)。我们还建议在动态扩缩中使用[目标跟踪扩缩策略](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html)。
+  确认工作负载部署可以同时处理扩展事件和缩减事件。例如，可以使用[活动历史记录](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-verify-scaling-activity.html)来验证自动扩缩组的扩缩活动。
+  评估工作负载，得出可预测的模式，从而在预期需求会发生预测性的计划内变化时主动扩缩。预测性扩缩可以避免过度预置容量。有关更多详细信息，请参阅 [Predictive Scaling with Amazon EC2 Auto Scaling](https://aws.amazon.com/blogs/compute/introducing-native-support-for-predictive-scaling-with-amazon-ec2-auto-scaling/)。

## 资源
<a name="resources"></a>

 **相关文档：**
+  [使用 AWS 进行云计算](https://aws.amazon.com/products/compute/) 
+  [Amazon EC2 实例类型](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instance-types.html) 
+  [Amazon ECS 容器：Amazon ECS 容器实例](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ECS_instances.html) 
+  [Amazon EKS 容器：Amazon EKS Worker 节点](https://docs.aws.amazon.com/eks/latest/userguide/worker.html) 
+  [函数：Lambda 函数配置](https://docs.aws.amazon.com/lambda/latest/dg/best-practices.html#function-configuration) 
+  [Amazon EC2 实例的处理器状态控制](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/processor_state_control.html) 
+  [Deep Dive on Amazon ECS Cluster Auto Scaling](https://aws.amazon.com/blogs/containers/deep-dive-on-amazon-ecs-cluster-auto-scaling/) 
+  [Introducing Karpenter – An Open-Source High-Performance Kubernetes Cluster Autoscaler](https://aws.amazon.com/blogs/aws/introducing-karpenter-an-open-source-high-performance-kubernetes-cluster-autoscaler/) 

 **相关视频：**
+  [AWS re:Invent 2023 – AWS Graviton: The best price performance for your AWS workloads](https://www.youtube.com/watch?v=T_hMIjKtSr4) 
+  [AWS re:Invent 2023 – New Amazon EC2 generative AI capabilities in AWS Management Console](https://www.youtube.com/watch?v=sSpJ8tWCEiA) 
+  [AWS re:Invent 2023 – What's new with Amazon EC2](https://www.youtube.com/watch?v=mjHw_wgJJ5g) 
+  [AWS re:Invent 2023 – Smart savings: Amazon EC2 cost-optimization strategies](https://www.youtube.com/watch?v=_AHPbxzIGV0) 
+  [AWS re:Invent 2021 – Powering next-gen Amazon EC2: Deep dive on the Nitro System](https://www.youtube.com/watch?v=2uc1vaEsPXU) 
+  [AWS re:Invent 2019 – Amazon EC2 foundations](https://www.youtube.com/watch?v=kMMybKqC2Y0) 

 **相关示例：**
+  [Amazon EC2 Auto Scaling Group Examples](https://github.com/aws-samples/amazon-ec2-auto-scaling-group-examples) 
+  [Amazon EKS 研讨会](https://www.eksworkshop.com/) 
+  [Scale your Amazon EKS workloads by running on IPv6](https://catalog.us-east-1.prod.workshops.aws/workshops/3b06259f-8e17-4f2f-811a-75c9b06a2807/en-US) 

# PERF02-BP06 使用基于硬件的优化型计算加速器
<a name="perf_compute_hardware_compute_accelerators"></a>

 与基于 CPU 的替代方案相比，使用硬件加速器可以更高效地执行某些功能。

 **常见反模式：**
+  在工作负载中，没有对照性能更高和成本更低的专用实例，对通用实例进行基准测试。
+  使用基于硬件的计算加速器执行任务，而使用基于 CPU 的替代方案能更高效地完成这些任务。
+  不监控 GPU 使用情况。

**建立此最佳实践的好处：**通过使用基于硬件的加速器 [如图形处理单元（GPU）和现场可编程门阵列（FPGA）]，可以更高效地执行某些处理功能。

 **在未建立这种最佳实践的情况下暴露的风险等级：**中 

## 实施指导
<a name="implementation-guidance"></a>

 加速型计算实例提供对基于硬件的计算加速器（如 GPU 和 FPGA）的访问。这些硬件加速器能够比基于 CPU 的替代方案更有效地执行某些功能，例如图形处理或数据模式匹配。许多加速工作负载（如渲染、转码和机器学习）在资源使用方面变化很大。仅在需要时运行此硬件，并在不需要时自动将其停用，从而提高整体性能效率。

### 实施步骤
<a name="implementation-steps"></a>
+  确定可以满足要求的[加速型计算实例](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html)。
+  对于机器学习工作负载，请利用针对工作负载的专用硬件，例如 [AWS Trainium](https://aws.amazon.com/machine-learning/trainium/)、[AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/) 和 [Amazon EC2 DL1](https://aws.amazon.com/ec2/instance-types/dl1/)。AWSInf2 实例等 Inferentia 实例[相比同类 Amazon EC2 实例，性能功耗比提升了 50%](https://aws.amazon.com/machine-learning/inferentia/)。
+  收集加速型计算实例的使用情况指标。例如，按照[使用 Amazon CloudWatch 收集 NVIDIA GPU 指标](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-NVIDIA-GPU.html)所述，使用 CloudWatch 代理收集 GPU 的 `utilization_gpu` 和 `utilization_memory` 等指标。
+  优化硬件加速器的代码、网络运营和设置，确保底层硬件得到充分利用。
  +  [优化 GPU 设置](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/optimize_gpu.html) 
  +  [GPU Monitoring and Optimization in the Deep Learning AMI](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-gpu.html) 
  +  [Optimizing I/O for GPU performance tuning of deep learning training in Amazon SageMaker AI](https://aws.amazon.com/blogs/machine-learning/optimizing-i-o-for-gpu-performance-tuning-of-deep-learning-training-in-amazon-sagemaker/) 
+  使用最新的高性能库和 GPU 驱动程序。
+  使用自动化功能在不使用 GPU 实例时将其释放。

## 资源
<a name="resources"></a>

 **相关文档：**
+  [在 Amazon ECS 上使用 GPU](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-gpu.html) 
+  [GPU 实例](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html#gpu-instances) 
+  [使用 AWS Trainium 的实例](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html#aws-trainium-instances) 
+  [使用 AWS Inferentia 的实例](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html#aws-inferentia-instances) 
+  [Let’s Architect\$1 Architecting with custom chips and accelerators](https://aws.amazon.com/blogs/architecture/lets-architect-custom-chips-and-accelerators/) 
+  [加速计算](https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing) 
+  [Amazon EC2 VT1 Instances](https://aws.amazon.com/ec2/instance-types/vt1/) 
+  [如何为我的工作负载选择适当的 Amazon EC2 实例类型？](https://aws.amazon.com/premiumsupport/knowledge-center/ec2-instance-choose-type-for-workload/) 
+  [Choose the best AI accelerator and model compilation for computer vision inference with Amazon SageMaker AI](https://aws.amazon.com/blogs/machine-learning/choose-the-best-ai-accelerator-and-model-compilation-for-computer-vision-inference-with-amazon-sagemaker/) 

 **相关视频：**
+  [AWS re:Invent 2021 - How to select Amazon Elastic Compute Cloud GPU instances for deep learning](https://www.youtube.com/watch?v=4bVrIbgGWEA&ab_channel=AWSEvents) 
+  [AWS re:Invent 2022 - [NEW LAUNCH\$1] Introducing AWS Inferentia2-based Amazon EC2 Inf2 instances](https://www.youtube.com/watch?v=jpqiG02Y2H4&ab_channel=AWSEvents) 
+  [AWS re:Invent 2022 - Accelerate deep learning and innovate faster with AWS Trainium](https://www.youtube.com/watch?v=YRqvfNwqUIA&ab_channel=AWSEvents) 
+  [AWS re:Invent 2022 - Deep learning on AWS with NVIDIA: From training to deployment](https://www.youtube.com/watch?v=l8AFfaCkp0E&ab_channel=AWSEvents) 

 **相关示例：**
+  [Amazon SageMaker AI and NVIDIA GPU Cloud (NGC)](https://github.com/aws-samples/amazon-sagemaker-nvidia-ngc-examples) 
+  [Use SageMaker AI with Trainium and Inferentia for optimized deep learning training and inferencing workloads](https://github.com/aws-samples/sagemaker-trainium-inferentia) 
+  [Optimizing NLP models with Amazon Elastic Compute Cloud Inf1 instances in Amazon SageMaker AI](https://github.com/aws-samples/aws-inferentia-huggingface-workshop)