

# PERF02-BP06 使用基于硬件的优化型计算加速器
<a name="perf_compute_hardware_compute_accelerators"></a>

 与基于 CPU 的替代方案相比，使用硬件加速器可以更高效地执行某些功能。

 **常见反模式：**
+  在工作负载中，没有对照性能更高和成本更低的专用实例，对通用实例进行基准测试。
+  使用基于硬件的计算加速器执行任务，而使用基于 CPU 的替代方案能更高效地完成这些任务。
+  不监控 GPU 使用情况。

**建立此最佳实践的好处：**通过使用基于硬件的加速器 [如图形处理单元（GPU）和现场可编程门阵列（FPGA）]，可以更高效地执行某些处理功能。

 **在未建立这种最佳实践的情况下暴露的风险等级：**中 

## 实施指导
<a name="implementation-guidance"></a>

 加速型计算实例提供对基于硬件的计算加速器（如 GPU 和 FPGA）的访问。这些硬件加速器能够比基于 CPU 的替代方案更有效地执行某些功能，例如图形处理或数据模式匹配。许多加速工作负载（如渲染、转码和机器学习）在资源使用方面变化很大。仅在需要时运行此硬件，并在不需要时自动将其停用，从而提高整体性能效率。

### 实施步骤
<a name="implementation-steps"></a>
+  确定可以满足要求的[加速型计算实例](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html)。
+  对于机器学习工作负载，请利用针对工作负载的专用硬件，例如 [AWS Trainium](https://aws.amazon.com/machine-learning/trainium/)、[AWS Inferentia](https://aws.amazon.com/machine-learning/inferentia/) 和 [Amazon EC2 DL1](https://aws.amazon.com/ec2/instance-types/dl1/)。AWSInf2 实例等 Inferentia 实例[相比同类 Amazon EC2 实例，性能功耗比提升了 50%](https://aws.amazon.com/machine-learning/inferentia/)。
+  收集加速型计算实例的使用情况指标。例如，按照[使用 Amazon CloudWatch 收集 NVIDIA GPU 指标](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-NVIDIA-GPU.html)所述，使用 CloudWatch 代理收集 GPU 的 `utilization_gpu` 和 `utilization_memory` 等指标。
+  优化硬件加速器的代码、网络运营和设置，确保底层硬件得到充分利用。
  +  [优化 GPU 设置](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/optimize_gpu.html) 
  +  [GPU Monitoring and Optimization in the Deep Learning AMI](https://docs.aws.amazon.com/dlami/latest/devguide/tutorial-gpu.html) 
  +  [Optimizing I/O for GPU performance tuning of deep learning training in Amazon SageMaker AI](https://aws.amazon.com/blogs/machine-learning/optimizing-i-o-for-gpu-performance-tuning-of-deep-learning-training-in-amazon-sagemaker/) 
+  使用最新的高性能库和 GPU 驱动程序。
+  使用自动化功能在不使用 GPU 实例时将其释放。

## 资源
<a name="resources"></a>

 **相关文档：**
+  [在 Amazon ECS 上使用 GPU](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-gpu.html) 
+  [GPU 实例](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html#gpu-instances) 
+  [使用 AWS Trainium 的实例](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html#aws-trainium-instances) 
+  [使用 AWS Inferentia 的实例](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/accelerated-computing-instances.html#aws-inferentia-instances) 
+  [Let’s Architect\$1 Architecting with custom chips and accelerators](https://aws.amazon.com/blogs/architecture/lets-architect-custom-chips-and-accelerators/) 
+  [加速计算](https://aws.amazon.com/ec2/instance-types/#Accelerated_Computing) 
+  [Amazon EC2 VT1 Instances](https://aws.amazon.com/ec2/instance-types/vt1/) 
+  [如何为我的工作负载选择适当的 Amazon EC2 实例类型？](https://aws.amazon.com/premiumsupport/knowledge-center/ec2-instance-choose-type-for-workload/) 
+  [Choose the best AI accelerator and model compilation for computer vision inference with Amazon SageMaker AI](https://aws.amazon.com/blogs/machine-learning/choose-the-best-ai-accelerator-and-model-compilation-for-computer-vision-inference-with-amazon-sagemaker/) 

 **相关视频：**
+  [AWS re:Invent 2021 - How to select Amazon Elastic Compute Cloud GPU instances for deep learning](https://www.youtube.com/watch?v=4bVrIbgGWEA&ab_channel=AWSEvents) 
+  [AWS re:Invent 2022 - [NEW LAUNCH\$1] Introducing AWS Inferentia2-based Amazon EC2 Inf2 instances](https://www.youtube.com/watch?v=jpqiG02Y2H4&ab_channel=AWSEvents) 
+  [AWS re:Invent 2022 - Accelerate deep learning and innovate faster with AWS Trainium](https://www.youtube.com/watch?v=YRqvfNwqUIA&ab_channel=AWSEvents) 
+  [AWS re:Invent 2022 - Deep learning on AWS with NVIDIA: From training to deployment](https://www.youtube.com/watch?v=l8AFfaCkp0E&ab_channel=AWSEvents) 

 **相关示例：**
+  [Amazon SageMaker AI and NVIDIA GPU Cloud (NGC)](https://github.com/aws-samples/amazon-sagemaker-nvidia-ngc-examples) 
+  [Use SageMaker AI with Trainium and Inferentia for optimized deep learning training and inferencing workloads](https://github.com/aws-samples/sagemaker-trainium-inferentia) 
+  [Optimizing NLP models with Amazon Elastic Compute Cloud Inf1 instances in Amazon SageMaker AI](https://github.com/aws-samples/aws-inferentia-huggingface-workshop) 