本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# 使用 TorchServe 部署模型
<a name="deploy-models-frameworks-torchserve"></a>

TorchServe 是 PyTorch 推薦的模型伺服器，已預先安裝在 AWS PyTorch Deep Learning Container (DLC)。這個強大的工具為客戶提供一致且易於使用的體驗，無論模型大小或分佈為何，都能在各種 AWS 執行個體之間部署多個 PyTorch 模型時提供高效能，包括 CPU、GPU、Neuron 和 Graviton。

TorchServe 支援多種進階功能，包括動態批處理、微批處理、模型 A/B 測試、串流傳輸、Torch XLA、TensorRT、ONNX 及 IPEX。此外，它還流暢整合了 PyTorch 的大型模型解決方案 PiPPy，能夠有效處理大型模型。此外，TorchServe 還擴展了對 DeepSpeed、Accelerate、Fast Transformers 等熱門開放原始碼程式庫的支援，進一步擴展了其功能。使用 TorchServe， AWS 使用者可以放心地部署和提供 PyTorch 模型，利用其在各種硬體組態和模型類型中的多樣性和最佳化效能。有關詳細資訊，請參閱 [PyTorch 文件](https://pytorch.org/serve/)與 [GitHub 上的 TorchServe](https://github.com/pytorch/serve)。

下表列出 AWS TorchServe 支援的 PyTorch TorchServe DLCs。


| 執行個體類型 | SageMaker AI PyTorch DLC 連結 | 
| --- | --- | 
| CPU 和 GPU | [SageMaker AI PyTorch 容器](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#sagemaker-framework-containers-sm-support-only) | 
| Neuron | [PyTorch Neuron 容器](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#neuron-containers) | 
| Graviton | [SageMaker AI PyTorch Graviton 容器](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#sagemaker-framework-graviton-containers-sm-support-only) | 

以下各節說明如何在 Amazon SageMaker AI 建立及測試 PyTorch DLC 的設定。

## 開始使用
<a name="deploy-models-frameworks-torchserve-prereqs"></a>

首先，請務必確認您已進行下列事前準備：

1. 確定您能夠存取 AWS 帳戶。設定您的環境，讓 AWS CLI 可以透過 IAM AWS 使用者或 IAM 角色存取您的帳戶。我們建議使用 IAM 角色。為了在個人帳戶進行測試，您可以將以下受管權限政策附加到 IAM 角色：
   + [AmazonEC2ContainerRegistryFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryFullAccess)
   + [AmazonEC2FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEC2FullAccess)
   + [AWS ServiceRoleForAmazonEKSNodegroup](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AWSServiceRoleForAmazonEKSNodegroup)
   + [AmazonSageMakerFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess)
   + [AmazonS3FullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonS3FullAccess)

1. 在本機設定相依性，如以下範例所示：

   ```
   from datetime import datetime
       import os
       import json
       import logging
       import time
       
       # External Dependencies:
       import boto3
       from botocore.exceptions import ClientError
       import sagemaker
       
       sess = boto3.Session()
       sm = sess.client("sagemaker")
       region = sess.region_name
       account = boto3.client("sts").get_caller_identity().get("Account")
       
       smsess = sagemaker.Session(boto_session=sess)
       role = sagemaker.get_execution_role()
       
       # Configuration:
       bucket_name = smsess.default_bucket()
       prefix = "torchserve"
       output_path = f"s3://{bucket_name}/{prefix}/models"
       print(f"account={account}, region={region}, role={role}")
   ```

1. 擷取 PyTorch DLC 映像檔，如下列範例所示。

   SageMaker AI PyTorch DLC 映像適用於所有 AWS 區域。如需詳細資訊，請參閱 [DLC 容器映像檔清單](https://github.com/aws/deep-learning-containers/blob/master/available_images.md#sagemaker-framework-containers-sm-support-only)。

   ```
   baseimage = sagemaker.image_uris.retrieve(
           framework="pytorch",
           region="<region>",
           py_version="py310",
           image_scope="inference",
           version="2.0.1",
           instance_type="ml.g4dn.16xlarge",
       )
   ```

1. 建立本機工作區。

   ```
   mkdir -p workspace/
   ```

## 新增套件
<a name="deploy-models-frameworks-torchserve-package"></a>

下列各節描述如何為 PyTorch DLC 映像檔新增及預先安裝套件。

**BYOC 使用案例**

下列步驟概述如何將套件新增至 PyTorch DLC 映像檔。如需自訂容器的詳細資訊，請參閱[建置 AWS 深度學習容器自訂映像](https://github.com/aws/deep-learning-containers/blob/master/custom_images.md)。

1. 假設您想要將套件新增至 PyTorch DLC Docker 映像檔。在 `docker` 目錄建立 Dockerfile，如下列範例所示：

   ```
   mkdir -p workspace/docker
       cat workspace/docker/Dockerfile
       
       ARG BASE_IMAGE
       
       FROM $BASE_IMAGE
       
       #Install any additional libraries
       RUN pip install transformers==4.28.1
   ```

1. 使用下列 [build\$1and\$1push.sh](https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/torchserve/mme-gpu/workspace/docker/build_and_push.sh) 指令碼建立並發布自訂的 Docker 映像檔。

   ```
   # Download script build_and_push.sh to workspace/docker
       ls workspace/docker
       build_and_push.sh  Dockerfile
       
       # Build and publish your docker image
       reponame = "torchserve"
       versiontag = "demo-0.1"
       
       ./build_and_push.sh {reponame} {versiontag} {baseimage} {region} {account}
   ```

**SageMaker AI 預先安裝使用案例**

下列範例向您顯示如何預先安裝套件至 PyTorch DLC 容器。您必須在本機目錄 `workspace/code` 建立 `requirements.txt` 檔案。

```
mkdir -p workspace/code
    cat workspace/code/requirements.txt
    
    transformers==4.28.1
```

## 建立 TorchServe 模型成品
<a name="deploy-models-frameworks-torchserve-artifacts"></a>

在下面的範例中，我們使用預先訓練的 [MNIST 模型](https://github.com/pytorch/serve/tree/master/examples/image_classifier/mnist)。我們建立 `workspace/mnist` 目錄，依照 [TorchServe 自訂服務指示](https://github.com/pytorch/serve/blob/master/docs/custom_service.md#custom-service)來實作 [mnist\$1handler.py](https://github.com/pytorch/serve/blob/master/examples/image_classifier/mnist/mnist_handler.py)，並在 [model-config.yaml](https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/torchserve/mme-gpu/workspace/lama/model-config.yaml) [設定模型參數](https://github.com/pytorch/serve/tree/master/model-archiver#config-file) (例如批次大小與工作者)。然後，我們使用 TorchServe `torch-model-archiver` 工具建立模型成品並上傳到 Amazon S3。

1. 在 `model-config.yaml` 設定模型參數。

   ```
   ls -al workspace/mnist-dev
       
       mnist.py
       mnist_handler.py
       mnist_cnn.pt
       model-config.yaml
       
       # config the model
       cat workspace/mnist-dev/model-config.yaml
       minWorkers: 1
       maxWorkers: 1
       batchSize: 4
       maxBatchDelay: 200
       responseTimeout: 300
   ```

1. 使用 [torch-model-archiver](https://github.com/pytorch/serve/tree/master/model-archiver#torch-model-archiver-for-torchserve) 建立模型成品。

   ```
   torch-model-archiver --model-name mnist --version 1.0 --model-file workspace/mnist-dev/mnist.py --serialized-file workspace/mnist-dev/mnist_cnn.pt --handler workspace/mnist-dev/mnist_handler.py --config-file workspace/mnist-dev/model-config.yaml --archive-format tgz
   ```

   如果要預先安裝套件，則必須在 `tar.gz` 檔案包含 `code` 目錄。

   ```
   cd workspace
       torch-model-archiver --model-name mnist --version 1.0 --model-file mnist-dev/mnist.py --serialized-file mnist-dev/mnist_cnn.pt --handler mnist-dev/mnist_handler.py --config-file mnist-dev/model-config.yaml --archive-format no-archive
       
       cd mnist
       mv ../code .
       tar cvzf mnist.tar.gz .
   ```

1. 將 `mnist.tar.gz` 上傳到 Amazon S3。

   ```
   # upload mnist.tar.gz to S3
       output_path = f"s3://{bucket_name}/{prefix}/models"
       aws s3 cp mnist.tar.gz {output_path}/mnist.tar.gz
   ```

## 使用單一模型端點透過 TorchServe 進行部署
<a name="deploy-models-frameworks-torchserve-single-model"></a>

下列範例說明如何建立[單一模型即時推斷端點](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints-deployment.html)、將模型部署到端點，以及如何使用 [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) 測試端點。

```
from sagemaker.model import Model
    from sagemaker.predictor import Predictor
    
    # create the single model endpoint and deploy it on SageMaker AI
    model = Model(model_data = f'{output_path}/mnist.tar.gz', 
                  image_uri = baseimage,
                  role = role,
                  predictor_cls = Predictor,
                  name = "mnist",
                  sagemaker_session = smsess)
                  
    endpoint_name = 'torchserve-endpoint-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
    predictor = model.deploy(instance_type='ml.g4dn.xlarge',
                             initial_instance_count=1,
                             endpoint_name = endpoint_name,
                             serializer=JSONSerializer(),
                             deserializer=JSONDeserializer())  
                             
    # test the endpoint
    import random
    import numpy as np
    dummy_data = {"inputs": np.random.rand(16, 1, 28, 28).tolist()}
    
    res = predictor.predict(dummy_data)
```

## 使用多模型端點透過 TorchServe 進行部署
<a name="deploy-models-frameworks-torchserve-multi-model"></a>

[多模型端點](https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html)提供可擴展且經濟實惠的解決方案，可在一個端點後面託管大量模型。它們透過共用相同的資源機群及服務容器來託管所有模型，進而提高端點使用率。由於 SageMaker AI 可以動態管理載入及卸載模型，以及根據流量模式調整資源的規模，因此可減少部署負擔。對於需要加速運算能力的深度學習及生成式 AI 模型，多模型端點特別有用。

透過在 SageMaker AI 多模型端點使用 TorchServe，您可以使用您熟悉的服務堆疊，同時利用 SageMaker AI 多模型端點提供的資源共用及簡化的模型管理來加快開發速度。

以下範例說明如何建立多模型端點、將模型部署到端點，以及如何使用 [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/) 測試端點。您可以在此[筆記本範例](https://github.com/aws/amazon-sagemaker-examples/blob/main/inference/torchserve/mme-gpu/torchserve_multi_model_endpoint.ipynb)找到其他詳細資訊。

```
from sagemaker.multidatamodel import MultiDataModel
    from sagemaker.model import Model
    from sagemaker.predictor import Predictor
    
    # create the single model endpoint and deploy it on SageMaker AI
    model = Model(model_data = f'{output_path}/mnist.tar.gz', 
                  image_uri = baseimage,
                  role = role,
                  sagemaker_session = smsess)
                  
    endpoint_name = 'torchserve-endpoint-' + time.strftime("%Y-%m-%d-%H-%M-%S", time.gmtime())
    mme = MultiDataModel(
        name = endpoint_name,
        model_data_prefix = output_path,
        model = model,
        sagemaker_session = smsess)
    
    mme.deploy(
        initial_instance_count = 1,
        instance_type = "ml.g4dn.xlarge",
        serializer=sagemaker.serializers.JSONSerializer(),
        deserializer=sagemaker.deserializers.JSONDeserializer())
    
    # list models
    list(mme.list_models())
    
    # create mnist v2 model artifacts
    cp mnist.tar.gz mnistv2.tar.gz
    
    # add mnistv2
    mme.add_model(mnistv2.tar.gz)
    
    # list models
    list(mme.list_models())
    
    predictor = Predictor(endpoint_name=mme.endpoint_name, sagemaker_session=smsess)
                             
    # test the endpoint
    import random
    import numpy as np
    dummy_data = {"inputs": np.random.rand(16, 1, 28, 28).tolist()}
    
    res = predictor.predict(date=dummy_data, target_model="mnist.tar.gz")
```

## 指標
<a name="deploy-models-frameworks-torchserve-metrics"></a>

TorchServe 支援系統層級與模型層級指標。您可以透過環境變數 `TS_METRICS_MODE`，在日誌格式模式或 Prometheus 模式啟用指標。您可以使用 TorchServe 中央指標設定檔 `metrics.yaml` 來指定要追蹤的指標類型，例如請求計數、延遲、記憶體使用量、GPU 使用率等。透過參考此檔案，您可以深入瞭解已部署模型的效能及健康狀況，並即時有效監控 TorchServe 伺服器的行為。如需詳細資訊，請參閱 [TorchServe 指標文件](https://github.com/pytorch/serve/blob/master/docs/metrics.md#torchserve-metrics)。

您可以透過 Amazon CloudWatch 日誌篩選條件存取類似於 StatsD 格式的指標日誌。下列為 TorchServe 指標日誌的範例：

```
CPUUtilization.Percent:0.0|#Level:Host|#hostname:my_machine_name,timestamp:1682098185
    DiskAvailable.Gigabytes:318.0416717529297|#Level:Host|#hostname:my_machine_name,timestamp:1682098185
```