本文為英文版的機器翻譯版本，如內容有任何歧義或不一致之處，概以英文版為準。

# 雲端執行個體
<a name="neo-cloud-instances"></a>

Amazon SageMaker Neo 為熱門的機器學習架構提供編譯支援，例如 TensorFlow、PyTorch、MXNet 等。您可以將編譯好的模型部署到雲端執行個體和 AWS Inferentia 執行個體。如需完整的支援架構和執行個體類型清單，請參閱[支援的執行個體類型和架構](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-supported-cloud.html)。

您可以使用以下三種方式之一來編譯模型：透過 AWS CLI、SageMaker AI 主控台，或適用於 Python 的 SageMaker AI SDK。如需更多資訊，請參閱[使用 Neo 編譯模型](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation.html)。編譯後，您的模型成品會存放在您在編譯任務期間指定的 Amazon S3 儲存貯體 URI 中。您可以使用適用於 Python 的 SageMaker AI SDK、適用於 Python (Boto3) 的 AWS SDK、AWS CLI 或 AWS 主控台，將編譯好的模型部署到雲端執行個體和 AWS Inferentia 執行個體。

如果使用 AWS CLI、主控台或 Boto3 部署模型，則必須為主要容器選取一個 Docker 映像 Amazon ECR URI。如需 Amazon ECR URI 的清單，請參閱 [Neo 推論容器映像](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)。

**Topics**
+ [支援的執行個體類型和架構](neo-supported-cloud.md)
+ [部署模型](neo-deployment-hosting-services.md)
+ [具有所部署服務的推論請求](neo-requests.md)
+ [推論容器映像](neo-deployment-hosting-services-container-images.md)

# 支援的執行個體類型和架構
<a name="neo-supported-cloud"></a>

Amazon SageMaker Neo 支援適用於編譯和部署的熱門深度學習架構。您可以將模型部署到雲端執行個體和 AWS Inferentia 執行個體類型。

以下說明 SageMaker Neo 支援的架構以及您可以編譯和部署的目標雲端執行個體。如需如何將已編譯的模型部署到雲端或 Inferentia 執行個體的詳細資訊，請參閱[使用雲端執行個體部署模型](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services.html)。

## 雲端執行個體
<a name="neo-supported-cloud-instances"></a>

SageMaker Neo 支援下列適用於 CPU 和 GPU 雲端執行個體的深度學習架構：


| 架構 | 框架版本 | 模型版本 | 模型 | 模型格式 (以 \$1.tar.gz 封裝) | 工具組 | 
| --- | --- | --- | --- | --- | --- | 
| MXNet | 1.8.0 | 支援 1.8.0 或更早版本 | 影像分類、物件偵測、語意分割、姿勢估算、活動辨識 | 一個符號檔案 (.json) 和一個參數檔案 (.params) | GluonCV v0.8.0 | 
| ONNX | 1.7.0 | 支援 1.7.0 或更早版本 | 影像影像分類、SVM | 一個模型檔案 (.onnx) |  | 
| Keras | 2.2.4 | 支援 2.2.4 或更早版本 | 影像分類 | 一個模型定義檔案 (.h5) |  | 
| PyTorch | 1.4、1.5、1.6、1.7、1.8、1.12、1.13 或 2.0 | 支援 1.4、1.5、1.6、1.7、1.8、1.12、1.13 和 2.0 |  影像分類 第 1.13 版和第 2.0 版支援物件偵測、視覺轉換器和 HuggingFace  | 一個具有輸入 dtype 之 float32 的模型定義檔案 (.pt 或 .pth) |  | 
| TensorFlow | 1.15.3 或 2.9 | 支援 1.15.3 或 2.9 | 影像分類 | 針對儲存的模型，有一個 .pb 或一個 .pbtxt 檔案，以及包含變數的變數目錄 針對凍結的模型，只有一個 .pb 或 .pbtxt 檔案 |  | 
| XGBoost | 1.3.3 | 支援 1.3.3 或更早版本 | 決策樹 | 一個 XGBoost 模型檔案 (模型)，其中樹中的節點數量低於 2^31 |  | 

**注意**  
“模型版本” 是用來訓練和匯出模型的架構版本。

## 執行個體類型
<a name="neo-supported-cloud-instances-types"></a>

 您可以將 SageMaker AI 已編譯模型部署到以下列出的其中一個雲端執行個體：


| 執行個體 | 運算類型 | 
| --- | --- | 
| `ml_c4` | 標準 | 
| `ml_c5` | 標準 | 
| `ml_m4` | 標準 | 
| `ml_m5` | 標準 | 
| `ml_p2` | 加速運算 | 
| `ml_p3` | 加速運算 | 
| `ml_g4dn` | 加速運算 | 

 如需每種執行個體類型之可用 vCPU、記憶體和每小時價格的資訊，請參閱 [Amazon SageMaker 定價](https://aws.amazon.com/sagemaker/pricing/)。

**注意**  
使用 PyTorch 架構編譯 `ml_*` 執行個體時，請使用**輸出組態**中的**編譯器選項**欄位來提供模型輸入的正確資料類型 (`dtype`)。  
預設設定為 `"float32"`。

## AWS Inferentia
<a name="neo-supported-inferentia"></a>

 SageMaker Neo 支援下列適用於 Inf1 的深度學習架構：


| 架構 | 框架版本 | 模型版本 | 模型 | 模型格式 (以 \$1.tar.gz 封裝) | 工具組 | 
| --- | --- | --- | --- | --- | --- | 
| MXNet | 1.5 或 1.8  | 支援 1.8、1.5 或更早版本 | 影像分類、物件偵測、語意分割、姿勢估算、活動辨識 | 一個符號檔案 (.json) 和一個參數檔案 (.params) | GluonCV v0.8.0 | 
| PyTorch | 1.7, 1.8 或 1.9 | 支援 1.9 或更早版本 | 影像分類 | 一個具有輸入 dtype 之 float32 的模型定義檔案 (.pt 或 .pth) |  | 
| TensorFlow | 1.15 或 2.5 | 支援 2.5、1.15 或更早版本 | 影像分類 | 針對儲存的模型，有一個 .pb 或一個 .pbtxt 檔案，以及包含變數的變數目錄 針對凍結的模型，只有一個 .pb 或 .pbtxt 檔案 |  | 

**注意**  
“模型版本” 是用來訓練和匯出模型的架構版本。

您可以將 SageMaker Neo 已編譯模型部署到以 AWS Inferentia 為基礎的 Amazon EC2 Inf1 執行個體。AWSInferentia 是 Amazon 專門設計用來加速深度學習的第一款自訂晶片。目前，您可以使用 `ml_inf1` 執行個體來部署已編譯的模型。

### AWS Inferentia2 和 AWS Trainium
<a name="neo-supported-inferentia-trainium"></a>

目前，您可以將 SageMaker Neo 已編譯模型部署到以 AWS Inferentia2 為基礎的 Amazon EC2 Inf2 執行個體 (位於美國東部 (俄亥俄) 區域)，也可已部署到以 AWS Trainium 為基礎的 Amazon EC2 Trn1 執行個體 (位於美國東部 (維吉尼亞北部) 區域)。如需這些執行個體上受支援模型的詳細資訊，請參閱 AWS Neuron 文件中的[模型架構擬合指南](https://awsdocs-neuron.readthedocs-hosted.com/en/latest/general/arch/model-architecture-fit.html)，以及 [Neuron Github 儲存庫](https://github.com/aws-neuron/aws-neuron-sagemaker-samples)中的範例。

# 部署模型
<a name="neo-deployment-hosting-services"></a>

若要將使用 Amazon SageMaker Neo 編譯的模型部署到 HTTPS 端點，您必須使用 Amazon SageMaker AI 託管服務設定並建立模型的端點。開發人員目前可以使用 Amazon SageMaker API，將模組部署至 ml.c5、ml.c4、ml.m5、ml.m4、ml.p3、ml.p2 和 ml.inf1 執行個體。

如果是 [Inferentia](https://aws.amazon.com/machine-learning/inferentia/) 和 [Trainium](https://aws.amazon.com/machine-learning/trainium/) 執行個體，則需要特別針對那些執行個體編譯模型。針對其他執行個體類型編譯的模型，不保證適用於 Inferentia 或 Trainium 執行個體。

當您部署已編譯的模型時，目標使用的執行個體需要和編譯使用的執行個體相同。這會建立可用來執行推論的 SageMaker AI 端點。您可以使用下列任何一種方式部署 Neo 編譯的模型：[Amazon SageMaker AI 適用於 Python 的 SDK](https://sagemaker.readthedocs.io/en/stable/)、[適用於 Python 的 SDK (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)、[AWS Command Line Interface](https://docs.aws.amazon.com/cli/latest/reference/)，以及 [SageMaker AI 主控台](https://console.aws.amazon.com/sagemaker)。

**注意**  
如需使用 AWS CLI、 主控台或 Boto3 部署模型，請參閱 [Neo 推論容器映像](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)，以選取主要容器的推論映像 URI。

**Topics**
+ [先決條件](neo-deployment-hosting-services-prerequisites.md)
+ [使用 SageMaker SDK 部署已編譯的模型](neo-deployment-hosting-services-sdk.md)
+ [使用 Boto3 部署編譯的模型](neo-deployment-hosting-services-boto3.md)
+ [使用 部署編譯的模型 AWS CLI](neo-deployment-hosting-services-cli.md)
+ [使用主控台部署編譯的模型](neo-deployment-hosting-services-console.md)

# 先決條件
<a name="neo-deployment-hosting-services-prerequisites"></a>

**注意**  
如果您使用 適用於 Python (Boto3) 的 AWS SDK、 或 SageMaker AI 主控台編譯模型 AWS CLI，請遵循本節中的指示。

若要建立 SageMaker Neo 編譯的模型，您需要下列項目：

1. 一個 Docker 映像 Amazon ECR URI。您可以從[這個清單](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)選取一個符合需求的 URL。

1. 進入點指令碼檔案：

   1. **若為 PyTorch 和 MXNet 模型：**

      *如果您使用 SageMaker AI 訓練模型*，則訓練指令碼必須實作下述功能。訓練指令碼可當成推論期間的進入點指令碼。在[使用 MXNet 模組和 SageMaker Neo 進行 MNIST 訓練、編譯和部署](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_neo_compilation_jobs/mxnet_mnist/mxnet_mnist_neo.html)詳細說明的範例中，訓練指令碼 (`mnist.py`) 會實作必要的函式。

      *如果您未使用 SageMaker AI 訓練模型*，則必須提供可在推論時使用的進入點指令碼 (`inference.py`) 檔案。根據架構 (MxNet 或 PyTorch)，推論指令碼位置必須符合 [MxNet 的 SageMaker Python SDK 模型目錄結構](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/using_mxnet.html#model-directory-structure)，或是 [PyTorch 的模型目錄結構](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#model-directory-structure)。

      在 CPU 和 GPU 執行個體類型上搭配 **PyTorch** 和 **MXNet** 使用 Neo 推論最佳化容器映像時，推論指令碼必須實作下列功能：
      + `model_fn`：載入模型。(選用)
      + `input_fn`：將傳入請求承載轉換為 numpy 陣列。
      + `predict_fn`：執行預測。
      + `output_fn`：將預測輸出轉換為回應承載。
      + 或者，您也可以定義 `transform_fn`，合併 `input_fn`、`predict_fn` 與 `output_fn`。

      以下是名為 `code` (`code/inference.py`) 的目錄中的 `inference.py` 指令碼範例，用於 **PyTorch 和 MXNet (Gluon 和模組)。**這些範例會先載入模型，然後將其提供給 GPU 上的映像資料：

------
#### [ MXNet Module ]

      ```
      import numpy as np
      import json
      import mxnet as mx
      import neomx  # noqa: F401
      from collections import namedtuple
      
      Batch = namedtuple('Batch', ['data'])
      
      # Change the context to mx.cpu() if deploying to a CPU endpoint
      ctx = mx.gpu()
      
      def model_fn(model_dir):
          # The compiled model artifacts are saved with the prefix 'compiled'
          sym, arg_params, aux_params = mx.model.load_checkpoint('compiled', 0)
          mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
          exe = mod.bind(for_training=False,
                         data_shapes=[('data', (1,3,224,224))],
                         label_shapes=mod._label_shapes)
          mod.set_params(arg_params, aux_params, allow_missing=True)
          
          # Run warm-up inference on empty data during model load (required for GPU)
          data = mx.nd.empty((1,3,224,224), ctx=ctx)
          mod.forward(Batch([data]))
          return mod
      
      
      def transform_fn(mod, image, input_content_type, output_content_type):
          # pre-processing
          decoded = mx.image.imdecode(image)
          resized = mx.image.resize_short(decoded, 224)
          cropped, crop_info = mx.image.center_crop(resized, (224, 224))
          normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                        mean=mx.nd.array([0.485, 0.456, 0.406]),
                                        std=mx.nd.array([0.229, 0.224, 0.225]))
          transposed = normalized.transpose((2, 0, 1))
          batchified = transposed.expand_dims(axis=0)
          casted = batchified.astype(dtype='float32')
          processed_input = casted.as_in_context(ctx)
      
          # prediction/inference
          mod.forward(Batch([processed_input]))
      
          # post-processing
          prob = mod.get_outputs()[0].asnumpy().tolist()
          prob_json = json.dumps(prob)
          return prob_json, output_content_type
      ```

------
#### [ MXNet Gluon ]

      ```
      import numpy as np
      import json
      import mxnet as mx
      import neomx  # noqa: F401
      
      # Change the context to mx.cpu() if deploying to a CPU endpoint
      ctx = mx.gpu()
      
      def model_fn(model_dir):
          # The compiled model artifacts are saved with the prefix 'compiled'
          block = mx.gluon.nn.SymbolBlock.imports('compiled-symbol.json',['data'],'compiled-0000.params', ctx=ctx)
          
          # Hybridize the model & pass required options for Neo: static_alloc=True & static_shape=True
          block.hybridize(static_alloc=True, static_shape=True)
          
          # Run warm-up inference on empty data during model load (required for GPU)
          data = mx.nd.empty((1,3,224,224), ctx=ctx)
          warm_up = block(data)
          return block
      
      
      def input_fn(image, input_content_type):
          # pre-processing
          decoded = mx.image.imdecode(image)
          resized = mx.image.resize_short(decoded, 224)
          cropped, crop_info = mx.image.center_crop(resized, (224, 224))
          normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                        mean=mx.nd.array([0.485, 0.456, 0.406]),
                                        std=mx.nd.array([0.229, 0.224, 0.225]))
          transposed = normalized.transpose((2, 0, 1))
          batchified = transposed.expand_dims(axis=0)
          casted = batchified.astype(dtype='float32')
          processed_input = casted.as_in_context(ctx)
          return processed_input
      
      
      def predict_fn(processed_input_data, block):
          # prediction/inference
          prediction = block(processed_input_data)
          return prediction
      
      def output_fn(prediction, output_content_type):
          # post-processing
          prob = prediction.asnumpy().tolist()
          prob_json = json.dumps(prob)
          return prob_json, output_content_type
      ```

------
#### [ PyTorch 1.4 and Older ]

      ```
      import os
      import torch
      import torch.nn.parallel
      import torch.optim
      import torch.utils.data
      import torch.utils.data.distributed
      import torchvision.transforms as transforms
      from PIL import Image
      import io
      import json
      import pickle
      
      
      def model_fn(model_dir):
          """Load the model and return it.
          Providing this function is optional.
          There is a default model_fn available which will load the model
          compiled using SageMaker Neo. You can override it here.
      
          Keyword arguments:
          model_dir -- the directory path where the model artifacts are present
          """
      
          # The compiled model is saved as "compiled.pt"
          model_path = os.path.join(model_dir, 'compiled.pt')
          with torch.neo.config(model_dir=model_dir, neo_runtime=True):
              model = torch.jit.load(model_path)
              device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
              model = model.to(device)
      
          # We recommend that you run warm-up inference during model load
          sample_input_path = os.path.join(model_dir, 'sample_input.pkl')
          with open(sample_input_path, 'rb') as input_file:
              model_input = pickle.load(input_file)
          if torch.is_tensor(model_input):
              model_input = model_input.to(device)
              model(model_input)
          elif isinstance(model_input, tuple):
              model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp))
              model(*model_input)
          else:
              print("Only supports a torch tensor or a tuple of torch tensors")
              return model
      
      
      def transform_fn(model, request_body, request_content_type,
                       response_content_type):
          """Run prediction and return the output.
          The function
          1. Pre-processes the input request
          2. Runs prediction
          3. Post-processes the prediction output.
          """
          # preprocess
          decoded = Image.open(io.BytesIO(request_body))
          preprocess = transforms.Compose([
              transforms.Resize(256),
              transforms.CenterCrop(224),
              transforms.ToTensor(),
              transforms.Normalize(
                  mean=[
                      0.485, 0.456, 0.406], std=[
                      0.229, 0.224, 0.225]),
          ])
          normalized = preprocess(decoded)
          batchified = normalized.unsqueeze(0)
          # predict
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          batchified = batchified.to(device)
          output = model.forward(batchified)
      
          return json.dumps(output.cpu().numpy().tolist()), response_content_type
      ```

------
#### [ PyTorch 1.5 and Newer ]

      ```
      import os
      import torch
      import torch.nn.parallel
      import torch.optim
      import torch.utils.data
      import torch.utils.data.distributed
      import torchvision.transforms as transforms
      from PIL import Image
      import io
      import json
      import pickle
      
      
      def model_fn(model_dir):
          """Load the model and return it.
          Providing this function is optional.
          There is a default_model_fn available, which will load the model
          compiled using SageMaker Neo. You can override the default here.
          The model_fn only needs to be defined if your model needs extra
          steps to load, and can otherwise be left undefined.
      
          Keyword arguments:
          model_dir -- the directory path where the model artifacts are present
          """
      
          # The compiled model is saved as "model.pt"
          model_path = os.path.join(model_dir, 'model.pt')
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          model = torch.jit.load(model_path, map_location=device)
          model = model.to(device)
      
          return model
      
      
      def transform_fn(model, request_body, request_content_type,
                          response_content_type):
          """Run prediction and return the output.
          The function
          1. Pre-processes the input request
          2. Runs prediction
          3. Post-processes the prediction output.
          """
          # preprocess
          decoded = Image.open(io.BytesIO(request_body))
          preprocess = transforms.Compose([
                                      transforms.Resize(256),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize(
                                          mean=[
                                              0.485, 0.456, 0.406], std=[
                                              0.229, 0.224, 0.225]),
                                          ])
          normalized = preprocess(decoded)
          batchified = normalized.unsqueeze(0)
          
          # predict
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          batchified = batchified.to(device)
          output = model.forward(batchified)
          return json.dumps(output.cpu().numpy().tolist()), response_content_type
      ```

------

   1.  **針對 inf1 執行個體或 onnx、xgboost、keras 容器映像檔** 

      針對所有其他 Neo 推論最佳化容器映像或 inferentia 執行個體類型，進入點指令碼必須為 Neo 深度學習執行期實作以下函式：
      + `neo_preprocess`：將傳入請求承載轉換為 numpy 陣列。
      + `neo_postprocess`：將 Neo 深度學習執行期的預測輸出轉換為回應內文。
**注意**  
前面這兩個函式都未使用 MXNet、PyTorch 或 TensorFlow 的任何功能。

      如需如何使用這些函式的範例，請參閱 [Neo 模型編譯範例筆記本](https://docs.aws.amazon.com//sagemaker/latest/dg/neo.html#neo-sample-notebooks)。

   1. **針對 TensorFlow 模型**

      如果您的模型在將資料傳送至模型之前需要自訂的預處理和後處理邏輯，則您必須指定可在推論時使用的進入點指令碼 `inference.py` 檔案。指令碼應該實作一對 `input_handler` 和 `output_handler` 函式或單一處理常式函式。
**注意**  
請注意，如果已實作處理常式函式，則會忽略 `input_handler` 和 `output_handler`。

      以下是 `inference.py` 指令碼的程式碼範例，您可以將其與編譯模型結合在一起，在映像分類模型上執行自訂預處理和後處理。SageMaker AI 用戶端會將映像檔案當成 `application/x-image` 內容類型傳送至 `input_handler` 函式，在這個函式將該檔案轉換為 JSON。然後使用 REST API 將轉換後的映像檔案傳送至 [Tensorflow 模型伺服器 (TFX)](https://www.tensorflow.org/tfx/serving/api_rest)。

      ```
      import json
      import numpy as np
      import json
      import io
      from PIL import Image
      
      def input_handler(data, context):
          """ Pre-process request input before it is sent to TensorFlow Serving REST API
          
          Args:
          data (obj): the request data, in format of dict or string
          context (Context): an object containing request and configuration details
          
          Returns:
          (dict): a JSON-serializable dict that contains request body and headers
          """
          f = data.read()
          f = io.BytesIO(f)
          image = Image.open(f).convert('RGB')
          batch_size = 1
          image = np.asarray(image.resize((512, 512)))
          image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
          body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()})
          return body
      
      def output_handler(data, context):
          """Post-process TensorFlow Serving output before it is returned to the client.
          
          Args:
          data (obj): the TensorFlow serving response
          context (Context): an object containing request and configuration details
          
          Returns:
          (bytes, string): data to return to client, response content type
          """
          if data.status_code != 200:
              raise ValueError(data.content.decode('utf-8'))
      
          response_content_type = context.accept_header
          prediction = data.content
          return prediction, response_content_type
      ```

      如果沒有自訂的預處理或後處理，SageMaker AI 用戶端會以類似的方式將檔案映像轉換為 JSON，然後再將其傳送到 SageMaker AI 端點。

      如需更多資訊，請參閱[部署至在 SageMaker Python SDK 服務端點的 TensorFlow](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#providing-python-scripts-for-pre-pos-processing)。

1. 包含已編譯模型成品的 Amazon S3 儲存貯體 URI。

# 使用 SageMaker SDK 部署已編譯的模型
<a name="neo-deployment-hosting-services-sdk"></a>

如果模型是使用 或 Amazon SageMaker AI 主控台編譯的 適用於 Python (Boto3) 的 AWS SDK AWS CLI，您必須滿足[先決條件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)區段。請遵循下列其中一個使用案例，根據您編譯模型的方式，部署使用 SageMaker Neo 編譯的模型。

**Topics**
+ [如果您使用 SageMaker SDK 編譯模型](#neo-deployment-hosting-services-sdk-deploy-sm-sdk)
+ [如果您使用 MXNet 或 PyTorch 編譯模型](#neo-deployment-hosting-services-sdk-deploy-sm-boto3)
+ [如果您使用 Boto3、SageMaker 主控台或 CLI for TensorFlow 編譯模型](#neo-deployment-hosting-services-sdk-deploy-sm-boto3-tensorflow)

## 如果您使用 SageMaker SDK 編譯模型
<a name="neo-deployment-hosting-services-sdk-deploy-sm-sdk"></a>

已編譯模型的 [sagemaker.Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html?highlight=sagemaker.Model) 物件控點提供 [deploy()](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html?highlight=sagemaker.Model#sagemaker.model.Model.deploy) 函式，讓您建立服務推論請求的端點。此函式可讓您設定用於端點的執行個體數量和類型。您必須選擇您為其編譯模型的執行個體。例如，在[編譯模型 (Amazon SageMaker SDK)](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation-sagemaker-sdk.html) 區段中編譯的工作，這是 `ml_c5`。

```
predictor = compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.c5.4xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

## 如果您使用 MXNet 或 PyTorch 編譯模型
<a name="neo-deployment-hosting-services-sdk-deploy-sm-boto3"></a>

建立 SageMaker AI 模型，並使用特定架構模型 API 下的 deploy() API 進行部署。若為 MXNet 是 [MXNetModel](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/sagemaker.mxnet.html?highlight=MXNetModel#mxnet-model)，若為 PyTorch 則是 [ PyTorchModel](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/sagemaker.pytorch.html?highlight=PyTorchModel#sagemaker.pytorch.model.PyTorchModel)。建立和部署 SageMaker AI 模型時，您必須將 `MMS_DEFAULT_RESPONSE_TIMEOUT` 環境變數設定為 `500`，並將 `entry_point` 參數指定為推論指令碼 (`inference.py`)，並將 `source_dir` 參數指定為推論指令碼的目錄位置 (`code`)。若要準備推論指令碼 (`inference.py`)，請遵循先決條件步驟。

下列範例說明如何使用 SageMaker AI SDK for Python，以這些函式部署已編譯的模型：

------
#### [ MXNet ]

```
from sagemaker.mxnet import MXNetModel

# Create SageMaker model and deploy an endpoint
sm_mxnet_compiled_model = MXNetModel(
    model_data='insert S3 path of compiled MXNet model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.8.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for MXNet',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_mxnet_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------
#### [ PyTorch 1.4 and Older ]

```
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.4.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------
#### [ PyTorch 1.5 and Newer ]

```
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.5',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------

**注意**  
`AmazonSageMakerFullAccess` 和 `AmazonS3ReadOnlyAccess` 政策必須連接到 `AmazonSageMaker-ExecutionRole` IAM 角色。

## 如果您使用 Boto3、SageMaker 主控台或 CLI for TensorFlow 編譯模型
<a name="neo-deployment-hosting-services-sdk-deploy-sm-boto3-tensorflow"></a>

建構一個 `TensorFlowModel` 物件，然後呼叫部署：

```
role='AmazonSageMaker-ExecutionRole'
model_path='S3 path for model file'
framework_image='inference container arn'
tf_model = TensorFlowModel(model_data=model_path,
                framework_version='1.15.3',
                role=role, 
                image_uri=framework_image)
instance_type='ml.c5.xlarge'
predictor = tf_model.deploy(instance_type=instance_type,
                    initial_instance_count=1)
```

若需更多資訊，請參閱[直接從模型成品部署](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#deploying-directly-from-model-artifacts)。

您可以從[此清單](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)選取符合您需求的 Docker 映像 Amazon ECR URI。

如需關於如何建構 `TensorFlowModel` 物件的更多資訊，請參閱 [SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#tensorflow-serving-model)。

**注意**  
如果您在 GPU 部署模型，第一個推論請求的延遲可能很高。這是因為在第一個推論請求上建立了最佳化的運算核心。我們建議您製作推論請求的暖機檔案，並將其與模型檔案一起儲存，然後再將其傳送至 TFX。這就是所謂的 “暖機” 模型。

下列程式碼片段示範如何在[先決條件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)區段中產生映像分類範例的暖機檔案：

```
import tensorflow as tf
from tensorflow_serving.apis import classification_pb2
from tensorflow_serving.apis import inference_pb2
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_log_pb2
from tensorflow_serving.apis import regression_pb2
import numpy as np

with tf.python_io.TFRecordWriter("tf_serving_warmup_requests") as writer:       
    img = np.random.uniform(0, 1, size=[224, 224, 3]).astype(np.float32)
    img = np.expand_dims(img, axis=0)
    test_data = np.repeat(img, 1, axis=0)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'compiled_models'
    request.model_spec.signature_name = 'serving_default'
    request.inputs['Placeholder:0'].CopyFrom(tf.compat.v1.make_tensor_proto(test_data, shape=test_data.shape, dtype=tf.float32))
    log = prediction_log_pb2.PredictionLog(
    predict_log=prediction_log_pb2.PredictLog(request=request))
    writer.write(log.SerializeToString())
```

如需有關如何 “暖機” 模型的更多資訊，請參閱 [TensorFlow TFX 網頁](https://www.tensorflow.org/tfx/serving/saved_model_warmup)。

# 使用 Boto3 部署編譯的模型
<a name="neo-deployment-hosting-services-boto3"></a>

如果模型是使用 或 Amazon SageMaker AI 主控台編譯的 適用於 Python (Boto3) 的 AWS SDK AWS CLI，您必須滿足[先決條件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)區段。請遵循以下步驟，使用[適用於 Python 的 Amazon Web Services SDK (Boto3)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) 建立和部署 SageMaker Neo 編譯的模型。

**Topics**
+ [部署模型](#neo-deployment-hosting-services-boto3-steps)

## 部署模型
<a name="neo-deployment-hosting-services-boto3-steps"></a>

在您符合[先決條件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)之後，請使用 `create_model`、`create_enpoint_config` 和 `create_endpoint` API。

下列範例示範如何使用這些 API 部署使用 Neo 編譯的模型：

```
import boto3
client = boto3.client('sagemaker')

# create sagemaker model
create_model_api_response = client.create_model(
                                    ModelName='my-sagemaker-model',
                                    PrimaryContainer={
                                        'Image': <insert the ECR Image URI>,
                                        'ModelDataUrl': 's3://path/to/model/artifact/model.tar.gz',
                                        'Environment': {}
                                    },
                                    ExecutionRoleArn='ARN for AmazonSageMaker-ExecutionRole'
                            )

print ("create_model API response", create_model_api_response)

# create sagemaker endpoint config
create_endpoint_config_api_response = client.create_endpoint_config(
                                            EndpointConfigName='sagemaker-neomxnet-endpoint-configuration',
                                            ProductionVariants=[
                                                {
                                                    'VariantName': <provide your variant name>,
                                                    'ModelName': 'my-sagemaker-model',
                                                    'InitialInstanceCount': 1,
                                                    'InstanceType': <provide your instance type here>
                                                },
                                            ]
                                       )

print ("create_endpoint_config API response", create_endpoint_config_api_response)

# create sagemaker endpoint
create_endpoint_api_response = client.create_endpoint(
                                    EndpointName='provide your endpoint name',
                                    EndpointConfigName=<insert your endpoint config name>,
                                )

print ("create_endpoint API response", create_endpoint_api_response)
```

**注意**  
`AmazonSageMakerFullAccess` 和 `AmazonS3ReadOnlyAccess` 政策必須連接到 `AmazonSageMaker-ExecutionRole` IAM 角色。

如需 `create_model`、`create_endpoint_config` 和 `create_endpoint` API 的完整語法，請分別參閱 [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model)、[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config) 以及 [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint)。

如果您未使用 SageMaker AI 訓練模型，請指定下列環境變數：

------
#### [ MXNet and PyTorch ]

```
"Environment": {
    "SAGEMAKER_PROGRAM": "inference.py",
    "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SAGEMAKER_REGION": "insert your region",
    "MMS_DEFAULT_RESPONSE_TIMEOUT": "500"
}
```

------
#### [ TensorFlow ]

```
"Environment": {
    "SAGEMAKER_PROGRAM": "inference.py",
    "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SAGEMAKER_REGION": "insert your region"
}
```

------

 如果您使用 SageMaker AI 訓練模型，請將環境變數 `SAGEMAKER_SUBMIT_DIRECTORY` 指定為包含訓練指令碼的完整 Amazon S3 儲存貯體 URI。

# 使用 部署編譯的模型 AWS CLI
<a name="neo-deployment-hosting-services-cli"></a>

如果模型是使用 或 Amazon SageMaker AI 主控台編譯的 適用於 Python (Boto3) 的 AWS SDK AWS CLI，您必須滿足[先決條件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)區段。請遵循下列步驟，使用 [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/) 建立和部署 SageMaker Neo 編譯的模型。

**Topics**
+ [部署模型](#neo-deploy-cli)

## 部署模型
<a name="neo-deploy-cli"></a>

在您滿足[先決條件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)之後，請使用 `create-model`、 `create-enpoint-config`和 `create-endpoint` AWS CLI 命令。下列步驟說明，如何使用這些命令部署使用 Neo 編譯的模型：


### 建立模型
<a name="neo-deployment-hosting-services-cli-create-model"></a>

從[Neo 推論容器映像](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)中，選取推論映像 URI，然後使用 `create-model` API 建立 SageMaker AI 模型。您可用兩個步驟完成這項工作：

1. 建立 `create_model.json` 檔案。在檔案中，指定模型的名稱、映像 URI、Amazon S3 儲存貯體中 `model.tar.gz` 檔案的路徑，以及 SageMaker AI 執行角色：

   ```
   {
       "ModelName": "insert model name",
       "PrimaryContainer": {
           "Image": "insert the ECR Image URI",
           "ModelDataUrl": "insert S3 archive URL",
           "Environment": {"See details below"}
       },
       "ExecutionRoleArn": "ARN for AmazonSageMaker-ExecutionRole"
   }
   ```

   如果您使用 SageMaker AI 訓練模型，請指定下列環境變數：

   ```
   "Environment": {
       "SAGEMAKER_SUBMIT_DIRECTORY" : "[Full S3 path for *.tar.gz file containing the training script]"
   }
   ```

   如果您未使用 SageMaker AI 訓練模型，請指定下列環境變數：

------
#### [ MXNet and PyTorch ]

   ```
   "Environment": {
       "SAGEMAKER_PROGRAM": "inference.py",
       "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
       "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
       "SAGEMAKER_REGION": "insert your region",
       "MMS_DEFAULT_RESPONSE_TIMEOUT": "500"
   }
   ```

------
#### [ TensorFlow ]

   ```
   "Environment": {
       "SAGEMAKER_PROGRAM": "inference.py",
       "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
       "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
       "SAGEMAKER_REGION": "insert your region"
   }
   ```

------
**注意**  
`AmazonSageMakerFullAccess` 和 `AmazonS3ReadOnlyAccess` 政策必須連接到 `AmazonSageMaker-ExecutionRole` IAM 角色。

1. 執行以下命令：

   ```
   aws sagemaker create-model --cli-input-json file://create_model.json
   ```

   如需 `create-model` API 的完整語法，請參閱 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-model.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-model.html)。

### 建立一個端點組態
<a name="neo-deployment-hosting-services-cli-create-endpoint-config"></a>

建立 SageMaker AI 模型之後，請使用 `create-endpoint-config` API 建立端點組態。若要這麼做，請使用端點組態規格建立 JSON 檔案。例如，您可使用下列程式碼範本並將其儲存為 `create_config.json`：

```
{
    "EndpointConfigName": "<provide your endpoint config name>",
    "ProductionVariants": [
        {
            "VariantName": "<provide your variant name>",
            "ModelName": "my-sagemaker-model",
            "InitialInstanceCount": 1,
            "InstanceType": "<provide your instance type here>",
            "InitialVariantWeight": 1.0
        }
    ]
}
```

現在執行下列 AWS CLI 命令來建立您的端點組態：

```
aws sagemaker create-endpoint-config --cli-input-json file://create_config.json
```

如需 `create-endpoint-config` API 的完整語法，請參閱 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint-config.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint-config.html)。

### 建立端點
<a name="neo-deployment-hosting-services-cli-create-endpoint"></a>

建立端點組態後，請使用 `create-endpoint` API 建立端點：

```
aws sagemaker create-endpoint --endpoint-name '<provide your endpoint name>' --endpoint-config-name '<insert your endpoint config name>'
```

如需 `create-endpoint` API 的完整語法，請參閱 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint.html)。

# 使用主控台部署編譯的模型
<a name="neo-deployment-hosting-services-console"></a>

如果模型是使用 適用於 Python (Boto3) 的 AWS SDK、 或 Amazon SageMaker AI 主控台編譯的 AWS CLI，您必須滿足[先決條件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)區段。請遵循下方步驟，使用 SageMaker AI 主控台[https://console.aws.amazon.com/ SageMaker AI](https://console.aws.amazon.com/sagemaker/)建立和部署 SageMaker AI Neo 編譯的模型。

**Topics**
+ [部署模型](#deploy-the-model-console-steps)

## 部署模型
<a name="deploy-the-model-console-steps"></a>

 符合[先決條件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)之後，請使用下列步驟部署使用 Neo 編譯的模型：

1. 選擇**模型**，然後從**推論** 群組中選擇 **建立模型**。在 **Create model (建立模型)** 頁面上，填寫 **Model name (模型名稱)**、**IAM role (IAM 角色)** 和 **VPC** (選用) 欄位 (如有需要)。  
![\[為推論建立 Neo 模型\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/create-pipeline-model.png)

1. 若要針對部署模型用的容器新增相關資訊，請選擇**新增容器**，然後選擇**下一步**。填寫 **Container input options (容器輸入選項)**、**Location of inference code image (推論程式碼映像的位置)** 和 **Location of model artifacts (模型成品的位置)**，以及選用的 **Container host name (容器主機名稱)** 和 **Environmental variables (環境變數)** 欄位。  
![\[為推論建立 Neo 模型\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/neo-deploy-console-container-definition.png)

1. 若要部署 Neo 編譯模型，請選擇下列項目：
   + **容器輸入選項**：選擇**提供模型成品和推論影像**。
   + **推論程式碼映像的位置**：根據 AWS 區域和應用程式類型，從 [Neo 推論容器映像](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)中選擇推論映像 URI。
   + **Location of model artifact (模型成品的位置)**：輸入 Neo 編譯 API 所產生之編譯模型成品的 Amazon S3 儲存貯體 URI。
   + **環境變數**：
     + 請為 **SageMaker XGBoost** 將此欄位留白。
     + 如果您使用 SageMaker AI 訓練模型，請將環境變數 `SAGEMAKER_SUBMIT_DIRECTORY` 指定為包含訓練指令碼的 Amazon S3 儲存貯體 URI。
     + 如果您未使用 SageMaker AI 訓練模型，請指定下列環境變數：    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/neo-deployment-hosting-services-console.html)

1. 確認容器的資訊正確，然後選擇 **Create model (建立模型)**。在**建立模型登陸頁面**上，選擇**建立端點**。  
![\[建立模型的登陸頁面\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/neo-deploy-console-create-model-land-page.png)

1. 在 **Create and configure endpoint (建立與設定端點)** 圖表中，指定 **Endpoint name (端點名稱)**。針對**附加端點組態**，選擇**建立新端點組態**。  
![\[Neo 主控台建立與設定端點使用者介面。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/neo-deploy-console-config-endpoint.png)

1. 在 **New endpoint configuration (新端點組態)** 頁面中，指定 **Endpoint configuration name (端點組態名稱)**。  
![\[Neo 主控台的新端點組態使用者介面。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/neo-deploy-console-new-endpoint-config.png)

1. 選擇模型名稱旁邊的**編輯**，然後在**編輯生產變體**頁面上指定正確的**執行個體類型**。**Instance type (執行個體類型)** 值一定要符合編譯任務中指定的值。  
![\[Neo 主控台的新端點組態使用者介面。\]](http://docs.aws.amazon.com/zh_tw/sagemaker/latest/dg/images/neo-deploy-console-edit-production-variant.png)

1. 選擇**儲存**。

1. 在**新端點組態**頁面上，選擇**建立端點組態**，然後選擇**建立端點**。

# 具有所部署服務的推論請求
<a name="neo-requests"></a>

如已依照[部署模型](neo-deployment-hosting-services.md)中的指示作業，您應已設定並執行 SageMaker AI 端點。無論您如何部署 Neo 編譯的模型，有三種方式可以提交推論請求：

**Topics**
+ [從部署的服務 (Amazon SageMaker SDK) 請求推論](neo-requests-sdk.md)
+ [從部署的服務 (Boto3) 請求推論](neo-requests-boto3.md)
+ [從部署的服務請求推論 (AWS CLI)](neo-requests-cli.md)

# 從部署的服務 (Amazon SageMaker SDK) 請求推論
<a name="neo-requests-sdk"></a>

使用下列程式碼範例，根據您用來訓練模型的架構，從部署的服務請求推論。不同架構的程式碼範例都很類似。主要差異在於 TensorFlow 需要 `application/json` 做為內容類型。

 
## PyTorch 和 MXNet
<a name="neo-requests-sdk-py-mxnet"></a>

 如果您正在使用 **PyTorch 1.4 版或更新版本**，或使用 **MXNet 1.7.0 或更新版本**，而且您具備一個 Amazon SageMaker AI 端點 `InService`，就可以使用適用於 Python 的 SageMaker AI SDK `predictor` 套件提出推論請求。

**注意**  
API 會根據適用於 Python 版本的 SageMaker AI SDK 而有所不同：  
針對 1.x 版，請使用 [https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor](https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor) 和 [https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor.predict](https://sagemaker.readthedocs.io/en/v1.72.0/api/inference/predictors.html#sagemaker.predictor.RealTimePredictor.predict) API。
針對 2.x 版，請使用 [https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor) 和 [https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor.predict](https://sagemaker.readthedocs.io/en/stable/api/inference/predictors.html#sagemaker.predictor.Predictor.predict) API。

下列程式碼範例會示範如何使用這些 API 傳送映像進行推論：

------
#### [ SageMaker Python SDK v1.x ]

```
from sagemaker.predictor import RealTimePredictor

endpoint = 'insert name of your endpoint here'

# Read image into memory
payload = None
with open("image.jpg", 'rb') as f:
    payload = f.read()

predictor = RealTimePredictor(endpoint=endpoint, content_type='application/x-image')
inference_response = predictor.predict(data=payload)
print (inference_response)
```

------
#### [ SageMaker Python SDK v2.x ]

```
from sagemaker.predictor import Predictor

endpoint = 'insert name of your endpoint here'

# Read image into memory
payload = None
with open("image.jpg", 'rb') as f:
    payload = f.read()
    
predictor = Predictor(endpoint)
inference_response = predictor.predict(data=payload)
print (inference_response)
```

------

## TensorFlow
<a name="neo-requests-sdk-py-tf"></a>

下列程式碼範例會示範如何使用 SageMaker Python SDK API 傳送映像進行推論：

```
from sagemaker.predictor import Predictor
from PIL import Image
import numpy as np
import json

endpoint = 'insert the name of your endpoint here'

# Read image into memory
image = Image.open(input_file)
batch_size = 1
image = np.asarray(image.resize((224, 224)))
image = image / 128 - 1
image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
body = json.dumps({"instances": image.tolist()})
    
predictor = Predictor(endpoint)
inference_response = predictor.predict(data=body)
print(inference_response)
```

# 從部署的服務 (Boto3) 請求推論
<a name="neo-requests-boto3"></a>

 具備 SageMaker 端點 `InService` 後，您就可以使用適用於 Python 的 SageMaker AI SDK (Boto3) 用戶端和 [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker-runtime.html#SageMakerRuntime.Client.invoke_endpoint) API 來提交推論請求。下列程式碼範例會示範如何傳送映像進行推論：

------
#### [ PyTorch and MXNet ]

```
import boto3

import json
 
endpoint = 'insert name of your endpoint here'
 
runtime = boto3.Session().client('sagemaker-runtime')
 
# Read image into memory
with open(image, 'rb') as f:
    payload = f.read()
# Send image via InvokeEndpoint API
response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='application/x-image', Body=payload)

# Unpack response
result = json.loads(response['Body'].read().decode())
```

------
#### [ TensorFlow ]

針對 TensorFlow，請提交內容類型具有 `application/json` 的輸入。

```
from PIL import Image
import numpy as np
import json
import boto3

client = boto3.client('sagemaker-runtime') 
input_file = 'path/to/image'
image = Image.open(input_file)
batch_size = 1
image = np.asarray(image.resize((224, 224)))
image = image / 128 - 1
image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
body = json.dumps({"instances": image.tolist()})
ioc_predictor_endpoint_name = 'insert name of your endpoint here'
content_type = 'application/json'   
ioc_response = client.invoke_endpoint(
    EndpointName=ioc_predictor_endpoint_name,
    Body=body,
    ContentType=content_type
 )
```

------
#### [ XGBoost ]

 針對 XGBoost 應用程式，您應該改提交 CSV 文字：

```
import boto3
import json
 
endpoint = 'insert your endpoint name here'
 
runtime = boto3.Session().client('sagemaker-runtime')
 
csv_text = '1,-1.0,1.0,1.5,2.6'
# Send CSV text via InvokeEndpoint API
response = runtime.invoke_endpoint(EndpointName=endpoint, ContentType='text/csv', Body=csv_text)
# Unpack response
result = json.loads(response['Body'].read().decode())
```

------

 請注意，BYOM 允許自訂的內容類型。如需詳細資訊，請參閱[https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html)。

# 從部署的服務請求推論 (AWS CLI)
<a name="neo-requests-cli"></a>

具備 Amazon SageMaker AI 端點 `InService` 後，您就可以使用 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker-runtime/invoke-endpoint.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker-runtime/invoke-endpoint.html) 提出推論請求。您可以使用 AWS Command Line Interface (AWS CLI) 提出推論請求。下列範例會示範如何傳送映像進行推論：

```
aws sagemaker-runtime invoke-endpoint --endpoint-name 'insert name of your endpoint here' --body fileb://image.jpg --content-type=application/x-image output_file.txt
```

如果推論成功，就會提出具有推論請求相關資訊的 `output_file.txt`。

 針對 TensorFlow，請提交具有 `application/json` 做為內容類型的輸入。

```
aws sagemaker-runtime invoke-endpoint --endpoint-name 'insert name of your endpoint here' --body fileb://input.json --content-type=application/json output_file.txt
```

# 推論容器映像
<a name="neo-deployment-hosting-services-container-images"></a>

SageMaker Neo 現在可提供 `ml_*` 目標的推論影像 URI 資訊。如需詳細資訊，請參閱 [DescribeCompilationJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeCompilationJob.html#sagemaker-DescribeCompilationJob-response-InferenceImage)。

根據您的使用案例，使用適當的值取代下方所提供之推論影像 URI 範本中重點標示的部分。

## Amazon SageMaker AI XGBoost
<a name="inference-container-collapse-xgboost"></a>

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/xgboost-neo:latest
```

根據您使用的 *aws\$1region*，取代本頁結尾資料表的 *aws\$1account\$1id*。

## Keras
<a name="inference-container-collapse-keras"></a>

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-keras:fx_version-instance_type-py3
```

根據您使用的 *aws\$1region*，取代本頁結尾資料表的 *aws\$1account\$1id*。

使用 `2.2.4` 取代 *fx\$1version*。

使用 `cpu` 或 `gpu` 取代 *instance\$1type*。

## MXNet
<a name="inference-container-collapse-mxnet"></a>

------
#### [ CPU or GPU instance types ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-inference-mxnet:fx_version-instance_type-py3
```

根據您使用的 *aws\$1region*，取代本頁結尾資料表的 *aws\$1account\$1id*。

使用 `1.8.0` 取代 *fx\$1version*。

使用 `cpu` 或 `gpu` 取代 *instance\$1type*。

------
#### [ Inferentia1 ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-mxnet:fx_version-instance_type-py3
```

使用 `us-east-1` 或 `us-west-2` 取代 *aws\$1region*。

根據您使用的 *aws\$1region*，取代本頁結尾資料表的 *aws\$1account\$1id*。

使用 `1.5.1` 取代 *fx\$1version*。

使用 `inf` 取代 *`instance_type`*。

------

## ONNX
<a name="inference-container-collapse-onnx"></a>

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-onnx:fx_version-instance_type-py3
```

根據您使用的 *aws\$1region*，取代本頁結尾資料表的 *aws\$1account\$1id*。

使用 `1.5.0` 取代 *fx\$1version*。

使用 `cpu` 或 `gpu` 取代 *instance\$1type*。

## PyTorch
<a name="inference-container-collapse-pytorch"></a>

------
#### [ CPU or GPU instance types ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-inference-pytorch:fx_version-instance_type-py3
```

根據您使用的 *aws\$1region*，取代本頁結尾資料表的 *aws\$1account\$1id*。

使用 `1.4`、`1.5`、`1.6`、`1.7`、`1.8`、`1.12`、`1.13` 或 `2.0` 取代 *fx\$1version*。

使用 `cpu` 或 `gpu` 取代 *instance\$1type*。

------
#### [ Inferentia1 ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-pytorch:fx_version-instance_type-py3
```

使用 `us-east-1` 或 `us-west-2` 取代 *aws\$1region*。

根據您使用的 *aws\$1region*，取代本頁結尾資料表的 *aws\$1account\$1id*。

使用 `1.5.1` 取代 *fx\$1version*。

使用 `inf` 取代 *`instance_type`*。

------
#### [ Inferentia2 and Trainium1 ]

```
763104351884.dkr.ecr.aws_region.amazonaws.com/pytorch-inference-neuronx:1.13.1-neuronx-py38-sdk2.10.0-ubuntu20.04
```

使用適用於 Inferentia2 的 `us-east-2` 和適用於 Trainium1 的 `us-east-1` 取代 *aws\$1region*。

------

## TensorFlow
<a name="inference-container-collapse-tf"></a>

------
#### [ CPU or GPU instance types ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-inference-tensorflow:fx_version-instance_type-py3
```

根據您使用的 *aws\$1region*，取代本頁結尾資料表的 *aws\$1account\$1id*。

使用 `1.15.3` 或 `2.9` 取代 *fx\$1version*。

使用 `cpu` 或 `gpu` 取代 *instance\$1type*。

------
#### [ Inferentia1 ]

```
aws_account_id.dkr.ecr.aws_region.amazonaws.com/sagemaker-neo-tensorflow:fx_version-instance_type-py3
```

根據您使用的 *aws\$1region*，取代本頁結尾資料表的 *aws\$1account\$1id*。請注意，針對執行個體類型 `inf`，只支援 `us-east-1` 和 `us-west-2`。

使用 `1.15.0` 取代 *fx\$1version*

使用 `inf` 取代 *instance\$1type*。

------
#### [ Inferentia2 and Trainium1 ]

```
763104351884.dkr.ecr.aws_region.amazonaws.com/tensorflow-inference-neuronx:2.10.1-neuronx-py38-sdk2.10.0-ubuntu20.04
```

使用適用於 Inferentia2 的 `us-east-2` 和適用於 Trainium1 的 `us-east-1` 取代 *aws\$1region*。

------

以下資料表對應 *aws\$1account\$1id * 和 *aws\$1region*。請使用此表，尋找您的應用程式所需的正確推論影像 URI。


| aws\$1account\$1id | aws\$1region | 
| --- | --- | 
| 785573368785 | us-east-1 | 
| 007439368137 | us-east-2 | 
| 710691900526 | us-west-1 | 
| 301217895009 | us-west-2 | 
| 802834080501 | eu-west-1 | 
| 205493899709 | eu-west-2 | 
| 254080097072 | eu-west-3 | 
| 601324751636 | eu-north-1 | 
| 966458181534 | eu-south-1 | 
| 746233611703 | eu-central-1 | 
| 110948597952 | ap-east-1 | 
| 763008648453 | ap-south-1 | 
| 941853720454 | ap-northeast-1 | 
| 151534178276 | ap-northeast-2 | 
| 925152966179 | ap-northeast-3 | 
| 324986816169 | ap-southeast-1 | 
| 355873309152 | ap-southeast-2 | 
| 474822919863 | cn-northwest-1 | 
| 472730292857 | cn-north-1 | 
| 756306329178 | sa-east-1 | 
| 464438896020 | ca-central-1 | 
| 836785723513 | me-south-1 | 
| 774647643957 | af-south-1 | 
| 275950707576 | il-central-1 |