

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# 使用 SageMaker SDK 部署编译后的模型
<a name="neo-deployment-hosting-services-sdk"></a>

如果模型是使用 适用于 Python (Boto3) 的 AWS SDK、或 Amazon A SageMaker I 控制台编译的 AWS CLI，则必须满足[先决条件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)部分。按照以下用例之一，根据您编译模型的方式部署使用 SageMaker Neo 编译的模型。

**Topics**
+ [如果您使用 SageMaker SDK 编译模型](#neo-deployment-hosting-services-sdk-deploy-sm-sdk)
+ [如果您使用 MXNet 或编译模型 PyTorch](#neo-deployment-hosting-services-sdk-deploy-sm-boto3)
+ [如果您使用 Boto3、 SageMaker 控制台或 CLI 编译模型 TensorFlow](#neo-deployment-hosting-services-sdk-deploy-sm-boto3-tensorflow)

## 如果您使用 SageMaker SDK 编译模型
<a name="neo-deployment-hosting-services-sdk-deploy-sm-sdk"></a>

已编译模型的 [sagemaker.Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html?highlight=sagemaker.Model) 对象句柄提供 [deploy()](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html?highlight=sagemaker.Model#sagemaker.model.Model.deploy) 函数，使您能够创建端点以便为推理请求提供服务。使用该函数，您可以设置用于端点的实例的数量和类型。您必须选择已经为其编译了模型的实例。例如，在编[译模型（Amazon SageMaker SDK）](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation-sagemaker-sdk.html)部分编译的作业中，这是`ml_c5`。

```
predictor = compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.c5.4xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

## 如果您使用 MXNet 或编译模型 PyTorch
<a name="neo-deployment-hosting-services-sdk-deploy-sm-boto3"></a>

创建 SageMaker AI 模型并使用特定于框架的模型下的 deploy () API 进行部署。 APIs因为 MXNet，它是 [MXNetModel](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/sagemaker.mxnet.html?highlight=MXNetModel#mxnet-model)，对于 PyTorch，它是[ PyTorchModel](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/sagemaker.pytorch.html?highlight=PyTorchModel#sagemaker.pytorch.model.PyTorchModel)。创建和部署 SageMaker AI 模型时，必须将`MMS_DEFAULT_RESPONSE_TIMEOUT`环境变量设置为，`500`并将该`entry_point`参数指定为推理脚本 (`inference.py`)，将`source_dir`参数指定为推理脚本的目录位置 (`code`)。按照先决条件中的步骤准备推理脚本 (`inference.py`)。

以下示例说明如何使用这些函数使用适用于 Python 的 SageMaker AI SDK 部署编译后的模型：

------
#### [ MXNet ]

```
from sagemaker.mxnet import MXNetModel

# Create SageMaker model and deploy an endpoint
sm_mxnet_compiled_model = MXNetModel(
    model_data='insert S3 path of compiled MXNet model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.8.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for MXNet',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_mxnet_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------
#### [ PyTorch 1.4 and Older ]

```
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.4.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------
#### [ PyTorch 1.5 and Newer ]

```
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.5',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------

**注意**  
`AmazonSageMakerFullAccess` 和 `AmazonS3ReadOnlyAccess` 策略必须附加到 `AmazonSageMaker-ExecutionRole` IAM 角色。

## 如果您使用 Boto3、 SageMaker 控制台或 CLI 编译模型 TensorFlow
<a name="neo-deployment-hosting-services-sdk-deploy-sm-boto3-tensorflow"></a>

构造一个 `TensorFlowModel` 对象，然后调用部署：

```
role='AmazonSageMaker-ExecutionRole'
model_path='S3 path for model file'
framework_image='inference container arn'
tf_model = TensorFlowModel(model_data=model_path,
                framework_version='1.15.3',
                role=role, 
                image_uri=framework_image)
instance_type='ml.c5.xlarge'
predictor = tf_model.deploy(instance_type=instance_type,
                    initial_instance_count=1)
```

有关更多信息，请参阅[直接使用模型构件部署](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#deploying-directly-from-model-artifacts)。

您可以从[此列表](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)中选择满足您需求的 Docker 映像 Amazon ECR URI。

有关如何构造`TensorFlowModel`对象的更多信息，请参阅 [SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#tensorflow-serving-model)。

**注意**  
如果在 GPU 上部署模型，第一个推理请求的延迟可能较高。这是因为会在第一个推理请求中创建一个优化的计算内核。我们建议您创建推理请求的预热文件，并在将其发送到 TFX 之前与模型文件一起存储。这称为“预热”模型。

以下代码段演示如何为[先决条件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)部分中的图像分类示例生成预热文件：

```
import tensorflow as tf
from tensorflow_serving.apis import classification_pb2
from tensorflow_serving.apis import inference_pb2
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_log_pb2
from tensorflow_serving.apis import regression_pb2
import numpy as np

with tf.python_io.TFRecordWriter("tf_serving_warmup_requests") as writer:       
    img = np.random.uniform(0, 1, size=[224, 224, 3]).astype(np.float32)
    img = np.expand_dims(img, axis=0)
    test_data = np.repeat(img, 1, axis=0)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'compiled_models'
    request.model_spec.signature_name = 'serving_default'
    request.inputs['Placeholder:0'].CopyFrom(tf.compat.v1.make_tensor_proto(test_data, shape=test_data.shape, dtype=tf.float32))
    log = prediction_log_pb2.PredictionLog(
    predict_log=prediction_log_pb2.PredictLog(request=request))
    writer.write(log.SerializeToString())
```

有关如何 “预热” 模型的更多信息，请参阅 [TensorFlow TFX 页面](https://www.tensorflow.org/tfx/serving/saved_model_warmup)。