

本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。

# 部署模型
<a name="neo-deployment-hosting-services"></a>

要将 Amazon SageMaker Neo 编译的模型部署到 HTTPS 终端节点，您必须使用 Amazon A SageMaker I 托管服务为该模型配置和创建终端节点。目前，开发人员可以使用亚马逊 SageMaker APIs 在 ml.c5、ml.c4、ml.m5、ml.m4、ml.p3、ml.p2 和 ml.inf1 实例上部署模块。

对于 [Inferentia](https://aws.amazon.com/machine-learning/inferentia/) 和 [Trainium](https://aws.amazon.com/machine-learning/trainium/) 实例，需要专门针对这些实例对模型进行编译。不保证为其他实例类型编译的模型能够与 Inferentia 或 Trainium 实例配合使用。

部署编译的模型时，您需要为用于编译的目标使用相同的实例。这将创建可用于执行推断的 SageMaker AI 终端节点。[您可以使用以下任一方法部署 Neo 编译模型：适用于 [Python 的亚马逊 A SageMaker I SDK、适用于 P](https://sagemaker.readthedocs.io/en/stable/)[ython 的软件开发工具包 ([AWS Command Line Interface](https://docs.aws.amazon.com/cli/latest/reference/)](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)Boto3) 和 AI 控制台。SageMaker ](https://console.aws.amazon.com/sagemaker)

**注意**  
要使用 AWS CLI控制台或 Boto3 部署模型，请参阅 [Neo 推理容器镜像，为主容器选择推理图像](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html) URI。

**Topics**
+ [先决条件](neo-deployment-hosting-services-prerequisites.md)
+ [使用 SageMaker SDK 部署编译后的模型](neo-deployment-hosting-services-sdk.md)
+ [使用 Boto3 部署编译的模型](neo-deployment-hosting-services-boto3.md)
+ [使用部署编译后的模型 AWS CLI](neo-deployment-hosting-services-cli.md)
+ [使用控制台部署编译的模型](neo-deployment-hosting-services-console.md)

# 先决条件
<a name="neo-deployment-hosting-services-prerequisites"></a>

**注意**  
如果您使用 适用于 Python (Boto3) 的 AWS SDK、或 SageMaker AI 控制台编译模型 AWS CLI，请按照本节中的说明进行操作。

要创建 SageMaker Neo 编译的模型，您需要以下内容：

1. Docker 映像 Amazon ECR URI。您可以从[此列表](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)中选择一个满足您需求的产品。

1. 入口点脚本文件：

   1. **适用于 PyTorch 和 MXNet 型号：**

      *如果您使用 SageMaker AI 训练模型*，则训练脚本必须实现下述功能。训练脚本在推理过程中用作入口点脚本。在 [MNIST 使用 MXNet 模块和 N SageMaker eo 进行训练、编译和部署](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_neo_compilation_jobs/mxnet_mnist/mxnet_mnist_neo.html)中详述的示例中，训练脚本 (`mnist.py`) 实现了所需的函数。

      *如果您没有使用 SageMaker AI 训练模型*，则需要提供可在推理时使用的入口点脚本 (`inference.py`) 文件。基于框架 PyTorch —— MXNet 或——推理脚本的位置必须符合适用的 SageMaker Python SDK [模型目录结构 MxNet](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/using_mxnet.html#model-directory-structure)或[模型目录结构](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#model-directory-structure)。 PyTorch

      **MXNet**在 CPU **PyTorch**和 GPU 实例类型上使用 Neo 推理优化的容器镜像时，推理脚本必须实现以下功能：
      + `model_fn`：加载模型。（可选）
      + `input_fn`：将传入的请求负载转换为 numpy 数组。
      + `predict_fn`：执行预测。
      + `output_fn`：将预测输出转换为响应负载。
      + 或者，您可以将 `transform_fn` 定义为组合 `input_fn`、`predict_fn` 和 `output_fn`。

      以下是名为 `code` (`code/inference.py`) **PyTorch 和 MXNet （Gluon and Mod** ule）的目录中的`inference.py`脚本示例。这些示例首先加载模型，然后在 GPU 上将其提供给映像数据：

------
#### [ MXNet Module ]

      ```
      import numpy as np
      import json
      import mxnet as mx
      import neomx  # noqa: F401
      from collections import namedtuple
      
      Batch = namedtuple('Batch', ['data'])
      
      # Change the context to mx.cpu() if deploying to a CPU endpoint
      ctx = mx.gpu()
      
      def model_fn(model_dir):
          # The compiled model artifacts are saved with the prefix 'compiled'
          sym, arg_params, aux_params = mx.model.load_checkpoint('compiled', 0)
          mod = mx.mod.Module(symbol=sym, context=ctx, label_names=None)
          exe = mod.bind(for_training=False,
                         data_shapes=[('data', (1,3,224,224))],
                         label_shapes=mod._label_shapes)
          mod.set_params(arg_params, aux_params, allow_missing=True)
          
          # Run warm-up inference on empty data during model load (required for GPU)
          data = mx.nd.empty((1,3,224,224), ctx=ctx)
          mod.forward(Batch([data]))
          return mod
      
      
      def transform_fn(mod, image, input_content_type, output_content_type):
          # pre-processing
          decoded = mx.image.imdecode(image)
          resized = mx.image.resize_short(decoded, 224)
          cropped, crop_info = mx.image.center_crop(resized, (224, 224))
          normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                        mean=mx.nd.array([0.485, 0.456, 0.406]),
                                        std=mx.nd.array([0.229, 0.224, 0.225]))
          transposed = normalized.transpose((2, 0, 1))
          batchified = transposed.expand_dims(axis=0)
          casted = batchified.astype(dtype='float32')
          processed_input = casted.as_in_context(ctx)
      
          # prediction/inference
          mod.forward(Batch([processed_input]))
      
          # post-processing
          prob = mod.get_outputs()[0].asnumpy().tolist()
          prob_json = json.dumps(prob)
          return prob_json, output_content_type
      ```

------
#### [ MXNet Gluon ]

      ```
      import numpy as np
      import json
      import mxnet as mx
      import neomx  # noqa: F401
      
      # Change the context to mx.cpu() if deploying to a CPU endpoint
      ctx = mx.gpu()
      
      def model_fn(model_dir):
          # The compiled model artifacts are saved with the prefix 'compiled'
          block = mx.gluon.nn.SymbolBlock.imports('compiled-symbol.json',['data'],'compiled-0000.params', ctx=ctx)
          
          # Hybridize the model & pass required options for Neo: static_alloc=True & static_shape=True
          block.hybridize(static_alloc=True, static_shape=True)
          
          # Run warm-up inference on empty data during model load (required for GPU)
          data = mx.nd.empty((1,3,224,224), ctx=ctx)
          warm_up = block(data)
          return block
      
      
      def input_fn(image, input_content_type):
          # pre-processing
          decoded = mx.image.imdecode(image)
          resized = mx.image.resize_short(decoded, 224)
          cropped, crop_info = mx.image.center_crop(resized, (224, 224))
          normalized = mx.image.color_normalize(cropped.astype(np.float32) / 255,
                                        mean=mx.nd.array([0.485, 0.456, 0.406]),
                                        std=mx.nd.array([0.229, 0.224, 0.225]))
          transposed = normalized.transpose((2, 0, 1))
          batchified = transposed.expand_dims(axis=0)
          casted = batchified.astype(dtype='float32')
          processed_input = casted.as_in_context(ctx)
          return processed_input
      
      
      def predict_fn(processed_input_data, block):
          # prediction/inference
          prediction = block(processed_input_data)
          return prediction
      
      def output_fn(prediction, output_content_type):
          # post-processing
          prob = prediction.asnumpy().tolist()
          prob_json = json.dumps(prob)
          return prob_json, output_content_type
      ```

------
#### [ PyTorch 1.4 and Older ]

      ```
      import os
      import torch
      import torch.nn.parallel
      import torch.optim
      import torch.utils.data
      import torch.utils.data.distributed
      import torchvision.transforms as transforms
      from PIL import Image
      import io
      import json
      import pickle
      
      
      def model_fn(model_dir):
          """Load the model and return it.
          Providing this function is optional.
          There is a default model_fn available which will load the model
          compiled using SageMaker Neo. You can override it here.
      
          Keyword arguments:
          model_dir -- the directory path where the model artifacts are present
          """
      
          # The compiled model is saved as "compiled.pt"
          model_path = os.path.join(model_dir, 'compiled.pt')
          with torch.neo.config(model_dir=model_dir, neo_runtime=True):
              model = torch.jit.load(model_path)
              device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
              model = model.to(device)
      
          # We recommend that you run warm-up inference during model load
          sample_input_path = os.path.join(model_dir, 'sample_input.pkl')
          with open(sample_input_path, 'rb') as input_file:
              model_input = pickle.load(input_file)
          if torch.is_tensor(model_input):
              model_input = model_input.to(device)
              model(model_input)
          elif isinstance(model_input, tuple):
              model_input = (inp.to(device) for inp in model_input if torch.is_tensor(inp))
              model(*model_input)
          else:
              print("Only supports a torch tensor or a tuple of torch tensors")
              return model
      
      
      def transform_fn(model, request_body, request_content_type,
                       response_content_type):
          """Run prediction and return the output.
          The function
          1. Pre-processes the input request
          2. Runs prediction
          3. Post-processes the prediction output.
          """
          # preprocess
          decoded = Image.open(io.BytesIO(request_body))
          preprocess = transforms.Compose([
              transforms.Resize(256),
              transforms.CenterCrop(224),
              transforms.ToTensor(),
              transforms.Normalize(
                  mean=[
                      0.485, 0.456, 0.406], std=[
                      0.229, 0.224, 0.225]),
          ])
          normalized = preprocess(decoded)
          batchified = normalized.unsqueeze(0)
          # predict
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          batchified = batchified.to(device)
          output = model.forward(batchified)
      
          return json.dumps(output.cpu().numpy().tolist()), response_content_type
      ```

------
#### [ PyTorch 1.5 and Newer ]

      ```
      import os
      import torch
      import torch.nn.parallel
      import torch.optim
      import torch.utils.data
      import torch.utils.data.distributed
      import torchvision.transforms as transforms
      from PIL import Image
      import io
      import json
      import pickle
      
      
      def model_fn(model_dir):
          """Load the model and return it.
          Providing this function is optional.
          There is a default_model_fn available, which will load the model
          compiled using SageMaker Neo. You can override the default here.
          The model_fn only needs to be defined if your model needs extra
          steps to load, and can otherwise be left undefined.
      
          Keyword arguments:
          model_dir -- the directory path where the model artifacts are present
          """
      
          # The compiled model is saved as "model.pt"
          model_path = os.path.join(model_dir, 'model.pt')
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          model = torch.jit.load(model_path, map_location=device)
          model = model.to(device)
      
          return model
      
      
      def transform_fn(model, request_body, request_content_type,
                          response_content_type):
          """Run prediction and return the output.
          The function
          1. Pre-processes the input request
          2. Runs prediction
          3. Post-processes the prediction output.
          """
          # preprocess
          decoded = Image.open(io.BytesIO(request_body))
          preprocess = transforms.Compose([
                                      transforms.Resize(256),
                                      transforms.CenterCrop(224),
                                      transforms.ToTensor(),
                                      transforms.Normalize(
                                          mean=[
                                              0.485, 0.456, 0.406], std=[
                                              0.229, 0.224, 0.225]),
                                          ])
          normalized = preprocess(decoded)
          batchified = normalized.unsqueeze(0)
          
          # predict
          device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
          batchified = batchified.to(device)
          output = model.forward(batchified)
          return json.dumps(output.cpu().numpy().tolist()), response_content_type
      ```

------

   1.  **对于 inf1 实例或 onnx、xgboost、keras 容器映像** 

      对于所有其他 Neo Inference 优化的容器映像或 inferentia 实例类型，入口点脚本必须为 Neo 深度学习运行时系统实施以下功能：
      + `neo_preprocess`：将传入的请求负载转换为 numpy 数组。
      + `neo_postprocess`：将 Neo 深度学习运行时系统的预测输出转换为响应正文。
**注意**  
前两个函数不使用 MXNet PyTorch、或 TensorFlow的任何功能。

      有关如何使用这些功能的示例，请参阅 [Neo 模型编译示例笔记本](https://docs.aws.amazon.com//sagemaker/latest/dg/neo.html#neo-sample-notebooks)。

   1. **对于 TensorFlow 模特**

      如果您的模型在将数据发送到模型之前需要自定义的预处理和后处理逻辑，则必须指定推理时可以使用的入口点脚本 `inference.py` 文件。该脚本应实施一对 `input_handler` 和 `output_handler` 功能或单个处理程序功能。
**注意**  
请注意，如果处理程序功能已实施，则 `input_handler` 和 `output_handler` 被忽略。

      以下是 `inference.py` 脚本的代码示例，您可以将该脚本与编译模型组合在一起，对图像分类模型执行自定义的预处理和后处理。A SageMaker I 客户端将图像文件作为`application/x-image`内容类型发送给`input_handler`函数，然后将其转换为 JSON。然后，使用 REST API 将转换后的映像文件发送到 [Tensorflow Model Server (TFX)](https://www.tensorflow.org/tfx/serving/api_rest)。

      ```
      import json
      import numpy as np
      import json
      import io
      from PIL import Image
      
      def input_handler(data, context):
          """ Pre-process request input before it is sent to TensorFlow Serving REST API
          
          Args:
          data (obj): the request data, in format of dict or string
          context (Context): an object containing request and configuration details
          
          Returns:
          (dict): a JSON-serializable dict that contains request body and headers
          """
          f = data.read()
          f = io.BytesIO(f)
          image = Image.open(f).convert('RGB')
          batch_size = 1
          image = np.asarray(image.resize((512, 512)))
          image = np.concatenate([image[np.newaxis, :, :]] * batch_size)
          body = json.dumps({"signature_name": "serving_default", "instances": image.tolist()})
          return body
      
      def output_handler(data, context):
          """Post-process TensorFlow Serving output before it is returned to the client.
          
          Args:
          data (obj): the TensorFlow serving response
          context (Context): an object containing request and configuration details
          
          Returns:
          (bytes, string): data to return to client, response content type
          """
          if data.status_code != 200:
              raise ValueError(data.content.decode('utf-8'))
      
          response_content_type = context.accept_header
          prediction = data.content
          return prediction, response_content_type
      ```

      如果没有自定义的预处理或后处理， SageMaker AI 客户端会以类似的方式将文件图像转换为 JSON，然后再将其发送到 SageMaker AI 终端节点。

      有关更多信息，请参阅 [ SageMaker Python SDK 中的部署到 TensorFlow 服务端点](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#providing-python-scripts-for-pre-pos-processing)。

1. 包含已编译模型构件的 Amazon S3 存储桶 URI。

# 使用 SageMaker SDK 部署编译后的模型
<a name="neo-deployment-hosting-services-sdk"></a>

如果模型是使用 适用于 Python (Boto3) 的 AWS SDK、或 Amazon A SageMaker I 控制台编译的 AWS CLI，则必须满足[先决条件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)部分。按照以下用例之一，根据您编译模型的方式部署使用 SageMaker Neo 编译的模型。

**Topics**
+ [如果您使用 SageMaker SDK 编译模型](#neo-deployment-hosting-services-sdk-deploy-sm-sdk)
+ [如果您使用 MXNet 或编译模型 PyTorch](#neo-deployment-hosting-services-sdk-deploy-sm-boto3)
+ [如果您使用 Boto3、 SageMaker 控制台或 CLI 编译模型 TensorFlow](#neo-deployment-hosting-services-sdk-deploy-sm-boto3-tensorflow)

## 如果您使用 SageMaker SDK 编译模型
<a name="neo-deployment-hosting-services-sdk-deploy-sm-sdk"></a>

已编译模型的 [sagemaker.Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html?highlight=sagemaker.Model) 对象句柄提供 [deploy()](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html?highlight=sagemaker.Model#sagemaker.model.Model.deploy) 函数，使您能够创建端点以便为推理请求提供服务。使用该函数，您可以设置用于端点的实例的数量和类型。您必须选择已经为其编译了模型的实例。例如，在编[译模型（Amazon SageMaker SDK）](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-job-compilation-sagemaker-sdk.html)部分编译的作业中，这是`ml_c5`。

```
predictor = compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.c5.4xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

## 如果您使用 MXNet 或编译模型 PyTorch
<a name="neo-deployment-hosting-services-sdk-deploy-sm-boto3"></a>

创建 SageMaker AI 模型并使用特定于框架的模型下的 deploy () API 进行部署。 APIs因为 MXNet，它是 [MXNetModel](https://sagemaker.readthedocs.io/en/stable/frameworks/mxnet/sagemaker.mxnet.html?highlight=MXNetModel#mxnet-model)，对于 PyTorch，它是[ PyTorchModel](https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/sagemaker.pytorch.html?highlight=PyTorchModel#sagemaker.pytorch.model.PyTorchModel)。创建和部署 SageMaker AI 模型时，必须将`MMS_DEFAULT_RESPONSE_TIMEOUT`环境变量设置为，`500`并将该`entry_point`参数指定为推理脚本 (`inference.py`)，将`source_dir`参数指定为推理脚本的目录位置 (`code`)。按照先决条件中的步骤准备推理脚本 (`inference.py`)。

以下示例说明如何使用这些函数使用适用于 Python 的 SageMaker AI SDK 部署编译后的模型：

------
#### [ MXNet ]

```
from sagemaker.mxnet import MXNetModel

# Create SageMaker model and deploy an endpoint
sm_mxnet_compiled_model = MXNetModel(
    model_data='insert S3 path of compiled MXNet model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.8.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for MXNet',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_mxnet_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------
#### [ PyTorch 1.4 and Older ]

```
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.4.0',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
    env={'MMS_DEFAULT_RESPONSE_TIMEOUT': '500'},
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------
#### [ PyTorch 1.5 and Newer ]

```
from sagemaker.pytorch import PyTorchModel

# Create SageMaker model and deploy an endpoint
sm_pytorch_compiled_model = PyTorchModel(
    model_data='insert S3 path of compiled PyTorch model archive',
    role='AmazonSageMaker-ExecutionRole',
    entry_point='inference.py',
    source_dir='code',
    framework_version='1.5',
    py_version='py3',
    image_uri='insert appropriate ECR Image URI for PyTorch',
)

# Replace the example instance_type below to your preferred instance_type
predictor = sm_pytorch_compiled_model.deploy(initial_instance_count = 1, instance_type = 'ml.p3.2xlarge')

# Print the name of newly created endpoint
print(predictor.endpoint_name)
```

------

**注意**  
`AmazonSageMakerFullAccess` 和 `AmazonS3ReadOnlyAccess` 策略必须附加到 `AmazonSageMaker-ExecutionRole` IAM 角色。

## 如果您使用 Boto3、 SageMaker 控制台或 CLI 编译模型 TensorFlow
<a name="neo-deployment-hosting-services-sdk-deploy-sm-boto3-tensorflow"></a>

构造一个 `TensorFlowModel` 对象，然后调用部署：

```
role='AmazonSageMaker-ExecutionRole'
model_path='S3 path for model file'
framework_image='inference container arn'
tf_model = TensorFlowModel(model_data=model_path,
                framework_version='1.15.3',
                role=role, 
                image_uri=framework_image)
instance_type='ml.c5.xlarge'
predictor = tf_model.deploy(instance_type=instance_type,
                    initial_instance_count=1)
```

有关更多信息，请参阅[直接使用模型构件部署](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/deploying_tensorflow_serving.html#deploying-directly-from-model-artifacts)。

您可以从[此列表](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)中选择满足您需求的 Docker 映像 Amazon ECR URI。

有关如何构造`TensorFlowModel`对象的更多信息，请参阅 [SageMaker SDK](https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/sagemaker.tensorflow.html#tensorflow-serving-model)。

**注意**  
如果在 GPU 上部署模型，第一个推理请求的延迟可能较高。这是因为会在第一个推理请求中创建一个优化的计算内核。我们建议您创建推理请求的预热文件，并在将其发送到 TFX 之前与模型文件一起存储。这称为“预热”模型。

以下代码段演示如何为[先决条件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)部分中的图像分类示例生成预热文件：

```
import tensorflow as tf
from tensorflow_serving.apis import classification_pb2
from tensorflow_serving.apis import inference_pb2
from tensorflow_serving.apis import model_pb2
from tensorflow_serving.apis import predict_pb2
from tensorflow_serving.apis import prediction_log_pb2
from tensorflow_serving.apis import regression_pb2
import numpy as np

with tf.python_io.TFRecordWriter("tf_serving_warmup_requests") as writer:       
    img = np.random.uniform(0, 1, size=[224, 224, 3]).astype(np.float32)
    img = np.expand_dims(img, axis=0)
    test_data = np.repeat(img, 1, axis=0)
    request = predict_pb2.PredictRequest()
    request.model_spec.name = 'compiled_models'
    request.model_spec.signature_name = 'serving_default'
    request.inputs['Placeholder:0'].CopyFrom(tf.compat.v1.make_tensor_proto(test_data, shape=test_data.shape, dtype=tf.float32))
    log = prediction_log_pb2.PredictionLog(
    predict_log=prediction_log_pb2.PredictLog(request=request))
    writer.write(log.SerializeToString())
```

有关如何 “预热” 模型的更多信息，请参阅 [TensorFlow TFX 页面](https://www.tensorflow.org/tfx/serving/saved_model_warmup)。

# 使用 Boto3 部署编译的模型
<a name="neo-deployment-hosting-services-boto3"></a>

如果模型是使用 适用于 Python (Boto3) 的 AWS SDK、或 Amazon A SageMaker I 控制台编译的 AWS CLI，则必须满足[先决条件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)部分。按照以下步骤使用适用于 [Python 的亚马逊 Web S SageMaker ervices 软件开发工具包 (](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)Boto3) 创建和部署 NEO 编译模型。

**Topics**
+ [部署模型](#neo-deployment-hosting-services-boto3-steps)

## 部署模型
<a name="neo-deployment-hosting-services-boto3-steps"></a>

满足先决[条件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)后，使用`create_model``create_enpoint_config`、和`create_endpoint` APIs。

以下示例说明如何使用它们 APIs 来部署使用 Neo 编译的模型：

```
import boto3
client = boto3.client('sagemaker')

# create sagemaker model
create_model_api_response = client.create_model(
                                    ModelName='my-sagemaker-model',
                                    PrimaryContainer={
                                        'Image': <insert the ECR Image URI>,
                                        'ModelDataUrl': 's3://path/to/model/artifact/model.tar.gz',
                                        'Environment': {}
                                    },
                                    ExecutionRoleArn='ARN for AmazonSageMaker-ExecutionRole'
                            )

print ("create_model API response", create_model_api_response)

# create sagemaker endpoint config
create_endpoint_config_api_response = client.create_endpoint_config(
                                            EndpointConfigName='sagemaker-neomxnet-endpoint-configuration',
                                            ProductionVariants=[
                                                {
                                                    'VariantName': <provide your variant name>,
                                                    'ModelName': 'my-sagemaker-model',
                                                    'InitialInstanceCount': 1,
                                                    'InstanceType': <provide your instance type here>
                                                },
                                            ]
                                       )

print ("create_endpoint_config API response", create_endpoint_config_api_response)

# create sagemaker endpoint
create_endpoint_api_response = client.create_endpoint(
                                    EndpointName='provide your endpoint name',
                                    EndpointConfigName=<insert your endpoint config name>,
                                )

print ("create_endpoint API response", create_endpoint_api_response)
```

**注意**  
`AmazonSageMakerFullAccess` 和 `AmazonS3ReadOnlyAccess` 策略必须附加到 `AmazonSageMaker-ExecutionRole` IAM 角色。

有关、和`create_model``create_endpoint_config`、的完整语法 `create_endpoint` APIs [https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model)，请分别参见[https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config)、和。

如果您没有使用 SageMaker AI 训练模型，请指定以下环境变量：

------
#### [ MXNet and PyTorch ]

```
"Environment": {
    "SAGEMAKER_PROGRAM": "inference.py",
    "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SAGEMAKER_REGION": "insert your region",
    "MMS_DEFAULT_RESPONSE_TIMEOUT": "500"
}
```

------
#### [ TensorFlow ]

```
"Environment": {
    "SAGEMAKER_PROGRAM": "inference.py",
    "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SAGEMAKER_REGION": "insert your region"
}
```

------

 如果您使用 SageMaker AI 训练模型，请将环境变量指定`SAGEMAKER_SUBMIT_DIRECTORY`为包含训练脚本的完整 Amazon S3 存储桶 URI。

# 使用部署编译后的模型 AWS CLI
<a name="neo-deployment-hosting-services-cli"></a>

如果模型是使用 适用于 Python (Boto3) 的 AWS SDK、或 Amazon A SageMaker I 控制台编译的 AWS CLI，则必须满足[先决条件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)部分。按照以下步骤使用创建和部署 SageMaker Neo 编译模型。[AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/)

**Topics**
+ [部署模型](#neo-deploy-cli)

## 部署模型
<a name="neo-deploy-cli"></a>

满足先决[条件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)后，使用`create-model``create-enpoint-config`、和`create-endpoint` AWS CLI 命令。以下步骤说明如何使用这些命令部署使用 Neo 编译的模型：



### 创建模型
<a name="neo-deployment-hosting-services-cli-create-model"></a>

从 [Neo 推理容器镜像中，选择推理图像](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html) URI，然后使用 `create-model` API 创建 SageMaker AI 模型。为此，请完成两个步骤：

1. 创建 `create_model.json` 文件。在文件中，指定模型名称、图像 URI、Amazon S3 存储桶中`model.tar.gz`文件的路径以及您的 SageMaker AI 执行角色：

   ```
   {
       "ModelName": "insert model name",
       "PrimaryContainer": {
           "Image": "insert the ECR Image URI",
           "ModelDataUrl": "insert S3 archive URL",
           "Environment": {"See details below"}
       },
       "ExecutionRoleArn": "ARN for AmazonSageMaker-ExecutionRole"
   }
   ```

   如果您使用 SageMaker AI 训练模型，请指定以下环境变量：

   ```
   "Environment": {
       "SAGEMAKER_SUBMIT_DIRECTORY" : "[Full S3 path for *.tar.gz file containing the training script]"
   }
   ```

   如果您没有使用 SageMaker AI 训练模型，请指定以下环境变量：

------
#### [ MXNet and PyTorch ]

   ```
   "Environment": {
       "SAGEMAKER_PROGRAM": "inference.py",
       "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
       "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
       "SAGEMAKER_REGION": "insert your region",
       "MMS_DEFAULT_RESPONSE_TIMEOUT": "500"
   }
   ```

------
#### [ TensorFlow ]

   ```
   "Environment": {
       "SAGEMAKER_PROGRAM": "inference.py",
       "SAGEMAKER_SUBMIT_DIRECTORY": "/opt/ml/model/code",
       "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
       "SAGEMAKER_REGION": "insert your region"
   }
   ```

------
**注意**  
`AmazonSageMakerFullAccess` 和 `AmazonS3ReadOnlyAccess` 策略必须附加到 `AmazonSageMaker-ExecutionRole` IAM 角色。

1. 运行如下命令：

   ```
   aws sagemaker create-model --cli-input-json file://create_model.json
   ```

   有关 `create-model` API 的完整语法，请参阅 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-model.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-model.html)。

### 创建端点配置
<a name="neo-deployment-hosting-services-cli-create-endpoint-config"></a>

创建 A SageMaker I 模型后，使用 `create-endpoint-config` API 创建端点配置。为此，请使用端点配置规范创建一个 JSON 文件。例如，可使用以下代码模板并将其另存为 `create_config.json`：

```
{
    "EndpointConfigName": "<provide your endpoint config name>",
    "ProductionVariants": [
        {
            "VariantName": "<provide your variant name>",
            "ModelName": "my-sagemaker-model",
            "InitialInstanceCount": 1,
            "InstanceType": "<provide your instance type here>",
            "InitialVariantWeight": 1.0
        }
    ]
}
```

现在运行以下 AWS CLI 命令来创建您的终端节点配置：

```
aws sagemaker create-endpoint-config --cli-input-json file://create_config.json
```

有关 `create-endpoint-config` API 的完整语法，请参阅 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint-config.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint-config.html)。

### 创建端点
<a name="neo-deployment-hosting-services-cli-create-endpoint"></a>

创建端点配置后，使用 `create-endpoint` API 创建端点：

```
aws sagemaker create-endpoint --endpoint-name '<provide your endpoint name>' --endpoint-config-name '<insert your endpoint config name>'
```

有关 `create-endpoint` API 的完整语法，请参阅 [https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint.html](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-endpoint.html)。

# 使用控制台部署编译的模型
<a name="neo-deployment-hosting-services-console"></a>

如果模型是使用 适用于 Python (Boto3) 的 AWS SDK、或 Amazon A SageMaker I 控制台编译的 AWS CLI，则必须满足[先决条件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)部分。按照以下步骤使用 AI 控制台 [https://console.aws.amazon.com/ SageMaker SageMaker ](https://console.aws.amazon.com/sagemaker/)AI 创建和部署 AI Neo 编译模型。 SageMaker 

**Topics**
+ [部署模型](#deploy-the-model-console-steps)

## 部署模型
<a name="deploy-the-model-console-steps"></a>

 满足[先决条件](https://docs.aws.amazon.com//sagemaker/latest/dg/neo-deployment-hosting-services-prerequisites)后，使用以下步骤部署使用 Neo 编译的模型：

1. 选择**模型**，然后从**推理**组中选择**创建模型**。在**创建模型**页面上，根据需要填写**模型名称**、**IAM 角色**以及 **VPC** 字段（可选）。  
![\[创建 Neo 模型用于推理\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/images/create-pipeline-model.png)

1. 要添加用于部署模型的容器的相关信息，请选择**添加容器**，然后选择**下一步**。填写 **Container input options (容器输入选项)**、**Location of inference code image (推理代码映像位置)** 和 **Location of model artifacts (模型构件位置)** 以及可选的 **Container host name (容器主机名)** 和 **Environmental variables (环境变量)** 字段。  
![\[创建 Neo 模型用于推理\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/images/neo-deploy-console-container-definition.png)

1. 要部署 Neo 编译的模型，请选择以下内容：
   + **容器输入选项**：选择**提供模型构件和推理映像**。
   + **推理代码映像的位置**：根据 AWS 区域和应用程序类型，从 [Neo 推理容器映像](https://docs.aws.amazon.com/sagemaker/latest/dg/neo-deployment-hosting-services-container-images.html)中选择推理映像 URI。
   + **模型构件位置**：输入 Neo 编译 API 生成的、已编译模型构件的 Amazon S3 存储桶 URI。
   + **环境变量**：
     + 将此字段留空**SageMaker XGBoost**。
     + 如果您使用 SageMaker AI 训练模型，请将环境变量指定`SAGEMAKER_SUBMIT_DIRECTORY`为包含训练脚本的 Amazon S3 存储桶 URI。
     + 如果您没有使用 SageMaker AI 训练模型，请指定以下环境变量：    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/neo-deployment-hosting-services-console.html)

1. 确认容器的信息准确无误，然后选择 **Create model (创建模型)**。在**创建模型登录页面**上，选择**创建端点**。  
![\[创建模型登录页面\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/images/neo-deploy-console-create-model-land-page.png)

1. 在 **Create and configure endpoint (创建和配置端点)** 图表中，指定 **Endpoint name (端点名称)**。对于**连接端点配置**，请选择**创建新端点配置**。  
![\[Neo 控制台的创建和配置端点 UI。\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/images/neo-deploy-console-config-endpoint.png)

1. 在 **New endpoint configuration (新建端点配置)** 页面中，指定 **Endpoint configuration name (端点配置名称)**。  
![\[Neo 控制台的新建端点配置 UI。\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/images/neo-deploy-console-new-endpoint-config.png)

1. 选择模型名称旁的**编辑**，然后在**编辑生产变体**页面上指定正确的**实例类型**。**Instance type (实例类型)** 值必须与在编译作业中指定的值匹配。  
![\[Neo 控制台的新建端点配置 UI。\]](http://docs.aws.amazon.com/zh_cn/sagemaker/latest/dg/images/neo-deploy-console-edit-production-variant.png)

1. 选择**保存**。

1. 在**新建端点配置**页面上，选择**创建端点配置**，然后选择**创建端点**。