本文属于机器翻译版本。若本译文内容与英语原文存在差异，则一律以英文原文为准。 # 使用 Amazon 中的推理管道将预处理逻辑部署到单个终端节点的 ML 模型中 SageMaker *Mohan Gowda Purushothama、Gabriel Rodriguez Garcia 和 Mateusz Zaremba，Amazon Web Services* ## Summary 此模式说明了如何使用 Amazon SageMaker 中的[推理管道在单个终端节点中部署多个管道](https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipelines.html)模型对象。管道模型对象表示不同的机器学习 (ML) 工作流程阶段，例如预处理、模型推断和后期处理。为了说明串行连接的管道模型对象的部署，此模式向您展示了如何部署预处理的 [Scikit-Learn 容器和基于内置[线性](https://docs.aws.amazon.com/sagemaker/latest/dg/linear-learner.html)学习](https://docs.aws.amazon.com/sagemaker/latest/dg/sklearn.html)器算法的回归模型。 SageMaker部署托管在中的单个端点后面 SageMaker。 **注意** 请注意：此模式使用 ml.m4.2xlarge 实例类型部署。建议使用符合您的数据大小和工作流程复杂要求的实例类型。有关更多信息，请参阅 [Amazon SageMaker 定价](https://aws.amazon.com/sagemaker/pricing/)。此模式使用了[预先构建的 Scikit-Learn Docker 映像](https://docs.aws.amazon.com/sagemaker/latest/dg/pre-built-docker-containers-scikit-learn-spark.html)，但您可以使用自己的 Docker 容器并将其集成至工作流程。 ## 先决条件和限制 **先决条件** + 一个有效的 Amazon Web Services account + [Python 3.9](https://www.python.org/downloads/release/python-390/) + [亚马逊 SageMaker Python 软件开发工具包](https://sagemaker.readthedocs.io/en/stable/)和 [Boto3 库](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html) + AWS Identity and Access Management (AWS IAM) [角色](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html)具有基本 SageMaker [权限](https://docs.aws.amazon.com/sagemaker/latest/dg/api-permissions-reference.html)和亚马逊简单存储服务 (Amazon S3) [S](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-policy-language-overview.html) ervice 权限 **产品版本** + [亚马逊 SageMaker Python SDK 2.49.2](https://sagemaker.readthedocs.io/en/v2.49.2/) ## 架构 **目标技术堆栈** + Amazon Elastic Container Registry (Amazon ECR) + Amazon SageMaker + 亚马逊 SageMaker Studio + Amazon Simple Storage Service（Amazon S3） + [Amazon 的实时推理](https://docs.aws.amazon.com/sagemaker/latest/dg/realtime-endpoints.html)终端节点 SageMaker **目标架构** 下图显示了部署 Amazon SageMaker 管道模型对象的架构。 ![用于部署 SageMaker 管道模型对象的架构](http://docs.aws.amazon.com/zh_cn/prescriptive-guidance/latest/patterns/images/pattern-img/1105d51b-752f-46d7-962c-acef1fb3399f/images/12f06715-b1c2-4de0-b277-99ce87308152.png) 下图显示了如下工作流： 1. SageMaker 笔记本部署管道模型。 1. S3 存储桶存储模型构件。 1. Amazon ECR 从 S3 存储桶获取源容器映像。 ## 工具 **AWS 工具** + [Amazon Elastic Container Registry (Amazon ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html) 是一项安全、可扩展且可靠的托管容器映像注册表服务。 + [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) 是一项托管机器学习服务，可帮助您构建和训练机器学习模型，然后将其部署到可用于生产的托管环境中。 + [Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio.html) 是一个基于 Web 的机器学习集成开发环境 (IDE)，允许您构建、训练、调试、部署和监控您的机器学习模型。 + [Amazon Simple Storage Service（Amazon S3）](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html)是一项基于云的对象存储服务，可帮助您存储、保护和检索任意数量的数据。 **代码** 此模式的代码可在[带有 Scikit-Learn 和 L GitHub inear Learner 存储库的推理管道](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.ipynb)中找到。 ## 操作说明 ### 准备数据集 | Task | 说明 | 所需技能 | | --- | --- | --- | | 为回归任务准备数据集。 | 在 Amazon SageMaker Studio 中@@ [打开笔记本电脑](https://docs.aws.amazon.com/sagemaker/latest/dg/notebooks-create-open.html#notebooks-open)。
若要导入所有必要的库并初始化工作环境，请在笔记本中使用以下示例代码：

import sagemaker
from sagemaker import get_execution_role

sagemaker_session = sagemaker.Session()

# Get a SageMaker-compatible role used by this Notebook Instance.
role = get_execution_role()

# S3 prefix
bucket = sagemaker_session.default_bucket()
prefix = "Scikit-LinearLearner-pipeline-abalone-example"

若要下载示例数据集，请将以下代码添加至您的笔记本：

! mkdir abalone_data
! aws s3 cp s3://sagemaker-sample-files/datasets/tabular/uci_abalone/abalone.csv ./abalone_data

** **此模式中的示例使用 UCI 机器学习存储库中的 [Abalone 数据集](https://archive.ics.uci.edu/ml/datasets/abalone)。 | 数据科学家 | | 将数据集上传至 S3 存储桶。 | 在此前准备数据集的笔记本中，添加以下代码，以将示例数据上传至 S3 存储桶：

WORK_DIRECTORY = "abalone_data"

train_input = sagemaker_session.upload_data(
    path="{}/{}".format(WORK_DIRECTORY, "abalone.csv"),
    bucket=bucket,
    key_prefix="{}/{}".format(prefix, "train"),
)

| 数据科学家 | ### 使用创建数据预处理器 SKLearn | Task | 说明 | 所需技能 | | --- | --- | --- | | 准备 preprocessor.py 脚本。 | [See the AWS documentation website for more details](http://docs.aws.amazon.com/zh_cn/prescriptive-guidance/latest/patterns/deploy-preprocessing-logic-into-an-ml-model-in-a-single-endpoint-using-an-inference-pipeline-in-amazon-sagemaker.html) | 数据科学家 | | 创建 SKLearn 预处理器对象。 | 要创建可以合并到最终推理管道中的 SKLearn 预处理器对象（名为 E SKLearn stimator），请在笔记本中运行以下代码： SageMaker

from sagemaker.sklearn.estimator import SKLearn

FRAMEWORK_VERSION = "0.23-1"
script_path = "sklearn_abalone_featurizer.py"

sklearn_preprocessor = SKLearn(
    entry_point=script_path,
    role=role,
    framework_version=FRAMEWORK_VERSION,
    instance_type="ml.c4.xlarge",
    sagemaker_session=sagemaker_session,
)
sklearn_preprocessor.fit({"train": train_input})

| 数据科学家 | | 测试预处理器推理。 | 要确认您的预处理器定义正确，请在 SageMaker 笔记本中输入以下代码来启动[批处理转换作业](https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform.html)：

# Define a SKLearn Transformer from the trained SKLearn Estimator
transformer = sklearn_preprocessor.transformer(
    instance_count=1, instance_type="ml.m5.xlarge", assemble_with="Line", accept="text/csv"
)


# Preprocess training input
transformer.transform(train_input, content_type="text/csv")
print("Waiting for transform job: " + transformer.latest_transform_job.job_name)
transformer.wait()
preprocessed_train = transformer.output_path

| | ### 验证机器学习模型 | Task | 说明 | 所需技能 | | --- | --- | --- | | 创建模型对象。 | 要基于线性学习器算法创建模型对象，请在 SageMaker 笔记本中输入以下代码：

import boto3
from sagemaker.image_uris import retrieve

ll_image = retrieve("linear-learner", boto3.Session().region_name)
s3_ll_output_key_prefix = "ll_training_output"
s3_ll_output_location = "s3://{}/{}/{}/{}".format(
    bucket, prefix, s3_ll_output_key_prefix, "ll_model"
)

ll_estimator = sagemaker.estimator.Estimator(
    ll_image,
    role,
    instance_count=1,
    instance_type="ml.m4.2xlarge",
    volume_size=20,
    max_run=3600,
    input_mode="File",
    output_path=s3_ll_output_location,
    sagemaker_session=sagemaker_session,
)

ll_estimator.set_hyperparameters(feature_dim=10, predictor_type="regressor", mini_batch_size=32)

ll_train_data = sagemaker.inputs.TrainingInput(
    preprocessed_train,
    distribution="FullyReplicated",
    content_type="text/csv",
    s3_data_type="S3Prefix",
)

data_channels = {"train": ll_train_data}
ll_estimator.fit(inputs=data_channels, logs=True)

先前代码从公共 Amazon ECR 注册表中检索模型的相关 Amazon ECR Docker 映像，创建估算器对象，然后使用该对象训练回归模型。 | 数据科学家 | ### 部署最终管道 | Task | 说明 | 所需技能 | | --- | --- | --- | | 部署管道模型。 | 要创建管道模型对象（即预处理器对象）并部署该对象，请在 SageMaker 笔记本中输入以下代码：

from sagemaker.model import Model
from sagemaker.pipeline import PipelineModel
import boto3
from time import gmtime, strftime

timestamp_prefix = strftime("%Y-%m-%d-%H-%M-%S", gmtime())

scikit_learn_inferencee_model = sklearn_preprocessor.create_model()
linear_learner_model = ll_estimator.create_model()

model_name = "inference-pipeline-" + timestamp_prefix
endpoint_name = "inference-pipeline-ep-" + timestamp_prefix
sm_model = PipelineModel(
    name=model_name, role=role, models= [scikit_learn_inferencee_model, linear_learner_model]
)

sm_model.deploy(initial_instance_count=1, instance_type="ml.c4.xlarge", endpoint_name=endpoint_name)

您可以调整模型对象中使用的实例类型，以满足您的需求。 | 数据科学家 | | 测试推理。 | 要确认端点是否正常工作，请在 SageMaker 笔记本中运行以下示例推理代码：

from sagemaker.predictor import Predictor
from sagemaker.serializers import CSVSerializer

payload = "M, 0.44, 0.365, 0.125, 0.516, 0.2155, 0.114, 0.155"
actual_rings = 10
predictor = Predictor(
    endpoint_name=endpoint_name, sagemaker_session=sagemaker_session, serializer=CSVSerializer()
)

print(predictor.predict(payload))

| 数据科学家 | ## 相关资源 + 在@@ [使用亚马逊 SageMaker 推理管道和 Scikit-Learn（AWS Machine Learning 博客）进行预测之前，对输入数据进行预处理](https://aws.amazon.com/blogs/machine-learning/preprocess-input-data-before-making-predictions-using-amazon-sagemaker-inference-pipelines-and-scikit-learn/) + [使用亚马逊进行端到端的 Machine Learn SageMaker ing](https://github.com/aws-samples/amazon-sagemaker-build-train-deploy) (GitHub)