# Create a custom Docker container image for SageMaker and use it for model training in AWS Step Functions *Julia Bluszcz, Aubrey Oosthuizen, Mohan Gowda Purushothama, Neha Sharma, and Mateusz Zaremba, Amazon Web Services* ## Summary This pattern shows how to create a Docker container image for [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) and use it for a training model in [AWS Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html). By packaging custom algorithms in a container, you can run almost any code in the SageMaker environment, regardless of programming language, framework, or dependencies. In the example [SageMaker notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html) provided, the custom Docker container image is stored in [Amazon Elastic Container Registry (Amazon ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html). Step Functions then uses the container that’s stored in Amazon ECR to run a Python processing script for SageMaker. Then, the container exports the model to [Amazon Simple Storage Service (Amazon S3)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html). ## Prerequisites and limitations **Prerequisites** + An active AWS account + An [AWS Identity and Access Management (IAM) role for SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) with Amazon S3 permissions + An [IAM role for Step Functions](https://sagemaker-examples.readthedocs.io/en/latest/step-functions-data-science-sdk/step_functions_mlworkflow_processing/step_functions_mlworkflow_scikit_learn_data_processing_and_model_evaluation.html#Create-an-Execution-Role-for-Step-Functions) + Familiarity with Python + Familiarity with the Amazon SageMaker Python SDK + Familiarity with the AWS Command Line Interface (AWS CLI) + Familiarity with AWS SDK for Python (Boto3) + Familiarity with Amazon ECR + Familiarity with Docker **Product versions** + AWS Step Functions Data Science SDK version 2.3.0 + Amazon SageMaker Python SDK version 2.78.0 ## Architecture The following diagram shows an example workflow for creating a Docker container image for SageMaker, then using it for a training model in Step Functions: ![\[Workflow to create Docker container image for SageMaker to use as a Step Functions training model.\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/images/pattern-img/7857d57f-3077-4b06-8971-fb5846387693/images/37755e38-0bc4-4dd0-90c7-135d95b00053.png) The diagram shows the following workflow: 1. A data scientist or DevOps engineer uses a Amazon SageMaker notebook to create a custom Docker container image. 1. A data scientist or DevOps engineer stores the Docker container image in an Amazon ECR private repository that’s in a private registry. 1. A data scientist or DevOps engineer uses the Docker container to run a Python SageMaker processing job in a Step Functions workflow. **Automation and scale** The example SageMaker notebook in this pattern uses an `ml.m5.xlarge` notebook instance type. You can change the instance type to fit your use case. For more information about SageMaker notebook instance types, see [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/). ## Tools + [Amazon Elastic Container Registry (Amazon ECR)](https://docs.aws.amazon.com/AmazonECR/latest/userguide/what-is-ecr.html) is a managed container image registry service that’s secure, scalable, and reliable. + [Amazon SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html) is a managed machine learning (ML) service that helps you build and train ML models and then deploy them into a production-ready hosted environment. + [Amazon SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) is an open source library for training and deploying machine-learning models on SageMaker. + [AWS Step Functions](https://docs.aws.amazon.com/step-functions/latest/dg/welcome.html) is a serverless orchestration service that helps you combine AWS Lambda functions and other AWS services to build business-critical applications. + [AWS Step Functions Data Science Python SDK](https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/index.html) is an open source library that helps you create Step Functions workflows that process and publish machine learning models. ## Epics ### Create a custom Docker container image and store it in Amazon ECR | Task | Description | Skills required | | --- | --- | --- | | Setup Amazon ECR and create a new private registry. | If you haven’t already, set up Amazon ECR by following the instructions in [Setting up with Amazon ECR](https://docs.aws.amazon.com/AmazonECR/latest/userguide/get-set-up-for-amazon-ecr.html) in the *Amazon ECR User Guide*. Each AWS account is provided with a default private Amazon ECR registry. | DevOps engineer | | Create an Amazon ECR private repository. | Follow the instructions in [Creating a private repository](https://docs.aws.amazon.com/AmazonECR/latest/userguide/repository-create.html) in the *Amazon ECR User Guide*.The repository that you create is where you’ll store your custom Docker container images. | DevOps engineer | | Create a Dockerfile that includes the specifications needed to run your SageMaker processing job. | Create a Dockerfile that includes the specifications needed to run your SageMaker processing job by configuring a Dockerfile. For instructions, see [Adapting your own training container](https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html) in the *Amazon SageMaker Developer Guide*.For more information about Dockerfiles, see the [Dockerfile Reference](https://docs.docker.com/engine/reference/builder/) in the Docker documentation.**Example Jupyter notebook code cells to create a Dockerfile***Cell 1*

# Make docker folder
!mkdir -p docker

*Cell 2*

%%writefile docker/Dockerfile

FROM python:3.7-slim-buster

RUN pip3 install pandas==0.25.3 scikit-learn==0.21.3
ENV PYTHONUNBUFFERED=TRUE

ENTRYPOINT ["python3"]

| DevOps engineer | | Build your Docker container image and push it to Amazon ECR. | [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/prescriptive-guidance/latest/patterns/create-a-custom-docker-container-image-for-sagemaker-and-use-it-for-model-training-in-aws-step-functions.html)For more information, see [Building and registering the container](https://sagemaker-examples.readthedocs.io/en/latest/advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.html#Building-and-registering-the-container) in *Building your own algorithm container* on GitHub.**Example Jupyter notebook code cells to build and register a Docker image**Before running the following cells, make sure that you’ve created a Dockerfile and stored it in the directory called `docker`. Also, make sure that you’ve created an Amazon ECR repository, and that you replace the `ecr_repository` value in the first cell with your repository’s name.*Cell 1*

import boto3
tag = ':latest'
account_id = boto3.client('sts').get_caller_identity().get('Account')
region = boto3.Session().region_name
ecr_repository = 'byoc'

image_uri = '{}.dkr.ecr.{}.amazonaws.com/{}'.format(account_id, region, ecr_repository + tag)

*Cell 2*

# Build docker image
!docker build -t $image_uri docker

*Cell 3*

# Authenticate to ECR
!aws ecr get-login-password --region {region} | docker login --username AWS --password-stdin {account_id}.dkr.ecr.{region}.amazonaws.com

*Cell 4*

# Push docker image
!docker push $image_uri

You must [authenticate your Docker client to your private registry](https://docs.aws.amazon.com/AmazonECR/latest/userguide/registry_auth.html) so that you can use the `docker push` and `docker pull` commands. These commands push and pull images to and from the repositories in your registry. | DevOps engineer | ### Create a Step Functions workflow that uses your custom Docker container image | Task | Description | Skills required | | --- | --- | --- | | Create a Python script that includes your custom processing and model training logic. | Write custom processing logic to run in your data processing script. Then, save it as a Python script named `training.py`.For more information, see [Bring your own model with SageMaker Script Mode](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-script-mode/sagemaker-script-mode.html) on GitHub.**Example Python script that includes custom processing and model training logic**

%%writefile training.py
from numpy import empty
import pandas as pd
import os
from sklearn import datasets, svm
from joblib import dump, load


if __name__ == '__main__':
    digits = datasets.load_digits()
    #create classifier object
    clf = svm.SVC(gamma=0.001, C=100.)
    
    #fit the model
    clf.fit(digits.data[:-1], digits.target[:-1])
    
    #model output in binary format
    output_path = os.path.join('/opt/ml/processing/model', "model.joblib")
    dump(clf, output_path)

| Data scientist | | Create a Step Functions workflow that includes your SageMaker Processing job as one of the steps. | Install and import the [AWS Step Functions Data Science SDK](https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/readmelink.html) and upload the **training.py** file to Amazon S3. Then, use the [Amazon SageMaker Python SDK](https://github.com/aws/sagemaker-python-sdk) to define a processing step in Step Functions.Make sure that you’ve [created an IAM execution role for Step Functions](https://sagemaker-examples.readthedocs.io/en/latest/step-functions-data-science-sdk/step_functions_mlworkflow_processing/step_functions_mlworkflow_scikit_learn_data_processing_and_model_evaluation.html#Create-an-Execution-Role-for-Step-Functions) in your AWS account.**Example environment set up and custom training script to upload to Amazon S3**

!pip install stepfunctions

import boto3
import stepfunctions
import sagemaker
import datetime

from stepfunctions import steps
from stepfunctions.inputs import ExecutionInput
from stepfunctions.steps import (
    Chain
)
from stepfunctions.workflow import Workflow
from sagemaker.processing import ScriptProcessor, ProcessingInput, ProcessingOutput

sagemaker_session = sagemaker.Session()
bucket = sagemaker_session.default_bucket() 
role = sagemaker.get_execution_role()
prefix = 'byoc-training-model'

# See prerequisites section to create this role
workflow_execution_role = f"arn:aws:iam::{account_id}:role/AmazonSageMaker-StepFunctionsWorkflowExecutionRole"

execution_input = ExecutionInput(
    schema={
        "PreprocessingJobName": str})


input_code = sagemaker_session.upload_data(
    "training.py",
    bucket=bucket,
    key_prefix="preprocessing.py",
)

**Example SageMaker processing step definition that uses a custom Amazon ECR image and Python script**Make sure that you use the `execution_input` parameter to specify the job name. The parameter’s value must be unique each time the job runs. Also, the **training.py** file’s code is passed as an `input` parameter to the `ProcessingStep`, which means that it will be copied inside the container. The destination for the `ProcessingInput` code is the same as the second argument inside the `container_entrypoint`.

script_processor = ScriptProcessor(command=['python3'],
                image_uri=image_uri,
                role=role,
                instance_count=1,
                instance_type='ml.m5.xlarge')


processing_step = steps.ProcessingStep(
    "training-step",
    processor=script_processor,
    job_name=execution_input["PreprocessingJobName"],
    inputs=[
        ProcessingInput(
            source=input_code,
            destination="/opt/ml/processing/input/code",
            input_name="code",
        ),
    ],
    outputs=[
        ProcessingOutput(
            source='/opt/ml/processing/model', 
            destination="s3://{}/{}".format(bucket, prefix), 
            output_name='byoc-example')
    ],
    container_entrypoint=["python3", "/opt/ml/processing/input/code/training.py"],
)

**Example Step Functions workflow that runs a SageMaker processing job**This example workflow includes the SageMaker processing job step only, not a complete Step Functions workflow. For a full example workflow, see [Example notebooks in SageMaker](https://aws-step-functions-data-science-sdk.readthedocs.io/en/stable/readmelink.html#example-notebooks-in-sagemaker) in the AWS Step Functions Data Science SDK documentation.

workflow_graph = Chain([processing_step])

workflow = Workflow(
    name="ProcessingWorkflow",
    definition=workflow_graph,
    role=workflow_execution_role
)

workflow.create()
# Execute workflow
execution = workflow.execute(
    inputs={
        "PreprocessingJobName": str(datetime.datetime.now().strftime("%Y%m%d%H%M-%SS")),  # Each pre processing job (SageMaker processing job) requires a unique name,
    }
)
execution_output = execution.get_output(wait=True)

| Data scientist | ## Related resources + [Process data](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html) (*Amazon SageMaker Developer Guide*) + [Adapting your own training container](https://docs.aws.amazon.com/sagemaker/latest/dg/adapt-training-container.html) (*Amazon SageMaker Developer Guide*)