

# Adapt your training job to access images in a private Docker registry
<a name="docker-containers-adapt-your-own-private-registry"></a>

You can use a private [Docker registry](https://docs.docker.com/registry/) instead of an Amazon Elastic Container Registry (Amazon ECR) to host your images for SageMaker AI Training. The following instructions show you how to create a Docker registry, configure your virtual private cloud (VPC) and training job, store images, and give SageMaker AI access to the training image in the private docker registry. These instructions also show you how to use a Docker registry that requires authentication for a SageMaker training job.

## Create and store your images in a private Docker registry
<a name="docker-containers-adapt-your-own-private-registry-prerequisites"></a>

Create a private Docker registry to store your images. Your registry must:
+ use the [Docker Registry HTTP API](https://docs.docker.com/registry/spec/api/) protocol
+ be accessible from the same VPC specified in the [VpcConfig](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html#API_CreateTrainingJob_RequestSyntax) parameter in the `CreateTrainingJob` API. Input `VpcConfig` when you create your training job.
+ secured with a [TLS certificate](https://aws.amazon.com/what-is/ssl-certificate/) from a known public certificate authority.

For more information about creating a Docker registry, see [Deploy a registry server](https://docs.docker.com/registry/deploying/).

## Configure your VPC and SageMaker training job
<a name="docker-containers-adapt-your-own-private-registry-configure"></a>

SageMaker AI uses a network connection within your VPC to access images in your Docker registry. To use the images in your Docker registry for training, the registry must be accessible from an Amazon VPC in your account. For more information, see [Use a Docker registry that requires authentication for training](docker-containers-adapt-your-own-private-registry-authentication.md).

You must also configure your training job to connect to the same VPC to which your Docker registry has access. For more information, see [Configure a Training Job for Amazon VPC Access](https://docs.aws.amazon.com/sagemaker/latest/dg/train-vpc.html#train-vpc-configure).

## Create a training job using an image from your private Docker registry
<a name="docker-containers-adapt-your-own-private-registry-create"></a>

To use an image from your private Docker registry for training, use the following guide to configure your image, configure and create a training job. The code examples that follow use the AWS SDK for Python (Boto3) client.

1. Create a training image configuration object and input `Vpc` the `TrainingRepositoryAccessMode` field as follows.

   ```
   training_image_config = {
       'TrainingRepositoryAccessMode': 'Vpc'
   }
   ```
**Note**  
If your private Docker registry requires authentication, you must add a `TrainingRepositoryAuthConfig` object to the training image configuration object. You must also specify the Amazon Resource Name (ARN) of an AWS Lambda function that provides access credentials to SageMaker AI using the `TrainingRepositoryCredentialsProviderArn` field of the `TrainingRepositoryAuthConfig` object. For more information, see the example code structure below.  

   ```
   training_image_config = {
      'TrainingRepositoryAccessMode': 'Vpc',
      'TrainingRepositoryAuthConfig': {
           'TrainingRepositoryCredentialsProviderArn': 'arn:aws:lambda:Region:Acct:function:FunctionName'
      }
   }
   ```

   For information about how to create the Lambda function to provide authentication, see [Use a Docker registry that requires authentication for training](docker-containers-adapt-your-own-private-registry-authentication.md).

1. Use a Boto3 client to create a training job and pass the correct configuration to the [create\$1training\$1job](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html) API. The following instructions show you how to configure the components and create a training job.

   1. Create the `AlgorithmSpecification` object that you want to pass to `create_training_job`. Use the training image configuration object that you created in the previous step, as shown in the following code example.

      ```
      algorithm_specification = {
         'TrainingImage': 'myteam.myorg.com/docker-local/my-training-image:<IMAGE-TAG>',
         'TrainingImageConfig': training_image_config,
         'TrainingInputMode': 'File'
      }
      ```
**Note**  
To use a fixed, rather than an updated version of an image, refer to the image’s [digest](https://docs.docker.com/engine/reference/commandline/pull/#pull-an-image-by-digest-immutable-identifier) instead of by name or tag.

   1. Specify the name of the training job and role that you want to pass to `create_training_job`, as shown in the following code example. 

      ```
      training_job_name = 'private-registry-job'
      execution_role_arn = 'arn:aws:iam::123456789012:role/SageMakerExecutionRole'
      ```

   1. Specify a security group and subnet for the VPC configuration for your training job. Your private Docker registry must allow inbound traffic from the security groups that you specify, as shown in the following code example.

      ```
      vpc_config = {
          'SecurityGroupIds': ['sg-0123456789abcdef0'],
          'Subnets': ['subnet-0123456789abcdef0','subnet-0123456789abcdef1']
      }
      ```
**Note**  
If your subnet is not in the same VPC as your private Docker registry, you must set up a networking connection between the two VPCs. SeeConnect VPCs using [VPC peering](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-peering.html) for more information.

   1. Specify the resource configuration, including machine learning compute instances and storage volumes to use for training, as shown in the following code example. 

      ```
      resource_config = {
          'InstanceType': 'ml.m4.xlarge',
          'InstanceCount': 1,
          'VolumeSizeInGB': 10,
      }
      ```

   1. Specify the input and output data configuration, where the training dataset is stored, and where you want to store model artifacts, as shown in the following code example.

      ```
      input_data_config = [
          {
              "ChannelName": "training",
              "DataSource":
              {
                  "S3DataSource":
                  {
                      "S3DataDistributionType": "FullyReplicated",
                      "S3DataType": "S3Prefix",
                      "S3Uri": "s3://your-training-data-bucket/training-data-folder"
                  }
              }
          }
      ]
      
      output_data_config = {
          'S3OutputPath': 's3://your-output-data-bucket/model-folder'
      }
      ```

   1. Specify the maximum number of seconds that a model training job can run as shown in the following code example.

      ```
      stopping_condition = {
          'MaxRuntimeInSeconds': 1800
      }
      ```

   1. Finally, create the training job using the parameters you specified in the previous steps as shown in the following code example.

      ```
      import boto3
      sm = boto3.client('sagemaker')
      try:
          resp = sm.create_training_job(
              TrainingJobName=training_job_name,
              AlgorithmSpecification=algorithm_specification,
              RoleArn=execution_role_arn,
              InputDataConfig=input_data_config,
              OutputDataConfig=output_data_config,
              ResourceConfig=resource_config,
              VpcConfig=vpc_config,
              StoppingCondition=stopping_condition
          )
      except Exception as e:
          print(f'error calling CreateTrainingJob operation: {e}')
      else:
          print(resp)
      ```

# Use a SageMaker AI estimator to run a training job
<a name="docker-containers-adapt-your-own-private-registry-estimator"></a>

You can also use an [estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) from the SageMaker Python SDK to handle the configuration and running of your SageMaker training job. The following code examples show how to configure and run an estimator using images from a private Docker registry.

1. Import the required libraries and dependencies, as shown in the following code example.

   ```
   import boto3
   import sagemaker
   from sagemaker.estimator import Estimator
   
   session = sagemaker.Session()
   
   role = sagemaker.get_execution_role()
   ```

1. Provide a Uniform Resource Identifier (URI) to your training image, security groups and subnets for the VPC configuration for your training job, as shown in the following code example.

   ```
   image_uri = "myteam.myorg.com/docker-local/my-training-image:<IMAGE-TAG>"
   
   security_groups = ["sg-0123456789abcdef0"]
   subnets = ["subnet-0123456789abcdef0", "subnet-0123456789abcdef0"]
   ```

   For more information about `security_group_ids` and `subnets`, see the appropriate parameter description in the [Estimators](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html) section of the SageMaker Python SDK.
**Note**  
SageMaker AI uses a network connection within your VPC to access images in your Docker registry. To use the images in your Docker registry for training, the registry must be accessible from an Amazon VPC in your account.

1. Optionally, if your Docker registry requires authentication, you must also specify the Amazon Resource Name (ARN) of an AWS Lambda function that provides access credentials to SageMaker AI. The following code example shows how to specify the ARN. 

   ```
   training_repository_credentials_provider_arn = "arn:aws:lambda:us-west-2:1234567890:function:test"
   ```

   For more information about using images in a Docker registry requiring authentication, see **Use a Docker registry that requires authentication for training** below.

1. Use the code examples from the previous steps to configure an estimator, as shown in the following code example.

   ```
   # The training repository access mode must be 'Vpc' for private docker registry jobs 
   training_repository_access_mode = "Vpc"
   
   # Specify the instance type, instance count you want to use
   instance_type="ml.m5.xlarge"
   instance_count=1
   
   # Specify the maximum number of seconds that a model training job can run
   max_run_time = 1800
   
   # Specify the output path for the model artifacts
   output_path = "s3://your-output-bucket/your-output-path"
   
   estimator = Estimator(
       image_uri=image_uri,
       role=role,
       subnets=subnets,
       security_group_ids=security_groups,
       training_repository_access_mode=training_repository_access_mode,
       training_repository_credentials_provider_arn=training_repository_credentials_provider_arn,  # remove this line if auth is not needed
       instance_type=instance_type,
       instance_count=instance_count,
       output_path=output_path,
       max_run=max_run_time
   )
   ```

1. Start your training job by calling `estimator.fit` with your job name and input path as parameters, as shown in the following code example.

   ```
   input_path = "s3://your-input-bucket/your-input-path"
   job_name = "your-job-name"
   
   estimator.fit(
       inputs=input_path,
       job_name=job_name
   )
   ```

# Use a Docker registry that requires authentication for training
<a name="docker-containers-adapt-your-own-private-registry-authentication"></a>

If your Docker registry requires authentication, you must create an AWS Lambda function that provides access credentials to SageMaker AI. Then, create a training job and provide the ARN of this Lambda function inside the [create\$1training\$1job](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_training_job) API. Lastly, you can optionally create an interface VPC endpoint so that your VPC can communicate with your Lambda function without sending traffic over the internet. The following guide shows how to create a Lambda function, assign it the correct role and create an interface VPC endpoint.

## Create the Lambda function
<a name="docker-containers-adapt-your-own-private-registry-authentication-create-lambda"></a>

Create an AWS Lambda function that passes access credentials to SageMaker AI and returns a response. The following code example creates the Lambda function handler, as follows.

```
def handler(event, context):
   response = {
      "Credentials": {"Username": "username", "Password": "password"}
   }
   return response
```

The type of authentication used to set up your private Docker registry determines the contents of the response returned by your Lambda function as follows.
+ If your private Docker registry uses basic authentication, the Lambda function will return the username and password needed in order to authenticate to the registry.
+ If your private Docker registry uses [bearer token authentication](https://docs.docker.com/registry/spec/auth/token/), the username and password are sent to your authorization server, which then returns a bearer token. This token is then used to authenticate to your private Docker registry.

**Note**  
If you have more than one Lambda functions for your registries in the same account, and the execution role is the same for your training jobs, then training jobs for registry one would have access to the Lambda functions for other registries.

## Grant the correct role permission to your Lambda function
<a name="docker-containers-adapt-your-own-private-registry-authentication-lambda-role"></a>

The [IAMrole](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) that you use in the `create_training_job` API must have permission to call an AWS Lambda function. The following code example shows how to extend permissions policy of an IAM role to call `myLambdaFunction`.

```
{
    "Effect": "Allow",
    "Action": [
        "lambda:InvokeFunction"
    ],
    "Resource": [
        "arn:aws:lambda:*:*:function:*myLambdaFunction*"
    ]
}
```

For information about editing a role permissions policy, see [Modifying a role permissions policy (console)](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-modify_permissions-policy) in the *AWS Identity and Access Management User Guide*.

**Note**  
An IAM role with an attached **AmazonSageMakerFullAccess** managed policy has permission to call any Lambda function with "SageMaker AI" in its name.

## Create an interface VPC endpoint for Lambda
<a name="docker-containers-adapt-your-own-private-registry-authentication-lambda-endpoint"></a>

If you create an interface endpoint, your Amazon VPC can communicate with your Lambda function without sending traffic over the internet. For more information, see [Configuring interface VPC endpoints for Lambda](https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc-endpoints.html) in the *AWS Lambda Developer Guide*.

After your interface endpoint is created, SageMaker training will call your Lambda function by sending a request through your VPC to `lambda.region.amazonaws.com`. If you select **Enable DNS Name** when you create your interface endpoint, [Amazon Route 53](https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Welcome.html) routes the call to the Lambda interface endpoint. If you use a different DNS provider, you must map `lambda.region.amazonaws.co`m, to your Lambda interface endpoint.