# Implement MLOps


Amazon SageMaker AI supports features to implement machine learning models in production environments with continuous integration and deployment. The following topics give information about how to set up MLOps infrastructure when using SageMaker AI.

**Topics**
+ [

# Why Should You Use MLOps?
](sagemaker-projects-why.md)
+ [

# SageMaker Experiments
](experiments-mlops.md)
+ [

# SageMaker AI Workflows
](workflows.md)
+ [

# Amazon SageMaker ML Lineage Tracking
](lineage-tracking.md)
+ [

# Model Registration Deployment with Model Registry
](model-registry.md)
+ [

# Model Deployment in SageMaker AI
](model-deploy-mlops.md)
+ [

# SageMaker Model Monitor
](model-monitor-mlops.md)
+ [

# MLOps Automation With SageMaker Projects
](sagemaker-projects.md)
+ [

# Amazon SageMaker AI MLOps troubleshooting
](mlopsfaq.md)

# Why Should You Use MLOps?
Why MLOps?

As you move from running individual artificial intelligence and machine learning (AI/ML) projects to using AI/ML to transform your business at scale, the discipline of ML Operations (MLOps) can help. MLOps accounts for the unique aspects of AI/ML projects in project management, CI/CD, and quality assurance, helping you improve delivery time, reduce defects, and make data science more productive. MLOps refers to a methodology that is built on applying DevOps practices to machine learning workloads. For a discussion of DevOps principles, see the white paper [Introduction to DevOps on AWS](https://docs.aws.amazon.com/whitepapers/latest/introduction-devops-aws/welcome.html?did=wp_card). To learn more about implementation using AWS services, see [Practicing CI/CD on AWS](https://d1.awsstatic.com/whitepapers/DevOps/practicing-continuous-integration-continuous-delivery-on-AWS.pdf) and [Infrastructure as Code](https://d1.awsstatic.com/whitepapers/DevOps/infrastructure-as-code.pdf).

Like DevOps, MLOps relies on a collaborative and streamlined approach to the machine learning development lifecycle where the intersection of people, process, and technology optimizes the end-to-end activities required to develop, build, and operate machine learning workloads.

MLOps focuses on the intersection of data science and data engineering in combination with existing DevOps practices to streamline model delivery across the machine learning development lifecycle. MLOps is the discipline of integrating ML workloads into release management, CI/CD, and operations. MLOps requires the integration of software development, operations, data engineering, and data science.

## Challenges with MLOps


Although MLOps can provide valuable tools to help you scale your business, you might face certain issues as you integrate MLOps into your machine learning workloads.

**Project management**
+ ML projects involve data scientists, a relatively new role, and one not often integrated into cross-functional teams. These new team members often speak a very different technical language than product owners and software engineers, compounding the usual problem of translating business requirements into technical requirements. 

**Communication and collaboration**
+ Building visibility on ML projects and enabling collaboration across different stakeholders such as data engineers, data scientists, ML engineers, and DevOps is becoming increasingly important to ensure successful outcomes.


**Everything is code**
+ Use of production data in development activities, longer experimentation lifecycles, dependencies on data pipelines, retraining deployment pipelines, and unique metrics in evaluating the performance of a model.
+ Models often have a lifecycle independent of the applications and systems integrating with those models. 
+ The entire end-to-end system is reproducible through versioned code and artifacts. DevOps projects use Infrastructure-as-Code (IaC) and Configuration-as-Code (CaC) to build environments, and Pipelines-as-Code (PaC) to ensure consistent CI/CD patterns. The pipelines have to integrate with Big Data and ML training workflows. That often means that the pipeline is a combination of a traditional CI/CD tool and another workflow engine. There are important policy concerns for many ML projects, so the pipeline may also need to enforce those policies. Biased input data produces biased results, an increasing concern for business stakeholders.

**CI/CD**
+ In MLOps, the source data is a first-class input, along with source code. That’s why MLOps calls for versioning the source data and initiating pipeline runs when the source or inference data changes. 
+ Pipelines must also version the ML models, along with inputs and other outputs, in order to provide for traceability. 
+ Automated testing must include proper validation of the ML model during build phases and when the model is in production.
+ Build phases may include model training and retraining, a time-consuming and resource-intensive process. Pipelines must be granular enough to only perform a full training cycle when the source data or ML code changes, not when related components change.
+ Because machine learning code is typically a small part of an overall solution, a deployment pipeline may also incorporate the additional steps required to package a model for consumption as an API by other applications and systems.

**Monitoring and logging**
+ The feature engineering and model training phases needed to capture model training metrics as well as model experiments. Tuning an ML model requires manipulating the form of the input data as well as algorithm hyperparameters, and systematically capture those experiments. Experiment tracking helps data scientists work more effectively and gives a reproducible snapshot of their work.
+ Deployed ML models require monitoring of the data passed to the model for inference, along with the standard endpoint stability and performance metrics. The monitoring system must also capture the quality of model output, as evaluated by an appropriate ML metric. 

## Benefits of MLOps


Adopting MLOps practices gives you faster time-to-market for ML projects by delivering the following benefits.
+ **Productivity**: Providing self-service environments with access to curated data sets lets data engineers and data scientists move faster and waste less time with missing or invalid data.
+ **Repeatability**: Automating all the steps in the MLDC helps you ensure a repeatable process, including how the model is trained, evaluated, versioned, and deployed. 
+ **Reliability**: Incorporating CI/CD practices allows for the ability to not only deploy quickly but with increased quality and consistency. 
+ **Auditability**: Versioning all inputs and outputs, from data science experiments to source data to trained model, means that we can demonstrate exactly how the model was built and where it was deployed.
+ **Data and model quality**: MLOps lets us enforce policies that guard against model bias and track changes to data statistical properties and model quality over time. 

# SageMaker Experiments
Experiments

ML model building requires many iterations of training as you tune the algorithm, model architecture, and parameters to achieve high prediction accuracy. You can track the inputs and outputs across these training iterations to improve repeatability of trials and collaboration within your team using Amazon SageMaker Experiments. You can also track parameters, metrics, datasets, and other artifacts related to your model training jobs. SageMaker Experiments offers a single interface where you can visualize your in-progress training jobs, share experiments within your team, and deploy models directly from an experiment.

To learn about SageMaker Experiments, see [Amazon SageMaker Experiments in Studio Classic](experiments.md).

# SageMaker AI Workflows
Workflows

As you scale your machine learning (ML) operations, you can use Amazon SageMaker AI fully managed workflow services to implement continuous integration and deployment (CI/CD) practices for your ML lifecycle. With the Pipelines SDK, you choose and integrate pipeline steps into a unified solution that automates the model-building process from data preparation to model deployment. For Kubernetes based architectures, you can install SageMaker AI Operators on your Kubernetes cluster to create SageMaker AI jobs natively using the Kubernetes API and command-line Kubernetes tools such as `kubectl`. With SageMaker AI components for Kubeflow pipelines, you can create and monitor native SageMaker AI jobs from your Kubeflow Pipelines. The job parameters, status, and outputs from SageMaker AI are accessible from the Kubeflow Pipelines UI. Lastly, if you want to schedule batch jobs, you can use either the AWS Batch job queue integration or the Jupyter notebook-based workflows service to initiate standalone or regular runs on a schedule you define.

In summary, SageMaker AI offers the following workflow technologies:
+ [Pipelines](pipelines.md): Tool for building and managing ML pipelines.
+ [Kubernetes Orchestration](kubernetes-workflows.md): SageMaker AI custom operators for your Kubernetes cluster and components for Kubeflow Pipelines.
+ [SageMaker Notebook Jobs](notebook-auto-run.md): On demand or scheduled non-interactive batch runs of your Jupyter notebook.

You can also leverage other services that integrate with SageMaker AI to build your workflow. Options include the following services:
+ [Airflow Workflows](https://sagemaker.readthedocs.io/en/stable/workflows/airflow/index.html): SageMaker APIs to export configurations for creating and managing Airflow workflows.
+ [AWS Step Functions](https://sagemaker.readthedocs.io/en/stable/workflows/step_functions/index.html): Multi-step ML workflows in Python that orchestrate SageMaker AI infrastructure without having to provision your resources separately.
+ [AWS Batch](https://docs.aws.amazon.com/batch/latest/userguide/getting-started-sagemaker.html): Submit SageMaker AI training jobs to an AWS Batch job queue, where you can prioritize and schedule jobs to run in a compute environment.

For more information on managing SageMaker training and inference, see [Amazon SageMaker Python SDK Workflows](https://sagemaker.readthedocs.io/en/stable/workflows/index.html).

**Topics**
+ [

# Pipelines
](pipelines.md)
+ [

# Kubernetes Orchestration
](kubernetes-workflows.md)
+ [

# SageMaker Notebook Jobs
](notebook-auto-run.md)
+ [

# Schedule your ML workflows
](workflow-scheduling.md)
+ [

# AWS Batch support for SageMaker AI training jobs
](training-job-queues.md)

# Pipelines
ML Pipelines

Amazon SageMaker Pipelines is a purpose-built workflow orchestration service to automate machine learning (ML) development.

Pipelines provide the following advantages over other AWS workflow offerings:

**Auto-scaling serverless infrastructure** You don't need to manage the underlying orchestration infrastructure to run Pipelines, which allows you to focus on core ML tasks. SageMaker AI automatically provisions, scales, and shuts down the pipeline orchestration compute resources as your ML workload demands.

**Intuitive user experience** Pipelines can be created and managed through your interface of choice: visual editor, SDK, APIs, or JSON. You can drag-and-drop the various ML steps to author your pipelines in the Amazon SageMaker Studio visual interface. The following screenshot shows the Studio visual editor for pipelines.

![\[Screenshot of the visual drag-and-drop interface for Pipelines in Studio.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pipelines/pipelines-studio-overview.png)


If you prefer managing your ML workflows programmatically, the SageMaker Python SDK offers advanced orchestration features. For more information, see [Amazon SageMaker Pipelines](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html) in the SageMaker Python SDK documentation.

**AWS integrations** Pipelines provide seamless integration with all SageMaker AI features and other AWS services to automate data processing, model training, fine-tuning, evaluation, deployment, and monitoring jobs. You can incorporate the SageMaker AI features in your Pipelines and navigate across them using deep links to create, monitor, and debug your ML workflows at scale.

**Reduced costs** With Pipelines, you only pay for the SageMaker Studio environment and the underlying jobs that are orchestrated by Pipelines (for example, SageMaker Training, SageMaker Processing, SageMaker AI Inference, and Amazon S3 data storage).

**Auditability and lineage tracking** With Pipelines, you can track the history of pipeline updates and executions using built-in versioning. Amazon SageMaker ML Lineage Tracking helps you analyze the data sources and data consumers in the end-to-end ML development lifecycle.

**Topics**
+ [

# Pipelines overview
](pipelines-overview.md)
+ [

# Pipelines actions
](pipelines-build.md)

# Pipelines overview
Pipelines overview

An Amazon SageMaker AI pipeline is a series of interconnected steps in directed acyclic graph (DAG) that are defined using the drag-and-drop UI or [Pipelines SDK](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html). You can also build your pipeline using the [pipeline definition JSON schema](https://aws-sagemaker-mlops.github.io/sagemaker-model-building-pipeline-definition-JSON-schema/). This DAG JSON definition gives information on the requirements and relationships between each step of your pipeline. The structure of a pipeline's DAG is determined by the data dependencies between steps. These data dependencies are created when the properties of a step's output are passed as the input to another step. The following image is an example of a pipeline DAG:

![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pipeline-full.png)


**The example DAG includes the following steps:**

1. `AbaloneProcess`, an instance of the [Processing](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing) step, runs a preprocessing script on the data used for training. For example, the script could fill in missing values, normalize numerical data, or split data into the train, validation, and test datasets.

1. `AbaloneTrain`, an instance of the [Training](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-training) step, configures hyperparameters and trains a model from the preprocessed input data.

1. `AbaloneEval`, another instance of the [Processing](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-processing) step, evaluates the model for accuracy. This step shows an example of a data dependency—this step uses the test dataset output of the `AbaloneProcess`.

1. `AbaloneMSECond` is an instance of a [Condition](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-condition) step which, in this example, checks to make sure the mean-square-error result of model evaluation is below a certain limit. If the model does not meet the criteria, the pipeline run stops.

1. The pipeline run proceeds with the following steps:

   1. `AbaloneRegisterModel`, where SageMaker AI calls a [RegisterModel](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-register-model) step to register the model as a versioned model package group into the Amazon SageMaker Model Registry.

   1. `AbaloneCreateModel`, where SageMaker AI calls a [CreateModel](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-create-model) step to create the model in preparation for batch transform. In `AbaloneTransform`, SageMaker AI calls a [Transform](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-transform) step to generate model predictions on a dataset you specify.

The following topics describe fundamental Pipelines concepts. For a tutorial describing the implementation of these concepts, see [Pipelines actions](pipelines-build.md).

**Topics**
+ [

# Pipeline Structure and Execution
](build-and-manage-pipeline.md)
+ [

# IAM Access Management
](build-and-manage-access.md)
+ [

# Set up cross-account support for Pipelines
](build-and-manage-xaccount.md)
+ [

# Pipeline parameters
](build-and-manage-parameters.md)
+ [

# Pipelines steps
](build-and-manage-steps.md)
+ [

# Lift-and-shift Python code with the @step decorator
](pipelines-step-decorator.md)
+ [

# Pass Data Between Steps
](build-and-manage-propertyfile.md)
+ [

# Caching pipeline steps
](pipelines-caching.md)
+ [

# Retry Policy for Pipeline Steps
](pipelines-retry-policy.md)
+ [

# Selective execution of pipeline steps
](pipelines-selective-ex.md)
+ [

# Baseline calculation, drift detection and lifecycle with ClarifyCheck and QualityCheck steps in Amazon SageMaker Pipelines
](pipelines-quality-clarify-baseline-lifecycle.md)
+ [

# Schedule Pipeline Runs
](pipeline-eventbridge.md)
+ [

# Amazon SageMaker Experiments Integration
](pipelines-experiments.md)
+ [

# Run pipelines using local mode
](pipelines-local-mode.md)
+ [

# Troubleshooting Amazon SageMaker Pipelines
](pipelines-troubleshooting.md)

# Pipeline Structure and Execution
Structure and Execution

**Topics**
+ [

## Pipeline Structure
](#build-and-manage-pipeline-structure)
+ [

## Pipeline Execution using Parallelism Configuration
](#build-and-manage-pipeline-execution)

## Pipeline Structure
Structure

An Amazon SageMaker Pipelines instance is composed of a `name`, `parameters`, and `steps`. Pipeline names must be unique within an `(account, region)` pair. All parameters used in step definitions must be defined in the pipeline. Pipeline steps listed automatically determine their order of execution by their data dependencies on one another. The Pipelines service resolves the relationships between steps in the data dependency DAG to create a series of steps that the execution completes. The following is an example of a pipeline structure.

**Warning**  
When building a pipeline through the visual editor or SageMaker AI Python SDK, do not include sensitive information in pipeline parameters or any step definition field (such as environment variables). These fields will be visible in the future when returned in a `DescribePipeline` request.

```
from sagemaker.workflow.pipeline import Pipeline
  
  pipeline_name = f"AbalonePipeline"
  pipeline = Pipeline(
      name=pipeline_name,
      parameters=[
          processing_instance_type, 
          processing_instance_count,
          training_instance_type,
          model_approval_status,
          input_data,
          batch_data,
      ],
      steps=[step_process, step_train, step_eval, step_cond],
  )
```

## Pipeline Execution using Parallelism Configuration
Execution

By default, a pipeline performs all steps that are available to run in parallel. You can control this behavior by using the `ParallelismConfiguration` property when creating or updating a pipeline, as well as when starting or retrying a pipeline execution. 

Parallelism configurations are applied per execution. For example, if two executions are started they can each run a maximum of 50 steps concurrently, for a total of 100 concurrently running steps. Also, `ParallelismConfiguration`(s) specified when starting, retrying or updating an execution take precedence over parallelism configurations defined in the pipeline.

**Example Creating a pipeline execution with `ParallelismConfiguration`**  

```
pipeline = Pipeline(
        name="myPipeline",
        steps=[step_process, step_train]
    )

  pipeline.create(role, parallelism_config={"MaxParallelExecutionSteps": 50})
```

# IAM Access Management
Access Management

The following sections describe the AWS Identity and Access Management (IAM) requirements for Amazon SageMaker Pipelines. For an example of how you can implement these permissions, see [Prerequisites](define-pipeline.md#define-pipeline-prereq).

**Topics**
+ [

## Pipeline Role Permissions
](#build-and-manage-role-permissions)
+ [

## Pipeline Step Permissions
](#build-and-manage-step-permissions)
+ [

## CORS configuration with Amazon S3 buckets
](#build-and-manage-cors-s3)
+ [

## Customize access management for Pipelines jobs
](#build-and-manage-step-permissions-prefix)
+ [

## Customize access to pipeline versions
](#build-and-manage-step-permissions-version)
+ [

## Service Control Policies with Pipelines
](#build-and-manage-scp)

## Pipeline Role Permissions


Your pipeline requires an IAM pipeline execution role that is passed to Pipelines when you create a pipeline. The role for the SageMaker AI instance you're using to create the pipeline must have a policy with the `iam:PassRole` permission that specifies the pipeline execution role. This is because the instance needs permission to pass your pipeline execution role to the Pipelines service for use in creating and running pipelines. For more information on IAM roles, see [IAM Roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html).

Your pipeline execution role requires the following permissions:
+ You can use a unique or customized role for any of the SageMaker AI job steps in your pipeline (rather than the pipeline execution role, which is used by default). Make sure that your pipeline execution role has added a policy with the `iam:PassRole` permission that specifies each of these roles.
+  `Create` and `Describe` permissions for each of the job types in the pipeline. 
+  Amazon S3 permissions to use the `JsonGet` function. You control access to your Amazon S3 resources using resource-based policies and identity-based policies. A resource-based policy is applied to your Amazon S3 bucket and grants Pipelines access to the bucket. An identity-based policy gives your pipeline the ability to make Amazon S3 calls from your account. For more information on resource-based policies and identity-based policies, see [Identity-based policies and resource-based policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_identity-vs-resource.html). 

  ```
  {
      "Action": [
          "s3:GetObject"
      ],
      "Resource": "arn:aws:s3:::<your-bucket-name>/*",
      "Effect": "Allow"
  }
  ```

## Pipeline Step Permissions


Pipelines include steps that run SageMaker AI jobs. In order for the pipeline steps to run these jobs, they require an IAM role in your account that provides access for the needed resource. This role is passed to the SageMaker AI service principal by your pipeline. For more information on IAM roles, see [IAM Roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html). 

By default, each step takes on the pipeline execution role. You can optionally pass a different role to any of the steps in your pipeline. This ensures that the code in each step does not have the ability to impact resources used in other steps unless there is a direct relationship between the two steps specified in the pipeline definition. You pass these roles when defining the processor or estimator for your step. For examples of how to include these roles in these definitions, see the [SageMaker AI Python SDK documentation](https://sagemaker.readthedocs.io/en/stable/overview.html#using-estimators). 

## CORS configuration with Amazon S3 buckets


To ensure your images are imported into your Pipelines from an Amazon S3 bucket in a predictable manner, a CORS configuration must be added to Amazon S3 buckets where images are imported from. This section provides instructions on how to set the required CORS configuration to your Amazon S3 bucket. The XML `CORSConfiguration` required for Pipelines differs from the one in [CORS Requirement for Input Image Data](sms-cors-update.md), otherwise you can use the information there to learn more about the CORS requirement with Amazon S3 buckets.

Use the following CORS configuration code for the Amazon S3 buckets that host your images. For instructions on configuring CORS, see [Configuring cross-origin resource sharing (CORS)](https://docs.aws.amazon.com/AmazonS3/latest/user-guide/add-cors-configuration.html) in the Amazon Simple Storage Service User Guide. If you use the Amazon S3 console to add the policy to your bucket, you must use the JSON format.

**JSON**

```
[
    {
        "AllowedHeaders": [
            "*"
        ],
        "AllowedMethods": [
            "PUT"
        ],
        "AllowedOrigins": [
            "*"
        ],
        "ExposeHeaders": [
            "Access-Control-Allow-Origin"
        ]
    }
]
```

**XML**

```
<CORSConfiguration>
 <CORSRule>
   <AllowedHeader>*</AllowedHeader>
   <AllowedOrigin>*</AllowedOrigin>
   <AllowedMethod>PUT</AllowedMethod>
   <ExposeHeader>Access-Control-Allow-Origin</ExposeHeader>
 </CORSRule>
</CORSConfiguration>
```

The following GIF demonstrates the instructions found in the Amazon S3 documentation to add a CORS header policy using the Amazon S3 console.

![\[Gif on how to add a CORS header policy using the Amazon S3 console.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/sms/gifs/cors-config.gif)


## Customize access management for Pipelines jobs


You can further customize your IAM policies so selected members in your organization can run any or all pipeline steps. For example, you can give certain users permission to create training jobs, and another group of users permission to create processing jobs, and all of your users permission to run the remaining steps. To use this feature, you select a custom string which prefixes your job name. Your admin prepends the permitted ARNs with the prefix while your data scientist includes this prefix in pipeline instantiations. Because the IAM policy for permitted users contains a job ARN with the specified prefix, subsequent jobs of your pipeline step have necessary permissions to proceed. Job prefixing is off by default—you must toggle on this option in your `Pipeline` class to use it. 

For jobs with prefixing turned off, the job name is formatted as shown and is a concatenation of fields described in the following table:

`pipelines-<executionId>-<stepNamePrefix>-<entityToken>-<failureCount>`


| Field | Definition | 
| --- | --- | 
|  pipelines   |  A static string always prepended. This string identifies the pipeline orchestration service as the job's source.  | 
|  executionId  |  A randomized buffer for the running instance of the pipeline.  | 
|  stepNamePrefix  |  The user-specified step name (given in the `name` argument of the pipeline step), limited to the first 20 characters.  | 
|  entityToken  |  A randomized token to ensure idempotency of the step entity.  | 
|  failureCount  |  The current number of retries attempted to complete the job.  | 

In this case, no custom prefix is prepended to the job name, and the corresponding IAM policy must match this string.

For users who turn on job prefixing, the underlying job name takes the following form, with the custom prefix specified as `MyBaseJobName`:

*<MyBaseJobName>*-*<executionId>*-*<entityToken>*-*<failureCount>*

The custom prefix replaces the static `pipelines` string to help you narrow the selection of users who can run the SageMaker AI job as a part of a pipeline.

**Prefix length restrictions**

The job names have internal length constraints specific to individual pipeline steps. This constraint also limits the length of the allowed prefix. The prefix length requirements are as follows:


| Pipeline step | Prefix length | 
| --- | --- | 
|   `[TrainingStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#trainingstep)`, `[ModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#step-collections)`, `[TransformStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#transformstep)`, `[ProcessingStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#processingstep)`, `[ClarifyCheckStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#clarifycheckstep)`, `[QualityCheckStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#qualitycheckstep)`, `[RegisterModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#step-collections)`   |  38  | 
|  `[TuningStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#tuningstep)`, `[AutoML](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#automlstep)`  |  6  | 

### Apply job prefixes to an IAM policy


Your admin creates IAM policies allowing users of specific prefixes to create jobs. The following example policy permits data scientists to create training jobs if they use the `MyBaseJobName` prefix. 

```
{
    "Action": "sagemaker:CreateTrainingJob",
    "Effect": "Allow",
    "Resource": [
        "arn:aws:sagemaker:region:account-id:*/MyBaseJobName-*"
    ]
}
```

### Apply job prefixes to pipeline instantiations


You specify your prefix with the `*base_job_name` argument of the job instance class.

**Note**  
You pass your job prefix with the `*base_job_name` argument to the job instance before creating a pipeline step. This job instance contains the necessary information for the job to run as a step in a pipeline. This argument varies depending upon the job instance used. The following list shows which argument to use for each pipeline step type:  
`base_job_name` for the `[Estimator](https://sagemaker.readthedocs.io/en/stable/api/training/estimators.html)` (`[TrainingStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#trainingstep)`), `[Processor](https://sagemaker.readthedocs.io/en/stable/api/training/processing.html)` (`[ProcessingStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#processingstep)`), and `[AutoML](https://sagemaker.readthedocs.io/en/stable/api/training/automl.html)` (`[AutoMLStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#automlstep)`) classes
`tuning_base_job_name` for the `[Tuner](https://sagemaker.readthedocs.io/en/stable/api/training/tuner.html)` class (`[TuningStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#tuningstep)`)
`transform_base_job_name` for the `[Transformer](https://sagemaker.readthedocs.io/en/stable/api/inference/transformer.html)` class (`[TransformStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#transformstep)`)
`base_job_name` of `[CheckJobConfig](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#checkjobconfig)` for the `[QualityCheckStep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#qualitycheckstep)` (Quality Check) and `[ClarifyCheckstep](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#clarifycheckstep)` (Clarify Check) classes
For the `[Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html)` class, the argument used depends on if you run `create` or `register` on your model before passing the result to `[ModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#step-collections)`  
If you call `create`, the custom prefix comes from the `name` argument when you construct your model (i.e., `Model(name=)`)
If you call `register`, the custom prefix comes from the `model_package_name` argument of your call to `register` (i.e., `my_model.register(model_package_name=)`)

The following example shows how to specify a prefix for a new training job instance.

```
# Create a job instance
xgb_train = Estimator(
    image_uri=image_uri,
    instance_type="ml.m5.xlarge",
    instance_count=1,
    output_path=model_path,
    role=role,
    subnets=["subnet-0ab12c34567de89f0"],
    base_job_name="MyBaseJobName"
    security_group_ids=["sg-1a2bbcc3bd4444e55"],
    tags = [ ... ]
    encrypt_inter_container_traffic=True, 
)

# Attach your job instance to a pipeline step
step_train = TrainingStep(
    name="TestTrainingJob",
    estimator=xgb_train, 
    inputs={
        "train": TrainingInput(...), 
        "validation": TrainingInput(...) 
    }
)
```

Job prefixing is off by default. To opt into this feature, use the `use_custom_job_prefix` option of `PipelineDefinitionConfig` as shown in the following snippet:

```
from sagemaker.workflow.pipeline_definition_config import PipelineDefinitionConfig
        
# Create a definition configuration and toggle on custom prefixing
definition_config = PipelineDefinitionConfig(use_custom_job_prefix=True);

# Create a pipeline with a custom prefix
 pipeline = Pipeline(
     name="MyJobPrefixedPipeline",
     parameters=[...]
     steps=[...]
     pipeline_definition_config=definition_config
)
```

Create and run your pipeline. The following example creates and runs a pipeline, and also demonstrates how you can turn off job prefixing and rerun your pipeline.

```
pipeline.create(role_arn=sagemaker.get_execution_role())

# Optionally, call definition() to confirm your prefixed job names are in the built JSON
pipeline.definition()
pipeline.start()
      
# To run a pipeline without custom-prefixes, toggle off use_custom_job_prefix, update the pipeline 
# via upsert() or update(), and start a new run
definition_config = PipelineDefinitionConfig(use_custom_job_prefix=False)
pipeline.pipeline_definition_config = definition_config
pipeline.update()
execution = pipeline.start()
```

Similarly, you can toggle the feature on for existing pipelines and start a new run which uses job prefixes.

```
definition_config = PipelineDefinitionConfig(use_custom_job_prefix=True)
pipeline.pipeline_definition_config = definition_config
pipeline.update()
execution = pipeline.start()
```

Finally, you can view your custom-prefixed job by calling `list_steps` on the pipeline execution.

```
steps = execution.list_steps()

prefixed_training_job_name = steps['PipelineExecutionSteps'][0]['Metadata']['TrainingJob']['Arn']
```

## Customize access to pipeline versions


You can grant customized access to specific versions of Amazon SageMaker Pipelines by using the `sagemaker:PipelineVersionId` condition key. For example, the policy below grants access to start executions or update pipeline version only for version ID 6 and above.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": {
        "Sid": "AllowStartPipelineExecution",
        "Effect": "Allow",
        "Action": [
            "sagemaker:StartPipelineExecution",
            "sagemaker:UpdatePipelineVersion"
        ],
        "Resource": "*",
        "Condition": {
            "NumericGreaterThanEquals": {
                "sagemaker:PipelineVersionId": 6
            }
        }
    }
}
```

------

For more information about supported condition keys, see [Condition keys for Amazon SageMaker AI](https://docs.aws.amazon.com//service-authorization/latest/reference/list_amazonsagemaker.html#amazonsagemaker-policy-keys).

## Service Control Policies with Pipelines


Service control policies (SCPs) are a type of organization policy that you can use to manage permissions in your organization. SCPs offer central control over the maximum available permissions for all accounts in your organization. By using Pipelines within your organization, you can ensure that data scientists manage your pipeline executions without having to interact with the AWS console. 

If you're using a VPC with your SCP that restricts access to Amazon S3, you need to take steps to allow your pipeline to access other Amazon S3 resources. 

To allow Pipelines to access Amazon S3 outside of your VPC with the `JsonGet` function, update your organization's SCP to ensure that the role using Pipelines can access Amazon S3. To do this, create an exception for roles that are being used by the Pipelines executor via the pipeline execution role using a principal tag and condition key. 

**To allow Pipelines to access Amazon S3 outside of your VPC**

1. Create a unique tag for your pipeline execution role following the steps in [Tagging IAM users and roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_tags.html). 

1. Grant an exception in your SCP using the `Aws:PrincipalTag IAM` condition key for the tag you created. For more information, see [Creating, updating, and deleting service control policies](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps_create.html). 

# Set up cross-account support for Pipelines
Set up cross-account support

Cross-account support for Amazon SageMaker Pipelines enables you to collaborate on machine learning pipelines with other teams or organizations that operate in different AWS accounts. By setting up cross-account pipeline sharing, you can grant controlled access to pipelines, allow other accounts to view pipeline details, trigger executions, and monitor runs. The following topic covers how to set up cross-account pipeline sharing, the different permission policies available for shared resources, and how to access and interact with shared pipeline entities through direct API calls to SageMaker AI.

## Set up cross-account pipeline sharing
Set up cross-account pipeline sharing

SageMaker AI uses [AWS Resource Access Manager](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) (AWS RAM) to help you securely share your pipeline entities across accounts. 

### Create a resource share


1. Select **Create a resource share** through the [AWS RAM console](https://console.aws.amazon.com/ram/home).

1. When specifying resource share details, choose the Pipelines resource type and select one or more pipelines that you want to share. When you share a pipeline with any other account, all of its executions are also shared implicitly.

1. Associate permissions with your resource share. Choose either the default read-only permission policy or the extended pipeline execution permission policy. For more detailed information, see [Permission policies for Pipelines resources](#build-and-manage-xaccount-permissions). 
**Note**  
If you select the extended pipeline execution policy, note that any start, stop, and retry commands called by shared accounts use resources in the AWS account that shared the pipeline.

1. Use AWS account IDs to specify the accounts to which you want to grant access to your shared resources.

1. Review your resource share configuration and select **Create resource share**. It may take a few minutes for the resource share and principal associations to complete.

For more information, see [Sharing your AWS resources](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html) in the *AWS Resource Access Manager User Guide*.

### Get responses to your resource share invitation


Once the resource share and principal associations are set, the specified AWS accounts receive an invitation to join the resource share. The AWS accounts must accept the invite to gain access to any shared resources.

For more information on accepting a resource share invite through AWS RAM, see [Using shared AWS resources ](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-shared.html)in the *AWS Resource Access Manager User Guide*.

## Permission policies for Pipelines resources


When creating your resource share, choose one of two supported permission policies to associate with the SageMaker AI pipeline resource type. Both policies grant access to any selected pipeline and all of its executions. 

### Default read-only permissions


The `AWSRAMDefaultPermissionSageMakerPipeline` policy allows the following read-only actions:

```
"sagemaker:DescribePipeline"
"sagemaker:DescribePipelineDefinitionForExecution"   
"sagemaker:DescribePipelineExecution"
"sagemaker:ListPipelineExecutions"
"sagemaker:ListPipelineExecutionSteps"
"sagemaker:ListPipelineParametersForExecution"
"sagemaker:Search"
```

### Extended pipeline execution permissions


The `AWSRAMPermissionSageMakerPipelineAllowExecution` policy includes all of the read-only permissions from the default policy and also allows shared accounts to start, stop, and retry pipeline executions.

**Note**  
Be mindful of AWS resource usage when using the extended pipeline execution permission policy. With this policy, shared accounts are allowed to start, stop, and retry pipeline executions. Any resources used for shared pipeline executions are consumed by the owner account. 

The extended pipeline execution permission policy allows the following actions:

```
"sagemaker:DescribePipeline"
"sagemaker:DescribePipelineDefinitionForExecution"   
"sagemaker:DescribePipelineExecution"
"sagemaker:ListPipelineExecutions"
"sagemaker:ListPipelineExecutionSteps"
"sagemaker:ListPipelineParametersForExecution"
"sagemaker:StartPipelineExecution"
"sagemaker:StopPipelineExecution"
"sagemaker:RetryPipelineExecution"
"sagemaker:Search"
```

## Access shared pipeline entities through direct API calls


Once cross-account pipeline sharing is set up, you can call the following SageMaker API actions using a pipeline ARN:

**Note**  
You can only call API commands if they are included in the permissions associated with your resource share. If you select the `AWSRAMPermissionSageMakerPipelineAllowExecution` policy, then the start, stop, and retry commands use resources in the AWS account that shared the pipeline.
+ [DescribePipeline](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribePipeline.html)
+ [DescribePipelineDefinitionForExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribePipelineDefinitionForExecution.html)
+ [DescribePipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribePipelineExecution.html)
+ [ListPipelineExecutions](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListPipelineExecutions.html)
+ [ListPipelineExecutionSteps](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListPipelineExecutionSteps.html)
+ [ListPipelineParametersForExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListPipelineParametersForExecution.html)
+ [StartPipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_StartPipelineExecution.html)
+ [StopPipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_StopPipelineExecution.html)
+ [RetryPipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_RetryPipelineExecution.html)

# Pipeline parameters


You can introduce variables into your pipeline definition using parameters. You can reference parameters that you define throughout your pipeline definition. Parameters have a default value, which you can override by specifying parameter values when starting a pipeline execution. The default value must be an instance matching the parameter type. All parameters used in step definitions must be defined in your pipeline definition. This topic describes the parameters that you can define and how to implement them.

Amazon SageMaker Pipelines supports the following parameter types: 
+  `ParameterString` – Representing a string parameter. 
+  `ParameterInteger` – Representing an integer parameter. 
+  `ParameterFloat` – Representing a float parameter.
+  `ParameterBoolean` – Representing a Boolean Python type.

Parameters take the following format:

```
<parameter> = <parameter_type>(
    name="<parameter_name>",
    default_value=<default_value>
)
```

The following example shows a sample parameter implementation.

```
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
    ParameterFloat,
    ParameterBoolean
)

processing_instance_count = ParameterInteger(
    name="ProcessingInstanceCount",
    default_value=1
)
```

You pass the parameter when creating your pipeline as shown in the following example.

```
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[
        processing_instance_count
    ],
    steps=[step_process]
)
```

You can also pass a parameter value that differs from the default value to a pipeline execution, as shown in the following example.

```
execution = pipeline.start(
    parameters=dict(
        ProcessingInstanceCount="2",
        ModelApprovalStatus="Approved"
    )
)
```

You can manipulate parameters with SageMaker Python SDK functions like `[ sagemaker.workflow.functions.Join](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.functions.Join)`. For more information on parameters, see [ SageMaker Pipelines Parameters](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#parameters).

For known limitations of Pipelines Parameters, see *[Limitations - Parameterization](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#parameterization)* in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).

# Pipelines steps
Pipelines Steps

Pipelines are composed of steps. These steps define the actions that the pipeline takes and the relationships between steps using properties. The following page describes the types of steps, their properties, and the relationships between them.

**Topics**
+ [

# Add a step
](build-and-manage-steps-types.md)
+ [

# Add integration
](build-and-manage-steps-integration.md)
+ [

## Step properties
](#build-and-manage-properties)
+ [

## Step parallelism
](#build-and-manage-parallelism)
+ [

## Data dependency between steps
](#build-and-manage-data-dependency)
+ [

## Custom dependency between steps
](#build-and-manage-custom-dependency)
+ [

## Custom images in a step
](#build-and-manage-images)

# Add a step


The following describes the requirements of each step type and provides an example implementation of the step, as well as how to add the step to a Pipelines. These are not working implementations because they don't provide the resource and inputs needed. For a tutorial that implements these steps, see [Pipelines actions](pipelines-build.md).

**Note**  
You can also create a step from your local machine learning code by converting it to a Pipelines step with the `@step` decorator. For more information, see [@step decorator](#step-type-custom).

Amazon SageMaker Pipelines support the following step types:
+ [Execute code](#step-type-executecode)

  [Processing](#step-type-processing)
+ [Training](#step-type-training)
+ [Tuning](#step-type-tuning)
+ [AutoML](#step-type-automl)
+ [`Model`](#step-type-model)
+ [`Create model`](#step-type-create-model)
+ [`Register model`](#step-type-register-model)
+ [`Deploy model (endpoint)`](#step-type-deploy-model-endpoint)
+ [Transform](#step-type-transform)
+ [Condition](#step-type-condition)
+ [`Callback`](#step-type-callback)
+ [Lambda](#step-type-lambda)
+ [`ClarifyCheck`](#step-type-clarify-check)
+ [`QualityCheck`](#step-type-quality-check)
+ [EMR](#step-type-emr)
+ [Notebook Job](#step-type-notebook-job)
+ [Fail](#step-type-fail)

## @step decorator


If you want to orchestrate a custom ML job that leverages advanced SageMaker AI features or other AWS services in the drag-and-drop Pipelines UI, use the [Execute code step](#step-type-executecode).

You can create a step from local machine learning code using the `@step` decorator. After you test your code, you can convert the function to a SageMaker AI pipeline step by annotating it with the `@step` decorator. Pipelines creates and runs a pipeline when you pass the output of the `@step`-decorated function as a step to your pipeline. You can also create a multi-step DAG pipeline that includes one or more `@step`-decorated functions as well as traditional SageMaker AI pipeline steps. For more details about how to create a step with `@step` decorator, see [Lift-and-shift Python code with the @step decorator](pipelines-step-decorator.md).

## Execute code step
Execute code

In the Pipelines drag-and-drop UI, you can use an **Execute code** step to run your own code as a pipeline step. You can upload a Python function, script, or notebook to be executed as part of your pipeline. You should use this step if you want to orchestrate a custom ML job that leverages advanced SageMaker AI features or other AWS services.

The **Execute Code** step uploads files to your default Amazon S3 bucket for Amazon SageMaker AI. This bucket might not have the required Cross-Origin Resource Sharing (CORS) permissions set. To learn more about configuring CORS permissions, see [CORS Requirement for Input Image Data](sms-cors-update.md).

The **Execute Code** step uses an Amazon SageMaker training job to run your code. Ensure that your IAM role has the `sagemaker:DescribeTrainingJob` and `sagemaker:CreateTrainingJob` API permissions. To learn more about all the required permissions for Amazon SageMaker AI and how to set them up, see [Amazon SageMaker AI API Permissions: Actions, Permissions, and Resources Reference](api-permissions-reference.md).

To add an execute code step to a pipeline using the Pipeline Designer, do the following:

1. Open the Amazon SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Execute code** and drag it to the canvas.

1. In the canvas, choose the **Execute code** step you added.

1. In the right sidebar, complete the forms in the **Setting** and **Details** tabs.

1. You can upload a single file to execute or upload a compressed folder containing multiple artifacts.

1. For single file uploads, you can provide optional parameters for notebooks, python functions, or scripts.

1. When providing Python functions, a handler must be provided in the format `file.py:<function_name>`

1. For compressed folder uploads, relative paths to your code must be provided, and you can optionally provide paths to a `requirements.txt` file or initialization script inside the compressed folder.

1. If the canvas includes any step that immediately precedes the **Execute code** step you added, click and drag the cursor from the step to the **Execute code** step to create an edge.

1. If the canvas includes any step that immediately succeeds the **Execute code** step you added, click and drag the cursor from the **Execute code** step to the step to create an edge. Outputs from **Execute code** steps can be referenced for Python functions.

## Processing step
Processing

Use a processing step to create a processing job for data processing. For more information on processing jobs, see [Process Data and Evaluate Models](https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html).

------
#### [ Pipeline Designer ]

To add a processing step to a pipeline using the Pipeline Designer, do the following:

1. Open the Amazon SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. In the left sidebar, choose **Process data** and drag it to the canvas.

1. In the canvas, choose the **Process data** step you added.

1. In the right sidebar, complete the forms in the **Setting** and **Details** tabs. For information about the fields in these tabs, see [ sagemaker.workflow.steps.ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep).

1. If the canvas includes any step that immediately precedes the **Process data** step you added, click and drag the cursor from the step to the **Process data** step to create an edge.

1. If the canvas includes any step that immediately succeeds the **Process data** step you added, click and drag the cursor from the **Process data** step to the step to create an edge.

------
#### [ SageMaker Python SDK ]

A processing step requires a processor, a Python script that defines the processing code, outputs for processing, and job arguments. The following example shows how to create a `ProcessingStep` definition. 

```
from sagemaker.sklearn.processing import SKLearnProcessor

sklearn_processor = SKLearnProcessor(framework_version='1.0-1',
                                     role=<role>,
                                     instance_type='ml.m5.xlarge',
                                     instance_count=1)
```

```
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

inputs = [
    ProcessingInput(source=<input_data>, destination="/opt/ml/processing/input"),
]

outputs = [
    ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
    ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
    ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
]

step_process = ProcessingStep(
    name="AbaloneProcess",
    step_args = sklearn_processor.run(inputs=inputs, outputs=outputs,
        code="abalone/preprocessing.py")
)
```

**Pass runtime parameters**

The following example shows how to pass runtime parameters from a PySpark processor to a `ProcessingStep`.

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.spark.processing import PySparkProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

pipeline_session = PipelineSession()

pyspark_processor = PySparkProcessor(
    framework_version='2.4',
    role=<role>,
    instance_type='ml.m5.xlarge',
    instance_count=1,
    sagemaker_session=pipeline_session,
)

step_args = pyspark_processor.run(
    inputs=[ProcessingInput(source=<input_data>, destination="/opt/ml/processing/input"),],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
        ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
    ],
    code="preprocess.py",
    arguments=None,
)


step_process = ProcessingStep(
    name="AbaloneProcess",
    step_args=step_args,
)
```

For more information on processing step requirements, see the [sagemaker.workflow.steps.ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep) documentation. For an in-depth example, see the [Orchestrate Jobs to Train and Evaluate Models with Amazon SageMaker Pipelines](https://github.com/aws/amazon-sagemaker-examples/blob/62de6a1fca74c7e70089d77e36f1356033adbe5f/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.ipynb) example notebook. The *Define a Processing Step for Feature Engineering* section includes more information.

------

## Training step
Training

You use a training step to create a training job to train a model. For more information on training jobs, see [Train a Model with Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html).

A training step requires an estimator, as well as training and validation data inputs.

------
#### [ Pipeline Designer ]

To add a training step to a pipeline using the Pipeline Designer, do the following:

1. Open the Amazon SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Train model** and drag it to the canvas.

1. In the canvas, choose the **Train model** step you added.

1. In the right sidebar, complete the forms in the **Setting** and **Details** tabs. For information about the fields in these tabs, see [ sagemaker.workflow.steps.TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep).

1. If the canvas includes any step that immediately precedes the **Train model** step you added, click and drag the cursor from the step to the **Train model** step to create an edge.

1. If the canvas includes any step that immediately succeeds the **Train model** step you added, click and drag the cursor from the **Train model** step to the step to create an edge.

------
#### [ SageMaker Python SDK ]

The following example shows how to create a `TrainingStep` definition. For more information about training step requirements, see the [sagemaker.workflow.steps.TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep) documentation.

```
from sagemaker.workflow.pipeline_context import PipelineSession

from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

from sagemaker.xgboost.estimator import XGBoost

pipeline_session = PipelineSession()

xgb_estimator = XGBoost(..., sagemaker_session=pipeline_session)

step_args = xgb_estimator.fit(
    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "train"
            ].S3Output.S3Uri,
            content_type="text/csv"
        ),
        "validation": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "validation"
            ].S3Output.S3Uri,
            content_type="text/csv"
        )
    }
)

step_train = TrainingStep(
    name="TrainAbaloneModel",
    step_args=step_args,
)
```

------

## Tuning step
Tuning

You use a tuning step to create a hyperparameter tuning job, also known as hyperparameter optimization (HPO). A hyperparameter tuning job runs multiple training jobs, with each job producing a model version. For more information on hyperparameter tuning, see [Automatic model tuning with SageMaker AI](automatic-model-tuning.md).

The tuning job is associated with the SageMaker AI experiment for the pipeline, with the training jobs created as trials. For more information, see [Experiments Integration](pipelines-experiments.md).

A tuning step requires a [HyperparameterTuner](https://sagemaker.readthedocs.io/en/stable/api/training/tuner.html) and training inputs. You can retrain previous tuning jobs by specifying the `warm_start_config` parameter of the `HyperparameterTuner`. For more information on hyperparameter tuning and warm start, see [Run a Warm Start Hyperparameter Tuning Job](automatic-model-tuning-warm-start.md).

You use the [get\$1top\$1model\$1s3\$1uri](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TuningStep.get_top_model_s3_uri) method of the [sagemaker.workflow.steps.TuningStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TuningStep) class to get the model artifact from one of the top-performing model versions. For a notebook that shows how to use a tuning step in a SageMaker AI pipeline, see [sagemaker-pipelines-tuning-step.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/tuning-step/sagemaker-pipelines-tuning-step.ipynb).

**Important**  
Tuning steps were introduced in Amazon SageMaker Python SDK v2.48.0 and Amazon SageMaker Studio Classic v3.8.0. You must update Studio Classic before you use a tuning step or the pipeline DAG doesn't display. To update Studio Classic, see [Shut Down and Update Amazon SageMaker Studio Classic](studio-tasks-update-studio.md).

The following example shows how to create a `TuningStep` definition.

```
from sagemaker.workflow.pipeline_context import PipelineSession

from sagemaker.tuner import HyperparameterTuner
from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TuningStep

tuner = HyperparameterTuner(..., sagemaker_session=PipelineSession())
    
step_tuning = TuningStep(
    name = "HPTuning",
    step_args = tuner.fit(inputs=TrainingInput(s3_data="s3://amzn-s3-demo-bucket/my-data"))
)
```

**Get the best model version**

The following example shows how to get the best model version from the tuning job using the `get_top_model_s3_uri` method. At most, the top 50 performing versions are available ranked according to [HyperParameterTuningJobObjective](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HyperParameterTuningJobObjective.html). The `top_k` argument is an index into the versions, where `top_k=0` is the best-performing version and `top_k=49` is the worst-performing version.

```
best_model = Model(
    image_uri=image_uri,
    model_data=step_tuning.get_top_model_s3_uri(
        top_k=0,
        s3_bucket=sagemaker_session.default_bucket()
    ),
    ...
)
```

For more information on tuning step requirements, see the [sagemaker.workflow.steps.TuningStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TuningStep) documentation.

## Fine-tuning step
Fine-tuning

Fine-tuning trains a pretrained foundation model from Amazon SageMaker JumpStart on a new dataset. This process, also known as transfer learning, can produce accurate models with smaller datasets and less training time. When you fine-tune a model, you can use the default dataset or choose your own data. To learn more about fine-tuning a foundation model from JumpStart, see [Fine-Tune a Model](jumpstart-fine-tune.md).

The fine-tuning step uses an Amazon SageMaker training job to customize your model. Ensure that your IAM role has the `sagemaker:DescribeTrainingJob` and `sagemaker:CreateTrainingJob` API permissions to execute the fine-tuning job in your pipeline. To learn more about the required permissions for Amazon SageMaker AI and how to set them up, see [Amazon SageMaker AI API Permissions: Actions, Permissions, and Resources Reference](api-permissions-reference.md).

To add a **Fine-tune model** step to your pipeline using the drag-and-drop editor, follow these steps:

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Fine-tune model** and drag it to the canvas.

1. In the canvas, choose the **Fine-tune model** step you added.

1. In the right sidebar, complete the forms in the **Setting** and **Details** tabs.

1. If the canvas includes any step that immediately precedes the **Fine-tune model** step you added, click and drag the cursor from the step to the **Fine-tune model** step to create an edge.

1. If the canvas includes any step that immediately succeeds the **Fine-tune model** step you added, click and drag the cursor from the **Fine-tune model** step to the step to create an edge.

## AutoML step
AutoML

Use the [AutoML](https://sagemaker.readthedocs.io/en/stable/api/training/automl.html) API to create an AutoML job to automatically train a model. For more information on AutoML jobs, see [Automate model development with Amazon SageMaker Autopilot](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-automate-model-development.html). 

**Note**  
Currently, the AutoML step supports only [ensembling training mode](https://docs.aws.amazon.com/sagemaker/latest/dg/autopilot-model-support-validation.html).

The following example shows how to create a definition using `AutoMLStep`.

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.automl_step import AutoMLStep

pipeline_session = PipelineSession()

auto_ml = AutoML(...,
    role="<role>",
    target_attribute_name="my_target_attribute_name",
    mode="ENSEMBLING",
    sagemaker_session=pipeline_session) 

input_training = AutoMLInput(
    inputs="s3://amzn-s3-demo-bucket/my-training-data",
    target_attribute_name="my_target_attribute_name",
    channel_type="training",
)
input_validation = AutoMLInput(
    inputs="s3://amzn-s3-demo-bucket/my-validation-data",
    target_attribute_name="my_target_attribute_name",
    channel_type="validation",
)

step_args = auto_ml.fit(
    inputs=[input_training, input_validation]
)

step_automl = AutoMLStep(
    name="AutoMLStep",
    step_args=step_args,
)
```

**Get the best model version**

The AutoML step automatically trains several model candidates. Get the model with the best objective metric from the AutoML job using the `get_best_auto_ml_model` method as follows. You must also use an IAM `role` to access model artifacts.

```
best_model = step_automl.get_best_auto_ml_model(role=<role>)
```

For more information, see the [AutoML](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.automl_step.AutoMLStep) step in the SageMaker Python SDK.

## Model step
`Model`

Use a `ModelStep` to create or register a SageMaker AI model. For more information on `ModelStep` requirements, see the [sagemaker.workflow.model\$1step.ModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.model_step.ModelStep) documentation.

### Create a model


You can use a `ModelStep` to create a SageMaker AI model. A `ModelStep` requires model artifacts and information about the SageMaker AI instance type that you need to use to create the model. For more information about SageMaker AI models, see [Train a Model with Amazon SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/how-it-works-training.html).

The following example shows how to create a `ModelStep` definition.

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.model import Model
from sagemaker.workflow.model_step import ModelStep

step_train = TrainingStep(...)
model = Model(
    image_uri=pytorch_estimator.training_image_uri(),
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=PipelineSession(),
    role=role,
)

step_model_create = ModelStep(
   name="MyModelCreationStep",
   step_args=model.create(instance_type="ml.m5.xlarge"),
)
```

### Register a model


You can use a `ModelStep` to register a `sagemaker.model.Model` or a `sagemaker.pipeline.PipelineModel` with the Amazon SageMaker Model Registry. A `PipelineModel` represents an inference pipeline, which is a model composed of a linear sequence of containers that process inference requests. For more information about how to register a model, see [Model Registration Deployment with Model Registry](model-registry.md).

The following example shows how to create a `ModelStep` that registers a `PipelineModel`.

```
import time

from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.sklearn import SKLearnModel
from sagemaker.xgboost import XGBoostModel

pipeline_session = PipelineSession()

code_location = 's3://{0}/{1}/code'.format(bucket_name, prefix)

sklearn_model = SKLearnModel(
   model_data=processing_step.properties.ProcessingOutputConfig.Outputs['model'].S3Output.S3Uri,
   entry_point='inference.py',
   source_dir='sklearn_source_dir/',
   code_location=code_location,
   framework_version='1.0-1',
   role=role,
   sagemaker_session=pipeline_session,
   py_version='py3'
)

xgboost_model = XGBoostModel(
   model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
   entry_point='inference.py',
   source_dir='xgboost_source_dir/',
   code_location=code_location,
   framework_version='0.90-2',
   py_version='py3',
   sagemaker_session=pipeline_session,
   role=role
)

from sagemaker.workflow.model_step import ModelStep
from sagemaker import PipelineModel

pipeline_model = PipelineModel(
   models=[sklearn_model, xgboost_model],
   role=role,sagemaker_session=pipeline_session,
)

register_model_step_args = pipeline_model.register(
    content_types=["application/json"],
   response_types=["application/json"],
   inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
   transform_instances=["ml.m5.xlarge"],
   model_package_group_name='sipgroup',
)

step_model_registration = ModelStep(
   name="AbaloneRegisterModel",
   step_args=register_model_step_args,
)
```

## Create model step
`Create model`

You use a Create model step to create a SageMaker AI model. For more information on SageMaker AI models, see [Train a Model with Amazon SageMaker](how-it-works-training.md).

A create model step requires model artifacts and information about the SageMaker AI instance type that you need to use to create the model. The following examples show how to create a Create model step definition. For more information about Create model step requirements, see the [sagemaker.workflow.steps.CreateModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.CreateModelStep) documentation.

------
#### [ Pipeline Designer ]

To add a create model step to your pipeline, do the following:

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Create model** and drag it to the canvas.

1. In the canvas, choose the **Create model** step you added.

1. In the right sidebar, complete the forms in the **Setting** and **Details** tabs. For information about the fields in these tabs, see [ sagemaker.workflow.steps.CreateModelStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.CreateModelStep).

1. If the canvas includes any step that immediately precedes the **Create model** step you added, click and drag the cursor from the step to the **Create model** step to create an edge.

1. If the canvas includes any step that immediately succeeds the **Create model** step you added, click and drag the cursor from the **Create model** step to the step to create an edge.

------
#### [ SageMaker Python SDK ]

**Important**  
We recommend using [Model step](#step-type-model) to create models as of v2.90.0 of the SageMaker AI Python SDK. `CreateModelStep` will continue to work in previous versions of the SageMaker Python SDK, but is no longer actively supported.

```
from sagemaker.workflow.steps import CreateModelStep

step_create_model = CreateModelStep(
    name="AbaloneCreateModel",
    model=best_model,
    inputs=inputs
)
```

------

## Register model step
`Register model`

The Register model step registers a model into the SageMaker Model Registry.

------
#### [ Pipeline Designer ]

To register a model from a pipeline using the Pipeline Designer, do the following:

1. Open the Amazon SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Register model** and drag it to the canvas.

1. In the canvas, choose the **Register model** step you added.

1. In the right sidebar, complete the forms in the **Setting** and **Details** tabs. For information about the fields in these tabs, see [ sagemaker.workflow.step\$1collections.RegisterModel](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel).

1. If the canvas includes any step that immediately precedes the **Register model** step you added, click and drag the cursor from the step to the **Register model** step to create an edge.

1. If the canvas includes any step that immediately succeeds the **Register model** step you added, click and drag the cursor from the **Register model** step to the step to create an edge.

------
#### [ SageMaker Python SDK ]

**Important**  
We recommend using [Model step](#step-type-model) to register models as of v2.90.0 of the SageMaker AI Python SDK. `RegisterModel` will continue to work in previous versions of the SageMaker Python SDK, but is no longer actively supported.

You use a `RegisterModel` step to register a [sagemaker.model.Model](https://sagemaker.readthedocs.io/en/stable/api/inference/model.html) or a [sagemaker.pipeline.PipelineModel](https://sagemaker.readthedocs.io/en/stable/api/inference/pipeline.html#pipelinemodel) with the Amazon SageMaker Model Registry. A `PipelineModel` represents an inference pipeline, which is a model composed of a linear sequence of containers that process inference requests.

For more information about how to register a model, see [Model Registration Deployment with Model Registry](model-registry.md). For more information on `RegisterModel` step requirements, see the [sagemaker.workflow.step\$1collections.RegisterModel](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel) documentation.

The following example shows how to create a `RegisterModel` step that registers a `PipelineModel`.

```
import time
from sagemaker.sklearn import SKLearnModel
from sagemaker.xgboost import XGBoostModel

code_location = 's3://{0}/{1}/code'.format(bucket_name, prefix)

sklearn_model = SKLearnModel(model_data=processing_step.properties.ProcessingOutputConfig.Outputs['model'].S3Output.S3Uri,
 entry_point='inference.py',
 source_dir='sklearn_source_dir/',
 code_location=code_location,
 framework_version='1.0-1',
 role=role,
 sagemaker_session=sagemaker_session,
 py_version='py3')

xgboost_model = XGBoostModel(model_data=training_step.properties.ModelArtifacts.S3ModelArtifacts,
 entry_point='inference.py',
 source_dir='xgboost_source_dir/',
 code_location=code_location,
 framework_version='0.90-2',
 py_version='py3',
 sagemaker_session=sagemaker_session,
 role=role)

from sagemaker.workflow.step_collections import RegisterModel
from sagemaker import PipelineModel
pipeline_model = PipelineModel(models=[sklearn_model,xgboost_model],role=role,sagemaker_session=sagemaker_session)

step_register = RegisterModel(
 name="AbaloneRegisterModel",
 model=pipeline_model,
 content_types=["application/json"],
 response_types=["application/json"],
 inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
 transform_instances=["ml.m5.xlarge"],
 model_package_group_name='sipgroup',
)
```

If `model` isn't provided, the register model step requires an estimator as shown in the following example.

```
from sagemaker.workflow.step_collections import RegisterModel

step_register = RegisterModel(
    name="AbaloneRegisterModel",
    estimator=xgb_train,
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    content_types=["text/csv"],
    response_types=["text/csv"],
    inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
    transform_instances=["ml.m5.xlarge"],
    model_package_group_name=model_package_group_name,
    approval_status=model_approval_status,
    model_metrics=model_metrics
)
```

------

## Deploy model (endpoint) step
`Deploy model (endpoint)`

In the Pipeline Designer, use the Deploy model (endpoint) step to deploy your model to an endpoint. You can create a new endpoint or use an existing endpoint. Real-time inference is ideal for inference workloads where you have real-time, interactive, low latency requirements. You can deploy your model to SageMaker AI Hosting services and get a real-time endpoint that can be used for inference. These endpoints are fully managed and support auto-scaling. To learn more about real-time inference in SageMaker AI, see [Real-time inference](realtime-endpoints.md).

Before adding a deploy model step to your pipeline, make sure that your IAM role has the following permissions:
+ `sagemaker:CreateModel`
+ `sagemaker:CreateEndpointConfig`
+ `sagemaker:CreateEndpoint`
+ `sagemaker:UpdateEndpoint`
+ `sagemaker:DescribeModel`
+ `sagemaker:DescribeEndpointConfig`
+ `sagemaker:DescribeEndpoint`

To learn more about all the required permissions for SageMaker AI and how to set them up, see [Amazon SageMaker AI API Permissions: Actions, Permissions, and Resources Reference](api-permissions-reference.md).

To add a model deployment step to your Pipeline in the drag-and-drop editor, complete the following steps:

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Deploy model (endpoint)** and drag it to the canvas.

1. In the canvas, choose the **Deploy model (endpoint)** step you added.

1. In the right sidebar, complete the forms in the **Setting** and **Details** tabs.

1. If the canvas includes any step that immediately precedes the **Deploy model (endpoint)** step you added, click and drag the cursor from the step to the **Deploy model (endpoint)** step to create an edge.

1. If the canvas includes any step that immediately succeeds the **Deploy model (endpoint)** step you added, click and drag the cursor from the **Deploy model (endpoint)** step to the step to create an edge.

## Transform step
Transform

You use a transform step for batch transformation to run inference on an entire dataset. For more information about batch transformation, see [Batch transforms with inference pipelines](inference-pipeline-batch.md).

A transform step requires a transformer and the data on which to run batch transformation. The following example shows how to create a Transform step definition. For more information on Transform step requirements, see the [sagemaker.workflow.steps.TransformStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TransformStep) documentation.

------
#### [ Pipeline Designer ]

To add a batch transform step to your pipeline using the drag-and-drop visual editor, do the following:

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Deploy model (batch transform)** and drag it to the canvas.

1. In the canvas, choose the **Deploy model (batch transform)** step you added.

1. In the right sidebar, complete the forms in the **Setting** and **Details** tabs. For information about the fields in these tabs, see [ sagemaker.workflow.steps.TransformStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TransformStep).

1. If the canvas includes any step that immediately precedes the **Deploy model (batch transform)** step you added, click and drag the cursor from the step to the **Deploy model (batch transform)** step to create an edge.

1. If the canvas includes any step that immediately succeeds the **Deploy model (batch transform)** step you added, click and drag the cursor from the **Deploy model (batch transform)** step to the step to create an edge.

------
#### [ SageMaker Python SDK ]

```
from sagemaker.workflow.pipeline_context import PipelineSession

from sagemaker.transformer import Transformer
from sagemaker.inputs import TransformInput
from sagemaker.workflow.steps import TransformStep

transformer = Transformer(..., sagemaker_session=PipelineSession())

step_transform = TransformStep(
    name="AbaloneTransform",
    step_args=transformer.transform(data="s3://amzn-s3-demo-bucket/my-data"),
)
```

------

## Condition step
Condition

You use a condition step to evaluate the condition of step properties to assess which action should be taken next in the pipeline.

A condition step requires:
+ A list of conditions.
+ A list of steps to run if the condition evaluates to `true`.
+ A list of steps to run if the condition evaluates to `false`.

------
#### [ Pipeline Designer ]

To add a condition step to a pipeline using the Pipeline Designer, do the following:

1. Open the Amazon SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Condition** and drag it to the canvas.

1. In the canvas, choose the **Condition** step you added.

1. In the right sidebar, complete the forms in the **Setting** and **Details** tabs. For information about the fields in these tabs, see [ sagemaker.workflow.condition\$1step.ConditionStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.condition_step.ConditionStep).

1. If the canvas includes any step that immediately precedes the **Condition** step you added, click and drag the cursor from the step to the **Condition** step to create an edge.

1. If the canvas includes any step that immediately succeeds the **Condition** step you added, click and drag the cursor from the **Condition** step to the step to create an edge.

------
#### [ SageMaker Python SDK ]

 The following example shows how to create a `ConditionStep` definition. 

**Limitations**
+ Pipelines doesn't support the use of nested condition steps. You can't pass a condition step as the input for another condition step.
+ A condition step can't use identical steps in both branches. If you need the same step functionality in both branches, duplicate the step and give it a different name.

```
from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo
from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.functions import JsonGet

cond_lte = ConditionLessThanOrEqualTo(
    left=JsonGet(
        step_name=step_eval.name,
        property_file=evaluation_report,
        json_path="regression_metrics.mse.value"
    ),
    right=6.0
)

step_cond = ConditionStep(
    name="AbaloneMSECond",
    conditions=[cond_lte],
    if_steps=[step_register, step_create_model, step_transform],
    else_steps=[]
)
```

For more information on `ConditionStep` requirements, see the [sagemaker.workflow.condition\$1step.ConditionStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#conditionstep) API reference. For more information on supported conditions, see *[Amazon SageMaker Pipelines - Conditions](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#conditions)* in the SageMaker AI Python SDK documentation. 

------

## Callback step
`Callback`

Use a `Callback` step to add additional processes and AWS services into your workflow that aren't directly provided by Amazon SageMaker Pipelines. When a `Callback` step runs, the following procedure occurs:
+ Pipelines sends a message to a customer-specified Amazon Simple Queue Service (Amazon SQS) queue. The message contains a Pipelines–generated token and a customer-supplied list of input parameters. After sending the message, Pipelines waits for a response from the customer.
+ The customer retrieves the message from the Amazon SQS queue and starts their custom process.
+ When the process finishes, the customer calls one of the following APIs and submits the Pipelines–generated token:
  +  [SendPipelineExecutionStepSuccess](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_SendPipelineExecutionStepSuccess.html), along with a list of output parameters
  +  [SendPipelineExecutionStepFailure](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_SendPipelineExecutionStepFailure.html), along with a failure reason
+ The API call causes Pipelines to either continue the pipeline process or fail the process.

For more information on `Callback` step requirements, see the [sagemaker.workflow.callback\$1step.CallbackStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.callback_step.CallbackStep) documentation. For a complete solution, see [Extend SageMaker Pipelines to include custom steps using callback steps](https://aws.amazon.com/blogs/machine-learning/extend-amazon-sagemaker-pipelines-to-include-custom-steps-using-callback-steps/).

**Important**  
`Callback` steps were introduced in Amazon SageMaker Python SDK v2.45.0 and Amazon SageMaker Studio Classic v3.6.2. You must update Studio Classic before you use a `Callback` step or the pipeline DAG doesn't display. To update Studio Classic, see [Shut Down and Update Amazon SageMaker Studio Classic](studio-tasks-update-studio.md).

The following sample shows an implementation of the preceding procedure.

```
from sagemaker.workflow.callback_step import CallbackStep

step_callback = CallbackStep(
    name="MyCallbackStep",
    sqs_queue_url="https://sqs.us-east-2.amazonaws.com/012345678901/MyCallbackQueue",
    inputs={...},
    outputs=[...]
)

callback_handler_code = '
    import boto3
    import json

    def handler(event, context):
        sagemaker_client=boto3.client("sagemaker")

        for record in event["Records"]:
            payload=json.loads(record["body"])
            token=payload["token"]

            # Custom processing

            # Call SageMaker AI to complete the step
            sagemaker_client.send_pipeline_execution_step_success(
                CallbackToken=token,
                OutputParameters={...}
            )
'
```

**Note**  
Output parameters for `CallbackStep` should not be nested. For example, if you use a nested dictionary as your output parameter, then the dictionary is treated as a single string (ex. `{"output1": "{\"nested_output1\":\"my-output\"}"}`). If you provide a nested value, then when you try to refer to a particular output parameter, SageMaker AI throws a non-retryable client error.

**Stopping behavior**

A pipeline process doesn't stop while a `Callback` step is running.

When you call [StopPipelineExecution](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_StopPipelineExecution.html) on a pipeline process with a running `Callback` step, Pipelines sends an Amazon SQS message to the SQS queue. The body of the SQS message contains a **Status** field, which is set to `Stopping`. The following shows an example SQS message body.

```
{
  "token": "26vcYbeWsZ",
  "pipelineExecutionArn": "arn:aws:sagemaker:us-east-2:012345678901:pipeline/callback-pipeline/execution/7pinimwddh3a",
  "arguments": {
    "number": 5,
    "stringArg": "some-arg",
    "inputData": "s3://sagemaker-us-west-2-012345678901/abalone/abalone-dataset.csv"
  },
  "status": "Stopping"
}
```

You should add logic to your Amazon SQS message consumer to take any needed action (for example, resource cleanup) upon receipt of the message. Then add a call to `SendPipelineExecutionStepSuccess` or `SendPipelineExecutionStepFailure`.

Only when Pipelines receives one of these calls does it stop the pipeline process.

## Lambda step
Lambda

You use a Lambda step to run an AWS Lambda function. You can run an existing Lambda function, or SageMaker AI can create and run a new Lambda function. If you choose to use an existing Lambda function, it must be in the same AWS Region as the SageMaker AI pipeline. For a notebook that shows how to use a Lambda step in a SageMaker AI pipeline, see [sagemaker-pipelines-lambda-step.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/lambda-step/sagemaker-pipelines-lambda-step.ipynb).

**Important**  
Lambda steps were introduced in Amazon SageMaker Python SDK v2.51.0 and Amazon SageMaker Studio Classic v3.9.1. You must update Studio Classic before you use a Lambda step or the pipeline DAG doesn't display. To update Studio Classic, see [Shut Down and Update Amazon SageMaker Studio Classic](studio-tasks-update-studio.md).

SageMaker AI provides the [sagemaker.lambda\$1helper.Lambda](https://sagemaker.readthedocs.io/en/stable/api/utility/lambda_helper.html) class to create, update, invoke, and delete Lambda functions. `Lambda` has the following signature.

```
Lambda(
    function_arn,       # Only required argument to invoke an existing Lambda function

    # The following arguments are required to create a Lambda function:
    function_name,
    execution_role_arn,
    zipped_code_dir,    # Specify either zipped_code_dir and s3_bucket, OR script
    s3_bucket,          # S3 bucket where zipped_code_dir is uploaded
    script,             # Path of Lambda function script
    handler,            # Lambda handler specified as "lambda_script.lambda_handler"
    timeout,            # Maximum time the Lambda function can run before the lambda step fails
    ...
)
```

The [sagemaker.workflow.lambda\$1step.LambdaStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.lambda_step.LambdaStep) class has a `lambda_func` argument of type `Lambda`. To invoke an existing Lambda function, the only requirement is to supply the Amazon Resource Name (ARN) of the function to `function_arn`. If you don't supply a value for `function_arn`, you must specify `handler` and one of the following:
+ `zipped_code_dir` – The path of the zipped Lambda function

  `s3_bucket` – Amazon S3 bucket where `zipped_code_dir` is to be uploaded
+ `script` – The path of the Lambda function script file

The following example shows how to create a `Lambda` step definition that invokes an existing Lambda function.

```
from sagemaker.workflow.lambda_step import LambdaStep
from sagemaker.lambda_helper import Lambda

step_lambda = LambdaStep(
    name="ProcessingLambda",
    lambda_func=Lambda(
        function_arn="arn:aws:lambda:us-west-2:012345678910:function:split-dataset-lambda"
    ),
    inputs={
        s3_bucket = s3_bucket,
        data_file = data_file
    },
    outputs=[
        "train_file", "test_file"
    ]
)
```

The following example shows how to create a `Lambda` step definition that creates and invokes a Lambda function using a Lambda function script.

```
from sagemaker.workflow.lambda_step import LambdaStep
from sagemaker.lambda_helper import Lambda

step_lambda = LambdaStep(
    name="ProcessingLambda",
    lambda_func=Lambda(
      function_name="split-dataset-lambda",
      execution_role_arn=execution_role_arn,
      script="lambda_script.py",
      handler="lambda_script.lambda_handler",
      ...
    ),
    inputs={
        s3_bucket = s3_bucket,
        data_file = data_file
    },
    outputs=[
        "train_file", "test_file"
    ]
)
```

**Inputs and outputs**

If your `Lambda` function has inputs or outputs, these must also be defined in your `Lambda` step.

**Note**  
Input and output parameters should not be nested. For example, if you use a nested dictionary as your output parameter, then the dictionary is treated as a single string (ex. `{"output1": "{\"nested_output1\":\"my-output\"}"}`). If you provide a nested value and try to refer to it later, a non-retryable client error is thrown.

When defining the `Lambda` step, `inputs` must be a dictionary of key-value pairs. Each value of the `inputs` dictionary must be a primitive type (string, integer, or float). Nested objects are not supported. If left undefined, the `inputs` value defaults to `None`.

The `outputs` value must be a list of keys. These keys refer to a dictionary defined in the output of the `Lambda` function. Like `inputs`, these keys must be primitive types, and nested objects are not supported.

**Timeout and stopping behavior**

The `Lambda` class has a `timeout` argument that specifies the maximum time that the Lambda function can run. The default value is 120 seconds with a maximum value of 10 minutes. If the Lambda function is running when the timeout is met, the Lambda step fails; however, the Lambda function continues to run.

A pipeline process can't be stopped while a Lambda step is running because the Lambda function invoked by the Lambda step can't be stopped. If you stop the process while the Lambda function is running, the pipeline waits for the function to finish or until the timeout is hit. This depends on whichever occurs first. The process then stops. If the Lambda function finishes, the pipeline process status is `Stopped`. If the timeout is hit the pipeline process status is `Failed`.

## ClarifyCheck step
`ClarifyCheck`

You can use the `ClarifyCheck` step to conduct baseline drift checks against previous baselines for bias analysis and model explainability. You can then generate and [register your baselines](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html#pipelines-quality-clarify-baseline-calculations) with the `model.register()` method and pass the output of that method to [Model step](#step-type-model) using `[step\$1args](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#model-step)`. These baselines for drift check can be used by Amazon SageMaker Model Monitor for your model endpoints. As a result, you don’t need to do a [baseline](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-create-baseline.html) suggestion separately. 

The `ClarifyCheck` step can also pull baselines for drift check from the model registry. The `ClarifyCheck` step uses the SageMaker Clarify prebuilt container. This container provides a range of model monitoring capabilities, including constraint suggestion and constraint validation against a given baseline. For more information, see [Prebuilt SageMaker Clarify Containers](clarify-processing-job-configure-container.md).

### Configuring the ClarifyCheck step
Configuring `ClarifyCheck`

You can configure the `ClarifyCheck` step to conduct only one of the following check types each time it’s used in a pipeline.
+ Data bias check
+ Model bias check
+ Model explainability check

To do this, set the `clarify_check_config` parameter with one of the following check type values:
+ `DataBiasCheckConfig`
+ `ModelBiasCheckConfig`
+ `ModelExplainabilityCheckConfig`

The `ClarifyCheck` step launches a processing job that runs the SageMaker AI Clarify prebuilt container and requires dedicated [configurations for the check and the processing job](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-configure-processing-jobs.html). `ClarifyCheckConfig` and `CheckJobConfig` are helper functions for these configurations. These helper functions are aligned with how the SageMaker Clarify processing job computes for checking model bias, data bias, or model explainability. For more information, see [Run SageMaker Clarify Processing Jobs for Bias Analysis and Explainability](clarify-processing-job-run.md). 

### Controlling step behaviors for drift check
Configuring `ClarifyCheck`

The `ClarifyCheck` step requires the following two boolean flags to control its behavior:
+ `skip_check`: This parameter indicates if the drift check against the previous baseline is skipped or not. If it is set to `False`, the previous baseline of the configured check type must be available.
+ `register_new_baseline`: This parameter indicates if a newly calculated baseline can be accessed though step property `BaselineUsedForDriftCheckConstraints`. If it is set to `False`, the previous baseline of the configured check type also must be available. This can be accessed through the `BaselineUsedForDriftCheckConstraints` property. 

For more information, see [Baseline calculation, drift detection and lifecycle with ClarifyCheck and QualityCheck steps in Amazon SageMaker Pipelines](pipelines-quality-clarify-baseline-lifecycle.md).

### Working with baselines
Baselines

You can optionally specify the `model_package_group_name` to locate the existing baseline. Then, the `ClarifyCheck` step pulls the `DriftCheckBaselines` on the latest approved model package in the model package group. 

Or, you can provide a previous baseline through the `supplied_baseline_constraints` parameter. If you specify both the `model_package_group_name` and the `supplied_baseline_constraints`, the `ClarifyCheck` step uses the baseline specified by the `supplied_baseline_constraints` parameter.

For more information on using the `ClarifyCheck` step requirements, see the [ sagemaker.workflow.steps.ClarifyCheckStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.clarify_check_step.ClarifyCheckStep) in the *Amazon SageMaker AI SageMaker AI SDK for Python*. For an Amazon SageMaker Studio Classic notebook that shows how to use `ClarifyCheck` step in Pipelines, see [sagemaker-pipeline-model-monitor-clarify-steps.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/model-monitor-clarify-pipelines/sagemaker-pipeline-model-monitor-clarify-steps.ipynb).

**Example Create a `ClarifyCheck` step for data bias check**  

```
from sagemaker.workflow.check_job_config import CheckJobConfig
from sagemaker.workflow.clarify_check_step import DataBiasCheckConfig, ClarifyCheckStep
from sagemaker.workflow.execution_variables import ExecutionVariables

check_job_config = CheckJobConfig(
    role=role,
    instance_count=1,
    instance_type="ml.c5.xlarge",
    volume_size_in_gb=120,
    sagemaker_session=sagemaker_session,
)

data_bias_data_config = DataConfig(
    s3_data_input_path=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
    s3_output_path=Join(on='/', values=['s3:/', your_bucket, base_job_prefix, ExecutionVariables.PIPELINE_EXECUTION_ID, 'databiascheckstep']),
    label=0,
    dataset_type="text/csv",
    s3_analysis_config_output_path=data_bias_analysis_cfg_output_path,
)

data_bias_config = BiasConfig(
    label_values_or_threshold=[15.0], facet_name=[8], facet_values_or_threshold=[[0.5]]  
)

data_bias_check_config = DataBiasCheckConfig(
    data_config=data_bias_data_config,
    data_bias_config=data_bias_config,
)h

data_bias_check_step = ClarifyCheckStep(
    name="DataBiasCheckStep",
    clarify_check_config=data_bias_check_config,
    check_job_config=check_job_config,
    skip_check=False,
    register_new_baseline=False
   supplied_baseline_constraints="s3://sagemaker-us-west-2-111122223333/baseline/analysis.json",
    model_package_group_name="MyModelPackageGroup"
)
```

## QualityCheck step
`QualityCheck`

Use the `QualityCheck` step to conduct [baseline suggestions](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-create-baseline.html) and drift checks against a previous baseline for data quality or model quality in a pipeline. You can then generate and [register your baselines](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html#pipelines-quality-clarify-baseline-calculations) with the `model.register()` method and pass the output of that method to [Model step](#step-type-model) using `[step\$1args](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#model-step)`. ]

Model Monitor can use these baselines for drift check for your model endpoints so that you don’t need to do a baseline suggestion separately. The `QualityCheck` step can also pull baselines for drift check from the model registry. The `QualityCheck` step leverages the Amazon SageMaker AI Model Monitor prebuilt container. This container has a range of model monitoring capabilities including constraint suggestion, statistics generation, and constraint validation against a baseline. For more information, see [Amazon SageMaker Model Monitor prebuilt container](model-monitor-pre-built-container.md).

### Configuring the QualityCheck step
Configuring `QualityCheck`

You can configure the `QualityCheck` step to run only one of the following check types each time it’s used in a pipeline.
+ Data quality check
+ Model quality check

You do this by setting the `quality_check_config` parameter with one of the following check type values:
+ `DataQualityCheckConfig`
+ `ModelQualityCheckConfig`

The `QualityCheck` step launches a processing job that runs the Model Monitor prebuilt container and requires dedicated configurations for the check and the processing job. The `QualityCheckConfig` and `CheckJobConfig` are helper functions for these configurations. These helper functions are aligned with how Model Monitor creates a baseline for the model quality or data quality monitoring. For more information on the Model Monitor baseline suggestions, see [Create a Baseline](model-monitor-create-baseline.md) and [Create a model quality baseline](model-monitor-model-quality-baseline.md).

### Controlling step behaviors for drift check
Configuring `QualityCheck`

The `QualityCheck` step requires the following two Boolean flags to control its behavior:
+ `skip_check`: This parameter indicates if the drift check against the previous baseline is skipped or not. If it is set to `False`, the previous baseline of the configured check type must be available.
+ `register_new_baseline`: This parameter indicates if a newly calculated baseline can be accessed through step properties `BaselineUsedForDriftCheckConstraints` and `BaselineUsedForDriftCheckStatistics`. If it is set to `False`, the previous baseline of the configured check type must also be available. These can be accessed through the `BaselineUsedForDriftCheckConstraints` and `BaselineUsedForDriftCheckStatistics` properties.

For more information, see [Baseline calculation, drift detection and lifecycle with ClarifyCheck and QualityCheck steps in Amazon SageMaker Pipelines](pipelines-quality-clarify-baseline-lifecycle.md).

### Working with baselines
Baselines

You can specify a previous baseline directly through the `supplied_baseline_statistics` and `supplied_baseline_constraints` parameters. You can also specify the `model_package_group_name` and the `QualityCheck` step pulls the `DriftCheckBaselines` on the latest approved model package in the model package group. 

When you specify the following, the `QualityCheck` step uses the baseline specified by `supplied_baseline_constraints` and `supplied_baseline_statistics` on the check type of the `QualityCheck` step.
+ `model_package_group_name`
+ `supplied_baseline_constraints`
+ `supplied_baseline_statistics`

For more information on using the `QualityCheck` step requirements, see the [sagemaker.workflow.steps.QualityCheckStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.quality_check_step.QualityCheckStep) in the *Amazon SageMaker AI SageMaker AI SDK for Python*. For an Amazon SageMaker Studio Classic notebook that shows how to use `QualityCheck` step in Pipelines, see [sagemaker-pipeline-model-monitor-clarify-steps.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/model-monitor-clarify-pipelines/sagemaker-pipeline-model-monitor-clarify-steps.ipynb). 

**Example Create a `QualityCheck` step for data quality check**  

```
from sagemaker.workflow.check_job_config import CheckJobConfig
from sagemaker.workflow.quality_check_step import DataQualityCheckConfig, QualityCheckStep
from sagemaker.workflow.execution_variables import ExecutionVariables

check_job_config = CheckJobConfig(
    role=role,
    instance_count=1,
    instance_type="ml.c5.xlarge",
    volume_size_in_gb=120,
    sagemaker_session=sagemaker_session,
)

data_quality_check_config = DataQualityCheckConfig(
    baseline_dataset=step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
    dataset_format=DatasetFormat.csv(header=False, output_columns_position="START"),
    output_s3_uri=Join(on='/', values=['s3:/', your_bucket, base_job_prefix, ExecutionVariables.PIPELINE_EXECUTION_ID, 'dataqualitycheckstep'])
)

data_quality_check_step = QualityCheckStep(
    name="DataQualityCheckStep",
    skip_check=False,
    register_new_baseline=False,
    quality_check_config=data_quality_check_config,
    check_job_config=check_job_config,
    supplied_baseline_statistics="s3://sagemaker-us-west-2-555555555555/baseline/statistics.json",
    supplied_baseline_constraints="s3://sagemaker-us-west-2-555555555555/baseline/constraints.json",
    model_package_group_name="MyModelPackageGroup"
)
```

## EMR step
EMR

Use the Amazon SageMaker Pipelines [EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-overview.html) step to:
+ Process [Amazon EMR steps](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-work-with-steps.html) on a running Amazon EMR cluster.
+ Have the pipeline create and manage an Amazon EMR cluster for you.

For more information about Amazon EMR, see [Getting started with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-gs.html).

The EMR step requires that `EMRStepConfig` include the location of the JAR file used by the Amazon EMR cluster and any arguments to be passed. You also provide the Amazon EMR cluster ID if you want to run the step on a running EMR cluster. You can also pass the cluster configuration to run the EMR step on a cluster that it creates, manages, and terminates for you. The following sections include examples and links to sample notebooks demonstrating both methods.

**Note**  
EMR steps require that the role passed to your pipeline has additional permissions. Attach the [AWS managed policy: `AmazonSageMakerPipelinesIntegrations`](https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam-awsmanpol-pipelines.html#security-iam-awsmanpol-AmazonSageMakerPipelinesIntegrations) to your pipeline role, or ensure that the role includes the permissions in that policy.
If you process an EMR step on a running cluster, you can only use a cluster that is in one of the following states:   
`STARTING`
`BOOTSTRAPPING`
`RUNNING`
`WAITING`
If you process EMR steps on a running cluster, you can have at most 256 EMR steps in a `PENDING` state on an EMR cluster. EMR steps submitted beyond this limit result in pipeline execution failure. You may consider using [Retry Policy for Pipeline Steps](pipelines-retry-policy.md).
You can specify either cluster ID or cluster configuration, but not both.
The EMR step relies on Amazon EventBridge to monitor changes in the EMR step or cluster state. If you process your Amazon EMR job on a running cluster, the EMR step uses the `SageMakerPipelineExecutionEMRStepStatusUpdateRule` rule to monitor EMR step state. If you process your job on a cluster that the EMR step creates, the step uses the `SageMakerPipelineExecutionEMRClusterStatusRule` rule to monitor changes in cluster state. If you see either of these EventBridge rules in your AWS account, do not delete them or else your EMR step may not complete.

**Add an Amazon EMR step to your pipeline**

To add an EMR step to your pipeline, do the following:
+ Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).
+ In the left navigation pane, select **Pipelines**.
+ Choose **Create**.
+ Choose **Blank**.
+ In the left sidebar, choose **Process data** and drag it to the canvas.
+ In the canvas, choose the **Process data** step you added.
+ In the right sidebar, under mode, choose **EMR (managed)**.
+ In the right sidebar, complete the forms in the **Setting and Details** tabs. For information about the fields in these tabs, see [sagemaker.workflow.fail\$1step.EMRstep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.emr_step.EMRStep).

**Launch a new job on a running Amazon EMR cluster**

To launch a new job on a running Amazon EMR cluster, pass the cluster ID as a string to the `cluster_id` argument of `EMRStep`. The following example demonstrates this procedure.

```
from sagemaker.workflow.emr_step import EMRStep, EMRStepConfig

emr_config = EMRStepConfig(
    jar="jar-location", # required, path to jar file used
    args=["--verbose", "--force"], # optional list of arguments to pass to the jar
    main_class="com.my.Main1", # optional main class, this can be omitted if jar above has a manifest 
    properties=[ # optional list of Java properties that are set when the step runs
    {
        "key": "mapred.tasktracker.map.tasks.maximum",
        "value": "2"
    },
    {
        "key": "mapreduce.map.sort.spill.percent",
        "value": "0.90"
   },
   {
       "key": "mapreduce.tasktracker.reduce.tasks.maximum",
       "value": "5"
    }
  ]
)

step_emr = EMRStep (
    name="EMRSampleStep", # required
    cluster_id="j-1ABCDEFG2HIJK", # include cluster_id to use a running cluster
    step_config=emr_config, # required
    display_name="My EMR Step",
    description="Pipeline step to execute EMR job"
)
```

For a sample notebook that guides you through a complete example, see [ Pipelines EMR Step With Running EMR Cluster](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/emr-step/sagemaker-pipelines-emr-step-with-running-emr-cluster.ipynb).

**Launch a new job on a new Amazon EMR cluster**

To launch a new job on a new cluster that `EMRStep` creates for you, provide your cluster configuration as a dictionary. The dictionary must have the same structure as a [RunJobFlow](https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html) request. However, do not include the following fields in your cluster configuration:
+ [`Name`]
+ [`Steps`]
+ [`AutoTerminationPolicy`]
+ [`Instances`][`KeepJobFlowAliveWhenNoSteps`]
+ [`Instances`][`TerminationProtected`]

All other `RunJobFlow` arguments are available for use in your cluster configuration. For details about the request syntax, see [RunJobFlow](https://docs.aws.amazon.com/emr/latest/APIReference/API_RunJobFlow.html).

The following example passes a cluster configuration to an EMR step definition. This prompts the step to launch a new job on a new EMR cluster. The EMR cluster configuration in this example includes specifications for primary and core EMR cluster nodes. For more information about Amazon EMR node types, see [ Understand node types: primary, core, and task nodes](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html).

```
from sagemaker.workflow.emr_step import EMRStep, EMRStepConfig

emr_step_config = EMRStepConfig(
    jar="jar-location", # required, path to jar file used
    args=["--verbose", "--force"], # optional list of arguments to pass to the jar
    main_class="com.my.Main1", # optional main class, this can be omitted if jar above has a manifest 
    properties=[ # optional list of Java properties that are set when the step runs
    {
        "key": "mapred.tasktracker.map.tasks.maximum",
        "value": "2"
    },
    {
        "key": "mapreduce.map.sort.spill.percent",
        "value": "0.90"
   },
   {
       "key": "mapreduce.tasktracker.reduce.tasks.maximum",
       "value": "5"
    }
  ]
)

# include your cluster configuration as a dictionary
emr_cluster_config = {
    "Applications": [
        {
            "Name": "Spark", 
        }
    ],
    "Instances":{
        "InstanceGroups":[
            {
                "InstanceRole": "MASTER",
                "InstanceCount": 1,
                "InstanceType": "m5.2xlarge"
            },
            {
                "InstanceRole": "CORE",
                "InstanceCount": 2,
                "InstanceType": "m5.2xlarge"
            }
        ]
    },
    "BootstrapActions":[],
    "ReleaseLabel": "emr-6.6.0",
    "JobFlowRole": "job-flow-role",
    "ServiceRole": "service-role"
}

emr_step = EMRStep(
    name="emr-step",
    cluster_id=None,
    display_name="emr_step",
    description="MyEMRStepDescription",
    step_config=emr_step_config,
    cluster_config=emr_cluster_config
)
```

For a sample notebook that guides you through a complete example, see [ Pipelines EMR Step With Cluster Lifecycle Management](https://github.com/aws/amazon-sagemaker-examples/blob/main/sagemaker-pipelines/tabular/emr-step/sagemaker-pipelines-emr-step-with-cluster-lifecycle-management.ipynb).

## EMR serverless step
EMR serverless step

To add an EMR serverless step to your pipeline, do the following:
+ Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).
+ In the left navigation pane, select **Pipelines**.
+ Choose **Create**.
+ Choose **Blank**.
+ In the left sidebar, choose **Process data** and drag it to the canvas.
+ In the canvas, choose the **Process data** step you added.
+ In the right sidebar, under mode, choose **EMR (serverless)**.
+ In the right sidebar, complete the forms in the **Setting and Details** tabs.

## Notebook job step
Notebook Job

Use a `NotebookJobStep` to run your SageMaker Notebook Job non-interactively as a pipeline step. If you build your pipeline in the Pipelines drag-and-drop UI, use the [Execute code step](#step-type-executecode) to run your notebook. For more information about SageMaker Notebook Jobs, see [SageMaker Notebook Jobs](notebook-auto-run.md).

A `NotebookJobStep` requires at minimum an input notebook, image URI and kernel name. For more information about Notebook Job step requirements and other parameters you can set to customize your step, see [sagemaker.workflow.steps.NotebookJobStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.notebook_job_step.NotebookJobStep).

The following example uses minimum arguments to define a `NotebookJobStep`.

```
from sagemaker.workflow.notebook_job_step import NotebookJobStep


notebook_job_step = NotebookJobStep(
    input_notebook=input_notebook,
    image_uri=image_uri,
    kernel_name=kernel_name
)
```

Your `NotebookJobStep` pipeline step is treated as a SageMaker notebook job. As a result, track the execution status in the Studio Classic UI notebook job dashboard by including specific tags with the `tags` argument. For more details about tags to include, see [View your notebook jobs in the Studio UI dashboard](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash).

Also, if you schedule your notebook job using the SageMaker Python SDK, you can only specify certain images to run your notebook job. For more information, see [Image constraints for SageMaker AI Python SDK notebook jobs](notebook-auto-run-constraints.md#notebook-auto-run-constraints-image-sdk).

## Fail step
Fail

Use a Fail step to stop an Amazon SageMaker Pipelines execution when a desired condition or state is not achieved. The Fail step also allows you to enter a custom error message, indicating the cause of the pipeline's execution failure.

**Note**  
When a Fail step and other pipeline steps execute at the same time, the pipeline does not terminate until all concurrent steps are completed.

### Limitations for using Fail step
Fail step limitations
+ You cannot add a Fail step to the `DependsOn` list of other steps. For more information, see [Custom dependency between steps](build-and-manage-steps.md#build-and-manage-custom-dependency).
+ Other steps cannot reference the Fail step. It is *always* the last step in a pipeline's execution.
+ You cannot retry a pipeline execution ending with a Fail step.

You can create the Fail step error message in the form of a static text string. Alternatively, you can also use [Pipeline Parameters](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-parameters.html), a [Join](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html?highlight=Join#sagemaker.workflow.functions.Join) operation, or other [step properties](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#build-and-manage-properties) to create a more informative error message if you use the SDK.

------
#### [ Pipeline Designer ]

To add a Fail step to your pipeline, do the following:

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Fail** and drag it to the canvas.

1. In the canvas, choose the **Fail** step you added.

1. In the right sidebar, complete the forms in the **Setting** and **Details** tabs. For information about the fields in these tabs, see [ sagemaker.workflow.fail\$1step.FailStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.fail_step.FailStep).

1. If the canvas includes any step that immediately precedes the **Fail** step you added, click and drag the cursor from the step to the **Fail** step to create an edge.

1. If the canvas includes any step that immediately succeeds the **Fail** step you added, click and drag the cursor from the **Fail** step to the step to create an edge.

------
#### [ SageMaker Python SDK ]

**Example**  
The following example code snippet uses a `FailStep` with an `ErrorMessage` configured with Pipeline Parameters and a `Join` operation.  

```
from sagemaker.workflow.fail_step import FailStep
from sagemaker.workflow.functions import Join
from sagemaker.workflow.parameters import ParameterInteger

mse_threshold_param = ParameterInteger(name="MseThreshold", default_value=5)
step_fail = FailStep(
    name="AbaloneMSEFail",
    error_message=Join(
        on=" ", values=["Execution failed due to MSE >", mse_threshold_param]
    ),
)
```

------

# Add integration


MLflow integration allows you to use MLflow with pipelines to select a tracking server or serverless application, choose an experiment, and log metrics.

## Key concepts


**Default app creation** - A default MLflow application will be created when you enter the pipeline visual editor.

**Integrations panel** - A new integrations panel includes MLflow, which you can select and configure.

**Update app and experiment** - The option to override selected application and experiment during the pipeline execution.

## How it works

+ Go to **Pipeline Visual Editor**
+ Choose **Integration** on the toolbar
+ Choose **MLflow**
+ Configure the MLflow app and experiment

## Example screenshots


Integrations side panel

![\[The to do description.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/screenshot-pipeline-1.png)


MLflow configuration

![\[The to do description.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/screenshot-pipeline-2.png)


How to override experiment during pipeline execution

![\[The to do description.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/screenshot-pipeline-3.png)


## Step properties


Use the `properties` attribute to add data dependencies between steps in the pipeline. Pipelines use these data dependencies to construct the DAG from the pipeline definition. These properties can be referenced as placeholder values and are resolved at runtime. 

The `properties` attribute of a Pipelines step matches the object returned by a `Describe` call for the corresponding SageMaker AI job type. For each job type, the `Describe` call returns the following response object:
+ `ProcessingStep` – [DescribeProcessingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeProcessingJob.html)
+ `TrainingStep` – [DescribeTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html)
+ `TransformStep` – [DescribeTransformJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTransformJob.html)

To check which properties are referrable for each step type during data dependency creation, see *[Data Dependency - Property Reference](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#data-dependency-property-reference)* in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).

## Step parallelism


When a step does not depend on any other step, it runs immediately upon pipeline execution. However, executing too many pipeline steps in parallel can quickly exhaust available resources. Control the number of concurrent steps for a pipeline execution with `ParallelismConfiguration`.

The following example uses `ParallelismConfiguration` to set the concurrent step limit to five.

```
pipeline.create(
    parallelism_config=ParallelismConfiguration(5),
)
```

## Data dependency between steps


You define the structure of your DAG by specifying the data relationships between steps. To create data dependencies between steps, pass the properties of one step as the input to another step in the pipeline. The step receiving the input isn't started until after the step providing the input finishes running.

A data dependency uses JsonPath notation in the following format. This format traverses the JSON property file. This means you can append as many *<property>* instances as needed to reach the desired nested property in the file. For more information on JsonPath notation, see the [JsonPath repo](https://github.com/json-path/JsonPath).

```
<step_name>.properties.<property>.<property>
```

The following shows how to specify an Amazon S3 bucket using the `ProcessingOutputConfig` property of a processing step.

```
step_process.properties.ProcessingOutputConfig.Outputs["train_data"].S3Output.S3Uri
```

To create the data dependency, pass the bucket to a training step as follows.

```
from sagemaker.workflow.pipeline_context import PipelineSession

sklearn_train = SKLearn(..., sagemaker_session=PipelineSession())

step_train = TrainingStep(
    name="CensusTrain",
    step_args=sklearn_train.fit(inputs=TrainingInput(
        s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
            "train_data"].S3Output.S3Uri
    ))
)
```

To check which properties are referrable for each step type during data dependency creation, see *[Data Dependency - Property Reference](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#data-dependency-property-reference)* in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).

## Custom dependency between steps


When you specify a data dependency, Pipelines provides the data connection between the steps. Alternatively, one step can access the data from a previous step without directly using Pipelines. In this case, you can create a custom dependency that tells Pipelines not to start a step until after another step has finished running. You create a custom dependency by specifying a step's `DependsOn` attribute.

As an example, the following defines a step `C` that starts only after both step `A` and step `B` finish running.

```
{
  'Steps': [
    {'Name':'A', ...},
    {'Name':'B', ...},
    {'Name':'C', 'DependsOn': ['A', 'B']}
  ]
}
```

Pipelines throws a validation exception if the dependency would create a cyclic dependency.

The following example creates a training step that starts after a processing step finishes running.

```
processing_step = ProcessingStep(...)
training_step = TrainingStep(...)

training_step.add_depends_on([processing_step])
```

The following example creates a training step that doesn't start until two different processing steps finish running.

```
processing_step_1 = ProcessingStep(...)
processing_step_2 = ProcessingStep(...)

training_step = TrainingStep(...)

training_step.add_depends_on([processing_step_1, processing_step_2])
```

The following provides an alternate way to create the custom dependency.

```
training_step.add_depends_on([processing_step_1])
training_step.add_depends_on([processing_step_2])
```

The following example creates a training step that receives input from one processing step and waits for a different processing step to finish running.

```
processing_step_1 = ProcessingStep(...)
processing_step_2 = ProcessingStep(...)

training_step = TrainingStep(
    ...,
    inputs=TrainingInput(
        s3_data=processing_step_1.properties.ProcessingOutputConfig.Outputs[
            "train_data"
        ].S3Output.S3Uri
    )

training_step.add_depends_on([processing_step_2])
```

The following example shows how to retrieve a string list of the custom dependencies of a step.

```
custom_dependencies = training_step.depends_on
```

## Custom images in a step


 You can use any of the available SageMaker AI [Deep Learning Container images](https://github.com/aws/deep-learning-containers/blob/master/available_images.md) when you create a step in your pipeline. 

You can also use your own container with pipeline steps. Because you can’t create an image from within Studio Classic, you must create your image using another method before using it with Pipelines.

To use your own container when creating the steps for your pipeline, include the image URI in the estimator definition. For more information on using your own container with SageMaker AI, see [Using Docker Containers with SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/docker-containers.html).

# Lift-and-shift Python code with the @step decorator


The `@step` decorator is a feature that converts your local machine learning (ML) code into one or more pipeline steps. You can write your ML function as you would for any ML project. Once tested locally or as a training job using the `@remote` decorator, you can convert the function to a SageMaker AI pipeline step by adding a `@step` decorator. You can then pass the output of the `@step`-decorated function call as a step to Pipelines to create and run a pipeline. You can chain a series of functions with the `@step` decorator to create a multi-step directed acyclic graph (DAG) pipeline as well.

The setup to use the `@step` decorator is the same as the setup to use the `@remote` decorator. You can refer to the remote function documentation for details about how to [setup the environment](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator.html#train-remote-decorator-env) and [use a configuration file](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator-config.html) to set defaults. For more information about the `@step` decorator, see [sagemaker.workflow.function\$1step.step](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.function_step.step).

To view to sample notebooks that demonstrate the use of `@step` decorator, see [@step decorator sample notebooks](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-pipelines/step-decorator).

The following sections explain how you can annotate your local ML code with a `@step` decorator to create a step, create and run a pipeline using the step, and customize the experience for your use case.

**Topics**
+ [

# Create a pipeline with `@step`-decorated functions
](pipelines-step-decorator-create-pipeline.md)
+ [

# Run a pipeline
](pipelines-step-decorator-run-pipeline.md)
+ [

# Configure your pipeline
](pipelines-step-decorator-cfg-pipeline.md)
+ [

# Best Practices
](pipelines-step-decorator-best.md)
+ [

# Limitations
](pipelines-step-decorator-limit.md)

# Create a pipeline with `@step`-decorated functions


You can create a pipeline by converting Python functions into pipeline steps using the `@step` decorator, creating dependencies between those functions to create a pipeline graph (or directed acyclic graph (DAG)), and passing the leaf nodes of that graph as a list of steps to the pipeline. The following sections explain this procedure in detail with examples.

**Topics**
+ [

## Convert a function to a step
](#pipelines-step-decorator-run-pipeline-convert)
+ [

## Create dependencies between the steps
](#pipelines-step-decorator-run-pipeline-link)
+ [

## Use `ConditionStep` with `@step`-decorated steps
](#pipelines-step-decorator-condition)
+ [

## Define a pipeline using the `DelayedReturn` output of steps
](#pipelines-step-define-delayed)
+ [

## Create a pipeline
](#pipelines-step-decorator-pipeline-create)

## Convert a function to a step


To create a step using the `@step` decorator, annotate the function with `@step`. The following example shows a `@step`-decorated function that preprocesses the data.

```
from sagemaker.workflow.function_step import step

@step
def preprocess(raw_data):
    df = pandas.read_csv(raw_data)
    ...
    return procesed_dataframe
    
step_process_result = preprocess(raw_data)
```

When you invoke a `@step`-decorated function, SageMaker AI returns a `DelayedReturn` instance instead of running the function. A `DelayedReturn` instance is a proxy for the actual return of that function. The `DelayedReturn` instance can be passed to another function as an argument or directly to a pipeline instance as a step. For information about the `DelayedReturn` class, see [sagemaker.workflow.function\$1step.DelayedReturn](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.function_step.DelayedReturn).

## Create dependencies between the steps


When you create a dependency between two steps, you create a connection between the steps in your pipeline graph. The following sections introduce multiple ways you can create a dependency between your pipeline steps.

### Data dependencies through input arguments


Passing in the `DelayedReturn` output of one function as an input to another function automatically creates a data dependency in the pipeline DAG. In the following example, passing in the `DelayedReturn` output of the `preprocess` function to the `train` function creates a dependency between `preprocess` and `train`.

```
from sagemaker.workflow.function_step import step

@step
def preprocess(raw_data):
    df = pandas.read_csv(raw_data)
    ...
    return procesed_dataframe

@step
def train(training_data):
    ...
    return trained_model

step_process_result = preprocess(raw_data)    
step_train_result = train(step_process_result)
```

The previous example defines a training function which is decorated with `@step`. When this function is invoked, it receives the `DelayedReturn` output of the preprocessing pipeline step as input. Invoking the training function returns another `DelayedReturn` instance. This instance holds the information about all the previous steps defined in that function (i.e, the `preprocess` step in this example) which form the pipeline DAG.

In the previous example, the `preprocess` function returns a single value. For more complex return types like lists or tuples, refer to [Limitations](pipelines-step-decorator-limit.md).

### Define custom dependencies


In the previous example, the `train` function received the `DelayedReturn` output of `preprocess` and created a dependency. If you want to define the dependency explicitly without passing the previous step output, use the `add_depends_on` function with the step. You can use the `get_step()` function to retrieve the underlying step from its `DelayedReturn` instance, and then call `add_depends_on`\$1on with the dependency as input. To view the `get_step()` function definition, see [sagemaker.workflow.step\$1outputs.get\$1step](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_outputs.get_step). The following example shows you how to create a dependency between `preprocess` and `train` using `get_step()` and `add_depends_on()`.

```
from sagemaker.workflow.step_outputs import get_step

@step
def preprocess(raw_data):
    df = pandas.read_csv(raw_data)
    ...
    processed_data = ..
    return s3.upload(processed_data)

@step
def train():
    training_data = s3.download(....)
    ...
    return trained_model

step_process_result = preprocess(raw_data)    
step_train_result = train()

get_step(step_train_result).add_depends_on([step_process_result])
```

### Pass data to and from a `@step`-decorated function to a traditional pipeline step


You can create a pipeline that includes a `@step`-decorated step and a traditional pipeline step and passes data between them. For example, you can use `ProcessingStep` to process the data and pass its result to the `@step`-decorated training function. In the following example, a `@step`-decorated training step references the output of a processing step.

```
# Define processing step

from sagemaker.sklearn.processing import SKLearnProcessor
from sagemaker.processing import ProcessingInput, ProcessingOutput
from sagemaker.workflow.steps import ProcessingStep

sklearn_processor = SKLearnProcessor(
    framework_version='1.2-1',
    role='arn:aws:iam::123456789012:role/SagemakerExecutionRole',
    instance_type='ml.m5.large',
    instance_count='1',
)

inputs = [
    ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),
]
outputs = [
    ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
    ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
    ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
]

process_step = ProcessingStep(
    name="MyProcessStep",
    step_args=sklearn_processor.run(inputs=inputs, outputs=outputs,code='preprocessing.py'),
)
```

```
# Define a @step-decorated train step which references the 
# output of a processing step

@step
def train(train_data_path, test_data_path):
    ...
    return trained_model
    
step_train_result = train(
   process_step.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri,
   process_step.properties.ProcessingOutputConfig.Outputs["test"].S3Output.S3Uri,
)
```

## Use `ConditionStep` with `@step`-decorated steps


Pipelines supports a `ConditionStep` class which evaluates the results of preceding steps to decide what action to take in the pipeline. You can use `ConditionStep` with a `@step`-decorated step as well. To use the output of any `@step`-decorated step with `ConditionStep`, enter the output of that step as an argument to `ConditionStep`. In the following example, the condition step receives the output of the `@step`-decorated model evaluation step.

```
# Define steps

@step(name="evaluate")
def evaluate_model():
    # code to evaluate the model
    return {
        "rmse":rmse_value
    }
    
@step(name="register")
def register_model():
    # code to register the model
    ...
```

```
# Define ConditionStep

from sagemaker.workflow.condition_step import ConditionStep
from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo
from sagemaker.workflow.fail_step import FailStep

conditionally_register = ConditionStep(
    name="conditional_register",
    conditions=[
        ConditionGreaterThanOrEqualTo(
            # Output of the evaluate step must be json serializable
            left=evaluate_model()["rmse"],  # 
            right=5,
        )
    ],
    if_steps=[FailStep(name="Fail", error_message="Model performance is not good enough")],
    else_steps=[register_model()],
)
```

## Define a pipeline using the `DelayedReturn` output of steps


You define a pipeline the same way whether or not you use a `@step` decorator. When you pass a `DelayedReturn` instance to your pipeline, you don't need to pass a full list of steps to build the pipeline. The SDK automatically infers the previous steps based on the dependencies you define. All the previous steps of the `Step` objects you passed to the pipeline or `DelayedReturn` objects are included in the pipeline graph. In the following example, the pipeline receives the `DelayedReturn` object for the `train` function. SageMaker AI adds the `preprocess` step, as a previous step of `train`, to the pipeline graph.

```
from sagemaker.workflow.pipeline import Pipeline

pipeline = Pipeline(
    name="<pipeline-name>",
    steps=[step_train_result],
    sagemaker_session=<sagemaker-session>,
)
```

If there are no data or custom dependencies between the steps and you run multiple steps in parallel, the pipeline graph has more than one leaf node. Pass all of these leaf nodes in a list to the `steps` argument in your pipeline definition, as shown in the following example:

```
@step
def process1():
    ...
    return data
    
@step
def process2():
   ...
   return data
   
step_process1_result = process1()
step_process2_result = process2()

pipeline = Pipeline(
    name="<pipeline-name>",
    steps=[step_process1_result, step_process2_result],
    sagemaker_session=sagemaker-session,
)
```

When the pipeline runs, both steps run in parallel.

You only pass the leaf nodes of the graph to the pipeline because the leaf nodes contain information about all the previous steps defined through data or custom dependencies. When it compiles the pipeline, SageMaker AI also infers of all of the subsequent steps that form the pipeline graph and adds each of them as a separate step to the pipeline.

## Create a pipeline


Create a pipeline by calling `pipeline.create()`, as shown in the following snippet. For details about `create()`, see [sagemaker.workflow.pipeline.Pipeline.create](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.pipeline.Pipeline.create).

```
role = "pipeline-role"
pipeline.create(role)
```

When you call `pipeline.create()`, SageMaker AI compiles all of the steps defined as part of the pipeline instance. SageMaker AI uploads the serialized function, arguments, and all the other step-related artifacts to Amazon S3.

Data resides in the S3 bucket according to the following structure:

```
s3_root_uri/
    pipeline_name/
        sm_rf_user_ws/
            workspace.zip  # archive of the current working directory (workdir)
        step_name/
            timestamp/
                arguments/                # serialized function arguments
                function/                 # serialized function
                pre_train_dependencies/   # any dependencies and pre_execution scripts provided for the step       
        execution_id/
            step_name/
                results     # returned output from the serialized function including the model
```

`s3_root_uri` is defined in the SageMaker AI config file and applies to the entire pipeline. If undefined, the default SageMaker AI bucket is used.

**Note**  
Every time SageMaker AI compiles a pipeline, SageMaker AI saves the the steps' serialized functions, arguments and dependencies in a folder timestamped with the current time. This occurs every time you run `pipeline.create()`, `pipeline.update()`, `pipeline.upsert()` or `pipeline.definition()`.

# Run a pipeline


The following page describes how to run a pipeline with Amazon SageMaker Pipelines, either with SageMaker AI resources or locally.

Start a new pipeline run with the `pipeline.start()` function as you would for a traditional SageMaker AI pipeline run. For information about the `start()` function, see [sagemaker.workflow.pipeline.Pipeline.start](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.pipeline.Pipeline.start).

**Note**  
A step defined using the `@step` decorator runs as a training job. Therefore, be aware of the following limits:  
Instance limits and training job limits in your accounts. Update your limits accordingly to avoid any throttling or resource limit issues.
The monetary costs associated with every run of a training step in the pipeline. For more details, refer to [Amazon SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/).

## Retrieve results from a pipeline run locally


To view the result of any step of a pipeline run, use [execution.result()](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.pipeline._PipelineExecution.result           ), as shown in the following snippet:

```
execution = pipeline.start()
execution.result(step_name="train")
```

**Note**  
Pipelines does not support `execution.result()` in local mode.

You can only retrieve results for one step at a time. If the step name was generated by SageMaker AI, you can retrieve the step name by calling `list_steps` as follows:

```
execution.list_step()
```

## Run a pipeline locally


You can run a pipeline with `@step`-decorated steps locally as you would for traditional pipeline steps. For details about local mode pipeline runs, see [Run pipelines using local mode](pipelines-local-mode.md). To use local mode, provide a `LocalPipelineSession` instead of a `SageMakerSession` to your pipeline definition, as shown in the following example:

```
from sagemaker.workflow.function_step import step
from sagemaker.workflow.pipeline import Pipeline
from sagemaker.workflow.pipeline_context import LocalPipelineSession

@step
def train():
    training_data = s3.download(....)
    ...
    return trained_model
    
step_train_result = train()

local_pipeline_session = LocalPipelineSession()

local_pipeline = Pipeline(
    name="<pipeline-name>",
    steps=[step_train_result],
    sagemaker_session=local_pipeline_session # needed for local mode
)

local_pipeline.create(role_arn="role_arn")

# pipeline runs locally
execution = local_pipeline.start()
```

# Configure your pipeline


You are advised to use the SageMaker AI config file to set the defaults for the pipeline. For information about the SageMaker AI configuration file, see [Configuring and using defaults with the SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk). Any configuration added to the config file applies to all steps in the pipeline. If you want to override options for any of the steps, provide new values in the `@step` decorator arguments. The following topic describes how to set up a config file.

The `@step` decorator's configuration in the config file is identical to the `@remote` decorator's configuration. To set up the pipeline role ARN and pipeline tags in the config file, use the `Pipeline` section shown in the following snippet:

```
SchemaVersion: '1.0'
SageMaker:
  Pipeline:
    RoleArn: 'arn:aws:iam::555555555555:role/IMRole'
    Tags:
    - Key: 'tag_key'
      Value: 'tag_value'
```

For most of the defaults you can set in the configuration file you can also override by passing new values to the `@step` decorator. For example, you can override the instance type set in the config file for your preprocessing step, as shown in the following example:

```
@step(instance_type="ml.m5.large")
def preprocess(raw_data):
    df = pandas.read_csv(raw_data)
    ...
    return procesed_dataframe
```

A few arguments are not part of the `@step` decorator parameters list—these can be configured for the entire pipeline only through the SageMaker AI configuration file. They are listed as follows:
+ `sagemaker_session` (`sagemaker.session.Session`): The underlying SageMaker AI session to which SageMaker AI delegates service calls. If unspecified, a session is created using a default configuration as follows:

  ```
  SageMaker:
    PythonSDK:
      Modules:
        Session:
          DefaultS3Bucket: 'default_s3_bucket'
          DefaultS3ObjectKeyPrefix: 'key_prefix'
  ```
+ `custom_file_filter` (`CustomFileFilter)`: A `CustomFileFilter` object that specifies the local directories and files to include in the pipeline step. If unspecified, this value defaults to `None`. For `custom_file_filter` to take effect, you must set `IncludeLocalWorkdir` to `True`. The following example shows a configuration that ignores all notebook files, and files and directories named `data`.

  ```
  SchemaVersion: '1.0'
  SageMaker:
    PythonSDK:
      Modules:
        RemoteFunction:
          IncludeLocalWorkDir: true
          CustomFileFilter: 
            IgnoreNamePatterns: # files or directories to ignore
            - "*.ipynb" # all notebook files
            - "data" # folder or file named "data"
  ```

  For more details about how to use `IncludeLocalWorkdir` with `CustomFileFilter`, see [Using modular code with the @remote decorator](train-remote-decorator-modular.md).
+ `s3_root_uri (str)`: The root Amazon S3 folder to which SageMaker AI uploads the code archives and data. If unspecified, the default SageMaker AI bucket is used.
+ `s3_kms_key (str)`: The key used to encrypt the input and output data. You can only configure this argument in the SageMaker AI config file and the argument applies to all steps defined in the pipeline. If unspecified, the value defaults to `None`. See the following snippet for an example S3 KMS key configuration:

  ```
  SchemaVersion: '1.0'
  SageMaker:
    PythonSDK:
      Modules:
        RemoteFunction:
          S3KmsKeyId: 's3kmskeyid'
          S3RootUri: 's3://amzn-s3-demo-bucket/my-project
  ```

# Best Practices


The following sections suggest best practices to follow when you use the `@step` decorator for your pipeline steps.

## Use warm pools


For faster pipeline step runs, use the warm pooling functionality provided for training jobs. You can turn on the warm pool functionality by providing the `keep_alive_period_in_seconds` argument to the `@step` decorator as demonstrated in the following snippet:

```
@step(
   keep_alive_period_in_seconds=900
)
```

For more information about warm pools, see [SageMaker AI Managed Warm Pools](train-warm-pools.md). 

## Structure your directory


You are advised to use code modules while using the `@step` decorator. Put the `pipeline.py` module, in which you invoke the step functions and define the pipeline, at the root of the workspace. The recommended structure is shown as follows:

```
.
├── config.yaml # the configuration file that define the infra settings
├── requirements.txt # dependencies
├── pipeline.py  # invoke @step-decorated functions and define the pipeline here
├── steps/
| ├── processing.py
| ├── train.py
├── data/
├── test/
```

# Limitations


The following sections outline the limitations that you should be aware of when you use the `@step` decorator for your pipeline steps.

## Function argument limitations


When you pass an input argument to the `@step`-decorated function, the following limitations apply:
+ You can pass the `DelayedReturn`, `Properties` (of steps of other types), `Parameter`, and `ExecutionVariable` objects to `@step`-decorated functions as arguments. But `@step`-decorated functions do not support `JsonGet` and `Join` objects as arguments.
+ You cannot directly access a pipeline variable from a `@step` function. The following example produces an error:

  ```
  param = ParameterInteger(name="<parameter-name>", default_value=10)
  
  @step
  def func():
      print(param)
  
  func() # this raises a SerializationError
  ```
+ You cannot nest a pipeline variable in another object and pass it to a `@step` function. The following example produces an error:

  ```
  param = ParameterInteger(name="<parameter-name>", default_value=10)
  
  @step
  def func(arg):
      print(arg)
  
  func(arg=(param,)) # this raises a SerializationError because param is nested in a tuple
  ```
+ Since inputs and outputs of a function are serialized, there are restrictions on the type of data that can be passed as input or output from a function. See the *Data serialization and deserialization* section of [Invoke a remote function](train-remote-decorator-invocation.md) for more details. The same restrictions apply to `@step`-decorated functions.
+ Any object that has a boto client cannot be serialized, hence you cannot pass such objects as input to or output from a `@step`-decorated function. For example, SageMaker Python SDK client classes such as `Estimator`, `Predictor`, and `Processor` can't be serialized.

## Function imports


You should import the libraries required by the step inside rather than outside the function. If you import them at global scope, you risk an import collision while serializing the function. For example, `sklearn.pipeline.Pipeline` could be overridden by `sagemaker.workflow.pipeline.Pipeline`.

## Referencing child members of function return value


If you reference child members of a `@step`-decorated function's return value, the following limitations apply:
+ You can reference the child members with `[]` if the `DelayedReturn` object represents a tuple, list or dict, as shown in the following example:

  ```
  delayed_return[0]
  delayed_return["a_key"]
  delayed_return[1]["a_key"]
  ```
+ You cannot unpack a tuple or list output because the exact length of the underlying tuple or list can't be known when you invoke the function. The following example produces an error:

  ```
  a, b, c = func() # this raises ValueError
  ```
+ You cannot iterate over a `DelayedReturn` object. The following example raises an error:

  ```
  for item in func(): # this raises a NotImplementedError
  ```
+ You cannot reference arbitrary child members with '`.`'. The following example produces an error:

  ```
  delayed_return.a_child # raises AttributeError
  ```

## Existing pipeline features that are not supported


You cannot use the `@step` decorator with the following pipeline features:
+ [Pipeline step caching](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-caching.html)
+ [Property files](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-propertyfile.html#build-and-manage-propertyfile-property)

# Pass Data Between Steps


When building pipelines with Amazon SageMaker Pipelines, you might need to pass data from one step to the next. For example, you might want to use the model artifacts generated by a training step as input to a model evaluation or deployment step. You can use this functionality to create interdependent pipeline steps and build your ML workflows.

When you need to retrieve information from the output of a pipeline step, you can use `JsonGet`. `JsonGet` helps you extract information from Amazon S3 or property files. The following sections explain methods you can use to extract step outputs with `JsonGet`.

## Pass data between steps with Amazon S3


You can use `JsonGet` in a `ConditionStep` to fetch the JSON output directly from Amazon S3. The Amazon S3 URI can be a `Std:Join` function containing primitive strings, pipeline run variables, or pipeline parameters. The following example shows how you can use `JsonGet` in a `ConditionStep`:

```
# Example json file in s3 bucket generated by a processing_step
{
   "Output": [5, 10]
}

cond_lte = ConditionLessThanOrEqualTo(
    left=JsonGet(
        step_name="<step-name>",
        s3_uri="<s3-path-to-json>",
        json_path="Output[1]"
    ),
    right=6.0
)
```

If you are using `JsonGet` with an Amazon S3 path in the condition step, you must explicitly add a dependency between the condition step and the step generating the JSON output. In following example, the condition step is created with a dependency on the processing step:

```
cond_step = ConditionStep(
        name="<step-name>",
        conditions=[cond_lte],
        if_steps=[fail_step],
        else_steps=[register_model_step],
        depends_on=[processing_step],
)
```

## Pass data between steps with property files


Use property files to store information from the output of a processing step. This is particularly useful when analyzing the results of a processing step to decide how a conditional step should be executed. The `JsonGet` function processes a property file and enables you to use JsonPath notation to query the property JSON file. For more information on JsonPath notation, see the [JsonPath repo](https://github.com/json-path/JsonPath).

To store a property file for later use, you must first create a `PropertyFile` instance with the following format. The `path` parameter is the name of the JSON file to which the property file is saved. Any `output_name` must match the `output_name` of the `ProcessingOutput` that you define in your processing step. This enables the property file to capture the `ProcessingOutput` in the step.

```
from sagemaker.workflow.properties import PropertyFile

<property_file_instance> = PropertyFile(
    name="<property_file_name>",
    output_name="<processingoutput_output_name>",
    path="<path_to_json_file>"
)
```

When you create your `ProcessingStep` instance, add the `property_files` parameter to list all of the parameter files that the Amazon SageMaker Pipelines service must index. This saves the property file for later use.

```
property_files=[<property_file_instance>]
```

To use your property file in a condition step, add the `property_file` to the condition that you pass to your condition step as shown in the following example to query the JSON file for your desired property using the `json_path` parameter.

```
cond_lte = ConditionLessThanOrEqualTo(
    left=JsonGet(
        step_name=step_eval.name,
        property_file=<property_file_instance>,
        json_path="mse"
    ),
    right=6.0
)
```

For more in-depth examples, see *[Property File](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#property-file)* in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).

# Caching pipeline steps


In Amazon SageMaker Pipelines, you can use step caching to save time and resources when rerunning pipelines. Step caching reuses the output of a previous successful run of a step (instead of recomputing it) when the step has the same configuration and inputs. This helps you achieve consistent results across pipeline reruns with identical parameters. The following topic shows you how to configure and turn on step caching for your pipelines.

When you use step signature caching, Pipelines tries to find a previous run of your current pipeline step with the same values for certain attributes. If found, Pipelines propagates the outputs from the previous run rather than recomputing the step. The attributes checked are specific to the step type, and are listed in [Default cache key attributes by pipeline step type](pipelines-default-keys.md).

You must opt in to step caching — it is off by default. When you turn on step caching, you must also define a timeout. This timeout defines how old a previous run can be to remain a candidate for reuse.

Step caching only considers successful runs — it never reuses failed runs. When multiple successful runs exist within the timeout period, Pipelines uses the result for the most recent successful run. If no successful runs match in the timeout period, Pipelines reruns the step. If the executor finds a previous run that meets the criteria but is still in progress, both steps continue running and update the cache if they're successful.

Step caching is only scoped for individual pipelines, so you can’t reuse a step from another pipeline even if there is a step signature match.

Step caching is available for the following step types: 
+ [Processing](build-and-manage-steps-types.md#step-type-processing)
+ [Training](build-and-manage-steps-types.md#step-type-training)
+ [Tuning](build-and-manage-steps-types.md#step-type-tuning)
+ [AutoML](build-and-manage-steps-types.md#step-type-automl)
+ [Transform](build-and-manage-steps-types.md#step-type-transform)
+ [`ClarifyCheck`](build-and-manage-steps-types.md#step-type-clarify-check)
+ [`QualityCheck`](build-and-manage-steps-types.md#step-type-quality-check)
+ [EMR](build-and-manage-steps-types.md#step-type-emr)

**Topics**
+ [

# Turn on step caching
](pipelines-caching-enabling.md)
+ [

# Turn off step caching
](pipelines-caching-disabling.md)
+ [

# Default cache key attributes by pipeline step type
](pipelines-default-keys.md)
+ [

# Cached data access control
](pipelines-access-control.md)

# Turn on step caching


To turn on step caching, you must add a `CacheConfig` property to the step definition. `CacheConfig` properties use the following format in the pipeline definition file:

```
{
    "CacheConfig": {
        "Enabled": false,
        "ExpireAfter": "<time>"
    }
}
```

The `Enabled` field indicates whether caching is turned on for the particular step. You can set the field to `true`, which tells SageMaker AI to try to find a previous run of the step with the same attributes. Or, you can set the field to `false`, which tells SageMaker AI to run the step every time the pipeline runs. `ExpireAfter` is a string in [ISO 8601 duration](https://en.wikipedia.org/wiki/ISO_8601#Durations) format that defines the timeout period. The `ExpireAfter` duration can be a year, month, week, day, hour, or minute value. Each value consists of a number followed by a letter indicating the unit of duration. For example:
+ "30d" = 30 days
+ "5y" = 5 years
+ "T16m" = 16 minutes
+ "30dT5h" = 30 days and 5 hours.

The following discussion describes the procedure to turn on caching for new or pre-existing pipelines using the Amazon SageMaker Python SDK.

**Turn on caching for new pipelines**

For new pipelines, initialize a `CacheConfig` instance with `enable_caching=True` and provide it as an input to your pipeline step. The following example turns on caching with a 1-hour timeout period for a training step: 

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.steps import CacheConfig
      
cache_config = CacheConfig(enable_caching=True, expire_after="PT1H")
estimator = Estimator(..., sagemaker_session=PipelineSession())

step_train = TrainingStep(
    name="TrainAbaloneModel",
    step_args=estimator.fit(inputs=inputs),
    cache_config=cache_config
)
```

**Turn on caching for pre-existing pipelines**

To turn on caching for pre-existing, already-defined pipelines, turn on the `enable_caching` property for the step, and set `expire_after` to a timeout value. Lastly, update the pipeline with `pipeline.upsert()` or `pipeline.update()`. Once you run it again, the following code example turns on caching with a 1-hour timeout period for a training step:

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.steps import CacheConfig
from sagemaker.workflow.pipeline import Pipeline

cache_config = CacheConfig(enable_caching=True, expire_after="PT1H")
estimator = Estimator(..., sagemaker_session=PipelineSession())

step_train = TrainingStep(
    name="TrainAbaloneModel",
    step_args=estimator.fit(inputs=inputs),
    cache_config=cache_config
)

# define pipeline
pipeline = Pipeline(
    steps=[step_train]
)

# additional step for existing pipelines
pipeline.update()
# or, call upsert() to update the pipeline
# pipeline.upsert()
```

Alternatively, update the cache config after you have already defined the (pre-existing) pipeline, allowing one continuous code run. The following code sample demonstrates this method:

```
# turn on caching with timeout period of one hour
pipeline.steps[0].cache_config.enable_caching = True 
pipeline.steps[0].cache_config.expire_after = "PT1H" 

# additional step for existing pipelines
pipeline.update()
# or, call upsert() to update the pipeline
# pipeline.upsert()
```

For more detailed code examples and a discussion about how Python SDK parameters affect caching, see [ Caching Configuration](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#caching-configuration) in the Amazon SageMaker Python SDK documentation.

# Turn off step caching


A pipeline step does not rerun if you change any attributes that are not listed in [Default cache key attributes by pipeline step type](pipelines-default-keys.md) for its step type. However, you may decide that you want the pipeline step to rerun anyway. In this case, you need to turn off step caching.

To turn off step caching, set the `Enabled` attribute in the step definition’s `CacheConfig` property in the step definition to `false`, as shown in the following code snippet:

```
{
    "CacheConfig": {
        "Enabled": false,
        "ExpireAfter": "<time>"
    }
}
```

Note that the `ExpireAfter` attribute is ignored when `Enabled` is `false`.

To turn off caching for a pipeline step using the Amazon SageMaker Python SDK, define the pipeline of your pipeline step, turn off the `enable_caching` property, and update the pipeline.

Once you run it again, the following code example turns off caching for a training step:

```
from sagemaker.workflow.pipeline_context import PipelineSession
from sagemaker.workflow.steps import CacheConfig
from sagemaker.workflow.pipeline import Pipeline

cache_config = CacheConfig(enable_caching=False, expire_after="PT1H")
estimator = Estimator(..., sagemaker_session=PipelineSession())

step_train = TrainingStep(
    name="TrainAbaloneModel",
    step_args=estimator.fit(inputs=inputs),
    cache_config=cache_config
)

# define pipeline
pipeline = Pipeline(
    steps=[step_train]
)

# update the pipeline
pipeline.update()
# or, call upsert() to update the pipeline
# pipeline.upsert()
```

Alternatively, turn off the `enable_caching` property after you have already defined the pipeline, allowing one continuous code run. The following code sample demonstrates this solution:

```
# turn off caching for the training step
pipeline.steps[0].cache_config.enable_caching = False

# update the pipeline
pipeline.update()
# or, call upsert() to update the pipeline
# pipeline.upsert()
```

For more detailed code examples and a discussion about how Python SDK parameters affect caching, see [Caching Configuration](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#caching-configuration) in the Amazon SageMaker Python SDK documentation.

# Default cache key attributes by pipeline step type


When deciding whether to reuse a previous pipeline step or rerun the step, Pipelines checks to see if certain attributes have changed. If the set of attributes is different from all previous runs within the timeout period, the step runs again. These attributes include input artifacts, app or algorithm specification, and environment variables. The following list shows each pipeline step type and the attributes that, if changed, initiate a rerun of the step. For more information about which Python SDK parameters are used to create the following attributes, see [ Caching Configuration](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#caching-configuration) in the Amazon SageMaker Python SDK documentation.

## [Processing step](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateProcessingJob.html)

+ AppSpecification
+ Environment
+ ProcessingInputs. This attribute contains information about the preprocessing script.

  
## [Training step](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html)

+ AlgorithmSpecification
+ CheckpointConfig
+ DebugHookConfig
+ DebugRuleConfigurations
+ Environment
+ HyperParameters
+ InputDataConfig. This attribute contains information about the training script.

  
## [Tuning step](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateHyperParameterTuningJob.html)

+ HyperParameterTuningJobConfig
+ TrainingJobDefinition. This attribute is composed of multiple child attributes, not all of which cause the step to rerun. The child attributes that could incur a rerun (if changed) are:
  + AlgorithmSpecification
  + HyperParameterRanges
  + InputDataConfig
  + StaticHyperParameters
  + TuningObjective
+ TrainingJobDefinitions

  
## [AutoML step](https://docs.aws.amazon.com//sagemaker/latest/APIReference/API_AutoMLJobConfig.html)

+ AutoMLJobConfig. This attribute is composed of multiple child attributes, not all of which cause the step to rerun. The child attributes that could incur a rerun (if changed) are:
  + CompletionCriteria
  + CandidateGenerationConfig
  + DataSplitConfig
  + Mode
+ AutoMLJobObjective
+ InputDataConfig
+ ProblemType

  
## [Transform step](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html)

+ DataProcessing
+ Environment
+ ModelName
+ TransformInput

  
## [ClarifyCheck step](build-and-manage-steps-types.md#step-type-clarify-check)

+ ClarifyCheckConfig
+ CheckJobConfig
+ SkipCheck
+ RegisterNewBaseline
+ ModelPackageGroupName
+ SuppliedBaselineConstraints

  
## [QualityCheck step](build-and-manage-steps-types.md#step-type-quality-check)

+ QualityCheckConfig
+ CheckJobConfig
+ SkipCheck
+ RegisterNewBaseline
+ ModelPackageGroupName
+ SuppliedBaselineConstraints
+ SuppliedBaselineStatistics

  
## [EMR Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-emr)

+ ClusterId
+ StepConfig

  
# Cached data access control


When a SageMaker AI pipeline runs, it caches the parameters and metadata associated with the SageMaker AI jobs launched by the pipeline and saves them for reuse in subsequent runs. This metadata is accessible through a variety of sources in addition to cached pipeline steps, and includes the following types:
+ `Describe*Job` requests
+ CloudWatch Logs
+ CloudWatch Events
+ CloudWatch Metrics
+ SageMaker AI Search

Note that access to each data source in the list is controlled by its own set of IAM permissions. Removing a particular role’s access to one data source does not affect the level of access to the others. For example, an account admin might remove IAM permissions for `Describe*Job` requests from a caller’s role. While the caller can no longer make `Describe*Job` requests, they can still retrieve the metadata from a pipeline run with cached steps as long as they have permission to run the pipeline. If an account admin wants to remove access to the metadata from a particular SageMaker AI job completely, they need to remove permissions for each of the relevant services that provide access to the data. 

# Retry Policy for Pipeline Steps
Retry Policy

Retry policies help you automatically retry your Pipelines steps after an error occurs. Any pipeline step can encounter exceptions, and exceptions happen for various reasons. In some cases, a retry can resolve these issues. With a retry policy for pipeline steps, you can choose whether to retry a particular pipeline step or not.

The retry policy only supports the following pipeline steps:
+ [Processing step](build-and-manage-steps-types.md#step-type-processing) 
+ [Training step](build-and-manage-steps-types.md#step-type-training) 
+ [Tuning step](build-and-manage-steps-types.md#step-type-tuning) 
+ [AutoML step](build-and-manage-steps-types.md#step-type-automl) 
+ [Create model step](build-and-manage-steps-types.md#step-type-create-model) 
+ [Register model step](build-and-manage-steps-types.md#step-type-register-model) 
+ [Transform step](build-and-manage-steps-types.md#step-type-transform) 
+ [Notebook job step](build-and-manage-steps-types.md#step-type-notebook-job) 

**Note**  
Jobs running inside both the tuning and AutoML steps conduct retries internally and will not retry the `SageMaker.JOB_INTERNAL_ERROR` exception type, even if a retry policy is configured. You can program your own [ Retry Strategy](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_RetryStrategy.html) using the SageMaker API.

## Supported exception types for the retry policy


The retry policy for pipeline steps supports the following exception types:
+ `Step.SERVICE_FAULT`: These exceptions occur when an internal server error or transient error happens when calling downstream services. Pipelines retries on this type of error automatically. With a retry policy, you can override the default retry operation for this exception type.
+ `Step.THROTTLING`: Throttling exceptions can occur while calling the downstream services. Pipelines retries on this type of error automatically. With a retry policy, you can override the default retry operation for this exception type.
+ `SageMaker.JOB_INTERNAL_ERROR`: These exceptions occur when the SageMaker AI job returns `InternalServerError`. In this case, starting a new job may fix a transient issue.
+ `SageMaker.CAPACITY_ERROR`: The SageMaker AI job may encounter Amazon EC2 `InsufficientCapacityErrors`, which leads to the SageMaker AI job’s failure. You can retry by starting a new SageMaker AI job to avoid the issue. 
+ `SageMaker.RESOURCE_LIMIT`: You can exceeed the resource limit quota when running a SageMaker AI job. You can wait and retry running the SageMaker AI job after a short period and see if resources are released.

## The JSON schema for the retry policy


The retry policy for Pipelines has the following JSON schema:

```
"RetryPolicy": {
   "ExceptionType": [String]
   "IntervalSeconds": Integer
   "BackoffRate": Double
   "MaxAttempts": Integer
   "ExpireAfterMin": Integer
}
```
+ `ExceptionType`: This field requires the following exception types in a string array format.
  + `Step.SERVICE_FAULT`
  + `Step.THROTTLING`
  + `SageMaker.JOB_INTERNAL_ERROR`
  + `SageMaker.CAPACITY_ERROR`
  + `SageMaker.RESOURCE_LIMIT`
+ `IntervalSeconds` (optional): The number of seconds before the first retry attempt (1 by default). `IntervalSeconds` has a maximum value of 43200 seconds (12 hours).
+ `BackoffRate` (optional): The multiplier by which the retry interval increases during each attempt (2.0 by default).
+ `MaxAttempts` (optional): A positive integer that represents the maximum number of retry attempts (5 by default). If the error recurs more times than `MaxAttempts` specifies, retries cease and normal error handling resumes. A value of 0 specifies that errors are never retried. `MaxAttempts` has a maximum value of 20.
+ `ExpireAfterMin` (optional): A positive integer that represents the maximum timespan of retry. If the error recurs after `ExpireAfterMin` minutes counting from the step gets executed, retries cease and normal error handling resumes. A value of 0 specifies that errors are never retried. `ExpireAfterMin ` has a maximum value of 14,400 minutes (10 days).
**Note**  
Only one of `MaxAttempts` or `ExpireAfterMin` can be given, but not both; if both are *not* specified, `MaxAttempts` becomes the default. If both properties are identified within one policy, then the retry policy generates a validation error.

# Configuring a retry policy
Retry policy example

While SageMaker Pipelines provide a robust and automated way to orchestrate machine learning workflows, you might encounter failures when you run them. To handle such scenarios gracefully and improve the reliability of your pipelines, you can configure retry policies that define how and when to automatically retry specific steps after encountering an exception. The retry policy allows you to specify the types of exceptions to retry, the maximum number of retry attempts, the interval between retries, and the backoff rate for increasing the retry intervals. The following section provides examples of how to configure a retry policy for a training step in your pipeline, both in JSON and using the SageMaker Python SDK.

The following is an example of a training step with a retry policy.

```
{
    "Steps": [
        {
            "Name": "MyTrainingStep",
            "Type": "Training",
            "RetryPolicies": [
                {
                    "ExceptionType": [
                        "SageMaker.JOB_INTERNAL_ERROR",
                        "SageMaker.CAPACITY_ERROR"
                    ],
                    "IntervalSeconds": 1,
                    "BackoffRate": 2,
                    "MaxAttempts": 5
                }
            ]
        }
    ]
}
```


The following is an example of how to build a `TrainingStep` in SDK for Python (Boto3) with a retry policy.

```
from sagemaker.workflow.retry import (
    StepRetryPolicy, 
    StepExceptionTypeEnum,
    SageMakerJobExceptionTypeEnum,
    SageMakerJobStepRetryPolicy
)

step_train = TrainingStep(
    name="MyTrainingStep",
    xxx,
    retry_policies=[
        // override the default 
        StepRetryPolicy(
            exception_types=[
                StepExceptionTypeEnum.SERVICE_FAULT, 
                StepExceptionTypeEnum.THROTTLING
            ],
            expire_after_mins=5,
            interval_seconds=10,
            backoff_rate=2.0 
        ),
        // retry when resource limit quota gets exceeded
        SageMakerJobStepRetryPolicy(
            exception_types=[SageMakerJobExceptionTypeEnum.RESOURCE_LIMIT],
            expire_after_mins=120,
            interval_seconds=60,
            backoff_rate=2.0
        ),
        // retry when job failed due to transient error or EC2 ICE.
        SageMakerJobStepRetryPolicy(
            failure_reason_types=[
                SageMakerJobExceptionTypeEnum.INTERNAL_ERROR,
                SageMakerJobExceptionTypeEnum.CAPACITY_ERROR,
            ],
            max_attempts=10,
            interval_seconds=30,
            backoff_rate=2.0
        )
    ]
)
```

For more information on configuring retry behavior for certain step types, see *[Amazon SageMaker Pipelines - Retry Policy](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#retry-policy)* in the Amazon SageMaker Python SDK documentation.

# Selective execution of pipeline steps
Selective Execution

As you use Pipelines to create workflows and orchestrate your ML training steps, you might need to undertake multiple experimentation phases. Instead of running the full pipeline each time, you might only want to repeat certain steps. With Pipelines, you can execute pipeline steps selectively. This helps optimize your ML training. Selective execution is useful in the following scenarios: 
+ You want to restart a specific step with updated instance type, hyperparameters, or other variables while keeping the parameters from upstream steps.
+ Your pipeline fails an intermediate step. Previous steps in the execution, such as data preparation or feature extraction, are expensive to rerun. You might need to introduce a fix and rerun certain steps manually to complete the pipeline. 

Using selective execution, you can choose to run any subset of steps as long as they are connected in the directed acyclic graph (DAG) of your pipeline. The following DAG shows an example pipeline workflow:

![\[A directed acyclic graph (DAG) of an example pipeline.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pipeline-full.png)


You can select steps `AbaloneTrain` and `AbaloneEval` in a selective execution, but you cannot select just `AbaloneTrain` and `AbaloneMSECond` steps because these steps are not connected in the DAG. For non-selected steps in the workflow, the selective execution reuses the outputs from a reference pipeline execution rather than rerunning the steps. Also, non-selected steps that are downstream from the selected steps do not run in a selective execution. 

If you choose to run a subset of intermediate steps in your pipeline, your steps may depend on previous steps. SageMaker AI needs a reference pipeline execution from which to resource these dependencies. For example, if you choose to run the steps `AbaloneTrain` and `AbaloneEval`, you need the outputs from the `AbaloneProcess` step. You can either provide a reference execution ARN or direct SageMaker AI to use the latest pipeline execution, which is the default behavior. If you have a reference execution, you can also build the runtime parameters from your reference run and supply them to your selective executive run with overrides. For details, see [Reuse runtime parameter values from a reference execution](#pipelines-selective-ex-reuse).

In detail, you provide a configuration for your selective execution pipeline run using `SelectiveExecutionConfig`. If you include an ARN for a reference pipeline execution (with the `source_pipeline_execution_arn` argument), SageMaker AI uses the previous step dependencies from the pipeline execution you provided. If you do not include an ARN and a latest pipeline execution exists, SageMaker AI uses it as a reference by default. If you do not include an ARN and do not want SageMaker AI to use your latest pipeline execution, set `reference_latest_execution` to `False`. The pipeline execution which SageMaker AI ultimately uses as a reference, whether the latest or user-specified, must be in `Success` or `Failed` state.

The following table summarizes how SageMaker AI chooses a reference execution.


| The `source_pipeline_execution_arn` argument value | The `reference_latest_execution` argument value | The reference execution used | 
| --- | --- | --- | 
| A pipeline ARN | `True` or unspecified | The specified pipeline ARN | 
| A pipeline ARN | `False` | The specified pipeline ARN | 
| null or unspecified | `True` or unspecified | The latest pipeline execution | 
| null or unspecified | `False` | None—in this case, select steps without upstream dependencies | 

For more information about selective execution configuration requirements, see the [ sagemaker.workflow.selective\$1execution\$1config.SelectiveExecutionConfig ](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#selective-execution-config) documentation.

The following discussion includes examples for the cases in which you want to specify a pipeline reference execution, use the latest pipeline execution as a reference, or run selective execution without a reference pipeline execution.

## Selective execution with a user-specified pipeline reference


The following example demonstrates a selective execution of the steps `AbaloneTrain` and `AbaloneEval` using a reference pipeline execution.

```
from sagemaker.workflow.selective_execution_config import SelectiveExecutionConfig

selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-west-2:123123123123:pipeline/abalone/execution/123ab12cd3ef", 
    selected_steps=["AbaloneTrain", "AbaloneEval"]
)

selective_execution = pipeline.start(
    execution_display_name=f"Sample-Selective-Execution-1",
    parameters={"MaxDepth":6, "NumRound":60},
    selective_execution_config=selective_execution_config,
)
```

## Selective execution with the latest pipeline execution as a reference


The following example demonstrates a selective execution of the steps `AbaloneTrain` and `AbaloneEval` using the latest pipeline execution as a reference. Since SageMaker AI uses the latest pipeline execution by default, you can optionally set the `reference_latest_execution` argument to `True`.

```
# Prepare a new selective execution. Select only the first step in the pipeline without providing source_pipeline_execution_arn.
selective_execution_config = SelectiveExecutionConfig(
    selected_steps=["AbaloneTrain", "AbaloneEval"],
    # optional
    reference_latest_execution=True
)

# Start pipeline execution without source_pipeline_execution_arn
pipeline.start(
    execution_display_name=f"Sample-Selective-Execution-1",
    parameters={"MaxDepth":6, "NumRound":60},
    selective_execution_config=selective_execution_config,
)
```

## Selective execution without a reference pipeline


The following example demonstrates a selective execution of the steps `AbaloneProcess` and `AbaloneTrain` without providing a reference ARN and turning off the option to use the latest pipeline run as a reference. SageMaker AI permits this configuration since this subset of steps doesn’t depend on previous steps.

```
# Prepare a new selective execution. Select only the first step in the pipeline without providing source_pipeline_execution_arn.
selective_execution_config = SelectiveExecutionConfig(
    selected_steps=["AbaloneProcess", "AbaloneTrain"],
    reference_latest_execution=False
)

# Start pipeline execution without source_pipeline_execution_arn
pipeline.start(
    execution_display_name=f"Sample-Selective-Execution-1",
    parameters={"MaxDepth":6, "NumRound":60},
    selective_execution_config=selective_execution_config,
)
```

## Reuse runtime parameter values from a reference execution


You can build the parameters from your reference pipeline execution using `build_parameters_from_execution`, and supply the result to your selective execution pipeline. You can use the original parameters from the reference execution, or apply any overrides using the `parameter_value_overrides` argument.

The following example shows you how to build parameters from a reference execution and apply an override for the `MseThreshold` parameter.

```
# Prepare a new selective execution.
selective_execution_config = SelectiveExecutionConfig(
    source_pipeline_execution_arn="arn:aws:sagemaker:us-west-2:123123123123:pipeline/abalone/execution/123ab12cd3ef",
    selected_steps=["AbaloneTrain", "AbaloneEval", "AbaloneMSECond"],
)
# Define a new parameters list to test.
new_parameters_mse={
    "MseThreshold": 5,
}

# Build parameters from reference execution and override with new parameters to test.
new_parameters = pipeline.build_parameters_from_execution(
    pipeline_execution_arn="arn:aws:sagemaker:us-west-2:123123123123:pipeline/abalone/execution/123ab12cd3ef",
    parameter_value_overrides=new_parameters_mse
)

# Start pipeline execution with new parameters.
execution = pipeline.start(
    selective_execution_config=selective_execution_config,
    parameters=new_parameters
)
```

# Baseline calculation, drift detection and lifecycle with ClarifyCheck and QualityCheck steps in Amazon SageMaker Pipelines
ClarifyCheck QualityCheck Baselines

The following topic discusses how baselines and model versions evolve in the Amazon SageMaker Pipelines when using the [`ClarifyCheck`](build-and-manage-steps-types.md#step-type-clarify-check) and [`QualityCheck`](build-and-manage-steps-types.md#step-type-quality-check) steps.

For the `ClarifyCheck` step, a baseline is a single file that resides in the step properties with the suffix `constraints`. For the `QualityCheck` step, a baseline is a combination of two files that resides in the step properties: one with the suffix `statistics` and the other with the suffix `constraints`. In the following topics we discuss these properties with a prefix that describes how they are used, impacting baseline behavior and lifecycle in these two pipeline steps. For example, the `ClarifyCheck` step always calculates and assigns the new baselines in the `CalculatedBaselineConstraints` property and the `QualityCheck` step does the same in the `CalculatedBaselineConstraints` and `CalculatedBaselineStatistics` properties.

## Baseline calculation and registration for ClarifyCheck and QualityCheck steps
Baseline calculation and registration

Both the `ClarifyCheck` and `QualityCheck` steps always calculate new baselines based on step inputs through the underlying processing job run. These newly calculated baselines are accessed through the properties with the prefix `CalculatedBaseline`. You can record these properties as the `ModelMetrics` of your model package in the [Model step](build-and-manage-steps-types.md#step-type-model). This model package can be registered with 5 different baselines. You can register it with one for each check type: data bias, model bias, and model explainability from running the `ClarifyCheck` step and model quality, and data quality from running the `QualityCheck` step. The `register_new_baseline` parameter dictates the value set in the properties with the prefix `BaselineUsedForDriftCheck` after a step runs.

The following table of potential use cases shows different behaviors resulting from the step parameters you can set for the `ClarifyCheck` and `QualityCheck` steps:


| Possible use case that you may consider for selecting this configuration  | `skip_check` / `register_new_baseline` | Does step do a drift check? | Value of step property `CalculatedBaseline` | Value of step property `BaselineUsedForDriftCheck` | 
| --- | --- | --- | --- | --- | 
| You are doing regular retraining with checks enabled to get a new model version, but you *want to carry over the previous baselines* as the `DriftCheckBaselines` in the model registry for your new model version. | False/ False | Drift check runs against existing baselines | New baselines calculated by running the step | Baseline from the latest approved model in Model Registry or the baseline supplied as step parameter | 
| You are doing regular retraining with checks enabled to get a new model version, but you *want to refresh the `DriftCheckBaselines` in the model registry with the newly calculated baselines* for your new model version. | False/ True | Drift check runs against existing baselines | New baselines calculated by running the step | Newly calculated baseline by running the step (value of property CalculatedBaseline) | 
| You are initiating the pipeline to retrain a new model version because there is a violation detected by Amazon SageMaker Model Monitor on an endpoint for a particular type of check, and you want to *skip this type of check against the previous baseline, but carry over the previous baseline as `DriftCheckBaselines` in the model registry* for your new model version. | True/ False | No drift check | New baselines calculated by running | Baseline from the latest approved model in the model registry or the baseline supplied as step parameter | 
| This happens in the following cases: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-quality-clarify-baseline-lifecycle.html)  | True/ True | No drift check | New baselines calculated by running the step | Newly calculated baseline by running the step (value of property CalculatedBaseline) | 

**Note**  
If you use scientific notation in your constraint, you need to convert to float. For a preprocessing script example of how to do this, see [Create a Model Quality Baseline](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality-baseline.html).

When you register a model with [Model step](build-and-manage-steps-types.md#step-type-model), you can register the `BaselineUsedForDriftCheck` property as `DriftCheckBaselines`. These baseline files can then be used by Model Monitor for model and data quality checks. In addition, these baselines can also be used in the ClarifyCheckStep and `QualityCheck` step to compare newly trained models against the existing models that are registered in the model registry for future pipeline runs.

## Drift Detection against Previous Baselines in Pipelines
Drift Detection

In the case of the `QualityCheck` step, when you initiate the pipeline for regular retraining to get a new model version, you may not want to run the training step if the data quality and the data bias has [Schema for Violations (constraint\$1violations.json file)](model-monitor-interpreting-violations.md) on the baselines of your previous approved model version. You also may not want to register the newly trained model version if the model quality, model bias, or model explainability violates the registered baseline of your previous approved model version when running the `ClarifyCheck` step. In these cases, you can enable the checks you want by setting the `skip_check` property of the corresponding check step set to `False`, resulting in the `ClarifyCheck` and `QualityCheck` step failing if violation is detected against previous baselines. The pipeline process then does not proceed so that the model drifted from the baseline isn't registered. `ClarifyCheck` and `QualityCheck` steps are able to get `DriftCheckBaselines` of the latest approved model version of a given model package group against which to compare. Previous baselines can also be supplied directly through `supplied_baseline_constraints` (in addition to `supplied_baseline_statistics` if it is a `QualityCheck` step) and are always prioritized over any baselines pulled from the model package group. 

## Baseline and model version lifecycle and evolution with Pipelines
Baseline Lifecycle and Evolution

By setting `register_new_baseline` of your `ClarifyCheck` and `QualityCheck` step to `False`, your previous baseline is accessible through the step property prefix `BaselineUsedForDriftCheck`. You can then register these baselines as the `DriftCheckBaselines` in the new model version when you register a model with [Model step](build-and-manage-steps-types.md#step-type-model). Once you approve this new model version in the model registry, the `DriftCheckBaseline` in this model version becomes available for the `ClarifyCheck` and `QualityCheck` steps in the next pipeline process. If you want to refresh the baseline of a certain check type for future model versions, you can set `register_new_baseline` to `True` so that the properties with prefix `BaselineUsedForDriftCheck` become the newly calculated baseline. In these ways, you can preserve your preferred baselines for a model trained in the future, or refresh the baselines for drift checks when needed, managing your baseline evolution and lifecycle throughout your model training iterations. 

The following diagram illustrates a model-version-centric view of the baseline evolution and lifecycle.

![\[A model-version-centric view of the baseline evolution and lifecycle.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pipelines/Baseline-Lifecycle.png)


# Schedule Pipeline Runs
Schedule Pipeline Runs

You can schedule your Amazon SageMaker Pipelines executions using [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/what-is-amazon-eventbridge.html). Amazon SageMaker Pipelines is supported as a target in [Amazon EventBridge](https://docs.aws.amazon.com/eventbridge/latest/userguide/what-is-amazon-eventbridge.html). This allows you to initiate the execution of your model building pipeline based on any event in your event bus. With EventBridge, you can automate your pipeline executions and respond automatically to events such as training job or endpoint status changes. Events include a new file being uploaded to your Amazon S3 bucket, a change in status of your Amazon SageMaker AI endpoint due to drift, and *Amazon Simple Notification Service* (SNS) topics.

The following Pipelines actions can be automatically initiated:  
+  `StartPipelineExecution` 

For more information on scheduling SageMaker AI jobs, see [Automating SageMaker AI with Amazon EventBridge.](https://docs.aws.amazon.com/sagemaker/latest/dg/automating-sagemaker-with-eventbridge.html) 

**Topics**
+ [

## Schedule a Pipeline with Amazon EventBridge
](#pipeline-eventbridge-schedule)
+ [

## Schedule a pipeline with the SageMaker Python SDK
](#build-and-manage-scheduling)

## Schedule a Pipeline with Amazon EventBridge


To start a pipeline execution with Amazon CloudWatch Events, you must create an EventBridge [rule](https://docs.aws.amazon.com/eventbridge/latest/APIReference/API_Rule.html). When you create a rule for events, you specify a target action to take when EventBridge receives an event that matches the rule. When an event matches the rule, EventBridge sends the event to the specified target and initiates the action defined in the rule. 

 The following tutorials show how to schedule a pipeline execution with EventBridge using the EventBridge console or the AWS CLI.  

### Prerequisites

+ A role that EventBridge can assume with the `SageMaker::StartPipelineExecution` permission. This role can be created automatically if you create a rule from the EventBridge console; otherwise, you need to create this role yourself. For information on creating a SageMaker AI role, see [SageMaker Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html).
+ An Amazon SageMaker AI Pipeline to schedule. To create an Amazon SageMaker AI Pipeline, see [Define a Pipeline](https://docs.aws.amazon.com/sagemaker/latest/dg/define-pipeline.html).

### Create an EventBridge rule using the EventBridge console


 The following procedure shows how to create an EventBridge rule using the EventBridge console.  

1. Navigate to the [EventBridge console](https://console.aws.amazon.com/events). 

1. Select **Rules** on the left hand side. 

1.  Select `Create Rule`. 

1. Enter a name and description for your rule.

1.  Select how you want to initiate this rule. You have the following choices for your rule: 
   + **Event pattern**: Your rule is initiated when an event matching the pattern occurs. You can choose a predefined pattern that matches a certain type of event, or you can create a custom pattern. If you select a predefined pattern, you can edit the pattern to customize it. For more information on Event patterns, see [Event Patterns in CloudWatch Events](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatchEventsandEventPatterns.html). 
   + **Schedule**: Your rule is initiated regularly on a specified schedule. You can use a fixed-rate schedule that initiates regularly for a specified number of minutes, hour, or weeks. You can also use a [cron expression](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/ScheduledEvents.html#CronExpressions) to create a more fine-grained schedule, such as “the first Monday of each month at 8am.” Schedule is not supported on a custom or partner event bus. 

1. Select your desired Event bus. 

1. Select the target(s) to invoke when an event matches your event pattern or when the schedule is initiated. You can add up to 5 targets per rule. Select `SageMaker Pipeline` in the target dropdown list. 

1. Select the pipeline you want to initiate from the pipeline dropdown list. 

1. Add parameters to pass to your pipeline execution using a name and value pair. Parameter values can be static or dynamic. For more information on Amazon SageMaker AI Pipeline parameters, see [AWS::Events::Rule SagemakerPipelineParameters](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-pipeline.html#aws-resource-sagemaker-pipeline-properties).
   + Static values are passed to the pipeline execution every time the pipeline is initiated. For example, if `{"Name": "Instance_type", "Value": "ml.4xlarge"}` is specified in the parameter list, then it is passed as a parameter in `StartPipelineExecutionRequest` every time EventBridge initiates the pipeline. 
   + Dynamic values are specified using a JSON path. EventBridge parses the value from an event payload, then passes it to the pipeline execution. For example: *`$.detail.param.value`* 

1. Select the role to use for this rule. You can either use an existing role or create a new one. 

1. (Optional) Add tags. 

1. Select `Create` to finalize your rule. 

 Your rule is now in effect and ready to initiate your pipeline executions. 

### Create an EventBridge rule using the [AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/events/index.html)


 The following procedure shows how to create an EventBridge rule using the AWS CLI. 

1. Create a rule to be initiated. When creating an EventBridge rule using the AWS CLI, you have two options for how your rule is initiated, event pattern and schedule.
   +  **Event pattern**: Your rule is initiated when an event matching the pattern occurs. You can choose a predefined pattern that matches a certain type of event, or you can create a custom pattern. If you select a predefined pattern, you can edit the pattern to customize it.  You can create a rule with event pattern using the following command: 

     ```
     aws events put-rule --name <RULE_NAME> ----event-pattern <YOUR_EVENT_PATTERN> --description <RULE_DESCRIPTION> --role-arn <ROLE_TO_EXECUTE_PIPELINE> --tags <TAGS>
     ```
   +  **Schedule**: Your rule is initiated regularly on a specified schedule. You can use a fixed-rate schedule that initiates regularly for a specified number of minutes, hour, or weeks. You can also use a cron expression to create a more fine-grained schedule, such as “the first Monday of each month at 8am.” Schedule is not supported on a custom or partner event bus. You can create a rule with schedule using the following command: 

     ```
     aws events put-rule --name <RULE_NAME> --schedule-expression <YOUR_CRON_EXPRESSION> --description <RULE_DESCRIPTION> --role-arn <ROLE_TO_EXECUTE_PIPELINE> --tags <TAGS>
     ```

1. Add target(s) to invoke when an event matches your event pattern or when the schedule is initiated. You can add up to 5 targets per rule.  For each target, you must specify:  
   +  ARN: The resource ARN of your pipeline. 
   +  Role ARN: The ARN of the role EventBridge should assume to execute the pipeline. 
   +  Parameters:  Amazon SageMaker AI pipeline parameters to pass. 

1. Run the following command to pass a Amazon SageMaker AI pipeline as a target to your rule using [put-targets](https://docs.aws.amazon.com/cli/latest/reference/events/put-targets.html) : 

   ```
   aws events put-targets --rule <RULE_NAME> --event-bus-name <EVENT_BUS_NAME> --targets "[{\"Id\": <ID>, \"Arn\": <RESOURCE_ARN>, \"RoleArn\": <ROLE_ARN>, \"SageMakerPipelineParameter\": { \"SageMakerParameterList\": [{\"Name\": <NAME>, \"Value\": <VALUE>}]} }]"] 
   ```

## Schedule a pipeline with the SageMaker Python SDK


The following sections show you how to set up permissions to access EventBridge resources and create your pipeline schedule using the SageMaker Python SDK. 

### Required permissions


You need to have necessary permissions to use the pipeline scheduler. Complete the following steps to set up your permissions:

1. Attach the following minimum privilege policy to the IAM role used to create the pipeline triggers, or use the AWS managed policy `AmazonEventBridgeSchedulerFullAccess`.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement":
       [
           {
               "Action":
               [
                   "scheduler:ListSchedules",
                   "scheduler:GetSchedule",
                   "scheduler:CreateSchedule",
                   "scheduler:UpdateSchedule",
                   "scheduler:DeleteSchedule"
               ],
               "Effect": "Allow",
               "Resource":
               [
                   "*"
               ]
           },
           {
               "Effect": "Allow",
               "Action": "iam:PassRole",
               "Resource": "arn:aws:iam::*:role/*", 
               "Condition": {
                   "StringLike": {
                       "iam:PassedToService": "scheduler.amazonaws.com"
                   }
               }
           }
       ]
   }
   ```

------

1. Establish a trust relationship with EventBridge by adding the service principal `scheduler.amazonaws.com` to this role’s trust policy. Make sure you attach the following trust policy to the execution role if you launch the notebook in SageMaker Studio.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": [
                    "scheduler.amazonaws.com",
                    "sagemaker.amazonaws.com"
                ]
            },
        "Action": "sts:AssumeRole"
        }
    ]
}
```

------

### Create a pipeline schedule


Using the `PipelineSchedule` constructor, you can schedule a pipeline to run once or at a predetermined interval. A pipeline schedule must be of the type `at`, `rate`, or `cron`. This set of scheduling types is an extension of the [EventBridge scheduling options](https://docs.aws.amazon.com/scheduler/latest/UserGuide/schedule-types.html). For more information about how to use the `PipelineSchedule` class, see [sagemaker.workflow.triggers.PipelineSchedule](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#pipeline-schedule). The following example demonstrates how to create each scheduling type with `PipelineSchedule`.

```
from sagemaker.workflow.triggers import PipelineSchedule

# schedules a pipeline run for 12/13/2023 at time 10:15:20 UTC
my_datetime_schedule = PipelineSchedule(
    name="<schedule-name>", 
    at=datetime(2023, 12, 13, 10, 15, 20)
)

# schedules a pipeline run every 5 minutes
my_rate_schedule = PipelineSchedule(
    name="<schedule-name>", 
    rate=(5, "minutes")
)

# schedules a pipeline run at 10:15am UTC on the last Friday of each month during the years 2022 to 2023
my_cron_schedule = PipelineSchedule(
    name="<schedule-name>", 
    cron="15 10 ? * 6L 2022-2023"
)
```

**Note**  
If you create a one-time schedule and need to access the current time, use `datetime.utcnow()` instead of `datetime.now()`. The latter does not store the current zone context and results in an incorrect time passed to EventBridge.

### Attach the trigger to your pipeline


To attach your `PipelineSchedule` to your pipeline, invoke the `put_triggers` call on your created pipeline object with a list of triggers. If you get a response ARN, you successfully created the schedule in your account and EventBridge begins to invoke the target pipeline at the time or rate specified. You must specify a role with correct permissions to attach triggers to a parent pipeline. If you don't provide one, Pipelines fetches the default role used to create the pipeline from the [configuration file](https://docs.aws.amazon.com/sagemaker/latest/dg/train-remote-decorator-config.html).

The following example demonstrates how to attach a schedule to a pipeline.

```
scheduled_pipeline = Pipeline(
    name="<pipeline-name>",
    steps=[...],
    sagemaker_session=<sagemaker-session>,
)
custom_schedule = PipelineSchedule(
    name="<schedule-name>", 
    at=datetime(year=2023, month=12, date=25, hour=10, minute=30, second=30)
)
scheduled_pipeline.put_triggers(triggers=[custom_schedule], role_arn=<role>)
```

### Describe current triggers


To retrieve information about your created pipeline triggers, you can invoke the `describe_trigger()` API with the trigger name. This command returns details about the created schedule expression such as its start time, enabled state, and other useful information. The following snippet shows a sample invocation:

```
scheduled_pipeline.describe_trigger(name="<schedule-name>")
```

### Cleanup trigger resources


Before you delete your pipeline, clean up existing triggers to avoid a resource leak in your account. You should delete the triggers before destroying the parent pipeline. You can delete your triggers by passing a list of trigger names to the `delete_triggers` API. The following snippet demonstrates how to delete triggers.

```
pipeline.delete_triggers(trigger_names=["<schedule-name>"])
```

**Note**  
Be aware of the following limitations when you delete your triggers:  
The option to delete the triggers by specifying trigger names is only available in the SageMaker Python SDK. Deleting the pipeline in the CLI or a `DeletePipeline` API call does not delete your triggers. As a result, the triggers become orphaned and SageMaker AI attempts to start a run for a non-existent pipeline.
Also, if you are using another notebook session or already deleted the pipeline target, clean up orphaned schedules through the scheduler [CLI](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/scheduler/delete-schedule.html) or EventBridge console.

# Amazon SageMaker Experiments Integration
Experiments Integration

Amazon SageMaker Pipelines is closely integrated with Amazon SageMaker Experiments. By default, when Pipelines creates and executes a pipeline, the following SageMaker Experiments entities are created if they don't exist:
+ An experiment for the pipeline
+ A run group for every execution of the pipeline
+ A run that's added to the run group for each SageMaker AI job created in a pipeline execution step

You can compare metrics such as model training accuracy across multiple pipeline executions just as you can compare such metrics across multiple run groups of a SageMaker AI model training experiment.

The following sample shows the relevant parameters of the [Pipeline](https://github.com/aws/sagemaker-python-sdk/blob/v2.41.0/src/sagemaker/workflow/pipeline.py) class in the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable).

```
Pipeline(
    name="MyPipeline",
    parameters=[...],
    pipeline_experiment_config=PipelineExperimentConfig(
      ExecutionVariables.PIPELINE_NAME,
      ExecutionVariables.PIPELINE_EXECUTION_ID
    ),
    steps=[...]
)
```

If you don't want an experiment and run group created for the pipeline, set `pipeline_experiment_config` to `None`.

**Note**  
Experiments integration was introduced in the Amazon SageMaker Python SDK v2.41.0.

The following naming rules apply based on what you specify for the `ExperimentName` and `TrialName` parameters of `pipeline_experiment_config`:
+ If you don't specify `ExperimentName`, the pipeline `name` is used for the experiment name.

  If you do specify `ExperimentName`, it's used for the experiment name. If an experiment with that name exists, the pipeline-created run groups are added to the existing experiment. If an experiment with that name doesn't exist, a new experiment is created.
+ If you don't specify `TrialName`, the pipeline execution ID is used for the run group name.

  If you do specify `TrialName`, it's used for the run group name. If a run group with that name exists, the pipeline-created runs are added to the existing run group. If a run group with that name doesn't exist, a new run group is created.

**Note**  
The experiment entities aren't deleted when the pipeline that created the entities is deleted. You can use the SageMaker Experiments API to delete the entities.

For information about how to view the SageMaker AI Experiment entities associated with a pipeline, see [Access experiment data from a pipeline](pipelines-studio-experiments.md). For more information on SageMaker Experiments, see [Amazon SageMaker Experiments in Studio Classic](experiments.md).

The following sections show examples of the previous rules and how they are represented in the pipeline definition file. For more information on pipeline definition files, see [Pipelines overview](pipelines-overview.md).

**Topics**
+ [

# Default Behavior
](pipelines-experiments-default.md)
+ [

# Disable Experiments Integration
](pipelines-experiments-none.md)
+ [

# Specify a Custom Experiment Name
](pipelines-experiments-custom-experiment.md)
+ [

# Specify a Custom Run Group Name
](pipelines-experiments-custom-trial.md)

# Default Behavior


**Create a pipeline**

The default behavior when creating a SageMaker AI Pipeline is to automatically integrate it with SageMaker Experiments. If you don't specify any custom configuration, SageMaker AI creates an experiment with the same name as the pipeline, a run group for each execution of the pipeline using the pipeline execution ID as the name, and individual runs within each run group for every SageMaker AI job launched as part of the pipeline steps. You can seamlessly track and compare metrics across different pipeline executions, similar to how you would analyze a model training experiment. The following section demonstrates this default behavior when defining a pipeline without explicitly configuring the experiment integration.

The `pipeline_experiment_config` is omitted. `ExperimentName` defaults to the pipeline `name`. `TrialName` defaults to the execution ID.

```
pipeline_name = f"MyPipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[...],
    steps=[step_train]
)
```

**Pipeline definition file**

```
{
  "Version": "2020-12-01",
  "Parameters": [
    {
      "Name": "InputDataSource"
    },
    {
      "Name": "InstanceCount",
      "Type": "Integer",
      "DefaultValue": 1
    }
  ],
  "PipelineExperimentConfig": {
    "ExperimentName": {"Get": "Execution.PipelineName"},
    "TrialName": {"Get": "Execution.PipelineExecutionId"}
  },
  "Steps": [...]
}
```

# Disable Experiments Integration


**Create a pipeline**

You can disable your pipeline's integration with SageMaker Experiments by setting the `pipeline_experiment_config` parameter to `None` when you define your pipeline. This way, SageMaker AI will not automatically create an experiment, run groups, or individual runs for tracking metrics and artifacts associated with your pipeline executions. The following example sets the pipeline config parameter to `None`.

```
pipeline_name = f"MyPipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[...],
    pipeline_experiment_config=None,
    steps=[step_train]
)
```

**Pipeline definition file**

This is the same as the preceding default example, without the `PipelineExperimentConfig`.

# Specify a Custom Experiment Name


While the default behavior is to use the pipeline name as the experiment name in SageMaker Experiments, you can override this and specify a custom experiment name instead. This can be useful if you want to group multiple pipeline executions under the same experiment for easier analysis and comparison. The run group name will still default to the pipeline execution ID unless you explicitly set a custom name for that as well. The following section demonstrates how to create a pipeline with a custom experiment name while leaving the run group name as the default execution ID.

**Create a pipeline**

```
pipeline_name = f"MyPipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[...],
    pipeline_experiment_config=PipelineExperimentConfig(
      "CustomExperimentName",
      ExecutionVariables.PIPELINE_EXECUTION_ID
    ),
    steps=[step_train]
)
```

**Pipeline definition file**

```
{
  ...,
  "PipelineExperimentConfig": {
    "ExperimentName": "CustomExperimentName",
    "TrialName": {"Get": "Execution.PipelineExecutionId"}
  },
  "Steps": [...]
}
```

# Specify a Custom Run Group Name


In addition to setting a custom experiment name, you can also specify a custom name for the run groups created by SageMaker Experiments during pipeline executions. This name is appended with the pipeline execution ID to ensure uniqueness. You can specify a custom run group name to identify and analyze related pipeline runs within the same experiment. The following section shows how to define a pipeline with a custom run group name while using the default pipeline name for the experiment name.

**Create a pipeline**

```
pipeline_name = f"MyPipeline"
pipeline = Pipeline(
    name=pipeline_name,
    parameters=[...],
    pipeline_experiment_config=PipelineExperimentConfig(
      ExecutionVariables.PIPELINE_NAME,
      Join(on="-", values=["CustomTrialName", ExecutionVariables.PIPELINE_EXECUTION_ID])
    ),
    steps=[step_train]
)
```

**Pipeline definition file**

```
{
  ...,
  "PipelineExperimentConfig": {
    "ExperimentName": {"Get": "Execution.PipelineName"},
    "TrialName": {
      "On": "-",
      "Values": [
         "CustomTrialName",
         {"Get": "Execution.PipelineExecutionId"}
       ]
    }
  },
  "Steps": [...]
}
```

# Run pipelines using local mode


SageMaker Pipelines local mode is an easy way to test your training, processing and inference scripts, as well as the runtime compatibility of [pipeline parameters](https://sagemaker.readthedocs.io/en/stable/amazon_sagemaker_model_building_pipeline.html#pipeline-parameters) before you execute your pipeline on the managed SageMaker AI service. By using local mode, you can test your SageMaker AI pipeline locally using a smaller dataset. This allows quick and easy debugging of errors in user scripts and the pipeline definition itself without incurring the costs of using the managed service. The following topic shows you how to define and run pipelines locally.

Pipelines local mode leverages [SageMaker AI jobs local mode](https://sagemaker.readthedocs.io/en/stable/overview.html#local-mode) under the hood. This is a feature in the SageMaker Python SDK that allows you to run SageMaker AI built-in or custom images locally using Docker containers. Pipelines local mode is built on top of SageMaker AI jobs local mode. Therefore, you can expect to see the same results as if you were running those jobs separately. For example, local mode still uses Amazon S3 to upload model artifacts and processing outputs. If you want data generated by local jobs to reside on local disk, you can use the setup mentioned in [Local Mode](https://sagemaker.readthedocs.io/en/stable/overview.html#local-mode).

Pipeline local mode currently supports the following step types:
+ [Training step](build-and-manage-steps-types.md#step-type-training)
+ [Processing step](build-and-manage-steps-types.md#step-type-processing)
+ [Transform step](build-and-manage-steps-types.md#step-type-transform)
+ [Model Step](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-model-create) (with Create Model arguments only)
+ [Condition step](build-and-manage-steps-types.md#step-type-condition)
+ [Fail step](build-and-manage-steps-types.md#step-type-fail)

As opposed to the managed Pipelines service which allows multiple steps to execute in parallel using [Parallelism Configuration](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#parallelism-configuration), the local pipeline executor runs the steps sequentially. Therefore, overall execution performance of a local pipeline may be poorer than one that runs on the cloud - this mostly depends on the size of the dataset, algorithm, as well as the power of your local computer. Also note that Pipelines runs in local mode are not recorded in [SageMaker Experiments](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-experiments.html).

**Note**  
Pipelines local mode is not compatible with SageMaker AI algorithms such as XGBoost. If you to want use these algorithms, you must use them in [script mode](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-script-mode/sagemaker-script-mode.html).

In order to execute a pipeline locally, the `sagemaker_session` fields associated with the pipeline steps and the pipeline itself need to be of type `LocalPipelineSession`. The following example shows how you can define a SageMaker AI pipeline to execute locally.

```
from sagemaker.workflow.pipeline_context import LocalPipelineSession
from sagemaker.pytorch import PyTorch
from sagemaker.workflow.steps import TrainingStep
from sagemaker.workflow.pipeline import Pipeline

local_pipeline_session = LocalPipelineSession()

pytorch_estimator = PyTorch(
    sagemaker_session=local_pipeline_session,
    role=sagemaker.get_execution_role(),
    instance_type="ml.c5.xlarge",
    instance_count=1,
    framework_version="1.8.0",
    py_version="py36",
    entry_point="./entry_point.py",
)

step = TrainingStep(
    name="MyTrainingStep",
    step_args=pytorch_estimator.fit(
        inputs=TrainingInput(s3_data="s3://amzn-s3-demo-bucket/my-data/train"),
    )
)

pipeline = Pipeline(
    name="MyPipeline",
    steps=[step],
    sagemaker_session=local_pipeline_session
)

pipeline.create(
    role_arn=sagemaker.get_execution_role(), 
    description="local pipeline example"
)

// pipeline will execute locally
execution = pipeline.start()

steps = execution.list_steps()

training_job_name = steps['PipelineExecutionSteps'][0]['Metadata']['TrainingJob']['Arn']

step_outputs = pipeline_session.sagemaker_client.describe_training_job(TrainingJobName = training_job_name)
```

Once you are ready to execute the pipeline on the managed SageMaker Pipelines service, you can do so by replacing `LocalPipelineSession` in the previous code snippet with `PipelineSession` (as shown in the following code sample) and rerunning the code.

```
from sagemaker.workflow.pipeline_context import PipelineSession

pipeline_session = PipelineSession()
```

# Troubleshooting Amazon SageMaker Pipelines
Troubleshooting Pipelines

When using Amazon SageMaker Pipelines, you might run into issues for various reasons. This topic provides information about common errors and how to resolve them. 

 **Pipeline Definition Issues** 

Your pipeline definition might not be formatted correctly. This can result in your execution failing or your job being inaccurate. These errors can be caught when the pipeline is created or when an execution occurs. If your definition doesn’t validate, Pipelines returns an error message identifying the character where the JSON file is malformed. To fix this problem, review the steps created using the SageMaker AI Python SDK for accuracy. 

You can only include steps in a pipeline definition once. Because of this, steps cannot exist as part of a condition step *and* a pipeline in the same pipeline. 

 **Examining Pipeline Logs** 

You can view the status of your steps using the following command: 

```
execution.list_steps()
```

Each step includes the following information:
+ The ARN of the entity launched by the pipeline, such as SageMaker AI job ARN, model ARN, or model package ARN. 
+ The failure reason includes a brief explanation of the step failure.
+ If the step is a condition step, it includes whether the condition is evaluated to true or false.  
+ If the execution reuses a previous job execution, the `CacheHit` lists the source execution.  

You can also view the error messages and logs in the Amazon SageMaker Studio interface. For information about how to see the logs in Studio, see [View the details of a pipeline run](pipelines-studio-view-execution.md).

 **Missing Permissions** 

Correct permissions are required for the role that creates the pipeline execution, and the steps that create each of the jobs in your pipeline execution. Without these permissions, you may not be able to submit your pipeline execution or run your SageMaker AI jobs as expected. To ensure that your permissions are properly set up, see [IAM Access Management](build-and-manage-access.md). 

 **Job Execution Errors ** 

You may run into issues when executing your steps because of issues in the scripts that define the functionality of your SageMaker AI jobs. Each job has a set of CloudWatch logs. To view these logs from Studio, see [View the details of a pipeline run](pipelines-studio-view-execution.md). For information about using CloudWatch logs with SageMaker AI, see [CloudWatch Logs for Amazon SageMaker AI](logging-cloudwatch.md). 

 **Property File Errors** 

You may have issues when incorrectly implementing property files with your pipeline. To ensure that your implementation of property files works as expected, see [Pass Data Between Steps](build-and-manage-propertyfile.md). 

 **Issues copying the script to the container in the Dockerfile** 

You can either copy the script to the container or pass it via the `entry_point` argument (of your estimator entity) or `code` argument (of your processor entity), as demonstrated in the following code sample.

```
step_process = ProcessingStep(
    name="PreprocessAbaloneData",
    processor=sklearn_processor,
    inputs = [
        ProcessingInput(
            input_name='dataset',
            source=...,
            destination="/opt/ml/processing/code",
        )
    ],
    outputs=[
        ProcessingOutput(output_name="train", source="/opt/ml/processing/train", destination = processed_data_path),
        ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation", destination = processed_data_path),
        ProcessingOutput(output_name="test", source="/opt/ml/processing/test", destination = processed_data_path),
    ],
    code=os.path.join(BASE_DIR, "process.py"), ## Code is passed through an argument
    cache_config = cache_config,
    job_arguments = ['--input', 'arg1']
)

sklearn_estimator = SKLearn(
    entry_point=os.path.join(BASE_DIR, "train.py"), ## Code is passed through the entry_point
    framework_version="0.23-1",
    instance_type=training_instance_type,
    role=role,
    output_path=model_path, # New
    sagemaker_session=sagemaker_session, # New
    instance_count=1, # New
    base_job_name=f"{base_job_prefix}/pilot-train",
    metric_definitions=[
        {'Name': 'train:accuracy', 'Regex': 'accuracy_train=(.*?);'},
        {'Name': 'validation:accuracy', 'Regex': 'accuracy_validation=(.*?);'}
    ],
)
```

# Pipelines actions
Pipelines actions

You can use either the Amazon SageMaker Pipelines Python SDK or the drag-and-drop visual designer in Amazon SageMaker Studio to author, view, edit, execute, and monitor your ML workflows.

The following screenshot shows the visual designer that you can use to create and manage your Amazon SageMaker Pipelines.

![\[Screenshot of the visual drag-and-drop interface for Pipelines in Studio.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pipelines/pipelines-studio-overview.png)


After your pipeline is deployed, you can view the directed acyclic graph (DAG) for your pipeline and manage your executions using Amazon SageMaker Studio. Using SageMaker Studio, you can get information about your current and historical pipelines, compare executions, see the DAG for your executions, get metadata information, and more. To learn about how to view pipelines from Studio, see [View the details of a pipeline](pipelines-studio-list.md). 

**Topics**
+ [

# Define a pipeline
](define-pipeline.md)
+ [

# Edit a pipeline
](edit-pipeline-before-execution.md)
+ [

# Run a pipeline
](run-pipeline.md)
+ [

# Stop a pipeline
](pipelines-studio-stop.md)
+ [

# View the details of a pipeline
](pipelines-studio-list.md)
+ [

# View the details of a pipeline run
](pipelines-studio-view-execution.md)
+ [

# Download a pipeline definition file
](pipelines-studio-download.md)
+ [

# Access experiment data from a pipeline
](pipelines-studio-experiments.md)
+ [

# Track the lineage of a pipeline
](pipelines-lineage-tracking.md)

# Define a pipeline
Define a pipeline

To orchestrate your workflows with Amazon SageMaker Pipelines, you must generate a directed acyclic graph (DAG) in the form of a JSON pipeline definition. The DAG specifies the different steps involved in your ML process, such as data preprocessing, model training, model evaluation, and model deployment, as well as the dependencies and flow of data between these steps. The following topic shows you how to generate a pipeline definition.

You can generate your JSON pipeline definition using either the SageMaker Python SDK or the visual drag-and-drop Pipeline Designer feature in Amazon SageMaker Studio. The following image is a representation of the pipeline DAG that you create in this tutorial:

![\[Screenshot of the visual drag-and-drop interface for Pipelines in Studio.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pipelines/pipelines-studio-overview.png)


The pipeline that you define in the following sections solves a regression problem to determine the age of an abalone based on its physical measurements. For a runnable Jupyter notebook that includes the content in this tutorial, see [Orchestrating Jobs with Amazon SageMaker Model Building Pipelines](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.html).

**Note**  
You can reference the model location as a property of the training step, as shown in the end-to-end example [CustomerChurn pipeline](https://github.com/aws-samples/customer-churn-sagemaker-pipelines-sample/blob/main/pipelines/customerchurn/pipeline.py) in Github.

**Topics**

## Define a pipeline (Pipeline Designer)


The following walkthrough guides you through the steps to create a barebones pipeline using the drag-and-drop Pipeline Designer. If you need to pause or end your Pipeline editing session in the visual designer at any time, click on the **Export** option. This allows you to download the current definition of your Pipeline to your local environment. Later, when you want to resume the Pipeline editing process, you can import the same JSON definition file into the visual designer.

### Create a Processing step


To create a data processing job step, do the following:

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, select **Pipelines**.

1. Choose **Create**.

1. Choose **Blank**.

1. In the left sidebar, choose **Process data** and drag it to the canvas.

1. In the canvas, choose the **Process data** step you added.

1. To add an input dataset, choose **Add** under **Data (input)** in the right sidebar and select a dataset.

1. To add a location to save output datasets, choose **Add** under **Data (output)** in the right sidebar and navigate to the destination.

1. Complete the remaining fields in the right sidebar. For information about the fields in these tabs, see [ sagemaker.workflow.steps.ProcessingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.ProcessingStep).

### Create a Training step


To set up a model training step, do the following:

1. In the left sidebar, choose **Train model** and drag it to the canvas.

1. In the canvas, choose the **Train model** step you added.

1. To add an input dataset, choose **Add** under **Data (input)** in the right sidebar and select a dataset.

1. To choose a location to save your model artifacts, enter an Amazon S3 URI in the **Location (S3 URI)** field, or choose **Browse S3** to navigate to the destination location.

1. Complete the remaining fields in the right sidebar. For information about the fields in these tabs, see [ sagemaker.workflow.steps.TrainingStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.steps.TrainingStep).

1. Click and drag the cursor from the **Process data** step you added in the previous section to the **Train model** step to create an edge connecting the two steps.

### Create a model package with a Register model step


To create a model package with a model registration step, do the following:

1. In the left sidebar, choose **Register model** and drag it to the canvas.

1. In the canvas, choose the **Register model** step you added.

1. To select a model to register, choose **Add** under **Model (input)**.

1. Choose **Create a model group** to add your model to a new model group.

1. Complete the remaining fields in the right sidebar. For information about the fields in these tabs, see [ sagemaker.workflow.step\$1collections.RegisterModel](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.step_collections.RegisterModel).

1. Click and drag the cursor from the **Train model** step you added in the previous section to the **Register model** step to create an edge connecting the two steps.

### Deploy the model to an endpoint with a Deploy model (endpoint) step


To deploy your model using a model deployment step, do the following:

1. In the left sidebar, choose **Deploy model (endpoint)** and drag it to the canvas.

1. In the canvas, choose the **Deploy model (endpoint)** step you added.

1. To choose a model to deploy, choose **Add** under **Model (input)**.

1. Choose the **Create endpoint** radio button to create a new endpoint.

1. Enter a **Name** and **Description** for your endpoint.

1. Click and drag the cursor from the **Register model** step you added in the previous section to the **Deploy model (endpoint)** step to create an edge connecting the two steps.

1. Complete the remaining fields in the right sidebar.

### Define the Pipeline parameters


You can configure a set of Pipeline parameters whose values can be updated for every execution. To define the pipeline parameters and set the default values, click on the gear icon at the bottom of the visual designer.

### Save Pipeline


After you have entered all the required information to create your pipeline, click on **Save** at the bottom of the visual designer. This validates your pipeline for any potential errors at runtime and notifies you. The **Save** operation won't succeed until you address all errors flagged by the automated validations checks. If you want to resume editing at a later point, you can save your in-progress pipeline as a JSON definition in your local environment. You can export your Pipeline as a JSON definition file by clicking on the **Export** button at the bottom of the visual designer. Later, to resume updating your Pipeline, upload that JSON definition file by clicking on the **Import** button.

## Define a pipeline (SageMaker Python SDK)


### Prerequisites


 To run the following tutorial, complete the following: 
+ Set up your notebook instance as outlined in [Create a notebook instance](https://docs.aws.amazon.com/sagemaker/latest/dg/howitworks-create-ws.html). This gives your role permissions to read and write to Amazon S3, and create training, batch transform, and processing jobs in SageMaker AI. 
+ Grant your notebook permissions to get and pass its own role as shown in [Modifying a role permissions policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-modify_permissions-policy). Add the following JSON snippet to attach this policy to your role. Replace `<your-role-arn>` with the ARN used to create your notebook instance. 

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "iam:GetRole",
                  "iam:PassRole"
              ],
              "Resource": "arn:aws:iam::111122223333:role/role-name"
          }
      ]
  }
  ```

------
+  Trust the SageMaker AI service principal by following the steps in [Modifying a role trust policy](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-cli.html#roles-managingrole_edit-trust-policy-cli). Add the following statement fragment to the trust relationship of your role: 

  ```
  {
        "Sid": "",
        "Effect": "Allow",
        "Principal": {
          "Service": "sagemaker.amazonaws.com"
        },
        "Action": "sts:AssumeRole"
      }
  ```

#### Set up your environment


Create a new SageMaker AI session using the following code block. This returns the role ARN for the session. This role ARN should be the execution role ARN that you set up as a prerequisite. 

```
import boto3
import sagemaker
import sagemaker.session
from sagemaker.workflow.pipeline_context import PipelineSession

region = boto3.Session().region_name
sagemaker_session = sagemaker.session.Session()
role = sagemaker.get_execution_role()
default_bucket = sagemaker_session.default_bucket()

pipeline_session = PipelineSession()

model_package_group_name = f"AbaloneModelPackageGroupName"
```

### Create a pipeline


**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

Run the following steps from your SageMaker AI notebook instance to create a pipeline that includes steps for:
+ preprocessing
+ training
+ evaluation
+ conditional evaluation
+ model registration

**Note**  
You can use [ExecutionVariables](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#execution-variables) and the [ Join](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#execution-variables) function to specify your output location. `ExecutionVariables` is resolved at runtime. For instance, `ExecutionVariables.PIPELINE_EXECUTION_ID` is resolved to the ID of the current execution, which can be used as a unique identifier across different runs.

#### Step 1: Download the dataset


This notebook uses the UCI Machine Learning Abalone Dataset. The dataset contains the following features: 
+ `length` – The longest shell measurement of the abalone.
+ `diameter` – The diameter of the abalone perpendicular to its length.
+ `height` – The height of the abalone with meat in the shell.
+ `whole_weight` – The weight of the whole abalone.
+ `shucked_weight` – The weight of the meat removed from the abalone.
+ `viscera_weight` – The weight of the abalone viscera after bleeding.
+ `shell_weight` – The weight of the abalone shell after meat removal and drying.
+ `sex` – The sex of the abalone. One of 'M', 'F', or 'I', where 'I' is an infant abalone.
+ `rings` – The number of rings in the abalone shell.

The number of rings in the abalone shell is a good approximation for its age using the formula `age=rings + 1.5`. However, getting this number is a time-consuming task. You must cut the shell through the cone, stain the section, and count the number of rings through a microscope. However, the other physical measurements are easier to get. This notebook uses the dataset to build a predictive model of the variable rings using the other physical measurements.

**To download the dataset**

1. Download the dataset into your account's default Amazon S3 bucket.

   ```
   !mkdir -p data
   local_path = "data/abalone-dataset.csv"
   
   s3 = boto3.resource("s3")
   s3.Bucket(f"sagemaker-servicecatalog-seedcode-{region}").download_file(
       "dataset/abalone-dataset.csv",
       local_path
   )
   
   base_uri = f"s3://{default_bucket}/abalone"
   input_data_uri = sagemaker.s3.S3Uploader.upload(
       local_path=local_path, 
       desired_s3_uri=base_uri,
   )
   print(input_data_uri)
   ```

1. Download a second dataset for batch transformation after your model is created.

   ```
   local_path = "data/abalone-dataset-batch.csv"
   
   s3 = boto3.resource("s3")
   s3.Bucket(f"sagemaker-servicecatalog-seedcode-{region}").download_file(
       "dataset/abalone-dataset-batch",
       local_path
   )
   
   base_uri = f"s3://{default_bucket}/abalone"
   batch_data_uri = sagemaker.s3.S3Uploader.upload(
       local_path=local_path, 
       desired_s3_uri=base_uri,
   )
   print(batch_data_uri)
   ```

#### Step 2: Define pipeline parameters


 This code block defines the following parameters for your pipeline: 
+  `processing_instance_count` – The instance count of the processing job. 
+  `input_data` – The Amazon S3 location of the input data. 
+  `batch_data` – The Amazon S3 location of the input data for batch transformation. 
+  `model_approval_status` – The approval status to register the trained model with for CI/CD. For more information, see [MLOps Automation With SageMaker Projects](sagemaker-projects.md).

```
from sagemaker.workflow.parameters import (
    ParameterInteger,
    ParameterString,
)

processing_instance_count = ParameterInteger(
    name="ProcessingInstanceCount",
    default_value=1
)
model_approval_status = ParameterString(
    name="ModelApprovalStatus",
    default_value="PendingManualApproval"
)
input_data = ParameterString(
    name="InputData",
    default_value=input_data_uri,
)
batch_data = ParameterString(
    name="BatchData",
    default_value=batch_data_uri,
)
```

#### Step 3: Define a processing step for feature engineering


This section shows how to create a processing step to prepare the data from the dataset for training.

**To create a processing step**

1.  Create a directory for the processing script.

   ```
   !mkdir -p abalone
   ```

1. Create a file in the `/abalone` directory named `preprocessing.py` with the following content. This preprocessing script is passed in to the processing step for running on the input data. The training step then uses the preprocessed training features and labels to train a model. The evaluation step uses the trained model and preprocessed test features and labels to evaluate the model. The script uses `scikit-learn` to do the following:
   +  Fill in missing `sex` categorical data and encode it so it's suitable for training. 
   +  Scale and normalize all numerical fields except for `rings` and `sex`. 
   +  Split the data into training, test, and validation datasets. 

   ```
   %%writefile abalone/preprocessing.py
   import argparse
   import os
   import requests
   import tempfile
   import numpy as np
   import pandas as pd
   
   
   from sklearn.compose import ColumnTransformer
   from sklearn.impute import SimpleImputer
   from sklearn.pipeline import Pipeline
   from sklearn.preprocessing import StandardScaler, OneHotEncoder
   
   
   # Because this is a headerless CSV file, specify the column names here.
   feature_columns_names = [
       "sex",
       "length",
       "diameter",
       "height",
       "whole_weight",
       "shucked_weight",
       "viscera_weight",
       "shell_weight",
   ]
   label_column = "rings"
   
   feature_columns_dtype = {
       "sex": str,
       "length": np.float64,
       "diameter": np.float64,
       "height": np.float64,
       "whole_weight": np.float64,
       "shucked_weight": np.float64,
       "viscera_weight": np.float64,
       "shell_weight": np.float64
   }
   label_column_dtype = {"rings": np.float64}
   
   
   def merge_two_dicts(x, y):
       z = x.copy()
       z.update(y)
       return z
   
   
   if __name__ == "__main__":
       base_dir = "/opt/ml/processing"
   
       df = pd.read_csv(
           f"{base_dir}/input/abalone-dataset.csv",
           header=None, 
           names=feature_columns_names + [label_column],
           dtype=merge_two_dicts(feature_columns_dtype, label_column_dtype)
       )
       numeric_features = list(feature_columns_names)
       numeric_features.remove("sex")
       numeric_transformer = Pipeline(
           steps=[
               ("imputer", SimpleImputer(strategy="median")),
               ("scaler", StandardScaler())
           ]
       )
   
       categorical_features = ["sex"]
       categorical_transformer = Pipeline(
           steps=[
               ("imputer", SimpleImputer(strategy="constant", fill_value="missing")),
               ("onehot", OneHotEncoder(handle_unknown="ignore"))
           ]
       )
   
       preprocess = ColumnTransformer(
           transformers=[
               ("num", numeric_transformer, numeric_features),
               ("cat", categorical_transformer, categorical_features)
           ]
       )
       
       y = df.pop("rings")
       X_pre = preprocess.fit_transform(df)
       y_pre = y.to_numpy().reshape(len(y), 1)
       
       X = np.concatenate((y_pre, X_pre), axis=1)
       
       np.random.shuffle(X)
       train, validation, test = np.split(X, [int(.7*len(X)), int(.85*len(X))])
   
       
       pd.DataFrame(train).to_csv(f"{base_dir}/train/train.csv", header=False, index=False)
       pd.DataFrame(validation).to_csv(f"{base_dir}/validation/validation.csv", header=False, index=False)
       pd.DataFrame(test).to_csv(f"{base_dir}/test/test.csv", header=False, index=False)
   ```

1.  Create an instance of an `SKLearnProcessor` to pass in to the processing step. 

   ```
   from sagemaker.sklearn.processing import SKLearnProcessor
   
   
   framework_version = "0.23-1"
   
   sklearn_processor = SKLearnProcessor(
       framework_version=framework_version,
       instance_type="ml.m5.xlarge",
       instance_count=processing_instance_count,
       base_job_name="sklearn-abalone-process",
       sagemaker_session=pipeline_session,
       role=role,
   )
   ```

1. Create a processing step. This step takes in the `SKLearnProcessor`, the input and output channels, and the `preprocessing.py` script that you created. This is very similar to a processor instance's `run` method in the SageMaker AI Python SDK. The `input_data` parameter passed into `ProcessingStep` is the input data of the step itself. This input data is used by the processor instance when it runs. 

    Note the  `"train`, `"validation`, and `"test"` named channels specified in the output configuration for the processing job. Step `Properties` such as these can be used in subsequent steps and resolve to their runtime values at runtime. 

   ```
   from sagemaker.processing import ProcessingInput, ProcessingOutput
   from sagemaker.workflow.steps import ProcessingStep
      
   
   processor_args = sklearn_processor.run(
       inputs=[
         ProcessingInput(source=input_data, destination="/opt/ml/processing/input"),  
       ],
       outputs=[
           ProcessingOutput(output_name="train", source="/opt/ml/processing/train"),
           ProcessingOutput(output_name="validation", source="/opt/ml/processing/validation"),
           ProcessingOutput(output_name="test", source="/opt/ml/processing/test")
       ],
       code="abalone/preprocessing.py",
   ) 
   
   step_process = ProcessingStep(
       name="AbaloneProcess",
       step_args=processor_args
   )
   ```

#### Step 4: Define a training step


This section shows how to use the SageMaker AI [XGBoost Algorithm](https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html) to train a model on the training data output from the processing steps. 

**To define a training step**

1.  Specify the model path where you want to save the models from training. 

   ```
   model_path = f"s3://{default_bucket}/AbaloneTrain"
   ```

1. Configure an estimator for the XGBoost algorithm and the input dataset. The training instance type is passed into the estimator. A typical training script:
   + loads data from the input channels
   + configures training with hyperparameters
   + trains a model
   + saves a model to `model_dir` so that it can be hosted later

   SageMaker AI uploads the model to Amazon S3 in the form of a `model.tar.gz` at the end of the training job.

   ```
   from sagemaker.estimator import Estimator
   
   
   image_uri = sagemaker.image_uris.retrieve(
       framework="xgboost",
       region=region,
       version="1.0-1",
       py_version="py3",
       instance_type="ml.m5.xlarge"
   )
   xgb_train = Estimator(
       image_uri=image_uri,
       instance_type="ml.m5.xlarge",
       instance_count=1,
       output_path=model_path,
       sagemaker_session=pipeline_session,
       role=role,
   )
   xgb_train.set_hyperparameters(
       objective="reg:linear",
       num_round=50,
       max_depth=5,
       eta=0.2,
       gamma=4,
       min_child_weight=6,
       subsample=0.7,
       silent=0
   )
   ```

1. Create a `TrainingStep` using the estimator instance and properties of the `ProcessingStep`. Pass in the `S3Uri` of the `"train"` and `"validation"` output channel to the `TrainingStep`.  

   ```
   from sagemaker.inputs import TrainingInput
   from sagemaker.workflow.steps import TrainingStep
   
   
   train_args = xgb_train.fit(
       inputs={
           "train": TrainingInput(
               s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                   "train"
               ].S3Output.S3Uri,
               content_type="text/csv"
           ),
           "validation": TrainingInput(
               s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                   "validation"
               ].S3Output.S3Uri,
               content_type="text/csv"
           )
       },
   )
   
   step_train = TrainingStep(
       name="AbaloneTrain",
       step_args = train_args
   )
   ```

#### Step 5: Define a processing step for model evaluation


This section shows how to create a processing step to evaluate the accuracy of the model. The result of this model evaluation is used in the condition step to determine which run path to take.

**To define a processing step for model evaluation**

1. Create a file in the `/abalone` directory named `evaluation.py`. This script is used in a processing step to perform model evaluation. It takes a trained model and the test dataset as input, then produces a JSON file containing classification evaluation metrics.

   ```
   %%writefile abalone/evaluation.py
   import json
   import pathlib
   import pickle
   import tarfile
   import joblib
   import numpy as np
   import pandas as pd
   import xgboost
   
   
   from sklearn.metrics import mean_squared_error
   
   
   if __name__ == "__main__":
       model_path = f"/opt/ml/processing/model/model.tar.gz"
       with tarfile.open(model_path) as tar:
           tar.extractall(path=".")
       
       model = pickle.load(open("xgboost-model", "rb"))
   
       test_path = "/opt/ml/processing/test/test.csv"
       df = pd.read_csv(test_path, header=None)
       
       y_test = df.iloc[:, 0].to_numpy()
       df.drop(df.columns[0], axis=1, inplace=True)
       
       X_test = xgboost.DMatrix(df.values)
       
       predictions = model.predict(X_test)
   
       mse = mean_squared_error(y_test, predictions)
       std = np.std(y_test - predictions)
       report_dict = {
           "regression_metrics": {
               "mse": {
                   "value": mse,
                   "standard_deviation": std
               },
           },
       }
   
       output_dir = "/opt/ml/processing/evaluation"
       pathlib.Path(output_dir).mkdir(parents=True, exist_ok=True)
       
       evaluation_path = f"{output_dir}/evaluation.json"
       with open(evaluation_path, "w") as f:
           f.write(json.dumps(report_dict))
   ```

1.  Create an instance of a `ScriptProcessor` that is used to create a `ProcessingStep`. 

   ```
   from sagemaker.processing import ScriptProcessor
   
   
   script_eval = ScriptProcessor(
       image_uri=image_uri,
       command=["python3"],
       instance_type="ml.m5.xlarge",
       instance_count=1,
       base_job_name="script-abalone-eval",
       sagemaker_session=pipeline_session,
       role=role,
   )
   ```

1.  Create a `ProcessingStep` using the processor instance, the input and output channels, and the  `evaluation.py` script. Pass in:
   + the `S3ModelArtifacts` property from the `step_train` training step
   + the `S3Uri` of the `"test"` output channel of the `step_process` processing step

   This is very similar to a processor instance's `run` method in the SageMaker AI Python SDK.  

   ```
   from sagemaker.workflow.properties import PropertyFile
   
   
   evaluation_report = PropertyFile(
       name="EvaluationReport",
       output_name="evaluation",
       path="evaluation.json"
   )
   
   eval_args = script_eval.run(
           inputs=[
           ProcessingInput(
               source=step_train.properties.ModelArtifacts.S3ModelArtifacts,
               destination="/opt/ml/processing/model"
           ),
           ProcessingInput(
               source=step_process.properties.ProcessingOutputConfig.Outputs[
                   "test"
               ].S3Output.S3Uri,
               destination="/opt/ml/processing/test"
           )
       ],
       outputs=[
           ProcessingOutput(output_name="evaluation", source="/opt/ml/processing/evaluation"),
       ],
       code="abalone/evaluation.py",
   )
   
   step_eval = ProcessingStep(
       name="AbaloneEval",
       step_args=eval_args,
       property_files=[evaluation_report],
   )
   ```

#### Step 6: Define a CreateModelStep for batch transformation


**Important**  
We recommend using [Model step](build-and-manage-steps-types.md#step-type-model) to create models as of v2.90.0 of the SageMaker Python SDK. `CreateModelStep` will continue to work in previous versions of the SageMaker Python SDK, but is no longer actively supported.

This section shows how to create a SageMaker AI model from the output of the training step. This model is used for batch transformation on a new dataset. This step is passed into the condition step and only runs if the condition step evaluates to `true`.

**To define a CreateModelStep for batch transformation**

1.  Create a SageMaker AI model. Pass in the `S3ModelArtifacts` property from the `step_train` training step.

   ```
   from sagemaker.model import Model
   
   
   model = Model(
       image_uri=image_uri,
       model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
       sagemaker_session=pipeline_session,
       role=role,
   )
   ```

1. Define the model input for your SageMaker AI model.

   ```
   from sagemaker.inputs import CreateModelInput
   
   
   inputs = CreateModelInput(
       instance_type="ml.m5.large",
       accelerator_type="ml.eia1.medium",
   )
   ```

1. Create your `CreateModelStep` using the `CreateModelInput` and SageMaker AI model instance you defined.

   ```
   from sagemaker.workflow.steps import CreateModelStep
   
   
   step_create_model = CreateModelStep(
       name="AbaloneCreateModel",
       model=model,
       inputs=inputs,
   )
   ```

#### Step 7: Define a TransformStep to perform batch transformation


This section shows how to create a `TransformStep` to perform batch transformation on a dataset after the model is trained. This step is passed into the condition step and only runs if the condition step evaluates to `true`.

**To define a TransformStep to perform batch transformation**

1. Create a transformer instance with the appropriate compute instance type, instance count, and desired output Amazon S3 bucket URI. Pass in the `ModelName` property from the `step_create_model` `CreateModel` step. 

   ```
   from sagemaker.transformer import Transformer
   
   
   transformer = Transformer(
       model_name=step_create_model.properties.ModelName,
       instance_type="ml.m5.xlarge",
       instance_count=1,
       output_path=f"s3://{default_bucket}/AbaloneTransform"
   )
   ```

1. Create a `TransformStep` using the transformer instance you defined and the `batch_data` pipeline parameter.

   ```
   from sagemaker.inputs import TransformInput
   from sagemaker.workflow.steps import TransformStep
   
   
   step_transform = TransformStep(
       name="AbaloneTransform",
       transformer=transformer,
       inputs=TransformInput(data=batch_data)
   )
   ```

#### Step 8: Define a RegisterModel step to create a model package


**Important**  
We recommend using [Model step](build-and-manage-steps-types.md#step-type-model) to register models as of v2.90.0 of the SageMaker Python SDK. `RegisterModel` will continue to work in previous versions of the SageMaker Python SDK, but is no longer actively supported.

This section shows how to create an instance of `RegisterModel`. The result of running `RegisterModel` in a pipeline is a model package. A model package is a reusable model artifacts abstraction that packages all ingredients necessary for inference. It consists of an inference specification that defines the inference image to use along with an optional model weights location. A model package group is a collection of model packages. You can use a `ModelPackageGroup` for Pipelines to add a new version and model package to the group for every pipeline run. For more information about model registry, see [Model Registration Deployment with Model Registry](model-registry.md).

This step is passed into the condition step and only runs if the condition step evaluates to `true`.

**To define a RegisterModel step to create a model package**
+  Construct a `RegisterModel` step using the estimator instance you used for the training step . Pass in the `S3ModelArtifacts` property from the `step_train` training step and specify a `ModelPackageGroup`. Pipelines creates this `ModelPackageGroup` for you.

  ```
  from sagemaker.model_metrics import MetricsSource, ModelMetrics 
  from sagemaker.workflow.step_collections import RegisterModel
  
  
  model_metrics = ModelMetrics(
      model_statistics=MetricsSource(
          s3_uri="{}/evaluation.json".format(
              step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
          ),
          content_type="application/json"
      )
  )
  step_register = RegisterModel(
      name="AbaloneRegisterModel",
      estimator=xgb_train,
      model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
      content_types=["text/csv"],
      response_types=["text/csv"],
      inference_instances=["ml.t2.medium", "ml.m5.xlarge"],
      transform_instances=["ml.m5.xlarge"],
      model_package_group_name=model_package_group_name,
      approval_status=model_approval_status,
      model_metrics=model_metrics
  )
  ```

#### Step 9: Define a condition step to verify model accuracy


A `ConditionStep` allows Pipelines to support conditional running in your pipeline DAG based on the condition of step properties. In this case, you only want to register a model package if the accuracy of that model exceeds the required value. The accuracy of the model is determined by the model evaluation step. If the accuracy exceeds the required value, the pipeline also creates a SageMaker AI Model and runs batch transformation on a dataset. This section shows how to define the Condition step.

**To define a condition step to verify model accuracy**

1.  Define a `ConditionLessThanOrEqualTo` condition using the accuracy value found in the output of the model evaluation processing step, `step_eval`. Get this output using the property file you indexed in the processing step and the respective JSONPath of the mean squared error value, `"mse"`.

   ```
   from sagemaker.workflow.conditions import ConditionLessThanOrEqualTo
   from sagemaker.workflow.condition_step import ConditionStep
   from sagemaker.workflow.functions import JsonGet
   
   
   cond_lte = ConditionLessThanOrEqualTo(
       left=JsonGet(
           step_name=step_eval.name,
           property_file=evaluation_report,
           json_path="regression_metrics.mse.value"
       ),
       right=6.0
   )
   ```

1.  Construct a `ConditionStep`. Pass the `ConditionEquals` condition in, then set the model package registration and batch transformation steps as the next steps if the condition passes. 

   ```
   step_cond = ConditionStep(
       name="AbaloneMSECond",
       conditions=[cond_lte],
       if_steps=[step_register, step_create_model, step_transform],
       else_steps=[], 
   )
   ```

#### Step 10: Create a pipeline


Now that you’ve created all of the steps, combine them into a pipeline.

**To create a pipeline**

1.  Define the following for your pipeline: `name`, `parameters`, and `steps`. Names must be unique within an `(account, region)` pair.
**Note**  
A step can only appear once in either the pipeline's step list or the if/else step lists of the condition step. It cannot appear in both. 

   ```
   from sagemaker.workflow.pipeline import Pipeline
   
   
   pipeline_name = f"AbalonePipeline"
   pipeline = Pipeline(
       name=pipeline_name,
       parameters=[
           processing_instance_count,
           model_approval_status,
           input_data,
           batch_data,
       ],
       steps=[step_process, step_train, step_eval, step_cond],
   )
   ```

1.  (Optional) Examine the JSON pipeline definition to ensure that it's well-formed.

   ```
   import json
   
   json.loads(pipeline.definition())
   ```

 This pipeline definition is ready to submit to SageMaker AI. In the next tutorial, you submit this pipeline to SageMaker AI and start a run. 

## Define a pipeline (JSON)


You can also use [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_pipeline) or [CloudFormation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-sagemaker-pipeline.html) to create a pipeline. Creating a pipeline requires a pipeline definition, which is a JSON object that defines each step of the pipeline. The SageMaker SDK offers a simple way to construct the pipeline definition, which you can use with any of the APIs previously mentioned to create the pipeline itself. Without using the SDK, users have to write the raw JSON definition to create the pipeline without any of the error checks provided by the SageMaker Python SDK. To see the schema for the pipeline JSON definition, see [ SageMaker AI Pipeline Definition JSON Schema](https://aws-sagemaker-mlops.github.io/sagemaker-model-building-pipeline-definition-JSON-schema/). The following code sample shows an example of a SageMaker AI pipeline definition JSON object:

```
{'Version': '2020-12-01',
 'Metadata': {},
 'Parameters': [{'Name': 'ProcessingInstanceType',
   'Type': 'String',
   'DefaultValue': 'ml.m5.xlarge'},
  {'Name': 'ProcessingInstanceCount', 'Type': 'Integer', 'DefaultValue': 1},
  {'Name': 'TrainingInstanceType',
   'Type': 'String',
   'DefaultValue': 'ml.m5.xlarge'},
  {'Name': 'ModelApprovalStatus',
   'Type': 'String',
   'DefaultValue': 'PendingManualApproval'},
  {'Name': 'ProcessedData',
   'Type': 'String',
   'DefaultValue': 'S3_URL',
{'Name': 'InputDataUrl',
   'Type': 'String',
   'DefaultValue': 'S3_URL',
 'PipelineExperimentConfig': {'ExperimentName': {'Get': 'Execution.PipelineName'},
  'TrialName': {'Get': 'Execution.PipelineExecutionId'}},
 'Steps': [{'Name': 'ReadTrainDataFromFS',
   'Type': 'Processing',
   'Arguments': {'ProcessingResources': {'ClusterConfig': {'InstanceType': 'ml.m5.4xlarge',
      'InstanceCount': 2,
      'VolumeSizeInGB': 30}},
    'AppSpecification': {'ImageUri': 'IMAGE_URI',
     'ContainerArguments': [....]},
    'RoleArn': 'ROLE',
      'ProcessingInputs': [...],
    'ProcessingOutputConfig': {'Outputs': [.....]},
    'StoppingCondition': {'MaxRuntimeInSeconds': 86400}},
   'CacheConfig': {'Enabled': True, 'ExpireAfter': '30d'}},
   ...
   ...
   ...
  }
```

 **Next step:** [Run a pipeline](run-pipeline.md) 

# Edit a pipeline


To make changes to a pipeline before running it, do the following:

1. Open SageMaker Studio by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane of Studio, select **Pipelines**.

1. Select a pipeline name to view details about the pipeline.

1. Choose the **Executions** tab.

1. Select the name of a pipeline execution.

1. Choose **Edit** to open the Pipeline Designer.

1. Update the edges between steps or the step configuration as required and click **Save**. 

   Saving a pipeline after editing automatically generates a new version number.

1. Choose **Run**.

# Run a pipeline


After defining the steps of your pipeline as a directed acyclic graph (DAG), you can run your pipeline, which executes the steps defined in your DAG. The following walkthroughs show you how to run an Amazon SageMaker AI pipeline using either the drag-and-drop visual editor in Amazon SageMaker Studio or the Amazon SageMaker Python SDK.

## Run a pipeline (Pipeline designer)


To start a new execution of your pipeline, do the following:

------
#### [ Studio ]

1. Open SageMaker Studio by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Choose a pipeline name to open the pipeline details view.

1. Choose **Visual Editor** on the top right.

1. To start an execution from the latest version, choose **Executions**.

1. To start an execution from a specific version, follow these steps:
   + Choose the version icon in the bottom toolbar to open the version panel.
   + Choose the pipeline version you want to execute.
   + Hover over the version item to reveal the three-dot menu, choose **Execute**.
   + (Optional) To view a previous version of the pipeline, choose **Preview** from the three-dot menu in the version panel. You can also edit the version by choosing **Edit** in the notification bar.

**Note**  
If your pipeline fails, the status banner will show a **Failed** status. After troubleshooting the failed step, choose **Retry** on the status banner to resume running the pipeline from that step.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Pipelines** from the menu.

1. To narrow the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name.

1. From the **Executions** or **Graph** tab in the execution list, choose **Create execution**.

1. Enter or update the following required information:
   + **Name** – Must be unique to your account in the AWS Region.
   + **ProcessingInstanceCount** – The number of instances to use for processing.
   + **ModelApprovalStatus** – For your convenience.
   + **InputDataUrl** – The Amazon S3 URI of the input data.

1. Choose **Start**.

Once your pipeline is running, you can view the details of the execution by choosing **View details** on the status banner.

To stop the run, choose **Stop** on the status banner. To resume the execution from where it was stopped, choose **Resume** on the status banner.

**Note**  
If your pipeline fails, the status banner will show a **Failed** status. After troubleshooting the failed step, choose **Retry** on the status banner to resume running the pipeline from that step.

------

## Run a pipeline (SageMaker Python SDK)


After you’ve created a pipeline definition using the SageMaker AI Python SDK, you can submit it to SageMaker AI to start your execution. The following tutorial shows how to submit a pipeline, start an execution, examine the results of that execution, and delete your pipeline. 

**Topics**
+ [

### Prerequisites
](#run-pipeline-prereq)
+ [

### Step 1: Start the Pipeline
](#run-pipeline-submit)
+ [

### Step 2: Examine a Pipeline Execution
](#run-pipeline-examine)
+ [

### Step 3: Override Default Parameters for a Pipeline Execution
](#run-pipeline-parametrized)
+ [

### Step 4: Stop and Delete a Pipeline Execution
](#run-pipeline-delete)

### Prerequisites


This tutorial requires the following: 
+  A SageMaker notebook instance.  
+  A Pipelines pipeline definition. This tutorial assumes you're using the pipeline definition created by completing the [Define a pipeline](define-pipeline.md) tutorial. 

### Step 1: Start the Pipeline


First, you need to start the pipeline. 

**To start the pipeline**

1. Examine the JSON pipeline definition to ensure that it's well-formed.

   ```
   import json
   
   json.loads(pipeline.definition())
   ```

1. Submit the pipeline definition to the Pipelines service to create a pipeline if it doesn't exist, or update the pipeline if it does. The role passed in is used by Pipelines to create all of the jobs defined in the steps. 

   ```
   pipeline.upsert(role_arn=role)
   ```

1. Start a pipeline execution.

   ```
   execution = pipeline.start()
   ```

### Step 2: Examine a Pipeline Execution


Next, you need to examine the pipeline execution. 

**To examine a pipeline execution**

1.  Describe the pipeline execution status to ensure that it has been created and started successfully.

   ```
   execution.describe()
   ```

1. Wait for the execution to finish. 

   ```
   execution.wait()
   ```

1. List the execution steps and their status.

   ```
   execution.list_steps()
   ```

   Your output should look like the following:

   ```
   [{'StepName': 'AbaloneTransform',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 27, 870000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 45, 50, 492000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'TransformJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:transform-job/pipelines-cfvy1tjuxdq8-abalonetransform-ptyjoef3jy'}}},
    {'StepName': 'AbaloneRegisterModel',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 26, 929000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 28, 15000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'RegisterModel': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:model-package/abalonemodelpackagegroupname/1'}}},
    {'StepName': 'AbaloneCreateModel',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 26, 895000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 27, 708000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'Model': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:model/pipelines-cfvy1tjuxdq8-abalonecreatemodel-jl94rai0ra'}}},
    {'StepName': 'AbaloneMSECond',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 41, 25, 558000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 26, 329000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'Condition': {'Outcome': 'True'}}},
    {'StepName': 'AbaloneEval',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 37, 34, 767000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 41, 18, 80000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:processing-job/pipelines-cfvy1tjuxdq8-abaloneeval-zfraozhmny'}}},
    {'StepName': 'AbaloneTrain',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 34, 55, 867000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 37, 34, 34000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'TrainingJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:training-job/pipelines-cfvy1tjuxdq8-abalonetrain-tavd6f3wdf'}}},
    {'StepName': 'AbaloneProcess',
     'StartTime': datetime.datetime(2020, 11, 21, 2, 30, 27, 160000, tzinfo=tzlocal()),
     'EndTime': datetime.datetime(2020, 11, 21, 2, 34, 48, 390000, tzinfo=tzlocal()),
     'StepStatus': 'Succeeded',
     'CacheHitResult': {'SourcePipelineExecutionArn': ''},
     'Metadata': {'ProcessingJob': {'Arn': 'arn:aws:sagemaker:us-east-2:111122223333:processing-job/pipelines-cfvy1tjuxdq8-abaloneprocess-mgqyfdujcj'}}}]
   ```

1. After your pipeline execution is complete, download the resulting  `evaluation.json` file from Amazon S3 to examine the report. 

   ```
   evaluation_json = sagemaker.s3.S3Downloader.read_file("{}/evaluation.json".format(
       step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
   ))
   json.loads(evaluation_json)
   ```

### Step 3: Override Default Parameters for a Pipeline Execution


You can run additional executions of the pipeline by specifying different pipeline parameters to override the defaults.

**To override default parameters**

1. Create the pipeline execution. This starts another pipeline execution with the model approval status override set to "Approved". This means that the model package version generated by the `RegisterModel` step is automatically ready for deployment through CI/CD pipelines, such as with SageMaker Projects. For more information, see [MLOps Automation With SageMaker Projects](sagemaker-projects.md).

   ```
   execution = pipeline.start(
       parameters=dict(
           ModelApprovalStatus="Approved",
       )
   )
   ```

1. Wait for the execution to finish. 

   ```
   execution.wait()
   ```

1. List the execution steps and their status.

   ```
   execution.list_steps()
   ```

1. After your pipeline execution is complete, download the resulting  `evaluation.json` file from Amazon S3 to examine the report. 

   ```
   evaluation_json = sagemaker.s3.S3Downloader.read_file("{}/evaluation.json".format(
       step_eval.arguments["ProcessingOutputConfig"]["Outputs"][0]["S3Output"]["S3Uri"]
   ))
   json.loads(evaluation_json)
   ```

### Step 4: Stop and Delete a Pipeline Execution


When you're finished with your pipeline, you can stop any ongoing executions and delete the pipeline.

**To stop and delete a pipeline execution**

1. Stop the pipeline execution.

   ```
   execution.stop()
   ```

1. Delete the pipeline.

   ```
   pipeline.delete()
   ```

# Stop a pipeline


You can stop a pipeline run in the Amazon SageMaker Studio console.

To stop a pipeline execution in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. In the left navigation pane, select **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name.

1. Choose the **Executions** tab.

1. Select the execution to stop.

1. Choose **Stop**. To resume the execution from where it was stopped, choose **Resume**

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Pipelines** from the menu.

1. To narrow the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. To stop a pipeline run, choose **View details** on the status banner of the pipeline, and then choose **Stop**. To resume the execution from where it was stopped, choose **Resume**.

------

# View the details of a pipeline


You can view the details of a SageMaker AI pipeline to understand its parameters, the dependencies of its steps, or monitor its progress and status. This can help you troubleshoot or optimize your workflow. You can access the details of a given pipeline using the Amazon SageMaker Studio console and explore its execution history, definition, parameters, and metadata.

Alternatively, if your pipeline is associated with a SageMaker AI Project, you can access the pipeline details from the project's details page. For more information, see [View Project Resources](sagemaker-projects-resources.md).

To view the details of a SageMaker AI pipeline, complete the following steps based on whether you use Studio or Studio Classic.

**Note**  
Model repacking happens when the pipeline needs to include a custom script in the compressed model file (model.tar.gz) to be uploaded to Amazon S3 and used to deploy a model to a SageMaker AI endpoint. When SageMaker AI pipeline trains a model and registers it to the model registry, it introduces a repack step *if* the trained model output from the training job needs to include a custom inference script. The repack step uncompresses the model, adds a new script, and recompresses the model. Running the pipeline adds the repack step as a training job.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, select **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name to view details about the pipeline.

1. Choose one of the following tabs to view pipeline details:
   + **Executions** – Details about the executions.
   + **Graph** – The pipeline graph, including all steps.
   + **Parameters** – The run parameters and metrics related to the pipeline.
   + **Information** – The metadata associated with the pipeline, such as tags, the pipeline Amazon Resource Name (ARN), and role ARN. You can also edit the pipeline description from this page.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Pipelines** from the menu.

1. To narrow the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name to view details about the pipeline. The pipeline details tab opens and displays a list of pipeline executions. You can start an execution or choose one of the other tabs for more information about the pipeline. Use the **Property Inspector** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/gears.png)) to choose which columns to display.

1. From the pipeline details page, choose one of the following tabs to view details about the pipeline:
   + **Executions** – Details about the executions. You can create an execution from this tab or the **Graph** tab.
   + **Graph** – The DAG for the pipeline.
   + **Parameters** – Includes the model approval status.
   + **Settings** – The metadata associated with the pipeline. You can download the pipeline definition file and edit the pipeline name and description from this tab.

------

# View the details of a pipeline run


You can review the details of a particular SageMaker AI pipeline run. This can help you:
+ Identify and resolve problems that may have occurred during the run, such as failed steps or unexpected errors.
+ Compare the results of different pipeline executions to understand how changes in input data or parameters impact the overall workflow.
+ Identify bottlenecks and opportunities for optimization.

To view the details of a pipeline run, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, select **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name to view details about the pipeline.

1. Choose the **Executions** tab.

1. Select the name of a pipeline execution to view. The pipeline graph for that execution appears.

1. Choose any of the pipeline steps in the graph to see step settings in the right sidebar.

1. Choose one of the following tabs to view more pipeline details:
   + **Definition** — The pipeline graph, including all steps.
   + **Parameters** – Includes the model approval status.
   + **Details** – The metadata associated with the pipeline, such as tags, the pipeline Amazon Resource Name (ARN), and role ARN. You can also edit the pipeline description from this page.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Pipelines** from the menu.

1. To narrow the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name. The pipeline's **Executions** page opens.

1. In the **Executions** page, select an execution name to view details about the execution. The execution details tab opens and displays a graph of the steps in the pipeline.

1. To search for a step by name, type characters that match a step name in the search field. Use the resizing icons on the lower-right side of the graph to zoom in and out of the graph, fit the graph to screen, and expand the graph to full screen. To focus on a specific part of the graph, you can select a blank area of the graph and drag the graph to center on that area.   
![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/yosemite/execution-graph-w-input.png)

1. Choose one of the pipeline steps in the graph to see details about the step. In the preceding screenshot, a training step is chosen and displays the following tabs:
   + **Input** – The training inputs. If an input source is from Amazon Simple Storage Service (Amazon S3), choose the link to view the file in the Amazon S3 console.
   + **Output** – The training outputs, such as metrics, charts, files, and evaluation outcome. The graphs are produced using the [Tracker](https://sagemaker-experiments.readthedocs.io/en/latest/tracker.html#smexperiments.tracker.Tracker.log_precision_recall) APIs.
   + **Logs** – The Amazon CloudWatch logs produced by the step.
   + **Info** – The parameters and metadata associated with the step.  
![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/yosemite/execution-graph-info.png)

------

# Download a pipeline definition file


You can download the definition file for your SageMaker AI pipeline directly from the Amazon SageMaker Studio UI. You can use this pipeline definition file for:
+ Backup and restoration: Use the downloaded file to create a backup of your pipeline configuration, which you can restore in case of infrastructure failures or accidental changes.
+ Version control: Store the pipeline definition file in a source control system to track changes to the pipeline and revert to previous versions if needed.
+ Programmatic interactions: Use the pipeline definition file as input to the SageMaker SDK or AWS CLI.
+ Integration with automation processes: Integrate the pipeline definition into your CI/CD workflows or other automation processes.

To download the definition file of a pipeline, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, select **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name. The **Executions** page opens and displays a list of pipeline executions.

1. Stay on the **Executions** page or choose the **Graph**, **Information**, or **Parameters** page to the left of the pipeline executions table. You can download the pipeline definition from any of these pages.

1. At the top right of the page, choose the vertical ellipsis and choose **Download pipeline definition (JSON)**.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Pipelines** from the menu.

1. To narrow the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. Select a pipeline name.

1. Choose the **Settings** tab.

1. Choose **Download pipeline definition file**.

------

# Access experiment data from a pipeline
Access experiment data from a pipeline

**Note**  
SageMaker Experiments is a feature provided in Studio Classic only.

When you create a pipeline and specify [pipeline\$1experiment\$1config](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.pipeline.Pipeline.pipeline_experiment_config), Pipelines creates the following SageMaker Experiments entities by default if they don't exist:
+ An experiment for the pipeline
+ A run group for every execution of the pipeline
+ A run for each SageMaker AI job created in a pipeline step

For information about how experiments are integrated with pipelines, see [Amazon SageMaker Experiments Integration](pipelines-experiments.md). For more information about SageMaker Experiments, see [Amazon SageMaker Experiments in Studio Classic](experiments.md).

You can get to the list of runs associated with a pipeline from either the pipeline executions list or the experiments list.

**To view the runs list from the pipeline executions list**

1. To view the pipeline executions list, follow the first five steps in the *Studio Classic* tab of [View the details of a pipeline](pipelines-studio-list.md).

1. On the top right of the screen, choose the **Filter** icon (![\[Funnel or filter icon representing data filtering or narrowing down options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/jumpstart/jumpstart-filter-icon.png)).

1. Choose **Experiment**. If experiment integration wasn't deactivated when the pipeline was created, the experiment name is displayed in the executions list. 
**Note**  
Experiments integration was introduced in v2.41.0 of the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable). Pipelines created with an earlier version of the SDK aren't integrated with experiments by default.

1. Select the experiment of your choice to view run groups and runs related to that experiment.

**To view the runs list from the experiments list**

1. In the left sidebar of Studio Classic, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Experiments** from the menu.

1. Use search bar or **Filter** icon (![\[Funnel or filter icon representing data filtering or narrowing down options.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/jumpstart/jumpstart-filter-icon.png)) to filter the list to experiments created by a pipeline.

1. Open an experiment name and view a list of runs created by the pipeline.

# Track the lineage of a pipeline
Track the lineage of a pipeline

In this tutorial, you use Amazon SageMaker Studio to track the lineage of an Amazon SageMaker AI ML Pipeline.

The pipeline was created by the [Orchestrating Jobs with Amazon SageMaker Model Building Pipelines](https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-pipelines/tabular/abalone_build_train_deploy/sagemaker-pipelines-preprocess-train-evaluate-batch-transform.html) notebook in the [Amazon SageMaker example GitHub repository](https://github.com/awslabs/amazon-sagemaker-examples). For detailed information on how the pipeline was created, see [Define a pipeline](define-pipeline.md).

Lineage tracking in Studio is centered around a directed acyclic graph (DAG). The DAG represents the steps in a pipeline. From the DAG you can track the lineage from any step to any other step. The following diagram displays the steps in the pipeline. These steps appear as a DAG in Studio.

![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/yosemite/pipeline-tutorial-steps.png)


To track the lineage of a pipeline in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

**To track the lineage of a pipeline**

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, select **Pipelines**.

1. (Optional) To filter the list of pipelines by name, enter a full or partial pipeline name in the search field.

1. In the **Name** column, select a pipeline name to view details about the pipeline.

1. Choose the **Executions** tab.

1. In the **Name** column of the **Executions** table, select the name of a pipeline execution to view.

1. At the top right of the **Executions** page, choose the vertical ellipsis and choose **Download pipeline definition (JSON)**. You can view the file to see how the pipeline graph was defined. 

1. Choose **Edit** to open the Pipeline Designer.

1. Use the resizing and zoom controls at the top right corner of the canvas to zoom in and out of the graph, fit the graph to screen, or expand the graph to full screen.

1. To view your training, validation, and test datasets, complete the following steps:

   1. Choose the Processing step in your pipeline graph.

   1. In the right sidebar, choose the **Overview** tab.

   1. In the **Files** section, find the Amazon S3 paths to the training, validation, and test datasets.

1. To view your model artifacts, complete the following steps:

   1. Choose the Training step in your pipeline graph.

   1. In the right sidebar, choose the **Overview** tab.

   1. In the **Files** section, find the Amazon S3 paths to the model artifact.

1. To find the model package ARN, complete the following steps:

   1. Choose the Register model step.

   1. In the right sidebar, choose the **Overview** tab.

   1. In the **Files** section, find the ARN of the model package.

------
#### [ Studio Classic ]

**To track the lineage of a pipeline**

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left sidebar of Studio, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. In the menu, select **Pipelines**.

1. Use the **Search** box to filter the pipelines list.

1. Choose the `AbalonePipeline` pipeline to view the execution list and other details about the pipeline.

1. Choose the **Property Inspector** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/gears.png)) in the right sidebar to open the **TABLE PROPERTIES** pane, where you can choose which properties to view.

1. Choose the **Settings** tab and then choose **Download pipeline definition file**. You can view the file to see how the pipeline graph was defined.

1. On the **Execution** tab, select the first row in the execution list to view its execution graph and other details about the execution. Note that the graph matches the diagram displayed at the beginning of the tutorial.

   Use the resizing icons on the lower-right side of the graph to zoom in and out of the graph, fit the graph to screen, or expand the graph to full screen. To focus on a specific part of the graph, you can select a blank area of the graph and drag the graph to center on that area. The inset on the lower-right side of the graph displays your location in the graph.  
![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/yosemite/pipeline-tutorial-execution-graph.png)

1. On the **Graph** tab, choose the `AbaloneProcess` step to view details about the step.

1. Find the Amazon S3 paths to the training, validation, and test datasets in the **Output** tab, under **Files**.
**Note**  
To get the full paths, right-click the path and then choose **Copy cell contents**.

   ```
   s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/train
   s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/validation
   s3://sagemaker-eu-west-1-acct-id/sklearn-abalone-process-2020-12-05-17-28-28-509/output/test
   ```

1. Choose the `AbaloneTrain` step.

1. Find the Amazon S3 path to the model artifact in the **Output** tab, under **Files**:

   ```
   s3://sagemaker-eu-west-1-acct-id/AbaloneTrain/pipelines-6locnsqz4bfu-AbaloneTrain-NtfEpI0Ahu/output/model.tar.gz
   ```

1. Choose the `AbaloneRegisterModel` step.

1. Find the ARN of the model package in the **Output** tab, under **Files**:

   ```
   arn:aws:sagemaker:eu-west-1:acct-id:model-package/abalonemodelpackagegroupname/2
   ```

------

# Kubernetes Orchestration


You can orchestrate your SageMaker training and inference jobs with SageMaker AI Operators for Kubernetes and SageMaker AI Components for Kubeflow Pipelines. SageMaker AI Operators for Kubernetes make it easier for developers and data scientists using Kubernetes to train, tune, and deploy machine learning (ML) models in SageMaker AI. SageMaker AI Components for Kubeflow Pipelines allow you to move your data processing and training jobs from the Kubernetes cluster to SageMaker AI’s machine learning-optimized managed service.

**Topics**
+ [

# SageMaker AI Operators for Kubernetes
](kubernetes-sagemaker-operators.md)
+ [

# SageMaker AI Components for Kubeflow Pipelines
](kubernetes-sagemaker-components-for-kubeflow-pipelines.md)

# SageMaker AI Operators for Kubernetes


SageMaker AI Operators for Kubernetes make it easier for developers and data scientists using Kubernetes to train, tune, and deploy machine learning (ML) models in SageMaker AI. You can install these SageMaker AI Operators on your Kubernetes cluster in Amazon Elastic Kubernetes Service (Amazon EKS) to create SageMaker AI jobs natively using the Kubernetes API and command-line Kubernetes tools such as `kubectl`. This guide shows how to set up and use the operators to run model training, hyperparameter tuning, or inference (real-time and batch) on SageMaker AI from a Kubernetes cluster. The procedures and guidelines in this chapter assume that you are familiar with Kubernetes and its basic commands.

**Important**  
We are stopping the development and technical support of the original version of [ SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master).  
If you are currently using version `v1.2.2` or below of [ SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master), we recommend migrating your resources to the [ACK service controller for Amazon SageMaker](https://github.com/aws-controllers-k8s/sagemaker-controller). The ACK service controller is a new generation of SageMaker Operators for Kubernetes based on [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/).  
For information on the migration steps, see [Migrate resources to the latest Operators](kubernetes-sagemaker-operators-migrate.md).  
For answers to frequently asked questions on the end of support of the original version of SageMaker Operators for Kubernetes, see [Announcing the End of Support of the Original Version of SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-eos-announcement.md)

**Note**  
There is no additional charge to use these operators. You do incur charges for any SageMaker AI resources that you use through these operators.

## What is an operator?


A Kubernetes operator is an application controller managing applications on behalf of a Kubernetes user. Controllers of the control plane encompass various control loops listening to a central state manager (ETCD) to regulate the state of the application they control. Examples of such applications include the [Cloud-controller-manager](https://kubernetes.io/docs/concepts/architecture/cloud-controller/) and `[kube-controller-manager](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/)`. Operators typically provide a higher-level abstraction than raw Kubernetes API, making it easier for users to deploy and manage applications. To add new capabilities to Kubernetes, developers can extend the Kubernetes API by creating a **custom resource** that contains their application-specific or domain-specific logic and components. Operators in Kubernetes allow users to natively invoke these custom resources and automate associated workflows.

### How does AWS Controllers for Kubernetes (ACK) work?


The SageMaker AI Operators for Kubernetes allow you to manage jobs in SageMaker AI from your Kubernetes cluster. The latest version of SageMaker AI Operators for Kubernetes is based on AWS Controllers for Kubernetes (ACK). ACK includes a common controller runtime, a code generator, and a set of AWS service-specific controllers, one of which is the SageMaker AI controller.

The following diagram illustrates how ACK works.

![\[ACK based SageMaker AI Operator for Kubernetes explained.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/k8s-orchestration/sagemaker-operators-for-kubernetes-ack-controller.png)


In this diagram, a Kubernetes user wants to run model training on SageMaker AI from within the Kubernetes cluster using the Kubernetes API. The user issues a call to `kubectl apply`, passing in a file that describes a Kubernetes custom resource describing the SageMaker training job. `kubectl apply` passes this file, called a manifest, to the Kubernetes API server running in the Kubernetes controller node (Step *1* in the workflow diagram). The Kubernetes API server receives the manifest with the SageMaker training job specification and determines whether the user has permissions to create a custom resource of kind `sageMaker.services.k8s.aws/TrainingJob`, and whether the custom resource is properly formatted (Step *2*). If the user is authorized and the custom resource is valid, the Kubernetes API server writes (Step *3*) the custom resource to its etcd data store and then responds back (Step *4*) to the user that the custom resource has been created. The SageMaker AI controller, which is running on a Kubernetes worker node within the context of a normal Kubernetes Pod, is notified (Step *5*) that a new custom resource of kind `sageMaker.services.k8s.aws/TrainingJob` has been created. The SageMaker AI controller then communicates (Step *6*) with the SageMaker API, calling the SageMaker AI `CreateTrainingJob` API to create the training job in AWS. After communicating with the SageMaker API, the SageMaker AI controller calls the Kubernetes API server to update (Step *7*) the custom resource’s status with information it received from SageMaker AI. The SageMaker AI controller therefore provides the same information to the developers that they would have received using the AWS SDK.

### Permissions overview


The operators access SageMaker AI resources on your behalf. The IAM role that the operator assumes to interact with AWS resources differs from the credentials you use to access the Kubernetes cluster. The role also differs from the role that AWS assumes when running your machine learning jobs. 

The following image explains the various authentication layers.

![\[SageMaker AI Operator for Kubernetes various authentication layers.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/k8s-orchestration/sagemaker-operators-for-kubernetes-authentication.png)


# Latest SageMaker AI Operators for Kubernetes


This section is based on the latest version of SageMaker AI Operators for Kubernetes using AWS Controllers for Kubernetes (ACK).

**Important**  
If you are currently using version `v1.2.2` or below of [ SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master), we recommend migrating your resources to the [ACK service controller for Amazon SageMaker](https://github.com/aws-controllers-k8s/sagemaker-controller). The ACK service controller is a new generation of SageMaker Operators for Kubernetes based on [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/).  
For information on the migration steps, see [Migrate resources to the latest Operators](kubernetes-sagemaker-operators-migrate.md).  
For answers to frequently asked questions on the end of support of the original version of SageMaker Operators for Kubernetes, see [Announcing the End of Support of the Original Version of SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-eos-announcement.md)

The latest version of [SageMaker AI Operators for Kubernetes](https://github.com/aws-controllers-k8s/sagemaker-controller) is based on [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/ ), a framework for building Kubernetes custom controllers where each controller communicates with an AWS service API. These controllers allow Kubernetes users to provision AWS resources like databases or message queues using the Kubernetes API.

Use the following steps to install and use ACK to train, tune, and deploy machine learning models with Amazon SageMaker AI.

**Topics**
+ [

## Install SageMaker AI Operators for Kubernetes
](#kubernetes-sagemaker-operators-ack-install)
+ [

## Use SageMaker AI Operators for Kubernetes
](#kubernetes-sagemaker-operators-ack-use)
+ [

## Reference
](#kubernetes-sagemaker-operators-ack-reference)

## Install SageMaker AI Operators for Kubernetes


To set up the latest available version of SageMaker AI Operators for Kubernetes, see the *Setup* section in [ Machine Learning with the ACK SageMaker AI Controller](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/#setup).

## Use SageMaker AI Operators for Kubernetes


For a tutorial on how to train a machine learning model with the ACK service controller for Amazon SageMaker AI using Amazon EKS, see [Machine Learning with the ACK SageMaker AI Controller](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/).

For an autoscaling example, see [ Scale SageMaker AI Workloads with Application Auto Scaling](https://aws-controllers-k8s.github.io/community/docs/tutorials/autoscaling-example/)

## Reference


See also the [ACK service controller for Amazon SageMaker AI GitHub repository](https://github.com/aws-controllers-k8s/sagemaker-controller) or read [AWS Controllers for Kubernetes Documentation](https://aws-controllers-k8s.github.io/community/docs/community/overview/). 

# Old SageMaker AI Operators for Kubernetes


This section is based on the original version of [SageMaker AI Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s).

**Important**  
We are stopping the development and technical support of the original version of [ SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master).  
If you are currently using version `v1.2.2` or below of [ SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master), we recommend migrating your resources to the [ACK service controller for Amazon SageMaker](https://github.com/aws-controllers-k8s/sagemaker-controller). The ACK service controller is a new generation of SageMaker Operators for Kubernetes based on [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/).  
For information on the migration steps, see [Migrate resources to the latest Operators](kubernetes-sagemaker-operators-migrate.md).  
For answers to frequently asked questions on the end of support of the original version of SageMaker Operators for Kubernetes, see [Announcing the End of Support of the Original Version of SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-eos-announcement.md)

**Topics**
+ [

## Install SageMaker AI Operators for Kubernetes
](#kubernetes-sagemaker-operators-eos-install)
+ [

# Use Amazon SageMaker AI Jobs
](kubernetes-sagemaker-jobs.md)
+ [

# Migrate resources to the latest Operators
](kubernetes-sagemaker-operators-migrate.md)
+ [

# Announcing the End of Support of the Original Version of SageMaker AI Operators for Kubernetes
](kubernetes-sagemaker-operators-eos-announcement.md)

## Install SageMaker AI Operators for Kubernetes


Use the following steps to install and use SageMaker AI Operators for Kubernetes to train, tune, and deploy machine learning models with Amazon SageMaker AI.

**Topics**
+ [

### IAM role-based setup and operator deployment
](#iam-role-based-setup-and-operator-deployment)
+ [

### Clean up resources
](#cleanup-operator-resources)
+ [

### Delete operators
](#delete-operators)
+ [

### Troubleshooting
](#troubleshooting)
+ [

### Images and SMlogs in each Region
](#images-and-smlogs-in-each-region)

### IAM role-based setup and operator deployment


The following sections describe the steps to set up and deploy the original version of the operator.

**Warning**  
**Reminder:** The following steps do not install the latest version of SageMaker AI Operators for Kubernetes. To install the new ACK-based SageMaker AI Operators for Kubernetes, see [Latest SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-ack.md).

#### Prerequisites


This guide assumes that you have completed the following prerequisites: 
+ Install the following tools on the client machine used to access your Kubernetes cluster: 
  + [https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) Version 1.13 or later. Use a `kubectl` version that is within one minor version of your Amazon EKS cluster control plane. For example, a 1.13 `kubectl` client works with Kubernetes 1.13 and 1.14 clusters. OpenID Connect (OIDC) is not supported in versions earlier than 1.13. 
  + [https://github.com/weaveworks/eksctl](https://github.com/weaveworks/eksctl) Version 0.7.0 or later 
  + [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/install-cliv1.html) Version 1.16.232 or later 
  + (optional) [Helm](https://helm.sh/docs/intro/install/) Version 3.0 or later 
  + [aws-iam-authenticator](https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html) 
+ Have IAM permissions to create roles and attach policies to roles.
+ Created a Kubernetes cluster on which to run the operators. It should either be Kubernetes version 1.13 or 1.14. For automated cluster creation using `eksctl`, see [Getting Started with eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html). It takes 20–30 minutes to provision a cluster. 

#### Cluster-scoped deployment


Before you can deploy your operator using an IAM role, associate an OpenID Connect (OIDC) Identity Provider (IdP) with your role to authenticate with the IAM service.

##### Create an OIDC provider for your cluster


The following instructions show how to create and associate an OIDC provider with your Amazon EKS cluster.

1. Set the local `CLUSTER_NAME` and `AWS_REGION` environment variables as follows:

   ```
   # Set the Region and cluster
   export CLUSTER_NAME="<your cluster name>"
   export AWS_REGION="<your region>"
   ```

1. Use the following command to associate the OIDC provider with your cluster. For more information, see [Enabling IAM Roles for Service Accounts on your Cluster.](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html) 

   ```
   eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} \
         --region ${AWS_REGION} --approve
   ```

   Your output should look like the following: 

   ```
   [_]  eksctl version 0.10.1
     [_]  using region us-east-1
     [_]  IAM OpenID Connect provider is associated with cluster "my-cluster" in "us-east-1"
   ```

Now that the cluster has an OIDC identity provider, you can create a role and give a Kubernetes ServiceAccount permission to assume the role.

##### Get the OIDC ID


To set up the ServiceAccount, obtain the OIDC issuer URL using the following command:

```
aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} \
      --query cluster.identity.oidc.issuer --output text
```

The command returns a URL like the following: 

```
https://oidc.eks.${AWS_REGION}.amazonaws.com/id/D48675832CA65BD10A532F597OIDCID
```

In this URL, the value `D48675832CA65BD10A532F597OIDCID` is the OIDC ID. The OIDC ID for your cluster is different. You need this OIDC ID value to create a role. 

 If your output is `None`, it means that your client version is old. To work around this, run the following command: 

```
aws eks describe-cluster --region ${AWS_REGION} --query cluster --name ${CLUSTER_NAME} --output text | grep OIDC
```

The OIDC URL is returned as follows: 

```
OIDC https://oidc.eks.us-east-1.amazonaws.com/id/D48675832CA65BD10A532F597OIDCID
```

##### Create an IAM role


1. Create a file named `trust.json` and insert the following trust relationship code block into it. Be sure to replace all `<OIDC ID>`, `<AWS account number>`, and `<EKS Cluster region>` placeholders with values corresponding to your cluster. 

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
         {
           "Effect": "Allow",
           "Principal": {
             "Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>"
           },
           "Action": "sts:AssumeRoleWithWebIdentity",
           "Condition": {
             "StringEquals": {
               "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:aud": "sts.amazonaws.com",
               "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:sub": "system:serviceaccount:sagemaker-k8s-operator-system:sagemaker-k8s-operator-default"
             }
           }
         }
       ]
     }
   ```

------

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
         {
           "Effect": "Allow",
           "Principal": {
             "Federated": "arn:aws-cn:iam::111122223333:oidc-provider/oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>"
           },
           "Action": "sts:AssumeRoleWithWebIdentity",
           "Condition": {
             "StringEquals": {
               "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:aud": "sts.amazonaws.com",
               "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:sub": "system:serviceaccount:sagemaker-k8s-operator-system:sagemaker-k8s-operator-default"
             }
           }
         }
       ]
     }
   ```

------

1. Run the following command to create a role with the trust relationship defined in `trust.json`. This role allows the Amazon EKS cluster to get and refresh credentials from IAM. 

   ```
   aws iam create-role --region ${AWS_REGION} --role-name <role name> --assume-role-policy-document file://trust.json --output=text
   ```

   Your output should look like the following: 

   ```
   ROLE    arn:aws:iam::123456789012:role/my-role 2019-11-22T21:46:10Z    /       ABCDEFSFODNN7EXAMPLE   my-role
   ASSUMEROLEPOLICYDOCUMENT        2012-10-17		 	 	 
   STATEMENT       sts:AssumeRoleWithWebIdentity   Allow
   STRINGEQUALS    sts.amazonaws.com       system:serviceaccount:sagemaker-k8s-operator-system:sagemaker-k8s-operator-default
   PRINCIPAL       arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/
   ```

    Take note of `ROLE ARN`; you pass this value to your operator. 

##### Attach the AmazonSageMakerFullAccess policy to the role


To give the role access to SageMaker AI, attach the [AmazonSageMakerFullAccess](https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess) policy. If you want to limit permissions to the operator, you can create your own custom policy and attach it. 

 To attach `AmazonSageMakerFullAccess`, run the following command: 

```
aws iam attach-role-policy --role-name <role name>  --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
```

The Kubernetes ServiceAccount `sagemaker-k8s-operator-default` should have `AmazonSageMakerFullAccess` permissions. Confirm this when you install the operator. 

##### Deploy the operator


When deploying your operator, you can use either a YAML file or Helm charts. 

##### Deploy the operator using YAML


This is the simplest way to deploy your operators. The process is as follows: 

1. Download the installer script using the following command: 

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/release/rolebased/installer.yaml
   ```

1. Edit the `installer.yaml` file to replace `eks.amazonaws.com/role-arn`. Replace the ARN here with the Amazon Resource Name (ARN) for the OIDC-based role you’ve created. 

1. Use the following command to deploy the cluster: 

   ```
   kubectl apply -f installer.yaml
   ```

##### Deploy the operator using Helm Charts


Use the provided Helm Chart to install the operator. 

1. Clone the Helm installer directory using the following command: 

   ```
   git clone https://github.com/aws/amazon-sagemaker-operator-for-k8s.git
   ```

1. Navigate to the `amazon-sagemaker-operator-for-k8s/hack/charts/installer` folder. Edit the `rolebased/values.yaml` file, which includes high-level parameters for the chart. Replace the role ARN here with the Amazon Resource Name (ARN) for the OIDC-based role you've created. 

1. Install the Helm Chart using the following command: 

   ```
   kubectl create namespace sagemaker-k8s-operator-system
     helm install --namespace sagemaker-k8s-operator-system sagemaker-operator rolebased/
   ```

   If you decide to install the operator into a namespace other than the one specified, you need to adjust the namespace defined in the IAM role `trust.json` file to match. 

1. After a moment, the chart is installed with a randomly generated name. Verify that the installation succeeded by running the following command: 

   ```
   helm ls
   ```

   Your output should look like the following: 

   ```
   NAME                    NAMESPACE                       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
     sagemaker-operator      sagemaker-k8s-operator-system   1               2019-11-20 23:14:59.6777082 +0000 UTC   deployed        sagemaker-k8s-operator-0.1.0
   ```

##### Verify the operator deployment


1. You should be able to see the SageMaker AI Custom Resource Definitions (CRDs) for each operator deployed to your cluster by running the following command: 

   ```
   kubectl get crd | grep sagemaker
   ```

   Your output should look like the following: 

   ```
   batchtransformjobs.sagemaker.aws.amazon.com         2019-11-20T17:12:34Z
   endpointconfigs.sagemaker.aws.amazon.com            2019-11-20T17:12:34Z
   hostingdeployments.sagemaker.aws.amazon.com         2019-11-20T17:12:34Z
   hyperparametertuningjobs.sagemaker.aws.amazon.com   2019-11-20T17:12:34Z
   models.sagemaker.aws.amazon.com                     2019-11-20T17:12:34Z
   trainingjobs.sagemaker.aws.amazon.com               2019-11-20T17:12:34Z
   ```

1. Ensure that the operator pod is running successfully. Use the following command to list all pods: 

   ```
   kubectl -n sagemaker-k8s-operator-system get pods
   ```

   You should see a pod named `sagemaker-k8s-operator-controller-manager-*****` in the namespace `sagemaker-k8s-operator-system` as follows: 

   ```
   NAME                                                         READY   STATUS    RESTARTS   AGE
   sagemaker-k8s-operator-controller-manager-12345678-r8abc     2/2     Running   0          23s
   ```

#### Namespace-scoped deployment


You have the option to install your operator within the scope of an individual Kubernetes namespace. In this mode, the controller only monitors and reconciles resources with SageMaker AI if the resources are created within that namespace. This allows for finer-grained control over which controller is managing which resources. This is useful for deploying to multiple AWS accounts or controlling which users have access to particular jobs. 

This guide outlines how to install an operator into a particular, predefined namespace. To deploy a controller into a second namespace, follow the guide from beginning to end and change out the namespace in each step. 

##### Create an OIDC provider for your Amazon EKS cluster


The following instructions show how to create and associate an OIDC provider with your Amazon EKS cluster. 

1. Set the local `CLUSTER_NAME` and `AWS_REGION` environment variables as follows: 

   ```
   # Set the Region and cluster
   export CLUSTER_NAME="<your cluster name>"
   export AWS_REGION="<your region>"
   ```

1. Use the following command to associate the OIDC provider with your cluster. For more information, see [Enabling IAM Roles for Service Accounts on your Cluster.](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html) 

   ```
   eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} \
         --region ${AWS_REGION} --approve
   ```

   Your output should look like the following: 

   ```
   [_]  eksctl version 0.10.1
     [_]  using region us-east-1
     [_]  IAM OpenID Connect provider is associated with cluster "my-cluster" in "us-east-1"
   ```

Now that the cluster has an OIDC identity provider, create a role and give a Kubernetes ServiceAccount permission to assume the role. 

##### Get your OIDC ID


To set up the ServiceAccount, first obtain the OpenID Connect issuer URL using the following command: 

```
aws eks describe-cluster --name ${CLUSTER_NAME} --region ${AWS_REGION} \
      --query cluster.identity.oidc.issuer --output text
```

The command returns a URL like the following: 

```
https://oidc.eks.${AWS_REGION}.amazonaws.com/id/D48675832CA65BD10A532F597OIDCID
```

In this URL, the value D48675832CA65BD10A532F597OIDCID is the OIDC ID. The OIDC ID for your cluster is different. You need this OIDC ID value to create a role. 

 If your output is `None`, it means that your client version is old. To work around this, run the following command: 

```
aws eks describe-cluster --region ${AWS_REGION} --query cluster --name ${CLUSTER_NAME} --output text | grep OIDC
```

The OIDC URL is returned as follows: 

```
OIDC https://oidc.eks.us-east-1.amazonaws.com/id/D48675832CA65BD10A532F597OIDCID
```

##### Create your IAM role


1. Create a file named `trust.json` and insert the following trust relationship code block into it. Be sure to replace all `<OIDC ID>`, `<AWS account number>`, `<EKS Cluster region>`, and `<Namespace>` placeholders with values corresponding to your cluster. For the purposes of this guide, `my-namespace` is used for the `<Namespace>` value. 

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
         {
           "Effect": "Allow",
           "Principal": {
           "Federated": "arn:aws:iam::111122223333:oidc-provider/oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>"
           },
           "Action": "sts:AssumeRoleWithWebIdentity",
           "Condition": {
             "StringEquals": {
                 "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:aud": "sts.amazonaws.com",
                 "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:sub": "system:serviceaccount:<Namespace>:sagemaker-k8s-operator-default"
             }
           }
         }
       ]
     }
   ```

------

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
         {
           "Effect": "Allow",
           "Principal": {
             "Federated": "arn:aws-cn:iam::111122223333:oidc-provider/oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>"
           },
           "Action": "sts:AssumeRoleWithWebIdentity",
           "Condition": {
             "StringEquals": {
                 "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:aud": "sts.amazonaws.com",
                 "oidc.eks.<EKS Cluster region>.amazonaws.com/id/<OIDC ID>:sub": "system:serviceaccount:<Namespace>:sagemaker-k8s-operator-default"
             }
           }
         }
       ]
     }
   ```

------

1. Run the following command to create a role with the trust relationship defined in `trust.json`. This role allows the Amazon EKS cluster to get and refresh credentials from IAM. 

   ```
   aws iam create-role --region ${AWS_REGION} --role-name <role name> --assume-role-policy-document file://trust.json --output=text
   ```

   Your output should look like the following: 

   ```
   ROLE    arn:aws:iam::123456789012:role/my-role 2019-11-22T21:46:10Z    /       ABCDEFSFODNN7EXAMPLE   my-role
     ASSUMEROLEPOLICYDOCUMENT        2012-10-17		 	 	 
     STATEMENT       sts:AssumeRoleWithWebIdentity   Allow
     STRINGEQUALS    sts.amazonaws.com       system:serviceaccount:my-namespace:sagemaker-k8s-operator-default
     PRINCIPAL       arn:aws:iam::123456789012:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/
   ```

Take note of `ROLE ARN`. You pass this value to your operator. 

##### Attach the AmazonSageMakerFullAccess policy to your role


To give the role access to SageMaker AI, attach the [https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess](https://console.aws.amazon.com/iam/home?#/policies/arn:aws:iam::aws:policy/AmazonSageMakerFullAccess) policy. If you want to limit permissions to the operator, you can create your own custom policy and attach it. 

 To attach `AmazonSageMakerFullAccess`, run the following command: 

```
aws iam attach-role-policy --role-name <role name>  --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
```

The Kubernetes ServiceAccount `sagemaker-k8s-operator-default` should have `AmazonSageMakerFullAccess` permissions. Confirm this when you install the operator. 

##### Deploy the operator to your namespace


When deploying your operator, you can use either a YAML file or Helm charts. 

##### Deploy the operator to your namespace using YAML


There are two parts to deploying an operator within the scope of a namespace. The first is the set of CRDs that are installed at a cluster level. These resource definitions only need to be installed once per Kubernetes cluster. The second part is the operator permissions and deployment itself. 

 If you have not already installed the CRDs into the cluster, apply the CRD installer YAML using the following command: 

```
kubectl apply -f https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/release/rolebased/namespaced/crd.yaml
```

To install the operator onto the cluster: 

1. Download the operator installer YAML using the following command: 

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/release/rolebased/namespaced/operator.yaml
   ```

1. Update the installer YAML to place the resources into your specified namespace using the following command: 

   ```
   sed -i -e 's/PLACEHOLDER-NAMESPACE/<YOUR NAMESPACE>/g' operator.yaml
   ```

1. Edit the `operator.yaml` file to place resources into your `eks.amazonaws.com/role-arn`. Replace the ARN here with the Amazon Resource Name (ARN) for the OIDC-based role you've created. 

1. Use the following command to deploy the cluster: 

   ```
   kubectl apply -f operator.yaml
   ```

##### Deploy the operator to your namespace using Helm Charts


There are two parts needed to deploy an operator within the scope of a namespace. The first is the set of CRDs that are installed at a cluster level. These resource definitions only need to be installed once per Kubernetes cluster. The second part is the operator permissions and deployment itself. When using Helm Charts you have to first create the namespace using `kubectl`. 

1. Clone the Helm installer directory using the following command: 

   ```
   git clone https://github.com/aws/amazon-sagemaker-operator-for-k8s.git
   ```

1. Navigate to the `amazon-sagemaker-operator-for-k8s/hack/charts/installer/namespaced` folder. Edit the `rolebased/values.yaml` file, which includes high-level parameters for the chart. Replace the role ARN here with the Amazon Resource Name (ARN) for the OIDC-based role you've created. 

1. Install the Helm Chart using the following command: 

   ```
   helm install crds crd_chart/
   ```

1. Create the required namespace and install the operator using the following command: 

   ```
   kubectl create namespace <namespace>
   helm install --n <namespace> op operator_chart/
   ```

1. After a moment, the chart is installed with the name `sagemaker-operator`. Verify that the installation succeeded by running the following command: 

   ```
   helm ls
   ```

   Your output should look like the following: 

   ```
   NAME                    NAMESPACE                       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
   sagemaker-operator      my-namespace                    1               2019-11-20 23:14:59.6777082 +0000 UTC   deployed        sagemaker-k8s-operator-0.1.0
   ```

##### Verify the operator deployment to your namespace


1. You should be able to see the SageMaker AI Custom Resource Definitions (CRDs) for each operator deployed to your cluster by running the following command: 

   ```
   kubectl get crd | grep sagemaker
   ```

   Your output should look like the following: 

   ```
   batchtransformjobs.sagemaker.aws.amazon.com         2019-11-20T17:12:34Z
   endpointconfigs.sagemaker.aws.amazon.com            2019-11-20T17:12:34Z
   hostingdeployments.sagemaker.aws.amazon.com         2019-11-20T17:12:34Z
   hyperparametertuningjobs.sagemaker.aws.amazon.com   2019-11-20T17:12:34Z
   models.sagemaker.aws.amazon.com                     2019-11-20T17:12:34Z
   trainingjobs.sagemaker.aws.amazon.com               2019-11-20T17:12:34Z
   ```

1. Ensure that the operator pod is running successfully. Use the following command to list all pods: 

   ```
   kubectl -n my-namespace get pods
   ```

   You should see a pod named `sagemaker-k8s-operator-controller-manager-*****` in the namespace `my-namespace` as follows: 

   ```
   NAME                                                         READY   STATUS    RESTARTS   AGE
   sagemaker-k8s-operator-controller-manager-12345678-r8abc     2/2     Running   0          23s
   ```

#### Install the SageMaker AI logs `kubectl` plugin


 As part of the SageMaker AI Operators for Kubernetes, you can use the `smlogs` [plugin](https://kubernetes.io/docs/tasks/extend-kubectl/kubectl-plugins/) for `kubectl`. This allows SageMaker AI CloudWatch logs to be streamed with `kubectl`. `kubectl` must be installed onto your [PATH](http://www.linfo.org/path_env_var.html). The following commands place the binary in the `sagemaker-k8s-bin` directory in your home directory, and add that directory to your `PATH`. 

```
export os="linux"
  
wget https://amazon-sagemaker-operator-for-k8s-us-east-1.s3.amazonaws.com/kubectl-smlogs-plugin/v1/${os}.amd64.tar.gz
tar xvzf ${os}.amd64.tar.gz
  
# Move binaries to a directory in your homedir.
mkdir ~/sagemaker-k8s-bin
cp ./kubectl-smlogs.${os}.amd64/kubectl-smlogs ~/sagemaker-k8s-bin/.
  
# This line adds the binaries to your PATH in your .bashrc.
  
echo 'export PATH=$PATH:~/sagemaker-k8s-bin' >> ~/.bashrc
  
# Source your .bashrc to update environment variables:
source ~/.bashrc
```

Use the following command to verify that the `kubectl` plugin is installed correctly: 

```
kubectl smlogs
```

If the `kubectl` plugin is installed correctly, your output should look like the following: 

```
View SageMaker AI logs via Kubernetes
  
Usage:
  smlogs [command]
  
Aliases:
  smlogs, SMLogs, Smlogs
  
Available Commands:
  BatchTransformJob       View BatchTransformJob logs via Kubernetes
  TrainingJob             View TrainingJob logs via Kubernetes
  help                    Help about any command
  
Flags:
   -h, --help   help for smlogs
  
Use "smlogs [command] --help" for more information about a command.
```

### Clean up resources


To uninstall the operator from your cluster, you must first make sure to delete all SageMaker AI resources from the cluster. Failure to do so causes the operator delete operation to hang. Run the following commands to stop all jobs: 

```
# Delete all SageMaker AI jobs from Kubernetes
kubectl delete --all --all-namespaces hyperparametertuningjob.sagemaker.aws.amazon.com
kubectl delete --all --all-namespaces trainingjobs.sagemaker.aws.amazon.com
kubectl delete --all --all-namespaces batchtransformjob.sagemaker.aws.amazon.com
kubectl delete --all --all-namespaces hostingdeployment.sagemaker.aws.amazon.com
```

You should see output similar to the following: 

```
$ kubectl delete --all --all-namespaces trainingjobs.sagemaker.aws.amazon.com
trainingjobs.sagemaker.aws.amazon.com "xgboost-mnist-from-for-s3" deleted
  
$ kubectl delete --all --all-namespaces hyperparametertuningjob.sagemaker.aws.amazon.com
hyperparametertuningjob.sagemaker.aws.amazon.com "xgboost-mnist-hpo" deleted
  
$ kubectl delete --all --all-namespaces batchtransformjob.sagemaker.aws.amazon.com
batchtransformjob.sagemaker.aws.amazon.com "xgboost-mnist" deleted
  
$ kubectl delete --all --all-namespaces hostingdeployment.sagemaker.aws.amazon.com
hostingdeployment.sagemaker.aws.amazon.com "host-xgboost" deleted
```

After you delete all SageMaker AI jobs, see [Delete operators](#delete-operators) to delete the operator from your cluster.

### Delete operators


#### Delete cluster-based operators


##### Operators installed using YAML


To uninstall the operator from your cluster, make sure that all SageMaker AI resources have been deleted from the cluster. Failure to do so causes the operator delete operation to hang.

**Note**  
Before deleting your cluster, be sure to delete all SageMaker AI resources from the cluster. See [Clean up resources](#cleanup-operator-resources) for more information.

After you delete all SageMaker AI jobs, use `kubectl` to delete the operator from the cluster:

```
# Delete the operator and its resources
kubectl delete -f /installer.yaml
```

You should see output similar to the following: 

```
$ kubectl delete -f raw-yaml/installer.yaml
namespace "sagemaker-k8s-operator-system" deleted
customresourcedefinition.apiextensions.k8s.io "batchtransformjobs.sagemaker.aws.amazon.com" deleted
customresourcedefinition.apiextensions.k8s.io "endpointconfigs.sagemaker.aws.amazon.com" deleted
customresourcedefinition.apiextensions.k8s.io "hostingdeployments.sagemaker.aws.amazon.com" deleted
customresourcedefinition.apiextensions.k8s.io "hyperparametertuningjobs.sagemaker.aws.amazon.com" deleted
customresourcedefinition.apiextensions.k8s.io "models.sagemaker.aws.amazon.com" deleted
customresourcedefinition.apiextensions.k8s.io "trainingjobs.sagemaker.aws.amazon.com" deleted
role.rbac.authorization.k8s.io "sagemaker-k8s-operator-leader-election-role" deleted
clusterrole.rbac.authorization.k8s.io "sagemaker-k8s-operator-manager-role" deleted
clusterrole.rbac.authorization.k8s.io "sagemaker-k8s-operator-proxy-role" deleted
rolebinding.rbac.authorization.k8s.io "sagemaker-k8s-operator-leader-election-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "sagemaker-k8s-operator-manager-rolebinding" deleted
clusterrolebinding.rbac.authorization.k8s.io "sagemaker-k8s-operator-proxy-rolebinding" deleted
service "sagemaker-k8s-operator-controller-manager-metrics-service" deleted
deployment.apps "sagemaker-k8s-operator-controller-manager" deleted
secrets "sagemaker-k8s-operator-abcde" deleted
```

##### Operators installed using Helm Charts


To delete the operator CRDs, first delete all the running jobs. Then delete the Helm Chart that was used to deploy the operators using the following commands: 

```
# get the helm charts
helm ls
  
# delete the charts
helm delete <chart_name>
```

#### Delete namespace-based operators


##### Operators installed with YAML


To uninstall the operator from your cluster, first make sure that all SageMaker AI resources have been deleted from the cluster. Failure to do so causes the operator delete operation to hang.

**Note**  
Before deleting your cluster, be sure to delete all SageMaker AI resources from the cluster. See [Clean up resources](#cleanup-operator-resources) for more information.

After you delete all SageMaker AI jobs, use `kubectl` to first delete the operator from the namespace and then the CRDs from the cluster. Run the following commands to delete the operator from the cluster: 

```
# Delete the operator using the same yaml file that was used to install the operator
kubectl delete -f operator.yaml
  
# Now delete the CRDs using the CRD installer yaml
kubectl delete -f https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/release/rolebased/namespaced/crd.yaml
  
# Now you can delete the namespace if you want
kubectl delete namespace <namespace>
```

##### Operators installed with Helm Charts


To delete the operator CRDs, first delete all the running jobs. Then delete the Helm Chart that was used to deploy the operators using the following commands: 

```
# Delete the operator
helm delete <chart_name>
  
# delete the crds
helm delete crds
  
# optionally delete the namespace
kubectl delete namespace <namespace>
```

### Troubleshooting


#### Debugging a failed job


Use these steps to debug a failed job.
+ Check the job status by running the following: 

  ```
  kubectl get <CRD Type> <job name>
  ```
+ If the job was created in SageMaker AI, you can use the following command to see the `STATUS` and the `SageMaker Job Name`: 

  ```
  kubectl get <crd type> <job name>
  ```
+ You can use `smlogs` to find the cause of the issue using the following command: 

  ```
  kubectl smlogs <crd type> <job name>
  ```
+  You can also use the `describe` command to get more details about the job using the following command. The output has an `additional` field that has more information about the status of the job. 

  ```
  kubectl describe <crd type> <job name>
  ```
+ If the job was not created in SageMaker AI, then use the logs of the operator's pod to find the cause of the issue as follows: 

  ```
  $ kubectl get pods -A | grep sagemaker
  # Output:
  sagemaker-k8s-operator-system   sagemaker-k8s-operator-controller-manager-5cd7df4d74-wh22z   2/2     Running   0          3h33m
    
  $ kubectl logs -p <pod name> -c manager -n sagemaker-k8s-operator-system
  ```

#### Deleting an operator CRD


If deleting a job is not working, check if the operator is running. If the operator is not running, then you have to delete the finalizer using the following steps: 

1. In a new terminal, open the job in an editor using `kubectl edit` as follows: 

   ```
   kubectl edit <crd type> <job name>
   ```

1. Edit the job to delete the finalizer by removing the following two lines from the file. Save the file and the job is be deleted. 

   ```
   finalizers:
     - sagemaker-operator-finalizer
   ```

### Images and SMlogs in each Region


The following table lists the available operator images and SMLogs in each Region. 


|  Region  |  Controller Image  |  Linux SMLogs  | 
| --- | --- | --- | 
|  us-east-1  |  957583890962.dkr.ecr.us-east-1.amazonaws.com/amazon-sagemaker-operator-for-k8s:v1  |  [https://s3.us-east-1.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-east-1/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz](https://s3.us-east-1.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-east-1/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz)  | 
|  us-east-2  |  922499468684.dkr.ecr.us-east-2.amazonaws.com/amazon-sagemaker-operator-for-k8s:v1  |  [https://s3.us-east-2.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-east-2/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz](https://s3.us-east-2.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-east-2/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz)  | 
|  us-west-2  |  640106867763.dkr.ecr.us-west-2.amazonaws.com/amazon-sagemaker-operator-for-k8s:v1  |  [https://s3.us-west-2.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-west-2/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz](https://s3.us-west-2.amazonaws.com/amazon-sagemaker-operator-for-k8s-us-west-2/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz)  | 
|  eu-west-1  |  613661167059.dkr.ecr.eu-west-1.amazonaws.com/amazon-sagemaker-operator-for-k8s:v1  |  [https://s3.eu-west-1.amazonaws.com/amazon-sagemaker-operator-for-k8s-eu-west-1/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz](https://s3.eu-west-1.amazonaws.com/amazon-sagemaker-operator-for-k8s-eu-west-1/kubectl-smlogs-plugin/v1/linux.amd64.tar.gz)  | 

# Use Amazon SageMaker AI Jobs
Use SageMaker AI Jobs

This section is based on the original version of [SageMaker AI Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s).

**Important**  
We are stopping the development and technical support of the original version of [ SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master).  
If you are currently using version `v1.2.2` or below of [ SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master), we recommend migrating your resources to the [ACK service controller for Amazon SageMaker](https://github.com/aws-controllers-k8s/sagemaker-controller). The ACK service controller is a new generation of SageMaker Operators for Kubernetes based on [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/).  
For information on the migration steps, see [Migrate resources to the latest Operators](kubernetes-sagemaker-operators-migrate.md).  
For answers to frequently asked questions on the end of support of the original version of SageMaker Operators for Kubernetes, see [Announcing the End of Support of the Original Version of SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-eos-announcement.md)

To run an Amazon SageMaker AI job using the Operators for Kubernetes, you can either apply a YAML file or use the supplied Helm Charts. 

All sample operator jobs in the following tutorials use sample data taken from a public MNIST dataset. In order to run these samples, download the dataset into your Amazon S3 bucket. You can find the dataset in [Download the MNIST Dataset.](https://docs.aws.amazon.com/sagemaker/latest/dg/ex1-preprocess-data-pull-data.html) 

**Topics**
+ [

## The TrainingJob operator
](#trainingjob-operator)
+ [

## The HyperParameterTuningJob operator
](#hyperparametertuningjobs-operator)
+ [

## The BatchTransformJob operator
](#batchtransformjobs-operator)
+ [

## The HostingDeployment operator
](#hosting-deployment-operator)
+ [

## The ProcessingJob operator
](#kubernetes-processing-job-operator)
+ [

## HostingAutoscalingPolicy (HAP) Operator
](#kubernetes-hap-operator)

## The TrainingJob operator


Training job operators reconcile your specified training job spec to SageMaker AI by launching it for you in SageMaker AI. You can learn more about SageMaker training jobs in the SageMaker AI [CreateTrainingJob API documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTrainingJob.html). 

**Topics**
+ [

### Create a TrainingJob using a YAML file
](#create-a-trainingjob-using-a-simple-yaml-file)
+ [

### Create a TrainingJob Using a Helm Chart
](#create-a-trainingjob-using-a-helm-chart)
+ [

### List TrainingJobs
](#list-training-jobs)
+ [

### Describe a TrainingJob
](#describe-a-training-job)
+ [

### View logs from TrainingJobs
](#view-logs-from-training-jobs)
+ [

### Delete TrainingJobs
](#delete-training-jobs)

### Create a TrainingJob using a YAML file


1. Download the sample YAML file for training using the following command: 

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/xgboost-mnist-trainingjob.yaml
   ```

1. Edit the `xgboost-mnist-trainingjob.yaml` file to replace the `roleArn` parameter with your `<sagemaker-execution-role>`, and `outputPath` with your Amazon S3 bucket to which the SageMaker AI execution role has write access. The `roleArn` must have permissions so that SageMaker AI can access Amazon S3, Amazon CloudWatch, and other services on your behalf. For more information on creating an SageMaker AI ExecutionRole, see [SageMaker AI Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html#sagemaker-roles-createtrainingjob-perms). Apply the YAML file using the following command: 

   ```
   kubectl apply -f xgboost-mnist-trainingjob.yaml
   ```

### Create a TrainingJob Using a Helm Chart


You can use Helm Charts to run TrainingJobs. 

1. Clone the GitHub repository to get the source using the following command: 

   ```
   git clone https://github.com/aws/amazon-sagemaker-operator-for-k8s.git
   ```

1. Navigate to the `amazon-sagemaker-operator-for-k8s/hack/charts/training-jobs/` folder and edit the `values.yaml` file to replace values like `rolearn` and `outputpath` with values that correspond to your account. The RoleARN must have permissions so that SageMaker AI can access Amazon S3, Amazon CloudWatch, and other services on your behalf. For more information on creating an SageMaker AI ExecutionRole, see [SageMaker AI Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html#sagemaker-roles-createtrainingjob-perms). 

#### Create the TrainingJob


With the roles and Amazon S3 buckets replaced with appropriate values in `values.yaml`, you can create a training job using the following command: 

```
helm install . --generate-name
```

Your output should look like the following: 

```
NAME: chart-12345678
LAST DEPLOYED: Wed Nov 20 23:35:49 2019
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thanks for installing the sagemaker-k8s-trainingjob.
```

#### Verify your training Helm Chart


To verify that the Helm Chart was created successfully, run: 

```
helm ls
```

Your output should look like the following: 

```
NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
chart-12345678        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-trainingjob-0.1.0
rolebased-12345678    default         1               2019-11-20 23:14:59.6777082 +0000 UTC   deployed        sagemaker-k8s-operator-0.1.0
```

`helm install` creates a `TrainingJob` Kubernetes resource. The operator launches the actual training job in SageMaker AI and updates the `TrainingJob` Kubernetes resource to reflect the status of the job in SageMaker AI. You incur charges for SageMaker AI resources used during the duration of your job. You do not incur any charges once your job completes or stops. 

**Note**: SageMaker AI does not allow you to update a running training job. You cannot edit any parameter and re-apply the config file. Either change the metadata name or delete the existing job and create a new one. Similar to existing training job operators like TFJob in Kubeflow, `update` is not supported. 

### List TrainingJobs


Use the following command to list all jobs created using the Kubernetes operator: 

```
kubectl get TrainingJob
```

The output listing all jobs should look like the following: 

```
kubectl get trainingjobs
NAME                        STATUS       SECONDARY-STATUS   CREATION-TIME          SAGEMAKER-JOB-NAME
xgboost-mnist-from-for-s3   InProgress   Starting           2019-11-20T23:42:35Z   xgboost-mnist-from-for-s3-examplef11eab94e0ed4671d5a8f
```

A training job continues to be listed after the job has completed or failed. You can remove a `TrainingJob` job from the list by following the [Delete TrainingJobs](#delete-training-jobs) steps. Jobs that have completed or stopped do not incur any charges for SageMaker AI resources. 

#### TrainingJob status values


The `STATUS` field can be one of the following values: 
+ `Completed` 
+ `InProgress` 
+ `Failed` 
+ `Stopped` 
+ `Stopping` 

These statuses come directly from the SageMaker AI official [API documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DescribeTrainingJob.html#SageMaker-DescribeTrainingJob-response-TrainingJobStatus). 

In addition to the official SageMaker AI status, it is possible for `STATUS` to be `SynchronizingK8sJobWithSageMaker`. This means that the operator has not yet processed the job. 

#### Secondary status values


The secondary statuses come directly from the SageMaker AI official [API documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DescribeTrainingJob.html#SageMaker-DescribeTrainingJob-response-SecondaryStatus). They contain more granular information about the status of the job. 

### Describe a TrainingJob


You can get more details about the training job by using the `describe` `kubectl` command. This is typically used for debugging a problem or checking the parameters of a training job. To get information about your training job, use the following command: 

```
kubectl describe trainingjob xgboost-mnist-from-for-s3
```

The output for your training job should look like the following: 

```
Name:         xgboost-mnist-from-for-s3
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  sagemaker.aws.amazon.com/v1
Kind:         TrainingJob
Metadata:
  Creation Timestamp:  2019-11-20T23:42:35Z
  Finalizers:
    sagemaker-operator-finalizer
  Generation:        2
  Resource Version:  23119
  Self Link:         /apis/sagemaker.aws.amazon.com/v1/namespaces/default/trainingjobs/xgboost-mnist-from-for-s3
  UID:               6d7uiui-0bef-11ea-b94e-0ed467example
Spec:
  Algorithm Specification:
    Training Image:       8256416981234.dkr.ecr.us-east-2.amazonaws.com/xgboost:1
    Training Input Mode:  File
  Hyper Parameters:
    Name:   eta
    Value:  0.2
    Name:   gamma
    Value:  4
    Name:   max_depth
    Value:  5
    Name:   min_child_weight
    Value:  6
    Name:   num_class
    Value:  10
    Name:   num_round
    Value:  10
    Name:   objective
    Value:  multi:softmax
    Name:   silent
    Value:  0
  Input Data Config:
    Channel Name:      train
    Compression Type:  None
    Content Type:      text/csv
    Data Source:
      S 3 Data Source:
        S 3 Data Distribution Type:  FullyReplicated
        S 3 Data Type:               S3Prefix
        S 3 Uri:                     https://s3-us-east-2.amazonaws.com/amzn-s3-demo-bucket/sagemaker/xgboost-mnist/train/
    Channel Name:                    validation
    Compression Type:                None
    Content Type:                    text/csv
    Data Source:
      S 3 Data Source:
        S 3 Data Distribution Type:  FullyReplicated
        S 3 Data Type:               S3Prefix
        S 3 Uri:                     https://s3-us-east-2.amazonaws.com/amzn-s3-demo-bucket/sagemaker/xgboost-mnist/validation/
  Output Data Config:
    S 3 Output Path:  s3://amzn-s3-demo-bucket/sagemaker/xgboost-mnist/xgboost/
  Region:             us-east-2
  Resource Config:
    Instance Count:     1
    Instance Type:      ml.m4.xlarge
    Volume Size In GB:  5
  Role Arn:             arn:aws:iam::12345678910:role/service-role/AmazonSageMaker-ExecutionRole
  Stopping Condition:
    Max Runtime In Seconds:  86400
  Training Job Name:         xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0example
Status:
  Cloud Watch Log URL:           https://us-east-2.console.aws.amazon.com/cloudwatch/home?region=us-east-2#logStream:group=/aws/sagemaker/TrainingJobs;prefix=<example>;streamFilter=typeLogStreamPrefix
  Last Check Time:               2019-11-20T23:44:29Z
  Sage Maker Training Job Name:  xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94eexample
  Secondary Status:              Downloading
  Training Job Status:           InProgress
Events:                          <none>
```

### View logs from TrainingJobs


Use the following command to see the logs from the `kmeans-mnist` training job: 

```
kubectl smlogs trainingjob xgboost-mnist-from-for-s3
```

Your output should look similar to the following. The logs from instances are ordered chronologically. 

```
"xgboost-mnist-from-for-s3" has SageMaker TrainingJobName "xgboost-mnist-from-for-s3-123456789" in region "us-east-2", status "InProgress" and secondary status "Starting"
xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0ed46example/algo-1-1574293123 2019-11-20 23:45:24.7 +0000 UTC Arguments: train
xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0ed46example/algo-1-1574293123 2019-11-20 23:45:24.7 +0000 UTC [2019-11-20:23:45:22:INFO] Running standalone xgboost training.
xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0ed46example/algo-1-1574293123 2019-11-20 23:45:24.7 +0000 UTC [2019-11-20:23:45:22:INFO] File size need to be processed in the node: 1122.95mb. Available memory size in the node: 8586.0mb
xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0ed46example/algo-1-1574293123 2019-11-20 23:45:24.7 +0000 UTC [2019-11-20:23:45:22:INFO] Determined delimiter of CSV input is ','
xgboost-mnist-from-for-s3-6d7fa0af0bef11eab94e0ed46example/algo-1-1574293123 2019-11-20 23:45:24.7 +0000 UTC [23:45:22] S3DistributionType set as FullyReplicated
```

### Delete TrainingJobs


Use the following command to stop a training job on Amazon SageMaker AI: 

```
kubectl delete trainingjob xgboost-mnist-from-for-s3
```

This command removes the SageMaker training job from Kubernetes. This command returns the following output: 

```
trainingjob.sagemaker.aws.amazon.com "xgboost-mnist-from-for-s3" deleted
```

If the job is still in progress on SageMaker AI, the job stops. You do not incur any charges for SageMaker AI resources after your job stops or completes. 

**Note**: SageMaker AI does not delete training jobs. Stopped jobs continue to show on the SageMaker AI console. The `delete` command takes about 2 minutes to clean up the resources from SageMaker AI. 

## The HyperParameterTuningJob operator


Hyperparameter tuning job operators reconcile your specified hyperparameter tuning job spec to SageMaker AI by launching it in SageMaker AI. You can learn more about SageMaker AI hyperparameter tuning jobs in the SageMaker AI [CreateHyperParameterTuningJob API documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateHyperParameterTuningJob.html). 

**Topics**
+ [

### Create a HyperparameterTuningJob using a YAML file
](#create-a-hyperparametertuningjob-using-a-simple-yaml-file)
+ [

### Create a HyperparameterTuningJob using a Helm Chart
](#create-a-hyperparametertuningjob-using-a-helm-chart)
+ [

### List HyperparameterTuningJobs
](#list-hyperparameter-tuning-jobs)
+ [

### Describe a HyperparameterTuningJob
](#describe-a-hyperparameter-tuning-job)
+ [

### View logs from HyperparameterTuningJobs
](#view-logs-from-hyperparametertuning-jobs)
+ [

### Delete a HyperparameterTuningJob
](#delete-hyperparametertuning-jobs)

### Create a HyperparameterTuningJob using a YAML file


1. Download the sample YAML file for the hyperparameter tuning job using the following command: 

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/xgboost-mnist-hpo.yaml
   ```

1. Edit the `xgboost-mnist-hpo.yaml` file to replace the `roleArn` parameter with your `sagemaker-execution-role`. For the hyperparameter tuning job to succeed, you must also change the `s3InputPath` and `s3OutputPath` to values that correspond to your account. Apply the updates YAML file using the following command: 

   ```
   kubectl apply -f xgboost-mnist-hpo.yaml
   ```

### Create a HyperparameterTuningJob using a Helm Chart


You can use Helm Charts to run hyperparameter tuning jobs. 

1. Clone the GitHub repository to get the source using the following command: 

   ```
   git clone https://github.com/aws/amazon-sagemaker-operator-for-k8s.git
   ```

1. Navigate to the `amazon-sagemaker-operator-for-k8s/hack/charts/hyperparameter-tuning-jobs/` folder. 

1. Edit the `values.yaml` file to replace the `roleArn` parameter with your `sagemaker-execution-role`. For the hyperparameter tuning job to succeed, you must also change the `s3InputPath` and `s3OutputPath` to values that correspond to your account. 

#### Create the HyperparameterTuningJob


With the roles and Amazon S3 paths replaced with appropriate values in `values.yaml`, you can create a hyperparameter tuning job using the following command: 

```
helm install . --generate-name
```

Your output should look similar to the following: 

```
NAME: chart-1574292948
LAST DEPLOYED: Wed Nov 20 23:35:49 2019
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
Thanks for installing the sagemaker-k8s-hyperparametertuningjob.
```

#### Verify chart installation


To verify that the Helm Chart was created successfully, run the following command: 

```
helm ls
```

Your output should look like the following: 

```
NAME                    NAMESPACE       REVISION        UPDATED
chart-1474292948        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-hyperparametertuningjob-0.1.0                               STATUS          CHART                           APP VERSION
chart-1574292948        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-trainingjob-0.1.0
rolebased-1574291698    default         1               2019-11-20 23:14:59.6777082 +0000 UTC   deployed        sagemaker-k8s-operator-0.1.0
```

`helm install` creates a `HyperParameterTuningJob` Kubernetes resource. The operator launches the actual hyperparameter optimization job in SageMaker AI and updates the `HyperParameterTuningJob` Kubernetes resource to reflect the status of the job in SageMaker AI. You incur charges for SageMaker AI resources used during the duration of your job. You do not incur any charges once your job completes or stops. 

**Note**: SageMaker AI does not allow you to update a running hyperparameter tuning job. You cannot edit any parameter and re-apply the config file. You must either change the metadata name or delete the existing job and create a new one. Similar to existing training job operators like `TFJob` in Kubeflow, `update` is not supported. 

### List HyperparameterTuningJobs


Use the following command to list all jobs created using the Kubernetes operator: 

```
kubectl get hyperparametertuningjob
```

Your output should look like the following: 

```
NAME         STATUS      CREATION-TIME          COMPLETED   INPROGRESS   ERRORS   STOPPED   BEST-TRAINING-JOB                               SAGEMAKER-JOB-NAME
xgboost-mnist-hpo   Completed   2019-10-17T01:15:52Z   10          0            0        0         xgboostha92f5e3cf07b11e9bf6c06d6-009-4c7a123   xgboostha92f5e3cf07b11e9bf6c123
```

A hyperparameter tuning job continues to be listed after the job has completed or failed. You can remove a `hyperparametertuningjob` from the list by following the steps in [Delete a HyperparameterTuningJob](#delete-hyperparametertuning-jobs). Jobs that have completed or stopped do not incur any charges for SageMaker AI resources.

#### Hyperparameter tuning job status values


The `STATUS` field can be one of the following values: 
+ `Completed` 
+ `InProgress` 
+ `Failed` 
+ `Stopped` 
+ `Stopping` 

These statuses come directly from the SageMaker AI official [API documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DescribeHyperParameterTuningJob.html#SageMaker-DescribeHyperParameterTuningJob-response-HyperParameterTuningJobStatus). 

In addition to the official SageMaker AI status, it is possible for `STATUS` to be `SynchronizingK8sJobWithSageMaker`. This means that the operator has not yet processed the job. 

#### Status counters


The output has several counters, like `COMPLETED` and `INPROGRESS`. These represent how many training jobs have completed and are in progress, respectively. For more information about how these are determined, see [TrainingJobStatusCounters](https://docs.aws.amazon.com/sagemaker/latest/dg/API_TrainingJobStatusCounters.html) in the SageMaker API documentation. 

#### Best TrainingJob


This column contains the name of the `TrainingJob` that best optimized the selected metric. 

To see a summary of the tuned hyperparameters, run: 

```
kubectl describe hyperparametertuningjob xgboost-mnist-hpo
```

To see detailed information about the `TrainingJob`, run: 

```
kubectl describe trainingjobs <job name>
```

#### Spawned TrainingJobs


You can also track all 10 training jobs in Kubernetes launched by `HyperparameterTuningJob` by running the following command: 

```
kubectl get trainingjobs
```

### Describe a HyperparameterTuningJob


You can obtain debugging details using the `describe` `kubectl` command.

```
kubectl describe hyperparametertuningjob xgboost-mnist-hpo
```

In addition to information about the tuning job, the SageMaker AI Operator for Kubernetes also exposes the [best training job](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-monitor.html#automatic-model-tuning-best-training-job) found by the hyperparameter tuning job in the `describe` output as follows: 

```
Name:         xgboost-mnist-hpo
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"sagemaker.aws.amazon.com/v1","kind":"HyperparameterTuningJob","metadata":{"annotations":{},"name":"xgboost-mnist-hpo","namespace":...
API Version:  sagemaker.aws.amazon.com/v1
Kind:         HyperparameterTuningJob
Metadata:
  Creation Timestamp:  2019-10-17T01:15:52Z
  Finalizers:
    sagemaker-operator-finalizer
  Generation:        2
  Resource Version:  8167
  Self Link:         /apis/sagemaker.aws.amazon.com/v1/namespaces/default/hyperparametertuningjobs/xgboost-mnist-hpo
  UID:               a92f5e3c-f07b-11e9-bf6c-06d6f303uidu
Spec:
  Hyper Parameter Tuning Job Config:
    Hyper Parameter Tuning Job Objective:
      Metric Name:  validation:error
      Type:         Minimize
    Parameter Ranges:
      Integer Parameter Ranges:
        Max Value:     20
        Min Value:     10
        Name:          num_round
        Scaling Type:  Linear
    Resource Limits:
      Max Number Of Training Jobs:     10
      Max Parallel Training Jobs:      10
    Strategy:                          Bayesian
    Training Job Early Stopping Type:  Off
  Hyper Parameter Tuning Job Name:     xgboostha92f5e3cf07b11e9bf6c06d6
  Region:                              us-east-2
  Training Job Definition:
    Algorithm Specification:
      Training Image:       12345678910.dkr.ecr.us-east-2.amazonaws.com/xgboost:1
      Training Input Mode:  File
    Input Data Config:
      Channel Name:  train
      Content Type:  text/csv
      Data Source:
        s3DataSource:
          s3DataDistributionType:  FullyReplicated
          s3DataType:              S3Prefix
          s3Uri:                   https://s3-us-east-2.amazonaws.com/amzn-s3-demo-bucket/sagemaker/xgboost-mnist/train/
      Channel Name:                validation
      Content Type:                text/csv
      Data Source:
        s3DataSource:
          s3DataDistributionType:  FullyReplicated
          s3DataType:              S3Prefix
          s3Uri:                   https://s3-us-east-2.amazonaws.com/amzn-s3-demo-bucket/sagemaker/xgboost-mnist/validation/
    Output Data Config:
      s3OutputPath:  https://s3-us-east-2.amazonaws.com/amzn-s3-demo-bucket/sagemaker/xgboost-mnist/xgboost
    Resource Config:
      Instance Count:     1
      Instance Type:      ml.m4.xlarge
      Volume Size In GB:  5
    Role Arn:             arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole
    Static Hyper Parameters:
      Name:   base_score
      Value:  0.5
      Name:   booster
      Value:  gbtree
      Name:   csv_weights
      Value:  0
      Name:   dsplit
      Value:  row
      Name:   grow_policy
      Value:  depthwise
      Name:   lambda_bias
      Value:  0.0
      Name:   max_bin
      Value:  256
      Name:   max_leaves
      Value:  0
      Name:   normalize_type
      Value:  tree
      Name:   objective
      Value:  reg:linear
      Name:   one_drop
      Value:  0
      Name:   prob_buffer_row
      Value:  1.0
      Name:   process_type
      Value:  default
      Name:   rate_drop
      Value:  0.0
      Name:   refresh_leaf
      Value:  1
      Name:   sample_type
      Value:  uniform
      Name:   scale_pos_weight
      Value:  1.0
      Name:   silent
      Value:  0
      Name:   sketch_eps
      Value:  0.03
      Name:   skip_drop
      Value:  0.0
      Name:   tree_method
      Value:  auto
      Name:   tweedie_variance_power
      Value:  1.5
    Stopping Condition:
      Max Runtime In Seconds:  86400
Status:
  Best Training Job:
    Creation Time:  2019-10-17T01:16:14Z
    Final Hyper Parameter Tuning Job Objective Metric:
      Metric Name:        validation:error
      Value:
    Objective Status:     Succeeded
    Training End Time:    2019-10-17T01:20:24Z
    Training Job Arn:     arn:aws:sagemaker:us-east-2:123456789012:training-job/xgboostha92f5e3cf07b11e9bf6c06d6-009-4sample
    Training Job Name:    xgboostha92f5e3cf07b11e9bf6c06d6-009-4c7a3059
    Training Job Status:  Completed
    Training Start Time:  2019-10-17T01:18:35Z
    Tuned Hyper Parameters:
      Name:                                    num_round
      Value:                                   18
  Hyper Parameter Tuning Job Status:           Completed
  Last Check Time:                             2019-10-17T01:21:01Z
  Sage Maker Hyper Parameter Tuning Job Name:  xgboostha92f5e3cf07b11e9bf6c06d6
  Training Job Status Counters:
    Completed:            10
    In Progress:          0
    Non Retryable Error:  0
    Retryable Error:      0
    Stopped:              0
    Total Error:          0
Events:                   <none>
```

### View logs from HyperparameterTuningJobs


Hyperparameter tuning jobs do not have logs, but all training jobs launched by them do have logs. These logs can be accessed as if they were a normal training job. For more information, see [View logs from TrainingJobs](#view-logs-from-training-jobs).

### Delete a HyperparameterTuningJob


Use the following command to stop a hyperparameter job in SageMaker AI. 

```
kubectl delete hyperparametertuningjob xgboost-mnist-hpo
```

This command removes the hyperparameter tuning job and associated training jobs from your Kubernetes cluster and stops them in SageMaker AI. Jobs that have stopped or completed do not incur any charges for SageMaker AI resources. SageMaker AI does not delete hyperparameter tuning jobs. Stopped jobs continue to show on the SageMaker AI console. 

Your output should look like the following: 

```
hyperparametertuningjob.sagemaker.aws.amazon.com "xgboost-mnist-hpo" deleted
```

**Note**: The delete command takes about 2 minutes to clean up the resources from SageMaker AI. 

## The BatchTransformJob operator


Batch transform job operators reconcile your specified batch transform job spec to SageMaker AI by launching it in SageMaker AI. You can learn more about SageMaker AI batch transform job in the SageMaker AI [CreateTransformJob API documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateTransformJob.html). 

**Topics**
+ [

### Create a BatchTransformJob using a YAML File
](#create-a-batchtransformjob-using-a-simple-yaml-file)
+ [

### Create a BatchTransformJob using a Helm Chart
](#create-a-batchtransformjob-using-a-helm-chart)
+ [

### List BatchTransformJobs
](#list-batch-transform-jobs)
+ [

### Describe a BatchTransformJob
](#describe-a-batch-transform-job)
+ [

### View logs from BatchTransformJobs
](#view-logs-from-batch-transform-jobs)
+ [

### Delete a BatchTransformJob
](#delete-a-batch-transform-job)

### Create a BatchTransformJob using a YAML File


1. Download the sample YAML file for the batch transform job using the following command: 

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/xgboost-mnist-batchtransform.yaml
   ```

1. Edit the file `xgboost-mnist-batchtransform.yaml` to change necessary parameters to replace the `inputdataconfig` with your input data and `s3OutputPath` with your Amazon S3 buckets that the SageMaker AI execution role has write access to. 

1. Apply the YAML file using the following command: 

   ```
   kubectl apply -f xgboost-mnist-batchtransform.yaml
   ```

### Create a BatchTransformJob using a Helm Chart


You can use Helm Charts to run batch transform jobs. 

#### Get the Helm installer directory


Clone the GitHub repository to get the source using the following command: 

```
git clone https://github.com/aws/amazon-sagemaker-operator-for-k8s.git
```

#### Configure the Helm Chart


Navigate to the `amazon-sagemaker-operator-for-k8s/hack/charts/batch-transform-jobs/` folder. 

Edit the `values.yaml` file to replace the `inputdataconfig` with your input data and outputPath with your S3 buckets to which the SageMaker AI execution role has write access. 

#### Create a BatchTransformJob


1. Use the following command to create a batch transform job: 

   ```
   helm install . --generate-name
   ```

   Your output should look like the following: 

   ```
   NAME: chart-1574292948
   LAST DEPLOYED: Wed Nov 20 23:35:49 2019
   NAMESPACE: default
   STATUS: deployed
   REVISION: 1
   TEST SUITE: None
   NOTES:
   Thanks for installing the sagemaker-k8s-batch-transform-job.
   ```

1. To verify that the Helm Chart was created successfully, run the following command: 

   ```
   helm ls
   NAME                    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART                           APP VERSION
   chart-1474292948        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-batchtransformjob-0.1.0
   chart-1474292948        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-hyperparametertuningjob-0.1.0
   chart-1574292948        default         1               2019-11-20 23:35:49.9136092 +0000 UTC   deployed        sagemaker-k8s-trainingjob-0.1.0
   rolebased-1574291698    default         1               2019-11-20 23:14:59.6777082 +0000 UTC   deployed        sagemaker-k8s-operator-0.1.0
   ```

   This command creates a `BatchTransformJob` Kubernetes resource. The operator launches the actual transform job in SageMaker AI and updates the `BatchTransformJob` Kubernetes resource to reflect the status of the job in SageMaker AI. You incur charges for SageMaker AI resources used during the duration of your job. You do not incur any charges once your job completes or stops. 

**Note**: SageMaker AI does not allow you to update a running batch transform job. You cannot edit any parameter and re-apply the config file. You must either change the metadata name or delete the existing job and create a new one. Similar to existing training job operators like `TFJob` in Kubeflow, `update` is not supported. 

### List BatchTransformJobs


Use the following command to list all jobs created using the Kubernetes operator: 

```
kubectl get batchtransformjob
```

Your output should look like the following: 

```
NAME                                STATUS      CREATION-TIME          SAGEMAKER-JOB-NAME
xgboost-mnist-batch-transform       Completed   2019-11-18T03:44:00Z   xgboost-mnist-a88fb19809b511eaac440aa8axgboost
```

A batch transform job continues to be listed after the job has completed or failed. You can remove a `hyperparametertuningjob` from the list by following the [Delete a BatchTransformJob](#delete-a-batch-transform-job) steps. Jobs that have completed or stopped do not incur any charges for SageMaker AI resources. 

#### Batch transform status values


The `STATUS` field can be one of the following values: 
+ `Completed` 
+ `InProgress` 
+ `Failed` 
+ `Stopped` 
+ `Stopping` 

These statuses come directly from the SageMaker AI official [API documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DescribeHyperParameterTuningJob.html#SageMaker-DescribeHyperParameterTuningJob-response-HyperParameterTuningJobStatus). 

In addition to the official SageMaker AI status, it is possible for `STATUS` to be `SynchronizingK8sJobWithSageMaker`. This means that the operator has not yet processed the job.

### Describe a BatchTransformJob


You can obtain debugging details using the `describe` `kubectl` command.

```
kubectl describe batchtransformjob xgboost-mnist-batch-transform
```

Your output should look like the following: 

```
Name:         xgboost-mnist-batch-transform
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"sagemaker.aws.amazon.com/v1","kind":"BatchTransformJob","metadata":{"annotations":{},"name":"xgboost-mnist","namespace"...
API Version:  sagemaker.aws.amazon.com/v1
Kind:         BatchTransformJob
Metadata:
  Creation Timestamp:  2019-11-18T03:44:00Z
  Finalizers:
    sagemaker-operator-finalizer
  Generation:        2
  Resource Version:  21990924
  Self Link:         /apis/sagemaker.aws.amazon.com/v1/namespaces/default/batchtransformjobs/xgboost-mnist
  UID:               a88fb198-09b5-11ea-ac44-0aa8a9UIDNUM
Spec:
  Model Name:  TrainingJob-20190814SMJOb-IKEB
  Region:      us-east-1
  Transform Input:
    Content Type:  text/csv
    Data Source:
      S 3 Data Source:
        S 3 Data Type:  S3Prefix
        S 3 Uri:        s3://amzn-s3-demo-bucket/mnist_kmeans_example/input
  Transform Job Name:   xgboost-mnist-a88fb19809b511eaac440aa8a9SMJOB
  Transform Output:
    S 3 Output Path:  s3://amzn-s3-demo-bucket/mnist_kmeans_example/output
  Transform Resources:
    Instance Count:  1
    Instance Type:   ml.m4.xlarge
Status:
  Last Check Time:                2019-11-19T22:50:40Z
  Sage Maker Transform Job Name:  xgboost-mnist-a88fb19809b511eaac440aaSMJOB
  Transform Job Status:           Completed
Events:                           <none>
```

### View logs from BatchTransformJobs


Use the following command to see the logs from the `xgboost-mnist` batch transform job: 

```
kubectl smlogs batchtransformjob xgboost-mnist-batch-transform
```

### Delete a BatchTransformJob


Use the following command to stop a batch transform job in SageMaker AI. 

```
kubectl delete batchTransformJob xgboost-mnist-batch-transform
```

Your output should look like the following: 

```
batchtransformjob.sagemaker.aws.amazon.com "xgboost-mnist" deleted
```

This command removes the batch transform job from your Kubernetes cluster, as well as stops them in SageMaker AI. Jobs that have stopped or completed do not incur any charges for SageMaker AI resources. Delete takes about 2 minutes to clean up the resources from SageMaker AI. 

**Note**: SageMaker AI does not delete batch transform jobs. Stopped jobs continue to show on the SageMaker AI console. 

## The HostingDeployment operator


HostingDeployment operators support creating and deleting an endpoint, as well as updating an existing endpoint, for real-time inference. The hosting deployment operator reconciles your specified hosting deployment job spec to SageMaker AI by creating models, endpoint-configs and endpoints in SageMaker AI. You can learn more about SageMaker AI inference in the SageMaker AI [CreateEndpoint API documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateEndpoint.html). 

**Topics**
+ [

### Configure a HostingDeployment resource
](#configure-a-hostingdeployment-resource)
+ [

### Create a HostingDeployment
](#create-a-hostingdeployment)
+ [

### List HostingDeployments
](#list-hostingdeployments)
+ [

### Describe a HostingDeployment
](#describe-a-hostingdeployment)
+ [

### Invoking the endpoint
](#invoking-the-endpoint)
+ [

### Update HostingDeployment
](#update-hostingdeployment)
+ [

### Delete the HostingDeployment
](#delete-the-hostingdeployment)

### Configure a HostingDeployment resource


Download the sample YAML file for the hosting deployment job using the following command: 

```
wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/xgboost-mnist-hostingdeployment.yaml
```

The `xgboost-mnist-hostingdeployment.yaml` file has the following components that can be edited as required: 
+ *ProductionVariants*. A production variant is a set of instances serving a single model. SageMaker AI load-balances between all production variants according to set weights. 
+ *Models*. A model is the containers and execution role ARN necessary to serve a model. It requires at least a single container. 
+ *Containers*. A container specifies the dataset and serving image. If you are using your own custom algorithm instead of an algorithm provided by SageMaker AI, the inference code must meet SageMaker AI requirements. For more information, see [Using Your Own Algorithms with SageMaker AI](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms.html). 

### Create a HostingDeployment


To create a HostingDeployment, use `kubectl` to apply the file `hosting.yaml` with the following command: 

```
kubectl apply -f hosting.yaml
```

SageMaker AI creates an endpoint with the specified configuration. You incur charges for SageMaker AI resources used during the lifetime of your endpoint. You do not incur any charges once your endpoint is deleted. 

The creation process takes approximately 10 minutes. 

### List HostingDeployments


To verify that the HostingDeployment was created, use the following command: 

```
kubectl get hostingdeployments
```

Your output should look like the following: 

```
NAME           STATUS     SAGEMAKER-ENDPOINT-NAME
host-xgboost   Creating   host-xgboost-def0e83e0d5f11eaaa450aSMLOGS
```

#### HostingDeployment status values


The status field can be one of several values: 
+ `SynchronizingK8sJobWithSageMaker`: The operator is preparing to create the endpoint. 
+ `ReconcilingEndpoint`: The operator is creating, updating, or deleting endpoint resources. If the HostingDeployment remains in this state, use `kubectl describe` to see the reason in the `Additional` field. 
+ `OutOfService`: The endpoint is not available to take incoming requests. 
+ `Creating`: [CreateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateEndpoint.html) is running. 
+ `Updating`: [UpdateEndpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/API_UpdateEndpoint.html) or [UpdateEndpointWeightsAndCapacities](https://docs.aws.amazon.com/sagemaker/latest/dg/API_UpdateEndpointWeightsAndCapacities.html) is running. 
+ `SystemUpdating`: The endpoint is undergoing maintenance and cannot be updated or deleted or re-scaled until it has completed. This maintenance operation does not change any customer-specified values such as VPC config, AWS KMS encryption, model, instance type, or instance count. 
+ `RollingBack`: The endpoint fails to scale up or down or change its variant weight and is in the process of rolling back to its previous configuration. Once the rollback completes, the endpoint returns to an `InService` status. This transitional status only applies to an endpoint that has autoscaling turned on and is undergoing variant weight or capacity changes as part of an [UpdateEndpointWeightsAndCapacities](https://docs.aws.amazon.com/sagemaker/latest/dg/API_UpdateEndpointWeightsAndCapacities.html) call or when the [UpdateEndpointWeightsAndCapacities](https://docs.aws.amazon.com/sagemaker/latest/dg/API_UpdateEndpointWeightsAndCapacities.html) operation is called explicitly. 
+ `InService`: The endpoint is available to process incoming requests. 
+ `Deleting`: [DeleteEndpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DeleteEndpoint.html) is running. 
+ `Failed`: The endpoint could not be created, updated, or re-scaled. Use [DescribeEndpoint:FailureReason](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DescribeEndpoint.html#SageMaker-DescribeEndpoint-response-FailureReason) for information about the failure. [DeleteEndpoint](https://docs.aws.amazon.com/sagemaker/latest/dg/API_DeleteEndpoint.html) is the only operation that can be performed on a failed endpoint. 

### Describe a HostingDeployment


You can obtain debugging details using the `describe` `kubectl` command.

```
kubectl describe hostingdeployment
```

Your output should look like the following: 

```
Name:         host-xgboost
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"sagemaker.aws.amazon.com/v1","kind":"HostingDeployment","metadata":{"annotations":{},"name":"host-xgboost","namespace":"def..."
API Version:  sagemaker.aws.amazon.com/v1
Kind:         HostingDeployment
Metadata:
  Creation Timestamp:  2019-11-22T19:40:00Z
  Finalizers:
    sagemaker-operator-finalizer
  Generation:        1
  Resource Version:  4258134
  Self Link:         /apis/sagemaker.aws.amazon.com/v1/namespaces/default/hostingdeployments/host-xgboost
  UID:               def0e83e-0d5f-11ea-aa45-0a3507uiduid
Spec:
  Containers:
    Container Hostname:  xgboost
    Image:               123456789012.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest
    Model Data URL:      s3://amzn-s3-demo-bucket/inference/xgboost-mnist/model.tar.gz
  Models:
    Containers:
      xgboost
    Execution Role Arn:  arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole
    Name:                xgboost-model
    Primary Container:   xgboost
  Production Variants:
    Initial Instance Count:  1
    Instance Type:           ml.c5.large
    Model Name:              xgboost-model
    Variant Name:            all-traffic
  Region:                    us-east-2
Status:
  Creation Time:         2019-11-22T19:40:04Z
  Endpoint Arn:          arn:aws:sagemaker:us-east-2:123456789012:endpoint/host-xgboost-def0e83e0d5f11eaaaexample
  Endpoint Config Name:  host-xgboost-1-def0e83e0d5f11e-e08f6c510d5f11eaaa450aexample
  Endpoint Name:         host-xgboost-def0e83e0d5f11eaaa450a350733ba06
  Endpoint Status:       Creating
  Endpoint URL:          https://runtime.sagemaker.us-east-2.amazonaws.com/endpoints/host-xgboost-def0e83e0d5f11eaaaexample/invocations
  Last Check Time:       2019-11-22T19:43:57Z
  Last Modified Time:    2019-11-22T19:40:04Z
  Model Names:
    Name:   xgboost-model
    Value:  xgboost-model-1-def0e83e0d5f11-df5cc9fd0d5f11eaaa450aexample
Events:     <none>
```

The status field provides more information using the following fields: 
+ `Additional`: Additional information about the status of the hosting deployment. This field is optional and only gets populated in case of error. 
+ `Creation Time`: When the endpoint was created in SageMaker AI. 
+ `Endpoint ARN`: The SageMaker AI endpoint ARN. 
+ `Endpoint Config Name`: The SageMaker AI name of the endpoint configuration. 
+ `Endpoint Name`: The SageMaker AI name of the endpoint. 
+ `Endpoint Status`: The status of the endpoint. 
+ `Endpoint URL`: The HTTPS URL that can be used to access the endpoint. For more information, see [Deploy a Model on SageMaker AI Hosting Services](https://docs.aws.amazon.com/sagemaker/latest/dg/deploy-model.html). 
+ `FailureReason`: If a create, update, or delete command fails, the cause is shown here. 
+ `Last Check Time`: The last time the operator checked the status of the endpoint. 
+ `Last Modified Time`: The last time the endpoint was modified. 
+ `Model Names`: A key-value pair of HostingDeployment model names to SageMaker AI model names. 

### Invoking the endpoint


Once the endpoint status is `InService`, you can invoke the endpoint in two ways: using the AWS CLI, which does authentication and URL request signing, or using an HTTP client like cURL. If you use your own client, you need to do AWS v4 URL signing and authentication on your own. 

To invoke the endpoint using the AWS CLI, run the following command. Make sure to replace the Region and endpoint name with your endpoint's Region and SageMaker AI endpoint name. This information can be obtained from the output of `kubectl describe`. 

```
# Invoke the endpoint with mock input data.
aws sagemaker-runtime invoke-endpoint \
  --region us-east-2 \
  --endpoint-name <endpoint name> \
  --body $(seq 784 | xargs echo | sed 's/ /,/g') \
  >(cat) \
  --content-type text/csv > /dev/null
```

For example, if your Region is `us-east-2` and your endpoint config name is `host-xgboost-f56b6b280d7511ea824b129926example`, then the following command would invoke the endpoint: 

```
aws sagemaker-runtime invoke-endpoint \
  --region us-east-2 \
  --endpoint-name host-xgboost-f56b6b280d7511ea824b1299example \
  --body $(seq 784 | xargs echo | sed 's/ /,/g') \
  >(cat) \
  --content-type text/csv > /dev/null
4.95847082138
```

Here, `4.95847082138` is the prediction from the model for the mock data. 

### Update HostingDeployment


1. Once a HostingDeployment has a status of `InService`, it can be updated. It might take about 10 minutes for HostingDeployment to be in service. To verify that the status is `InService`, use the following command: 

   ```
   kubectl get hostingdeployments
   ```

1. The HostingDeployment can be updated before the status is `InService`. The operator waits until the SageMaker AI endpoint is `InService` before applying the update. 

   To apply an update, modify the `hosting.yaml` file. For example, change the `initialInstanceCount` field from 1 to 2 as follows: 

   ```
   apiVersion: sagemaker.aws.amazon.com/v1
   kind: HostingDeployment
   metadata:
     name: host-xgboost
   spec:
       region: us-east-2
       productionVariants:
           - variantName: all-traffic
             modelName: xgboost-model
             initialInstanceCount: 2
             instanceType: ml.c5.large
       models:
           - name: xgboost-model
             executionRoleArn: arn:aws:iam::123456789012:role/service-role/AmazonSageMaker-ExecutionRole
             primaryContainer: xgboost
             containers:
               - xgboost
       containers:
           - containerHostname: xgboost
             modelDataUrl: s3://amzn-s3-demo-bucket/inference/xgboost-mnist/model.tar.gz
             image: 123456789012.dkr.ecr.us-east-2.amazonaws.com/xgboost:latest
   ```

1. Save the file, then use `kubectl` to apply your update as follows. You should see the status change from `InService` to `ReconcilingEndpoint`, then `Updating`. 

   ```
   $ kubectl apply -f hosting.yaml
   hostingdeployment.sagemaker.aws.amazon.com/host-xgboost configured
   
   $ kubectl get hostingdeployments
   NAME           STATUS                SAGEMAKER-ENDPOINT-NAME
   host-xgboost   ReconcilingEndpoint   host-xgboost-def0e83e0d5f11eaaa450a350abcdef
   
   $ kubectl get hostingdeployments
   NAME           STATUS     SAGEMAKER-ENDPOINT-NAME
   host-xgboost   Updating   host-xgboost-def0e83e0d5f11eaaa450a3507abcdef
   ```

SageMaker AI deploys a new set of instances with your models, switches traffic to use the new instances, and drains the old instances. As soon as this process begins, the status becomes `Updating`. After the update is complete, your endpoint becomes `InService`. This process takes approximately 10 minutes. 

### Delete the HostingDeployment


1. Use `kubectl` to delete a HostingDeployment with the following command: 

   ```
   kubectl delete hostingdeployments host-xgboost
   ```

   Your output should look like the following: 

   ```
   hostingdeployment.sagemaker.aws.amazon.com "host-xgboost" deleted
   ```

1. To verify that the hosting deployment has been deleted, use the following command: 

   ```
   kubectl get hostingdeployments
   No resources found.
   ```

Endpoints that have been deleted do not incur any charges for SageMaker AI resources. 

## The ProcessingJob operator


ProcessingJob operators are used to launch Amazon SageMaker processing jobs. For more information on SageMaker Processing jobs, see [CreateProcessingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateProcessingJob.html). 

**Topics**
+ [

### Create a ProcessingJob using a YAML file
](#kubernetes-processing-job-yaml)
+ [

### List ProcessingJobs
](#kubernetes-processing-job-list)
+ [

### Describe a ProcessingJob
](#kubernetes-processing-job-description)
+ [

### Delete a ProcessingJob
](#kubernetes-processing-job-delete)

### Create a ProcessingJob using a YAML file


Follow these steps to create an Amazon SageMaker processing job by using a YAML file:

1. Download the `kmeans_preprocessing.py` pre-processing script.

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/kmeans_preprocessing.py
   ```

1. In one of your Amazon Simple Storage Service (Amazon S3) buckets, create a `mnist_kmeans_example/processing_code` folder and upload the script to the folder.

1. Download the `kmeans-mnist-processingjob.yaml` file.

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/kmeans-mnist-processingjob.yaml
   ```

1. Edit the YAML file to specify your `sagemaker-execution-role` and replace all instances of `amzn-s3-demo-bucket` with your S3 bucket.

   ```
   ...
   metadata:
     name: kmeans-mnist-processing
   ...
     roleArn: arn:aws:iam::<acct-id>:role/service-role/<sagemaker-execution-role>
     ...
     processingOutputConfig:
       outputs:
         ...
             s3Output:
               s3Uri: s3://<amzn-s3-demo-bucket>/mnist_kmeans_example/output/
     ...
     processingInputs:
       ...
           s3Input:
             s3Uri: s3://<amzn-s3-demo-bucket>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
   ```

   The `sagemaker-execution-role` must have permissions so that SageMaker AI can access your S3 bucket, Amazon CloudWatch, and other services on your behalf. For more information on creating an execution role, see [SageMaker AI Roles](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html#sagemaker-roles-createtrainingjob-perms).

1. Apply the YAML file using one of the following commands.

   For cluster-scoped installation:

   ```
   kubectl apply -f kmeans-mnist-processingjob.yaml
   ```

   For namespace-scoped installation:

   ```
   kubectl apply -f kmeans-mnist-processingjob.yaml -n <NAMESPACE>
   ```

### List ProcessingJobs


Use one of the following commands to list all the jobs created using the ProcessingJob operator. `SAGEMAKER-JOB-NAME ` comes from the `metadata` section of the YAML file.

For cluster-scoped installation:

```
kubectl get ProcessingJob kmeans-mnist-processing
```

For namespace-scoped installation:

```
kubectl get ProcessingJob -n <NAMESPACE> kmeans-mnist-processing
```

Your output should look similar to the following:

```
NAME                    STATUS     CREATION-TIME        SAGEMAKER-JOB-NAME
kmeans-mnist-processing InProgress 2020-09-22T21:13:25Z kmeans-mnist-processing-7410ed52fd1811eab19a165ae9f9e385
```

The output lists all jobs regardless of their status. To remove a job from the list, see [Delete a Processing Job](https://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-processing-job-operator.html#kubernetes-processing-job-delete).

**ProcessingJob Status**
+ `SynchronizingK8sJobWithSageMaker` – The job is first submitted to the cluster. The operator has received the request and is preparing to create the processing job.
+ `Reconciling` – The operator is initializing or recovering from transient errors, along with others. If the processing job remains in this state, use the `kubectl` `describe` command to see the reason in the `Additional` field.
+ `InProgress | Completed | Failed | Stopping | Stopped` – Status of the SageMaker Processing job. For more information, see [DescribeProcessingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeProcessingJob.html#sagemaker-DescribeProcessingJob-response-ProcessingJobStatus).
+ `Error` – The operator cannot recover by reconciling.

Jobs that have completed, stopped, or failed do not incur further charges for SageMaker AI resources.

### Describe a ProcessingJob


Use one of the following commands to get more details about a processing job. These commands are typically used for debugging a problem or checking the parameters of a processing job.

For cluster-scoped installation:

```
kubectl describe processingjob kmeans-mnist-processing
```

For namespace-scoped installation:

```
kubectl describe processingjob kmeans-mnist-processing -n <NAMESPACE>
```

The output for your processing job should look similar to the following.

```
$ kubectl describe ProcessingJob kmeans-mnist-processing
Name:         kmeans-mnist-processing
Namespace:    default
Labels:       <none>
Annotations:  kubectl.kubernetes.io/last-applied-configuration:
                {"apiVersion":"sagemaker.aws.amazon.com/v1","kind":"ProcessingJob","metadata":{"annotations":{},"name":"kmeans-mnist-processing",...
API Version:  sagemaker.aws.amazon.com/v1
Kind:         ProcessingJob
Metadata:
  Creation Timestamp:  2020-09-22T21:13:25Z
  Finalizers:
    sagemaker-operator-finalizer
  Generation:        2
  Resource Version:  21746658
  Self Link:         /apis/sagemaker.aws.amazon.com/v1/namespaces/default/processingjobs/kmeans-mnist-processing
  UID:               7410ed52-fd18-11ea-b19a-165ae9f9e385
Spec:
  App Specification:
    Container Entrypoint:
      python
      /opt/ml/processing/code/kmeans_preprocessing.py
    Image Uri:  763104351884.dkr.ecr.us-west-2.amazonaws.com/pytorch-training:1.5.0-cpu-py36-ubuntu16.04
  Environment:
    Name:   MYVAR
    Value:  my_value
    Name:   MYVAR2
    Value:  my_value2
  Network Config:
  Processing Inputs:
    Input Name:  mnist_tar
    s3Input:
      Local Path:   /opt/ml/processing/input
      s3DataType:   S3Prefix
      s3InputMode:  File
      s3Uri:        s3://<s3bucket>-us-west-2/algorithms/kmeans/mnist/mnist.pkl.gz
    Input Name:     source_code
    s3Input:
      Local Path:   /opt/ml/processing/code
      s3DataType:   S3Prefix
      s3InputMode:  File
      s3Uri:        s3://<s3bucket>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
  Processing Output Config:
    Outputs:
      Output Name:  train_data
      s3Output:
        Local Path:    /opt/ml/processing/output_train/
        s3UploadMode:  EndOfJob
        s3Uri:         s3://<s3bucket>/mnist_kmeans_example/output/
      Output Name:     test_data
      s3Output:
        Local Path:    /opt/ml/processing/output_test/
        s3UploadMode:  EndOfJob
        s3Uri:         s3://<s3bucket>/mnist_kmeans_example/output/
      Output Name:     valid_data
      s3Output:
        Local Path:    /opt/ml/processing/output_valid/
        s3UploadMode:  EndOfJob
        s3Uri:         s3://<s3bucket>/mnist_kmeans_example/output/
  Processing Resources:
    Cluster Config:
      Instance Count:     1
      Instance Type:      ml.m5.xlarge
      Volume Size In GB:  20
  Region:                 us-west-2
  Role Arn:               arn:aws:iam::<acct-id>:role/m-sagemaker-role
  Stopping Condition:
    Max Runtime In Seconds:  1800
  Tags:
    Key:    tagKey
    Value:  tagValue
Status:
  Cloud Watch Log URL:             https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logStream:group=/aws/sagemaker/ProcessingJobs;prefix=kmeans-mnist-processing-7410ed52fd1811eab19a165ae9f9e385;streamFilter=typeLogStreamPrefix
  Last Check Time:                 2020-09-22T21:14:29Z
  Processing Job Status:           InProgress
  Sage Maker Processing Job Name:  kmeans-mnist-processing-7410ed52fd1811eab19a165ae9f9e385
Events:                            <none>
```

### Delete a ProcessingJob


When you delete a processing job, the SageMaker Processing job is removed from Kubernetes but the job isn't deleted from SageMaker AI. If the job status in SageMaker AI is `InProgress` the job is stopped. Processing jobs that are stopped do not incur any charges for SageMaker AI resources. Use one of the following commands to delete a processing job. 

For cluster-scoped installation:

```
kubectl delete processingjob kmeans-mnist-processing
```

For namespace-scoped installation:

```
kubectl delete processingjob kmeans-mnist-processing -n <NAMESPACE>
```

The output for your processing job should look similar to the following.

```
processingjob.sagemaker.aws.amazon.com "kmeans-mnist-processing" deleted
```


**Note**  
SageMaker AI does not delete the processing job. Stopped jobs continue to show in the SageMaker AI console. The `delete` command takes a few minutes to clean up the resources from SageMaker AI.

## HostingAutoscalingPolicy (HAP) Operator


The HostingAutoscalingPolicy (HAP) operator takes a list of resource IDs as input and applies the same policy to each of them. Each resource ID is a combination of an endpoint name and a variant name. The HAP operator performs two steps: it registers the resource IDs and then applies the scaling policy to each resource ID. `Delete` undoes both actions. You can apply the HAP to an existing SageMaker AI endpoint or you can create a new SageMaker AI endpoint using the [HostingDeployment operator](https://docs.aws.amazon.com/sagemaker/latest/dg/hosting-deployment-operator.html#create-a-hostingdeployment). You can read more about SageMaker AI autoscaling in the [ Application Autoscaling Policy documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/endpoint-auto-scaling.html).

**Note**  
In your `kubectl` commands, you can use the short form, `hap`, in place of `hostingautoscalingpolicy`.

**Topics**
+ [

### Create a HostingAutoscalingPolicy using a YAML file
](#kubernetes-hap-job-yaml)
+ [

### List HostingAutoscalingPolicies
](#kubernetes-hap-list)
+ [

### Describe a HostingAutoscalingPolicy
](#kubernetes-hap-describe)
+ [

### Update a HostingAutoscalingPolicy
](#kubernetes-hap-update)
+ [

### Delete a HostingAutoscalingPolicy
](#kubernetes-hap-delete)
+ [

### Update or delete an endpoint with a HostingAutoscalingPolicy
](#kubernetes-hap-update-delete-endpoint)

### Create a HostingAutoscalingPolicy using a YAML file


Use a YAML file to create a HostingAutoscalingPolicy (HAP) that applies a predefined or custom metric to one or multiple SageMaker AI endpoints.

Amazon SageMaker AI requires specific values in order to apply autoscaling to your variant. If these values are not specified in the YAML spec, the HAP operator applies the following default values.

```
# Do not change
Namespace                    = "sagemaker"
# Do not change
ScalableDimension            = "sagemaker:variant:DesiredInstanceCount"
# Only one supported
PolicyType                   = "TargetTrackingScaling"
# This is the default policy name but can be changed to apply a custom policy
DefaultAutoscalingPolicyName = "SageMakerEndpointInvocationScalingPolicy"
```

Use the following samples to create a HAP that applies a predefined or custom metric to one or multiple endpoints.

#### Sample 1: Apply a predefined metric to a single endpoint variant
Apply a predefined metric

1. Download the sample YAML file for a predefined metric using the following command:

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/hap-predefined-metric.yaml
   ```

1. Edit the YAML file to specify your `endpointName`, `variantName`, and `Region`.

1. Use one of the following commands to apply a predefined metric to a single resource ID (endpoint name and variant name combination).

   For cluster-scoped installation:

   ```
   kubectl apply -f hap-predefined-metric.yaml
   ```

   For namespace-scoped installation:

   ```
   kubectl apply -f hap-predefined-metric.yaml -n <NAMESPACE>
   ```

#### Sample 2: Apply a custom metric to a single endpoint variant
Apply a custom metric

1. Download the sample YAML file for a custom metric using the following command:

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/hap-custom-metric.yaml
   ```

1. Edit the YAML file to specify your `endpointName`, `variantName`, and `Region`.

1. Use one of the following commands to apply a custom metric to a single resource ID (endpoint name and variant name combination) in place of the recommended `SageMakerVariantInvocationsPerInstance`.
**Note**  
Amazon SageMaker AI does not check the validity of your YAML spec.

   For cluster-scoped installation:

   ```
   kubectl apply -f hap-custom-metric.yaml
   ```

   For namespace-scoped installation:

   ```
   kubectl apply -f hap-custom-metric.yaml -n <NAMESPACE>
   ```

#### Sample 3: Apply a scaling policy to multiple endpoints and variants
Apply a scaling policy

You can use the HAP operator to apply the same scaling policy to multiple resource IDs. A separate `scaling_policy` request is created for each resource ID (endpoint name and variant name combination).

1. Download the sample YAML file for a predefined metric using the following command:

   ```
   wget https://raw.githubusercontent.com/aws/amazon-sagemaker-operator-for-k8s/master/samples/hap-predefined-metric.yaml
   ```

1. Edit the YAML file to specify your `Region` and multiple `endpointName` and `variantName` values.

1. Use one of the following commands to apply a predefined metric to multiple resource IDs (endpoint name and variant name combinations).

   For cluster-scoped installation:

   ```
   kubectl apply -f hap-predefined-metric.yaml
   ```

   For namespace-scoped installation:

   ```
   kubectl apply -f hap-predefined-metric.yaml -n <NAMESPACE>
   ```

#### Considerations for HostingAutoscalingPolicies for multiple endpoints and variants
Considerations for applying a scaling policy

The following considerations apply when you use multiple resource IDs:
+ If you apply a single policy across multiple resource IDs, one PolicyARN is created per resource ID. Five endpoints have five PolicyARNs. When you run the `describe` command on the policy, the responses show up as one job and include a single job status.
+ If you apply a custom metric to multiple resource IDs, the same dimension or value is used for all the resource ID (variant) values. For example, if you apply a customer metric for instances 1-5, and the endpoint variant dimension is mapped to variant 1, when variant 1 exceeds the metrics, all endpoints are scaled up or down.
+ The HAP operator supports updating the list of resource IDs. If you modify, add, or delete resource IDs to the spec, the autoscaling policy is removed from the previous list of variants and applied to the newly specified resource ID combinations. Use the [https://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-hap-operator.html#kubernetes-hap-describe](https://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-hap-operator.html#kubernetes-hap-describe) command to list the resource IDs to which the policy is currently applied.

### List HostingAutoscalingPolicies


Use one of the following commands to list all HostingAutoscalingPolicies (HAPs) created using the HAP operator.

For cluster-scoped installation:

```
kubectl get hap
```

For namespace-scoped installation:

```
kubectl get hap -n <NAMESPACE>
```

Your output should look similar to the following:

```
NAME             STATUS   CREATION-TIME
hap-predefined   Created  2021-07-13T21:32:21Z
```

Use the following command to check the status of your HostingAutoscalingPolicy (HAP).

```
kubectl get hap <job-name>
```

One of the following values is returned:
+ `Reconciling` – Certain types of errors show the status as `Reconciling` instead of `Error`. Some examples are server-side errors and endpoints in the `Creating` or `Updating` state. Check the `Additional` field in status or operator logs for more details.
+ `Created`
+ `Error`

**To view the autoscaling endpoint to which you applied the policy**

1. Open the Amazon SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. In the left side panel, expand **Inference**.

1. Choose **Endpoints**.

1. Select the name of the endpoint of interest.

1. Scroll to the **Endpoint runtime settings** section.

### Describe a HostingAutoscalingPolicy


Use the following command to get more details about a HostingAutoscalingPolicy (HAP). These commands are typically used for debugging a problem or checking the resource IDs (endpoint name and variant name combinations) of a HAP.

```
kubectl describe hap <job-name>
```

### Update a HostingAutoscalingPolicy


The HostingAutoscalingPolicy (HAP) operator supports updates. You can edit your YAML spec to change the values and then reapply the policy. The HAP operator deletes the existing policy and applies the new policy.

### Delete a HostingAutoscalingPolicy


Use one of the following commands to delete a HostingAutoscalingPolicy (HAP) policy.

For cluster-scoped installation:

```
kubectl delete hap hap-predefined
```

For namespace-scoped installation:

```
kubectl delete hap hap-predefined -n <NAMESPACE>
```

This command deletes the scaling policy and deregisters the scaling target from Kubernetes. This command returns the following output:

```
hostingautoscalingpolicies.sagemaker.aws.amazon.com "hap-predefined" deleted
```

### Update or delete an endpoint with a HostingAutoscalingPolicy
Update or delete an endpoint

To update an endpoint that has a HostingAutoscalingPolicy (HAP), use the `kubectl` `delete` command to remove the HAP, update the endpoint, and then reapply the HAP.

To delete an endpoint that has a HAP, use the `kubectl` `delete` command to remove the HAP before you delete the endpoint.

# Migrate resources to the latest Operators
Migrate to Latest Operator

We are stopping the development and technical support of the original version of [ SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master).

If you are currently using version `v1.2.2` or below of [ SageMaker Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master), we recommend migrating your resources to the [ACK service controller for Amazon SageMaker](https://github.com/aws-controllers-k8s/sagemaker-controller). The ACK service controller is a new generation of SageMaker Operators for Kubernetes based on [AWS Controllers for Kubernetes (ACK)](https://aws-controllers-k8s.github.io/community/).

For answers to frequently asked questions on the end of support of the original version of SageMaker Operators for Kubernetes, see [Announcing the End of Support of the Original Version of SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-eos-announcement.md)

Use the following steps to migrate your resources and use ACK to train, tune, and deploy machine learning models with Amazon SageMaker AI.

**Note**  
The latest SageMaker AI Operators for Kubernetes are not backwards compatible.

**Topics**
+ [

## Prerequisites
](#migrate-resources-to-new-operators-prerequisites)
+ [

## Adopt resources
](#migrate-resources-to-new-operators-steps)
+ [

## Clean up old resources
](#migrate-resources-to-new-operators-cleanup)
+ [

## Use the new SageMaker AI Operators for Kubernetes
](#migrate-resources-to-new-operators-tutorials)

## Prerequisites


To successfully migrate resources to the latest SageMaker AI Operators for Kubernetes, you must do the following:

1. Install the latest SageMaker AI Operators for Kubernetes. See [Setup](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/#setup) in *Machine Learning with the ACK SageMaker AI Controller* for step-by-step instructions.

1. If you are using [HostingAutoscalingPolicy resources](#migrate-resources-to-new-operators-hap), install the new Application Auto Scaling Operators. See [Setup](https://aws-controllers-k8s.github.io/community/docs/tutorials/autoscaling-example/#setup) in *Scale SageMaker AI Workloads with Application Auto Scaling* for step-by-step instructions. This step is optional if you are not using HostingAutoScalingPolicy resources.

If permissions are configured correctly, then the ACK SageMaker AI service controller can determine the specification and status of the AWS resource and reconcile the resource as if the ACK controller originally created it.

## Adopt resources


The new SageMaker AI Operators for Kubernetes provide the ability to adopt resources that were not originally created by the ACK service controller. For more information, see [Adopt Existing AWS Resources](https://aws-controllers-k8s.github.io/community/docs/user-docs/adopted-resource/) in the ACK documentation.

The following steps show how the new SageMaker AI Operators for Kubernetes can adopt an existing SageMaker AI endpoint. Save the following sample to a file named `adopt-endpoint-sample.yaml`. 

```
apiVersion: services.k8s.aws/v1alpha1
kind: AdoptedResource
metadata:
  name: adopt-endpoint-sample
spec:  
  aws:
    # resource to adopt, not created by ACK
    nameOrID: xgboost-endpoint
  kubernetes:
    group: sagemaker.services.k8s.aws
    kind: Endpoint
    metadata:
      # target K8s CR name
      name: xgboost-endpoint
```

Submit the custom resource (CR) using `kubectl apply`:

```
kubectl apply -f adopt-endpoint-sample.yaml
```

Use `kubectl describe` to check the status conditions of your adopted resource.

```
kubectl describe adoptedresource adopt-endpoint-sample
```

Verify that the `ACK.Adopted` condition is `True`. The output should look similar to the following example:

```
---
kind: AdoptedResource
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"services.k8s.aws/v1alpha1","kind":"AdoptedResource","metadata":{"annotations":{},"name":"xgboost-endpoint","namespace":"default"},"spec":{"aws":{"nameOrID":"xgboost-endpoint"},"kubernetes":{"group":"sagemaker.services.k8s.aws","kind":"Endpoint","metadata":{"name":"xgboost-endpoint"}}}}'
  creationTimestamp: '2021-04-27T02:49:14Z'
  finalizers:
  - finalizers.services.k8s.aws/AdoptedResource
  generation: 1
  name: adopt-endpoint-sample
  namespace: default
  resourceVersion: '12669876'
  selfLink: "/apis/services.k8s.aws/v1alpha1/namespaces/default/adoptedresources/adopt-endpoint-sample"
  uid: 35f8fa92-29dd-4040-9d0d-0b07bbd7ca0b
spec:
  aws:
    nameOrID: xgboost-endpoint
  kubernetes:
    group: sagemaker.services.k8s.aws
    kind: Endpoint
    metadata:
      name: xgboost-endpoint
status:
  conditions:
  - status: 'True'
    type: ACK.Adopted
```

Check that your resource exists in your cluster:

```
kubectl describe endpoints.sagemaker xgboost-endpoint
```

### HostingAutoscalingPolicy resources


The `HostingAutoscalingPolicy` (HAP) resource consists of multiple Application Auto Scaling resources: `ScalableTarget` and `ScalingPolicy`. When adopting a HAP resource with ACK, first install the [Application Auto Scaling controller](https://github.com/aws-controllers-k8s/applicationautoscaling-controller). To adopt HAP resources, you need to adopt both `ScalableTarget` and `ScalingPolicy` resources. You can find the resource indentifier for these resources in the status of the `HostingAutoscalingPolicy` resource (`status.ResourceIDList`).

### HostingDeployment resources


The `HostingDeployment` resource consists of multiple SageMaker AI resources: `Endpoint`, `EndpointConfig`, and each `Model`. If you adopt a SageMaker AI endpoint in ACK, you need to adopt the `Endpoint`, `EndpointConfig`, and each `Model` separately. The `Endpoint`, `EndpointConfig`, and `Model` names can be found in status of the `HostingDeployment` resource (`status.endpointName`, `status.endpointConfigName`, and `status.modelNames`).

For a list of all supported SageMaker AI resources, refer to the [ACK API Reference](https://aws-controllers-k8s.github.io/community/reference/).

## Clean up old resources


After the new SageMaker AI Operators for Kubernetes adopt your resources, you can uninstall old operators and clean up old resources.

### Step 1: Uninstall the old operator


To uninstall the old operator, see [Delete operators](kubernetes-sagemaker-operators-end-of-support.md#delete-operators).

**Warning**  
Uninstall the old operator before deleting any old resources.

### Step 2: Remove finalizers and delete old resources


**Warning**  
Before deleting old resources, be sure that you have uninstalled the old operator.

After uninstalling the old operator, you must explicitly remove the finalizers to delete old operator resources. The following sample script shows how to delete all training jobs managed by the old operator in a given namespace. You can use a similar pattern to delete additional resources once they are adopted by the new operator.

**Note**  
You must use full resource names to get resources. For example, use `kubectl get trainingjobs.sagemaker.aws.amazon.com` instead of `kubectl get trainingjob`.

```
namespace=sagemaker_namespace
training_jobs=$(kubectl get trainingjobs.sagemaker.aws.amazon.com -n $namespace -ojson | jq -r '.items | .[] | .metadata.name')
 
for job in $training_jobs
do
    echo "Deleting $job resource in $namespace namespace"
    kubectl patch trainingjobs.sagemaker.aws.amazon.com $job -n $namespace -p '{"metadata":{"finalizers":null}}' --type=merge
    kubectl delete trainingjobs.sagemaker.aws.amazon.com $job -n $namespace
done
```

## Use the new SageMaker AI Operators for Kubernetes


For in-depth guides on using the new SageMaker AI Operators for Kubernetes, see [Use SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-ack.md#kubernetes-sagemaker-operators-ack-use)

# Announcing the End of Support of the Original Version of SageMaker AI Operators for Kubernetes
End of support FAQ

This page announces the end of support for the original version of [SageMaker AI Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s) and provides answers to frequently asked questions as well as migration information about the [ACK service controller for Amazon SageMaker AI](https://github.com/aws-controllers-k8s/sagemaker-controller), a new generation of fully supported SageMaker AI Operators for Kubernetes. For general information about the new SageMaker AI Operators for Kubernetes, see [Latest SageMaker AI Operators for Kubernetes](kubernetes-sagemaker-operators-ack.md). 

## End of Support Frequently Asked Questions
End of support FAQ

**Topics**
+ [

### Why are we ending support for the original version of SageMaker AI Operators for Kubernetes?
](#kubernetes-sagemaker-operators-eos-faq-why)
+ [

### Where can I find more information about the new SageMaker AI Operators for Kubernetes and ACK?
](#kubernetes-sagemaker-operators-eos-faq-more)
+ [

### What does end of support (EOS) mean?
](#kubernetes-sagemaker-operators-eos-faq-definition)
+ [

### How can I migrate my workload to the new SageMaker AI Operators for Kubernetes for training and inference?
](#kubernetes-sagemaker-operators-eos-faq-how)
+ [

### Which version of ACK should I migrate to?
](#kubernetes-sagemaker-operators-eos-faq-version)
+ [

### Are the initial SageMaker AI Operators for Kubernetes and the new Operators (ACK service controller for Amazon SageMaker AI) functionally equivalent?
](#kubernetes-sagemaker-operators-eos-faq-parity)

### Why are we ending support for the original version of SageMaker AI Operators for Kubernetes?


Users can now take advantage of the [ACK service controller for Amazon SageMaker AI](https://github.com/aws-controllers-k8s/sagemaker-controller). The ACK service controller is a new generation of SageMaker AI Operators for Kubernetes based on [AWS Controllers for Kubernetes](https://aws-controllers-k8s.github.io/community/) (ACK), a community-driven project optimized for production, standardizing the way to expose AWS services via a Kubernetes operator. We are therefore announcing the end of support (EOS) for the original version (not ACK-based) of [ SageMaker AI Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s). The support ends on **Feb 15, 2023** along with [Amazon Elastic Kubernetes Service Kubernetes 1.21](https://docs.aws.amazon.com/eks/latest/userguide/kubernetes-versions.html#kubernetes-release-calendar). 

For more information on ACK, see [ACK history and tenets](https://aws-controllers-k8s.github.io/community/docs/community/background/).

### Where can I find more information about the new SageMaker AI Operators for Kubernetes and ACK?

+ For more information about the new SageMaker AI Operators for Kubernetes, see the [ACK service controller for Amazon SageMaker AI](https://github.com/aws-controllers-k8s/sagemaker-controller) GitHub repository or read [AWS Controllers for Kubernetes Documentation](https://aws-controllers-k8s.github.io/community/docs/community/overview/).
+ For a tutorial on how to train a machine learning model with the ACK service controller for Amazon SageMaker AI using Amazon EKS, see this [SageMaker AI example](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/).

  For an autoscaling example, see [ Scale SageMaker AI Workloads with Application Auto Scaling](https://aws-controllers-k8s.github.io/community/docs/tutorials/autoscaling-example/).
+ For information on AWS Controller for Kubernetes (ACK), see the [AWS Controllers for Kubernetes](https://aws-controllers-k8s.github.io/community/) (ACK) documentation.
+ For a list of supported SageMaker AI resources, see [ACK API Reference](https://aws-controllers-k8s.github.io/community/reference/).

### What does end of support (EOS) mean?


While users can continue to use their current operators, we are no longer developing new features for the operators, nor will we release any patches or security updates for any issues found. `v1.2.2` is the last release of [SageMaker AI Operators for Kubernetes](https://github.com/aws/amazon-sagemaker-operator-for-k8s/tree/master). Users should migrate their workloads to use the [ACK service controller for Amazon SageMaker AI](https://github.com/aws-controllers-k8s/sagemaker-controller).

### How can I migrate my workload to the new SageMaker AI Operators for Kubernetes for training and inference?


For information about migrating resources from the old to the new SageMaker AI Operators for Kubernetes, follow [Migrate resources to the latest Operators](kubernetes-sagemaker-operators-migrate.md).

### Which version of ACK should I migrate to?


Users should migrate to the most recent released version of the [ACK service controller for Amazon SageMaker AI](https://github.com/aws-controllers-k8s/sagemaker-controller/tags).

### Are the initial SageMaker AI Operators for Kubernetes and the new Operators (ACK service controller for Amazon SageMaker AI) functionally equivalent?


Yes, they are at feature parity.

A few highlights of the main notable differences between the two versions include:
+ The Custom Resources Definitions (CRD) used by the ACK-based SageMaker AI Operators for Kubernetes follow the AWS API definition making it incompatible with the custom resources specifications from the SageMaker AI Operators for Kubernetes in its original version. Refer to the [CRDs](https://github.com/aws-controllers-k8s/sagemaker-controller/tree/main/helm/crds) in the new controller or use the migration guide to adopt the resources and use the new controller. 
+ The `Hosting Autoscaling` policy is no longer part of the new SageMaker AI Operators for Kubernetes and has been migrated to the [Application autoscaling](https://github.com/aws-controllers-k8s/applicationautoscaling-controller) ACK controller. To learn how to use the application autoscaling controller to configure autoscaling on SageMaker AI Endpoints, follow this [autoscaling example](https://aws-controllers-k8s.github.io/community/docs/tutorials/autoscaling-example/). 
+ The `HostingDeployment` resource was used to create Models, Endpoint Configurations, and Endpoints in one CRD. The new SageMaker AI Operators for Kubernetes has a separate CRD for each of these resources. 

# SageMaker AI Components for Kubeflow Pipelines


With SageMaker AI components for Kubeflow Pipelines, you can create and monitor native SageMaker AI training, tuning, endpoint deployment, and batch transform jobs from your Kubeflow Pipelines. By running Kubeflow Pipeline jobs on SageMaker AI, you move data processing and training jobs from the Kubernetes cluster to SageMaker AI's machine learning-optimized managed service. This document assumes prior knowledge of Kubernetes and Kubeflow. 

**Topics**
+ [

## What are Kubeflow Pipelines?
](#what-is-kubeflow-pipelines)
+ [

## What are Kubeflow Pipeline components?
](#kubeflow-pipeline-components)
+ [

## Why use SageMaker AI Components for Kubeflow Pipelines?
](#why-use-sagemaker-components)
+ [

## SageMaker AI Components for Kubeflow Pipelines versions
](#sagemaker-components-versions)
+ [

## List of SageMaker AI Components for Kubeflow Pipelines
](#sagemaker-components-list)
+ [

## IAM permissions
](#iam-permissions)
+ [

## Converting pipelines to use SageMaker AI
](#converting-pipelines-to-use-amazon-sagemaker)
+ [

# Install Kubeflow Pipelines
](kubernetes-sagemaker-components-install.md)
+ [

# Use SageMaker AI components
](kubernetes-sagemaker-components-tutorials.md)

## What are Kubeflow Pipelines?


Kubeflow Pipelines (KFP) is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. The Kubeflow Pipelines platform consists of the following:
+ A user interface (UI) for managing and tracking experiments, jobs, and runs. 
+ An engine (Argo) for scheduling multi-step ML workflows.
+ An SDK for defining and manipulating pipelines and components.
+ Notebooks for interacting with the system using the SDK.

A pipeline is a description of an ML workflow expressed as a [directed acyclic graph](https://www.kubeflow.org/docs/pipelines/concepts/graph/). Every step in the workflow is expressed as a Kubeflow Pipeline [component](https://www.kubeflow.org/docs/pipelines/overview/concepts/component/), which is a AWS SDK for Python (Boto3) module.

For more information on Kubeflow Pipelines, see the [Kubeflow Pipelines documentation](https://www.kubeflow.org/docs/pipelines/). 

## What are Kubeflow Pipeline components?


A Kubeflow Pipeline component is a set of code used to execute one step of a Kubeflow pipeline. Components are represented by a Python module built into a Docker image. When the pipeline runs, the component's container is instantiated on one of the worker nodes on the Kubernetes cluster running Kubeflow, and your logic is executed. Pipeline components can read outputs from the previous components and create outputs that the next component in the pipeline can consume. These components make it fast and easy to write pipelines for experimentation and production environments without having to interact with the underlying Kubernetes infrastructure.

You can use SageMaker AI Components in your Kubeflow pipeline. Rather than encapsulating your logic in a custom container, you simply load the components and describe your pipeline using the Kubeflow Pipelines SDK. When the pipeline runs, your instructions are translated into a SageMaker AI job or deployment. The workload then runs on the fully managed infrastructure of SageMaker AI. 

## Why use SageMaker AI Components for Kubeflow Pipelines?
Why use SageMaker AI Components?

SageMaker AI Components for Kubeflow Pipelines offer an alternative to launching your compute-intensive jobs from SageMaker AI. The components integrate SageMaker AI with the portability and orchestration of Kubeflow Pipelines. Using the SageMaker AI Components for Kubeflow Pipelines, you can create and monitor your SageMaker AI resources as part of a Kubeflow Pipelines workflow. Each of the jobs in your pipelines runs on SageMaker AI instead of the local Kubernetes cluster allowing you to take advantage of key SageMaker AI features such as data labeling, large-scale hyperparameter tuning and distributed training jobs, or one-click secure and scalable model deployment. The job parameters, status, logs, and outputs from SageMaker AI are still accessible from the Kubeflow Pipelines UI. 

The SageMaker AI components integrate key SageMaker AI features into your ML workflows from preparing data, to building, training, and deploying ML models. You can create a Kubeflow Pipeline built entirely using these components, or integrate individual components into your workflow as needed. The components are available in one or two versions. Each version of a component leverages a different backend. For more information on those versions, see [SageMaker AI Components for Kubeflow Pipelines versions](#sagemaker-components-versions).

There is no additional charge for using SageMaker AI Components for Kubeflow Pipelines. You incur charges for any SageMaker AI resources you use through these components.

## SageMaker AI Components for Kubeflow Pipelines versions
SageMaker AI Components versions

SageMaker AI Components for Kubeflow Pipelines come in two versions. Each version leverages a different backend to create and manage resources on SageMaker AI.
+ The SageMaker AI Components for Kubeflow Pipelines version 1 (v1.x or below) use **[Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html)** (AWS SDK for Python (Boto3)) as backend. 
+ The version 2 (v2.0.0-alpha2 and above) of SageMaker AI Components for Kubeflow Pipelines use [ SageMaker AI Operator for Kubernetes (ACK)](https://github.com/aws-controllers-k8s/sagemaker-controller). 

  AWS introduced [ACK](https://aws-controllers-k8s.github.io/community/) to facilitate a Kubernetes-native way of managing AWS Cloud resources. ACK includes a set of AWS service-specific controllers, one of which is the SageMaker AI controller. The SageMaker AI controller makes it easier for machine learning developers and data scientists using Kubernetes as their control plane to train, tune, and deploy machine learning (ML) models in SageMaker AI. For more information, see [SageMaker AI Operators for Kubernetes](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/) 

Both versions of the SageMaker AI Components for Kubeflow Pipelines are supported. However, the version 2 provides some additional advantages. In particular, it offers: 

1. A consistent experience to manage your SageMaker AI resources from any application; whether you are using Kubeflow pipelines, or Kubernetes CLI (`kubectl`) or other Kubeflow applications such as Notebooks. 

1. The flexibility to manage and monitor your SageMaker AI resources outside of the Kubeflow pipeline workflow. 

1. Zero setup time to use the SageMaker AI components if you deployed the full [Kubeflow on AWS](https://awslabs.github.io/kubeflow-manifests/docs/about/) release since the SageMaker AI Operator is part of its deployment. 

## List of SageMaker AI Components for Kubeflow Pipelines
List of SageMaker AI Components

The following is a list of all SageMaker AI Components for Kubeflow Pipelines and their available versions. Alternatively, you can find all [SageMaker AI Components for Kubeflow Pipelines in GitHub](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker#versioning).

**Note**  
We encourage users to utilize Version 2 of a SageMaker AI component wherever it is available.

### Ground Truth components

+ **Ground Truth**

  The Ground Truth component enables you to submit SageMaker AI Ground Truth labeling jobs directly from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **Workteam**

  The Workteam component enables you to create SageMaker AI private workteam jobs directly from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

### Data processing components

+ **Processing**

  The Processing component enables you to submit processing jobs to SageMaker AI directly from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

### Training components

+ **Training**

  The Training component allows you to submit SageMaker Training jobs directly from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **Hyperparameter Optimization**

  The Hyperparameter Optimization component enables you to submit hyperparameter tuning jobs to SageMaker AI directly from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

### Inference components

+ **Hosting Deploy**

  The Hosting components allow you to deploy a model using SageMaker AI hosting services from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **Batch Transform**

  The Batch Transform component allows you to run inference jobs for an entire dataset in SageMaker AI from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **Model Monitor**

  The Model Monitor components allow you to monitor the quality of SageMaker AI machine learning models in production from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

## IAM permissions


Deploying Kubeflow Pipelines with SageMaker AI components requires the following three layers of authentication: 
+ An IAM role granting your gateway node (which can be your local machine or a remote instance) access to the Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

  The user accessing the gateway node assumes this role to:
  + Create an Amazon EKS cluster and install KFP
  + Create IAM roles
  + Create Amazon S3 buckets for your sample input data

  The role requires the following permissions:
  + CloudWatchLogsFullAccess 
  + [https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess](https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess) 
  + IAMFullAccess
  + AmazonS3FullAccess
  + AmazonEC2FullAccess
  + AmazonEKSAdminPolicy (Create this policy using the schema from [Amazon EKS Identity-Based Policy Examples](https://docs.aws.amazon.com/eks/latest/userguide/security_iam_id-based-policy-examples.html)) 
+ A Kubernetes IAM execution role assumed by Kubernetes pipeline pods (**kfp-example-pod-role**) or the SageMaker AI Operator for Kubernetes controller pod to access SageMaker AI. This role is used to create and monitor SageMaker AI jobs from Kubernetes.

  The role requires the following permission:
  + AmazonSageMakerFullAccess 

  You can limit permissions to the KFP and controller pods by creating and attaching your own custom policy.
+ A SageMaker AI IAM execution role assumed by SageMaker AI jobs to access AWS resources such as Amazon S3 or Amazon ECR (**kfp-example-sagemaker-execution-role**).

  SageMaker AI jobs use this role to:
  + Access SageMaker AI resources
  + Input Data from Amazon S3
  + Store your output model to Amazon S3

  The role requires the following permissions:
  + AmazonSageMakerFullAccess 
  + AmazonS3FullAccess 

## Converting pipelines to use SageMaker AI


You can convert an existing pipeline to use SageMaker AI by porting your generic Python [processing containers](https://docs.aws.amazon.com/sagemaker/latest/dg/amazon-sagemaker-containers.html) and [training containers](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html). If you are using SageMaker AI for inference, you also need to attach IAM permissions to your cluster and convert an artifact to a model.

# Install Kubeflow Pipelines


[Kubeflow Pipelines (KFP)](https://www.kubeflow.org/docs/components/pipelines/v2/introduction/) is the pipeline orchestration component of Kubeflow.

You can deploy Kubeflow Pipelines (KFP) on an existing Amazon Elastic Kubernetes Service (Amazon EKS) or create a new Amazon EKS cluster. Use a gateway node to interact with your cluster. The gateway node can be your local machine or an Amazon EC2 instance.

The following section guides you through the steps to set up and configure these resources.

**Topics**
+ [

## Choose an installation option
](#choose-install-option)
+ [

## Configure your pipeline permissions to access SageMaker AI
](#configure-permissions-for-pipeline)
+ [

## Access the KFP UI (Kubeflow Dashboard)
](#access-the-kfp-ui)

## Choose an installation option


Kubeflow Pipelines is available as a core component of the full distribution of Kubeflow on AWS or as a standalone installation.

Select the option that applies to your use case:

1. [Full Kubeflow on AWS Deployment](#full-kubeflow-deployment)

   To use other Kubeflow components in addition to Kubeflow Pipelines, choose the full [AWS distribution of Kubeflow](https://awslabs.github.io/kubeflow-manifests) deployment. 

1. [Standalone Kubeflow Pipelines Deployment](#kubeflow-pipelines-standalone)

   To use the Kubeflow Pipelines without the other components of Kubeflow, install Kubeflow pipelines standalone. 

### Full Kubeflow on AWS Deployment


To install the full release of Kubeflow on AWS, choose the vanilla deployment option from [Kubeflow on AWS deployment guide](https://awslabs.github.io/kubeflow-manifests/docs/deployment/) or any other deployment option supporting integrations with various AWS services (Amazon S3, Amazon RDS, Amazon Cognito).

### Standalone Kubeflow Pipelines Deployment


This section assumes that your user has permissions to create roles and define policies for the role.

#### Set up a gateway node


You can use your local machine or an Amazon EC2 instance as your gateway node. A gateway node is used to create an Amazon EKS cluster and access the Kubeflow Pipelines UI. 

Complete the following steps to set up your node. 

1. 

**Create a gateway node.**

   You can use an existing Amazon EC2 instance or create a new instance with the latest Ubuntu 18.04 DLAMI version using the steps in [Launching and Configuring a DLAMI](https://docs.aws.amazon.com/dlami/latest/devguide/launch-config.html).

1. 

**Create an IAM role to grant your gateway node access to AWS resources.**

   Create an IAM role with permissions to the following resources: CloudWatch, CloudFormation, IAM, Amazon EC2, Amazon S3, Amazon EKS.

   Attach the following policies to the IAM role:
   + CloudWatchLogsFullAccess 
   + [https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess](https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess)
   + IAMFullAccess 
   + AmazonS3FullAccess 
   + AmazonEC2FullAccess 
   + AmazonEKSAdminPolicy (Create this policy using the schema from [Amazon EKS Identity-Based Policy Examples](https://docs.aws.amazon.com/eks/latest/userguide/security_iam_id-based-policy-examples.html)) 

   For information on adding IAM permissions to an IAM role, see [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).

1. 

**Install the following tools and clients**

   Install and configure the following tools and resources on your gateway node to access the Amazon EKS cluster and KFP User Interface (UI). 
   + [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html): The command line tool for working with AWS services. For AWS CLI configuration information, see [Configuring the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html). 
   + [aws-iam-authenticator](https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html) version 0.1.31 and above: A tool to use AWS IAM credentials to authenticate to a Kubernetes cluster.
   + [https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html](https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html) version above 0.15: The command line tool for working with Amazon EKS clusters.
   + [https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl): The command line tool for working with Kubernetes clusters. The version needs to match your Kubernetes version within one minor version.
   + [https://aws.amazon.com/sdk-for-python/](https://aws.amazon.com/sdk-for-python/).

     ```
     pip install boto3
     ```

#### Set up an Amazon EKS cluster


1. If you do not have an existing Amazon EKS cluster, run the following steps from the command line of your gateway node, skip this step otherwise.

   1. Run the following command to create an Amazon EKS cluster with version 1.17 or above. Replace `<clustername>` with any name for your cluster. 

      ```
      eksctl create cluster --name <clustername> --region us-east-1 --auto-kubeconfig --timeout=50m --managed --nodes=1
      ```

   1. When the cluster creation is complete, ensure that you have access to your cluster by listing the cluster's nodes. 

      ```
      kubectl get nodes
      ```

1. Ensure that the current `kubectl` context points to your cluster with the following command. The current context is marked with an asterisk (\$1) in the output.

   ```
   kubectl config get-contexts
   
   CURRENT NAME     CLUSTER
   *   <username>@<clustername>.us-east-1.eksctl.io   <clustername>.us-east-1.eksctl.io
   ```

1. If the desired cluster is not configured as your current default, update the default with the following command. 

   ```
   aws eks update-kubeconfig --name <clustername> --region us-east-1
   ```

#### Install Kubeflow Pipelines


Run the following steps from the terminal of your gateway node to install Kubeflow Pipelines on your cluster.

1. Install all [cert-manager components](https://cert-manager.io/docs/installation/kubectl/).

   ```
   kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cert-manager.yaml
   ```

1. Install the Kubeflow Pipelines.

   ```
   export PIPELINE_VERSION=2.0.0-alpha.5
   kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/cert-manager/cluster-scoped-resources?ref=$KFP_VERSION"
   kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
   kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/cert-manager/dev?ref=$KFP_VERSION"
   ```

1. Ensure that the Kubeflow Pipelines service and other related resources are running.

   ```
   kubectl -n kubeflow get all | grep pipeline
   ```

   Your output should look like the following.

   ```
   pod/ml-pipeline-6b88c67994-kdtjv                      1/1     Running            0          2d
   pod/ml-pipeline-persistenceagent-64d74dfdbf-66stk     1/1     Running            0          2d
   pod/ml-pipeline-scheduledworkflow-65bdf46db7-5x9qj    1/1     Running            0          2d
   pod/ml-pipeline-ui-66cc4cffb6-cmsdb                   1/1     Running            0          2d
   pod/ml-pipeline-viewer-crd-6db65ccc4-wqlzj            1/1     Running            0          2d
   pod/ml-pipeline-visualizationserver-9c47576f4-bqmx4   1/1     Running            0          2d
   service/ml-pipeline                       ClusterIP   10.100.170.170   <none>        8888/TCP,8887/TCP   2d
   service/ml-pipeline-ui                    ClusterIP   10.100.38.71     <none>        80/TCP              2d
   service/ml-pipeline-visualizationserver   ClusterIP   10.100.61.47     <none>        8888/TCP            2d
   deployment.apps/ml-pipeline                       1/1     1            1           2d
   deployment.apps/ml-pipeline-persistenceagent      1/1     1            1           2d
   deployment.apps/ml-pipeline-scheduledworkflow     1/1     1            1           2d
   deployment.apps/ml-pipeline-ui                    1/1     1            1           2d
   deployment.apps/ml-pipeline-viewer-crd            1/1     1            1           2d
   deployment.apps/ml-pipeline-visualizationserver   1/1     1            1           2d
   replicaset.apps/ml-pipeline-6b88c67994                      1         1         1       2d
   replicaset.apps/ml-pipeline-persistenceagent-64d74dfdbf     1         1         1       2d
   replicaset.apps/ml-pipeline-scheduledworkflow-65bdf46db7    1         1         1       2d
   replicaset.apps/ml-pipeline-ui-66cc4cffb6                   1         1         1       2d
   replicaset.apps/ml-pipeline-viewer-crd-6db65ccc4            1         1         1       2d
   replicaset.apps/ml-pipeline-visualizationserver-9c47576f4   1         1         1       2d
   ```

## Configure your pipeline permissions to access SageMaker AI


In this section, you create an IAM execution role granting Kubeflow Pipeline pods access to SageMaker AI services. 

### Configuration for SageMaker AI components version 2


To run SageMaker AI Components version 2 for Kubeflow Pipelines, you need to install [SageMaker AI Operator for Kubernetes](https://github.com/aws-controllers-k8s/sagemaker-controller) and configure Role-Based Access Control (RBAC) allowing the Kubeflow Pipelines pods to create SageMaker AI custom resources in your Kubernetes cluster.

**Important**  
Follow this section if you are using Kubeflow pipelines standalone deployment. If you are using AWS distribution of Kubeflow version 1.6.0-aws-b1.0.0 or above, SageMaker AI components version 2 are already set up.

1. Install SageMaker AI Operator for Kubernetes to use SageMaker AI components version 2.

   Follow the *Setup* section of [Machine Learning with ACK SageMaker AI Controller tutorial](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/#setup).

1. Configure RBAC permissions for the execution role (service account) used by Kubeflow Pipelines pods. In Kubeflow Pipelines standalone deployment, pipeline runs are executed in the namespace `kubeflow` using the `pipeline-runner` service account.

   1. Create a [RoleBinding](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#rolebinding-example) that gives the service account permission to manage SageMaker AI custom resources.

      ```
      cat > manage_sagemaker_cr.yaml <<EOF
      apiVersion: rbac.authorization.k8s.io/v1
      kind: RoleBinding
      metadata:
      name: manage-sagemaker-cr  
      namespace: kubeflow
      subjects:
      - kind: ServiceAccount
      name: pipeline-runner
      namespace: kubeflow
      roleRef:
      kind: ClusterRole
      name: ack-sagemaker-controller 
      apiGroup: rbac.authorization.k8s.io
      EOF
      ```

      ```
      kubectl apply -f manage_sagemaker_cr.yaml
      ```

   1. Ensure that the rolebinding was created by running:

      ```
      kubectl get rolebinding manage-sagemaker-cr -n kubeflow -o yaml
      ```

### Configuration for SageMaker AI components version 1


To run SageMaker AI Components version 1 for Kubeflow Pipelines, the Kubeflow Pipeline pods need access to SageMaker AI.

**Important**  
Follow this section whether you are using the full Kubeflow on AWS deployment or Kubeflow Pilepines standalone.

To create an IAM execution role granting Kubeflow pipeline pods access to SageMaker AI, follow those steps:

1. Export your cluster name (e.g., *my-cluster-name*) and cluster region (e.g., *us-east-1*).

   ```
   export CLUSTER_NAME=my-cluster-name
   export CLUSTER_REGION=us-east-1
   ```

1. Export the namespace and service account name according to your installation.
   + For the full Kubeflow on AWS installation, export your profile `namespace` (e.g., *kubeflow-user-example-com*) and *default-editor* as the service account.

     ```
     export NAMESPACE=kubeflow-user-example-com
     export KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT=default-editor
     ```
   + For the standalone Pipelines deployment, export *kubeflow* as the `namespace` and *pipeline-runner* as the service account.

     ```
     export NAMESPACE=kubeflow
     export KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT=pipeline-runner
     ```

1. Create an [ IAM OIDC provider for the Amazon EKS cluster](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html) with the following command.

   ```
   eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} \
               --region ${CLUSTER_REGION} --approve
   ```

1. Create an IAM execution role for the KFP pods to access AWS services (SageMaker AI, CloudWatch).

   ```
   eksctl create iamserviceaccount \
   --name ${KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT} \
   --namespace ${NAMESPACE} --cluster ${CLUSTER_NAME} \
   --region ${CLUSTER_REGION} \
   --attach-policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess \
   --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchLogsFullAccess \
   --override-existing-serviceaccounts \
   --approve
   ```

Once your pipeline permissions are configured to access SageMaker AI Components version 1, follow the SageMaker AI components for Kubeflow pipelines guide on the[ Kubeflow on AWS documentation](https://awslabs.github.io/kubeflow-manifests/docs/amazon-sagemaker-integration/sagemaker-components-for-kubeflow-pipelines/).

## Access the KFP UI (Kubeflow Dashboard)


The Kubeflow Pipelines UI is used for managing and tracking experiments, jobs, and runs on your cluster. For instructions on how to access the Kubeflow Pipelines UI from your gateway node, follow the steps that apply to your deployment option in this section.

### Full Kubeflow on AWS Deployment


Follow the instructions on the [Kubeflow on AWS website](https://awslabs.github.io/kubeflow-manifests/docs/deployment/connect-kubeflow-dashboard/) to connect to the Kubeflow dashboard and navigate to the pipelines tab.

### Standalone Kubeflow Pipelines Deployment


Use port forwarding to access the Kubeflow Pipelines UI from your gateway node by following those steps.

#### Set up port forwarding to the KFP UI service


Run the following command from the command line of your gateway node.

1. Verify that the KFP UI service is running using the following command.

   ```
   kubectl -n kubeflow get service ml-pipeline-ui
   
   NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
   ml-pipeline-ui   ClusterIP   10.100.38.71   <none>        80/TCP    2d22h
   ```

1. Run the following command to set up port forwarding to the KFP UI service. This forwards the KFP UI to port 8080 on your gateway node and allows you to access the KFP UI from your browser. 

   ```
   kubectl port-forward -n kubeflow service/ml-pipeline-ui 8080:80
   ```

   The port forward from your remote machine drops if there is no activity. Run this command again if your dashboard is unable to get logs or updates. If the commands return an error, ensure that there is no process already running on the port you are trying to use. 

#### Access the KFP UI service


Your method of accessing the KFP UI depends on your gateway node type.
+ Local machine as the gateway node:

  1. Access the dashboard in your browser as follows: 

     ```
     http://localhost:8080
     ```

  1. Choose **Pipelines** to access the pipelines UI. 
+ Amazon EC2 instance as the gateway node:

  1. You need to set up an SSH tunnel on your Amazon EC2 instance to access the Kubeflow dashboard from your local machine's browser. 

     From a new terminal session in your local machine, run the following. Replace `<public-DNS-of-gateway-node>` with the IP address of your instance found on the Amazon EC2 console. You can also use the public DNS. Replace `<path_to_key>` with the path to the pem key used to access the gateway node. 

     ```
     public_DNS_address=<public-DNS-of-gateway-node>
     key=<path_to_key>
     
     on Ubuntu:
     ssh -i ${key} -L 9000:localhost:8080 ubuntu@${public_DNS_address}
     
     or on Amazon Linux:
     ssh -i ${key} -L 9000:localhost:8080 ec2-user@${public_DNS_address}
     ```

  1. Access the dashboard in your browser. 

     ```
     http://localhost:9000
     ```

  1. Choose **Pipelines** to access the KFP UI. 

#### (Optional) Grant SageMaker AI notebook instances access to Amazon EKS, and run KFP pipelines from your notebook.


A SageMaker notebook instance is a fully managed Amazon EC2 compute instance that runs the Jupyter Notebook App. You can use a notebook instance to create and manage Jupyter notebooks then define, compile, deploy, and run your KFP pipelines using AWS SDK for Python (Boto3) or the KFP CLI. 

1. Follow the steps in [Create a SageMaker Notebook Instance](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html) to create your notebook instance, then attach the `S3FullAccess` policy to its IAM execution role.

1. From the command line of your gateway node, run the following command to retrieve the IAM role ARN of the notebook instance you created. Replace `<instance-name>` with the name of your instance.

   ```
   aws sagemaker describe-notebook-instance --notebook-instance-name <instance-name> --region <region> --output text --query 'RoleArn'
   ```

   This command outputs the IAM role ARN in the `arn:aws:iam::<account-id>:role/<role-name>` format. Take note of this ARN.

1. Run this command to attach the following policies (AmazonSageMakerFullAccess, AmazonEKSWorkerNodePolicy, AmazonS3FullAccess) to this IAM role. Replace `<role-name>` with the `<role-name>` in your ARN. 

   ```
   aws iam attach-role-policy --role-name <role-name> --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
   aws iam attach-role-policy --role-name <role-name> --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
   aws iam attach-role-policy --role-name <role-name> --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
   ```

1. Amazon EKS clusters use IAM roles to control access to the cluster. The rules are implemented in a config map named `aws-auth`. `eksctl` provides commands to read and edit the `aws-auth` config map. Only the users that have access to the cluster can edit this config map.

   `system:masters` is one of the default user groups with super user permissions to the cluster. Add your user to this group or create a group with more restrictive permissions.

1. Bind the role to your cluster by running the following command. Replace `<IAM-Role-arn>` with the ARN of the IAM role. `<your_username>` can be any unique username.

   ```
   eksctl create iamidentitymapping \
   --cluster <cluster-name> \
   --arn <IAM-Role-arn> \
   --group system:masters \
   --username <your-username> \
   --region <region>
   ```

1. Open a Jupyter notebook on your SageMaker AI instance and run the following command to ensure that it has access to the cluster.

   ```
   aws eks --region <region> update-kubeconfig --name <cluster-name>
   kubectl -n kubeflow get all | grep pipeline
   ```

# Use SageMaker AI components


In this tutorial, you run a pipeline using SageMaker AI Components for Kubeflow Pipelines to train a classification model using Kmeans with the MNIST dataset on SageMaker AI. The workflow uses Kubeflow Pipelines as the orchestrator and SageMaker AI to execute each step of the workflow. The example was taken from an existing [ SageMaker AI example](https://github.com/aws/amazon-sagemaker-examples/blob/8279abfcc78bad091608a4a7135e50a0bd0ec8bb/sagemaker-python-sdk/1P_kmeans_highlevel/kmeans_mnist.ipynb) and modified to work with SageMaker AI Components for Kubeflow Pipelines.

You can define your pipeline in Python using AWS SDK for Python (Boto3) then use the KFP dashboard, KFP CLI, or Boto3 to compile, deploy, and run your workflows. The full code for the MNIST classification pipeline example is available in the [Kubeflow Github repository](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/mnist-kmeans-sagemaker#mnist-classification-with-kmeans). To use it, clone the Python files to your gateway node.

You can find additional [ SageMaker AI Kubeflow Pipelines examples](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples) on GitHub. For information on the components used, see the [KubeFlow Pipelines GitHub repository](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker).

To run the classification pipeline example, create a SageMaker AI IAM execution role granting your training job the permission to access AWS resources, then continue with the steps that correspond to your deployment option.

## Create a SageMaker AI execution role


The `kfp-example-sagemaker-execution-role` IAM role is a runtime role assumed by SageMaker AI jobs to access AWS resources. In the following command, you create an IAM execution role named `kfp-example-sagemaker-execution-role`, attach two managed policies (AmazonSageMakerFullAccess, AmazonS3FullAccess), and create a trust relationship with SageMaker AI to grant SageMaker AI jobs access to those AWS resources.

You provide this role as an input parameter when running the pipeline.

Run the following command to create the role. Note the ARN that is returned in your output.

```
SAGEMAKER_EXECUTION_ROLE_NAME=kfp-example-sagemaker-execution-role

TRUST="{ \"Version\": \"2012-10-17		 	 	 \", \"Statement\": [ { \"Effect\": \"Allow\", \"Principal\": { \"Service\": \"sagemaker.amazonaws.com\" }, \"Action\": \"sts:AssumeRole\" } ] }"
aws iam create-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --assume-role-policy-document "$TRUST"
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

aws iam get-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --output text --query 'Role.Arn'
```

## Full Kubeflow on AWS Deployment


Follow the instructions of the [SageMaker Training Pipeline tutorial for MNIST Classification with K-Means](https://awslabs.github.io/kubeflow-manifests/docs/amazon-sagemaker-integration/sagemaker-components-for-kubeflow-pipelines/).

## Standalone Kubeflow Pipelines Deployment


### Prepare datasets


To run the pipelines, you need to upload the data extraction pre-processing script to an Amazon S3 bucket. This bucket and all resources for this example must be located in the `us-east-1` region. For information on creating a bucket, see [Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).

From the `mnist-kmeans-sagemaker` folder of the Kubeflow repository you cloned on your gateway node, run the following command to upload the `kmeans_preprocessing.py` file to your Amazon S3 bucket. Change `<bucket-name>` to the name of your Amazon S3 bucket.

```
aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://<bucket-name>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
```

### Compile and deploy your pipeline


After defining the pipeline, you must compile it to an intermediate representation before you submit it to the Kubeflow Pipelines service on your cluster. The intermediate representation is a workflow specification in the form of a YAML file compressed into a tar.gz file. You need the KFP SDK to compile your pipeline.

#### Install KFP SDK


Run the following from the command line of your gateway node:

1. Install the KFP SDK following the instructions in the [Kubeflow pipelines documentation](https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/).

1. Verify that the KFP SDK is installed with the following command:

   ```
   pip show kfp
   ```

1. Verify that `dsl-compile` has been installed correctly as follows:

   ```
   which dsl-compile
   ```

#### Compile your pipeline


You have three options to interact with Kubeflow Pipelines: KFP UI, KFP CLI, or the KFP SDK. The following sections illustrate the workflow using the KFP UI and CLI.

Complete the following steps from your gateway node.

1. Modify your Python file with your Amazon S3 bucket name and IAM role ARN.

1. Use the `dsl-compile` command from the command line to compile your pipeline as follows. Replace `<path-to-python-file>` with the path to your pipeline and `<path-to-output>` with the location where you want your tar.gz file to be.

   ```
   dsl-compile --py <path-to-python-file> --output <path-to-output>
   ```

#### Upload and run the pipeline using the KFP CLI


Complete the following steps from the command line of your gateway node. KFP organizes runs of your pipeline as experiments. You have the option to specify an experiment name. If you do not specify one, the run will be listed under **Default** experiment.

1. Upload your pipeline as follows:

   ```
   kfp pipeline upload --pipeline-name <pipeline-name> <path-to-output-tar.gz>
   ```

   Your output should look like the following. Take note of the pipeline `ID`.

   ```
   Pipeline 29c3ff21-49f5-4dfe-94f6-618c0e2420fe has been submitted
   
   Pipeline Details
   ------------------
   ID           29c3ff21-49f5-4dfe-94f6-618c0e2420fe
   Name         sm-pipeline
   Description
   Uploaded at  2020-04-30T20:22:39+00:00
   ...
   ...
   ```

1. Create a run using the following command. The KFP CLI run command currently does not support specifying input parameters while creating the run. You need to update your parameters in the AWS SDK for Python (Boto3) pipeline file before compiling. Replace `<experiment-name>` and `<job-name>` with any names. Replace `<pipeline-id>` with the ID of your submitted pipeline. Replace `<your-role-arn>` with the ARN of `kfp-example-pod-role`. Replace `<your-bucket-name>` with the name of the Amazon S3 bucket you created. 

   ```
   kfp run submit --experiment-name <experiment-name> --run-name <job-name> --pipeline-id <pipeline-id> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
   ```

   You can also directly submit a run using the compiled pipeline package created as the output of the `dsl-compile` command.

   ```
   kfp run submit --experiment-name <experiment-name> --run-name <job-name> --package-file <path-to-output> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
   ```

   Your output should look like the following:

   ```
   Creating experiment aws.
   Run 95084a2c-f18d-4b77-a9da-eba00bf01e63 is submitted
   +--------------------------------------+--------+----------+---------------------------+
   | run id                               | name   | status   | created at                |
   +======================================+========+==========+===========================+
   | 95084a2c-f18d-4b77-a9da-eba00bf01e63 | sm-job |          | 2020-04-30T20:36:41+00:00 |
   +--------------------------------------+--------+----------+---------------------------+
   ```

1. Navigate to the UI to check the progress of the job.

#### Upload and run the pipeline using the KFP UI


1. On the left panel, choose the **Pipelines** tab. 

1. In the upper-right corner, choose **\$1UploadPipeline**. 

1. Enter the pipeline name and description. 

1. Choose **Upload a file** and enter the path to the tar.gz file you created using the CLI or with AWS SDK for Python (Boto3).

1. On the left panel, choose the **Pipelines** tab.

1. Find the pipeline you created.

1. Choose **\$1CreateRun**.

1. Enter your input parameters.

1. Choose **Run**.

### Run predictions


Once your classification pipeline is deployed, you can run classification predictions against the endpoint that was created by the Deploy component. Use the KFP UI to check the output artifacts for `sagemaker-deploy-model-endpoint_name`. Download the .tgz file to extract the endpoint name or check the SageMaker AI console in the region you used.

#### Configure permissions to run predictions


If you want to run predictions from your gateway node, skip this section.

**To use any other machine to run predictions, assign the `sagemaker:InvokeEndpoint` permission to the IAM role used by the client machine.**

1. On your gateway node, run the following to create an IAM policy file:

   ```
   cat <<EoF > ./sagemaker-invoke.json
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "sagemaker:InvokeEndpoint"
               ],
               "Resource": "*"
           }
       ]
   }
   EoF
   ```

1. Attach the policy to the IAM role of the client node.

   Run the following command. Replace `<your-instance-IAM-role>` with the name of the IAM role. Replace `<path-to-sagemaker-invoke-json>` with the path to the policy file you created.

   ```
   aws iam put-role-policy --role-name <your-instance-IAM-role> --policy-name sagemaker-invoke-for-worker --policy-document file://<path-to-sagemaker-invoke-json>
   ```

#### Run predictions


1. Create a AWS SDK for Python (Boto3) file from your client machine named `mnist-predictions.py` with the following content. Replace the `ENDPOINT_NAME` variable. The script loads the MNIST dataset, creates a CSV from those digits, then sends the CSV to the endpoint for prediction and prints the results.

   ```
   import boto3
   import gzip
   import io
   import json
   import numpy
   import pickle
   
   ENDPOINT_NAME='<endpoint-name>'
   region = boto3.Session().region_name
   
   # S3 bucket where the original mnist data is downloaded and stored
   downloaded_data_bucket = f"jumpstart-cache-prod-{region}"
   downloaded_data_prefix = "1p-notebooks-datasets/mnist"
   
   # Download the dataset
   s3 = boto3.client("s3")
   s3.download_file(downloaded_data_bucket, f"{downloaded_data_prefix}/mnist.pkl.gz", "mnist.pkl.gz")
   
   # Load the dataset
   with gzip.open('mnist.pkl.gz', 'rb') as f:
       train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
   
   # Simple function to create a csv from our numpy array
   def np2csv(arr):
       csv = io.BytesIO()
       numpy.savetxt(csv, arr, delimiter=',', fmt='%g')
       return csv.getvalue().decode().rstrip()
   
   runtime = boto3.Session(region).client('sagemaker-runtime')
   
   payload = np2csv(train_set[0][30:31])
   
   response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                      ContentType='text/csv',
                                      Body=payload)
   result = json.loads(response['Body'].read().decode())
   print(result)
   ```

1. Run the AWS SDK for Python (Boto3) file as follows:

   ```
   python mnist-predictions.py
   ```

### View results and logs


When the pipeline is running, you can choose any component to check execution details, such as inputs and outputs. This lists the names of created resources.

If the KFP request is successfully processed and an SageMaker AI job is created, the component logs in the KFP UI provide a link to the job created in SageMaker AI. The CloudWatch logs are also provided if the job is successfully created. 

If you run too many pipeline jobs on the same cluster, you may see an error message that indicates that you do not have enough pods available. To fix this, log in to your gateway node and delete the pods created by the pipelines you are not using:

```
kubectl get pods -n kubeflow
kubectl delete pods -n kubeflow <name-of-pipeline-pod>
```

### Cleanup


When you're finished with your pipeline, you need to clean up your resources.

1. From the KFP dashboard, terminate your pipeline runs if they do not exit properly by choosing **Terminate**.

1. If the **Terminate** option doesn't work, log in to your gateway node and manually terminate all the pods created by your pipeline run as follows: 

   ```
   kubectl get pods -n kubeflow
   kubectl delete pods -n kubeflow <name-of-pipeline-pod>
   ```

1. Using your AWS account, log in to the SageMaker AI service. Manually stop all training, batch transform, and HPO jobs. Delete models, data buckets, and endpoints to avoid incurring any additional costs. Terminating the pipeline runs does not stop the jobs in SageMaker AI.

# SageMaker Notebook Jobs
Notebook Jobs

You can use Amazon SageMaker AI to interactively build, train, and deploy machine learning models from your Jupyter notebook in any JupyterLab environment. However, there are various scenarios in which you might want to run your notebook as a noninteractive, scheduled job. For example, you might want to create regular audit reports that analyze all training jobs run over a certain time frame and analyze the business value of deploying those models into production. Or you might want to scale up a feature engineering job after testing the data transformation logic on a small subset of data. Other common use cases include:
+ Scheduling jobs for model drift monitoring
+ Exploring the parameter space for better models

In these scenarios, you can use SageMaker Notebook Jobs to create a noninteractive job (which SageMaker AI runs as an underlying training job) to either run on demand or on a schedule. SageMaker Notebook Jobs provides an intuitive user interface so you can schedule your jobs right from JupyterLab by choosing the Notebook Jobs widget (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) in your notebook. You can also schedule your jobs using the SageMaker AI Python SDK, which offers the flexibility of scheduling multiple notebook jobs in a pipeline workflow. You can run multiple notebooks in parallel, and parameterize cells in your notebooks to customize the input parameters.

This feature leverages the Amazon EventBridge, SageMaker Training and Pipelines services and is available for use in your Jupyter notebook in any of the following environments:
+ Studio, Studio Lab, Studio Classic, or Notebook Instances
+ Local setup, such as your local machine, where you run JupyterLab

**Prerequisites**

To schedule a notebook job, make sure you meet the following criteria:
+ Ensure your Jupyter notebook and any initialization or startup scripts are self-contained with respect to code and software packages. Otherwise, your noninteractive job may incur errors.
+ Review [Constraints and considerations](notebook-auto-run-constraints.md) to make sure you properly configured your Jupyter notebook, network settings, and container settings.
+ Ensure your notebook can access needed external resources, such as Amazon EMR clusters.
+ If you are setting up Notebook Jobs in a local Jupyter notebook, complete the installation. For instructions, see [Installation guide](scheduled-notebook-installation.md). 
+ If you connect to an Amazon EMR cluster in your notebook and want to parameterize your Amazon EMR connection command, you must apply a workaround using environment variables to pass parameters. For details, see [Connect to an Amazon EMR cluster from your notebook](scheduled-notebook-connect-emr.md).
+ If you connect to an Amazon EMR cluster using Kerberos, LDAP, or HTTP Basic Auth authentication, you must use the AWS Secrets Manager to pass your security credentials to your Amazon EMR connection command. For details, see [Connect to an Amazon EMR cluster from your notebook](scheduled-notebook-connect-emr.md).
+ (optional) If you want the UI to preload a script to run upon notebook startup, your admin must install it with a Lifecycle Configuration (LCC). For information about how to use a LCC script, see [Customize a Notebook Instance Using a Lifecycle Configuration Script](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-lifecycle-config.html).

# Installation guide


The following provides information about what you need to install to use Notebook Jobs in your JupyterLab environment.

**For Amazon SageMaker Studio and Amazon SageMaker Studio Lab**

If your notebook is in Amazon SageMaker Studio or Amazon SageMaker Studio Lab, you don’t need to perform additional installation—SageMaker Notebook Jobs is built into the platform. To set up required permissions for Studio, see [Set up policies and permissions for Studio](scheduled-notebook-policies-studio.md).

**For local Jupyter notebooks**

If you want to use SageMaker Notebook Jobs for your local JupyterLab environment, you need to perform additional installation.

To install SageMaker Notebook Jobs, complete the following steps:

1. Install Python 3. For details, see [Installing Python 3 and Python Packages](https://www.codecademy.com/article/install-python3).

1. Install JupyterLab version 4 or higher. For details, see [JupyterLab SDK documentation](https://jupyterlab.readthedocs.io/en/stable/getting_started/installation.html).

1. Install the AWS CLI. For details, see [Installing or updating the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html).

1. Install two sets of permissions. The IAM user needs permissions to submit jobs to SageMaker AI, and once submitted, the notebook job itself assumes an IAM role that needs permissions to access resources depending on the job tasks.

   1. If you haven’t yet created an IAM user, see [Creating an IAM user in your AWS account](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html).

   1. If you haven’t yet created your notebook job role, see [Creating a role to delegate permissions to an IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user.html).

   1. Attach the necessary permissions and trust policy to attach to your user and role. For step-by-step instructions and permission details, see [Install policies and permissions for local Jupyter environments](scheduled-notebook-policies-other.md).

1. Generate AWS credentials for your newly-created IAM user and save them in the credentials file (\$1/.aws/credentials) of your JupyterLab environment. You can do this with the CLI command `aws configure`. For instructions, see section *Set and view configuration settings using commands* in [Configuration and credential file settings](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).

1. (optional) By default, the scheduler extension uses a pre-built SageMaker AI Docker image with Python 2.0. Any non-default kernel used in the notebook should be installed in the container. If you want to run your notebook in a container or Docker image, you need to create an Amazon Elastic Container Registry (Amazon ECR) image. For information about how to push a Docker image to an Amazon ECR, see [Pushing a Docker Image](https://docs.aws.amazon.com/AmazonECR/latest/userguide/docker-push-ecr-image.html).

1. Add the JupyterLab extension for SageMaker Notebook Jobs. You can add it to your JupyterLab environment with the command: `pip install amazon_sagemaker_jupyter_scheduler`. You may need to restart your Jupyter server with the command:`sudo systemctl restart jupyter-server`.

1. Start JupyterLab with the command: `jupyter lab`.

1. Verify that the Notebook Jobs widget (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) appears in your Jupyter notebook taskbar.

# Set up policies and permissions for Studio


You will need to install the proper policies and permissions before you schedule your first notebook run. The following provides instructions on setting up the following permissions:
+ Job execution role trust relationships
+ Additional IAM permissions attached to the job execution role
+ (optional) The AWS KMS permission policy to use a custom KMS key

**Important**  
If your AWS account belongs to an organization with service control policies (SCP) in place, your effective permissions are the logical intersection between what is allowed by the SCPs and what is allowed by your IAM role and user policies. For example, if your organization’s SCP specifies that you can only access resources in `us-east-1` and `us-west-1`, and your policies only allow you to access resources in `us-west-1` and `us-west-2`, then ultimately you can only access resources in `us-west-1`. If you want to exercise all the permissions allowed in your role and user policies, your organization’s SCPs should grant the same set of permissions as your own IAM user and role policies. For details about how to determine your allowed requests, see [Determining whether a request is allowed or denied within an account](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic.html#policy-eval-denyallow).

**Trust relationships**

To modify the trust relationships, complete the following steps:

1. Open the [IAM console](https://console.aws.amazon.com/iam/).

1. Select **Roles** in the left panel.

1. Find the job execution role for your notebook job and choose the role name. 

1. Choose the **Trust relationships** tab.

1. Choose **Edit trust policy**.

1. Copy and paste the following policy:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "sagemaker.amazonaws.com"
               },
               "Action": "sts:AssumeRole"
           },
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": "events.amazonaws.com"
               },
               "Action": "sts:AssumeRole"
           }
       ]
   }
   ```

------

1. Choose **Update Policy**.

## Additional IAM permissions


You might need to include additional IAM permissions in the following situations:
+ Your Studio execution and notebook job roles differ
+ You need to access Amazon S3 resources through a S3 VPC endpoint
+ You want to use a custom KMS key to encrypt your input and output Amazon S3 buckets

The following discussion provides the policies you need for each case.

### Permissions needed if your Studio execution and notebook job roles differ


The following JSON snippet is an example policy that you should add to the Studio execution and notebook job roles if you don’t use the Studio execution role as the notebook job role. Review and modify this policy if you need to further restrict privileges.

------
#### [ JSON ]

****  

```
{
   "Version":"2012-10-17",		 	 	 
   "Statement":[
      {
         "Effect":"Allow",
         "Action":"iam:PassRole",
         "Resource":"arn:aws:iam::*:role/*",
         "Condition":{
            "StringLike":{
               "iam:PassedToService":[
                  "sagemaker.amazonaws.com",
                  "events.amazonaws.com"
               ]
            }
         }
      },
      {
         "Effect":"Allow",
         "Action":[
            "events:TagResource",
            "events:DeleteRule",
            "events:PutTargets",
            "events:DescribeRule",
            "events:PutRule",
            "events:RemoveTargets",
            "events:DisableRule",
            "events:EnableRule"
         ],
         "Resource":"*",
         "Condition":{
            "StringEquals":{
               "aws:ResourceTag/sagemaker:is-scheduling-notebook-job":"true"
            }
         }
      },
      {
         "Effect":"Allow",
         "Action":[
            "s3:CreateBucket",
            "s3:PutBucketVersioning",
            "s3:PutEncryptionConfiguration"
         ],
         "Resource":"arn:aws:s3:::sagemaker-automated-execution-*"
      },
      {
            "Sid": "S3DriverAccess",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:GetObject",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::sagemakerheadlessexecution-*"
            ]
      },
      {
         "Effect":"Allow",
         "Action":[
            "sagemaker:ListTags"
         ],
         "Resource":[
            "arn:aws:sagemaker:*:*:user-profile/*",
            "arn:aws:sagemaker:*:*:space/*",
            "arn:aws:sagemaker:*:*:training-job/*",
            "arn:aws:sagemaker:*:*:pipeline/*"
         ]
      },
      {
         "Effect":"Allow",
         "Action":[
            "sagemaker:AddTags"
         ],
         "Resource":[
            "arn:aws:sagemaker:*:*:training-job/*",
            "arn:aws:sagemaker:*:*:pipeline/*"
         ]
      },
      {
         "Effect":"Allow",
         "Action":[
            "ec2:DescribeDhcpOptions",
            "ec2:DescribeNetworkInterfaces",
            "ec2:DescribeRouteTables",
            "ec2:DescribeSecurityGroups",
            "ec2:DescribeSubnets",
            "ec2:DescribeVpcEndpoints",
            "ec2:DescribeVpcs",
            "ecr:BatchCheckLayerAvailability",
            "ecr:BatchGetImage",
            "ecr:GetDownloadUrlForLayer",
            "ecr:GetAuthorizationToken",
            "s3:ListBucket",
            "s3:GetBucketLocation",
            "s3:GetEncryptionConfiguration",
            "s3:PutObject",
            "s3:DeleteObject",
            "s3:GetObject",
            "sagemaker:DescribeApp",
            "sagemaker:DescribeDomain",
            "sagemaker:DescribeUserProfile",
            "sagemaker:DescribeSpace",
            "sagemaker:DescribeStudioLifecycleConfig",
            "sagemaker:DescribeImageVersion",
            "sagemaker:DescribeAppImageConfig",
            "sagemaker:CreateTrainingJob",
            "sagemaker:DescribeTrainingJob",
            "sagemaker:StopTrainingJob",
            "sagemaker:Search",
            "sagemaker:CreatePipeline",
            "sagemaker:DescribePipeline",
            "sagemaker:DeletePipeline",
            "sagemaker:StartPipelineExecution"
         ],
         "Resource":"*"
      }
   ]
}
```

------

### Permissions needed to access Amazon S3 resources through a S3 VPC endpoint


If you run SageMaker Studio in private VPC mode and access S3 through the S3 VPC endpoint, you can add permissions to the VPC endpoint policy to control which S3 resources are accessible through the VPC endpoint. Add the following permissions to your VPC endpoint policy. You can modify the policy if you need to further restrict permissions—for example, you can provide a more narrow specification for the `Principal` field.

```
{
    "Sid": "S3DriverAccess",
    "Effect": "Allow",
    "Principal": "*",
    "Action": [
        "s3:GetBucketLocation",
        "s3:GetObject",
        "s3:ListBucket"
    ],
    "Resource": "arn:aws:s3:::sagemakerheadlessexecution-*"
}
```

For details about how to set up a S3 VPC endpoint policy, see [Edit the VPC endpoint policy](https://docs.aws.amazon.com/vpc/latest/privatelink/vpc-endpoints-s3.html#edit-vpc-endpoint-policy-s3).

### Permissions needed to use a custom KMS key (optional)


By default, the input and output Amazon S3 buckets are encrypted using server side encryption, but you can specify a custom KMS key to encrypt your data in the output Amazon S3 bucket and the storage volume attached to the notebook job.

If you want to use a custom KMS key, attach the following policy and supply your own KMS key ARN.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
      {
         "Effect":"Allow",
         "Action":[
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey",
            "kms:CreateGrant"
         ],
         "Resource":"arn:aws:kms:us-east-1:111122223333:key/key-id"
      }
   ]
}
```

------

# Install policies and permissions for local Jupyter environments


You will need to set up the necessary permissions and policies to schedule notebook jobs in a local Jupyter environment. The IAM user needs permissions to submit jobs to SageMaker AI and the IAM role that the notebook job itself assumes needs permissions to access resources, depending on the job tasks. The following will provide instructions on how to set up the necessary permissions and policies.

You will need to install two sets of permissions. The following diagram shows the permission structure for you to schedule notebook jobs in a local Jupyter environment. The IAM user needs to set up IAM permissions in order to submit jobs to SageMaker AI. Once the user submits the notebook job, the job itself assumes an IAM role that has permissions to access resources depending on the job tasks.

![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/notebook-jobs-permissions.png)


The following sections help you install necessary policies and permissions for both the IAM user and the job execution role.

## IAM user permissions


**Permissions to submit jobs to SageMaker AI**

To add permissions to submit jobs, complete the following steps:

1. Open the [IAM console](https://console.aws.amazon.com/iam/).

1. Select **Users** in the left panel.

1. Find the IAM user for your notebook job and choose the user name.

1. Choose **Add Permissions**, and choose **Create inline policy** from the dropdown menu.

1. Choose the **JSON** tab.

1. Copy and paste the following policy:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "EventBridgeSchedule",
               "Effect": "Allow",
               "Action": [
                   "events:TagResource",
                   "events:DeleteRule",
                   "events:PutTargets",
                   "events:DescribeRule",
                   "events:EnableRule",
                   "events:PutRule",
                   "events:RemoveTargets",
                   "events:DisableRule"
               ],
               "Resource": "*",
               "Condition": {
                   "StringEquals": {
                       "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
                   }
               }
           },
           {
               "Sid": "IAMPassrole",
               "Effect": "Allow",
               "Action": "iam:PassRole",
               "Resource": "arn:aws:iam::*:role/*",
               "Condition": {
                   "StringLike": {
                       "iam:PassedToService": [
                           "sagemaker.amazonaws.com",
                           "events.amazonaws.com"
                       ]
                   }
               }
           },
           {
               "Sid": "IAMListRoles",
               "Effect": "Allow",
               "Action": "iam:ListRoles",
               "Resource": "*"
           },
           {
               "Sid": "S3ArtifactsAccess",
               "Effect": "Allow",
               "Action": [
                   "s3:PutEncryptionConfiguration",
                   "s3:CreateBucket",
                   "s3:PutBucketVersioning",
                   "s3:ListBucket",
                   "s3:PutObject",
                   "s3:GetObject",
                   "s3:GetEncryptionConfiguration",
                   "s3:DeleteObject",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::sagemaker-automated-execution-*"
               ]
           },
           {
               "Sid": "S3DriverAccess",
               "Effect": "Allow",
               "Action": [
                   "s3:ListBucket",
                   "s3:GetObject",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::sagemakerheadlessexecution-*"
               ]
           },
           {
               "Sid": "SagemakerJobs",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:DescribeTrainingJob",
                   "sagemaker:StopTrainingJob",
                   "sagemaker:DescribePipeline",
                   "sagemaker:CreateTrainingJob",
                   "sagemaker:DeletePipeline",
                   "sagemaker:CreatePipeline"
               ],
               "Resource": "*",
               "Condition": {
                   "StringEquals": {
                       "aws:ResourceTag/sagemaker:is-scheduling-notebook-job": "true"
                   }
               }
           },
           {
               "Sid": "AllowSearch",
               "Effect": "Allow",
               "Action": "sagemaker:Search",
               "Resource": "*"
           },
           {
               "Sid": "SagemakerTags",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:ListTags",
                   "sagemaker:AddTags"
               ],
               "Resource": [
                   "arn:aws:sagemaker:*:*:pipeline/*",
                   "arn:aws:sagemaker:*:*:space/*",
                   "arn:aws:sagemaker:*:*:training-job/*",
                   "arn:aws:sagemaker:*:*:user-profile/*"
               ]
           },
           {
               "Sid": "ECRImage",
               "Effect": "Allow",
               "Action": [
                   "ecr:GetAuthorizationToken",
                   "ecr:BatchGetImage"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

**AWS KMS permission policy (optional)**

By default, the input and output Amazon S3 buckets are encrypted using server side encryption, but you can specify a custom KMS key to encrypt your data in the output Amazon S3 bucket and the storage volume attached to the notebook job.

If you want to use a custom KMS key, repeat the previous instructions, attaching the following policy, and supply your own KMS key ARN.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
      {
         "Effect":"Allow",
         "Action":[
            "kms:Encrypt",
            "kms:Decrypt",
            "kms:ReEncrypt*",
            "kms:GenerateDataKey*",
            "kms:DescribeKey",
            "kms:CreateGrant"
         ],
         "Resource":"arn:aws:kms:us-east-1:111122223333:key/key-id"
      }
   ]
}
```

------

## Job execution role permissions


**Trust relationships**

To modify the job execution role trust relationships, complete the following steps:

1. Open the [IAM console](https://console.aws.amazon.com/iam/).

1. Select **Roles** in the left panel.

1. Find the job execution role for your notebook job and choose the role name.

1. Choose the **Trust relationships** tab.

1. Choose **Edit trust policy**.

1. Copy and paste the following policy:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "Service": [
                       "sagemaker.amazonaws.com",
                       "events.amazonaws.com"
                   ]
               },
               "Action": "sts:AssumeRole"
           }
       ]
   }
   ```

------

**Additional permissions**

Once submitted, the notebook job needs permissions to access resources. The following instructions show you how to add a minimal set of permissions. If needed, add more permissions based on your notebook job needs. To add permissions to your job execution role, complete the following steps:

1. Open the [IAM console](https://console.aws.amazon.com/iam/).

1. Select **Roles** in the left panel.

1. Find the job execution role for your notebook job and choose the role name.

1. Choose **Add Permissions**, and choose **Create inline policy** from the dropdown menu.

1. Choose the **JSON** tab.

1. Copy and paste the following policy:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "PassroleForJobCreation",
               "Effect": "Allow",
               "Action": "iam:PassRole",
               "Resource": "arn:aws:iam::*:role/*",
               "Condition": {
                   "StringLike": {
                       "iam:PassedToService": "sagemaker.amazonaws.com"
                   }
               }
           },
           {
               "Sid": "S3ForStoringArtifacts",
               "Effect": "Allow",
               "Action": [
                   "s3:PutObject",
                   "s3:GetObject",
                   "s3:ListBucket",
                   "s3:GetBucketLocation"
               ],
               "Resource": "arn:aws:s3:::sagemaker-automated-execution-*"
           },
           {
               "Sid": "S3DriverAccess",
               "Effect": "Allow",
               "Action": [
                   "s3:ListBucket",
                   "s3:GetObject",
                   "s3:GetBucketLocation"
               ],
               "Resource": [
                   "arn:aws:s3:::sagemakerheadlessexecution-*"
               ]
           },
           {
               "Sid": "SagemakerJobs",
               "Effect": "Allow",
               "Action": [
                   "sagemaker:StartPipelineExecution",
                   "sagemaker:CreateTrainingJob"
               ],
               "Resource": "*"
           },
           {
               "Sid": "ECRImage",
               "Effect": "Allow",
               "Action": [
                   "ecr:GetDownloadUrlForLayer",
                   "ecr:BatchGetImage",
                   "ecr:GetAuthorizationToken",
                   "ecr:BatchCheckLayerAvailability"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. Add permissions to other resources your notebook job accesses.

1. Choose **Review policy**.

1. Enter a name for your policy.

1. Choose **Create policy**.

# Where you can create a notebook job


If you want to create a notebook job, you have multiple options. The following provides the SageMaker AI options for you to create a notebook job.

You can create a job in your JupyterLab notebook in the Studio UI, or you can programmatically create a job with the SageMaker Python SDK:
+ If you create your notebook job in the Studio UI, you supply details about the image and kernel, security configurations, and any custom variables or scripts, and your job is scheduled. For details about how to schedule your job using SageMaker Notebook Jobs, see [Create a notebook job in Studio](create-notebook-auto-run-studio.md).
+ To create a notebook job with the SageMaker Python SDK, you create a pipeline with a Notebook Job step and initiate an on-demand run or optionally use the pipeline scheduling feature to schedule future runs. The SageMaker SDK gives you the flexibility to customize your pipeline—you can expand your pipeline to a workflow with multiple notebook job steps. Since you create both a SageMaker Notebook Job step and a pipeline, you can track your pipeline execution status in the SageMaker Notebook Jobs job dashboard and also view your pipeline graph in Studio. For details about how to schedule your job with the SageMaker Python SDK and links to example notebooks, see [Create notebook job with SageMaker AI Python SDK example](create-notebook-auto-run-sdk.md).

# Create notebook job with SageMaker AI Python SDK example


To run a standalone notebook using the SageMaker Python SDK, you need to create a Notebook Job step, attach it into a pipeline, and use the utilities provided by Pipelines to run your job on demand or optionally schedule one or more future jobs. The following sections describe the basic steps to create an on-demand or scheduled notebook job and track the run. In addition, refer to the following discussion if you need to pass parameters to your notebook job or connect to Amazon EMR in your notebook—additional preparation of your Jupyter notebook is required in these cases. You can also apply defaults for a subset of the arguments of `NotebookJobStep` so you don’t have to specify them every time you create a Notebook Job step.

To view sample notebooks that demonstrate how to schedule notebook jobs with the SageMaker AI Python SDK, see [notebook job sample notebooks](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-pipelines/notebook-job-step).

**Topics**
+ [

## Steps to create a notebook job
](#create-notebook-auto-run-overall)
+ [

## View your notebook jobs in the Studio UI dashboard
](#create-notebook-auto-run-dash)
+ [

## View your pipeline graph in Studio
](#create-notebook-auto-run-graph)
+ [

## Passing parameters to your notebook
](#create-notebook-auto-run-passparam)
+ [

## Connecting to an Amazon EMR cluster in your input notebook
](#create-notebook-auto-run-emr)
+ [

## Set up default options
](#create-notebook-auto-run-intdefaults)

## Steps to create a notebook job


You can either create a notebook job that runs immediately or on a schedule. The following instructions describe both methods.

**To schedule a notebook job, complete the following basic steps:**

1. Create a `NotebookJobStep` instance. For details about `NotebookJobStep` parameters, see [sagemaker.workflow.steps.NotebookJobStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.notebook_job_step.NotebookJobStep). At minimum, you can provide the following arguments as shown in the following code snippet:
**Important**  
If you schedule your notebook job using the SageMaker Python SDK, you can only specify certain images to run your notebook job. For more information, see [Image constraints for SageMaker AI Python SDK notebook jobs](notebook-auto-run-constraints.md#notebook-auto-run-constraints-image-sdk).

   ```
   notebook_job_step = NotebookJobStep(
       input_notebook=input-notebook,
       image_uri=image-uri,
       kernel_name=kernel-name
   )
   ```

1. Create a pipeline with your `NotebookJobStep` as a single step, as shown in the following snippet:

   ```
   pipeline = Pipeline(
       name=pipeline-name,
       steps=[notebook_job_step],
       sagemaker_session=sagemaker-session,
   )
   ```

1. Run the pipeline on demand or optionally schedule future pipeline runs. To initiate an immediate run, use the following command:

   ```
   execution = pipeline.start(
       parameters={...}
   )
   ```

   Optionally, you can schedule a single future pipeline run or multiple runs at a predetermined interval. You specify your schedule in `PipelineSchedule` and then pass the schedule object to your pipeline with `put_triggers`. For more information about pipeline scheduling, see [Schedule a pipeline with the SageMaker Python SDK](pipeline-eventbridge.md#build-and-manage-scheduling).

   The following example schedules your pipeline to run once at December 12, 2023 at 10:31:32 UTC.

   ```
   my_schedule = PipelineSchedule(  
       name="my-schedule“,  
       at=datetime(year=2023, month=12, date=25, hour=10, minute=31, second=32) 
   )  
   pipeline.put_triggers(triggers=[my_schedule])
   ```

   The following example schedules your pipeline to run at 10:15am UTC on the last Friday of each month during the years 2022 to 2023. For details about cron-based scheduling, see [Cron-based schedules](https://docs.aws.amazon.com/scheduler/latest/UserGuide/schedule-types.html#cron-based).

   ```
   my_schedule = PipelineSchedule(  
       name="my-schedule“,  
       cron="15 10 ? * 6L 2022-2023"
   )  
   pipeline.put_triggers(triggers=[my_schedule])
   ```

1. (Optional) View your notebook jobs in the SageMaker Notebook Jobs dashboard. The values you supply for the `tags` argument of your Notebook Job step control how the Studio UI captures and displays the job. For more information, see [View your notebook jobs in the Studio UI dashboard](#create-notebook-auto-run-dash).

## View your notebook jobs in the Studio UI dashboard


The notebook jobs you create as pipeline steps appear in the Studio Notebook Job dashboard if you specify certain tags.

**Note**  
Only notebook jobs created in Studio or local JupyterLab environments create job definitions. Therefore, if you create your notebook job with the SageMaker Python SDK, you don’t see job definitions in the Notebook Jobs dashboard. You can, however, view your notebook jobs as described in [View notebook jobs](view-notebook-jobs.md). 

You can control which team members can view your notebook jobs with the following tags:
+ To display the notebook to all user profiles or [spaces](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-user-guide.html) in a domain, add the domain tag with your domain name. An example is shown as follows:
  + key: `sagemaker:domain-name`, value: `d-abcdefghij5k`
+ To display the notebook job to a certain user profile in a domain, add both the user profile and the domain tags. An example of a user profile tag is shown as follows:
  + key: `sagemaker:user-profile-name`, value: `studio-user`
+ To display the notebook job to a [space](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-user-guide.html), add both the space and the domain tags. An example of a space tag is shown as follows:
  + key: `sagemaker:shared-space-name`, value: `my-space-name`
+ If you do not attach any domain or user profile or space tags, then the Studio UI does not show the notebook job created by pipeline step. In this case, you can view the underlying training job in the training job console or you can view the status in the [list of pipeline executions](https://docs.aws.amazon.com/sagemaker/latest/dg/pipelines-studio-view-execution.html).

Once you set up the necessary tags to view your jobs in the dashboard, see [View notebook jobs](view-notebook-jobs.md) for instructions about how to view your jobs and download outputs.

## View your pipeline graph in Studio


Since your notebook job step is part of a pipeline, you can view the pipeline graph (DAG) in Studio. In the pipeline graph, you can view the status of the pipeline run and track lineage. For details, see [View the details of a pipeline run](pipelines-studio-view-execution.md).

## Passing parameters to your notebook


If you want to pass parameters to your notebook job (using the `parameters` argument of `NotebookJobStep`), you need to prepare your input notebook to receive the parameters. 

The Papermill-based notebook job executor searches for a Jupyter cell tagged with the `parameters` tag and applies the new parameters or parameter overrides immediately after this cell. For details, see [Parameterize your notebook](notebook-auto-run-troubleshoot-override.md). 

Once you have performed this step, pass your parameters to your `NotebookJobStep`, as shown in the following example:

```
notebook_job_parameters = {
    "company": "Amazon"
}

notebook_job_step = NotebookJobStep(
    image_uri=image-uri,
    kernel_name=kernel-name,
    role=role-name,
    input_notebook=input-notebook,
    parameters=notebook_job_parameters,
    ...
)
```

## Connecting to an Amazon EMR cluster in your input notebook


If you connect to an Amazon EMR cluster from your Jupyter notebook in Studio, you might need to further modify your Jupyter notebook. See [Connect to an Amazon EMR cluster from your notebook](scheduled-notebook-connect-emr.md) if you need to perform any of the following tasks in your notebook:
+ **Pass parameters into your Amazon EMR connection command.** Studio uses Papermill to run notebooks. In SparkMagic kernels, parameters you pass to your Amazon EMR connection command may not work as expected due to how Papermill passes information to SparkMagic.
+ **Passing user credentials to Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR clusters.** You have to pass user credentials through the AWS Secrets Manager.

## Set up default options


The SageMaker SDK gives you the option to set defaults for a subset of parameters so you don’t have to specify these parameters every time you create a `NotebookJobStep` instance. These parameters are `role`, `s3_root_uri`, `s3_kms_key`, `volume_kms_key`, `subnets`, and `security_group_ids`. Use the SageMaker AI config file to set the defaults for the step. For information about the SageMaker AI configuration file, see [Configuring and using defaults with the SageMaker Python SDK.](https://sagemaker.readthedocs.io/en/stable/overview.html#configuring-and-using-defaults-with-the-sagemaker-python-sdk).

To set up the notebook job defaults, apply your new defaults to the notebook job section of the config file as shown in the following snippet:

```
SageMaker:
  PythonSDK:
    Modules:
      NotebookJob:
        RoleArn: 'arn:aws:iam::555555555555:role/IMRole'
        S3RootUri: 's3://amzn-s3-demo-bucket/my-project'
        S3KmsKeyId: 's3kmskeyid'
        VolumeKmsKeyId: 'volumekmskeyid1'
        VpcConfig:
          SecurityGroupIds:
            - 'sg123'
          Subnets:
            - 'subnet-1234'
```

# Create a notebook job in Studio


**Note**  
The notebook scheduler is built from the Amazon EventBridge, SageMaker Training, and Pipelines services. If your notebook jobs fail, you might see errors related to these services. The following provides information on how to create a notebook job in the Studio UI.

SageMaker Notebook Jobs gives you the tools to create and manage your noninteractive notebook jobs using the Notebook Jobs widget. You can create jobs, view the jobs you created, and pause, stop, or resume existing jobs. You can also modify notebook schedules.

When you create your scheduled notebook job with the widget, the scheduler tries to infer a selection of default options and automatically populates the form to help you get started quickly. If you are using Studio, at minimum you can submit an on-demand job without setting any options. You can also submit a (scheduled) notebook job definition supplying just the time-specific schedule information. However, you can customize other fields if your scheduled job requires specialized settings. If you are running a local Jupyter notebook, the scheduler extension provides a feature for you to specify your own defaults (for a subset of options) so you don't have to manually insert the same values every time.

When you create a notebook job, you can include additional files such as datasets, images, and local scripts. To do so, choose **Run job with input folder**. The Notebook Job will now have access to all files under the input file's folder. While the notebook job is running the file structure of directory remains unchanged.

To schedule a notebook job, complete the following steps.

1. Open the **Create Job** form.

   In local JupyterLab environments, choose the **Create a notebook job** icon (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) in the taskbar. If you don't see the icon, follow the instructions in [Installation guide](scheduled-notebook-installation.md) to install it.

   In Studio, open the form in one of two ways:
   + Using the **File Browser**

     1. In the **File Browser** in the left panel, right-click on the notebook you want to run as a scheduled job.

     1. Choose **Create Notebook Job**.
   + Within the Studio notebook
     + Inside the Studio notebook you want to run as a scheduled job, choose the **Create a notebook job** icon (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) in the Studio toolbar.

1. Complete the popup form. The form displays the following fields:
   + **Job name**: A descriptive name you specify for your job.
   + **Input file**: The name of the notebook which you are scheduling to run in noninteractive mode.
   + **Compute type**: The type of Amazon EC2 instance in which you want to run your notebook.
   + **Parameters**: Custom parameters you can optionally specify as inputs to your notebook. To use this feature, you might optionally want to tag a specific cell in your Jupyter notebook with the **parameters** tag to control where your parameters are applied. For more details, see [Parameterize your notebook](notebook-auto-run-troubleshoot-override.md).
   + (Optional)**Run job with input folder**: If selected the scheduled job will have access to all the files found in the same folder as the **Input file**.
   + **Additional Options**: You can specify additional customizations for your job. For example, you can specify an image or kernel, input and output folders, job retry and timeout options, encryption details, and custom initialization scripts. For the complete listing of customizations you can apply, see [Available options](create-notebook-auto-execution-advanced.md).

1. Schedule your job. You can run your notebook on demand or on a fixed schedule.
   + To run the notebook on demand, complete the following steps:
     + Select **Run Now**.
     + Choose **Create**.
     + The **Notebook Jobs** tab appears. Choose **Reload** to load your job into the dashboard.
   + To run the notebook on a fixed schedule, complete the following steps:
     + Choose **Run on a schedule**.
     + Choose the **Interval** dropdown list and select an interval. The intervals range from every minute to monthly. You can also select **Custom schedule**.
     + Based on the interval you choose, additional fields appear to help you further specify your desired run day and time. For example, if you select **Day** for a daily run, an additional field appears for you to specify the desired time. Note that any time you specify is in UTC format. Note also that if you choose a small interval, such as one minute, your jobs overlap if the previous job is not complete when the next job starts.

       If you select a custom schedule, you use cron syntax in the expression box to specify your exact run date and time. The cron syntax is a space-separated list of digits, each of which represent a unit of time from seconds to years. For help with cron syntax, you can choose **Get help with cron syntax** under the expression box.
     + Choose **Create**.
     + The **Notebook Job Definitions** tab appears. Choose **Reload** to load your job definition into the dashboard.

# Set up default options for local notebooks


**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. The following section is specific to using the Studio Classic application. For information about using the updated Studio experience, see [Amazon SageMaker Studio](studio-updated.md).  
Studio Classic is still maintained for existing workloads but is no longer available for onboarding. You can only stop or delete existing Studio Classic applications and cannot create new ones. We recommend that you [migrate your workload to the new Studio experience](studio-updated-migrate.md).

You can set up default options when you create a notebook job. This can save you time if you plan to create multiple notebook jobs with different options than the provided defaults. The following provides information on how to set up the default options for local notebooks.

If you have to manually type (or paste in) custom values in the **Create Job** form, you can store new default values and the scheduler extension inserts your new values every time you create a new job definition. This feature is available for the following options:
+ **Role ARN**
+ **S3 Input Folder**
+ **S3 Output Folder**
+ **Output encryption KMS key** (if you turn on **Configure Job Encryption**)
+ **Job instance volume encryption KMS key** (if you turn on **Configure Job Encryption**)

This feature saves you time if you insert different values than the provided defaults and continue to use those values for future job runs. Your chosen user settings are stored on the machine that runs your JupyterLab server and are retrieved with the help of native API. If you provide new default values for one or more but not all five options, the previous defaults are taken for the ones you don’t customize.

The following instructions show you how to preview the existing default values, set new default values, and reset your default values for your notebook jobs.

**To preview existing default values for your notebook jobs, complete the following steps:**

1. Open the Amazon SageMaker Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. In the **File Browser** in the left panel, right-click on the notebook you want to run as a scheduled job.

1. Choose **Create Notebook Job**.

1. Choose **Additional options** to expand the tab of notebook job settings. You can view the default settings here. 

**To set new default values for your future notebook jobs, complete the following steps:**

1. Open the Amazon SageMaker Studio Classic console by following the instructions in [Launch Amazon SageMaker Studio Classic](studio-launch.md).

1. From the top menu in Studio Classic, choose **Settings**, then choose **Advanced Settings Editor**.

1. Choose **Amazon SageMaker Scheduler** from the list below **Settings**. This may already be open by default.

1. You can update the default settings directly in this UI page or by using the JSON editor.
   + In the UI you can insert new values for **Role ARN**, **S3 Input Folder**, **S3 Output Folder**, **Output encryption KMS key**, or **Job instance volume encryption KMS key**. If you change these values, you will see the new defaults for these fields while you create your next notebook job under **Additional options**.
   + (Optional) To update the user defaults using the **JSON Settings Editor**, complete the following steps:

     1. In the top right corner, choose **JSON Settings Editor**.

     1. In the **Settings** left sidebar, choose **Amazon SageMaker AI Scheduler**. This may already be open by default.

        You can see your current default values in the **User Preferences** panel.

        You can see the system default values in the **System Defaults** panel.

     1. To update your default values, copy and paste the JSON snippet from the **System Defaults** panel to the **User Preferences** panel, and update the fields.

     1. If you updated the default values, choose the **Save User Settings** icon (![\[Icon of a cloud with an arrow pointing upward, representing cloud upload functionality.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/Notebook_save.png)) in the top right corner. Closing the editor does not save the changes.

**If you previously changed and now want to reset the user-defined default values, complete following steps:**

1. From the top menu in Studio Classic, choose **Settings**, then choose **Advanced Settings Editor**.

1. Choose **Amazon SageMaker Scheduler** from the list below **Settings**. This may already be open by default.

1. You can restore the defaults by directly using this UI page or using the JSON editor.
   + In the UI you can choose **Restore to Defaults** in the top right corner. Your defaults are restored to empty strings. You only see this option if you previously changed your default values.
   + (Optional) To restart the default settings using the **JSON Settings Editor**, complete the following steps:

     1. In the top right corner, choose **JSON Settings Editor**.

     1. In the **Settings** left sidebar, choose **Amazon SageMaker AI Scheduler**. This may already be open by default.

        You can see your current default values in the **User Preferences** panel.

        You can see the system default values in the **System Defaults** panel.

     1. To restore your current default settings copy the content from the **System Defaults** panel to the **User Preferences** panel.

     1. Choose the **Save User Settings** icon (![\[Icon of a cloud with an arrow pointing upward, representing cloud upload functionality.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/Notebook_save.png)) in the top right corner. Closing the editor does not save the changes.

# Notebook job workflows


Since a notebook job runs your custom code, you can create a pipeline that includes one or more notebook job steps. ML workflows often contain multiple steps, such as a processing step to preprocess data, a training step to build your model, and a model evaluation step, among others. One possible use of notebook jobs is to handle preprocessing—you might have a notebook that performs data transformation or ingestion, an EMR step that performs data cleaning, and another notebook job that performs featurization of your inputs before initiating a training step. A notebook job may require information from previous steps in the pipeline or from user-specified customization as parameters in the input notebook. For examples that show how to pass environment variables and parameters to your notebook and retrieve information from prior steps, see [Pass information to and from your notebook step](create-notebook-auto-run-dag-seq.md).

In another use case, one of your notebook jobs might call another notebook to perform some tasks during your notebook run—in this scenario you need to specify these sourced notebooks as dependencies with your notebook job step. For information about how to call another notebook, see [Invoke another notebook in your notebook job](create-notebook-auto-run-dag-call.md).

To view sample notebooks that demonstrate how to schedule notebook jobs with the SageMaker AI Python SDK, see [notebook job sample notebooks](https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-pipelines/notebook-job-step).

# Pass information to and from your notebook step


The following sections describe ways to pass information to your notebook as environment variables and parameters.

## Pass environment variables


Pass environment variables as a dictionary to the `environment_variable` argument of your `NotebookJobStep`, as shown in the following example:

```
environment_variables = {"RATE": 0.0001, "BATCH_SIZE": 1000}

notebook_job_step = NotebookJobStep(
    ...
    environment_variables=environment_variables,
    ...
)
```

You can use the environment variables in the notebook using `os.getenv()`, as shown in the following example:

```
# inside your notebook
import os
print(f"ParentNotebook: env_key={os.getenv('env_key')}")
```

## Pass parameters


When you pass parameters to the first Notebook Job step in your `NotebookJobStep` instance, you might optionally want to tag a cell in your Jupyter notebook to indicate where to apply new parameters or parameter overrides. For instructions about how to tag a cell in your Jupyter notebook, see [Parameterize your notebook](notebook-auto-run-troubleshoot-override.md).

You pass parameters through the Notebook Job step's `parameters` parameter, as shown in the following snippet:

```
notebook_job_parameters = {
    "company": "Amazon",
}

notebook_job_step = NotebookJobStep(
    ...
    parameters=notebook_job_parameters,
    ...
)
```

Inside your input notebook, your parameters are applied after the cell tagged with `parameters` or at the beginning of the notebook if you don’t have a tagged cell.

```
# this cell is in your input notebook and is tagged with 'parameters'
# your parameters and parameter overrides are applied after this cell
company='default'
```

```
# in this cell, your parameters are applied
# prints "company is Amazon"
print(f'company is {company}')
```

## Retrieve information from a previous step


The following discussion explains how you can extract data from a previous step to to pass to your Notebook Job step.

**Use `properties` attribute**

You can use the following properties with the previous step's `properties` attribute:
+ `ComputingJobName`—The training job name
+ `ComputingJobStatus`—The training job status
+ `NotebookJobInputLocation`—The input Amazon S3 location
+ `NotebookJobOutputLocationPrefix`—The path to your training job outputs, more specifically `{NotebookJobOutputLocationPrefix}/{training-job-name}/output/output.tar.gz`. containing outputs
+ `InputNotebookName`—The input notebook file name
+ `OutputNotebookName`—The output notebook file name (which may not exist in the training job output folder if the job fails)

The following code snippet shows how to extract parameters from the properties attribute.

```
notebook_job_step2 = NotebookJobStep(
    ....
    parameters={
        "step1_JobName": notebook_job_step1.properties.ComputingJobName,
        "step1_JobStatus": notebook_job_step1.properties.ComputingJobStatus,
        "step1_NotebookJobInput": notebook_job_step1.properties.NotebookJobInputLocation,
        "step1_NotebookJobOutput": notebook_job_step1.properties.NotebookJobOutputLocationPrefix,
    }
```

**Use JsonGet**

If you want to pass parameters other than the ones previously mentioned and the JSON outputs of your previous step reside in Amazon S3, use `JsonGet`. `JsonGet` is a general mechanism that can directly extract data from JSON files in Amazon S3.

To extract JSON files in Amazon S3 with `JsonGet`, complete the following steps:

1. Upload your JSON file to Amazon S3. If your data is already uploaded to Amazon S3, skip this step. The following example demonstrates uploading a JSON file to Amazon S3.

   ```
   import json
   from sagemaker.s3 import S3Uploader
   
   output = {
       "key1": "value1", 
       "key2": [0,5,10]
   }
               
   json_output = json.dumps(output)
   
   with open("notebook_job_params.json", "w") as file:
       file.write(json_output)
   
   S3Uploader.upload(
       local_path="notebook_job_params.json",
       desired_s3_uri="s3://path/to/bucket"
   )
   ```

1. Provide your S3 URI and the JSON path to the value you want to extract. In the following example, `JsonGet` returns an object representing index 2 of the value associated with key `key2` (`10`).

   ```
   NotebookJobStep(
       ....
       parameters={
           # the key job_key1 returns an object representing the value 10
           "job_key1": JsonGet(
               s3_uri=Join(on="/", values=["s3:/", ..]),
               json_path="key2[2]" # value to reference in that json file
           ), 
           "job_key2": "Amazon" 
       }
   )
   ```

# Invoke another notebook in your notebook job


You can set up a pipeline in which one notebook job calls another notebook. The following sets up an example of a pipeline with a Notebook Job step in which the notebook calls two other notebooks. The input notebook contains the following lines:

```
%run 'subfolder/notebook_to_call_in_subfolder.ipynb'
%run 'notebook_to_call.ipynb'
```

Pass these notebooks into your `NotebookJobStep` instances with `additional_dependencies`, as shown in the following snippet. Note that the paths provided for the notebooks in `additional_dependencies` are provided from the root location. For information about how SageMaker AI uploads your dependent files and folders to Amazon S3 so you can correctly provide paths to your dependencies, see the description for `additional_dependencies` in [NotebookJobStep](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.notebook_job_step.NotebookJobStep).

```
input_notebook = "inputs/input_notebook.ipynb"
simple_notebook_path = "inputs/notebook_to_call.ipynb"
folder_with_sub_notebook = "inputs/subfolder"

notebook_job_step = NotebookJobStep(
    image_uri=image-uri,
    kernel_name=kernel-name,
    role=role-name,
    input_notebook=input_notebook,
    additional_dependencies=[simple_notebook_path, folder_with_sub_notebook],
    tags=tags,
)
```

# Available options
Available options

The following table displays all available options you can use to customize your notebook job, whether you run your Notebook Job in Studio, a local Jupyter environment, or using the SageMaker Python SDK. The table includes the type of custom option, a description, additional guidelines about how to use the option, a field name for the option in Studio (if available) and the parameter name for the notebook job step in the SageMaker Python SDK (if available).

For some options, you can also preset custom default values so you don’t have to specify them every time you set up a notebook job. For Studio, these options are **Role**, **Input folder**, **Output folder**, and **KMS Key ID**, and are specified in the following table. If you preset custom defaults for these options, these fields are prepopulated in the **Create Job** form when you create your notebook job. For details about how to create custom defaults in Studio and local Jupyter environments, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md).

The SageMaker SDK also gives you the option to set intelligent defaults so that you don’t have to specify these parameters when you create a `NotebookJobStep`. These parameters are `role`, `s3_root_uri`, `s3_kms_key`, `volume_kms_key`, `subnets`, `security_group_ids`, and are specified in the following table. For information about how to set intelligent defaults, see [Set up default options](create-notebook-auto-run-sdk.md#create-notebook-auto-run-intdefaults).


| Custom option | Description | Studio-specific guideline | Local Jupyter environment guideline | SageMaker Python SDK guideline | 
| --- | --- | --- | --- | --- | 
| Job name | Your job name as it should appear in the Notebook Jobs dashboard. | Field Job name. | Same as Studio. | Parameter notebook\$1job\$1name. Defaults to None. | 
| Image | The container image used to run the notebook noninteractively on the chosen compute type. | Field Image. This field defaults to your notebook’s current image. Change this field from the default to a custom value if needed. If Studio cannot infer this value, the form displays a validation error requiring you to specify it. This image can be a custom, [bring-your-own image](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-byoi.html) or an available Amazon SageMaker image. For a list of available SageMaker images supported by the notebook scheduler, see [Amazon SageMaker Images Available for Use With Studio Classic Notebooks](notebooks-available-images.md). | Field Image. This field requires an ECR URI of a Docker image that can run the provided notebook on the selected compute type. By default, the scheduler extension uses a pre-built SageMaker AI Docker image—base Python 2.0. This is the official Python 3.8 image from DockerHub with boto3, AWS CLI, and the Python 3 kernel. You can also provide any ECR URI that meets the notebook custom image specification. For details, see [Custom SageMaker Image Specifications for Amazon SageMaker Studio Classic](studio-byoi-specs.md). This image should have all the kernels and libraries needed for the notebook run. | Required. Parameter image\$1uri. URI location of a Docker image on ECR. You can use specific SageMaker Distribution Images or custom image based on those images, or your own image pre-installed with notebook job dependencies that meets additional requirements. For details, see [Image constraints for SageMaker AI Python SDK notebook jobs](notebook-auto-run-constraints.md#notebook-auto-run-constraints-image-sdk). | 
| Instance type | The EC2 instance type to use to run the notebook job. The notebook job uses a SageMaker Training Job as a computing layer, so the specified instance type should be a SageMaker Training supported instance type. | Field Compute type. Defaults to ml.m5.large. | Same as Studio. | Parameter instance\$1type. Defaults to ml.m5.large. | 
| Kernel | The Jupyter kernel used to run the notebook job. | Field Kernel. This field defaults to your notebook’s current kernel. Change this field from the default to a custom value if needed. If Studio cannot infer this value, the form displays a validation error requiring you to specify it. | Field Kernel. This kernel should be present in the image and follow the Jupyter kernel specs. This field defaults to the Python3 kernel found in the base Python 2.0 SageMaker image. Change this field to a custom value if needed. | Required. Parameter kernel\$1name. This kernel should be present in the image and follow the Jupyter kernel specs. To see the kernel identifiers for your image, see (LINK). | 
| SageMaker AI session | The underlying SageMaker AI session to which SageMaker AI service calls are delegated. | N/A | N/A | Parameter sagemaker\$1session. If unspecified, one is created using a default configuration chain. | 
| Role ARN | The role’s Amazon Resource Name (ARN) used with the notebook job. | Field Role ARN. This field defaults to the Studio execution role. Change this field to a custom value if needed.  If Studio cannot infer this value, the **Role ARN** field is blank. In this case, insert the ARN you want to use.  | Field Role ARN. This field defaults to any role prefixed with SagemakerJupyterScheduler. If you have multiple roles with the prefix, the extension chooses one. Change this field to a custom value if needed. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md). | Parameter role. Defaults to the SageMaker AI default IAM role if the SDK is running in SageMaker Notebooks or SageMaker Studio Notebooks. Otherwise, it throws a ValueError. Allows intelligent defaults. | 
| Input notebook | The name of the notebook which you are scheduling to run. | Required. Field Input file. | Same as Studio. | Required.Parameter input\$1notebook. | 
| Input folder | The folder containing your inputs. The job inputs, including the input notebook and any optional start-up or initialization scripts, are put in this folder. | Field Input folder. If you don’t provide a folder, the scheduler creates a default Amazon S3 bucket for your inputs. | Same as Studio. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md). | N/A. The input folder is placed inside the location specified by parameter s3\$1root\$1uri. | 
| Output folder | The folder containing your outputs. The job outputs, including the output notebook and logs, are put in this folder. | Field Output folder. If you don’t specify a folder, the scheduler creates a default Amazon S3 bucket for your outputs. | Same as Studio. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md). | N/A. The output folder is placed inside the location specified by parameter s3\$1root\$1uri. | 
| Parameters | A dictionary of variables and values to pass to your notebook job. | Field Parameters. You need to [parameterize your notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html) to accept parameters. | Same as Studio. | Parameter parameters. You need to [parameterize your notebook](https://docs.aws.amazon.com/sagemaker/latest/dg/notebook-auto-run-troubleshoot-override.html) to accept parameters. | 
| Additional (file or folder) dependencies | The list of file or folder dependencies which the notebook job uploads to s3 staged folder. | Not supported. | Not supported. | Parameter additional\$1dependencies. The notebook job uploads these dependencies to an S3 staged folder so they can be consumed during execution. | 
| S3 root URI | The folder containing your inputs. The job inputs, including the input notebook and any optional start-up or initialization scripts, are put in this folder. This S3 bucket must be in the same AWS account that you're using to run your notebook job. | N/A. Use Input Folder and Output folder. | Same as Studio. | Parameter s3\$1root\$1uri. Defaults to a default S3 bucket. Allows intelligent defaults. | 
| Environment variables | Any existing environment variables that you want to override, or new environment variables that you want to introduce and use in your notebook. | Field Environment variables. | Same as Studio. | Parameter environment\$1variables. Defaults to None. | 
| Tags | A list of tags attached to the job. | N/A | N/A | Parameter tags. Defaults to None. Your tags control how the Studio UI captures and displays the job created by the pipeline. For details, see [View your notebook jobs in the Studio UI dashboard](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash). | 
| Start-up script | A script preloaded in the notebook startup menu that you can choose to run before you run the notebook. | Field Start-up script. Select a Lifecycle Configuration (LCC) script that runs on the image at start-up. A start-up script runs in a shell outside of the Studio environment. Therefore, this script cannot depend on the Studio local storage, environment variables, or app metadata (in `/opt/ml/metadata`). Also, if you use a start-up script and an initialization script, the start-up script runs first.   | Not supported. | Not supported. | 
| Initialization script | A path to a local script you can run when your notebook starts up. | Field Initialization script. Enter the EFS file path where a local script or a Lifecycle Configuration (LCC) script is located. If you use a start-up script and an initialization script, the start-up script runs first. An initialization script is sourced from the same shell as the notebook job. This is not the case for a start-up script described previously. Also, if you use a start-up script and an initialization script, the start-up script runs first.    | Field Initialization script. Enter the local file path where a local script or a Lifecycle Configuration (LCC) script is located.  | Parameter initialization\$1script. Defaults to None. | 
| Max retry attempts | The number of times Studio tries to rerun a failed job run. | Field Max retry attempts. Defaults to 1. | Same as Studio. | Parameter max\$1retry\$1attempts. Defaults to 1. | 
| Max run time (in seconds) | The maximum length of time, in seconds, that a notebook job can run before it is stopped. If you configure both Max run time and Max retry attempts, the run time applies to each retry. If a job does not complete in this time, its status is set to Failed. | Field Max run time (in seconds). Defaults to 172800 seconds (2 days). | Same as Studio. | Parameter max\$1runtime\$1in\$1seconds. Defaults to 172800 seconds (2 days). | 
| Retry policies | A list of retry policies, which govern actions to take in case of failure. | Not supported. | Not supported. | Parameter retry\$1policies. Defaults to None. | 
| Add Step or StepCollection dependencies | A list of Step or StepCollection names or instances on which the job depends. | Not supported. | Not supported. | Parameter depends\$1on. Defaults to None. Use this to define explicit dependencies between steps in your pipeline graph. | 
| Volume size | The size in GB of the storage volume for storing input and output data during training. | Not supported. | Not supported. | Parameter volume\$1size. Defaults to 30GB. | 
| Encrypt traffic between containers | A flag that specifies whether traffic between training containers is encrypted for the training job. | N/A. Enabled by default. | N/A. Enabled by default. | Parameter encrypt\$1inter\$1container\$1traffic. Defaults to True. | 
| Configure job encryption | An indicator that you want to encrypt your notebook job outputs, job instance volume, or both. | Field Configure job encryption. Check this box to choose encryption. If left unchecked, the job outputs are encrypted with the account's default KMS key and the job instance volume is not encrypted. | Same as Studio. | Not supported. | 
| Output encryption KMS key | A KMS key to use if you want to customize the encryption key used for your notebook job outputs. This field is only applicable if you checked Configure job encryption. | Field Output encryption KMS key. If you do not specify this field, your notebook job outputs are encrypted with SSE-KMS using the default Amazon S3 KMS key. Also, if you create the Amazon S3 bucket yourself and use encryption, your encryption method is preserved. | Same as Studio. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md). | Parameter s3\$1kms\$1key. Defaults to None. Allows intelligent defaults. | 
| Job instance volume encryption KMS key | A KMS key to use if you want to encrypt your job instance volume. This field is only applicable if you checked Configure job encryption. | Field Job instance volume encryption KMS key. | Field Job instance volume encryption KMS key. For this field, you can set your own user default that pre-populates whenever you create a new job definition. For details, see [Set up default options for local notebooks](create-notebook-auto-execution-advanced-default.md). | Parameter volume\$1kms\$1key. Defaults to None. Allows intelligent defaults. | 
| Use a Virtual Private Cloud to run this job (for VPC users) | An indicator that you want to run this job in a Virtual Private Cloud (VPC). For better security, it is recommend that you use a private VPC. | Field Use a Virtual Private Cloud to run this job. Check this box if you want to use a VPC. At minimum, create the following VPC endpoints to enable your notebook job to privately connect to those AWS resources: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/create-notebook-auto-execution-advanced.html)If you choose to use a VPC, you need to specify at least one private subnet and at least one security group in the following options. If you don’t use any private subnets, you need to consider other configuration options. For details, see Public VPC subnets not supported in [Constraints and considerations](notebook-auto-run-constraints.md). | Same as Studio. | N/A | 
| Subnet(s) (for VPC users) | Your subnets. This field must contain at least one and at most five, and all the subnets you provide should be private. For details, see Public VPC subnets not supported in [Constraints and considerations](notebook-auto-run-constraints.md). | Field Subnet(s). This field defaults to the subnets associated with the Studio domain, but you can change this field if needed. | Field Subnet(s). The scheduler cannot detect your subnets, so you need to enter any subnets you configured for your VPC. | Parameter subnets. Defaults to None. Allows intelligent defaults. | 
| Security group(s) (for VPC users) | Your security groups. This field must contain at least one and at most 15. For details, see Public VPC subnets not supported in [Constraints and considerations](notebook-auto-run-constraints.md). | Field Security groups. This field defaults to the security groups associated with the domain VPC, but you can change this field if needed. | Field Security groups. The scheduler cannot detect your security groups, so you need to enter any security groups you configured for your VPC. | Parameter security\$1group\$1ids. Defaults to None. Allows intelligent defaults. | 
| Name | The name of the notebook job step. | N/A | N/A | Parameter name. If unspecified, it is derived from the notebook file name. | 
| Display name | Your job name as it should appear in your list of pipeline executions. | N/A | N/A | Parameter display\$1name. Defaults to None. | 
| Description | A description of your job. | N/A | N/A | Parameter description. | 

# Parameterize your notebook
Parameterize your notebook

To pass new parameters or parameter overrides to your scheduled notebook job, you might optionally want to modify your Jupyter notebook if you want your new parameters values to be applied after a cell. When you pass a parameter, the notebook job executor uses the methodology enforced by Papermill. The notebook job executor searches for a Jupyter cell tagged with the `parameters` tag and applies the new parameters or parameter overrides immediately after this cell. If you don’t have any cells tagged with `parameters`, the parameters are applied at the beginning of the notebook. If you have more than one cell tagged with `parameters`, the parameters are applied after the first cell tagged with `parameters`.

To tag a cell in your notebook with the `parameters` tag, complete the following steps:

1. Select the cell to parameterize.

1. Choose the **Property Inspector** icon (![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/gears.png)) in the right sidebar.

1. Type **parameters** in the **Add Tag** box.

1. Choose the **\$1** sign.

1. The `parameters` tag appears under **Cell Tags** with a check mark, which means the tag is applied to the cell.

# Connect to an Amazon EMR cluster from your notebook
Connect to an Amazon EMR cluster from your notebook

If you connect to an Amazon EMR cluster from your Jupyter notebook in Studio, you might need to perform additional setup. In particular, the following discussion addresses two issues:
+ **Passing parameters into your Amazon EMR connection command**. In SparkMagic kernels, parameters you pass to your Amazon EMR connection command may not work as expected due to differences in how Papermill passes parameters and how SparkMagic receives parameters. The workaround to address this limitation is to pass parameters as environment variables. For more details about the issue and workaround, see [Pass parameters to your EMR connection command](#scheduled-notebook-connect-emr-pass-param).
+ **Passing user credentials to Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR clusters**. In interactive mode, Studio asks for credentials in a popup form where you can enter your sign-in credentials. In your noninteractive scheduled notebook, you have to pass them through the AWS Secrets Manager. For more details about how to use the AWS Secrets Manager in your scheduled notebook jobs, see [Pass user credentials to your Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR cluster](#scheduled-notebook-connect-emr-credentials).

## Pass parameters to your EMR connection command
Pass parameters to your EMR connection command

If you are using images with the SparkMagic PySpark and Spark kernels and want to parameterize your EMR connection command, provide your parameters in the **Environment variables** field instead of the Parameters field in the Create Job form (in the **Additional Options** dropdown menu). Make sure your EMR connection command in the Jupyter notebook passes these parameters as environment variables. For example, suppose you pass `cluster-id` as an environment variable when you create your job. Your EMR connection command should look like the following:

```
%%local
import os
```

```
%sm_analytics emr connect —cluster-id {os.getenv('cluster_id')} --auth-type None
```

You need this workaround to meet requirements by SparkMagic and Papermill. For background context, the SparkMagic kernel expects that the `%%local` magic command accompany any local variables you define. However, Papermill does not pass the `%%local` magic command with your overrides. In order to work around this Papermill limitation, you must supply your parameters as environment variables in the **Environment variables** field.

## Pass user credentials to your Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR cluster
Pass user credentials to your Kerberos, LDAP, or HTTP Basic Auth-authenticated Amazon EMR cluster

To establish a secure connection to an Amazon EMR cluster that uses Kerberos, LDAP, or HTTP Basic Auth authentication, you use the AWS Secrets Manager to pass user credentials to your connection command. For information about how to create a Secrets Manager secret, see [Create an AWS Secrets Manager secret](https://docs.aws.amazon.com/secretsmanager/latest/userguide/create_secret.html). Your secret must contain your username and password. You pass the secret with the `--secrets` argument, as shown in the following example:

```
%sm_analytics emr connect --cluster-id j_abcde12345 
    --auth Kerberos 
    --secret aws_secret_id_123
```

Your administrator can set up a flexible access policy using an attribute-based-access-control (ABAC) method, which assigns access based on special tags. You can set up flexible access to create a single secret for all users in the account or a secret for each user. The following code samples demonstrate these scenarios:

**Create a single secret for all users in the account**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:role/service-role/AmazonSageMaker-ExecutionRole-20190101T012345"
            },
            "Action": "secretsmanager:GetSecretValue",
            "Resource": [
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes123-1a2b3c",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes456-4d5e6f",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes789-7g8h9i"
            ]
        }
    ]
}
```

------

**Create a different secret for each user**

You can create a different secret for each user using the `PrincipleTag` tag, as shown in the following example:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:role/service-role/AmazonSageMaker-ExecutionRole-20190101T012345"
            },
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/user-identity": "${aws:PrincipalTag/user-identity}"
                }
            },
            "Action": "secretsmanager:GetSecretValue",
            "Resource": [
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes123-1a2b3c",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes456-4d5e6f",
                "arn:aws:secretsmanager:us-west-2:111122223333:secret:aes789-7g8h9i"
            ]
        }
    ]
}
```

------

# Notebook jobs details in Amazon SageMaker Studio
Notebook jobs details in Studio

SageMaker Notebook Jobs dashboards help organize the job definitions that you schedule, and also keep track of the actual jobs that run from your job definitions. There are two important concepts to understand when scheduling notebook jobs: *job definitions* and *job runs*. Job definitions are schedules you set to run specific notebooks. For example, you can create a job definition that runs notebook XYZ.ipynb every Wednesday. This job definition launches the actual job runs which occur this coming Wednesday, next Wednesday, the Wednesday after that, and so on. 

**Note**  
The SageMaker Python SDK notebook job step does not create job definitions. However, you can view your jobs in the Notebook Jobs dashboard. Both jobs and job definitions are available if you schedule your job in a JupyterLab environment.

The interface provides two main tabs that help you track your existing job definitions and job runs:
+ **Notebook Jobs** tab: This tab displays a list of all your job runs from your on-demand jobs and job definitions. From this tab, you can directly access the details for a single job run. For example, you can view a single job run that occurred two Wednesdays ago.
+ **Notebook Job Definitions** tab: This tab displays a list of all your job definitions. From this tab, you can directly access the details for a single job definition. For example, you can view the schedule you created to run XYZ.ipynb every Wednesday.

For details about the **Notebook Jobs** tab, see [View notebook jobs](view-notebook-jobs.md).

For details about the **Notebook Job Definitions** tab, see [View notebook job definitions](view-def-detail-notebook-auto-run.md).

# View notebook jobs
View notebook jobs

**Note**  
You can automatically view your notebook jobs if you scheduled your notebook job from the Studio UI. If you used the SageMaker Python SDK to schedule your notebook job, you need to supply additional tags when you create the notebook job step. For details, see [View your notebook jobs in the Studio UI dashboard](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash).

The following topic gives information about the **Notebook Jobs** tab and how to view the details of a single notebook job. The **Notebook Jobs** tab (which you access by choosing the **Create a notebook job** icon (![\[Blue icon of a calendar with a checkmark, representing a scheduled task or event.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/notebook-schedule.png)) in the Studio toolbar) shows a history of your on-demand jobs and all the jobs that run from the job definitions you created. This tab opens after you create an on-demand job, or you can just view this tab yourself to see a history of past and current jobs. If you select the **Job name** for any job, you can view details for a single job in its **Job Detail** page. For more information about the **Job Detail** page, see the following section [View a single job](#view-jobs-detail-notebook-auto-run).

The **Notebook Jobs** tab includes the following information for each job:
+ **Output files**: Displays the availability of output files. This column can contain one of the following:
  + A download icon (![\[Cloud icon with downward arrow, representing download or cloud storage functionality.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/File_download.png)): The output notebook and log are available for download; choose this button to download them. Note that a failed job can still generate output files if the failure occurred after the files were created. In this case, it is helpful to view the output notebook to identify the failure point.
  + Links to the **Notebook** and **Output log**: The notebook and output log are downloaded. Choose the links to view their contents.
  + (blank): The job was stopped by the user, or a failure occurred in the job run before it could generate output files. For example, network failures could prevent the job from starting.

  The output notebook is the result of running all cells in the notebook, and also incorporates any new or overriding parameters or environment variables you included. The output log captures the details of the job run to help you troubleshoot failed jobs.
+ **Created at**: The time the on-demand job or scheduled job was created.
+ **Status**: The current status of the job, which is one of the following values:
  + **In progress**: The job is running
  + **Failed**: The job failed from configuration or notebook logic errors
  + **Stopped**: The job was stopped by the user
  + **Completed**: The job completed
+ **Actions**: This column provides shortcuts to help you stop or remove any job directly in the interface.

## View a single job
View a single job

From the **Notebook Jobs** tab, you can select a job name to view the **Job Detail** page for a specific job. The **Job Detail** page includes all the details you provided in the **Create Job** form. Use this page to confirm the settings you specified when you created the job definition. 

In addition, you can access shortcuts to help you perform the following actions in the page itself:
+ **Delete Job**: Remove the job from the **Notebook Jobs** tab.
+ **Stop Job**: Stop your running job.

# View notebook job definitions
View notebook job definitions

**Note**  
If you scheduled your notebook job with the SageMaker Python SDK, skip this section. Only notebook jobs created in Studio or local JupyterLab environments create job definitions. Therefore, if you created your notebook job with the SageMaker Python SDK, you won’t see job definitions in the Notebook Jobs dashboard. You can, however, view your notebook jobs as described in [View notebook jobs](view-notebook-jobs.md). 

When you create a job definition, you create a schedule for a job. The **Notebook Job Definitions** tab lists these schedules, as well as information about specific notebook job definitions. For example, you might create a job definition that runs a specific notebook every minute. Once this job definition is active, you see a new job every minute in the **Notebook Jobs** tab. The following page gives information about the **Notebook Job Definitions** tab, as well as how to view a notebook job definition.

The **Notebook Job Definitions** tab displays a dashboard with all your job definitions and includes the input notebook, the creation time, the schedule, and the status for each job definition. The value in the **Status** column is one of the following values:
+ **Paused**: You paused the job definition. Studio does not initiate any jobs until you resume the definition.
+ **Active**: The schedule is on and Studio can run the notebook according to the schedule you specified.

In addition, the **Actions** column provides shortcuts to help you perform the following tasks directly in the interface:
+ Pause: Pauses the job definition. Studio won’t create any jobs until you resume the definition.
+ Delete: Removes the job definition from the **Notebook Job Definitions** tab.
+ Resume: Continues a paused job definition so that it can start jobs.

If you created a job definition but it doesn’t initiate jobs, see [Job definition doesn’t create jobs](notebook-auto-run-troubleshoot.md#notebook-auto-run-troubleshoot-no-jobs) in the [Troubleshooting guide](notebook-auto-run-troubleshoot.md).

## View a single job definition
View a single job definition

If you select a job definition name in the **Notebook Job Definitions** tab, you see the **Job Definition** page where you can view specific details for a job definition. Use this page to confirm the settings you specified when you created the job definition. If you don’t see any jobs created from your job definition, see [Job definition doesn’t create jobs](notebook-auto-run-troubleshoot.md#notebook-auto-run-troubleshoot-no-jobs) in the [Troubleshooting guide](notebook-auto-run-troubleshoot.md).

This page also contains a section listing the jobs that run from this job definition. Viewing your jobs in the **Job Definition** page may be a more productive way to help you organize your jobs instead of viewing jobs in the **Notebook Jobs** tab, which combines all jobs from all your job definitions.

In addition, this page provides shortcuts for the following actions:
+ **Pause/Resume**: Pause your job definition, or resume a paused definition. Note that if a job is currently running for this definition, Studio does not stop it.
+ **Run**: Run a single on-demand job from this job definition. This option also lets you specify different input parameters to your notebook before starting the job.
+ **Edit Job Definition**: Change the schedule of your job definition. You can select a different time interval, or you can opt for a custom schedule using cron syntax.
+ **Delete Job Definition**: Remove the job definition from the **Notebook Job Definitions** tab. Note that if a job is currently running for this definition, Studio does not stop it.

# Troubleshooting guide
Troubleshooting guide

Refer to this troubleshooting guide to help you debug failures you might experience when your scheduled notebook job runs.

## Job definition doesn’t create jobs
Job definition doesn’t create jobs

If your job definition does not initiate any jobs, the notebook or training job may not be displayed in the **Jobs** section on the left navigation bar in Amazon SageMaker Studio. If this is the case, you can find error messages in the **Pipelines** section on the left navigation bar in Studio. Each notebook or training job definition belongs to an execution pipeline. The following are common causes for failing to initiate notebook jobs.

**Missing permissions**
+ The role assigned to the job definition does not have a trust relationship with Amazon EventBridge. That is, EventBridge cannot assume the role.
+ The role assigned to the job definition does not have permission to call `SageMaker AI:StartPipelineExecution`.
+ The role assigned to the job definition does not have permission to call `SageMaker AI:CreateTrainingJob`.

**EventBridge quota exceeded**

If you see a `Put*` error such as the following example, you exceeded an EventBridge quota. To resolve this, you can clean up unused EventBridge runs, or ask AWS Support to increase your quota.

```
LimitExceededException) when calling the PutRule operation: 
The requested resource exceeds the maximum number allowed
```

For more information about EventBridge quotas, see [Amazon EventBridge quotas](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-quota.html).

**Pipeline quota limit exceeded**

If you see an error such as the following example, you exceeded the number of pipelines that you can run. To resolve this, you can clean up unused pipelines in your account, or ask AWS Support to increase your quota.

```
ResourceLimitExceeded: The account-level service limit 
'Maximum number of pipelines allowed per account' is XXX Pipelines, 
with current utilization of XXX Pipelines and a request delta of 1 Pipelines.
```

For more information about pipeline quotas, see [Amazon SageMaker AI endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html).

**Training job limit exceeded**

If you see an error such as the following example, you exceeded the number of training jobs that you can run. To resolve this, reduce the number of training jobs in your account, or ask AWS Support to increase your quota.

```
ResourceLimitExceeded: The account-level service limit 
'ml.m5.2xlarge for training job usage' is 0 Instances, with current 
utilization of 0 Instances and a request delta of 1 Instances. 
Please contact AWS support to request an increase for this limit.
```

For more information about training job quotas, see [Amazon SageMaker AI endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html).

## Auto visualizations disabled in SparkMagic notebooks


If your notebook uses the SparkMagic PySpark kernel and you run the notebook as a Notebook Job, you may see that your auto visualizations are disabled in the output. Turning on auto visualization causes the kernel to hang, so the notebook job executor currently disables auto visualizations as a workaround.

# Constraints and considerations
Constraints and considerations

Review the following constraints to ensure your notebook jobs complete successfully. Studio uses Papermill to run notebooks. You might need to update Jupyter notebooks to align to Papermill's requirements. There are also restrictions on the content of LCC scripts and important details to understand regarding VPC configuration.

## JupyterLab version


JupyterLab version 4.0 is supported.

## Installation of packages that require kernel restart


Papermill does not support calling `pip install` to install packages that require a kernel restart. In this situation, use `pip install` in an initialization script. For a package installation that does not require kernel restart, you can still include `pip install` in the notebook. 

## Kernel and language names registered with Jupyter


Papermill registers a translator for specific kernels and languages. If you bring your own instance (BYOI), use a standard kernel name as shown in the following snippet:

```
papermill_translators.register("python", PythonTranslator)
papermill_translators.register("R", RTranslator)
papermill_translators.register("scala", ScalaTranslator)
papermill_translators.register("julia", JuliaTranslator)
papermill_translators.register("matlab", MatlabTranslator)
papermill_translators.register(".net-csharp", CSharpTranslator)
papermill_translators.register(".net-fsharp", FSharpTranslator)
papermill_translators.register(".net-powershell", PowershellTranslator)
papermill_translators.register("pysparkkernel", PythonTranslator)
papermill_translators.register("sparkkernel", ScalaTranslator)
papermill_translators.register("sparkrkernel", RTranslator)
papermill_translators.register("bash", BashTranslator)
```

## Parameters and environment variable limits


**Parameters and environment variable limits.** When you create your notebook job, it receives the parameters and environment variables you specify. You can pass up to 100 parameters. Each parameter name can be up to 256 characters long, and the associated value can be up to 2500 characters long. If you pass environment variables, you can pass up to 28 variables. The variable name and associated value can be up to 512 characters long. If you need more than 28 environment variables, use additional environment variables in an initialization script which has no limit on the number of environment variables you can use.

## Viewing jobs and job definitions


**Viewing jobs and job definitions.** If you schedule your notebook job in the Studio UI in the JupyterLab notebook, you can [view your notebook jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/view-notebook-jobs.html) and your [notebook job definitions](https://docs.aws.amazon.com/sagemaker/latest/dg/view-def-detail-notebook-auto-run.html) in the Studio UI. If you scheduled your notebook job with the SageMaker Python SDK, you can view your jobs only—the SageMaker Python SDK notebook job step does not create job definitions. To view your jobs, you also need to supply additional tags to your notebook job step instance. For details, see [View your notebook jobs in the Studio UI dashboard](create-notebook-auto-run-sdk.md#create-notebook-auto-run-dash).

## Image


You need to manage image constraints depending on whether you run notebook jobs in Studio or the SageMaker Python SDK notebook job step in a pipeline.

### Image constraints for SageMaker AI Notebook Jobs (Studio)


**Image and kernel support.** The driver that launches your notebook job assumes the following:
+ A base Python runtime environment is installed in the Studio or bring-your-own (BYO) images and is the default in the shell.
+ The base Python runtime environment includes the Jupyter client with kernelspecs properly configured.
+ The base Python runtime environment includes the `pip` function so the notebook job can install system dependencies.
+ For images with multiple environments, your initialization script should switch to the proper kernel-specific environment before installing notebook-specific packages. You should switch back to the default Python runtime environment, if different from the kernel runtime environment, after configuring the kernel Python runtime environment.

The driver that launches your notebook job is a bash script, and Bash v4 must be available at /bin/bash. 

**Root privileges on bring-your-own-images (BYOI).** You must have root privileges on your own Studio images, either as the root user or through `sudo` access. If you are not a root user but accessing root privileges through `sudo`, use **1000/100** as the `UID/GID`.

### Image constraints for SageMaker AI Python SDK notebook jobs


The notebook job step supports the following images:
+ SageMaker Distribution Images listed in [Amazon SageMaker Images Available for Use With Studio Classic Notebooks](notebooks-available-images.md).
+ A custom image based on the SageMaker Distribution images in the previous list. Use a [SageMaker Distribution image](https://github.com/aws/sagemaker-distribution) as a base.
+ A custom image (BYOI) pre-installed with notebook job dependencies (i.e., [sagemaker-headless-execution-driver](https://pypi.org/project/sagemaker-headless-execution-driver/). Your image must meet the following requirements:
  + The image is pre-installed with notebook job dependencies.
  + A base Python runtime environment is installed and is default in the shell environment.
  + The base Python runtime environment includes the Jupyter client with kernelspecs properly configured.
  + You have root privileges, either as the root user or through `sudo` access. If you are not a root user but accessing root privileges through `sudo`, use **1000/100** as the `UID/GID`.

## VPC subnets used during job creation


If you use a VPC, Studio uses your private subnets to create your job. Specify one to five private subnets (and 1–15 security groups).

If you use a VPC with private subnets, you must choose one of the following options to ensure the notebook job can connect to dependent services or resources:
+ If the job needs access to an AWS service that supports interface VPC endpoints, create an endpoint to connect to the service. For a list of services that support interface endpoints, see [AWS services that integrate with AWS PrivateLink](https://docs.aws.amazon.com/vpc/latest/privatelink/aws-services-privatelink-support.html). For information about creating an interface VPC endpoint, see [Access an AWS service using an interface VPC endpoint](https://docs.aws.amazon.com/vpc/latest/privatelink/create-interface-endpoint.html). At minimum, an Amazon S3 VPC endpoint gateway must be provided.
+ If a notebook job needs access to an AWS service that doesn't support interface VPC endpoints or to a resource outside of AWS, create a NAT gateway and configure your security groups to allow outbound connections. For information about setting up a NAT gateway for your VPC, see *VPC with public and private Subnets (NAT)* in the [Amazon Virtual Private Cloud User Guide](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html).

## Service limits


Since the notebook job scheduler is built from Pipelines, SageMaker Training, and Amazon EventBridge services, your notebook jobs are subject to their service-specific quotas. If you exceed these quotas, you may see error messages related to these services. For example, there are limits for how many pipelines you can run at one time, and how many rules you can set up for a single event bus. For more information about SageMaker AI quotas, see [Amazon SageMaker AI Endpoints and Quotas](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html). For more information about EventBridge quotas, see [Amazon EventBridge Quotas](https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-quota.html).

# Pricing for SageMaker Notebook Jobs


When you schedule notebook jobs, your Jupyter notebooks run on SageMaker training instances. After you select an **Image** and **Kernel** in your **Create Job** form, the form provides a list of available compute types. You are charged for the compute type you choose, based on the combined duration of use for all notebook jobs that run from the job definition. If you don’t specify a compute type, SageMaker AI assigns you a default Amazon EC2 instance type of `ml.m5.large`. For a breakdown of SageMaker AI pricing by compute type, see [Amazon SageMaker AI Pricing](https://aws.amazon.com/sagemaker/pricing).

# Schedule your ML workflows


With Amazon SageMaker AI you can manage your entire ML workflow as you create datasets, perform data transforms, build models from data, and deploy your models to endpoints for inference. If you perform any subset of steps of your workflow periodically, you can also choose to run these steps on a schedule. For example, you might want to schedule a job in SageMaker Canvas to run a transform on new data every hour. In another scenario, you might want to schedule a weekly job to monitor model drift of your deployed model. You can specify a recurring schedule of any time interval—you can iterate every second, minute, daily, weekly, monthly, or the 3rd Friday of every month at 3pm.

**The following scenarios summarize the options available to you depending on your use case.**
+ Use case 1: **Build and schedule your ML workflow in a no-code environment**. For beginners or those new to SageMaker AI, you can use Amazon SageMaker Canvas to both build your ML workflow and create scheduled runs using the Canvas UI-based scheduler.
+ Use case 2: **Build your workflow in a single Jupyter notebook and use a no-code scheduler**. Experienced ML practitioners can use code to build their ML workflow in a Jupyter notebook and use the no-code scheduling option available with the Notebook Jobs widget. If your ML workflow consists of multiple Jupyter notebooks, you can use the scheduling feature in the Pipelines Python SDK described in use case 3.
+ Use case 3: **Build and schedule your ML workflow using Pipelines**. Advanced users can use the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable), the Amazon SageMaker Pipelines visual editor, or Amazon EventBridge scheduling options available with Pipelines. You can build an ML workflow comprised of steps that include operations with various SageMaker AI features and AWS services, such as Amazon EMR.


| Descriptor | Use case 1 | Use case 2 | Use case 3 | 
| --- | --- | --- | --- | 
| SageMaker AI feature | Amazon SageMaker Canvas data processing and ML workflow scheduling | Notebook Jobs schedule widget (UI) | Pipelines Python SDK scheduling options | 
| Description | With Amazon SageMaker Canvas, you can schedule automatic runs of data processing steps and, in a separate procedure, automatic dataset updates. You can also indirectly schedule your entire ML workflow by setting up a configuration that runs a batch prediction whenever a specific dataset is updated. For both automated data processing and dataset updates, SageMaker Canvas provides a basic form where you select a start time and date and a time interval between runs (or a cron expression if you schedule a data processing step). For more information about how to schedule data processing steps, see [Create a schedule to automatically process new data](canvas-data-export-schedule-job.md). For more information about how to schedule dataset and batch prediction updates, see [How to manage automations](canvas-manage-automations.md). | If you built your data processing and pipeline workflow in a single Jupyter notebook, you can use the Notebook Jobs widget to run your notebook on demand or on a schedule. The Notebook Jobs widget displays a basic form where you specify the compute type, run schedule, and optional custom settings. You define your run schedule by selecting a time-based interval or by inserting a cron expression. The widget is automatically installed in Studio, or you can perform additional installation to use this feature in your local JupyterLab environment. For more information about Notebook Jobs, see [SageMaker Notebook Jobs](notebook-auto-run.md). | You can use the scheduling features in the SageMaker SDK if you implemented your ML workflow with Pipelines. Your pipeline can include steps such as fine-tuning, data processing, and deployment. Pipelines supports two ways to schedule your pipeline. You can create an Amazon EventBridge rule or use the SageMaker SDK [PipelineSchedule](https://sagemaker.readthedocs.io/en/stable/workflows/pipelines/sagemaker.workflow.pipelines.html#sagemaker.workflow.triggers.PipelineSchedule) constructor or the Amazon SageMaker Pipelines visual editor to define a schedule. For more information about the scheduling options available in Pipelines, see [Schedule Pipeline Runs](pipeline-eventbridge.md). | 
| Optimized for | Provides a scheduling option for a SageMaker Canvas ML workflow | Provides a UI-based scheduling option for Jupyter notebook-based ML workflows | Provides a SageMaker SDK or EventBridge scheduling option for ML workflows | 
| Considerations | You can schedule your workflow with the Canvas no-code framework, but dataset updates and batch transform updates can handle up to 5GB of data. | You can schedule one notebook using the UI-based scheduling form, but not multiple notebooks, in the same job. To schedule multiple notebooks, use the Pipelines SDK code-based solution described in use case 3. | You can use the more advanced (SDK based) scheduling capabilities provided by Pipelines, but you need to reference API documentation to specify the correct options rather than selecting from a UI-based menu of options. | 
| Recommended environment | Amazon SageMaker Canvas | Studio, local JupyterLab environment | Studio, local JupyterLab environment, any code editor | 

## Additional resources


**SageMaker AI offers the following additional options for scheduling your workflows.**
+ [What is Amazon EventBridge Scheduler?](https://docs.aws.amazon.com/scheduler/latest/UserGuide/what-is-scheduler.html). The scheduling options discussed in this section include pre-built options available in SageMaker Canvas, Studio, and the SageMaker AI Python SDK. All options extend the features of Amazon EventBridge, and you can also create your own custom scheduling solution with EventBridge.
+ [Scheduled and event based executions for Feature Processor pipelines](feature-store-feature-processor-schedule-pipeline.md). With Amazon SageMaker Feature Store Feature Processing, you can configure your Feature Processing pipelines to run on a schedule or as a result of another AWS service event.

# AWS Batch support for SageMaker AI training jobs
AWS Batch support for training jobs

An [AWS Batch job queue](https://docs.aws.amazon.com/batch/latest/userguide/job_queues.html) stores and prioritizes submitted jobs before they run on compute resources. You can submit SageMaker AI training jobs to a job queue in order to take advantage of the serverless job scheduling and prioritization tools provided by AWS Batch.

## How it works


The following steps describe the workflow of how to use an AWS Batch job queue with SageMaker AI training jobs. For more detailed tutorials and example notebooks, see the [Get started](#training-job-queues-get-started) section.
+ Set up AWS Batch and any necessary permissions. For more information, see [Setting up AWS Batch](https://docs.aws.amazon.com/batch/latest/userguide/get-set-up-for-aws-batch.html) in the *AWS Batch User Guide*.
+ Create the following AWS Batch resources in the console or using the AWS CLI:
  + [Service environment](https://docs.aws.amazon.com/batch/latest/userguide/service-environments.html) – Contains configuration parameters for integrating with SageMaker AI.
  + [SageMaker AI training job queue](https://docs.aws.amazon.com/batch/latest/userguide/create-sagemaker-job-queue.html) – Integrates with SageMaker AI to submit training jobs.
+ Configure your details and request for a SageMaker AI training job, such as your training container image. To submit a training job to an AWS Batch queue, you can use the AWS CLI, the AWS SDK for Python (Boto3), or the SageMaker AI Python SDK.
+ Submit your training jobs to the job queue. You can use the following options to submit jobs:
  + Use the AWS Batch [SubmitServiceJob](https://docs.aws.amazon.com/batch/latest/APIReference/API_SubmitServiceJob.html) API.
  + Use the [`aws_batch` module](https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/aws_batch) from the SageMaker AI Python SDK. After creating a TrainingQueue object and a model training object (such as an Estimator or ModelTrainer), you can submit training jobs to the TrainingQueue using the `queue.submit()` method.
+ After submitting jobs, view your job queue and job status with the AWS Batch console, the AWS Batch [DescribeServiceJob](https://docs.aws.amazon.com/batch/latest/APIReference/API_DescribeServiceJob.html) API, or the SageMaker AI [DescribeTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrainingJob.html) API.

## Cost and availability


For detailed pricing information about training jobs, see [Amazon SageMaker AI pricing](https://aws.amazon.com/sagemaker-ai/pricing/). With AWS Batch, you only pay for any AWS resources used, such as Amazon EC2 instances. For more information, see [AWS Batch pricing](https://aws.amazon.com/batch/pricing/).

You can use AWS Batch for SageMaker AI training jobs in any AWS Region where training jobs are available. For more information, see [Amazon SageMaker AI endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/sagemaker.html).

To ensure you have the required capacity when you need it, you can use SageMaker AI Flexible Training Plans (FTP). These plans allow you to reserve capacity for your training jobs. When combined with AWS Batch's queuing capabilities, you can maximize utilization during your plan's duration. For more information, see [Reserve training plans for you training jobs or HyperPod clusters](https://docs.aws.amazon.com/sagemaker/latest/dg/reserve-capacity-with-training-plans.html).

## Get started


For a tutorial on how to set up an AWS Batch job queue and submit SageMaker AI training jobs, see [Getting started with AWS Batch on SageMaker AI](https://docs.aws.amazon.com/batch/latest/userguide/getting-started-sagemaker.html) in the *AWS Batch User Guide*.

For Jupyter notebooks that show how to use the `aws_batch` module in the SageMaker AI Python SDK, see the [AWS Batch for SageMaker AI Training jobs notebook examples in the amazon-sagemaker-examples GitHub repository](https://github.com/aws/amazon-sagemaker-examples/tree/default/%20%20%20%20%20%20build_and_train_models/sm-training-queues).

# Amazon SageMaker ML Lineage Tracking
ML Lineage Tracking

**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. The following section is specific to using the Studio Classic application. For information about using the updated Studio experience, see [Amazon SageMaker Studio](studio-updated.md).  
Studio Classic is still maintained for existing workloads but is no longer available for onboarding. You can only stop or delete existing Studio Classic applications and cannot create new ones. We recommend that you [migrate your workload to the new Studio experience](studio-updated-migrate.md).

Amazon SageMaker ML Lineage Tracking creates and stores information about the steps of a machine learning (ML) workflow from data preparation to model deployment. With the tracking information, you can reproduce the workflow steps, track model and dataset lineage, and establish model governance and audit standards.

SageMaker AI’s Lineage Tracking feature works in the backend to track all the metadata associated with your model training and deployment workflows. This includes your training jobs, datasets used, pipelines, endpoints, and the actual models. You can query the lineage service at any point to find the exact artifacts used to train a model. Using those artifacts, you can recreate the same ML workflow to reproduce the model as long as you have access to the exact dataset that was used. A trial component tracks the training job. This trial component has all the parameters used as part of the training job. If you don’t need to rerun the entire workflow, you can reproduce the training job to derive the same model.

With SageMaker AI Lineage Tracking data scientists and model builders can do the following:
+ Keep a running history of model discovery experiments.
+ Establish model governance by tracking model lineage artifacts for auditing and compliance verification.

The following diagram shows an example lineage graph that Amazon SageMaker AI automatically creates in an end-to-end model training and deployment ML workflow.

![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/pipelines/PipelineLineageWorkflow.png)


**Topics**
+ [

# Lineage Tracking Entities
](lineage-tracking-entities.md)
+ [

# Amazon SageMaker AI–Created Tracking Entities
](lineage-tracking-auto-creation.md)
+ [

# Manually Create Tracking Entities
](lineage-tracking-manual-creation.md)
+ [

# Querying Lineage Entities
](querying-lineage-entities.md)
+ [

# Tracking Cross-Account Lineage
](xaccount-lineage-tracking.md)

# Lineage Tracking Entities
Tracking Entities

Tracking entities maintain a representation of all the elements of your end-to-end machine learning workflow. You can use this representation to establish model governance, reproduce your workflow, and maintain a record of your work history.

Amazon SageMaker AI automatically creates tracking entities for trial components and their associated trials and experiments when you create SageMaker AI jobs such as processing jobs, training jobs, and batch transform jobs. In additional to auto tracking, you can also [Manually Create Tracking Entities](lineage-tracking-manual-creation.md) to model custom steps in your workflow. For more information, see [Amazon SageMaker Experiments in Studio Classic](experiments.md).

SageMaker AI also automatically creates tracking entities for the other steps in a workflow so you can track the workflow from end to end. For more information, see [Amazon SageMaker AI–Created Tracking Entities](lineage-tracking-auto-creation.md).

You can create additional entities to supplement those created by SageMaker AI. For more information, see [Manually Create Tracking Entities](lineage-tracking-manual-creation.md).

SageMaker AI reuses any existing entities rather than creating new ones. For example, there can be only one artifact with a unique `SourceUri`.

**Key concepts for querying lineage**
+ **Lineage** – Metadata that tracks the relationships between various entities in your ML workflows.
+ **QueryLineage** – The action to inspect your lineage and discover relationships between entities.
+ **Lineage entities** – The metadata elements of which your lineage is composed.
+ **Cross-account lineage** – Your ML workflow may span more than one account. With cross-account lineage, you can configure multiple accounts to automatically create lineage associations between shared entity resources. QueryLineage then can return entities even from these shared accounts.

The following tracking entities are defined:

**Experiment entities**
+ [Trial component](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrialComponent.html) – A stage of a machine learning trial. Includes processing jobs, training jobs, and batch transform jobs.
+ [Trial](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrial.html) – A combination of trial components that generally produces a model.
+ [Experiment](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateExperiment.html) – A grouping of trials generally focused on solving a specific use case.

**Lineage entities**
+ [Trial Component](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrialComponent.html) – Represents processing, training, and transform jobs in the lineage. Also part of experiment management.
+ [Context](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateContext.html) – Provides a logical grouping of other tracking or experiment entities. Conceptually, experiments and trials are contexts. Some examples are an endpoint and a model package.
+ [Action](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateAction.html) – Represents an action or activity. Generally, an action involves at least one input artifact or output artifact. Some examples are a workflow step and a model deployment.
+ [Artifact](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateArtifact.html) – Represents a URI addressable object or data. An artifact is generally either an input or an output to a trial component or action. Some examples include a dataset (S3 bucket URI), or an image (Amazon ECR registry path).
+ [Association](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddAssociation.html) – Links other tracking or experiment entities, such as an association between the location of training data and a training job.

  An association has an optional `AssociationType` property. The following values are available along with the suggested use for each type. SageMaker AI places no restrictions on their use:
  + `ContributedTo` – The source contributed to the destination or had a part in enabling the destination. For example, the training data contributed to the training job.
  + `AssociatedWith` – The source is connected to the destination. For example, an approval workflow is associated with a model deployment.
  + `DerivedFrom` - The destination is a modification of the source. For example, a digest output of a channel input for a processing job is derived from the original inputs.
  + `Produced` – The source generated the destination. For example, a training job produced a model artifact.
  + `SameAs` – When the same lineage entity used in different accounts.

**Common properties**
+ **Type property**

  The action, artifact, and context entities have a *type* property, `ActionType`, `ArtifactType`, and `ContextType`, respectively. This property is a custom string which can associate meaningful information with the entity and be used as a filter in the List APIs.
+ **Source property**

  The action, artifact, and context entities have a `Source` property. This property provides the underlying URI that the entity represents. Some examples are:
  + An `UpdateEndpoint` action where the source is the `EndpointArn`.
  + An image artifact for a processing job where the source is the `ImageUri`.
  + An `Endpoint` context where the source is the `EndpointArn`.
+ **Metadata property**

  The action and artifact entities have an optional `Metadata` property which can provide the following information:
  + `ProjectId` – For example, the ID of the SageMaker AI MLOps project to which a model belongs.
  + `GeneratedBy` – For example, the SageMaker AI pipeline execution that registered a model package version.
  + `Repository` – For example, the repository that contains an algorithm.
  + `CommitId` – For example, the commit ID of an algorithm version.

# Amazon SageMaker AI–Created Tracking Entities
SageMaker AI-Created Entities

Amazon SageMaker AI automatically creates tracking entities for SageMaker AI jobs, models, model packages, and endpoints if the data is available. There is no limit to the number of lineage entities created by SageMaker AI.

For information on how you can manually create tracking entities, see [Manually Create Tracking Entities](lineage-tracking-manual-creation.md).

**Topics**
+ [

## Tracking Entities for SageMaker AI Jobs
](#lineage-tracking-auto-creation-jobs)
+ [

## Tracking Entities for Model Packages
](#lineage-tracking-auto-creation-model-package)
+ [

## Tracking Entities for Endpoints
](#lineage-tracking-auto-creation-endpoint)

## Tracking Entities for SageMaker AI Jobs


SageMaker AI creates a trial component for and associated with each SageMaker AI job. SageMaker AI creates artifacts to track the job metadata and associations between each artifact and the job.

Artifacts are created for the following job properties and associated with the Amazon Resource Name (ARN) of the SageMaker AI job. The artifact `SourceUri` is listed in parentheses.

**Training Job**
+ The image that contains the training algorithm (`TrainingImage`).
+ The data source of each input channel (`S3Uri`).
+ The location for the model (`S3OutputPath)`.
+ The location for the managed spot checkpoint data (`S3Uri`).

**Processing Job**
+ The container to be run by the processing job (`ImageUri`).
+ The data location for each processing input and processing output (`S3Uri`).

**Transform Job**
+ The input data source to be transformed (`S3Uri`).
+ The results of the transform (`S3OutputPath`).

**Note**  
Amazon Simple Storage Service (Amazon S3) artifacts are tracked based on the Amazon S3 URI values provided to the Create API, for example [CreateTrainingJob](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTrainingJob.html), and not on the Amazon S3 key and hash or etag values from each file.

## Tracking Entities for Model Packages


The following entities are created:

**Model Packages**
+ A context for each model package group.
+ An artifact for each model package.
+ An association between each model package artifact and the context for each model package group to which the package belongs to.
+ An action for the creation of a model package version.
+ An association between the model package artifact and the creation action.
+ An association between the model package artifact and each model package group context to which the package belongs to.
+ Inference containers
  + An artifact for the image used in each container defined in the model package.
  + An artifact for the model used in each container.
  + An association between each artifact and the model package artifact.
+ Algorithms
  + An artifact for each algorithm defined in the model package.
  + An artifact for the model created by each algorithm.
  + An association between each artifact and the model package artifact.

## Tracking Entities for Endpoints
Tracking Endpoints

The following entities are created by Amazon SageMaker AI:

**Endpoints**
+ A context for each endpoint
+ An action for the model deployment that created each endpoint
+ An artifact for each model deployed to the endpoint
+ An artifact for the image used in the model
+ An artifact for the model package for the model
+ An artifact for each image deployed to the endpoint
+ An association between each artifact and the model deployment action

# Manually Create Tracking Entities
Manually Create Entities

You can manually create tracking entities for any property to establish model governance, reproduce your workflow, and maintain a record of your work history. For information on the tracking entities that Amazon SageMaker AI automatically creates, see [Amazon SageMaker AI–Created Tracking Entities](lineage-tracking-auto-creation.md). The following tutorial demonstrates the steps needed to manually create and associate artifacts between a SageMaker training job and endpoint, then track the workflow. 

You can add tags to all entities except associations. Tags are arbitrary key-value pairs that provide custom information. You can filter or sort a list or search query by tags. For more information, see [Tagging AWS resources](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html) in the *AWS General Reference*.

For a sample notebook that demonstrates how to create lineage entities, see the [Amazon SageMaker AI Lineage](https://github.com/aws/amazon-sagemaker-examples/tree/master/sagemaker-lineage) notebook in the [Amazon SageMaker example GitHub repository](https://github.com/awslabs/amazon-sagemaker-examples).

**Topics**
+ [

## Manually Create Entities
](#lineage-tracking-manual-create)
+ [

## Manually Track a Workflow
](#lineage-tracking-manual-track)
+ [

## Limits
](#lineage-tracking-manual-track-limits)

## Manually Create Entities


The following procedure shows you how to create and associate artifacts between a SageMaker AI training job and endpoint. You perform the following steps:

**Import tracking entities and associations**

1. Import the lineage tracking entities.

   ```
   import sys
   !{sys.executable} -m pip install -q sagemaker
   
   from sagemaker import get_execution_role
   from sagemaker.session import Session
   from sagemaker.lineage import context, artifact, association, action
   
   import boto3
   boto_session = boto3.Session(region_name=region)
   sagemaker_client = boto_session.client("sagemaker")
   ```

1. Create the input and output artifacts.

   ```
   code_location_arn = artifact.Artifact.create(
       artifact_name='source-code-location',
       source_uri='s3://...',
       artifact_type='code-location'
   ).artifact_arn
   
   # Similar constructs for train_data_location_arn and test_data_location_arn
   
   model_location_arn = artifact.Artifact.create(
       artifact_name='model-location',
       source_uri='s3://...',
       artifact_type='model-location'
   ).artifact_arn
   ```

1. Train the model and get the `trial_component_arn` that represents the training job.

1. Associate the input artifacts and output artifacts with the training job (trial component).

   ```
   input_artifacts = [code_location_arn, train_data_location_arn, test_data_location_arn]
   for artifact_arn in input_artifacts:
       try:
           association.Association.create(
               source_arn=artifact_arn,
               destination_arn=trial_component_arn,
               association_type='ContributedTo'
           )
       except:
           logging.info('association between {} and {} already exists', artifact_arn, trial_component_arn)
   
   output_artifacts = [model_location_arn]
   for artifact_arn in output_artifacts:
       try:
            association.Association.create(
               source_arn=trial_component_arn,
               destination_arn=artifact_arn,
               association_type='Produced'
           )
       except:
           logging.info('association between {} and {} already exists', artifact_arn, trial_component_arn)
   ```

1. Create the inference endpoint.

   ```
   predictor = mnist_estimator.deploy(initial_instance_count=1,
                                        instance_type='ml.m4.xlarge')
   ```

1. Create the endpoint context.

   ```
   from sagemaker.lineage import context
   
   endpoint = sagemaker_client.describe_endpoint(EndpointName=predictor.endpoint_name)
   endpoint_arn = endpoint['EndpointArn']
   
   endpoint_context_arn = context.Context.create(
       context_name=predictor.endpoint_name,
       context_type='Endpoint',
       source_uri=endpoint_arn
   ).context_arn
   ```

1. Associate the training job (trial component) and endpoint context.

   ```
   association.Association.create(
       source_arn=trial_component_arn,
       destination_arn=endpoint_context_arn
   )
   ```

## Manually Track a Workflow


You can manually track the workflow created in the previous section.

Given the endpoint Amazon Resource Name (ARN) from the previous example, the following procedure shows you how to track the workflow back to the datasets used to train the model that was deployed to the endpoint. You perform the following steps:

**To track a workflow from endpoint to training data source**

1. Import the tracking entities.

   ```
   import sys
   !{sys.executable} -m pip install -q sagemaker
   
   from sagemaker import get_execution_role
   from sagemaker.session import Session
   from sagemaker.lineage import context, artifact, association, action
   
   import boto3
   boto_session = boto3.Session(region_name=region)
   sagemaker_client = boto_session.client("sagemaker")
   ```

1. Get the endpoint context from the endpoint ARN.

   ```
   endpoint_context_arn = sagemaker_client.list_contexts(
       SourceUri=endpoint_arn)['ContextSummaries'][0]['ContextArn']
   ```

1. Get the trial component from the association between the trial component and the endpoint context.

   ```
   trial_component_arn = sagemaker_client.list_associations(
       DestinationArn=endpoint_context_arn)['AssociationSummaries'][0]['SourceArn']
   ```

1. Get the training data location artifact from the association between the trial component and the endpoint context.

   ```
   train_data_location_artifact_arn = sagemaker_client.list_associations(
       DestinationArn=trial_component_arn, SourceType='Model')['AssociationSummaries'][0]['SourceArn']
   ```

1. Get the training data location from the training data location artifact.

   ```
   train_data_location = sagemaker_client.describe_artifact(
       ArtifactArn=train_data_location_artifact_arn)['Source']['SourceUri']
       print(train_data_location)
   ```

   Response:

   ```
   s3://sagemaker-sample-data-us-east-2/mxnet/mnist/train
   ```

## Limits


You can create an an association between any entities, experiment and lineage, except the following:
+ You cannot create an association between two experiment entities. Experiment entities consist of experiments, trials, and trial components.
+ You can create an association with another association.

An error occurs if you try to create an entity that already exists.

**Maximum number of manually created lineage entities**
+ Actions: 3000
+ Artifacts: 6000
+ Associations: 6000
+ Contexts: 500

There is no limit to the number of lineage entities automatically created by Amazon SageMaker AI.

# Querying Lineage Entities
Querying Lineage Entities

Amazon SageMaker AI automatically generates graphs of lineage entities as you use them. You can query this data to answer a variety of questions. The following provides instructions on how to query this data in SDK for Python. 

For information on how to view a registered model lineage in Amazon SageMaker Studio, see [View model lineage details in Studio](model-registry-lineage-view-studio.md).

You can query your lineage entities to:
+ Retrieve all data sets that went into the creation of a model.
+ Retrieve all jobs that went into the creation of an endpoint.
+ Retrieve all models that use a data set.
+ Retrieve all endpoints that use a model.
+ Retrieve which endpoints are derived from a certain data set.
+ Retrieve the pipeline execution that created a training job.
+ Retrieve the relationships between entities for investigation, governance, and reproducibility.
+ Retrieve all downstream trials that use the artifact.
+ Retrieve all upstream trials that use the artifact.
+ Retrieve a list of artifacts that use the provided S3 uri.
+ Retrieve upstream artifacts that use the dataset artifact.
+ Retrieve downstream artifacts that use the dataset artifact.
+ Retrieve datasets that use the image artifact.
+ Retrieve actions that use the context.
+ Retrieve processing jobs that use the endpoint.
+ Retrieve transform jobs that use the endpoint.
+ Retrieve trial components that use the endpoint.
+ Retrieve the ARN for the pipeline execution associated with the model package group.
+ Retrieve all artifacts that use the action.
+ Retrieve all upstream datasets that use the model package approval action.
+ Retrieve model package from model package approval action.
+ Retrieve downstream endpoint contexts that use the endpoint.
+ Retrieve the ARN for the pipeline execution associated with the trial component.
+ Retrieve datasets that use the trial component.
+ Retrieve models that use the trial component.
+ Explore your lineage for visualization.

**Limitations**
+ Lineage querying is not available in the following Regions:
  + Africa (Cape Town) – af-south
  + Asia Pacific (Jakarta) – ap-southeast-3
  + Asia Pacific (Osaka) – ap-northeast-3
  + Europe (Milan) – eu-south-1
  + Europe (Spain) – eu-south-2
  + Israel (Tel Aviv) – il-central-1
+ The maximum depth of relationships to discover is currently limited to 10.
+ Filtering is limited to the following properties: last modified date, created date, type, and lineage entity type. 

**Topics**
+ [

## Getting Started with Querying Lineage Entities
](#querying-lineage-entities-getting-started)

## Getting Started with Querying Lineage Entities
Getting Started

The easiest way to get started is either via the:
+ [ Amazon SageMaker AI SDK for Python](https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/lineage/artifact.py#L397) which has defined many common use cases.
+ For a notebook that demonstrates how to use SageMaker AI Lineage APIs to query relationships across the lineage graph, see [sagemaker-lineage-multihop-queries.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-lineage/sagemaker-lineage-multihop-queries.ipynb).

The following examples show how to use the `LineageQuery` and `LineageFilter` APIs to construct queries to answer questions about the Lineage Graph and extract entity relationships for a few use cases.

**Example Using the `LineageQuery` API to find entity associations**  

```
from sagemaker.lineage.context import Context, EndpointContext
from sagemaker.lineage.action import Action
from sagemaker.lineage.association import Association
from sagemaker.lineage.artifact import Artifact, ModelArtifact, DatasetArtifact

from sagemaker.lineage.query import (
    LineageQuery,
    LineageFilter,
    LineageSourceEnum,
    LineageEntityEnum,
    LineageQueryDirectionEnum,
)
# Find the endpoint context and model artifact that should be used for the lineage queries.

contexts = Context.list(source_uri=endpoint_arn)
context_name = list(contexts)[0].context_name
endpoint_context = EndpointContext.load(context_name=context_name)
```

**Example Find all the datasets associated with an endpoint**  

```
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `DATASET`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.DATASET]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the datasets
dataset_artifacts = []
for vertex in query_result.vertices:
    dataset_artifacts.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(dataset_artifacts)
```

**Example Find the models associated with an endpoint**  

```
# Define the LineageFilter to look for entities of type `ARTIFACT` and the source of type `MODEL`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT], sources=[LineageSourceEnum.MODEL]
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the lineage objects corresponding to the model
model_artifacts = []
for vertex in query_result.vertices:
    model_artifacts.append(vertex.to_lineage_object().source.source_uri)

# The results of the `LineageQuery` API call return the ARN of the model deployed to the endpoint along with
# the S3 URI to the model.tar.gz file associated with the model
pp.pprint(model_artifacts)
```

**Example Find the trial components associated with the endpoint**  

```
# Define the LineageFilter to look for entities of type `TRIAL_COMPONENT` and the source of type `TRAINING_JOB`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.TRIAL_COMPONENT],
    sources=[LineageSourceEnum.TRAINING_JOB],
)

# Providing this `LineageFilter` to the `LineageQuery` constructs a query that traverses through the given context `endpoint_context`
# and find all datasets.

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[endpoint_context.context_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

# Parse through the query results to get the ARNs of the training jobs associated with this Endpoint
trial_components = []
for vertex in query_result.vertices:
    trial_components.append(vertex.arn)

pp.pprint(trial_components)
```

**Example Changing the focal point of lineage**  
The `LineageQuery` can be modified to have different `start_arns` which changes the focal point of lineage. In addition, the `LineageFilter` can take multiple sources and entities to expand the scope of the query.  
In the following we use the model as the lineage focal point and find the endpoints and datasets associated with it.  

```
# Get the ModelArtifact

model_artifact_summary = list(Artifact.list(source_uri=model_package_arn))[0]
model_artifact = ModelArtifact.load(artifact_arn=model_artifact_summary.artifact_arn)
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that descend from the model, i.e. the endpoint
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # Find all the entities that ascend from the model, i.e. the datasets
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)
```

**Example Using `LineageQueryDirectionEnum.BOTH` to find ascendent and descendent relationships**  
When the direction is set to `BOTH`, the query traverses the graph to find ascendant and descendant relationships. This traversal takes place not only from the starting node, but from each node that is visited. For example; if a training job is run twice and both models generated by the training job are deployed to endpoints, the result of the query with direction set to `BOTH` shows both endpoints. This is because the same image is used for training and deploying the model. Since the image is common to the model, the `start_arn` and both the endpoints, appear in the query result.  

```
query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[LineageSourceEnum.ENDPOINT, LineageSourceEnum.DATASET],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],  # Model is the starting artifact
    query_filter=query_filter,
    # This specifies that the query should look for associations both ascending and descending for the start
    direction=LineageQueryDirectionEnum.BOTH,
    include_edges=False,
)

associations = []
for vertex in query_result.vertices:
    associations.append(vertex.to_lineage_object().source.source_uri)

pp.pprint(associations)
```

**Example Directions in `LineageQuery` - `ASCENDANTS` vs. `DESCENDANTS`**  
To understand the direction in the Lineage Graph, take the following entity relationship graph - Dataset -> Training Job -> Model -> Endpoint  
The endpoint is a descendant of the model, and the model is a descendant of the dataset. Similarly, the model is an ascendant of the endpoint. The `direction` parameter can be used to specify whether the query should return entities that are descendants or ascendants of the entity in `start_arns`. If the `start_arns` contains a model and the direction is `DESCENDANTS`, the query returns the endpoint. If the direction is `ASCENDANTS`, the query returns the dataset.  

```
# In this example, we'll look at the impact of specifying the direction as ASCENDANT or DESCENDANT in a `LineageQuery`.

query_filter = LineageFilter(
    entities=[LineageEntityEnum.ARTIFACT],
    sources=[
        LineageSourceEnum.ENDPOINT,
        LineageSourceEnum.MODEL,
        LineageSourceEnum.DATASET,
        LineageSourceEnum.TRAINING_JOB,
    ],
)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.ASCENDANTS,
    include_edges=False,
)

ascendant_artifacts = []

# The lineage entity returned for the Training Job is a TrialComponent which can't be converted to a
# lineage object using the method `to_lineage_object()` so we extract the TrialComponent ARN.
for vertex in query_result.vertices:
    try:
        ascendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        ascendant_artifacts.append(vertex.arn)

print("Ascendant artifacts : ")
pp.pprint(ascendant_artifacts)

query_result = LineageQuery(sagemaker_session).query(
    start_arns=[model_artifact.artifact_arn],
    query_filter=query_filter,
    direction=LineageQueryDirectionEnum.DESCENDANTS,
    include_edges=False,
)

descendant_artifacts = []
for vertex in query_result.vertices:
    try:
        descendant_artifacts.append(vertex.to_lineage_object().source.source_uri)
    except:
        # Handling TrialComponents.
        descendant_artifacts.append(vertex.arn)

print("Descendant artifacts : ")
pp.pprint(descendant_artifacts)
```

**Example SDK helper functions to make lineage queries easier**  
The classes `EndpointContext`, `ModelArtifact`, and `DatasetArtifact` have helper functions that are wrappers over the `LineageQuery` API to make certain lineage queries easier to leverage. The following example shows how to use these helper function.  

```
# Find all the datasets associated with this endpoint

datasets = []
dataset_artifacts = endpoint_context.dataset_artifacts()
for dataset in dataset_artifacts:
    datasets.append(dataset.source.source_uri)
print("Datasets : ", datasets)

# Find the training jobs associated with the endpoint
training_job_artifacts = endpoint_context.training_job_arns()
training_jobs = []
for training_job in training_job_artifacts:
    training_jobs.append(training_job)
print("Training Jobs : ", training_jobs)

# Get the ARN for the pipeline execution associated with this endpoint (if any)
pipeline_executions = endpoint_context.pipeline_execution_arn()
if pipeline_executions:
    for pipeline in pipelines_executions:
        print(pipeline)

# Here we use the `ModelArtifact` class to find all the datasets and endpoints associated with the model

dataset_artifacts = model_artifact.dataset_artifacts()
endpoint_contexts = model_artifact.endpoint_contexts()

datasets = [dataset.source.source_uri for dataset in dataset_artifacts]
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Datasets associated with this model : ")
pp.pprint(datasets)

print("Endpoints associated with this model : ")
pp.pprint(endpoints)

# Here we use the `DatasetArtifact` class to find all the endpoints hosting models that were trained with a particular dataset
# Find the artifact associated with the dataset

dataset_artifact_arn = list(Artifact.list(source_uri=training_data))[0].artifact_arn
dataset_artifact = DatasetArtifact.load(artifact_arn=dataset_artifact_arn)

# Find the endpoints that used this training dataset
endpoint_contexts = dataset_artifact.endpoint_contexts()
endpoints = [endpoint.source.source_uri for endpoint in endpoint_contexts]

print("Endpoints associated with the training dataset {}".format(training_data))
pp.pprint(endpoints)
```

**Example Getting a Lineage graph visualization**  
A helper class `Visualizer` is provided in the sameple notebook [visualizer.py ](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-lineage/visualizer.py) to help plot the lineage graph. When the query response is rendered, a graph with the lineage relationships from the `StartArns` is displayed. From the `StartArns` the visualization shows the relationships with the other lineage entities returned in the `query_lineage` API action.  

```
# Graph APIs
# Here we use the boto3 `query_lineage` API to generate the query response to plot.

from visualizer import Visualizer

query_response = sm_client.query_lineage(
    StartArns=[endpoint_context.context_arn], Direction="Ascendants", IncludeEdges=True
)

viz = Visualizer()
viz.render(query_response, "Endpoint")
        
        query_response = sm_client.query_lineage(
    StartArns=[model_artifact.artifact_arn], Direction="Ascendants", IncludeEdges=True
)
viz.render(query_response, "Model")
```

# Tracking Cross-Account Lineage


Amazon SageMaker AI supports tracking lineage entities from a different AWS account. Other AWS accounts can share their lineage entities with you and you can access these lineage entities through direct API calls or SageMaker AI lineage queries.

SageMaker AI uses [AWS Resource Access Manager](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) to help you securely share your lineage resources. You can share your resources through the [AWS RAM console](https://console.aws.amazon.com/ram/home).


## Set Up Cross-Account Lineage Tracking
Set Up Cross-Account Lineage Tracking

You can group and share your [Lineage Tracking Entities](lineage-tracking-entities.md) through a lineage group in Amazon SageMaker AI. SageMaker AI supports only one default lineage group per account. SageMaker AI creates the default lineage group whenever a lineage entity is created in your account. Every lineage entity owned by your account is assigned to this default lineage group. To share lineage entities with another account, you share this default lineage group with that account.

**Note**  
You can share all lineage tracking entities in a lineage group or none.

Create a resource share for your lineage entities using AWS Resource Access Manager console. For more information, see [Sharing your AWS resources](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html) in the *AWS Resource Access Manager User Guide*.

**Note**  
After the resource share is created, it can take a few minutes for the resource and principal associations to complete. Once the association is set, the shared account receives an invitation to join the resource share. The shared account must accept the invite to gain access to shared resources. For more information on accepting a resource share invite in AWS RAM, see [Using shared AWS resources ](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-shared.html) in the *AWS Resource Access Manager User Guide*.

### Your cross-account lineage tracking resource policy
Sharing lineage resource policy

Amazon SageMaker AI supports only one type of resource policy. The SageMaker AI resource policy must allow all of the following operations:

```
"sagemaker:DescribeAction"
"sagemaker:DescribeArtifact"
"sagemaker:DescribeContext"
"sagemaker:DescribeTrialComponent"
"sagemaker:AddAssociation"
"sagemaker:DeleteAssociation"
"sagemaker:QueryLineage"
```

**Example The following is a SageMaker AI resource policy created using AWS Resource Access Manager for creating a resource share for an accounts lineage group.**    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "FullLineageAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "111122223333"
      },
      "Action": [
        "sagemaker:DescribeAction",
        "sagemaker:DescribeArtifact",
        "sagemaker:DescribeContext",
        "sagemaker:DescribeTrialComponent",
        "sagemaker:AddAssociation",
        "sagemaker:DeleteAssociation",
        "sagemaker:QueryLineage"
      ],
      "Resource": "arn:aws:sagemaker:us-west-2:111111111111:lineage-group/sagemaker-default-lineage-group"
    }
  ]
}
```

## Tracking Cross-Account Lineage Entities
Tracking Lineage cross-account

With cross-account lineage tracking you can associate lineage entities in different accounts using the same `AddAssociation` API action. When you associate two lineage entities, SageMaker AI validates if you have permissions to perform the `AddAssociation` API action on both lineage entities. SageMaker AI then establishes the association. If you don’t have the permissions, SageMaker AI *does not* create the association. Once the cross-account association is established, you can access either lineage entity from the other through the `QueryLineage` API action. For more information, see [Querying Lineage Entities](querying-lineage-entities.md).

In addition to SageMaker AI automatically creating lineage entities, if you have cross-account access, SageMaker AI connects artifacts that reference the same object or data. If the data from one account is used in lineage tracking by different accounts, SageMaker AI creates an artifact in each account to track that data. With cross-account lineage, whenever SageMaker AI creates new artifacts, SageMaker AI checks if there are other artifacts created for the same data that are also shared with you. SageMaker AI then establishes associations between the newly created artifact and each of the artifacts shared with you with the `AssociationType` set to `SameAs`. You can then use the `[QueryLineage](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_QueryLineage.html)` API action to traverse the lineage entities in your own account to lineage entities shared with you but owned by a different AWS account. For more information, see [Querying Lineage Entities](querying-lineage-entities.md)

**Topics**
+ [

### Accessing lineage resources from a different accounts
](#tracking-lineage-xaccount-accessing-resources)
+ [

### Authorization for querying cross-account lineage entities
](#tracking-lineage-xaccount-authorization)

### Accessing lineage resources from a different accounts
Accessing different accounts

Once the cross-account access for sharing lineage has been set up, you can call the following SageMaker API actions directly with the ARN to describe the shared lineage entities from another account:
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAction.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeAction.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeArtifact.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeArtifact.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeContext.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeContext.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrialComponent.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeTrialComponent.html)

You can also manage [Associations ](https://docs.aws.amazon.com/sagemaker/latest/dg/lineage-tracking-entities.html)for lineage entities owned by different accounts that are shared with you, using the following SageMaker API actions: 
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddAssociation.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddAssociation.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DeleteAssociation.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DeleteAssociation.html)

For a notebook that demonstrates how to use SageMaker AI Lineage APIs to query lineage across accounts., see [sagemaker-lineage-cross-account-with-ram.ipynb](https://github.com/aws/amazon-sagemaker-examples/blob/master/sagemaker-lineage/sagemaker-lineage-cross-account-with-ram.ipynb).

### Authorization for querying cross-account lineage entities
Authorization across accounts

Amazon SageMaker AI must validate that you have permissions to perform the `QueryLineage` API action on the `StartArns`. This is enforced through the resource policy attached to the `LineageGroup`. The result from this action includes all the lineage entities to which you have access, whether they are owned by your account or shared by another account. For more information, see [Querying Lineage Entities](querying-lineage-entities.md).

# Model Registration Deployment with Model Registry
Model Registry

With the Amazon SageMaker Model Registry you can do the following:
+ Catalog models for production.
+ Manage model versions.
+ Associate metadata, such as training metrics, with a model.
+ View information from Amazon SageMaker Model Cards in your registered models. 
+ View model lineage for traceability and reproducibility.
+ Define a staging construct that models can progress through for your model lifecycle.
+ Manage the approval status of a model.
+ Deploy models to production.
+ Automate model deployment with CI/CD.
+ Share models with other users.

Catalog models by creating SageMaker Model Registry Model (Package) Groups that contain different versions of a model. You can create a Model Group that tracks all of the models that you train to solve a particular problem. You can then register each model you train and the Model Registry adds it to the Model Group as a new model version. Lastly, you can create categories of Model Groups by further organizing them into SageMaker Model Registry Collections. A typical workflow might look like the following:
+ Create a Model Group.
+ Create an ML pipeline that trains a model. For information about SageMaker pipelines, see [Pipelines actions](pipelines-build.md).
+ For each run of the ML pipeline, create a model version that you register in the Model Group you created in the first step.
+ Add your Model Group into one or more Model Registry Collections.

For details about how to create and work with models, model versions, and Model Groups, see [Model Registry Models, Model Versions, and Model Groups](model-registry-models.md). Optionally, if you want to further group your Model Groups into Collections, see [Model Registry Collections](modelcollections.md).

# Model Registry Models, Model Versions, and Model Groups
Models, Model Versions, and Model Groups

The SageMaker Model Registry is structured as several Model (Package) Groups with model packages in each group. These Model Groups can optionally be added to one or more Collections. Each model package in a Model Group corresponds to a trained model. The version of each model package is a numerical value that starts at 1 and is incremented with each new model package added to a Model Group. For example, if 5 model packages are added to a Model Group, the model package versions will be 1, 2, 3, 4, and 5. 

 A model package is the actual model that is registered into the Model Registry as a versioned entity. There are two types of model packages in SageMaker AI. One type is used in the AWS Marketplace, and the other is used in the Model Registry. Model packages used in the AWS Marketplace are not versionable entities and are not associated with Model Groups in the Model Registry. The Model Registry receives every new model that you retrain, gives it a version, and assigns it to a Model Group inside the Model Registry. The following image shows an example of a Model Group with 25 consecutively-versioned models. For more information about model packages used in the AWS Marketplace, see [Algorithms and packages in the AWS Marketplace](sagemaker-marketplace.md).

The model packages used in the Model Registry are versioned, and **must** be associated with a Model Group. The ARN of this model package type has the structure: `'arn:aws:sagemaker:region:account:model-package-group/version'`

The following topics show you how to create and work with models, model versions, and Model Groups in the Model Registry.

**Topics**
+ [

# Create a Model Group
](model-registry-model-group.md)
+ [

# Delete a Model Group
](model-registry-delete-model-group.md)
+ [

# Register a Model Version
](model-registry-version.md)
+ [

# View Model Groups and Versions
](model-registry-view.md)
+ [

# Update the Details of a Model Version
](model-registry-details.md)
+ [

# Compare Model Versions
](model-registry-version-compare.md)
+ [

# View and Manage Model Group and Model Version Tags
](model-registry-tags.md)
+ [

# Delete a Model Version
](model-registry-delete-model-version.md)
+ [

# Staging Construct for your Model Lifecycle
](model-registry-staging-construct.md)
+ [

# Update the Approval Status of a Model
](model-registry-approve.md)
+ [

# Deploy a Model from the Registry with Python
](model-registry-deploy.md)
+ [

# Deploy a Model in Studio
](model-registry-deploy-studio.md)
+ [

# Cross-account discoverability
](model-registry-ram.md)
+ [

# View the Deployment History of a Model
](model-registry-deploy-history.md)
+ [

# View model lineage details in Studio
](model-registry-lineage-view-studio.md)

# Create a Model Group


A Model Group contains different versions of a model. You can create a Model Group that tracks all of the models that you train to solve a particular problem. Create a Model Group by using either the AWS SDK for Python (Boto3) or the Amazon SageMaker Studio console.

## Create a Model Group (Boto3)


**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

To create a Model Group by using Boto3, call the `create_model_package_group` API operation and specify a name and description as parameters. The following example shows how to create a Model Group. The response from the `create_model_package_group` call is the Amazon Resource Name (ARN) of the new Model Group.

First, import the required packages and set up the SageMaker AI Boto3 client.

```
import time
import os
from sagemaker import get_execution_role, session
import boto3

region = boto3.Session().region_name

role = get_execution_role()

sm_client = boto3.client('sagemaker', region_name=region)
```

Now create the Model Group.

```
import time
model_package_group_name = "scikit-iris-detector-" + str(round(time.time()))
model_package_group_input_dict = {
 "ModelPackageGroupName" : model_package_group_name,
 "ModelPackageGroupDescription" : "Sample model package group"
}

create_model_package_group_response = sm_client.create_model_package_group(**model_package_group_input_dict)
print('ModelPackageGroup Arn : {}'.format(create_model_package_group_response['ModelPackageGroupArn']))
```

## Create a Model Group (Studio or Studio Classic)
Create Model Group

To create a Model Group in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models**.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. Choose **Register**, then choose **Model group**.

1. In the **Register model group** dialog box, enter the following information:
   + The name of the new Model Group in the **Model group name** field.
   + (Optional) A description for the Model Group in the **Description** field.
   + (Optional) Any key-value pairs you want to associate with the Model Group in the **Tags** field. For information about using tags, see [Tagging AWS resources](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html) in the *AWS General Reference*.

1. Choose **Register model group**.

1. (Optional) In the **Models** page, choose the **Registered models** tab, then choose **Model Groups**. Confirm your newly-created Model Group appears in the list of Model Groups.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. Choose **Actions**, then choose **Create model group**.

1. In the **Create model group** dialog box, enter the following information:
   + Enter the name of the new Model Group in the **Model group name** field.
   + (Optional) Enter a description for the Model Group in the **Description** field.
   + (Optional) Enter any key-value pairs you want to associate with the Model Group in the **Tags** field. For information about using tags, see [Tagging AWS resources](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html) in the *AWS General Reference*.
   + (Optional) Choose a project with which to associate the Model Group in the **Project** field. For information about projects, see [MLOps Automation With SageMaker Projects](sagemaker-projects.md).

1. Choose **Create model group**.

------

# Delete a Model Group
Delete a Model Group

This procedure demonstrates how to delete a Model Group in the Amazon SageMaker Studio console. When you delete a Model Group, you lose access to the model versions in the Model Group.

## Delete a Model Group (Studio or Studio Classic)


**Important**  
You can only delete an empty model group. Before you delete your model group, remove its model versions, if any.

To delete a Model Group in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models**.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. From the model groups list, select the check box next to the name of the Model Group you want to delete.

1. Choose the vertical ellipsis above the top right corner of the model groups list, and choose **Delete**.

1. In the **Delete model group** dialog box, choose **Yes, delete the model group**.

1. Choose **Delete**.

1. Confirm that your deleted model groups no longer appear in your list of model groups.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**. A list of your Model Groups appears.

1. From the model groups list, select the name of the Model Group you want to delete.

1. In the top right corner, choose **Remove**.

1. In the confirmation dialog box, enter `REMOVE`.

1. Choose **Remove**.

------

# Register a Model Version
Register Version

You can register an Amazon SageMaker AI model by creating a model version that specifies the model group to which it belongs. A model version must include a model artifacts (the trained weights of a model) and optionally the inference code for the model.

An *inference pipeline* is a SageMaker AI model composed of a linear sequence of two to fifteen containers that process inference requests. You register an inference pipeline by specifying the containers and the associated environment variables. For more information on inference pipelines, see [Inference pipelines in Amazon SageMaker AI](inference-pipelines.md).

You can register a model with an inference pipeline, by specifying the containers and the associated environment variables. To create a model version with an inference pipeline by using either the AWS SDK for Python (Boto3), the Amazon SageMaker Studio console, or by creating a step in a SageMaker AI model building pipeline, use the following steps. 

**Topics**
+ [

## Register a Model Version (SageMaker AI Pipelines)
](#model-registry-pipeline)
+ [

## Register a Model Version (Boto3)
](#model-registry-version-api)
+ [

## Register a Model Version (Studio or Studio Classic)
](#model-registry-studio)
+ [

## Register a Model Version from a Different Account
](#model-registry-version-xaccount)

## Register a Model Version (SageMaker AI Pipelines)


To register a model version by using a SageMaker AI model building pipeline, create a `RegisterModel` step in your pipeline. For information about creating a `RegisterModel` step as part of a pipeline, see [Step 8: Define a RegisterModel step to create a model package](define-pipeline.md#define-pipeline-register).

## Register a Model Version (Boto3)


To register a model version by using Boto3, call the `create_model_package` API operation.

First, you set up the parameter dictionary to pass to the `create_model_package` API operation.

```
# Specify the model source
model_url = "s3://your-bucket-name/model.tar.gz"

modelpackage_inference_specification =  {
    "InferenceSpecification": {
      "Containers": [
         {
            "Image": image_uri,
	    "ModelDataUrl": model_url
         }
      ],
      "SupportedContentTypes": [ "text/csv" ],
      "SupportedResponseMIMETypes": [ "text/csv" ],
   }
 }

# Alternatively, you can specify the model source like this:
# modelpackage_inference_specification["InferenceSpecification"]["Containers"][0]["ModelDataUrl"]=model_url

create_model_package_input_dict = {
    "ModelPackageGroupName" : model_package_group_name,
    "ModelPackageDescription" : "Model to detect 3 different types of irises (Setosa, Versicolour, and Virginica)",
    "ModelApprovalStatus" : "PendingManualApproval"
}
create_model_package_input_dict.update(modelpackage_inference_specification)
```

Then you call the `create_model_package` API operation, passing in the parameter dictionary that you just set up.

```
create_model_package_response = sm_client.create_model_package(**create_model_package_input_dict)
model_package_arn = create_model_package_response["ModelPackageArn"]
print('ModelPackage Version ARN : {}'.format(model_package_arn))
```

## Register a Model Version (Studio or Studio Classic)


To register a model version in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models** from the menu.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups** and **My models**, if not selected already.

1. Choose **Register**. This will open the **Register model** page.

1. Follow the instructions provided in the **Register model** page. 

1. Once you have reviewed your choices, choose **Register**. Once completed, you will be taken to the model version **Overview** page.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. Open the **Register Version** form. You can do this in one of two ways:
   + Choose **Actions**, and then choose **Create model version**.
   + Select the name of the model group for which you want to create a model version, then choose **Create model version**.

1. In the **Register model version** form, enter the following information:
   + In the **Model package group name** dropdown, select the model group name.
   + (Optional) Enter a description for your model version.
   + In the **Model Approval Status** dropdown, select the version approval status.
   + (Optional) In the **Custom metadata** field, add custom tags as key-value pairs.

1. Choose **Next**.

1. In the **Inference Specification** form, enter the following information:
   + Enter your inference image location.
   + Enter your model data artifacts location.
   + (Optional) Enter information about images to use for transform and real-time inference jobs, and supported input and output MIME types.

1. Choose **Next**.

1. (Optional) Provide details to aid endpoint recommendations.

1. Choose **Next**.

1. (Optional) Choose model metrics you want to include.

1. Choose **Next**.

1. Ensure the displayed settings are correct, and choose **Register model version**. If you subsequently see a modal window with an error message, choose **View** (next to the message) to view the source of the error.

1. Confirm your new model version appears in the parent model group page.

------

## Register a Model Version from a Different Account


To register model versions with a Model Group created by a different AWS account, you must add a cross-account AWS Identity and Access Management resource policy to enable that account. For example, one AWS account in your organization is responsible for training models, and a different account is responsible for managing, deploying, and updating models. You create IAM resource policies and apply the policies to the specific account resource to which you want to grant access for this case. For more information about cross-account resource policies in AWS, see [Cross-account policy evaluation logic](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic-cross-account.html) in the *AWS Identity and Access Management User Guide*.

To enable cross-account discoverability, which allows other accounts to view model package groups from the resource owner account, see [Cross-account discoverability](model-registry-ram.md).

**Note**  
You must also use a KMS key to encrypt the [output data config](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OutputDataConfig.html) action during training for cross-account model deployment.

To enable cross-account model registry in SageMaker AI, you have to provide a cross-account resource policy for the Model Group that contains the model versions. The following is an example that creates cross-account policies for the Model Group and applies these policies to that specific resource.

The following configuration must be set in the source account which registers models cross-account in a Model Group. In this example, the source account is the model training account which will train and then register the model cross-account into the Model Registry of the Model Registry account.

The example assumes that you previously defined the following variables:
+ `sm_client` – A SageMaker AI Boto3 client.
+ `model_package_group_name` – The Model Group to which you want to grant access.
+ `model_package_group_arn` – The Model Group ARN to which you want to grant cross-account access.
+ `bucket` – The Amazon S3 bucket where the model training artifacts are stored.

To be able to deploy a model created in a different account, the user must have a role that has access to SageMaker AI actions, such as a role with the `AmazonSageMakerFullAccess` managed policy. For information about SageMaker AI managed policies, see [AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md).

### Required IAM resource policies


The following diagram captures the policies required to allow cross-account model registration. As shown, these policies need to be active during model training to properly register the model into the Model Registry account.

![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/model_registry_cross_account.png)


Amazon ECR, Amazon S3, and AWS KMS policies are demonstrated in the following code samples. 

**Sample Amazon ECR policy**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:root"
            },
            "Action": [
                "ecr:BatchGetImage",
                "ecr:Describe*"
            ]
        }
    ]
}
```

------

**Sample Amazon S3 policy**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:root"
            },
            "Action": [
                "s3:GetObject",
                "s3:GetBucketAcl",
                "s3:GetObjectAcl"
            ],
            "Resource": "arn:aws:s3:::amzn-s3-demo-bucket/*"
        }
    ]
}
```

------

**Sample AWS KMS policy**

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AddPerm",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::111122223333:root"
            },
            "Action": [
                "kms:Decrypt",
                "kms:GenerateDataKey*"
            ],
            "Resource": "*"
        }
    ]
}
```

------

### Apply resource policies to accounts


The following policy configuration applies the policies discussed in the previous section and must be put in the model training account.

```
import json

# The Model Registry account id of the Model Group 
model_registry_account = "111111111111"

# The model training account id where training happens
model_training_account = "222222222222"

# 1. Create a policy for access to the ECR repository 
# in the model training account for the Model Registry account Model Group
ecr_repository_policy = {"Version": "2012-10-17",		 	 	 
    "Statement": [{"Sid": "AddPerm",
        "Effect": "Allow",
        "Principal": {
          "AWS": f"arn:aws:iam::{model_registry_account}:root"
        },
        "Action": [
          "ecr:BatchGetImage",
          "ecr:Describe*"
        ]
    }]
}

# Convert the ECR policy from JSON dict to string
ecr_repository_policy = json.dumps(ecr_repository_policy)

# Set the new ECR policy
ecr = boto3.client('ecr')
response = ecr.set_repository_policy(
    registryId = model_training_account,
    repositoryName = "decision-trees-sample",
    policyText = ecr_repository_policy
)

# 2. Create a policy in the model training account for access to the S3 bucket 
# where the model is present in the Model Registry account Model Group
bucket_policy = {"Version": "2012-10-17",		 	 	 
    "Statement": [{"Sid": "AddPerm",
        "Effect": "Allow",
        "Principal": {"AWS": f"arn:aws:iam::{model_registry_account}:root"
        },
        "Action": [
          "s3:GetObject",
          "s3:GetBucketAcl",
          "s3:GetObjectAcl"
        ],
        "Resource": [
          "arn:aws:s3:::{bucket}/*",
	  "Resource: arn:aws:s3:::{bucket}"
        ]
    }]
}

# Convert the S3 policy from JSON dict to string
bucket_policy = json.dumps(bucket_policy)

# Set the new bucket policy
s3 = boto3.client("s3")
response = s3.put_bucket_policy(
    Bucket = bucket,
    Policy = bucket_policy)

# 3. Create the KMS grant for the key used during training for encryption
# in the model training account to the Model Registry account Model Group
client = boto3.client("kms")

response = client.create_grant(
    GranteePrincipal=model_registry_account,
    KeyId=kms_key_id
    Operations=[
        "Decrypt",
        "GenerateDataKey",
    ],
)
```

The following configuration needs to be put in the Model Registry account where the Model Group exists.

```
# The Model Registry account id of the Model Group 
model_registry_account = "111111111111"

# 1. Create policy to allow the model training account to access the ModelPackageGroup
model_package_group_policy = {"Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AddPermModelPackageVersion",
            "Effect": "Allow",
            "Principal": {"AWS": f"arn:aws:iam::{model_training_account}:root"},
            "Action": ["sagemaker:CreateModelPackage"],
            "Resource": f"arn:aws:sagemaker:{region}:{model_registry_account}:model-package/{model_package_group_name}/*"
        }
    ]
}

# Convert the policy from JSON dict to string
model_package_group_policy = json.dumps(model_package_group_policy)

# Set the new policy
response = sm_client.put_model_package_group_policy(
    ModelPackageGroupName = model_package_group_name,
    ResourcePolicy = model_package_group_policy)
```

Finally, use the `create_model_package` action from the model training account to register the model package in the cross-account.

```
# Specify the model source
model_url = "s3://{bucket}/model.tar.gz"

#Set up the parameter dictionary to pass to the create_model_package API operation
modelpackage_inference_specification =  {
    "InferenceSpecification": {
        "Containers": [
            {
                "Image": f"{model_training_account}.dkr.ecr.us-east-2.amazonaws.com/decision-trees-sample:latest",
                "ModelDataUrl": model_url
            }
        ],
        "SupportedContentTypes": [ "text/csv" ],
        "SupportedResponseMIMETypes": [ "text/csv" ],
    }
}

# Alternatively, you can specify the model source like this:
# modelpackage_inference_specification["InferenceSpecification"]["Containers"][0]["ModelDataUrl"]=model_url

create_model_package_input_dict = {
    "ModelPackageGroupName" : model_package_group_arn,
    "ModelPackageDescription" : "Model to detect 3 different types of irises (Setosa, Versicolour, and Virginica)",
    "ModelApprovalStatus" : "PendingManualApproval"
}
create_model_package_input_dict.update(modelpackage_inference_specification)

# Create the model package in the Model Registry account
create_model_package_response = sm_client.create_model_package(**create_model_package_input_dict)
model_package_arn = create_model_package_response["ModelPackageArn"]
print('ModelPackage Version ARN : {}'.format(model_package_arn))
```

# View Model Groups and Versions
View Model Groups and Versions

Model Groups and versions help you organize your models. You can view a list of the model versions in a Model Group by using either the AWS SDK for Python (Boto3) (Boto3) or the Amazon SageMaker Studio console.

## View a List of Model Versions in a Group


You can view all of the model versions that are associated with a Model Group. If a Model Group represents all models that you train to address a specific ML problem, you can view all of those related models.

### View a List of Model Versions in a Group (Boto3)


To view model versions associated with a Model Group by using Boto3, call the `list_model_packages` API operation, and pass the name of the Model Group as the value of the `ModelPackageGroupName` parameter. The following code lists the model versions associated with the Model Group you created in [Create a Model Group (Boto3)](model-registry-model-group.md#model-registry-package-group-api).

```
sm_client.list_model_packages(ModelPackageGroupName=model_package_group_name)
```

### View a List of Model Versions in a Group (Studio or Studio Classic)


To view a list of the model versions in a Model Group in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models** from the menu.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. From the model groups list, choose the angle bracket to the left of the model group you want to view.

1. A list of the model versions in the model group appears.

1. (Optional) Choose **View all**, if shown, to view additional model versions.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. From the model groups list, select the name of the Model Group you want to view.

1. A new tab appears with a list of the model versions in the Model Group.

------

# Update the Details of a Model Version
Update Model Version Details

You can view and update details of a specific model version by using either the AWS SDK for Python (Boto3) or the Amazon SageMaker Studio console.

**Important**  
Amazon SageMaker AI integrates Model Cards into Model Registry. A model package registered in the Model Registry includes a simplified Model Card as a component of the model package. For more information, see [Model package model card schema (Studio)](#model-card-schema).

## View and Update the Details of a Model Version (Boto3)


To view the details of a model version by using Boto3, complete the following steps.

1. Call the `list_model_packages` API operation to view the model versions in a Model Group.

   ```
   sm_client.list_model_packages(ModelPackageGroupName="ModelGroup1")
   ```

   The response is a list of model package summaries. You can get the Amazon Resource Name (ARN) of the model versions from this list.

   ```
   {'ModelPackageSummaryList': [{'ModelPackageGroupName': 'AbaloneMPG-16039329888329896',
      'ModelPackageVersion': 1,
      'ModelPackageArn': 'arn:aws:sagemaker:us-east-2:123456789012:model-package/ModelGroup1/1',
      'ModelPackageDescription': 'TestMe',
      'CreationTime': datetime.datetime(2020, 10, 29, 1, 27, 46, 46000, tzinfo=tzlocal()),
      'ModelPackageStatus': 'Completed',
      'ModelApprovalStatus': 'Approved'}],
    'ResponseMetadata': {'RequestId': '12345678-abcd-1234-abcd-aabbccddeeff',
     'HTTPStatusCode': 200,
     'HTTPHeaders': {'x-amzn-requestid': '12345678-abcd-1234-abcd-aabbccddeeff',
      'content-type': 'application/x-amz-json-1.1',
      'content-length': '349',
      'date': 'Mon, 23 Nov 2020 04:56:50 GMT'},
     'RetryAttempts': 0}}
   ```

1. Call `describe_model_package` to see the details of the model version. You pass in the ARN of a model version that you got in the output of the call to `list_model_packages`.

   ```
   sm_client.describe_model_package(ModelPackageName="arn:aws:sagemaker:us-east-2:123456789012:model-package/ModelGroup1/1")
   ```

   The output of this call is a JSON object with the model version details.

   ```
   {'ModelPackageGroupName': 'ModelGroup1',
    'ModelPackageVersion': 1,
    'ModelPackageArn': 'arn:aws:sagemaker:us-east-2:123456789012:model-package/ModelGroup/1',
    'ModelPackageDescription': 'Test Model',
    'CreationTime': datetime.datetime(2020, 10, 29, 1, 27, 46, 46000, tzinfo=tzlocal()),
    'InferenceSpecification': {'Containers': [{'Image': '257758044811.dkr.ecr.us-east-2.amazonaws.com/sagemaker-xgboost:1.0-1-cpu-py3',
       'ImageDigest': 'sha256:99fa602cff19aee33297a5926f8497ca7bcd2a391b7d600300204eef803bca66',
       'ModelDataUrl': 's3://sagemaker-us-east-2-123456789012/ModelGroup1/pipelines-0gdonccek7o9-AbaloneTrain-stmiylhtIR/output/model.tar.gz'}],
     'SupportedTransformInstanceTypes': ['ml.m5.xlarge'],
     'SupportedRealtimeInferenceInstanceTypes': ['ml.t2.medium', 'ml.m5.xlarge'],
     'SupportedContentTypes': ['text/csv'],
     'SupportedResponseMIMETypes': ['text/csv']},
    'ModelPackageStatus': 'Completed',
    'ModelPackageStatusDetails': {'ValidationStatuses': [],
     'ImageScanStatuses': []},
    'CertifyForMarketplace': False,
    'ModelApprovalStatus': 'PendingManualApproval',
    'LastModifiedTime': datetime.datetime(2020, 10, 29, 1, 28, 0, 438000, tzinfo=tzlocal()),
    'ResponseMetadata': {'RequestId': '12345678-abcd-1234-abcd-aabbccddeeff',
     'HTTPStatusCode': 200,
     'HTTPHeaders': {'x-amzn-requestid': '212345678-abcd-1234-abcd-aabbccddeeff',
      'content-type': 'application/x-amz-json-1.1',
      'content-length': '1038',
      'date': 'Mon, 23 Nov 2020 04:59:38 GMT'},
     'RetryAttempts': 0}}
   ```

### Model package model card schema (Studio)
Model package model card schema

All details related to the model version are encapsulated in the model package’s model card. The model card of a model package is a special usage of the Amazon SageMaker Model Card and its schema is simplified. The model package model card schema is shown in the following expandable dropdown.

#### Model package model card schema


```
{
  "title": "SageMakerModelCardSchema",
  "description": "Schema of a model package’s model card.",
  "version": "0.1.0",
  "type": "object",
  "additionalProperties": false,
  "properties": {
    "model_overview": {
      "description": "Overview about the model.",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "model_creator": {
          "description": "Creator of model.",
          "type": "string",
          "maxLength": 1024
        },
        "model_artifact": {
          "description": "Location of the model artifact.",
          "type": "array",
          "maxContains": 15,
          "items": {
            "type": "string",
            "maxLength": 1024
          }
        }
      }
    },
    "intended_uses": {
      "description": "Intended usage of model.",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "purpose_of_model": {
          "description": "Reason the model was developed.",
          "type": "string",
          "maxLength": 2048
        },
        "intended_uses": {
          "description": "Intended use cases.",
          "type": "string",
          "maxLength": 2048
        },
        "factors_affecting_model_efficiency": {
          "type": "string",
          "maxLength": 2048
        },
        "risk_rating": {
          "description": "Risk rating for model card.",
          "$ref": "#/definitions/risk_rating"
        },
        "explanations_for_risk_rating": {
          "type": "string",
          "maxLength": 2048
        }
      }
    },
    "business_details": {
      "description": "Business details of model.",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "business_problem": {
          "description": "Business problem solved by the model.",
          "type": "string",
          "maxLength": 2048
        },
        "business_stakeholders": {
          "description": "Business stakeholders.",
          "type": "string",
          "maxLength": 2048
        },
        "line_of_business": {
          "type": "string",
          "maxLength": 2048
        }
      }
    },
    "training_details": {
      "description": "Overview about the training.",
      "type": "object",
      "additionalProperties": false,
      "properties": {
        "objective_function": {
          "description": "The objective function for which the model is optimized.",
          "function": {
            "$ref": "#/definitions/objective_function"
          },
          "notes": {
            "type": "string",
            "maxLength": 1024
          }
        },
        "training_observations": {
          "type": "string",
          "maxLength": 1024
        },
        "training_job_details": {
          "type": "object",
          "additionalProperties": false,
          "properties": {
            "training_arn": {
              "description": "SageMaker Training job ARN.",
              "type": "string",
              "maxLength": 1024
            },
            "training_datasets": {
              "description": "Location of the model datasets.",
              "type": "array",
              "maxContains": 15,
              "items": {
                "type": "string",
                "maxLength": 1024
              }
            },
            "training_environment": {
              "type": "object",
              "additionalProperties": false,
              "properties": {
                "container_image": {
                  "description": "SageMaker training image URI.",
                  "type": "array",
                  "maxContains": 15,
                  "items": {
                    "type": "string",
                    "maxLength": 1024
                  }
                }
              }
            },
            "training_metrics": {
              "type": "array",
              "items": {
                "maxItems": 50,
                "$ref": "#/definitions/training_metric"
              }
            },
            "user_provided_training_metrics": {
              "type": "array",
              "items": {
                "maxItems": 50,
                "$ref": "#/definitions/training_metric"
              }
            },
            "hyper_parameters": {
              "type": "array",
              "items": {
                "maxItems": 100,
                "$ref": "#/definitions/training_hyper_parameter"
              }
            },
            "user_provided_hyper_parameters": {
              "type": "array",
              "items": {
                "maxItems": 100,
                "$ref": "#/definitions/training_hyper_parameter"
              }
            }
          }
        }
      }
    },
    "evaluation_details": {
      "type": "array",
      "default": [],
      "items": {
        "type": "object",
        "required": [
          "name"
        ],
        "additionalProperties": false,
        "properties": {
          "name": {
            "type": "string",
            "pattern": ".{1,63}"
          },
          "evaluation_observation": {
            "type": "string",
            "maxLength": 2096
          },
          "evaluation_job_arn": {
            "type": "string",
            "maxLength": 256
          },
          "datasets": {
            "type": "array",
            "items": {
              "type": "string",
              "maxLength": 1024
            },
            "maxItems": 10
          },
          "metadata": {
            "description": "Additional attributes associated with the evaluation results.",
            "type": "object",
            "additionalProperties": {
              "type": "string",
              "maxLength": 1024
            }
          },
          "metric_groups": {
            "type": "array",
            "default": [],
            "items": {
              "type": "object",
              "required": [
                "name",
                "metric_data"
              ],
              "properties": {
                "name": {
                  "type": "string",
                  "pattern": ".{1,63}"
                },
                "metric_data": {
                  "type": "array",
                  "items": {
                    "anyOf": [
                      {
                        "$ref": "#/definitions/simple_metric"
                      },
                      {
                        "$ref": "#/definitions/linear_graph_metric"
                      },
                      {
                        "$ref": "#/definitions/bar_chart_metric"
                      },
                      {
                        "$ref": "#/definitions/matrix_metric"
                      }
                    ]

                  }
                }
              }
            }
          }
        }
      }
    },
    "additional_information": {
      "additionalProperties": false,
      "type": "object",
      "properties": {
        "ethical_considerations": {
          "description": "Ethical considerations for model users.",
          "type": "string",
          "maxLength": 2048
        },
        "caveats_and_recommendations": {
          "description": "Caveats and recommendations for model users.",
          "type": "string",
          "maxLength": 2048
        },
        "custom_details": {
          "type": "object",
          "additionalProperties": {
            "$ref": "#/definitions/custom_property"
          }
        }
      }
    }
  },
  "definitions": {
    "source_algorithms": {
      "type": "array",
      "minContains": 1,
      "maxContains": 1,
      "items": {
        "type": "object",
        "additionalProperties": false,
        "required": [
          "algorithm_name"
        ],
        "properties": {
          "algorithm_name": {
            "description": "The name of the algorithm used to create the model package. The algorithm must be either an algorithm resource in your SageMaker AI account or an algorithm in AWS Marketplace that you are subscribed to.",
            "type": "string",
            "maxLength": 170
          },
          "model_data_url": {
            "description": "Amazon S3 path where the model artifacts, which result from model training, are stored.",
            "type": "string",
            "maxLength": 1024
          }
        }
      }
    },
    "inference_specification": {
      "type": "object",
      "additionalProperties": false,
      "required": [
        "containers"
      ],
      "properties": {
        "containers": {
          "description": "Contains inference related information used to create model package.",
          "type": "array",
          "minContains": 1,
          "maxContains": 15,
          "items": {
            "type": "object",
            "additionalProperties": false,
            "required": [
              "image"
            ],
            "properties": {
              "model_data_url": {
                "description": "Amazon S3 path where the model artifacts, which result from model training, are stored.",
                "type": "string",
                "maxLength": 1024
              },
              "image": {
                "description": "Inference environment path. The Amazon Elastic Container Registry (Amazon ECR) path where inference code is stored.",
                "type": "string",
                "maxLength": 255
              },
              "nearest_model_name": {
                "description": "The name of a pre-trained machine learning benchmarked by an Amazon SageMaker Inference Recommender model that matches your model.",
                "type": "string"
              }
            }
          }
        }
      }
    },
    "risk_rating": {
      "description": "Risk rating of model.",
      "type": "string",
      "enum": [
        "High",
        "Medium",
        "Low",
        "Unknown"
      ]
    },
    "custom_property": {
      "description": "Additional property.",
      "type": "string",
      "maxLength": 1024
    },
    "objective_function": {
      "description": "Objective function for which the training job is optimized.",
      "additionalProperties": false,
      "properties": {
        "function": {
          "type": "string",
          "enum": [
            "Maximize",
            "Minimize"
          ]
        },
        "facet": {
          "type": "string",
          "maxLength": 63
        },
        "condition": {
          "type": "string",
          "maxLength": 63
        }
      }
    },
    "training_metric": {
      "description": "Training metric data.",
      "type": "object",
      "required": [
        "name",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "value": {
          "type": "number"
        }
      }
    },
    "training_hyper_parameter": {
      "description": "Training hyperparameter.",
      "type": "object",
      "required": [
        "name",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "value": {
          "type": "string",
          "pattern": ".{1,255}"
        }
      }
    },
    "linear_graph_metric": {
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "linear_graph"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "array",
              "items": {
                "type": "array",
                "items": {
                  "type": "number"
                },
                "minItems": 2,
                "maxItems": 2
              },
              "minItems": 1
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        }
      }
    },
    "bar_chart_metric": {
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "bar_chart"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "array",
              "items": {
                "type": "number"
              },
              "minItems": 1
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_array"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        }
      }
    },
    "matrix_metric": {
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "matrix"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "array",
              "items": {
                "type": "array",
                "items": {
                  "type": "number"
                },
                "minItems": 1,
                "maxItems": 20
              },
              "minItems": 1,
              "maxItems": 20
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_array"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_array"
        }
      }
    },
    "simple_metric": {
      "description": "Metric data.",
      "type": "object",
      "required": [
        "name",
        "type",
        "value"
      ],
      "additionalProperties": false,
      "properties": {
        "name": {
          "type": "string",
          "pattern": ".{1,255}"
        },
        "notes": {
          "type": "string",
          "maxLength": 1024
        },
        "type": {
          "type": "string",
          "enum": [
            "number",
            "string",
            "boolean"
          ]
        },
        "value": {
          "anyOf": [
            {
              "type": "number"
            },
            {
              "type": "string",
              "maxLength": 63
            },
            {
              "type": "boolean"
            }
          ]
        },
        "x_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        },
        "y_axis_name": {
          "$ref": "#/definitions/axis_name_string"
        }
      }
    },
    "axis_name_array": {
      "type": "array",
      "items": {
        "type": "string",
        "maxLength": 63
      }
    },
    "axis_name_string": {
      "type": "string",
      "maxLength": 63
    }
  }
}
```

## View and Update the Details of a Model Version (Studio or Studio Classic)


To view and update the details of a model version, complete the following steps based on whether you use Studio or Studio Classic. In Studio Classic, you can update the approval status for a model version. For details, see [Update the Approval Status of a Model](model-registry-approve.md). In Studio, on the other hand, SageMaker AI creates a model card for a model package, and the model version UI provides options to update details in the model card.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models** from the menu.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. Select the name of the model group containing the model version to view.

1. In the list of model versions, select the model version to view.

1. Choose one of the following tabs.
   + **Training**: To view or edit details related to your training job, including performance metrics, artifacts, IAM role and encryption, and containers. For more information, see [Add a training job (Studio)](model-registry-details-studio-training.md).
   + **Evaluate**: To view or edit details related to your training job, such as performance metrics, evaluation datasets, and security. For more information, see [Add an evaluation job (Studio)](model-registry-details-studio-evaluate.md).
   + **Audit**: To view or edit high-level details related to the model’s business purpose, usage, risk, and technical details such as algorithm and performance limitations. For more information, see [Update audit (governance) information (Studio)](model-registry-details-studio-audit.md).
   + **Deploy**: To view or edit the location of your inference image container and instances which compose the endpoint. For more information, see [Update deployment information (Studio)](model-registry-details-studio-deploy.md).

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. From the model groups list, select the name of the Model Group you want to view.

1. A new tab appears with a list of the model versions in the Model Group.

1. In the list of model versions, select the name of the model version for which you want to view details.

1. On the model version tab that opens, choose one of the following to see details about the model version:
   + **Activity**: Shows events for the model version, such as approval status updates.
   + **Model quality**: Reports metrics related to your Model Monitor model quality checks, which compare model predictions to Ground Truth. For more information about Model Monitor model quality checks, see [Model quality](model-monitor-model-quality.md). 
   + **Explainability**: Reports metrics related to your Model Monitor feature attribution checks, which compare the relative rankings of your features in training data versus live data. For more information about Model Monitor explainability checks, see [Feature attribution drift for models in production](clarify-model-monitor-feature-attribution-drift.md).
   + **Bias**: Reports metrics related to your Model Monitor bias drift checks, which compare the distribution of live data to training data. For more information about Model Monitor bias drift checks, see [Bias drift for models in production](clarify-model-monitor-bias-drift.md).
   + **Inference recommender**: Provides initial instance recommendations for optimal performance based on your model and sample payloads.
   + **Load test**: Runs load tests across your choice of instance types when you provide your specific production requirements, such as latency and throughput constraints.
   + **Inference specification**: Displays instance types for your real-time inference and transform jobs, and information about your Amazon ECR containers.
   + **Information**: Shows information such as the project with which the model version is associated, the pipeline that generated the model, the Model Group, and the model's location in Amazon S3.

------

# Add a training job (Studio)


**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. The following section is specific to using the updated Studio experience. For information about using the Studio Classic application, see [Amazon SageMaker Studio Classic](studio.md).

You can add one training job, created externally or with SageMaker AI, to your model. If you add a SageMaker training job, SageMaker AI prepopulates the fields for all of the subpages in the **Train** tab. If you add an externally created training job, you need to add details related to your training job manually. 

**To add a training job to your model package, complete the following steps.**

1. Choose the **Train** tab.

1. Choose **Add**. If you do not see this option, you may already have a training job attached. If you want to remove this training job, complete the following instructions to remove a training job. 

1. You can add a training job you created in SageMaker AI or a training job you created externally.

   1. To add a training job you created in SageMaker AI, complete the following steps.

      1. Choose **SageMaker AI**.

      1. Select the radio box next to the training job you want to add.

      1. Choose **Add**.

   1. To add a training job you created externally, complete the following steps.

      1. Choose **Custom**.

      1. In the **Name** field, insert the name of your custom training job.

      1. Choose **Add**.

# Remove a training job (Studio)


You can remove a training job, created externally or with SageMaker AI, from your model by completing the following steps.

**To remove a training job from your model package, complete the following steps.**

1. Choose **Train**.

1. Choose the **Gear** ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/Settings_squid.png)) icon under the **Train** tab.

1. Choose **Remove** next to your training job.

1. Choose **Yes, I want to remove <name of your training job>**.

1. Choose **Done**.

# Update training job details (Studio)


Complete the following steps to update the details of a training job, created externally or with SageMaker AI, associated with your model.

**To update (and view) details related to the training job:**

1. On the **Train** tab, view the status of the training job. The status is `Complete` if you added a training job to your model package and `Undefined` if not.

1. To view details related to your training job such as performance, hyperparameters, and identifying details, choose the **Train** tab.

1. To update and view details related to model performance, complete the following steps.

   1. Choose **Performance** in the left sidebar of the **Train** tab.

   1. View **Metrics** related to your training job. The **Performance** page lists metrics by name, value, and any notes you added related to the metric.

   1. (Optional) To add notes to existing metrics, complete the following steps.

      1. Choose the vertical ellipsis in the top right corner of the model version page, and choose **Edit**.

      1. Add notes to any of the listed metrics.

      1. At the top of the model version page, choose **Save** in the **Editing Model Version...** banner.

   1. View **Custom Metrics** related to your training job. Custom metrics are formatted similarly to metrics.

   1. (Optional) To add custom metrics, complete the following steps.

      1. Choose **Add**.

      1. Insert a name, value, and any optional notes for your new metric.

   1. (Optional) To remove custom metrics, choose the **Trash** icon next to the metric you want to remove.

   1. In the **Observations** text box, view any notes you added related to the performance of your training job.

   1. (Optional) To add or update observations, complete the following steps.

      1. Choose the vertical ellipsis in the top right corner of the model version page, and choose **Edit**.

      1. Add or update your notes in the **Observations** text box.

      1. At the top of the model version page, choose **Save** in the **Editing Model Version...** banner.

1. To update and view details related to model artifacts, complete the following steps.

   1. Choose **Artifacts** in the left sidebar of the **Train** tab.

   1. In the **Location (S3 URI)** field, view the Amazon S3 location of your training datasets.

   1. In the **Models** field, view the name and Amazon S3 locations of model artifacts from other models that you included in the training job.

   1. To update any of the fields in the **Artifacts** page, complete the following steps.

      1. Choose the vertical ellipsis in the top right of the model version page, and choose **Edit**.

      1. Enter new values in any of the fields.

      1. At the top of the model version page, choose **Save** in the **Editing Model Version...** banner.

1. To update and view details related to hyperparameters, complete the following steps.

   1. Choose **Hyperparameters** in the left sidebar of the **Train** tab.

   1. View the SageMaker AI provided and custom hyperparameters defined. Each hyperparameter is listed with its name and value.

   1. View the custom hyperparameters you added.

   1. (Optional) To add an additional custom hyperparameter, complete the following steps.

      1. Above the top right corner of the **Custom Hyperparameters** table, choose **Add**. A pair of new blank fields appears.

      1. Enter the name and value of the new custom hyperparameter. These values are automatically saved.

   1. (Optional) To remove a custom hyperparameter, choose the **Trash** icon to the right of the hyperparameter.

1. To update and view details related to the training job environment, complete the following steps.

   1. Choose **Environment** in the left sidebar of the **Train** tab.

   1. View the Amazon ECR URI locations for any training job containers added by SageMaker AI (for a SageMaker training job) or by you (for a custom training job).

   1. (Optional) To add an additional training job container, choose **Add**, and then enter the URI of the new training container.

1. To update and view the training job name and the Amazon Resource Names (ARN) for the training job, complete the following steps.

   1. Choose **Details** in the left sidebar of the **Train** tab.

   1. View the training job name and ARN of the training job.

# Add an evaluation job (Studio)


**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. The following section is specific to using the updated Studio experience. For information about using the Studio Classic application, see [Amazon SageMaker Studio Classic](studio.md).

After you register your model, you can test your model with one or more datasets to assess its performance. You can add one or more evaluation jobs from Amazon S3 or define your own evaluation job by manually entering all details. If you add a job from Amazon S3, SageMaker AI prepopulates the fields for all of the subpages in the **Evaluate** tab. If you define your own evaluation job, you need to add details related to your evaluation job manually.

**To add your first evaluation job to your model package, complete the following steps.**

1. Choose the **Evaluate** tab.

1. Choose **Add**.

1. You can add an evaluation job from Amazon S3 or a custom evaluation job.

   1. To add an evaluation job with collaterals from Amazon S3, complete the following steps.

      1. Choose **S3**.

      1. Enter a name for the evaluation job.

      1. Enter the Amazon S3 location to the output collaterals of your evaluation job.

      1. Choose **Add**.

   1. To add a custom evaluation job, complete the following step:

      1. Choose **Custom**.

      1. Enter a name for the evaluation job.

      1. Choose **Add**.

**To add an additional evaluation job to your model package, complete the following steps.**

1. Choose the **Evaluate** tab.

1. Choose the **Gear** ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/Settings_squid.png)) icon under the **Train** tab.

1. In the dialog box, choose **Add**.

1. You can add an evaluation job from Amazon S3 or a custom evaluation job.

   1. To add an evaluation job with collaterals from Amazon S3, complete the following steps.

      1. Choose **S3**.

      1. Enter a name for the evaluation job.

      1. Enter the Amazon S3 location to the output collaterals of your evaluation job.

      1. Choose **Add**.

   1. To add a custom evaluation job, complete the following step:

      1. Choose **Custom**.

      1. Enter a name for the evaluation job.

      1. Choose **Add**.

# Remove an evaluation job (Studio)


You can remove an evaluation job, created externally or with SageMaker AI, from your model by completing the following steps.

**To remove an evaluation job from your model package, complete the following steps.**

1. Choose the **Evaluate** tab.

1. Choose the **Gear** ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/icons/Settings_squid.png)) icon under the **Train** tab.

1. (Optional) To find your evaluation job from the list, enter a search term in the search box to narrow the list of choices.

1. Choose the radio button next to your evaluation job.

1. Choose **Remove**.

1. Choose **Yes, I want to remove <name of your evaluation job>**.

1. Choose **Done**.

# Update an evaluation job (Studio)


Complete the following steps to update the details of an evaluation job, created externally or with SageMaker AI, associated with your model.

**To update (and view) details related to the evaluation job:**

1. On the **Evaluate** tab, view the status of the evaluation job. The status is `Complete` if you added an evaluation job to your model package and `Undefined` if not.

1. To view details related to your evaluation job, such as performance and artifacts location, choose the **Evaluate** tab.

1. To update and view details related to model performance during evaluation, complete the following steps.

   1. Choose **Performance** in the **Evaluate** tab sidebar.

   1. View metrics related to your evaluation job in the **Metrics** list. The **Metrics** list displays the individual metrics by name, value, and any notes you added related to the metric.

   1. In the **Observations** text box, view any notes you added related to the performance of your evaluation job.

   1. To update any of the **Notes** fields for any metric or the **Observations** field, complete the following steps.

      1. Choose the vertical ellipsis in the top right of the model version page, and choose **Edit**.

      1. Enter notes for any metric or in the **Observations** text box.

      1. At the top of the model version page, choose **Save** in the **Editing Model Version...** banner.

1. To update and view details related to your evaluation job datasets, complete the following steps.

   1. Choose **Artifacts** in the left sidebar of the **Evaluate** page.

   1. View datasets used in your evaluation job.

   1. (Optional) To add a dataset, choose **Add** and enter an Amazon S3 URI to the dataset.

   1. (Optional) To remove a dataset, choose the **Trash** icon next to the dataset you want to remove.

1. To view the job name and evaluation job ARN, choose **Details**.

# Update audit (governance) information (Studio)


**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. The following section is specific to using the updated Studio experience. For information about using the Studio Classic application, see [Amazon SageMaker Studio Classic](studio.md).

Document important model details to help your organization establish a robust framework of model governance. You and your team members can reference these details so they use the model for the appropriate use cases, know the business domain and owners of the model, and understand model risks. You can also save details about how the model is expected to perform and reasons for performance limitations.

**To view or update details related to the model governance, complete the following steps.**

1. On the **Audit** tab, view the approval status of the model card. The status can be one the following:
   + **Draft**: The model card is still a draft.
   + **Pending approval**: The model card is waiting to be approved.
   + **Approved**: The model card is approved.

1. To update the approval status of the model card, choose the pulldown menu next to the approval status and choose the updated approval status.

1. To update and view details related to your model package risk, complete the following steps.

   1. Choose **Risk** in the left sidebar of the **Audit** tab.

   1. View the current risk rating and explanation for the risk rating.

   1. To update the rating or explanation, complete the following steps.

      1. Choose the vertical ellipsis at the top right corner of the **Audit** page, and choose **Edit**.

      1. (Optional) Choose an updated risk rating.

      1. (Optional) Update the risk rating explanation.

      1.  At the top of the model version page, choose **Save** in the **Editing Model Version...** banner.

1. To update and view details related to the usage of your model package, complete the following steps.

   1. Choose **Usage** in the left sidebar of the **Audit** tab.

   1. View text you added in the following fields:
      + **Problem type**: The category of machine learning algorithm used to build your model.
      + **Algorithm type**: The specific algorithm used to create your model.
      + **Intended uses**: The current application of the model in your business problem.
      + **Factors affecting model efficacy**: Notes about your model’s performance limitations.
      + **Recommended use**: The types of applications you can create with the model, the scenarios in which you can expect a reasonable performance, or the type of data to use with the model.
      + **Ethical considerations**: A description of how your model might discriminate based on factors such as age or gender.

   1. To update any of the previously listed fields, complete the following steps.

      1. Choose the vertical ellipsis at the top right corner of the model version page, and choose **Edit**.

      1. (Optional) Use the dropdown menus for **Problem type** and **Algorithm type** to select new values, if needed.

      1. (Optional) Update the text descriptions in the remaining fields.

      1.  At the top of the model version page, choose **Save** in the **Editing Model Version...** banner.

1. To update and view details related to the stakeholders of your model package, complete the following steps.

   1. Choose **Stakeholders** in the left sidebar of the **Audit** tab.

   1. View the current model owner and creator, if any.

   1. To update the model owner or creator, complete the following steps:

      1. Choose the vertical ellipsis at the top right corner of the model version page, and choose **Edit**.

      1. Update the model owner or model creator fields.

      1.  At the top of the model version page, choose **Save** in the **Editing Model Version...** banner.

1. To update and view details related to the business problem that your model package addresses, complete the following steps.

   1. Choose **Business** in the left sidebar of the **Audit** tab.

   1. View the current descriptions, if any, for the business problem that the model addresses, the business problem stakeholders, and the line of business.

   1. To update any of the fields in the **Business** tab, complete the following steps.

      1. Choose the vertical ellipsis at the top right corner of the model version page, and choose **Edit**.

      1. Update the descriptions in any of the fields.

      1.  At the top of the model version page, choose **Save** in the **Editing Model Version...** banner.

1. To update and view existing documentation (represented as key-value pairs) for your model, complete the following steps.

   1. Choose **Documentation** in the left sidebar of the **Audit** page.

   1. View existing key-value pairs.

   1. To add any key-value pairs, complete the following steps.

      1. Choose the vertical ellipsis at the top right corner of the model version page, and choose **Edit**.

      1. Choose **Add**.

      1. Enter a new key and associated value.

      1.  At the top of the model version page, choose **Save** in the **Editing Model Version...** banner.

   1. To remove any key-value pairs, complete the following steps.

      1. Choose the vertical ellipsis at the top right corner of the model version page, and choose **Edit**.

      1. Choose the **Trash** icon next to the key-value pair to remove.

      1.  At the top of the model version page, choose **Save** in the **Editing Model Version...** banner.

# Update deployment information (Studio)


**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. The following section is specific to using the updated Studio experience. For information about using the Studio Classic application, see [Amazon SageMaker Studio Classic](studio.md).

After you evaluate your model performance and determine that it is ready to use for production workloads, you can change the approval status of the model to initiate CI/CD deployment. For more about approval status definitions, see [Update the Approval Status of a Model](model-registry-approve.md).

**To view or update details related to the model package deployment, complete the following steps.**

1. On the **Deploy** tab, view the model package approval status. Possible values can be the following:
   + **Pending Approval**: The model is registered but not yet approved or rejected for deployment.
   + **Approved**: The model is approved for CI/CD deployment. If there is an EventBridge rule in place that initiates model deployment upon a model approval event, as is the case for a model built from a SageMaker AI project template, SageMaker AI also deploys the model.
   + **Rejected**: The model is rejected for deployment.

   If you need to change the approval status, choose the dropdown menu next to the status and choose the updated status.

1. To update the model package approval status, choose the dropdown next to the approval status and choose the updated approval status.

1. In the **Containers** list, view the inference image containers.

1. In the **Instances** list, view the instances which compose your deployment endpoint.

# Compare Model Versions


As you generate model versions, you might want to compare models versions by viewing relevant model quality metrics side-by-side. For example, you might want to track accuracy by comparing mean squared error (MSE) values, or you might decide to remove models that perform poorly on selected measures. The following procedure shows you how to set up model version comparison in Model Registry using the Amazon SageMaker Studio Classic console.

## Compare Model Versions (Amazon SageMaker Studio Classic)


**Note**  
You can only compare model versions the Amazon SageMaker Studio Classic console.

To compare model versions within a model group, complete the following steps:

1. Sign in to Studio Classic. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. From the model groups list, select the name of the Model Group you want to view. A new tab opens with a list of the model versions in the Model Group.

1. In the list of model versions, check the boxes next to the model versions you want to compare.

1. Choose the **Actions** dropdown menu, then choose **Compare**. A listing of model quality metrics appears for your selected models.

# View and Manage Model Group and Model Version Tags
View and Manage Tags

Model Registry helps you view and manage tags related to your model groups. You can use tags to categorize model groups by purpose, owner, environment, or other criteria. The following instructions show you how to view, add, delete, and edit your tags in the Amazon SageMaker Studio console.

**Note**  
Model packages in the SageMaker Model Registry do not support tags—these are versioned model packages. Instead, you can add key value pairs using `CustomerMetadataProperties`. Model package groups in the model registry support tagging. 

## View and manage model group tags


------
#### [ Studio ]

**To view a model group tag, complete the following steps:**

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models** to display a list of your model groups.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. From the Model Groups list, select the name of the Model Group you want to view.

1. In the model group page, choose the **Tags** tab. View the tags associated with your model group.

**To add a model group tag, complete the following steps:**

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models** to display a list of your model groups.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. From the Model Groups list, select the name of the Model Group you want to edit.

1. In the model group page, choose the **Tags** tab.

1. Choose **Add/Edit tags**.

1. Above **\$1 Add new tag**, enter your new key in the blank **Key** field.

1. (Optional) Enter your new value in the blank **Value** field.

1. Choose **Confirm changes**.

1. Confirm your new tag appears in the **Tags** section of the **Information** page.

**To delete a model group tag, complete the following steps:**

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models** to display a list of your model groups.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. From the Model Groups list, select the name of the Model Group you want to edit.

1. In the model group page, choose the **Tags** tab.

1. Choose **Add/Edit tags**.

1. Choose the **Trash** icon next to the key-value pair you want to remove.

1. Choose **Confirm changes**.

**To edit a model group tag, complete the following steps:**

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models** to display a list of your model groups.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. From the Model Groups list, select the name of the Model Group you want to edit.

1. In the model group page, choose the **Tags** tab.

1. Choose **Add/Edit tags**.

1. Enter a new value in the **Value** field of the key-pair you want to edit.

1. Choose **Confirm changes**.

------
#### [ Studio Classic ]

**To view a model group tag, complete the following steps:**

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. From the Model Groups list, select the name of the Model Group you want to edit.

1. Choose **Information**.

1. View your tags in the **Tags** section of the **Information** page.

**To add a model group tag, complete the following steps:**

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. From the Model Groups list, select the name of the Model Group you want to edit.

1. Choose **Information**.

1. If you don't have any tags, choose **Add tags**.

1. If you have pre-existing tags, choose **Manage tags** in the **Tags** section. A list of the model group's tags appears as key-value pairs.

1. Above **Add new tag**, enter your new key in the blank **Key** field.

1. (Optional) Enter your new value in the blank **Value** field.

1. Choose **Confirm changes**.

1. Confirm your new tag appears in the **Tags** section of the **Information** page.

**To delete a model group tag, complete the following steps:**

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. From the Model Groups list, select the name of the Model Group you want to edit.

1. Choose **Information**.

1. In the **Tags** section, choose **Manage tags**. A list of the model group's tags appears as key-value pairs.

1. Choose the **Trash** icon to the right of the tag you want to remove.

1. Choose **Confirm changes**.

1. Confirm your removed tag does not appear in the **Tags** section of the **Information** page.

**To edit a model group tag, complete the following steps:**

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. From the Model Groups list, select the name of the Model Group you want to edit.

1. Choose **Information**.

1. In the **Tags** section, choose **Manage tags**. A list of the model group's tags appears as key-value pairs.

1. Edit any key or value.

1. Choose **Confirm changes**.

1. Confirm your tag contains your edits in the **Tags** section of the **Information** page.

**To assign or tag model groups to a project, complete the following steps:**

1. Get tags with key `sagemaker:project-name` and `sagemaker:project-id` for the SageMaker AI project using the [ListTags](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ListTags.html) API.

1. To apply the tags to your model package group, choose one of the following methods:
   + If you create a new model package group and want to add tags, pass your tags from Step 1 to the [CreateModelPackageGroup](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelPackageGroup.html) API.
   + If you want to add tags to an existing model package group, use the [AddTags](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_AddTags.html) APIs.
   + If you create your model package group through Pipelines, use the `pipeline.create()` or `pipeline.upsert()` methods, or pass your tags to the [RegisterModel](https://docs.aws.amazon.com/sagemaker/latest/dg/build-and-manage-steps.html#step-type-register-model) step.

------

# Delete a Model Version
Delete a Model Version

This procedure demonstrates how to delete a model version in the Amazon SageMaker Studio console.

## Delete a Model Version (Studio or Studio Classic)


To delete a model version in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models** to display a list of your model groups.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. From the model groups list, choose the angle bracket to the left of the model group that you want to view.

1. A list of the model versions in the model group appears. If you don't see the model version that you want to delete, choose **View all**.

1. Select the check boxes next to the model versions that you want to delete.

1. Choose the vertical ellipsis above the top right corner of the table, and choose **Delete** (or **Delete model version** if you are in the model group details page).

1. In the **Delete model version** dialog box, choose **Yes, delete the model version**.

1. Choose **Delete**.

1. Confirm that your deleted model versions no longer appear in the model group.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**. A list of your Model Groups appears.

1. From the model groups list, select the name of the Model Group of the model version that you want to delete.

1. From the list of model versions, select the name of the model version that you want to delete.

1. Choose the **Actions** dropdown menu, and choose **Remove**.

1. In the confirmation dialog box, enter `REMOVE`.

1. Choose **Remove**.

1. Confirm the model version you removed does not appear in the list of the model group’s model versions.

------

# Staging Construct for your Model Lifecycle
Staging Construct

You can define a series of stages that models can progress through for your model workflows and lifecycle with the Model Registry staging construct. This simplifies tracking and managing models as they transition through development, testing, and production stages. The following will provide information on staging constructs and how to use them in your model governance.

The stage construct allows you to define a series of stages and statuses that models progress through. At each stage, specific personas with the relevant permissions can update the stage status. As a model advances through the stages, its metadata is carried forward, providing a comprehensive view of the model's lifecycle. This metadata can be accessed and reviewed by authorized personas at each stage, enabling informed decision making. This includes the following benefits.
+ Model Life Cycle Permissions - Set permissions for designated personas to update a model stage state and enforce approval gates at critical transition points. Administrators can assign permission by using IAM policies and condition keys with the API. For example, you can restrict you data scientist from updating the Model Lifecycle stage transition from "Development" to "Production". For examples, see [Set up Staging Construct Examples](model-registry-staging-construct-set-up.md).
+ Model Life Cycle Events via Amazon EventBridge - You can consume the lifecycle stage events using EventBridge. This sets you up to receive event notifications when models change approval or staging state, enabling integration with third-party governance tools. See [Get event notifications for ModelLifeCycle](model-registry-staging-construct-event-bridge.md) for an example.
+ Search based on Model Life Cycle Fields - You can search and filter stage and stage status using the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_Search.html) API.
+ Audit Trails for Model Life Cycle Events - You can view the history of model approval and staging events for the model lifecycle transitions.

The following topics will walk you through how to set up a stage construct on the administrator side and how to update a stage status from the user side.

**Topics**
+ [

# Set up Staging Construct Examples
](model-registry-staging-construct-set-up.md)
+ [

# Update a model package stage and status in Studio
](model-registry-staging-construct-update-studio.md)
+ [

# Update a model package stage and status example (boto3)
](model-registry-staging-construct-update-boto3.md)
+ [

# Invoke ModelLifeCycle using the AWS CLI examples
](model-registry-staging-construct-cli.md)
+ [

# Get event notifications for ModelLifeCycle
](model-registry-staging-construct-event-bridge.md)

# Set up Staging Construct Examples


To set up stage constructs for your Amazon SageMaker Model Registry, the administrator will need to grant the relevant permissions to the intended roles. The following provides examples on how to set up stage constructs for various roles.

**Note**  
Users within a Amazon SageMaker AI domain will be able to view all stages defined within the domain, but can only use the ones they have permissions for.

Stages are defined by the `ModelLifeCycle` parameter and have the following structure. The administrator sets up the permissions for which `stage` and `stageStatus` can be accessed by which roles. The users assuming a role can use the relevant `stage` and `stageStatus` and include their own `stageDescription`.

```
ModelLifeCycle {
    stage: String # Required (e.g., Development/QA/Production)
    stageStatus: String # Required (e.g., PendingApproval/Approved/Rejected)  
    stageDescription: String # Optional
}
```

The following table contains Model Registry pre-defined stage construct templates. You can define your own stage constructs based on your use cases. The relevant permissions will need to be set up before users can use them.


| Stage | Stage status | 
| --- | --- | 
|  Proposal  |  PendingApproval  | 
|  Development  |  InProgress  | 
|  QA  |  OnHold  | 
|  PreProduction  |  Approved  | 
|  Production  |  Rejected  | 
|  Archived  |  Retired  | 

The `ModelLifeCycle` parameter can be invoked by the following APIs:
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelPackage.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelPackage.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html)
+ [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeModelPackage.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_DescribeModelPackage.html)

------
#### [ Policy for a data scientist role ]

The following is an example IAM policy using model lifecycle condition keys. You can modify them based on your own requirements. In this example, the role’s permissions are limited to set or define the model lifecycle stage to:
+ Create or update a model with the stage `"Development"` and status `"Approved"`.
+ Update a model package with the stage quality assurance, `"QA"`, and status `"PendingApproval"`.

```
{
    "Action" : [
        "sagemaker:UpdateModelPackage",
        "sagemaker:CreateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage" : "Development"
            "sagemaker:ModelLifeCycle:stageStatus" : "Approved"       
        }
    }
},
{
    "Action" : [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage" : "Staging"
            "sagemaker:ModelLifeCycle:stageStatus" : "PendingApproval"       
        }
    }
}
```

------
#### [ Policy for a quality assurance specialist ]

The following is an example IAM policy using model lifecycle condition keys. You can modify them based on your own requirements. In this example, the role’s permissions are limited to set or define the model lifecycle stage to:
+ Update a model package with:
  + The stage `"QA"` and status `"Approved"` or `"Rejected"`.
  + The stage `"Production"` and status `"PendingApproval"`.

```
{
    "Action": [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage": "Staging",
            "sagemaker:ModelLifeCycle:stageStatus": "Approved"
        }
    }
}, {
    "Action": [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage": "Staging",
            "sagemaker:ModelLifeCycle:stageStatus": "Rejected"
        }
    }
}, {
    "Action": [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage": "Production",
            "sagemaker:ModelLifeCycle:stageStatus": "PendingApproval"
        }
    }
}
```

------
#### [ Policy for lead engineer role ]

The following is an example IAM policy using model lifecycle condition keys. You can modify them based on your own requirements. In this example, the role’s permissions are limited to set or define the model lifecycle stage to:
+ Update a model package with:
  + The stage `"Production"` and status `"Approved"` or `"Rejected"`.
  + The stage `"Development"` and status `"PendingApproval"`.

```
{
    "Action" : [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "ForAnyvalue:StringEquals" : {
            "sagemaker:ModelLifeCycle:stage" : "Production",
            "sagemaker:ModelLifeCycle:stageStatus" : "Approved"
        }
    }
},
{
    "Action" : [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals:" {
            "sagemaker:ModelLifeCycle:stage" : "Production"
            "sagemaker:ModelLifeCycle:stageStatus" : "Rejected"
        }
    }
},
{
    "Action" : [
        "sagemaker:UpdateModelPackage"
    ],
    "Resource": [
        "*"
    ],
    "Condition": {
        "StringEquals": {
            "sagemaker:ModelLifeCycle:stage" : "Development"
            "sagemaker:ModelLifeCycle:stageStatus" : "PendingApproval"
        }
    }
}
```

------

To get Amazon EventBridge notifications on any model status update, see the example in [Get event notifications for ModelLifeCycle](model-registry-staging-construct-event-bridge.md). For an example EventBridge payload you may receive, see [SageMaker model package state change](automating-sagemaker-with-eventbridge.md#eventbridge-model-package).

# Update a model package stage and status in Studio


To use a model package stage construct, you will need to assume an execution role with the relevant permissions. The following page provides information on how to update the stage status using Amazon SageMaker Studio.

All stage constructs defined in the domain will be viewable by all users. To update a stage, you will need have the administrator set up the relevant permissions for you to access it. For information on how, see [Set up Staging Construct Examples](model-registry-staging-construct-set-up.md). 

The following procedure will take you to the Studio UI where you can update your model package stage.

1. Sign in to Amazon SageMaker Studio. For more information, see [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. In the left navigation pane, choose the **Models**.

1. Find your model.
   + You can use the tabs to find your models. For example, choose the **Registered models** or **Deployable models** tabs.
   + You can use the **My models** and **Shared with me** options to find models you created or ones that are shared by you.

1. Select the checkbox next to the model you wish to update.

1. Choose the **More options** icon. 

1. Choose **Update model lifecycle**. This will take you to the **Update model lifecycle** section.

1. Complete the tasks to update the stage. 

   If you cannot update the stage, you will receive an error. Your administrator will need to set up the permissions for you to do so. For information on how to set up the permissions, see [Set up Staging Construct Examples](model-registry-staging-construct-set-up.md).

# Update a model package stage and status example (boto3)


To update a model package stage and status, you will need to assume an execution role with the relevant permissions. The following provides an example on how you can update the stage status using the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html) API using AWS SDK for Python (Boto3).

In this example, the `ModelLifeCycle` stage `"Development"` and stage status `"Approved"` condition keys for the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateModelPackage.html) API action has been granted to the your execution role. You can include a description in `stage-description`. See [Set up Staging Construct Examples](model-registry-staging-construct-set-up.md) for more information. 

```
from sagemaker import get_execution_role, session 
import boto3 

region = boto3.Session().region_name role = get_execution_role() 
sm_client = boto3.client('sagemaker', region_name=region)

model_package_update_input_dict = {
    "ModelLifeCycle" : { 
        "stage" : "Development",
        "stageStatus" : "Approved",
        "stageDescription" : "stage-description"
    }
} 
model_package_update_response = sm_client.update_model_package(**model_package_update_input_dict)
```

# Invoke ModelLifeCycle using the AWS CLI examples


You can use the AWS CLI tool to manage your AWS resources. A few AWS CLI commands include [search](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/cloudsearchdomain/search.html) and [list-actions](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/fis/list-actions.html). The following page will provide examples on how to use `ModelPackage` while using these commands. For information and examples on setting up your stage construct, see [Set up Staging Construct Examples](model-registry-staging-construct-set-up.md).

The examples on this page uses the following variables.
+ `region` is the region that your model package exists in.
+ `stage-name` is the name of your defined stage.
+ `stage-status` is the name of your defined stage status.

The following are example AWS CLI commands using ModelLifeCycle.

Search for your model packages with a *stage-name* you have already defined.

```
aws sagemaker search --region 'region' --resource ModelPackage --search-expression '{"Filters": [{"Name": "ModelLifeCycle.Stage","Value": "stage-name"}]}'
```

List the actions associated with `ModelLifeCycle`.

```
aws sagemaker list-actions --region 'region' --action-type ModelLifeCycle
```

Create a model package with ModelLifeCycle.

```
aws sagemaker create-model-package --model-package-group-name 'model-package-group-name' --source-uri 'source-uri' --region 'region' --model-life-cycle '{"Stage":"stage-name", "StageStatus":"stage-status", "StageDescription":"Your Staging Comment"}' 
```

Update a model package with ModelLifeCycle.

```
aws sagemaker update-model-package --model-package 'model-package-arn' --region 'region' --model-life-cycle '{"Stage":"stage-name", "StageStatus":"stage-status"}' 
```

Search via the ModelLifeCycle field.

```
aws sagemaker search --region 'region' --resource ModelPackage --search-expression '{"Filters": [{"Name": "ModelLifeCycle.Stage","Value": "stage-name"}]}'
```

Fetch audit records for ModelLifeField updates via [Amazon SageMaker ML Lineage Tracking](lineage-tracking.md) APIs.

```
aws sagemaker list-actions --region 'region' --action-type ModelLifeCycle
```

```
aws sagemaker describe-action --region 'region' --action-name 'action-arn or action-name'
```

# Get event notifications for ModelLifeCycle


You can get the ModelLifeCycle update notifications and events with EventBridge in your account. The following is an example of an EventBridge rule, to be configured in your account, in order to get the ModelLifeCycle event notifications.

```
{
  "source": ["aws.sagemaker"],
  "detail-type": ["SageMaker Model Package State Change"]
}
```

For an example EventBridge payload you may receive, see [SageMaker model package state change](automating-sagemaker-with-eventbridge.md#eventbridge-model-package).

# Update the Approval Status of a Model
Update Model Approval Status

After you create a model version, you typically want to evaluate its performance before you deploy it to a production endpoint. If it performs to your requirements, you can update the approval status of the model version to `Approved`. Setting the status to `Approved` can initiate CI/CD deployment for the model. If the model version does not perform to your requirements, you can update the approval status to `Rejected`.

You can manually update the approval status of a model version after you register it, or you can create a condition step to evaluate the model when you create a SageMaker AI pipeline. For information about creating a condition step in a SageMaker AI pipeline, see [Pipelines steps](build-and-manage-steps.md).

When you use one of the SageMaker AI provided project templates and the approval status of a model version changes, the following action occurs. Only valid transitions are shown.
+ `PendingManualApproval` to `Approved` – initiates CI/CD deployment for the approved model version
+ `PendingManualApproval` to `Rejected` – No action
+ `Rejected` to `Approved` – initiates CI/CD deployment for the approved model version
+ `Approved` to `Rejected` – initiates CI/CD to deploy the latest model version with an `Approved` status

You can update the approval status of a model version by using the AWS SDK for Python (Boto3) or by using the Amazon SageMaker Studio console. You can also update the approval status of a model version as part of a condition step in a SageMaker AI pipeline. For information about using a model approval step in a SageMaker AI pipeline, see [Pipelines overview](pipelines-overview.md).

## Update the Approval Status of a Model (Boto3)


When you created the model version in [Register a Model Version](model-registry-version.md), you set the `ModelApprovalStatus` to `PendingManualApproval`. You update the approval status for the model by calling `update_model_package`. Note that you can automate this process by writing code that, for example, sets the approval status of a model depending on the result of an evaluation of some measure of the model's performance. You can also create a step in a pipeline that automatically deploys a new model version when it is approved. The following code snippet shows how to manually change the approval status to `Approved`.

```
model_package_update_input_dict = {
    "ModelPackageArn" : model_package_arn,
    "ModelApprovalStatus" : "Approved"
}
model_package_update_response = sm_client.update_model_package(**model_package_update_input_dict)
```

## Update the Approval Status of a Model (Studio or Studio Classic)


To manually change the approval status in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose the **Models** to display a list of your model groups.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. From the model groups list, choose the angle bracket to the left of the model group that you want to view.

1. A list of the model versions in the model group appears. If you don't see the model version that you want to delete, choose **View all** to display the complete list of model versions in the model group details page.

1. Select the name of the model version that you want to update.

1. The **Deploy** tab displays the current approval status. Choose the dropdown menu next to the current approval status and select the updated approval status.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. From the model groups list, select the name of the Model Group that you want to view. A new tab opens with a list of the model versions in the Model Group.

1. In the list of model versions, select the name of the model version that you want to update.

1. Under the **Actions** dropdown menu, you can choose one of two possible menu options to update the model version status.
   + Using the **Update Status** option

     1. Under the **Actions** dropdown menu, choose the **Update Status** dropdown menu, and choose the new model version status.

     1. (Optional) In the **Comment** field, add additional details.

     1. Choose **Save and Update**.
   + Using the **Edit** option

     1. Under the **Actions** dropdown menu, choose **Edit**.

     1. (Optional) In the **Comment** field, add additional details.

     1. Choose **Save changes**.

1. Confirm the model version status is updated to the correct value in the model version page.

------

For `us-east-1`, `us-west-2`, `ap-northeast-1`, and `eu-west-1` regions, you can use the following instructions to access the lineage details for logged and registered model versions:

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. Choose **Models** from the left navigation pane.

1. Choose the Logged **models** tab, if not selected already, then select **Registered Models**.

1. Select a model and choose **View Latest Version**.

1. Choose the **Governance** tab.

1. The **Deploy** section under **Governance overview** displays the current approval status. Select the updated approval status from the dropdown menu.

# Deploy a Model from the Registry with Python
Deploy Model with Python

After you register a model version and approve it for deployment, deploy it to a SageMaker AI endpoint for real-time inference. You can deploy your model by using the SageMaker AI SDK or the AWS SDK for Python (Boto3).

When you create a machine learning operations (MLOps) project and choose an MLOps project template that includes model deployment, approved model versions in the Model Registry are automatically deployed to production. For information about using SageMaker AI MLOps projects, see [MLOps Automation With SageMaker Projects](sagemaker-projects.md).

You can also enable an AWS account to deploy model versions that were created in a different account by adding a cross-account resource policy. For example, one team in your organization might be responsible for training models, and a different team is responsible for deploying and updating models.

**Topics**
+ [

## Deploy a Model from the Registry (SageMaker SDK)
](#model-registry-deploy-smsdk)
+ [

## Deploy a Model from the Registry (Boto3)
](#model-registry-deploy-api)
+ [

## Deploy a Model Version from a Different Account
](#model-registry-deploy-xaccount)

## Deploy a Model from the Registry (SageMaker SDK)


To deploy a model version using the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable) use the following code snippet:

```
from sagemaker import ModelPackage
from time import gmtime, strftime

model_package_arn = 'arn:aws:sagemaker:us-east-2:12345678901:model-package/modeltest/1'
model = ModelPackage(role=role, 
                     model_package_arn=model_package_arn, 
                     sagemaker_session=sagemaker_session)
model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge')
```

## Deploy a Model from the Registry (Boto3)


To deploy a model version using the AWS SDK for Python (Boto3), complete the following steps:

1. The following code snippet assumes you already created the SageMaker AI Boto3 client `sm_client` and a model version whose ARN is stored in the variable `model_version_arn`.

   Create a model object from the model version by calling the [create\$1model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model) API operation. Pass the Amazon Resource Name (ARN) of the model version as part of the `Containers` for the model object:

   ```
   model_name = 'DEMO-modelregistry-model-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
   print("Model name : {}".format(model_name))
   container_list = [{'ModelPackageName': model_version_arn}]
   
   create_model_response = sm_client.create_model(
       ModelName = model_name,
       ExecutionRoleArn = role,
       Containers = container_list
   )
   print("Model arn : {}".format(create_model_response["ModelArn"]))
   ```

1. Create an endpoint configuration by calling `create_endpoint_config`. The endpoint configuration specifies the number and type of Amazon EC2 instances to use for the endpoint.

   ```
   endpoint_config_name = 'DEMO-modelregistry-EndpointConfig-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
   print(endpoint_config_name)
   create_endpoint_config_response = sm_client.create_endpoint_config(
       EndpointConfigName = endpoint_config_name,
       ProductionVariants=[{
           'InstanceType':'ml.m4.xlarge',
           'InitialVariantWeight':1,
           'InitialInstanceCount':1,
           'ModelName':model_name,
           'VariantName':'AllTraffic'}])
   ```

1. Create the endpoint by calling `create_endpoint`.

   ```
   endpoint_name = 'DEMO-modelregistry-endpoint-' + strftime("%Y-%m-%d-%H-%M-%S", gmtime())
   print("EndpointName={}".format(endpoint_name))
   
   create_endpoint_response = sm_client.create_endpoint(
       EndpointName=endpoint_name,
       EndpointConfigName=endpoint_config_name)
   print(create_endpoint_response['EndpointArn'])
   ```

## Deploy a Model Version from a Different Account


You can permit an AWS account to deploy model versions that were created in a different account by adding a cross-account resource policy. For example, one team in your organization might be responsible for training models, and a different team is responsible for deploying and updating models. When you create these resource policies, you apply the policy to the specific resource to which you want to grant access. For more information about cross-account resource policies in AWS, see [Cross-account policy evaluation logic](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic-cross-account.html) in the *AWS Identity and Access Management User Guide*.

**Note**  
You must use a KMS key to encrypt the [output data config](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_OutputDataConfig.html) action during training for cross-account model deployment.

To enable cross-account model deployment in SageMaker AI, you have to provide a cross-account resource policy for the Model Group that contains the model versions you want to deploy, the Amazon ECR repository where the inference image for the Model Group resides, and the Amazon S3 bucket where the model versions are stored.

To be able to deploy a model that was created in a different account, you must have a role that has access to SageMaker AI actions, such as a role with the `AmazonSageMakerFullAccess` managed policy. For information about SageMaker AI managed policies, see [AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md).

The following example creates cross-account policies for all three of these resources, and applies the policies to the resources. The example also assumes that you previously defined the following variables:
+ `bucket` – The Amazon S3 bucket where the model versions are stored.
+ `kms_key_id` – The KMS key used to encrypt the training output.
+ `sm_client` – A SageMaker AI Boto3 client.
+ `model_package_group_name` – The Model Group to which you want to grant cross-account access.
+ `model_package_group_arn` – The Model Group ARN to which you want to grant cross-account access.

```
import json

# The cross-account id to grant access to
cross_account_id = "123456789012"

# Create the policy for access to the ECR repository
ecr_repository_policy = {
    'Version': '2012-10-17		 	 	 ',
    'Statement': [{
        'Sid': 'AddPerm',
        'Effect': 'Allow',
        'Principal': {
            'AWS': f'arn:aws:iam::{cross_account_id}:root'
        },
        'Action': ['ecr:*']
    }]
}

# Convert the ECR policy from JSON dict to string
ecr_repository_policy = json.dumps(ecr_repository_policy)

# Set the new ECR policy
ecr = boto3.client('ecr')
response = ecr.set_repository_policy(
    registryId = account,
    repositoryName = 'decision-trees-sample',
    policyText = ecr_repository_policy
)

# Create a policy for accessing the S3 bucket
bucket_policy = {
    'Version': '2012-10-17		 	 	 ',
    'Statement': [{
        'Sid': 'AddPerm',
        'Effect': 'Allow',
        'Principal': {
            'AWS': f'arn:aws:iam::{cross_account_id}:root'
        },
        'Action': 's3:*',
        'Resource': f'arn:aws:s3:::{bucket}/*'
    }]
}

# Convert the policy from JSON dict to string
bucket_policy = json.dumps(bucket_policy)

# Set the new policy
s3 = boto3.client('s3')
response = s3.put_bucket_policy(
    Bucket = bucket,
    Policy = bucket_policy)

# Create the KMS grant for encryption in the source account to the
# Model Registry account Model Group
client = boto3.client('kms')

response = client.create_grant(
    GranteePrincipal=cross_account_id,
    KeyId=kms_key_id
    Operations=[
        'Decrypt',
        'GenerateDataKey',
    ],
)

# 3. Create a policy for access to the Model Group.
model_package_group_policy = {
    'Version': '2012-10-17		 	 	 ',
    'Statement': [{
        'Sid': 'AddPermModelPackageGroup',
        'Effect': 'Allow',
        'Principal': {
            'AWS': f'arn:aws:iam::{cross_account_id}:root'
        },
        'Action': ['sagemaker:DescribeModelPackageGroup'],
        'Resource': f'arn:aws:sagemaker:{region}:{account}:model-package-group/{model_package_group_name}'
    },{
        'Sid': 'AddPermModelPackageVersion',
        'Effect': 'Allow',
        'Principal': {
            'AWS': f'arn:aws:iam::{cross_account_id}:root'
        },
        'Action': ["sagemaker:DescribeModelPackage",
                   "sagemaker:ListModelPackages",
                   "sagemaker:UpdateModelPackage",
                   "sagemaker:CreateModel"],
        'Resource': f'arn:aws:sagemaker:{region}:{account}:model-package/{model_package_group_name}/*'
    }]
}

# Convert the policy from JSON dict to string
model_package_group_policy = json.dumps(model_package_group_policy)

# Set the policy to the Model Group
response = sm_client.put_model_package_group_policy(
    ModelPackageGroupName = model_package_group_name,
    ResourcePolicy = model_package_group_policy)

print('ModelPackageGroupArn : {}'.format(create_model_package_group_response['ModelPackageGroupArn']))
print("First Versioned ModelPackageArn: " + model_package_arn)
print("Second Versioned ModelPackageArn: " + model_package_arn2)

print("Success! You are all set to proceed for cross-account deployment.")
```

# Deploy a Model in Studio
Deploy Model in Studio

After you register a model version and approve it for deployment, deploy it to a Amazon SageMaker AI endpoint for real-time inference. You can [Deploy a Model from the Registry with Python](model-registry-deploy.md) or deploy your model in Amazon SageMaker Studio. The following provides instructions on how to deploy your model in Studio.

This feature is not available in Amazon SageMaker Studio Classic.
+ If Studio is your default experience, the UI is similar to the images found in [Amazon SageMaker Studio UI overview](studio-updated-ui.md).
+ If Studio Classic is your default experience, the UI is similar to the images found in [Amazon SageMaker Studio Classic UI Overview](studio-ui.md).

Before you can deploy a model package, the following requirements must be met for the model package:
+ A valid inference specification available. See [InferenceSpecification](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateModelPackage.html#sagemaker-CreateModelPackage-request-InferenceSpecification) for more information.
+ Model with approved status. See [Update the Approval Status of a Model](model-registry-approve.md) for more information.

The following provides instructions on how to deploy a model in Studio.

**To deploy a model in Studio**

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Models** from the left navigation pane.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. (Optional) If you have models that are shared with you, you can choose between **My models** or **Shared with me**.

1. Select the checkboxes for the registered models. If the above requirements are met, the **Deploy** button becomes available to choose.

1. Choose **Deploy** to open the **Deploy model to endpoint** page.

1. Configure the deployment resources in the **Endpoint settings**. 

1. Once you have verified the settings, choose **Deploy**. The model will then be deployed to the endpoint with the **In service** status.

For `us-east-1`, `us-west-2`, `ap-northeast-1`, and `eu-west-1` regions, you can use the following instructions to deploy models:

**To deploy a model in Studio**

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Models** from the left navigation pane.

1. Choose the **My models** tab.

1. Choose the Logged **models** tab, if not selected already.

1. Select a model and choose **View Latest Version**.

1. Choose **Deploy** and select between SageMaker AI or Amazon Bedrock.

1. Once you have verified the settings, choose **Deploy**. The model will then be deployed to the endpoint with the **In service** status.

# Cross-account discoverability


By exploring and accessing model package groups registered in other accounts, data scientists and data engineers can promote data consistency, streamline collaboration, and reduce duplication of effort. With Amazon SageMaker Model Registry, you can share model package groups across accounts. There are two categories of permissions associated with the sharing of resources:
+ **Discoverability**: *Discoverability* is the ability of the resource consumer account to see the model package groups shared by one or more resource owner accounts. Discoverability is only possible if the resource owner attaches the necessary resource policies to the shared model package groups. The resource consumer can view all shared model package groups in the AWS RAM UI and AWS CLI.
+ **Accessibility**: *Accessibility* is the ability of the resource consumer account to use the shared model package groups. For example, the resource consumer can register or deploy a model package from a different account if they have the necessary permissions.

**Topics**
+ [

# Share model group in Studio
](model-registry-ram-studio-share.md)
+ [

# View shared model groups in Studio
](model-registry-ram-studio-view.md)
+ [

# Accessibility
](model-registry-ram-accessibility.md)
+ [

# Set up discoverability
](model-registry-ram-discover.md)
+ [

# View shared model package groups
](model-registry-ram-view-shared.md)
+ [

# Dissociate principals from a resource share and remove a resource share
](model-registry-ram-dissociate.md)
+ [

# Promote the permission and resource share
](model-registry-ram-promote.md)

# Share model group in Studio


You can share your model groups with other AWS principals (AWS accounts or AWS Organizations) using the Studio UI. This streamlined sharing process enables cross-team collaboration, promotes best practices, and facilitates model reuse across your teams. The following will provide instructions on how to share model groups in Studio.

This feature is not available in Amazon SageMaker Studio Classic.
+ If Studio is your default experience, the UI is similar to the images found in [Amazon SageMaker Studio UI overview](studio-updated-ui.md).
+ If Studio Classic is your default experience, the UI is similar to the images found in [Amazon SageMaker Studio Classic UI Overview](studio-ui.md).

To share model groups you will first need to make sure that you have the following permission added to the execution role you a sharing the resources from.

1. [Get your execution role](sagemaker-roles.md#sagemaker-roles-get-execution-role).

1. [Update role permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_update-role-permissions.html) with the following:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "ram:ListPermissions",
                   "ram:GetPermission",
                   "ram:GetResourceShareAssociations",
                   "ram:ListResourceSharePermissions",
                   "ram:DeleteResourceShare",
                   "ram:GetResourceShareInvitations",
                   "ram:AcceptResourceShareInvitation"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

The following provides instructions on how to share a model group with other AWS principals.

**To share a model group with other AWS principals**

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Models** from the left navigation pane.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. Select a registered model.

1. At the top right corner, choose **Share**. This will open the **Share model group** section.

   If you see an error message at the bottom of the screen, you will need add the appropriate permissions to your execution role. See the above permissions for more information.

1. Under **Resource shares**, choose a resource share to update or create a new resource share. 

1. Under **Managed permission**, choose a managed permission to control the level of access for your model. 

   The viewable options include permissions that have already been created for you or your custom made permissions in AWS RAM. See [Creating and using customer managed permissions](https://docs.aws.amazon.com/ram/latest/userguide/create-customer-managed-permissions.html) in the *AWS Resource Access Manager* User Guide.

1. Under **AWS principals**, input the AWS Organizations ARN or AWS account IDs you wish to share to and then choose **Add**. You can add multiple AWS principals this way.

1. When the minimum requirements are satisfied, the **Share** button becomes accessible. Once you have verified your settings, choose **Share**.

   A successful share will result in a green banner message at the bottom of the screen.

# View shared model groups in Studio


You can view model groups that are shared with you or an account belonging to the same AWS Organizations. If a model group is shared with an account belonging to the same AWS Organizations, the shared model group will be automatically approved and available for you to view in Studio. Otherwise, you will need to approve the pending invitation before you can view the shared model group in Studio. The following will provide instructions on how to view shared model groups and accept model group share invitations in Studio.

This feature is not available in Amazon SageMaker Studio Classic.
+ If Studio is your default experience, the UI is similar to the images found in [Amazon SageMaker Studio UI overview](studio-updated-ui.md).
+ If Studio Classic is your default experience, the UI is similar to the images found in [Amazon SageMaker Studio Classic UI Overview](studio-ui.md).

The following provides instructions on how to view and accept model groups shared with you.

**View and accept model groups shared with you**

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Models** from the left navigation pane.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. Choose **Shared with me** to view the model groups that are shared with you.

1. To accept pending model group invitations:

   1. Choose **View pending approvals** to open the **Pending invitations** list.

   1. If you would like to accept the invitation, choose **Accept**.

# Accessibility


If the resource consumer has access permissions to use a shared model package group, they can register or deploy a version of the model package group. For details about how the resource consumer can register a shared model package group, see [Register a Model Version from a Different Account](model-registry-version.md#model-registry-version-xaccount). For details about how the resource consumer can deploy a shared model package group, see [Deploy a Model Version from a Different Account](model-registry-deploy.md#model-registry-deploy-xaccount).

# Set up discoverability


The resource owner can set up model package group discoverability by creating resource shares and attaching resource policies to the entities. For detailed steps about how to create a general resource share in AWS RAM, see [Create a resource share](https://docs.aws.amazon.com/ram/latest/userguide/getting-started-sharing.html#getting-started-sharing-create) in the [AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) documentation.

Complete the following instructions to set up model package group discoverability using the AWS RAM console or Model Registry Resource Policy APIs.

------
#### [ AWS CLI ]

1. Create a resource share in the model owner account.

   1. The model owner attaches a resource policy to the model package group using the SageMaker AI Resource Policy API [put-model-package-group-policy](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/put-model-package-group-policy.html), as demonstrated in the following command.

      ```
      aws sagemaker put-model-package-group-policy
      --model-package-group-name <model-package-group-name>
      --resource-policy "{\"Version\":\"2012-10-17\",		 	 	 \"Statement\":[{\"Sid\":
      \"ExampleResourcePolicy\",\"Effect\":\"Allow\",\"Principal\":<principal>,
      \"Action\":[\"sagemaker:DescribeModelPackage\",
      \"sagemaker:ListModelPackages\",\"sagemaker:DescribeModelPackageGroup\"],
      \"Resource\":[\"<model-package-group-arn>,\"
      \"arn:aws:sagemaker:<region>:<owner-account-id>:model-package/
      <model-package-group-name>/*\"]}]}"
      ```
**Note**  
Different combinations of actions can be attached to the resource policy. For custom policies, the permission created should be promoted by the model package group owner, and only entities with promoted permissions attached are discoverable. Unpromotable resource shares cannot be made discoverable or managed through AWS RAM.

   1. To check that AWS RAM created the resource share ARN, use the following command:

      ```
      aws ram get-resource-share-associations --association-type resource --resource-arn <model-package-group-arn>
      ```

      The response contains the *resource-share-arn* for the entity.

   1. To check if the attached policy permission is a managed or custom policy, use the following command:

      ```
      aws ram list-resource-share-permissions --resource-share-arn <resource-share-arn>
      ```

      The `featureSet` field can take values `CREATED_FROM_POLICY` or `STANDARD`, which are defined as follows:
      + `STANDARD`: The permission already exists.
      + `CREATED_FROM_POLICY`: The permission needs to be promoted in order for the entity to be discoverable. For more information, see [Promote the permission and resource share](model-registry-ram-promote.md).

1. Accept the resource share invitation in the model consumer account.

   1. The model package group consumer accepts the invitation for resource share. To see all resource invitations, run the following command:

      ```
      aws ram get-resource-share-invitations
      ```

      Identify the requests that have status `PENDING` and include the account ID of the owner account.

   1. Accept the resource share invitation from the model owner using the following command:

      ```
      aws ram accept-resource-share-invitation --resource-share-invitation-arn <resource-share-invitation-arn>
      ```

------
#### [ AWS RAM console ]

1. Log into the [AWS RAM console](https://console.aws.amazon.com/ram/home).

1. Complete the following steps to create a resource share from the model package group owner account.

   1. Complete the following steps to specify resource share details.

      1. In the **Name** field, add a unique name for your resource.

      1. In the **Resources** card, choose the dropdown menu and select **SageMaker AI Model Package Groups**.

      1. Select the check box of the ARN of the model package group resource share.

      1. In the **Select resources** card, select the check box of your model package group resource share.

      1. In the **Tags** card, add key-value pairs for tags to add to your resource share.

      1. Choose **Next**.

   1. Complete the following steps to associate managed permissions to the resource share.

      1. If you use a managed permission, choose a managed permission in the **Managed permissions** dropdown menu.

      1. If you use a custom permission, choose **Customer Managed Permission**. In this case, the model package group is not immediately discoverable. You have to promote the permission and the resource policy after you create the resource share. For information about how to promote permissions and resource shares, see [Promote the permission and resource share](model-registry-ram-promote.md). For more information about how to attach custom permissions, see [Creating and using customer managed permissions in AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/create-customer-managed-permissions.html).

      1. Choose **Next**.

   1. Complete the following steps to grant access to principals.

      1. Choose **Allow sharing with anyone** to allow sharing with accounts outside of your organization, or choose **Allow sharing only within your organization**.

      1. In the **Select principal type** dropdown menu, add the principal types and ID for the principals you want to add.

      1. Add and select the chosen principals for the share.

      1. Choose **Next**.

   1. Review the displayed share configuration and then choose **Create resource share**.

1. Accept the resource share invitation from the consumer account. Once the model owner creates the resource share and principal associations, the specified resource consumer accounts receive an invitation to join the resource share. The resource consumer accounts can view and accept the invitations in the [Shared with me: Resource shares](https://console.aws.amazon.com/ram/home#SharedResourceShares:) page in the AWS RAM console. For more information about accepting and viewing resources in AWS RAM, see [Access AWS resources shared with you](https://docs.aws.amazon.com//ram/latest/userguide/working-with-shared.html).

------

# View shared model package groups


After the resource owner completes the previous steps to create a resource share and the consumer accepts the invitation for the share, the consumer can view the shared model package groups using the AWS CLI or in the AWS RAM console.

## AWS CLI


To view the model package groups shared, use the following command in the model consumer account:

```
aws sagemaker list-model-package-groups --cross-account-filter-option CrossAccount
```

## AWS RAM console


In the AWS RAM console, the resource owner and consumer can view shared model package groups. The resource owner can view the model package groups shared with the consumer by following the steps in [Viewing resource shares you created in AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/working-with-sharing-view-rs.html). The resource consumer can view the model package groups shared by the owner by following the steps in [Viewing resource shares shared with you](https://docs.aws.amazon.com/ram/latest/userguide/working-with-shared-view-rs.html).

# Dissociate principals from a resource share and remove a resource share


The resource owner can dissociate principals from the resource share for a set of permissions or delete the entire resource share using the AWS CLI or the AWS RAM console. For details about how to dissociate principals from a resource share, see [Update a Resource Share](https://docs.aws.amazon.com/ram/latest/userguide/working-with-sharing-update.html) in the [AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) documentation. For details about how to delete a resource share, see [Deleting a resource share](https://docs.aws.amazon.com/ram/latest/userguide/working-with-sharing-delete.html) in the [AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) documentation.

## AWS CLI


To dissociate principals from a resource share, use the command [dissociate-resource-share](https://docs.aws.amazon.com/cli/latest/reference/ram/disassociate-resource-share.html) as follows:

```
aws ram disassociate-resource-share --resource-share-arn <resource-share-arn> --principals <principal>
```

To delete a resource share, use the command [delete-resource-share](https://docs.aws.amazon.com/cli/latest/reference/ram/delete-resource-share.html) as follows:

```
aws ram delete-resource-share --resource-share-arn <resource-share-arn>
```

## AWS RAM console


For more details about how to dissociate principals from a resource share, see [Update a Resource Share](https://docs.aws.amazon.com/ram/latest/userguide/working-with-sharing-update.html) in the [AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) documentation. For more details about how to delete a resource share, see [Deleting a resource share](https://docs.aws.amazon.com/ram/latest/userguide/working-with-sharing-delete.html) in the [AWS RAM](https://docs.aws.amazon.com/ram/latest/userguide/what-is.html) documentation.

# Promote the permission and resource share


If you use customized (customer managed) permissions, you need to promote the permission and the associated resource share in order for the model package group to be discoverable. Complete the following steps to promote the permission and resource share.

1. To promote your customized permission to be accessible by AWS RAM, use the following command:

   ```
   aws ram promote-permission-created-from-policy —permission-arn <permission-arn>
   ```

1. Promote the resource share using the following command:

   ```
   aws ram promote-resource-share-created-from-policy --resource-share-arn <resource-share-arn>
   ```

If you see the `OperationNotPermittedException` error while performing the previous steps, the entity is not discoverable but is accessible. For example, if the resource owner attaches a resource policy with an assume role principal such as `“Principal”: {“AWS”: “arn:aws:iam::3333333333:role/Role-1”}`, or if the resource policy allows `“Action”: “*”` , the associated model package group is not promotable nor discoverable.

# View the Deployment History of a Model
Deployment History

To view the deployments for a model version in the Amazon SageMaker Studio console, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

**View the deployment history for a model version**

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models** to display a list of your model groups.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. From the model groups list, choose the angle bracket to the left of the model group that you want to view.

1. A list of the model versions in the model group appears. If you don't see the model version that you want to delete, choose **View all**.

1. Select the name of the model version that you want to view.

1. Choose the **Activity** tab. Deployments for the model version appear as events in the activity list with an **Event type** of **ModelDeployment**.

------
#### [ Studio Classic ]

**View the deployment history for a model version**

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. From the model groups list, select the name of the Model Group that you want to view.

1. A new tab appears with a list of the model versions in the Model Group.

1. In the list of model versions, select the name of the model version for which you want to view details.

1. On the model version tab that opens, choose **Activity**. Deployments for the model version appear as events in the activity list with an **Event type** of **ModelDeployment**.

------

# View model lineage details in Studio


You can view the lineage details of a registered model in Amazon SageMaker Studio. The following will provide instructions on how to access the lineage view in Studio. See [Amazon SageMaker ML Lineage Tracking](lineage-tracking.md) for more information about lineage tracking in Amazon SageMaker Studio.

This feature is not available in Amazon SageMaker Studio Classic.
+ If Studio is your default experience, the UI is similar to the images found in [Amazon SageMaker Studio UI overview](studio-updated-ui.md).
+ If Studio Classic is your default experience, the UI is similar to the images found in [Amazon SageMaker Studio Classic UI Overview](studio-ui.md).

The lineage view is an interactive visualization of the resources associated with your registered models. These resources include datasets, training jobs, approvals, models, and endpoints. In the lineage you can also view the associated resource details, including the source URI, creation timestamp, and other metadata.

The following capabilities are available in `us-east-1`, `us-west-2`, `ap-northeast-1`, and `eu-west-1` regions: 

You can track the lineage of logged and registered models. Furthermore, lineage for models resources include datasets, evaluators, training jobs, approvals, models, inference components, and endpoints. In the lineage you can also view the associated resource details, including the source URI, creation timestamp, and other metadata.

The following provides instructions on how to access the lineage details for a registered model version.

**To access the lineage details for a registered model version**

1. Open the Studioconsole by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Models** from the left navigation pane.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. (Optional) If you have models that are shared with you, you can choose between **My models** or **Shared with me**.

1. Select a registered model.

1. Choose the **Versions** tab, if not selected already.

1. Choose a specific model version from the **Versions** list.

1. Choose the **Lineage** tab. 

In the **Lineage** tab you can navigate through the resources associated with the model version. You can also choose a resource to view the resource details. 

Note that the Lineage view is for visualization purposes only. Rearranging or moving the components in this view does not affect the actual registered model resources.

For `us-east-1`, `us-west-2`, `ap-northeast-1`, and `eu-west-1` regions, you can use the following instructions to access the lineage details for logged and registered model versions:

1. Open the Studio console by following the instructions in [Launch Amazon SageMaker Studio](studio-updated-launch.md).

1. Choose **Models** from the left navigation pane.

1. Choose the **My models** tab.

1. (Optional) If you have models that are shared with you, you can choose between **Created by me** or **Shared with me**.

1. Select a model and choose **View Latest Version**.

1. Choose the **Lineage** tab.

# Model Registry Collections
Collections

You can use Collections to group registered models that are related to each other and organize them in hierarchies to improve model discoverability at scale. With Collections, you can organize registered models that are associated with one another. For example, you could categorize your models based on the domain of the problem they solve as Collections titled * NLP-models*, *CV-models*, or * Speech-recognition-models*. To organize your registered models in a tree structure, you can nest Collections within each other. Any operations you perform on a Collection, such as create, read, update, or delete, will not alter your registered models. You can use the Amazon SageMaker Studio UI or the Python SDK to manage your Collections.

The **Collections** tab in the Model Registry displays a list of all the Collections in your account. The following sections describe how you can use options in the **Collections** tab to do the following:
+ Create Collections
+ Add Model Groups to a Collection
+ Move Model Groups between Collections
+ Remove Model Groups or Collections from other Collections

Any operation you perform on your Collections does not affect the integrity of the individual Model Groups they contain—the underlying Model Group artifacts in Amazon S3 and Amazon ECR are not modified.

While Collections provide greater flexibility in organizing your models, the internal representation imposes some constraints on the size of your hierarchy. For a summary of these constraints, see [Constraints](modelcollections-limitations.md).

The following topics show you how to create and work with Collections in the Model Registry.

**Topics**
+ [

# Set up prerequisite permissions
](modelcollections-permissions.md)
+ [

# Create a Collection
](modelcollections-create.md)
+ [

# Add Model Groups to a Collection
](modelcollections-add-models.md)
+ [

# Remove Model Groups or Collections from a Collection
](modelcollections-remove-models.md)
+ [

# Move a Model Group Between Collections
](modelcollections-move-models.md)
+ [

# View a Model Group's Parent Collection
](modelcollections-view-parent.md)
+ [

# Constraints
](modelcollections-limitations.md)

# Set up prerequisite permissions


Create a custom policy which includes the following required Resource Groups actions:
+ `resource-groups:CreateGroup`
+ `resource-groups:DeleteGroup`
+ `resource-groups:GetGroupQuery`
+ `resource-groups:ListGroupResources`
+ `resource-groups:Tag`
+ `tag:GetResources`

For instructions on how to add an inline policy, see [ Adding IAM identity permissions (console)](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html#add-policies-console). When you choose the policy format, choose the JSON format and add the following policy:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "resource-groups:ListGroupResources"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "resource-groups:GetGroupQuery"
            ],
            "Resource": "arn:aws:resource-groups:*:*:group/*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "resource-groups:CreateGroup",
                "resource-groups:Tag"
            ],
            "Resource": "arn:aws:resource-groups:*:*:group/*",
            "Condition": {
                "ForAnyValue:StringEquals": {
                    "aws:TagKeys": "sagemaker:collection"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": "resource-groups:DeleteGroup",
            "Resource": "arn:aws:resource-groups:*:*:group/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/sagemaker:collection": "true"
                }
            }
        },
        {
            "Effect": "Allow",
            "Action": "tag:GetResources",
            "Resource": "*"
        }
    ]
}
```

------

# Create a Collection


**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

You can create a Collection in the Amazon SageMaker Studio console. To create a Collection, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models**.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Collections**.

1. (Optional) To create a Collection inside another Collection, navigate to the hierarchy where you want to add your Collection. Otherwise, your Collection is created at the root level.

1. In the **Actions** dropdown menu in the top right, choose **Create new collection**.

1. Enter a name for your Collection in the **Name** field of the dialog box.
**Note**  
If you plan to create multiple hierarchies in this Collection, keep your Collection names short. The absolute path, which is a string representing the location of your Collections from the root level, must be 256 characters or less. For additional details, see [Collection and Model Group tagging](modelcollections-limitations.md#modelcollections-tagging).

1. (Optional) To add Model Groups to your Collection, complete the following steps:

   1. Choose **Select model groups**.

   1. Select the Model Groups that you want to add. You can select up to 10.

1. Choose **Create**.

1. Check to make sure your Collection was created in the current hierarchy. If you do not immediately see your new Collection, choose **Refresh**.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. Choose the **Collections** tab.

1. (Optional) To create a Collection inside another Collection, navigate to the hierarchy where you want to add your Collection. Otherwise, your Collection is created at the root level.

1. In the **Actions** dropdown menu in the top right, choose **Create new collection**.

1. Enter a name for your Collection in the **Name** field of the dialog box.
**Note**  
If you plan to create multiple hierarchies in this Collection, keep your Collection names short. The absolute path, which is a string representing the location of your Collections from the root level, must be 256 characters or less. For additional details, see [Collection and Model Group tagging](modelcollections-limitations.md#modelcollections-tagging).

1. (Optional) To add Model Groups to your Collection, complete the following steps:

   1. Choose **Select model groups**.

   1. Select the Model Groups that you want to add. You can select up to 10.

1. Choose **Create**.

1. Check to make sure your Collection was created in the current hierarchy. If you do not immediately see your new Collection, choose **Refresh**.

------

# Add Model Groups to a Collection


You can add model groups to a Collection in the Amazon SageMaker Studio console. To add Model Groups to a Collection, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models**.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Models**, if not selected already.

1. Select the check box next to the model groups that you want to add. You can select up to 10 Model Groups. If you select more than 10, the UI option to add your Model Groups to a Collection is inactive.

1. Choose the vertical ellipsis next to **Create**, and choose **Add to collection**.

1. Select the radio button for the collection to which you want to add your selected Model Groups.

1. Choose **Add to collection**.

1. Check to make sure your Model Groups were added in to the collection. In the **Collections** column of the Model Groups you selected, you should see the name of collection to which you added the Model Groups.

------
#### [ Studio Classic ]

You can add Model Groups to a Collection from either the **Model Groups** or **Collections** tab.

To add one or more Model Groups to a Collection from the ** Collections** tab, complete the following steps:

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. Choose the **Collections** tab.

1. Select the Collection to which you want to add Model Groups. If the desired Collection is not at root level, navigate to the hierarchy where you want to add your Model Groups.

1. In the **Actions** dropdown menu in the top right, choose **Add model groups**.

1. Select the Model Groups that you want to add. You can select up to 10 Model Groups. If you select more than 10, the UI option to add your Model Groups to a Collection is inactive.

1. Choose **Add to collection**.

1. Check to make sure your Model Groups were added in the current hierarchy. If you do not immediately see your new Model Groups, choose **Refresh**.

To add one or more Model Groups to a Collection from the **Model Groups** tab, complete the following steps:

1. Sign in to Studio Classic. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. Choose the **Model Groups** tab.

1. Select the Model Groups that you want to add. You can select up to 10. If you select more than 10, the UI option to add your Model Groups to a Collection is inactive.

1. In the **Actions** dropdown menu in the top right, choose **Add to collection**.

1. In the pop-up dialog, choose the root path location `Collections`. This link to the root location appears above the table.

1. Navigate to the hierarchy which contains your destination Collection, or where you want to create a new Collection to which you add your models.

1. (Optional) To add your Model Groups to an existing Collection, complete the following steps:

   1. Select the destination Collection.

   1. Choose **Add to collection**.

1. (Optional) To add your Model Groups to a new Collection, complete the following steps:

   1. Choose **New collection**.

   1. Enter a name for your new Collection.

   1. Choose **Create**.

------

# Remove Model Groups or Collections from a Collection


When you remove Model Groups or Collections from a Collection, you are removing them from a particular grouping and not from the Model Registry. You can remove Model Groups from a Collection in the Amazon SageMaker Studio console.

To remove one or more Model Groups or Collections from a Collection, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models**.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Collections**.

1. Navigate to the Collection which contains the Model Groups or Collections you want to remove.

1. Select the Model Groups or Collections that you want to remove. You can select up to 10. If you select more than 10 Model Groups or Collections, the UI option to remove them is inactive.
**Important**  
You cannot simultaneously select Model Groups and Collections for removal. To remove both Model Groups and Collections, first remove Model Groups, and then remove Collections.
**Important**  
You cannot remove non-empty Collections. To remove a non-empty Collection, first remove its contents.

1. In the **Actions** dropdown menu in the top right, choose **Remove X items from collection** (where X is the number of Model Groups that you selected).

1. Confirm that you want to remove the selected Model Groups.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. Choose the **Collections** tab.

1. Navigate to the Collection which contains the Model Groups or Collections you want to remove.

1. Select the Model Groups or Collections that you want to remove. You can select up to 10. If you select more than 10 Model Groups or Collections, the UI option to remove them is inactive.
**Important**  
You cannot simultaneously select Model Groups and Collections for removal. To remove both Model Groups and Collections, first remove Model Groups, and then remove Collections.
**Important**  
You cannot remove non-empty Collections. To remove a non-empty Collection, first remove its contents.

1. In the **Actions** dropdown menu in the top right, choose **Remove X items from collection** (where X is the number of Model Groups you selected).

1. Confirm that you want to remove the selected Model Groups.

------

# Move a Model Group Between Collections


You can move one or more Model Groups from one Collection to another in the Amazon SageMaker Studio console.

To move Model Groups, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models**.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Collections**.

1. Navigate to the Collection which contains the Model Groups that you want to move.

1. Select the Model Groups that you want to move. You can select up to 10. If you select more than 10, the UI option to move your Model Groups is inactive.

1. In the **Actions** dropdown menu in the top right, choose **Move to**.

1. In the dialog box, choose the root path location `Collections`. This link to the root location appears above the table.

1. Navigate to the hierarchy which contains your destination Collection.

1. Select your destination Collection in the table.

1. Choose **Move here**.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. Choose the **Collections** tab.

1. Navigate to the Collection which contains the Model Groups that you want to move.

1. Select the Model Groups that you want to move. You can select up to 10. If you select more than 10, the UI option to move your Model Groups is inactive.

1. In the **Actions** dropdown menu in the top right, choose **Move to**.

1. In the dialog box, choose the root path location `Collections`. This link to the root location appears above the table.

1. Navigate to the hierarchy which contains your destination Collection.

1. Select your destination Collection in the table.

1. Choose **Move here**.

------

# View a Model Group's Parent Collection


You can view the Collections which contain a particular Model Group in the Amazon SageMaker Studio console.

To view the Collections which contain a particular Model Group, complete the following steps based on whether you use Studio or Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Models**.

1. Choose the **Registered models** tab, if not selected already.

1. Immediately below the **Registered models** tab label, choose **Model Groups**, if not selected already.

1. View the **Collection** column for your Model Group, which displays the name of the Collection which contains this Model Group. If multiple Collections contain this Model Group, choose the **Collection** column entry to display a pop-up listing the Collections which contain this Model Group.

------
#### [ Studio Classic ]

1. Sign in to Amazon SageMaker Studio Classic. For more information, see [Launch Amazon SageMaker Studio Classic](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-launch.html).

1. In the left navigation pane, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Choose **Models**, and then **Model registry**.

1. Choose the **Model Groups** tab.

1. Find your Model Group in the table.

1. View the **Collection** column for your Model Group, which displays the name of the Collection which contains this Model Group. If multiple Collections contain this Model Group, choose the **Collection** column entry to display a pop-up listing the Collections which contain this Model Group.

------

# Constraints


While using Collections, you may face issues related to tag length constraints or rate limits for Collection operations. Review the following list of caveats so you can avoid issues related to these limitations when you work with your Collections.

**VPC constraints**
+ Collections are not supported in VPC mode.

**Collection operation constraints**
+ You can add a maximum of 10 Model Groups to a Collection at a time.
+ You can remove a maximum of 10 Model Groups from a Collection at a time.
+ You can move a maximum of 10 Model Groups from one Collection to another at a time.
+ You cannot delete a Collection unless it is empty.
+ A Model Group can belong to multiple Collections, but a Collection can only belong to one Collection.

**Tag-related constraints**
+ A Model Group can belong to a maximum of 48 Collections. For more details, see the following section [Collection and Model Group tagging](#modelcollections-tagging).
+ A Collection’s absolute path can be a maximum of 256 characters long. Since Collection names are user-specified, you can control the path length. For more details, see the following section [Collection and Model Group tagging](#modelcollections-tagging).

## Collection and Model Group tagging


The SageMaker Model Registry uses tag rules and tags to internally represent your Collection groupings and hierarchy. You can access these tag elements in the AWS Resource Access Manager, the SageMaker SDK, and the AWS CLI, but it is important that you do not alter or delete them.

**Important**  
Do not delete or alter any tag rules or tags which belong to your Collections or Model Groups. Doing so prevents you from performing Collection operations\$1

A tag rule is a key-value pair that SageMaker AI uses to identify a Collection’s location in the hierarchy. In short, the key is the key of the parent Collection, and the value is the path of the Collection within the hierarchy. SageMaker AI allows tag values to be 256 characters or less, so if you have multiple nested hierarchies you are advised to keep your Collection names short.

**Important**  
Keep your Collection names short. The absolute path to any Collection must be 256 characters long or less.

Model Groups, on the other hand, do not have tag rules but use tags. A Model Group’s tags include the tag rules for all the Collections which contain the Model Group. For example, if four Collections contain *model-group-1*, * model-group-1* has four tags. SageMaker AI allows a single AWS resource to have a maximum of 50 tags. Since two are pre-allocated for general purposes, a Model Group can have a maximum of 48 tags. In conclusion, a Model Group can belong to a maximum of 48 Collections.

# Model Deployment in SageMaker AI
Model Deployment

Once you train and approve a model for production, use SageMaker AI to deploy your model to an endpoint for real-time inference. SageMaker AI provides multiple inference options so that you can pick the option that best suits your workload. You also configure your endpoint by choosing the instance type and number of instances you need for optimal performance. For details about model deployment, see [Deploy models for inference](deploy-model.md).


After you deploy your models to production, you might want to explore ways to further optimize model performance while maintaining availability of your current models. For example, you can set up a shadow test to try out a different model or model serving infrastructure before committing to the change. SageMaker AI deploys the new model, container, or instance in shadow mode and routes to it a copy of the inference requests in real time within the same endpoint. You can log the responses of the shadow variant for comparison. For details about shadow testing, see [Shadow tests](shadow-tests.md). If you decide to go ahead and change your model, deployment guardrails help you control the switch from the current model to a new one. You can select such methods as blue/green or canary testing of the traffic shifting process to maintain granular control during the update. For information about deployment guardrails, see [Deployment guardrails for updating models in production](deployment-guardrails.md).

# SageMaker Model Monitor
Model Monitor

Once a model is in production, you can monitor its performance in real time with Amazon SageMaker Model Monitor. Model Monitor helps you maintain model quality by detecting violations of user-defined thresholds for data quality, model quality, bias drift and feature attribution drift. In addition, you can configure alerts so you can troubleshoot violations as they arise and promptly initiate retraining. Model Monitor is integrated with SageMaker Clarify to improve visibility into potential bias. 

To learn about SageMaker Model Monitor, see [Data and model quality monitoring with Amazon SageMaker Model Monitor](model-monitor.md).

# MLOps Automation With SageMaker Projects
Projects

Create end-to-end ML solutions with CI/CD by using SageMaker Projects. 

Use SageMaker Projects to create a MLOps solution to orchestrate and manage:
+ Building custom images for processing, training, and inference
+ Data preparation and feature engineering
+ Training models
+ Evaluating models
+ Deploying models
+ Monitoring and updating models

**Topics**
+ [

# What is a SageMaker AI Project?
](sagemaker-projects-whatis.md)
+ [

# Granting SageMaker Studio Permissions Required to Use Projects
](sagemaker-projects-studio-updates.md)
+ [

# Create a MLOps Project using Amazon SageMaker Studio or Studio Classic
](sagemaker-projects-create.md)
+ [

# MLOps Project Templates
](sagemaker-projects-templates.md)
+ [

# View Project Resources
](sagemaker-projects-resources.md)
+ [

# Update a MLOps Project in Amazon SageMaker Studio or Studio Classic
](sagemaker-projects-update.md)
+ [

# Delete a MLOps Project using Amazon SageMaker Studio or Studio Classic
](sagemaker-projects-delete.md)
+ [

# Walk Through a SageMaker AI MLOps Project Using Third-party Git Repos
](sagemaker-projects-walkthrough-3rdgit.md)

# What is a SageMaker AI Project?
SageMaker Projects

SageMaker Projects help organizations set up and standardize developer environments for data scientists and CI/CD systems for MLOps engineers. Projects also help organizations set up dependency management, code repository management, build reproducibility, and artifact sharing.

You can provision SageMaker Projects using custom templates that are stored in Amazon S3 buckets, or by using templates from the AWS Service Catalog or SageMaker AI. For information about the AWS Service Catalog, see [What Is AWS Service Catalog](https://docs.aws.amazon.com/servicecatalog/latest/dg/what-is-service-catalog.html). With SageMaker Projects, MLOps engineers and organization admins can define their own templates or use SageMaker AI-provided templates. The SageMaker AI-provided templates bootstrap the ML workflow with source version control, automated ML pipelines, and a set of code to quickly start iterating over ML use cases.

## When Should You Use a SageMaker AI Project?


**Important**  
Effective September 9, 2024, project templates that use the AWS CodeCommit repository are no longer supported. For new projects, select from the available project templates that use third-party Git repositories.

While notebooks are helpful for model building and experimentation, a team of data scientists and ML engineers sharing code needs a more scalable way to maintain code consistency and strict version control.

Every organization has its own set of standards and practices that provide security and governance for its AWS environment. SageMaker AI provides a set of first-party templates for organizations that want to quickly get started with ML workflows and CI/CD. The templates include projects that use AWS-native services for CI/CD, such as AWS CodeBuild, AWS CodePipeline, and AWS CodeCommit. The templates also offer the option to create projects that use third-party tools, such as Jenkins and GitHub. For a list of the project templates that SageMaker AI provides, see [Use SageMaker AI-Provided Project Templates](sagemaker-projects-templates-sm.md).

Organizations often need tight control over the MLOps resources that they provision and manage. Such responsibility assumes certain tasks, including configuring IAM roles and policies, enforcing resource tags, enforcing encryption, and decoupling resources across multiple accounts. SageMaker Projects can support all these tasks through custom template offerings where organizations use CloudFormation templates to define the resources needed for an ML workflow. Data Scientists can choose a template to bootstrap and pre-configure their ML workflow.

To get started, we recommend that you create and store custom templates inside an Amazon S3 bucket. Doing so lets you create a bucket in any supported Region for your organization. S3 supports version control, so you can maintain multiple versions of your templates and roll back if necessary. For information about how to create a project from template store in an Amazon S3 bucket, see [Using a template from an Amazon S3 bucket](sagemaker-projects-templates-custom.md#sagemaker-projects-templates-s3).

Alternatively, you can also create custom templates as Service Catalog products and you can provision them in the Studio or Studio Classic UI under **Organization Templates**. The Service Catalog is a service that helps organizations create and manage catalogs of products that are approved for use on AWS. For more information about creating custom templates, see [Build Custom SageMaker AI Project Templates – Best Practices](https://aws.amazon.com/blogs/machine-learning/build-custom-sagemaker-project-templates-best-practices/).

While you can use either option, we recommend that you use S3 buckets over the Service Catalog, so you can create a bucket in supported Regions where SageMaker AI is available without needing to manage the complexities of the Service Catalog.

SageMaker Projects can help you manage your Git repositories so that you can collaborate more efficiently across teams, ensure code consistency, and support CI/CD. SageMaker Projects can help you with the following tasks:
+ Organize all entities of the ML lifecycle under one project.
+ Establish a single-click approach to set up standard ML infrastructure for model training and deployment that incorporates best practices.
+ Create and share templates for ML infrastructure to serve multiple use cases.
+ Leverage SageMaker AI-provided pre-built templates to quickly start focusing on model building, or create custom templates with organization-specific resources and guidelines.
+ Integrate with tools of your choice by extending the project templates. For an example, see [Create a SageMaker AI Project to integrate with GitLab and GitLab Pipelines](https://aws.amazon.com/blogs/machine-learning/build-mlops-workflows-with-amazon-sagemaker-projects-gitlab-and-gitlab-pipelines/).
+ Organize all entities of the ML lifecycle under one project.

## What is in a SageMaker AI Project?


Customers have the flexibility to set up their projects with the resources that best serve their use case. The example below showcases the MLOps setup for an ML workflow, including model training and deployment.

![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/projects/projects-ml-workflow.png)


A typical project with a SageMaker AI-provided template might include the following:
+ One or more repositories with sample code to build and deploy ML solutions. These are working examples that you can modify for your needs. You own this code and can take advantage of the version-controlled repositories for your tasks.
+ A SageMaker AI pipeline that defines steps for data preparation, training, model evaluation, and model deployment, as shown in the following diagram.  
![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/projects/pipeline-in-project-simple.png)
+ A CodePipeline or Jenkins pipeline that runs your SageMaker AI pipeline every time you check in a new version of the code. For information about CodePipeline, see [What is AWS CodePipeline.](https://docs.aws.amazon.com/codepipeline/latest/userguide/welcome.html) For information about Jenkins, see [Jenkins User Documentation](https://www.jenkins.io/doc/).
+ A model group that contains model versions. Every time you approve the resulting model version from a SageMaker AI pipeline run, you can deploy it to a SageMaker AI endpoint.

Each SageMaker AI project has a unique name and ID that are applied as tags to all of the SageMaker AI and AWS resources created in the project. With the name and ID, you can view all of the entities associated with your project. These include:
+ Pipelines
+ Registered models
+ Deployed models (endpoints)
+ Datasets
+ Service Catalog products
+ CodePipeline and Jenkins pipelines
+ CodeCommit and third-party Git repositories

## Do I Need to Create a Project to Use SageMaker AI Pipelines?


No. SageMaker pipelines are standalone entities just like training jobs, processing jobs, and other SageMaker AI jobs. You can create, update, and run pipelines directly within a notebook by using the SageMaker Python SDK without using a SageMaker AI project.

Projects provide an additional layer to help you organize your code and adopt operational best practices that you need for a production-quality system.

# Granting SageMaker Studio Permissions Required to Use Projects


The Amazon SageMaker Studio (or Studio Classic) administrator and Studio (or Studio Classic) users that you add to your domain can view project templates provided by SageMaker AI and create projects with those templates. By default, the administrator can view the SageMaker AI templates in the Service Catalog console. The administrator can see what another user creates if the user has permission to use SageMaker Projects. The administrator can also view the CloudFormation template that the SageMaker AI project templates define in the Service Catalog console. For information about using the Service Catalog console, see [What Is Service Catalog](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/introduction.html) in the *Service Catalog User Guide*.

Studio (and Studio Classic) users of the domain who are configured to use the same execution role as the domain by default have permission to create projects using SageMaker AI project templates.

**Important**  
Do not manually create your roles. Always create roles through **Studio Settings** using the steps described in the following procedure.

For users who use any role other than the domain's execution role to view and use SageMaker AI-provided project templates, you need to grant **Projects** permissions to the individual user profiles by turning on **Enable Amazon SageMaker AI project templates and Amazon SageMaker JumpStart** for Studio users when you add them to your domain. For more information about this step, see [Add user profiles](domain-user-profile-add.md). 

Since SageMaker Projects is backed by Service Catalog, you must add each role that requires access to SageMaker Projects to the **Amazon SageMaker AI Solutions and ML Ops products** Portfolio in the service catalog. You can do this in the **Groups, roles, and users** tab, as shown in the following image. If each user profile in Studio Classic has a different role, you should add each of those roles to the service catalog. You can also do this while creating a user profile in Studio Classic.

## Grant new domain roles access to projects


When you change your domain's execution role or add user profiles with different roles, you must grant these new roles access to the Service Catalog portfolio to use SageMaker Projects. Follow these steps to ensure all roles have the necessary permissions:

**To grant new domain roles access to projects**

1. Open the [Service Catalog console](https://console.aws.amazon.com/servicecatalog/).

1. In the left navigation menu, choose **Portfolios**.

1. Select the **Imported** section.

1. Select **Amazon SageMaker Solutions and ML Ops products**.

1. Choose the **Access** tab.

1. Choose **Grant access**.

1. In the **Grant access** dialog, select **Roles**.

1. Grant access for all roles that are used by the domain's user profiles, including:
   + The domain's execution role
   + Any custom execution roles assigned to individual user profiles

1. Choose **Grant access** to confirm.

**Important**  
You must complete this process whenever you change your domain's execution role or add user profiles with new execution roles. Without this access, users will not be able to create or use SageMaker Projects.

The following procedures show how to grant **Projects** permissions after you onboard to Studio or Studio Classic. For more information about onboarding to Studio or Studio Classic, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

**To confirm that your SageMaker AI Domain has active project template permissions:**

1. Open the [SageMaker AI console](https://console.aws.amazon.com/sagemaker/).

1. On the left navigation pane, choose **Admin configurations**.

1. Under **Admin configurations**, choose **domains**. 

1. Select your domain.

1. Choose the **Domain Settings** tab.

1. Under **SageMaker Projects and JumpStart**, make sure the following options are turned on:
   + **Enable Amazon SageMaker AI project templates and Amazon SageMaker JumpStart for this account**
   + **Enable Amazon SageMaker AI project templates and Amazon SageMaker JumpStart for Studio users**

**To view a list of your roles:**

1. Open the [SageMaker AI console](https://console.aws.amazon.com/sagemaker/).

1. On the left navigation pane, choose **Admin configurations**.

1. Under **Admin configurations**, choose **domains**. 

1. Select your domain.

1. Choose the **Domain Settings** tab.

1. A list of your roles appears in the `Apps` card under the **Studio** tab.
**Important**  
As of July 25, we require additional roles to use project templates. Here is the complete list of roles you should see under `Projects`:  
`AmazonSageMakerServiceCatalogProductsLaunchRole` `AmazonSageMakerServiceCatalogProductsUseRole` `AmazonSageMakerServiceCatalogProductsApiGatewayRole` `AmazonSageMakerServiceCatalogProductsCloudformationRole` `AmazonSageMakerServiceCatalogProductsCodeBuildRole` `AmazonSageMakerServiceCatalogProductsCodePipelineRole` `AmazonSageMakerServiceCatalogProductsEventsRole` `AmazonSageMakerServiceCatalogProductsFirehoseRole` `AmazonSageMakerServiceCatalogProductsGlueRole` `AmazonSageMakerServiceCatalogProductsLambdaRole` `AmazonSageMakerServiceCatalogProductsExecutionRole`  
For descriptions of these roles, see [AWS Managed Policies for SageMaker Projects and JumpStart](security-iam-awsmanpol-sc.md).

# Create a MLOps Project using Amazon SageMaker Studio or Studio Classic
Create a MLOps Project

**Important**  
Custom IAM policies that allow Amazon SageMaker Studio or Amazon SageMaker Studio Classic to create Amazon SageMaker resources must also grant permissions to add tags to those resources. The permission to add tags to resources is required because Studio and Studio Classic automatically tag any resources they create. If an IAM policy allows Studio and Studio Classic to create resources but does not allow tagging, "AccessDenied" errors can occur when trying to create resources. For more information, see [Provide permissions for tagging SageMaker AI resources](security_iam_id-based-policy-examples.md#grant-tagging-permissions).  
[AWS managed policies for Amazon SageMaker AI](security-iam-awsmanpol.md) that give permissions to create SageMaker resources already include permissions to add tags while creating those resources.

This procedure demonstrates how to create an MLOps project using Amazon SageMaker Studio Classic.

**Prerequisites**
+ An IAM account or IAM Identity Center to sign in to Studio or Studio Classic. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).
+ Permission to use SageMaker AI-provided project templates. For more information, see [Granting SageMaker Studio Permissions Required to Use Projects](sagemaker-projects-studio-updates.md).
+ Basic familiarity with the Studio Classic user interface. For nore information, see [Amazon SageMaker Studio Classic UI Overview](studio-ui.md).

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Deployments**, and then choose **Projects**.

1. In the upper-right corner above the projects list, choose **Create project**.

1. In the **Templates** page, choose a template to use for your project. For more information about project templates, see [MLOps Project Templates](sagemaker-projects-templates.md).

1. Choose **Next**.

1. In the **Project details** page, enter the following information:
   + **Name**: A name for your project.
   + **Description**: An optional description for your project.
   + The values for the Service Catalog provisioning parameters related to your chosen template.

1. Choose **Create project** and wait for the project to appear in the **Projects** list.

1. (Optional) In the Studio sidebar, choose **Pipelines** to view the pipeline created from your project. For more information about Pipelines, see [Pipelines](pipelines.md).

------
#### [ Studio Classic ]

1. Sign in to Studio Classic. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Deployments** from the menu, and then select **Projects**.

1. Choose **Create project**.

   The **Create project** tab opens displaying a list of available templates.

1. If not selected already, choose **SageMaker AI templates**. For more information about project templates, see [MLOps Project Templates](sagemaker-projects-templates.md).

1. Choose the template **Model building, training, and deployment**.

1. Choose **Select project template**.

   The **Create project** tab changes to display **Project details**.

1. Enter the following information:
   + For **Project details**, enter a name and description for your project.
   + Optionally, add tags, which are key-value pairs that you can use to track your projects.

1. Choose **Create project** and wait for the project to appear in the **Projects** list.

------

# MLOps Project Templates
Templates

An Amazon SageMaker AI project template automates the setup and implementation of MLOps for your projects. A SageMaker AI project template is an Service Catalog product that SageMaker AI makes available to Amazon SageMaker Studio (or Studio Classic) users. These Service Catalog products are visible in your Service Catalog console after you enable permissions when you onboard or update Amazon SageMaker Studio (or Studio Classic). For information about enabling permissions to use SageMaker AI project templates, see [Granting SageMaker Studio Permissions Required to Use Projects](sagemaker-projects-studio-updates.md). Use SageMaker AI project templates to create a project that is an end-to-end MLOps solution.

You can use a SageMaker Projects template to implement image-building CI/CD. With this template, you can automate the CI/CD of images that are built and pushed to Amazon ECR. Changes in the container files in your project’s source control repositories initiate the ML pipeline and deploy the latest version for your container. For more information, see the blog [Create Amazon SageMaker Projects with image building CI/CD pipelines](https://aws.amazon.com/blogs/machine-learning/create-amazon-sagemaker-projects-with-image-building-ci-cd-pipelines/).

If you are an administrator, you can create custom project templates from scratch or modify one of the project templates provided by SageMaker AI. Studio (or Studio Classic) users in your organization can use these custom project templates to create their projects.

**Topics**
+ [

# Use SageMaker AI-Provided Project Templates
](sagemaker-projects-templates-sm.md)
+ [

# Create Custom Project Templates
](sagemaker-projects-templates-custom.md)

# Use SageMaker AI-Provided Project Templates
Use Provided Templates

**Important**  
As of October 28, 2024, the AWS CodeCommit templates have been removed. For new projects, select from the available project templates that use third-party Git repositories.

Amazon SageMaker AI provides project templates that create the infrastructure you need to create an MLOps solution for continuous integration and continuous deployment (CI/CD) of ML models. Use these templates to process data, extract features, train and test models, register the models in the SageMaker Model Registry, and deploy the models for inference. You can customize the seed code and the configuration files to suit your requirements.

**Note**  
Additional roles are required to use project templates. For a complete list of required roles and instructions on how to create them, see [Granting SageMaker Studio Permissions Required to Use Projects](sagemaker-projects-studio-updates.md). If you do not have the new roles, you will get the error message **CodePipeline is not authorized to perform AssumeRole on role arn:aws:iam::xxx:role/service-role/AmazonSageMakerServiceCatalogProductsCodePipelineRole** when you try to create a new project and cannot proceed.

SageMaker AI project templates offer you the following choice of code repositories, workflow automation tools, and pipeline stages:
+ **Code repository**: Third-party Git repositories such as GitHub and Bitbucket
+ **CI/CD workflow automation**: AWS CodePipeline or Jenkins
+ **Pipeline stages**: Model building and training, model deployment, or both

The following discussion provides an overview of each template you can choose when you create your SageMaker AI project. You can also view the available templates in Studio (or Studio Classic) by following [Create the Project](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html#sagemaker-proejcts-walkthrough-create) of the [Project walkthrough](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough.html).

For step-by-step instructions on how to create a real project, you can follow one of the project walkthroughs:
+ If you want to use the template [MLOps templates for model building, training, and deployment with third-party Git using CodePipeline](#sagemaker-projects-templates-git-code-pipeline), see [Walk Through a SageMaker AI MLOps Project Using Third-party Git Repos](sagemaker-projects-walkthrough-3rdgit.md).
+ If you want to use the template [MLOps templates for model building, training, and deployment with third-party Git repositories using Jenkins](#sagemaker-projects-templates-git-jenkins), see [Create Amazon SageMaker Projects using third-party source control and Jenkins](https://aws.amazon.com/blogs/machine-learning/create-amazon-sagemaker-projects-using-third-party-source-control-and-jenkins/).

**Topics**

## MLOps templates for model building, training, and deployment with third-party Git using CodePipeline

+ **Code repository**: Third-party Git.
**Note**  
Establish the AWS CodeStar connection from your AWS account to your GitHub user or organization. Add a tag with the key `sagemaker` and value `true` to this AWS CodeStar connection.
+ **CI/CD workflow automation**: AWS CodePipeline

### Model building and training


This template provides the following resources:
+ Associations with one customer-specified Git repositories. Repository contains sample code that creates an Amazon SageMaker AI pipeline in Python code and shows how to create and update the SageMaker AI pipeline. This repository also has a sample Python notebook that you can open and run in Studio (or Studio Classic).
+ An AWS CodePipeline pipeline that has source and build steps. The source step points to the third-party Git repository. The build step gets the code from that repository, creates and updates the SageMaker AI pipeline, starts a pipeline execution, and waits for the pipeline execution to complete.
+ An AWS CodeBuild project to populate the Git repositories with the seed code information. This requires an AWS CodeStar connection from your AWS account to your account on the Git repository host.
+ An Amazon S3 bucket to store artifacts, including CodePipeline and CodeBuild artifacts, and any artifacts generated from the SageMaker AI pipeline runs.

### Model deployment


This template provides the following resources:
+ Associations with one customer-specified Git repositories. Repository contains sample code that deploys models to endpoints in staging and production environments.
+ An AWS CodePipeline pipeline that has source, build, deploy-to-staging, and deploy-to-production steps. The source step points to the third-party Git repository and the build step gets the code from that repository and generates CloudFormation stacks to deploy. The deploy-to-staging and deploy-to-production steps deploy the CloudFormation stacks to their respective environments. There is a manual approval step between the staging and production build steps, so that a MLOps engineer must approve the model before it is deployed to production.
+ An AWS CodeBuild project to populate the Git repositories with the seed code information. This requires an AWS CodeStar connection from your AWS account to your account on the Git repository host.
+ An Amazon S3 bucket to store artifacts, including CodePipeline and CodeBuild artifacts, and any artifacts generated from the SageMaker AI pipeline runs.

### Model building, training, and deployment


This template provides the following resources:
+ Associations with one or more customer-specified Git repositories.
+ An AWS CodePipeline pipeline that has source, build, deploy-to-staging, and deploy-to-production steps. The source step points to the third-party Git repository and the build step gets the code from that repository and generates CloudFormation stacks to deploy. The deploy-to-staging and deploy-to-production steps deploy the CloudFormation stacks to their respective environments. There is a manual approval step between the staging and production build steps, so that a MLOps engineer must approve the model before it is deployed to production.
+ An AWS CodeBuild project to populate the Git repositories with the seed code information. This requires an AWS CodeStar connection from your AWS account to your account on the Git repository host.
+ An Amazon S3 bucket to store artifacts, including CodePipeline and CodeBuild artifacts, and any artifacts generated from the SageMaker AI pipeline runs.

As previously mentioned, see [Project Walkthrough Using Third-party Git Repos](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-projects-walkthrough-3rdgit.html) for a demonstration that uses this template to create a real project.

## MLOps template for model building, training, deployment, and Amazon SageMaker Model Monitor using CodePipeline

+ **Code repository**: Third-party Git.
**Note**  
Establish the AWS CodeStar connection from your AWS account to your GitHub user or organization. Add a tag with the key `sagemaker` and value `true` to this AWS CodeStar connection.
+ **CI/CD workflow automation**: AWS CodePipeline

The following templates include an additional Amazon SageMaker Model Monitor template that provides the following types of monitoring:
+ [Data Quality](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-data-quality.html) – Monitor drift in data quality.
+ [Model Quality](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-model-quality.html) – Monitor drift in model quality metrics, such as accuracy.
+ [Bias Drift for Models in Production](https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-model-monitor-bias-drift.html) – Monitor bias in a model's predictions.

### Model building, training, deployment, and Amazon SageMaker Model Monitor


This template is an extension of the MLOps template for model building, training, and deployment with Git repositories using CodePipeline. It includes both the model building, training, and deployment components of the template, and an additional Amazon SageMaker Model Monitor template that provides the following types of monitoring: 

### Monitor a deployed model
Template for model deployment

You can use this template for an MLOps solution to deploy one or more of the Amazon SageMaker AI data quality, model quality, model bias, and model explainability monitors to monitor a deployed model on a SageMaker AI inference endpoint. This template provides the following resources: 
+ Associations with one or more customer-specified Git repositories. Repository contains sample Python code that gets the [baselines](https://docs.aws.amazon.com/sagemaker/latest/dg/model-monitor-create-baseline.html) used by the monitors from the Amazon SageMaker Model Registry, and updates the template’s parameters for the staging and production environments. It also contains a CloudFormation template to create the Amazon SageMaker Model Monitors.
+ An AWS CodePipeline pipeline that has source, build, and deploy steps. The source step points to the CodePipeline repository. The build step gets the code from that repository, gets the baseline from the Model Registry, and updates template parameters for the staging and production environments. The deploy steps deploy the configured monitors into the staging and production environments. The manual approval step, within the `DeployStaging` stage, requires you to verify that the production SageMaker AI endpoint is `InService` before approving and moving to the `DeployProd` stage.
+ An AWS CodeBuild project to populate the Git repositories with the seed code information. This requires an AWS CodeStar connection from your AWS account to your account on the Git repository host.
+ The template uses the same Amazon S3 bucket created by the MLOps template for model building, training, and deployment to store the monitors' outputs.
+ Two Amazon EventBridge events rules initiate the Amazon SageMaker Model Monitor AWS CodePipeline every time the staging SageMaker AI endpoint is updated.

## MLOps templates for model building, training, and deployment with third-party Git repositories using Jenkins

+ **Code repository**: Third-party Git.
**Note**  
Establish the AWS CodeStar connection from your AWS account to your GitHub user or organization. Add a tag with the key `sagemaker` and value `true` to this AWS CodeStar connection.
+ **CI/CD workflow automation**: Jenkins

### Model building, training, and deployment


This template provides the following resources:
+ Associations with one or more customer-specified Git repositories.
+ Seed code to generate Jenkins pipelines that have source, build, deploy-to-staging, and deploy-to-production steps. The source step points to the customer-specified Git repository. The build step gets the code from that repository and generates two CloudFormation stacks. The deploy steps deploy the CloudFormation stacks to their respective environments. There is an approval step between the staging step and the production step.
+ An AWS CodeBuild project to populate the Git repositories with the seed code information. This requires an AWS CodeStar connection from your AWS account to your account on the Git repository host.
+ An Amazon S3 bucket to store artifacts of the SageMaker AI project and SageMaker AI pipeline.

The template creates the association between your project and the source control repositories, but you need to perform additional manual steps to establish communication between your AWS account and Jenkins. For the detailed steps, see [Create Amazon SageMaker Projects using third-party source control and Jenkins](https://aws.amazon.com/blogs/machine-learning/create-amazon-sagemaker-projects-using-third-party-source-control-and-jenkins/).

The instructions help you build the architecture shown in the following diagram, with GitHub as the source control repository in this example. As shown, you are attaching your Git repository to the project to check in and manage code versions. Jenkins initiates the model build pipeline when it detects changes to the model build code in the Git repository. You are also connecting the project to Jenkins to orchestrate your model deployment steps, which start when you approve the model registered in the model registry, or when Jenkins detects changes to the model deployment code.


![\[\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/projects/projects-templates-gitjenkins.png)


In summary, the steps guide you through the following tasks:

1. Establish the connection between your AWS and GitHub accounts.

1. Create the Jenkins account and import needed plugins.

1. Create the Jenkins IAM user and permissions policy.

1. Set the AWS credentials for the Jenkins IAM user on your Jenkins server.

1. Create an API token for communication with your Jenkins server.

1. Use a CloudFormation template to set up an EventBridge rule to monitor the model registry for newly-approved models.

1. Create the SageMaker AI project, which seeds your GitHub repositories with model build and deploy code.

1. Create your Jenkins model build pipeline with the model build seed code.

1. Create your Jenkins model deploy pipeline with the model deploy seed code.

## MLOps template for image building, model building, and model deployment


This template is an extension of the [MLOps templates for model building, training, and deployment with third-party Git using CodePipeline](#sagemaker-projects-templates-git-code-pipeline). It includes both the model building, training, and deployment components of that template and the following options:
+ Include processing image–building pipeline
+ Include training image–building pipeline
+ Include inference image–building pipeline

For each of the components selected during project creation, the following are created by using the template:
+ An Amazon ECR repository
+ [A SageMaker Image](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateImage.html)
+ A CodeCommit repository containing a Dockerfile that you can customize
+ A CodePipeline that is initiated by changes to the CodePipeline repository
+ A CodeBuild project that builds a Docker image and registers it in the Amazon ECR repository
+ An EventBridge rule that initiates the CodePipeline on a schedule

When the CodePipeline is initiated, it builds a new Docker container and registers it with an Amazon ECR repository. When a new container is registered with the Amazon ECR repository, a new `ImageVersion` is added to the SageMaker image. This initiates the model building pipeline, which in turn initiates the deployment pipeline.

The newly created image is used in the model building, training, and deployment portions of the workflow where applicable.

## Update SageMaker Projects to Use Third-Party Git Repositories
Third-Party Git Repositories

The managed policy attached to the `AmazonSageMakerServiceCatalogProductsUseRole` role was updated on July 27, 2021 for use with the third-party Git templates. Users who onboard to Amazon SageMaker Studio (or Studio Classic) after this date and enable project templates use the new policy. Users who onboarded prior to this date must update the policy to use these templates. Use one of the following options to update the policy:
+ Delete role and toggle Studio (or Studio Classic) settings

  1. In the IAM console, delete `AmazonSageMakerServiceCatalogProductsUseRole`.

  1. In the Studio (or Studio Classic) control panel, choose **Edit Settings**.

  1. Toggle both settings and then choose **Submit**.
+ In the IAM console, add the following permissions to `AmazonSageMakerServiceCatalogProductsUseRole`:

  ```
  {
        "Effect": "Allow",
        "Action": [
            "codestar-connections:UseConnection"
        ],
        "Resource": "arn:aws:codestar-connections:*:*:connection/*",
        "Condition": {
            "StringEqualsIgnoreCase": {
                "aws:ResourceTag/sagemaker": "true"
            }
        }
    },
    {
        "Effect": "Allow",
        "Action": [
            "s3:PutObjectAcl"
        ],
        "Resource": [
            "arn:aws:s3:::sagemaker-*"
        ]
    }
  ```

# Create Custom Project Templates
Custom Templates

**Important**  
As of October 28, 2024, the AWS CodeCommit templates have been removed. For new projects, select from the available project templates that use third-party Git repositories. For more information, see [MLOps Project Templates](sagemaker-projects-templates.md).

If the SageMaker AI-provided templates do not meet your needs (for example, you want to have more complex orchestration in the CodePipeline with multiple stages or custom approval steps), create your own templates.

We recommend starting by using SageMaker AI-provided templates to understand how to organize your code and resources and build on top of it. To do this, after you enable administrator access to the SageMaker AI templates, log in to the [https://console.aws.amazon.com/servicecatalog/](https://console.aws.amazon.com/servicecatalog/), choose **Portfolios**, then choose **Imported**. For information about Service Catalog, see [Overview of Service Catalog](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/what-is_concepts.html) in the *Service Catalog User Guide*.

Create your own project templates to customize your MLOps project. SageMaker AI project templates are Service Catalog–provisioned products to provision the resources for your MLOps project. 

To create a custom project template, complete the following steps.

1. Create a portfolio. For information, see [Step 3: Create an Service Catalog Portfolio](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/getstarted-portfolio.html).

1. Create a product. A product is a CloudFormation template. You can create multiple versions of the product. For information, see [Step 4: Create an Service Catalog Product](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/getstarted-product.html).

   For the product to work with SageMaker Projects, add the following parameters to your product template.

   ```
   SageMakerProjectName:
   Type: String
   Description: Name of the project
   
   SageMakerProjectId:
   Type: String
   Description: Service generated Id of the project.
   ```
**Important**  
We recommend that you wrap the CodeCommit repository into the SageMaker AI code repository for the project's repositories to be visible in VPC mode. The sample template and required addition are shown in the following code samples.  
Original (sample) template:  

   ```
   ModelBuildCodeCommitRepository:
       Type: AWS::CodeCommit::Repository
       Properties:
         # Max allowed length: 100 chars
         RepositoryName: !Sub sagemaker-${SageMakerProjectName}-${SageMakerProjectId}-modelbuild # max: 10+33+15+10=68
         RepositoryDescription: !Sub SageMaker Model building workflow infrastructure as code for the Project ${SageMakerProjectName}
         Code:
           S3:
             Bucket: SEEDCODE_BUCKETNAME
             Key: toolchain/model-building-workflow-v1.0.zip
           BranchName: main
   ```
Additional content to add in VPC mode:  

   ```
   SageMakerRepository:
       Type: AWS::SageMaker::CodeRepository
       Properties:
           GitConfig:
               RepositoryUrl: !GetAtt ModelBuildCodeCommitRepository.CloneUrlHttp
               Branch: main
   ```

1. Add a launch constraint. A launch constraint designates an IAM role that Service Catalog assumes when a user launches a product. For information, see [Step 6: Add a Launch Constraint to Assign an IAM Role](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/getstarted-launchconstraint.html).

1. Provision the product on [https://console.aws.amazon.com/servicecatalog/](https://console.aws.amazon.com/servicecatalog/) to test the template. If you are satisfied with your template, continue to the next step to make the template available in Studio (or Studio Classic).

1. Grant access to the Service Catalog portfolio that you created in step 1 to your Studio (or Studio Classic) execution role. Use either the domain execution role or a user role that has Studio (or Studio Classic) access. For information about adding a role to the portfolio, see [Step 7: Grant End Users Access to the Portfolio](https://docs.aws.amazon.com/servicecatalog/latest/adminguide/getstarted-deploy.html).

1. To make your project template available in your **Organization templates** list in Studio (or Studio Classic), create a tag with the following key and value to the Service Catalog product you created in step 2.
   + **key**: `sagemaker:studio-visibility`
   + **value**: `true`

After you complete these steps, Studio (or Studio Classic) users in your organization can create a project with the template you created by following the steps in [Create a MLOps Project using Amazon SageMaker Studio or Studio Classic](sagemaker-projects-create.md) and choosing **Organization templates** when you choose a template.

## Using a template from an Amazon S3 bucket
Templates in Amazon S3

You can also create SageMaker projects using templates stored in Amazon S3.

**Note**  
While you can use the templates in the AWS Service Catalog, we recommend that you store templates in an S3 bucket and create projects using those templates.

### Admin setup


Before you can create projects using templates in an S3 bucket, perform the following steps.

1. [Create an S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html), and upload your templates to the bucket.

1. [Set up a CORS policy on your S3 bucket to configure access permissions](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enabling-cors-examples.html).

1. Add the following key-value tag to the template so they become visible to SageMaker AI.

   ```
   sagemaker:studio-visibility : true
   ```

1. [Create a domain](https://docs.aws.amazon.com/sagemaker/latest/dg/onboard-quick-start.html).

1. After SageMaker AI finishes creating your domain, add the following key-value tag to the domain:

   ```
   sagemaker:projectS3TemplatesLocation : s3://<amzn-s3-demo-bucket>
   ```

Then use the AWS console, Python, or the [CreateProject](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateProject.html) and [UpdateProject](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_UpdateProject.html) API operations to create or update a SageMaker project from templates inside the S3 bucket.

------
#### [ Studio ]

**Create a project**

1. Open the Amazon SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Deployments**, **Projects**, **Create project**.

1. Choose **Organization templates** and then **S3 Templates** to see the templates that are available to you. If you don't see a template that you're expecting, notify your administrator.

1. Choose the template that you want to use, and then choose **Next**.

1. Enter a name for your project, an optional description, and the other required fields. When you're done, choose **Create**.

**Update a project**

1. Open the Amazon SageMaker AI console at [https://console.aws.amazon.com/sagemaker/](https://console.aws.amazon.com/sagemaker/).

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. Choose the project that you want to update. Choose **Actions**, then choose **Update Project**.

1. When updating a project, you can update the template parameters or the template URL. When you're done, choose **Next**.

1. Review the project updates in the summary table, and choose **Update**.

------
#### [ Python Boto3 ]

After you create the S3 bucket and uploaded your templates, you can use the following example to create a SageMaker project.

```
sagemaker_client = boto3.client('sagemaker', region_name='us-west-2')

response = sagemaker_client.create_project(
    ProjectName='my-custom-project',
    ProjectDescription='SageMaker project with custom CFN template stored in S3',
    TemplateProviders=[{
        'CfnTemplateProvider': {
            'TemplateName': 'CustomProjectTemplate',
            'TemplateURL': f'https://<bucket_name>.s3.us-west-2.amazonaws.com/custom-project-template.yml',
            'Parameters': [
                {'Key': 'ParameterKey', 'Value': 'ParameterValue'}
            ]
        }
    }]
)
print(f"Project ARN: {response['ProjectArn']}")
```

To update a SageMaker project, see the following example.

```
sagemaker_client = boto3.client('sagemaker', region_name='us-west-2')

response = sagemaker_client.update_project(
    ProjectName='my-custom-project',
    ProjectDescription='SageMaker project with custom CFN template stored in S3',
    TemplateProvidersToUpdate=[{
        'CfnTemplateProvider': {
            'TemplateName': 'CustomProjectTemplate',
            'TemplateURL': f'https://<bucket_name>.s3.us-west-2.amazonaws.com/custom-project-template.yml',
            'Parameters': [
                {'Key': 'ParameterKey', 'Value': 'ParameterValue'}
            ]
        }
    }]
)
print(f"Project ARN: {response['ProjectArn']}")
```

------

# View Project Resources
View Resources

After you create a project, view the resources associated with the project in Amazon SageMaker Studio Classic.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Deployments**, and then choose **Projects**.

1. Select the name of the project for which you want to view details. A page with the project details appears.

On the project details page, you can view the following entities can open any of the following tabs corresponding to the entity associated with the project.
+ Repositories: Code repositories (repos) associated with this project. If you use a SageMaker AI-provided template when you create your project, it creates a AWS CodeCommit repo or a third-party Git repo. For more information about CodeCommit, see [What is AWS CodeCommit](https://docs.aws.amazon.com/codecommit/latest/userguide/welcome.html).
+ Pipelines: SageMaker AI ML pipelines that define steps to prepare data, train, and deploy models. For information about SageMaker AI ML pipelines, see [Pipelines actions](pipelines-build.md).
+ Experiments: One or more Amazon SageMaker Autopilot experiments associated with the project. For information about Autopilot, see [SageMaker Autopilot](autopilot-automate-model-development.md).
+ Model groups: Groups of model versions that were created by pipeline executions in the project. For information about model groups, see [Create a Model Group](model-registry-model-group.md).
+ Endpoints: SageMaker AI endpoints that host deployed models for real-time inference. When a model version is approved, it is deployed to an endpoint.
+ Tags: All the tags associated with the project. For more information about tags, see [Tagging AWS resources](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html) in the *AWS General Reference*.
+ Metadata: Metadata associated with the project. This includes the template and version used, and the template launch path.

------
#### [ Studio Classic ]

1. Sign in to Studio Classic. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Deployments** from the menu, and then select **Projects**.

1. Select the name of the project for which you want to view details.

   A tab with the project details appears.

On the project details tab, you can view the following entities associated with the project.
+ Repositories: Code repositories (repos) associated with this project. If you use a SageMaker AI-provided template when you create your project, it creates a AWS CodeCommit repo or a third-party Git repo. For more information about CodeCommit, see [What is AWS CodeCommit](https://docs.aws.amazon.com/codecommit/latest/userguide/welcome.html).
+ Pipelines: SageMaker AI ML pipelines that define steps to prepare data, train, and deploy models. For information about SageMaker AI ML pipelines, see [Pipelines actions](pipelines-build.md).
+ Experiments: One or more Amazon SageMaker Autopilot experiments associated with the project. For information about Autopilot, see [SageMaker Autopilot](autopilot-automate-model-development.md).
+ Model groups: Groups of model versions that were created by pipeline executions in the project. For information about model groups, see [Create a Model Group](model-registry-model-group.md).
+ Endpoints: SageMaker AI endpoints that host deployed models for real-time inference. When a model version is approved, it is deployed to an endpoint.
+ Settings: Settings for the project. This includes the name and description of the project, information about the project template and `SourceModelPackageGroupName`, and metadata about the project.

------

# Update a MLOps Project in Amazon SageMaker Studio or Studio Classic
Update a MLOps Project

This procedure demonstrates how to update a MLOps project in Amazon SageMaker Studio or Studio Classic. Updating the project gives you the option to modify your end-to-end ML solution. You can update the **Description**, template version, and template parameters.

**Prerequisites**
+ An IAM account or IAM Identity Center to sign in to Studio or Studio Classic. For information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).
+ Basic familiarity with the Studio or Studio Classic user interface. For information about the Studio UI, see [Amazon SageMaker Studio](studio-updated.md). For information about Studio Classic, see [Amazon SageMaker Studio Classic UI Overview](studio-ui.md).
+ Add the following custom inline policies to the specified roles:

  User-created role having `AmazonSageMakerFullAccess`

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "servicecatalog:CreateProvisionedProductPlan",
                  "servicecatalog:DescribeProvisionedProductPlan",
                  "servicecatalog:DeleteProvisionedProductPlan"
              ],
              "Resource": "*"
          }
      ]
  }
  ```

------

  `AmazonSageMakerServiceCatalogProductsLaunchRole`

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "cloudformation:CreateChangeSet",
                  "cloudformation:DeleteChangeSet",
                  "cloudformation:DescribeChangeSet"
              ],
              "Resource": "arn:aws:cloudformation:*:*:stack/SC-*"
          },
          {
              "Effect": "Allow",
              "Action": [
                  "codecommit:PutRepositoryTriggers"
              ],
              "Resource": "arn:aws:codecommit:*:*:sagemaker-*"
          }
      ]
  }
  ```

------

To update your project in Studio or Studio Classic, complete the following steps.

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Deployments**, and then choose **Projects**.

1. Choose the radio button next to the project you want to update.

1. Choose the vertical ellipsis above the upper-right corner of the projects list, and choose **Update**.

1. Choose **Next**.

1. Review the project updates in the summary table, and choose **Update**. It may take a few minutes for the project to update.

------
#### [ Studio Classic ]

**To update a project in Studio Classic**

1. Sign in to Studio Classic. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Deployments** from the menu, and then select **Projects**. A list of your projects appears.

1. Select the name of the project you want to update in the projects list.

1. Choose **Update** from the **Actions** menu in the upper-right corner of the project tab.

1. In the **Update project** dialog box, you can edit the **Description** and listed template parameters.

1. Choose **View difference**.

   A dialog box displays your original and updated project settings. Any change in your project settings can modify or delete resources in the current project. The dialog box displays these changes as well.

1. You may need to wait a few minutes for the **Update** button to become active. Choose **Update**.

1. The project update may take a few minutes to complete. Select **Settings** in the project tab and ensure the parameters have been updated correctly.

------

# Delete a MLOps Project using Amazon SageMaker Studio or Studio Classic
Delete a MLOps Project

This procedure demonstrates how to delete a MLOps project using Amazon SageMaker Studio or Studio Classic.

**Prerequisites**

**Note**  
You can only delete projects in Studio or Studio Classic that you have created. This condition is part of the service catalog permission `servicecatalog:TerminateProvisionedProduct` in the `AmazonSageMakerFullAccess` policy. If needed, you can update this policy to remove this condition.
+ An IAM account or IAM Identity Center to sign in to Studio or Studio Classic. For information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).
+ Basic familiarity with the Studio or Studio Classic user interface. For information about the Studio UI, see [Amazon SageMaker Studio](studio-updated.md). For information about Studio Classic, see [Amazon SageMaker Studio Classic UI Overview](studio-ui.md).

------
#### [ Studio ]

1. Open the SageMaker Studio console by following the instructions in [Launch Amazon SageMaker Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-launch.html).

1. In the left navigation pane, choose **Deployments**, and then choose **Projects**.

1. Choose the radio button next to the project you want to delete.

1. Choose the vertical ellipsis above the upper-right corner of the projects list, and choose **Delete**.

1. Review the information in the **Delete project** dialog box, and choose **Yes, delete the project** if you still want to delete the project.

1. Choose **Delete**.

1. Your projects list appears. Confirm that your project no longer appears in the list.

------
#### [ Studio Classic ]

1. Sign in to Studio Classic. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Deployments** from the menu, and then select **Projects**.

1. Select the target project from the dropdown list. If you don’t see your project, type the project name and apply the filter to find your project.

1. Once you've found your project, select the project name to view details.

1. Choose **Delete** from the **Actions** menu.

1. Confirm your choice by choosing **Delete** from the **Delete Project** window.

------

# Walk Through a SageMaker AI MLOps Project Using Third-party Git Repos
Walk Through a Project Using Third-party Git Repos

**Important**  
As of November 30, 2023, the previous Amazon SageMaker Studio experience is now named Amazon SageMaker Studio Classic. The following section is specific to using the Studio Classic application. For information about using the updated Studio experience, see [Amazon SageMaker Studio](studio-updated.md).  
Studio Classic is still maintained for existing workloads but is no longer available for onboarding. You can only stop or delete existing Studio Classic applications and cannot create new ones. We recommend that you [migrate your workload to the new Studio experience](studio-updated-migrate.md).

This walkthrough uses the template [MLOps templates for model building, training, and deployment with third-party Git using CodePipeline](sagemaker-projects-templates-sm.md#sagemaker-projects-templates-git-code-pipeline) to demonstrate how to use MLOps projects to create a CI/CD system to build, train, and deploy models.

**Prerequisites**

To complete this walkthrough, you need:
+ An IAM or IAM Identity Center account to sign in to Studio Classic. For information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).
+ Permission to use SageMaker AI-provided project templates. For information, see [Granting SageMaker Studio Permissions Required to Use Projects](sagemaker-projects-studio-updates.md).
+ Basic familiarity with the Studio Classic user interface. For information, see [Amazon SageMaker Studio Classic UI Overview](studio-ui.md).
+ Two empty GitHub repositories. You input these repositories into the project template, which will seed these repos with model build and deploy code.

**Topics**
+ [

## Step 1: Set up the GitHub connection
](#sagemaker-proejcts-walkthrough-connect-3rdgit)
+ [

## Step 2: Create the Project
](#sagemaker-proejcts-walkthrough-create-3rdgit)
+ [

## Step 3: Make a Change in the Code
](#sagemaker-projects-walkthrough-change-3rdgit)
+ [

## Step 4: Approve the Model
](#sagemaker-proejcts-walkthrough-approve-3rdgit)
+ [

## (Optional) Step 5: Deploy the Model Version to Production
](#sagemaker-projects-walkthrough-prod-3rdgit)
+ [

## Step 6: Clean Up Resources
](#sagemaker-projectcts-walkthrough-cleanup-3rdgit)

## Step 1: Set up the GitHub connection


In this step, you connect to your GitHub repositories using an [AWS CodeConnections connection](https://docs.aws.amazon.com/dtconsole/latest/userguide/welcome-connections.html). The SageMaker AI project uses this connection to access your source code repositories.

**To set up the GitHub connection:**

1. Log in to the CodePipeline console at [https://console.aws.amazon.com/codepipeline/](https://console.aws.amazon.com/codepipeline/)

1. Under **Settings** in the navigation pane, choose **Connections**.

1. Choose **Create connection**.

1. For **Select a provider**, select **GitHub**.

1. For **Connection name**, enter a name.

1. Choose **Connect to GitHub**.

1. If the AWS Connector GitHub app isn’t previously installed, choose **Install new app**.

   This displays a list of all the GitHub personal accounts and organizations to which you have access.

1. Choose the account where you want to establish connectivity for use with SageMaker Projects and GitHub repositories.

1. Choose **Configure**.

1. You can optionally select your specific repositories or choose **All repositories**.

1. Choose **Save**. When the app is installed, you’re redirected to the **Connect to GitHub** page and the installation ID is automatically populated.

1. Choose **Connect**.

1. Add a tag with the key `sagemaker` and value `true` to this CodeConnections connection.

1. Copy the connection ARN to save for later. You use the ARN as a parameter in the project creation step.

## Step 2: Create the Project


In this step, you create a SageMaker AI MLOps project by using a SageMaker AI-provided project template to build, train, and deploy models.

**To create the SageMaker AI MLOps project**

1. Sign in to Studio. For more information, see [Amazon SageMaker AI domain overview](gs-studio-onboard.md).

1. In the Studio sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Deployments** from the menu, and then select **Projects**.

1. Choose **Create project**.

   The **Create project** tab appears.

1. For **SageMaker AI project templates**, choose **Model building, training, and deployment with third-party Git repositories using CodePipeline**.

1. Choose **Next**.

1. Under **ModelBuild CodeRepository Info**, provide the following parameters:
   + For **Branch**, enter the branch to use from your Git repository for pipeline activities.
   + For **Full Repository Name**, enter the Git repository name in the format of *username/repository name* or *organization/repository name*.
   + For **Code Connection ARN**, enter the ARN of the CodeConnections connection you created in Step 1.

1. Under **ModelDeploy CodeRepository Info**, provide the following parameters:
   + For **Branch**, enter the branch to use from your Git repository for pipeline activities.
   + For **Full Repository Name**, enter the Git repository name in the format of *username/repository name* or *organization/repository name*.
   + For **Code Connection ARN**, enter the ARN of the CodeConnections connection you created in Step 1.

1. Choose **Create Project**.

The project appears in the **Projects** list with a **Status** of **Created**.

## Step 3: Make a Change in the Code


Now make a change to the pipeline code that builds the model and commit the change to initiate a new pipeline run. The pipeline run registers a new model version.

**To make a code change**

1. In your model build GitHub repo, navigate to the `pipelines/abalone` folder. Double-click `pipeline.py` to open the code file.

1. In the `pipeline.py` file, find the line that sets the training instance type.

   ```
   training_instance_type = ParameterString(
           name="TrainingInstanceType", default_value="ml.m5.xlarge"
   ```

   Open the file for editing, change `ml.m5.xlarge` to `ml.m5.large`, then commit.

After you commit your code change, the MLOps system initiates a run of the pipeline that creates a new model version. In the next step, you approve the new model version to deploy it to production.

## Step 4: Approve the Model


Now you approve the new model version that was created in the previous step to initiate a deployment of the model version to a SageMaker AI endpoint.

**To approve the model version**

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Deployments** from the menu, and then select **Projects**.

1. Find the name of the project you created in the first step and double-click on it to open the project tab for your project.

1. In the project tab, choose **Model groups**, then double-click the name of the model group that appears.

   The model group tab appears.

1. In the model group tab, double-click **Version 1**. The **Version 1** tab opens. Choose **Update status**.

1. In the model **Update model version status** dialog box, in the **Status** dropdown list, select **Approve** and then choose **Update status**.

   Approving the model version causes the MLOps system to deploy the model to staging. To view the endpoint, choose the **Endpoints** tab on the project tab.

## (Optional) Step 5: Deploy the Model Version to Production


Now you can deploy the model version to the production environment.

**Note**  
To complete this step, you need to be an administrator in your Studio Classic domain. If you are not an administrator, skip this step.

**To deploy the model version to the production environment**

1. Log in to the CodePipeline console at [https://console.aws.amazon.com/codepipeline/](https://console.aws.amazon.com/codepipeline/)

1. Choose **Pipelines**, then choose the pipeline with the name **sagemaker-*projectname*-*projectid*-modeldeploy**, where *projectname* is the name of your project, and *projectid* is the ID of your project.

1. In the **DeployStaging** stage, choose **Review**.

1. In the **Review** dialog box, choose **Approve**.

   Approving the **DeployStaging** stage causes the MLOps system to deploy the model to production. To view the endpoint, choose the **Endpoints** tab on the project tab in Studio Classic.

## Step 6: Clean Up Resources


To stop incurring charges, clean up the resources that were created in this walkthrough.

**Note**  
To delete the CloudFormation stack and the Amazon S3 bucket, you need to be an administrator in Studio Classic. If you are not an administrator, ask your administrator to complete those steps.

1. In the Studio Classic sidebar, choose the **Home** icon ( ![\[Black square icon representing a placeholder or empty image.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/studio/icons/house.png)).

1. Select **Deployments** from the menu, and then select **Projects**.

1. Select the target project from the dropdown list. If you don’t see your project, type the project name and apply the filter to find your project.

1. Select your project to view its details in the main panel.

1. Choose **Delete** from the **Actions** menu.

1. Confirm your choice by choosing **Delete** from the **Delete Project** window.

   This deletes the Service Catalog provisioned product that the project created. This includes the CodeCommit, CodePipeline, and CodeBuild resources created for the project.

1. Delete the CloudFormation stacks that the project created. There are two stacks, one for staging and one for production. The names of the stacks are **sagemaker-*projectname*-*project-id*-deploy-staging** and **sagemaker-*projectname*-*project-id*-deploy-prod**, where *projectname* is the name of your project, and *project-id* is the ID of your project.

   For information about how to delete a CloudFormation stack, see [Deleting a stack on the CloudFormation console](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-console-delete-stack.html) in the *CloudFormation User Guide*.

1. Delete the Amazon S3 bucket that the project created. The name of the bucket is **sagemaker-project-*project-id***, where *project-id* is the ID of your project.

# Amazon SageMaker AI MLOps troubleshooting
MLOps troubleshooting

Use the following to troubleshoot issues with MLOps in SageMaker AI. This topic provides information about common errors and how to resolve them. 

## If I try to delete a SageMaker AI project created from a SageMaker AI template and receive an error due to non-empty Amazon S3 buckets or Amazon ECR repositories, how can I delete the project?


If you try to delete your SageMaker AI project and get one of the following error messages:

```
The bucket you tried to delete is not empty
```

```
The repository with name 'repository-name' in registry 
        with id 'id' cannot be deleted because it still contains images
```

then you have non-empty Amazon S3 buckets or Amazon ECR repositories which you need to manually delete before you delete the SageMaker AI project. CloudFormation does not automatically delete non-empty Amazon S3 buckets or Amazon ECR repositories for you.