

# SageMaker AI Components for Kubeflow Pipelines


With SageMaker AI components for Kubeflow Pipelines, you can create and monitor native SageMaker AI training, tuning, endpoint deployment, and batch transform jobs from your Kubeflow Pipelines. By running Kubeflow Pipeline jobs on SageMaker AI, you move data processing and training jobs from the Kubernetes cluster to SageMaker AI's machine learning-optimized managed service. This document assumes prior knowledge of Kubernetes and Kubeflow. 

**Topics**
+ [

## What are Kubeflow Pipelines?
](#what-is-kubeflow-pipelines)
+ [

## What are Kubeflow Pipeline components?
](#kubeflow-pipeline-components)
+ [

## Why use SageMaker AI Components for Kubeflow Pipelines?
](#why-use-sagemaker-components)
+ [

## SageMaker AI Components for Kubeflow Pipelines versions
](#sagemaker-components-versions)
+ [

## List of SageMaker AI Components for Kubeflow Pipelines
](#sagemaker-components-list)
+ [

## IAM permissions
](#iam-permissions)
+ [

## Converting pipelines to use SageMaker AI
](#converting-pipelines-to-use-amazon-sagemaker)
+ [

# Install Kubeflow Pipelines
](kubernetes-sagemaker-components-install.md)
+ [

# Use SageMaker AI components
](kubernetes-sagemaker-components-tutorials.md)

## What are Kubeflow Pipelines?


Kubeflow Pipelines (KFP) is a platform for building and deploying portable, scalable machine learning (ML) workflows based on Docker containers. The Kubeflow Pipelines platform consists of the following:
+ A user interface (UI) for managing and tracking experiments, jobs, and runs. 
+ An engine (Argo) for scheduling multi-step ML workflows.
+ An SDK for defining and manipulating pipelines and components.
+ Notebooks for interacting with the system using the SDK.

A pipeline is a description of an ML workflow expressed as a [directed acyclic graph](https://www.kubeflow.org/docs/pipelines/concepts/graph/). Every step in the workflow is expressed as a Kubeflow Pipeline [component](https://www.kubeflow.org/docs/pipelines/overview/concepts/component/), which is a AWS SDK for Python (Boto3) module.

For more information on Kubeflow Pipelines, see the [Kubeflow Pipelines documentation](https://www.kubeflow.org/docs/pipelines/). 

## What are Kubeflow Pipeline components?


A Kubeflow Pipeline component is a set of code used to execute one step of a Kubeflow pipeline. Components are represented by a Python module built into a Docker image. When the pipeline runs, the component's container is instantiated on one of the worker nodes on the Kubernetes cluster running Kubeflow, and your logic is executed. Pipeline components can read outputs from the previous components and create outputs that the next component in the pipeline can consume. These components make it fast and easy to write pipelines for experimentation and production environments without having to interact with the underlying Kubernetes infrastructure.

You can use SageMaker AI Components in your Kubeflow pipeline. Rather than encapsulating your logic in a custom container, you simply load the components and describe your pipeline using the Kubeflow Pipelines SDK. When the pipeline runs, your instructions are translated into a SageMaker AI job or deployment. The workload then runs on the fully managed infrastructure of SageMaker AI. 

## Why use SageMaker AI Components for Kubeflow Pipelines?
Why use SageMaker AI Components?

SageMaker AI Components for Kubeflow Pipelines offer an alternative to launching your compute-intensive jobs from SageMaker AI. The components integrate SageMaker AI with the portability and orchestration of Kubeflow Pipelines. Using the SageMaker AI Components for Kubeflow Pipelines, you can create and monitor your SageMaker AI resources as part of a Kubeflow Pipelines workflow. Each of the jobs in your pipelines runs on SageMaker AI instead of the local Kubernetes cluster allowing you to take advantage of key SageMaker AI features such as data labeling, large-scale hyperparameter tuning and distributed training jobs, or one-click secure and scalable model deployment. The job parameters, status, logs, and outputs from SageMaker AI are still accessible from the Kubeflow Pipelines UI. 

The SageMaker AI components integrate key SageMaker AI features into your ML workflows from preparing data, to building, training, and deploying ML models. You can create a Kubeflow Pipeline built entirely using these components, or integrate individual components into your workflow as needed. The components are available in one or two versions. Each version of a component leverages a different backend. For more information on those versions, see [SageMaker AI Components for Kubeflow Pipelines versions](#sagemaker-components-versions).

There is no additional charge for using SageMaker AI Components for Kubeflow Pipelines. You incur charges for any SageMaker AI resources you use through these components.

## SageMaker AI Components for Kubeflow Pipelines versions
SageMaker AI Components versions

SageMaker AI Components for Kubeflow Pipelines come in two versions. Each version leverages a different backend to create and manage resources on SageMaker AI.
+ The SageMaker AI Components for Kubeflow Pipelines version 1 (v1.x or below) use **[Boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html)** (AWS SDK for Python (Boto3)) as backend. 
+ The version 2 (v2.0.0-alpha2 and above) of SageMaker AI Components for Kubeflow Pipelines use [ SageMaker AI Operator for Kubernetes (ACK)](https://github.com/aws-controllers-k8s/sagemaker-controller). 

  AWS introduced [ACK](https://aws-controllers-k8s.github.io/community/) to facilitate a Kubernetes-native way of managing AWS Cloud resources. ACK includes a set of AWS service-specific controllers, one of which is the SageMaker AI controller. The SageMaker AI controller makes it easier for machine learning developers and data scientists using Kubernetes as their control plane to train, tune, and deploy machine learning (ML) models in SageMaker AI. For more information, see [SageMaker AI Operators for Kubernetes](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/) 

Both versions of the SageMaker AI Components for Kubeflow Pipelines are supported. However, the version 2 provides some additional advantages. In particular, it offers: 

1. A consistent experience to manage your SageMaker AI resources from any application; whether you are using Kubeflow pipelines, or Kubernetes CLI (`kubectl`) or other Kubeflow applications such as Notebooks. 

1. The flexibility to manage and monitor your SageMaker AI resources outside of the Kubeflow pipeline workflow. 

1. Zero setup time to use the SageMaker AI components if you deployed the full [Kubeflow on AWS](https://awslabs.github.io/kubeflow-manifests/docs/about/) release since the SageMaker AI Operator is part of its deployment. 

## List of SageMaker AI Components for Kubeflow Pipelines
List of SageMaker AI Components

The following is a list of all SageMaker AI Components for Kubeflow Pipelines and their available versions. Alternatively, you can find all [SageMaker AI Components for Kubeflow Pipelines in GitHub](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker#versioning).

**Note**  
We encourage users to utilize Version 2 of a SageMaker AI component wherever it is available.

### Ground Truth components

+ **Ground Truth**

  The Ground Truth component enables you to submit SageMaker AI Ground Truth labeling jobs directly from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **Workteam**

  The Workteam component enables you to create SageMaker AI private workteam jobs directly from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

### Data processing components

+ **Processing**

  The Processing component enables you to submit processing jobs to SageMaker AI directly from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

### Training components

+ **Training**

  The Training component allows you to submit SageMaker Training jobs directly from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **Hyperparameter Optimization**

  The Hyperparameter Optimization component enables you to submit hyperparameter tuning jobs to SageMaker AI directly from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

### Inference components

+ **Hosting Deploy**

  The Hosting components allow you to deploy a model using SageMaker AI hosting services from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **Batch Transform**

  The Batch Transform component allows you to run inference jobs for an entire dataset in SageMaker AI from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)
+ **Model Monitor**

  The Model Monitor components allow you to monitor the quality of SageMaker AI machine learning models in production from a Kubeflow Pipelines workflow.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/sagemaker/latest/dg/kubernetes-sagemaker-components-for-kubeflow-pipelines.html)

## IAM permissions


Deploying Kubeflow Pipelines with SageMaker AI components requires the following three layers of authentication: 
+ An IAM role granting your gateway node (which can be your local machine or a remote instance) access to the Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

  The user accessing the gateway node assumes this role to:
  + Create an Amazon EKS cluster and install KFP
  + Create IAM roles
  + Create Amazon S3 buckets for your sample input data

  The role requires the following permissions:
  + CloudWatchLogsFullAccess 
  + [https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess](https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess) 
  + IAMFullAccess
  + AmazonS3FullAccess
  + AmazonEC2FullAccess
  + AmazonEKSAdminPolicy (Create this policy using the schema from [Amazon EKS Identity-Based Policy Examples](https://docs.aws.amazon.com/eks/latest/userguide/security_iam_id-based-policy-examples.html)) 
+ A Kubernetes IAM execution role assumed by Kubernetes pipeline pods (**kfp-example-pod-role**) or the SageMaker AI Operator for Kubernetes controller pod to access SageMaker AI. This role is used to create and monitor SageMaker AI jobs from Kubernetes.

  The role requires the following permission:
  + AmazonSageMakerFullAccess 

  You can limit permissions to the KFP and controller pods by creating and attaching your own custom policy.
+ A SageMaker AI IAM execution role assumed by SageMaker AI jobs to access AWS resources such as Amazon S3 or Amazon ECR (**kfp-example-sagemaker-execution-role**).

  SageMaker AI jobs use this role to:
  + Access SageMaker AI resources
  + Input Data from Amazon S3
  + Store your output model to Amazon S3

  The role requires the following permissions:
  + AmazonSageMakerFullAccess 
  + AmazonS3FullAccess 

## Converting pipelines to use SageMaker AI


You can convert an existing pipeline to use SageMaker AI by porting your generic Python [processing containers](https://docs.aws.amazon.com/sagemaker/latest/dg/amazon-sagemaker-containers.html) and [training containers](https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html). If you are using SageMaker AI for inference, you also need to attach IAM permissions to your cluster and convert an artifact to a model.

# Install Kubeflow Pipelines


[Kubeflow Pipelines (KFP)](https://www.kubeflow.org/docs/components/pipelines/v2/introduction/) is the pipeline orchestration component of Kubeflow.

You can deploy Kubeflow Pipelines (KFP) on an existing Amazon Elastic Kubernetes Service (Amazon EKS) or create a new Amazon EKS cluster. Use a gateway node to interact with your cluster. The gateway node can be your local machine or an Amazon EC2 instance.

The following section guides you through the steps to set up and configure these resources.

**Topics**
+ [

## Choose an installation option
](#choose-install-option)
+ [

## Configure your pipeline permissions to access SageMaker AI
](#configure-permissions-for-pipeline)
+ [

## Access the KFP UI (Kubeflow Dashboard)
](#access-the-kfp-ui)

## Choose an installation option


Kubeflow Pipelines is available as a core component of the full distribution of Kubeflow on AWS or as a standalone installation.

Select the option that applies to your use case:

1. [Full Kubeflow on AWS Deployment](#full-kubeflow-deployment)

   To use other Kubeflow components in addition to Kubeflow Pipelines, choose the full [AWS distribution of Kubeflow](https://awslabs.github.io/kubeflow-manifests) deployment. 

1. [Standalone Kubeflow Pipelines Deployment](#kubeflow-pipelines-standalone)

   To use the Kubeflow Pipelines without the other components of Kubeflow, install Kubeflow pipelines standalone. 

### Full Kubeflow on AWS Deployment


To install the full release of Kubeflow on AWS, choose the vanilla deployment option from [Kubeflow on AWS deployment guide](https://awslabs.github.io/kubeflow-manifests/docs/deployment/) or any other deployment option supporting integrations with various AWS services (Amazon S3, Amazon RDS, Amazon Cognito).

### Standalone Kubeflow Pipelines Deployment


This section assumes that your user has permissions to create roles and define policies for the role.

#### Set up a gateway node


You can use your local machine or an Amazon EC2 instance as your gateway node. A gateway node is used to create an Amazon EKS cluster and access the Kubeflow Pipelines UI. 

Complete the following steps to set up your node. 

1. 

**Create a gateway node.**

   You can use an existing Amazon EC2 instance or create a new instance with the latest Ubuntu 18.04 DLAMI version using the steps in [Launching and Configuring a DLAMI](https://docs.aws.amazon.com/dlami/latest/devguide/launch-config.html).

1. 

**Create an IAM role to grant your gateway node access to AWS resources.**

   Create an IAM role with permissions to the following resources: CloudWatch, CloudFormation, IAM, Amazon EC2, Amazon S3, Amazon EKS.

   Attach the following policies to the IAM role:
   + CloudWatchLogsFullAccess 
   + [https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess](https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess)
   + IAMFullAccess 
   + AmazonS3FullAccess 
   + AmazonEC2FullAccess 
   + AmazonEKSAdminPolicy (Create this policy using the schema from [Amazon EKS Identity-Based Policy Examples](https://docs.aws.amazon.com/eks/latest/userguide/security_iam_id-based-policy-examples.html)) 

   For information on adding IAM permissions to an IAM role, see [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).

1. 

**Install the following tools and clients**

   Install and configure the following tools and resources on your gateway node to access the Amazon EKS cluster and KFP User Interface (UI). 
   + [AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-install.html): The command line tool for working with AWS services. For AWS CLI configuration information, see [Configuring the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html). 
   + [aws-iam-authenticator](https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html) version 0.1.31 and above: A tool to use AWS IAM credentials to authenticate to a Kubernetes cluster.
   + [https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html](https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html) version above 0.15: The command line tool for working with Amazon EKS clusters.
   + [https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl](https://kubernetes.io/docs/tasks/tools/install-kubectl/#install-kubectl): The command line tool for working with Kubernetes clusters. The version needs to match your Kubernetes version within one minor version.
   + [https://aws.amazon.com/sdk-for-python/](https://aws.amazon.com/sdk-for-python/).

     ```
     pip install boto3
     ```

#### Set up an Amazon EKS cluster


1. If you do not have an existing Amazon EKS cluster, run the following steps from the command line of your gateway node, skip this step otherwise.

   1. Run the following command to create an Amazon EKS cluster with version 1.17 or above. Replace `<clustername>` with any name for your cluster. 

      ```
      eksctl create cluster --name <clustername> --region us-east-1 --auto-kubeconfig --timeout=50m --managed --nodes=1
      ```

   1. When the cluster creation is complete, ensure that you have access to your cluster by listing the cluster's nodes. 

      ```
      kubectl get nodes
      ```

1. Ensure that the current `kubectl` context points to your cluster with the following command. The current context is marked with an asterisk (\$1) in the output.

   ```
   kubectl config get-contexts
   
   CURRENT NAME     CLUSTER
   *   <username>@<clustername>.us-east-1.eksctl.io   <clustername>.us-east-1.eksctl.io
   ```

1. If the desired cluster is not configured as your current default, update the default with the following command. 

   ```
   aws eks update-kubeconfig --name <clustername> --region us-east-1
   ```

#### Install Kubeflow Pipelines


Run the following steps from the terminal of your gateway node to install Kubeflow Pipelines on your cluster.

1. Install all [cert-manager components](https://cert-manager.io/docs/installation/kubectl/).

   ```
   kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.9.1/cert-manager.yaml
   ```

1. Install the Kubeflow Pipelines.

   ```
   export PIPELINE_VERSION=2.0.0-alpha.5
   kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/cert-manager/cluster-scoped-resources?ref=$KFP_VERSION"
   kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
   kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/cert-manager/dev?ref=$KFP_VERSION"
   ```

1. Ensure that the Kubeflow Pipelines service and other related resources are running.

   ```
   kubectl -n kubeflow get all | grep pipeline
   ```

   Your output should look like the following.

   ```
   pod/ml-pipeline-6b88c67994-kdtjv                      1/1     Running            0          2d
   pod/ml-pipeline-persistenceagent-64d74dfdbf-66stk     1/1     Running            0          2d
   pod/ml-pipeline-scheduledworkflow-65bdf46db7-5x9qj    1/1     Running            0          2d
   pod/ml-pipeline-ui-66cc4cffb6-cmsdb                   1/1     Running            0          2d
   pod/ml-pipeline-viewer-crd-6db65ccc4-wqlzj            1/1     Running            0          2d
   pod/ml-pipeline-visualizationserver-9c47576f4-bqmx4   1/1     Running            0          2d
   service/ml-pipeline                       ClusterIP   10.100.170.170   <none>        8888/TCP,8887/TCP   2d
   service/ml-pipeline-ui                    ClusterIP   10.100.38.71     <none>        80/TCP              2d
   service/ml-pipeline-visualizationserver   ClusterIP   10.100.61.47     <none>        8888/TCP            2d
   deployment.apps/ml-pipeline                       1/1     1            1           2d
   deployment.apps/ml-pipeline-persistenceagent      1/1     1            1           2d
   deployment.apps/ml-pipeline-scheduledworkflow     1/1     1            1           2d
   deployment.apps/ml-pipeline-ui                    1/1     1            1           2d
   deployment.apps/ml-pipeline-viewer-crd            1/1     1            1           2d
   deployment.apps/ml-pipeline-visualizationserver   1/1     1            1           2d
   replicaset.apps/ml-pipeline-6b88c67994                      1         1         1       2d
   replicaset.apps/ml-pipeline-persistenceagent-64d74dfdbf     1         1         1       2d
   replicaset.apps/ml-pipeline-scheduledworkflow-65bdf46db7    1         1         1       2d
   replicaset.apps/ml-pipeline-ui-66cc4cffb6                   1         1         1       2d
   replicaset.apps/ml-pipeline-viewer-crd-6db65ccc4            1         1         1       2d
   replicaset.apps/ml-pipeline-visualizationserver-9c47576f4   1         1         1       2d
   ```

## Configure your pipeline permissions to access SageMaker AI


In this section, you create an IAM execution role granting Kubeflow Pipeline pods access to SageMaker AI services. 

### Configuration for SageMaker AI components version 2


To run SageMaker AI Components version 2 for Kubeflow Pipelines, you need to install [SageMaker AI Operator for Kubernetes](https://github.com/aws-controllers-k8s/sagemaker-controller) and configure Role-Based Access Control (RBAC) allowing the Kubeflow Pipelines pods to create SageMaker AI custom resources in your Kubernetes cluster.

**Important**  
Follow this section if you are using Kubeflow pipelines standalone deployment. If you are using AWS distribution of Kubeflow version 1.6.0-aws-b1.0.0 or above, SageMaker AI components version 2 are already set up.

1. Install SageMaker AI Operator for Kubernetes to use SageMaker AI components version 2.

   Follow the *Setup* section of [Machine Learning with ACK SageMaker AI Controller tutorial](https://aws-controllers-k8s.github.io/community/docs/tutorials/sagemaker-example/#setup).

1. Configure RBAC permissions for the execution role (service account) used by Kubeflow Pipelines pods. In Kubeflow Pipelines standalone deployment, pipeline runs are executed in the namespace `kubeflow` using the `pipeline-runner` service account.

   1. Create a [RoleBinding](https://kubernetes.io/docs/reference/access-authn-authz/rbac/#rolebinding-example) that gives the service account permission to manage SageMaker AI custom resources.

      ```
      cat > manage_sagemaker_cr.yaml <<EOF
      apiVersion: rbac.authorization.k8s.io/v1
      kind: RoleBinding
      metadata:
      name: manage-sagemaker-cr  
      namespace: kubeflow
      subjects:
      - kind: ServiceAccount
      name: pipeline-runner
      namespace: kubeflow
      roleRef:
      kind: ClusterRole
      name: ack-sagemaker-controller 
      apiGroup: rbac.authorization.k8s.io
      EOF
      ```

      ```
      kubectl apply -f manage_sagemaker_cr.yaml
      ```

   1. Ensure that the rolebinding was created by running:

      ```
      kubectl get rolebinding manage-sagemaker-cr -n kubeflow -o yaml
      ```

### Configuration for SageMaker AI components version 1


To run SageMaker AI Components version 1 for Kubeflow Pipelines, the Kubeflow Pipeline pods need access to SageMaker AI.

**Important**  
Follow this section whether you are using the full Kubeflow on AWS deployment or Kubeflow Pilepines standalone.

To create an IAM execution role granting Kubeflow pipeline pods access to SageMaker AI, follow those steps:

1. Export your cluster name (e.g., *my-cluster-name*) and cluster region (e.g., *us-east-1*).

   ```
   export CLUSTER_NAME=my-cluster-name
   export CLUSTER_REGION=us-east-1
   ```

1. Export the namespace and service account name according to your installation.
   + For the full Kubeflow on AWS installation, export your profile `namespace` (e.g., *kubeflow-user-example-com*) and *default-editor* as the service account.

     ```
     export NAMESPACE=kubeflow-user-example-com
     export KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT=default-editor
     ```
   + For the standalone Pipelines deployment, export *kubeflow* as the `namespace` and *pipeline-runner* as the service account.

     ```
     export NAMESPACE=kubeflow
     export KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT=pipeline-runner
     ```

1. Create an [ IAM OIDC provider for the Amazon EKS cluster](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html) with the following command.

   ```
   eksctl utils associate-iam-oidc-provider --cluster ${CLUSTER_NAME} \
               --region ${CLUSTER_REGION} --approve
   ```

1. Create an IAM execution role for the KFP pods to access AWS services (SageMaker AI, CloudWatch).

   ```
   eksctl create iamserviceaccount \
   --name ${KUBEFLOW_PIPELINE_POD_SERVICE_ACCOUNT} \
   --namespace ${NAMESPACE} --cluster ${CLUSTER_NAME} \
   --region ${CLUSTER_REGION} \
   --attach-policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess \
   --attach-policy-arn arn:aws:iam::aws:policy/CloudWatchLogsFullAccess \
   --override-existing-serviceaccounts \
   --approve
   ```

Once your pipeline permissions are configured to access SageMaker AI Components version 1, follow the SageMaker AI components for Kubeflow pipelines guide on the[ Kubeflow on AWS documentation](https://awslabs.github.io/kubeflow-manifests/docs/amazon-sagemaker-integration/sagemaker-components-for-kubeflow-pipelines/).

## Access the KFP UI (Kubeflow Dashboard)


The Kubeflow Pipelines UI is used for managing and tracking experiments, jobs, and runs on your cluster. For instructions on how to access the Kubeflow Pipelines UI from your gateway node, follow the steps that apply to your deployment option in this section.

### Full Kubeflow on AWS Deployment


Follow the instructions on the [Kubeflow on AWS website](https://awslabs.github.io/kubeflow-manifests/docs/deployment/connect-kubeflow-dashboard/) to connect to the Kubeflow dashboard and navigate to the pipelines tab.

### Standalone Kubeflow Pipelines Deployment


Use port forwarding to access the Kubeflow Pipelines UI from your gateway node by following those steps.

#### Set up port forwarding to the KFP UI service


Run the following command from the command line of your gateway node.

1. Verify that the KFP UI service is running using the following command.

   ```
   kubectl -n kubeflow get service ml-pipeline-ui
   
   NAME             TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
   ml-pipeline-ui   ClusterIP   10.100.38.71   <none>        80/TCP    2d22h
   ```

1. Run the following command to set up port forwarding to the KFP UI service. This forwards the KFP UI to port 8080 on your gateway node and allows you to access the KFP UI from your browser. 

   ```
   kubectl port-forward -n kubeflow service/ml-pipeline-ui 8080:80
   ```

   The port forward from your remote machine drops if there is no activity. Run this command again if your dashboard is unable to get logs or updates. If the commands return an error, ensure that there is no process already running on the port you are trying to use. 

#### Access the KFP UI service


Your method of accessing the KFP UI depends on your gateway node type.
+ Local machine as the gateway node:

  1. Access the dashboard in your browser as follows: 

     ```
     http://localhost:8080
     ```

  1. Choose **Pipelines** to access the pipelines UI. 
+ Amazon EC2 instance as the gateway node:

  1. You need to set up an SSH tunnel on your Amazon EC2 instance to access the Kubeflow dashboard from your local machine's browser. 

     From a new terminal session in your local machine, run the following. Replace `<public-DNS-of-gateway-node>` with the IP address of your instance found on the Amazon EC2 console. You can also use the public DNS. Replace `<path_to_key>` with the path to the pem key used to access the gateway node. 

     ```
     public_DNS_address=<public-DNS-of-gateway-node>
     key=<path_to_key>
     
     on Ubuntu:
     ssh -i ${key} -L 9000:localhost:8080 ubuntu@${public_DNS_address}
     
     or on Amazon Linux:
     ssh -i ${key} -L 9000:localhost:8080 ec2-user@${public_DNS_address}
     ```

  1. Access the dashboard in your browser. 

     ```
     http://localhost:9000
     ```

  1. Choose **Pipelines** to access the KFP UI. 

#### (Optional) Grant SageMaker AI notebook instances access to Amazon EKS, and run KFP pipelines from your notebook.


A SageMaker notebook instance is a fully managed Amazon EC2 compute instance that runs the Jupyter Notebook App. You can use a notebook instance to create and manage Jupyter notebooks then define, compile, deploy, and run your KFP pipelines using AWS SDK for Python (Boto3) or the KFP CLI. 

1. Follow the steps in [Create a SageMaker Notebook Instance](https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html) to create your notebook instance, then attach the `S3FullAccess` policy to its IAM execution role.

1. From the command line of your gateway node, run the following command to retrieve the IAM role ARN of the notebook instance you created. Replace `<instance-name>` with the name of your instance.

   ```
   aws sagemaker describe-notebook-instance --notebook-instance-name <instance-name> --region <region> --output text --query 'RoleArn'
   ```

   This command outputs the IAM role ARN in the `arn:aws:iam::<account-id>:role/<role-name>` format. Take note of this ARN.

1. Run this command to attach the following policies (AmazonSageMakerFullAccess, AmazonEKSWorkerNodePolicy, AmazonS3FullAccess) to this IAM role. Replace `<role-name>` with the `<role-name>` in your ARN. 

   ```
   aws iam attach-role-policy --role-name <role-name> --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
   aws iam attach-role-policy --role-name <role-name> --policy-arn arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy
   aws iam attach-role-policy --role-name <role-name> --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
   ```

1. Amazon EKS clusters use IAM roles to control access to the cluster. The rules are implemented in a config map named `aws-auth`. `eksctl` provides commands to read and edit the `aws-auth` config map. Only the users that have access to the cluster can edit this config map.

   `system:masters` is one of the default user groups with super user permissions to the cluster. Add your user to this group or create a group with more restrictive permissions.

1. Bind the role to your cluster by running the following command. Replace `<IAM-Role-arn>` with the ARN of the IAM role. `<your_username>` can be any unique username.

   ```
   eksctl create iamidentitymapping \
   --cluster <cluster-name> \
   --arn <IAM-Role-arn> \
   --group system:masters \
   --username <your-username> \
   --region <region>
   ```

1. Open a Jupyter notebook on your SageMaker AI instance and run the following command to ensure that it has access to the cluster.

   ```
   aws eks --region <region> update-kubeconfig --name <cluster-name>
   kubectl -n kubeflow get all | grep pipeline
   ```

# Use SageMaker AI components


In this tutorial, you run a pipeline using SageMaker AI Components for Kubeflow Pipelines to train a classification model using Kmeans with the MNIST dataset on SageMaker AI. The workflow uses Kubeflow Pipelines as the orchestrator and SageMaker AI to execute each step of the workflow. The example was taken from an existing [ SageMaker AI example](https://github.com/aws/amazon-sagemaker-examples/blob/8279abfcc78bad091608a4a7135e50a0bd0ec8bb/sagemaker-python-sdk/1P_kmeans_highlevel/kmeans_mnist.ipynb) and modified to work with SageMaker AI Components for Kubeflow Pipelines.

You can define your pipeline in Python using AWS SDK for Python (Boto3) then use the KFP dashboard, KFP CLI, or Boto3 to compile, deploy, and run your workflows. The full code for the MNIST classification pipeline example is available in the [Kubeflow Github repository](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples/mnist-kmeans-sagemaker#mnist-classification-with-kmeans). To use it, clone the Python files to your gateway node.

You can find additional [ SageMaker AI Kubeflow Pipelines examples](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/aws-samples) on GitHub. For information on the components used, see the [KubeFlow Pipelines GitHub repository](https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker).

To run the classification pipeline example, create a SageMaker AI IAM execution role granting your training job the permission to access AWS resources, then continue with the steps that correspond to your deployment option.

## Create a SageMaker AI execution role


The `kfp-example-sagemaker-execution-role` IAM role is a runtime role assumed by SageMaker AI jobs to access AWS resources. In the following command, you create an IAM execution role named `kfp-example-sagemaker-execution-role`, attach two managed policies (AmazonSageMakerFullAccess, AmazonS3FullAccess), and create a trust relationship with SageMaker AI to grant SageMaker AI jobs access to those AWS resources.

You provide this role as an input parameter when running the pipeline.

Run the following command to create the role. Note the ARN that is returned in your output.

```
SAGEMAKER_EXECUTION_ROLE_NAME=kfp-example-sagemaker-execution-role

TRUST="{ \"Version\": \"2012-10-17		 	 	 \", \"Statement\": [ { \"Effect\": \"Allow\", \"Principal\": { \"Service\": \"sagemaker.amazonaws.com\" }, \"Action\": \"sts:AssumeRole\" } ] }"
aws iam create-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --assume-role-policy-document "$TRUST"
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonSageMakerFullAccess
aws iam attach-role-policy --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess

aws iam get-role --role-name ${SAGEMAKER_EXECUTION_ROLE_NAME} --output text --query 'Role.Arn'
```

## Full Kubeflow on AWS Deployment


Follow the instructions of the [SageMaker Training Pipeline tutorial for MNIST Classification with K-Means](https://awslabs.github.io/kubeflow-manifests/docs/amazon-sagemaker-integration/sagemaker-components-for-kubeflow-pipelines/).

## Standalone Kubeflow Pipelines Deployment


### Prepare datasets


To run the pipelines, you need to upload the data extraction pre-processing script to an Amazon S3 bucket. This bucket and all resources for this example must be located in the `us-east-1` region. For information on creating a bucket, see [Creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/gsg/CreatingABucket.html).

From the `mnist-kmeans-sagemaker` folder of the Kubeflow repository you cloned on your gateway node, run the following command to upload the `kmeans_preprocessing.py` file to your Amazon S3 bucket. Change `<bucket-name>` to the name of your Amazon S3 bucket.

```
aws s3 cp mnist-kmeans-sagemaker/kmeans_preprocessing.py s3://<bucket-name>/mnist_kmeans_example/processing_code/kmeans_preprocessing.py
```

### Compile and deploy your pipeline


After defining the pipeline, you must compile it to an intermediate representation before you submit it to the Kubeflow Pipelines service on your cluster. The intermediate representation is a workflow specification in the form of a YAML file compressed into a tar.gz file. You need the KFP SDK to compile your pipeline.

#### Install KFP SDK


Run the following from the command line of your gateway node:

1. Install the KFP SDK following the instructions in the [Kubeflow pipelines documentation](https://www.kubeflow.org/docs/pipelines/sdk/install-sdk/).

1. Verify that the KFP SDK is installed with the following command:

   ```
   pip show kfp
   ```

1. Verify that `dsl-compile` has been installed correctly as follows:

   ```
   which dsl-compile
   ```

#### Compile your pipeline


You have three options to interact with Kubeflow Pipelines: KFP UI, KFP CLI, or the KFP SDK. The following sections illustrate the workflow using the KFP UI and CLI.

Complete the following steps from your gateway node.

1. Modify your Python file with your Amazon S3 bucket name and IAM role ARN.

1. Use the `dsl-compile` command from the command line to compile your pipeline as follows. Replace `<path-to-python-file>` with the path to your pipeline and `<path-to-output>` with the location where you want your tar.gz file to be.

   ```
   dsl-compile --py <path-to-python-file> --output <path-to-output>
   ```

#### Upload and run the pipeline using the KFP CLI


Complete the following steps from the command line of your gateway node. KFP organizes runs of your pipeline as experiments. You have the option to specify an experiment name. If you do not specify one, the run will be listed under **Default** experiment.

1. Upload your pipeline as follows:

   ```
   kfp pipeline upload --pipeline-name <pipeline-name> <path-to-output-tar.gz>
   ```

   Your output should look like the following. Take note of the pipeline `ID`.

   ```
   Pipeline 29c3ff21-49f5-4dfe-94f6-618c0e2420fe has been submitted
   
   Pipeline Details
   ------------------
   ID           29c3ff21-49f5-4dfe-94f6-618c0e2420fe
   Name         sm-pipeline
   Description
   Uploaded at  2020-04-30T20:22:39+00:00
   ...
   ...
   ```

1. Create a run using the following command. The KFP CLI run command currently does not support specifying input parameters while creating the run. You need to update your parameters in the AWS SDK for Python (Boto3) pipeline file before compiling. Replace `<experiment-name>` and `<job-name>` with any names. Replace `<pipeline-id>` with the ID of your submitted pipeline. Replace `<your-role-arn>` with the ARN of `kfp-example-pod-role`. Replace `<your-bucket-name>` with the name of the Amazon S3 bucket you created. 

   ```
   kfp run submit --experiment-name <experiment-name> --run-name <job-name> --pipeline-id <pipeline-id> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
   ```

   You can also directly submit a run using the compiled pipeline package created as the output of the `dsl-compile` command.

   ```
   kfp run submit --experiment-name <experiment-name> --run-name <job-name> --package-file <path-to-output> role_arn="<your-role-arn>" bucket_name="<your-bucket-name>"
   ```

   Your output should look like the following:

   ```
   Creating experiment aws.
   Run 95084a2c-f18d-4b77-a9da-eba00bf01e63 is submitted
   +--------------------------------------+--------+----------+---------------------------+
   | run id                               | name   | status   | created at                |
   +======================================+========+==========+===========================+
   | 95084a2c-f18d-4b77-a9da-eba00bf01e63 | sm-job |          | 2020-04-30T20:36:41+00:00 |
   +--------------------------------------+--------+----------+---------------------------+
   ```

1. Navigate to the UI to check the progress of the job.

#### Upload and run the pipeline using the KFP UI


1. On the left panel, choose the **Pipelines** tab. 

1. In the upper-right corner, choose **\$1UploadPipeline**. 

1. Enter the pipeline name and description. 

1. Choose **Upload a file** and enter the path to the tar.gz file you created using the CLI or with AWS SDK for Python (Boto3).

1. On the left panel, choose the **Pipelines** tab.

1. Find the pipeline you created.

1. Choose **\$1CreateRun**.

1. Enter your input parameters.

1. Choose **Run**.

### Run predictions


Once your classification pipeline is deployed, you can run classification predictions against the endpoint that was created by the Deploy component. Use the KFP UI to check the output artifacts for `sagemaker-deploy-model-endpoint_name`. Download the .tgz file to extract the endpoint name or check the SageMaker AI console in the region you used.

#### Configure permissions to run predictions


If you want to run predictions from your gateway node, skip this section.

**To use any other machine to run predictions, assign the `sagemaker:InvokeEndpoint` permission to the IAM role used by the client machine.**

1. On your gateway node, run the following to create an IAM policy file:

   ```
   cat <<EoF > ./sagemaker-invoke.json
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "sagemaker:InvokeEndpoint"
               ],
               "Resource": "*"
           }
       ]
   }
   EoF
   ```

1. Attach the policy to the IAM role of the client node.

   Run the following command. Replace `<your-instance-IAM-role>` with the name of the IAM role. Replace `<path-to-sagemaker-invoke-json>` with the path to the policy file you created.

   ```
   aws iam put-role-policy --role-name <your-instance-IAM-role> --policy-name sagemaker-invoke-for-worker --policy-document file://<path-to-sagemaker-invoke-json>
   ```

#### Run predictions


1. Create a AWS SDK for Python (Boto3) file from your client machine named `mnist-predictions.py` with the following content. Replace the `ENDPOINT_NAME` variable. The script loads the MNIST dataset, creates a CSV from those digits, then sends the CSV to the endpoint for prediction and prints the results.

   ```
   import boto3
   import gzip
   import io
   import json
   import numpy
   import pickle
   
   ENDPOINT_NAME='<endpoint-name>'
   region = boto3.Session().region_name
   
   # S3 bucket where the original mnist data is downloaded and stored
   downloaded_data_bucket = f"jumpstart-cache-prod-{region}"
   downloaded_data_prefix = "1p-notebooks-datasets/mnist"
   
   # Download the dataset
   s3 = boto3.client("s3")
   s3.download_file(downloaded_data_bucket, f"{downloaded_data_prefix}/mnist.pkl.gz", "mnist.pkl.gz")
   
   # Load the dataset
   with gzip.open('mnist.pkl.gz', 'rb') as f:
       train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
   
   # Simple function to create a csv from our numpy array
   def np2csv(arr):
       csv = io.BytesIO()
       numpy.savetxt(csv, arr, delimiter=',', fmt='%g')
       return csv.getvalue().decode().rstrip()
   
   runtime = boto3.Session(region).client('sagemaker-runtime')
   
   payload = np2csv(train_set[0][30:31])
   
   response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                      ContentType='text/csv',
                                      Body=payload)
   result = json.loads(response['Body'].read().decode())
   print(result)
   ```

1. Run the AWS SDK for Python (Boto3) file as follows:

   ```
   python mnist-predictions.py
   ```

### View results and logs


When the pipeline is running, you can choose any component to check execution details, such as inputs and outputs. This lists the names of created resources.

If the KFP request is successfully processed and an SageMaker AI job is created, the component logs in the KFP UI provide a link to the job created in SageMaker AI. The CloudWatch logs are also provided if the job is successfully created. 

If you run too many pipeline jobs on the same cluster, you may see an error message that indicates that you do not have enough pods available. To fix this, log in to your gateway node and delete the pods created by the pipelines you are not using:

```
kubectl get pods -n kubeflow
kubectl delete pods -n kubeflow <name-of-pipeline-pod>
```

### Cleanup


When you're finished with your pipeline, you need to clean up your resources.

1. From the KFP dashboard, terminate your pipeline runs if they do not exit properly by choosing **Terminate**.

1. If the **Terminate** option doesn't work, log in to your gateway node and manually terminate all the pods created by your pipeline run as follows: 

   ```
   kubectl get pods -n kubeflow
   kubectl delete pods -n kubeflow <name-of-pipeline-pod>
   ```

1. Using your AWS account, log in to the SageMaker AI service. Manually stop all training, batch transform, and HPO jobs. Delete models, data buckets, and endpoints to avoid incurring any additional costs. Terminating the pipeline runs does not stop the jobs in SageMaker AI.