

# Getting started with Amazon EMR on EKS in Amazon SageMaker Unified Studio
<a name="getting-started-with-emr-on-eks"></a>

 Before you begin with Amazon EMR on EKS, you must have a compatible Amazon EKS cluster. If you do not have an existing Amazon EKS cluster, see [Get started with Amazon EKS](https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html) for more information regarding cost, installation and management of an Amazon EKS cluster. 

 Amazon EMR on EKS and Amazon SageMaker Unified Studio require additional Amazon EKS cluster configurations granting minimum access controls and connectivity. Review your Amazon EKS cluster configuration and ensure all requirements are fulfilled: 

1.  [ Install and configure the Load Balancer Controller for your Amazon EKS cluster ](https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html) 

1.  [ Enable Amazon EKS cluster access for Amazon EMR on EKS and Amazon SageMaker Unified Studio ](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/enable-eks-cluster-access-for-emr-on-eks-and-sagemaker-unified-studio.html) 

 Additionally, Amazon EKS clusters in a different account or Amazon VPC network than your Amazon SageMaker Unified Studio domain require additional configuration. Review your Amazon EKS cluster configuration and ensure all requirements are fulfilled: 

1.  [ Enable cross-account access for Amazon EMR on EKS using Amazon SageMaker Unified Studio associated domains ](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/enable-cross-account-access-using-associated-domains.html) 

1.  [ Enable cross-network access for Amazon SageMaker Unified Studio using VPC peering connections ](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/enable-cross-network-access-using-vpc-peering.html) 

## Configure project profiles in Amazon SageMaker Unified Studio for Amazon EMR on EKS
<a name="configure-project-profile-for-emr-on-eks"></a>

 For data workers to use Amazon EMR on EKS in Amazon SageMaker Unified Studio, administrators must configure project profiles with Amazon EMR on EKS environment blueprint configurations. 

**Note**  
 Administrators can configure multiple environment blueprint configurations using different Amazon EKS clusters in the same project profile. Data workers can view environment blueprint configurations and select a specific Amazon EKS cluster when creating Amazon EMR on EKS resources in a Amazon SageMaker Unified Studio project. 

1.  Navigate to the [Amazon SageMaker Unified Studio management console](https://console.aws.amazon.com/datazone). 

1.  From the navigation bar, select **Domains**. For cross-account Amazon EKS clusters, select **Associated domains**. 

1.  Select the name of the domain you want to configure Amazon EMR on EKS for. 

1.  In the domain management view, navigate to **Project profiles**. 

1.  Search for and select your target project profile. 

1.  In the project profile management view, navigate to the **Blueprint deployment settings** view and select **Blueprint deployment settings**. 

1.  In the **Blueprint** section, select **EmrOnEks** from the dropdown. 

1.  In the **Account and region** section, specify the same AWS account and AWS region as your Amazon EKS cluster. 

1.  In the **Blueprint parameters** section, specify the Amazon EKS cluster ARN as the `eksClusterArn` user parameter value. 

1.  At the bottom of the page, select **Add blueprint deployment settings** to create your Amazon EMR on EKS environment blueprint configuration. 

# Enable Amazon EKS cluster access for Amazon EMR on EKS and Amazon SageMaker Unified Studio
<a name="enable-eks-cluster-access-for-emr-on-eks-and-sagemaker-unified-studio"></a>

 Amazon EMR on EKS and Amazon SageMaker Unified Studio require access to the Kubernetes service running on the Amazon EKS cluster. 

## Amazon EKS cluster access for Amazon EMR on EKS
<a name="eks-cluster-access-for-emr-on-eks"></a>

1.  Create a Kubernetes cluster role for Amazon EMR on EKS. 

   ```
   kubectl apply -f - <<EOF
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRole
   metadata:
     name: emr-containers
   rules:
     - apiGroups: [""]
       resources: ["namespaces"]
       verbs: ["get"]
     - apiGroups: [""]
       resources: ["statefulsets", "event", "serviceaccounts", "services", "configmaps", "events", "pods", "pods/log", "pods/exec", "pods/portforward", "pods/secrets"]
       verbs: ["update", "get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
     - apiGroups: [""]
       resources: ["secrets"]
       verbs: ["list", "get", "create", "patch", "delete", "watch"]
     - apiGroups: ["apps"]
       resources: ["statefulsets", "deployments", "configmaps", "events", "persistentvolumeclaims", "pods", "pods/exec", "pods/log", "pods/portforward", "pods/secrets", "serviceaccounts", "services"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "update", "label", "deletecollection"]
     - apiGroups: ["batch", "extensions"]
       resources: ["jobs", "configmaps", "events", "persistentvolumeclaims", "pods", "pods/exec", "pods/log", "pods/portforward", "pods/secrets", "serviceaccounts", "services", "statefulsets"]
       verbs: ["get", "describe", "create", "delete", "watch", "list", "patch", "update", "edit", "deletecollection", "label"]
     - apiGroups: ["extensions", "networking.k8s.io"]
       resources: ["ingresses"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["rbac.authorization.k8s.io"]
       resources: ["clusterroles","clusterrolebindings","roles", "rolebindings"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
     - apiGroups: [""]
       resources: ["persistentvolumeclaims"]
       verbs: ["update", "get", "list", "watch", "describe", "create", "edit", "delete",  "deletecollection", "annotate", "patch", "label"]
   EOF
   ```

1.  Create a Kubernetes cluster role binding for Amazon EMR on EKS. 

   ```
   kubectl apply -f - <<EOF
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRoleBinding
   metadata:
     name: emr-containers
   subjects:
   - kind: User
     name: emr-containers
     apiGroup: rbac.authorization.k8s.io
   - kind: User
     name: EmrContainersUser
     apiGroup: rbac.authorization.k8s.io
   roleRef:
     kind: ClusterRole
     name: emr-containers
     apiGroup: rbac.authorization.k8s.io
   EOF
   ```

1.  Create a Amazon EKS IAM identity mapping binding the Kubernetes user "emr-containers" to the service-linked IAM role for EMR on EKS. 

   ```
   eksctl create iamidentitymapping \
       --cluster {eks-cluster-name} \
       --arn "arn:aws:iam::{aws-account-id}:role/AWSServiceRoleForAmazonEMRContainers" \
       --username emr-containers
   ```

**Note**  
 `AWSServiceRoleForAmazonEMRContainers` is a service-linked role managed by Amazon EMR on EKS. For more information, see [ Using service-linked roles for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/using-service-linked-roles.html). 

## Amazon EKS cluster access for Amazon SageMaker Unified Studio
<a name="eks-cluster-access-for-sagemaker-unified-studio"></a>

1.  Create a Kubernetes cluster role for Amazon SageMaker Unified Studio. 

   ```
   kubectl apply -f - <<EOF
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRole
   metadata:
     name: sagemaker-provisioning
   rules:
     - apiGroups: [""]
       resources: ["namespaces"]
       verbs: ["create", "delete"]
   EOF
   ```

1.  Create a Kubernetes cluster role binding for Amazon SageMaker Unified Studio. 

   ```
   kubectl apply -f - <<EOF
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRoleBinding
   metadata:
     name: sagemaker-provisioning
   subjects:
   - kind: Group
     name: sagemaker-provisioning
     apiGroup: rbac.authorization.k8s.io
   roleRef:
     kind: ClusterRole
     name: sagemaker-provisioning
     apiGroup: rbac.authorization.k8s.io
   EOF
   ```

1.  Create a Amazon EKS access entry binding the Kubernetes group "sagemaker-provisioning" to the IAM role designated as the provisioning role for your target domain. 

   ```
   aws eks create-access-entry \
       --cluster-name {eks-cluster-name} \
       --region {aws-region-code} \
       --principal-arn {iam-provisioning-role-arn} \
       --kubernetes-groups sagemaker-provisioning \
       --type STANDARD
   ```

# Enable cross-account access for Amazon EMR on EKS using Amazon SageMaker Unified Studio associated domains
<a name="enable-cross-account-access-using-associated-domains"></a>

 Amazon EMR on EKS virtual clusters require an Amazon EKS cluster residing in the same account. As an admin, you can make use of Amazon SageMaker Unified Studio associated domains to bring Amazon EKS clusters from any account and use with any Amazon SageMaker Unified Studio domain. 

 Enabling cross-account access for Amazon EMR on EKS using Amazon SageMaker Unified Studio associated domains requires high privilege access to both Amazon EKS cluster account and Amazon SageMaker Unified Studio domain account. 

## Step 1: Submit associated domain request from the Amazon SageMaker Unified Studio domain account
<a name="submit-associated-domain-request-from-the-domain-account"></a>

1.  Navigate to the [Amazon SageMaker Unified Studio management console](https://console.aws.amazon.com/datazone). 

1.  From the navigation bar, select **Domains**. 

1.  Select the name of the domain you want to configure Amazon EMR on EKS for. 

1.  In the domain management view, navigate to **Account associations**. 

1.  Select the **Request association** button. 

1.  In the request domain association view, under accounts, provide the Amazon EKS cluster account. 

1.  Select the **Request assocation** button to submit. 

## Step 2: Accept and configure associated domain in the Amazon EKS cluster account
<a name="accept-and-configure-associated-domain-in-the-cluster-account"></a>

1.  Navigate to the [Amazon SageMaker Unified Studio management console](https://console.aws.amazon.com/datazone). 

1.  Select **Associated domains**. 

1.  Under **Requests**, select the name of the domain you requested domain association for. 

1.  In the domain association request view, select **Accept association**. 

1.  After domain association succeeds, select the domain name to navigate the domain management view. 

1.  In the domain management view, select **Blueprints**. 

1.  In the Tooling section, select **Enable** and configure the associated Tooling environment. 

1.  In the Blueprints section, select **EmrOnEks**, enable and configure the associated EmrOnEks environment. 

**Note**  
 The IAM role designated as the provisioning role must have access to the Amazon EKS cluster. See [ Enable Amazon EKS cluster access for Amazon EMR on EKS and Amazon SageMaker Unified Studio ](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/enable-eks-cluster-access-for-emr-on-eks-and-sagemaker-unified-studio.html) 

# Enable cross-network access for Amazon SageMaker Unified Studio using VPC peering connections
<a name="enable-cross-network-access-using-vpc-peering"></a>

**Note**  
 If your Amazon SageMaker Unified Studio domain and your Amazon EKS cluster are configured with the same Amazon VPC, you can skip the steps in this section. 

 Amazon SageMaker Unified Studio requires network connectivity between your Amazon SageMaker Unified Studio domain and your Amazon EKS cluster in order to maintain interactive sessions. See [What is VPC peering?](https://docs.aws.amazon.com/vpc/latest/peering/what-is-vpc-peering.html) and [Update your route tables for a VPC peering connection](https://docs.aws.amazon.com/vpc/latest/peering/vpc-peering-routing.html) for more information regarding cross-network connectivity with Amazon VPC. 

# Configuring monitoring with Spark History Server for Amazon EMR on EKS
<a name="configuring-monitoring-with-spark-history-server-for-emr-on-eks"></a>

 Amazon EMR on EKS requires additional IAM permissions to enable monitoring with Spark History Server. You must attach the following inline IAM role policy to the IAM role created as the project user role. 

**Note**  
 The project user role for an Amazon SageMaker Unified Studio project is named `datazone_usr_role_{project_id}`. 

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "SparkHistoryServer",
            "Effect": "Allow",
            "Action": [
                "sagemaker:CreatePresignedDomainUrl"
            ],
            "Resource": "arn:aws:sagemaker:*:*:user-profile/*",
            "Condition": {
                "StringEquals": {
                    "aws:ResourceTag/AmazonDataZoneProject": "${aws:PrincipalTag/AmazonDataZoneProject}"
                }
            }
        }
    ]
}
```

# Configuring fine-grained access controls for Amazon EMR on EKS
<a name="configuring-fine-grained-access-controls-for-emr-on-eks"></a>

 Amazon EMR on EKS requires additional IAM permissions to enable fine-grained access controls. You must attach the following inline IAM role policy to the IAM role created as the project user role. 

**Note**  
 The project user role for an Amazon SageMaker Unified Studio project is named `datazone_usr_role_{project_id}`. 

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "FineGrainedAccessControls",
            "Effect": "Allow",
            "Action": [
                "emr-containers:CreateCertificate"
            ],
            "Resource": "*"
        }
    ]
}
```

# Configuring trusted identity propagation for Amazon EMR on EKS
<a name="configuring-trusted-identity-propagation-for-emr-on-eks"></a>

 Amazon EMR on EKS requires additional IAM permissions to enable trusted identity propagation. You must attach the following inline IAM role policy to the IAM role created as the project user role. 

**Note**  
 The project user role for an Amazon SageMaker Unified Studio project is named `datazone_usr_role_{project_id}`. 

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "TrustedIdentityPropagation",
            "Effect": "Allow",
            "Action": [
                "sso-oauth:CreateTokenWithIAM",
                "sso-oauth:IntrospectTokenWithIAM",
                "sso-oauth:RevokeTokenWithIAM"
            ],
            "Resource": "*"
        }
    ]
}
```

# Configuring user background sessions for Amazon EMR on EKS
<a name="configuring-user-background-sessions-for-emr-on-eks"></a>

**Warning**  
 When user background sessions is enabled for Amazon EMR on EKS, Amazon SageMaker Unified Studio will not terminate interactive sessions. All interactive sessions will be only terminated once all queries are completed and the compute session has timed out. 

 Amazon EMR on EKS requires additional IAM permissions to enable user background sessions. You must attach the following inline IAM role policy to the IAM role created as the project user role. 

**Note**  
 The project user role for an Amazon SageMaker Unified Studio project is named `datazone_usr_role_{project_id}`. 

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "UserBackgroundSessions",
            "Effect": "Allow",
            "Action": [
                "sso:GetApplicationSessionConfiguration"
            ],
            "Resource": "*"
        }
    ]
}
```