# Running Spark jobs with Amazon EMR on EKS
Running Spark jobs

A *job run* is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS. This topic provides an overview of managing job runs using the AWS CLI, viewing job runs using the Amazon EMR console, and troubleshooting common job run errors.

Note that you can't run IPv6 Spark jobs on Amazon EMR on EKS

**Note**  
Before you submit a job run with Amazon EMR on EKS, you must complete the steps in [Setting up Amazon EMR on EKS](setting-up.md).

**Topics**
+ [

# Running Spark jobs with `StartJobRun`
](job-runs.md)
+ [

# Running Spark jobs with the Spark operator
](spark-operator.md)
+ [

# Running Spark jobs with spark-submit
](spark-submit.md)
+ [

# Using Apache Livy with Amazon EMR on EKS
](job-runs-apache-livy.md)
+ [

# Managing Amazon EMR on EKS job runs
](emr-eks-jobs-manage.md)
+ [

# Using job templates
](job-templates.md)
+ [

# Using pod templates
](pod-templates.md)
+ [

# Using job retry policies
](jobruns-using-retry-policies.md)
+ [

# Using Spark event log rotation
](emr-eks-log-rotation.md)
+ [

# Using Spark container log rotation
](emr-eks-log-rotation-container.md)
+ [

# Using vertical autoscaling with Amazon EMR Spark jobs
](jobruns-vas.md)

# Running Spark jobs with `StartJobRun`
StartJobRun

This section includes detailed setup steps to get your environment ready to run Spark jobs and then provides step-by-step instructions for submitting a job run with specified parameters.

**Topics**
+ [

# Setting up Amazon EMR on EKS
](setting-up.md)
+ [

# Submit a job run with `StartJobRun`
](emr-eks-jobs-submit.md)
+ [

# Using job submitter classification
](emr-eks-job-submitter.md)
+ [

# Using Amazon EMR container defaults classification
](emr-eks-job-submitter-container-defaults.md)

# Setting up Amazon EMR on EKS
Setting up

Complete the following tasks to get set up for Amazon EMR on EKS. If you've already signed up for Amazon Web Services (AWS) and have been using Amazon EKS, you are almost ready to use Amazon EMR on EKS. Skip any of the tasks that you've already completed.

**Note**  
You can also follow the [Amazon EMR on EKS Workshop](https://emr-on-eks.workshop.aws/amazon-emr-eks-workshop.html) to set up all the necessary resources to run Spark jobs on Amazon EMR on EKS. The workshop also provides automation by using CloudFormation templates to create the resources necessary for you to get started. For other templates and best practices, see our [EMR Containers Best Practices Guide](https://aws.github.io/aws-emr-containers-best-practices/) on GitHub.

1. [Install or update to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)

1. [Set up kubectl and eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html)

1. [Get started with Amazon EKS – eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html)

1. [Enable cluster access for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-cluster-access.html)

1. [Enable IAM Roles for the EKS cluster](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-enable-IAM-roles.html)

1. [Grant users access to Amazon EMR on EKS](setting-up-iam.md)

1. [Register the Amazon EKS cluster with Amazon EMR](setting-up-registration.md)

# Enable cluster access for Amazon EMR on EKS


The following sections show a couple ways to enable cluster access. The first is by using Amazon EKS cluster access management (CAM) and the latter shows how to take manual steps to enable cluster access.

## Enable cluster access using EKS Access Entry (recommended)
Enable cluster access using EKS Access Entry

**Note**  
The `aws-auth` ConfigMap is deprecated. The recommended method to manage access to Kubernetes APIs is [Access Entries](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html).

Amazon EMR is integrated with [Amazon EKS cluster access management (CAM)](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html), so you can automate configuration of the necessary AuthN and AuthZ policies to run Amazon EMR Spark jobs in namespaces of Amazon EKS clusters. When you create a virtual cluster from an Amazon EKS cluster namespace, Amazon EMR automatically configures all of the necessary permissions, so you don't need to add any extra steps into your current workflows.

**Note**  
The Amazon EMR integration with Amazon EKS CAM is supported only for new Amazon EMR on EKS virtual clusters. You can't migrate existing virtual clusters to use this integration.

### Prerequisites

+ Make sure that you are running version 2.15.3 or higher of the AWS CLI
+ Your Amazon EKS cluster must be on version 1.23 or higher.

### Setup


To set up the integration between Amazon EMR and the AccessEntry API operations from Amazon EKS, make sure that you have completed the follow items:
+ Make sure that `authenticationMode` of your Amazon EKS cluster is set to `API_AND_CONFIG_MAP`.

  ```
  aws eks describe-cluster --name <eks-cluster-name>
  ```

  If it isn't already, set `authenticationMode` to `API_AND_CONFIG_MAP`.

  ```
  aws eks update-cluster-config 
      --name <eks-cluster-name> 
      --access-config authenticationMode=API_AND_CONFIG_MAP
  ```

  For more information about authentication modes, see [ Cluster authentication modes](https://docs.aws.amazon.com/eks/latest/userguide/access-entries.html#authentication-modes).
+ Make sure that the [IAM role](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-iam.html) that you're using to run the `CreateVirtualCluster` and `DeleteVirtualCluster` API operations also has the following permissions:

  ```
  {
    "Effect": "Allow",
    "Action": [
      "eks:CreateAccessEntry"
    ],
    "Resource": "arn:<AWS_PARTITION>:eks:<AWS_REGION>:<AWS_ACCOUNT_ID>:cluster/<EKS_CLUSTER_NAME>"
  }, 
  {
    "Effect": "Allow",
    "Action": [
      "eks:DescribeAccessEntry",
      "eks:DeleteAccessEntry",
      "eks:ListAssociatedAccessPolicies",
      "eks:AssociateAccessPolicy",
      "eks:DisassociateAccessPolicy"
    ],
    "Resource": "arn:<AWS_PARTITION>:eks:<AWS_REGION>:<AWS_ACCOUNT_ID>:access-entry/<EKS_CLUSTER_NAME>/role/<AWS_ACCOUNT_ID>/AWSServiceRoleForAmazonEMRContainers/*"
  }
  ```

### Concepts and terminology


The following is a list of terminologies and concepts related to Amazon EKS CAM.
+ Virtual cluster (VC) – logical representation of the namespace created in Amazon EKS. It’s a 1:1 link to an Amazon EKS cluster namespace. You can use it to run Amazon EMR workloads on a a Amazon EKS cluster within the specified namespace.
+ Namespace – mechanism to isolate groups of resources within a single EKS cluster.
+ Access policy – permissions that grant access and actions to an IAM role within an EKS cluster.
+ Access entry – an entry created with a role arn. You can link the access entry to an access policy to assign specific permissions in the Amazon EKS cluster.
+ EKS access entry integrated virtual cluster – the virtual cluster created using [access entry API operations](https://docs.aws.amazon.com/eks/latest/APIReference/API_Operations_Amazon_Elastic_Kubernetes_Service.html) from Amazon EKS.

## Enable cluster access using `aws-auth`
Enable cluster access using aws-auth

You must allow Amazon EMR on EKS access to a specific namespace in your cluster by taking the following actions: creating a Kubernetes role, binding the role to a Kubernetes user, and mapping the Kubernetes user with the service linked role [https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/using-service-linked-roles.html](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/using-service-linked-roles.html). These actions are automated in `eksctl` when the IAM identity mapping command is used with `emr-containers` as the service name. You can perform these operations easily by using the following command.

```
eksctl create iamidentitymapping \
    --cluster my_eks_cluster \
    --namespace kubernetes_namespace \
    --service-name "emr-containers"
```

Replace *my\$1eks\$1cluster* with the name of your Amazon EKS cluster and replace *kubernetes\$1namespace* with the Kubernetes namespace created to run Amazon EMR workloads. 

**Important**  
You must download the latest eksctl using the previous step [Set up kubectl and eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) to use this functionality. 

### Manual steps to enable cluster access for Amazon EMR on EKS


You can also use the following manual steps to enable cluster access for Amazon EMR on EKS.

1. **Create a Kubernetes role in a specific namespace**

------
#### [ Amazon EKS 1.22 - 1.29 ]

   With Amazon EKS 1.22 - 1.29, run the following command to create a Kubernetes role in a specific namespace. This role grants the necessary RBAC permissions to Amazon EMR on EKS.

   ```
   namespace=my-namespace
   cat - >>EOF | kubectl apply -f - >>namespace "${namespace}"
   apiVersion: rbac.authorization.k8s.io/v1
   kind: Role
   metadata:
     name: emr-containers
     namespace: ${namespace}
   rules:
     - apiGroups: [""]
       resources: ["namespaces"]
       verbs: ["get"]
     - apiGroups: [""]
       resources: ["serviceaccounts", "services", "configmaps", "events", "pods", "pods/log"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
     - apiGroups: [""]
       resources: ["secrets"]
       verbs: ["create", "patch", "delete", "watch"]
     - apiGroups: ["apps"]
       resources: ["statefulsets", "deployments"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["batch"]
       resources: ["jobs"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["extensions", "networking.k8s.io"]
       resources: ["ingresses"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["rbac.authorization.k8s.io"]
       resources: ["roles", "rolebindings"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
     - apiGroups: [""]
       resources: ["persistentvolumeclaims"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete",  "deletecollection", "annotate", "patch", "label"]
   EOF
   ```

------
#### [ Amazon EKS 1.21 and below ]

   With Amazon EKS 1.21 and below, run the following command to create a Kubernetes role in a specific namespace. This role grants the necessary RBAC permissions to Amazon EMR on EKS.

   ```
   namespace=my-namespace
   cat - >>EOF | kubectl apply -f - >>namespace "${namespace}"
   apiVersion: rbac.authorization.k8s.io/v1
   kind: Role
   metadata:
     name: emr-containers
     namespace: ${namespace}
   rules:
     - apiGroups: [""]
       resources: ["namespaces"]
       verbs: ["get"]
     - apiGroups: [""]
       resources: ["serviceaccounts", "services", "configmaps", "events", "pods", "pods/log"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
     - apiGroups: [""]
       resources: ["secrets"]
       verbs: ["create", "patch", "delete", "watch"]
     - apiGroups: ["apps"]
       resources: ["statefulsets", "deployments"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["batch"]
       resources: ["jobs"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["extensions"]
       resources: ["ingresses"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "annotate", "patch", "label"]
     - apiGroups: ["rbac.authorization.k8s.io"]
       resources: ["roles", "rolebindings"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
     - apiGroups: [""]
       resources: ["persistentvolumeclaims"]
       verbs: ["get", "list", "watch", "describe", "create", "edit", "delete", "deletecollection", "annotate", "patch", "label"]
   EOF
   ```

------

1. **Create a Kubernetes role binding scoped to the namespace**

   Run the following command to create a Kubernetes role binding in the given namespace. This role binding grants the permissions defined in the role created in the previous step to a user named `emr-containers`. This user identifies [service-linked roles for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/using-service-linked-roles.html) and thus allows Amazon EMR on EKS to perform actions as defined by the role you created.

   ```
   namespace=my-namespace
   
   cat - <<EOF | kubectl apply -f - --namespace "${namespace}"
   apiVersion: rbac.authorization.k8s.io/v1
   kind: RoleBinding
   metadata:
     name: emr-containers
     namespace: ${namespace}
   subjects:
   - kind: User
     name: emr-containers
     apiGroup: rbac.authorization.k8s.io
   roleRef:
     kind: Role
     name: emr-containers
     apiGroup: rbac.authorization.k8s.io
   EOF
   ```

1. **Update Kubernetes `aws-auth` conﬁguration map**

   You can use one of the following options to map the Amazon EMR on EKS service-linked role with the `emr-containers` user that was bound with the Kubernetes role in the previous step.

   **Option 1: Using `eksctl`**

   Run the following `eksctl` command to map the Amazon EMR on EKS service-linked role with the `emr-containers` user.

   ```
   eksctl create iamidentitymapping \
       --cluster my-cluster-name \
       --arn "arn:aws:iam::my-account-id:role/AWSServiceRoleForAmazonEMRContainers" \
       --username emr-containers
   ```

   **Option 2: Without using eksctl**

   1. Run the following command to open the `aws-auth` configuration map in text editor. 

      ```
      kubectl edit -n kube-system configmap/aws-auth
      ```
**Note**  
If you receive an error stating `Error from server (NotFound): configmaps "aws-auth" not found`, see the steps in [Add user roles](https://docs.aws.amazon.com/eks/latest/userguide/add-user-role.html) in the Amazon EKS User Guide to apply the stock ConfigMap. 

   1. Add Amazon EMR on EKS service-linked role details to the `mapRoles` section of the `ConfigMap`, under `data`. Add this section if it does not already exist in the file. The updated `mapRoles` section under data looks like the following example.

      ```
      apiVersion: v1
      data:
        mapRoles: |
          - rolearn: arn:aws:iam::<your-account-id>:role/AWSServiceRoleForAmazonEMRContainers
            username: emr-containers
          - ... <other previously existing role entries, if there's any>.
      ```

   1. Save the file and exit your text editor.

## Enable cluster access for Amazon SageMaker Unified Studio
Enable cluster access for Amazon SageMaker Unified Studio

Amazon EMR on EKS and Amazon SageMaker Unified Studio require access to an Amazon EKS cluster. Please follow the steps at [Enable EKS cluster access for EMR on EKS and SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/enable-eks-cluster-access-for-emr-on-eks-and-sagemaker-unified-studio.html) to provide access.

# Enable IAM Roles for the EKS cluster


The following topics detail options for enabling IAM roles.

**Topics**
+ [

# Option 1: Enable EKS Pod Identity on the EKS Cluster
](setting-up-enable-IAM.md)
+ [

# Option 2: Enable IAM Roles for Service Accounts (IRSA) on the EKS cluster
](setting-up-enable-IAM-service-accounts.md)

# Option 1: Enable EKS Pod Identity on the EKS Cluster


Amazon EKS Pod Identity associations provide the ability to manage credentials for your applications, similar to the way that Amazon EC2 instance profiles provide credentials to Amazon EC2 instances. Amazon EKS Pod Identity provides credentials to your workloads with an additional EKS Auth API and an agent pod that runs on each node.

Amazon EMR on EKS starts to support EKS pod identity since emr-7.3.0 release for the StartJobRun submission model.

For more information on EKS pod identities, refer to [Understand how EKS Pod Identity works](https://docs.aws.amazon.com/eks/latest/userguide/pod-id-how-it-works.html).

## Why EKS Pod Identities?


As part of EMR setup, the Job Execution Role needs to establish trust boundaries between an IAM role and service accounts in a specific namespace (of EMR virtual clusters). With IRSA, this was achieved by updating the trust policy of the EMR Job Execution Role. However, due to the 4096 character hard-limit on IAM trust policy length, there was a constraint to share a single Job Execution IAM Role across a maximum of twelve (12) EKS clusters.

With EMR’s support for Pod Identities, the trust boundary between IAM roles and service accounts are now being managed by the EKS team through EKS pod identity’s association APIs.

**Note**  
The security boundary for EKS pod identity is still on service account level, not on pod level.

## Pod Identity Considerations


For information on the Pod Identity Limitations, see [EKS Pod Identity considerations](https://docs.aws.amazon.com/eks/latest/userguide/pod-identities.html#pod-id-considerations).

## Prepare EKS Pod Identity in EKS Cluster


### Check if the required permission exists in NodeInstanceRole


The node role `NodeInstanceRole` needs a permission for the agent to do the `AssumeRoleForPodIdentity` action in the EKS Auth API. You can add the following to the [AmazonEKSWorkerNodePolicy](https://docs.aws.amazon.com/aws-managed-policy/latest/reference/security-iam-awsmanpol.html#security-iam-awsmanpol-amazoneksworkernodepolicy), which is defined in the Amazon EKS User Guide, or use a custom policy.

If your EKS cluster was created with eksctl version higher than **0.181.0**, the AmazonEKSWorkerNodePolicy, including the required `AssumeRoleForPodIdentity` permission, will be attached to the node role automatically. If the permission is not present, manually add the following permission to AmazonEKSWorkerNodePolicy that allows assuming a role for pod identity. This permission is needed by the EKS pod identity agent to retrieve credentials for pods.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "eks-auth:AssumeRoleForPodIdentity"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowEKSAUTHAssumeroleforpodidentity"
    }
  ]
}
```

------

### Create EKS pod identity agent add-on


Use the following command to create EKS Pod Identity Agent add-on with the latest version:

```
aws eks create-addon --cluster-name cluster-name --addon-name eks-pod-identity-agent

kubectl get pods -n kube-system | grep 'eks-pod-identity-agent'
```

Use the following steps to create EKS Pod Identity Agent add-on from the Amazon EKS console:

1. Open the Amazon EKS console: [Amazon EKS console](https://console.aws.amazon.com/eks/home#/clusters).

1. In the left navigation pane, select **Clusters**, and then select the name of the cluster that you want to configure the EKS Pod Identity Agent add-on for.

1. Choose the **Add-ons** tab.

1. Choose **Get more add-ons**.

1. Select the box in the top right of the add-on box for EKS Pod Identity Agent and then choose **Next**.

1. On the **Configure selected add-ons settings** page, select any version in the **Version** drop-down list.

1. (Optional) Expand **Optional configuration settings** to enter additional configuration. For example, you can provide an alternative container image location and `ImagePullSecrets`. The JSON Schema with accepted keys is shown in **Add-on configuration schema**.

   Enter the configuration keys and values in **Configuration values**.

1. Choose **Next**.

1. Confirm that the agent pods are running on your cluster via the CLI.

   `kubectl get pods -n kube-system | grep 'eks-pod-identity-agent'`

An example output is as followings:

```
NAME                              READY   STATUS    RESTARTS      AGE
eks-pod-identity-agent-gmqp7      1/1     Running   1 (24h ago)   24h
eks-pod-identity-agent-prnsh      1/1     Running   1 (24h ago)   24h
```

This sets up a new DaemonSet in the `kube-system` namespace. The Amazon EKS Pod Identity Agent, running on each EKS node, uses the [AssumeRoleForPodIdentity](https://docs.aws.amazon.com/eks/latest/APIReference/API_auth_AssumeRoleForPodIdentity.html) action to retrieve temporary credentials from the EKS Auth API. These credentials are then made available for the AWS SDKs that you run inside your containers.

For more information, check the pre-requisite in the public document: [Set up the Amazon EKS Pod Identity Agent](https://docs.aws.amazon.com/eks/latest/userguide/pod-id-agent-setup.html).

## Create a Job Execution Role


### Create or update job execution role that allows EKS Pod Identity


To run workloads with Amazon EMR on EKS, you need to create an IAM role. We refer to this role as the job execution role in this documentation. For more information about how to create the IAM role, see [Creating IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) in the user Guide.

Additionally, you must create an IAM policy that specifies the necessary permissions for the job execution role and then attach this policy to the role to enable EKS Pod Identity.

For example, you have the following job execution role. For more information, see [Create a job execution role](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/creating-job-execution-role.html).

```
arn:aws:iam::111122223333:role/PodIdentityJobExecutionRole
```

**Important**  
Amazon EMR on EKS automatically creates Kubernetes Service Accounts, based on your job execution role name. Ensure the role name is not too long, as your job may fail if the combination of `cluster_name`, `pod_name`, and `service_account_name` exceeds the length limit.

**Job Execution Role Configuration** – Ensure the job execution role is created with the below trust permission for EKS Pod Identity. To update an existing job execution role, configure it to trust the following EKS service principal as an additional permission in the trust policy. This trust permission can co-exist with existing IRSA trust policies.

```
cat >trust-relationship.json <<EOF
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "AllowEksAuthToAssumeRoleForPodIdentity",
            "Effect": "Allow",
            "Principal": {
                "Service": "pods.eks.amazonaws.com"
            },
            "Action": [
                "sts:AssumeRole",
                "sts:TagSession"
            ]
        }
    ]
}
EOF
```

**User Permission**: Users require the `iam:PassRole` permission to execute `StartJobRun` API calls or submit jobs. This permission enables users to pass the job execution role to EMR on EKS. Job administrators should have the permission by default.

Below is the permission needed for a user:

```
{
    "Effect": "Allow",
    "Action": "iam:PassRole",
    "Resource": "arn:aws:iam::111122223333:role/PodIdentityJobExecutionRole",
    "Condition": {
        "StringEquals": {
            "iam:PassedToService": "pods.eks.amazonaws.com"
        }
    }
}
```

To further restrict the user access to specific EKS clusters, add the AssociatedResourceArn attribute filter to the IAM policy. It limits the role assumption to authorized EKS clusters, strengthening your resource-level security controls.

```
"Condition": {
        "ArnLike": {
            "iam:AssociatedResourceARN": [
                "arn:aws:eks:us-west-2:111122223333:cluster/*"
            ]
        }
```

## Set up EKS pod identity associations


### Prerequisite


Make sure the IAM Identity creating the pod identity association, such as an EKS admin user, has the permission `eks:CreatePodIdentityAssociation` and `iam:PassRole`.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "eks:CreatePodIdentityAssociation"
      ],
      "Resource": [
        "arn:aws:eks:*:*:cluster/*"
      ],
      "Sid": "AllowEKSCreatepodidentityassociation"
    },
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/*"
      ],
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": "pods.eks.amazonaws.com"
        }
      },
      "Sid": "AllowIAMPassrole"
    }
  ]
}
```

------

### Create Associations for the role and EMR service account


------
#### [ Create EMR role associations through the AWS CLI ]

When you submit a job to a Kubernetes namespace, an administrator must create associations between the job execution role and the identity of the EMR managed service account. Note that the EMR managed service account is automatically created at job submission, scoped to the namespace where the job is submitted.

With the AWS CLI (above version 2.24.0), run the following command to create role associations with pod identity.

Run the following command to create role associations with pod identity:

```
aws emr-containers create-role-associations \
        --cluster-name mycluster \
        --namespace mynamespace \
        --role-name JobExecutionRoleIRSAv2
```

Note:
+ Each cluster can have a limit of 1,000 associations. Each job execution role - namespace mapping will require 3 associations for job submitter, driver and executor pods.
+ You can only associate roles that are in the same AWS account as the cluster. You can delegate access from another account to the role in this account that you configure for EKS Pod Identities to use. For a tutorial about delegating access and `AssumeRole`, see [IAM tutorial: Delegate access across AWS accounts using IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_cross-account-with-roles.html).

------
#### [ Create EMR role associations through Amazon EKS ]

EMR creates service account with certain naming pattern when a job is submitted. To make manual associations or integrate this workflow with the AWS SDK, follow these steps:

Construct Service Account Name:

```
emr-containers-sa-spark-%(SPARK_ROLE)s-%(AWS_ACCOUNT_ID)s-%(BASE36_ENCODED_ROLE_NAME)s
```

The below examples creates a role associations for a sample Job execution role JobExecutionRoleIRSAv2.

**Example Role Associations:**

```
RoleName: JobExecutionRoleIRSAv2
Base36EncodingOfRoleName: 2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe
```

**Sample CLI command:**

```
# setup for the client service account (used by job runner pod)
# emr-containers-sa-spark-client-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe
aws eks create-pod-identity-association --cluster-name mycluster --role-arn arn:aws:iam::111122223333:role/JobExecutionRoleIRSAv2 --namespace mynamespace --service-account emr-containers-sa-spark-client-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe

# driver service account
# emr-containers-sa-spark-driver-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe        
aws eks create-pod-identity-association --cluster-name mycluster --role-arn arn:aws:iam::111122223333:role/JobExecutionRoleIRSAv2 --namespace mynamespace --service-account emr-containers-sa-spark-driver-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe

# executor service account
# emr-containers-sa-spark-executor-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe
aws eks create-pod-identity-association --cluster-name mycluster --role-arn arn:aws:iam::111122223333:role/JobExecutionRoleIRSAv2 --namespace mynamespace --service-account emr-containers-sa-spark-executor-111122223333-2eum5fah1jc1kwyjc19ikdhdkdegh1n26vbe
```

------

Once you completed all the steps required for EKS pod identity, you can skip the following steps for IRSA setup:
+ [Enable IAM Roles for Service Accounts (IRSA) on the EKS cluster](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-enable-IAM.html)
+ [Create a job execution role](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/creating-job-execution-role.html)
+ [Update the trust policy of the job execution role](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-trust-policy.html)

You can skip directly to the following step: [Grant users access to Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-iam.html)

## Delete Role Associations


Whenever you delete a virtual cluster or a job execution role and you no longer want to give access to EMR to its service accounts, you should delete the associations for the role. This is because EKS allows associations with non-existent resources (namespace and service account). Amazon EMR on EKS recommends deleting the associations if the namespace is deleted or the role is no longer in use, to free up space for other associations.

**Note**  
The lingering associations could potentially impact your ability to scale if you don’t delete them, as EKS has limitations on the number of associations you can create (soft limit: 1000 associations per cluster). You can list pod identity associations in a given namespace to check if you have any lingering associations that needs to be cleaned up:

```
aws eks list-pod-identity-associations --cluster-name mycluster --namespace mynamespace
```

With the AWS CLI (version 2.24.0 or higher), run the following emr-containers command to delete EMR’s role associations:

```
aws emr-containers delete-role-associations \
        --cluster-name mycluster \
        --namespace mynamespace \
        --role-name JobExecutionRoleIRSAv2
```

## Automatically Migrate Existing IRSA to Pod Identity


You can use the tool eksctl to migrate existing IAM Roles for Service Accounts (IRSA) to pod identity associations:

```
eksctl utils migrate-to-pod-identity \
    --cluster mycluster \
    --remove-oidc-provider-trust-relationship \
    --approve
```

Running the command without the `--approve` flag will only output a plan reflecting the migration steps, and no actual migration will occur.

## Troubleshooting


### My job failed with NoClassDefinitionFound or ClassNotFound Exception for Credentials Provider, or failed to get credentials provider.


EKS Pod Identity uses the Container Credentials Provider to retrieve the necessary credentials. If you have specified a custom credentials provider, ensure it is working correctly. Alternatively, make sure you are using a correct AWS SDK version that supports the EKS Pod Identity. For more information, refer to [Get started with Amazon EKS](https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html).

### Job failed with the "Failed to Retrieve Credentials Due to [x] Size Limit" error shown in the eks-pod-identity-agent log.


EMR on EKS creates Kubernetes Service Accounts based on the job execution role name. If the role name is too long, EKS Auth will fail to retrieve credentials because the combination of `cluster_name`, `pod_name`, and `service_account_name` exceeds the length limit. Identify which component is taking up the most space and adjust the size accordingly.

### Job failed with "Failed to Retrieve Credentials xxx" error shown in the eks-pod-identity log.


One possible cause of this issue could be that the EKS cluster is configured under private subnets without correctly configuring PrivateLink for the cluster. Check if your cluster is in a private network and configure AWS PrivateLink to address the issue. For detailed instructions, refer to [Get started with Amazon EKS](https://docs.aws.amazon.com/eks/latest/userguide/getting-started.html)..

# Option 2: Enable IAM Roles for Service Accounts (IRSA) on the EKS cluster


The IAM roles for service accounts feature is available on Amazon EKS versions 1.14 and later and for EKS clusters that are updated to versions 1.13 or later on or after September 3rd, 2019. To use this feature, you can update existing EKS clusters to version 1.14 or later. For more information, see [Updating an Amazon EKS cluster Kubernetes version](https://docs.aws.amazon.com/eks/latest/userguide/update-cluster.html).

If your cluster supports IAM roles for service accounts, it has an [OpenID Connect](https://openid.net/connect/) issuer URL associated with it. You can view this URL in the Amazon EKS console, or you can use the following AWS CLI command to retrieve it.

**Important**  
You must use the latest version of the AWS CLI to receive the proper output from this command.

```
aws eks describe-cluster --name cluster_name --query "cluster.identity.oidc.issuer" --output text
```

The expected output is as follows.

```
https://oidc.eks.<region-code>.amazonaws.com/id/EXAMPLED539D4633E53DE1B716D3041E
```

To use IAM roles for service accounts in your cluster, you must create an OIDC identity provider using either [eksctl](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html#create-oidc-eksctl) or the [AWS Management Console](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html#create-oidc-console).

## To create an IAM OIDC identity provider for your cluster with `eksctl`


Check your `eksctl` version with the following command. This procedure assumes that you have installed `eksctl` and that your `eksctl` version is 0.32.0 or later.

```
eksctl version
```

For more information about installing or upgrading eksctl, see [Installing or upgrading eksctl](https://docs.aws.amazon.com/eks/latest/userguide/eksctl.html#installing-eksctl).

Create your OIDC identity provider for your cluster with the following command. Replace *cluster\$1name* with your own value.

```
eksctl utils associate-iam-oidc-provider --cluster cluster_name --approve
```

## To create an IAM OIDC identity provider for your cluster with the AWS Management Console


Retrieve the OIDC issuer URL from the Amazon EKS console description of your cluster, or use the following AWS CLI command.

Use the following command to retrieve the OIDC issuer URL from the AWS CLI.

```
aws eks describe-cluster --name <cluster_name> --query "cluster.identity.oidc.issuer" --output text
```

Use the following steps to retrieve the OIDC issuer URL from the Amazon EKS console. 

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation panel, choose **Identity Providers**, and then choose **Create Provider**.

   1. For **Provider Type**, choose **Choose a provider type**, and then choose **OpenID Connect**.

   1. For **Provider URL**, paste the OIDC issuer URL for your cluster.

   1. For Audience, type sts.amazonaws.com and choose **Next Step**.

1. Verify that the provider information is correct, and then choose **Create** to create your identity provider.

# Create a job execution role


To run workloads on Amazon EMR on EKS, you need to create an IAM role. We refer to this role as the *job execution role* in this documentation. For more information about how to create IAM roles, see [Creating IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create.html) in the IAM user Guide. 

You must also create an IAM policy that specifies the permissions for the job execution role and then attach the IAM policy to the job execution role. 

The following policy for the job execution role allows access to resource targets, Amazon S3, and CloudWatch. These permissions are necessary to monitor jobs and access logs. To follow the same process using the AWS CLI: 

Create IAM Role for job execution: Let’s create the role that EMR will use for job execution. This is the role, EMR jobs will assume when they run on EKS.

```
cat <<EoF > ~/environment/emr-trust-policy.json
 {
   "Version": "2012-10-17",		 	 	 
   "Statement": [
     {
       "Effect": "Allow",
       "Principal": {
         "Service": "elasticmapreduce.amazonaws.com"
       },
       "Action": "sts:AssumeRole"
     }
   ]
 }
 EoF
  
 aws iam create-role --role-name EMRContainers-JobExecutionRole --assume-role-policy-document file://~/environment/emr-trust-policy.json
```

Next, we need to attach the required IAM policies to the role so it can write logs to s3 and cloudwatch.

```
cat <<EoF > ~/environment/EMRContainers-JobExecutionRole.json
 {
     "Version": "2012-10-17",		 	 	 
     "Statement": [
         {
             "Effect": "Allow",
             "Action": [
                 "s3:PutObject",
                 "s3:GetObject",
                 "s3:ListBucket"
             ],
             "Resource": "arn:aws:s3:::amzn-s3-demo-bucket"
         },
         {
             "Effect": "Allow",
             "Action": [
                 "logs:PutLogEvents",
                 "logs:CreateLogStream",
               "logs:DescribeLogGroups",
                 "logs:DescribeLogStreams"
             ],
             "Resource": [
                 "arn:aws:logs:*:*:*"
             ]
         }
     ]
 } 
 EoF
 aws iam put-role-policy --role-name EMRContainers-JobExecutionRole --policy-name EMR-Containers-Job-Execution --policy-document file://~/environment/EMRContainers-JobExecutionRole.json
```

**Note**  
Access should be appropriately scoped, not granted to all S3 objects in the job execution role.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket"
      ],
      "Sid": "AllowS3Putobject"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:PutLogEvents",
        "logs:CreateLogStream",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ],
      "Resource": [
        "arn:aws:logs:*:*:*"
      ],
      "Sid": "AllowLOGSPutlogevents"
    }
  ]
}
```

------

For more information, see [Using job execution roles](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/iam-execution-role.html), [Configure a job run to use S3 logs](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-jobs-CLI.html#emr-eks-jobs-s3), and [Configure a job run to use CloudWatch Logs](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-jobs-CLI.html#emr-eks-jobs-cloudwatch).

# Update the trust policy of the job execution role


When you use IAM Roles for Service Accounts (IRSA) to run jobs on a Kubernetes namespace, an administrator must create a trust relationship between the job execution role and the identity of the EMR managed service account. The trust relationship can be created by updating the trust policy of the job execution role. Note that the EMR managed service account is automatically created at job submission, scoped to the namespace where the job is submitted.

Run the following command to update the trust policy.

```
 aws emr-containers update-role-trust-policy \
       --cluster-name cluster \
       --namespace namespace \
       --role-name iam_role_name_for_job_execution
```

For more information, see [Using job execution roles with Amazon EMR on EKS](iam-execution-role.md).

**Important**  
The operator running the above command must have these permissions: `eks:DescribeCluster`, `iam:GetRole`, `iam:UpdateAssumeRolePolicy`.

# Grant users access to Amazon EMR on EKS


For any actions that you perform on Amazon EMR on EKS, you need a corresponding IAM permission for that action. You must create an IAM policy that allows you to perform the Amazon EMR on EKS actions and attach the policy to the IAM user or role that you use. 

This topic provides steps for creating a new policy and attaching it to a user. It also covers the basic permissions that you need to set up your Amazon EMR on EKS environment. We recommend that you refine the permissions to specific resources whenever possible based on your business needs.

## Creating a new IAM policy and attaching it to a user in the IAM console


**Create a new IAM policy**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the left navigation pane of the IAM console, choose **Policies**.

1. On the **Policies** page, choose **Create Policy**.

1. In the **Create Policy** window, navigate to the **Edit JSON** tab. Create a policy document with one or more JSON statements as shown in the examples following this procedure. Next, choose **Review policy**.

1. On the **Review Policy** screen, enter your **Policy Name**, for example `AmazonEMROnEKSPolicy`. Enter an optional description, and then choose **Create policy**. 

**Attach the policy to a user or role**

1. Sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/) 

1. In the navigation pane, choose **Policies**.

1. In the list of policies, select the check box next to the policy created in the previous section. You can use the **Filter** menu and the search box to filter the list of policies. 

1. Choose **Policy actions**, and then choose **Attach**.

1. Choose the user or role to attach the policy to. You can use the **Filter** menu and the search box to filter the list of principal entities. After choosing the user or role to attach the policy to, choose **Attach policy**.

## Permissions for managing virtual clusters


To manage virtual clusters in your AWS account, create an IAM policy with the following permissions. These permissions allow you to create, list, describe, and delete virtual clusters in your AWS account.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "iam:CreateServiceLinkedRole"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringLike": {
          "iam:AWSServiceName": "emr-containers.amazonaws.com"
        }
      },
      "Sid": "AllowIAMCreateservicelinkedrole"
    },
    {
      "Effect": "Allow",
      "Action": [
        "emr-containers:CreateVirtualCluster",
        "emr-containers:ListVirtualClusters",
        "emr-containers:DescribeVirtualCluster",
        "emr-containers:DeleteVirtualCluster"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowEMRCONTAINERSCreatevirtualcluster"
    }
  ]
}
```

------

Amazon EMR is integrated with Amazon EKS cluster access management (CAM), so you can automate configuration of the necessary AuthN and AuthZ policies to run Amazon EMR Spark jobs in namespaces of Amazon EKS clusters. To do so, you must have the following permissions:

```
{
  "Effect": "Allow",
  "Action": [
    "eks:CreateAccessEntry"
  ],
  "Resource": "arn:<AWS_PARTITION>:eks:<AWS_REGION>:<AWS_ACCOUNT_ID>:cluster/<EKS_CLUSTER_NAME>"
}, 
{
  "Effect": "Allow",
  "Action": [
    "eks:DescribeAccessEntry",
    "eks:DeleteAccessEntry",
    "eks:ListAssociatedAccessPolicies",
    "eks:AssociateAccessPolicy",
    "eks:DisassociateAccessPolicy"
  ],
  "Resource": "arn:<AWS_PARTITION>:eks:<AWS_REGION>:<AWS_ACCOUNT_ID>:access-entry/<EKS_CLUSTER_NAME>/role/<AWS_ACCOUNT_ID>/AWSServiceRoleForAmazonEMRContainers/*"
}
```

For more information, see [ Automate enabling cluster access for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/setting-up-cluster-access.html#setting-up-cluster-access-cam-integration).

When the `CreateVirtualCluster` operation is invoked for the first time from an AWS account, you also need the `CreateServiceLinkedRole` permissions to create the service-linked role for Amazon EMR on EKS. For more information, see [Using service-linked roles for Amazon EMR on EKS](using-service-linked-roles.md). 

## Permissions for submitting jobs


To submit jobs on the virtual clusters in your AWS account, create an IAM policy with the following permissions. These permissions allow you to start, list, describe, and cancel job runs for the all virtual clusters in your account. You should consider adding permissions to list or describe virtual clusters, which allow you to check the state of the virtual cluster before submitting jobs.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "emr-containers:StartJobRun",
        "emr-containers:ListJobRuns",
        "emr-containers:DescribeJobRun",
        "emr-containers:CancelJobRun"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowEMRCONTAINERSStartjobrun"
    }
  ]
}
```

------

## Permissions for debugging and monitoring


To get access to logs pushed to Amazon S3 and CloudWatch, or to view application event logs in the Amazon EMR console, create an IAM policy with the following permissions. We recommend that you refine the permissions to specific resources whenever possible based on your business needs.

**Important**  
If you haven't created an Amazon S3 bucket, you need to add `s3:CreateBucket` permission to the policy statement. If you haven't created a log group, you need to add `logs:CreateLogGroup` to the policy statement.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "emr-containers:DescribeJobRun",
        "elasticmapreduce:CreatePersistentAppUI",
        "elasticmapreduce:DescribePersistentAppUI",
        "elasticmapreduce:GetPersistentAppUIPresignedURL"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowEMRCONTAINERSDescribejobrun"
    },
    {
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowS3Getobject"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:Get*",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowLOGSGet"
    }
  ]
}
```

------

For more information about how to configure a job run to push logs to Amazon S3 and CloudWatch, see [Configure a job run to use S3 logs](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-jobs-CLI.html#emr-eks-jobs-s3) and [Configure a job run to use CloudWatch Logs](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/emr-eks-jobs-CLI.html#emr-eks-jobs-cloudwatch).

# Register the Amazon EKS cluster with Amazon EMR


Registering your cluster is the final required step to set up Amazon EMR on EKS to run workloads.

Use the following command to create a virtual cluster with a name of your choice for the Amazon EKS cluster and namespace that you set up in previous steps.

**Note**  
Each virtual cluster must have a unique name across all the EKS clusters. If two virtual clusters have the same name, the deployment process will fail even if the two virtual clusters belong to different EKS clusters. 

```
aws emr-containers create-virtual-cluster \
--name virtual_cluster_name \
--container-provider '{
    "id": "cluster_name",
    "type": "EKS",
    "info": {
        "eksInfo": {
            "namespace": "namespace_name"
        }
    }
}'
```

Alternatively, you can create a JSON file that includes the required parameters for the virtual cluster and then run the `create-virtual-cluster` command with the path to the JSON file. For more information, see [Managing virtual clusters](virtual-cluster.md).

**Note**  
To validate the successful creation of a virtual cluster, view the status of virtual clusters using the `list-virtual-clusters` operation or by going to the **Virtual Clusters** page in the Amazon EMR console. 

# Submit a job run with `StartJobRun`
Submit a job run with `StartJobRun`

**To submit a job run with a JSON file with specified parameters**

1. Create a `start-job-run-request.json` file and specify the required parameters for your job run, as the following example JSON file demonstrates. For more information about the parameters, see [Options for configuring a job run](emr-eks-jobs-CLI.md#emr-eks-jobs-parameters).

   ```
   {
     "name": "myjob", 
     "virtualClusterId": "123456",  
     "executionRoleArn": "iam_role_name_for_job_execution", 
     "releaseLabel": "emr-6.2.0-latest", 
     "jobDriver": {
       "sparkSubmitJobDriver": {
         "entryPoint": "entryPoint_location",
         "entryPointArguments": ["argument1", "argument2", ...],  
          "sparkSubmitParameters": "--class <main_class> --conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1"
       }
     }, 
     "configurationOverrides": {
       "applicationConfiguration": [
         {
           "classification": "spark-defaults", 
           "properties": {
             "spark.driver.memory":"2G"
            }
         }
       ], 
       "monitoringConfiguration": {
         "persistentAppUI": "ENABLED", 
         "cloudWatchMonitoringConfiguration": {
           "logGroupName": "my_log_group", 
           "logStreamNamePrefix": "log_stream_prefix"
         }, 
         "s3MonitoringConfiguration": {
           "logUri": "s3://my_s3_log_location"
         }
       }
     }
   }
   ```

1. Use the `start-job-run` command with a path to the `start-job-run-request.json` file stored locally.

   ```
   aws emr-containers start-job-run \
   --cli-input-json file://./start-job-run-request.json
   ```

**To start a job run using the `start-job-run` command**

1. Supply all the specified parameters in the `StartJobRun` command, as the following example demonstrates.

   ```
   aws emr-containers start-job-run \
   --virtual-cluster-id 123456 \
   --name myjob \
   --execution-role-arn execution-role-arn \
   --release-label emr-6.2.0-latest \
   --job-driver '{"sparkSubmitJobDriver": {"entryPoint": "entryPoint_location", "entryPointArguments": ["argument1", "argument2", ...], "sparkSubmitParameters": "--class <main_class> --conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1"}}' \
   --configuration-overrides '{"applicationConfiguration": [{"classification": "spark-defaults", "properties": {"spark.driver.memory": "2G"}}], "monitoringConfiguration": {"cloudWatchMonitoringConfiguration": {"logGroupName": "log_group_name", "logStreamNamePrefix": "log_stream_prefix"}, "persistentAppUI":"ENABLED",  "s3MonitoringConfiguration": {"logUri": "s3://my_s3_log_location" }}}'
   ```

1. For Spark SQL, supply all the specified parameters in the `StartJobRun` command, as the following example demonstrates.

   ```
   aws emr-containers start-job-run \
   --virtual-cluster-id 123456 \
   --name myjob \
   --execution-role-arn execution-role-arn \
   --release-label emr-6.7.0-latest \
   --job-driver '{"sparkSqlJobDriver": {"entryPoint": "entryPoint_location", "sparkSqlParameters": "--conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1"}}' \
   --configuration-overrides '{"applicationConfiguration": [{"classification": "spark-defaults", "properties": {"spark.driver.memory": "2G"}}], "monitoringConfiguration": {"cloudWatchMonitoringConfiguration": {"logGroupName": "log_group_name", "logStreamNamePrefix": "log_stream_prefix"}, "persistentAppUI":"ENABLED",  "s3MonitoringConfiguration": {"logUri": "s3://my_s3_log_location" }}}'
   ```

# Using job submitter classification
Using job submitter classification

## Overview


The Amazon EMR on EKS `StartJobRun` request creates a *job submitter* pod (also known as the *job-runner* pod) to spawn the Spark driver. You can use the `emr-job-submitter` classification to configure node selectors, add tolerations, customize logging, and make other modifications to the job submitter pod.

The following settings are available under the `emr-job-submitter` classification:

** `jobsubmitter.node.selector.[selectorKey]` **  
Adds to the node selector of the job submitter pod, with key *selectorKey* and the value as the configuration value. For example, you can set ` jobsubmitter.node.selector.identifier` to `myIdentifier` and the job submitter pod will have a node selector with a key `identifier` and a value `myIdentifier`. This can be used to specify which nodes the job submitter pod can be placed on. To add multiple node selector keys, set multiple configurations with this prefix.

** `jobsubmitter.label.[labelKey]` **  
Adds to the labels of the job submitter pod, with key *labelKey* and the value as the configuration value. To add multiple labels, set multiple configurations with this prefix.

** `jobsubmitter.annotation.[annotationKey]` **  
Adds to the annotations of the job submitter pod, with key *annotationKey* and the value as the configuration value. To add multiple annotations, set multiple configurations with this prefix.

** `jobsubmitter.node.toleration.[tolerationKey]` **  
Adds [ tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/) to the job submitter pod. By default there are no tolerations added to the pod. The toleration's key will be *tolerationKey* and the toleration's value will be the configuration value. If the configuration value is set to a non-empty string, the operator will be `Equals`. If the configuration value is set to `""`, then the operator will be `Exists`.

** `jobsubmitter.node.toleration.[tolerationKey].[effect]` **  
Adds a toleration effect to the prefixed *tolerationKey*. This field is required when adding tolerations. The allowed values for the effect field are ` NoExecute`, `NoSchedule`, and `PreferNoSchedule`.

** `jobsubmitter.node.toleration.[tolerationKey].[tolerationSeconds]` **  
Adds tolerationSeconds to the prefixed *tolerationKey*. Optional field. Only applicable when the effect is `NoExecute`.

** `jobsubmitter.scheduler.name` **  
Sets a custom schedulerName for the job submitter pod.

** `jobsubmitter.logging` **  
Enables or disables logging on the job submitter pod. When this is set to ` DISABLED` the logging container is removed from the job submitter pod, which will disable any logging for this pod specified in the `monitoringConfiguration`, such as `s3MonitoringConfiguration` or `cloudWatchMonitoringConfiguration`. When this setting is not set or is set to any other value, logging on the job submitter pod is enabled.

** `jobsubmitter.logging.image` **  
Sets a custom image to be used for the logging container on the job submitter pod.

** `jobsubmitter.logging.request.cores` **  
Sets a custom value for the number of CPUs, in CPU units, for the logging container on the job submitter pod. By default, this is set to **100m**.

** `jobsubmitter.logging.request.memory` **  
Sets a custom value for the amount of memory, in bytes, for the logging container on the job submitter pod. By default, this is set to **200Mi**. A mebibyte is a unit of measure that's similar to a megabyte.

** `jobsubmitter.container.image` **  
Sets a custom image for the job submitter pod's `job-runner` container.

** `jobsubmitter.container.image.pullPolicy` **  
Sets the [imagePullPolicy](https://kubernetes.io/docs/concepts/containers/images/#image-pull-policy) for the job submitter pod's containers.

We recommend to place job submitter pods on On-Demand Instances. Placing job submitter pods on Spot instances might result in a job failure if the instance where the job submitter pod runs is subject to a Spot Instance interruption. You can also [place the job submitter pod in a single Availability Zone or use any Kubernetes labels that are applied to the nodes](#emr-eks-job-submitter-ex-ec2).

## Job submitter classification examples
Examples

**Topics**
+ [

### `StartJobRun` request with On-Demand node placement for the job submitter pod
](#emr-eks-job-submitter-ex-od)
+ [

### `StartJobRun` request with single-AZ node placement and Amazon EC2 instance type placement for the job submitter pod
](#emr-eks-job-submitter-ex-ec2)
+ [

### `StartJobRun` request with labels, annotations, and a custom scheduler for the job submitter pod
](#emr-eks-job-submitter-label-annotation-scheduler)
+ [

### `StartJobRun` request with a toleration applied to the job submitter pod with key `dedicated`, value `graviton_machines`, effect `NoExecute`, and a `tolerationSeconds` of 60 seconds
](#emr-eks-job-submitter-tolerations)
+ [

### `StartJobRun` request with logging disabled for the job submitter pod
](#emr-eks-job-submitter-logging-disabled)
+ [

### `StartJobRun` request with custom logging container image, CPU, and memory for the job submitter pod
](#emr-eks-job-submitter-custom)
+ [

### `StartJobRun` request with a custom job submitter container image and pull policy
](#emr-eks-job-submitter-custom-container)

### `StartJobRun` request with On-Demand node placement for the job submitter pod
On-Demand node placement

```
cat >spark-python-in-s3-nodeselector-job-submitter.json << EOF
{
  "name": "spark-python-in-s3-nodeselector", 
  "virtualClusterId": "virtual-cluster-id", 
  "executionRoleArn": "execution-role-arn", 
  "releaseLabel": "emr-6.11.0-latest", 
  "jobDriver": {
    "sparkSubmitJobDriver": {
      "entryPoint": "s3://S3-prefix/trip-count.py", 
      "sparkSubmitParameters": "--conf spark.driver.cores=5  --conf spark.executor.memory=20G --conf spark.driver.memory=15G --conf spark.executor.cores=6"
    }
  }, 
  "configurationOverrides": {
    "applicationConfiguration": [
      {
        "classification": "spark-defaults", 
        "properties": {
          "spark.dynamicAllocation.enabled":"false"
        }
      },
      {
        "classification": "emr-job-submitter",
        "properties": {
          "jobsubmitter.node.selector.eks.amazonaws.com/capacityType": "ON_DEMAND"
        }
      }
    ], 
    "monitoringConfiguration": {
      "cloudWatchMonitoringConfiguration": {
        "logGroupName": "/emr-containers/jobs", 
        "logStreamNamePrefix": "demo"
      }, 
      "s3MonitoringConfiguration": {
        "logUri": "s3://joblogs"
      }
    }
  }
}
EOF
aws emr-containers start-job-run --cli-input-json file:///spark-python-in-s3-nodeselector-job-submitter.json
```

### `StartJobRun` request with single-AZ node placement and Amazon EC2 instance type placement for the job submitter pod
Single AZ and EC2 instance type placement

```
"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-job-submitter",
      "properties": {
        "jobsubmitter.node.selector.topology.kubernetes.io/zone": "Availability Zone",
        "jobsubmitter.node.selector.node.kubernetes.io/instance-type":"m5.4xlarge"
      }
    }
  ]
}
```

### `StartJobRun` request with labels, annotations, and a custom scheduler for the job submitter pod
Labels, annotations, and scheduler

```
"configurationOverrides": { 
  "applicationConfiguration": [ 
    {
      "classification": "emr-job-submitter", 
      "properties": {
        "jobsubmitter.label.label1": "value1",
        "jobsubmitter.label.label2": "value2",
        "jobsubmitter.annotation.ann1": "value1",
        "jobsubmitter.annotation.ann2": "value2",
        "jobsubmitter.scheduler.name": "custom-scheduler"
      }
    }
  ]
}
```

### `StartJobRun` request with a toleration applied to the job submitter pod with key `dedicated`, value `graviton_machines`, effect `NoExecute`, and a `tolerationSeconds` of 60 seconds
Tolerations

```
"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-job-submitter",
      "properties": {
        "jobsubmitter.node.toleration.dedicated":"graviton_machines",
        "jobsubmitter.node.toleration.dedicated.effect":"NoExecute",
        "jobsubmitter.node.toleration.dedicated.tolerationSeconds":"60"
      }
    }
  ]
}
```

### `StartJobRun` request with logging disabled for the job submitter pod
Logging disabled

```
"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-job-submitter",
      "properties": {
        "jobsubmitter.logging": "DISABLED"
      }
    }
  ], 
  "monitoringConfiguration": {
    "cloudWatchMonitoringConfiguration": {
      "logGroupName": "/emr-containers/jobs", 
      "logStreamNamePrefix": "demo"
    }, 
    "s3MonitoringConfiguration": {
      "logUri": "s3://joblogs"
    }
  }
}
```

### `StartJobRun` request with custom logging container image, CPU, and memory for the job submitter pod
Custom logging container image, CPU, and memory

```
"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-job-submitter",
      "properties": {
        "jobsubmitter.logging.image": "YOUR_ECR_IMAGE_URL",
        "jobsubmitter.logging.request.memory": "200Mi",
        "jobsubmitter.logging.request.cores": "0.5"
      }
    }
  ], 
  "monitoringConfiguration": {
    "cloudWatchMonitoringConfiguration": {
      "logGroupName": "/emr-containers/jobs", 
      "logStreamNamePrefix": "demo"
    }, 
    "s3MonitoringConfiguration": {
      "logUri": "s3://joblogs"
    }
  }
}
```

### `StartJobRun` request with a custom job submitter container image and pull policy
Custom container image

```
"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-job-submitter",
      "properties": {
        "jobsubmitter.container.image": "123456789012.dkr.ecr.us-west-2.amazonaws.com/emr6.11_custom_repo",
        "jobsubmitter.container.image.pullPolicy": "kubernetes pull policy"
      }
    }
  ]
}
```

# Using Amazon EMR container defaults classification
Using Amazon EMR container defaults classification

## Overview
Overview

The following settings are available under the `emr-containers-defaults` classification:

** `job-start-timeout` **  
By default, a job will time out if it cannot start and it waits in the ` SUBMITTED` state for 15 minutes. This configuration changes the number of seconds to wait before the job times out.

** `executor.logging` **  
Enables or disables logging on the executor pods. When this is set to ` DISABLED` the logging container is removed from the executor pods, which will disable any logging for these pods specified in the `monitoringConfiguration`, such as `s3MonitoringConfiguration` or `cloudWatchMonitoringConfiguration`. When this setting is not set or is set to any other value, logging on the executor pods is enabled.

** `logging.image` **  
Sets a custom image to be used for the logging container on the driver and executor pods.

** `logging.request.cores` **  
Sets a custom value for the number of CPUs, in CPU units, for the logging container on the driver and executor pods. By default, this is not set.

** `logging.request.memory` **  
Sets a custom value for the amount of memory, in bytes, for the logging container on the driver and executor pods. By default, this is set to **512Mi**. A mebibyte is a unit of measure that's similar to a megabyte.

## Job submitter classification examples
Examples

**Topics**
+ [

### `StartJobRun` request with custom job timeout
](#emr-eks-job-submitter-container-custom-timeout)
+ [

### `StartJobRun` request with logging disabled for executor pods
](#emr-eks-executor-logging-disabled)
+ [

### `StartJobRun` request with custom logging container image, CPU, and memory for the driver and executor pods
](#emr-eks-job-submitter-container-custom-image-cpu)

### `StartJobRun` request with custom job timeout
Custom job timeout

```
{
  "name": "spark-python", 
  "virtualClusterId": "virtual-cluster-id", 
  "executionRoleArn": "execution-role-arn", 
  "releaseLabel": "emr-6.11.0-latest", 
  "jobDriver": {
    "sparkSubmitJobDriver": {
      "entryPoint": "s3://S3-prefix/trip-count.py"
    }
  }, 
  "configurationOverrides": {
    "applicationConfiguration": [
      {
        "classification": "emr-containers-defaults", 
        "properties": {
          "job-start-timeout": "1800"
        }
      }
    ], 
    "monitoringConfiguration": {
      "cloudWatchMonitoringConfiguration": {
        "logGroupName": "/emr-containers/jobs", 
        "logStreamNamePrefix": "demo"
      }, 
      "s3MonitoringConfiguration": {
        "logUri": "s3://joblogs"
      }
    }
  }
}
```

### `StartJobRun` request with logging disabled for executor pods
Executor logging disabled

```
"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-containers-defaults", 
      "properties": {
        "executor.logging": "DISABLED"
      }
    }
  ], 
  "monitoringConfiguration": {
    "cloudWatchMonitoringConfiguration": {
      "logGroupName": "/emr-containers/jobs", 
      "logStreamNamePrefix": "demo"
    }, 
    "s3MonitoringConfiguration": {
      "logUri": "s3://joblogs"
    }
  }
}
```

### `StartJobRun` request with custom logging container image, CPU, and memory for the driver and executor pods
Custom logging container image, CPU, and memory

```
"configurationOverrides": {
  "applicationConfiguration": [
    {
      "classification": "emr-containers-defaults", 
      "properties": {
        "logging.image": "YOUR_ECR_IMAGE_URL",
        "logging.request.memory": "200Mi",
        "logging.request.cores": "0.5"
      }
    }
  ], 
  "monitoringConfiguration": {
    "cloudWatchMonitoringConfiguration": {
      "logGroupName": "/emr-containers/jobs", 
      "logStreamNamePrefix": "demo"
    }, 
    "s3MonitoringConfiguration": {
      "logUri": "s3://joblogs"
    }
  }
}
```

# Running Spark jobs with the Spark operator
Spark operator

Amazon EMR releases 6.10.0 and higher support the Kubernetes operator for Apache Spark, or *the Spark operator*, as a job submission model for Amazon EMR on EKS. With the Spark operator, you can deploy and manage Spark applications with the Amazon EMR release runtime on your own Amazon EKS clusters. Once you deploy the Spark operator in your Amazon EKS cluster, you can directly submit Spark applications with the operator. The operator manages the lifecycle of Spark applications.

**Note**  
Amazon EMR calculates pricing on Amazon EKS based on vCPU and memory consumption. This calculation applies to driver and executor pods. This calculation starts from when you download your Amazon EMR application image until the Amazon EKS pod terminates and is rounded to the nearest second.

**Topics**
+ [

# Setting up the Spark operator for Amazon EMR on EKS
](spark-operator-setup.md)
+ [

# Getting started with the Spark operator for Amazon EMR on EKS
](spark-operator-gs.md)
+ [

# Use vertical autoscaling with the Spark operator for Amazon EMR on EKS
](spark-operator-vas.md)
+ [

# Uninstalling the Spark operator for Amazon EMR on EKS
](spark-operator-uninstall.md)
+ [

# Using monitoring configuration to monitor the Spark Kubernetes operator and Spark jobs
](spark-operator-monitoring-configuration.md)
+ [

# Security and the Spark operator with Amazon EMR on EKS
](spark-operator-security.md)

# Setting up the Spark operator for Amazon EMR on EKS
Setting up

Complete the following tasks to get set up before you install the Spark operator on Amazon EKS. If you've already signed up for Amazon Web Services (AWS) and have used Amazon EKS, you are almost ready to use Amazon EMR on EKS. Complete the following tasks to get set up for the Spark operator on Amazon EKS. If you've already completed any of the prerequisites, you can skip those and move on to the next one.
+ **[Install or update to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) ** – If you've already installed the AWS CLI, confirm that you have the latest version.
+ **[Set up kubectl and eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) ** – eksctl is a command line tool that you use to communicate with Amazon EKS.
+ **[Install Helm](https://docs.aws.amazon.com/eks/latest/userguide/helm.html)** – The Helm package manager for Kubernetes helps you install and manage applications on your Kubernetes cluster. 
+ **[Get started with Amazon EKS – eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) ** – Follow the steps to create a new Kubernetes cluster with nodes in Amazon EKS.
+ **[Select an Amazon EMR base image URI](docker-custom-images-tag.md) (release 6.10.0 or higher)** – the Spark operator is supported with Amazon EMR releases 6.10.0 and higher.

# Getting started with the Spark operator for Amazon EMR on EKS
Getting started

This topic helps you start to use the Spark operator on Amazon EKS by deploying a Spark application and a Schedule Spark application.

## Install the Spark operator
Install the Spark operator

Use the following steps to install the Kubernetes operator for Apache Spark.

1. If you haven't already, complete the steps in [Setting up the Spark operator for Amazon EMR on EKS](spark-operator-setup.md).

1. Authenticate your Helm client to the Amazon ECR registry. In the following command, replace the *region-id* values with your preferred AWS Region, and the corresponding *ECR-registry-account* value for the Region from the [Amazon ECR registry accounts by Region](docker-custom-images-tag.md#docker-custom-images-ECR) page.

   ```
   aws ecr get-login-password \
   --region region-id | helm registry login \
   --username AWS \
   --password-stdin ECR-registry-account.dkr.ecr.region-id.amazonaws.com
   ```

1. Install the Spark operator with the following command.

   For the Helm chart `--version` parameter, use your Amazon EMR release label with the `emr-` prefix and date suffix removed. For example, with the `emr-6.12.0-java17-latest` release, specify `6.12.0-java17`. The example in the following command uses the `emr-7.12.0-latest` release, so it specifies `7.12.0` for the Helm chart `--version`.

   ```
   helm install spark-operator-demo \
     oci://895885662937.dkr.ecr.region-id.amazonaws.com/spark-operator \
     --set emrContainers.awsRegion=region-id \
     --version 7.12.0 \
     --namespace spark-operator \
     --create-namespace
   ```

   By default, the command creates service account `emr-containers-sa-spark-operator` for the Spark operator. To use a different service account, provide the argument `serviceAccounts.sparkoperator.name`. For example:

   ```
   --set serviceAccounts.sparkoperator.name my-service-account-for-spark-operator
   ```

   If you want to [use vertical autoscaling with the Spark operator](), add the following line to the installation command to allow webhooks for the operator:

   ```
   --set webhook.enable=true
   ```

1. Verify that you installed the Helm chart with the `helm list` command:

   ```
   helm list --namespace spark-operator -o yaml
   ```

   The `helm list` command should return your newly-deployed Helm chart release information:

   ```
   app_version: v1beta2-1.3.8-3.1.1
   chart: spark-operator-7.12.0
   name: spark-operator-demo
   namespace: spark-operator
   revision: "1"
   status: deployed
   updated: 2023-03-14 18:20:02.721638196 +0000 UTC
   ```

1. Complete installation with any additional options that you require. For more informtation, see the [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/charts/spark-operator-chart/README.md](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/charts/spark-operator-chart/README.md) documentation on GitHub.

## Run a Spark application
Run a Spark application

The Spark operator is supported with Amazon EMR 6.10.0 or higher. When you install the Spark operator, it creates the service account `emr-containers-sa-spark` to run Spark applications by default. Use the following steps to run a Spark application with the Spark operator on Amazon EMR on EKS 6.10.0 or higher.

1. Before you can run a Spark application with the Spark operator, complete the steps in [Setting up the Spark operator for Amazon EMR on EKS](spark-operator-setup.md) and [Install the Spark operator](#spark-operator-install). 

1. Create a `SparkApplication` definition file `spark-pi.yaml` with the following example contents: 

   ```
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: spark-pi
     namespace: spark-operator
   spec:
     type: Scala
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.SparkPi
     mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar"
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     volumes:
       - name: "test-volume"
         hostPath:
           path: "/tmp"
           type: Directory
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: emr-containers-sa-spark
       volumeMounts:
         - name: "test-volume"
           mountPath: "/tmp"
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
       volumeMounts:
         - name: "test-volume"
           mountPath: "/tmp"
   ```

1. Now, submit the Spark application with the following command. This will also create a `SparkApplication` object named `spark-pi`:

   ```
   kubectl apply -f spark-pi.yaml
   ```

1. Check events for the `SparkApplication` object with the following command: 

   ```
   kubectl describe sparkapplication spark-pi --namespace spark-operator
   ```

For more information on submitting applications to Spark through the Spark operator, see [Using a `SparkApplication`](https://www.kubeflow.org/docs/components/spark-operator/user-guide/using-sparkapplication/) in the `spark-on-k8s-operator` documentation on GitHub.

## Use Amazon S3 for storage


To use Amazon S3 as your file storage option, add the following configurations to your YAML file.

```
hadoopConf:
# EMRFS filesystem
  fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
  fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
  fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
  fs.s3.buffer.dir: /mnt/s3
  fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
  mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
  mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
sparkConf:
 # Required for EMR Runtime
 spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
 spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
 spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
 spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
```

 If you use Amazon EMR releases 7.2.0 and higher, the configurations are included by default. In that case, you can set the file path to `s3://<bucket_name>/<file_path>` instead of `local://<file_path>` in the Spark application YAML file. 

Then submit the Spark application as normal.

# Use vertical autoscaling with the Spark operator for Amazon EMR on EKS
Vertical autoscaling

Starting with Amazon EMR 7.0, you can use Amazon EMR on EKS vertical autoscaling to simplify resource management. It automatically tunes memory and CPU resources to adapt to the needs of the workload that you provide for Amazon EMR Spark applications. For more information, see [Using vertical autoscaling with Amazon EMR Spark jobs](jobruns-vas.md).

This section describes how to configure the Spark operator to use vertical autoscaling.

## Prerequisites
Prerequisites

Before you configure monitoring, be sure to complete the following setup tasks:
+ Complete the steps in [Setting up the Spark operator for Amazon EMR on EKS](spark-operator-setup.md).
+ (optional) If you previously installed an older version of the Spark operator, delete the SparkApplication/ScheduledSparkApplication CRD.

  ```
  kubectl delete crd sparkApplication
  kubectl delete crd scheduledSparkApplication
  ```
+ Complete the steps in [Install the Spark operator](spark-operator-gs.md#spark-operator-install). In step 3, add the following line to the installation command to allow webhooks for the operator:

  ```
  --set webhook.enable=true
  ```
+ Complete the steps in [Setting up vertical autoscaling for Amazon EMR on EKS](jobruns-vas-setup.md).
+ Give access to the files in your Amazon S3 location:

  1. Annotate your driver and operator service account with the `JobExecutionRole` that has S3 permissions.

     ```
     kubectl annotate serviceaccount -n spark-operator emr-containers-sa-spark eks.amazonaws.com/role-arn=JobExecutionRole
     kubectl annotate serviceaccount -n spark-operator emr-containers-sa-spark-operator eks.amazonaws.com/role-arn=JobExecutionRole
     ```

  1. Update the trust policy of your job execution role in that namespace.

     ```
     aws emr-containers update-role-trust-policy \
     --cluster-name cluster \
     --namespace ${Namespace}\
     --role-name iam_role_name_for_job_execution
     ```

  1. Edit the IAM role trust policy of your job execution role and update the `serviceaccount` from `emr-containers-sa-spark-*-*-xxxx` to `emr-containers-sa-*`.

     ```
     {
         "Effect": "Allow",
         "Principal": {
             "Federated": "OIDC-provider"
         },
         "Action": "sts:AssumeRoleWithWebIdentity",
         "Condition": {
             "StringLike": {
                 "OIDC": "system:serviceaccount:${Namespace}:emr-containers-sa-*"
             }
         }
     }
     ```

  1. If you're using Amazon S3 as your file storage, add the following defaults to your yaml file.

     ```
     hadoopConf:
     # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
      # Required for EMR Runtime
      spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
      spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
      spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
      spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     ```

## Run a job with vertical autoscaling on the Spark operator
Run a job

Before you can run a Spark application with the Spark operator, you must complete the steps in [Prerequisites](#spark-operator-vas-prereqs). 

To use vertical autoscaling with the Spark operator, add the following configuration to the driver for your Spark Application spec to turn on vertical autoscaling:

```
dynamicSizing:
  mode: Off
  signature: "my-signature"
```

This configuration enables vertical autoscaling and is a required signature configuration that lets you choose a signature for your job.

For more information on the configurations and parameter values, see [Configuring vertical autoscaling for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/jobruns-vas-configure.html). By default, your job submits in the monitoring-only **Off** mode of vertical autoscaling. This monitoring state lets you compute and view resource recommendations without performing autoscaling. For more information, see [Vertical autoscaling modes](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/jobruns-vas-configure.html#jobruns-vas-parameters-opt-mode).

The following is a sample `SparkApplication` definition file named `spark-pi.yaml` with the required configurations to use vertical autoscaling.

```
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark-operator
spec:
  type: Scala
  mode: cluster
  image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-7.12.0:latest"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar"
  sparkVersion: "3.4.1"
  dynamicSizing:
    mode: Off
    signature: "my-signature"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.4.1
    serviceAccount: emr-containers-sa-spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.4.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
```

Now, submit the Spark application with the following command. This will also create a `SparkApplication` object named `spark-pi`:

```
kubectl apply -f spark-pi.yaml
```

For more information on submitting applications to Spark through the Spark operator, see [Using a `SparkApplication`](https://www.kubeflow.org/docs/components/spark-operator/user-guide/using-sparkapplication/) in the `spark-on-k8s-operator` documentation on GitHub.

## Verifying the vertical autoscaling functionality
Verify functionality

To verify that vertical autoscaling works correctly for the submitted job, use kubectl to get the `verticalpodautoscaler` custom resource and view your scaling recommendations.

```
kubectl get verticalpodautoscalers --all-namespaces \ 
-l=emr-containers.amazonaws.com/dynamic.sizing.signature=my-signature
```

The output from this query should resemble the following:

```
NAMESPACE        NAME                                                          MODE   CPU   MEM         PROVIDED   AGE
spark-operator   ds-p73j6mkosvc4xeb3gr7x4xol2bfcw5evqimzqojrlysvj3giozuq-vpa   Off          580026651   True       15m
```

If your output doesn't look similar or contains an error code, see [Troubleshooting Amazon EMR on EKS vertical autoscaling](troubleshooting-vas.md) for steps to help resolve the issue.

To remove the pods and applications, run the following command:

```
kubectl delete sparkapplication spark-pi
```

# Uninstalling the Spark operator for Amazon EMR on EKS
Uninstall

Use the following steps to uninstall the Spark operator.

1. Delete the Spark operator using the correct namespace. For this example, the namespace is `spark-operator-demo`.

   ```
   helm uninstall spark-operator-demo -n spark-operator
   ```

1. Delete the Spark operator service account:

   ```
   kubectl delete sa emr-containers-sa-spark-operator -n spark-operator
   ```

1. Delete the Spark operator `CustomResourceDefinitions` (CRDs):

   ```
   kubectl delete crd sparkapplications.sparkoperator.k8s.io
   kubectl delete crd scheduledsparkapplications.sparkoperator.k8s.io
   ```

# Using monitoring configuration to monitor the Spark Kubernetes operator and Spark jobs
Using monitoring configuration to monitor Spark

Monitoring configuration lets you easily set up log archiving of your Spark application and operator logs to Amazon S3 or to Amazon CloudWatch. You can choose one or both. Doing so adds a log agent sidecar to your spark operator pod, driver, and executor pods, and subsequently forwards these components' logs to your configured sinks.

## Prerequisites
Prerequisites to monitor Spark

Before you configure monitoring, be sure to complete the following setup tasks:

1. (Optional) If you previously installed an older version of the Spark operator, delete the *SparkApplication/ScheduledSparkApplication* CRD.

   ```
   kubectl delete crd scheduledsparkapplications.sparkoperator.k8s.io
   kubectl delete crd sparkapplications.sparkoperator.k8s.io
   ```

1. Create an operator/job execution role in IAM if you don’t have one already.

1. Run the following command to update the trust policy of the operator/job execution role you just created:

   ```
   aws emr-containers update-role-trust-policy \ 
   --cluster-name cluster \
   --namespace namespace \
   --role-name iam_role_name_for_operator/job_execution_role
   ```

1. Edit the IAM role trust policy of your operator/job execution role to the following:

   ```
   {
       "Effect": "Allow",
       "Principal": {
           "Federated": "${OIDC-provider}"
       },
       "Action": "sts:AssumeRoleWithWebIdentity",
       "Condition": {
           "StringLike": {
               "OIDC_PROVIDER:sub": "system:serviceaccount:${Namespace}:emr-containers-sa-*"
           }
       }
   }
   ```

1. Create a *monitoringConfiguration* policy in IAM with following permissions:

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Action": [
           "logs:DescribeLogStreams",
           "logs:CreateLogStream",
           "logs:CreateLogGroup",
           "logs:PutLogEvents"
         ],
         "Resource": [
           "arn:aws:logs:*:*:log-group:log_group_name",
           "arn:aws:logs:*:*:log-group:log_group_name:*"
         ],
         "Sid": "AllowLOGSDescribelogstreams"
       },
       {
         "Effect": "Allow",
         "Action": [
           "logs:DescribeLogGroups"
         ],
         "Resource": [
           "*"
         ],
         "Sid": "AllowLOGSDescribeloggroups"
       },
       {
         "Effect": "Allow",
         "Action": [
           "s3:PutObject",
           "s3:GetObject",
           "s3:ListBucket"
         ],
         "Resource": [
           "arn:aws:s3:::bucket_name",
           "arn:aws:s3:::bucket_name/*"
         ],
         "Sid": "AllowS3Putobject"
       }
     ]
   }
   ```

------

1. Attach the above policy to your operator/job execution role.

# Spark Operator Logs
Spark Operator Logs

You can define monitoring configuration in the following way when doing `helm install`:

```
helm install spark-operator spark-operator \
--namespace namespace \
--set emrContainers.awsRegion=aws_region \
--set emrContainers.monitoringConfiguration.image=log_agent_image_url \
--set emrContainers.monitoringConfiguration.s3MonitoringConfiguration.logUri=S3_bucket_uri \
--set emrContainers.monitoringConfiguration.cloudWatchMonitoringConfiguration.logGroupName=log_group_name \
--set emrContainers.monitoringConfiguration.cloudWatchMonitoringConfiguration.logStreamNamePrefix=log_stream_prefix \
--set emrContainers.monitoringConfiguration.sideCarResources.limits.cpuLimit=500m \
--set emrContainers.monitoringConfiguration.sideCarResources.limits.memoryLimit=512Mi \
--set emrContainers.monitoringConfiguration.containerLogRotationConfiguration.rotationSize=2GB \
--set emrContainers.monitoringConfiguration.containerLogRotationConfiguration.maxFilesToKeep=10 \
--set webhook.enable=true \
--set emrContainers.operatorExecutionRoleArn=operator_execution_role_arn
```

**Monitoring configuration**

The following are the available configuration options under **monitoringConfiguration**.
+ **Image ** (optional) – Log agent image url. Will fetch by emrReleaseLabel if not provided.
+ **s3MonitoringConfiguration** – Set this option to archive to Amazon S3.
  + **logUri** – (required) – The Amazon S3 bucket path where you want to store your logs.
  + The following are sample formats for the Amazon S3 bucket paths, after the logs are uploaded. The first example shows no log rotation enabled.

    ```
    s3://${logUri}/${POD NAME}/operator/stdout.gz
    s3://${logUri}/${POD NAME}/operator/stderr.gz
    ```

    Log rotation enabled by default. You can see both a rotated file, with an incrementing index, and a current file, which is the same as the previous sample.

    ```
    s3://${logUri}/${POD NAME}/operator/stdout_YYYYMMDD_index.gz
    s3://${logUri}/${POD NAME}/operator/stderr_YYYYMMDD_index.gz
    ```
+ **cloudWatchMonitoringConfiguration** – The configuration key to set up forwarding to Amazon CloudWatch.
  + **logGroupName** (required) – Name of the Amazon CloudWatch log group that you want to send logs to. The group automatically gets created if it doesn't exist.
  + **logStreamNamePrefix** (optional) – Name of the log stream that you want to send logs into. The default value is an empty string. The format in Amazon CloudWatch is as follows:

    ```
    ${logStreamNamePrefix}/${POD NAME}/STDOUT or STDERR
    ```
+ **sideCarResources** (optional) – The configuration key to set resource limits on the launched Fluentd sidecar container.
  + **memoryLimit** (optional) – The memory limit. Adjust according to your needs. The default is 512Mi.
  + **cpuLimit** (optional) – The CPU limit. Adjust according to your needs. The default is 500m.
+ **containerLogRotationConfiguration** (optional) – Controls the container log rotation behavior. It is enabled by default.
  + **rotationSize** (required) – Specifies file size for the log rotation. The range of possible values is from 2KB to 2GB. The numeric unit portion of the rotationSize parameter is passed as an integer. Since decimal values aren't supported, you can specify a rotation size of 1.5GB, for example, with the value 1500MB. The default is 2GB.
  + **maxFilesToKeep** (required) – Specifies the maximum number of files to retain in the container after rotation has taken place. The minimum value is 1, and the maximum value is 50. The default is 10.

After configured *monitoringConfiguration*, you should be able to check spark operator pod logs on an Amazon S3 bucket or Amazon CloudWatch or both. For an Amazon S3 bucket, you need to wait 2 minutes for the first log file to get flushed.

To find the logs in Amazon CloudWatch, you can navigate to the following: **CloudWatch** > **Log groups** > ***Log group name*** > *Pod name***/operator/stderr**

Or you can navigate to: **CloudWatch** > **Log groups** > ***Log group name*** > *Pod name***/operator/stdout**

# Spark Application Logs
Spark Application Logs

You can define this configuration in the following way.

```
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: namespace
spec:
  type: Scala
  mode: cluster
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar"
  sparkVersion: "3.3.1"
  emrReleaseLabel: emr_release_label
  executionRoleArn: job_execution_role_arn
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.3.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.3.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  monitoringConfiguration:
    image: "log_agent_image"
    s3MonitoringConfiguration:
      logUri: "S3_bucket_uri"
    cloudWatchMonitoringConfiguration:
      logGroupName: "log_group_name"
      logStreamNamePrefix: "log_stream_prefix"
    sideCarResources:
      limits:
        cpuLimit: "500m"
        memoryLimit: "250Mi"
    containerLogRotationConfiguration:
      rotationSize: "2GB"
      maxFilesToKeep: "10"
```

The following are the available configuration options under **monitoringConfiguration**.
+ **Image** (optional) – Log agent image url. Will fetch by emrReleaseLabel if not provided.
+ **s3MonitoringConfiguration** – Set this option to archive to Amazon S3.
  + **logUri** (required) – The Amazon S3 bucket path where you want to store your logs. The first example shows no log rotation enabled:

    ```
    s3://${logUri}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stdout.gz
    s3://${logUri}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stderr.gz
    ```

    Log rotation is enabled by default. You can use both a rotated file (with incrementing index) and a current file (one without the date stamp).

    ```
    s3://${logUri}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stdout_YYYYMMDD_index.gz
    s3://${logUri}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stderr_YYYYMMDD_index.gz
    ```
+ **cloudWatchMonitoringConfiguration** – The configuration key to set up forwarding to Amazon CloudWatch.
  + **logGroupName** (required) – The name of the Cloudwatch log group that you want to send logs to. The group automatically is created if it doesn't exist.
  + **logStreamNamePrefix** (optional) – The Name of the log stream that you want to send logs into. The default value is an empty string. The format in CloudWatch is as follows:

    ```
    ${logStreamNamePrefix}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stdout
    ${logStreamNamePrefix}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stderr
    ```
+ **sideCarResources** (optional) – The configuration key to set resource limits on the launched Fluentd sidecar container.
  + **memoryLimit** (optional) – The memory limit. Adjust according to your needs. The default is 250Mi.
  + **cpuLimit** – The CPU limit. Adjust according to your needs. The default is 500m.
+ **containerLogRotationConfiguration** (optional) – Controls the container log rotation behavior. It is enabled by default.
  + **rotationSize** (required) – Specifies file size for the log rotation. The range of possible values is from 2KB to 2GB. The numeric unit portion of the rotationSize parameter is passed as an integer. Since decimal values aren't supported, you can specify a rotation size of 1.5GB, for example, with the value 1500MB. The default is 2GB.
  + **maxFilesToKeep** (required) – Specifies the maximum number of files to retain in the container after rotation has taken place. The minimum value is 1. The maximum value is 50. The default is 10.

After configuring monitoringConfiguration, you should be able to check your spark application driver and executor logs on an Amazon S3 bucket or CloudWatch or both. For an Amazon S3 bucket, you need to wait 2 minutes for the first log file to be flushed. For example, in Amazon S3, the bucket path appears like the following:

**Amazon S3** > **Buckets** > ***Bucket name*** > *Spark application name - UUID* > *Pod Name* > **stderr.gz**

Or:

**Amazon S3** > **Buckets** > ***Bucket name*** > *Spark application name - UUID* > *Pod Name* > **stdout.gz**

In CloudWatch, the path appears like the following:

**CloudWatch** > **Log groups** > ***Log group name*** > *Spark application name - UUID*/ *Pod name***/stderr**

Or:

**CloudWatch** > **Log groups** > ***Log group name*** > *Spark application name - UUID*/ *Pod name***/stdout**

# Security and the Spark operator with Amazon EMR on EKS
Security

There are a couple ways to set up cluster-access permissions when you use the Spark operator. The first is to use role-based access control, Role-based access control (RBAC) restricts access based on a person's role within an organization. It has become a primary way to handle access. The second access method is to assume an AWS Identity and Access Management role, which provides resource access by means of specific assigned permissions.

**Topics**
+ [

# Setting up cluster access permissions with role-based access control (RBAC)
](spark-operator-security-rbac.md)
+ [

# Setting up cluster access permissions with IAM roles for service accounts (IRSA)
](spark-operator-security-irsa.md)

# Setting up cluster access permissions with role-based access control (RBAC)
Role-based access control (RBAC)

To deploy the Spark operator, Amazon EMR on EKS creates two roles and service accounts for the Spark operator and the Spark apps.

**Topics**
+ [

## Operator service account and role
](#spark-operator-sa-oper)
+ [

## Spark service account and role
](#spark-operator-sa-spark)

## Operator service account and role
Operator service account and role

Amazon EMR on EKS creates the **operator service account and role** to manage `SparkApplications` for Spark jobs and for other resources such as services.

The default name for this service account is `emr-containers-sa-spark-operator`.

The following rules apply to this service role: 

```
 rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - services
  - configmaps
  - secrets
  verbs:
  - create
  - get
  - delete
  - update
- apiGroups:
  - extensions
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - create
  - get
  - delete
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - update
  - patch
- apiGroups:
  - ""
  resources:
  - resourcequotas
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - create
  - get
  - update
  - delete
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  - validatingwebhookconfigurations
  verbs:
  - create
  - get
  - update
  - delete
- apiGroups:
  - sparkoperator.k8s.io
  resources:
  - sparkapplications
  - sparkapplications/status
  - scheduledsparkapplications
  - scheduledsparkapplications/status
  verbs:
  - "*"
  {{- if .Values.batchScheduler.enable }}
  # required for the `volcano` batch scheduler
- apiGroups:
  - scheduling.incubator.k8s.io
  - scheduling.sigs.dev
  - scheduling.volcano.sh
  resources:
  - podgroups
  verbs:
  - "*"
  {{- end }}
  {{ if .Values.webhook.enable }}
- apiGroups:
  - batch
  resources:
  - jobs
  verbs:
  - delete
  {{- end }}
```

## Spark service account and role
Spark service account and role

A Spark driver pod needs a Kubernetes service account in the same namespace as the pod. This service account needs permissions to create, get, list, patch and delete executor pods, and to create a Kubernetes headless service for the driver. The driver fails and exits without the service account unless the default service account in the pod's namespace has the required permissions.

The default name for this service account is `emr-containers-sa-spark`.

The following rules apply to this service role: 

```
 rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - persistentvolumeclaims
  verbs:
  - "*"
```

# Setting up cluster access permissions with IAM roles for service accounts (IRSA)
IAM roles for service accounts (IRSA)

This section uses an example to demonstrate how to configure a Kubernetes service account to assume an AWS Identity and Access Management role. Pods that use the service account can then access any AWS service that the role has permissions to access.

The following example runs a Spark application to count the words from a file in Amazon S3. To do this, you can set up IAM roles for service accounts (IRSA) to authenticate and authorize Kubernetes service accounts.

**Note**  
This example uses the "spark-operator" namespace for the Spark operator and for the namespace where you submit the Spark application.

## Prerequisites


Before you try the example on this page, complete the following prerequisites:
+ [Get set up for the Spark operator]().
+ [Install the Spark operator](spark-operator-gs.md#spark-operator-install).
+ [Create an Amazon S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html).
+ Save your favorite poem in a text file named `poem.txt`, and upload the file to your S3 bucket. The Spark application that you create on this page will read the contents of the text file. For more information on uploading files to S3, see [Upload an object to your bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/uploading-an-object-bucket.html) in the *Amazon Simple Storage Service User Guide*.

## Configure a Kubernetes service account to assume an IAM role
Configure the Kubernetes service account

Use the following steps to configure a Kubernetes service account to assume an IAM role that pods can use to access AWS services that the role has permissions to access.

1. After completing the [Prerequisites](#spark-operator-security-irsa-prereqs), use the AWS Command Line Interface to create an `example-policy.json` file that allows read-only access to the file that you uploaded to Amazon S3:

   ```
   cat >example-policy.json <<EOF
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "s3:GetObject",
                   "s3:ListBucket"
               ],
               "Resource": [
                   "arn:aws:s3:::my-pod-bucket",
                   "arn:aws:s3:::my-pod-bucket/*"
               ]
           }
       ]
   }
   EOF
   ```

1. Then, create an IAM policy `example-policy`:

   ```
   aws iam create-policy --policy-name example-policy --policy-document file://example-policy.json
   ```

1. Next, create an IAM role `example-role` and associate it with a Kubernetes service account for the Spark driver:

   ```
   eksctl create iamserviceaccount --name driver-account-sa --namespace spark-operator \
   --cluster my-cluster --role-name "example-role" \
   --attach-policy-arn arn:aws:iam::111122223333:policy/example-policy --approve
   ```

1. Create a yaml file with the cluster role bindings that are required for the Spark driver service account:

   ```
   cat >spark-rbac.yaml <<EOF
   apiVersion: v1
   kind: ServiceAccount
   metadata:
     name: driver-account-sa
   ---
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRoleBinding
   metadata:
     name: spark-role
   roleRef:
     apiGroup: rbac.authorization.k8s.io
     kind: ClusterRole
     name: edit
   subjects:
     - kind: ServiceAccount
       name: driver-account-sa
       namespace: spark-operator
   EOF
   ```

1. Apply the cluster role binding configurations:

   ```
   kubectl apply -f spark-rbac.yaml
   ```

The kubectl command should confirm successful creation of the account:

```
serviceaccount/driver-account-sa created
clusterrolebinding.rbac.authorization.k8s.io/spark-role configured
```

## Running an application from the Spark operator
Run the application

After you [configure the Kubernetes service account](), you can run a Spark application that counts the number of words in the text file that you uploaded as part of the [Prerequisites](#spark-operator-security-irsa-prereqs).

1. Create a new file `word-count.yaml`, with a `SparkApplication` definition for your word-count application, based on Amazon EMR version 6.

   ```
   cat >word-count.yaml <<EOF
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: word-count
     namespace: spark-operator
   spec:
     type: Java
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.JavaWordCount
     mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar
     arguments:
       - s3://my-pod-bucket/poem.txt
     hadoopConf:
      # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
       # Required for EMR Runtime
       spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
       spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: my-spark-driver-sa
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
   EOF
   ```

   If you're using the spark operator with a version 7 release, you adjust some of the configuration values:

   ```
   cat >word-count.yaml <<EOF
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: word-count
     namespace: spark-operator
   spec:
     type: Java
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-7.7.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.JavaWordCount
     mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar
     arguments:
       - s3://my-pod-bucket/poem.txt
     hadoopConf:
      # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
       # Required for EMR Runtime
       spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/aws-java-sdk-v2/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
       spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/aws-java-sdk-v2/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: my-spark-driver-sa
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
   EOF
   ```

1. Submit the Spark application.

   ```
   kubectl apply -f word-count.yaml
   ```

   The kubectl command should return confirmation that you successfully created a `SparkApplication` object called `word-count`.

   ```
   sparkapplication.sparkoperator.k8s.io/word-count configured
   ```

1. To check events for the `SparkApplication` object, run the following command:

   ```
   kubectl describe sparkapplication word-count -n spark-operator
   ```

   The kubectl command should return the description of the `SparkApplication` with the events:

   ```
   Events:
     Type     Reason                               Age                    From            Message
     ----     ------                               ----                   ----            -------
     Normal   SparkApplicationSpecUpdateProcessed  3m2s (x2 over 17h)     spark-operator  Successfully processed spec update for SparkApplication word-count
     Warning  SparkApplicationPendingRerun         3m2s (x2 over 17h)     spark-operator  SparkApplication word-count is pending rerun
     Normal   SparkApplicationSubmitted            2m58s (x2 over 17h)    spark-operator  SparkApplication word-count was submitted successfully
     Normal   SparkDriverRunning                   2m56s (x2 over 17h)    spark-operator  Driver word-count-driver is running
     Normal   SparkExecutorPending                 2m50s                  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] is pending
     Normal   SparkExecutorRunning                 2m48s                  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] is running
     Normal   SparkDriverCompleted                 2m31s (x2 over 17h)    spark-operator  Driver word-count-driver completed
     Normal   SparkApplicationCompleted            2m31s (x2 over 17h)    spark-operator  SparkApplication word-count completed
     Normal   SparkExecutorCompleted               2m31s (x2 over 2m31s)  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] completed
   ```

The application is now counting the words in your S3 file. To find the count of words, refer to the log files for your driver:

```
kubectl logs pod/word-count-driver -n spark-operator
```

The kubectl command should return the contents of the log file with the results of your word-count application.

```
INFO DAGScheduler: Job 0 finished: collect at JavaWordCount.java:53, took 5.146519 s
                Software: 1
```

For more information on how to submit applications to Spark through the Spark operator, see [Using a SparkApplication](https://www.kubeflow.org/docs/components/spark-operator/user-guide/using-sparkapplication/) in the *Kubernetes Operator for Apache Spark (spark-on-k8s-operator)* documentation on GitHub.

# Running Spark jobs with spark-submit
spark-submit

Amazon EMR releases 6.10.0 and higher support `spark-submit` as a command-line tool that you can use to submit and execute Spark applications to an Amazon EMR on EKS cluster.

**Note**  
Amazon EMR calculates pricing on Amazon EKS based on vCPU and memory consumption. This calculation applies to driver and executor pods. This calculation starts from when you download your Amazon EMR application image until the Amazon EKS pod terminates and is rounded to the nearest second.

**Topics**
+ [

# Setting up spark-submit for Amazon EMR on EKS
](spark-submit-setup.md)
+ [

# Getting started with spark-submit for Amazon EMR on EKS
](spark-submit-gs.md)
+ [

# Verify Spark driver service account security requirements for spark-submit
](spark-submit-security.md)

# Setting up spark-submit for Amazon EMR on EKS
Setting up

Complete the following tasks to get set up before you can run an application with spark-submit on Amazon EMR on EKS. If you've already signed up for Amazon Web Services (AWS) and have used Amazon EKS, you are almost ready to use Amazon EMR on EKS. If you've already completed any of the prerequisites, you can skip those and move on to the next one.
+ **[Install or update to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) ** – If you've already installed the AWS CLI, confirm that you have the latest version.
+ **[Set up kubectl and eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) ** – eksctl is a command line tool that you use to communicate with Amazon EKS.
+ **[Get started with Amazon EKS – eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) ** – Follow the steps to create a new Kubernetes cluster with nodes in Amazon EKS.
+ **[Select an Amazon EMR base image URI](docker-custom-images-tag.md) (release 6.10.0 or higher)** – the `spark-submit` command is supported with Amazon EMR releases 6.10.0 and higher.
+ Confirm that the driver service account has appropriate permissions to create and watch executor pods. For more information, see [Verify Spark driver service account security requirements for spark-submit](spark-submit-security.md).
+ Set up your local [AWS credentials profile](https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html).
+ From the Amazon EKS console, choose your EKS cluster, then find the EKS cluster endpoint, located under **Overview**, **Details**, then **API server endpoint**.

# Getting started with spark-submit for Amazon EMR on EKS
Getting started

Amazon EMR 6.10.0 and higher supports spark-submit for running Spark applications on an Amazon EKS cluster. The section that follows shows you how to submit a command for a Spark application.

## Run a Spark application
Run a Spark application

To run the Spark application, follow these steps:

1. Before you can run a Spark application with the `spark-submit` command, complete the steps in [Setting up spark-submit for Amazon EMR on EKS](spark-submit-setup.md). 

1. Run a container with an Amazon EMR on EKS base image. See [How to select a base image URI](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/docker-custom-images-tag.html) for more information.

   ```
   kubectl run -it containerName --image=EMRonEKSImage --command -n namespace /bin/bash
   ```

1. Set the values for the following environment variables:

   ```
   export SPARK_HOME=spark-home
   export MASTER_URL=k8s://Amazon EKS-cluster-endpoint
   ```

1. Now, submit the Spark application with the following command:

   ```
   $SPARK_HOME/bin/spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master $MASTER_URL \
    --conf spark.kubernetes.container.image=895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --deploy-mode cluster \
    --conf spark.kubernetes.namespace=spark-operator \
    local:///usr/lib/spark/examples/jars/spark-examples.jar 20
   ```

For more information about submitting applications to Spark, see [Submitting applications](https://spark.apache.org/docs/latest/submitting-applications.html) in the Apache Spark documentation.

**Important**  
`spark-submit` only supports cluster mode as the submission mechanism.

# Verify Spark driver service account security requirements for spark-submit
Security

The Spark driver pod uses a Kubernetes service account to access the Kubernetes API server to create and watch executor pods. Driver service account must have appropriate permissions to list, create, edit, patch and delete pods in your cluster. You can verify that you can list these resources by running the following command:

```
kubectl auth can-i list|create|edit|delete|patch pods
```

Verify that you have the necessary permissions by running each command.

```
kubectl auth can-i list pods
kubectl auth can-i create pods
kubectl auth can-i edit pods
kubectl auth can-i delete pods
kubectl auth can-i patch pods
```

The following rules apply to this service role: 

```
 rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - persistentvolumeclaims
  verbs:
  - "*"
```

# Setting up IAM roles for service accounts (IRSA) for spark-submit
IAM roles for service roles for spark-submit

The following sections explain how to set up IAM roles for service accounts (IRSA) to authenticate and authorize Kubernetes service accounts so you can run Spark applications stored in Amazon S3.

## Prerequisites


Before trying any of the examples in this documentation, make sure that you have completed the following prerequisites:
+ [Finished setting up spark-submit](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/spark-submit-setup.html)
+ [Created an S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html) and [uploaded](https://docs.aws.amazon.com/AmazonS3/latest/userguide/uploading-an-object-bucket.html) the spark application jar

## Configuring a Kubernetes service account to assume an IAM role
Configuring a Kubernetes service account

The following steps cover how to configure a Kubernetes service account to assume an AWS Identity and Access Management (IAM) role. After you configure the pods to use the service account, they can then access any AWS service that the role has permissions to access.

1. Create a policy file to allow read-only access to the Amazon S3 object you [uploaded](https://docs.aws.amazon.com/AmazonS3/latest/userguide/uploading-an-object-bucket.html):

   ```
   cat >my-policy.json <<EOF
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "s3:GetObject",
                   "s3:ListBucket"
               ],
               "Resource": [
                   "arn:aws:s3:::<my-spark-jar-bucket>",
                   "arn:aws:s3:::<my-spark-jar-bucket>/*"
               ]
           }
       ]
   }
   EOF
   ```

1. Create the IAM policy.

   ```
   aws iam create-policy --policy-name my-policy --policy-document file://my-policy.json
   ```

1. Create an IAM role and associate it with a Kubernetes service account for the Spark driver

   ```
   eksctl create iamserviceaccount --name my-spark-driver-sa --namespace spark-operator \
   --cluster my-cluster --role-name "my-role" \
   --attach-policy-arn arn:aws:iam::111122223333:policy/my-policy --approve
   ```

1. Create a YAML file with the required [permissions](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/spark-submit-security.html) for the Spark driver service account:

   ```
   cat >spark-rbac.yaml <<EOF
   apiVersion: rbac.authorization.k8s.io/v1
   kind: Role
   metadata:
     namespace: default
     name: emr-containers-role-spark
   rules:
   - apiGroups:
     - ""
     resources:
     - pods
     verbs:
     - "*"
   - apiGroups:
     - ""
     resources:
     - services
     verbs:
     - "*"
   - apiGroups:
     - ""
     resources:
     - configmaps
     verbs:
     - "*"
   - apiGroups:
     - ""
     resources:
     - persistentvolumeclaims
     verbs:
     - "*"
   ---
   apiVersion: rbac.authorization.k8s.io/v1
   kind: RoleBinding
   metadata:
     name: spark-role-binding
     namespace: default
   roleRef:
     apiGroup: rbac.authorization.k8s.io
     kind: Role
     name: emr-containers-role-spark
   subjects:
   - kind: ServiceAccount
     name: emr-containers-sa-spark
     namespace: default
   EOF
   ```

1. Apply the cluster role binding configurations.

   ```
   kubectl apply -f spark-rbac.yaml
   ```

1. The `kubectl` command should return confirmation of the created account.

   ```
   serviceaccount/emr-containers-sa-spark created
   clusterrolebinding.rbac.authorization.k8s.io/emr-containers-role-spark configured
   ```

## Running the Spark application
Running the Spark application

Amazon EMR 6.10.0 and higher supports spark-submit for running Spark applications on an Amazon EKS cluster. To run the Spark application, follow these steps:

1. Make sure that you have completed the steps in [ Setting up spark-submit for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/spark-submit-setup.html).

1. Set the values for the following environment variables:

   ```
   export SPARK_HOME=spark-home
   export MASTER_URL=k8s://Amazon EKS-cluster-endpoint
   ```

1. Now, submit the Spark application with the following command:

   ```
   $SPARK_HOME/bin/spark-submit \
    --class org.apache.spark.examples.SparkPi \
    --master $MASTER_URL \
    --conf spark.kubernetes.container.image=895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.15.0:latest \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=emr-containers-sa-spark \
    --deploy-mode cluster \
    --conf spark.kubernetes.namespace=default \
    --conf "spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*" \
    --conf "spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native" \
    --conf "spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*" \
    --conf "spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native" \
    --conf spark.hadoop.fs.s3.customAWSCredentialsProvider=com.amazonaws.auth.WebIdentityTokenCredentialsProvider \
    --conf spark.hadoop.fs.s3.impl=com.amazon.ws.emr.hadoop.fs.EmrFileSystem \
    --conf spark.hadoop.fs.AbstractFileSystem.s3.impl=org.apache.hadoop.fs.s3.EMRFSDelegate \
    --conf spark.hadoop.fs.s3.buffer.dir=/mnt/s3 \
    --conf spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds="2000" \
    --conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem="2" \
    --conf spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem="true" \
    s3://my-pod-bucket/spark-examples.jar 20
   ```

1. After the spark driver finishes the Spark job, you should see a log line at the end of the submission indicating that the Spark job has finished.

   ```
   23/11/24 17:02:14 INFO LoggingPodStatusWatcherImpl: Application org.apache.spark.examples.SparkPi with submission ID default:org-apache-spark-examples-sparkpi-4980808c03ff3115-driver finished
   23/11/24 17:02:14 INFO ShutdownHookManager: Shutdown hook called
   ```

## Cleanup


When you're done running your applications, you can perform cleanup with the following command.

```
kubectl delete -f spark-rbac.yaml
```

# Using Apache Livy with Amazon EMR on EKS
Apache Livy

With Amazon EMR releases 7.1.0 and higher, you can use Apache Livy to submit jobs on Amazon EMR on EKS. Using Apache Livy, you can set up your own Apache Livy REST endpoint and use it to deploy and manage Spark applications on your Amazon EKS clusters. After you install Livy in your Amazon EKS cluster, you can use the Livy endpoint to submit Spark applications to your Livy server. The server manages the lifecycle of the Spark applications.

**Note**  
Amazon EMR calculates pricing on Amazon EKS based on vCPU and memory consumption. This calculation applies to driver and executor pods. This calculation starts from when you download your Amazon EMR application image until the Amazon EKS pod terminates and is rounded to the nearest second.

**Topics**
+ [

# Setting up Apache Livy for Amazon EMR on EKS
](job-runs-apache-livy-setup.md)
+ [

# Getting started with Apache Livy on Amazon EMR on EKS
](job-runs-apache-livy-install.md)
+ [

# Running a Spark application with Apache Livy for Amazon EMR on EKS
](job-runs-apache-livy-run-spark.md)
+ [

# Uninstalling Apache Livy with Amazon EMR on EKS
](job-runs-apache-livy-uninstall.md)
+ [

# Security for Apache Livy with Amazon EMR on EKS
](job-runs-apache-livy-security.md)
+ [

# Installation properties for Apache Livy on Amazon EMR on EKS releases
](job-runs-apache-livy-installation-properties.md)
+ [

# Troubleshoot common environment-variable format errors
](job-runs-apache-livy-troubleshooting.md)

# Setting up Apache Livy for Amazon EMR on EKS
Setting up

Before you can install Apache Livy on your Amazon EKS cluster, you must install and configure a set of prerequisite tools. These include the AWS CLI, which is a foundational command-line tool for working with AWS resources, command-line tools for working with Amazon EKS, and a controller that's used in this use case to make your cluster application available to the internet and to route network traffic.
+ **[Install or update to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) ** – If you've already installed the AWS CLI, confirm that you have the latest version.
+ **[Set up kubectl and eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) ** – eksctl is a command line tool that you use to communicate with Amazon EKS.
+ **[Install Helm](https://docs.aws.amazon.com/eks/latest/userguide/helm.html)** – The Helm package manager for Kubernetes helps you install and manage applications on your Kubernetes cluster. 
+ **[Get started with Amazon EKS – eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) ** – Follow the steps to create a new Kubernetes cluster with nodes in Amazon EKS.
+ **[Select an Amazon EMR release label](docker-custom-images-tag.md)** – the Apache Livy is supported with Amazon EMR releases 7.1.0 and higher.
+ **[Install the ALB controller](https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html)** – the ALB controller manages AWS Elastic Load Balancing for Kubernetes clusters. It creates an AWS Network Load Balancer (NLB) when you create a Kubernetes Ingress while setting up Apache Livy.

# Getting started with Apache Livy on Amazon EMR on EKS
Getting started

Complete the following steps to install Apache Livy. They include configuring the package manager, creating a namespace for running Spark workloads, installing Livy, setting up load balancing, and verification steps. You have to complete these steps in order to run a batch job with Spark.

1. If you haven't already, set up [Apache Livy for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-setup.html).

1. Authenticate your Helm client to the Amazon ECR registry. You can find the corresponding `ECR-registry-account` value for your AWS Region from [Amazon ECR registry accounts by Region](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/docker-custom-images-tag.html#docker-custom-images-ECR).

   ```
   aws ecr get-login-password \--region <AWS_REGION> | helm registry login \
   --username AWS \
   --password-stdin <ECR-registry-account>.dkr.ecr.<region-id>.amazonaws.com
   ```

1. Setting up Livy creates a service account for the Livy server and another account for the Spark application. To set up IRSA for the service accounts, see [Setting up access permissions with IAM roles for service accounts (IRSA)](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-irsa.html). 

1. Create a namespace to run your Spark workloads.

   ```
   kubectl create ns <spark-ns>
   ```

1. Use the following command to install Livy.

   This Livy endpoint is only internally available to the VPC in the EKS cluster. To enable access beyond the VPC, set `—-set loadbalancer.internal=false` in your Helm installation command.
**Note**  
By default, SSL is not enabled within this Livy endpoint and the endpoint is only visible inside the VPC of the EKS cluster. If you set `loadbalancer.internal=false` and `ssl.enabled=false`, you are exposing an insecure endpointto outside of your VPC. To set up a secure Livy endpoint, see [Configuring a secure Apache Livy endpoint with TLS/SSL](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-secure-endpoint.html). 

   ```
   helm install livy-demo \
     oci://895885662937.dkr.ecr.region-id.amazonaws.com/livy \
     --version 7.12.0 \
     --namespace livy-ns \
     --set image=ECR-registry-account.dkr.ecr.region-id.amazonaws.com/livy/emr-7.12.0:latest \
     --set sparkNamespace=<spark-ns> \
     --create-namespace
   ```

   You should see the following output.

   ```
   NAME: livy-demo
   LAST DEPLOYED: Mon Mar 18 09:23:23 2024
   NAMESPACE: livy-ns
   STATUS: deployed
   REVISION: 1
   TEST SUITE: None
   NOTES:
   The Livy server has been installed.
   Check installation status:
   1. Check Livy Server pod is running
     kubectl --namespace livy-ns get pods -l "app.kubernetes.io/instance=livy-demo"
   2. Verify created NLB is in Active state and it's target groups are healthy (if loadbalancer.enabled is true)
   
   Access LIVY APIs:
       # Ensure your NLB is active and healthy
       # Get the Livy endpoint using command:
       LIVY_ENDPOINT=$(kubectl get svc -n livy-ns -l app.kubernetes.io/instance=livy-demo,emr-containers.amazonaws.com/type=loadbalancer -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}' |  awk '{printf "%s:8998\n", $0}')
       # Access Livy APIs using http://$LIVY_ENDPOINT or https://$LIVY_ENDPOINT (if SSL is enabled)
       # Note: While uninstalling Livy, makes sure the ingress and NLB are deleted after running the helm command to avoid dangling resources
   ```

   The default service account names for the Livy server and the Spark session are `emr-containers-sa-livy` and `emr-containers-sa-spark-livy`. To use custom names, use the `serviceAccounts.name` and `sparkServiceAccount.name` parameters.

   ```
   --set serviceAccounts.name=my-service-account-for-livy
   --set sparkServiceAccount.name=my-service-account-for-spark
   ```

1. Verify that you installed the Helm chart.

   ```
   helm list -n livy-ns -o yaml
   ```

   The `helm list` command should return information about your new Helm chart.

   ```
   app_version: 0.7.1-incubating
   chart: livy-emr-7.12.0
   name: livy-demo
   namespace: livy-ns
   revision: "1"
   status: deployed
   updated: 2024-02-08 22:39:53.539243 -0800 PST
   ```

1. Verify that the Network Load Balancer is active.

   ```
   LIVY_NAMESPACE=<livy-ns>
   LIVY_APP_NAME=<livy-app-name>
   AWS_REGION=<AWS_REGION>
   
   # Get the NLB Endpoint URL
   NLB_ENDPOINT=$(kubectl --namespace $LIVY_NAMESPACE get svc -l "app.kubernetes.io/instance=$LIVY_APP_NAME,emr-containers.amazonaws.com/type=loadbalancer" -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}') 
   
   # Get all the load balancers in the account's region
   ELB_LIST=$(aws elbv2 describe-load-balancers --region $AWS_REGION)
   
   # Get the status of the NLB that matching the endpoint from the Kubernetes service
   NLB_STATUS=$(echo $ELB_LIST | grep -A 8 "\"DNSName\": \"$NLB_ENDPOINT\"" | awk '/Code/{print $2}/}/' | tr -d '"},\n')
   echo $NLB_STATUS
   ```

1. Now verify that the target group in the Network Load Balancer is healthy.

   ```
   LIVY_NAMESPACE=<livy-ns>
   LIVY_APP_NAME=<livy-app-name>
   AWS_REGION=<AWS_REGION>
   
   # Get the NLB endpoint
   NLB_ENDPOINT=$(kubectl --namespace $LIVY_NAMESPACE get svc -l "app.kubernetes.io/instance=$LIVY_APP_NAME,emr-containers.amazonaws.com/type=loadbalancer" -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}') 
   
   # Get all the load balancers in the account's region
   ELB_LIST=$(aws elbv2 describe-load-balancers --region $AWS_REGION)
   
   # Get the NLB ARN from the NLB endpoint
   NLB_ARN=$(echo $ELB_LIST | grep -B 1 "\"DNSName\": \"$NLB_ENDPOINT\"" | awk '/"LoadBalancerArn":/,/"/'| awk '/:/{print $2}' | tr -d \",)
   
   # Get the target group from the NLB. Livy setup only deploys 1 target group
   TARGET_GROUP_ARN=$(aws elbv2 describe-target-groups --load-balancer-arn $NLB_ARN --region $AWS_REGION | awk '/"TargetGroupArn":/,/"/'| awk '/:/{print $2}' | tr -d \",)
   
   # Get health of target group
   aws elbv2 describe-target-health --target-group-arn $TARGET_GROUP_ARN
   ```

   The following is sample output that shows the status of the target group:

   ```
   {
       "TargetHealthDescriptions": [
           {
               "Target": {
                   "Id": "<target IP>",
                   "Port": 8998,
                   "AvailabilityZone": "us-west-2d"
               },
               "HealthCheckPort": "8998",
               "TargetHealth": {
                   "State": "healthy"
               }
           }
       ]
   }
   ```

   Once the status of your NLB becomes `active` and your target group is `healthy`, you can continue. It might take a few minutes.

1. Retrieve the Livy endpoint from the Helm installation. Whether or not your Livy endpoint is secure depends on whether you enabled SSL.

   ```
   LIVY_NAMESPACE=<livy-ns>
    LIVY_APP_NAME=livy-app-name
    LIVY_ENDPOINT=$(kubectl get svc -n livy-ns -l app.kubernetes.io/instance=livy-app-name,emr-containers.amazonaws.com/type=loadbalancer -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}' |  awk '{printf "%s:8998\n", $0}')
    echo "$LIVY_ENDPOINT"
   ```

1. Retrieve the Spark service account from the Helm installation

   ```
   SPARK_NAMESPACE=spark-ns
   LIVY_APP_NAME=<livy-app-name>
   SPARK_SERVICE_ACCOUNT=$(kubectl --namespace $SPARK_NAMESPACE get sa -l "app.kubernetes.io/instance=$LIVY_APP_NAME" -o jsonpath='{.items[0].metadata.name}')
   echo "$SPARK_SERVICE_ACCOUNT"
   ```

   You should see something similar to the following output:

   ```
   emr-containers-sa-spark-livy
   ```

1. If you set `internalALB=true` to enable access from outside of your VPC, create an Amazon EC2 instance and make sure the Network Load Balancer allows network traffic coming from the EC2 instance. You must do so for the instance to have access to your Livy endpoint. For more information about securely exposing your endpoint outside of your VPC, see [Setting up with a secure Apache Livy endpoint with TLS/SSL](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-secure-endpoint.html).

1. Installing Livy creates the service account `emr-containers-sa-spark` to run Spark applications. If your Spark application uses any AWS resources like S3 or calls AWS API or CLI operations, you must link an IAM role with the necessary permissions to your spark service account. For more information, see [Setting up access permissions with IAM roles for service accounts (IRSA).](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-irsa.html)

Apache Livy supports additional configurations that you can use while installing Livy. For more information, see Installation properties for Apache Livy on Amazon EMR on EKS releases.

# Running a Spark application with Apache Livy for Amazon EMR on EKS
Running a Spark application

Before you can run a Spark application with Apache Livy, make sure that you have completed the steps in [Setting up Apache Livy for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-setup.html) and [Getting started with Apache Livy for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-install.html).

You can use Apache Livy to run two types of applications:
+ Batch sessions – a type of Livy workload to submit Spark batch jobs.
+ Interactive sessions – a type of Livy workload that provides a programmatic and visual interface to run Spark queries.

**Note**  
Driver and executor pods from different sessions can communicate with each other. Namespaces don't guarantee any security between pods. Kubernetes doesn't allow selective permissions on a subset of pods inside a given namespace.

## Running batch sessions


To submit a batch job, use the following command.

```
curl -s -k -H 'Content-Type: application/json' -X POST \
      -d '{
            "name": "my-session",
            "file": "entryPoint_location (S3 or local)",
            "args": ["argument1", "argument2", ...],
            "conf": {
                "spark.kubernetes.namespace": "<spark-namespace>",
                "spark.kubernetes.container.image": "public.ecr.aws/emr-on-eks/spark/emr-7.12.0:latest",
                "spark.kubernetes.authenticate.driver.serviceAccountName": "<spark-service-account>"
            }
          }' <livy-endpoint>/batches
```

To monitor your batch job, use the following command.

```
curl -s -k -H 'Content-Type: application/json' -X GET <livy-endpoint>/batches/my-session
```

## Running interactive sessions


To run interactive sessions with Apache Livy, see the following steps.

1. Make sure you have access to either a self-hosted or a managed Jupyter notebook, such as a SageMaker AI Jupyter notebook. Your jupyter notebook must have [sparkmagic](https://github.com/jupyter-incubator/sparkmagic/blob/master/README.md) installed.

1. Create a bucket for Spark configuration `spark.kubernetes.file.upload.path`. Make sure the Spark service account has read and write access to the bucket. For more details on how to configure your spark service account, see Setting up access permissions with IAM roles for service accounts (IRSA)

1. Load sparkmagic in the Jupyter notebook with the command `%load_ext sparkmagic.magics`.

1. Run the command `%manage_spark` to set up your Livy endpoint with the Jupyter notebook. Choose the **Add Endpoints** tab, choose the configured auth type, add the Livy endpoint to the notebook, and then choose **Add endpoint**.

1. Run `%manage_spark` again to create the Spark context and then go to the **Create session**. Choose the Livy endpoint, specify a unique session name choose a language, and then add the following properties.

   ```
   {
     "conf": {
       "spark.kubernetes.namespace": "livy-namespace",
       "spark.kubernetes.container.image": "public.ecr.aws/emr-on-eks/spark/emr-7.12.0:latest",
       "spark.kubernetes.authenticate.driver.serviceAccountName": "<spark-service-account>", 
       "spark.kubernetes.file.upload.path": "<URI_TO_S3_LOCATION_>"
     }
   }
   ```

1. Submit the application and wait for it to create the Spark context.

1. To monitor the status of the interactive session, run the following command.

   ```
   curl -s -k -H 'Content-Type: application/json' -X GET livy-endpoint/sessions/my-interactive-session
   ```

## Monitoring Spark applications


To monitor the progress of your Spark applications with the Livy UI, use the link `http://<livy-endpoint>/ui`.

# Uninstalling Apache Livy with Amazon EMR on EKS
Uninstalling

Follow these steps to uninstall Apache Livy.

1. Delete the Livy setup using the names of your namespace and application name. In this example, the application name is `livy-demo` and the namespace is `livy-ns`.

   ```
   helm uninstall livy-demo -n livy-ns
   ```

1. When uninstalling, Amazon EMR on EKS deletes the Kubernetes service in Livy, the AWS load balancers, and the target groups that you created during installation. Deleting resources can take a few minutes. Make sure that the resources are deleted before installing Livy on the namespace again.

1. Delete the Spark namespace.

   ```
   kubectl delete namespace spark-ns
   ```

# Security for Apache Livy with Amazon EMR on EKS
Security

See the following topics to learn more about configuring security for Apache Livy with Amazon EMR on EKS. These options include using transport-layer security, role-based access control, which is access based on a person's role within an organization, and using IAM roles, which provide access to resources, based on granted permissions.

**Topics**
+ [

# Setting up a secure Apache Livy endpoint with TLS/SSL
](job-runs-apache-livy-secure-endpoint.md)
+ [

# Setting up the Apache Livy and Spark application permissions with role-based access control (RBAC)
](job-runs-apache-livy-rbac.md)
+ [

# Setting up access permissions with IAM roles for service accounts (IRSA)
](job-runs-apache-livy-irsa.md)

# Setting up a secure Apache Livy endpoint with TLS/SSL


See the following sections to learn more about setting up Apache Livy for Amazon EMR on EKS with end-to-end TLS and SSL encryption.

## Setting up TLS and SSL encryption


To set up SSL encryption on your Apache Livy endpoint, follow these steps.
+ [Install the Secrets Store CSI Driver and AWS Secrets and Configuration Provider (ASCP)](https://docs.aws.amazon.com/secretsmanager/latest/userguide/integrating_csi_driver.html) – the Secrets Store CSI Driver and ASCP securely store Livy's JKS certificates and passwords that the Livy server pod needs to enable SSL. You can also install just the Secrets Store CSI Driver and use any other supported secrets provider.
+ [Create an ACM certificate](https://docs.aws.amazon.com/acm/latest/userguide/gs-acm-request-public.html) – this certificate is required to secure the connection between the client and the ALB endpoint.
+ Set up a JKS certificate, key password, and keystore password for AWS Secrets Manager – required to secure the connection between the ALB endpoint and the Livy server.
+ Add permissions to the Livy service account to retrieve secrets from AWS Secrets Manager – the Livy server needs these permissions to retrieve secrets from ASCP and add the Livy configurations to secure the Livy server. To add IAM permissions to a service account, see Setting up access permissions with IAM roles for service accounts (IRSA).

### Setting up a JKS certificate with a key and a keystore password for AWS Secrets Manager


Follow these steps to set up a JKS certificate with a key and a keystore password.

1. Generate a keystore file for the Livy server.

   ```
   keytool -genkey -alias <host> -keyalg RSA -keysize 2048 –dname CN=<host>,OU=hw,O=hw,L=<your_location>,ST=<state>,C=<country> –keypass <keyPassword> -keystore <keystore_file> -storepass <storePassword> --validity 3650
   ```

1. Create a certificate.

   ```
   keytool -export -alias <host> -keystore mykeystore.jks -rfc -file mycertificate.cert -storepass <storePassword>
   ```

1. Create a truststore file.

   ```
   keytool -import -noprompt -alias <host>-file <cert_file> -keystore <truststore_file> -storepass <truststorePassword>
   ```

1. Save the JKS certificate in AWS Secrets Manager. Replace `livy-jks-secret` with your secret and `fileb://mykeystore.jks` with the path to your keystore JKS certificate.

   ```
   aws secretsmanager create-secret \ 
   --name livy-jks-secret \
   --description "My Livy keystore JKS secret" \
   --secret-binary fileb://mykeystore.jks
   ```

1. Save the keystore and key password in Secrets Manager. Make sure to use your own parameters.

   ```
   aws secretsmanager create-secret \
   --name livy-jks-secret \
   --description "My Livy key and keystore password secret" \
   --secret-string "{\"keyPassword\":\"<test-key-password>\",\"keyStorePassword\":\"<test-key-store-password>\"}"
   ```

1. Create a Livy server namespace with the following command.

   ```
   kubectl create ns <livy-ns>
   ```

1. Create the `ServiceProviderClass` object for the Livy server that has the JKS certificate and the passwords.

   ```
   cat >livy-secret-provider-class.yaml << EOF
   apiVersion: secrets-store.csi.x-k8s.io/v1
   kind: SecretProviderClass
   metadata:
     name: aws-secrets
   spec:
     provider: aws
     parameters:
       objects: |
           - objectName: "livy-jks-secret"
             objectType: "secretsmanager"
           - objectName: "livy-passwords"
             objectType: "secretsmanager"
                        
   EOF
   kubectl apply -f livy-secret-provider-class.yaml -n <livy-ns>
   ```

## Getting started with SSL-enabled Apache Livy


After enabling SSL on your Livy server, you must set up the `serviceAccount` to have access to the `keyStore` and `keyPasswords` secrets on AWS Secrets Manager.

1. Create the Livy server namespace.

   ```
   kubectl create namespace <livy-ns>
   ```

1. Set up the Livy service account to have access to the secrets in Secrets Manager. For more information about setting up IRSA, see [Setting up IRSA while installing Apache Livy](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-irsa.html#job-runs-apache-livy-irsa).

   ```
   aws ecr get-login-password \--region region-id | helm registry login \
   --username AWS \
   --password-stdin ECR-registry-account.dkr.ecr.region-id.amazonaws.com
   ```

1. Install Livy. For the Helm chart --version parameter, use your Amazon EMR release label, such as `7.1.0`. You must also replace the Amazon ECR registry account ID and Region ID with your own IDs. You can find the corresponding `ECR-registry-account` value for your AWS Region from [Amazon ECR registry accounts by Region](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/docker-custom-images-tag.html#docker-custom-images-ECR).

   ```
   helm install <livy-app-name> \
     oci://895885662937.dkr.ecr.region-id.amazonaws.com/livy \
     --version 7.12.0 \
     --namespace livy-namespace-name \
     --set image=<ECR-registry-account.dkr.ecr>.<region>.amazonaws.com/livy/emr-7.12.0:latest \
     --set sparkNamespace=spark-namespace \
     --set ssl.enabled=true
     --set ssl.CertificateArn=livy-acm-certificate-arn
     --set ssl.secretProviderClassName=aws-secrets
     --set ssl.keyStoreObjectName=livy-jks-secret
     --set ssl.keyPasswordsObjectName=livy-passwords
     --create-namespace
   ```

1. Continue from step 5 of the [Installing Apache Livy on Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-setup.html#job-runs-apache-livy-install).

# Setting up the Apache Livy and Spark application permissions with role-based access control (RBAC)
Setting up the Spark application with RBAC

To deploy Livy, Amazon EMR on EKS creates a server service account and role and a Spark service account and role. These roles must have the necessary RBAC permissions to finish setup and run Spark applications.

**RBAC permissions for the server service account and role**

Amazon EMR on EKS creates the Livy server service account and role to manage Livy sessions for Spark jobs and routing traffic to and from the ingress and other resources.

The default name for this service account is `emr-containers-sa-livy`. It must have the following permissions.

```
rules:
- apiGroups:
  - ""
  resources:
  - "namespaces"
  verbs:
  - "get"
- apiGroups:
  - ""
  resources:
  - "serviceaccounts"
    "services"
    "configmaps"
    "events"
    "pods"
    "pods/log"
  verbs:
  - "get"
    "list"
    "watch"
    "describe"
    "create"
    "edit"
    "delete"
    "deletecollection"
    "annotate"
    "patch"
    "label"
 - apiGroups:
   - ""
   resources:
   - "secrets"
   verbs:
   - "create"
     "patch"
     "delete"
     "watch"
 - apiGroups:
   - ""
   resources:
   - "persistentvolumeclaims"
   verbs:
   - "get"
     "list"
     "watch"
     "describe"
     "create"
     "edit"
     "delete"
     "annotate"
     "patch"
     "label"
```

**RBAC permissions for the spark service account and role**

A Spark driver pod needs a Kubernetes service account in the same namespace as the pod. This service account needs permissions to manage executor pods and any resources required by the driver pod. Unless the default service account in the namespace has the required permissions, the driver fails and exits. The following RBAC permissions are required.

```
rules:
- apiGroups:
  - ""
    "batch"
    "extensions"
    "apps"
  resources:
  - "configmaps"
    "serviceaccounts"
    "events"
    "pods"
    "pods/exec"
    "pods/log"
    "pods/portforward"
    "secrets"
    "services"
    "persistentvolumeclaims"
    "statefulsets"
  verbs:
  - "create"
    "delete"
    "get"
    "list"
    "patch"
    "update"
    "watch"
    "describe"
    "edit"
    "deletecollection"
    "patch"
    "label"
```

# Setting up access permissions with IAM roles for service accounts (IRSA)
IAM roles for service accounts (IRSA)

By default, the Livy server and Spark application's driver and executors don't have access to AWS resources. The server service account and spark service account controls access to AWS resources for the Livy server and spark application's pods. To grant access, you need to map the service accounts with an IAM role that has the necessary AWS permissions.

You can set up IRSA mapping before you install Apache Livy, during the installation, or after you finish the installation.

## Setting up IRSA while installing Apache Livy (for server service account)


**Note**  
This mapping is supported only for the server service account.

1. Make sure that you have finished [setting up Apache Livy for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-setup.html) and are in the middle of [installing Apache Livy with Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-install.html). 

1. Create a Kubernetes namespace for the Livy server. In this example, the name of the namespace is `livy-ns`.

1. Create an IAM policy that includes the permissions for the AWS services for which you want your pods to access. The following example creates an IAM policy of getting Amazon S3 resources for the Spark entry point.

   ```
   cat >my-policy.json <<EOF{
   "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
   "Effect": "Allow",
               "Action": "s3:GetObject",
               "Resource": "arn:aws:s3:::my-spark-entrypoint-bucket"
           }
       ]
   }
   EOF
   
   aws iam create-policy --policy-name my-policy --policy-document file://my-policy.json
   ```

1. Use the following command to set your AWS account ID to a variable.

   ```
   account_id=$(aws sts get-caller-identity --query "Account" --output text)
   ```

1. Set the OpenID Connect (OIDC) identity provider of your cluster to an environment variable.

   ```
   oidc_provider=$(aws eks describe-cluster --name my-cluster --region $AWS_REGION --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")
   ```

1. Set variables for the namespace and name of the service account. Be sure to use your own values.

   ```
   export namespace=default
   export service_account=my-service-account
   ```

1. Create a trust policy file with the following command. If you want to grant access of the role to all service accounts within a namespace, copy the following command, and replace `StringEquals` with `StringLike` and replace `$service_account` with `*`.

   ```
   cat >trust-relationship.json <<EOF
   {
     "Version": "2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Federated": "arn:aws:iam::$account_id:oidc-provider/$oidc_provider"
         },
         "Action": "sts:AssumeRoleWithWebIdentity",
         "Condition": {
           "StringEquals": {
             "$oidc_provider:aud": "sts.amazonaws.com",
             "$oidc_provider:sub": "system:serviceaccount:$namespace:$service_account"
           }
         }
       }
     ]
   }
   EOF
   ```

1. Create the role.

   ```
   aws iam create-role --role-name my-role --assume-role-policy-document file://trust-relationship.json --description "my-role-description"
   ```

1. Use the following Helm install command to set the `serviceAccount.executionRoleArn` to map IRSA. The following is an example of the Helm install command. You can find the corresponding `ECR-registry-account` value for your AWS Region from [Amazon ECR registry accounts by Region](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/docker-custom-images-tag.html#docker-custom-images-ECR).

   ```
   helm install livy-demo \
     oci://895885662937.dkr.ecr.us-west-2.amazonaws.com/livy \
     --version 7.12.0 \
     --namespace livy-ns \
     --set image=ECR-registry-account.dkr.ecr.region-id.amazonaws.com/livy/emr-7.12.0:latest \
     --set sparkNamespace=spark-ns \
     --set serviceAccount.executionRoleArn=arn:aws:iam::123456789012:role/my-role
   ```

## Mapping IRSA to a Spark service account


Before you map IRSA to a Spark service account, make sure that you have completed the following items:
+ Make sure that you have finished [setting up Apache Livy for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-setup.html) and are in the middle of [installing Apache Livy with Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-install.html). 
+ You must have an existing IAM OpenID Connect (OIDC) provdider for your cluster. To see if you already have one or how to create one, see [Create an IAM OIDC provider for your cluster](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html).
+ Make sure that you have installed version 0.171.0 or later of the `eksctl` CLI installed or AWS CloudShell. To install or update `eksctl`, see [Installation](https://eksctl.io/installation/) of the `eksctl` documentation.

Follow these steps to map IRSA to your Spark service account:

1. Use the following command to get the Spark service account.

   ```
   SPARK_NAMESPACE=<spark-ns>
   LIVY_APP_NAME=<livy-app-name>
   kubectl --namespace $SPARK_NAMESPACE describe sa -l "app.kubernetes.io/instance=$LIVY_APP_NAME" | awk '/^Name:/ {print $2}'
   ```

1. Set your variables for the namespace and name of the service account.

   ```
   export namespace=default
   export service_account=my-service-account
   ```

1. Use the following command to create a trust policy file for the IAM role. The following example gives permission to all service accounts within the namespace to use the role. To do so, replace `StringEquals` with `StringLike` and replace `$service_account` with \$1.

   ```
   cat >trust-relationship.json <<EOF
   {
     "Version": "2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Federated": "arn:aws:iam::$account_id:oidc-provider/$oidc_provider"
         },
         "Action": "sts:AssumeRoleWithWebIdentity",
         "Condition": {
           "StringEquals": {
             "$oidc_provider:aud": "sts.amazonaws.com",
             "$oidc_provider:sub": "system:serviceaccount:$namespace:$service_account"
           }
         }
       }
     ]
   }
   EOF
   ```

1. Create the role.

   ```
   aws iam create-role --role-name my-role --assume-role-policy-document file://trust-relationship.json --description "my-role-description"
   ```

1. Map the server or spark service account with the following `eksctl` command. Make sure to use your own values.

   ```
    eksctl create iamserviceaccount --name spark-sa \
    --namespace spark-namespace --cluster livy-eks-cluster \
    --attach-role-arn arn:aws:iam::0123456789012:role/my-role \
    --approve --override-existing-serviceaccounts
   ```

# Installation properties for Apache Livy on Amazon EMR on EKS releases
Installation properties

Apache Livy installation allows you to select a version of the Livy Helm chart. The Helm chart offers a variety of properties to customize your installation and setup experience. These properties are supported for Amazon EMR on EKS releases 7.1.0 and higher.

**Topics**
+ [

## Amazon EMR 7.1.0 installation properties
](#job-runs-apache-livy-installation-properties-710)

## Amazon EMR 7.1.0 installation properties
7.1.0 properties

The following table describes all of the supported Livy properties. When installing Apache Livy, you can choose the Livy Helm chart version. To set a property during the installation, use the command `--set <property>=<value>`.


| Property | Description | Default | 
| --- | --- | --- | 
| image | The Amazon EMR release URI of the Livy server. This is a required configuration. | "" | 
| sparkNamespace | Namespace to run Livy Spark sessions. For example, specify "livy". This is a required configuration. | "" | 
| nameOverride | Provide a name instead of livy. The name is set as a label for all Livy resources | "livy" | 
| fullnameOverride | Provide a name to use instead of the full names of resources. | "" | 
| ssl.enabled | Enables end-to-end SSL from Livy endpoint to Livy server. | FALSE | 
| ssl.certificateArn | If SSL is enabled, this is the ACM certificate ARN for the NLB created by the service.. | "" | 
| ssl.secretProviderClassName | If SSL is enabled, this is the secret provider class name to secure NLB for the Livy server connection with SSL. | "" | 
| ssl.keyStoreObjectName | If SSL is enabled, the object name for the keystore certificate in the secret provider class. | "" | 
| ssl.keyPasswordsObjectName | If SSL is enabled, the object name for the secret that has the keystore and key password. | "" | 
| rbac.create | If true, creates RBAC resources. | FALSE | 
| serviceAccount.create | If true, creates a Livy service account. | TRUE | 
| serviceAccount.name | The name of the service account to use for Livy. If you don't set this property and create a service account, Amazon EMR on EKS automatically generates a name using the fullname override property. | "emr-containers-sa-livy" | 
| serviceAccount.executionRoleArn | The execution role ARN of the Livy service account. | "" | 
| sparkServiceAccount.create | IF true, creates the Spark service account in .Release.Namespace | TRUE | 
| sparkServiceAccount.name | The name of the service account to use for Spark. If you don't set this property and create a Spark service account, Amazon EMR on EKS automatically generates a name with the fullnameOverride property with -spark-livy suffix. | "emr-containers-sa-spark-livy" | 
| service.name | Name of the Livy service | "emr-containers-livy" | 
| service.annotations | Livy service annotations | \$1\$1 | 
| loadbalancer.enabled | Whether to create a load balancer for the Livy service used to expose the Livy endpoint outside of the Amazon EKS cluster. | FALSE | 
| loadbalancer.internal | Whether to configure the Livy endpoint as internal to the VPC or external. Setting this property to `FALSE` exposes the endpoint to sources outside of the VPC. We recommend securing your endpoint with TLS/SSL. For more information, see [Setting up TLS and SSL encryption](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-security.html#job-runs-apache-livy-security-tls). | FALSE | 
| imagePullSecrets | The list of imagePullSecret names to use to pull Livy image from private repositories. | [] | 
| resources | The resource requests and limits for Livy containers. | \$1\$1 | 
| nodeSelector | The nodes for which to schedule Livy pods. | \$1\$1 | 
| tolerations | A list containing the Livy pods tolerations to define. | [] | 
| affinity | The Livy pods affinity rules. | \$1\$1 | 
| persistence.enabled | If true, enables persistance for sesions directories. | FALSE | 
| persistence.subPath | The PVC subpath to mount to sessions directories. | "" | 
| persistence.existingClaim | The PVC to use instead of creating a new one. | \$1\$1 | 
| persistence.storageClass | The storage class to use. To define this parameter, use the format storageClassName: <storageClass>. Setting this parameter to "-" disables dynamic provisioning. If you set this parameter to null or don't specify anything, Amazon EMR on EKS doesn't set a storageClassName and uses the default provisioner. | "" | 
| persistence.accessMode | The PVC access mode. | ReadWriteOnce | 
| persistence.size | The PVC size. | 20Gi | 
| persistence.annotations | Additional annotations for the PVC. | \$1\$1 | 
| env.\$1 | Additional envs to set to Livy container. For more information, see [Inputting your own Livy and Spark configurations while installing Livy](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-troubleshooting.html). | \$1\$1 | 
| envFrom.\$1 | Additional envs to set to Livy from a Kubernetes config map or secret. | [] | 
| livyConf.\$1 | Additional livy.conf entries to set from a mounted Kubernetes config map or secret. | \$1\$1 | 
| sparkDefaultsConf.\$1 | Additional spark-defaults.conf entries to set from a mounted Kubernetes config map or secret. | \$1\$1 | 

# Troubleshoot common environment-variable format errors


When you input Livy and Spark configurations, there are environment-variable formats that aren't supported and can cause errors. The procedure takes you through a series of steps to help ensure that you use correct formats.

**Inputting your own Livy and Spark configurations while installing Livy**

You can configure any Apache Livy or Apache Spark environment variable with the `env.*` Helm property. Follow the steps below to convert the example configuration `example.config.with-dash.withUppercase` to a supported environment variable format.

1. Replace uppercase letters with a 1 and a lowercase of the letter. For example, `example.config.with-dash.withUppercase` becomes `example.config.with-dash.with1uppercase`.

1. Replace dashes (-) with 0. For example, `example.config.with-dash.with1uppercase` becomes `example.config.with0dash.with1uppercase`

1. Replace dots (.) with underscores (\$1). For example, `example.config.with0dash.with1uppercase` becomes `example_config_with0dash_with1uppercase`.

1. Replace all lowercase letters with uppercase letters.

1. Add the prefix `LIVY_` to the variable name.

1. Use the variable while installing Livy through the helm chart using the format --set env.*YOUR\$1VARIABLE\$1NAME*.value=*yourvalue*

For example, to set the Livy and Spark configurations `livy.server.recovery.state-store = filesystem` and `spark.kubernetes.executor.podNamePrefix = my-prefix`, use these Helm properties:

```
—set env.LIVY_LIVY_SERVER_RECOVERY_STATE0STORE.value=filesystem
—set env.LIVY_SPARK_KUBERNETES_EXECUTOR_POD0NAME0PREFIX.value=myprefix
```

# Managing Amazon EMR on EKS job runs
Managing job runs

The following sections cover topics that help you manage your Amazon EMR on EKS job runs. These include configuring job run parameters when you use the AWS CLI, configuring how your log data is stored, running Spark SQL scripts to run queries, understanding job run states, and knowing how to monitor jobs. You can work through these topics, generally in order, if you want to set up and complete a job run to process data.

**Topics**
+ [

# Managing job runs with the AWS CLI
](emr-eks-jobs-CLI.md)
+ [

# Running Spark SQL scripts through the StartJobRun API
](emr-eks-jobs-spark-sql-parameters.md)
+ [

# Job run states
](emr-eks-jobs-states.md)
+ [

# Viewing jobs in the Amazon EMR console
](emr-eks-jobs-console.md)
+ [

# Common errors when running jobs
](emr-eks-jobs-error.md)

# Managing job runs with the AWS CLI
Manage with CLI

This topic covers how to manage job runs with the AWS Command Line Interface (AWS CLI). It goes into detail regarding properties, like security parameters, the driver, and various override settings. It also includes subtopics that cover various ways to configure logging.

**Topics**
+ [

## Options for configuring a job run
](#emr-eks-jobs-parameters)
+ [

# Configure a job run to use Amazon S3 logs
](emr-eks-jobs-s3.md)
+ [

# Configure a job run to use Amazon CloudWatch Logs
](emr-eks-jobs-cloudwatch.md)
+ [

## List job runs
](#emr-eks-jobs-list)
+ [

## Describe a job run
](#emr-eks-jobs-describe)
+ [

## Cancel a job run
](#emr-eks-jobs-cancel)

## Options for configuring a job run
Configure a job run

Use the following options to configure job run parameters:
+ `--execution-role-arn`: You must provide an IAM role that is used for running jobs. For more information, see [Using job execution roles with Amazon EMR on EKS](iam-execution-role.md). 
+ `--release-label`: You can deploy Amazon EMR on EKS with Amazon EMR versions 5.32.0 and 6.2.0 and later. Amazon EMR on EKS is not supported in previous Amazon EMR release versions. For more information, see [Amazon EMR on EKS releases](emr-eks-releases.md). 
+ `--job-driver`: Job driver is used to provide input on the main job. This is a union type field where you can only pass one of the values for the job type that you want to run. Supported job types include:
  + Spark submit jobs - Used to run a command through Spark submit. You can use this job type to run Scala, PySpark, SparkR, SparkSQL and any other supported jobs through Spark Submit. This job type has the following parameters:
    + Entrypoint - This is the HCFS (Hadoop compatible file system) reference to the main jar/py file you want to run.
    + EntryPointArguments - This is an array of arguments you want to pass to your main jar/py file. You should handle reading these parameters using your entrypoint code. Each argument in the array should be separated by a comma. EntryPointArguments cannot contain brackets or parentheses, such as (), \$1\$1, or []. 
    + SparkSubmitParameters - These are the additional spark parameters you want to send to the job. Use this parameter to override default Spark properties such as driver memory or number of executors like —conf or —class. For additional information, see [Launching Applications with spark-submit](https://spark.apache.org/docs/latest/submitting-applications.html#launching-applications-with-spark-submit).
  + Spark SQL jobs - Used to run a SQL query file through Spark SQL. You can use this job type to run SparkSQL jobs. This job type has the following parameters:
    + Entrypoint - This is the HCFS (Hadoop compatible file system) reference to the SQL query file you want to run.

      For a list of additional Spark parameters you can use for a Spark SQL job, see [Running Spark SQL scripts through the StartJobRun API](emr-eks-jobs-spark-sql-parameters.md).
+ `--configuration-overrides`: You can override the default configurations for applications by supplying a configuration object. You can use a shorthand syntax to provide the configuration or you can reference the configuration object in a JSON file. Configuration objects consist of a classification, properties, and optional nested configurations. Properties consist of the settings you want to override in that file. You can specify multiple classifications for multiple applications in a single JSON object. The configuration classifications that are available vary by Amazon EMR release version. For a list of configuration classifications that are available for each release version of Amazon EMR, see [Amazon EMR on EKS releases](emr-eks-releases.md).

  If you pass the same configuration in an application override and in Spark submit parameters, the Spark submit parameters take precedence. The complete configuration priority list follows, in order of highest priority to lowest priority.
  + Configuration supplied when creating `SparkSession`.
  + Configuration supplied as part of `sparkSubmitParameters` using `—conf`.
  + Configuration provided as part of application overrides.
  + Optimized configurations chosen by Amazon EMR for the release.
  + Default open source configurations for the application.

  To monitor job runs using Amazon CloudWatch or Amazon S3, you must provide the configuration details for CloudWatch. For more information, see [Configure a job run to use Amazon S3 logs](emr-eks-jobs-s3.md) and [Configure a job run to use Amazon CloudWatch Logs](emr-eks-jobs-cloudwatch.md). If the S3 bucket or CloudWatch log group does not exist, then Amazon EMR creates it before uploading logs to the bucket.
+ For an additional list of Kubernetes configuration options, see [Spark Properties on Kubernetes](https://spark.apache.org/docs/latest/running-on-kubernetes.html#configuration). 

  The following Spark configurations are not supported.
  + `spark.kubernetes.authenticate.driver.serviceAccountName`
  + `spark.kubernetes.authenticate.executor.serviceAccountName`
  + `spark.kubernetes.namespace`
  + `spark.kubernetes.driver.pod.name`
  + `spark.kubernetes.container.image.pullPolicy`
  + `spark.kubernetes.container.image`
**Note**  
You can use `spark.kubernetes.container.image` for customized Docker images. For more information, see [Customizing Docker images for Amazon EMR on EKS](docker-custom-images.md).

# Configure a job run to use Amazon S3 logs
Use S3 logs

To be able to monitor the job progress and to troubleshoot failures, you must configure your jobs to send log information to Amazon S3, Amazon CloudWatch Logs, or both. This topic helps you get started publishing application logs to Amazon S3 on your jobs that are launched with Amazon EMR on EKS.

**S3 logs IAM policy**

Before your jobs can send log data to Amazon S3, the following permissions must be included in the permissions policy for the job execution role. Replace *amzn-s3-demo-logging-bucket* with the name of your logging bucket.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject",
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket",
        "arn:aws:s3:::amzn-s3-demo-bucket/*"
      ],
      "Sid": "AllowS3Putobject"
    }
  ]
}
```

------

**Note**  
Amazon EMR on EKS can also create an Amazon S3 bucket. If an Amazon S3 bucket is not available, include the `“s3:CreateBucket”` permission in the IAM policy.

After you've given your execution role the proper permissions to send logs to Amazon S3, your log data are sent to the following Amazon S3 locations when `s3MonitoringConfiguration` is passed in the `monitoringConfiguration` section of a `start-job-run` request, as shown in [Managing job runs with the AWS CLI](emr-eks-jobs-CLI.md).
+ Submitter Logs - /*logUri*/*virtual-cluster-id*/jobs/*job-id*/containers/*pod-name*/(stderr.gz/stdout.gz)
+ Driver Logs - /*logUri*/*virtual-cluster-id*/jobs/*job-id*/containers/*spark-application-id*/spark-*job-id*-driver/(stderr.gz/stdout.gz)
+ Executor Logs - /*logUri*/*virtual-cluster-id*/jobs/*job-id*/containers/*spark-application-id*/*executor-pod-name*/(stderr.gz/stdout.gz)

# Configure a job run to use Amazon CloudWatch Logs
Use CloudWatch Logs

To monitor job progress and to troubleshoot failures, you must configure your jobs to send log information to Amazon S3, Amazon CloudWatch Logs, or both. This topic helps you get started using CloudWatch Logs on your jobs that are launched with Amazon EMR on EKS. For more information about CloudWatch Logs, see [Monitoring Log Files](https://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/WhatIsCloudWatchLogs.html) in the Amazon CloudWatch User Guide.

**CloudWatch Logs IAM policy**

For your jobs to send log data to CloudWatch Logs, the following permissions must be included in the permissions policy for the job execution role. Replace *my\$1log\$1group\$1name* and *my\$1log\$1stream\$1prefix* with names of your CloudWatch log group and log stream names, respectively. Amazon EMR on EKS creates the log group and log stream if they do not exist as long as the execution role ARN has appropriate permissions. 

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "logs:CreateLogStream",
        "logs:DescribeLogGroups",
        "logs:DescribeLogStreams"
      ],
      "Resource": [
        "arn:aws:logs:*:*:*"
      ],
      "Sid": "AllowLOGSCreatelogstream"
    },
    {
      "Effect": "Allow",
      "Action": [
        "logs:PutLogEvents"
      ],
      "Resource": [
        "arn:aws:logs:*:*:log-group:my_log_group_name:log-stream:my_log_stream_prefix/*"
      ],
      "Sid": "AllowLOGSPutlogevents"
    }
  ]
}
```

------

**Note**  
Amazon EMR on EKS can also create a log stream. If a log stream does not exist, the IAM policy should include the`"logs:CreateLogGroup"` permission.

After you've given your execution role the proper permissions, your application sends its log data to CloudWatch Logs when `cloudWatchMonitoringConfiguration` is passed in the `monitoringConfiguration` section of a `start-job-run` request, as shown in [Managing job runs with the AWS CLI](emr-eks-jobs-CLI.md).

In the `StartJobRun` API, *log\$1group\$1name *is the log group name for CloudWatch, and *log\$1stream\$1prefix* is the log stream name prefix for CloudWatch. You can view and search these logs in the AWS Management Console.
+ Submitter logs - *logGroup*/*logStreamPrefix*/*virtual-cluster-id*/jobs/*job-id*/containers/*pod-name*/(stderr/stdout)
+ Driver logs - *logGroup*/*logStreamPrefix*/*virtual-cluster-id*/jobs/*job-id*/containers/*spark-application-id*/spark-*job-id*-driver/(stderrstdout)
+ Executor logs - *logGroup*/*logStreamPrefix*/*virtual-cluster-id*/jobs/*job-id*/containers/*spark-application-id*/*executor-pod-name*/(stderr/stdout)

## List job runs


You can run `list-job-run` to show the states of job runs, as the following example demonstrates. 

```
aws emr-containers list-job-runs --virtual-cluster-id <cluster-id>
```

## Describe a job run


You can run `describe-job-run` to get more details about the job, such as job state, state details, and job name, as the following example demonstrates. 

```
aws emr-containers describe-job-run --virtual-cluster-id cluster-id --id job-run-id
```

## Cancel a job run


You can run `cancel-job-run` to cancel running jobs, as the following example demonstrates.

```
aws emr-containers cancel-job-run --virtual-cluster-id cluster-id --id job-run-id
```

# Running Spark SQL scripts through the StartJobRun API
Run Spark SQL scripts

Amazon EMR on EKS releases 6.7.0 and higher include a Spark SQL job driver so that you can run Spark SQL scripts through the `StartJobRun` API. You can supply SQL entry-point files to directly run Spark SQL queries on Amazon EMR on EKS with the `StartJobRun` API, without any modifications to existing Spark SQL scripts. The following table lists Spark parameters that are supported for the Spark SQL jobs through the StartJobRun API.

You can choose from the following Spark parameters to send to a Spark SQL job. Use these parameters to override default Spark properties.


| Option | Description | 
| --- | --- | 
|  --name NAME  | Application Name | 
| --jars JARS | Comma separated list of jars to be included with driver and execute classpath. | 
| --packages | Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. | 
| --exclude-packages | Comma-separated list of groupId:artifactId, to exclude while resolving the dependencies provided in –packages to avoid dependency conflicts. | 
| --repositories | Comma-separated list of additional remote repositories to search for the maven coordinates given with –packages. | 
| --files FILES | Comma-separated list of files to be placed in the working directory of each executor. | 
| --conf PROP=VALUE | Spark configuration property. | 
| --properties-file FILE | Path to a file from which to load extra properties. | 
| --driver-memory MEM | Memory for driver. Default 1024MB. | 
| --driver-java-options | Extra Java options to pass to the driver. | 
| --driver-library-path | Extra library path entries to pass to the driver. | 
| --driver-class-path | Extra classpath entries to pass to the driver. | 
| --executor-memory MEM | Memory per executor. Default 1GB. | 
| --driver-cores NUM | Number of cores used by the driver. | 
| --total-executor-cores NUM | Total cores for all executors. | 
| --executor-cores NUM | Number of cores used by each executor. | 
| --num-executors NUM | Number of executors to launch. | 
| -hivevar <key=value> | Variable substitution to apply to Hive commands, for example, -hivevar A=B | 
| -hiveconf <property=value> | Value to use for the given property. | 

For a Spark SQL job, create a start-job-run-request.json file and specify the required parameters for your job run, as in the following example:

```
{
  "name": "myjob", 
  "virtualClusterId": "123456",  
  "executionRoleArn": "iam_role_name_for_job_execution", 
  "releaseLabel": "emr-6.7.0-latest", 
  "jobDriver": {
    "sparkSqlJobDriver": {
      "entryPoint": "entryPoint_location",
       "sparkSqlParameters": "--conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1"
    }
  }, 
  "configurationOverrides": {
    "applicationConfiguration": [
      {
        "classification": "spark-defaults", 
        "properties": {
          "spark.driver.memory":"2G"
         }
      }
    ], 
    "monitoringConfiguration": {
      "persistentAppUI": "ENABLED", 
      "cloudWatchMonitoringConfiguration": {
        "logGroupName": "my_log_group", 
        "logStreamNamePrefix": "log_stream_prefix"
      }, 
      "s3MonitoringConfiguration": {
        "logUri": "s3://my_s3_log_location"
      }
    }
  }
}
```

# Job run states


When you submit a job run to an Amazon EMR on EKS job queue, the job run enters the `PENDING` state. It then passes through the following states until it succeeds (exits with code `0`) or fails (exits with a non-zero code). 

Job runs can have the following states:
+ `PENDING` ‐ The initial job state when the job run is submitted to Amazon EMR on EKS. The job is waiting to be submitted to the virtual cluster, and Amazon EMR on EKS is working on submitting this job.
+ `SUBMITTED` ‐ A job run that has been successfully submitted to the virtual cluster. The cluster scheduler then tries to run this job on the cluster.
+ `RUNNING` ‐ A job run that is running in the virtual cluster. In Spark applications, this means that the Spark driver process is in the `running` state.
+ `FAILED` ‐ A job run that failed to be submitted to the virtual cluster or that completed unsuccessfully. Look at StateDetails and FailureReason to find additional information about this job failure.
+ `COMPLETED` ‐ A job run that has completed successfully.
+ `CANCEL_PENDING` ‐ A job run has been requested for cancellation. Amazon EMR on EKS is trying to cancel the job on the virtual cluster.
+ `CANCELLED` ‐ A job run that was cancelled successfully.

# Viewing jobs in the Amazon EMR console
View jobs in the console

Job run data is avilable to view, so you can monitor each job as it passes through the states. To view jobs in the Amazon EMR console, perform the following steps.

1. In the Amazon EMR console lefthand menu, under Amazon EMR on EKS, choose **Virtual clusters**.

1. From the list of virtual clusters, select the virtual cluster for which you want to view jobs.

1. On the **Job runs** table, select **View logs** to view the details of a job run.

**Note**  
Support for the one-click experience is enabled by default. It can be turned off by setting `persistentAppUI` to `DISABLED` in `monitoringConfiguration` during job submission. For more information, see [View Persistent Application User Interfaces](https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html).

# Common errors when running jobs
Common job run errors

The following errors may occur when you run `StartJobRun` API. The table lists each error and provides mitigation steps so you can address issues quickly.


| Error Message | Error Condition | Recommended Next Step | 
| --- | --- | --- | 
|  error: argument --*argument* is required  | Required parameters are missing. | Add the missing arguments to the API request. | 
| An error occurred (AccessDeniedException) when calling the StartJobRun operation: User: ARN is not authorized to perform: emr-containers:StartJobRun | Execution role is missing. | See Using [Using job execution roles with Amazon EMR on EKS](iam-execution-role.md).  | 
|  An error occurred (AccessDeniedException) when calling the StartJobRun operation: User: *ARN* is not authorized to perform: emr-containers:StartJobRun  |  Caller doesn't have permission to the execution role [valid / not valid format] via condition keys.  | See [Using job execution roles with Amazon EMR on EKS](iam-execution-role.md).  | 
|  An error occurred (AccessDeniedException) when calling the StartJobRun operation: User: *ARN* is not authorized to perform: emr-containers:StartJobRun  |  Job submitter and Execution role ARN are from different accounts.  | Ensure that job submitter and execution role ARN are from the same AWS account. | 
|  1 validation error detected: Value *Role* at 'executionRoleArn' failed to satisfy the ARN regular expression pattern: ^arn:(aws[a-zA-Z0-9-]\$1):iam::(\$1d\$112\$1)?:(role((\$1u002F)\$1(\$1u002F[\$1u0021-\$1u007F]\$1\$1u002F))[\$1w\$1=,.@-]\$1)  |  Caller has permissions for the execution role via condition keys, but the role does not satisfy the constraints of ARN format.  | Provide the execution role following the ARN format. See [Using job execution roles with Amazon EMR on EKS](iam-execution-role.md).  | 
|  An error occurred (ResourceNotFoundException) when calling the StartJobRun operation: Virtual cluster *Virtual Cluster ID* doesn't exist.  |  Virtual cluster ID is not found.  | Provide a virtual cluster ID registered with Amazon EMR on EKS. | 
|  An error occurred (ValidationException) when calling the StartJobRun operation: Virtual cluster state *state* is not valid to create resource JobRun.  |  Virtual cluster is not ready to execute job.  | See [Virtual cluster states](virtual-cluster.md#virtual-cluster-states).  | 
|  An error occurred (ResourceNotFoundException) when calling the StartJobRun operation: Release *RELEASE* doesn't exist.  |  The release specified in job submission is incorrect.  | See [Amazon EMR on EKS releases](emr-eks-releases.md).  | 
|  An error occurred (AccessDeniedException) when calling the StartJobRun operation: User: *ARN* is not authorized to perform: emr-containers:StartJobRun on resource: *ARN* with an explicit deny. An error occurred (AccessDeniedException) when calling the StartJobRun operation: User: *ARN* is not authorized to perform: emr-containers:StartJobRun on resource: *ARN*  | User is not authorized to call StartJobRun. | See [Using job execution roles with Amazon EMR on EKS](iam-execution-role.md).  | 
|  An error occurred (ValidationException) when calling the StartJobRun operation: configurationOverrides.monitoringConfiguration.s3MonitoringConfiguration.logUri failed to satisfy constraint : %s  |  S3 path URI syntax is not valid.  | logUri should be in the format of s3://...  | 

The following errors may occur when you run `DescribeJobRun` API before the job runs.


| Error Message | Error Condition | Recommended Next Step | 
| --- | --- | --- | 
|  stateDetails: JobRun submission failed.  Classification *classification* not supported. failureReason: VALIDATION\$1ERROR state: FAILED.  | Parameters in StartJobRun are not valid. | See [Amazon EMR on EKS releases](emr-eks-releases.md).  | 
|  stateDetails: Cluster *EKS Cluster ID* does not exist. failureReason: CLUSTER\$1UNAVAILABLE state: FAILED  | The EKS cluster is not available. | Check if the EKS cluster exists and has the right permissions. For more information, see [Setting up Amazon EMR on EKS](setting-up.md). | 
|  stateDetails: Cluster *EKS Cluster ID* does not have sufficient permissions. failureReason: CLUSTER\$1UNAVAILABLE state: FAILED  |  Amazon EMR does not have permissions to access the EKS cluster.  | Verify that permissions are set up for Amazon EMR on the registered namespace. For more information, see [Setting up Amazon EMR on EKS](setting-up.md). | 
|  stateDetails: Cluster *EKS Cluster ID* is currently not reachable. failureReason: CLUSTER\$1UNAVAILABLE state: FAILED  |  EKS cluster is not reachable.  | Check if EKS Cluster exists and has the right permissions. For more information, see [Setting up Amazon EMR on EKS](setting-up.md). | 
|  stateDetails: JobRun submission failed due to an internal error. failureReason: INTERNAL\$1ERROR state: FAILED  |  An internal error has occurred with the EKS cluster.  | N/A | 
|  stateDetails: Cluster *EKS Cluster ID* does not have sufficient resources. failureReason: USER\$1ERROR state: FAILED  |  There are insufficient resources in the EKS cluster to run the job.  | Add more capacity to the EKS node group or set up EKS Autoscaler. For more information, see [Cluster Autoscaler](https://docs.aws.amazon.com/eks/latest/userguide/cluster-autoscaler.html). | 

The following errors may occur when you run `DescribeJobRun` API after the job runs.


| Error Message | Error Condition | Recommended Next Step | 
| --- | --- | --- | 
|  stateDetails: Trouble monitoring your JobRun.  Cluster *EKS Cluster ID* does not exist. failureReason: CLUSTER\$1UNAVAILABLE state: FAILED  | The EKS cluster does not exist. | Check if EKS Cluster exists and has the right permissions. For more information, see [Setting up Amazon EMR on EKS](setting-up.md). | 
|  stateDetails: Trouble monitoring your JobRun. Cluster *EKS Cluster ID* does not have sufficient permissions. failureReason: CLUSTER\$1UNAVAILABLE state: FAILED  | Amazon EMR does not have permissions to access the EKS cluster. | Verify that permissions are set up for Amazon EMR on the registered namespace. For more information, see [Setting up Amazon EMR on EKS](setting-up.md). | 
|  stateDetails: Trouble monitoring your JobRun. Cluster *EKS Cluster ID* is currently not reachable. failureReason: CLUSTER\$1UNAVAILABLE state: FAILED  |  The EKS cluster is not reachable.  | Check if EKS Cluster exists and has the right permissions. For more information, see [Setting up Amazon EMR on EKS](setting-up.md). | 
|  stateDetails: Trouble monitoring your JobRun due to an internal error failureReason: INTERNAL\$1ERROR state: FAILED  |  An internal error has occurred and is preventing JobRun monitoring.  | N/A | 

The following error may occur when a job cannot start and the job waits in the SUBMITTED state for 15 minutes. This can be caused by a lack of cluster resources.


| Error Message | Error Condition | Recommended Next Step | 
| --- | --- | --- | 
|  cluster timeout  | The job has been in the SUBMITTED state for 15 minutes or more. | You can override the default setting of 15 minutes for this parameter with the configuration override shown below.  | 

Use the following configuration to change the cluster timeout setting to 30 minutes. Notice that you provide the new `job-start-timeout` value in seconds:

```
{
"configurationOverrides": {
  "applicationConfiguration": [{
      "classification": "emr-containers-defaults",
      "properties": {
          "job-start-timeout":"1800"
      }
  }]
}
```

# Using job templates


A job template stores values that can be shared across `StartJobRun` API invocations when starting a job run. It supports two use cases:
+ To prevent repetitive recurring `StartJobRun` API request values.
+ To enforce a rule that certain values must be provided via `StartJobRun` API requests.

Job templates enable you to define a reusable template for job runs to apply additional customization, for example:
+ Configuring executor and driver compute capacity
+ Setting security and governance properties such as IAM roles
+ Customizing a docker image to use across multiple applications and data pipelines

The following topics provide detailed information on using templates, including how to use them to start a job run and how to change template parameters.

**Topics**
+ [

# Create and using a job template to start a job run
](create-job-template.md)
+ [

# Defining job template parameters
](use-job-template-parameters.md)
+ [

# Controlling access to job templates
](iam-job-template.md)

# Create and using a job template to start a job run


This section describes creating a job template and using the template to start a job run with the AWS Command Line Interface (AWS CLI).

**To create a job template**

1. Create a `create-job-template-request.json` file and specify the required parameters for your job template, as shown in the following example JSON file. For information about all available parameters, see the [CreateJobTemplate](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/Welcome.html) API.

   Most values that are required for the `StartJobRun` API are also required for `jobTemplateData`. If you want to use placeholders for any parameters and provide values when invoking StartJobRun using a job template, please see the next section on job template parameters.

   ```
   {
      "name": "mytemplate",
      "jobTemplateData": {
           "executionRoleArn": "iam_role_arn_for_job_execution", 
           "releaseLabel": "emr-6.7.0-latest",
           "jobDriver": {
               "sparkSubmitJobDriver": { 
                   "entryPoint": "entryPoint_location",
                   "entryPointArguments": [ "argument1","argument2",...],
                   "sparkSubmitParameters": "--class <main_class> --conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1"
               }
           },
           "configurationOverrides": {
               "applicationConfiguration": [
                   {
                       "classification": "spark-defaults", 
                       "properties": {
                            "spark.driver.memory":"2G"
                       }
                   }
               ], 
               "monitoringConfiguration": {
                   "persistentAppUI": "ENABLED", 
                   "cloudWatchMonitoringConfiguration": {
                       "logGroupName": "my_log_group", 
                       "logStreamNamePrefix": "log_stream_prefix"
                   }, 
                   "s3MonitoringConfiguration": {
                       "logUri": "s3://my_s3_log_location/"
                   }
               }
           }
        }
   }
   ```

1. Use the `create-job-template` command with a path to the `create-job-template-request.json` file stored locally.

   ```
   aws emr-containers create-job-template \ 
   --cli-input-json file://./create-job-template-request.json
   ```

**To start a job run using a job template**

Supply the virtual cluster id, job template id, and job name in the `StartJobRun` command, as shown in the following example.

```
aws emr-containers start-job-run \
--virtual-cluster-id 123456 \
--name myjob \
--job-template-id 1234abcd
```

# Defining job template parameters


Job template parameters allow you to specify variables in the job template. Values for these parameter variables will need to be specified when starting a job run using that job template. Job template parameters are specified in `${parameterName}` format. You can choose to specify any value in a `jobTemplateData` field as a job template parameter. For each of the job template parameter variables, specify its data type (`STRING` or `NUMBER`) and optionally a default value. The example below shows how you can specify job template parameters for entry point location, main class, and S3 log location values.

**To specify entry point location, main class, and Amazon S3 log location as job template parameters**

1. Create a `create-job-template-request.json` file and specify the required parameters for your job template, as shown in the following example JSON file. For more information about the parameters, see the [CreateJobTemplate](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/Welcome.html) API.

   ```
   {
      "name": "mytemplate",
      "jobTemplateData": {
           "executionRoleArn": "iam_role_arn_for_job_execution", 
           "releaseLabel": "emr-6.7.0-latest",
           "jobDriver": {
               "sparkSubmitJobDriver": { 
                   "entryPoint": "${EntryPointLocation}",
                   "entryPointArguments": [ "argument1","argument2",...],
                   "sparkSubmitParameters": "--class ${MainClass} --conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1"
               }
           },
           "configurationOverrides": {
               "applicationConfiguration": [
                   {
                       "classification": "spark-defaults", 
                       "properties": {
                            "spark.driver.memory":"2G"
                       }
                   }
               ], 
               "monitoringConfiguration": {
                   "persistentAppUI": "ENABLED", 
                   "cloudWatchMonitoringConfiguration": {
                       "logGroupName": "my_log_group", 
                       "logStreamNamePrefix": "log_stream_prefix"
                   }, 
                   "s3MonitoringConfiguration": {
                       "logUri": "${LogS3BucketUri}"
                   }
               }
           },
           "parameterConfiguration": {
               "EntryPointLocation": {
                   "type": "STRING"
               },
               "MainClass": {
                   "type": "STRING",
                   "defaultValue":"Main"
               },
               "LogS3BucketUri": {
                   "type": "STRING",
                   "defaultValue":"s3://my_s3_log_location/"
               }
           }
       }
   }
   ```

1. Use the `create-job-template` command with a path to the `create-job-template-request.json` file stored locally or in Amazon S3.

   ```
   aws emr-containers create-job-template \ 
   --cli-input-json file://./create-job-template-request.json
   ```

**To start a job run using job template with job template parameters**

To start a job run with a job template containing job template parameters, specify the job template id as well as values for job template parameters in the `StartJobRun` API request as shown below.

```
aws emr-containers start-job-run \
--virtual-cluster-id 123456 \
--name myjob \
--job-template-id 1234abcd \
--job-template-parameters '{"EntryPointLocation": "entry_point_location","MainClass": "ExampleMainClass","LogS3BucketUri": "s3://example_s3_bucket/"}'
```

# Controlling access to job templates


`StartJobRun` policy lets you enforce that a user or a role can only run jobs using job templates that you specify and cannot run `StartJobRun` operations without using the specified job templates. To achieve this, first ensure that you give the user or role a read permission to the specified job templates as shown below.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "emr-containers:DescribeJobRun"
      ],
      "Resource": [
        "arn:aws:emr-containers:*:*:jobtemplate/job_template_1_id",
        "arn:aws:emr-containers:*:*:jobtemplate/job_template_2_id"
      ],
      "Sid": "AllowEMRCONTAINERSDescribejobtemplate"
    }
  ]
}
```

------

To enforce that a user or role is able to invoke `StartJobRun` operation only when using specified job templates, you can assign the following `StartJobRun` policy permission to a given user or role.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "emr-containers:StartJobRun"
      ],
      "Resource": [
        "arn:aws:emr-containers:*:*:/virtualclusters/virtual_cluster_id"
      ],
      "Condition": {
        "ArnLike": {
          "emr-containers:JobTemplateArn": [
            "arn:aws:emr-containers:*:*:jobtemplate/job_template_1_id",
            "arn:aws:emr-containers:*:*:jobtemplate/job_template_2_id"
          ]
        }
      },
      "Sid": "AllowEMRCONTAINERSStartjobrun"
    }
  ]
}
```

------

If the job template specifies a job template parameter inside the execution role ARN field, then the user will be able to provide a value for this parameter and thus be able to invoke `StartJobRun` using an arbitrary execution role. To restrict the execution roles the user can provide, see **Controlling access to the execution role** in [Using job execution roles with Amazon EMR on EKS](iam-execution-role.md). 

If no condition is specified in the above `StartJobRun` action policy for a given user or a role, the user or the role will be allowed to invoke `StartJobRun` action on the specified virtual cluster using an arbitrary job template that they have read access to or using an arbitrary execution role.

# Using pod templates


Beginning with Amazon EMR versions 5.33.0 or 6.3.0, Amazon EMR on EKS supports Spark’s pod template feature. A pod is a group of one or more containers, with shared storage and network resources, and a specification for how to run the containers. Pod templates are specifications that determine how to run each pod. You can use pod template files to define the driver or executor pod’s configurations that Spark configurations do not support. For more information about the Spark’s pod template feature, see [Pod Template](https://spark.apache.org/docs/latest/running-on-kubernetes.html#pod-template). 

**Note**  
The pod template feature only works with driver and executor pods. You cannot configure job submitter pods using the pod template.

## Common scenarios


You can define how to run Spark jobs on shared EKS clusters by using pod templates with Amazon EMR on EKS and save costs and improve resource utilization and performance.
+ To reduce costs, you can schedule Spark driver tasks to run on Amazon EC2 On-Demand Instances while scheduling Spark executor tasks to run on Amazon EC2 Spot Instances. 
+ To increase resource utilization, you can support multiple teams running their workloads on the same EKS cluster. Each team will get a designated Amazon EC2 node group to run their workloads on. You can use pod templates to apply a corresponding toleration to their workload. 
+ To improve monitoring, you can run a separate logging container to forward logs to your existing monitoring application. 

For example, the following pod template file demonstrates a common usage scenario. 

```
apiVersion: v1
kind: Pod
spec:
  volumes:
    - name: source-data-volume
      emptyDir: {}
    - name: metrics-files-volume
      emptyDir: {}
  nodeSelector:
    eks.amazonaws.com/nodegroup: emr-containers-nodegroup
  containers:
  - name: spark-kubernetes-driver # This will be interpreted as driver Spark main container
    env:
      - name: RANDOM
        value: "random"
    volumeMounts:
      - name: shared-volume
        mountPath: /var/data
      - name: metrics-files-volume
        mountPath: /var/metrics/data
  - name: custom-side-car-container # Sidecar container
    image: <side_car_container_image>
    env:
      - name: RANDOM_SIDECAR
        value: random
    volumeMounts:
      - name: metrics-files-volume
        mountPath: /var/metrics/data
    command:
      - /bin/sh
      - '-c'
      -  <command-to-upload-metrics-files>
  initContainers:
  - name: spark-init-container-driver # Init container
    image: <spark-pre-step-image>
    volumeMounts:
      - name: source-data-volume # Use EMR predefined volumes
        mountPath: /var/data
    command:
      - /bin/sh
      - '-c'
      -  <command-to-download-dependency-jars>
```

The pod template completes the following tasks:
+ Add a new [init container](https://kubernetes.io/docs/concepts/workloads/pods/init-containers/) that is executed before the Spark main container starts. The init container shares the [EmptyDir volume](https://kubernetes.io/docs/concepts/storage/volumes/#emptydir) called `source-data-volume` with the Spark main container. You can have your init container run initialization steps, such as downloading dependencies or generating input data. Then the Spark main container consumes the data.
+ Add another [sidecar container](https://kubernetes.io/docs/concepts/workloads/pods/#how-pods-manage-multiple-containers) that is executed along with the Spark main container. The two containers are sharing another `EmptyDir` volume called `metrics-files-volume`. Your Spark job can generate metrics, such as Prometheus metrics. Then the Spark job can put the metrics into a file and have the sidecar container upload the files to your own BI system for future analysis.
+ Add a new environment variable to the Spark main container. You can have your job consume the environment variable.
+ Define a [node selector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/), so that the pod is only scheduled on the `emr-containers-nodegroup` node group. This helps to isolate compute resources across jobs and teams.

## Enabling pod templates with Amazon EMR on EKS


To enable the pod template feature with Amazon EMR on EKS, configure the Spark properties `spark.kubernetes.driver.podTemplateFile` and `spark.kubernetes.executor.podTemplateFile` to point to the pod template files in Amazon S3. Spark then downloads the pod template file and uses it to construct driver and executor pods.

**Note**  
Spark uses the job execution role to load the pod template, so the job execution role must have permissions to access Amazon S3 to load the pod templates. For more information, see [Create a job execution role](creating-job-execution-role.md).

You can use the `SparkSubmitParameters` to specify the Amazon S3 path to the pod template, as the following job run JSON file demonstrates.

```
{
  "name": "myjob", 
  "virtualClusterId": "123456",  
  "executionRoleArn": "iam_role_name_for_job_execution", 
  "releaseLabel": "release_label", 
  "jobDriver": {
    "sparkSubmitJobDriver": {
      "entryPoint": "entryPoint_location",
      "entryPointArguments": ["argument1", "argument2", ...], 
       "sparkSubmitParameters": "--class <main_class> \
         --conf spark.kubernetes.driver.podTemplateFile=s3://path_to_driver_pod_template \
         --conf spark.kubernetes.executor.podTemplateFile=s3://path_to_executor_pod_template \
         --conf spark.executor.instances=2 \
         --conf spark.executor.memory=2G \
         --conf spark.executor.cores=2 \
         --conf spark.driver.cores=1"
    }
  }
}
```

Alternatively, you can use the `configurationOverrides` to specify the Amazon S3 path to the pod template, as the following job run JSON file demonstrates.

```
{
  "name": "myjob", 
  "virtualClusterId": "123456",  
  "executionRoleArn": "iam_role_name_for_job_execution", 
  "releaseLabel": "release_label", 
  "jobDriver": {
    "sparkSubmitJobDriver": {
      "entryPoint": "entryPoint_location",
      "entryPointArguments": ["argument1", "argument2", ...],  
       "sparkSubmitParameters": "--class <main_class> \
         --conf spark.executor.instances=2 \
         --conf spark.executor.memory=2G \
         --conf spark.executor.cores=2 \
         --conf spark.driver.cores=1"
    }
  }, 
  "configurationOverrides": {
    "applicationConfiguration": [
      {
        "classification": "spark-defaults", 
        "properties": {
          "spark.driver.memory":"2G",
          "spark.kubernetes.driver.podTemplateFile":"s3://path_to_driver_pod_template",
          "spark.kubernetes.executor.podTemplateFile":"s3://path_to_executor_pod_template"
         }
      }
    ]
  }
}
```

**Note**  
You need to follow the security guidelines when using the pod template feature with Amazon EMR on EKS, such as isolating untrusted application code. For more information, see [Amazon EMR on EKS security best practices](security-best-practices.md).
You cannot change the Spark main container names by using `spark.kubernetes.driver.podTemplateContainerName` and `spark.kubernetes.executor.podTemplateContainerName`, because these names are hardcoded as `spark-kubernetes-driver` and `spark-kubernetes-executors`. If you want to customize the Spark main container, you must specify the container in a pod template with these hardcoded names.

## Pod template fields


Consider the following field restrictions when configuring a pod template with Amazon EMR on EKS.
+ Amazon EMR on EKS allows only the following fields in a pod template to enable proper job scheduling.

  These are the allowed pod level fields:
  + `apiVersion`
  + `kind`
  + `metadata`
  + `spec.activeDeadlineSeconds`
  + `spec.affinity`
  + `spec.containers`
  + `spec.enableServiceLinks`
  + `spec.ephemeralContainers`
  + `spec.hostAliases`
  + `spec.hostname`
  + `spec.imagePullSecrets`
  + `spec.initContainers`
  + `spec.nodeName`
  + `spec.nodeSelector`
  + `spec.overhead`
  + `spec.preemptionPolicy`
  + `spec.priority`
  + `spec.priorityClassName`
  + `spec.readinessGates`
  + `spec.runtimeClassName`
  + `spec.schedulerName`
  + `spec.subdomain`
  + `spec.terminationGracePeriodSeconds`
  + `spec.tolerations`
  + `spec.topologySpreadConstraints`
  + `spec.volumes`

  These are the allowed Spark main container level fields:
  + `env`
  + `envFrom`
  + `name`
  + `lifecycle`
  + `livenessProbe`
  + `readinessProbe`
  + `resources`
  + `startupProbe`
  + `stdin`
  + `stdinOnce`
  + `terminationMessagePath`
  + `terminationMessagePolicy`
  + `tty`
  + `volumeDevices`
  + `volumeMounts`
  + `workingDir`

  When you use any disallowed fields in the pod template, Spark throws an exception and the job fails. The following example shows an error message in the Spark controller log due to disallowed fields. 

  ```
  Executor pod template validation failed.
  Field container.command in Spark main container not allowed but specified.
  ```
+  Amazon EMR on EKS predefines the following parameters in a pod template. The fields that you specify in a pod template must not overlap with these fields. 

  These are the predefined volume names:
  + `emr-container-communicate`
  + `config-volume`
  + `emr-container-application-log-dir`
  + `emr-container-event-log-dir`
  + `temp-data-dir`
  + `mnt-dir`
  + `home-dir`
  + `emr-container-s3`

  These are the predefined volume mounts that only apply to the Spark main container:
  + Name: `emr-container-communicate`; MountPath: `/var/log/fluentd`
  + Name: `emr-container-application-log-dir`; MountPath: `/var/log/spark/user`
  + Name: `emr-container-event-log-dir`; MountPath: `/var/log/spark/apps`
  + Name: `mnt-dir`; MountPath: `/mnt`
  + Name: `temp-data-dir`; MountPath: `/tmp`
  + Name: `home-dir`; MountPath: `/home/hadoop`

  These are the predefined environment variables that only apply to the Spark main container:
  + `SPARK_CONTAINER_ID`
  + `K8S_SPARK_LOG_URL_STDERR`
  + `K8S_SPARK_LOG_URL_STDOUT`
  + `SIDECAR_SIGNAL_FILE`
**Note**  
You can still use these predefined volumes and mount them into your additional sidecar containers. For example, you can use `emr-container-application-log-dir` and mount it to your own sidecar container defined in the pod template.

  If the fields you specify conflict with any of the predefined fields in the pod template, Spark throws an exception and the job fails. The following example shows an error message in the Spark application log due to conflicts with the predefined fields. 

  ```
  Defined volume mount path on main container must not overlap with reserved mount paths: [<reserved-paths>]
  ```

## Sidecar container considerations


Amazon EMR controls the lifecycle of the pods provisioned by Amazon EMR on EKS. The sidecar containers should follow the same lifecycle of the Spark main container. If you inject additional sidecar containers into your pods, we recommend that you integrate with the pod lifecycle management that Amazon EMR defines so that the sidecar container can stop itself when the Spark main container exits.

To reduce costs, we recommend that you implement a process that prevents driver pods with sidecar containers from continuing to run after your job completes. The Spark driver deletes executor pods when the executor is done. However, when a driver program completes, the additional sidecar containers continue to run. The pod is billed until Amazon EMR on EKS cleans up the driver pod, usually less than one minute after the driver Spark main container completes. To reduce costs, you can integrate your additional sidecar containers with the lifecycle management mechanism that Amazon EMR on EKS defines for both driver and executor pods, as described in the following section.

Spark main container in driver and executor pods sends `heartbeat` to a file `/var/log/fluentd/main-container-terminated` every two seconds. By adding the Amazon EMR predefined `emr-container-communicate` volume mount to your sidecar container, you can define a sub-process of your sidecar container to periodically track the last modified time for this file. The sub-process then stops itself if it discovers that the Spark main container stops the `heartbeat` for a longer duration. 

The following example demonstrates a sub-process that tracks the heartbeat file and stops itself. Replace *your\$1volume\$1mount* with the path where you mount the predefined volume. The script is bundled inside the image used by sidecar container. In a pod template file, you can specify a sidecar container with the following commands `sub_process_script.sh` and `main_command`.

```
MOUNT_PATH="your_volume_mount"
FILE_TO_WATCH="$MOUNT_PATH/main-container-terminated"
INITIAL_HEARTBEAT_TIMEOUT_THRESHOLD=60
HEARTBEAT_TIMEOUT_THRESHOLD=15
SLEEP_DURATION=10

function terminate_main_process() {
  # Stop main process
}

# Waiting for the first heartbeat sent by Spark main container
echo "Waiting for file $FILE_TO_WATCH to appear..."
start_wait=$(date +%s)
while ! [[ -f "$FILE_TO_WATCH" ]]; do
    elapsed_wait=$(expr $(date +%s) - $start_wait)
    if [ "$elapsed_wait" -gt "$INITIAL_HEARTBEAT_TIMEOUT_THRESHOLD" ]; then
        echo "File $FILE_TO_WATCH not found after $INITIAL_HEARTBEAT_TIMEOUT_THRESHOLD seconds; aborting"
        terminate_main_process
        exit 1
    fi
    sleep $SLEEP_DURATION;
done;
echo "Found file $FILE_TO_WATCH; watching for heartbeats..."

while [[ -f "$FILE_TO_WATCH" ]]; do
    LAST_HEARTBEAT=$(stat -c %Y $FILE_TO_WATCH)
    ELAPSED_TIME_SINCE_AFTER_HEARTBEAT=$(expr $(date +%s) - $LAST_HEARTBEAT)
    if [ "$ELAPSED_TIME_SINCE_AFTER_HEARTBEAT" -gt "$HEARTBEAT_TIMEOUT_THRESHOLD" ]; then
        echo "Last heartbeat to file $FILE_TO_WATCH was more than $HEARTBEAT_TIMEOUT_THRESHOLD seconds ago at $LAST_HEARTBEAT; terminating"
        terminate_main_process
        exit 0
    fi
    sleep $SLEEP_DURATION;
done;
echo "Outside of loop, main-container-terminated file no longer exists"
    
# The file will be deleted once the fluentd container is terminated

echo "The file $FILE_TO_WATCH doesn't exist any more;"
terminate_main_process
exit 0
```

# Using job retry policies
Using retry policies

In Amazon EMR on EKS versions 6.9.0 and later, you can set a retry policy for your job runs. Retry policies cause a job driver pod to be restarted automatically if it fails or is deleted. This makes long-running Spark streaming jobs more resilient to failures.

## Setting a retry policy for a job
Set a retry policy

To configure a retry policy, you provide a `RetryPolicyConfiguration` field using the [StartJobRun](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_StartJobRun.html) API. An example `retryPolicyConfiguration` is shown here:

```
aws emr-containers start-job-run \
--virtual-cluster-id cluster_id \
--name sample-job-name \
--execution-role-arn execution-role-arn \
--release-label emr-6.9.0-latest \
--job-driver '{
  "sparkSubmitJobDriver": {
    "entryPoint": "local:///usr/lib/spark/examples/src/main/python/pi.py",
    "entryPointArguments": [ "2" ],
    "sparkSubmitParameters": "--conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1"
  }
}' \
--retry-policy-configuration '{
    "maxAttempts": 5
  }' \
--configuration-overrides '{
  "monitoringConfiguration": {
    "cloudWatchMonitoringConfiguration": {
      "logGroupName": "my_log_group_name",
      "logStreamNamePrefix": "my_log_stream_prefix"
    },
    "s3MonitoringConfiguration": {
       "logUri": "s3://amzn-s3-demo-logging-bucket"
    }
  }
}'
```

**Note**  
`retryPolicyConfiguration` is only available from AWS CLI 1.27.68 version onwards. To update the AWS CLI to the latest version, see [Installing or updating the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html)

Configure the `maxAttempts` field with the maximum number of times you want the job driver pod to be restarted if it fails or is deleted. The execution interval between two job driver retry attempts is an exponential retry interval of (10 seconds, 20 seconds, 40 seconds ...) which is capped at 6 minutes, as described in the [Kubernetes documentation](https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-backoff-failure-policy).

**Note**  
Every additional job driver execution will be billed as another job run, and will be subject to [Amazon EMR on EKS pricing](https://aws.amazon.com/emr/pricing/#Amazon_EMR_on_Amazon_EKS).

### Retry policy configuration values

+ **Default retry policy for a job:** `StartJobRun` includes a retry policy set to 1 maximum attempt by default. You can configure the retry policy as desired.
**Note**  
If `maxAttempts` of the `retryPolicyConfiguration` is set to 1, it means that no retries will be done to bring up the driver pod on failure.
+ **Disabling retry policy for a job:** To disable a retry policy, set the max attempts value in retryPolicyConfiguration to 1.

  ```
  "retryPolicyConfiguration": {
      "maxAttempts": 1
  }
  ```
+ **Set maxAttempts for a job within the valid range:** `StartJobRun` call will fail if the `maxAttempts` value is outside the valid range. The valid `maxAttempts` range is from 1 to 2,147,483,647 (32-bit integer), the range supported for Kubernetes' `backOffLimit` configuration setting. For more information, see [Pod backoff failure policy](https://kubernetes.io/docs/concepts/workloads/controllers/job/#pod-backoff-failure-policy) in the Kubernetes documentation. If the `maxAttempts` value is invalid, the following error message is returned:

  ```
  {
   "message": "Retry policy configuration's parameter value of maxAttempts is invalid"
  }
  ```

## Retrieving a retry policy status for a job
Retrieve the policy status

You can view the status of the retry attempts for a job with the [https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_ListJobRuns.html](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_ListJobRuns.html) and [https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_DescribeJobRun.html](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_DescribeJobRun.html) APIs. Once you request a job with an enabled retry policy configuration, the `ListJobRun` and `DescribeJobRun` responses will contain the status of the retry policy in the `RetryPolicyExecution` field. In addition, the `DescribeJobRun` response will contain the `RetryPolicyConfiguration` that was input in the `StartJobRun` request for the job.

**Sample responses**

------
#### [ ListJobRuns response ]

```
{
  "jobRuns": [
    ...
    ...
    "retryPolicyExecution" : {
      "currentAttemptCount": 2
    }
    ...
    ...
  ]
}
```

------
#### [ DescribeJobRun response ]

```
{
  ...
  ...
  "retryPolicyConfiguration": {
    "maxAttempts": 5
   },
   "retryPolicyExecution" : {
    "currentAttemptCount": 2
  },
  ...
  ...
}
```

------

These fields will not be visible when retry policy is disabled in the job, as described below in [Retry policy configuration values](#retry-config). 

## Monitoring a job with a retry policy
Monitor the job

When you enable a retry policy, a CloudWatch event is generated for every job driver that is created. To subscribe to these events, set up a CloudWatch event rule using the following command:

```
aws events put-rule \
--name cwe-test \
--event-pattern '{"detail-type": ["EMR Job Run New Driver Attempt"]}'
```

The event will return information on the `newDriverPodName`, `newDriverCreatedAt` timestamp, `previousDriverFailureMessage`, and `currentAttemptCount` of the job drivers. These events will not be created if the retry policy is disabled.

For more information on how to monitor your job with CloudWatch events, see [Monitor jobs with Amazon CloudWatch Events](monitoring.md#monitoring-cloudwatch-events).

## Finding logs for drivers and executors
Find the driver logs

Driver pod names follow the format `spark-<job id>-driver-<random-suffix>`. The same `random-suffix` is added to the executor pod names that the driver spawns. When you use this `random-suffix`, you can find logs for a driver and its associated executors. The `random-suffix` is only present if the [retry policy is enabled](#retry-config) for the job; otherwise, the `random-suffix` is absent.

For more information on how to configure jobs with monitoring configuration for logging, see [Run a Spark application](getting-started.md#getting-started-run-spark-app).

# Using Spark event log rotation


With Amazon EMR 6.3.0 and later, you can turn on the Spark event log rotation feature for Amazon EMR on EKS. Instead of generating a single event log file, this feature rotates the file based on your configured time interval and removes the oldest event log files.

Rotating Spark event logs can help you avoid potential issues with a large Spark event log file generated for long running or streaming jobs. For example, you start a long running Spark job with an event log enabled with the `persistentAppUI` parameter. The Spark driver generates an event log file. If the job runs for hours or days and there is a limited disk space on the Kubernetes node, the event log file can consume all available disk space. Turning on the Spark event log rotation feature solves the problem by splitting the log file into multiple files and removing the oldest files.

**Note**  
This feature only works with Amazon EMR on EKS. Amazon EMR running on Amazon EC2 doesn't support Spark event log rotation.

To turn on the Spark event log rotation feature, configure the following Spark parameters:
+ `spark.eventLog.rotation.enabled` ‐ turns on log rotation. It is disabled by default in the Spark configuration file. Set it to true to turn on this feature. 
+ `spark.eventLog.rotation.interval` ‐ specifies time interval for the log rotation. The minimum value is 60 seconds. The default value is 300 seconds. 
+ `spark.eventLog.rotation.minFileSize` ‐ specifies a minimum file size to rotate the log file. The minimum and default value is 1 MB. 
+ `spark.eventLog.rotation.maxFilesToRetain` ‐ specifies how many rotated log files to keep during cleanup. The valid range is 1 to 10. The default value is 2. 

You can specify these parameters in the `sparkSubmitParameters` section of the [`StartJobRun`](emr-eks-jobs-submit.md) API, as the following example shows.

```
"sparkSubmitParameters": "--class org.apache.spark.examples.SparkPi --conf spark.eventLog.rotation.enabled=true --conf spark.eventLog.rotation.interval=300 --conf spark.eventLog.rotation.minFileSize=1m --conf spark.eventLog.rotation.maxFilesToRetain=2"
```

# Using Spark container log rotation


With Amazon EMR 6.11.0 and later, you can turn on the Spark container log rotation feature for Amazon EMR on EKS. Instead of generating a single `stdout` or `stderr` log file, this feature rotates the file based on your configured rotation size and removes the oldest log files from the container.

Rotating Spark container logs can help you avoid potential issues with a large Spark log files generated for long-running or streaming jobs. For example, you might start a long-running Spark job, and the Spark driver generates a container log file. If the job runs for hours or days and there is limited disk space on the Kubernetes node, the container log file can consume all available disk space. When you turn on Spark container log rotation, you split the log file into multiple files, and remove the oldest files.

To turn on the Spark container log rotation feature, configure the following Spark parameters:

**`containerLogRotationConfiguration`**  
Include this parameter in `monitoringConfiguration` to turn on log rotation. It is disabled by default. You must use `containerLogRotationConfiguration` in addition to `s3MonitoringConfiguration`.

**`rotationSize`**  
The `rotationSize` parameter specifies file size for the log rotation. The range of possible values is from `2KB` to `2GB`. The numeric unit portion of the `rotationSize` parameter is passed as an integer. Since decimal values aren't supported, you can specify a rotation size of 1.5GB, for example, with the value `1500MB`.

**`maxFilesToKeep`**  
The `maxFilesToKeep` parameter specifies the maximum number of files to retain in container after rotation has taken place. The minimum value is 1, and the maximum value is 50.

You can specify these parameters in the `monitoringConfiguration` section of the `StartJobRun` API, as the following example shows. In this example, with `rotationSize = "10 MB"` and `maxFilesToKeep = 3`, Amazon EMR on EKS rotates your logs at 10 MB, generates a new log file, and then purges the oldest log file once the number of log files reaches 3.

```
{
  "name": "my-long-running-job", 
  "virtualClusterId": "123456",  
  "executionRoleArn": "iam_role_name_for_job_execution", 
  "releaseLabel": "emr-6.11.0-latest", 
  "jobDriver": {
    "sparkSubmitJobDriver": {
      "entryPoint": "entryPoint_location",
      "entryPointArguments": ["argument1", "argument2", ...],  
       "sparkSubmitParameters": "--class main_class --conf spark.executor.instances=2 --conf spark.executor.memory=2G --conf spark.executor.cores=2 --conf spark.driver.cores=1"
    }
  }, 
  "configurationOverrides": {
    "applicationConfiguration": [
      {
        "classification": "spark-defaults", 
        "properties": {
          "spark.driver.memory":"2G"
         }
      }
    ], 
    "monitoringConfiguration": {
      "persistentAppUI": "ENABLED", 
      "cloudWatchMonitoringConfiguration": {
        "logGroupName": "my_log_group", 
        "logStreamNamePrefix": "log_stream_prefix"
      }, 
      "s3MonitoringConfiguration": {
        "logUri": "s3://my_s3_log_location"
      },
      "containerLogRotationConfiguration": {
        "rotationSize":"10MB",
        "maxFilesToKeep":"3"
      }
    }
  }
}
```

To start a job run with Spark container log rotation, include a path to the json file that you configured with these parameters in the [`StartJobRun`](emr-eks-jobs-submit.md) command.

```
aws emr-containers start-job-run \
--cli-input-json file://path-to-json-request-file
```

# Using vertical autoscaling with Amazon EMR Spark jobs
Using vertical autoscaling

Amazon EMR on EKS vertical autoscaling automatically tunes memory and CPU resources to adapt to the needs of the workload that you provide for Amazon EMR Spark applications. This simplifies resource management.

To track the real-time and historic resource utilization of your Amazon EMR Spark applications, vertical autoscaling leverages the Kubernetes [Vertical Pod Autoscaler (VPA)](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler). The vertical autoscaling capability uses the data that VPA collects to automatically tune the memory and CPU resources assigned to your Spark applications. This simplified process enhances reliability and optimizes cost.

**Topics**
+ [Setting up](jobruns-vas-setup.md)
+ [Getting started](jobruns-vas-gs.md)
+ [Configuration](jobruns-vas-configure.md)
+ [Monitoring the recommendations](jobruns-vas-monitor.md)
+ [Uninstalling](jobruns-vas-uninstall-operator.md)

# Setting up vertical autoscaling for Amazon EMR on EKS
Setting up

This topic helps you get your Amazon EKS cluster ready to submit Amazon EMR Spark jobs with vertical autoscaling. The setup process requires you to confirm or complete the tasks in the following sections:

**Topics**
+ [

## Prerequisites
](#jobruns-vas-prereqs)
+ [

## Install the Operator Lifecycle Manager (OLM) on your Amazon EKS cluster
](#jobruns-vas-install-olm)
+ [

## Install the Amazon EMR on EKS vertical autoscaling operator
](#jobruns-vas-install-operator)

## Prerequisites


Complete the following tasks before you install the vertical autoscaling Kubernetes operator on your cluster. If you've already completed any of the prerequisites, you can skip those and move on to the next one.
+ **[Install or update to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) ** – If you've already installed the AWS CLI, confirm that you have the latest version.
+ **[Install kubectl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html)** – kubectl is a command line tool that you use to communicate with the Kubernetes API server. You need kubectl to install and monitor vertical autoscaling-related artifacts on your Amazon EKS cluster.
+ **[Install the Operator SDK](https://sdk.operatorframework.io/docs/installation/)** – Amazon EMR on EKS uses the Operator SDK as a package manager for the life of the vertical autoscaling operator that you install on your cluster.
+ **[Install Docker](https://docs.docker.com/get-docker/)** – You need access to the Docker CLI to authenticate and fetch the vertical autoscaling-related Docker images to install on your Amazon EKS cluster.
+ **[Install the Kubernetes Metrics server](https://docs.aws.amazon.com/eks/latest/userguide/metrics-server.html)**– You must first install metrics server so the vertical pod autoscaler can fetch metrics from the Kubernetes API server.
+ **[Get started with Amazon EKS – eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) (version 1.24 or higher)** – Vertical autoscaling is supported with Amazon EKS versions 1.24 and higher. Once you create the cluster, [register it for use with Amazon EMR](setting-up-registration.md).
+ **[Select an Amazon EMR base image URI](docker-custom-images-tag.md) (release 6.10.0 or higher)** – Vertical autoscaling is supported with Amazon EMR releases 6.10.0 and higher.

## Install the Operator Lifecycle Manager (OLM) on your Amazon EKS cluster
Install the OLM

Use the Operator SDK CLI to install the Operator Lifecycle Manager (OLM) on the Amazon EMR on EKS cluster where you want to set up vertical autoscaling, as shown in the following example. Once you set it up, you can use OLM to install and manage the lifecycle of the [Amazon EMR vertical autoscaling operator](#jobruns-vas-install-operator).

```
operator-sdk olm install
```

To validate installation, run the `olm status` command:

```
operator-sdk olm status
```

Verify that the command returns a successful result, similar to the following example output:

```
INFO[0007] Successfully got OLM status for version X.XX
```

If your installation doesn't succeed, see [Troubleshooting Amazon EMR on EKS vertical autoscaling](troubleshooting-vas.md).

## Install the Amazon EMR on EKS vertical autoscaling operator
Install the vertical autoscaling operator

Use the following steps to install the vertical autoscaling operator on your Amazon EKS cluster:

1. Set up the following environment variables that you will use to complete the installation:
   + **`$REGION`** points to the AWS Region for your cluster. For example, `us-west-2`.
   + **`$ACCOUNT_ID`** points to the Amazon ECR account ID for your Region. For more information, see [Amazon ECR registry accounts by Region](docker-custom-images-tag.md#docker-custom-images-ECR).
   + **`$RELEASE`** points to the Amazon EMR release that you want to use for your cluster. With vertical autoscaling, you must use Amazon EMR release 6.10.0 or higher.

1. Next, get authentication tokens to the [Amazon ECR registry](docker-custom-images-tag.md#docker-custom-images-ECR) for the operator.

   ```
   aws ecr get-login-password \
    --region region-id | docker login \
    --username AWS \
    --password-stdin $ACCOUNT_ID.dkr.ecr.region-id.amazonaws.com
   ```

1. Install the Amazon EMR on EKS vertical autoscaling operator with the following command:

   ```
   ECR_URL=$ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com && \
   REPO_DEST=dynamic-sizing-k8s-operator-olm-bundle && \
   BUNDLE_IMG=emr-$RELEASE-dynamic-sizing-k8s-operator && \
   operator-sdk run bundle \
   $ECR_URL/$REPO_DEST/$BUNDLE_IMG\:latest
   ```

   This will create a release of the vertical autoscaling operator in the default namespace of your Amazon EKS cluster. Use this command to install in a different namespace:

   ```
   operator-sdk run bundle \
   $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/dynamic-sizing-k8s-operator-olm-bundle/emr-$RELEASE-dynamic-sizing-k8s-operator:latest \
   -n operator-namespace
   ```
**Note**  
If the namespace that you specify doesn't exist, OLM won't install the operator. For more information, see [Kubernetes namespace not found](troubleshooting-vas.md).

1. Verify that you successfully installed the operator with the kubectl Kubernetes command-line tool.

   ```
   kubectl get csv -n operator-namespace
   ```

   The `kubectl` command should return your newly-deployed vertical autoscaler operator with a **Phase** status of **Succeeded**. If you've trouble with installation or setup, see [Troubleshooting Amazon EMR on EKS vertical autoscaling](troubleshooting-vas.md).

# Getting started with vertical autoscaling for Amazon EMR on EKS
Getting started

Use vertical autoscaling for Amazon EMR on EKS when you want automatic tuning of memory and CPU resources to adapt to your Amazon EMR Spark application workload. For more information, see [Using vertical autoscaling with Amazon EMR Spark jobs](jobruns-vas.html).

## Submitting a Spark job with vertical autoscaling
Submit a Spark job

When you submit a job through the [StartJobRun](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_StartJobRun.html) API, add the following two configurations to the driver for your Spark job to turn on vertical autoscaling:

```
"spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing":"true",
"spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.signature":"YOUR_JOB_SIGNATURE"
```

In the code above, the first line enables the vertical autoscaling capability. The next line is a required signature configuration that lets you choose a signature for your job.

For more information on these configurations and acceptable parameter values, see [Configuring vertical autoscaling for Amazon EMR on EKS](jobruns-vas-configure.md). By default, your job submits in the monitoring-only **Off** mode of vertical autoscaling. This monitoring state lets you compute and view resource recommendations without performing autoscaling. For more information, see [Vertical autoscaling modes](jobruns-vas-configure.md#jobruns-vas-parameters-opt-mode).

The following example shows how to complete a sample `start-job-run` command with vertical autoscaling:

```
aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name $JOB_NAME \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.10.0-latest \
--job-driver '{
  "sparkSubmitJobDriver": {
     "entryPoint": "local:///usr/lib/spark/examples/src/main/python/pi.py"
   }
 }' \
--configuration-overrides '{
    "applicationConfiguration": [{
        "classification": "spark-defaults",
        "properties": {
          "spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing": "true",
          "spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.signature": "test-signature"
        }
    }]
  }'
```

## Verifying the vertical autoscaling functionality
Verify functionality

To verify that vertical autoscaling works correctly for the submitted job, use kubectl to get the `verticalpodautoscaler` custom resource and view your scaling recommendations. For example, the following command queries for recommendations on the example job from the [Submitting a Spark job with vertical autoscaling](#jobruns-vas-spark-submit) section:

```
kubectl get verticalpodautoscalers --all-namespaces \
-l=emr-containers.amazonaws.com/dynamic.sizing.signature=test-signature
```

The output from this query should resemble the following:

```
NAME                                                          MODE   CPU         MEM PROVIDED   AGE
ds-jceyefkxnhrvdzw6djum3naf2abm6o63a6dvjkkedqtkhlrf25eq-vpa   Off    3304504865  True           87m
```

If your output doesn't look similar or contains an error code, see [Troubleshooting Amazon EMR on EKS vertical autoscaling](troubleshooting-vas.md) for steps to help resolve the issue.

# Configuring vertical autoscaling for Amazon EMR on EKS
Configuration

You can configure vertical autoscaling when you submit Amazon EMR Spark jobs through the [StartJobRun](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_StartJobRun.html) API. Set the autoscaling-related configuration parameters on the Spark driver pod as shown in the example in [Submitting a Spark job with vertical autoscaling](jobruns-vas-gs.md#jobruns-vas-spark-submit).

The Amazon EMR on EKS vertical autoscaling operator listens to driver pods that have autoscaling, then sets up integration with the Kubernetes Vertical Pod Autoscaler (VPA) with the settings on the driver pod. This facilitates resource tracking and autoscaling of Spark executor pods.

The following sections describe the parameters that you can use when you configure vertical autoscaling for your Amazon EKS cluster.

**Note**  
Configure the feature toggle parameter as a label, and configure the remaining parameters as annotations on the Spark driver pod. The autoscaling parameters belong to the `emr-containers.amazonaws.com/` domain and have the `dynamic.sizing` prefix.

## Required parameters
Required parameters

You must include the following two parameters on the Spark job driver when you submit your job:


| Key | Description | Accepted values | Default value | Type | Spark parameter1 | 
| --- | --- | --- | --- | --- | --- | 
|  `dynamic.sizing`  |  Feature toggle  |  `true`, `false`  |  not set  |  label  |  `spark.kubernetes.driver.label.emr-containers.amazonaws.com/dynamic.sizing`  | 
|  `dynamic.sizing.signature`  |  Job signature  |  *string*  |  not set  |  annotation  |  `spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.signature`  | 

1 Use this parameter as a `SparkSubmitParameter` or `ConfigurationOverride` in the `StartJobRun` API.
+ **`dynamic.sizing`** – You can turn vertical autoscaling on and off with the `dynamic.sizing` label. To turn on vertical autoscaling, set `dynamic.sizing` to `true` on the Spark driver pod. If you omit this label or set it to any value other than `true`, vertical autoscaling is off.
+ **`dynamic.sizing.signature`** – Set the job signature with the `dynamic.sizing.signature` annotation on the driver pod. Vertical autoscaling aggregates your resource usage data across different runs of Amazon EMR Spark jobs to derive resource recommendations. You provide the unique identifier to tie the jobs together.

  
**Note**  
If your job recurs at a fixed interval such as daily or weekly, then your job signature should remain the same for each new instance of the job. This ensures that vertical autoscaling can compute and aggregate recommendations across different runs of the job.

1 Use this parameter as a `SparkSubmitParameter` or `ConfigurationOverride` in the `StartJobRun` API.

## Optional parameters
Optional parameters

Vertical autoscaling also supports the following optional parameters. Set them as annotations on the driver pod.


| Key | Description | Accepted values | Default value | Type | Spark parameter1 | 
| --- | --- | --- | --- | --- | --- | 
|  `dynamic.sizing.mode`  |  Vertical autoscaling mode  |  `Off`, `Initial`, `Auto`  |  `Off`  |  annotation  |  `spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.mode`  | 
|  `dynamic.sizing.scale.memory`  |  Enables memory scaling  |  *`true`, `false`*  |  `true`  |  annotation  |  `spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.memory`  | 
|  `dynamic.sizing.scale.cpu`  |  Turn CPU scaling on or off  |  *`true`, `false`*  |  `false`  |  annotation  |  `spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.cpu`  | 
|  `dynamic.sizing.scale.memory.min`  |  Minumum limit for memory scaling  | string, [K8s resource quantity](https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#Quantity) ex: 1G |  not set  |  annotation  | spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.memory.min | 
|  `dynamic.sizing.scale.memory.max`  |  Maximum limit for memory scaling  | string, [K8s resource quantity](https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#Quantity) ex: 4G |  not set  |  annotation  | spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.memory.max | 
|  `dynamic.sizing.scale.cpu.min`  |  Minimum limit for CPU scaling  | string, [K8s resource quantity](https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#Quantity) ex: 1 |  not set  |  annotation  | spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.cpu.min | 
|  `dynamic.sizing.scale.cpu.max`  |  Maximum limit for CPU scaling  | string, [K8s resource quantity](https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#Quantity) ex: 2 |  not set  |  annotation  | spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.cpu.max | 

### Vertical autoscaling modes
Modes

The `mode` parameter maps to the different autoscaling modes that the VPA supports. Use the `dynamic.sizing.mode` annotation on the driver pod to set the mode. The following values are supported for this parameter:
+ **Off** – A dry-run mode where you can monitor recommendations, but autoscaling is not performed. This is the default mode for vertical autoscaling. In this mode, the associated vertical pod autoscaler resource computes recommendations, and you can monitor the recommendations through tools like kubectl, Prometheus, and Grafana.
+ **Initial** – In this mode, VPA autoscales resources when the job starts if recommendations are available based on historic runs of the job, such as in the case of a recurring job.
+ **Auto** – In this mode, VPA evicts Spark executor pods, and autoscales them with the recommended resource settings when the Spark driver pod restarts them. Sometimes, the VPA evicts running Spark executor pods, so it might result in additional latency when it retries the interrupted executor.

### Resource scaling


When you set up vertical autoscaling, you can choose whether to scale CPU and memory resources. Set the `dynamic.sizing.scale.cpu` and `dynamic.sizing.scale.memory` annotations to `true` or `false`. By default, CPU scaling is set to `false`, and memory scaling is set to `true`.

### Resource minimums and maximums (Bounds)
Bounds

Optionally, you can also set boundaries on the CPU and memory resources. Choose a minimum and maximum value for these resources with the `dynamic.sizing.[memory/cpu].[min/max]` annotations when you enable autoscaling. By default, the resources have no limitations. Set the annotations as string values that represent a Kubernetes resource quantity. For example, set `dynamic.sizing.memory.max` to `4G` to represent 4 GB.

# Monitoring vertical autoscaling for Amazon EMR on EKS
Monitoring the recommendations

You can use the **kubectl** Kubernetes command line tool to list the active, vertical autoscaling-related recommendations on your cluster. You can also view your tracked job signatures, and purge any unneeded resources that are associated with the signatures.


## List the vertical autoscaling recommendations for your cluster
List recommendations

Use kubectl to get the `verticalpodautoscaler` resource, and view the current status and recommendations. The following example query returns all active resources on your Amazon EKS cluster.

```
kubectl get verticalpodautoscalers \
-o custom-columns="NAME:.metadata.name,"\
"SIGNATURE:.metadata.labels.emr-containers\.amazonaws\.com/dynamic\.sizing\.signature,"\
"MODE:.spec.updatePolicy.updateMode,"\
"MEM:.status.recommendation.containerRecommendations[0].target.memory" \
--all-namespaces
```

The output from this query resembles the following:

```
NAME                  SIGNATURE                MODE      MEM
ds-example-id-1-vpa   job-signature-1          Off       none
ds-example-id-2-vpa   job-signature-2          Initial   12936384283
```

## Query and delete the vertical autoscaling recommendations for your cluster
Query and delete recommendations

When you delete an Amazon EMR vertical autoscaling job-run resource, it automatically deletes the associated VPA object that tracks and stores recommendations.

The following example uses kubectl to purge recommendations for a job that is identified by a signature:

```
kubectl delete jobrun -n emr -l=emr-containers\.amazonaws\.com/dynamic\.sizing\.signature=integ-test
jobrun.dynamicsizing.emr.services.k8s.aws "ds-job-signature" deleted
```

If you don't know the specific job signature, or want to purge all of the resources on the cluster, you can use `--all` or `--all-namespaces` in your command instead of the unique job ID, as shown in the following example:

```
kubectl delete jobruns --all --all-namespaces
jobrun.dynamicsizing.emr.services.k8s.aws "ds-example-id" deleted
```

# Uninstall the Amazon EMR on EKS vertical autoscaling operator
Uninstalling

If you want to remove the vertical autoscaling operator from your Amazon EKS cluster, use the `cleanup` command with the Operator SDK CLI as shown in the following example. This also deletes upstream dependencies that installed with the operator, such as the Vertical Pod Autoscaler.

```
operator-sdk cleanup emr-dynamic-sizing
```

If there are any running jobs on the cluster when you delete the operator, those jobs continue to run without vertical autoscaling. If you submit jobs on the cluster after you delete the operator, Amazon EMR on EKS will ignore any vertical autoscaling-related parameters that you may have defined during [configuration](jobruns-vas-configure.md).