

# Using Apache Livy with Amazon EMR on EKS
<a name="job-runs-apache-livy"></a>

With Amazon EMR releases 7.1.0 and higher, you can use Apache Livy to submit jobs on Amazon EMR on EKS. Using Apache Livy, you can set up your own Apache Livy REST endpoint and use it to deploy and manage Spark applications on your Amazon EKS clusters. After you install Livy in your Amazon EKS cluster, you can use the Livy endpoint to submit Spark applications to your Livy server. The server manages the lifecycle of the Spark applications.

**Note**  
Amazon EMR calculates pricing on Amazon EKS based on vCPU and memory consumption. This calculation applies to driver and executor pods. This calculation starts from when you download your Amazon EMR application image until the Amazon EKS pod terminates and is rounded to the nearest second.

**Topics**
+ [Setting up Apache Livy for Amazon EMR on EKS](job-runs-apache-livy-setup.md)
+ [Getting started with Apache Livy on Amazon EMR on EKS](job-runs-apache-livy-install.md)
+ [Running a Spark application with Apache Livy for Amazon EMR on EKS](job-runs-apache-livy-run-spark.md)
+ [Uninstalling Apache Livy with Amazon EMR on EKS](job-runs-apache-livy-uninstall.md)
+ [Security for Apache Livy with Amazon EMR on EKS](job-runs-apache-livy-security.md)
+ [Installation properties for Apache Livy on Amazon EMR on EKS releases](job-runs-apache-livy-installation-properties.md)
+ [Troubleshoot common environment-variable format errors](job-runs-apache-livy-troubleshooting.md)

# Setting up Apache Livy for Amazon EMR on EKS
<a name="job-runs-apache-livy-setup"></a>

Before you can install Apache Livy on your Amazon EKS cluster, you must install and configure a set of prerequisite tools. These include the AWS CLI, which is a foundational command-line tool for working with AWS resources, command-line tools for working with Amazon EKS, and a controller that's used in this use case to make your cluster application available to the internet and to route network traffic.
+ **[Install or update to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) ** – If you've already installed the AWS CLI, confirm that you have the latest version.
+ **[Set up kubectl and eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) ** – eksctl is a command line tool that you use to communicate with Amazon EKS.
+ **[Install Helm](https://docs.aws.amazon.com/eks/latest/userguide/helm.html)** – The Helm package manager for Kubernetes helps you install and manage applications on your Kubernetes cluster. 
+ **[Get started with Amazon EKS – eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) ** – Follow the steps to create a new Kubernetes cluster with nodes in Amazon EKS.
+ **[Select an Amazon EMR release label](docker-custom-images-tag.md)** – the Apache Livy is supported with Amazon EMR releases 7.1.0 and higher.
+ **[Install the ALB controller](https://docs.aws.amazon.com/eks/latest/userguide/aws-load-balancer-controller.html)** – the ALB controller manages AWS Elastic Load Balancing for Kubernetes clusters. It creates an AWS Network Load Balancer (NLB) when you create a Kubernetes Ingress while setting up Apache Livy.

# Getting started with Apache Livy on Amazon EMR on EKS
<a name="job-runs-apache-livy-install"></a>

Complete the following steps to install Apache Livy. They include configuring the package manager, creating a namespace for running Spark workloads, installing Livy, setting up load balancing, and verification steps. You have to complete these steps in order to run a batch job with Spark.

1. If you haven't already, set up [Apache Livy for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-setup.html).

1. Authenticate your Helm client to the Amazon ECR registry. You can find the corresponding `ECR-registry-account` value for your AWS Region from [Amazon ECR registry accounts by Region](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/docker-custom-images-tag.html#docker-custom-images-ECR).

   ```
   aws ecr get-login-password \--region <AWS_REGION> | helm registry login \
   --username AWS \
   --password-stdin <ECR-registry-account>.dkr.ecr.<region-id>.amazonaws.com
   ```

1. Setting up Livy creates a service account for the Livy server and another account for the Spark application. To set up IRSA for the service accounts, see [Setting up access permissions with IAM roles for service accounts (IRSA)](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-irsa.html). 

1. Create a namespace to run your Spark workloads.

   ```
   kubectl create ns <spark-ns>
   ```

1. Use the following command to install Livy.

   This Livy endpoint is only internally available to the VPC in the EKS cluster. To enable access beyond the VPC, set `—-set loadbalancer.internal=false` in your Helm installation command.
**Note**  
By default, SSL is not enabled within this Livy endpoint and the endpoint is only visible inside the VPC of the EKS cluster. If you set `loadbalancer.internal=false` and `ssl.enabled=false`, you are exposing an insecure endpointto outside of your VPC. To set up a secure Livy endpoint, see [Configuring a secure Apache Livy endpoint with TLS/SSL](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-secure-endpoint.html). 

   ```
   helm install livy-demo \
     oci://895885662937.dkr.ecr.region-id.amazonaws.com/livy \
     --version 7.12.0 \
     --namespace livy-ns \
     --set image=ECR-registry-account.dkr.ecr.region-id.amazonaws.com/livy/emr-7.12.0:latest \
     --set sparkNamespace=<spark-ns> \
     --create-namespace
   ```

   You should see the following output.

   ```
   NAME: livy-demo
   LAST DEPLOYED: Mon Mar 18 09:23:23 2024
   NAMESPACE: livy-ns
   STATUS: deployed
   REVISION: 1
   TEST SUITE: None
   NOTES:
   The Livy server has been installed.
   Check installation status:
   1. Check Livy Server pod is running
     kubectl --namespace livy-ns get pods -l "app.kubernetes.io/instance=livy-demo"
   2. Verify created NLB is in Active state and it's target groups are healthy (if loadbalancer.enabled is true)
   
   Access LIVY APIs:
       # Ensure your NLB is active and healthy
       # Get the Livy endpoint using command:
       LIVY_ENDPOINT=$(kubectl get svc -n livy-ns -l app.kubernetes.io/instance=livy-demo,emr-containers.amazonaws.com/type=loadbalancer -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}' |  awk '{printf "%s:8998\n", $0}')
       # Access Livy APIs using http://$LIVY_ENDPOINT or https://$LIVY_ENDPOINT (if SSL is enabled)
       # Note: While uninstalling Livy, makes sure the ingress and NLB are deleted after running the helm command to avoid dangling resources
   ```

   The default service account names for the Livy server and the Spark session are `emr-containers-sa-livy` and `emr-containers-sa-spark-livy`. To use custom names, use the `serviceAccounts.name` and `sparkServiceAccount.name` parameters.

   ```
   --set serviceAccounts.name=my-service-account-for-livy
   --set sparkServiceAccount.name=my-service-account-for-spark
   ```

1. Verify that you installed the Helm chart.

   ```
   helm list -n livy-ns -o yaml
   ```

   The `helm list` command should return information about your new Helm chart.

   ```
   app_version: 0.7.1-incubating
   chart: livy-emr-7.12.0
   name: livy-demo
   namespace: livy-ns
   revision: "1"
   status: deployed
   updated: 2024-02-08 22:39:53.539243 -0800 PST
   ```

1. Verify that the Network Load Balancer is active.

   ```
   LIVY_NAMESPACE=<livy-ns>
   LIVY_APP_NAME=<livy-app-name>
   AWS_REGION=<AWS_REGION>
   
   # Get the NLB Endpoint URL
   NLB_ENDPOINT=$(kubectl --namespace $LIVY_NAMESPACE get svc -l "app.kubernetes.io/instance=$LIVY_APP_NAME,emr-containers.amazonaws.com/type=loadbalancer" -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}') 
   
   # Get all the load balancers in the account's region
   ELB_LIST=$(aws elbv2 describe-load-balancers --region $AWS_REGION)
   
   # Get the status of the NLB that matching the endpoint from the Kubernetes service
   NLB_STATUS=$(echo $ELB_LIST | grep -A 8 "\"DNSName\": \"$NLB_ENDPOINT\"" | awk '/Code/{print $2}/}/' | tr -d '"},\n')
   echo $NLB_STATUS
   ```

1. Now verify that the target group in the Network Load Balancer is healthy.

   ```
   LIVY_NAMESPACE=<livy-ns>
   LIVY_APP_NAME=<livy-app-name>
   AWS_REGION=<AWS_REGION>
   
   # Get the NLB endpoint
   NLB_ENDPOINT=$(kubectl --namespace $LIVY_NAMESPACE get svc -l "app.kubernetes.io/instance=$LIVY_APP_NAME,emr-containers.amazonaws.com/type=loadbalancer" -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}') 
   
   # Get all the load balancers in the account's region
   ELB_LIST=$(aws elbv2 describe-load-balancers --region $AWS_REGION)
   
   # Get the NLB ARN from the NLB endpoint
   NLB_ARN=$(echo $ELB_LIST | grep -B 1 "\"DNSName\": \"$NLB_ENDPOINT\"" | awk '/"LoadBalancerArn":/,/"/'| awk '/:/{print $2}' | tr -d \",)
   
   # Get the target group from the NLB. Livy setup only deploys 1 target group
   TARGET_GROUP_ARN=$(aws elbv2 describe-target-groups --load-balancer-arn $NLB_ARN --region $AWS_REGION | awk '/"TargetGroupArn":/,/"/'| awk '/:/{print $2}' | tr -d \",)
   
   # Get health of target group
   aws elbv2 describe-target-health --target-group-arn $TARGET_GROUP_ARN
   ```

   The following is sample output that shows the status of the target group:

   ```
   {
       "TargetHealthDescriptions": [
           {
               "Target": {
                   "Id": "<target IP>",
                   "Port": 8998,
                   "AvailabilityZone": "us-west-2d"
               },
               "HealthCheckPort": "8998",
               "TargetHealth": {
                   "State": "healthy"
               }
           }
       ]
   }
   ```

   Once the status of your NLB becomes `active` and your target group is `healthy`, you can continue. It might take a few minutes.

1. Retrieve the Livy endpoint from the Helm installation. Whether or not your Livy endpoint is secure depends on whether you enabled SSL.

   ```
   LIVY_NAMESPACE=<livy-ns>
    LIVY_APP_NAME=livy-app-name
    LIVY_ENDPOINT=$(kubectl get svc -n livy-ns -l app.kubernetes.io/instance=livy-app-name,emr-containers.amazonaws.com/type=loadbalancer -o jsonpath='{.items[0].status.loadBalancer.ingress[0].hostname}' |  awk '{printf "%s:8998\n", $0}')
    echo "$LIVY_ENDPOINT"
   ```

1. Retrieve the Spark service account from the Helm installation

   ```
   SPARK_NAMESPACE=spark-ns
   LIVY_APP_NAME=<livy-app-name>
   SPARK_SERVICE_ACCOUNT=$(kubectl --namespace $SPARK_NAMESPACE get sa -l "app.kubernetes.io/instance=$LIVY_APP_NAME" -o jsonpath='{.items[0].metadata.name}')
   echo "$SPARK_SERVICE_ACCOUNT"
   ```

   You should see something similar to the following output:

   ```
   emr-containers-sa-spark-livy
   ```

1. If you set `internalALB=true` to enable access from outside of your VPC, create an Amazon EC2 instance and make sure the Network Load Balancer allows network traffic coming from the EC2 instance. You must do so for the instance to have access to your Livy endpoint. For more information about securely exposing your endpoint outside of your VPC, see [Setting up with a secure Apache Livy endpoint with TLS/SSL](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-secure-endpoint.html).

1. Installing Livy creates the service account `emr-containers-sa-spark` to run Spark applications. If your Spark application uses any AWS resources like S3 or calls AWS API or CLI operations, you must link an IAM role with the necessary permissions to your spark service account. For more information, see [Setting up access permissions with IAM roles for service accounts (IRSA).](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-irsa.html)

Apache Livy supports additional configurations that you can use while installing Livy. For more information, see Installation properties for Apache Livy on Amazon EMR on EKS releases.

# Running a Spark application with Apache Livy for Amazon EMR on EKS
<a name="job-runs-apache-livy-run-spark"></a>

Before you can run a Spark application with Apache Livy, make sure that you have completed the steps in [Setting up Apache Livy for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-setup.html) and [Getting started with Apache Livy for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-install.html).

You can use Apache Livy to run two types of applications:
+ Batch sessions – a type of Livy workload to submit Spark batch jobs.
+ Interactive sessions – a type of Livy workload that provides a programmatic and visual interface to run Spark queries.

**Note**  
Driver and executor pods from different sessions can communicate with each other. Namespaces don't guarantee any security between pods. Kubernetes doesn't allow selective permissions on a subset of pods inside a given namespace.

## Running batch sessions
<a name="job-runs-apache-livy-run-spark-batch"></a>

To submit a batch job, use the following command.

```
curl -s -k -H 'Content-Type: application/json' -X POST \
      -d '{
            "name": "my-session",
            "file": "entryPoint_location (S3 or local)",
            "args": ["argument1", "argument2", ...],
            "conf": {
                "spark.kubernetes.namespace": "<spark-namespace>",
                "spark.kubernetes.container.image": "public.ecr.aws/emr-on-eks/spark/emr-7.12.0:latest",
                "spark.kubernetes.authenticate.driver.serviceAccountName": "<spark-service-account>"
            }
          }' <livy-endpoint>/batches
```

To monitor your batch job, use the following command.

```
curl -s -k -H 'Content-Type: application/json' -X GET <livy-endpoint>/batches/my-session
```

## Running interactive sessions
<a name="job-runs-apache-livy-run-spark-interactive"></a>

To run interactive sessions with Apache Livy, see the following steps.

1. Make sure you have access to either a self-hosted or a managed Jupyter notebook, such as a SageMaker AI Jupyter notebook. Your jupyter notebook must have [sparkmagic](https://github.com/jupyter-incubator/sparkmagic/blob/master/README.md) installed.

1. Create a bucket for Spark configuration `spark.kubernetes.file.upload.path`. Make sure the Spark service account has read and write access to the bucket. For more details on how to configure your spark service account, see Setting up access permissions with IAM roles for service accounts (IRSA)

1. Load sparkmagic in the Jupyter notebook with the command `%load_ext sparkmagic.magics`.

1. Run the command `%manage_spark` to set up your Livy endpoint with the Jupyter notebook. Choose the **Add Endpoints** tab, choose the configured auth type, add the Livy endpoint to the notebook, and then choose **Add endpoint**.

1. Run `%manage_spark` again to create the Spark context and then go to the **Create session**. Choose the Livy endpoint, specify a unique session name choose a language, and then add the following properties.

   ```
   {
     "conf": {
       "spark.kubernetes.namespace": "livy-namespace",
       "spark.kubernetes.container.image": "public.ecr.aws/emr-on-eks/spark/emr-7.12.0:latest",
       "spark.kubernetes.authenticate.driver.serviceAccountName": "<spark-service-account>", 
       "spark.kubernetes.file.upload.path": "<URI_TO_S3_LOCATION_>"
     }
   }
   ```

1. Submit the application and wait for it to create the Spark context.

1. To monitor the status of the interactive session, run the following command.

   ```
   curl -s -k -H 'Content-Type: application/json' -X GET livy-endpoint/sessions/my-interactive-session
   ```

## Monitoring Spark applications
<a name="job-runs-apache-livy-run-ui"></a>

To monitor the progress of your Spark applications with the Livy UI, use the link `http://<livy-endpoint>/ui`.

# Uninstalling Apache Livy with Amazon EMR on EKS
<a name="job-runs-apache-livy-uninstall"></a>

Follow these steps to uninstall Apache Livy.

1. Delete the Livy setup using the names of your namespace and application name. In this example, the application name is `livy-demo` and the namespace is `livy-ns`.

   ```
   helm uninstall livy-demo -n livy-ns
   ```

1. When uninstalling, Amazon EMR on EKS deletes the Kubernetes service in Livy, the AWS load balancers, and the target groups that you created during installation. Deleting resources can take a few minutes. Make sure that the resources are deleted before installing Livy on the namespace again.

1. Delete the Spark namespace.

   ```
   kubectl delete namespace spark-ns
   ```

# Security for Apache Livy with Amazon EMR on EKS
<a name="job-runs-apache-livy-security"></a>

See the following topics to learn more about configuring security for Apache Livy with Amazon EMR on EKS. These options include using transport-layer security, role-based access control, which is access based on a person's role within an organization, and using IAM roles, which provide access to resources, based on granted permissions.

**Topics**
+ [Setting up a secure Apache Livy endpoint with TLS/SSL](job-runs-apache-livy-secure-endpoint.md)
+ [Setting up the Apache Livy and Spark application permissions with role-based access control (RBAC)](job-runs-apache-livy-rbac.md)
+ [Setting up access permissions with IAM roles for service accounts (IRSA)](job-runs-apache-livy-irsa.md)

# Setting up a secure Apache Livy endpoint with TLS/SSL
<a name="job-runs-apache-livy-secure-endpoint"></a>

See the following sections to learn more about setting up Apache Livy for Amazon EMR on EKS with end-to-end TLS and SSL encryption.

## Setting up TLS and SSL encryption
<a name="job-runs-apache-livy-security-tls"></a>

To set up SSL encryption on your Apache Livy endpoint, follow these steps.
+ [Install the Secrets Store CSI Driver and AWS Secrets and Configuration Provider (ASCP)](https://docs.aws.amazon.com/secretsmanager/latest/userguide/integrating_csi_driver.html) – the Secrets Store CSI Driver and ASCP securely store Livy's JKS certificates and passwords that the Livy server pod needs to enable SSL. You can also install just the Secrets Store CSI Driver and use any other supported secrets provider.
+ [Create an ACM certificate](https://docs.aws.amazon.com/acm/latest/userguide/gs-acm-request-public.html) – this certificate is required to secure the connection between the client and the ALB endpoint.
+ Set up a JKS certificate, key password, and keystore password for AWS Secrets Manager – required to secure the connection between the ALB endpoint and the Livy server.
+ Add permissions to the Livy service account to retrieve secrets from AWS Secrets Manager – the Livy server needs these permissions to retrieve secrets from ASCP and add the Livy configurations to secure the Livy server. To add IAM permissions to a service account, see Setting up access permissions with IAM roles for service accounts (IRSA).

### Setting up a JKS certificate with a key and a keystore password for AWS Secrets Manager
<a name="job-runs-apache-livy-jks-certificate"></a>

Follow these steps to set up a JKS certificate with a key and a keystore password.

1. Generate a keystore file for the Livy server.

   ```
   keytool -genkey -alias <host> -keyalg RSA -keysize 2048 –dname CN=<host>,OU=hw,O=hw,L=<your_location>,ST=<state>,C=<country> –keypass <keyPassword> -keystore <keystore_file> -storepass <storePassword> --validity 3650
   ```

1. Create a certificate.

   ```
   keytool -export -alias <host> -keystore mykeystore.jks -rfc -file mycertificate.cert -storepass <storePassword>
   ```

1. Create a truststore file.

   ```
   keytool -import -noprompt -alias <host>-file <cert_file> -keystore <truststore_file> -storepass <truststorePassword>
   ```

1. Save the JKS certificate in AWS Secrets Manager. Replace `livy-jks-secret` with your secret and `fileb://mykeystore.jks` with the path to your keystore JKS certificate.

   ```
   aws secretsmanager create-secret \ 
   --name livy-jks-secret \
   --description "My Livy keystore JKS secret" \
   --secret-binary fileb://mykeystore.jks
   ```

1. Save the keystore and key password in Secrets Manager. Make sure to use your own parameters.

   ```
   aws secretsmanager create-secret \
   --name livy-jks-secret \
   --description "My Livy key and keystore password secret" \
   --secret-string "{\"keyPassword\":\"<test-key-password>\",\"keyStorePassword\":\"<test-key-store-password>\"}"
   ```

1. Create a Livy server namespace with the following command.

   ```
   kubectl create ns <livy-ns>
   ```

1. Create the `ServiceProviderClass` object for the Livy server that has the JKS certificate and the passwords.

   ```
   cat >livy-secret-provider-class.yaml << EOF
   apiVersion: secrets-store.csi.x-k8s.io/v1
   kind: SecretProviderClass
   metadata:
     name: aws-secrets
   spec:
     provider: aws
     parameters:
       objects: |
           - objectName: "livy-jks-secret"
             objectType: "secretsmanager"
           - objectName: "livy-passwords"
             objectType: "secretsmanager"
                        
   EOF
   kubectl apply -f livy-secret-provider-class.yaml -n <livy-ns>
   ```

## Getting started with SSL-enabled Apache Livy
<a name="job-runs-apache-livy-ssl-enabled-getting-started"></a>

After enabling SSL on your Livy server, you must set up the `serviceAccount` to have access to the `keyStore` and `keyPasswords` secrets on AWS Secrets Manager.

1. Create the Livy server namespace.

   ```
   kubectl create namespace <livy-ns>
   ```

1. Set up the Livy service account to have access to the secrets in Secrets Manager. For more information about setting up IRSA, see [Setting up IRSA while installing Apache Livy](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-irsa.html#job-runs-apache-livy-irsa).

   ```
   aws ecr get-login-password \--region region-id | helm registry login \
   --username AWS \
   --password-stdin ECR-registry-account.dkr.ecr.region-id.amazonaws.com
   ```

1. Install Livy. For the Helm chart --version parameter, use your Amazon EMR release label, such as `7.1.0`. You must also replace the Amazon ECR registry account ID and Region ID with your own IDs. You can find the corresponding `ECR-registry-account` value for your AWS Region from [Amazon ECR registry accounts by Region](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/docker-custom-images-tag.html#docker-custom-images-ECR).

   ```
   helm install <livy-app-name> \
     oci://895885662937.dkr.ecr.region-id.amazonaws.com/livy \
     --version 7.12.0 \
     --namespace livy-namespace-name \
     --set image=<ECR-registry-account.dkr.ecr>.<region>.amazonaws.com/livy/emr-7.12.0:latest \
     --set sparkNamespace=spark-namespace \
     --set ssl.enabled=true
     --set ssl.CertificateArn=livy-acm-certificate-arn
     --set ssl.secretProviderClassName=aws-secrets
     --set ssl.keyStoreObjectName=livy-jks-secret
     --set ssl.keyPasswordsObjectName=livy-passwords
     --create-namespace
   ```

1. Continue from step 5 of the [Installing Apache Livy on Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-setup.html#job-runs-apache-livy-install).

# Setting up the Apache Livy and Spark application permissions with role-based access control (RBAC)
<a name="job-runs-apache-livy-rbac"></a>

To deploy Livy, Amazon EMR on EKS creates a server service account and role and a Spark service account and role. These roles must have the necessary RBAC permissions to finish setup and run Spark applications.

**RBAC permissions for the server service account and role**

Amazon EMR on EKS creates the Livy server service account and role to manage Livy sessions for Spark jobs and routing traffic to and from the ingress and other resources.

The default name for this service account is `emr-containers-sa-livy`. It must have the following permissions.

```
rules:
- apiGroups:
  - ""
  resources:
  - "namespaces"
  verbs:
  - "get"
- apiGroups:
  - ""
  resources:
  - "serviceaccounts"
    "services"
    "configmaps"
    "events"
    "pods"
    "pods/log"
  verbs:
  - "get"
    "list"
    "watch"
    "describe"
    "create"
    "edit"
    "delete"
    "deletecollection"
    "annotate"
    "patch"
    "label"
 - apiGroups:
   - ""
   resources:
   - "secrets"
   verbs:
   - "create"
     "patch"
     "delete"
     "watch"
 - apiGroups:
   - ""
   resources:
   - "persistentvolumeclaims"
   verbs:
   - "get"
     "list"
     "watch"
     "describe"
     "create"
     "edit"
     "delete"
     "annotate"
     "patch"
     "label"
```

**RBAC permissions for the spark service account and role**

A Spark driver pod needs a Kubernetes service account in the same namespace as the pod. This service account needs permissions to manage executor pods and any resources required by the driver pod. Unless the default service account in the namespace has the required permissions, the driver fails and exits. The following RBAC permissions are required.

```
rules:
- apiGroups:
  - ""
    "batch"
    "extensions"
    "apps"
  resources:
  - "configmaps"
    "serviceaccounts"
    "events"
    "pods"
    "pods/exec"
    "pods/log"
    "pods/portforward"
    "secrets"
    "services"
    "persistentvolumeclaims"
    "statefulsets"
  verbs:
  - "create"
    "delete"
    "get"
    "list"
    "patch"
    "update"
    "watch"
    "describe"
    "edit"
    "deletecollection"
    "patch"
    "label"
```

# Setting up access permissions with IAM roles for service accounts (IRSA)
<a name="job-runs-apache-livy-irsa"></a>

By default, the Livy server and Spark application's driver and executors don't have access to AWS resources. The server service account and spark service account controls access to AWS resources for the Livy server and spark application's pods. To grant access, you need to map the service accounts with an IAM role that has the necessary AWS permissions.

You can set up IRSA mapping before you install Apache Livy, during the installation, or after you finish the installation.

## Setting up IRSA while installing Apache Livy (for server service account)
<a name="job-runs-apache-livy-irsa"></a>

**Note**  
This mapping is supported only for the server service account.

1. Make sure that you have finished [setting up Apache Livy for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-setup.html) and are in the middle of [installing Apache Livy with Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-install.html). 

1. Create a Kubernetes namespace for the Livy server. In this example, the name of the namespace is `livy-ns`.

1. Create an IAM policy that includes the permissions for the AWS services for which you want your pods to access. The following example creates an IAM policy of getting Amazon S3 resources for the Spark entry point.

   ```
   cat >my-policy.json <<EOF{
   "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
   "Effect": "Allow",
               "Action": "s3:GetObject",
               "Resource": "arn:aws:s3:::my-spark-entrypoint-bucket"
           }
       ]
   }
   EOF
   
   aws iam create-policy --policy-name my-policy --policy-document file://my-policy.json
   ```

1. Use the following command to set your AWS account ID to a variable.

   ```
   account_id=$(aws sts get-caller-identity --query "Account" --output text)
   ```

1. Set the OpenID Connect (OIDC) identity provider of your cluster to an environment variable.

   ```
   oidc_provider=$(aws eks describe-cluster --name my-cluster --region $AWS_REGION --query "cluster.identity.oidc.issuer" --output text | sed -e "s/^https:\/\///")
   ```

1. Set variables for the namespace and name of the service account. Be sure to use your own values.

   ```
   export namespace=default
   export service_account=my-service-account
   ```

1. Create a trust policy file with the following command. If you want to grant access of the role to all service accounts within a namespace, copy the following command, and replace `StringEquals` with `StringLike` and replace `$service_account` with `*`.

   ```
   cat >trust-relationship.json <<EOF
   {
     "Version": "2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Federated": "arn:aws:iam::$account_id:oidc-provider/$oidc_provider"
         },
         "Action": "sts:AssumeRoleWithWebIdentity",
         "Condition": {
           "StringEquals": {
             "$oidc_provider:aud": "sts.amazonaws.com",
             "$oidc_provider:sub": "system:serviceaccount:$namespace:$service_account"
           }
         }
       }
     ]
   }
   EOF
   ```

1. Create the role.

   ```
   aws iam create-role --role-name my-role --assume-role-policy-document file://trust-relationship.json --description "my-role-description"
   ```

1. Use the following Helm install command to set the `serviceAccount.executionRoleArn` to map IRSA. The following is an example of the Helm install command. You can find the corresponding `ECR-registry-account` value for your AWS Region from [Amazon ECR registry accounts by Region](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/docker-custom-images-tag.html#docker-custom-images-ECR).

   ```
   helm install livy-demo \
     oci://895885662937.dkr.ecr.us-west-2.amazonaws.com/livy \
     --version 7.12.0 \
     --namespace livy-ns \
     --set image=ECR-registry-account.dkr.ecr.region-id.amazonaws.com/livy/emr-7.12.0:latest \
     --set sparkNamespace=spark-ns \
     --set serviceAccount.executionRoleArn=arn:aws:iam::123456789012:role/my-role
   ```

## Mapping IRSA to a Spark service account
<a name="job-runs-apache-livy-irsa-spark"></a>

Before you map IRSA to a Spark service account, make sure that you have completed the following items:
+ Make sure that you have finished [setting up Apache Livy for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-setup.html) and are in the middle of [installing Apache Livy with Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-install.html). 
+ You must have an existing IAM OpenID Connect (OIDC) provdider for your cluster. To see if you already have one or how to create one, see [Create an IAM OIDC provider for your cluster](https://docs.aws.amazon.com/eks/latest/userguide/enable-iam-roles-for-service-accounts.html).
+ Make sure that you have installed version 0.171.0 or later of the `eksctl` CLI installed or AWS CloudShell. To install or update `eksctl`, see [Installation](https://eksctl.io/installation/) of the `eksctl` documentation.

Follow these steps to map IRSA to your Spark service account:

1. Use the following command to get the Spark service account.

   ```
   SPARK_NAMESPACE=<spark-ns>
   LIVY_APP_NAME=<livy-app-name>
   kubectl --namespace $SPARK_NAMESPACE describe sa -l "app.kubernetes.io/instance=$LIVY_APP_NAME" | awk '/^Name:/ {print $2}'
   ```

1. Set your variables for the namespace and name of the service account.

   ```
   export namespace=default
   export service_account=my-service-account
   ```

1. Use the following command to create a trust policy file for the IAM role. The following example gives permission to all service accounts within the namespace to use the role. To do so, replace `StringEquals` with `StringLike` and replace `$service_account` with \$1.

   ```
   cat >trust-relationship.json <<EOF
   {
     "Version": "2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Principal": {
           "Federated": "arn:aws:iam::$account_id:oidc-provider/$oidc_provider"
         },
         "Action": "sts:AssumeRoleWithWebIdentity",
         "Condition": {
           "StringEquals": {
             "$oidc_provider:aud": "sts.amazonaws.com",
             "$oidc_provider:sub": "system:serviceaccount:$namespace:$service_account"
           }
         }
       }
     ]
   }
   EOF
   ```

1. Create the role.

   ```
   aws iam create-role --role-name my-role --assume-role-policy-document file://trust-relationship.json --description "my-role-description"
   ```

1. Map the server or spark service account with the following `eksctl` command. Make sure to use your own values.

   ```
    eksctl create iamserviceaccount --name spark-sa \
    --namespace spark-namespace --cluster livy-eks-cluster \
    --attach-role-arn arn:aws:iam::0123456789012:role/my-role \
    --approve --override-existing-serviceaccounts
   ```

# Installation properties for Apache Livy on Amazon EMR on EKS releases
<a name="job-runs-apache-livy-installation-properties"></a>

Apache Livy installation allows you to select a version of the Livy Helm chart. The Helm chart offers a variety of properties to customize your installation and setup experience. These properties are supported for Amazon EMR on EKS releases 7.1.0 and higher.

**Topics**
+ [Amazon EMR 7.1.0 installation properties](#job-runs-apache-livy-installation-properties-710)

## Amazon EMR 7.1.0 installation properties
<a name="job-runs-apache-livy-installation-properties-710"></a>

The following table describes all of the supported Livy properties. When installing Apache Livy, you can choose the Livy Helm chart version. To set a property during the installation, use the command `--set <property>=<value>`.


| Property | Description | Default | 
| --- | --- | --- | 
| image | The Amazon EMR release URI of the Livy server. This is a required configuration. | "" | 
| sparkNamespace | Namespace to run Livy Spark sessions. For example, specify "livy". This is a required configuration. | "" | 
| nameOverride | Provide a name instead of livy. The name is set as a label for all Livy resources | "livy" | 
| fullnameOverride | Provide a name to use instead of the full names of resources. | "" | 
| ssl.enabled | Enables end-to-end SSL from Livy endpoint to Livy server. | FALSE | 
| ssl.certificateArn | If SSL is enabled, this is the ACM certificate ARN for the NLB created by the service.. | "" | 
| ssl.secretProviderClassName | If SSL is enabled, this is the secret provider class name to secure NLB for the Livy server connection with SSL. | "" | 
| ssl.keyStoreObjectName | If SSL is enabled, the object name for the keystore certificate in the secret provider class. | "" | 
| ssl.keyPasswordsObjectName | If SSL is enabled, the object name for the secret that has the keystore and key password. | "" | 
| rbac.create | If true, creates RBAC resources. | FALSE | 
| serviceAccount.create | If true, creates a Livy service account. | TRUE | 
| serviceAccount.name | The name of the service account to use for Livy. If you don't set this property and create a service account, Amazon EMR on EKS automatically generates a name using the fullname override property. | "emr-containers-sa-livy" | 
| serviceAccount.executionRoleArn | The execution role ARN of the Livy service account. | "" | 
| sparkServiceAccount.create | IF true, creates the Spark service account in .Release.Namespace | TRUE | 
| sparkServiceAccount.name | The name of the service account to use for Spark. If you don't set this property and create a Spark service account, Amazon EMR on EKS automatically generates a name with the fullnameOverride property with -spark-livy suffix. | "emr-containers-sa-spark-livy" | 
| service.name | Name of the Livy service | "emr-containers-livy" | 
| service.annotations | Livy service annotations | \$1\$1 | 
| loadbalancer.enabled | Whether to create a load balancer for the Livy service used to expose the Livy endpoint outside of the Amazon EKS cluster. | FALSE | 
| loadbalancer.internal | Whether to configure the Livy endpoint as internal to the VPC or external. Setting this property to `FALSE` exposes the endpoint to sources outside of the VPC. We recommend securing your endpoint with TLS/SSL. For more information, see [Setting up TLS and SSL encryption](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-security.html#job-runs-apache-livy-security-tls). | FALSE | 
| imagePullSecrets | The list of imagePullSecret names to use to pull Livy image from private repositories. | [] | 
| resources | The resource requests and limits for Livy containers. | \$1\$1 | 
| nodeSelector | The nodes for which to schedule Livy pods. | \$1\$1 | 
| tolerations | A list containing the Livy pods tolerations to define. | [] | 
| affinity | The Livy pods affinity rules. | \$1\$1 | 
| persistence.enabled | If true, enables persistance for sesions directories. | FALSE | 
| persistence.subPath | The PVC subpath to mount to sessions directories. | "" | 
| persistence.existingClaim | The PVC to use instead of creating a new one. | \$1\$1 | 
| persistence.storageClass | The storage class to use. To define this parameter, use the format storageClassName: <storageClass>. Setting this parameter to "-" disables dynamic provisioning. If you set this parameter to null or don't specify anything, Amazon EMR on EKS doesn't set a storageClassName and uses the default provisioner. | "" | 
| persistence.accessMode | The PVC access mode. | ReadWriteOnce | 
| persistence.size | The PVC size. | 20Gi | 
| persistence.annotations | Additional annotations for the PVC. | \$1\$1 | 
| env.\$1 | Additional envs to set to Livy container. For more information, see [Inputting your own Livy and Spark configurations while installing Livy](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/job-runs-apache-livy-troubleshooting.html). | \$1\$1 | 
| envFrom.\$1 | Additional envs to set to Livy from a Kubernetes config map or secret. | [] | 
| livyConf.\$1 | Additional livy.conf entries to set from a mounted Kubernetes config map or secret. | \$1\$1 | 
| sparkDefaultsConf.\$1 | Additional spark-defaults.conf entries to set from a mounted Kubernetes config map or secret. | \$1\$1 | 

# Troubleshoot common environment-variable format errors
<a name="job-runs-apache-livy-troubleshooting"></a>

When you input Livy and Spark configurations, there are environment-variable formats that aren't supported and can cause errors. The procedure takes you through a series of steps to help ensure that you use correct formats.

**Inputting your own Livy and Spark configurations while installing Livy**

You can configure any Apache Livy or Apache Spark environment variable with the `env.*` Helm property. Follow the steps below to convert the example configuration `example.config.with-dash.withUppercase` to a supported environment variable format.

1. Replace uppercase letters with a 1 and a lowercase of the letter. For example, `example.config.with-dash.withUppercase` becomes `example.config.with-dash.with1uppercase`.

1. Replace dashes (-) with 0. For example, `example.config.with-dash.with1uppercase` becomes `example.config.with0dash.with1uppercase`

1. Replace dots (.) with underscores (\$1). For example, `example.config.with0dash.with1uppercase` becomes `example_config_with0dash_with1uppercase`.

1. Replace all lowercase letters with uppercase letters.

1. Add the prefix `LIVY_` to the variable name.

1. Use the variable while installing Livy through the helm chart using the format --set env.*YOUR\$1VARIABLE\$1NAME*.value=*yourvalue*

For example, to set the Livy and Spark configurations `livy.server.recovery.state-store = filesystem` and `spark.kubernetes.executor.podNamePrefix = my-prefix`, use these Helm properties:

```
—set env.LIVY_LIVY_SERVER_RECOVERY_STATE0STORE.value=filesystem
—set env.LIVY_SPARK_KUBERNETES_EXECUTOR_POD0NAME0PREFIX.value=myprefix
```