

# Running Spark jobs with the Spark operator
Spark operator

Amazon EMR releases 6.10.0 and higher support the Kubernetes operator for Apache Spark, or *the Spark operator*, as a job submission model for Amazon EMR on EKS. With the Spark operator, you can deploy and manage Spark applications with the Amazon EMR release runtime on your own Amazon EKS clusters. Once you deploy the Spark operator in your Amazon EKS cluster, you can directly submit Spark applications with the operator. The operator manages the lifecycle of Spark applications.

**Note**  
Amazon EMR calculates pricing on Amazon EKS based on vCPU and memory consumption. This calculation applies to driver and executor pods. This calculation starts from when you download your Amazon EMR application image until the Amazon EKS pod terminates and is rounded to the nearest second.

**Topics**
+ [

# Setting up the Spark operator for Amazon EMR on EKS
](spark-operator-setup.md)
+ [

# Getting started with the Spark operator for Amazon EMR on EKS
](spark-operator-gs.md)
+ [

# Use vertical autoscaling with the Spark operator for Amazon EMR on EKS
](spark-operator-vas.md)
+ [

# Uninstalling the Spark operator for Amazon EMR on EKS
](spark-operator-uninstall.md)
+ [

# Using monitoring configuration to monitor the Spark Kubernetes operator and Spark jobs
](spark-operator-monitoring-configuration.md)
+ [

# Security and the Spark operator with Amazon EMR on EKS
](spark-operator-security.md)

# Setting up the Spark operator for Amazon EMR on EKS
Setting up

Complete the following tasks to get set up before you install the Spark operator on Amazon EKS. If you've already signed up for Amazon Web Services (AWS) and have used Amazon EKS, you are almost ready to use Amazon EMR on EKS. Complete the following tasks to get set up for the Spark operator on Amazon EKS. If you've already completed any of the prerequisites, you can skip those and move on to the next one.
+ **[Install or update to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) ** – If you've already installed the AWS CLI, confirm that you have the latest version.
+ **[Set up kubectl and eksctl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html) ** – eksctl is a command line tool that you use to communicate with Amazon EKS.
+ **[Install Helm](https://docs.aws.amazon.com/eks/latest/userguide/helm.html)** – The Helm package manager for Kubernetes helps you install and manage applications on your Kubernetes cluster. 
+ **[Get started with Amazon EKS – eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) ** – Follow the steps to create a new Kubernetes cluster with nodes in Amazon EKS.
+ **[Select an Amazon EMR base image URI](docker-custom-images-tag.md) (release 6.10.0 or higher)** – the Spark operator is supported with Amazon EMR releases 6.10.0 and higher.

# Getting started with the Spark operator for Amazon EMR on EKS
Getting started

This topic helps you start to use the Spark operator on Amazon EKS by deploying a Spark application and a Schedule Spark application.

## Install the Spark operator
Install the Spark operator

Use the following steps to install the Kubernetes operator for Apache Spark.

1. If you haven't already, complete the steps in [Setting up the Spark operator for Amazon EMR on EKS](spark-operator-setup.md).

1. Authenticate your Helm client to the Amazon ECR registry. In the following command, replace the *region-id* values with your preferred AWS Region, and the corresponding *ECR-registry-account* value for the Region from the [Amazon ECR registry accounts by Region](docker-custom-images-tag.md#docker-custom-images-ECR) page.

   ```
   aws ecr get-login-password \
   --region region-id | helm registry login \
   --username AWS \
   --password-stdin ECR-registry-account.dkr.ecr.region-id.amazonaws.com
   ```

1. Install the Spark operator with the following command.

   For the Helm chart `--version` parameter, use your Amazon EMR release label with the `emr-` prefix and date suffix removed. For example, with the `emr-6.12.0-java17-latest` release, specify `6.12.0-java17`. The example in the following command uses the `emr-7.12.0-latest` release, so it specifies `7.12.0` for the Helm chart `--version`.

   ```
   helm install spark-operator-demo \
     oci://895885662937.dkr.ecr.region-id.amazonaws.com/spark-operator \
     --set emrContainers.awsRegion=region-id \
     --version 7.12.0 \
     --namespace spark-operator \
     --create-namespace
   ```

   By default, the command creates service account `emr-containers-sa-spark-operator` for the Spark operator. To use a different service account, provide the argument `serviceAccounts.sparkoperator.name`. For example:

   ```
   --set serviceAccounts.sparkoperator.name my-service-account-for-spark-operator
   ```

   If you want to [use vertical autoscaling with the Spark operator](), add the following line to the installation command to allow webhooks for the operator:

   ```
   --set webhook.enable=true
   ```

1. Verify that you installed the Helm chart with the `helm list` command:

   ```
   helm list --namespace spark-operator -o yaml
   ```

   The `helm list` command should return your newly-deployed Helm chart release information:

   ```
   app_version: v1beta2-1.3.8-3.1.1
   chart: spark-operator-7.12.0
   name: spark-operator-demo
   namespace: spark-operator
   revision: "1"
   status: deployed
   updated: 2023-03-14 18:20:02.721638196 +0000 UTC
   ```

1. Complete installation with any additional options that you require. For more informtation, see the [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/charts/spark-operator-chart/README.md](https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/charts/spark-operator-chart/README.md) documentation on GitHub.

## Run a Spark application
Run a Spark application

The Spark operator is supported with Amazon EMR 6.10.0 or higher. When you install the Spark operator, it creates the service account `emr-containers-sa-spark` to run Spark applications by default. Use the following steps to run a Spark application with the Spark operator on Amazon EMR on EKS 6.10.0 or higher.

1. Before you can run a Spark application with the Spark operator, complete the steps in [Setting up the Spark operator for Amazon EMR on EKS](spark-operator-setup.md) and [Install the Spark operator](#spark-operator-install). 

1. Create a `SparkApplication` definition file `spark-pi.yaml` with the following example contents: 

   ```
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: spark-pi
     namespace: spark-operator
   spec:
     type: Scala
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.SparkPi
     mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar"
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     volumes:
       - name: "test-volume"
         hostPath:
           path: "/tmp"
           type: Directory
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: emr-containers-sa-spark
       volumeMounts:
         - name: "test-volume"
           mountPath: "/tmp"
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
       volumeMounts:
         - name: "test-volume"
           mountPath: "/tmp"
   ```

1. Now, submit the Spark application with the following command. This will also create a `SparkApplication` object named `spark-pi`:

   ```
   kubectl apply -f spark-pi.yaml
   ```

1. Check events for the `SparkApplication` object with the following command: 

   ```
   kubectl describe sparkapplication spark-pi --namespace spark-operator
   ```

For more information on submitting applications to Spark through the Spark operator, see [Using a `SparkApplication`](https://www.kubeflow.org/docs/components/spark-operator/user-guide/using-sparkapplication/) in the `spark-on-k8s-operator` documentation on GitHub.

## Use Amazon S3 for storage


To use Amazon S3 as your file storage option, add the following configurations to your YAML file.

```
hadoopConf:
# EMRFS filesystem
  fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
  fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
  fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
  fs.s3.buffer.dir: /mnt/s3
  fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
  mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
  mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
sparkConf:
 # Required for EMR Runtime
 spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
 spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
 spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
 spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
```

 If you use Amazon EMR releases 7.2.0 and higher, the configurations are included by default. In that case, you can set the file path to `s3://<bucket_name>/<file_path>` instead of `local://<file_path>` in the Spark application YAML file. 

Then submit the Spark application as normal.

# Use vertical autoscaling with the Spark operator for Amazon EMR on EKS
Vertical autoscaling

Starting with Amazon EMR 7.0, you can use Amazon EMR on EKS vertical autoscaling to simplify resource management. It automatically tunes memory and CPU resources to adapt to the needs of the workload that you provide for Amazon EMR Spark applications. For more information, see [Using vertical autoscaling with Amazon EMR Spark jobs](jobruns-vas.md).

This section describes how to configure the Spark operator to use vertical autoscaling.

## Prerequisites
Prerequisites

Before you configure monitoring, be sure to complete the following setup tasks:
+ Complete the steps in [Setting up the Spark operator for Amazon EMR on EKS](spark-operator-setup.md).
+ (optional) If you previously installed an older version of the Spark operator, delete the SparkApplication/ScheduledSparkApplication CRD.

  ```
  kubectl delete crd sparkApplication
  kubectl delete crd scheduledSparkApplication
  ```
+ Complete the steps in [Install the Spark operator](spark-operator-gs.md#spark-operator-install). In step 3, add the following line to the installation command to allow webhooks for the operator:

  ```
  --set webhook.enable=true
  ```
+ Complete the steps in [Setting up vertical autoscaling for Amazon EMR on EKS](jobruns-vas-setup.md).
+ Give access to the files in your Amazon S3 location:

  1. Annotate your driver and operator service account with the `JobExecutionRole` that has S3 permissions.

     ```
     kubectl annotate serviceaccount -n spark-operator emr-containers-sa-spark eks.amazonaws.com/role-arn=JobExecutionRole
     kubectl annotate serviceaccount -n spark-operator emr-containers-sa-spark-operator eks.amazonaws.com/role-arn=JobExecutionRole
     ```

  1. Update the trust policy of your job execution role in that namespace.

     ```
     aws emr-containers update-role-trust-policy \
     --cluster-name cluster \
     --namespace ${Namespace}\
     --role-name iam_role_name_for_job_execution
     ```

  1. Edit the IAM role trust policy of your job execution role and update the `serviceaccount` from `emr-containers-sa-spark-*-*-xxxx` to `emr-containers-sa-*`.

     ```
     {
         "Effect": "Allow",
         "Principal": {
             "Federated": "OIDC-provider"
         },
         "Action": "sts:AssumeRoleWithWebIdentity",
         "Condition": {
             "StringLike": {
                 "OIDC": "system:serviceaccount:${Namespace}:emr-containers-sa-*"
             }
         }
     }
     ```

  1. If you're using Amazon S3 as your file storage, add the following defaults to your yaml file.

     ```
     hadoopConf:
     # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
      # Required for EMR Runtime
      spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
      spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
      spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
      spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     ```

## Run a job with vertical autoscaling on the Spark operator
Run a job

Before you can run a Spark application with the Spark operator, you must complete the steps in [Prerequisites](#spark-operator-vas-prereqs). 

To use vertical autoscaling with the Spark operator, add the following configuration to the driver for your Spark Application spec to turn on vertical autoscaling:

```
dynamicSizing:
  mode: Off
  signature: "my-signature"
```

This configuration enables vertical autoscaling and is a required signature configuration that lets you choose a signature for your job.

For more information on the configurations and parameter values, see [Configuring vertical autoscaling for Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/jobruns-vas-configure.html). By default, your job submits in the monitoring-only **Off** mode of vertical autoscaling. This monitoring state lets you compute and view resource recommendations without performing autoscaling. For more information, see [Vertical autoscaling modes](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/jobruns-vas-configure.html#jobruns-vas-parameters-opt-mode).

The following is a sample `SparkApplication` definition file named `spark-pi.yaml` with the required configurations to use vertical autoscaling.

```
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: spark-operator
spec:
  type: Scala
  mode: cluster
  image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-7.12.0:latest"
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar"
  sparkVersion: "3.4.1"
  dynamicSizing:
    mode: Off
    signature: "my-signature"
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.4.1
    serviceAccount: emr-containers-sa-spark
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.4.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
```

Now, submit the Spark application with the following command. This will also create a `SparkApplication` object named `spark-pi`:

```
kubectl apply -f spark-pi.yaml
```

For more information on submitting applications to Spark through the Spark operator, see [Using a `SparkApplication`](https://www.kubeflow.org/docs/components/spark-operator/user-guide/using-sparkapplication/) in the `spark-on-k8s-operator` documentation on GitHub.

## Verifying the vertical autoscaling functionality
Verify functionality

To verify that vertical autoscaling works correctly for the submitted job, use kubectl to get the `verticalpodautoscaler` custom resource and view your scaling recommendations.

```
kubectl get verticalpodautoscalers --all-namespaces \ 
-l=emr-containers.amazonaws.com/dynamic.sizing.signature=my-signature
```

The output from this query should resemble the following:

```
NAMESPACE        NAME                                                          MODE   CPU   MEM         PROVIDED   AGE
spark-operator   ds-p73j6mkosvc4xeb3gr7x4xol2bfcw5evqimzqojrlysvj3giozuq-vpa   Off          580026651   True       15m
```

If your output doesn't look similar or contains an error code, see [Troubleshooting Amazon EMR on EKS vertical autoscaling](troubleshooting-vas.md) for steps to help resolve the issue.

To remove the pods and applications, run the following command:

```
kubectl delete sparkapplication spark-pi
```

# Uninstalling the Spark operator for Amazon EMR on EKS
Uninstall

Use the following steps to uninstall the Spark operator.

1. Delete the Spark operator using the correct namespace. For this example, the namespace is `spark-operator-demo`.

   ```
   helm uninstall spark-operator-demo -n spark-operator
   ```

1. Delete the Spark operator service account:

   ```
   kubectl delete sa emr-containers-sa-spark-operator -n spark-operator
   ```

1. Delete the Spark operator `CustomResourceDefinitions` (CRDs):

   ```
   kubectl delete crd sparkapplications.sparkoperator.k8s.io
   kubectl delete crd scheduledsparkapplications.sparkoperator.k8s.io
   ```

# Using monitoring configuration to monitor the Spark Kubernetes operator and Spark jobs
Using monitoring configuration to monitor Spark

Monitoring configuration lets you easily set up log archiving of your Spark application and operator logs to Amazon S3 or to Amazon CloudWatch. You can choose one or both. Doing so adds a log agent sidecar to your spark operator pod, driver, and executor pods, and subsequently forwards these components' logs to your configured sinks.

## Prerequisites
Prerequisites to monitor Spark

Before you configure monitoring, be sure to complete the following setup tasks:

1. (Optional) If you previously installed an older version of the Spark operator, delete the *SparkApplication/ScheduledSparkApplication* CRD.

   ```
   kubectl delete crd scheduledsparkapplications.sparkoperator.k8s.io
   kubectl delete crd sparkapplications.sparkoperator.k8s.io
   ```

1. Create an operator/job execution role in IAM if you don’t have one already.

1. Run the following command to update the trust policy of the operator/job execution role you just created:

   ```
   aws emr-containers update-role-trust-policy \ 
   --cluster-name cluster \
   --namespace namespace \
   --role-name iam_role_name_for_operator/job_execution_role
   ```

1. Edit the IAM role trust policy of your operator/job execution role to the following:

   ```
   {
       "Effect": "Allow",
       "Principal": {
           "Federated": "${OIDC-provider}"
       },
       "Action": "sts:AssumeRoleWithWebIdentity",
       "Condition": {
           "StringLike": {
               "OIDC_PROVIDER:sub": "system:serviceaccount:${Namespace}:emr-containers-sa-*"
           }
       }
   }
   ```

1. Create a *monitoringConfiguration* policy in IAM with following permissions:

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Action": [
           "logs:DescribeLogStreams",
           "logs:CreateLogStream",
           "logs:CreateLogGroup",
           "logs:PutLogEvents"
         ],
         "Resource": [
           "arn:aws:logs:*:*:log-group:log_group_name",
           "arn:aws:logs:*:*:log-group:log_group_name:*"
         ],
         "Sid": "AllowLOGSDescribelogstreams"
       },
       {
         "Effect": "Allow",
         "Action": [
           "logs:DescribeLogGroups"
         ],
         "Resource": [
           "*"
         ],
         "Sid": "AllowLOGSDescribeloggroups"
       },
       {
         "Effect": "Allow",
         "Action": [
           "s3:PutObject",
           "s3:GetObject",
           "s3:ListBucket"
         ],
         "Resource": [
           "arn:aws:s3:::bucket_name",
           "arn:aws:s3:::bucket_name/*"
         ],
         "Sid": "AllowS3Putobject"
       }
     ]
   }
   ```

------

1. Attach the above policy to your operator/job execution role.

# Spark Operator Logs
Spark Operator Logs

You can define monitoring configuration in the following way when doing `helm install`:

```
helm install spark-operator spark-operator \
--namespace namespace \
--set emrContainers.awsRegion=aws_region \
--set emrContainers.monitoringConfiguration.image=log_agent_image_url \
--set emrContainers.monitoringConfiguration.s3MonitoringConfiguration.logUri=S3_bucket_uri \
--set emrContainers.monitoringConfiguration.cloudWatchMonitoringConfiguration.logGroupName=log_group_name \
--set emrContainers.monitoringConfiguration.cloudWatchMonitoringConfiguration.logStreamNamePrefix=log_stream_prefix \
--set emrContainers.monitoringConfiguration.sideCarResources.limits.cpuLimit=500m \
--set emrContainers.monitoringConfiguration.sideCarResources.limits.memoryLimit=512Mi \
--set emrContainers.monitoringConfiguration.containerLogRotationConfiguration.rotationSize=2GB \
--set emrContainers.monitoringConfiguration.containerLogRotationConfiguration.maxFilesToKeep=10 \
--set webhook.enable=true \
--set emrContainers.operatorExecutionRoleArn=operator_execution_role_arn
```

**Monitoring configuration**

The following are the available configuration options under **monitoringConfiguration**.
+ **Image ** (optional) – Log agent image url. Will fetch by emrReleaseLabel if not provided.
+ **s3MonitoringConfiguration** – Set this option to archive to Amazon S3.
  + **logUri** – (required) – The Amazon S3 bucket path where you want to store your logs.
  + The following are sample formats for the Amazon S3 bucket paths, after the logs are uploaded. The first example shows no log rotation enabled.

    ```
    s3://${logUri}/${POD NAME}/operator/stdout.gz
    s3://${logUri}/${POD NAME}/operator/stderr.gz
    ```

    Log rotation enabled by default. You can see both a rotated file, with an incrementing index, and a current file, which is the same as the previous sample.

    ```
    s3://${logUri}/${POD NAME}/operator/stdout_YYYYMMDD_index.gz
    s3://${logUri}/${POD NAME}/operator/stderr_YYYYMMDD_index.gz
    ```
+ **cloudWatchMonitoringConfiguration** – The configuration key to set up forwarding to Amazon CloudWatch.
  + **logGroupName** (required) – Name of the Amazon CloudWatch log group that you want to send logs to. The group automatically gets created if it doesn't exist.
  + **logStreamNamePrefix** (optional) – Name of the log stream that you want to send logs into. The default value is an empty string. The format in Amazon CloudWatch is as follows:

    ```
    ${logStreamNamePrefix}/${POD NAME}/STDOUT or STDERR
    ```
+ **sideCarResources** (optional) – The configuration key to set resource limits on the launched Fluentd sidecar container.
  + **memoryLimit** (optional) – The memory limit. Adjust according to your needs. The default is 512Mi.
  + **cpuLimit** (optional) – The CPU limit. Adjust according to your needs. The default is 500m.
+ **containerLogRotationConfiguration** (optional) – Controls the container log rotation behavior. It is enabled by default.
  + **rotationSize** (required) – Specifies file size for the log rotation. The range of possible values is from 2KB to 2GB. The numeric unit portion of the rotationSize parameter is passed as an integer. Since decimal values aren't supported, you can specify a rotation size of 1.5GB, for example, with the value 1500MB. The default is 2GB.
  + **maxFilesToKeep** (required) – Specifies the maximum number of files to retain in the container after rotation has taken place. The minimum value is 1, and the maximum value is 50. The default is 10.

After configured *monitoringConfiguration*, you should be able to check spark operator pod logs on an Amazon S3 bucket or Amazon CloudWatch or both. For an Amazon S3 bucket, you need to wait 2 minutes for the first log file to get flushed.

To find the logs in Amazon CloudWatch, you can navigate to the following: **CloudWatch** > **Log groups** > ***Log group name*** > *Pod name***/operator/stderr**

Or you can navigate to: **CloudWatch** > **Log groups** > ***Log group name*** > *Pod name***/operator/stdout**

# Spark Application Logs
Spark Application Logs

You can define this configuration in the following way.

```
apiVersion: "sparkoperator.k8s.io/v1beta2"
kind: SparkApplication
metadata:
  name: spark-pi
  namespace: namespace
spec:
  type: Scala
  mode: cluster
  imagePullPolicy: Always
  mainClass: org.apache.spark.examples.SparkPi
  mainApplicationFile: "local:///usr/lib/spark/examples/jars/spark-examples.jar"
  sparkVersion: "3.3.1"
  emrReleaseLabel: emr_release_label
  executionRoleArn: job_execution_role_arn
  restartPolicy:
    type: Never
  volumes:
    - name: "test-volume"
      hostPath:
        path: "/tmp"
        type: Directory
  driver:
    cores: 1
    coreLimit: "1200m"
    memory: "512m"
    labels:
      version: 3.3.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  executor:
    cores: 1
    instances: 1
    memory: "512m"
    labels:
      version: 3.3.1
    volumeMounts:
      - name: "test-volume"
        mountPath: "/tmp"
  monitoringConfiguration:
    image: "log_agent_image"
    s3MonitoringConfiguration:
      logUri: "S3_bucket_uri"
    cloudWatchMonitoringConfiguration:
      logGroupName: "log_group_name"
      logStreamNamePrefix: "log_stream_prefix"
    sideCarResources:
      limits:
        cpuLimit: "500m"
        memoryLimit: "250Mi"
    containerLogRotationConfiguration:
      rotationSize: "2GB"
      maxFilesToKeep: "10"
```

The following are the available configuration options under **monitoringConfiguration**.
+ **Image** (optional) – Log agent image url. Will fetch by emrReleaseLabel if not provided.
+ **s3MonitoringConfiguration** – Set this option to archive to Amazon S3.
  + **logUri** (required) – The Amazon S3 bucket path where you want to store your logs. The first example shows no log rotation enabled:

    ```
    s3://${logUri}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stdout.gz
    s3://${logUri}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stderr.gz
    ```

    Log rotation is enabled by default. You can use both a rotated file (with incrementing index) and a current file (one without the date stamp).

    ```
    s3://${logUri}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stdout_YYYYMMDD_index.gz
    s3://${logUri}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stderr_YYYYMMDD_index.gz
    ```
+ **cloudWatchMonitoringConfiguration** – The configuration key to set up forwarding to Amazon CloudWatch.
  + **logGroupName** (required) – The name of the Cloudwatch log group that you want to send logs to. The group automatically is created if it doesn't exist.
  + **logStreamNamePrefix** (optional) – The Name of the log stream that you want to send logs into. The default value is an empty string. The format in CloudWatch is as follows:

    ```
    ${logStreamNamePrefix}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stdout
    ${logStreamNamePrefix}/${APPLICATION NAME}-${APPLICATION UID}/${POD NAME}/stderr
    ```
+ **sideCarResources** (optional) – The configuration key to set resource limits on the launched Fluentd sidecar container.
  + **memoryLimit** (optional) – The memory limit. Adjust according to your needs. The default is 250Mi.
  + **cpuLimit** – The CPU limit. Adjust according to your needs. The default is 500m.
+ **containerLogRotationConfiguration** (optional) – Controls the container log rotation behavior. It is enabled by default.
  + **rotationSize** (required) – Specifies file size for the log rotation. The range of possible values is from 2KB to 2GB. The numeric unit portion of the rotationSize parameter is passed as an integer. Since decimal values aren't supported, you can specify a rotation size of 1.5GB, for example, with the value 1500MB. The default is 2GB.
  + **maxFilesToKeep** (required) – Specifies the maximum number of files to retain in the container after rotation has taken place. The minimum value is 1. The maximum value is 50. The default is 10.

After configuring monitoringConfiguration, you should be able to check your spark application driver and executor logs on an Amazon S3 bucket or CloudWatch or both. For an Amazon S3 bucket, you need to wait 2 minutes for the first log file to be flushed. For example, in Amazon S3, the bucket path appears like the following:

**Amazon S3** > **Buckets** > ***Bucket name*** > *Spark application name - UUID* > *Pod Name* > **stderr.gz**

Or:

**Amazon S3** > **Buckets** > ***Bucket name*** > *Spark application name - UUID* > *Pod Name* > **stdout.gz**

In CloudWatch, the path appears like the following:

**CloudWatch** > **Log groups** > ***Log group name*** > *Spark application name - UUID*/ *Pod name***/stderr**

Or:

**CloudWatch** > **Log groups** > ***Log group name*** > *Spark application name - UUID*/ *Pod name***/stdout**

# Security and the Spark operator with Amazon EMR on EKS
Security

There are a couple ways to set up cluster-access permissions when you use the Spark operator. The first is to use role-based access control, Role-based access control (RBAC) restricts access based on a person's role within an organization. It has become a primary way to handle access. The second access method is to assume an AWS Identity and Access Management role, which provides resource access by means of specific assigned permissions.

**Topics**
+ [

# Setting up cluster access permissions with role-based access control (RBAC)
](spark-operator-security-rbac.md)
+ [

# Setting up cluster access permissions with IAM roles for service accounts (IRSA)
](spark-operator-security-irsa.md)

# Setting up cluster access permissions with role-based access control (RBAC)
Role-based access control (RBAC)

To deploy the Spark operator, Amazon EMR on EKS creates two roles and service accounts for the Spark operator and the Spark apps.

**Topics**
+ [

## Operator service account and role
](#spark-operator-sa-oper)
+ [

## Spark service account and role
](#spark-operator-sa-spark)

## Operator service account and role
Operator service account and role

Amazon EMR on EKS creates the **operator service account and role** to manage `SparkApplications` for Spark jobs and for other resources such as services.

The default name for this service account is `emr-containers-sa-spark-operator`.

The following rules apply to this service role: 

```
 rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - services
  - configmaps
  - secrets
  verbs:
  - create
  - get
  - delete
  - update
- apiGroups:
  - extensions
  - networking.k8s.io
  resources:
  - ingresses
  verbs:
  - create
  - get
  - delete
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - update
  - patch
- apiGroups:
  - ""
  resources:
  - resourcequotas
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - apiextensions.k8s.io
  resources:
  - customresourcedefinitions
  verbs:
  - create
  - get
  - update
  - delete
- apiGroups:
  - admissionregistration.k8s.io
  resources:
  - mutatingwebhookconfigurations
  - validatingwebhookconfigurations
  verbs:
  - create
  - get
  - update
  - delete
- apiGroups:
  - sparkoperator.k8s.io
  resources:
  - sparkapplications
  - sparkapplications/status
  - scheduledsparkapplications
  - scheduledsparkapplications/status
  verbs:
  - "*"
  {{- if .Values.batchScheduler.enable }}
  # required for the `volcano` batch scheduler
- apiGroups:
  - scheduling.incubator.k8s.io
  - scheduling.sigs.dev
  - scheduling.volcano.sh
  resources:
  - podgroups
  verbs:
  - "*"
  {{- end }}
  {{ if .Values.webhook.enable }}
- apiGroups:
  - batch
  resources:
  - jobs
  verbs:
  - delete
  {{- end }}
```

## Spark service account and role
Spark service account and role

A Spark driver pod needs a Kubernetes service account in the same namespace as the pod. This service account needs permissions to create, get, list, patch and delete executor pods, and to create a Kubernetes headless service for the driver. The driver fails and exits without the service account unless the default service account in the pod's namespace has the required permissions.

The default name for this service account is `emr-containers-sa-spark`.

The following rules apply to this service role: 

```
 rules:
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - services
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - "*"
- apiGroups:
  - ""
  resources:
  - persistentvolumeclaims
  verbs:
  - "*"
```

# Setting up cluster access permissions with IAM roles for service accounts (IRSA)
IAM roles for service accounts (IRSA)

This section uses an example to demonstrate how to configure a Kubernetes service account to assume an AWS Identity and Access Management role. Pods that use the service account can then access any AWS service that the role has permissions to access.

The following example runs a Spark application to count the words from a file in Amazon S3. To do this, you can set up IAM roles for service accounts (IRSA) to authenticate and authorize Kubernetes service accounts.

**Note**  
This example uses the "spark-operator" namespace for the Spark operator and for the namespace where you submit the Spark application.

## Prerequisites


Before you try the example on this page, complete the following prerequisites:
+ [Get set up for the Spark operator]().
+ [Install the Spark operator](spark-operator-gs.md#spark-operator-install).
+ [Create an Amazon S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html).
+ Save your favorite poem in a text file named `poem.txt`, and upload the file to your S3 bucket. The Spark application that you create on this page will read the contents of the text file. For more information on uploading files to S3, see [Upload an object to your bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/uploading-an-object-bucket.html) in the *Amazon Simple Storage Service User Guide*.

## Configure a Kubernetes service account to assume an IAM role
Configure the Kubernetes service account

Use the following steps to configure a Kubernetes service account to assume an IAM role that pods can use to access AWS services that the role has permissions to access.

1. After completing the [Prerequisites](#spark-operator-security-irsa-prereqs), use the AWS Command Line Interface to create an `example-policy.json` file that allows read-only access to the file that you uploaded to Amazon S3:

   ```
   cat >example-policy.json <<EOF
   {
       "Version": "2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "s3:GetObject",
                   "s3:ListBucket"
               ],
               "Resource": [
                   "arn:aws:s3:::my-pod-bucket",
                   "arn:aws:s3:::my-pod-bucket/*"
               ]
           }
       ]
   }
   EOF
   ```

1. Then, create an IAM policy `example-policy`:

   ```
   aws iam create-policy --policy-name example-policy --policy-document file://example-policy.json
   ```

1. Next, create an IAM role `example-role` and associate it with a Kubernetes service account for the Spark driver:

   ```
   eksctl create iamserviceaccount --name driver-account-sa --namespace spark-operator \
   --cluster my-cluster --role-name "example-role" \
   --attach-policy-arn arn:aws:iam::111122223333:policy/example-policy --approve
   ```

1. Create a yaml file with the cluster role bindings that are required for the Spark driver service account:

   ```
   cat >spark-rbac.yaml <<EOF
   apiVersion: v1
   kind: ServiceAccount
   metadata:
     name: driver-account-sa
   ---
   apiVersion: rbac.authorization.k8s.io/v1
   kind: ClusterRoleBinding
   metadata:
     name: spark-role
   roleRef:
     apiGroup: rbac.authorization.k8s.io
     kind: ClusterRole
     name: edit
   subjects:
     - kind: ServiceAccount
       name: driver-account-sa
       namespace: spark-operator
   EOF
   ```

1. Apply the cluster role binding configurations:

   ```
   kubectl apply -f spark-rbac.yaml
   ```

The kubectl command should confirm successful creation of the account:

```
serviceaccount/driver-account-sa created
clusterrolebinding.rbac.authorization.k8s.io/spark-role configured
```

## Running an application from the Spark operator
Run the application

After you [configure the Kubernetes service account](), you can run a Spark application that counts the number of words in the text file that you uploaded as part of the [Prerequisites](#spark-operator-security-irsa-prereqs).

1. Create a new file `word-count.yaml`, with a `SparkApplication` definition for your word-count application, based on Amazon EMR version 6.

   ```
   cat >word-count.yaml <<EOF
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: word-count
     namespace: spark-operator
   spec:
     type: Java
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-6.10.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.JavaWordCount
     mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar
     arguments:
       - s3://my-pod-bucket/poem.txt
     hadoopConf:
      # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
       # Required for EMR Runtime
       spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
       spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: my-spark-driver-sa
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
   EOF
   ```

   If you're using the spark operator with a version 7 release, you adjust some of the configuration values:

   ```
   cat >word-count.yaml <<EOF
   apiVersion: "sparkoperator.k8s.io/v1beta2"
   kind: SparkApplication
   metadata:
     name: word-count
     namespace: spark-operator
   spec:
     type: Java
     mode: cluster
     image: "895885662937.dkr.ecr.us-west-2.amazonaws.com/spark/emr-7.7.0:latest"
     imagePullPolicy: Always
     mainClass: org.apache.spark.examples.JavaWordCount
     mainApplicationFile: local:///usr/lib/spark/examples/jars/spark-examples.jar
     arguments:
       - s3://my-pod-bucket/poem.txt
     hadoopConf:
      # EMRFS filesystem
       fs.s3.customAWSCredentialsProvider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider
       fs.s3.impl: com.amazon.ws.emr.hadoop.fs.EmrFileSystem
       fs.AbstractFileSystem.s3.impl: org.apache.hadoop.fs.s3.EMRFSDelegate
       fs.s3.buffer.dir: /mnt/s3
       fs.s3.getObject.initialSocketTimeoutMilliseconds: "2000"
       mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem: "2"
       mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem: "true"
     sparkConf:
       # Required for EMR Runtime
       spark.driver.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/aws-java-sdk-v2/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.driver.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
       spark.executor.extraClassPath: /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/aws-java-sdk-v2/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/home/hadoop/extrajars/*
       spark.executor.extraLibraryPath: /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
     sparkVersion: "3.3.1"
     restartPolicy:
       type: Never
     driver:
       cores: 1
       coreLimit: "1200m"
       memory: "512m"
       labels:
         version: 3.3.1
       serviceAccount: my-spark-driver-sa
     executor:
       cores: 1
       instances: 1
       memory: "512m"
       labels:
         version: 3.3.1
   EOF
   ```

1. Submit the Spark application.

   ```
   kubectl apply -f word-count.yaml
   ```

   The kubectl command should return confirmation that you successfully created a `SparkApplication` object called `word-count`.

   ```
   sparkapplication.sparkoperator.k8s.io/word-count configured
   ```

1. To check events for the `SparkApplication` object, run the following command:

   ```
   kubectl describe sparkapplication word-count -n spark-operator
   ```

   The kubectl command should return the description of the `SparkApplication` with the events:

   ```
   Events:
     Type     Reason                               Age                    From            Message
     ----     ------                               ----                   ----            -------
     Normal   SparkApplicationSpecUpdateProcessed  3m2s (x2 over 17h)     spark-operator  Successfully processed spec update for SparkApplication word-count
     Warning  SparkApplicationPendingRerun         3m2s (x2 over 17h)     spark-operator  SparkApplication word-count is pending rerun
     Normal   SparkApplicationSubmitted            2m58s (x2 over 17h)    spark-operator  SparkApplication word-count was submitted successfully
     Normal   SparkDriverRunning                   2m56s (x2 over 17h)    spark-operator  Driver word-count-driver is running
     Normal   SparkExecutorPending                 2m50s                  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] is pending
     Normal   SparkExecutorRunning                 2m48s                  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] is running
     Normal   SparkDriverCompleted                 2m31s (x2 over 17h)    spark-operator  Driver word-count-driver completed
     Normal   SparkApplicationCompleted            2m31s (x2 over 17h)    spark-operator  SparkApplication word-count completed
     Normal   SparkExecutorCompleted               2m31s (x2 over 2m31s)  spark-operator  Executor [javawordcount-fdd1698807392c66-exec-1] completed
   ```

The application is now counting the words in your S3 file. To find the count of words, refer to the log files for your driver:

```
kubectl logs pod/word-count-driver -n spark-operator
```

The kubectl command should return the contents of the log file with the results of your word-count application.

```
INFO DAGScheduler: Job 0 finished: collect at JavaWordCount.java:53, took 5.146519 s
                Software: 1
```

For more information on how to submit applications to Spark through the Spark operator, see [Using a SparkApplication](https://www.kubeflow.org/docs/components/spark-operator/user-guide/using-sparkapplication/) in the *Kubernetes Operator for Apache Spark (spark-on-k8s-operator)* documentation on GitHub.