# Using vertical autoscaling with Amazon EMR Spark jobs
<a name="jobruns-vas"></a>

Amazon EMR on EKS vertical autoscaling automatically tunes memory and CPU resources to adapt to the needs of the workload that you provide for Amazon EMR Spark applications. This simplifies resource management.

To track the real-time and historic resource utilization of your Amazon EMR Spark applications, vertical autoscaling leverages the Kubernetes [Vertical Pod Autoscaler (VPA)](https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler). The vertical autoscaling capability uses the data that VPA collects to automatically tune the memory and CPU resources assigned to your Spark applications. This simplified process enhances reliability and optimizes cost.

**Topics**
+ [Setting up](jobruns-vas-setup.md)
+ [Getting started](jobruns-vas-gs.md)
+ [Configuration](jobruns-vas-configure.md)
+ [Monitoring the recommendations](jobruns-vas-monitor.md)
+ [Uninstalling](jobruns-vas-uninstall-operator.md)

# Setting up vertical autoscaling for Amazon EMR on EKS
<a name="jobruns-vas-setup"></a>

This topic helps you get your Amazon EKS cluster ready to submit Amazon EMR Spark jobs with vertical autoscaling. The setup process requires you to confirm or complete the tasks in the following sections:

**Topics**
+ [Prerequisites](#jobruns-vas-prereqs)
+ [Install the Operator Lifecycle Manager (OLM) on your Amazon EKS cluster](#jobruns-vas-install-olm)
+ [Install the Amazon EMR on EKS vertical autoscaling operator](#jobruns-vas-install-operator)

## Prerequisites
<a name="jobruns-vas-prereqs"></a>

Complete the following tasks before you install the vertical autoscaling Kubernetes operator on your cluster. If you've already completed any of the prerequisites, you can skip those and move on to the next one.
+ **[Install or update to the latest version of the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) ** – If you've already installed the AWS CLI, confirm that you have the latest version.
+ **[Install kubectl](https://docs.aws.amazon.com/eks/latest/userguide/install-kubectl.html)** – kubectl is a command line tool that you use to communicate with the Kubernetes API server. You need kubectl to install and monitor vertical autoscaling-related artifacts on your Amazon EKS cluster.
+ **[Install the Operator SDK](https://sdk.operatorframework.io/docs/installation/)** – Amazon EMR on EKS uses the Operator SDK as a package manager for the life of the vertical autoscaling operator that you install on your cluster.
+ **[Install Docker](https://docs.docker.com/get-docker/)** – You need access to the Docker CLI to authenticate and fetch the vertical autoscaling-related Docker images to install on your Amazon EKS cluster.
+ **[Install the Kubernetes Metrics server](https://docs.aws.amazon.com/eks/latest/userguide/metrics-server.html)**– You must first install metrics server so the vertical pod autoscaler can fetch metrics from the Kubernetes API server.
+ **[Get started with Amazon EKS – eksctl](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) (version 1.24 or higher)** – Vertical autoscaling is supported with Amazon EKS versions 1.24 and higher. Once you create the cluster, [register it for use with Amazon EMR](setting-up-registration.md).
+ **[Select an Amazon EMR base image URI](docker-custom-images-tag.md) (release 6.10.0 or higher)** – Vertical autoscaling is supported with Amazon EMR releases 6.10.0 and higher.

## Install the Operator Lifecycle Manager (OLM) on your Amazon EKS cluster
<a name="jobruns-vas-install-olm"></a>

Use the Operator SDK CLI to install the Operator Lifecycle Manager (OLM) on the Amazon EMR on EKS cluster where you want to set up vertical autoscaling, as shown in the following example. Once you set it up, you can use OLM to install and manage the lifecycle of the [Amazon EMR vertical autoscaling operator](#jobruns-vas-install-operator).

```
operator-sdk olm install
```

To validate installation, run the `olm status` command:

```
operator-sdk olm status
```

Verify that the command returns a successful result, similar to the following example output:

```
INFO[0007] Successfully got OLM status for version X.XX
```

If your installation doesn't succeed, see [Troubleshooting Amazon EMR on EKS vertical autoscaling](troubleshooting-vas.md).

## Install the Amazon EMR on EKS vertical autoscaling operator
<a name="jobruns-vas-install-operator"></a>

Use the following steps to install the vertical autoscaling operator on your Amazon EKS cluster:

1. Set up the following environment variables that you will use to complete the installation:
   + **`$REGION`** points to the AWS Region for your cluster. For example, `us-west-2`.
   + **`$ACCOUNT_ID`** points to the Amazon ECR account ID for your Region. For more information, see [Amazon ECR registry accounts by Region](docker-custom-images-tag.md#docker-custom-images-ECR).
   + **`$RELEASE`** points to the Amazon EMR release that you want to use for your cluster. With vertical autoscaling, you must use Amazon EMR release 6.10.0 or higher.

1. Next, get authentication tokens to the [Amazon ECR registry](docker-custom-images-tag.md#docker-custom-images-ECR) for the operator.

   ```
   aws ecr get-login-password \
    --region region-id | docker login \
    --username AWS \
    --password-stdin $ACCOUNT_ID.dkr.ecr.region-id.amazonaws.com
   ```

1. Install the Amazon EMR on EKS vertical autoscaling operator with the following command:

   ```
   ECR_URL=$ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com && \
   REPO_DEST=dynamic-sizing-k8s-operator-olm-bundle && \
   BUNDLE_IMG=emr-$RELEASE-dynamic-sizing-k8s-operator && \
   operator-sdk run bundle \
   $ECR_URL/$REPO_DEST/$BUNDLE_IMG\:latest
   ```

   This will create a release of the vertical autoscaling operator in the default namespace of your Amazon EKS cluster. Use this command to install in a different namespace:

   ```
   operator-sdk run bundle \
   $ACCOUNT_ID.dkr.ecr.$REGION.amazonaws.com/dynamic-sizing-k8s-operator-olm-bundle/emr-$RELEASE-dynamic-sizing-k8s-operator:latest \
   -n operator-namespace
   ```
**Note**  
If the namespace that you specify doesn't exist, OLM won't install the operator. For more information, see [Kubernetes namespace not found](troubleshooting-vas.md).

1. Verify that you successfully installed the operator with the kubectl Kubernetes command-line tool.

   ```
   kubectl get csv -n operator-namespace
   ```

   The `kubectl` command should return your newly-deployed vertical autoscaler operator with a **Phase** status of **Succeeded**. If you've trouble with installation or setup, see [Troubleshooting Amazon EMR on EKS vertical autoscaling](troubleshooting-vas.md).

# Getting started with vertical autoscaling for Amazon EMR on EKS
<a name="jobruns-vas-gs"></a>

Use vertical autoscaling for Amazon EMR on EKS when you want automatic tuning of memory and CPU resources to adapt to your Amazon EMR Spark application workload. For more information, see [Using vertical autoscaling with Amazon EMR Spark jobs](jobruns-vas.html).

## Submitting a Spark job with vertical autoscaling
<a name="jobruns-vas-spark-submit"></a>

When you submit a job through the [StartJobRun](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_StartJobRun.html) API, add the following two configurations to the driver for your Spark job to turn on vertical autoscaling:

```
"spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing":"true",
"spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.signature":"YOUR_JOB_SIGNATURE"
```

In the code above, the first line enables the vertical autoscaling capability. The next line is a required signature configuration that lets you choose a signature for your job.

For more information on these configurations and acceptable parameter values, see [Configuring vertical autoscaling for Amazon EMR on EKS](jobruns-vas-configure.md). By default, your job submits in the monitoring-only **Off** mode of vertical autoscaling. This monitoring state lets you compute and view resource recommendations without performing autoscaling. For more information, see [Vertical autoscaling modes](jobruns-vas-configure.md#jobruns-vas-parameters-opt-mode).

The following example shows how to complete a sample `start-job-run` command with vertical autoscaling:

```
aws emr-containers start-job-run \
--virtual-cluster-id $VIRTUAL_CLUSTER_ID \
--name $JOB_NAME \
--execution-role-arn $EMR_ROLE_ARN \
--release-label emr-6.10.0-latest \
--job-driver '{
  "sparkSubmitJobDriver": {
     "entryPoint": "local:///usr/lib/spark/examples/src/main/python/pi.py"
   }
 }' \
--configuration-overrides '{
    "applicationConfiguration": [{
        "classification": "spark-defaults",
        "properties": {
          "spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing": "true",
          "spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.signature": "test-signature"
        }
    }]
  }'
```

## Verifying the vertical autoscaling functionality
<a name="jobruns-vas-verify"></a>

To verify that vertical autoscaling works correctly for the submitted job, use kubectl to get the `verticalpodautoscaler` custom resource and view your scaling recommendations. For example, the following command queries for recommendations on the example job from the [Submitting a Spark job with vertical autoscaling](#jobruns-vas-spark-submit) section:

```
kubectl get verticalpodautoscalers --all-namespaces \
-l=emr-containers.amazonaws.com/dynamic.sizing.signature=test-signature
```

The output from this query should resemble the following:

```
NAME                                                          MODE   CPU         MEM PROVIDED   AGE
ds-jceyefkxnhrvdzw6djum3naf2abm6o63a6dvjkkedqtkhlrf25eq-vpa   Off    3304504865  True           87m
```

If your output doesn't look similar or contains an error code, see [Troubleshooting Amazon EMR on EKS vertical autoscaling](troubleshooting-vas.md) for steps to help resolve the issue.

# Configuring vertical autoscaling for Amazon EMR on EKS
<a name="jobruns-vas-configure"></a>

You can configure vertical autoscaling when you submit Amazon EMR Spark jobs through the [StartJobRun](https://docs.aws.amazon.com/emr-on-eks/latest/APIReference/API_StartJobRun.html) API. Set the autoscaling-related configuration parameters on the Spark driver pod as shown in the example in [Submitting a Spark job with vertical autoscaling](jobruns-vas-gs.md#jobruns-vas-spark-submit).

The Amazon EMR on EKS vertical autoscaling operator listens to driver pods that have autoscaling, then sets up integration with the Kubernetes Vertical Pod Autoscaler (VPA) with the settings on the driver pod. This facilitates resource tracking and autoscaling of Spark executor pods.

The following sections describe the parameters that you can use when you configure vertical autoscaling for your Amazon EKS cluster.

**Note**  
Configure the feature toggle parameter as a label, and configure the remaining parameters as annotations on the Spark driver pod. The autoscaling parameters belong to the `emr-containers.amazonaws.com/` domain and have the `dynamic.sizing` prefix.

## Required parameters
<a name="jobruns-vas-parameters-req"></a>

You must include the following two parameters on the Spark job driver when you submit your job:


| Key | Description | Accepted values | Default value | Type | Spark parameter1 | 
| --- | --- | --- | --- | --- | --- | 
|  `dynamic.sizing`  |  Feature toggle  |  `true`, `false`  |  not set  |  label  |  `spark.kubernetes.driver.label.emr-containers.amazonaws.com/dynamic.sizing`  | 
|  `dynamic.sizing.signature`  |  Job signature  |  *string*  |  not set  |  annotation  |  `spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.signature`  | 

1 Use this parameter as a `SparkSubmitParameter` or `ConfigurationOverride` in the `StartJobRun` API.
+ **`dynamic.sizing`** – You can turn vertical autoscaling on and off with the `dynamic.sizing` label. To turn on vertical autoscaling, set `dynamic.sizing` to `true` on the Spark driver pod. If you omit this label or set it to any value other than `true`, vertical autoscaling is off.
+ **`dynamic.sizing.signature`** – Set the job signature with the `dynamic.sizing.signature` annotation on the driver pod. Vertical autoscaling aggregates your resource usage data across different runs of Amazon EMR Spark jobs to derive resource recommendations. You provide the unique identifier to tie the jobs together.

  
**Note**  
If your job recurs at a fixed interval such as daily or weekly, then your job signature should remain the same for each new instance of the job. This ensures that vertical autoscaling can compute and aggregate recommendations across different runs of the job.

1 Use this parameter as a `SparkSubmitParameter` or `ConfigurationOverride` in the `StartJobRun` API.

## Optional parameters
<a name="jobruns-vas-parameters-opt"></a>

Vertical autoscaling also supports the following optional parameters. Set them as annotations on the driver pod.


| Key | Description | Accepted values | Default value | Type | Spark parameter1 | 
| --- | --- | --- | --- | --- | --- | 
|  `dynamic.sizing.mode`  |  Vertical autoscaling mode  |  `Off`, `Initial`, `Auto`  |  `Off`  |  annotation  |  `spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.mode`  | 
|  `dynamic.sizing.scale.memory`  |  Enables memory scaling  |  *`true`, `false`*  |  `true`  |  annotation  |  `spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.memory`  | 
|  `dynamic.sizing.scale.cpu`  |  Turn CPU scaling on or off  |  *`true`, `false`*  |  `false`  |  annotation  |  `spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.cpu`  | 
|  `dynamic.sizing.scale.memory.min`  |  Minumum limit for memory scaling  | string, [K8s resource quantity](https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#Quantity) ex: 1G |  not set  |  annotation  | spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.memory.min | 
|  `dynamic.sizing.scale.memory.max`  |  Maximum limit for memory scaling  | string, [K8s resource quantity](https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#Quantity) ex: 4G |  not set  |  annotation  | spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.memory.max | 
|  `dynamic.sizing.scale.cpu.min`  |  Minimum limit for CPU scaling  | string, [K8s resource quantity](https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#Quantity) ex: 1 |  not set  |  annotation  | spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.cpu.min | 
|  `dynamic.sizing.scale.cpu.max`  |  Maximum limit for CPU scaling  | string, [K8s resource quantity](https://pkg.go.dev/k8s.io/apimachinery/pkg/api/resource#Quantity) ex: 2 |  not set  |  annotation  | spark.kubernetes.driver.annotation.emr-containers.amazonaws.com/dynamic.sizing.scale.cpu.max | 

### Vertical autoscaling modes
<a name="jobruns-vas-parameters-opt-mode"></a>

The `mode` parameter maps to the different autoscaling modes that the VPA supports. Use the `dynamic.sizing.mode` annotation on the driver pod to set the mode. The following values are supported for this parameter:
+ **Off** – A dry-run mode where you can monitor recommendations, but autoscaling is not performed. This is the default mode for vertical autoscaling. In this mode, the associated vertical pod autoscaler resource computes recommendations, and you can monitor the recommendations through tools like kubectl, Prometheus, and Grafana.
+ **Initial** – In this mode, VPA autoscales resources when the job starts if recommendations are available based on historic runs of the job, such as in the case of a recurring job.
+ **Auto** – In this mode, VPA evicts Spark executor pods, and autoscales them with the recommended resource settings when the Spark driver pod restarts them. Sometimes, the VPA evicts running Spark executor pods, so it might result in additional latency when it retries the interrupted executor.

### Resource scaling
<a name="jobruns-vas-parameters-opt-rs"></a>

When you set up vertical autoscaling, you can choose whether to scale CPU and memory resources. Set the `dynamic.sizing.scale.cpu` and `dynamic.sizing.scale.memory` annotations to `true` or `false`. By default, CPU scaling is set to `false`, and memory scaling is set to `true`.

### Resource minimums and maximums (Bounds)
<a name="jobruns-vas-parameters-opt-bounds"></a>

Optionally, you can also set boundaries on the CPU and memory resources. Choose a minimum and maximum value for these resources with the `dynamic.sizing.[memory/cpu].[min/max]` annotations when you enable autoscaling. By default, the resources have no limitations. Set the annotations as string values that represent a Kubernetes resource quantity. For example, set `dynamic.sizing.memory.max` to `4G` to represent 4 GB.

# Monitoring vertical autoscaling for Amazon EMR on EKS
<a name="jobruns-vas-monitor"></a>

You can use the **kubectl** Kubernetes command line tool to list the active, vertical autoscaling-related recommendations on your cluster. You can also view your tracked job signatures, and purge any unneeded resources that are associated with the signatures.


## List the vertical autoscaling recommendations for your cluster
<a name="jobruns-vas-monitor-list"></a>

Use kubectl to get the `verticalpodautoscaler` resource, and view the current status and recommendations. The following example query returns all active resources on your Amazon EKS cluster.

```
kubectl get verticalpodautoscalers \
-o custom-columns="NAME:.metadata.name,"\
"SIGNATURE:.metadata.labels.emr-containers\.amazonaws\.com/dynamic\.sizing\.signature,"\
"MODE:.spec.updatePolicy.updateMode,"\
"MEM:.status.recommendation.containerRecommendations[0].target.memory" \
--all-namespaces
```

The output from this query resembles the following:

```
NAME                  SIGNATURE                MODE      MEM
ds-example-id-1-vpa   job-signature-1          Off       none
ds-example-id-2-vpa   job-signature-2          Initial   12936384283
```

## Query and delete the vertical autoscaling recommendations for your cluster
<a name="jobruns-vas-monitor-query"></a>

When you delete an Amazon EMR vertical autoscaling job-run resource, it automatically deletes the associated VPA object that tracks and stores recommendations.

The following example uses kubectl to purge recommendations for a job that is identified by a signature:

```
kubectl delete jobrun -n emr -l=emr-containers\.amazonaws\.com/dynamic\.sizing\.signature=integ-test
jobrun.dynamicsizing.emr.services.k8s.aws "ds-job-signature" deleted
```

If you don't know the specific job signature, or want to purge all of the resources on the cluster, you can use `--all` or `--all-namespaces` in your command instead of the unique job ID, as shown in the following example:

```
kubectl delete jobruns --all --all-namespaces
jobrun.dynamicsizing.emr.services.k8s.aws "ds-example-id" deleted
```

# Uninstall the Amazon EMR on EKS vertical autoscaling operator
<a name="jobruns-vas-uninstall-operator"></a>

If you want to remove the vertical autoscaling operator from your Amazon EKS cluster, use the `cleanup` command with the Operator SDK CLI as shown in the following example. This also deletes upstream dependencies that installed with the operator, such as the Vertical Pod Autoscaler.

```
operator-sdk cleanup emr-dynamic-sizing
```

If there are any running jobs on the cluster when you delete the operator, those jobs continue to run without vertical autoscaling. If you submit jobs on the cluster after you delete the operator, Amazon EMR on EKS will ignore any vertical autoscaling-related parameters that you may have defined during [configuration](jobruns-vas-configure.md).