

# Manage Amazon EMR clusters
<a name="emr-manage"></a>

 After you've launched your cluster, you can connect to it and manage it. Amazon EMR provides a collection of tools you can use to do this. This section provides guidance for connecting to your cluster, giving it work to do, monitoring work as it's done, and troubleshooting issues. 

**Topics**
+ [Connect to an Amazon EMR cluster](emr-connect-master-node.md)
+ [Submit work to an Amazon EMR cluster](emr-work-with-steps.md)
+ [View and monitor an Amazon EMR cluster as it performs work](emr-manage-view.md)
+ [Use Amazon EMR cluster scaling to adjust for changing workloads](emr-scale-on-demand.md)
+ [Terminate an Amazon EMR cluster in the starting, running, or waiting states](UsingEMR_TerminateJobFlow.md)
+ [Clone an Amazon EMR cluster using the console](clone-console.md)
+ [Automate recurring Amazon EMR clusters with AWS Data Pipeline](emr-manage-recurring.md)

# Connect to an Amazon EMR cluster
<a name="emr-connect-master-node"></a>

When you run an Amazon EMR cluster, often all you need to do is run an application to analyze your data and then collect the output from an Amazon S3 bucket. At other times, you may want to interact with the primary node while the cluster is running. For example, you may want to connect to the primary node to run interactive queries, check log files, debug a problem with the cluster, monitor performance using an application such as Ganglia that runs on the primary node, and so on. The following sections describe techniques that you can use to connect to the primary node. 

In an EMR cluster, the primary node is an Amazon EC2 instance that coordinates the EC2 instances that are running as task and core nodes. The primary node exposes a public DNS name that you can use to connect to it. By default, Amazon EMR creates security group rules for the primary node, and for core and task nodes, that determine how you access the nodes.

**Note**  
You can connect to the primary node only while the cluster is running. When the cluster terminates, the EC2 instance acting as the primary node is terminated and is no longer available. To connect to the primary node, you must also authenticate to the cluster. You can either use Kerberos for authentication, or specify an Amazon EC2 key pair private key when you launch the cluster. For more information about configuring Kerberos, and then connecting, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md). When you launch a cluster from the console, the Amazon EC2 key pair private key is specified in the **Security and Access** section on the **Create Cluster** page. 

By default, the ElasticMapReduce-master security group does not permit inbound SSH access. You may need to add an inbound rule that allows SSH access (TCP port 22) from the sources you want to have access. For more information about modifying security group rules, see [Adding rules to a security group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html) in the *Amazon EC2 User Guide*.

**Important**  
Do not modify the remaining rules in the ElasticMapReduce-master security group. Modifying these rules may interfere with the operation of the cluster. 

**Topics**
+ [Before you connect to Amazon EMR: Authorize inbound traffic](emr-connect-ssh-prereqs.md)
+ [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md)

# Before you connect to Amazon EMR: Authorize inbound traffic
<a name="emr-connect-ssh-prereqs"></a>

Before you connect to an Amazon EMR cluster, you must authorize inbound SSH traffic (port 22) from trusted clients such as your computer's IP address. In order to do so, edit the managed security group rules for the nodes to which you want to connect. For example, the following instructions show you how to add an inbound rule for SSH access to the default ElasticMapReduce-master security group.

For more information about using security groups with Amazon EMR, see [Control network traffic with security groups for your Amazon EMR cluster](emr-security-groups.md).

------
#### [ Console ]

**To grant trusted sources SSH access to the primary security group with the console**

To edit your security groups, you must have permission to manage security groups for the VPC that the cluster is in. For more information, see [Changing Permissions for a user](https://docs.aws.amazon.com//IAM/latest/UserGuide/id_users_change-permissions.html) and the [Example Policy](https://docs.aws.amazon.com//IAM/latest/UserGuide/reference_policies_examples_ec2_securitygroups-vpc.html) that allows managing EC2 security groups in the *IAM User Guide*.

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose the cluster that you want to update. This opens up the cluster details page. The **Properties** tab on this page will be pre-selected.

1. Under **Networking** in the **Properties** tab, select the arrow next to **EC2 security groups (firewall)** to expand this section. Under **Primary node**, select the security group link. This opens the EC2 console.

1. Choose the **Inbound rules** tab and then choose **Edit inbound rules**.

1. Check for an inbound rule that allows public access with the following settings. If it exists, choose **Delete** to remove it.
   + **Type**

     SSH
   + **Port**

     22
   + **Source**

     Custom 0.0.0.0/0
**Warning**  
Before December 2020, the ElasticMapReduce-master security group had a pre-configured rule to allow inbound traffic on Port 22 from all sources. This rule was created to simplify initial SSH connections to the primary node. We strongly recommend that you remove this inbound rule and restrict traffic to trusted sources.

1. Scroll to the bottom of the list of rules and choose **Add Rule**.

1. For **Type**, select **SSH**. This selection automatically enters **TCP** for **Protocol** and **22** for **Port Range**.

1. For source, select **My IP** to automatically add your IP address as the source address. You can also add a range of **Custom** trusted client IP addresses, or create additional rules for other clients. Many network environments dynamically allocate IP addresses, so you might need to update your IP addresses for trusted clients in the future.

1. Choose **Save**.

1. Optionally return to Step 3, choose **Core and task nodes**, and repeat Steps 4 - 8. This grants core and task nodes SSH client access.

------

# Connect to the Amazon EMR cluster primary node using SSH
<a name="emr-connect-master-node-ssh"></a>

Secure Shell (SSH) is a network protocol you can use to create a secure connection to a remote computer. After you make a connection, the terminal on your local computer behaves as if it is running on the remote computer. Commands you issue locally run on the remote computer, and the command output from the remote computer appears in your terminal window. 

When you use SSH with AWS, you are connecting to an EC2 instance, which is a virtual server running in the cloud. When working with Amazon EMR, the most common use of SSH is to connect to the EC2 instance that is acting as the primary node of the cluster.

Using SSH to connect to the primary node gives you the ability to monitor and interact with the cluster. You can issue Linux commands on the primary node, run applications such as Hive and Pig interactively, browse directories, read log files, and so on. You can also create a tunnel in your SSH connection to view the web interfaces hosted on the primary node. For more information, see [View web interfaces hosted on Amazon EMR clusters](emr-web-interfaces.md).

To connect to the primary node using SSH, you need the public DNS name of the primary node. In addition, the security group associated with the primary node must have an inbound rule that allows SSH (TCP port 22) traffic from a source that includes the client where the SSH connection originates. You may need to add a rule to allow an SSH connection from your client. For more information about modifying security group rules, see [Control network traffic with security groups for your Amazon EMR cluster](emr-security-groups.md) and [Adding rules to a security group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html) in the *Amazon EC2 User Guide*.

## Retrieve the public DNS name of the primary node
<a name="emr-connect-master-dns"></a>

You can retrieve the primary public DNS name using the Amazon EMR console and the AWS CLI. 

------
#### [ Console ]

**To retrieve the public DNS name of the primary node with the new console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then select the cluster where you want to retrieve the public DNS name.

1. Note the **Primary node public DNS** value in the **Summary** section of the cluster details page.

------
#### [ CLI ]<a name="emr-connect-master-dns-cli"></a>

**To retrieve the public DNS name of the primary node with the AWS CLI**

1. To retrieve the cluster identifier, type the following command.

   ```
   1. aws emr list-clusters
   ```

   The output lists your clusters including the cluster IDs. Note the cluster ID for the cluster to which you are connecting.

   ```
   "Status": {
       "Timeline": {
           "ReadyDateTime": 1408040782.374,
           "CreationDateTime": 1408040501.213
       },
       "State": "WAITING",
       "StateChangeReason": {
           "Message": "Waiting after step completed"
       }
   },
   "NormalizedInstanceHours": 4,
   "Id": "j-2AL4XXXXXX5T9",
   "Name": "My cluster"
   ```

1. To list the cluster instances including the public DNS name for the cluster, type one of the following commands. Replace *j-2AL4XXXXXX5T9* with the cluster ID returned by the previous command.

   ```
   1. aws emr list-instances --cluster-id j-2AL4XXXXXX5T9
   ```

   Or:

   ```
   aws emr describe-cluster --cluster-id j-2AL4XXXXXX5T9
   ```

   The output lists the cluster instances including DNS names and IP addresses. Note the value for `PublicDnsName`. 

   ```
   "Status": {
       "Timeline": {
           "ReadyDateTime": 1408040779.263,
           "CreationDateTime": 1408040515.535
       },
       "State": "RUNNING",
       "StateChangeReason": {}
   },
   "Ec2InstanceId": "i-e89b45e7",
   "PublicDnsName": "ec2-###-##-##-###.us-west-2.compute.amazonaws.com"
   
   "PrivateDnsName": "ip-###-##-##-###.us-west-2.compute.internal",
   "PublicIpAddress": "##.###.###.##",
   "Id": "ci-12XXXXXXXXFMH",
   "PrivateIpAddress": "###.##.#.###"
   ```

For more information, see [Amazon EMR commands in the AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/emr).

------

## Connect to the primary node using SSH and an Amazon EC2 private key on Linux, Unix, and Mac OS X
<a name="emr-connect-linux"></a>

To create an SSH connection authenticated with a private key file, you need to specify the Amazon EC2 key pair private key when you launch a cluster. For more information about accessing your key pair, see [Amazon EC2 key pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*.

Your Linux computer most likely includes an SSH client by default. For example, OpenSSH is installed on most Linux, Unix, and macOS operating systems. You can check for an SSH client by typing **ssh** at the command line. If your computer does not recognize the command, install an SSH client to connect to the primary node. The OpenSSH project provides a free implementation of the full suite of SSH tools. For more information, see the [OpenSSH](http://www.openssh.org/) website.

The following instructions demonstrate opening an SSH connection to the Amazon EMR primary node on Linux, Unix, and Mac OS X. <a name="emr-keypair-file-permission-config"></a>

**To configure the key pair private key file permissions**

Before you can use your Amazon EC2 key pair private key to create an SSH connection, you must set permissions on the `.pem` file so that only the key owner has permission to access the file. This is required for creating an SSH connection using terminal or the AWS CLI.

1. Ensure you've allowed inbound SSH traffic. For instructions, see [Before you connect to Amazon EMR: Authorize inbound traffic](emr-connect-ssh-prereqs.md).

1. Locate your `.pem` file. These instructions assume that the file is named `mykeypair.pem` and that it is stored in the current user's home directory.

1. Type the following command to set the permissions. Replace *\$1/mykeypair.pem* with the full path and file name of your key pair private key file. For example `C:/Users/<username>/.ssh/mykeypair.pem`.

   ```
   1. chmod 400 ~/mykeypair.pem
   ```

   If you do not set permissions on the `.pem` file, you will receive an error indicating that your key file is unprotected and the key will be rejected. To connect, you only need to set permissions on the key pair private key file the first time you use it.<a name="emr-ssh"></a>

**To connect to the primary node using the terminal**

1. Open a terminal window. On Mac OS X, choose **Applications > Utilities > Terminal**. On other Linux distributions, terminal is typically found at **Applications > Accessories > Terminal**.

1. To establish a connection to the primary node, type the following command. Replace *ec2-\$1\$1\$1-\$1\$1-\$1\$1-\$1\$1\$1.compute-1.amazonaws.com* with the primary public DNS name of your cluster and replace *\$1/mykeypair.pem* with the full path and file name of your `.pem` file. For example `C:/Users/<username>/.ssh/mykeypair.pem`.

   ```
   1. ssh hadoop@ec2-###-##-##-###.compute-1.amazonaws.com -i ~/mykeypair.pem
   ```
**Important**  
You must use the login name `hadoop` when you connect to the Amazon EMR primary node; otherwise, you may see an error similar to `Server refused our key`.

1. A warning states that the authenticity of the host you are connecting to cannot be verified. Type `yes` to continue.

1.  When you are done working on the primary node, type the following command to close the SSH connection. 

   ```
   exit
   ```

If you're experiencing difficulty with using SSH to connect to your primary node, see [Troubleshoot connecting to your instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesConnecting.html).

## Connect to the primary node using SSH on Windows
<a name="emr-connect-win"></a>

Windows users can use an SSH client such as PuTTY to connect to the primary node. Before connecting to the Amazon EMR primary node, you should download and install PuTTY and PuTTYgen. You can download these tools from the [PuTTY download page](http://www.chiark.greenend.org.uk/~sgtatham/putty/).

PuTTY does not natively support the key pair private key file format (`.pem`) generated by Amazon EC2. You use PuTTYgen to convert your key file to the required PuTTY format (`.ppk`). You must convert your key into this format (`.ppk`) before attempting to connect to the primary node using PuTTY.

For more information about converting your key, see [Converting your private key using PuTTYgen](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html) in the *Amazon EC2 User Guide*.<a name="emr-ssh-windows"></a>

**To connect to the primary node using PuTTY**

1. Ensure you've allowed inbound SSH traffic. For instructions, see [Before you connect to Amazon EMR: Authorize inbound traffic](emr-connect-ssh-prereqs.md).

1. Open `putty.exe`. You can also launch PuTTY from the Windows programs list. 

1. If necessary, in the **Category** list, choose **Session**.

1. For **Host Name (or IP address)**, type `hadoop@`*MasterPublicDNS*. For example: `hadoop@`*ec2-\$1\$1\$1-\$1\$1-\$1\$1-\$1\$1\$1.compute-1.amazonaws.com*. 

1. In the **Category** list, choose **Connection > SSH**, **Auth**.

1. For **Private key file for authentication**, choose **Browse** and select the `.ppk` file that you generated. 

1. Choose **Open** and then **Yes** to dismiss the PuTTY security alert. 
**Important**  
When logging into the primary node, type `hadoop` if you are prompted for a user name .

1. When you are done working on the primary node, you can close the SSH connection by closing PuTTY.
**Note**  
To prevent the SSH connection from timing out, you can choose **Connection** in the **Category** list and select the option **Enable TCP\$1keepalives**. If you have an active SSH session in PuTTY, you can change your settings by opening the context (right-click) for the PuTTY title bar and choosing **Change Settings**.

If you're experiencing difficulty with using SSH to connect to your primary node, see [Troubleshoot connecting to your instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesConnecting.html).

## Connect to the primary node using the AWS CLI
<a name="emr-connect-cli"></a>

You can create an SSH connection with the primary node using the AWS CLI on Windows and on Linux, Unix, and Mac OS X. Regardless of the platform, you need the public DNS name of the primary node and your Amazon EC2 key pair private key. If you are using the AWS CLI on Linux, Unix, or Mac OS X, you must also set permissions on the private key (`.pem` or `.ppk`) file as shown in [To configure the key pair private key file permissions](#emr-keypair-file-permission-config).<a name="emr-ssh-cli"></a>

**To connect to the primary node using the AWS CLI**

1. Ensure you've allowed inbound SSH traffic. For instructions, see [Before you connect to Amazon EMR: Authorize inbound traffic](emr-connect-ssh-prereqs.md).

1. To retrieve the cluster identifier, type:

   ```
   1. aws emr list-clusters
   ```

   The output lists your clusters including the cluster IDs. Note the cluster ID for the cluster to which you are connecting.

   ```
   "Status": {
       "Timeline": {
           "ReadyDateTime": 1408040782.374,
           "CreationDateTime": 1408040501.213
       },
       "State": "WAITING",
       "StateChangeReason": {
           "Message": "Waiting after step completed"
       }
   },
   "NormalizedInstanceHours": 4,
   "Id": "j-2AL4XXXXXX5T9",
   "Name": "AWS CLI cluster"
   ```

1. Type the following command to open an SSH connection to the primary node. In the following example, replace *j-2AL4XXXXXX5T9* with the cluster ID and replace *\$1/mykeypair.key* with the full path and file name of your `.pem` file (for Linux, Unix, and Mac OS X) or `.ppk` file (for Windows). For example `C:\Users\<username>\.ssh\mykeypair.pem`.

   ```
   aws emr ssh --cluster-id j-2AL4XXXXXX5T9 --key-pair-file ~/mykeypair.key						
   ```

1. When you are done working on the primary node, close the AWS CLI window. 

   For more information, see [Amazon EMR commands in the AWS CLI](https://docs.aws.amazon.com/cli/latest/reference/emr). If you're experiencing difficulty with using SSH to connect to your primary node, see [Troubleshoot connecting to your instance](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/TroubleshootingInstancesConnecting.html).

# Amazon EMR service ports
<a name="emr-service-ports"></a>

**Note**  
The following are interfaces and service ports for components on Amazon EMR. This is not a complete list of service ports. Non-default services, such as SSL ports and different types of protocols, are not listed.

**Important**  
Use caution when you edit security group rules to open ports. Be sure to add rules that only allow traffic from trusted and authenticated clients for the protocols and ports that are required to run your workloads.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-service-ports.html)

# View web interfaces hosted on Amazon EMR clusters
<a name="emr-web-interfaces"></a>

**Important**  
It is possible to configure a custom security group to allow inbound access to these web interfaces. Keep in mind that any port on which you allow inbound traffic represents a potential security vulnerability. Carefully review custom security groups to ensure that you minimize vulnerabilities. For more information, see [Control network traffic with security groups for your Amazon EMR cluster](emr-security-groups.md).

Hadoop and other applications that you install on your EMR cluster publish user interfaces as web sites that are hosted on the primary node. For security reasons, when using Amazon EMR Managed Security Groups, these web sites are only available on the primary node's local web server. For that reason, you need to connect to the primary node to view the web interfaces. For more information, see [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md). Hadoop also publishes user interfaces as web sites hosted on the core and task nodes. These web sites are also only available on local web servers on the nodes. 

The following table lists web interfaces that you can view on cluster instances. These Hadoop interfaces are available on all clusters. For the master instance interfaces, replace *master-public-dns-name* with the **Master public DNS** listed on the cluster **Summary** tab in the Amazon EMR console. For core and task instance interfaces, replace *coretask-public-dns-name* with the **Public DNS name** listed for the instance. To find an instance's **Public DNS name**, in the Amazon EMR console, choose your cluster from the list, choose the **Hardware** tab, choose the **ID** of the instance group that contains the instance you want to connect to, and then note the **Public DNS name** listed for the instance.


|  Name of interface |   URI  | 
| --- | --- | 
| Flink history server (EMR version 5.33 and later) | http://master-public-dns-name:8082/ | 
| Ganglia | http://master-public-dns-name/ganglia/ | 
| Hadoop HDFS NameNode (EMR version pre-6.x) | http://master-public-dns-name:50070/ | 
| Hadoop HDFS NameNode (EMR version pre-6.x) | https://master-public-dns-name:50470/ | 
| Hadoop HDFS NameNode (EMR version 6.x) | http://master-public-dns-name:9870/ | 
| Hadoop HDFS NameNode (EMR version 6.x) | https://master-public-dns-name:9871/ | 
| Hadoop HDFS DataNode(EMR version pre-6.x) | http://coretask-public-dns-name:50075/ | 
| Hadoop HDFS DataNode (EMR version pre-6.x) | https://coretask-public-dns-name:50475/ | 
| Hadoop HDFS DataNode(EMR version 6.x) | http://coretask-public-dns-name:9864/ | 
| Hadoop HDFS DataNode (EMR version 6.x) | https://coretask-public-dns-name:9865/ | 
| HBase | http://master-public-dns-name:16010/ | 
| Hue | http://master-public-dns-name:8888/ | 
| JupyterHub | https://master-public-dns-name:9443/ | 
| Livy | http://master-public-dns-name:8998/ | 
| Spark HistoryServer | http://master-public-dns-name:18080/ | 
| Tez | http://master-public-dns-name:8080/tez-ui | 
| YARN NodeManager | http://coretask-public-dns-name:8042/ | 
| YARN ResourceManager | http://master-public-dns-name:8088/ | 
| Zeppelin | http://master-public-dns-name:8890/ | 

Because there are several application-specific interfaces available on the primary node that are not available on the core and task nodes, the instructions in this document are specific to the Amazon EMR primary node. Accessing the web interfaces on the core and task nodes can be done in the same manner as you would access the web interfaces on the primary node. 

There are several ways you can access the web interfaces on the primary node. The easiest and quickest method is to use SSH to connect to the primary node and use the text-based browser, Lynx, to view the web sites in your SSH client. However, Lynx is a text-based browser with a limited user interface that cannot display graphics. The following example shows how to open the Hadoop ResourceManager interface using Lynx (Lynx URLs are also provided when you log into the primary node using SSH). 

```
lynx http://ip-###-##-##-###.us-west-2.compute.internal:8088/
```

There are two remaining options for accessing web interfaces on the primary node that provide full browser functionality. Choose one of the following: 
+ Option 1 (recommended for more technical users): Use an SSH client to connect to the primary node, configure SSH tunneling with local port forwarding, and use an Internet browser to open web interfaces hosted on the primary node. This method allows you to configure web interface access without using a SOCKS proxy.
+ Option 2 (recommended for new users): Use an SSH client to connect to the primary node, configure SSH tunneling with dynamic port forwarding, and configure your Internet browser to use an add-on such as FoxyProxy for Firefox or SwitchyOmega for Chrome to manage your SOCKS proxy settings. This method lets you automatically filter URLs based on text patterns and limit the proxy settings to domains that match the form of the primary node's DNS name. For more information about how to configure FoxyProxy for Firefox and Google Chrome, see [Option 2, part 2: Configure proxy settings to view websites hosted on the Amazon EMR cluster primary node](emr-connect-master-node-proxy.md).

**Note**  
If you modify the port where an application runs via cluster configuration, the hyperlink to the port will not update in the Amazon EMR console. This is because the console doesn't have the functionality to read `server.port` configuration.

With Amazon EMR version 5.25.0 or later, you can access Spark history server UI from the console without setting up a web proxy through an SSH connection. For more information, see [One-click access to persistent Spark history server](https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html).

**Topics**
+ [Option 1: Set up an SSH tunnel to the Amazon EMR primary node using local port forwarding](emr-ssh-tunnel-local.md)
+ [Option 2, part 1: Set up an SSH tunnel to the primary node using dynamic port forwarding](emr-ssh-tunnel.md)
+ [Option 2, part 2: Configure proxy settings to view websites hosted on the Amazon EMR cluster primary node](emr-connect-master-node-proxy.md)

# Option 1: Set up an SSH tunnel to the Amazon EMR primary node using local port forwarding
<a name="emr-ssh-tunnel-local"></a>

To connect to the local web server on the primary node, you create an SSH tunnel between your computer and the primary node. This is also known as *port forwarding*. If you do not wish to use a SOCKS proxy, you can set up an SSH tunnel to the primary node using local port forwarding. With local port forwarding, you specify unused local ports that are used to forward traffic to specific remote ports on the primary node's local web server. 

Setting up an SSH tunnel using local port forwarding requires the public DNS name of the primary node and your key pair private key file. For information about how to locate the master public DNS name, see [Retrieve the public DNS name of the primary node](emr-connect-master-node-ssh.md#emr-connect-master-dns). For more information about accessing your key pair, see [Amazon EC2 key pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*. For more information about the sites you might want to view on the primary node, see [View web interfaces hosted on Amazon EMR clusters](emr-web-interfaces.md).

## Set up an SSH tunnel to the primary node using local port forwarding with OpenSSH
<a name="ssh-tunnel-local-linux"></a><a name="tunnel-local-linux"></a>

**To set up an SSH tunnel using local port forwarding in terminal**

1. Ensure you've allowed inbound SSH traffic. For instructions, see [Before you connect to Amazon EMR: Authorize inbound traffic](emr-connect-ssh-prereqs.md).

1. Open a terminal window. On Mac OS X, choose **Applications > Utilities > Terminal**. On other Linux distributions, terminal is typically found at **Applications > Accessories > Terminal**.

1. Type the following command to open an SSH tunnel on your local machine. This example command accesses the ResourceManager web interface by forwarding traffic on local port 8157 (a randomly chosen unused local port) to port 8088 on the master node's local web server. 

   In the command, replace *\$1/mykeypair.pem* with the location and file name of your `.pem` file and replace *ec2-\$1\$1\$1-\$1\$1-\$1\$1-\$1\$1\$1.compute-1.amazonaws.com* with the master public DNS name of your cluster. To access a different web interface, replace `8088` with the appropriate port number. For example, replace `8088` with `8890` for the Zeppelin interface.

   ```
   ssh -i ~/mykeypair.pem -N -L 8157:ec2-###-##-##-###.compute-1.amazonaws.com:8088 hadoop@ec2-###-##-##-###.compute-1.amazonaws.com
   ```

   `-L` signifies the use of local port forwarding which allows you to specify a local port used to forward data to the identified remote port on the master node's local web server.

   After you issue this command, the terminal remains open and does not return a response. 

1. To open the ResourceManager web interface in your browser, type `http://localhost:8157/` in the address bar. 

1. When you are done working with the web interfaces on the primary node, close the terminal windows.

# Option 2, part 1: Set up an SSH tunnel to the primary node using dynamic port forwarding
<a name="emr-ssh-tunnel"></a>

To connect to the local web server on the primary node, you create an SSH tunnel between your computer and the primary node. This is also known as *port forwarding*. If you create your SSH tunnel using dynamic port forwarding, all traffic routed to a specified unused local port is forwarded to the local web server on the primary node. This creates a SOCKS proxy. You can then configure your Internet browser to use an add-on such as FoxyProxy or SwitchyOmega to manage your SOCKS proxy settings. 

Using a proxy management add-on allows you to automatically filter URLs based on text patterns and to limit the proxy settings to domains that match the form of the primary node's public DNS name. The browser add-on automatically handles turning the proxy on and off when you switch between viewing websites hosted on the primary node, and those on the Internet. 

Before you begin, you need the public DNS name of the primary node and your key pair private key file. For information about how to locate the primary public DNS name, see [Retrieve the public DNS name of the primary node](emr-connect-master-node-ssh.md#emr-connect-master-dns). For more information about accessing your key pair, see [Amazon EC2 key pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*. For more information about the sites you might want to view on the primary node, see [View web interfaces hosted on Amazon EMR clusters](emr-web-interfaces.md).

## Set up an SSH tunnel to the primary node using dynamic port forwarding with OpenSSH
<a name="emr-ssh-tunnel-linux"></a><a name="emr-ssh-tunnel-unix"></a>

**To set up an SSH tunnel using dynamic port forwarding with OpenSSH**

1. Ensure you've allowed inbound SSH traffic. For instructions, see [Before you connect to Amazon EMR: Authorize inbound traffic](emr-connect-ssh-prereqs.md).

1. Open a terminal window. On Mac OS X, choose **Applications > Utilities > Terminal**. On other Linux distributions, terminal is typically found at **Applications > Accessories > Terminal**.

1. Type the following command to open an SSH tunnel on your local machine. Replace *\$1/mykeypair.pem* with the location and file name of your `.pem` file, replace *8157* with an unused, local port number, and replace *ec2-\$1\$1\$1-\$1\$1-\$1\$1-\$1\$1\$1.compute-1.amazonaws.com* with the primary public DNS name of your cluster. 

   ```
   1. ssh -i ~/mykeypair.pem -N -D 8157 hadoop@ec2-###-##-##-###.compute-1.amazonaws.com
   ```

   After you issue this command, the terminal remains open and does not return a response. 
**Note**  
`-D` signifies the use of dynamic port forwarding which allows you to specify a local port used to forward data to all remote ports on the primary node's local web server. Dynamic port forwarding creates a local SOCKS proxy listening on the port specified in the command.

1. After the tunnel is active, configure a SOCKS proxy for your browser. For more information, see [Option 2, part 2: Configure proxy settings to view websites hosted on the Amazon EMR cluster primary node](emr-connect-master-node-proxy.md).

1. When you are done working with the web interfaces on the primary node, close the terminal window.

## Set up an SSH tunnel using dynamic port forwarding with the AWS CLI
<a name="emr-ssh-tunnel-cli"></a>

You can create an SSH connection with the primary node using the AWS CLI on Windows and on Linux, Unix, and Mac OS X. If you are using the AWS CLI on Linux, Unix, or Mac OS X, you must set permissions on the `.pem` file as shown in [To configure the key pair private key file permissions](emr-connect-master-node-ssh.md#emr-keypair-file-permission-config). If you are using the AWS CLI on Windows, PuTTY must appear in the path environment variable or you may receive an error such as OpenSSH or PuTTY not available.<a name="ssh-tunnel-cli"></a>

**To set up an SSH tunnel using dynamic port forwarding with the AWS CLI**

1. Ensure you've allowed inbound SSH traffic. For instructions, see [Before you connect to Amazon EMR: Authorize inbound traffic](emr-connect-ssh-prereqs.md).

1. Create an SSH connection with the primary node as shown in [Connect to the primary node using the AWS CLI](emr-connect-master-node-ssh.md#emr-connect-cli). 

1. To retrieve the cluster identifier, type:

   ```
   1. aws emr list-clusters
   ```

   The output lists your clusters including the cluster IDs. Note the cluster ID for the cluster to which you are connecting.

   ```
   "Status": {
       "Timeline": {
           "ReadyDateTime": 1408040782.374,
           "CreationDateTime": 1408040501.213
       },
       "State": "WAITING",
       "StateChangeReason": {
           "Message": "Waiting after step completed"
       }
   },
   "NormalizedInstanceHours": 4,
   "Id": "j-2AL4XXXXXX5T9",
   "Name": "AWS CLI cluster"
   ```

1. Type the following command to open an SSH tunnel to the primary node using dynamic port forwarding. In the following example, replace *j-2AL4XXXXXX5T9* with the cluster ID and replace *\$1/mykeypair.key* with the location and file name of your `.pem` file (for Linux, Unix, and Mac OS X) or `.ppk` file (for Windows).

   ```
   aws emr socks --cluster-id j-2AL4XXXXXX5T9 --key-pair-file ~/mykeypair.key						
   ```
**Note**  
The socks command automatically configures dynamic port forwarding on local port 8157. Currently, this setting cannot be modified.

1. After the tunnel is active, configure a SOCKS proxy for your browser. For more information, see [Option 2, part 2: Configure proxy settings to view websites hosted on the Amazon EMR cluster primary node](emr-connect-master-node-proxy.md).

1. When you are done working with the web interfaces on the primary node, close the AWS CLI window. 

   For more information on using Amazon EMR commands in the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/emr](https://docs.aws.amazon.com/cli/latest/reference/emr).

## Set up an SSH tunnel to the primary node using PuTTY
<a name="emr-ssh-tunnel-win"></a>

Windows users can use an SSH client such as PuTTY to create an SSH tunnel to the primary node. Before connecting to the Amazon EMR primary node, you should download and install PuTTY and PuTTYgen. You can download these tools from the [PuTTY download page](http://www.chiark.greenend.org.uk/~sgtatham/putty/).

PuTTY does not natively support the key pair private key file format (`.pem`) generated by Amazon EC2. You use PuTTYgen to convert your key file to the required PuTTY format (`.ppk`). You must convert your key into this format (`.ppk`) before attempting to connect to the primary node using PuTTY.

For more information about converting your key, see [Converting your private key using PuTTYgen](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html) in the *Amazon EC2 User Guide*.<a name="emr-ssh-tunnel-putty"></a>

**To set up an SSH tunnel using dynamic port forwarding using PuTTY**

1. Ensure you've allowed inbound SSH traffic. For instructions, see [Before you connect to Amazon EMR: Authorize inbound traffic](emr-connect-ssh-prereqs.md).

1. Double-click `putty.exe` to start PuTTY. You can also launch PuTTY from the Windows programs list. 
**Note**  
If you already have an active SSH session with the primary node, you can add a tunnel by right-clicking the PuTTY title bar and choosing **Change Settings**. 

1. If necessary, in the **Category** list, choose **Session**.

1. In the **Host Name** field, type **hadoop@***MasterPublicDNS*. For example: **hadoop@***ec2-\$1\$1\$1-\$1\$1-\$1\$1-\$1\$1\$1.compute-1.amazonaws.com*. 

1. In the **Category** list, expand **Connection > SSH**, and then choose **Auth**.

1. For **Private key file for authentication**, choose **Browse** and select the `.ppk` file that you generated. 
**Note**  
PuTTY does not natively support the key pair private key file format (`.pem`) generated by Amazon EC2. You use PuTTYgen to convert your key file to the required PuTTY format (`.ppk`). You must convert your key into this format (`.ppk`) before attempting to connect to the primary node using PuTTY.

1. In the **Category** list, expand **Connection > SSH**, and then choose **Tunnels**. 

1. In the **Source port** field, type `8157` (an unused local port), and then choose **Add**.

1. Leave the **Destination** field blank.

1. Select the **Dynamic** and **Auto** options.

1. Choose **Open**. 

1. Choose **Yes** to dismiss the PuTTY security alert.
**Important**  
When you log in to the primary node, type `hadoop` if you are prompted for a user name.

1. After the tunnel is active, configure a SOCKS proxy for your browser. For more information, see [Option 2, part 2: Configure proxy settings to view websites hosted on the Amazon EMR cluster primary node](emr-connect-master-node-proxy.md).

1. When you are done working with the web interfaces on the primary node, close the PuTTY window. 

# Option 2, part 2: Configure proxy settings to view websites hosted on the Amazon EMR cluster primary node
<a name="emr-connect-master-node-proxy"></a>

If you use an SSH tunnel with dynamic port forwarding, you must use a SOCKS proxy management add-on to control the proxy settings in your browser. Using a SOCKS proxy management tool allows you to automatically filter URLs based on text patterns and to limit the proxy settings to domains that match the form of the primary node's public DNS name. The browser add-on automatically handles turning the proxy on and off when you switch between viewing websites hosted on the primary node and those on the Internet. To manage your proxy settings, configure your browser to use an add-on such as FoxyProxy or SwitchyOmega. 

For more information about creating an SSH tunnel, see [Option 2, part 1: Set up an SSH tunnel to the primary node using dynamic port forwarding](emr-ssh-tunnel.md). For more information about the available web interfaces, see [View web interfaces hosted on Amazon EMR clusters](emr-web-interfaces.md). 



Include the following settings when you set up your proxy add-on:
+ Use **localhost** as the host address.
+ Use the same local port number that you selected to establish the SSH tunnel with the primary node in [Option 2, part 1: Set up an SSH tunnel to the primary node using dynamic port forwarding](emr-ssh-tunnel.md). For example, port *8157*. This port must also match the port number you use in PuTTY or any other terminal emulator you use to connect.
+ Specify the **SOCKS v5** protocol. SOCKS v5 lets you optionally set up user authorization.
+ **URL Patterns**

  The following URL patterns should be allow-listed and specified with a wildcard pattern type:
  + The **\$1ec2\$1.\$1compute\$1.amazonaws.com\$1** and **\$110\$1.amazonaws.com\$1** patterns to match the public DNS name of clusters in US regions.
  + The **\$1ec2\$1.compute\$1** and **\$110\$1.compute\$1** patterns to match the public DNS name of clusters in all other regions.
  +  A **10.\$1** pattern to provide access to the JobTracker log files in Hadoop. Alter this filter if it conflicts with your network access plan.
  + The **\$1.ec2.internal\$1** and **\$1.compute.internal\$1** patterns to match the private (internal) DNS names of clusters in the `us-east-1` region and all other regions, respectively.

## Example: Configure FoxyProxy for Firefox
<a name="emr-connect-foxy-proxy-chrome"></a>

The following example demonstrates a FoxyProxy Standard (version 7.5.1) configuration for Mozilla Firefox.

FoxyProxy provides a set of proxy management tools. It lets you use a proxy server for URLs that match patterns corresponding to domains used by the Amazon EC2 instances in your Amazon EMR cluster.<a name="foxy-proxy"></a>

**To install and configure FoxyProxy using Mozilla Firefox**

1. In Firefox, go to [https://addons.mozilla.org/](https://addons.mozilla.org/), search for FoxyProxy Standard, and follow the instructions to add FoxyProxy to Firefox.

1. Using a text editor, create a JSON file named `foxyproxy-settings.json` from the following example configuration.

   ```
   {
     "k20d21508277536715": {
       "active": true,
       "address": "localhost",
       "port": 8157,
       "username": "",
       "password": "",
       "type": 3,
       "proxyDNS": true,
       "title": "emr-socks-proxy",
       "color": "#0055E5",
       "index": 9007199254740991,
       "whitePatterns": [
         {
           "title": "*ec2*.*compute*.amazonaws.com*",
           "active": true,
           "pattern": "*ec2*.*compute*.amazonaws.com*",
           "importedPattern": "*ec2*.*compute*.amazonaws.com*",
           "type": 1,
           "protocols": 1
         },
         {
           "title": "*ec2*.compute*",
           "active": true,
           "pattern": "*ec2*.compute*",
           "importedPattern": "*ec2*.compute*",
           "type": 1,
           "protocols": 1
         },
         {
           "title": "10.*",
           "active": true,
           "pattern": "10.*",
           "importedPattern": "http://10.*",
           "type": 1,
           "protocols": 2
         },
         {
           "title": "*10*.amazonaws.com*",
           "active": true,
           "pattern": "*10*.amazonaws.com*",
           "importedPattern": "*10*.amazonaws.com*",
           "type": 1,
           "protocols": 1
         },
         {
           "title": "*10*.compute*",
           "active": true,
           "pattern": "*10*.compute*",
           "importedPattern": "*10*.compute*",
           "type": 1,
           "protocols": 1
         },
         {
           "title": "*.compute.internal*",
           "active": true,
           "pattern": "*.compute.internal*",
           "importedPattern": "*.compute.internal*",
           "type": 1,
           "protocols": 1
         },
         {
           "title": "*.ec2.internal* ",
           "active": true,
           "pattern": "*.ec2.internal*",
           "importedPattern": "*.ec2.internal*",
           "type": 1,
           "protocols": 1
         }
       ],
       "blackPatterns": []
     },
     "logging": {
       "size": 100,
       "active": false
     },
     "mode": "patterns",
     "browserVersion": "68.12.0",
     "foxyProxyVersion": "7.5.1",
     "foxyProxyEdition": "standard"
   }
   ```

1. Open the Firefox **Manage Your Extensions** page (go to **about:addons**, then choose **Extensions**).

1. Choose **FoxyProxy Standard**, then choose the more options button (the button that looks like an ellipsis).

1. Select **Options** from the dropdown.

1. Choose **Import Settings** from the left menu.

1. On the **Import Settings** page, choose **Import Settings** under **Import Settings from FoxyProxy 6.0\$1**, browse to the location of the `foxyproxy-settings.json` file you created, select the file, and choose **Open**. 

1. Choose **OK** when prompted to overwrite the existing settings and save your new configuration.

## Example: Configure SwitchyOmega for chrome
<a name="switchyomega"></a>

The following example demonstrates how to set up the SwitchyOmega extension for Google Chrome. SwitchyOmega lets you configure, manage, and switch between multiple proxies.

**To install and configure SwitchyOmega using Google Chrome**

1. Go to [https://chrome.google.com/webstore/category/extensions](https://chrome.google.com/webstore/category/extensions), search for **Proxy SwitchyOmega**, and add it to Chrome.

1. Choose **New profile** and enter `emr-socks-proxy` as the profile name.

1. Choose **PAC profile** and then **Create**. [Proxy Auto-Configuration (PAC)](https://developer.mozilla.org/en-US/docs/Web/HTTP/Proxy_servers_and_tunneling/Proxy_Auto-Configuration_(PAC)_file) files help you define an allow list for browser requests that should be forwarded to a web proxy server.

1. In the **PAC Script** field, replace the contents with the following script that defines which URLs should be forwarded through your web proxy server. If you specified a different port number when you set up your SSH tunnel, replace *8157* with your port number.

   ```
   function FindProxyForURL(url, host) {
       if (shExpMatch(url, "*ec2*.*compute*.amazonaws.com*")) return 'SOCKS5 localhost:8157';
       if (shExpMatch(url, "*ec2*.compute*")) return 'SOCKS5 localhost:8157';
       if (shExpMatch(url, "http://10.*")) return 'SOCKS5 localhost:8157';
       if (shExpMatch(url, "*10*.compute*")) return 'SOCKS5 localhost:8157';
       if (shExpMatch(url, "*10*.amazonaws.com*")) return 'SOCKS5 localhost:8157';
       if (shExpMatch(url, "*.compute.internal*")) return 'SOCKS5 localhost:8157';
       if (shExpMatch(url, "*ec2.internal*")) return 'SOCKS5 localhost:8157';
       return 'DIRECT';
   }
   ```

1. Under **Actions**, choose **Apply changes** to save your proxy settings.

1. On the Chrome toolbar, choose SwitchyOmega and select the `emr-socks-proxy` profile.

## Access a web interface in the browser
<a name="connect-to-web-ui-browser"></a>

To open a web interface, enter the public DNS name of your primary or core node followed by the port number for your chosen interface into your browser address bar. The following example shows the URL you would enter to connect to the Spark HistoryServer.

```
http://master-public-dns-name:18080/				
```

For instructions on retrieving the public DNS name of a node, see [Retrieve the public DNS name of the primary node](emr-connect-master-node-ssh.md#emr-connect-master-dns). For a complete list of web interface URLs, see [View web interfaces hosted on Amazon EMR clusters](emr-web-interfaces.md).

# Submit work to an Amazon EMR cluster
<a name="emr-work-with-steps"></a>

This section describes the methods that you can use to submit work to an Amazon EMR cluster. To submit work, you can add steps, or you can interactively submit Hadoop jobs to the primary node.

Consider the following rules of step behavior when you submit steps to a cluster:
+ A step ID can contain up to 256 characters.
+ You can have up to 256 PENDING and RUNNING steps in a cluster.
+ Even if you have 256 active steps running on a cluster, you can interactively submit jobs to the primary node. You can submit an unlimited number of steps over the lifetime of a long-running cluster, but only 256 steps can be RUNNING or PENDING at any given time.
+ With Amazon EMR versions 4.8.0 and later, except version 5.0.0, you can cancel pending steps. For more information, see [Cancel steps when you submit work to an Amazon EMR cluster](emr-cancel-steps.md).
+ With Amazon EMR versions 5.28.0 and later, you can cancel both pending and running steps. You can also choose to run multiple steps in parallel to improve cluster utilization and save cost. For more information, see [Considerations for running multiple steps in parallel when you submit work to Amazon EMR](emr-concurrent-steps.md).

**Note**  
For the best performance, we recommend that you store custom bootstrap actions, scripts, and other files that you want to use with Amazon EMR in an Amazon S3 bucket that is in the same AWS Region as your cluster.

**Topics**
+ [Adding steps to a cluster with the Amazon EMR Management Console](emr-add-steps-console.md)
+ [Adding steps to an Amazon EMR cluster with the AWS CLI](add-step-cli.md)
+ [Considerations for running multiple steps in parallel when you submit work to Amazon EMR](emr-concurrent-steps.md)
+ [Viewing steps after submitting work to an Amazon EMR cluster](emr-view-steps.md)
+ [Cancel steps when you submit work to an Amazon EMR cluster](emr-cancel-steps.md)

# Adding steps to a cluster with the Amazon EMR Management Console
<a name="emr-add-steps-console"></a>

Use the following procedures to add steps to a cluster with the AWS Management Console. For detailed information about how to submit steps for specific big data applications, see the following sections of the *[Amazon EMR Release Guide](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-components.html)*:
+ [Submit a custom JAR step ](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-launch-custom-jar-cli.html) 
+ [Submit a Hadoop streaming step ](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/CLI_CreateStreaming.html) 
+ [Submit a Spark step ](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html) 
+ [Submit a Pig step](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-pig-launch.html#ConsoleCreatingaPigJob) 
+ [Run a command or script as a step ](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-launch-custom-jar-cli.html) 
+ [Pass values into steps to run Hive scripts](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-differences.html#emr-hive-additional-features) 

## Add steps during cluster creation
<a name="emr-add-steps-console-cluster-creation"></a>

From the AWS Management Console, you can add steps when you create a cluster.

------
#### [ Console ]

**To add steps when you create a cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. Under **Steps**, choose **Add step**. Enter appropriate values in the fields in the **Add step** dialog. For information on formatting your step arguments, see [Add step arguments](#emr-add-steps-console-arguments). Options differ depending on the step type. To add your step and exit the dialog, select **Add step**.

1. Choose any other options that apply to your cluster. 

1. To launch your cluster, choose **Create cluster**.

------

## Add steps to a running cluster
<a name="emr-add-steps-console-running-cluster"></a>

With the AWS Management Console, you can add steps to a cluster with the auto-terminate option disabled. 

------
#### [ Console ]

**To add steps to a running cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and select the cluster that you want to update.

1. On the **Steps** tab on the cluster details page, select **Add step**. To clone an existing step, choose the **Actions** dropdown menu and select **Clone step**.

1. Enter appropriate values in the fields in the **Add step** dialog. Options differ depending on the step type. To add your step and exit the dialog, choose **Add step**.

------

## Modify the step concurrency level in a running cluster
<a name="emr-add-steps-console-modify-concurrency"></a>

With the AWS Management Console, you can modify the step concurrency level in a running cluster. 

**Note**  
You can only run multiple steps in parallel with Amazon EMR version 5.28.0 and later. 

------
#### [ Console ]

**To modify step concurrency in a running cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and select the cluster that you want to update. The cluster must be running to change its concurrency attribute.

1. On the **Steps** tab on the cluster details page, find the **Attributes** section. Select **Edit** to change the concurrency. Enter a value between 1 and 256.

------

## Add step arguments
<a name="emr-add-steps-console-arguments"></a>

When you use the AWS Management Console to add a step to your cluster, you can specify arguments for that step in the **Arguments** field. You must separate arguments with whitespace and surround string arguments that consist of characters *and* whitespace with quotation marks.

**Example : Correct arguments**  
The following example arguments are formatted correctly for the AWS Management Console, with quotation marks around the final string argument.  

```
bash -c "aws s3 cp s3://amzn-s3-demo-bucket/my-script.sh ."
```
You can also put each argument on a separate line for readability as shown in the following example.  

```
bash 
-c 
"aws s3 cp s3://amzn-s3-demo-bucket/my-script.sh ."
```

**Example : Incorrect arguments**  
The following example arguments are improperly formatted for the AWS Management Console. Notice that the final string argument, `aws s3 cp s3://amzn-s3-demo-bucket/my-script.sh .`, contains whitespace and is not surrounded by quotation marks.  

```
bash -c aws s3 cp s3://amzn-s3-demo-bucket/my-script.sh .
```

# Adding steps to an Amazon EMR cluster with the AWS CLI
<a name="add-step-cli"></a>

The following procedures demonstrate how to add steps to a newly created cluster and to a running cluster with the AWS CLI. Both examples use the `--steps` subcommand to add steps to the cluster. 

**To add steps during cluster creation**
+ Type the following command to create a cluster and add an Apache Pig step. Make sure to replace *`myKey`* with the name of your Amazon EC2 key pair.

  ```
  1. aws emr create-cluster --name "Test cluster" \
  2. --applications Name=Spark \
  3. --use-default-roles \
  4. --ec2-attributes KeyName=myKey \
  5. --instance-groups InstanceGroupType=PRIMARY,InstanceCount=1,InstanceType=m5.xlarge InstanceGroupType=CORE,InstanceCount=2,InstanceType=m5.xlarge \
  6. --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--class","org.apache.spark.examples.SparkPi","/usr/lib/spark/examples/jars/spark-examples.jar","5"],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]'
  ```
**Note**  
The list of arguments changes depending on the type of step.

  By default, the step concurrency level is `1`. You can set the step concurrency level with the `StepConcurrencyLevel` parameter when you create a cluster. 

  The output is a cluster identifier similar to the following. 

  ```
  1. {
  2.     "ClusterId": "j-2AXXXXXXGAPLF"
  3. }
  ```

**To add a step to a running cluster**
+ Type the following command to add a step to a running cluster. Replace `j-2AXXXXXXGAPLF` with your own cluster ID.

  ```
  aws emr add-steps --cluster-id j-2AXXXXXXGAPLF \
  --steps '[{"Args":["spark-submit","--deploy-mode","cluster","--class","org.apache.spark.examples.SparkPi","/usr/lib/spark/examples/jars/spark-examples.jar","5"],"Type":"CUSTOM_JAR","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Name":"Spark application"}]'
  ```

   The output is a step identifier similar to the following.

  ```
  1. {
  2.     "StepIds": [
  3. 	"s-Y9XXXXXXAPMD"
  4.     ]
  5. }
  ```

**To modify the StepConcurrencyLevel in a running cluster**

1. In a running cluster, you can modify the `StepConcurrencyLevel` with the `ModifyCluster` API. For example, type the following command to increase the `StepConcurrencyLevel` to `10`. Replace `j-2AXXXXXXGAPLF` with your cluster ID.

   ```
   aws emr modify-cluster --cluster-id j-2AXXXXXXGAPLF --step-concurrency-level 10
   ```

1. The output is similar to the following.

   ```
   {
   "StepConcurrencyLevel": 10
   }
   ```

For more information on using Amazon EMR commands in the AWS CLI, see the [AWS CLI Command Reference](https://docs.aws.amazon.com/cli/latest/reference/emr).

# Considerations for running multiple steps in parallel when you submit work to Amazon EMR
<a name="emr-concurrent-steps"></a>

Running multiple steps in parallel when you submit work to Amazon EMR requires preliminary decisions about resource planning and expectations regarding cluster behavior. These are covered in detail here.
+ Steps running in parallel may complete in any order, but pending steps in queue transition to running state in the order they were submitted.
+ When you select a step concurrency level for your cluster, you must consider whether or not the primary node instance type meets the memory requirements of user workloads. The main step executer process runs on the primary node for each step. Running multiple steps in parallel requires more memory and CPU utilization from the primary node than running one step at a time. 
+ To achieve complex scheduling and resource management of concurrent steps, you can use YARN scheduling features such as `FairScheduler` or `CapacityScheduler`. For example, you can use `FairScheduler` with a `queueMaxAppsDefault` set to prevent more than a certain number of jobs from running at a time. 
+ The step concurrency level is subject to the configurations of resource managers. For example, if YARN is configured with only a parallelism of `5`, then you can only have five YARN applications running in parallel even if the `StepConcurrencyLevel` is set to `10`. For more information about configuring resource managers, see [Configure applications](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html) in the *Amazon EMR Release Guide*.
+ You cannot add a step with an `ActionOnFailure` other than CONTINUE while the step concurrency level of the cluster is greater than 1.
+ If the step concurrency level of a cluster is greater than one, step `ActionOnFailure` feature will not activate.
+ If a cluster has step concurrency level `1` but has multiple running steps, `TERMINATE_CLUSTER ActionOnFailure` may activate, but `CANCEL_AND_WAIT ActionOnFailure` will not. This edge case arises when the cluster step concurrency level was greater than one, but lowered while multiple steps were running.
+ You can use EMR automatic scaling to scale up and down based on the YARN resources to prevent resource contention. For more information, see [Using automatic scaling with a custom policy for instance groups](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-automatic-scaling.html) in the *Amazon EMR Management Guide*.
+ When you decrease the step concurrent level, EMR allows any running steps to complete before reducing the number of steps. If the resources are exhausted because the cluster is running too many concurrent steps, we recommend manually canceling any running steps to free up resources.

# Viewing steps after submitting work to an Amazon EMR cluster
<a name="emr-view-steps"></a>

You can see up to 10,000 steps that Amazon EMR completed within the last seven days. You can also view 1,000 steps that Amazon EMR completed any time. This total includes both user-submitted and system steps.

If you submit new steps once the cluster reaches the 1,000 step record limit, Amazon EMR deletes the inactive user-submitted steps whose statuses have been COMPLETED, CANCELLED, or FAILED for more than seven days. If you submit steps beyond the 10,000 step record limit, Amazon EMR deletes the inactive user-submitted step records regardless of their inactive duration. Amazon EMR doesn't remove these records from the log files. Amazon EMR removes them from the AWS console, and they aren't returned when you use the AWS CLI or API to retrieve cluster information. System step records are never removed.

The step information you can view depends on the mechanism used to retrieve cluster information. The following table indicates the step information returned by each of the available options. 

 


| Option | DescribeJobFlow or --describe --jobflow | ListSteps or list-steps | 
| --- | --- | --- | 
| SDK | 256 steps | Up to 10,000 steps | 
| Amazon EMR CLI | 256 steps | NA | 
| AWS CLI | NA | Up to 10,000 steps | 
| API | 256 steps | Up to 10,000 steps | 

# Cancel steps when you submit work to an Amazon EMR cluster
<a name="emr-cancel-steps"></a>

You can cancel pending and running steps from the AWS Management Console, the AWS CLI, or the Amazon EMR, when you submit work to your cluster. API.

------
#### [ Console ]

**To cancel steps with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then select the cluster that you want to update.

1. On the **Steps** tab on the cluster details page, select the check box next to the step you wants to cancel. Choose the **Actions** dropdown menu and then select **Cancel steps**.

1. In the **Cancel the step** dialog, choose to either cancel the step and wait for it to exit, or cancel the step and force it to exit. Then choose **Confirm**.

1. The status of the steps in the **Steps** table changes to `CANCELLED`. 

------
#### [ CLI ]

**To cancel with using the AWS CLI**
+ Use the `aws emr cancel-steps` command, specifying the cluster and steps to cancel. The following example demonstrates an AWS CLI command to cancel two steps.

  ```
  aws emr cancel-steps --cluster-id j-2QUAXXXXXXXXX \
  --step-ids s-3M8DXXXXXXXXX s-3M8DXXXXXXXXX \
  --step-cancellation-option SEND_INTERRUPT
  ```

With Amazon EMR version 5.28.0, you can choose one of the two following cancellation options for `StepCancellationOption` parameter when canceling steps. 
+ `SEND_INTERRUPT`– This is the default option. When a step cancellation request is received, EMR sends a `SIGTERM` signal to the step. add a `SIGTERM` signal handler to your step logic to catch this signal and terminate descendant step processes or wait for them to complete.
+ `TERMINATE_PROCESS` – When this option is selected, EMR sends a `SIGKILL` signal to the step and all its descendant processes which terminates them immediately.

------

**Considerations for canceling steps**
+ Canceling a running or pending step removes that step from the active step count.
+ Canceling a running step does not allow a pending step to start running, assuming no change to `stepConcurrencyLevel`.
+ Canceling a running step does not trigger the step `ActionOnFailure`.
+ For EMR 5.32.0 and later, `SEND_INTERRUPT StepCancellationOption` sends a `SIGTERM` signal to the step child process. You should watch for this signal and do a cleanup and shutdown gracefully. The `TERMINATE_PROCESS StepCancellationOption` sends a `SIGKILL` signal to the step child process and all of its descendant processes; however, asynchronous processes are not affected.

# View and monitor an Amazon EMR cluster as it performs work
<a name="emr-manage-view"></a>

Amazon EMR provides several tools you can use to gather information about your cluster. You can access information about the cluster from the console, the CLI or programmatically. The standard Hadoop web interfaces and log files are available on the primary node. You can also use monitoring services such as CloudWatch and Ganglia to track the performance of your cluster. 

Application history is also available from the console using the "persistent" application UIs for Spark History Server starting with Amazon EMR 5.25.0. With Amazon EMR 6.x, persistent YARN timeline server, and Tez user interfaces are also available. These services are hosted off-cluster, so you can access application history for 30 days after the cluster terminates, without the need for a SSH connection or web proxy. See [View application history](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-cluster-application-history.html).

**Topics**
+ [View Amazon EMR cluster status and details](emr-manage-view-clusters.md)
+ [Enhanced step debugging with Amazon EMR](emr-enhanced-step-debugging.md)
+ [View Amazon EMR application history](emr-cluster-application-history.md)
+ [View Amazon EMR log files](emr-manage-view-web-log-files.md)
+ [View cluster instances in Amazon EC2](UsingEMR_Tagging.md)
+ [CloudWatch events and metrics from Amazon EMR](emr-manage-cluster-cloudwatch.md)
+ [View cluster application metrics using Ganglia with Amazon EMR](ViewingGangliaMetrics.md)
+ [Logging AWS EMR API calls using AWS CloudTrail](logging-using-cloudtrail.md)
+ [EMR Observability Best Practices](emr-metrics-observability.md)

# View Amazon EMR cluster status and details
<a name="emr-manage-view-clusters"></a>

After you create a cluster, you can monitor its status and get detailed information about its execution and errors that may have occurred, even after it has terminated. Amazon EMR saves metadata about terminated clusters for your reference for two months, after which the metadata is deleted. You can't delete clusters from the cluster history, but using the AWS Management Console, you can use the **Filter**, and using the AWS CLI, you can use options with the `list-clusters` command to focus on the clusters that you care about.

You can access application history stored on-cluster for one week from the time it is recorded, regardless of whether the cluster is running or terminated. In addition, persistent application user interfaces store application history off-cluster for 30 days after a cluster terminates. See [View application history](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-cluster-application-history.html).

For more information about cluster states, such as Waiting and Running, see [Understanding the cluster lifecycle](emr-overview.md#emr-overview-cluster-lifecycle).

## View cluster details using the AWS Management Console
<a name="emr-view-cluster-console"></a>

The **Clusters** list in the [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr) lists all the clusters in your account and AWS Region, including terminated clusters. The list shows the following for each cluster: the **Name** and **ID**, the **Status** and **Status details**, the **Creation time**, the **Elapsed time** that the cluster was running, and the **Normalized instance hours** that have accrued for all EC2 instances in the cluster. This list is the starting point for monitoring the status of your clusters. It's designed so that you can drill down into each cluster's details for analysis and troubleshooting.

------
#### [ Console ]

**To view cluster information with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and select the cluster that you want to view.

1. Use the **Summary** panel to view the basics of your cluster configuration, such as cluster status, the open-source applications that Amazon EMR installed on the cluster, and the version of Amazon EMR that you used to create the cluster. Use each tab below the Summary to view information as described in the following table.

------

## View cluster details using the AWS CLI
<a name="view-cluser-cli"></a>

The following examples demonstrate how to retrieve cluster details using the AWS CLI. For more information about available commands, see the [AWS CLI Command Reference for Amazon EMR](https://docs.aws.amazon.com/cli/latest/reference/emr). You can use the [describe-cluster](https://docs.aws.amazon.com/cli/latest/reference/emr/describe-cluster.html) command to view cluster-level details including status, hardware and software configuration, VPC settings, bootstrap actions, instance groups, and so on. For more information about cluster states, see [Understanding the cluster lifecycle](emr-overview.md#emr-overview-cluster-lifecycle). The following example demonstrates using the `describe-cluster` command, followed by examples of the [list-clusters](https://docs.aws.amazon.com/cli/latest/reference/emr/describe-cluster.html) command.

**Example Viewing cluster status**  
To use the `describe-cluster` command, you need the cluster ID. This example demonstrates using to get a list of clusters created within a certain date range, and then using one of the cluster IDs returned to list more information about an individual cluster's status.  
The following command describes cluster *j-1K48XXXXXXHCB*, which you replace with your cluster ID.  

```
aws emr describe-cluster --cluster-id j-1K48XXXXXXHCB
```
The output of your command is similar to the following:  

```
{
    "Cluster": {
        "Status": {
            "Timeline": {
                "ReadyDateTime": 1438281058.061, 
                "CreationDateTime": 1438280702.498
            }, 
            "State": "WAITING", 
            "StateChangeReason": {
                "Message": "Waiting for steps to run"
            }
        }, 
        "Ec2InstanceAttributes": {
            "EmrManagedMasterSecurityGroup": "sg-cXXXXX0", 
            "IamInstanceProfile": "EMR_EC2_DefaultRole", 
            "Ec2KeyName": "myKey", 
            "Ec2AvailabilityZone": "us-east-1c", 
            "EmrManagedSlaveSecurityGroup": "sg-example"
        }, 
        "Name": "Development Cluster", 
        "ServiceRole": "EMR_DefaultRole", 
        "Tags": [], 
        "TerminationProtected": false, 
        "ReleaseLabel": "emr-4.0.0", 
        "NormalizedInstanceHours": 16, 
        "InstanceGroups": [
            {
                "RequestedInstanceCount": 1, 
                "Status": {
                    "Timeline": {
                        "ReadyDateTime": 1438281058.101, 
                        "CreationDateTime": 1438280702.499
                    }, 
                    "State": "RUNNING", 
                    "StateChangeReason": {
                        "Message": ""
                    }
                }, 
                "Name": "CORE", 
                "InstanceGroupType": "CORE", 
                "Id": "ig-2EEXAMPLEXXP", 
                "Configurations": [], 
                "InstanceType": "m5.xlarge", 
                "Market": "ON_DEMAND", 
                "RunningInstanceCount": 1
            }, 
            {
                "RequestedInstanceCount": 1, 
                "Status": {
                    "Timeline": {
                        "ReadyDateTime": 1438281023.879, 
                        "CreationDateTime": 1438280702.499
                    }, 
                    "State": "RUNNING", 
                    "StateChangeReason": {
                        "Message": ""
                    }
                }, 
                "Name": "MASTER", 
                "InstanceGroupType": "MASTER", 
                "Id": "ig-2A1234567XP", 
                "Configurations": [], 
                "InstanceType": "m5.xlarge", 
                "Market": "ON_DEMAND", 
                "RunningInstanceCount": 1
            }
        ], 
        "Applications": [
            {
                "Version": "1.0.0", 
                "Name": "Hive"
            }, 
            {
                "Version": "2.6.0", 
                "Name": "Hadoop"
            }, 
            {
                "Version": "0.14.0", 
                "Name": "Pig"
            }, 
            {
                "Version": "1.4.1", 
                "Name": "Spark"
            }
        ], 
        "BootstrapActions": [], 
        "MasterPublicDnsName": "ec2-X-X-X-X.compute-1.amazonaws.com", 
        "AutoTerminate": false, 
        "Id": "j-jobFlowID", 
        "Configurations": [
            {
                "Properties": {
                    "hadoop.security.groups.cache.secs": "250"
                }, 
                "Classification": "core-site"
            }, 
            {
                "Properties": {
                    "mapreduce.tasktracker.reduce.tasks.maximum": "5", 
                    "mapred.tasktracker.map.tasks.maximum": "2", 
                    "mapreduce.map.sort.spill.percent": "90"
                }, 
                "Classification": "mapred-site"
            }, 
            {
                "Properties": {
                    "hive.join.emit.interval": "1000", 
                    "hive.merge.mapfiles": "true"
                }, 
                "Classification": "hive-site"
            }
        ]
    }
}
```

**Example Listing clusters by creation date**  
To retrieve clusters created within a specific data range, use the `list-clusters` command with the `--created-after` and `--created-before` parameters.  
The following command lists all clusters created between October 09, 2019 and October 12, 2019.  

```
aws emr list-clusters --created-after 2019-10-09T00:12:00 --created-before 2019-10-12T00:12:00
```

**Example Listing clusters by state**  
To list clusters by state, use the `list-clusters` command with the `--cluster-states` parameter. Valid cluster states include: STARTING, BOOTSTRAPPING, RUNNING, WAITING, TERMINATING, TERMINATED, and TERMINATED\$1WITH\$1ERRORS.   

```
aws emr list-clusters --cluster-states TERMINATED
```
You can also use the following shortcut parameters to list all clusters in the states specified.:  
+ `--active` filters clusters in the STARTING,BOOTSTRAPPING, RUNNING, WAITING, or TERMINATING states.
+ `--terminated` filters clusters in the TERMINATED state.
+ `--failed` parameter filters clusters in the TERMINATED\$1WITH\$1ERRORS state.
The following commands return the same result.  

```
aws emr list-clusters --cluster-states TERMINATED
```

```
aws emr list-clusters --terminated
```
For more information about cluster states, see [Understanding the cluster lifecycle](emr-overview.md#emr-overview-cluster-lifecycle).

# Enhanced step debugging with Amazon EMR
<a name="emr-enhanced-step-debugging"></a>

If an Amazon EMR step fails and you submitted your work using the Step API operation with an AMI of version 5.x or later, Amazon EMR can identify and return the root cause of the step failure in some cases, along with the name of the relevant log file and a portion of the application stack trace via API. For example, the following failures can be identified: 
+ A common Hadoop error such as the output directory already exists, the input directory does not exist, or an application runs out of memory.
+ Java errors such as an application that was compiled with an incompatible version of Java or run with a main class that is not found.
+ An issue accessing objects stored in Amazon S3.

This information is available using the [DescribeStep](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_DescribeStep.html) and [ListSteps](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_ListSteps.html) API operations. The [FailureDetails](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_FailureDetails.html) field of the [StepSummary](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_StepSummary.html) returned by those operations. To access the FailureDetails information, use the AWS CLI, console, or AWS SDK.

------
#### [ Console ]

The new Amazon EMR console doesn't offer step debugging. However, you can view cluster termination details with the following steps.

**To view failure details with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then select the cluster that you want to view.

1. Note the **Status** value in the **Summary** section of the cluster details page. If the status is **Terminated with errors**, hover over the text to view cluster failure details.

------
#### [ CLI ]

**To view failure details with the AWS CLI**
+ To get failure details for a step with the AWS CLI, use the `describe-step` command.

  ```
  aws emr describe-step --cluster-id j-1K48XXXXXHCB --step-id s-3QM0XXXXXM1W
  ```

  The output will look similar to the following:

  ```
  {
    "Step": {
      "Status": {
        "FailureDetails": {
          "LogFile": "s3://amzn-s3-demo-bucket/logs/j-1K48XXXXXHCB/steps/s-3QM0XXXXXM1W/stderr.gz",
          "Message": "org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory s3://amzn-s3-demo-bucket/logs/beta already exists",
          "Reason": "Output directory already exists."
        },
        "Timeline": {
          "EndDateTime": 1469034209.143,
          "CreationDateTime": 1469033847.105,
          "StartDateTime": 1469034202.881
        },
        "State": "FAILED",
        "StateChangeReason": {}
      },
      "Config": {
        "Args": [
          "wordcount",
          "s3://amzn-s3-demo-bucket/input/input.txt",
          "s3://amzn-s3-demo-bucket/logs/beta"
        ],
        "Jar": "s3://amzn-s3-demo-bucket/jars/hadoop-mapreduce-examples-2.7.2-amzn-1.jar",
        "Properties": {}
      },
      "Id": "s-3QM0XXXXXM1W",
      "ActionOnFailure": "CONTINUE",
      "Name": "ExampleJob"
    }
  }
  ```

------

# View Amazon EMR application history
<a name="emr-cluster-application-history"></a>

You can view Spark History Server and YARN timeline service application details with the cluster's detail page in the console. Amazon EMR application history makes it easier for you to troubleshoot and analyze active jobs and job history. 

**Note**  
To augment the security for the off-console applications that you might use with Amazon EMR, the application hosting domains are registered in the Public Suffix List (PSL). Examples of these hosting domains include the following: `emrstudio-prod.us-east-1.amazonaws.com`, `emrnotebooks-prod.us-east-1.amazonaws.com`, `emrappui-prod.us-east-1.amazonaws.com`. For further security, if you ever need to set sensitive cookies in the default domain name, we recommend that you use cookies with a `__Host-` prefix. This helps to defend your domain against cross-site request forgery attempts (CSRF). For more information, see the [https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#cookie_prefixes](https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie#cookie_prefixes) page in the *Mozilla Developer Network*. 

The **Application user interfaces** section of the **Applications** tab provides several viewing options, depending on the cluster status and the applications you installed on the cluster.
+ [Off-cluster access to persistent application user interfaces](https://docs.aws.amazon.com/emr/latest/ManagementGuide/app-history-spark-UI.html) – Starting with Amazon EMR version 5.25.0, persistent application user interface links are available for Spark UI and Spark History Service. With Amazon EMR version 5.30.1 and later, Tez UI and the YARN timeline server also have persistent application user interfaces. The YARN timeline server and Tez UI are open-source applications that provide metrics for active and terminated clusters. The Spark user interface provides details about scheduler stages and tasks, RDD sizes and memory usage, environmental information, and information about the running executors. Persistent application UIs are run off-cluster, so cluster information and logs are available for 30 days after an application terminates. Unlike on-cluster application user interfaces, persistent application UIs don't require you to set up a web proxy through a SSH connection.
+ [On-cluster application user interfaces](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-web-interfaces.html) – There are a variety of application history user interfaces that can be run on a cluster. On-cluster user interfaces are hosted on the master node and require you to set up a SSH connection to the web server. On-cluster application user interfaces keep application history for one week after an application terminates. For more information and instructions on setting up an SSH tunnel, see [View web interfaces hosted on Amazon EMR clusters](emr-web-interfaces.md).

  With the exception of the Spark History Server, YARN timeline server, and Hive applications, on-cluster application history can only be viewed while the cluster is running.

# View persistent application user interfaces in Amazon EMR
<a name="app-history-spark-UI"></a>

Starting with Amazon EMR version 5.25.0, you can connect to the persistent Spark History Server application details hosted off-cluster using the cluster **Summary** page or the **Application user interfaces** tab in the console. Tez UI and YARN timeline server persistent application interfaces are available starting with Amazon EMR version 5.30.1. One-click link access to persistent application history provides the following benefits: 
+ You can quickly analyze and troubleshoot active jobs and job history without setting up a web proxy through an SSH connection.
+ You can access application history and relevant log files for active and terminated clusters. The logs are available for 30 days after the application ends. 

Navigate to your cluster details in the console, and select the **Applications** tab. Select the application UI that you want once your cluster has launched. The application UI opens in a new browser tab. For more information, see [Monitoring and instrumentation](https://spark.apache.org/docs/latest/monitoring.html).

You can view YARN container logs through the links on the Spark history server, YARN timeline server, and Tez UI. 

**Note**  
To access YARN container logs from the Spark history server, YARN timeline server, and Tez UI, you must enable logging to Amazon S3 for your cluster. If you don't enable logging, the links to YARN container logs won't work. 

## Logs collection
<a name="app-history-spark-UI-event-logs"></a>

To enable one-click access to persistent application user interfaces, Amazon EMR collects two types of logs: 
+ **Application event logs** are collected into an EMR system bucket. The event logs are encrypted at rest using Server-Side Encryption with Amazon S3 Managed Keys (SSE-S3). If you use a private subnet for your cluster, make sure to include the correct system bucket ARNs in the resource list of the Amazon S3 policy for the private subnet. For more information, see [Minimum Amazon S3 policy for private subnet](https://docs.aws.amazon.com/emr/latest/ManagementGuide/private-subnet-iampolicy.html).
+ **YARN container logs** are collected into an Amazon S3 bucket that you own. You must enable logging for your cluster to access YARN container logs. For more information, see [Configure cluster logging and debugging](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-debugging.html).

If you need to disable this feature for privacy reasons, you can stop the daemon by using a bootstrap script when you create a cluster, as the following example demonstrates.

```
aws emr create-cluster --name "Stop Application UI Support" --release-label emr-7.12.0 \
--applications Name=Hadoop Name=Spark --ec2-attributes KeyName=<myEMRKeyPairName> \
--instance-groups InstanceGroupType=MASTER,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=CORE,InstanceCount=1,InstanceType=m3.xlarge InstanceGroupType=TASK,InstanceCount=1,InstanceType=m3.xlarge \
--use-default-roles --bootstrap-actions Path=s3://region.elasticmapreduce/bootstrap-actions/run-if,Args=["instance.isMaster=true","echo Stop Application UI | sudo tee /etc/apppusher/run-apppusher; sudo systemctl stop apppusher || exit 0"]
```

After you run this bootstrap script, Amazon EMR will not collect any Spark History Server or YARN timeline server event logs into the EMR system bucket. No application history information will be available on the **Application user interfaces** tab, and you will lose access to all application user interfaces from the console.

## Large Spark event log files
<a name="app-history-spark-UI-large-event-logs"></a>

In some cases, long-running Spark jobs, such as Spark streaming, and large jobs, such as Spark SQL queries, can generate large event logs. With large events logs, you can quickly use up disk space on compute instances and encounter `OutOfMemory` errors when you load Persistent UIs. To avoid these issues, we recommend that you turn on the Spark event log rolling and compaction feature. This feature is available on Amazon EMR versions emr-6.1.0 and later. For more details about rolling and compaction, see [Applying compaction on rolling event log files](https://spark.apache.org/docs/latest/monitoring.html#applying-compaction-on-rolling-event-log-files) in the Spark documentation.

To activate the Spark event log rolling and compaction feature, turn on the following Spark configuration settings.
+ `spark.eventLog.rolling.enabled` – Turns on event log rolling based on size. This setting is deactivated by default.
+ `spark.eventLog.rolling.maxFileSize` – When rolling is activated, specifies the maximum size of the event log file before it rolls over. The default is 128 MB.
+ `spark.history.fs.eventLog.rolling.maxFilesToRetain` – Specifies the maximum number of non-compacted event log files to retain. By default, all event log files are retained. Set to a lower number to compact older event logs. The lowest value is 1.

Note that compaction attempts to exclude events with outdated event log files, such as the following. If it does discard events, you no longer see them on the Spark History Server UI.
+ Events for finished jobs and related stage or task events.
+ Events for terminated executors.
+ Events for completed SQL inquiries, and related job, stage, and tasks events.

**To launch a cluster with rolling and compaction enabled**

1. Create a `spark-configuration.json` file with the following configuration.

   ```
   [
      {
        "Classification": "spark-defaults",
           "Properties": {
              "spark.eventLog.rolling.enabled": true,
              "spark.history.fs.eventLog.rolling.maxFilesToRetain": 1
           }
      }
   ]
   ```

1. Create your cluster with the Spark rolling compaction configuration as follows.

   ```
   aws emr create-cluster \
   --release-label emr-6.6.0 \
   --instance-type m4.large \
   --instance-count 2 \
   --use-default-roles \
   --configurations file://spark-configuration.json
   ```

## Permissions for viewing persistent application user interfaces
<a name="app-history-spark-UI-permissions"></a>

The following sample shows the role permissions required for access to persistent application user interfaces. For clusters with runtime role enabled, this will only allow users to access applications submitted by the same user identity and runtime role.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:CreatePersistentAppUI",
        "elasticmapreduce:DescribePersistentAppUI"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:*:123456789012:cluster/clusterId"
      ],
      "Sid": "AllowELASTICMAPREDUCECreatepersistentappui"
    },
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:GetPersistentAppUIPresignedURL"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:*:123456789012:cluster/clusterId",
        "arn:aws:elasticmapreduce:*:123456789012:persistent-app-ui/*"
      ],
      "Condition": {
        "StringEqualsIfExists": {
          "elasticmapreduce:ExecutionRoleArn": [
            "arn:aws:iam::123456789012:role/executionRoleArn"
          ]
        }
      },
      "Sid": "AllowELASTICMAPREDUCEGetpersistentappuipresignedurl"
    }
  ]
}
```

------

The following sample shows the role permissions required for removing the restrictions on viewing applications in the persistent application user interfaces for runtime role enabled clusters.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:CreatePersistentAppUI",
        "elasticmapreduce:DescribePersistentAppUI",
        "elasticmapreduce:AccessAllEventLogs"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:us-east-1:123456789012:cluster/j-XXXXXXXXXXXXX"
      ],
      "Sid": "AllowELASTICMAPREDUCECreatepersistentappui"
    },
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:GetPersistentAppUIPresignedURL"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:us-east-1:123456789012:cluster/j-XXXXXXXXXXXXX",
        "arn:aws:elasticmapreduce:us-east-1:123456789012:persistent-app-ui/*"
      ],
      "Condition": {
        "StringEqualsIfExists": {
          "elasticmapreduce:ExecutionRoleArn": [
            "arn:aws:iam::123456789012:role/YourExecutionRoleName"
          ]
        }
      },
      "Sid": "AllowELASTICMAPREDUCEGetpersistentappuipresignedurl"
    }
  ]
}
```

------

## Considerations and limitations
<a name="app-history-spark-UI-limitations"></a>

One-click access to persistent application user interfaces currently has the following limitations.
+ There will be at least a two-minute delay when the application details show up on the Spark History Server UI.
+ This feature works only when the event log directory for the application is in HDFS. By default, Amazon EMR stores event logs in a directory of HDFS. If you change the default directory to a different file system, such as Amazon S3, this feature will not work. 
+ This feature is currently not available for EMR clusters with multiple master nodes or for EMR clusters integrated with AWS Lake Formation. 
+ To enable one-click access to persistent application user interfaces, you must have permission to the `CreatePersistentAppUI`, `DescribePersistentAppUI` and `GetPersistentAppUIPresignedURL` actions for Amazon EMR. If you deny an IAM principal's permission to these actions, it takes approximately five minutes for the permission change to propagate.
+ If a cluster is a runtime role enabled cluster, when accessing the Spark History Server from the Persistent App UI, the user will only be able to access a Spark job if the Spark job is submitted by a runtime role.
+ If a cluster is a runtime role enabled cluster, each user can access only an application submitted by the same user identity and runtime role.
+  The `AccessAllEventLogs` action for Amazon EMR is necessary to view all applications in persistent application user interfaces for runtime role enabled clusters. 
+ If you reconfigure applications in a running cluster, the application history will be not available through the application UI. 
+ For each AWS account, the default limit for active application UIs is 200.
+ In the following AWS Regions, you can access application UIs from the console with Amazon EMR 6.14.0 and higher: 
  + Asia Pacific (Jakarta) (ap-southeast-3)
  + Europe (Spain) (eu-south-2)
  + Asia Pacific (Melbourne) (ap-southeast-4)
  + Israel (Tel Aviv) (il-central-1)
  + Middle East (UAE) (me-central-1)
+ In the following AWS Regions, you can access application UIs from the console with Amazon EMR 5.25.0 and higher: 
  + US East (N. Virginia) (us-east-1)
  + US West (Oregon) (us-west-2)
  + Asia Pacific (Mumbai) (ap-south-1)
  + Asia Pacific (Seoul) (ap-northeast-2)
  + Asia Pacific (Singapore) (ap-southeast-1)
  + Asia Pacific (Sydney) (ap-southeast-2)
  + Asia Pacific (Tokyo) (ap-northeast-1)
  + Canada (Central) (ca-central-1)
  + South America (São Paulo) (sa-east-1)
  + Europe (Frankfurt) (eu-central-1)
  + Europe (Ireland) (eu-west-1)
  + Europe (London) (eu-west-2)
  + Europe (Paris) (eu-west-3)
  + Europe (Stockholm) (eu-north-1)
  + China (Beijing) (cn-north-1)
  + China (Ningxia) (cn-northwest-1)

# View a high-level application history in Amazon EMR
<a name="app-history-summary"></a>

**Note**  
We recommend that you use the persistent application interface for an improved user experience that retains app history for up to 30 days. The high-level application history described on this page isn't available in the new Amazon EMR console ([https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr)). For more information, see [View persistent application user interfaces in Amazon EMR](app-history-spark-UI.md).

With Amazon EMR releases 5.8.0 to 5.36.0 and 6.x releases up to 6.8.0, you can view a high-level application history from the **Application user interfaces** tab in the old Amazon EMR console. An Amazon EMR **Application user interface** keeps the summary of application history for 7 days after an application has completed. 

## Considerations and limitations
<a name="app-history-limitations"></a>

Consider the following limitations when you use the **Application user interfaces** tab in the old Amazon EMR console.
+ You can only access the high-level application history feature when using Amazon EMR releases 5.8.0 to 5.36.0 and 6.x releases up to 6.8.0. Effective January 23, 2023, Amazon EMR will discontinue high-level application history for all versions. If you use Amazon EMR version 5.25.0 or higher, we recommend that you use the persistent application user interface instead.
+ The high-level application history feature does not support Spark Streaming applications.
+ One-click access to persistent application user interfaces is currently not available for Amazon EMR clusters with multiple master nodes or for Amazon EMR clusters integrated with AWS Lake Formation.

## Example: View a high-level application history
<a name="app-history-example"></a>

The following sequence demonstrates a drill-down through a Spark or YARN application into job details using the **Application user interfaces** tab on the cluster details page of the old console. 

To view cluster details, select a cluster **Name** from the **Clusters** list. To view information about YARN container logs, you must enable logging for your cluster. For more information, see [Configure cluster logging and debugging](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-debugging.html). For Spark application history, the information provided in the summary table is only a subset of the information available through the Spark history server UI.

In the **Application user interfaces** tab under **High-level application history**, you can expand a row to show the diagnostic summary for a Spark application or select an **Application ID** link to view details about a different application.

![\[Application user interfaces tab showing persistent and on-cluster UIs, with YARN application history.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/app-history-app.png)


When you select an **Application ID** link, the UI changes to show the **YARN application** details for that application. In the **Jobs** tab of **YARN application** details, you can choose the **Description** link for a job to display details for that job.

![\[YARN application details showing job history with completed Spark tasks and their statuses.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/app-history-job-1.png)


On the job details page, you can expand information about individual job stages, and then select the **Description** link to see stage details.

![\[EMR cluster interface showing persistent and on-cluster application UIs, with job details and stages.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/app-history-job-2.png)


On the stage details page, you can view key metrics for stage tasks and executors. You can also view task and executor logs using the **View logs** links.

![\[Application history page showing task metrics, executor details, and log access links for a Spark job.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/app-history-job-3.png)


# View Amazon EMR log files
<a name="emr-manage-view-web-log-files"></a>

 Amazon EMR and Hadoop both produce log files that report status on the cluster. By default, these are written to the primary node in the `/mnt/var/log/` directory. Depending on how you configured your cluster when you launched it, these logs may also be archived to Amazon S3 and may be viewable through the graphical debugging tool. 

 There are many types of logs written to the primary node. Amazon EMR writes step, bootstrap action, and instance state logs. Apache Hadoop writes logs to report the processing of jobs, tasks, and task attempts. Hadoop also records logs of its daemons. For more information about the logs written by Hadoop, go to [http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html](http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/ClusterSetup.html). 

## View log files on the primary node
<a name="emr-manage-view-web-log-files-master-node"></a>

The following table lists some of the log files you'll find on the primary node.


| Location | Description | 
| --- | --- | 
|  /emr/instance-controller/log/bootstrap-actions  | Logs written during the processing of the bootstrap actions. | 
|  /mnt/var/log/hadoop-state-pusher  | Logs written by the Hadoop state pusher process. | 
|  /emr/instance-controller/log  | Instance controller logs. | 
|  /emr/instance-state  | Instance state logs. These contain information about the CPU, memory state, and garbage collector threads of the node. | 
|  /emr/service-nanny  | Logs written by the service nanny process. | 
|  /mnt/var/log/*application*  | Logs specific to an application such as Hadoop, Spark, or Hive. | 
|  /mnt/var/log/hadoop/steps/*N*  | Step logs that contain information about the processing of the step. The value of *N* indicates the stepId assigned by Amazon EMR. For example, a cluster has two steps: `s-1234ABCDEFGH` and `s-5678IJKLMNOP`. The first step is located in `/mnt/var/log/hadoop/steps/s-1234ABCDEFGH/` and the second step in `/mnt/var/log/hadoop/steps/s-5678IJKLMNOP/`.  The step logs written by Amazon EMR are as follows.  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-web-log-files.html)  | 

**To view log files on the primary node with the AWS CLI.**

1.  Use SSH to connect to the primary node as described in [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md). 

1.  Navigate to the directory that contains the log file information you wish to view. The preceding table gives a list of the types of log files that are available and where you will find them. The following example shows the command for navigating to the step log with an ID, `s-1234ABCDEFGH`. 

   ```
   cd /mnt/var/log/hadoop/steps/s-1234ABCDEFGH/
   ```

1. Use a file viewer of your choice to view the log file. The following example uses the Linux `less` command to view the `controller` log file.

   ```
   less controller
   ```

## View log files archived to Amazon S3
<a name="emr-manage-view-web-log-files-s3"></a>

By default, Amazon EMR clusters launched using the console automatically archive log files to Amazon S3. You can specify your own log path, or you can allow the console to automatically generate a log path for you. For clusters launched using the CLI or API, you must configure Amazon S3 log archiving manually. 

 When Amazon EMR is configured to archive log files to Amazon S3, it stores the files in the S3 location you specified, in the /*cluster-id*/ folder, where *cluster-id* is the cluster ID. 

The following table lists some of the log files you'll find on Amazon S3.


| Location | Description | 
| --- | --- | 
|  /*cluster-id*/node/  | Node logs, including bootstrap action, instance state, and application logs for the node. The logs for each node are stored in a folder labeled with the identifier of the EC2 instance of that node. | 
|  /*cluster-id*/node/*instance-id*/*application*  | The logs created by each application or daemon associated with an application. For example, the Hive server log is located at `cluster-id/node/instance-id/hive/hive-server.log`. | 
|  /*cluster-id*/steps/*step-id*/  | Step logs that contain information about the processing of the step. The value of *step-id* indicates the step ID assigned by Amazon EMR. For example, a cluster has two steps: `s-1234ABCDEFGH` and `s-5678IJKLMNOP`. The first step is located in `/mnt/var/log/hadoop/steps/s-1234ABCDEFGH/` and the second step in `/mnt/var/log/hadoop/steps/s-5678IJKLMNOP/`.  The step logs written by Amazon EMR are as follows.  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-view-web-log-files.html)  | 
|  /*cluster-id*/containers  |  Application container logs. The logs for each YARN application are stored in these locations.  | 
|  /*cluster-id*/hadoop-mapreduce/  | The logs that contain information about configuration details and job history of MapReduce jobs.  | 

**To view log files archived to Amazon S3 with the Amazon S3 console**

1. Sign in to the AWS Management Console and open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. Open the S3 bucket specified when you configured the cluster to archive log files in Amazon S3. 

1. Navigate to the log file containing the information to display. The preceding table gives a list of the types of log files that are available and where you will find them. 

1. Download the log file object to view it. For instructions, see [Downloading an object](https://docs.aws.amazon.com/AmazonS3/latest/userguide/download-objects.html).

# View cluster instances in Amazon EC2
<a name="UsingEMR_Tagging"></a>

 To help you manage your resources, Amazon EC2 allows you to assign metadata to resources in the form of tags. Each Amazon EC2 tag consists of a key and a value. Tags allow you to categorize your Amazon EC2 resources in different ways: for example, by purpose, owner, or environment. 

 You can search and filter resources based on the tags. The tags that you assign to resources through your AWS account are available only to you. Other accounts that share the same resource can't view your tags. 

Amazon EMR automatically tags each EC2 instance that it launches with key-value pairs. The keys identify the cluster and the instance group to which the instance belongs. This makes it easy to filter your EC2 instances to show, for example, only those instances that belong to a particular cluster, or to show all of the currently running instances in the instance group for the task. This is especially useful if you run several clusters concurrently or manage large numbers of EC2 instances.

These are the predefined key-value pairs that Amazon EMR assigns:


| Key | Value | Value definition | 
| --- | --- | --- | 
| aws:elasticmapreduce:job-flow-id |  `job-flow-identifier`  | The ID of the cluster that the instance is provisioned for. It appears in the format `j-XXXXXXXXXXXXX` and can be up to 256 characters long. | 
| aws:elasticmapreduce:instance-group-role |  `group-role`  | The type of instance group, entered as one of the following values: `master`, `core`, or `task`. | 

 You can view and filter on the tags that Amazon EMR adds. For more information, see [Using tags](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/Using_Tags.html) in the *Amazon EC2 User Guide*. Because the tags set by Amazon EMR are system tags and cannot be edited or deleted, the sections on displaying and filtering tags are the most relevant. 

**Note**  
 Amazon EMR adds tags to the EC2 instance when its status updates to **Running**. If latency occurs between the time that the EC2 instance is provisioned and the time that its status is set to **Running**, the tags that Amazon EMR sets will appear once the instance starts. If you don't see the tags, wait for a few minutes and refresh the view.

# CloudWatch events and metrics from Amazon EMR
<a name="emr-manage-cluster-cloudwatch"></a>

Use events and metrics to track the activity and health of an Amazon EMR cluster. Events are useful for monitoring a specific occurrence within a cluster - for example, when a cluster changes state from starting to running. Metrics are useful to monitor a specific value - for example, the percentage of available disk space that HDFS is using within a cluster.

For more information about CloudWatch Events, see the [Amazon CloudWatch Events User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/). For more information about CloudWatch metrics, see [Using Amazon CloudWatch metrics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/working_with_metrics.html) and [Creating Amazon CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html) in the *Amazon CloudWatch User Guide*.

**Topics**
+ [Monitoring Amazon EMR metrics with CloudWatch](UsingEMR_ViewingMetrics.md)
+ [Monitoring Amazon EMR events with CloudWatch](emr-manage-cloudwatch-events.md)
+ [Responding to CloudWatch events from Amazon EMR](emr-events-response.md)

# Monitoring Amazon EMR metrics with CloudWatch
<a name="UsingEMR_ViewingMetrics"></a>

Metrics are updated every five minutes and automatically collected and pushed to CloudWatch for every Amazon EMR cluster. This interval is not configurable. There is no charge for the Amazon EMR metrics reported in CloudWatch. These five minute datapoint metrics are archived for 63 days, after which the data is discarded. 

## How do I use Amazon EMR metrics?
<a name="UsingEMR_ViewingMetrics_HowDoI"></a>

The following table shows common uses for metrics reported by Amazon EMR. These are suggestions to get you started, not a comprehensive list. For a complete list of metrics reported by Amazon EMR, see [Metrics reported by Amazon EMR in CloudWatch](#UsingEMR_ViewingMetrics_MetricsReported). 


****  

| How do I? | Relevant metrics | 
| --- | --- | 
| Track the progress of my cluster | Look at the RunningMapTasks, RemainingMapTasks, RunningReduceTasks, and RemainingReduceTasks metrics.  | 
| Detect clusters that are idle | The IsIdle metric tracks whether a cluster is live, but not currently running tasks. You can set an alarm to fire when the cluster has been idle for a given period of time, such as thirty minutes.  | 
| Detect when a node runs out of storage | The MRUnhealthyNodes metric tracks when one or more core or task nodes run out of local disk storage and transition to an UNHEALTHY YARN state. For example, core or task nodes are running low on disk space and will not be able to run tasks. | 
| Detect when a cluster runs out of storage | The HDFSUtilization metric monitors the cluster's combined HDFS capacity, and can require resizing the cluster to add more core nodes. For example, the HDFS utilization is high, which may affect jobs and cluster health.  | 
| Detect when a cluster is running at reduced capacity | The MRLostNodes metric tracks when one or more core or task nodes is unable to communicate with the master node. For example, the core or task node is unreachable by the master node. | 

For more information, see [Amazon EMR cluster terminates with NO\$1SLAVE\$1LEFT and core nodes FAILED\$1BY\$1MASTER](emr-cluster-NO_SLAVE_LEFT-FAILED_BY_MASTER.md) and [AWSSupport-AnalyzeEMRLogs](https://docs.aws.amazon.com//systems-manager-automation-runbooks/latest/userguide/automation-awssupport-analyzeemrlogs.html). 

## Access CloudWatch metrics for Amazon EMR
<a name="UsingEMR_ViewingMetrics_Access"></a>

You can view the metrics that Amazon EMR reports to CloudWatch using the Amazon EMR console or the CloudWatch console. You can also retrieve metrics using the CloudWatch CLI command `[mon-get-stats](https://docs.aws.amazon.com/AmazonCloudWatch/latest/cli/cli-mon-get-stats.html)` or the CloudWatch `[GetMetricStatistics](https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_GetMetricStatistics.html)` API. For more information about viewing or retrieving metrics for Amazon EMR using CloudWatch, see the [Amazon CloudWatch User Guide](https://docs.aws.amazon.com/AmazonCloudWatch/latest/DeveloperGuide/).

------
#### [ Console ]

**To view metrics with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose the cluster that you want to view metrics for. This opens the cluster details page.

1. Select the **Monitoring** tab on the cluster details page. Choose any one of the **Cluster status**, **Node status**, or **Inputs and outputs** options to load the reports about the progress and health of the cluster. 

1. After you choose a metric to view, you can enlarge each graph. To filter the time frame of your graph, select a prefilled option or choose **Custom**.

------

## Metrics reported by Amazon EMR in CloudWatch
<a name="UsingEMR_ViewingMetrics_MetricsReported"></a>

The following tables list the metrics that Amazon EMR reports in the console and pushes to CloudWatch.

### Amazon EMR metrics
<a name="emr-metrics-reported"></a>

Amazon EMR sends data for several metrics to CloudWatch. All Amazon EMR clusters automatically send metrics in five-minute intervals. Metrics are archived for two weeks; after that period, the data is discarded. 

The `AWS/ElasticMapReduce` namespace includes the following metrics.

**Note**  
Amazon EMR pulls metrics from a cluster. If a cluster becomes unreachable, no metrics are reported until the cluster becomes available again.

The following metrics are available for clusters running Hadoop 2.x versions.


| Metric | Description | 
| --- | --- | 
| Cluster Status | 
| IsIdle  | Indicates that a cluster is no longer performing work, but is still alive and accruing charges. It is set to 1 if no tasks are running and no jobs are running, and set to 0 otherwise. This value is checked at five-minute intervals and a value of 1 indicates only that the cluster was idle when checked, not that it was idle for the entire five minutes. To avoid false positives, you should raise an alarm when this value has been 1 for more than one consecutive 5-minute check. For example, you might raise an alarm on this value if it has been 1 for thirty minutes or longer. Use case: Monitor cluster performance Units: *Boolean*  | 
| ContainerAllocated  | The number of resource containers allocated by the ResourceManager. Use case: Monitor cluster progress Units: *Count*  | 
| ContainerReserved  | The number of containers reserved. Use case: Monitor cluster progress Units: *Count*  | 
| ContainerPending  | The number of containers in the queue that have not yet been allocated. Use case: Monitor cluster progress Units: *Count*  | 
| ContainerPendingRatio  | The ratio of pending containers to containers allocated (ContainerPendingRatio = ContainerPending / ContainerAllocated). If ContainerAllocated = 0, then ContainerPendingRatio = ContainerPending. The value of ContainerPendingRatio represents a number, not a percentage. This value is useful for scaling cluster resources based on container allocation behavior. Units: *Count*  | 
| AppsCompleted  | The number of applications submitted to YARN that have completed. Use case: Monitor cluster progress Units: *Count*  | 
| AppsFailed  | The number of applications submitted to YARN that have failed to complete. Use case: Monitor cluster progress, Monitor cluster health Units: *Count*  | 
| AppsKilled  | The number of applications submitted to YARN that have been killed. Use case: Monitor cluster progress, Monitor cluster health Units: *Count*  | 
| AppsPending  | The number of applications submitted to YARN that are in a pending state. Use case: Monitor cluster progress Units: *Count*  | 
| AppsRunning  | The number of applications submitted to YARN that are running. Use case: Monitor cluster progress Units: *Count*  | 
| AppsSubmitted  | The number of applications submitted to YARN. Use case: Monitor cluster progress Units: *Count*  | 
| Node Status | 
| CoreNodesRunning  | The number of core nodes working. Data points for this metric are reported only when a corresponding instance group exists. Use case: Monitor cluster health Units: *Count*  | 
| CoreNodesPending  | The number of core nodes waiting to be assigned. All of the core nodes requested may not be immediately available; this metric reports the pending requests. Data points for this metric are reported only when a corresponding instance group exists. Use case: Monitor cluster health Units: *Count*  | 
| LiveDataNodes  | The percentage of data nodes that are receiving work from Hadoop. Use case: Monitor cluster health Units: *Percent*  | 
| MRTotalNodes  | The number of nodes presently available to MapReduce jobs. Equivalent to YARN metric `mapred.resourcemanager.TotalNodes`. Use ase: Monitor cluster progress Units: *Count* Note: MRTotalNodes only counts currently active nodes in the system. YARN automatically removes terminated nodes from this count and stops tracking them, so they are not considered in the MRTotalNodes metric.  | 
| MRActiveNodes  | The number of nodes presently running MapReduce tasks or jobs. Equivalent to YARN metric `mapred.resourcemanager.NoOfActiveNodes`. Use case: Monitor cluster progress Units: *Count*  | 
| MRLostNodes  | The number of nodes allocated to MapReduce that have been marked in a LOST state. Equivalent to YARN metric `mapred.resourcemanager.NoOfLostNodes`. Use case: Monitor cluster health, Monitor cluster progress Units: *Count*  | 
| MRUnhealthyNodes  | The number of nodes available to MapReduce jobs marked in an UNHEALTHY state. Equivalent to YARN metric `mapred.resourcemanager.NoOfUnhealthyNodes`. Use case: Monitor cluster progress Units: *Count*  | 
| MRDecommissionedNodes  | The number of nodes allocated to MapReduce applications that have been marked in a DECOMMISSIONED state. Equivalent to YARN metric `mapred.resourcemanager.NoOfDecommissionedNodes`. Use ase: Monitor cluster health, Monitor cluster progress Units: *Count*  | 
| MRRebootedNodes  | The number of nodes available to MapReduce that have been rebooted and marked in a REBOOTED state. Equivalent to YARN metric `mapred.resourcemanager.NoOfRebootedNodes`. Use case: Monitor cluster health, Monitor cluster progress Units: *Count*  | 
| MultiMasterInstanceGroupNodesRunning  | The number of running master nodes. Use case: Monitor master node failure and replacement Units: *Count*  | 
| MultiMasterInstanceGroupNodesRunningPercentage  | The percentage of master nodes that are running over the requested master node instance count.  Use case: Monitor master node failure and replacement Units: *Percent*  | 
| MultiMasterInstanceGroupNodesRequested  | The number of requested master nodes.  Use case: Monitor master node failure and replacement Units: *Count*  | 
| IO | 
| S3BytesWritten  | The number of bytes written to Amazon S3. This metric aggregates MapReduce jobs only, and does not apply for other workloads on Amazon EMR.  Use case: Analyze cluster performance, Monitor cluster progress Units: *Count*  | 
| S3BytesRead  | The number of bytes read from Amazon S3. This metric aggregates MapReduce jobs only, and does not apply for other workloads on Amazon EMR.  Use case: Analyze cluster performance, Monitor cluster progress Units: *Count*  | 
| HDFSUtilization  | The percentage of HDFS storage currently used. Use case: Analyze cluster performance Units: *Percent*  | 
| HDFSBytesRead  | The number of bytes read from HDFS. This metric aggregates MapReduce jobs only, and does not apply for other workloads on Amazon EMR. Use case: Analyze cluster performance, Monitor cluster progress Units: *Count*  | 
| HDFSBytesWritten  | The number of bytes written to HDFS. This metric aggregates MapReduce jobs only, and does not apply for other workloads on Amazon EMR. Use case: Analyze cluster performance, Monitor cluster progress Units: *Count*  | 
| MissingBlocks  | The number of blocks in which HDFS has no replicas. These might be corrupt blocks. Use case: Monitor cluster health Units: *Count*  | 
| CorruptBlocks  | The number of blocks that HDFS reports as corrupted. Use case: Monitor cluster health Units: *Count*  | 
| TotalLoad  | The total number of concurrent data transfers. Use case: Monitor cluster health Units: *Count*  | 
| MemoryTotalMB  | The total amount of memory in the cluster. Use case: Monitor cluster progress Units: *Count*  | 
| MemoryReservedMB  | The amount of memory reserved. Use case: Monitor cluster progress Units: *Count*  | 
| MemoryAvailableMB  | The amount of memory available to be allocated. Use case: Monitor cluster progress Units: *Count*  | 
| YARNMemoryAvailablePercentage  | The percentage of remaining memory available to YARN (YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB). This value is useful for scaling cluster resources based on YARN memory usage. Units: *Percent*  | 
| MemoryAllocatedMB  | The amount of memory allocated to the cluster. Use case: Monitor cluster progress Units: *Count*  | 
| PendingDeletionBlocks  | The number of blocks marked for deletion. Use case: Monitor cluster progress, Monitor cluster health Units: *Count*  | 
| UnderReplicatedBlocks  | The number of blocks that need to be replicated one or more times. Use case: Monitor cluster progress, Monitor cluster health Units: *Count*  | 
| DfsPendingReplicationBlocks  | The status of block replication: blocks being replicated, age of replication requests, and unsuccessful replication requests. Use case: Monitor cluster progress, Monitor cluster health Units: *Count*  | 
| CapacityRemainingGB  | The amount of remaining HDFS disk capacity.  Use case: Monitor cluster progress, Monitor cluster health Units: *Count*  | 

The following are Hadoop 1 metrics:


| Metric | Description | 
| --- | --- | 
| Cluster Status | 
| IsIdle  | Indicates that a cluster is no longer performing work, but is still alive and accruing charges. It is set to 1 if no tasks are running and no jobs are running, and set to 0 otherwise. This value is checked at five-minute intervals and a value of 1 indicates only that the cluster was idle when checked, not that it was idle for the entire five minutes. To avoid false positives, you should raise an alarm when this value has been 1 for more than one consecutive 5-minute check. For example, you might raise an alarm on this value if it has been 1 for thirty minutes or longer. Use case: Monitor cluster performance Units: *Boolean*  | 
| JobsRunning  | The number of jobs in the cluster that are currently running. Use case: Monitor cluster health Units: *Count*  | 
| JobsFailed  | The number of jobs in the cluster that have failed. Use case: Monitor cluster health Units: *Count*  | 
| Map/Reduce | 
| MapTasksRunning  | The number of running map tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. Use case: Monitor cluster progress Units: *Count*  | 
| MapTasksRemaining  | The number of remaining map tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. A remaining map task is one that is not in any of the following states: Running, Killed, or Completed. Use case: Monitor cluster progress Units: *Count*  | 
| MapSlotsOpen  | The unused map task capacity. This is calculated as the maximum number of map tasks for a given cluster, less the total number of map tasks currently running in that cluster. Use case: Analyze cluster performance Units: *Count*  | 
| RemainingMapTasksPerSlot  | The ratio of the total map tasks remaining to the total map slots available in the cluster. Use case: Analyze cluster performance Units: *Ratio*  | 
| ReduceTasksRunning  | The number of running reduce tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. Use case: Monitor cluster progress Units: *Count*  | 
| ReduceTasksRemaining  | The number of remaining reduce tasks for each job. If you have a scheduler installed and multiple jobs running, multiple graphs are generated. Use case: Monitor cluster progress Units: *Count*  | 
| ReduceSlotsOpen  | Unused reduce task capacity. This is calculated as the maximum reduce task capacity for a given cluster, less the number of reduce tasks currently running in that cluster. Use case: Analyze cluster performance Units: *Count*  | 
| Node Status | 
| CoreNodesRunning  | The number of core nodes working. Data points for this metric are reported only when a corresponding instance group exists. Use case: Monitor cluster health Units: *Count*  | 
| CoreNodesPending  | The number of core nodes waiting to be assigned. All of the core nodes requested may not be immediately available; this metric reports the pending requests. Data points for this metric are reported only when a corresponding instance group exists. Use case: Monitor cluster health Units: *Count*  | 
| LiveDataNodes  | The percentage of data nodes that are receiving work from Hadoop. Use case: Monitor cluster health Units: *Percent*  | 
| TaskNodesRunning  | The number of task nodes working. Data points for this metric are reported only when a corresponding instance group exists. Use case: Monitor cluster health Units: *Count*  | 
| TaskNodesPending  | The number of task nodes waiting to be assigned. All of the task nodes requested may not be immediately available; this metric reports the pending requests. Data points for this metric are reported only when a corresponding instance group exists. Use case: Monitor cluster health Units: *Count*  | 
| LiveTaskTrackers  | The percentage of task trackers that are functional. Use case: Monitor cluster health Units: *Percent*  | 
| IO | 
| S3BytesWritten  | The number of bytes written to Amazon S3. This metric aggregates MapReduce jobs only, and does not apply for other workloads on Amazon EMR. Use case: Analyze cluster performance, Monitor cluster progress Units: *Count*  | 
| S3BytesRead  | The number of bytes read from Amazon S3. This metric aggregates MapReduce jobs only, and does not apply for other workloads on Amazon EMR. Use case: Analyze cluster performance, Monitor cluster progress Units: *Count*  | 
| HDFSUtilization  | The percentage of HDFS storage currently used. Use case: Analyze cluster performance Units: *Percent*  | 
| HDFSBytesRead  | The number of bytes read from HDFS. Use case: Analyze cluster performance, Monitor cluster progress Units: *Count*  | 
| HDFSBytesWritten  | The number of bytes written to HDFS. Use case: Analyze cluster performance, Monitor cluster progress Units: *Count*  | 
| MissingBlocks  | The number of blocks in which HDFS has no replicas. These might be corrupt blocks. Use case: Monitor cluster health Units: *Count*  | 
| TotalLoad  | The current, total number of readers and writers reported by all DataNodes in a cluster. Use case: Diagnose the degree to which high I/O might be contributing to poor job execution performance. Worker nodes running the DataNode daemon must also perform map and reduce tasks. Persistently high TotalLoad values over time can indicate that high I/O might be a contributing factor to poor performance. Occasional spikes in this value are typical and do not usually indicate a problem. Units: *Count*  | 

#### Cluster capacity metrics
<a name="emr-metrics-managed-scaling"></a>

The following metrics indicate the current or target capacities of a cluster. These metrics are only available when managed scaling or auto-termination is enabled. 

For clusters composed of instance fleets, the cluster capacity metrics are measured in `Units`. For clusters composed of instance groups, the cluster capacity metrics are measured in `Nodes` or `VCPU` based on the unit type used in the managed scaling policy. For more information, see [Using EMR-managed scaling](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-scaling.html) in the *Amazon EMR Management Guide*.


| Metric | Description | 
| --- | --- | 
| [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html) | The target total number of units/nodes/vCPUs in a cluster as determined by managed scaling. Units: *Count*  | 
| [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html)  | The current total number of units/nodes/vCPUs available in a running cluster. When a cluster resize is requested, this metric will be updated after the new instances are added or removed from the cluster. Units: *Count*  | 
| [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html)  | The target number of CORE units/nodes/vCPUs in a cluster as determined by managed scaling. Units: *Count*  | 
| [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html)  | The current number of CORE units/nodes/vCPUs running in a cluster. Units: *Count*  | 
| [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html)  | The target number of TASK units/nodes/vCPUs in a cluster as determined by managed scaling. Units: *Count*  | 
| [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html)  | The current number of TASK units/nodes/vCPUs running in a cluster. Units: *Count*  | 

Amazon EMR emits the following metrics at a one-minute granularity when you enable auto-termination using an auto-termination policy. Some metrics are only available for Amazon EMR versions 6.4.0 and later. To learn more about auto-termination, see [Using an auto-termination policy for Amazon EMR cluster cleanup](emr-auto-termination-policy.md).


****  

| Metric | Description | 
| --- | --- | 
| TotalNotebookKernels | The total number of running and idle notebook kernels on the cluster. This metric is only available for Amazon EMR versions 6.4.0 and later. | 
| AutoTerminationIsClusterIdle | Indicates whether the cluster is in use.A value of **0** indicates that the cluster is in active use by one of the following components:[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_ViewingMetrics.html) A value of **1** indicates that the cluster is idle. Amazon EMR checks for continuous cluster idleness (`AutoTerminationIsClusterIdle` = 1). When a cluster's idle time equals the `IdleTimeout` value in your auto-termination policy, Amazon EMR terminates the cluster.  | 

### Dimensions for Amazon EMR metrics
<a name="emr-metrics-dimensions"></a>

Amazon EMR data can be filtered using any of the dimensions in the following table. 


| Dimension  | Description  | 
| --- | --- | 
| JobFlowId | The same as cluster ID, which is the unique identifier of a cluster in the form j-XXXXXXXXXXXXX. Find this value by clicking on the cluster in the Amazon EMR console.  | 

# Monitoring Amazon EMR events with CloudWatch
<a name="emr-manage-cloudwatch-events"></a>

Amazon EMR tracks events and keeps information about them for up to seven days in the Amazon EMR console. Amazon EMR records events when there is a change in the state of clusters, instance groups, instance fleets, automatic scaling policies, or steps. Events capture the date and time the event occurred, details about the affected elements, and other critical data points.

The following table lists Amazon EMR events, along with the state or state change that the event indicates, the severity of the event, event type, event code, and event messages. Amazon EMR represents events as JSON objects and automatically sends them to an event stream. The JSON object is important when you set up rules for event processing using CloudWatch Events because rules seek to match patterns in the JSON object. For more information, see [Events and event patterns](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/CloudWatchEventsandEventPatterns.html) and [Amazon EMR events](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#emr_event_type) in the *Amazon CloudWatch Events User Guide*.

**Note**  
EMR periodically emits events with the event code **EC2 provisioning - Insufficient Instance Capacity**. These events occur when your Amazon EMR cluster encounters an insufficient capacity error from Amazon EMR for your instance fleet or instance group during cluster creation or resize operation. An event might not include all the instance types and AZs you have provided, because EMR only includes the instance types and AZs it attempted to provision capacity in since the last the Insufficient capacity event was emitted. For information on how to respond to these events, see [Responding to Amazon EMR cluster insufficient instance capacity events](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-events-response-insuff-capacity.html).

## Cluster start events
<a name="emr-cloudwatch-cluster-events"></a>


| State or state change | Severity | Event type | Event code | Message | 
| --- | --- | --- | --- | --- | 
| CREATING | WARN | EMR instance fleet provisioning | EC2 provisioning - Insufficient Instance Capacity | We are not able to create your Amazon EMR cluster ClusterId (ClusterName) for Instance Fleet InstanceFleetID Amazon EC2 has insufficient Spot capacity for Instance type [Instancetype1, Instancetype2] and insufficient On-Demand capacity for Instance type [Instancetype3, Instancetype4] in Availability Zone [AvailabilityZone1, AvaliabilityZone2]. Check here [documentation](emr-EC2_INSUFFICIENT_CAPACITY-error.md) for more information on how to respond to this event. | 
| CREATING | WARN | EMR instance group provisioning | EC2 provisioning - Insufficient Instance Capacity | We are not able to create your Amazon EMR cluster ClusterId (ClusterName) for Instance Group InstanceGroupID Amazon EC2 has insufficient Spot capacity for Instance type [Instancetype1, Instancetype2] and insufficient On-Demand capacity for Instance type [Instancetype3, Instancetype4] in Availability Zone [AvailabilityZone1, AvaliabilityZone2]. Check here [documentation](emr-EC2_INSUFFICIENT_CAPACITY-error.md) for more information on how to respond to this event. | 
| CREATING | WARN | EMR instance fleet provisioning | EC2 provisioning - Insufficient Free Addresses In Subnet | We can’t create the Amazon EMR cluster ClusterId (ClusterName) that you requested for instance fleet InstanceFleetID because the specified subnet [Subnet1, Subnet2] doesn't contain enough free private IP addresses to fulfill your request. Use the DescribeSubnets operation to see how many IP addresses are available (unused) in your subnet. For information on how to respond to this event, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html) | 
| CREATING | WARN | EMR instance group provisioning | EC2 provisioning - Insufficient Free Addresses In Subnet | We can’t create the Amazon EMR cluster ClusterId (ClusterName) that you requested for instance group InstanceGroupID because the specified subnet [Subnet1, Subnet2] doesn't contain enough free private IP addresses to fulfill your request. Use the DescribeSubnets operation to see how many IP addresses are available (unused) in your subnet. For information on how to respond to this event, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html) | 
| CREATING  | WARN  | EMR instance fleet provisioning  | EC2 Provisioning – vCPU Limit Exceeded  | The provision of InstanceFleetID in the Amazon EMR cluster ClusterId (ClusterName) is delayed because you've reached the limit on the number of vCPUs (virtual processing units) assigned to the running instances in your account (accountId). For more information, [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html)  | 
| CREATING  | WARN  | EMR instance group provisioning  | EC2 Provisioning – vCPU Limit Exceeded  | The provision of instance group InstanceGroupID in the Amazon EMR cluster ClusterId is delayed because you've reached the limit on the number of vCPUs (virtual processing units) assigned to the running instances in your account (accountId). For more information, [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html)  | 
| CREATING  | WARN  | EMR instance fleet provisioning  | EC2 Provisioning – Spot Instance Count Limit Exceeded  | The provision of instance fleet InstanceFleetID in the Amazon EMR cluster ClusterID (ClusterName) is delayed because you've reached the limit on the number of Spot Instances that you can launch in your account (accountId). For more information, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html).  | 
| CREATING  | WARN  | EMR instance group provisioning  | EC2 Provisioning – Spot Instance Count Limit Exceeded  | The provision of instance group InstanceGroupID in the Amazon EMR cluster ClusterID (ClusterName) is delayed because you've reached the limit on the number of Spot Instances that you can launch in your account (accountId). For more information, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html).  | 
| CREATING  | WARN  | EMR instance fleet provisioning  | EC2 Provisioning – Instance Limit Exceeded  | The provision of instance fleet InstanceFleetID in the Amazon EMR cluster ClusterId (ClusterName) is delayed because you've reached the limit on the number of instances you can run concurrently in your account (accountID). For more information on Amazon EC2 service limits, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html).  | 
| CREATING  | WARN  | EMR instance group provisioning  | EC2 Provisioning – Instance Limit Exceeded  | The provision of instance group InstanceGroupID in the Amazon EMR cluster ClusterId (ClusterName) is delayed because you've reached the limit on the number of instances you can run concurrently in your account (accountID). For more information on Amazon EC2 service limits, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html).  | 
| CREATING | WARN | EMR instance group provisioning | *none* | Amazon EMR cluster `ClusterId (ClusterName)` was created at `Time` and is ready for use. - or -  Amazon EMR cluster `ClusterId (ClusterName)` finished running all pending steps at `Time`.  A cluster in the `WAITING` state may still be processing jobs.   | 
| STARTING  | INFO  | EMR cluster state change  | *none*  | Amazon EMR cluster `ClusterId (ClusterName)` was requested at `Time` and is being created.  | 
| STARTING  | INFO  | EMR cluster state change  | *none*  |  Applies only to clusters with the instance fleets configuration and multiple Availability Zones selected within Amazon EC2.  Amazon EMR cluster `ClusterId (ClusterName)` is being created in zone (`AvailabilityZoneID`), which was chosen from the specified Availability Zone options.  | 
| STARTING  | INFO  | EMR cluster state change  | *none*  | Amazon EMR cluster `ClusterId (ClusterName)` began running steps at `Time`.  | 
| WAITING  | INFO  | EMR cluster state change  | *none*  | Amazon EMR cluster `ClusterId (ClusterName)` was created at `Time` and is ready for use. - or -  Amazon EMR cluster `ClusterId (ClusterName)` finished running all pending steps at `Time`.  A cluster in the `WAITING` state may still be processing jobs.   | 

**Note**  
The events with event code `EC2 provisioning - Insufficient Instance Capacity` periodically emit when your EMR cluster encounters an insufficient capacity error from Amazon EC2 for your instance fleet or instance group during cluster creation or resize operation. For information on how to respond to these events, see [Responding to Amazon EMR cluster insufficient instance capacity events](emr-events-response-insuff-capacity.md).

## Cluster termination events
<a name="emr-cloudwatch-cluster-termination-events"></a>


| State or state change | Severity | Event type | Event code | Message | 
| --- | --- | --- | --- | --- | 
| TERMINATED  | The severity depends on the reason for the state change, as shown in the following: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-cloudwatch-events.html)  | EMR cluster state change  | *none*  | Amazon EMR Cluster `ClusterId (ClusterName)` has terminated at `Time` with a reason of `StateChangeReason:Code`.  | 
| TERMINATED\$1WITH\$1ERRORS  | CRITICAL  | EMR cluster state change  | *none*  | Amazon EMR Cluster `ClusterId (ClusterName)` has terminated with errors at `Time` with a reason of `StateChangeReason:Code`.  | 
| TERMINATED\$1WITH\$1ERRORS  | CRITICAL  | EMR cluster state change  | *none*  | Amazon EMR Cluster `ClusterId (ClusterName)` has terminated with errors at `Time` with a reason of `StateChangeReason:Code`.  | 

## Instance fleet state-change events
<a name="emr-cloudwatch-instance-fleet-events"></a>

**Note**  
The instance fleets configuration is available only in Amazon EMR releases 4.8.0 and later, excluding 5.0.0 and 5.0.3.


****  

| State or state change | Severity | Event type | Event code | Message | 
| --- | --- | --- | --- | --- | 
| From `PROVISIONING` to `WAITING`  | INFO  |  | none | Provisioning for instance fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` is complete. Provisioning started at `Time` and took `Num` minutes. The instance fleet now has On-Demand capacity of `Num` and Spot capacity of `Num`. Target On-Demand capacity was `Num`, and target Spot capacity was `Num`.  | 
| From `WAITING` to `RESIZING`  | INFO  |  | none | A resize for instance fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` started at `Time`. The instance fleet is resizing from an On-Demand capacity of `Num` to a target of `Num`, and from a Spot capacity of `Num` to a target of `Num`.  | 
| From `RESIZING` to `WAITING`  | INFO  |  | none | The resizing operation for instance fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` is complete. The resize started at `Time` and took `Num` minutes. The instance fleet now has On-Demand capacity of `Num` and Spot capacity of `Num`. Target On-Demand capacity was `Num` and target Spot capacity was `Num`.  | 
| From `RESIZING` to `WAITING`  | INFO  |  | none | The resizing operation for instance fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` has reached the timeout and stopped. The resize started at `Time` and stopped after `Num` minutes. The instance fleet now has On-Demand capacity of `Num` and Spot capacity of `Num`. Target On-Demand capacity was `Num` and target Spot capacity was `Num`.  | 
| SUSPENDED  | ERROR  |  | none | Instance fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` was arrested at `Time` for the following reason: `ReasonDesc`.  | 
| RESIZING  | WARNING  |  | none | The resizing operation for instance fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` is stuck for the following reason: `ReasonDesc`.  | 
| `WAITING` or `Running`  | INFO  |  | none | The resizing operation for instance fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` couldn't complete while Amazon EMR added Spot capacity in availability zone `AvailabilityZone`. We've cancelled your request to provision additional Spot capacity. For recommended actions, check [Availability Zone flexibility for an Amazon EMR cluster](emr-flexibility.md) and try again.  | 
| `WAITING` or `Running`  | INFO  |  | none | A resizing operation for instance fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` was initiated by `Entity` at `Time`.  | 

## Instance fleet reconfiguration events
<a name="emr-cloudwatch-instance-fleet-events-reconfig"></a>


****  

| State or state change | Severity | Message | 
| --- | --- | --- | 
| Instance Fleet Reconfiguration Requested  | INFO  | A user has requested to reconfigure the instance fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId` (`ClusterName`).  | 
| Instance Fleet Reconfiguration Start  | INFO  | Amazon EMR has started a reconfiguration of the instance fleet `InstanceFleetID` in the Amazon EMR cluster `ClusterId` (`ClusterName`) at `Time`.  | 
| Instance Fleet Reconfiguration Completed  | INFO  | Amazon EMR has finished reconfiguring instance fleet `InstanceFleetID` in the Amazon EMR cluster `ClusterId` (`ClusterName`).  | 
| Instance Fleet Reconfiguration Failed  | WARNING  | Amazon EMR failed to reconfigure the instance fleet `InstanceFleetID` in the Amazon EMR cluster `ClusterId` (`ClusterName`) at `Time`. The reconfiguration failed because `Reason`.  | 
| Instance Fleet Reconfiguration Reversion Start  | INFO  | Amazon EMR is reverting the instance fleet `InstanceFleetID` in the Amazon EMR cluster `ClusterId` (`ClusterName`) to the previous successful configuration.  | 
| Instance Fleet Reconfiguration Reversion Completed  | INFO  | Amazon EMR finished reverting the instance fleet `InstanceFleetID` in the Amazon EMR cluster `ClusterId` (`ClusterName`) to the previous successful configuration.  | 
| Instance Fleet Reconfiguration Reversion Failed  | CRITICAL  | Amazon EMR couldn't revert the instance fleet `InstanceFleetID` in the Amazon EMR cluster `ClusterId` (`ClusterName`) to the previously successful configuration at `Time`. The reconfiguration reversion failed because of `Reason`.  | 
| Instance Fleet Reconfiguration Reversion Blocked  | INFO  | Amazon EMR tmeporarily blocked the instance fleet `InstanceFleetID` in the Amazon EMR cluster `ClusterId` (`ClusterName`) at `Time` because the instance fleet is in the `State` state.  | 

## Instance fleet resize events
<a name="emr-cloudwatch-instance-fleet-resize-events"></a>


****  

| Event type | Severity | Event code | Message | 
| --- | --- | --- | --- | 
| EMR instance fleet resize   | ERROR | Spot Provisioning timeout  | The Resize operation for Instance Fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` was not able to complete while acquiring Spot capacity in AZ `AvailabilityZone`. We have now cancelled your request and stopped trying to provision any additional Spot capacity and the Instance Fleet has provisioned Spot capacity of `num`. Target Spot capacity was `num`. For more information and recommended actions, please check the documentation page [here](emr-flexibility.md) and retry again.  | 
| EMR instance fleet resize   | ERROR | On-Demand Provisioning timeout  | The Resize operation for Instance Fleet `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` was not able to complete while acquiring On-Demand capacity in AZ `AvailabilityZone`. We have now cancelled your request and stopped trying to provision any additional On-Demand capacity and the Instance Fleet has provisioned On-Demand capacity of `num`. Target On-Demand capacity was `num`. For more information and recommended actions, please check the documentation page [here](emr-flexibility.md) and retry again.  | 
| EMR instance fleet resize   | WARNING | EC2 provisioning - Insufficient Instance Capacity | We are not able to complete the resize operation for Instance Fleet `InstanceFleetID` in EMR cluster `ClusterId (ClusterName)` as Amazon EC2 has insufficient Spot capacity for Instance types `[Instancetype1, Instancetype2]` and insufficient On-Demand capacity for Instance types `[Instancetype3, Instancetype4]` in Availability Zone `[AvailabilityZone1]`. So far, the instance fleet has provisioned On-Demand capacity of `num` and target On-Demand capacity was `num`. Provisioned Spot capacity is `num` and target Spot capacity was `num`. Check here [documentation](emr-EC2_INSUFFICIENT_CAPACITY-error.md) for more information on how to respond to this event.  | 
| EMR instance fleet resize   | WARNING | Spot Provisioning Timeout - Continuing Resize  | We're still provisioning Spot capacity for the Instance Fleet resize operation that initiated at `time` for instance fleet ID `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` for `[Instancetype1, Instancetype2]` in AZ `AvailabilityZone`. For the previous resize operation that initiated at `time`, the timeout period expired, so Amazon EMR stopped provisioning Spot capacity after adding `num` of the requested `num` instances to your instance fleet. For more information, please check the documentation page [here](emr-flexibility.md). | 
| EMR instance fleet resize   | WARNING | On-Demand Provisioning Timeout - Continuing Resize  | We're still provisioning On-Demand capacity for the Instance Fleet resize operation that initiated at `time` for instance fleet ID `InstanceFleetID` in Amazon EMR cluster `ClusterId (ClusterName)` for `[Instancetype1, Instancetype2]` in AZ `AvailabilityZone`. For the previous resize operation that initiated at `time`, the timeout period expired, so Amazon EMR stopped provisioning On-Demand capacity after adding `num` of the requested `num` instances to your instance fleet. For more information, please check the documentation page [here](emr-flexibility.md). | 
| EMR instance fleet resize   | WARNING | EC2 Provisioning - Insufficient Free Address in Subnet  | We can't complete the resize operation for instance fleet InstanceFleetID in Amazon EMR cluster ClusterId (ClusterName) because the specified subnet [Subnet1, Subnet2] doesn't contain enough free private IP addresses to fulfill your request. Use the DescribeSubnets operation to view how many IP addresses are available (unused) in your subnet. For information on how to respond to this event, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html). | 
| EMR instance fleet resize   | WARNING | EC2 Provisioning - vCPU Limit Exceeded  | The resize of instance fleet InstanceFleetID in the Amazon EMR cluster ClusterName is delayed because you've reached the limit on the number of vCPUs (virtual processing units) assigned to the running instances in your account (accountId). For more information, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html). | 
| EMR instance fleet resize  | WARNING | EC2 Provisioning - Spot Instance Count Limit Exceeded  | The provision of instance fleet InstanceFleetID in the Amazon EMR cluster ClusterID (ClusterName) is delayed because you've reached the limit on the number of Spot Instances that you can launch in your account (accountId). For more information, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html).  | 
| EMR instance fleet resize   | WARNING | EC2 Provisioning - Instance Limit Exceeded  | The provision of instance fleet InstanceFleetID in the Amazon EMR cluster ClusterID (ClusterName) is delayed because you've reached the limit on the number of on-demand instances you can run in your account (accountId). For more information on [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html).  | 

**Note**  
The provisioning timeout events are emitted when Amazon EMR stops provisioning Spot or On-demand capacity for the fleet after the timeout expires. For information on how to respond to these events, see [Responding to Amazon EMR cluster instance fleet resize timeout events](emr-events-response-timeout-events.md) .

## Instance group events
<a name="emr-cloudwatch-instance-group-events"></a>


****  

| Event type | Severity | Event code | Message | 
| --- | --- | --- | --- | 
| From `RESIZING` to `Running`  | INFO  | none | The resizing operation for instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` is complete. It now has an instance count of `Num`. The resize started at `Time` and took `Num` minutes to complete.  | 
| From `RUNNING` to `RESIZING`  | INFO  | none | A resize for instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` started at `Time`. It is resizing from an instance count of `Num` to `Num`.  | 
| SUSPENDED  | ERROR  | none | Instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` was arrested at `Time` for the following reason: `ReasonDesc`.  | 
| RESIZING  | WARNING  | none | The resizing operation for instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` is stuck for the following reason: `ReasonDesc`.  | 
| EMR instance group resize   | WARNING | EC2 provisioning - Insufficient Instance Capacity | We are not able to complete the resize operation that started at `time` for Instance Group `InstanceGroupID` in EMR cluster `ClusterId (ClusterName)` as Amazon EC2 has insufficient `Spot/On Demand` capacity for Instance type `[Instancetype]` in Availability Zone `[AvailabilityZone1]`. So far, the instance group has a running instance count of `num` and requested instance count was `num`. Check here [documentation](emr-EC2_INSUFFICIENT_CAPACITY-error.md) for more information on how to respond to this event.  | 
| EMR instance group resize   | WARNING | EC2 Provisioning - Insufficient Free Address in Subnet  | We can't complete the resize operation for instance group InstanceGroupID in Amazon EMR cluster ClusterId (ClusterName) because the specified subnet [Subnet1, Subnet2] doesn't contain enough free private IP addresses to fulfill your request. Use the DescribeSubnets operation to view how many IP addresses are available (unused) in your subnet. For information on how to respond to this event, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html). | 
| EMR instance group resize   | WARNING | EC2 Provisioning - vCPU Limit Exceeded  | The resize of instance group InstanceGroupID in the Amazon EMR cluster ClusterName is delayed because you've reached the limit on the number of vCPUs (virtual processing units) assigned to the running instances in your account (accountId). For more information, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html). | 
| EMR instance group resize   | WARNING | EC2 Provisioning - Spot Instance Count Limit Exceeded  | The provision of instance group InstanceGroupID in the Amazon EMR cluster ClusterID (ClusterName) is delayed because you've reached the limit on the number of Spot Instances that you can launch in your account (accountId). For more information, see [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html).  | 
| EMR instance group resize   | WARNING | EC2 Provisioning - Instance Limit Exceeded  | The provision of instance group InstanceGroupID in the Amazon EMR cluster ClusterID (ClusterName) is delayed because you've reached the limit on the number of on-demand instances you can run in your account (accountId). For more information on [Error codes for the Amazon EC2 API](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/errors-overview.html).  | 
| From `RUNNING` to `RESIZING`  | INFO  | none | A resize for instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` was initiated by `Entity` at `Time`.  | 

**Note**  
With Amazon EMR version 5.21.0 and later, you can override cluster configurations and specify additional configuration classifications for each instance group in a running cluster. You do this by using the Amazon EMR console, the AWS Command Line Interface (AWS CLI), or the AWS SDK. For more information, see [Supplying a Configuration for an Instance Group in a Running Cluster](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html).

The following table lists Amazon EMR events for the reconfiguration operation, along with the state or state change that the event indicates, the severity of the event, and event messages. 


****  

| State or state change | Severity | Message | 
| --- | --- | --- | 
| RUNNING  | INFO  | A reconfiguration for instance group `InstanceGroupID` in the Amazon EMR cluster `ClusterId (ClusterName)` was initiated by user at `Time`. Version of requested configuration is `Num`.  | 
| From `RECONFIGURING` to `Running` | INFO  | The reconfiguration operation for instance group `InstanceGroupID` in the Amazon EMR cluster `ClusterId (ClusterName)` is complete. The reconfiguration started at `Time` and took `Num` minutes to complete. Current configuration version is `Num`.  | 
| From `RUNNING` to `RECONFIGURING` in  | INFO  | A reconfiguration for instance group `InstanceGroupID` in the Amazon EMR cluster `ClusterId (ClusterName)` started at `Time`. It is configuring from version number `Num` to version number `Num`.  | 
| RESIZING  | INFO  | Reconfiguring operation towards configuration version `Num` for instance group `InstanceGroupID` in the Amazon EMR cluster `ClusterId (ClusterName)` is temporarily blocked at `Time` because instance group is in `State`.  | 
| RECONFIGURING  | INFO  | Resizing operation towards instance count Num for instance group InstanceGroupID in the Amazon EMR cluster ClusterId (ClusterName) is temporarily blocked at Time because the instance group is in State. | 
| RECONFIGURING  | WARNING  | The reconfiguration operation for instance group `InstanceGroupID` in the Amazon EMR cluster `ClusterId (ClusterName)` failed at `Time` and took `Num` minutes to fail. Failed configuration version is `Num`.   | 
| RECONFIGURING  | INFO  | Configurations are reverting to the previous successful version number `Num`for instance group `InstanceGroupID` in the Amazon EMR cluster `ClusterId (ClusterName)` at `Time`. New configuration version is `Num`.   | 
| From `RECONFIGURING` to `Running` | INFO  | Configurations were successfully reverted to the previous successful version `Num` for instance group `InstanceGroupID` in the Amazon EMR cluster `ClusterId (ClusterName)` at `Time`. New configuration version is `Num`.  | 
| From `RECONFIGURING` to `SUSPENDED`  | CRITICAL  | Failed to revert to the previous successful version `Num` for Instance group `InstanceGroupID` in the Amazon EMR cluster `ClusterId (ClusterName)` at `Time`.  | 

## Automatic scaling policy events
<a name="emr-cloudwatch-autoscale-events"></a>


****  

| State or state change | Severity | Message | 
| --- | --- | --- | 
| PENDING  | INFO  | An Auto Scaling policy was added to instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` at `Time`. The policy is pending attachment. - or -  The Auto Scaling policy for instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` was updated at `Time`. The policy is pending attachment.  | 
| ATTACHED  | INFO  | The Auto Scaling policy for instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` was attached at `Time`.  | 
| `DETACHED`  | INFO  | The Auto Scaling policy for instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` was detached at `Time`.  | 
| FAILED  | ERROR  | The Auto Scaling policy for instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` could not attach and failed at `Time`. - or -  The Auto Scaling policy for instance group `InstanceGroupID` in Amazon EMR cluster `ClusterId (ClusterName)` could not detach and failed at `Time`.  | 

## Step events
<a name="emr-cloudwatch-step-events"></a>


****  

| State or state change | Severity | Message | 
| --- | --- | --- | 
| PENDING  | INFO  | Step `StepID (StepName)` was added to Amazon EMR cluster `ClusterId (ClusterName)` at `Time` and is pending execution.   | 
| CANCEL\$1PENDING  | WARN  | Step `StepID (StepName)` in Amazon EMR cluster `ClusterId (ClusterName)` was cancelled at `Time` and is pending cancellation.   | 
| RUNNING  | INFO  | Step `StepID (StepName)` in Amazon EMR cluster `ClusterId (ClusterName)` started running at `Time`.   | 
| COMPLETED  | INFO  | Step `StepID (StepName)` in Amazon EMR cluster `ClusterId (ClusterName)` completed execution at `Time`. The step started running at `Time` and took `Num` minutes to complete.  | 
| CANCELLED  | WARN  | Cancellation request has succeeded for cluster step `StepID (StepName)` in Amazon EMR cluster `ClusterId (ClusterName)` at `Time`, and the step is now cancelled.   | 
| FAILED  | ERROR  | Step `StepID (StepName)` in Amazon EMR cluster `ClusterId (ClusterName)` failed at `Time`.  | 

## Unhealthy node replacement events
<a name="emr-cloudwatch-unhealthy-node-replacement-events"></a>


| Event type | Severity | Event code | Message | 
| --- | --- | --- | --- | 
| Amazon EMR unhealthy node replacement | INFO | Unhealthy core node detected | Amazon EMR has identified that core instance `[instanceID (InstanceName)]` in `InstanceGroup/Fleet` in the Amazon EMR cluster `clusterID (ClusterName)` is `UNHEALTHY`. Amazon EMR will attempt to recover or gracefully replace the `UNHEALTHY` instance.  | 
| Amazon EMR unhealthy node replacement | INFO | Core node unhealthy - replacement disabled | Amazon EMR has identified that core instance `[instanceID (InstanceName)]` in `InstanceGroup/Fleet` in the Amazon EMR cluster `(clusterID) (ClusterName)` is `UNHEALTHY`. Turn on graceful unhealthy core node replacement in your cluster to let Amazon EMR gracefully replace the `UNHEALTHY` instances in the event that they can’t be recovered.  | 
| Amazon EMR unhealthy node replacement | WARN | Unhealthy core node not replaced | Amazon EMR can't replace your `UNHEALTHY` core instance `[instanceID (InstanceName)]` in `InstanceGroup/Fleet` in the Amazon EMR cluster `clusterID (ClusterName)` because of *reason*. The reason of why Amazon EMR can't replace your core node differs depending on your scenario. For example, one reason of why Amazon EMR can't delete a node is because a cluster wouldn't have any remaining core nodes.  | 
| Amazon EMR unhealthy node replacement | INFO | Unhealthy core node recovered | Amazon EMR has recovered your `UNHEALTHY` core instances `[instanceID (InstanceName)]` in `InstanceGroup/Fleet` in the Amazon EMR cluster `clusterID (ClusterName)`  | 

For more information about unhealthy node replacement, see [Replacing unhealthy nodes](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-node-replacement.html).

## Viewing events with the Amazon EMR console
<a name="emr-events-console"></a>

For each cluster, you can view a simple list of events in the details pane, which lists events in descending order of occurrence. You can also view all events for all clusters in a region in descending order of occurrence.

If you don't want a user to see all cluster events for a region, add a statement that denies permission (`"Effect": "Deny"`) for the `elasticmapreduce:ViewEventsFromAllClustersInConsole` action to a policy that is attached to the user. 

**To view events for all clusters in a Region with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Events**.

**To view events for a particular cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose a cluster.

1. To view all of your events, select the **Events** tab on the cluster details page.

# Responding to CloudWatch events from Amazon EMR
<a name="emr-events-response"></a>

This section describes various ways that you can respond to actionable events that Amazon EMR emits as [CloudWatch event messages](emr-manage-cloudwatch-events.md). Ways you can respond to events include creating rules, setting alarms, and other responses. The sections that follow include links to procedures and recommneded responses to common evens.

**Topics**
+ [Creating rules for Amazon EMR events with CloudWatch](emr-events-cloudwatch-console.md)
+ [Setting alarms on CloudWatch metrics from Amazon EMR](UsingEMR_ViewingMetrics_Alarm.md)
+ [Responding to Amazon EMR cluster insufficient instance capacity events](emr-events-response-insuff-capacity.md)
+ [Responding to Amazon EMR cluster instance fleet resize timeout events](emr-events-response-timeout-events.md)

# Creating rules for Amazon EMR events with CloudWatch
<a name="emr-events-cloudwatch-console"></a>

Amazon EMR automatically sends events to a CloudWatch event stream. You can create rules that match events according to a specified pattern, and route the events to targets to take action, such as sending an email notification. Patterns are matched against the event JSON object. For more information about Amazon EMR event details, see [Amazon EMR events](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/EventTypes.html#emr_event_type) in the *Amazon CloudWatch Events User Guide*.

For information about setting up CloudWatch event rules, see [Creating a CloudWatch rule that triggers on an event](https://docs.aws.amazon.com/AmazonCloudWatch/latest/events/Create-CloudWatch-Events-Rule.html).

# Setting alarms on CloudWatch metrics from Amazon EMR
<a name="UsingEMR_ViewingMetrics_Alarm"></a>

Amazon EMR pushes metrics to Amazon CloudWatch. In response, you can use CloudWatch to set alarms on your Amazon EMR metrics. For example, you can configure an alarm in CloudWatch to send you an email any time the HDFS utilization rises above 80%. For detailed instructions, see [Create or edit a CloudWatch alarm](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/ConsoleAlarms.html) in the *Amazon CloudWatch User Guide*. 

# Responding to Amazon EMR cluster insufficient instance capacity events
<a name="emr-events-response-insuff-capacity"></a>

## Overview
<a name="emr-events-response-insuff-capacity-overview"></a>

Amazon EMR clusters return the event code `EC2 provisioning - Insufficient Instance Capacity` when the selected Availability Zone doesn't have enough capacity to fulfill your cluster start or resize request. The event emits periodically with both instance groups and instance fleets if Amazon EMR repeatedly encounters insufficient capacity exceptions and can't fulfill your provisioning request for a cluster start or cluster resize operation.

This page describes how you can best respond to this event type when it occurs for your EMR cluster.

## Recommended response to an insufficient capacity event
<a name="emr-events-response-insuff-capacity-rec"></a>

We recommend that you respond to an insufficient-capacity event in one of the following ways:
+ Wait for capacity to recover. Capacity shifts frequently, so an insufficient capacity exception can recover on its own. Your clusters will start or finish resizing as soon as Amazon EC2 capacity becomes available.
+ Alternatively, you can terminate your cluster, modify your instance type configurations, and create a new cluster with the updated cluster configuration request. For more information, see [Availability Zone flexibility for an Amazon EMR cluster](emr-flexibility.md).

You can also set up rules or automated responses to an insufficient capacity event, as described in the next section.

## Automated recovery from an insufficient capacity event
<a name="emr-events-response-insuff-capacity-ex"></a>

You can build automation in response to Amazon EMR events such as the ones with event code `EC2 provisioning - Insufficient Instance Capacity`. For example, the following AWS Lambda function terminates an EMR cluster with an instance group that uses On-Demand instances, and then creates a new EMR cluster with an instance group that contains different instance types than the original request.

The following conditions trigger the automated process to occur:
+ The insufficient capacity event has been emitting for primary or core nodes for more than 20 minutes.
+ The cluster is not in a **READY** or **WAITING** state. For more information about EMR cluster states, see [Understanding the cluster lifecycle](emr-overview.md#emr-overview-cluster-lifecycle).

**Note**  
When you build an automated process for an insufficient capacity exception, you should consider that the insufficient capacity event is recoverable. Capacity often shifts and your clusters will resume the resize or start operation as soon as Amazon EC2 capacity becomes available.

**Example function to respond to insufficient capacity event**  

```
// Lambda code with Python 3.10 and handler is lambda_function.lambda_handler
// Note: related IAM role requires permission to use Amazon EMR

import json
import boto3
import datetime
from datetime import timezone

INSUFFICIENT_CAPACITY_EXCEPTION_DETAIL_TYPE = "EMR Instance Group Provisioning"
INSUFFICIENT_CAPACITY_EXCEPTION_EVENT_CODE = (
    "EC2 provisioning - Insufficient Instance Capacity"
)
ALLOWED_INSTANCE_TYPES_TO_USE = [
    "m5.xlarge",
    "c5.xlarge",
    "m5.4xlarge",
    "m5.2xlarge",
    "t3.xlarge",
]
CLUSTER_START_ACCEPTABLE_STATES = ["WAITING", "RUNNING"]
CLUSTER_START_SLA = 20

CLIENT = boto3.client("emr", region_name="us-east-1")

# checks if the incoming event is 'EMR Instance Fleet Provisioning' with eventCode 'EC2 provisioning - Insufficient Instance Capacity'
def is_insufficient_capacity_event(event):
    if not event["detail"]:
        return False
    else:
        return (
            event["detail-type"] == INSUFFICIENT_CAPACITY_EXCEPTION_DETAIL_TYPE
            and event["detail"]["eventCode"]
            == INSUFFICIENT_CAPACITY_EXCEPTION_EVENT_CODE
        )


# checks if the cluster is eligible for termination
def is_cluster_eligible_for_termination(event, describeClusterResponse):
    # instanceGroupType could be CORE, MASTER OR TASK
    instanceGroupType = event["detail"]["instanceGroupType"]
    clusterCreationTime = describeClusterResponse["Cluster"]["Status"]["Timeline"][
        "CreationDateTime"
    ]
    clusterState = describeClusterResponse["Cluster"]["Status"]["State"]

    now = datetime.datetime.now()
    now = now.replace(tzinfo=timezone.utc)
    isClusterStartSlaBreached = clusterCreationTime < now - datetime.timedelta(
        minutes=CLUSTER_START_SLA
    )

    # Check if instance group receiving Insufficient capacity exception is CORE or PRIMARY (MASTER),
    # and it's been more than 20 minutes since cluster was created but the cluster state and the cluster state is not updated to RUNNING or WAITING
    if (
        (instanceGroupType == "CORE" or instanceGroupType == "MASTER")
        and isClusterStartSlaBreached
        and clusterState not in CLUSTER_START_ACCEPTABLE_STATES
    ):
        return True
    else:
        return False


# Choose item from the list except the exempt value
def choice_excluding(exempt):
    for i in ALLOWED_INSTANCE_TYPES_TO_USE:
        if i != exempt:
            return i


# Create a new cluster by choosing different InstanceType.
def create_cluster(event):
    # instanceGroupType cloud be CORE, MASTER OR TASK
    instanceGroupType = event["detail"]["instanceGroupType"]

    # Following two lines assumes that the customer that created the cluster already knows which instance types they use in original request
    instanceTypesFromOriginalRequestMaster = "m5.xlarge"
    instanceTypesFromOriginalRequestCore = "m5.xlarge"

    # Select new instance types to include in the new createCluster request
    instanceTypeForMaster = (
        instanceTypesFromOriginalRequestMaster
        if instanceGroupType != "MASTER"
        else choice_excluding(instanceTypesFromOriginalRequestMaster)
    )
    instanceTypeForCore = (
        instanceTypesFromOriginalRequestCore
        if instanceGroupType != "CORE"
        else choice_excluding(instanceTypesFromOriginalRequestCore)
    )

    print("Starting to create cluster...")
    instances = {
        "InstanceGroups": [
            {
                "InstanceRole": "MASTER",
                "InstanceCount": 1,
                "InstanceType": instanceTypeForMaster,
                "Market": "ON_DEMAND",
                "Name": "Master",
            },
            {
                "InstanceRole": "CORE",
                "InstanceCount": 1,
                "InstanceType": instanceTypeForCore,
                "Market": "ON_DEMAND",
                "Name": "Core",
            },
        ]
    }
    response = CLIENT.run_job_flow(
        Name="Test Cluster",
        Instances=instances,
        VisibleToAllUsers=True,
        JobFlowRole="EMR_EC2_DefaultRole",
        ServiceRole="EMR_DefaultRole",
        ReleaseLabel="emr-6.10.0",
    )

    return response["JobFlowId"]


# Terminated the cluster using clusterId received in an event
def terminate_cluster(event):
    print("Trying to terminate cluster, clusterId: " + event["detail"]["clusterId"])
    response = CLIENT.terminate_job_flows(JobFlowIds=[event["detail"]["clusterId"]])
    print(f"Terminate cluster response: {response}")


def describe_cluster(event):
    response = CLIENT.describe_cluster(ClusterId=event["detail"]["clusterId"])
    return response


def lambda_handler(event, context):
    if is_insufficient_capacity_event(event):
        print(
            "Received insufficient capacity event for instanceGroup, clusterId: "
            + event["detail"]["clusterId"]
        )

        describeClusterResponse = describe_cluster(event)

        shouldTerminateCluster = is_cluster_eligible_for_termination(
            event, describeClusterResponse
        )
        if shouldTerminateCluster:
            terminate_cluster(event)

            clusterId = create_cluster(event)
            print("Created a new cluster, clusterId: " + clusterId)
        else:
            print(
                "Cluster is not eligible for termination, clusterId: "
                + event["detail"]["clusterId"]
            )

    else:
        print("Received event is not insufficient capacity event, skipping")
```

# Responding to Amazon EMR cluster instance fleet resize timeout events
<a name="emr-events-response-timeout-events"></a>

## Overview
<a name="emr-events-response-timeout-events-overview"></a>

Amazon EMR clusters emit [events](emr-manage-cloudwatch-events.md#emr-cloudwatch-instance-fleet-resize-events) while executing the resize operation for instance fleet clusters. The provisioning timeout events are emitted when Amazon EMR stops provisioning Spot or On-demand capacity for the fleet after the timeout expires. The timeout duration can be configured by the user as part of the [resize specifications](https://docs.aws.amazon.com/emr/latest/APIReference/API_InstanceFleetResizingSpecifications.html) for the instance fleets. In scenarios of consecutive resizes for the same instance fleet, Amazon EMR emits the `Spot provisioning timeout - continuing resize` or `On-Demand provisioning timeout - continuing resize` events when timeout for the current resize operation expires. It then starts provisioning capacity for the fleet’s next resize operation.

## Responding to instance fleet resize timeout events
<a name="emr-events-response-timeout-events-rec"></a>

We recommend that you respond to a provisioning timeout event in one of the following ways:
+ Revisit the [resize specifications](https://docs.aws.amazon.com/emr/latest/APIReference/API_InstanceFleetResizingSpecifications.html) and retry the resize operation. As capacity shifts frequently, your clusters will successfully resize as soon as Amazon EC2 capacity becomes available. We recommend customers to configure lower values for the timeout duration for the jobs that require stricter SLAs.
+ Alternatively, you can either:
  + Launch a new cluster with diversified instance types based on the [best practices for instance and Availability Zone flexibility](emr-flexibility.md#emr-flexibility-types) or
  + Launch a cluster with On-demand capacity
+ For the provisioning timeout - continuing resize event, you can additionally wait for resize operations to be processed. Amazon EMR will continue to sequentially process the resize operations triggered for the fleet, respecting the configured resize specifications.

You can also set up rules or automated responses to this event as described in the next section.

## Automated recovery from a provisioning timeout event
<a name="emr-events-response-timeout-events-ex"></a>

You can build automation in response to Amazon EMR events with the `Spot Provisioning timeout` event code. For example, the following AWS Lambda function shuts down an EMR cluster with an instance fleet that uses Spot instances for Task nodes, and then creates a new EMR cluster with an instance fleet that contains more diversified instance types than the original request. In this example, the `Spot Provisioning timeout` event emitted for task nodes will trigger the execution of the Lambda function.

**Example function to respond to `Spot Provisioning timeout` event**  

```
// Lambda code with Python 3.10 and handler is lambda_function.lambda_handler
// Note: related IAM role requires permission to use Amazon EMR
 
import json
import boto3
import datetime
from datetime import timezone
 
SPOT_PROVISIONING_TIMEOUT_EXCEPTION_DETAIL_TYPE = "EMR Instance Fleet Resize"
SPOT_PROVISIONING_TIMEOUT_EXCEPTION_EVENT_CODE = (
    "Spot Provisioning timeout"
)
 
CLIENT = boto3.client("emr", region_name="us-east-1")
 
# checks if the incoming event is 'EMR Instance Fleet Resize' with eventCode 'Spot provisioning timeout'
def is_spot_provisioning_timeout_event(event):
    if not event["detail"]:
        return False
    else:
        return (
            event["detail-type"] == SPOT_PROVISIONING_TIMEOUT_EXCEPTION_DETAIL_TYPE
            and event["detail"]["eventCode"]
            == SPOT_PROVISIONING_TIMEOUT_EXCEPTION_EVENT_CODE
        )
 
 
# checks if the cluster is eligible for termination
def is_cluster_eligible_for_termination(event, describeClusterResponse):
    # instanceFleetType could be CORE, MASTER OR TASK
    instanceFleetType = event["detail"]["instanceFleetType"]
 
    # Check if instance fleet receiving Spot provisioning timeout event is TASK
    if (instanceFleetType == "TASK"):
        return True
    else:
        return False
 
 
# create a new cluster by choosing different InstanceType.
def create_cluster(event):
    # instanceFleetType cloud be CORE, MASTER OR TASK
    instanceFleetType = event["detail"]["instanceFleetType"]
 
    # the following two lines assumes that the customer that created the cluster already knows which instance types they use in original request
    instanceTypesFromOriginalRequestMaster = "m5.xlarge"
    instanceTypesFromOriginalRequestCore = "m5.xlarge"
   
    # select new instance types to include in the new createCluster request
    instanceTypesForTask = [
        "m5.xlarge",
        "m5.2xlarge",
        "m5.4xlarge",
        "m5.8xlarge",
        "m5.12xlarge"
    ]
    
    print("Starting to create cluster...")
    instances = {
        "InstanceFleets": [
            {
                "InstanceFleetType":"MASTER",
                "TargetOnDemandCapacity":1,
                "TargetSpotCapacity":0,
                "InstanceTypeConfigs":[
                    {
                        'InstanceType': instanceTypesFromOriginalRequestMaster,
                        "WeightedCapacity":1,
                    }
                ]
            },
            {
                "InstanceFleetType":"CORE",
                "TargetOnDemandCapacity":1,
                "TargetSpotCapacity":0,
                "InstanceTypeConfigs":[
                    {
                        'InstanceType': instanceTypesFromOriginalRequestCore,
                        "WeightedCapacity":1,
                    }
                ]
            },
            {
                "InstanceFleetType":"TASK",
                "TargetOnDemandCapacity":0,
                "TargetSpotCapacity":100,
                "LaunchSpecifications":{},
                "InstanceTypeConfigs":[
                    {
                        'InstanceType': instanceTypesForTask[0],
                        "WeightedCapacity":1,
                    },
                    {
                        'InstanceType': instanceTypesForTask[1],
                        "WeightedCapacity":2,
                    },
                    {
                        'InstanceType': instanceTypesForTask[2],
                        "WeightedCapacity":4,
                    },
                    {
                        'InstanceType': instanceTypesForTask[3],
                        "WeightedCapacity":8,
                    },
                    {
                        'InstanceType': instanceTypesForTask[4],
                        "WeightedCapacity":12,
                    }
                ],
                "ResizeSpecifications": {
                    "SpotResizeSpecification": {
                        "TimeoutDurationMinutes": 30
                    }
                }
            }
        ]
    }
    response = CLIENT.run_job_flow(
        Name="Test Cluster",
        Instances=instances,
        VisibleToAllUsers=True,
        JobFlowRole="EMR_EC2_DefaultRole",
        ServiceRole="EMR_DefaultRole",
        ReleaseLabel="emr-6.10.0",
    )
 
    return response["JobFlowId"]
 
 
# terminated the cluster using clusterId received in an event
def terminate_cluster(event):
    print("Trying to terminate cluster, clusterId: " + event["detail"]["clusterId"])
    response = CLIENT.terminate_job_flows(JobFlowIds=[event["detail"]["clusterId"]])
    print(f"Terminate cluster response: {response}")
 
 
def describe_cluster(event):
    response = CLIENT.describe_cluster(ClusterId=event["detail"]["clusterId"])
    return response
 
 
def lambda_handler(event, context):
    if is_spot_provisioning_timeout_event(event):
        print(
            "Received spot provisioning timeout event for instanceFleet, clusterId: "
            + event["detail"]["clusterId"]
        )
 
        describeClusterResponse = describe_cluster(event)
 
        shouldTerminateCluster = is_cluster_eligible_for_termination(
            event, describeClusterResponse
        )
        if shouldTerminateCluster:
            terminate_cluster(event)
 
            clusterId = create_cluster(event)
            print("Created a new cluster, clusterId: " + clusterId)
        else:
            print(
                "Cluster is not eligible for termination, clusterId: "
                + event["detail"]["clusterId"]
            )
 
    else:
        print("Received event is not spot provisioning timeout event, skipping")
```

# View cluster application metrics using Ganglia with Amazon EMR
<a name="ViewingGangliaMetrics"></a>

Ganglia is available with Amazon EMR releases between 4.2 and 6.15. Ganglia is an open source project which is a scalable, distributed system designed to monitor clusters and grids while minimizing the impact on their performance. When you enable Ganglia on your cluster, you can generate reports and view the performance of the cluster as a whole, as well as inspect the performance of individual node instances. Ganglia is also configured to ingest and visualize Hadoop and Spark metrics. For more information, see [Ganglia](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-ganglia.html) in the *Amazon EMR Release Guide*.

# Logging AWS EMR API calls using AWS CloudTrail
<a name="logging-using-cloudtrail"></a>

AWS EMR is integrated with [AWS CloudTrail](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html), a service that provides a record of actions taken by a user, role, or an AWS service. CloudTrail captures all API calls for AWS EMR as events. The calls captured include calls from the AWS EMR console and code calls to the AWS EMR API operations. Using the information collected by CloudTrail, you can determine the request that was made to AWS EMR, the IP address from which the request was made, when it was made, and additional details.

Every event or log entry contains information about who generated the request. The identity information helps you determine the following:
+ Whether the request was made with root user or user credentials.
+ Whether the request was made on behalf of an IAM Identity Center user.
+ Whether the request was made with temporary security credentials for a role or federated user.
+ Whether the request was made by another AWS service.

CloudTrail is active in your AWS account when you create the account and you automatically have access to the CloudTrail **Event history**. The CloudTrail **Event history** provides a viewable, searchable, downloadable, and immutable record of the past 90 days of recorded management events in an AWS Region. For more information, see [Working with CloudTrail Event history](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/view-cloudtrail-events.html) in the *AWS CloudTrail User Guide*. There are no CloudTrail charges for viewing the **Event history**.

For an ongoing record of events in your AWS account past 90 days, create a trail or a [CloudTrail Lake](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake.html) event data store.

**CloudTrail trails**  
A *trail* enables CloudTrail to deliver log files to an Amazon S3 bucket. All trails created using the AWS Management Console are multi-Region. You can create a single-Region or a multi-Region trail by using the AWS CLI. Creating a multi-Region trail is recommended because you capture activity in all AWS Regions in your account. If you create a single-Region trail, you can view only the events logged in the trail's AWS Region. For more information about trails, see [Creating a trail for your AWS account](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-create-and-update-a-trail.html) and [Creating a trail for an organization](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/creating-trail-organization.html) in the *AWS CloudTrail User Guide*.  
You can deliver one copy of your ongoing management events to your Amazon S3 bucket at no charge from CloudTrail by creating a trail, however, there are Amazon S3 storage charges. For more information about CloudTrail pricing, see [AWS CloudTrail Pricing](https://aws.amazon.com/cloudtrail/pricing/). For information about Amazon S3 pricing, see [Amazon S3 Pricing](https://aws.amazon.com/s3/pricing/).

**CloudTrail Lake event data stores**  
*CloudTrail Lake* lets you run SQL-based queries on your events. CloudTrail Lake converts existing events in row-based JSON format to [ Apache ORC](https://orc.apache.org/) format. ORC is a columnar storage format that is optimized for fast retrieval of data. Events are aggregated into *event data stores*, which are immutable collections of events based on criteria that you select by applying [advanced event selectors](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake-concepts.html#adv-event-selectors). The selectors that you apply to an event data store control which events persist and are available for you to query. For more information about CloudTrail Lake, see [Working with AWS CloudTrail Lake](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake.html) in the *AWS CloudTrail User Guide*.  
CloudTrail Lake event data stores and queries incur costs. When you create an event data store, you choose the [pricing option](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-lake-manage-costs.html#cloudtrail-lake-manage-costs-pricing-option) you want to use for the event data store. The pricing option determines the cost for ingesting and storing events, and the default and maximum retention period for the event data store. For more information about CloudTrail pricing, see [AWS CloudTrail Pricing](https://aws.amazon.com/cloudtrail/pricing/).

## AWS EMR data events in CloudTrail
<a name="cloudtrail-data-events"></a>

[Data events](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/logging-data-events-with-cloudtrail.html#logging-data-events) provide information about the resource operations performed on or in a resource (for example, reading or writing to an Amazon S3 object). These are also known as data plane operations. Data events are often high-volume activities. By default, CloudTrail doesn’t log data events. The CloudTrail **Event history** doesn't record data events.

Additional charges apply for data events. For more information about CloudTrail pricing, see [AWS CloudTrail Pricing](https://aws.amazon.com/cloudtrail/pricing/).

You can log data events for the AWS EMR resource types by using the CloudTrail console, AWS CLI, or CloudTrail API operations. For more information about how to log data events, see [Logging data events with the AWS Management Console](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/logging-data-events-with-cloudtrail.html#logging-data-events-console) and [Logging data events with the AWS Command Line Interface](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/logging-data-events-with-cloudtrail.html#creating-data-event-selectors-with-the-AWS-CLI) in the *AWS CloudTrail User Guide*.

The following table lists the AWS EMR resource types for which you can log data events. The **Data event type (console)** column shows the value to choose from the **Data event type** list on the CloudTrail console. The **resources.type value** column shows the `resources.type` value, which you would specify when configuring advanced event selectors using the AWS CLI or CloudTrail APIs. The **Data APIs logged to CloudTrail** column shows the API calls logged to CloudTrail for the resource type.

For more information about these API operations, see [ Amazon EMR WAL (EMRWAL) CLI reference](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emrwalcli-ref.html). Amazon EMR logs some Data API operations to CloudTrail that are HBase system operations that you never call directly. These operations aren't in the EMRWAL CLI reference.


| Data event type (console) | resources.type value | Data APIs logged to CloudTrail | 
| --- | --- | --- | 
| Amazon EMR write-ahead log workspace |  AWS::EMRWAL::Workspace  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/logging-using-cloudtrail.html)  | 

You can configure advanced event selectors to filter on the `eventName`, `readOnly`, and `resources.ARN` fields to log only those events that are important to you. For more information about these fields, see [https://docs.aws.amazon.com/awscloudtrail/latest/APIReference/API_AdvancedFieldSelector.html](https://docs.aws.amazon.com/awscloudtrail/latest/APIReference/API_AdvancedFieldSelector.html) in the *AWS CloudTrail API Reference*.

## AWS EMR management events in CloudTrail
<a name="cloudtrail-management-events"></a>

[Management events](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/logging-management-events-with-cloudtrail.html#logging-management-events) provide information about management operations that are performed on resources in your AWS account. These are also known as control plane operations. By default, CloudTrail logs management events.

AWS EMR logs all AWS EMR control plane operations as management events. For a list of the AWS EMR control plane operations that AWS EMR logs to CloudTrail, see the [AWS EMR API Reference](https://docs.aws.amazon.com/emr/latest/APIReference/Welcome.html).

## AWS EMR event examples
<a name="cloudtrail-event-examples"></a>

An event represents a single request from any source and includes information about the requested API operation, the date and time of the operation, request parameters, and so on. CloudTrail log files aren't an ordered stack trace of the public API calls, so events don't appear in any specific order.

The following example shows a CloudTrail log entry that demonstrates the **RunJobFlow** action.

```
{
	"Records": [
	{
         "eventVersion":"1.01",
         "userIdentity":{
            "type":"IAMUser",
            "principalId":"EX_PRINCIPAL_ID",
            "arn":"arn:aws:iam::123456789012:user/temporary-user-xx-7M",
            "accountId":"123456789012",
            "userName":"temporary-user-xx-7M"
         },
         "eventTime":"2018-03-31T17:59:21Z",
         "eventSource":"elasticmapreduce.amazonaws.com",
         "eventName":"RunJobFlow",
         "awsRegion":"us-west-2",
         "sourceIPAddress":"192.0.2.1",
         "userAgent":"aws-sdk-java/unknown-version Linux/xx Java_HotSpot(TM)_64-Bit_Server_VM/xx",
         "requestParameters":{
            "tags":[
               {
                  "value":"prod",
                  "key":"domain"
               },
               {
                  "value":"us-west-2",
                  "key":"realm"
               },
               {
                  "value":"VERIFICATION",
                  "key":"executionType"
               }
            ],
            "instances":{
               "slaveInstanceType":"m5.xlarge",
               "ec2KeyName":"emr-integtest",
               "instanceCount":1,
               "masterInstanceType":"m5.xlarge",
               "keepJobFlowAliveWhenNoSteps":true,
               "terminationProtected":false
            },
            "visibleToAllUsers":false,
            "name":"MyCluster",
            "ReleaseLabel":"emr-5.16.0"
         },
         "responseElements":{
            "jobFlowId":"j-2WDJCGEG4E6AJ"
         },
         "requestID":"2f482daf-b8fe-11e3-89e7-75a3d0e071c5",
         "eventID":"b348a38d-f744-4097-8b2a-e68c9b424698"
      },
	...additional entries
  ]
}
```

For information about CloudTrail record contents, see [CloudTrail record contents](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-event-reference-record-contents.html) in the *AWS CloudTrail User Guide*.

# EMR Observability Best Practices
<a name="emr-metrics-observability"></a>

EMR Observability encompasses a comprehensive monitoring and management approach for AWS EMR clusters. The foundation rests on Amazon CloudWatch as the primary monitoring service, complemented by EMR Studio, and third-party tools like Prometheus and Grafana for enhanced visibility. In this document, we explore specific aspects of cluster observability:

1. *[Spark observability](https://github.com/aws/aws-emr-best-practices/blob/main/website/docs/bestpractices/Applications/Spark/observability.md)* (GitHub) – With regards to the Spark user interface, you have three options in Amazon EMR.

1. *[Spark troubleshooting](https://github.com/aws/aws-emr-best-practices/blob/main/website/docs/bestpractices/Applications/Spark/troubleshooting.md)* (GitHub) – Resolutions for errors.

1. *[EMR Cluster monitoring](https://aws.github.io/aws-emr-best-practices/docs/bestpractices/Observability/best_practices/) * (GitHub) – Monitoring cluster performance.

1. *[Troubleshooting EMR](https://github.com/aws/aws-emr-best-practices/blob/main/website/docs/bestpractices/Troubleshooting/Troubleshooting%20EMR.md)* (GitHub) – Identify, diagnose, and resolve common EMR cluster problems.

1. *[Cost optimization](https://github.com/aws/aws-emr-best-practices/blob/main/website/docs/bestpractices/Cost%20Optimizations/best_practices.md)* (GitHub) – This section outlines the best practices for running cost-effective workloads.

## Performance Optimization Tool for Apache Spark Applications
<a name="performance-optimization"></a>

1. [AWS EMR Advisor](https://github.com/aws-samples/aws-emr-advisor) tool analyzes Spark event logs to provide tailored recommendations for optimizing EMR cluster configurations, enhancing performance, and reducing costs. By leveraging historical data, it suggests ideal executor sizes and infrastructure settings, enabling more efficient resource utilization and improved overall cluster performance.

1. [Amazon CodeGuru Profiler](https://github.com/amzn/amazon-codeguru-profiler-for-spark) tool helps developers identify performance bottlenecks and inefficiencies in their Spark applications by collecting and analyzing runtime data. The tool integrates seamlessly with existing Spark applications, requiring minimal setup, and provides detailed insights through the AWS Console about CPU usage, memory patterns, and performance hotspots.

# Use Amazon EMR cluster scaling to adjust for changing workloads
<a name="emr-scale-on-demand"></a>

You can adjust the number of Amazon EC2 instances available to an Amazon EMR cluster automatically or manually in response to workloads that have varying demands. To use automatic scaling, you have two options. You can enable Amazon EMR managed scaling or create a custom automatic scaling policy. The following table describes the differences between the two options.


|  | Amazon EMR managed scaling | Custom automatic scaling | 
| --- | --- | --- | 
|  Scaling policies and rules  |  No policy required. Amazon EMR manages the automatic scaling activity by continuously evaluating cluster metrics and making optimized scaling decisions.   |  You need to define and manage the automatic scaling policies and rules, such as the specific conditions that trigger scaling activities, evaluation periods, cooldown periods, etc.  | 
|  Supported Amazon EMR releases  |  Amazon EMR version 5.30.0 and higher (except Amazon EMR version 6.0.0)  |  Amazon EMR version 4.0.0 and higher  | 
|  Supported cluster composition  | Instance groups or instance fleets |  Instance groups only  | 
| Scaling limits configuration |  Scaling limits are configured for the entire cluster.  |  Scaling limits can only be configured for each instance group.  | 
|  Metrics evaluation frequency   |  Every 5 to 10 seconds More frequent evaluation of metrics allows Amazon EMR to make more precise scaling decisions.  |  You can define the evaluation periods only in five-minute increments.  | 
|  Supported applications  |  Only YARN applications are supported, such as Spark, Hadoop, Hive, Flink. Amazon EMR managed scaling does not support applications that are not based on YARN, such as Presto or HBase.  |  You can choose which applications are supported when defining the automatic scaling rules.   | 

## Considerations
<a name="emr-scaling-considerations"></a>
+ An Amazon EMR cluster always comprises one or three primary nodes. Once you initially configure the cluster, you can only scale core and task nodes. You can't scale the number of primary nodes for the cluster. 
+ For instance groups, reconfiguration operations and resize operations occur consecutively and not concurrently. If you initiate a reconfiguration while an instance group is resizing, the reconfiguration starts once the instance group completes the resize in progress. Conversely, if you initiate a resize operation while an instance group is busy with reconfiguration, the resizing starts once the reconfiguration is complete. 

# Using managed scaling in Amazon EMR
<a name="emr-managed-scaling"></a>

**Important**  
We strongly recommend that you use the latest Amazon EMR release (Amazon EMR 7.12.0) for managed scaling. In some early releases, you might experience intermittent application failures or delays in scaling. Amazon EMR resolved this issue with 5.x releases 5.30.2, 5.31.1, 5.32.1, 5.33.1 and higher, and with 6.x releases 6.1.1, 6.2.1, 6.3.1 and higher. For more information Region and release availability, see [Managed scaling availability](#emr-managed-scaling-availability).

## Overview
<a name="emr-managed-scaling-overview"></a>

With Amazon EMR versions 5.30.0 and higher (except for Amazon EMR 6.0.0), you can enable Amazon EMR managed scaling. Managed scaling lets you automatically increase or decrease the number of instances or units in your cluster based on workload. Amazon EMR continuously evaluates cluster metrics to make scaling decisions that optimize your clusters for cost and speed. Managed scaling is available for clusters composed of either instance groups or instance fleets.

## Managed scaling availability
<a name="emr-managed-scaling-availability"></a>
+ In the following AWS Regions, Amazon EMR managed scaling is available with Amazon EMR 6.14.0 and higher:
  + Asia Pacific (Taipei) (ap-east-2)
  + Asia Pacific (Melbourne) (ap-southeast-4)
  + Asia Pacific (Malaysia) (ap-southeast-5)
  + Asia Pacific (New Zealand) (ap-southeast-6)
  + Asia Pacific (Thailand) (ap-southeast-7)
  + Canada West (Calgary) (ca-west-1)
  + Europe (Spain) (eu-south-2)
  + Mexico (Central) (mx-central-1)
+ In the following AWS Regions, Amazon EMR managed scaling is available with Amazon EMR 5.30.0 and 6.1.0 and higher:
  + US East (N. Virginia) (us-east-1)
  + US East (Ohio) (us-east-2)
  + US West (Oregon) (us-west-2)
  + US West (N. California) (us-west-1)
  + Africa (Cape Town) (af-south-1)
  + Asia Pacific (Hong Kong) (ap-east-1)
  + Asia Pacific (Mumbai) (ap-south-1)
  + Asia Pacific (Hyderabad) (ap-south-2)
  + Asia Pacific (Seoul) (ap-northeast-2)
  + Asia Pacific (Singapore) (ap-southeast-1)
  + Asia Pacific (Sydney) (ap-southeast-2)
  + Asia Pacific (Jakarta) (ap-southeast-3)
  + Asia Pacific (Tokyo) (ap-northeast-1)
  + Asia Pacific (Osaka) (ap-northeast-3)
  + Canada (Central) (ca-central-1)
  + South America (São Paulo) (sa-east-1)
  + Europe (Frankfurt) (eu-central-1)
  + Europe (Zurich) (eu-central-2)
  + Europe (Ireland) (eu-west-1)
  + Europe (London) (eu-west-2)
  + Europe (Milan) (eu-south-1)
  + Europe (Paris) (eu-west-3)
  + Europe (Stockholm) (eu-north-1)
  + Israel (Tel Aviv) (il-central-1)
  + Middle East (UAE) (me-central-1)
  + China (Beijing) (cn-north-1)
  + China (Ningxia) (cn-northwest-1)
  + AWS GovCloud (US-East) (us-gov-east-1)
  + AWS GovCloud (US-West) (us-gov-west-1)
+ Amazon EMR managed scaling only works with YARN applications, such as Spark, Hadoop, Hive, and Flink. It doesn't support applications that are not based on YARN, such as Presto and HBase.

## Managed scaling parameters
<a name="emr-managed-scaling-parameters"></a>

You must configure the following parameters for managed scaling. The limit only applies to the core and task nodes. You cannot scale the primary node after initial configuration.
+ **Minimum** (`MinimumCapacityUnits`) – The lower boundary of allowed EC2 capacity in a cluster. It is measured through virtual central processing unit (vCPU) cores or instances for instance groups. It is measured through units for instance fleets. 
+ **Maximum** (`MaximumCapacityUnits`) – The upper boundary of allowed EC2 capacity in a cluster. It is measured through virtual central processing unit (vCPU) cores or instances for instance groups. It is measured through units for instance fleets. 
+ **On-Demand limit** (`MaximumOnDemandCapacityUnits`) (Optional) – The upper boundary of allowed EC2 capacity for On-Demand market type in a cluster. If this parameter is not specified, it defaults to the value of `MaximumCapacityUnits`. 
  + This parameter is used to split capacity allocation between On-Demand and Spot Instances. For example, if you set the minimum parameter as 2 instances, the maximum parameter as 100 instances, the On-Demand limit as 10 instances, then Amazon EMR managed scaling scales up to 10 On-Demand Instances and allocates the remaining capacity to Spot Instances. For more information, see [Node allocation scenarios](managed-scaling-allocation-strategy.md#node-allocation-scenarios).
+ **Maximum core nodes **(`MaximumCoreCapacityUnits`) (Optional) – The upper boundary of allowed EC2 capacity for core node type in a cluster. If this parameter is not specified, it defaults to the value of `MaximumCapacityUnits`. 
  + This parameter is used to split capacity allocation between core and task nodes. For example, if you set the minimum parameter as 2 instances, the maximum as 100 instances, the maximum core node as 17 instances, then Amazon EMR managed scaling scales up to 17 core nodes and allocates the remaining 83 instances to task nodes. For more information, see [Node allocation scenarios](managed-scaling-allocation-strategy.md#node-allocation-scenarios). 

For more information about managed scaling parameters, see [https://docs.aws.amazon.com/emr/latest/APIReference/API_ComputeLimits.html](https://docs.aws.amazon.com/emr/latest/APIReference/API_ComputeLimits.html).

## Considerations for Amazon EMR managed scaling
<a name="emr-managed-scaling-considerations"></a>
+ Managed scaling is supported in limited AWS Regions and Amazon EMR releases. For more information, see [Managed scaling availability](#emr-managed-scaling-availability).
+ You must configure the required parameters for Amazon EMR managed scaling. For more information, see [Managed scaling parameters](#emr-managed-scaling-parameters). 
+ To use managed scaling, the metrics-collector process must be able to connect to the public API endpoint for managed scaling in API Gateway. If you use a private DNS name with Amazon Virtual Private Cloud, managed scaling won't function properly. To ensure that managed scaling works, we recommend that you take one of the following actions:
  + Remove the API Gateway interface VPC endpoint from your Amazon VPC.
  + Follow the instructions in [Why do I get an HTTP 403 Forbidden error when connecting to my API Gateway APIs from a VPC?](https://aws.amazon.com/premiumsupport/knowledge-center/api-gateway-vpc-connections/) to disable the private DNS name setting.
  + Launch your cluster in a private subnet instead. For more information, see the topic on [Private subnets](emr-clusters-in-a-vpc.md#emr-vpc-private-subnet).
+ If your YARN jobs are intermittently slow during scale down, and YARN Resource Manager logs show that most of your nodes were deny-listed during that time, you can adjust the decommissioning timeout threshold.

  Reduce the `spark.blacklist.decommissioning.timeout` from one hour to one minute to make the node available for other pending containers to continue task processing.

  You should also set `YARN.resourcemanager.nodemanager-graceful-decommission-timeout-secs` to a larger value to ensure Amazon EMR doesn’t force terminate the node while the longest “Spark Task” is still running on the node. The current default is 60 minutes, which means YARN force-terminates the container after 60 minutes once the node enters the decomissioning state.

  The following example YARN Resource Manager Log line shows nodes added to the decomissioning state:

  ```
  2021-10-20 15:55:26,994 INFO org.apache.hadoop.YARN.server.resourcemanager.DefaultAMSProcessor (IPC Server handler 37 on default port 8030): blacklist are updated in Scheduler.blacklistAdditions: [ip-10-10-27-207.us-west-2.compute.internal, ip-10-10-29-216.us-west-2.compute.internal, ip-10-10-31-13.us-west-2.compute.internal, ... , ip-10-10-30-77.us-west-2.compute.internal], blacklistRemovals: []
  ```

  See more [details on how Amazon EMR integrates with YARN deny listing during decommissioning of nodes](https://aws.amazon.com/blogs/big-data/spark-enhancements-for-elasticity-and-resiliency-on-amazon-emr/), [cases when nodes in Amazon EMR can be deny listed](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-error-resource-3.html), and [configuring Spark node-decommissioning behavior](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#spark-decommissioning).
+ For Spark workloads, disabling Spark Dynamic Resource Allocator (DRA) by changing the Spark property **spark.dynamicAllocation.enabled** to `FALSE` can cause Managed Scaling issues, where clusters can be scaled up more than required for your workloads (up to the maximum compute). When using Managed Scaling for these workloads, we recommend that you keep Spark DRA enabled, which is the default state of this property.
+ Over-utilization of EBS volumes can cause Managed Scaling issues. We recommend that you maintain EBS volume below 90% utilization. For more information, see [Instance storage options and behavior in Amazon EMR](emr-plan-storage.md).
+ Amazon CloudWatch metrics are critical for Amazon EMR managed scaling to operate. We recommend that you closely monitor Amazon CloudWatch metrics to make sure data is not missing. For more information about how you can configure CloudWatch alarms to detect missing metrics, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html). 
+ Managed scaling operations on 5.30.0 and 5.30.1 clusters without Presto installed may cause application failures or cause a uniform instance group or instance fleet to stay in the `ARRESTED` state, particularly when a scale down operation is followed quickly by a scale up operation.

  As a workaround, choose Presto as an application to install when you create a cluster with Amazon EMR releases 5.30.0 and 5.30.1, even if your job does not require Presto.
+ When you set the maximum core node and the On-Demand limit for Amazon EMR managed scaling, consider the differences between instance groups and instance fleets. Each instance group consists of the same instance type and the same purchasing option for instances: On-Demand or Spot. For each instance fleet, you can specify up to five instance types, which can be provisioned as On-Demand and Spot Instances. For more information, see [Create a cluster with instance fleets or uniform instance groups](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-instance-group-configuration.html), [Instance fleet options](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-instance-fleet.html#emr-instance-fleet-options), and [Node allocation scenarios](managed-scaling-allocation-strategy.md#node-allocation-scenarios).
+ With Amazon EMR 5.30.0 and higher, if you remove the default **Allow All** outbound rule to 0.0.0.0/ for the master security group, you must add a rule that allows outbound TCP connectivity to your security group for service access on port 9443. Your security group for service access must also allow inbound TCP traffic on port 9443 from the master security group. For more information about configuring security groups, see [Amazon EMR-managed security group for the primary instance (private subnets)](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-man-sec-groups.html#emr-sg-elasticmapreduce-master-private).
+ You can use AWS CloudFormation to configure Amazon EMR managed scaling. For more information, see [AWS::EMR::Cluster](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticmapreduce-cluster.html) in the *AWS CloudFormation User Guide*. 
+ If you're using Spot nodes, consider using node labels to prevent Amazon EMR from removing application processes when Amazon EMR removes Spot nodes. For more information about node labels, see [Task nodes](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html#emr-plan-task).
+ Node labeling is not supported by default in Amazon EMR releases 6.15 or lower. For more information, see [Understand node types: primary, core, and task nodes.](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html)
+ If you're using Amazon EMR releases 6.15 or lower, you can only assign node labels by node type, such as core and task nodes. However, if you're using Amazon EMR release 7.0 or higher, you can configure node labels by node type and market type, such as On-Demand and Spot.
+ If application process demand increases and executor demand decreases when you restricted the application process to core nodes, you can add back core nodes and remove task nodes in the same resize operation. For more information, see [Understanding node allocation strategy and scenarios](https://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html).
+ Amazon EMR doesn't label task nodes, so you can't set the YARN properties to restrict application processes only for task nodes. However, if you want to use market types as node labels, you can use the `ON_DEMAND` or `SPOT` labels for application process placement. We don't recommend using Spot nodes for application primary processes.
+ When using node labels, the total running units in the cluster can temporarily exceed the max compute set in your managed scaling policy while Amazon EMR decommissions some of your instances. Total requested units will always stay at or below your policy’s max compute. 
+ Managed scaling only supports the node labels `ON_DEMAND` and `SPOT` or `CORE` and `TASK`. Custom node labels aren't supported.
+ Amazon EMR creates node labels when creating the cluster and provisioning resources. Amazon EMR doesn't support adding node labels when you reconfigure the cluster. You also can't modify the node labels when configuring managed scaling after launching the cluster.
+ Managed scaling scales core and task nodes independently based on application process and executor demand. To prevent HDFS data loss issues during core scale down, follow standard practice for core nodes. To learn more about best practices about core nodes and HDFS replication, see [Considerations and best practices](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-ha-considerations.html).
+ You can't place both the application process and executors on only the `core` or the `ON_DEMAND` node. If you want to add both the application process and executors on one of the nodes, don't use the `yarn.node-labels.am.default-node-label-expression` configuration.

  For example, to place both the application process and executors in `ON_DEMAND` nodes, set max compute to the same as the maximum in the `ON_DEMAND` node. Also remove the `yarn.node-labels.am.default-node-label-expression` configuration.

  To add both the application process and executors on `core` nodes, remove the `yarn.node-labels.am.default-node-label-expression` configuration.
+  When you use managed scaling with node labels, set the property `yarn.scheduler.capacity.maximum-am-resource-percent: 1` if you plan to run multiple applications in parallel. Doing so ensures that your application processes fully utilize the available `CORE` or `ON_DEMAND` nodes. 
+  If you use managed scaling with node labels, set the property `yarn.resourcemanager.decommissioning.timeout` to a value that's longer than the longest running application on your cluster. Doing so reduces the chance that Amazon EMR managed scalling needs to reschedule your applications to recommission `CORE` or `ON_DEMAND` nodes. 
+ To reduce the risk of application failures due to shuffle data loss, Amazon EMR collects metrics from the cluster to determine nodes that have existing transient shuffle data from the current and previous stage. In rare cases, metrics can continue to report stale data for applications that are already completed or terminated. This can impact timely scale down of instances in your cluster. For clusters that have large amount of shuffle data, consider using EMR versions 6.13 and later.

## Feature history
<a name="emr-managed-scaling-history"></a>

This table lists updates to the Amazon EMR managed scaling capability.


| Release date | Capability | Amazon EMR versions | 
| --- | --- | --- | 
| November 20, 2024 | Managed scaling is available in the il-central-1 Israel (Tel Aviv), me-central-1 Middle East (UAE), and ap-northeast-3 Asia Pacific (Osaka) regions. | 5.30.0 and 6.1.0 and higher | 
| November 15, 2024 | Managed scaling is available in the eu-central-2 Europe (Zurich) Region. | 5.30.0 and 6.1.0 and higher | 
| August 20, 2024 | Node labels are now available in managed scaling, so you can label your instances based on market type or node type to improve automatic scaling. | 7.2.0 and higher | 
| March 31, 2024 | Managed scaling is available in the ap-south-2 Asia Pacific (Hyderabad) Region. | 6.14.0 and higher | 
| February 13, 2024 | Managed scaling is available in the eu-south-2 Europe (Spain) Region. | 6.14.0 and higher | 
| October 10, 2023 | Managed scaling is available in the ap-southeast-3 Asia Pacific (Jakarta) Region. | 6.14.0 and higher | 
| July 28, 2023 | Enhanced managed scaling to switch to different task instance group on scale-up when Amazon EMR experiences a delay in scale-up with the current instance group. | 5.34.0 and higher, 6.4.0 and higher | 
| June 16, 2023 | Enhanced managed scaling to be aware of the nodes running application master so that those nodes are not scaled down. For more information, see [Understanding Amazon EMR node allocation strategy and scenarios](managed-scaling-allocation-strategy.md). | 5.34.0 and higher, 6.4.0 and higher | 
| March 21, 2022 | Added Spark shuffle data awareness used when scaling-down clusters. For Amazon EMR clusters with Apache Spark and the managed scaling feature enabled, Amazon EMR continuously monitors Spark executors and intermediate shuffle data locations. Using this information, Amazon EMR scales-down only under-utilized instances which don't contain actively used shuffle data. This prevents recomputation of lost shuffle data, helping to lower cost and improve job performance. For more information, see the [Spark Programming Guide](https://spark.apache.org/docs/latest/rdd-programming-guide.html#shuffle-operations). | 5.34.0 and higher, 6.4.0 and higher | 

# Configure managed scaling for Amazon EMR
<a name="managed-scaling-configure"></a>

The following sections explain how to launch an EMR cluster that uses managed scaling with the AWS Management Console, the AWS SDK for Java, or the AWS Command Line Interface.

**Topics**
+ [Use the AWS Management Console to configure managed scaling](#managed-scaling-console)
+ [Use the AWS CLI to configure managed scaling](#managed-scaling-cli)
+ [Use AWS SDK for Java to configure managed scaling](#managed-scaling-sdk)

## Use the AWS Management Console to configure managed scaling
<a name="managed-scaling-console"></a>

You can use the Amazon EMR console to configure managed scaling when you create a cluster or to change a managed scaling policy for a running cluster.

------
#### [ Console ]

**To configure managed scaling when you create a cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. Choose an Amazon EMR release **emr-5.30.0** or later, except version **emr-6.0.0**. 

1. Under **Cluster scaling and provisioning option**, choose **Use EMR-managed scaling**. Specify the **Minimum** and **Maximum** number of instances, the **Maximum core node** instances, and the **Maximum On-Demand** instances.

1. Choose any other options that apply to your cluster. 

1. To launch your cluster, choose **Create cluster**.

**To configure managed scaling on an existing cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and select the cluster that you want to update.

1. On the **Instances** tab of the cluster details page, find the **Instance group settings** section. Select **Edit cluster scaling** to specify new values for the **Minimum** and **Maximum** number of instances and the **On-Demand** limit.

------

## Use the AWS CLI to configure managed scaling
<a name="managed-scaling-cli"></a>

You can use AWS CLI commands for Amazon EMR to configure managed scaling when you create a cluster. You can use a shorthand syntax, specifying the JSON configuration inline within the relevant commands, or you can reference a file containing the configuration JSON. You can also apply a managed scaling policy to an existing cluster and remove a managed scaling policy that was previously applied. In addition, you can retrieve details of a scaling policy configuration from a running cluster.

**Enabling Managed Scaling During Cluster Launch**

You can enable managed scaling during cluster launch as the following example demonstrates.

```
aws emr create-cluster \
 --service-role EMR_DefaultRole \
 --release-label emr-7.12.0 \
 --name EMR_Managed_Scaling_Enabled_Cluster \
 --applications Name=Spark Name=Hbase \
 --ec2-attributes KeyName=keyName,InstanceProfile=EMR_EC2_DefaultRole \
 --instance-groups InstanceType=m4.xlarge,InstanceGroupType=MASTER,InstanceCount=1 InstanceType=m4.xlarge,InstanceGroupType=CORE,InstanceCount=2 \
 --region us-east-1 \
 --managed-scaling-policy ComputeLimits='{MinimumCapacityUnits=2,MaximumCapacityUnits=4,UnitType=Instances}'
```

You can also specify a managed policy configuration using the --managed-scaling-policy option when you use `create-cluster`. 

**Applying a Managed Scaling Policy to an Existing Cluster**

You can apply a managed scaling policy to an existing cluster as the following example demonstrates.

```
aws emr put-managed-scaling-policy  
--cluster-id j-123456  
--managed-scaling-policy ComputeLimits='{MinimumCapacityUnits=1,
MaximumCapacityUnits=10,  MaximumOnDemandCapacityUnits=10, UnitType=Instances}'
```

You can also apply a managed scaling policy to an existing cluster by using the `aws emr put-managed-scaling-policy` command. The following example uses a reference to a JSON file, `managedscaleconfig.json`, that specifies the managed scaling policy configuration.

```
aws emr put-managed-scaling-policy --cluster-id j-123456 --managed-scaling-policy file://./managedscaleconfig.json
```

The following example shows the contents of the `managedscaleconfig.json` file, which defines the managed scaling policy.

```
{
    "ComputeLimits": {
        "UnitType": "Instances",
        "MinimumCapacityUnits": 1,
        "MaximumCapacityUnits": 10,
        "MaximumOnDemandCapacityUnits": 10
    }
}
```

**Retrieving a Managed Scaling Policy Configuration**

The `GetManagedScalingPolicy` command retrieves the policy configuration. For example, the following command retrieves the configuration for the cluster with a cluster ID of `j-123456`.

```
aws emr get-managed-scaling-policy --cluster-id j-123456
```

The command produces the following example output.

```
 1. {
 2.    "ManagedScalingPolicy": { 
 3.       "ComputeLimits": { 
 4.          "MinimumCapacityUnits": 1,
 5.          "MaximumOnDemandCapacityUnits": 10,
 6.          "MaximumCapacityUnits": 10,
 7.          "UnitType": "Instances"
 8.       }
 9.    }
10. }
```

For more information about using Amazon EMR commands in the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/emr](https://docs.aws.amazon.com/cli/latest/reference/emr).

**Removing Managed Scaling Policy**

The `RemoveManagedScalingPolicy` command removes the policy configuration. For example, the following command removes the configuration for the cluster with a cluster ID of `j-123456`.

```
aws emr remove-managed-scaling-policy --cluster-id j-123456
```

## Use AWS SDK for Java to configure managed scaling
<a name="managed-scaling-sdk"></a>

The following program excerpt shows how to configure managed scaling using the AWS SDK for Java:

```
package com.amazonaws.emr.sample;

import java.util.ArrayList;
import java.util.List;

import com.amazonaws.AmazonClientException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduce;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClientBuilder;
import com.amazonaws.services.elasticmapreduce.model.Application;
import com.amazonaws.services.elasticmapreduce.model.ComputeLimits;
import com.amazonaws.services.elasticmapreduce.model.ComputeLimitsUnitType;
import com.amazonaws.services.elasticmapreduce.model.InstanceGroupConfig;
import com.amazonaws.services.elasticmapreduce.model.JobFlowInstancesConfig;
import com.amazonaws.services.elasticmapreduce.model.ManagedScalingPolicy;
import com.amazonaws.services.elasticmapreduce.model.RunJobFlowRequest;
import com.amazonaws.services.elasticmapreduce.model.RunJobFlowResult;

public class CreateClusterWithManagedScalingWithIG {

	public static void main(String[] args) {
		AWSCredentials credentialsFromProfile = getCreadentials("AWS-Profile-Name-Here");
		
		/**
		 * Create an Amazon EMR client with the credentials and region specified in order to create the cluster
		 */
		AmazonElasticMapReduce emr = AmazonElasticMapReduceClientBuilder.standard()
			.withCredentials(new AWSStaticCredentialsProvider(credentialsFromProfile))
			.withRegion(Regions.US_EAST_1)
			.build();
		
		/**
		 * Create Instance Groups - Primary, Core, Task
		 */
		InstanceGroupConfig instanceGroupConfigMaster = new InstanceGroupConfig()
				.withInstanceCount(1)
				.withInstanceRole("MASTER")
				.withInstanceType("m4.large")
				.withMarket("ON_DEMAND"); 
				
		InstanceGroupConfig instanceGroupConfigCore = new InstanceGroupConfig()
			.withInstanceCount(4)
			.withInstanceRole("CORE")
			.withInstanceType("m4.large")
			.withMarket("ON_DEMAND");
			
		InstanceGroupConfig instanceGroupConfigTask = new InstanceGroupConfig()
			.withInstanceCount(5)
			.withInstanceRole("TASK")
			.withInstanceType("m4.large")
			.withMarket("ON_DEMAND");

		List<InstanceGroupConfig> igConfigs = new ArrayList<>();
		igConfigs.add(instanceGroupConfigMaster);
		igConfigs.add(instanceGroupConfigCore);
		igConfigs.add(instanceGroupConfigTask);
		
        /**
         *  specify applications to be installed and configured when Amazon EMR creates the cluster
         */
		Application hive = new Application().withName("Hive");
		Application spark = new Application().withName("Spark");
		Application ganglia = new Application().withName("Ganglia");
		Application zeppelin = new Application().withName("Zeppelin");
		
		/** 
		 * Managed Scaling Configuration - 
         * Using UnitType=Instances for clusters composed of instance groups
		 *
         * Other options are: 
         * UnitType = VCPU ( for clusters composed of instance groups)
         * UnitType = InstanceFleetUnits ( for clusters composed of instance fleets)
         **/
		ComputeLimits computeLimits = new ComputeLimits()
				.withMinimumCapacityUnits(1)
				.withMaximumCapacityUnits(20)
				.withUnitType(ComputeLimitsUnitType.Instances);
		
		ManagedScalingPolicy managedScalingPolicy = new ManagedScalingPolicy();
		managedScalingPolicy.setComputeLimits(computeLimits);
		
		// create the cluster with a managed scaling policy
		RunJobFlowRequest request = new RunJobFlowRequest()
	       		.withName("EMR_Managed_Scaling_TestCluster")
	       		.withReleaseLabel("emr-7.12.0")          // Specifies the version label for the Amazon EMR release; we recommend the latest release
	       		.withApplications(hive,spark,ganglia,zeppelin)
	       		.withLogUri("s3://path/to/my/emr/logs")  // A URI in S3 for log files is required when debugging is enabled.
	       		.withServiceRole("EMR_DefaultRole")      // If you use a custom IAM service role, replace the default role with the custom role.
	       		.withJobFlowRole("EMR_EC2_DefaultRole")  // If you use a custom Amazon EMR role for EC2 instance profile, replace the default role with the custom Amazon EMR role.
	       		.withInstances(new JobFlowInstancesConfig().withInstanceGroups(igConfigs)
	       	   		.withEc2SubnetId("subnet-123456789012345")
	           		.withEc2KeyName("my-ec2-key-name") 
	           		.withKeepJobFlowAliveWhenNoSteps(true))    
	       		.withManagedScalingPolicy(managedScalingPolicy);
	   RunJobFlowResult result = emr.runJobFlow(request); 
	   
	   System.out.println("The cluster ID is " + result.toString());
	}
	
	public static AWSCredentials getCredentials(String profileName) {
		// specifies any named profile in .aws/credentials as the credentials provider
		try {
			return new ProfileCredentialsProvider("AWS-Profile-Name-Here")
					.getCredentials(); 
        } catch (Exception e) {
            throw new AmazonClientException(
                    "Cannot load credentials from .aws/credentials file. " +
                    "Make sure that the credentials file exists and that the profile name is defined within it.",
                    e);
        }
	}
	
	public CreateClusterWithManagedScalingWithIG() { }
}
```

# Advanced Scaling for Amazon EMR
<a name="managed-scaling-allocation-strategy-optimized"></a>

Starting with Amazon EMR on EC2 version 7.0, you can leverage Advanced Scaling to control your cluster's resource utilization. Advanced Scaling introduces a utilization-performance scale for tuning your resource utilization and performance level according to your business needs. The value you set determines whether your cluster is weighted more to resource conservation or to scaling up to handle service-level-agreement (SLA) sensitive workloads, where quick completion is critical. When the scaling value is adjusted, managed scaling interprets your intent and intelligently scales to optimize resources. For more information about managed scaling, see [Configure managed scaling for Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-configure.html).

## Advanced Scaling settings
<a name="managed-scaling-allocation-strategy-optimized-strategies"></a>

The value your set for Advanced Scaling optimizes your cluster to your requirements. Values range from **1**-**100**. Possible values are **1**, **25**, **50**, **75** and **100**. If you set the index to values other than these, it results in a validation error. 

Scaling values map to resource-utilization strategies. The following list defines several of these:
+ **Utilization optimized [1]** – This setting prevents resource over provisioning. Use a low value when you want to keep costs low and to prioritize efficient resource utilization. It causes the cluster to scale up less aggressively. This works well for the use case when there are regularly occurring workload spikes and you don't want resources to ramp up too quickly.
+ **Balanced [50]** – This balances resource utilization and job performance. This setting is suitable for steady workloads where most stages have a stable runtime. It's also suitable for workloads with a mix of short and long-running stages. We recommend starting with this setting if you aren't sure which to choose.
+ ** Performance optimized [100]** – This strategy prioritizes performance. The cluster scales up aggressively to ensure that jobs complete quickly and meet performance targets. Performance optimized is suitable for service-level-agreement (SLA) sensitive workloads where fast run time is critical.

**Note**  
The intermediate values available provide a middle ground between strategies in order to fine tune your cluster's Advanced Scaling behavior.

## Benefits of Advanced Scaling
<a name="managed-scaling-allocation-strategy-optimized-benefits"></a>

As you have variability in your environment and requirements, such as changing data volumes, cost-target adjustments, and SLA implementations, cluster scaling can help you adjust your cluster configuration to achieve your objectives. Key benefits include:
+ **Enhanced granular control** – The introduction of the utilization-performance setting allows you to easily adjust your cluster's scaling behavior according to your requirements. You can scale up to meet demand for compute resources or scale down to save resources, based on your use patterns.
+ **Improved cost optimization** – You can choose a low utilization value as requirements dictate to more easily meet your cost objectives.

## Getting started with optimization
<a name="managed-scaling-allocation-strategy-optimized-getting-started"></a>

**Setup and configuration**

Use these steps to set the performance index and optimize your scaling strategy.

1. The following command updates an existing cluster with the utilization-optimized `[1]` scaling strategy:

   ```
   aws emr put-managed-scaling-policy --cluster-id 'cluster-id' \
    --managed-scaling-policy '{
     "ComputeLimits": {
       "UnitType": "Instances",
       "MinimumCapacityUnits": 1,
       "MaximumCapacityUnits": 2,
       "MaximumOnDemandCapacityUnits": 2,
       "MaximumCoreCapacityUnits": 2
     },
     "ScalingStrategy": "ADVANCED",
     "UtilizationPerformanceIndex": "1"
   }' \
    --region "region-name"
   ```

   The attributes `ScalingStrategy` and `UtilizationPerformanceIndex` are new and relevant to scaling optimization. You can select different scaling strategies by setting corresponding values (1, 25, 50, 75, and 100) for the `UtilizationPerformanceIndex` attribute in the managed-scaling policy.

1. To revert to the default managed-scaling strategy, run the `put-managed-scaling-policy` command without including the `ScalingStrategy` and `UtilizationPerformanceIndex` attributes. (This is optional.) This sample shows how to do this:

   ```
   aws emr put-managed-scaling-policy \
   --cluster-id 'cluster-id' \
   --managed-scaling-policy '{"ComputeLimits":{"UnitType":"Instances","MinimumCapacityUnits":1,"MaximumCapacityUnits":2,"MaximumOnDemandCapacityUnits":2,"MaximumCoreCapacityUnits":2}}' \
   --region "region-name"
   ```

**Using monitoring metrics to track cluster utilization**

Starting with EMR version 7.3.0, Amazon EMR publishes four new metrics related to memory and virtual CPU. You can use these to measure cluster utilization across scaling strategies. These metrics are available for any use case, but you can use the details provided here for monitoring Advanced Scaling.

Helpful metrics available include the following:
+ **YarnContainersUsedMemoryGBSeconds** – Amount of memory consumed by applications managed by YARN.
+ **YarnContainersTotalMemoryGBSeconds** – Total memory capacity allocated to YARN within the cluster.
+ **YarnNodesUsedVCPUSeconds** – Total VCPU seconds for each application managed by YARN.
+ **YarnNodesTotalVCPUSeconds** – Aggregated total VCPU seconds for memory consumed, including the time window when yarn is not ready.

You can analyze resource metrics using Amazon CloudWatch Logs Insights. Features include a purpose-built query language that helps you extract metrics specific to resource use and scaling.

The following query, which you can run in the Amazon CloudWatch console, uses metric math to calculate the average memory utilization (e1) by dividing the running sum of consumed memory (e2) by the running sum of total memory (e3):

```
{
    "metrics": [
        [ { "expression": "e2/e3", "label": "Average Mem Utilization", "id": "e1", "yAxis": "right" } ],
        [ { "expression": "RUNNING_SUM(m1)", "label": "RunningTotal-YarnContainersUsedMemoryGBSeconds", "id": "e2", "visible": false } ],
        [ { "expression": "RUNNING_SUM(m2)", "label": "RunningTotal-YarnContainersTotalMemoryGBSeconds", "id": "e3", "visible": false } ],
        [ "AWS_EMR_ManagedResize", "YarnContainersUsedMemoryGBSeconds", "ACCOUNT_ID", "793684541905", "COMPONENT", "ManagerService", "JOB_FLOW_ID", "cluster-id", { "id": "m1", "label": "YarnContainersUsedMemoryGBSeconds" } ],
        [ ".", "YarnContainersTotalMemoryGBSeconds", ".", ".", ".", ".", ".", ".", { "id": "m2", "label": "YarnContainersTotalMemoryGBSeconds" } ]
    ],
    "view": "timeSeries",
    "stacked": false,
    "region": "region",
    "period": 60,
    "stat": "Sum",
    "title": "Memory Utilization"
}
```

To query logs, you can select CloudWatch in the AWS console. For more information about writing queries for CloudWatch, see [Analyzing log data with CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html) in the Amazon CloudWatch Logs User Guide.

The following image shows these metrics for a sample cluster:

![\[Graph that shows utilization statistics.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/scaling_graph_EMR.png)


## Considerations and limitations
<a name="managed-scaling-allocation-strategy-optimized-considerations"></a>
+ The effectiveness of scaling strategies might vary, depending on your unique workload characteristics and cluster configuration. We encourage you to experiment with the scaling setting to determine an optimal index value for your use case.
+ Amazon EMR Advanced Scaling is particularly well suited for batch workloads. For SQL/data-warehousing and streaming workloads, we recommend using the default managed-scaling strategy for optimal performance.
+ Amazon EMR Advanced Scaling is not supported when Node Label Configurations are enabled in the cluster. If both Advanced Scaling and Node Label Configurations are enabled together in a cluster, then the scaling behavior would be as if the default managed scaling setting was enabled.
+ The performance-optimized scaling strategy enables faster job execution by maintaining high compute resources for a longer period than the default managed-scaling strategy. This mode prioritizes quickly scaling up to meet resource demands, resulting in quicker job completion. This might result in higher costs when compared with the default strategy.
+ In cases where the cluster is already optimized and fully utilized, enabling Advanced Scaling might not provide additional benefits. In some situations, enabling Advanced Scaling might lead to increased costs as workloads may run longer. In these cases, we recommend using the default managed-scaling strategy to ensure optimal resource allocation and cost efficiency.
+ In the context of managed scaling, the emphasis shifts towards resource utilization over execution time as the setting is adjusted from performance-optimized [**100**] to utilization-optimized [**1**]. However, it is important to note that the outcomes might vary, based on the nature of the workload and the cluster's topology. To ensure optimal results for your use case, we strongly recommend testing the scaling strategies with your workloads to determine the most suitable setting.
+ The **PerformanceUtilizationIndex** accepts only the following values:
  + **1**
  + **25**
  + **50**
  + **75**
  + **100**

  Any other values submitted result in a validation error.

# Understanding Amazon EMR node allocation strategy and scenarios
<a name="managed-scaling-allocation-strategy"></a>

This section gives an overview of node allocation strategy and common scaling scenarios that you can use with Amazon EMR managed scaling. 

## Node allocation strategy
<a name="node-allocation-strategy"></a>

Amazon EMR managed scaling allocates core and task nodes based on the following scale-up and scale-down strategies: 

**Scale-up strategy **
+ For Amazon EMR releases 7.2 and higher, managed scaling first adds nodes based on node labels and the application process restriction YARN property. 
+ For Amazon EMR releases 7.2 and higher, if you enabled node labels and restricted application processes to `CORE` nodes, Amazon EMR managed scaling scales up core nodes and task nodes if application process demand increases and executor demand increases. Similarly, if you enabled node labels and restricted application processes to `ON_DEMAND` nodes, managed scaling scales up on-demand nodes if application process demand increases and scales up spot nodes if executor demand increases.
+ If node labels aren't enabled, application process placement aren't restricted to any node or market type.
+ By using node labels, managed scaling can scale up and scale down different instance groups and instance fleets in the same resize operation. For example, in a scenario in which `instance_group1` has `ON_DEMAND` node and `instance_group2` has a `SPOT` node, and node labels are enabled and application processes are restricted to nodes with the `ON_DEMAND` label. Managed scaling will scale down `instance_group1` and scale up `instance_group2` if application process demand decreases and executor demand increases. 
+ When Amazon EMR experiences a delay in scale-up with the current instance group, clusters that use managed scaling automatically switch to a different task instance group.
+ If the `MaximumCoreCapacityUnits` parameter is set, then Amazon EMR scales core nodes until the core units reach the maximum allowed limit. All the remaining capacity is added to task nodes. 
+ If the `MaximumOnDemandCapacityUnits` parameter is set, then Amazon EMR scales the cluster by using the On-Demand Instances until the On-Demand units reach the maximum allowed limit. All the remaining capacity is added using Spot Instances. 
+ If both the `MaximumCoreCapacityUnits` and `MaximumOnDemandCapacityUnits` parameters are set, Amazon EMR considers both limits during scaling. 

  For example, if the `MaximumCoreCapacityUnits` is less than `MaximumOnDemandCapacityUnits`, Amazon EMR first scales core nodes until the core capacity limit is reached. For the remaining capacity, Amazon EMR first uses On-Demand Instances to scale task nodes until the On-Demand limit is reached, and then uses Spot Instances for task nodes. 

**Scale-down strategy**
+ Similar to the scale-up strategy, Amazon EMR removes nodes based on node labels. For more information about node labels, see [Understand node types: primary, core, and task nodes](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html).
+ If you haven't enabled node labels, managed scaling removes task nodes and then removes core nodes until it achieves the desired scale-down target capacity. Managed scaling never scales down the cluster below the minimum constraints specified in the managed scaling policy. 
+ Amazon EMR versions 5.34.0 and higher, and Amazon EMR versions 6.4.0 and higher, support Spark shuffle data awareness, which prevents an instance from scaling down while Managed Scaling is aware of existing shuffle data. For more information on shuffle operations, see the [Spark Programming Guide](https://spark.apache.org/docs/latest/rdd-programming-guide.html#shuffle-operations). Managed Scaling makes best effort to prevent scaling-down nodes with shuffle data from the current and previous stage of any active Spark application, up to a maximum of 30 minutes. This helps minimize unintended shuffle data loss, avoiding the need for job re-attempts and recomputation of intermediate data. However, prevention of shuffle data loss is not guaranteed. For improved Spark shuffle protection, we recommend shuffle awareness on clusters with release label 7.4 or higher. Add the following flags to the cluster configuration to enable improved Spark shuffle protection.
  + If either the `yarn.nodemanager.shuffledata-monitor.interval-ms` flag (default 30000 ms) or the `spark.dynamicAllocation.executorIdleTimeout` (default 60 sec) has been changed from the default values, ensure the condition `spark.dynamicAllocation.executorIdleTimeout > yarn.nodemanager.shuffledata-monitor.interval-ms` remains `true` by updating the necessary flag.

    ```
    [
    	{
    		"Classification": "yarn-site",
    		"Properties": { 
    		"yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-shuffle-data": "true"
    		}
    	},
    	{
    		"Classification": "spark-defaults",
    		"Properties": {
    		"spark.dynamicAllocation.enabled": "true",
    		"spark.shuffle.service.removeShuffle": "true"
    		}
    	}
    ]
    ```
+ Managed scaling first removes task nodes and then removes core nodes until it achieves the desired scale-down target capacity. The cluster never scales below the minimum constraints specified in the managed scaling policy.
+ For clusters that are launched with Amazon EMR 5.x releases 5.34.0 and higher, and 6.x releases 6.4.0 and higher, Amazon EMR Managed Scaling doesn’t scale down nodes that have `ApplicationMaster` for Apache Spark, if there are active stages in the applications running on them. This minimizes job failures and retries, which helps to improve job performance and reduce costs. To confirm which nodes in your cluster are running `ApplicationMaster`, visit the Spark History Server and filter for the driver under the **Executors** tab of your Spark application ID.
+ While the intelligent scaling with EMR Managed Scaling minimizes shuffle data loss for Spark, there can be instances when transient shuffle data might be not be protected during a scale-down. To provide enhanced resiliency of shuffle data during scale-down, we recommend enabling **Graceful Decommissioning for Shuffle Data** in YARN. When **Graceful Decommissioning for Shuffle Data** is enabled in YARN, nodes selected for scale-down that have shuffle data will enter the **Decommissioning** state and continue to serve shuffle files. The YARN ResourceManager waits until nodes report no shuffle files present before removing the nodes from the cluster.
  + Amazon EMR version 6.11.0 and higher support Yarn-based graceful decommissioning for **Hive** shuffle data for both the Tez and MapReduce Shuffle Handlers.
    + Enable Graceful Decommissioning for Shuffle Data by setting `yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-shuffle-data` to `true`.
  + Amazon EMR version 7.4.0 and higher support Yarn-based graceful decommissioning for Spark shuffle data when the external shuffle service is enabled (enabled by default in EMR on EC2).
    + The default behavior of the Spark external shuffle service, when running Spark on Yarn, is for the Yarn NodeManager to remove application shuffle files at time of application termination. This may have an impact on the speed of node decommissioning and compute utilization. For long running applications, consider setting `spark.shuffle.service.removeShuffle` to `true` to remove shuffle files no longer in use to enable faster decommissioning of nodes with no active shuffle data.
  + To minimize Spark shuffle data loss in Amazon EMR version 7.4.0 and higher, consider setting the following flags.
    + If either the `yarn.nodemanager.shuffledata-monitor.interval-ms` flag (default 30000 ms) or the `spark.dynamicAllocation.executorIdleTimeout` (default 60 sec) has been changed from the default values, ensure that the condition `spark.dynamicAllocation.executorIdleTimeout > yarn.nodemanager.shuffledata-monitor.interval-ms` remains `true` by updating the necessary flag.

      ```
      [
      	{
      		"Classification": "yarn-site",
      		"Properties": { 
      		"yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-shuffle-data": "true"
      		}
      	},
      	{
      		"Classification": "spark-defaults",
      		"Properties": {
      		"spark.dynamicAllocation.enabled": "true",
      		"spark.shuffle.service.removeShuffle": "true"
      		}
      	}
      ]
      ```

If the cluster does not have any load, then Amazon EMR cancels the addition of new instances from a previous evaluation and performs scale-down operations. If the cluster has a heavy load, Amazon EMR cancels the removal of instances and performs scale-up operations.

## Node allocation considerations
<a name="node-allocation-considerations"></a>

We recommend that you use the On-Demand purchasing option for core nodes to avoid HDFS data loss in case of Spot reclamation. You can use the Spot purchasing option for task nodes to reduce costs and get faster job execution when more Spot Instances are added to task nodes.

## Node allocation scenarios
<a name="node-allocation-scenarios"></a>

You can create various scaling scenarios based on your needs by setting up the Maximum, Minimum, On-Demand limit, and Maximum core node parameters in different combinations. 

**Scenario 1: Scale Core Nodes Only**

To scale core nodes only, the managed scaling parameters must meet the following requirements: 
+ The On-Demand limit is equal to the maximum boundary.
+ The maximum core node is equal to the maximum boundary. 

When the On-Demand limit and the maximum core node parameters are not specified, both parameters default to the maximum boundary. 

This scenario isn't applicable if you use managed scaling with node labels and restrict your application processes to only run on `CORE` nodes, because managed scaling scales task nodes to accommodate executor demand.

The following examples demonstrate the scenario of scaling cores nodes only.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 2: Scale task nodes only **

To scale task nodes only, the managed scaling parameters must meet the following requirement: 
+ The maximum core node must be equal to the minimum boundary.

The following examples demonstrate the scenario of scaling task nodes only.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 3: Only On-Demand Instances in the cluster **

To have On-Demand Instances only, your cluster and the managed scaling parameters must meet the following requirement: 
+ The On-Demand limit is equal to the maximum boundary. 

  When the On-Demand limit is not specified, the parameter value defaults to the maximum boundary. The default value indicates that Amazon EMR scales On-Demand Instances only. 

If the maximum core node is less than the maximum boundary, the maximum core node parameter can be used to split capacity allocation between core and task nodes. 

To enable this scenario in a cluster composed of instance groups, all node groups in the cluster must use the On-Demand market type during initial configuration. 

This scenario is not applicable if you use managed scaling with node labels and restrict your application processes to only run on `ON_DEMAND` nodes, because managed scaling scales `Spot` nodes to accommodate executor demand.

The following examples demonstrate the scenario of having On-Demand Instances in the entire cluster.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 4: Only Spot Instances in the cluster**

To have Spot Instances only, the managed scaling parameters must meet the following requirement: 
+ On-Demand limit is set to 0.

If the maximum core node is less than the maximum boundary, the maximum core node parameter can be used to split capacity allocation between core and task nodes.

To enable this scenario in a cluster composed of instance groups, the core instance group must use the Spot purchasing option during initial configuration. If there is no Spot Instance in the task instance group, Amazon EMR managed scaling creates a task group using Spot Instances when needed. 

This scenario isn't applicable if you use managed scaling with node labels and restrict your application processes to only run on `ON_DEMAND` nodes, because managed scaling scales `ON_DEMAND` nodes to accommodate application process demand.

The following examples demonstrate the scenario of having Spot Instances in the entire cluster.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 5: Scale On-Demand Instances on core nodes and Spot Instances on task nodes **

To scale On-Demand Instances on core nodes and Spot Instances on task nodes, the managed scaling parameters must meet the following requirements: 
+ The On-Demand limit must be equal to the maximum core node.
+ Both the On-Demand limit and the maximum core node must be less than the maximum boundary.

To enable this scenario in a cluster composed of instance groups, the core node group must use the On-Demand purchasing option.

This scenario isn't applicable if you use managed scaling with node labels and restrict your application processes to only run on `ON_DEMAND` nodes or `CORE` nodes. 

The following examples demonstrate the scenario of scaling On-Demand Instances on core nodes and Spot Instances on task nodes.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 6: Scale `CORE` instances for application process demand and `TASK` instances for executor demand.**

This scenario is only applicable if you use managed scaling with node labels and restrict application processes to only run on `CORE` nodes.

To scale `CORE` nodes based on application process demand and `TASK` nodes based on executor demand, you must set the following configurations at cluster launch:
+  `yarn.node-labels.enabled:true` 
+  `yarn.node-labels.am.default-node-label-expression: 'CORE'` 

If you don't specify the `ON_DEMAND` limit and the maximum `CORE` node parameters, both parameters default to the maximum boundary.

If the maximum `ON_DEMAND` node is less than the maximum boundary, managed scaling uses the maximum `ON_DEMAND` node parameter to split capacity allocation between `ON_DEMAND` and `SPOT` nodes. If you set the the maximum `CORE` node parameter to less than or equal to the minimum capacity parameter, `CORE` nodes remain static at the maximum core capacity.

The following examples demonstrate the scenario of scaling CORE instances based on application process demand and TASK instances based on executor demand.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 7: Scale `ON_DEMAND` instances for application process demand and `SPOT` instances for executor demand.**

This scenario is only applicable if you use managed scaling with node labels and restrict application processes to only run on `ON_DEMAND` nodes.

To scale `ON_DEMAND` nodes based on application process demand and `SPOT` nodes based on executor demand, you must set the following configurations at cluster launch:
+  `yarn.node-labels.enabled:true` 
+  `yarn.node-labels.am.default-node-label-expression: 'ON_DEMAND'` 

If you don't specify the `ON_DEMAND` limit and the maximum `CORE` node parameters, both parameters default to the maximum boundary.

If the maximum `CORE` node is less than the maximum boundary, managed scaling uses the maximum `CORE` node parameter to split capacity allocation between `CORE` and `TASK` nodes. If you set the the maximum `CORE` node parameter to less than or equal to the minimum capacity parameter, `CORE` nodes remain static at the maximum core capacity.

The following examples demonstrate the scenario of scaling On-Demand Instances based on application process demand and Spot instances based on executor demand.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

# Understanding managed scaling metrics in Amazon EMR
<a name="managed-scaling-metrics"></a>

Amazon EMR publishes high-resolution metrics with data at a one-minute granularity when managed scaling is enabled for a cluster. You can view events on every resize initiation and completion controlled by managed scaling with the Amazon EMR console or the Amazon CloudWatch console. CloudWatch metrics are critical for Amazon EMR managed scaling to operate. We recommend that you closely monitor CloudWatch metrics to make sure data is not missing. For more information about how you can configure CloudWatch alarms to detect missing metrics, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com//AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html). For more information about using CloudWatch events with Amazon EMR, see [Monitor CloudWatch events](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-cloudwatch-events.html).

The following metrics indicate the current or target capacities of a cluster. These metrics are only available when managed scaling is enabled. For clusters composed of instance fleets, the cluster capacity metrics are measured in `Units`. For clusters composed of instance groups, the cluster capacity metrics are measured in `Nodes` or `vCPU` based on the unit type used in the managed scaling policy. 


| Metric | Description | 
| --- | --- | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The target total number of units/nodes/vCPUs in a cluster as determined by managed scaling. Units: *Count*  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The current total number of units/nodes/vCPUs available in a running cluster. When a cluster resize is requested, this metric will be updated after the new instances are added or removed from the cluster. Units: *Count*  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The target number of CORE units/nodes/vCPUs in a cluster as determined by managed scaling. Units: *Count*  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The current number of CORE units/nodes/vCPUs running in a cluster. Units: *Count*  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The target number of TASK units/nodes/vCPUs in a cluster as determined by managed scaling. Units: *Count*  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The current number of TASK units/nodes/vCPUs running in a cluster. Units: *Count*  | 

The following metrics indicate the usage status of cluster and applications. These metrics are available for all Amazon EMR features, but are published at a higher resolution with data at a one-minute granularity when managed scaling is enabled for a cluster. You can correlate the following metrics with the cluster capacity metrics in the previous table to understand the managed scaling decisions. 


| Metric | Description | 
| --- | --- | 
|  `AppsCompleted`  |  The number of applications submitted to YARN that have completed. Use case: Monitor cluster progress Units: *Count*  | 
|  `AppsPending`  |  The number of applications submitted to YARN that are in a pending state. Use case: Monitor cluster progress Units: *Count*  | 
|  `AppsRunning`  |  The number of applications submitted to YARN that are running. Use case: Monitor cluster progress Units: *Count*  | 
| ContainerAllocated |  The number of resource containers allocated by the ResourceManager. Use case: Monitor cluster progress Units: *Count*  | 
|  `ContainerPending`  |  The number of containers in the queue that have not yet been allocated. Use case: Monitor cluster progress Units: *Count*  | 
| ContainerPendingRatio |  The ratio of pending containers to containers allocated (ContainerPendingRatio = ContainerPending / ContainerAllocated). If ContainerAllocated = 0, then ContainerPendingRatio = ContainerPending. The value of ContainerPendingRatio represents a number, not a percentage. This value is useful for scaling cluster resources based on container allocation behavior. Units: *Count*  | 
|  `HDFSUtilization`  |  The percentage of HDFS storage currently used. Use case: Analyze cluster performance Units: *Percent*  | 
|  `IsIdle`  |  Indicates that a cluster is no longer performing work, but is still alive and accruing charges. It is set to 1 if no tasks are running and no jobs are running, and set to 0 otherwise. This value is checked at five-minute intervals and a value of 1 indicates only that the cluster was idle when checked, not that it was idle for the entire five minutes. To avoid false positives, you should raise an alarm when this value has been 1 for more than one consecutive five-minute check. For example, you might raise an alarm on this value if it has been 1 for thirty minutes or longer. Use case: Monitor cluster performance Units: *Boolean*  | 
|  `MemoryAvailableMB`  |  The amount of memory available to be allocated. Use case: Monitor cluster progress Units: *Count*  | 
|  `MRActiveNodes`  |  The number of nodes presently running MapReduce tasks or jobs. Equivalent to YARN metric `mapred.resourcemanager.NoOfActiveNodes`. Use case: Monitor cluster progress Units: *Count*  | 
|  `YARNMemoryAvailablePercentage`  |  The percentage of remaining memory available to YARN (YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB). This value is useful for scaling cluster resources based on YARN memory usage. Units: *Percent*  | 

The following metrics provide information about resources used by YARN containers and nodes. These metrics from the YARN resource manager offer insights into the resources used by containers and nodes running in the cluster. Comparing these metrics to the previous table’s cluster capacity metrics provides a clearer picture of the impact of managed scaling:


| Metric | Associated releases | Description | 
| --- | --- | --- | 
|  `YarnContainersUsedMemoryGBSeconds`  |  Available to release label 7.3.0 and higher  |  The consumed container memory \$1 seconds for the publishing period. **Units:** GB \$1 seconds  | 
|  `YarnContainersTotalMemoryGBSeconds`  |  Available to release label 7.3.0 and higher  |  The total yarn container \$1 seconds for the publishing period. **Units:** GB \$1 seconds  | 
|  `YarnContainersUsedVCPUSeconds`  |  Available to release label 7.5.0 and higher  |  The consumed container VCPU \$1 seconds for the publishing period. **Units:** VCPU \$1 seconds  | 
| `YarnContainersTotalVCPUSeconds` | Available to release label 7.5.0 and higher |  The total container VCPU \$1 seconds for the publishing period. **Units:** VCPU \$1 seconds  | 
|  `YarnNodesUsedMemoryGBSeconds`  |  Available to release label 7.5.0 and higher  |  The consumed node memory \$1 seconds for the publishing period. **Units:** GB \$1 seconds  | 
| `YarnNodesTotalMemoryGBSeconds` | Available to release label 7.5.0 and higher |  The total node memory \$1 seconds for the publishing period. **Units:** GB \$1 seconds  | 
|  `YarnNodesUsedVCPUSeconds`  |  Available to release label 7.3.0 and higher  |  The consumed node VCPU \$1 seconds for the publishing period. **Units:** VCPU \$1 seconds  | 
|  `YarnNodesTotalVCPUSeconds`  |  Available to release label 7.3.0 and higher  |  The total node VCPU \$1 seconds for the publishing period. **Units:** VCPU \$1 seconds  | 

## Graphing managed scaling metrics
<a name="managed-scaling-graphic"></a>

You can graph metrics to visualize your cluster's workload patterns and corresponding scaling decisions made by Amazon EMR managed scaling as the following steps demonstrate. 

**To graph managed scaling metrics in the CloudWatch console**

1. Open the [CloudWatch console](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Amazon EMR**. You can search on the cluster identifier of the cluster to monitor.

1. Scroll down to the metric to graph. Open a metric to display the graph.

1. To graph one or more metrics, select the check box next to each metric. 

The following example illustrates the Amazon EMR managed scaling activity of a cluster. The graph shows three automatic scale-down periods, which save costs when there is a less active workload. 

![\[Graph managed scaling metrics\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/Managed_Scaling_Decision.png)


All the cluster capacity and usage metrics are published at one-minute intervals. Additional statistical information is also associated with each one-minute data, which allows you to plot various functions such as `Percentiles`, `Min`, `Max`, `Sum`, `Average`, `SampleCount`.

For example, the following graph plots the same `YARNMemoryAvailablePercentage` metric at different percentiles, P10, P50, P90, P99, along with `Sum`, `Average`, `Min`, `SampleCount`.

![\[Graph managed scaling metrics with different percentiles\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/Managed_Scaling_Metrics.png)


# Using automatic scaling with a custom policy for instance groups in Amazon EMR
<a name="emr-automatic-scaling"></a>

Automatic scaling with a custom policy in Amazon EMR releases 4.0 and higher allows you to programmatically scale out and scale in core nodes and task nodes based on a CloudWatch metric and other parameters that you specify in a *scaling policy*. Automatic scaling with a custom policy is available with the instance groups configuration and is not available when you use instance fleets. For more information about instance groups and instance fleets, see [Create an Amazon EMR cluster with instance fleets or uniform instance groups](emr-instance-group-configuration.md).

The scaling policy is part of an instance group configuration. You can specify a policy during initial configuration of an instance group, or by modifying an instance group in an existing cluster, even when that instance group is active. Each instance group in a cluster, except the primary instance group, can have its own scaling policy, which consists of scale-out and scale-in rules. Scale-out and scale-in rules can be configured independently, with different parameters for each rule.

You can configure scaling policies with the AWS Management Console, the AWS CLI, or the Amazon EMR API. When you use the AWS CLI or Amazon EMR API, you specify the scaling policy in JSON format. In addition, when with the AWS CLI or the Amazon EMR API, you can specify custom CloudWatch metrics. Custom metrics are not available for selection with the AWS Management Console. When you initially create a scaling policy with the console, a default policy suitable for many applications is pre-configured to help you get started. You can delete or modify the default rules.

Even though automatic scaling allows you to adjust EMR cluster capacity on-the-fly, you should still consider baseline workload requirements and plan your node and instance group configurations. For more information, see [Cluster configuration guidelines](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html).

**Note**  
For most workloads, setting up both scale-in and scale-out rules is desirable to optimize resource utilization. Setting either rule without the other means that you need to manually resize the instance count after a scaling activity. In other words, this sets up a "one-way" automatic scale-out or scale-in policy with a manual reset.

## Creating the IAM role for automatic scaling
<a name="emr-automatic-scaling-iam-role"></a>

Automatic scaling in Amazon EMR requires an IAM role with permissions to add and terminate instances when scaling activities are triggered. A default role configured with the appropriate role policy and trust policy, `EMR_AutoScaling_DefaultRole`, is available for this purpose. When you create a cluster with a scaling policy for the first time with the AWS Management Console, Amazon EMR creates the default role and attaches the default managed policy for permissions, `AmazonElasticMapReduceforAutoScalingRole`.

When you create a cluster with an automatic scaling policy with the AWS CLI, you must first ensure that either the default IAM role exists, or that you have a custom IAM role with a policy attached that provides the appropriate permissions. To create the default role, you can run the `create-default-roles` command before you create a cluster. You can then specify `--auto-scaling-role EMR_AutoScaling_DefaultRole` option when you create a cluster. Alternatively, you can create a custom automatic scaling role and then specify it when you create a cluster, for example `--auto-scaling-role MyEMRAutoScalingRole`. If you create a customized automatic scaling role for Amazon EMR, we recommend that you base permissions policies for your custom role based on the managed policy. For more information, see [Configure IAM service roles for Amazon EMR permissions to AWS services and resources](emr-iam-roles.md).

## Understanding automatic scaling rules
<a name="emr-scaling-rules"></a>

When a scale-out rule triggers a scaling activity for an instance group, Amazon EC2 instances are added to the instance group according to your rules. New nodes can be used by applications such as Apache Spark, Apache Hive, and Presto as soon as the Amazon EC2 instance enters the `InService` state. You can also set up a scale-in rule that terminates instances and removes nodes. For more information about the lifecycle of Amazon EC2 instances that scale automatically, see [Auto Scaling lifecycle](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html) in the *Amazon EC2 Auto Scaling User Guide*.

You can configure how a cluster terminates Amazon EC2 instances. You can choose to either terminate at the Amazon EC2 instance-hour boundary for billing, or upon task completion. This setting applies both to automatic scaling and to manual resizing operations. For more information about this configuration, see [Cluster scale-down options for Amazon EMR clusters](emr-scaledown-behavior.md).

The following parameters for each rule in a policy determine automatic scaling behavior.

**Note**  
The parameters listed here are based on the AWS Management Console for Amazon EMR. When you use the AWS CLI or Amazon EMR API, additional advanced configuration options are available. For more information about advanced options, see [SimpleScalingPolicyConfiguration](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_PutAutoScalingPolicy.html) in the *Amazon EMR API Reference*.
+ Maximum instances and minimum instances. The **Maximum instances** constraint specifies the maximum number of Amazon EC2 instances that can be in the instance group, and applies to all scale-out rules. Similarly, the **Minimum instances** constraint specifies the minimum number of Amazon EC2 instances and applies to all scale-in rules.
+ The **Rule name**, which must be unique within the policy.
+ The **scaling adjustment**, which determines the number of EC2 instances to add (for scale-out rules) or terminate (for scale-in rules) during the scaling activity triggered by the rule. 
+ The **CloudWatch metric**, which is watched for an alarm condition.
+ A **comparison operator**, which is used to compare the CloudWatch metric to the **Threshold** value and determine a trigger condition.
+ An **evaluation period**, in five-minute increments, for which the CloudWatch metric must be in a trigger condition before scaling activity is triggered.
+ A **Cooldown period**, in seconds, which determines the amount of time that must elapse between a scaling activity started by a rule and the start of the next scaling activity, regardless of the rule that triggers it. When an instance group has finished a scaling activity and reached its post-scale state, the cooldown period provides an opportunity for the CloudWatch metrics that might trigger subsequent scaling activities to stabilize. For more information, see [Auto Scaling cooldowns](https://docs.aws.amazon.com/autoscaling/ec2/userguide/Cooldown.html) in the *Amazon EC2 Auto Scaling User Guide*.  
![\[AWS Management Console automatic scaling rule parameters for Amazon EMR.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/auto-scaling-rule-params.png)

## Considerations and limitations
<a name="emr-automatic-scaling-considerations"></a>
+ Amazon CloudWatch metrics are critical for Amazon EMR automatic scaling to operate. We recommend that you closely monitor Amazon CloudWatch metrics to make sure data is not missing. For more information about how you can configure Amazon CloudWatch alarms to detect missing metrics, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html).
+ Over-utilization of EBS volumes can cause Managed Scaling issues. We recommend that you monitor EBS volume usage closely to make sure EBS volume is below 90% utilization. See [Instance storage](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-storage.html) for information on specifying additional EBS volumes.
+ Automatic scaling with a custom policy in Amazon EMR releases 5.18 to 5.28 may experience scaling failure caused by data intermittently missing in Amazon CloudWatch metrics. We recommend that you use the most recent Amazon EMR versions for improved autoscaling. You can also contact [AWS Support](https://aws.amazon.com/premiumsupport/) for a patch if you need to use an Amazon EMR release between 5.18 and 5.28.

## Using the AWS Management Console to configure automatic scaling
<a name="emr-automatic-scale-console"></a>

When you create a cluster, you configure a scaling policy for instance groups with the advanced cluster configuration options. You can also create or modify a scaling policy for an instance group in-service by modifying instance groups in the **Hardware** settings of an existing cluster.

1. Navigate to the new Amazon EMR console and select **Switch to the old console** from the side navigation. For more information on what to expect when you switch to the old console, see [Using the old console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html#console-opt-in).

1. If you are creating a cluster, in the Amazon EMR console, select **Create Cluster**, select **Go to advanced options**, choose options for **Step 1: Software and Steps**, and then go to **Step 2: Hardware Configuration**.

   ** - or - **

   If you are modifying an instance group in a running cluster, select your cluster from the cluster list, and then expand the **Hardware** section.

1. In **Cluster scaling and provisioning option** section, select **Enable cluster scaling**. Then select **Create a custom automatic scaling policy**.

   In the table of **Custom automatic scaling policies**, click the pencil icon that appears in the row of the instance group you want to configure. The Auto Scaling Rules screen opens. 

1. Type the **Maximum instances** you want the instance group to contain after it scales out, and type the **Minimum instances** you want the instance group to contain after it scales in.

1. Click the pencil to edit rule parameters, click the **X** to remove a rule from the policy, and click **Add rule** to add additional rules.

1. Choose rule parameters as described earlier in this topic. For descriptions of available CloudWatch metrics for Amazon EMR, see [Amazon EMR metrics and dimensions](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/emr-metricscollected.html) in the *Amazon CloudWatch User Guide*.

## Using the AWS CLI to configure automatic scaling
<a name="emr-automatic-scale-cli"></a>

You can use AWS CLI commands for Amazon EMR to configure automatic scaling when you create a cluster and when you create an instance group. You can use a shorthand syntax, specifying the JSON configuration inline within the relevant commands, or you can reference a file containing the configuration JSON. You can also apply an automatic scaling policy to an existing instance group and remove an automatic scaling policy that was previously applied. In addition, you can retrieve details of a scaling policy configuration from a running cluster.

**Important**  
When you create a cluster that has an automatic scaling policy, you must use the `--auto-scaling-role MyAutoScalingRole` command to specify the IAM role for automatic scaling. The default role is `EMR_AutoScaling_DefaultRole` and can be created with the `create-default-roles` command. The role can only be added when the cluster is created, and cannot be added to an existing cluster.

For a detailed description of the parameters available when configuring an automatic scaling policy, see [PutAutoScalingPolicy](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_PutAutoScalingPolicy.html) in *Amazon EMR API Reference*.

### Creating a cluster with an automatic scaling policy applied to an instance group
<a name="emr-autoscale-cli-createcluster"></a>

You can specify an automatic scaling configuration within the `--instance-groups` option of the `aws emr create-cluster` command. The following example illustrates a create-cluster command where an automatic scaling policy for the core instance group is provided inline. The command creates a scaling configuration equivalent to the default scale-out policy that appears when you create an automatic scaling policy with the AWS Management Console for Amazon EMR. For brevity, a scale-in policy is not shown. We do not recommend creating a scale-out rule without a scale-in rule.

```
aws emr create-cluster --release-label emr-5.2.0 --service-role EMR_DefaultRole --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole --auto-scaling-role EMR_AutoScaling_DefaultRole  --instance-groups Name=MyMasterIG,InstanceGroupType=MASTER,InstanceType=m5.xlarge,InstanceCount=1 'Name=MyCoreIG,InstanceGroupType=CORE,InstanceType=m5.xlarge,InstanceCount=2,AutoScalingPolicy={Constraints={MinCapacity=2,MaxCapacity=10},Rules=[{Name=Default-scale-out,Description=Replicates the default scale-out rule in the console.,Action={SimpleScalingPolicyConfiguration={AdjustmentType=CHANGE_IN_CAPACITY,ScalingAdjustment=1,CoolDown=300}},Trigger={CloudWatchAlarmDefinition={ComparisonOperator=LESS_THAN,EvaluationPeriods=1,MetricName=YARNMemoryAvailablePercentage,Namespace=AWS/ElasticMapReduce,Period=300,Statistic=AVERAGE,Threshold=15,Unit=PERCENT,Dimensions=[{Key=JobFlowId,Value="${emr.clusterId}"}]}}}]}'				
```

 The following command illustrates how to use the command line to provide the automatic scaling policy definition as part of an instance group configuration file named `instancegroupconfig.json`.

```
aws emr create-cluster --release-label emr-5.2.0 --service-role EMR_DefaultRole --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole --instance-groups file://your/path/to/instancegroupconfig.json --auto-scaling-role EMR_AutoScaling_DefaultRole								
```

With the contents of the configuration file as follows:

```
[
{
  "InstanceCount": 1,
  "Name": "MyMasterIG",
  "InstanceGroupType": "MASTER",
  "InstanceType": "m5.xlarge"
},
{
  "InstanceCount": 2,
  "Name": "MyCoreIG",
  "InstanceGroupType": "CORE",
  "InstanceType": "m5.xlarge",
  "AutoScalingPolicy":
    {
     "Constraints":
      {
       "MinCapacity": 2,
       "MaxCapacity": 10
      },
     "Rules":
     [
      {
       "Name": "Default-scale-out",
       "Description": "Replicates the default scale-out rule in the console for YARN memory.",
       "Action":{
        "SimpleScalingPolicyConfiguration":{
          "AdjustmentType": "CHANGE_IN_CAPACITY",
          "ScalingAdjustment": 1,
          "CoolDown": 300
        }
       },
       "Trigger":{
        "CloudWatchAlarmDefinition":{
          "ComparisonOperator": "LESS_THAN",
          "EvaluationPeriods": 1,
          "MetricName": "YARNMemoryAvailablePercentage",
          "Namespace": "AWS/ElasticMapReduce",
          "Period": 300,
          "Threshold": 15,
          "Statistic": "AVERAGE",
          "Unit": "PERCENT",
          "Dimensions":[
             {
               "Key" : "JobFlowId",
               "Value" : "${emr.clusterId}"
             }
          ]
        }
       }
      }
     ]
   }
}
]
```

### Adding an instance group with an automatic scaling policy to a cluster
<a name="emr-autoscale-cli-createinstancegroup"></a>

You can specify a scaling policy configuration with the `--instance-groups` option with the `add-instance-groups` command in the same way you can when you use `create-cluster`. The following example uses a reference to a JSON file, `instancegroupconfig.json`, with the instance group configuration.

```
aws emr add-instance-groups --cluster-id j-1EKZ3TYEVF1S2 --instance-groups file://your/path/to/instancegroupconfig.json
```

### Applying an automatic scaling policy to an existing instance group or modifying an applied policy
<a name="emr-autoscale-cli-modifyinstancegroup"></a>

Use the `aws emr put-auto-scaling-policy` command to apply an automatic scaling policy to an existing instance group. The instance group must be part of a cluster that uses the automatic scaling IAM role. The following example uses a reference to a JSON file, `autoscaleconfig.json`, that specifies the automatic scaling policy configuration.

```
aws emr put-auto-scaling-policy --cluster-id j-1EKZ3TYEVF1S2 --instance-group-id ig-3PLUZBA6WLS07 --auto-scaling-policy file://your/path/to/autoscaleconfig.json 
```

The contents of the `autoscaleconfig.json` file, which defines the same scale-out rule as shown in the previous example, is shown below.

```
{
          "Constraints": {
                  "MaxCapacity": 10,
                  "MinCapacity": 2
          },
          "Rules": [{
                  "Action": {
                          "SimpleScalingPolicyConfiguration": {
                                  "AdjustmentType": "CHANGE_IN_CAPACITY",
                                  "CoolDown": 300,
                                  "ScalingAdjustment": 1
                          }
                  },
                  "Description": "Replicates the default scale-out rule in the console for YARN memory",
                  "Name": "Default-scale-out",
                  "Trigger": {
                          "CloudWatchAlarmDefinition": {
                                  "ComparisonOperator": "LESS_THAN",
                                  "Dimensions": [{
                                          "Key": "JobFlowId",
                                          "Value": "${emr.clusterID}"
                                  }],
                                  "EvaluationPeriods": 1,
                                  "MetricName": "YARNMemoryAvailablePercentage",
                                  "Namespace": "AWS/ElasticMapReduce",
                                  "Period": 300,
                                  "Statistic": "AVERAGE",
                                  "Threshold": 15,
                                  "Unit": "PERCENT"
                          }
                  }
          }]
  }
```

### Removing an automatic scaling policy from an instance group
<a name="emr-autoscale-cli-removepolicy"></a>

```
aws emr remove-auto-scaling-policy --cluster-id j-1EKZ3TYEVF1S2 --instance-group-id ig-3PLUZBA6WLS07
```

### Retrieving an automatic scaling policy configuration
<a name="emr-autoscale-cli-getpolicy"></a>

The `describe-cluster` command retrieves the policy configuration in the InstanceGroup block. For example, the following command retrieves the configuration for the cluster with a cluster ID of `j-1CWOHP4PI30VJ`.

```
aws emr describe-cluster --cluster-id j-1CWOHP4PI30VJ
```

The command produces the following example output.

```
{
    "Cluster": {
        "Configurations": [],
        "Id": "j-1CWOHP4PI30VJ",
        "NormalizedInstanceHours": 48,
        "Name": "Auto Scaling Cluster",
        "ReleaseLabel": "emr-5.2.0",
        "ServiceRole": "EMR_DefaultRole",
        "AutoTerminate": false,
        "TerminationProtected": true,
        "MasterPublicDnsName": "ec2-54-167-31-38.compute-1.amazonaws.com",
        "LogUri": "s3n://aws-logs-232939870606-us-east-1/elasticmapreduce/",
        "Ec2InstanceAttributes": {
            "Ec2KeyName": "performance",
            "AdditionalMasterSecurityGroups": [],
            "AdditionalSlaveSecurityGroups": [],
            "EmrManagedSlaveSecurityGroup": "sg-09fc9362",
            "Ec2AvailabilityZone": "us-east-1d",
            "EmrManagedMasterSecurityGroup": "sg-0bfc9360",
            "IamInstanceProfile": "EMR_EC2_DefaultRole"
        },
        "Applications": [
            {
                "Name": "Hadoop",
                "Version": "2.7.3"
            }
        ],
        "InstanceGroups": [
            {
                "AutoScalingPolicy": {
                    "Status": {
                        "State": "ATTACHED",
                        "StateChangeReason": {
                            "Message": ""
                        }
                    },
                    "Constraints": {
                        "MaxCapacity": 10,
                        "MinCapacity": 2
                    },
                    "Rules": [
                        {
                            "Name": "Default-scale-out",
                            "Trigger": {
                                "CloudWatchAlarmDefinition": {
                                    "MetricName": "YARNMemoryAvailablePercentage",
                                    "Unit": "PERCENT",
                                    "Namespace": "AWS/ElasticMapReduce",
                                    "Threshold": 15,
                                    "Dimensions": [
                                        {
                                            "Key": "JobFlowId",
                                            "Value": "j-1CWOHP4PI30VJ"
                                        }
                                    ],
                                    "EvaluationPeriods": 1,
                                    "Period": 300,
                                    "ComparisonOperator": "LESS_THAN",
                                    "Statistic": "AVERAGE"
                                }
                            },
                            "Description": "",
                            "Action": {
                                "SimpleScalingPolicyConfiguration": {
                                    "CoolDown": 300,
                                    "AdjustmentType": "CHANGE_IN_CAPACITY",
                                    "ScalingAdjustment": 1
                                }
                            }
                        },
                        {
                            "Name": "Default-scale-in",
                            "Trigger": {
                                "CloudWatchAlarmDefinition": {
                                    "MetricName": "YARNMemoryAvailablePercentage",
                                    "Unit": "PERCENT",
                                    "Namespace": "AWS/ElasticMapReduce",
                                    "Threshold": 75,
                                    "Dimensions": [
                                        {
                                            "Key": "JobFlowId",
                                            "Value": "j-1CWOHP4PI30VJ"
                                        }
                                    ],
                                    "EvaluationPeriods": 1,
                                    "Period": 300,
                                    "ComparisonOperator": "GREATER_THAN",
                                    "Statistic": "AVERAGE"
                                }
                            },
                            "Description": "",
                            "Action": {
                                "SimpleScalingPolicyConfiguration": {
                                    "CoolDown": 300,
                                    "AdjustmentType": "CHANGE_IN_CAPACITY",
                                    "ScalingAdjustment": -1
                                }
                            }
                        }
                    ]
                },
                "Configurations": [],
                "InstanceType": "m5.xlarge",
                "Market": "ON_DEMAND",
                "Name": "Core - 2",
                "ShrinkPolicy": {},
                "Status": {
                    "Timeline": {
                        "CreationDateTime": 1479413437.342,
                        "ReadyDateTime": 1479413864.615
                    },
                    "State": "RUNNING",
                    "StateChangeReason": {
                        "Message": ""
                    }
                },
                "RunningInstanceCount": 2,
                "Id": "ig-3M16XBE8C3PH1",
                "InstanceGroupType": "CORE",
                "RequestedInstanceCount": 2,
                "EbsBlockDevices": []
            },
            {
                "Configurations": [],
                "Id": "ig-OP62I28NSE8M",
                "InstanceGroupType": "MASTER",
                "InstanceType": "m5.xlarge",
                "Market": "ON_DEMAND",
                "Name": "Master - 1",
                "ShrinkPolicy": {},
                "EbsBlockDevices": [],
                "RequestedInstanceCount": 1,
                "Status": {
                    "Timeline": {
                        "CreationDateTime": 1479413437.342,
                        "ReadyDateTime": 1479413752.088
                    },
                    "State": "RUNNING",
                    "StateChangeReason": {
                        "Message": ""
                    }
                },
                "RunningInstanceCount": 1
            }
        ],
        "AutoScalingRole": "EMR_AutoScaling_DefaultRole",
        "Tags": [],
        "BootstrapActions": [],
        "Status": {
            "Timeline": {
                "CreationDateTime": 1479413437.339,
                "ReadyDateTime": 1479413863.666
            },
            "State": "WAITING",
            "StateChangeReason": {
                "Message": "Cluster ready after last step completed."
            }
        }
    }
}
```

# Manually resize a running Amazon EMR cluster
<a name="emr-manage-resize"></a>

You can add and remove instances from core and task instance groups and instance fleets in a running cluster with the AWS Management Console, AWS CLI, or the Amazon EMR API. If a cluster uses instance groups, you explicitly change the instance count. If your cluster uses instance fleets, you can change the target units for On-Demand Instances and Spot Instances. The instance fleet then adds and removes instances to meet the new target. For more information, see [Instance fleet options](emr-instance-fleet.md#emr-instance-fleet-options). Applications can use newly provisioned Amazon EC2 instances to host nodes as soon as the instances are available. When instances are removed, Amazon EMR shuts down tasks in a way that does not interrupt jobs and safeguards against data loss. For more information, see [Terminate at task completion](emr-scaledown-behavior.md#emr-scaledown-terminate-task).

## Resize a cluster with the console
<a name="resize-console"></a>

You can use the Amazon EMR console to resize a running cluster.

------
#### [ Console ]

**To change the instance count for an existing cluster with the new console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and select the cluster that you want to update. The cluster must be running; you can't resize a provisioning or terminated cluster.

1. On the **Instances** tab on the cluster details page, view the **Instance groups** panel. 

1. To resize an existing instance group, select the radio button next to the core or task instance group that you want to resize and then choose **Resize instance group**. Specify the new number of instances for the instance group, then select **Resize**.
**Note**  
If you choose to reduce the size of a running instance group, Amazon EMR will intelligently select the instances to remove from the group for minimal data loss. For more granular control of your resize action, you can select the **ID** for the instance group, choose the instances you want to remove, and then use the **Terminate** option. For more information on intelligent scale-down behavior, see [Cluster scale-down options for Amazon EMR clusters](emr-scaledown-behavior.md).

1. If you want to cancel the resizing action, you can select the radio button for an instance group with the status **Resizing** and then choose **Stop resize** from the list actions.

1. To add one or more task instance groups to your cluster in response to increasing workload, choose **Add task instance group** from the list actions. Choose the Amazon EC2 instance type, enter the number of instances for the task group, then select **Add task instance group** to return to the **Instance groups** panel for your cluster.

------

When you make a change to the number of nodes, the **Status** of the instance group updates. When the change you requested is complete, the **Status** is **Running**.

## Resize a cluster with the AWS CLI
<a name="ResizingParameters"></a>

You can use the AWS CLI to resize a running cluster. You can increase or decrease the number of task nodes, and you can increase the number of core nodes in a running cluster. It is also possible to shut down an instance in the core instance group with the AWS CLI or the API. This should be done with caution. Shutting down an instance in the core instance group risks data loss, and the instance is not automatically replaced.

In addition to resizing the core and task groups, you can also add one or more task instance groups to a running cluster with the AWS CLI. <a name="IncreaseDecreaseNodesawscli"></a>

**To resize a cluster by changing the instance count with the AWS CLI**

You can add instances to the core group or task group, and you can remove instances from the task group with the AWS CLI `modify-instance-groups` subcommand with the `InstanceCount` parameter. To add instances to the core or task groups, increase the `InstanceCount`. To reduce the number of instances in the task group, decrease the `InstanceCount`. Changing the instance count of the task group to 0 removes all instances but not the instance group.
+ To increase the number of instances in the task instance group from 3 to 4, type the following command and replace *ig-31JXXXXXXBTO* with the instance group ID.

  ```
  aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-31JXXXXXXBTO,InstanceCount=4
  ```

  To retrieve the `InstanceGroupId`, use the `describe-cluster` subcommand. The output is a JSON object called `Cluster` that contains the ID of each instance group. To use this command, you need the cluster ID (which you can retrieve with the `aws emr list-clusters` command or the console). To retrieve the instance group ID, type the following command and replace *j-2AXXXXXXGAPLF* with the cluster ID.

  ```
  aws emr describe-cluster --cluster-id j-2AXXXXXXGAPLF
  ```

  With the AWS CLI, you can also terminate an instance in the core instance group with the `--modify-instance-groups` subcommand.
**Warning**  
Specifying `EC2InstanceIdsToTerminate` must be done with caution. Instances are terminated immediately, regardless of the status of applications running on them, and the instance is not automatically replaced. This is true regardless of the cluster's **Scale down behavior** configuration. Terminating an instance in this way risks data loss and unpredictable cluster behavior.

  To terminate a specific instance you need the instance group ID (returned by the `aws emr describe-cluster --cluster-id` subcommand) and the instance ID (returned by the `aws emr list-instances --cluster-id` subcommand), type the following command, replace *ig-6RXXXXXX07SA* with the instance group ID and replace *i-f9XXXXf2* with the instance ID.

  ```
  1. aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-6RXXXXXX07SA,EC2InstanceIdsToTerminate=i-f9XXXXf2
  ```

  For more information about using Amazon EMR commands in the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/emr](https://docs.aws.amazon.com/cli/latest/reference/emr).

**To resize a cluster by adding task instance groups with the AWS CLI**

With the AWS CLI, you can add from 1–48 task instance groups to a cluster with the `--add-instance-groups` subcommand. Task instances groups can only be added to a cluster containing a primary instance group and a core instance group. When you use the AWS CLI, you can add up to five task instance groups each time you use the `--add-instance-groups` subcommand.

1. To add a single task instance group to a cluster, type the following command and replace *j-JXBXXXXXX37R* with the cluster ID.

   ```
   1. aws emr add-instance-groups --cluster-id j-JXBXXXXXX37R --instance-groups InstanceCount=6,InstanceGroupType=task,InstanceType=m5.xlarge
   ```

1. To add multiple task instance groups to a cluster, type the following command and replace *j-JXBXXXXXX37R* with the cluster ID. You can add up to five task instance groups in a single command.

   ```
   aws emr add-instance-groups --cluster-id j-JXBXXXXXX37R --instance-groups InstanceCount=6,InstanceGroupType=task,InstanceType=m5.xlarge InstanceCount=10,InstanceGroupType=task,InstanceType=m5.xlarge
   ```

   For more information about using Amazon EMR commands in the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/emr](https://docs.aws.amazon.com/cli/latest/reference/emr).

## Interrupting a resize
<a name="interruptible-resize"></a>

Using Amazon EMR version 4.1.0 or later, you can issue a resize in the midst of an existing resize operation. Additionally, you can stop a previously submitted resize request or submit a new request to override a previous request without waiting for it to finish. You can also stop an existing resize from the console or with the `ModifyInstanceGroups` API call with the current count as the target count of the cluster.

The following screenshot shows a task instance group that is resizing but can be stopped by choosing **Stop**.

![\[Task instance group showing resizing status with options to resize or stop.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/resize-stop.png)


**To interrupt a resize with the AWS CLI**

You can use the AWS CLI to stop a resize with the `modify-instance-groups` subcommand. Assume that you have six instances in your instance group and you want to increase this to 10. You later decide that you would like to cancel this request:
+ The initial request:

  ```
  aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-myInstanceGroupId,InstanceCount=10
  ```

  The second request to stop the first request:

  ```
  aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-myInstanceGroupId,InstanceCount=6
  ```

**Note**  
Because this process is asynchronous, you may see instance counts change with respect to previous API requests before subsequent requests are honored. In the case of shrinking, it is possible that if you have work running on the nodes, the instance group may not shrink until nodes have completed their work.

## Suspended state
<a name="emr-manage-resizeSuspended"></a>

An instance group goes into a suspended state if it encounters too many errors while trying to start the new cluster nodes. For example, if new nodes fail while performing bootstrap actions, the instance group goes into a *SUSPENDED* state, rather than continuously provisioning new nodes. After you resolve the underlying issue, reset the desired number of nodes on the cluster's instance group, and then the instance group resumes allocating nodes. Modifying an instance group instructs Amazon EMR to attempt to provision nodes again. No running nodes are restarted or terminated.

In the AWS CLI, the `list-instances` subcommand returns all instances and their states as does the `describe-cluster` subcommand. If Amazon EMR detects a fault with an instance group, it changes the group's state to `SUSPENDED`. 

**To reset a cluster in a SUSPENDED state with the AWS CLI**

Type the `describe-cluster` subcommand with the `--cluster-id` parameter to view the state of the instances in your cluster.
+ To view information on all instances and instance groups in a cluster, type the following command and replace *j-3KVXXXXXXY7UG* with the cluster ID.

  ```
  1. aws emr describe-cluster --cluster-id j-3KVXXXXXXY7UG
  ```

  The output displays information about your instance groups and the state of the instances:

  ```
   1. {
   2.     "Cluster": {
   3.         "Status": {
   4.             "Timeline": {
   5.                 "ReadyDateTime": 1413187781.245,
   6.                 "CreationDateTime": 1413187405.356
   7.             },
   8.             "State": "WAITING",
   9.             "StateChangeReason": {
  10.                 "Message": "Waiting after step completed"
  11.             }
  12.         },
  13.         "Ec2InstanceAttributes": {
  14.             "Ec2AvailabilityZone": "us-west-2b"
  15.         },
  16.         "Name": "Development Cluster",
  17.         "Tags": [],
  18.         "TerminationProtected": false,
  19.         "RunningAmiVersion": "3.2.1",
  20.         "NormalizedInstanceHours": 16,
  21.         "InstanceGroups": [
  22.             {
  23.                 "RequestedInstanceCount": 1,
  24.                 "Status": {
  25.                     "Timeline": {
  26.                         "ReadyDateTime": 1413187775.749,
  27.                         "CreationDateTime": 1413187405.357
  28.                     },
  29.                     "State": "RUNNING",
  30.                     "StateChangeReason": {
  31.                         "Message": ""
  32.                     }
  33.                 },
  34.                 "Name": "MASTER",
  35.                 "InstanceGroupType": "MASTER",
  36.                 "InstanceType": "m5.xlarge",
  37.                 "Id": "ig-3ETXXXXXXFYV8",
  38.                 "Market": "ON_DEMAND",
  39.                 "RunningInstanceCount": 1
  40.             },
  41.             {
  42.                 "RequestedInstanceCount": 1,
  43.                 "Status": {
  44.                     "Timeline": {
  45.                         "ReadyDateTime": 1413187781.301,
  46.                         "CreationDateTime": 1413187405.357
  47.                     },
  48.                     "State": "RUNNING",
  49.                     "StateChangeReason": {
  50.                         "Message": ""
  51.                     }
  52.                 },
  53.                 "Name": "CORE",
  54.                 "InstanceGroupType": "CORE",
  55.                 "InstanceType": "m5.xlarge",
  56.                 "Id": "ig-3SUXXXXXXQ9ZM",
  57.                 "Market": "ON_DEMAND",
  58.                 "RunningInstanceCount": 1
  59.             }
  60. ...
  61. }
  ```

  To view information about a particular instance group, type the `list-instances` subcommand with the `--cluster-id` and `--instance-group-types` parameters. You can view information for the primary, core, or task groups.

  ```
  1. aws emr list-instances --cluster-id j-3KVXXXXXXY7UG --instance-group-types "CORE"
  ```

  Use the `modify-instance-groups` subcommand with the `--instance-groups` parameter to reset a cluster in the `SUSPENDED` state. The instance group id is returned by the `describe-cluster` subcommand.

  ```
  1. aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-3SUXXXXXXQ9ZM,InstanceCount=3
  ```

## Considerations when reducing cluster size
<a name="resize-considerations"></a>

If you choose to reduce the size of a running cluster, consider the following Amazon EMR behavior and best practices:
+ To reduce impact on jobs that are in progress, Amazon EMR intelligently selects the instances to remove. For more information on cluster scale-down behavior, see [Terminate at task completion](emr-scaledown-behavior.md#emr-scaledown-terminate-task) in the Amazon EMR Management Guide. 
+ When you scale down the size of a cluster, Amazon EMR copies the data from the instances that it removes to the instances that remain. Ensure that there is sufficient storage capacity for this data in the instances that remain in the group.
+ Amazon EMR attempts to decommission HDFS on instances in the group. Before you reduce the size of a cluster, we recommend that you minimize HDFS write I/O.
+ For the most granular control when you reduce the size of a cluster, you can view the cluster in the console and navigate to the **Instances** tab. Select the **ID** for the instance group that you want to resize. Then use the **Terminate** option for the specific instances that you want to remove. 

# Configuring provisioning timeouts to control capacity in Amazon EMR
<a name="emr-provisioning-timeout"></a>

When you use instance fleets, you can configure *provisioning timeouts*. A provisioning timeout instructs Amazon EMR to stop provisioning instance capacity if the cluster exceeds a specified time threshold during cluster launch or cluster scaling operations. The following topics cover how to configure a provisioning timeout for cluster launch and for cluster scale-up operations.

**Topics**
+ [Configure provisioning timeouts for cluster launch in Amazon EMR](emr-provisioning-timeout-launch.md)
+ [Customize a provisioning timeout period for cluster resize in Amazon EMR](emr-provisioning-timeout-resize.md)

# Configure provisioning timeouts for cluster launch in Amazon EMR
<a name="emr-provisioning-timeout-launch"></a>

You can define a timeout period to provision Spot Instances for each fleet in your cluster. If Amazon EMR can't provision Spot capacity, you can choose either to terminate the cluster or to provision On-Demand capacity instead. If the timeout period ends during the cluster resizing process, Amazon EMR cancels unprovisioned Spot requests. Unprovisioned Spot instances aren't transferred to On-Demand capacity.

Perform the following steps to customize a provisioning timeout period for cluster launch with the Amazon EMR console.

------
#### [ Console ]

**To configure the provisioning timeout when you create a cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. On the **Create Cluster** page, navigate to **Cluster configuration** and select **Instance Fleets**.

1. Under **Cluster scaling and provisioning option**, specify the Spot size for your core and task fleets.

1. Under **Spot timeout configuration**, select either **Terminate cluster after Spot timeout** or **Switch to On-Demand after Spot timeout**. Then, specify the timeout period for provisioning Spot Instances. The default value is 1 hour.

1. Choose any other options that apply for your cluster.

1. To launch your cluster with the configured timeout, choose **Create cluster**.

------
#### [ AWS CLI ]

**To specify a provisioning timeout with the `create-cluster` command**

```
aws emr create-cluster \
--release-label emr-5.35.0 \
--service-role EMR_DefaultRole \
--ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","SubnetIds":["subnet-XXXXX"]}' \
--instance-fleets '[{"InstanceFleetType":"MASTER","TargetOnDemandCapacity":1,"TargetSpotCapacity":0,"LaunchSpecifications":{"OnDemandSpecification":{"AllocationStrategy":"lowest-price"}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"m5.xlarge"}],"Name":"Master - 1"},{"InstanceFleetType":"CORE","TargetOnDemandCapacity":1,"TargetSpotCapacity":1,"LaunchSpecifications":{"SpotSpecification":{"TimeoutDurationMinutes":120,"TimeoutAction":"SWITCH_TO_ON_DEMAND"},"OnDemandSpecification":{"AllocationStrategy":"lowest-price"}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":1,"InstanceType":"m5.xlarge"}],"Name":"Core - 2"}]'
```

------

# Customize a provisioning timeout period for cluster resize in Amazon EMR
<a name="emr-provisioning-timeout-resize"></a>

You can define a timeout period for provisioning Spot Instances for each fleet in your cluster. If Amazon EMR can't provision the Spot capacity, it cancels the resize request and stops its attempts to provision additional Spot capacity. When you create a cluster, you can configure the timeout. For a running cluster, you can add or update a timeout.

When the timeout period expires, Amazon EMR automatically sends events to an Amazon CloudWatch Events stream. With CloudWatch, you can create rules that match events according to a specified pattern, and then route the events to targets to take action. For example, you might configure a rule to send an email notification. For more information on how to create rules, see [Creating rules for Amazon EMR events with CloudWatch](emr-events-cloudwatch-console.md). For more information about different event details, see [Instance fleet state-change events](emr-manage-cloudwatch-events.md#emr-cloudwatch-instance-fleet-events).

## Examples of provisioning timeouts for cluster resize
<a name="emr-provisioning-timeout-examples"></a>

**Specify a provisioning timeout for resize with the AWS CLI**

The following example uses the `create-cluster` command to add a provisioning timeout for resize.

```
aws emr create-cluster \
--release-label emr-5.35.0 \
--service-role EMR_DefaultRole \
--ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","SubnetIds":["subnet-XXXXX"]}' \
--instance-fleets '[{"InstanceFleetType":"MASTER","TargetOnDemandCapacity":1,"TargetSpotCapacity":0,"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"m5.xlarge"}],"Name":"Master - 1"},{"InstanceFleetType":"CORE","TargetOnDemandCapacity":1,"TargetSpotCapacity":1,"LaunchSpecifications":{"SpotSpecification":{"TimeoutDurationMinutes":120,"TimeoutAction":"SWITCH_TO_ON_DEMAND"},"OnDemandSpecification":{"AllocationStrategy":"lowest-price"}},"ResizeSpecifications":{"SpotResizeSpecification":{"TimeoutDurationMinutes":20},"OnDemandResizeSpecification":{"TimeoutDurationMinutes":25}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":1,"InstanceType":"m5.xlarge"}],"Name":"Core - 2"}]'
```

The following example uses the `modify-instance-fleet` command to add a provisioning timeout for resize.

```
aws emr modify-instance-fleet \
--cluster-id j-XXXXXXXXXXXXX \
--instance-fleet '{"InstanceFleetId":"if-XXXXXXXXXXXX","ResizeSpecifications":{"SpotResizeSpecification":{"TimeoutDurationMinutes":30},"OnDemandResizeSpecification":{"TimeoutDurationMinutes":60}}}' \
--region us-east-1
```

The following example uses the `add-instance-fleet-command` to add a provisioning timeout for resize.

```
aws emr add-instance-fleet \
--cluster-id j-XXXXXXXXXXXXX \
--instance-fleet '{"InstanceFleetType":"TASK","TargetOnDemandCapacity":1,"TargetSpotCapacity":0,"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"m5.xlarge"}],"Name":"TaskFleet","ResizeSpecifications":{"SpotResizeSpecification":{"TimeoutDurationMinutes":30},"OnDemandResizeSpecification":{"TimeoutDurationMinutes":35}}}' \
--region us-east-1
```

**Specify a provisioning timeout for resize and launch with the AWS CLI**

The following example uses the `create-cluster` command to add a provisioning timeout for resize and launch.

```
aws emr create-cluster \
--release-label emr-5.35.0 \
--service-role EMR_DefaultRole \
--ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","SubnetIds":["subnet-XXXXX"]}' \
--instance-fleets '[{"InstanceFleetType":"MASTER","TargetOnDemandCapacity":1,"TargetSpotCapacity":0,"LaunchSpecifications":{"OnDemandSpecification":{"AllocationStrategy":"lowest-price"}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"m5.xlarge"}],"Name":"Master - 1"},{"InstanceFleetType":"CORE","TargetOnDemandCapacity":1,"TargetSpotCapacity":1,"LaunchSpecifications":{"SpotSpecification":{"TimeoutDurationMinutes":120,"TimeoutAction":"SWITCH_TO_ON_DEMAND"},"OnDemandSpecification":{"AllocationStrategy":"lowest-price"}},"ResizeSpecifications":{"SpotResizeSpecification":{"TimeoutDurationMinutes":20},"OnDemandResizeSpecification":{"TimeoutDurationMinutes":25}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":1,"InstanceType":"m5.xlarge"}],"Name":"Core - 2"}]'
```

## Considerations for resize provisioning timeouts
<a name="emr-provisioning-timeout-considerations"></a>

When you configure cluster provisioning timeouts for your instance fleets, consider the following behaviors.
+ You can configure provisioning timeouts for both Spot and On-Demand Instances. The minimum provisioning timeout is 5 minutes. The maximum provisioning timeout is 7 days.
+ You can only configure provisioning timeouts for an EMR cluster that uses instance fleets. You must configure each core and task fleet separately.
+ When you create a cluster, you can configure provisioning timeouts. You can add a timeout or update an existing timeout for a running cluster.
+ If you submit multiple resize operations, then Amazon EMR tracks provisioning timeouts for every resize operation. For example, set the provisioning timeout on a cluster to *60* minutes. Then, submit a resize operation *R1* at time *T1*. Submit a second resize operation *R2* at time *T2*. The provisioning timeout for R1 expires at *T1 \$1 60 minutes*. The provisioning timeout for R2 expires at *T2 \$1 60 minutes*.
+ If you submit a new scale-up resize operation before the timeout expires, Amazon EMR continues its attempt to provision capacity for your EMR cluster.

# Cluster scale-down options for Amazon EMR clusters
<a name="emr-scaledown-behavior"></a>

**Note**  
Scale-down behavior options are no longer supported since Amazon EMR release 5.10.0. Because of the introduction of per-second billing in Amazon EC2, the default scale-down behavior for Amazon EMR clusters is now terminate at task completion.

With Amazon EMR releases 5.1.0 through 5.9.1, there are two options for scale-down behavior: terminate at the instance-hour boundary for Amazon EC2 billing, or terminate at task completion. Starting with Amazon EMR release 5.10.0, the setting for termination at instance-hour boundary is deprecated because of the introduction of per-second billing in Amazon EC2. We do not recommend specifying termination at the instance-hour boundary in versions where the option is available.

**Warning**  
If you use the AWS CLI to issue a `modify-instance-groups` with `EC2InstanceIdsToTerminate`, these instances are terminated immediately, without consideration for these settings, and regardless of the status of applications running on them. Terminating an instance in this way risks data loss and unpredictable cluster behavior.

When terminate at task completion is specified, Amazon EMR deny lists and drains tasks from nodes before terminating the Amazon EC2 instances. With either behavior specified, Amazon EMR does not terminate Amazon EC2 instances in core instance groups if it could lead to HDFS corruption. 

## Terminate at task completion
<a name="emr-scaledown-terminate-task"></a>

Amazon EMR allows you to scale down your cluster without affecting your workload. Amazon EMR attempts to gracefully decommission YARN, HDFS, and other daemons on core and task nodes during a resize down operation without losing data or interrupting jobs. Amazon EMR only reduces instance group size if the work assigned to the groups has completed and they are idle. For YARN NodeManager Graceful Decommission, you can manually adjust the time a node waits for decommissioning.

**Note**  
When graceful decommissioning occurs, there can be data loss. Be sure to back up your data.

**Important**  
It is possible that HDFS data can be permanently lost during the graceful replacement of an unhealthy core instance. We recommend that you always back up your data.

This time is set using a property in the `YARN-site` configuration classification. Using Amazon EMR release 5.12.0 and higher, specify the `YARN.resourcemanager.nodemanager-graceful-decommission-timeout-secs` property. Using earlier Amazon EMR releases, specify the `YARN.resourcemanager.decommissioning.timeout` property.

If there are still running containers or YARN applications when the decommissioning timeout passes, the node is forced to be decommissioned and YARN reschedules affected containers on other nodes. The default value is 3600s (one hour). You can set this timeout to be an arbitrarily high value to force graceful reduction to wait longer. For more information, see [Graceful Decommission of YARN nodes](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html) in the Apache Hadoop documentation.

### Task node groups
<a name="emr-scaledown-task-nodes"></a>

Amazon EMR intelligently selects instances that do not have tasks that are running against any step or application, and removes those instances from a cluster first. If all instances in the cluster are in use, Amazon EMR waits for tasks to complete on an instance before removing it from the cluster. The default wait time is 1 hour. This value can be changed with the `YARN.resourcemanager.decommissioning.timeout` setting. Amazon EMR dynamically uses the new setting. You can set this to an arbitrarily large number to ensure that Amazon EMR doesn't terminate any tasks while reducing the cluster size.

### Core node groups
<a name="emr-scaledown-core-nodes"></a>

On core nodes, both YARN NodeManager and HDFS DataNode daemons must be decommissioned for the instance group to reduce. For YARN, graceful reduction ensures that a node marked for decommissioning is only transitioned to the `DECOMMISSIONED` state if there are no pending or incomplete containers or applications. The decommissioning finishes immediately if there are no running containers on the node at the beginning of decommissioning. 

For HDFS, graceful reduction ensures that the target capacity of HDFS is large enough to fit all existing blocks. If the target capacity is not large enough, only a partial amount of core instances are decommissioned such that the remaining nodes can handle the current data residing in HDFS. You should ensure additional HDFS capacity to allow further decommissioning. You should also try to minimize write I/O before attempting to reduce instance groups. Excessive write I/O might delay completion of the resize operation. 

Another limit is the default replication factor, `dfs.replication` inside `/etc/hadoop/conf/hdfs-site`. When it creates a cluster, Amazon EMR configures the value based on the number of instances in the cluster: `1` with 1-3 instances, `2` for clusters with 4-9 instances, and `3` for clusters with 10\$1 instances. 

**Warning**  
Setting `dfs.replication` to 1 on clusters with fewer than four nodes can lead to HDFS data loss if a single node goes down. We recommend you use a cluster with at least four core nodes for production workloads.
Amazon EMR will not allow clusters to scale core nodes below `dfs.replication`. For example, if `dfs.replication = 2`, the minimum number of core nodes is 2.
When you use Managed Scaling, Auto-scaling, or choose to manually resize your cluster, we recommend that you to set `dfs.replication` to 2 or higher.

Graceful reduction doesn't let you reduce core nodes below the HDFS replication factor. This is to allow HDFS to close files due insufficient replicas. To circumvent this limit, lower the replication factor and restart the NameNode daemon. 

# Configure Amazon EMR scale-down behavior
<a name="emr-scaledown-configure"></a>

**Note**  
The terminate at instance hour scale-down behavior option is no longer supported for Amazon EMR release 5.10.0 and higher. The following scale-down behavior options only appear in the Amazon EMR console for releases 5.1.0 through 5.9.1.

You can use the AWS Management Console, the AWS CLI, or the Amazon EMR API to configure scale-down behavior when you create a cluster. 

------
#### [ Console ]

**To configure scale-down behavior with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. In the **Cluster scaling and provisioning option** section, choose **Use custom automatic scaling **. Under **Custom automatic scaling policies**, choose the **plus action button** to add **Scale in** policies. We recommend that you add both **Scale in** and **Scale out** policies. Adding in only one set of policies means Amazon EMR will only perform one-way scaling, and you have to manually perform the other actions.

1. Choose any other options that apply to your cluster. 

1. To launch your cluster, choose **Create cluster**.

------
#### [ AWS CLI ]

**To configure scale-down behavior with the AWS CLI**
+ Use the `--scale-down-behavior` option to specify either `TERMINATE_AT_INSTANCE_HOUR` or `TERMINATE_AT_TASK_COMPLETION`.

------

# Terminate an Amazon EMR cluster in the starting, running, or waiting states
<a name="UsingEMR_TerminateJobFlow"></a>

**Warning**  
A terminated EMR Cluster is irrecoverable. Ensure that the resource and any data on HDFS or jupyter notebooks is no longer required prior to removal. HDFS data is lost when the cluster is terminated.

This section describes the methods of terminating a cluster. For information about enabling termination protection and auto-terminating clusters, see [Control Amazon EMR cluster termination](emr-plan-termination.md). You can terminate clusters in the `STARTING`, `RUNNING`, or `WAITING` states. A cluster in the `WAITING` state must be terminated or it runs indefinitely, generating charges to your account. You can terminate a cluster that fails to leave the `STARTING` state or is unable to complete a step. 

If you want to terminate a cluster that has termination protection set on it, you must disable termination protection before you can terminate the cluster. Clusters can be terminated using the console, the AWS CLI, or programmatically using the `TerminateJobFlows` API.

Depending on the configuration of the cluster, it could take from 5 to 20 minutes for the cluster to completely terminate and release allocated resources, such as EC2 instances.

**Note**  
You can't restart a terminated cluster, but you can clone a terminated cluster to reuse its configuration for a new cluster. For more information, see [Clone an Amazon EMR cluster using the console](clone-console.md).

**Important**  
Amazon EMR uses the [Amazon EMR service role](emr-iam-role.md) and the `AWSServiceRoleForEMRCleanup` role to clean up cluster resources in your account that you no longer use, such as Amazon EC2 instances. You must include actions for the role policies to delete or terminate the resources. Otherwise, Amazon EMR can’t perform these cleanup actions, and you might incur costs for unused resources that remain on the cluster.

## Terminate a cluster with the console
<a name="emr-dev-terminate-job-flow-console"></a>

You can terminate one or more clusters using the Amazon EMR console. The steps to terminate a cluster in the console vary depending on whether termination protection is on or off. To terminate a protected cluster, you must first disable termination protection. 

------
#### [ Console ]

**To terminate a cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Choose **Clusters**, and then choose the cluster you want to terminate.

1. Under the **Actions** dropdown menu, choose **Terminate cluster** to open the **Terminate cluster** prompt.

1. At the prompt, choose **Terminate**. Depending on the cluster configuration, termination may take 5 to 10 minutes. For more information on how to Amazon EMR clusters, see [Terminate an Amazon EMR cluster in the starting, running, or waiting states](#UsingEMR_TerminateJobFlow).

------

## Terminate a cluster with the AWS CLI
<a name="emr-dev-terminate-job-flow-cli"></a>

**To terminate an unprotected cluster using the AWS CLI**

To terminate an unprotected cluster using the AWS CLI, use the `terminate-clusters` subcommand with the --cluster-ids parameter. 
+ Type the following command to terminate a single cluster and replace *j-3KVXXXXXXX7UG* with your cluster ID.

  ```
  1. aws emr terminate-clusters --cluster-ids j-3KVXXXXXXX7UG
  ```

  To terminate multiple clusters, type the following command and replace *j-3KVXXXXXXX7UG* and *j-WJ2XXXXXX8EU* with your cluster IDs.

  ```
  1. aws emr terminate-clusters --cluster-ids j-3KVXXXXXXX7UG j-WJ2XXXXXX8EU
  ```

  For more information on using Amazon EMR commands in the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/emr](https://docs.aws.amazon.com/cli/latest/reference/emr).

**To terminate a protected cluster using the AWS CLI**

To terminate a protected cluster using the AWS CLI, first disable termination protection using the `modify-cluster-attributes` subcommand with the `--no-termination-protected` parameter. Then use the `terminate-clusters` subcommand with the `--cluster-ids` parameter to terminate it. 

1. Type the following command to disable termination protection and replace *j-3KVTXXXXXX7UG* with your cluster ID.

   ```
   1. aws emr modify-cluster-attributes --cluster-id j-3KVTXXXXXX7UG --no-termination-protected
   ```

1. To terminate the cluster, type the following command and replace *j-3KVXXXXXXX7UG* with your cluster ID.

   ```
   1. aws emr terminate-clusters --cluster-ids j-3KVXXXXXXX7UG
   ```

   To terminate multiple clusters, type the following command and replace *j-3KVXXXXXXX7UG* and *j-WJ2XXXXXX8EU* with your cluster IDs.

   ```
   1. aws emr terminate-clusters --cluster-ids j-3KVXXXXXXX7UG j-WJ2XXXXXX8EU
   ```

   For more information on using Amazon EMR commands in the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/emr](https://docs.aws.amazon.com/cli/latest/reference/emr).

## Terminate a cluster with the API
<a name="emr-dev-terminate-job-flow-api"></a>

The `TerminateJobFlows` operation ends step processing, uploads any log data from Amazon EC2 to Amazon S3 (if configured), and terminates the Hadoop cluster. A cluster also terminates automatically if you set `KeepJobAliveWhenNoSteps` to `False` in a `RunJobFlows` request.

You can use this action to terminate either a single cluster or a list of clusters by their cluster IDs.

For more information about the input parameters unique to `TerminateJobFlows`, see [ TerminateJobFlows](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_TerminateJobFlows.html). For more information about the generic parameters in the request, see [Common request parameters](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/CommonParameters.html).

# Clone an Amazon EMR cluster using the console
<a name="clone-console"></a>

You can use the Amazon EMR console to clone a cluster, which makes a copy of the configuration of the original cluster to use as the basis for a new cluster. 

------
#### [ Console ]

**To clone a cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**.

1.  *To clone a cluster from the cluster list*

   1. Use the search and filter options to find the cluster that you want to clone in the list view.

   1. Select the check box to the left of the row for the cluster that you want to clone. 

   1. The **Clone** option will now be available at the top of the list view. Select **Clone** to initiate the cloning process. If the cluster has steps configured, choose **Include steps** and **Continue** if you want to clone the steps along with the other cluster configurations.

   1. Review the settings for the new cluster that have copied over from the cloned cluster. Adjust the settings if needed. When you are satisfied with the new cluster's configuration, select **Create cluster** to launch the new cluster.

1. *To clone a cluster from a cluster detail page*

   1. To navigate to the detail page of the cluster that you want to clone, select its **Cluster ID** from the cluster list view.

   1. At the top of the cluster detail page, select **Clone cluster** from the **Actions** menu to initiate the cloning process. If the cluster has steps configured, choose **Include steps** and **Continue** if you want to clone the steps along with the other cluster configurations.

   1. Review the settings for the new cluster that have copied over from the cloned cluster. Adjust the settings if needed. When you are satisfied with the new cluster's configuration, select **Create cluster** to launch the new cluster.

------

# Automate recurring Amazon EMR clusters with AWS Data Pipeline
<a name="emr-manage-recurring"></a>

**Note**  
AWS Data Pipeline is no longer available to new customers. Existing customers of AWS Data Pipeline can continue to use the service as normal.

 AWS Data Pipeline is a service that automates the movement and transformation of data. You can use it to schedule moving input data into Amazon S3 and to schedule launching clusters to process that data. For example, consider the case where you have a web server recording traffic logs. If you want to run a weekly cluster to analyze the traffic data, you can use AWS Data Pipeline to schedule those clusters. AWS Data Pipeline is a data-driven workflow, so that one task (launching the cluster) can be dependent on another task (moving the input data to Amazon S3). It also has robust retry functionality. 

 For more information about AWS Data Pipeline, see the [AWS Data Pipeline Developer Guide](https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html), especially the tutorials regarding Amazon EMR: 
+  [Tutorial: Launch an Amazon EMR job flow](https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-launch-emr-jobflow.html) 
+  [Getting started: Process web logs with AWS Data Pipeline, Amazon EMR, and Hive](https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-process-logs.html) 
+  [Tutorial: Amazon DynamoDB import and export using AWS Data Pipeline](https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-importexport-ddb.html) 