# Use Amazon EMR cluster scaling to adjust for changing workloads
<a name="emr-scale-on-demand"></a>

You can adjust the number of Amazon EC2 instances available to an Amazon EMR cluster automatically or manually in response to workloads that have varying demands. To use automatic scaling, you have two options. You can enable Amazon EMR managed scaling or create a custom automatic scaling policy. The following table describes the differences between the two options.


|  | Amazon EMR managed scaling | Custom automatic scaling | 
| --- | --- | --- | 
|  Scaling policies and rules  |  No policy required. Amazon EMR manages the automatic scaling activity by continuously evaluating cluster metrics and making optimized scaling decisions.   |  You need to define and manage the automatic scaling policies and rules, such as the specific conditions that trigger scaling activities, evaluation periods, cooldown periods, etc.  | 
|  Supported Amazon EMR releases  |  Amazon EMR version 5.30.0 and higher (except Amazon EMR version 6.0.0)  |  Amazon EMR version 4.0.0 and higher  | 
|  Supported cluster composition  | Instance groups or instance fleets |  Instance groups only  | 
| Scaling limits configuration |  Scaling limits are configured for the entire cluster.  |  Scaling limits can only be configured for each instance group.  | 
|  Metrics evaluation frequency   |  Every 5 to 10 seconds More frequent evaluation of metrics allows Amazon EMR to make more precise scaling decisions.  |  You can define the evaluation periods only in five-minute increments.  | 
|  Supported applications  |  Only YARN applications are supported, such as Spark, Hadoop, Hive, Flink. Amazon EMR managed scaling does not support applications that are not based on YARN, such as Presto or HBase.  |  You can choose which applications are supported when defining the automatic scaling rules.   | 

## Considerations
<a name="emr-scaling-considerations"></a>
+ An Amazon EMR cluster always comprises one or three primary nodes. Once you initially configure the cluster, you can only scale core and task nodes. You can't scale the number of primary nodes for the cluster. 
+ For instance groups, reconfiguration operations and resize operations occur consecutively and not concurrently. If you initiate a reconfiguration while an instance group is resizing, the reconfiguration starts once the instance group completes the resize in progress. Conversely, if you initiate a resize operation while an instance group is busy with reconfiguration, the resizing starts once the reconfiguration is complete. 

# Using managed scaling in Amazon EMR
<a name="emr-managed-scaling"></a>

**Important**  
We strongly recommend that you use the latest Amazon EMR release (Amazon EMR 7.12.0) for managed scaling. In some early releases, you might experience intermittent application failures or delays in scaling. Amazon EMR resolved this issue with 5.x releases 5.30.2, 5.31.1, 5.32.1, 5.33.1 and higher, and with 6.x releases 6.1.1, 6.2.1, 6.3.1 and higher. For more information Region and release availability, see [Managed scaling availability](#emr-managed-scaling-availability).

## Overview
<a name="emr-managed-scaling-overview"></a>

With Amazon EMR versions 5.30.0 and higher (except for Amazon EMR 6.0.0), you can enable Amazon EMR managed scaling. Managed scaling lets you automatically increase or decrease the number of instances or units in your cluster based on workload. Amazon EMR continuously evaluates cluster metrics to make scaling decisions that optimize your clusters for cost and speed. Managed scaling is available for clusters composed of either instance groups or instance fleets.

## Managed scaling availability
<a name="emr-managed-scaling-availability"></a>
+ In the following AWS Regions, Amazon EMR managed scaling is available with Amazon EMR 6.14.0 and higher:
  + Asia Pacific (Taipei) (ap-east-2)
  + Asia Pacific (Melbourne) (ap-southeast-4)
  + Asia Pacific (Malaysia) (ap-southeast-5)
  + Asia Pacific (New Zealand) (ap-southeast-6)
  + Asia Pacific (Thailand) (ap-southeast-7)
  + Canada West (Calgary) (ca-west-1)
  + Europe (Spain) (eu-south-2)
  + Mexico (Central) (mx-central-1)
+ In the following AWS Regions, Amazon EMR managed scaling is available with Amazon EMR 5.30.0 and 6.1.0 and higher:
  + US East (N. Virginia) (us-east-1)
  + US East (Ohio) (us-east-2)
  + US West (Oregon) (us-west-2)
  + US West (N. California) (us-west-1)
  + Africa (Cape Town) (af-south-1)
  + Asia Pacific (Hong Kong) (ap-east-1)
  + Asia Pacific (Mumbai) (ap-south-1)
  + Asia Pacific (Hyderabad) (ap-south-2)
  + Asia Pacific (Seoul) (ap-northeast-2)
  + Asia Pacific (Singapore) (ap-southeast-1)
  + Asia Pacific (Sydney) (ap-southeast-2)
  + Asia Pacific (Jakarta) (ap-southeast-3)
  + Asia Pacific (Tokyo) (ap-northeast-1)
  + Asia Pacific (Osaka) (ap-northeast-3)
  + Canada (Central) (ca-central-1)
  + South America (São Paulo) (sa-east-1)
  + Europe (Frankfurt) (eu-central-1)
  + Europe (Zurich) (eu-central-2)
  + Europe (Ireland) (eu-west-1)
  + Europe (London) (eu-west-2)
  + Europe (Milan) (eu-south-1)
  + Europe (Paris) (eu-west-3)
  + Europe (Stockholm) (eu-north-1)
  + Israel (Tel Aviv) (il-central-1)
  + Middle East (UAE) (me-central-1)
  + China (Beijing) (cn-north-1)
  + China (Ningxia) (cn-northwest-1)
  + AWS GovCloud (US-East) (us-gov-east-1)
  + AWS GovCloud (US-West) (us-gov-west-1)
+ Amazon EMR managed scaling only works with YARN applications, such as Spark, Hadoop, Hive, and Flink. It doesn't support applications that are not based on YARN, such as Presto and HBase.

## Managed scaling parameters
<a name="emr-managed-scaling-parameters"></a>

You must configure the following parameters for managed scaling. The limit only applies to the core and task nodes. You cannot scale the primary node after initial configuration.
+ **Minimum** (`MinimumCapacityUnits`) – The lower boundary of allowed EC2 capacity in a cluster. It is measured through virtual central processing unit (vCPU) cores or instances for instance groups. It is measured through units for instance fleets. 
+ **Maximum** (`MaximumCapacityUnits`) – The upper boundary of allowed EC2 capacity in a cluster. It is measured through virtual central processing unit (vCPU) cores or instances for instance groups. It is measured through units for instance fleets. 
+ **On-Demand limit** (`MaximumOnDemandCapacityUnits`) (Optional) – The upper boundary of allowed EC2 capacity for On-Demand market type in a cluster. If this parameter is not specified, it defaults to the value of `MaximumCapacityUnits`. 
  + This parameter is used to split capacity allocation between On-Demand and Spot Instances. For example, if you set the minimum parameter as 2 instances, the maximum parameter as 100 instances, the On-Demand limit as 10 instances, then Amazon EMR managed scaling scales up to 10 On-Demand Instances and allocates the remaining capacity to Spot Instances. For more information, see [Node allocation scenarios](managed-scaling-allocation-strategy.md#node-allocation-scenarios).
+ **Maximum core nodes **(`MaximumCoreCapacityUnits`) (Optional) – The upper boundary of allowed EC2 capacity for core node type in a cluster. If this parameter is not specified, it defaults to the value of `MaximumCapacityUnits`. 
  + This parameter is used to split capacity allocation between core and task nodes. For example, if you set the minimum parameter as 2 instances, the maximum as 100 instances, the maximum core node as 17 instances, then Amazon EMR managed scaling scales up to 17 core nodes and allocates the remaining 83 instances to task nodes. For more information, see [Node allocation scenarios](managed-scaling-allocation-strategy.md#node-allocation-scenarios). 

For more information about managed scaling parameters, see [https://docs.aws.amazon.com/emr/latest/APIReference/API_ComputeLimits.html](https://docs.aws.amazon.com/emr/latest/APIReference/API_ComputeLimits.html).

## Considerations for Amazon EMR managed scaling
<a name="emr-managed-scaling-considerations"></a>
+ Managed scaling is supported in limited AWS Regions and Amazon EMR releases. For more information, see [Managed scaling availability](#emr-managed-scaling-availability).
+ You must configure the required parameters for Amazon EMR managed scaling. For more information, see [Managed scaling parameters](#emr-managed-scaling-parameters). 
+ To use managed scaling, the metrics-collector process must be able to connect to the public API endpoint for managed scaling in API Gateway. If you use a private DNS name with Amazon Virtual Private Cloud, managed scaling won't function properly. To ensure that managed scaling works, we recommend that you take one of the following actions:
  + Remove the API Gateway interface VPC endpoint from your Amazon VPC.
  + Follow the instructions in [Why do I get an HTTP 403 Forbidden error when connecting to my API Gateway APIs from a VPC?](https://aws.amazon.com/premiumsupport/knowledge-center/api-gateway-vpc-connections/) to disable the private DNS name setting.
  + Launch your cluster in a private subnet instead. For more information, see the topic on [Private subnets](emr-clusters-in-a-vpc.md#emr-vpc-private-subnet).
+ If your YARN jobs are intermittently slow during scale down, and YARN Resource Manager logs show that most of your nodes were deny-listed during that time, you can adjust the decommissioning timeout threshold.

  Reduce the `spark.blacklist.decommissioning.timeout` from one hour to one minute to make the node available for other pending containers to continue task processing.

  You should also set `YARN.resourcemanager.nodemanager-graceful-decommission-timeout-secs` to a larger value to ensure Amazon EMR doesn’t force terminate the node while the longest “Spark Task” is still running on the node. The current default is 60 minutes, which means YARN force-terminates the container after 60 minutes once the node enters the decomissioning state.

  The following example YARN Resource Manager Log line shows nodes added to the decomissioning state:

  ```
  2021-10-20 15:55:26,994 INFO org.apache.hadoop.YARN.server.resourcemanager.DefaultAMSProcessor (IPC Server handler 37 on default port 8030): blacklist are updated in Scheduler.blacklistAdditions: [ip-10-10-27-207.us-west-2.compute.internal, ip-10-10-29-216.us-west-2.compute.internal, ip-10-10-31-13.us-west-2.compute.internal, ... , ip-10-10-30-77.us-west-2.compute.internal], blacklistRemovals: []
  ```

  See more [details on how Amazon EMR integrates with YARN deny listing during decommissioning of nodes](https://aws.amazon.com/blogs/big-data/spark-enhancements-for-elasticity-and-resiliency-on-amazon-emr/), [cases when nodes in Amazon EMR can be deny listed](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-troubleshoot-error-resource-3.html), and [configuring Spark node-decommissioning behavior](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-configure.html#spark-decommissioning).
+ For Spark workloads, disabling Spark Dynamic Resource Allocator (DRA) by changing the Spark property **spark.dynamicAllocation.enabled** to `FALSE` can cause Managed Scaling issues, where clusters can be scaled up more than required for your workloads (up to the maximum compute). When using Managed Scaling for these workloads, we recommend that you keep Spark DRA enabled, which is the default state of this property.
+ Over-utilization of EBS volumes can cause Managed Scaling issues. We recommend that you maintain EBS volume below 90% utilization. For more information, see [Instance storage options and behavior in Amazon EMR](emr-plan-storage.md).
+ Amazon CloudWatch metrics are critical for Amazon EMR managed scaling to operate. We recommend that you closely monitor Amazon CloudWatch metrics to make sure data is not missing. For more information about how you can configure CloudWatch alarms to detect missing metrics, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html). 
+ Managed scaling operations on 5.30.0 and 5.30.1 clusters without Presto installed may cause application failures or cause a uniform instance group or instance fleet to stay in the `ARRESTED` state, particularly when a scale down operation is followed quickly by a scale up operation.

  As a workaround, choose Presto as an application to install when you create a cluster with Amazon EMR releases 5.30.0 and 5.30.1, even if your job does not require Presto.
+ When you set the maximum core node and the On-Demand limit for Amazon EMR managed scaling, consider the differences between instance groups and instance fleets. Each instance group consists of the same instance type and the same purchasing option for instances: On-Demand or Spot. For each instance fleet, you can specify up to five instance types, which can be provisioned as On-Demand and Spot Instances. For more information, see [Create a cluster with instance fleets or uniform instance groups](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-instance-group-configuration.html), [Instance fleet options](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-instance-fleet.html#emr-instance-fleet-options), and [Node allocation scenarios](managed-scaling-allocation-strategy.md#node-allocation-scenarios).
+ With Amazon EMR 5.30.0 and higher, if you remove the default **Allow All** outbound rule to 0.0.0.0/ for the master security group, you must add a rule that allows outbound TCP connectivity to your security group for service access on port 9443. Your security group for service access must also allow inbound TCP traffic on port 9443 from the master security group. For more information about configuring security groups, see [Amazon EMR-managed security group for the primary instance (private subnets)](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-man-sec-groups.html#emr-sg-elasticmapreduce-master-private).
+ You can use AWS CloudFormation to configure Amazon EMR managed scaling. For more information, see [AWS::EMR::Cluster](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-elasticmapreduce-cluster.html) in the *AWS CloudFormation User Guide*. 
+ If you're using Spot nodes, consider using node labels to prevent Amazon EMR from removing application processes when Amazon EMR removes Spot nodes. For more information about node labels, see [Task nodes](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html#emr-plan-task).
+ Node labeling is not supported by default in Amazon EMR releases 6.15 or lower. For more information, see [Understand node types: primary, core, and task nodes.](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html)
+ If you're using Amazon EMR releases 6.15 or lower, you can only assign node labels by node type, such as core and task nodes. However, if you're using Amazon EMR release 7.0 or higher, you can configure node labels by node type and market type, such as On-Demand and Spot.
+ If application process demand increases and executor demand decreases when you restricted the application process to core nodes, you can add back core nodes and remove task nodes in the same resize operation. For more information, see [Understanding node allocation strategy and scenarios](https://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html).
+ Amazon EMR doesn't label task nodes, so you can't set the YARN properties to restrict application processes only for task nodes. However, if you want to use market types as node labels, you can use the `ON_DEMAND` or `SPOT` labels for application process placement. We don't recommend using Spot nodes for application primary processes.
+ When using node labels, the total running units in the cluster can temporarily exceed the max compute set in your managed scaling policy while Amazon EMR decommissions some of your instances. Total requested units will always stay at or below your policy’s max compute. 
+ Managed scaling only supports the node labels `ON_DEMAND` and `SPOT` or `CORE` and `TASK`. Custom node labels aren't supported.
+ Amazon EMR creates node labels when creating the cluster and provisioning resources. Amazon EMR doesn't support adding node labels when you reconfigure the cluster. You also can't modify the node labels when configuring managed scaling after launching the cluster.
+ Managed scaling scales core and task nodes independently based on application process and executor demand. To prevent HDFS data loss issues during core scale down, follow standard practice for core nodes. To learn more about best practices about core nodes and HDFS replication, see [Considerations and best practices](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-ha-considerations.html).
+ You can't place both the application process and executors on only the `core` or the `ON_DEMAND` node. If you want to add both the application process and executors on one of the nodes, don't use the `yarn.node-labels.am.default-node-label-expression` configuration.

  For example, to place both the application process and executors in `ON_DEMAND` nodes, set max compute to the same as the maximum in the `ON_DEMAND` node. Also remove the `yarn.node-labels.am.default-node-label-expression` configuration.

  To add both the application process and executors on `core` nodes, remove the `yarn.node-labels.am.default-node-label-expression` configuration.
+  When you use managed scaling with node labels, set the property `yarn.scheduler.capacity.maximum-am-resource-percent: 1` if you plan to run multiple applications in parallel. Doing so ensures that your application processes fully utilize the available `CORE` or `ON_DEMAND` nodes. 
+  If you use managed scaling with node labels, set the property `yarn.resourcemanager.decommissioning.timeout` to a value that's longer than the longest running application on your cluster. Doing so reduces the chance that Amazon EMR managed scalling needs to reschedule your applications to recommission `CORE` or `ON_DEMAND` nodes. 
+ To reduce the risk of application failures due to shuffle data loss, Amazon EMR collects metrics from the cluster to determine nodes that have existing transient shuffle data from the current and previous stage. In rare cases, metrics can continue to report stale data for applications that are already completed or terminated. This can impact timely scale down of instances in your cluster. For clusters that have large amount of shuffle data, consider using EMR versions 6.13 and later.

## Feature history
<a name="emr-managed-scaling-history"></a>

This table lists updates to the Amazon EMR managed scaling capability.


| Release date | Capability | Amazon EMR versions | 
| --- | --- | --- | 
| November 20, 2024 | Managed scaling is available in the il-central-1 Israel (Tel Aviv), me-central-1 Middle East (UAE), and ap-northeast-3 Asia Pacific (Osaka) regions. | 5.30.0 and 6.1.0 and higher | 
| November 15, 2024 | Managed scaling is available in the eu-central-2 Europe (Zurich) Region. | 5.30.0 and 6.1.0 and higher | 
| August 20, 2024 | Node labels are now available in managed scaling, so you can label your instances based on market type or node type to improve automatic scaling. | 7.2.0 and higher | 
| March 31, 2024 | Managed scaling is available in the ap-south-2 Asia Pacific (Hyderabad) Region. | 6.14.0 and higher | 
| February 13, 2024 | Managed scaling is available in the eu-south-2 Europe (Spain) Region. | 6.14.0 and higher | 
| October 10, 2023 | Managed scaling is available in the ap-southeast-3 Asia Pacific (Jakarta) Region. | 6.14.0 and higher | 
| July 28, 2023 | Enhanced managed scaling to switch to different task instance group on scale-up when Amazon EMR experiences a delay in scale-up with the current instance group. | 5.34.0 and higher, 6.4.0 and higher | 
| June 16, 2023 | Enhanced managed scaling to be aware of the nodes running application master so that those nodes are not scaled down. For more information, see [Understanding Amazon EMR node allocation strategy and scenarios](managed-scaling-allocation-strategy.md). | 5.34.0 and higher, 6.4.0 and higher | 
| March 21, 2022 | Added Spark shuffle data awareness used when scaling-down clusters. For Amazon EMR clusters with Apache Spark and the managed scaling feature enabled, Amazon EMR continuously monitors Spark executors and intermediate shuffle data locations. Using this information, Amazon EMR scales-down only under-utilized instances which don't contain actively used shuffle data. This prevents recomputation of lost shuffle data, helping to lower cost and improve job performance. For more information, see the [Spark Programming Guide](https://spark.apache.org/docs/latest/rdd-programming-guide.html#shuffle-operations). | 5.34.0 and higher, 6.4.0 and higher | 

# Configure managed scaling for Amazon EMR
<a name="managed-scaling-configure"></a>

The following sections explain how to launch an EMR cluster that uses managed scaling with the AWS Management Console, the AWS SDK for Java, or the AWS Command Line Interface.

**Topics**
+ [Use the AWS Management Console to configure managed scaling](#managed-scaling-console)
+ [Use the AWS CLI to configure managed scaling](#managed-scaling-cli)
+ [Use AWS SDK for Java to configure managed scaling](#managed-scaling-sdk)

## Use the AWS Management Console to configure managed scaling
<a name="managed-scaling-console"></a>

You can use the Amazon EMR console to configure managed scaling when you create a cluster or to change a managed scaling policy for a running cluster.

------
#### [ Console ]

**To configure managed scaling when you create a cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. Choose an Amazon EMR release **emr-5.30.0** or later, except version **emr-6.0.0**. 

1. Under **Cluster scaling and provisioning option**, choose **Use EMR-managed scaling**. Specify the **Minimum** and **Maximum** number of instances, the **Maximum core node** instances, and the **Maximum On-Demand** instances.

1. Choose any other options that apply to your cluster. 

1. To launch your cluster, choose **Create cluster**.

**To configure managed scaling on an existing cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and select the cluster that you want to update.

1. On the **Instances** tab of the cluster details page, find the **Instance group settings** section. Select **Edit cluster scaling** to specify new values for the **Minimum** and **Maximum** number of instances and the **On-Demand** limit.

------

## Use the AWS CLI to configure managed scaling
<a name="managed-scaling-cli"></a>

You can use AWS CLI commands for Amazon EMR to configure managed scaling when you create a cluster. You can use a shorthand syntax, specifying the JSON configuration inline within the relevant commands, or you can reference a file containing the configuration JSON. You can also apply a managed scaling policy to an existing cluster and remove a managed scaling policy that was previously applied. In addition, you can retrieve details of a scaling policy configuration from a running cluster.

**Enabling Managed Scaling During Cluster Launch**

You can enable managed scaling during cluster launch as the following example demonstrates.

```
aws emr create-cluster \
 --service-role EMR_DefaultRole \
 --release-label emr-7.12.0 \
 --name EMR_Managed_Scaling_Enabled_Cluster \
 --applications Name=Spark Name=Hbase \
 --ec2-attributes KeyName=keyName,InstanceProfile=EMR_EC2_DefaultRole \
 --instance-groups InstanceType=m4.xlarge,InstanceGroupType=MASTER,InstanceCount=1 InstanceType=m4.xlarge,InstanceGroupType=CORE,InstanceCount=2 \
 --region us-east-1 \
 --managed-scaling-policy ComputeLimits='{MinimumCapacityUnits=2,MaximumCapacityUnits=4,UnitType=Instances}'
```

You can also specify a managed policy configuration using the --managed-scaling-policy option when you use `create-cluster`. 

**Applying a Managed Scaling Policy to an Existing Cluster**

You can apply a managed scaling policy to an existing cluster as the following example demonstrates.

```
aws emr put-managed-scaling-policy  
--cluster-id j-123456  
--managed-scaling-policy ComputeLimits='{MinimumCapacityUnits=1,
MaximumCapacityUnits=10,  MaximumOnDemandCapacityUnits=10, UnitType=Instances}'
```

You can also apply a managed scaling policy to an existing cluster by using the `aws emr put-managed-scaling-policy` command. The following example uses a reference to a JSON file, `managedscaleconfig.json`, that specifies the managed scaling policy configuration.

```
aws emr put-managed-scaling-policy --cluster-id j-123456 --managed-scaling-policy file://./managedscaleconfig.json
```

The following example shows the contents of the `managedscaleconfig.json` file, which defines the managed scaling policy.

```
{
    "ComputeLimits": {
        "UnitType": "Instances",
        "MinimumCapacityUnits": 1,
        "MaximumCapacityUnits": 10,
        "MaximumOnDemandCapacityUnits": 10
    }
}
```

**Retrieving a Managed Scaling Policy Configuration**

The `GetManagedScalingPolicy` command retrieves the policy configuration. For example, the following command retrieves the configuration for the cluster with a cluster ID of `j-123456`.

```
aws emr get-managed-scaling-policy --cluster-id j-123456
```

The command produces the following example output.

```
 1. {
 2.    "ManagedScalingPolicy": { 
 3.       "ComputeLimits": { 
 4.          "MinimumCapacityUnits": 1,
 5.          "MaximumOnDemandCapacityUnits": 10,
 6.          "MaximumCapacityUnits": 10,
 7.          "UnitType": "Instances"
 8.       }
 9.    }
10. }
```

For more information about using Amazon EMR commands in the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/emr](https://docs.aws.amazon.com/cli/latest/reference/emr).

**Removing Managed Scaling Policy**

The `RemoveManagedScalingPolicy` command removes the policy configuration. For example, the following command removes the configuration for the cluster with a cluster ID of `j-123456`.

```
aws emr remove-managed-scaling-policy --cluster-id j-123456
```

## Use AWS SDK for Java to configure managed scaling
<a name="managed-scaling-sdk"></a>

The following program excerpt shows how to configure managed scaling using the AWS SDK for Java:

```
package com.amazonaws.emr.sample;

import java.util.ArrayList;
import java.util.List;

import com.amazonaws.AmazonClientException;
import com.amazonaws.auth.AWSCredentials;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduce;
import com.amazonaws.services.elasticmapreduce.AmazonElasticMapReduceClientBuilder;
import com.amazonaws.services.elasticmapreduce.model.Application;
import com.amazonaws.services.elasticmapreduce.model.ComputeLimits;
import com.amazonaws.services.elasticmapreduce.model.ComputeLimitsUnitType;
import com.amazonaws.services.elasticmapreduce.model.InstanceGroupConfig;
import com.amazonaws.services.elasticmapreduce.model.JobFlowInstancesConfig;
import com.amazonaws.services.elasticmapreduce.model.ManagedScalingPolicy;
import com.amazonaws.services.elasticmapreduce.model.RunJobFlowRequest;
import com.amazonaws.services.elasticmapreduce.model.RunJobFlowResult;

public class CreateClusterWithManagedScalingWithIG {

	public static void main(String[] args) {
		AWSCredentials credentialsFromProfile = getCreadentials("AWS-Profile-Name-Here");
		
		/**
		 * Create an Amazon EMR client with the credentials and region specified in order to create the cluster
		 */
		AmazonElasticMapReduce emr = AmazonElasticMapReduceClientBuilder.standard()
			.withCredentials(new AWSStaticCredentialsProvider(credentialsFromProfile))
			.withRegion(Regions.US_EAST_1)
			.build();
		
		/**
		 * Create Instance Groups - Primary, Core, Task
		 */
		InstanceGroupConfig instanceGroupConfigMaster = new InstanceGroupConfig()
				.withInstanceCount(1)
				.withInstanceRole("MASTER")
				.withInstanceType("m4.large")
				.withMarket("ON_DEMAND"); 
				
		InstanceGroupConfig instanceGroupConfigCore = new InstanceGroupConfig()
			.withInstanceCount(4)
			.withInstanceRole("CORE")
			.withInstanceType("m4.large")
			.withMarket("ON_DEMAND");
			
		InstanceGroupConfig instanceGroupConfigTask = new InstanceGroupConfig()
			.withInstanceCount(5)
			.withInstanceRole("TASK")
			.withInstanceType("m4.large")
			.withMarket("ON_DEMAND");

		List<InstanceGroupConfig> igConfigs = new ArrayList<>();
		igConfigs.add(instanceGroupConfigMaster);
		igConfigs.add(instanceGroupConfigCore);
		igConfigs.add(instanceGroupConfigTask);
		
        /**
         *  specify applications to be installed and configured when Amazon EMR creates the cluster
         */
		Application hive = new Application().withName("Hive");
		Application spark = new Application().withName("Spark");
		Application ganglia = new Application().withName("Ganglia");
		Application zeppelin = new Application().withName("Zeppelin");
		
		/** 
		 * Managed Scaling Configuration - 
         * Using UnitType=Instances for clusters composed of instance groups
		 *
         * Other options are: 
         * UnitType = VCPU ( for clusters composed of instance groups)
         * UnitType = InstanceFleetUnits ( for clusters composed of instance fleets)
         **/
		ComputeLimits computeLimits = new ComputeLimits()
				.withMinimumCapacityUnits(1)
				.withMaximumCapacityUnits(20)
				.withUnitType(ComputeLimitsUnitType.Instances);
		
		ManagedScalingPolicy managedScalingPolicy = new ManagedScalingPolicy();
		managedScalingPolicy.setComputeLimits(computeLimits);
		
		// create the cluster with a managed scaling policy
		RunJobFlowRequest request = new RunJobFlowRequest()
	       		.withName("EMR_Managed_Scaling_TestCluster")
	       		.withReleaseLabel("emr-7.12.0")          // Specifies the version label for the Amazon EMR release; we recommend the latest release
	       		.withApplications(hive,spark,ganglia,zeppelin)
	       		.withLogUri("s3://path/to/my/emr/logs")  // A URI in S3 for log files is required when debugging is enabled.
	       		.withServiceRole("EMR_DefaultRole")      // If you use a custom IAM service role, replace the default role with the custom role.
	       		.withJobFlowRole("EMR_EC2_DefaultRole")  // If you use a custom Amazon EMR role for EC2 instance profile, replace the default role with the custom Amazon EMR role.
	       		.withInstances(new JobFlowInstancesConfig().withInstanceGroups(igConfigs)
	       	   		.withEc2SubnetId("subnet-123456789012345")
	           		.withEc2KeyName("my-ec2-key-name") 
	           		.withKeepJobFlowAliveWhenNoSteps(true))    
	       		.withManagedScalingPolicy(managedScalingPolicy);
	   RunJobFlowResult result = emr.runJobFlow(request); 
	   
	   System.out.println("The cluster ID is " + result.toString());
	}
	
	public static AWSCredentials getCredentials(String profileName) {
		// specifies any named profile in .aws/credentials as the credentials provider
		try {
			return new ProfileCredentialsProvider("AWS-Profile-Name-Here")
					.getCredentials(); 
        } catch (Exception e) {
            throw new AmazonClientException(
                    "Cannot load credentials from .aws/credentials file. " +
                    "Make sure that the credentials file exists and that the profile name is defined within it.",
                    e);
        }
	}
	
	public CreateClusterWithManagedScalingWithIG() { }
}
```

# Advanced Scaling for Amazon EMR
<a name="managed-scaling-allocation-strategy-optimized"></a>

Starting with Amazon EMR on EC2 version 7.0, you can leverage Advanced Scaling to control your cluster's resource utilization. Advanced Scaling introduces a utilization-performance scale for tuning your resource utilization and performance level according to your business needs. The value you set determines whether your cluster is weighted more to resource conservation or to scaling up to handle service-level-agreement (SLA) sensitive workloads, where quick completion is critical. When the scaling value is adjusted, managed scaling interprets your intent and intelligently scales to optimize resources. For more information about managed scaling, see [Configure managed scaling for Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-configure.html).

## Advanced Scaling settings
<a name="managed-scaling-allocation-strategy-optimized-strategies"></a>

The value your set for Advanced Scaling optimizes your cluster to your requirements. Values range from **1**-**100**. Possible values are **1**, **25**, **50**, **75** and **100**. If you set the index to values other than these, it results in a validation error. 

Scaling values map to resource-utilization strategies. The following list defines several of these:
+ **Utilization optimized [1]** – This setting prevents resource over provisioning. Use a low value when you want to keep costs low and to prioritize efficient resource utilization. It causes the cluster to scale up less aggressively. This works well for the use case when there are regularly occurring workload spikes and you don't want resources to ramp up too quickly.
+ **Balanced [50]** – This balances resource utilization and job performance. This setting is suitable for steady workloads where most stages have a stable runtime. It's also suitable for workloads with a mix of short and long-running stages. We recommend starting with this setting if you aren't sure which to choose.
+ ** Performance optimized [100]** – This strategy prioritizes performance. The cluster scales up aggressively to ensure that jobs complete quickly and meet performance targets. Performance optimized is suitable for service-level-agreement (SLA) sensitive workloads where fast run time is critical.

**Note**  
The intermediate values available provide a middle ground between strategies in order to fine tune your cluster's Advanced Scaling behavior.

## Benefits of Advanced Scaling
<a name="managed-scaling-allocation-strategy-optimized-benefits"></a>

As you have variability in your environment and requirements, such as changing data volumes, cost-target adjustments, and SLA implementations, cluster scaling can help you adjust your cluster configuration to achieve your objectives. Key benefits include:
+ **Enhanced granular control** – The introduction of the utilization-performance setting allows you to easily adjust your cluster's scaling behavior according to your requirements. You can scale up to meet demand for compute resources or scale down to save resources, based on your use patterns.
+ **Improved cost optimization** – You can choose a low utilization value as requirements dictate to more easily meet your cost objectives.

## Getting started with optimization
<a name="managed-scaling-allocation-strategy-optimized-getting-started"></a>

**Setup and configuration**

Use these steps to set the performance index and optimize your scaling strategy.

1. The following command updates an existing cluster with the utilization-optimized `[1]` scaling strategy:

   ```
   aws emr put-managed-scaling-policy --cluster-id 'cluster-id' \
    --managed-scaling-policy '{
     "ComputeLimits": {
       "UnitType": "Instances",
       "MinimumCapacityUnits": 1,
       "MaximumCapacityUnits": 2,
       "MaximumOnDemandCapacityUnits": 2,
       "MaximumCoreCapacityUnits": 2
     },
     "ScalingStrategy": "ADVANCED",
     "UtilizationPerformanceIndex": "1"
   }' \
    --region "region-name"
   ```

   The attributes `ScalingStrategy` and `UtilizationPerformanceIndex` are new and relevant to scaling optimization. You can select different scaling strategies by setting corresponding values (1, 25, 50, 75, and 100) for the `UtilizationPerformanceIndex` attribute in the managed-scaling policy.

1. To revert to the default managed-scaling strategy, run the `put-managed-scaling-policy` command without including the `ScalingStrategy` and `UtilizationPerformanceIndex` attributes. (This is optional.) This sample shows how to do this:

   ```
   aws emr put-managed-scaling-policy \
   --cluster-id 'cluster-id' \
   --managed-scaling-policy '{"ComputeLimits":{"UnitType":"Instances","MinimumCapacityUnits":1,"MaximumCapacityUnits":2,"MaximumOnDemandCapacityUnits":2,"MaximumCoreCapacityUnits":2}}' \
   --region "region-name"
   ```

**Using monitoring metrics to track cluster utilization**

Starting with EMR version 7.3.0, Amazon EMR publishes four new metrics related to memory and virtual CPU. You can use these to measure cluster utilization across scaling strategies. These metrics are available for any use case, but you can use the details provided here for monitoring Advanced Scaling.

Helpful metrics available include the following:
+ **YarnContainersUsedMemoryGBSeconds** – Amount of memory consumed by applications managed by YARN.
+ **YarnContainersTotalMemoryGBSeconds** – Total memory capacity allocated to YARN within the cluster.
+ **YarnNodesUsedVCPUSeconds** – Total VCPU seconds for each application managed by YARN.
+ **YarnNodesTotalVCPUSeconds** – Aggregated total VCPU seconds for memory consumed, including the time window when yarn is not ready.

You can analyze resource metrics using Amazon CloudWatch Logs Insights. Features include a purpose-built query language that helps you extract metrics specific to resource use and scaling.

The following query, which you can run in the Amazon CloudWatch console, uses metric math to calculate the average memory utilization (e1) by dividing the running sum of consumed memory (e2) by the running sum of total memory (e3):

```
{
    "metrics": [
        [ { "expression": "e2/e3", "label": "Average Mem Utilization", "id": "e1", "yAxis": "right" } ],
        [ { "expression": "RUNNING_SUM(m1)", "label": "RunningTotal-YarnContainersUsedMemoryGBSeconds", "id": "e2", "visible": false } ],
        [ { "expression": "RUNNING_SUM(m2)", "label": "RunningTotal-YarnContainersTotalMemoryGBSeconds", "id": "e3", "visible": false } ],
        [ "AWS_EMR_ManagedResize", "YarnContainersUsedMemoryGBSeconds", "ACCOUNT_ID", "793684541905", "COMPONENT", "ManagerService", "JOB_FLOW_ID", "cluster-id", { "id": "m1", "label": "YarnContainersUsedMemoryGBSeconds" } ],
        [ ".", "YarnContainersTotalMemoryGBSeconds", ".", ".", ".", ".", ".", ".", { "id": "m2", "label": "YarnContainersTotalMemoryGBSeconds" } ]
    ],
    "view": "timeSeries",
    "stacked": false,
    "region": "region",
    "period": 60,
    "stat": "Sum",
    "title": "Memory Utilization"
}
```

To query logs, you can select CloudWatch in the AWS console. For more information about writing queries for CloudWatch, see [Analyzing log data with CloudWatch Logs Insights](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/AnalyzingLogData.html) in the Amazon CloudWatch Logs User Guide.

The following image shows these metrics for a sample cluster:

![\[Graph that shows utilization statistics.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/scaling_graph_EMR.png)


## Considerations and limitations
<a name="managed-scaling-allocation-strategy-optimized-considerations"></a>
+ The effectiveness of scaling strategies might vary, depending on your unique workload characteristics and cluster configuration. We encourage you to experiment with the scaling setting to determine an optimal index value for your use case.
+ Amazon EMR Advanced Scaling is particularly well suited for batch workloads. For SQL/data-warehousing and streaming workloads, we recommend using the default managed-scaling strategy for optimal performance.
+ Amazon EMR Advanced Scaling is not supported when Node Label Configurations are enabled in the cluster. If both Advanced Scaling and Node Label Configurations are enabled together in a cluster, then the scaling behavior would be as if the default managed scaling setting was enabled.
+ The performance-optimized scaling strategy enables faster job execution by maintaining high compute resources for a longer period than the default managed-scaling strategy. This mode prioritizes quickly scaling up to meet resource demands, resulting in quicker job completion. This might result in higher costs when compared with the default strategy.
+ In cases where the cluster is already optimized and fully utilized, enabling Advanced Scaling might not provide additional benefits. In some situations, enabling Advanced Scaling might lead to increased costs as workloads may run longer. In these cases, we recommend using the default managed-scaling strategy to ensure optimal resource allocation and cost efficiency.
+ In the context of managed scaling, the emphasis shifts towards resource utilization over execution time as the setting is adjusted from performance-optimized [**100**] to utilization-optimized [**1**]. However, it is important to note that the outcomes might vary, based on the nature of the workload and the cluster's topology. To ensure optimal results for your use case, we strongly recommend testing the scaling strategies with your workloads to determine the most suitable setting.
+ The **PerformanceUtilizationIndex** accepts only the following values:
  + **1**
  + **25**
  + **50**
  + **75**
  + **100**

  Any other values submitted result in a validation error.

# Understanding Amazon EMR node allocation strategy and scenarios
<a name="managed-scaling-allocation-strategy"></a>

This section gives an overview of node allocation strategy and common scaling scenarios that you can use with Amazon EMR managed scaling. 

## Node allocation strategy
<a name="node-allocation-strategy"></a>

Amazon EMR managed scaling allocates core and task nodes based on the following scale-up and scale-down strategies: 

**Scale-up strategy **
+ For Amazon EMR releases 7.2 and higher, managed scaling first adds nodes based on node labels and the application process restriction YARN property. 
+ For Amazon EMR releases 7.2 and higher, if you enabled node labels and restricted application processes to `CORE` nodes, Amazon EMR managed scaling scales up core nodes and task nodes if application process demand increases and executor demand increases. Similarly, if you enabled node labels and restricted application processes to `ON_DEMAND` nodes, managed scaling scales up on-demand nodes if application process demand increases and scales up spot nodes if executor demand increases.
+ If node labels aren't enabled, application process placement aren't restricted to any node or market type.
+ By using node labels, managed scaling can scale up and scale down different instance groups and instance fleets in the same resize operation. For example, in a scenario in which `instance_group1` has `ON_DEMAND` node and `instance_group2` has a `SPOT` node, and node labels are enabled and application processes are restricted to nodes with the `ON_DEMAND` label. Managed scaling will scale down `instance_group1` and scale up `instance_group2` if application process demand decreases and executor demand increases. 
+ When Amazon EMR experiences a delay in scale-up with the current instance group, clusters that use managed scaling automatically switch to a different task instance group.
+ If the `MaximumCoreCapacityUnits` parameter is set, then Amazon EMR scales core nodes until the core units reach the maximum allowed limit. All the remaining capacity is added to task nodes. 
+ If the `MaximumOnDemandCapacityUnits` parameter is set, then Amazon EMR scales the cluster by using the On-Demand Instances until the On-Demand units reach the maximum allowed limit. All the remaining capacity is added using Spot Instances. 
+ If both the `MaximumCoreCapacityUnits` and `MaximumOnDemandCapacityUnits` parameters are set, Amazon EMR considers both limits during scaling. 

  For example, if the `MaximumCoreCapacityUnits` is less than `MaximumOnDemandCapacityUnits`, Amazon EMR first scales core nodes until the core capacity limit is reached. For the remaining capacity, Amazon EMR first uses On-Demand Instances to scale task nodes until the On-Demand limit is reached, and then uses Spot Instances for task nodes. 

**Scale-down strategy**
+ Similar to the scale-up strategy, Amazon EMR removes nodes based on node labels. For more information about node labels, see [Understand node types: primary, core, and task nodes](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-master-core-task-nodes.html).
+ If you haven't enabled node labels, managed scaling removes task nodes and then removes core nodes until it achieves the desired scale-down target capacity. Managed scaling never scales down the cluster below the minimum constraints specified in the managed scaling policy. 
+ Amazon EMR versions 5.34.0 and higher, and Amazon EMR versions 6.4.0 and higher, support Spark shuffle data awareness, which prevents an instance from scaling down while Managed Scaling is aware of existing shuffle data. For more information on shuffle operations, see the [Spark Programming Guide](https://spark.apache.org/docs/latest/rdd-programming-guide.html#shuffle-operations). Managed Scaling makes best effort to prevent scaling-down nodes with shuffle data from the current and previous stage of any active Spark application, up to a maximum of 30 minutes. This helps minimize unintended shuffle data loss, avoiding the need for job re-attempts and recomputation of intermediate data. However, prevention of shuffle data loss is not guaranteed. For improved Spark shuffle protection, we recommend shuffle awareness on clusters with release label 7.4 or higher. Add the following flags to the cluster configuration to enable improved Spark shuffle protection.
  + If either the `yarn.nodemanager.shuffledata-monitor.interval-ms` flag (default 30000 ms) or the `spark.dynamicAllocation.executorIdleTimeout` (default 60 sec) has been changed from the default values, ensure the condition `spark.dynamicAllocation.executorIdleTimeout > yarn.nodemanager.shuffledata-monitor.interval-ms` remains `true` by updating the necessary flag.

    ```
    [
    	{
    		"Classification": "yarn-site",
    		"Properties": { 
    		"yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-shuffle-data": "true"
    		}
    	},
    	{
    		"Classification": "spark-defaults",
    		"Properties": {
    		"spark.dynamicAllocation.enabled": "true",
    		"spark.shuffle.service.removeShuffle": "true"
    		}
    	}
    ]
    ```
+ Managed scaling first removes task nodes and then removes core nodes until it achieves the desired scale-down target capacity. The cluster never scales below the minimum constraints specified in the managed scaling policy.
+ For clusters that are launched with Amazon EMR 5.x releases 5.34.0 and higher, and 6.x releases 6.4.0 and higher, Amazon EMR Managed Scaling doesn’t scale down nodes that have `ApplicationMaster` for Apache Spark, if there are active stages in the applications running on them. This minimizes job failures and retries, which helps to improve job performance and reduce costs. To confirm which nodes in your cluster are running `ApplicationMaster`, visit the Spark History Server and filter for the driver under the **Executors** tab of your Spark application ID.
+ While the intelligent scaling with EMR Managed Scaling minimizes shuffle data loss for Spark, there can be instances when transient shuffle data might be not be protected during a scale-down. To provide enhanced resiliency of shuffle data during scale-down, we recommend enabling **Graceful Decommissioning for Shuffle Data** in YARN. When **Graceful Decommissioning for Shuffle Data** is enabled in YARN, nodes selected for scale-down that have shuffle data will enter the **Decommissioning** state and continue to serve shuffle files. The YARN ResourceManager waits until nodes report no shuffle files present before removing the nodes from the cluster.
  + Amazon EMR version 6.11.0 and higher support Yarn-based graceful decommissioning for **Hive** shuffle data for both the Tez and MapReduce Shuffle Handlers.
    + Enable Graceful Decommissioning for Shuffle Data by setting `yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-shuffle-data` to `true`.
  + Amazon EMR version 7.4.0 and higher support Yarn-based graceful decommissioning for Spark shuffle data when the external shuffle service is enabled (enabled by default in EMR on EC2).
    + The default behavior of the Spark external shuffle service, when running Spark on Yarn, is for the Yarn NodeManager to remove application shuffle files at time of application termination. This may have an impact on the speed of node decommissioning and compute utilization. For long running applications, consider setting `spark.shuffle.service.removeShuffle` to `true` to remove shuffle files no longer in use to enable faster decommissioning of nodes with no active shuffle data.
  + To minimize Spark shuffle data loss in Amazon EMR version 7.4.0 and higher, consider setting the following flags.
    + If either the `yarn.nodemanager.shuffledata-monitor.interval-ms` flag (default 30000 ms) or the `spark.dynamicAllocation.executorIdleTimeout` (default 60 sec) has been changed from the default values, ensure that the condition `spark.dynamicAllocation.executorIdleTimeout > yarn.nodemanager.shuffledata-monitor.interval-ms` remains `true` by updating the necessary flag.

      ```
      [
      	{
      		"Classification": "yarn-site",
      		"Properties": { 
      		"yarn.resourcemanager.decommissioning-nodes-watcher.wait-for-shuffle-data": "true"
      		}
      	},
      	{
      		"Classification": "spark-defaults",
      		"Properties": {
      		"spark.dynamicAllocation.enabled": "true",
      		"spark.shuffle.service.removeShuffle": "true"
      		}
      	}
      ]
      ```

If the cluster does not have any load, then Amazon EMR cancels the addition of new instances from a previous evaluation and performs scale-down operations. If the cluster has a heavy load, Amazon EMR cancels the removal of instances and performs scale-up operations.

## Node allocation considerations
<a name="node-allocation-considerations"></a>

We recommend that you use the On-Demand purchasing option for core nodes to avoid HDFS data loss in case of Spot reclamation. You can use the Spot purchasing option for task nodes to reduce costs and get faster job execution when more Spot Instances are added to task nodes.

## Node allocation scenarios
<a name="node-allocation-scenarios"></a>

You can create various scaling scenarios based on your needs by setting up the Maximum, Minimum, On-Demand limit, and Maximum core node parameters in different combinations. 

**Scenario 1: Scale Core Nodes Only**

To scale core nodes only, the managed scaling parameters must meet the following requirements: 
+ The On-Demand limit is equal to the maximum boundary.
+ The maximum core node is equal to the maximum boundary. 

When the On-Demand limit and the maximum core node parameters are not specified, both parameters default to the maximum boundary. 

This scenario isn't applicable if you use managed scaling with node labels and restrict your application processes to only run on `CORE` nodes, because managed scaling scales task nodes to accommodate executor demand.

The following examples demonstrate the scenario of scaling cores nodes only.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 2: Scale task nodes only **

To scale task nodes only, the managed scaling parameters must meet the following requirement: 
+ The maximum core node must be equal to the minimum boundary.

The following examples demonstrate the scenario of scaling task nodes only.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 3: Only On-Demand Instances in the cluster **

To have On-Demand Instances only, your cluster and the managed scaling parameters must meet the following requirement: 
+ The On-Demand limit is equal to the maximum boundary. 

  When the On-Demand limit is not specified, the parameter value defaults to the maximum boundary. The default value indicates that Amazon EMR scales On-Demand Instances only. 

If the maximum core node is less than the maximum boundary, the maximum core node parameter can be used to split capacity allocation between core and task nodes. 

To enable this scenario in a cluster composed of instance groups, all node groups in the cluster must use the On-Demand market type during initial configuration. 

This scenario is not applicable if you use managed scaling with node labels and restrict your application processes to only run on `ON_DEMAND` nodes, because managed scaling scales `Spot` nodes to accommodate executor demand.

The following examples demonstrate the scenario of having On-Demand Instances in the entire cluster.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 4: Only Spot Instances in the cluster**

To have Spot Instances only, the managed scaling parameters must meet the following requirement: 
+ On-Demand limit is set to 0.

If the maximum core node is less than the maximum boundary, the maximum core node parameter can be used to split capacity allocation between core and task nodes.

To enable this scenario in a cluster composed of instance groups, the core instance group must use the Spot purchasing option during initial configuration. If there is no Spot Instance in the task instance group, Amazon EMR managed scaling creates a task group using Spot Instances when needed. 

This scenario isn't applicable if you use managed scaling with node labels and restrict your application processes to only run on `ON_DEMAND` nodes, because managed scaling scales `ON_DEMAND` nodes to accommodate application process demand.

The following examples demonstrate the scenario of having Spot Instances in the entire cluster.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 5: Scale On-Demand Instances on core nodes and Spot Instances on task nodes **

To scale On-Demand Instances on core nodes and Spot Instances on task nodes, the managed scaling parameters must meet the following requirements: 
+ The On-Demand limit must be equal to the maximum core node.
+ Both the On-Demand limit and the maximum core node must be less than the maximum boundary.

To enable this scenario in a cluster composed of instance groups, the core node group must use the On-Demand purchasing option.

This scenario isn't applicable if you use managed scaling with node labels and restrict your application processes to only run on `ON_DEMAND` nodes or `CORE` nodes. 

The following examples demonstrate the scenario of scaling On-Demand Instances on core nodes and Spot Instances on task nodes.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 6: Scale `CORE` instances for application process demand and `TASK` instances for executor demand.**

This scenario is only applicable if you use managed scaling with node labels and restrict application processes to only run on `CORE` nodes.

To scale `CORE` nodes based on application process demand and `TASK` nodes based on executor demand, you must set the following configurations at cluster launch:
+  `yarn.node-labels.enabled:true` 
+  `yarn.node-labels.am.default-node-label-expression: 'CORE'` 

If you don't specify the `ON_DEMAND` limit and the maximum `CORE` node parameters, both parameters default to the maximum boundary.

If the maximum `ON_DEMAND` node is less than the maximum boundary, managed scaling uses the maximum `ON_DEMAND` node parameter to split capacity allocation between `ON_DEMAND` and `SPOT` nodes. If you set the the maximum `CORE` node parameter to less than or equal to the minimum capacity parameter, `CORE` nodes remain static at the maximum core capacity.

The following examples demonstrate the scenario of scaling CORE instances based on application process demand and TASK instances based on executor demand.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

**Scenario 7: Scale `ON_DEMAND` instances for application process demand and `SPOT` instances for executor demand.**

This scenario is only applicable if you use managed scaling with node labels and restrict application processes to only run on `ON_DEMAND` nodes.

To scale `ON_DEMAND` nodes based on application process demand and `SPOT` nodes based on executor demand, you must set the following configurations at cluster launch:
+  `yarn.node-labels.enabled:true` 
+  `yarn.node-labels.am.default-node-label-expression: 'ON_DEMAND'` 

If you don't specify the `ON_DEMAND` limit and the maximum `CORE` node parameters, both parameters default to the maximum boundary.

If the maximum `CORE` node is less than the maximum boundary, managed scaling uses the maximum `CORE` node parameter to split capacity allocation between `CORE` and `TASK` nodes. If you set the the maximum `CORE` node parameter to less than or equal to the minimum capacity parameter, `CORE` nodes remain static at the maximum core capacity.

The following examples demonstrate the scenario of scaling On-Demand Instances based on application process demand and Spot instances based on executor demand.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-allocation-strategy.html)

# Understanding managed scaling metrics in Amazon EMR
<a name="managed-scaling-metrics"></a>

Amazon EMR publishes high-resolution metrics with data at a one-minute granularity when managed scaling is enabled for a cluster. You can view events on every resize initiation and completion controlled by managed scaling with the Amazon EMR console or the Amazon CloudWatch console. CloudWatch metrics are critical for Amazon EMR managed scaling to operate. We recommend that you closely monitor CloudWatch metrics to make sure data is not missing. For more information about how you can configure CloudWatch alarms to detect missing metrics, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com//AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html). For more information about using CloudWatch events with Amazon EMR, see [Monitor CloudWatch events](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-manage-cloudwatch-events.html).

The following metrics indicate the current or target capacities of a cluster. These metrics are only available when managed scaling is enabled. For clusters composed of instance fleets, the cluster capacity metrics are measured in `Units`. For clusters composed of instance groups, the cluster capacity metrics are measured in `Nodes` or `vCPU` based on the unit type used in the managed scaling policy. 


| Metric | Description | 
| --- | --- | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The target total number of units/nodes/vCPUs in a cluster as determined by managed scaling. Units: *Count*  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The current total number of units/nodes/vCPUs available in a running cluster. When a cluster resize is requested, this metric will be updated after the new instances are added or removed from the cluster. Units: *Count*  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The target number of CORE units/nodes/vCPUs in a cluster as determined by managed scaling. Units: *Count*  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The current number of CORE units/nodes/vCPUs running in a cluster. Units: *Count*  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The target number of TASK units/nodes/vCPUs in a cluster as determined by managed scaling. Units: *Count*  | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/managed-scaling-metrics.html)  |  The current number of TASK units/nodes/vCPUs running in a cluster. Units: *Count*  | 

The following metrics indicate the usage status of cluster and applications. These metrics are available for all Amazon EMR features, but are published at a higher resolution with data at a one-minute granularity when managed scaling is enabled for a cluster. You can correlate the following metrics with the cluster capacity metrics in the previous table to understand the managed scaling decisions. 


| Metric | Description | 
| --- | --- | 
|  `AppsCompleted`  |  The number of applications submitted to YARN that have completed. Use case: Monitor cluster progress Units: *Count*  | 
|  `AppsPending`  |  The number of applications submitted to YARN that are in a pending state. Use case: Monitor cluster progress Units: *Count*  | 
|  `AppsRunning`  |  The number of applications submitted to YARN that are running. Use case: Monitor cluster progress Units: *Count*  | 
| ContainerAllocated |  The number of resource containers allocated by the ResourceManager. Use case: Monitor cluster progress Units: *Count*  | 
|  `ContainerPending`  |  The number of containers in the queue that have not yet been allocated. Use case: Monitor cluster progress Units: *Count*  | 
| ContainerPendingRatio |  The ratio of pending containers to containers allocated (ContainerPendingRatio = ContainerPending / ContainerAllocated). If ContainerAllocated = 0, then ContainerPendingRatio = ContainerPending. The value of ContainerPendingRatio represents a number, not a percentage. This value is useful for scaling cluster resources based on container allocation behavior. Units: *Count*  | 
|  `HDFSUtilization`  |  The percentage of HDFS storage currently used. Use case: Analyze cluster performance Units: *Percent*  | 
|  `IsIdle`  |  Indicates that a cluster is no longer performing work, but is still alive and accruing charges. It is set to 1 if no tasks are running and no jobs are running, and set to 0 otherwise. This value is checked at five-minute intervals and a value of 1 indicates only that the cluster was idle when checked, not that it was idle for the entire five minutes. To avoid false positives, you should raise an alarm when this value has been 1 for more than one consecutive five-minute check. For example, you might raise an alarm on this value if it has been 1 for thirty minutes or longer. Use case: Monitor cluster performance Units: *Boolean*  | 
|  `MemoryAvailableMB`  |  The amount of memory available to be allocated. Use case: Monitor cluster progress Units: *Count*  | 
|  `MRActiveNodes`  |  The number of nodes presently running MapReduce tasks or jobs. Equivalent to YARN metric `mapred.resourcemanager.NoOfActiveNodes`. Use case: Monitor cluster progress Units: *Count*  | 
|  `YARNMemoryAvailablePercentage`  |  The percentage of remaining memory available to YARN (YARNMemoryAvailablePercentage = MemoryAvailableMB / MemoryTotalMB). This value is useful for scaling cluster resources based on YARN memory usage. Units: *Percent*  | 

The following metrics provide information about resources used by YARN containers and nodes. These metrics from the YARN resource manager offer insights into the resources used by containers and nodes running in the cluster. Comparing these metrics to the previous table’s cluster capacity metrics provides a clearer picture of the impact of managed scaling:


| Metric | Associated releases | Description | 
| --- | --- | --- | 
|  `YarnContainersUsedMemoryGBSeconds`  |  Available to release label 7.3.0 and higher  |  The consumed container memory \$1 seconds for the publishing period. **Units:** GB \$1 seconds  | 
|  `YarnContainersTotalMemoryGBSeconds`  |  Available to release label 7.3.0 and higher  |  The total yarn container \$1 seconds for the publishing period. **Units:** GB \$1 seconds  | 
|  `YarnContainersUsedVCPUSeconds`  |  Available to release label 7.5.0 and higher  |  The consumed container VCPU \$1 seconds for the publishing period. **Units:** VCPU \$1 seconds  | 
| `YarnContainersTotalVCPUSeconds` | Available to release label 7.5.0 and higher |  The total container VCPU \$1 seconds for the publishing period. **Units:** VCPU \$1 seconds  | 
|  `YarnNodesUsedMemoryGBSeconds`  |  Available to release label 7.5.0 and higher  |  The consumed node memory \$1 seconds for the publishing period. **Units:** GB \$1 seconds  | 
| `YarnNodesTotalMemoryGBSeconds` | Available to release label 7.5.0 and higher |  The total node memory \$1 seconds for the publishing period. **Units:** GB \$1 seconds  | 
|  `YarnNodesUsedVCPUSeconds`  |  Available to release label 7.3.0 and higher  |  The consumed node VCPU \$1 seconds for the publishing period. **Units:** VCPU \$1 seconds  | 
|  `YarnNodesTotalVCPUSeconds`  |  Available to release label 7.3.0 and higher  |  The total node VCPU \$1 seconds for the publishing period. **Units:** VCPU \$1 seconds  | 

## Graphing managed scaling metrics
<a name="managed-scaling-graphic"></a>

You can graph metrics to visualize your cluster's workload patterns and corresponding scaling decisions made by Amazon EMR managed scaling as the following steps demonstrate. 

**To graph managed scaling metrics in the CloudWatch console**

1. Open the [CloudWatch console](https://console.aws.amazon.com/cloudwatch/).

1. In the navigation pane, choose **Amazon EMR**. You can search on the cluster identifier of the cluster to monitor.

1. Scroll down to the metric to graph. Open a metric to display the graph.

1. To graph one or more metrics, select the check box next to each metric. 

The following example illustrates the Amazon EMR managed scaling activity of a cluster. The graph shows three automatic scale-down periods, which save costs when there is a less active workload. 

![\[Graph managed scaling metrics\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/Managed_Scaling_Decision.png)


All the cluster capacity and usage metrics are published at one-minute intervals. Additional statistical information is also associated with each one-minute data, which allows you to plot various functions such as `Percentiles`, `Min`, `Max`, `Sum`, `Average`, `SampleCount`.

For example, the following graph plots the same `YARNMemoryAvailablePercentage` metric at different percentiles, P10, P50, P90, P99, along with `Sum`, `Average`, `Min`, `SampleCount`.

![\[Graph managed scaling metrics with different percentiles\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/Managed_Scaling_Metrics.png)


# Using automatic scaling with a custom policy for instance groups in Amazon EMR
<a name="emr-automatic-scaling"></a>

Automatic scaling with a custom policy in Amazon EMR releases 4.0 and higher allows you to programmatically scale out and scale in core nodes and task nodes based on a CloudWatch metric and other parameters that you specify in a *scaling policy*. Automatic scaling with a custom policy is available with the instance groups configuration and is not available when you use instance fleets. For more information about instance groups and instance fleets, see [Create an Amazon EMR cluster with instance fleets or uniform instance groups](emr-instance-group-configuration.md).

The scaling policy is part of an instance group configuration. You can specify a policy during initial configuration of an instance group, or by modifying an instance group in an existing cluster, even when that instance group is active. Each instance group in a cluster, except the primary instance group, can have its own scaling policy, which consists of scale-out and scale-in rules. Scale-out and scale-in rules can be configured independently, with different parameters for each rule.

You can configure scaling policies with the AWS Management Console, the AWS CLI, or the Amazon EMR API. When you use the AWS CLI or Amazon EMR API, you specify the scaling policy in JSON format. In addition, when with the AWS CLI or the Amazon EMR API, you can specify custom CloudWatch metrics. Custom metrics are not available for selection with the AWS Management Console. When you initially create a scaling policy with the console, a default policy suitable for many applications is pre-configured to help you get started. You can delete or modify the default rules.

Even though automatic scaling allows you to adjust EMR cluster capacity on-the-fly, you should still consider baseline workload requirements and plan your node and instance group configurations. For more information, see [Cluster configuration guidelines](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-instances-guidelines.html).

**Note**  
For most workloads, setting up both scale-in and scale-out rules is desirable to optimize resource utilization. Setting either rule without the other means that you need to manually resize the instance count after a scaling activity. In other words, this sets up a "one-way" automatic scale-out or scale-in policy with a manual reset.

## Creating the IAM role for automatic scaling
<a name="emr-automatic-scaling-iam-role"></a>

Automatic scaling in Amazon EMR requires an IAM role with permissions to add and terminate instances when scaling activities are triggered. A default role configured with the appropriate role policy and trust policy, `EMR_AutoScaling_DefaultRole`, is available for this purpose. When you create a cluster with a scaling policy for the first time with the AWS Management Console, Amazon EMR creates the default role and attaches the default managed policy for permissions, `AmazonElasticMapReduceforAutoScalingRole`.

When you create a cluster with an automatic scaling policy with the AWS CLI, you must first ensure that either the default IAM role exists, or that you have a custom IAM role with a policy attached that provides the appropriate permissions. To create the default role, you can run the `create-default-roles` command before you create a cluster. You can then specify `--auto-scaling-role EMR_AutoScaling_DefaultRole` option when you create a cluster. Alternatively, you can create a custom automatic scaling role and then specify it when you create a cluster, for example `--auto-scaling-role MyEMRAutoScalingRole`. If you create a customized automatic scaling role for Amazon EMR, we recommend that you base permissions policies for your custom role based on the managed policy. For more information, see [Configure IAM service roles for Amazon EMR permissions to AWS services and resources](emr-iam-roles.md).

## Understanding automatic scaling rules
<a name="emr-scaling-rules"></a>

When a scale-out rule triggers a scaling activity for an instance group, Amazon EC2 instances are added to the instance group according to your rules. New nodes can be used by applications such as Apache Spark, Apache Hive, and Presto as soon as the Amazon EC2 instance enters the `InService` state. You can also set up a scale-in rule that terminates instances and removes nodes. For more information about the lifecycle of Amazon EC2 instances that scale automatically, see [Auto Scaling lifecycle](https://docs.aws.amazon.com/autoscaling/ec2/userguide/AutoScalingGroupLifecycle.html) in the *Amazon EC2 Auto Scaling User Guide*.

You can configure how a cluster terminates Amazon EC2 instances. You can choose to either terminate at the Amazon EC2 instance-hour boundary for billing, or upon task completion. This setting applies both to automatic scaling and to manual resizing operations. For more information about this configuration, see [Cluster scale-down options for Amazon EMR clusters](emr-scaledown-behavior.md).

The following parameters for each rule in a policy determine automatic scaling behavior.

**Note**  
The parameters listed here are based on the AWS Management Console for Amazon EMR. When you use the AWS CLI or Amazon EMR API, additional advanced configuration options are available. For more information about advanced options, see [SimpleScalingPolicyConfiguration](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_PutAutoScalingPolicy.html) in the *Amazon EMR API Reference*.
+ Maximum instances and minimum instances. The **Maximum instances** constraint specifies the maximum number of Amazon EC2 instances that can be in the instance group, and applies to all scale-out rules. Similarly, the **Minimum instances** constraint specifies the minimum number of Amazon EC2 instances and applies to all scale-in rules.
+ The **Rule name**, which must be unique within the policy.
+ The **scaling adjustment**, which determines the number of EC2 instances to add (for scale-out rules) or terminate (for scale-in rules) during the scaling activity triggered by the rule. 
+ The **CloudWatch metric**, which is watched for an alarm condition.
+ A **comparison operator**, which is used to compare the CloudWatch metric to the **Threshold** value and determine a trigger condition.
+ An **evaluation period**, in five-minute increments, for which the CloudWatch metric must be in a trigger condition before scaling activity is triggered.
+ A **Cooldown period**, in seconds, which determines the amount of time that must elapse between a scaling activity started by a rule and the start of the next scaling activity, regardless of the rule that triggers it. When an instance group has finished a scaling activity and reached its post-scale state, the cooldown period provides an opportunity for the CloudWatch metrics that might trigger subsequent scaling activities to stabilize. For more information, see [Auto Scaling cooldowns](https://docs.aws.amazon.com/autoscaling/ec2/userguide/Cooldown.html) in the *Amazon EC2 Auto Scaling User Guide*.  
![\[AWS Management Console automatic scaling rule parameters for Amazon EMR.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/auto-scaling-rule-params.png)

## Considerations and limitations
<a name="emr-automatic-scaling-considerations"></a>
+ Amazon CloudWatch metrics are critical for Amazon EMR automatic scaling to operate. We recommend that you closely monitor Amazon CloudWatch metrics to make sure data is not missing. For more information about how you can configure Amazon CloudWatch alarms to detect missing metrics, see [Using Amazon CloudWatch alarms](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/AlarmThatSendsEmail.html).
+ Over-utilization of EBS volumes can cause Managed Scaling issues. We recommend that you monitor EBS volume usage closely to make sure EBS volume is below 90% utilization. See [Instance storage](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-storage.html) for information on specifying additional EBS volumes.
+ Automatic scaling with a custom policy in Amazon EMR releases 5.18 to 5.28 may experience scaling failure caused by data intermittently missing in Amazon CloudWatch metrics. We recommend that you use the most recent Amazon EMR versions for improved autoscaling. You can also contact [AWS Support](https://aws.amazon.com/premiumsupport/) for a patch if you need to use an Amazon EMR release between 5.18 and 5.28.

## Using the AWS Management Console to configure automatic scaling
<a name="emr-automatic-scale-console"></a>

When you create a cluster, you configure a scaling policy for instance groups with the advanced cluster configuration options. You can also create or modify a scaling policy for an instance group in-service by modifying instance groups in the **Hardware** settings of an existing cluster.

1. Navigate to the new Amazon EMR console and select **Switch to the old console** from the side navigation. For more information on what to expect when you switch to the old console, see [Using the old console](https://docs.aws.amazon.com/emr/latest/ManagementGuide/whats-new-in-console.html#console-opt-in).

1. If you are creating a cluster, in the Amazon EMR console, select **Create Cluster**, select **Go to advanced options**, choose options for **Step 1: Software and Steps**, and then go to **Step 2: Hardware Configuration**.

   ** - or - **

   If you are modifying an instance group in a running cluster, select your cluster from the cluster list, and then expand the **Hardware** section.

1. In **Cluster scaling and provisioning option** section, select **Enable cluster scaling**. Then select **Create a custom automatic scaling policy**.

   In the table of **Custom automatic scaling policies**, click the pencil icon that appears in the row of the instance group you want to configure. The Auto Scaling Rules screen opens. 

1. Type the **Maximum instances** you want the instance group to contain after it scales out, and type the **Minimum instances** you want the instance group to contain after it scales in.

1. Click the pencil to edit rule parameters, click the **X** to remove a rule from the policy, and click **Add rule** to add additional rules.

1. Choose rule parameters as described earlier in this topic. For descriptions of available CloudWatch metrics for Amazon EMR, see [Amazon EMR metrics and dimensions](https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/emr-metricscollected.html) in the *Amazon CloudWatch User Guide*.

## Using the AWS CLI to configure automatic scaling
<a name="emr-automatic-scale-cli"></a>

You can use AWS CLI commands for Amazon EMR to configure automatic scaling when you create a cluster and when you create an instance group. You can use a shorthand syntax, specifying the JSON configuration inline within the relevant commands, or you can reference a file containing the configuration JSON. You can also apply an automatic scaling policy to an existing instance group and remove an automatic scaling policy that was previously applied. In addition, you can retrieve details of a scaling policy configuration from a running cluster.

**Important**  
When you create a cluster that has an automatic scaling policy, you must use the `--auto-scaling-role MyAutoScalingRole` command to specify the IAM role for automatic scaling. The default role is `EMR_AutoScaling_DefaultRole` and can be created with the `create-default-roles` command. The role can only be added when the cluster is created, and cannot be added to an existing cluster.

For a detailed description of the parameters available when configuring an automatic scaling policy, see [PutAutoScalingPolicy](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_PutAutoScalingPolicy.html) in *Amazon EMR API Reference*.

### Creating a cluster with an automatic scaling policy applied to an instance group
<a name="emr-autoscale-cli-createcluster"></a>

You can specify an automatic scaling configuration within the `--instance-groups` option of the `aws emr create-cluster` command. The following example illustrates a create-cluster command where an automatic scaling policy for the core instance group is provided inline. The command creates a scaling configuration equivalent to the default scale-out policy that appears when you create an automatic scaling policy with the AWS Management Console for Amazon EMR. For brevity, a scale-in policy is not shown. We do not recommend creating a scale-out rule without a scale-in rule.

```
aws emr create-cluster --release-label emr-5.2.0 --service-role EMR_DefaultRole --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole --auto-scaling-role EMR_AutoScaling_DefaultRole  --instance-groups Name=MyMasterIG,InstanceGroupType=MASTER,InstanceType=m5.xlarge,InstanceCount=1 'Name=MyCoreIG,InstanceGroupType=CORE,InstanceType=m5.xlarge,InstanceCount=2,AutoScalingPolicy={Constraints={MinCapacity=2,MaxCapacity=10},Rules=[{Name=Default-scale-out,Description=Replicates the default scale-out rule in the console.,Action={SimpleScalingPolicyConfiguration={AdjustmentType=CHANGE_IN_CAPACITY,ScalingAdjustment=1,CoolDown=300}},Trigger={CloudWatchAlarmDefinition={ComparisonOperator=LESS_THAN,EvaluationPeriods=1,MetricName=YARNMemoryAvailablePercentage,Namespace=AWS/ElasticMapReduce,Period=300,Statistic=AVERAGE,Threshold=15,Unit=PERCENT,Dimensions=[{Key=JobFlowId,Value="${emr.clusterId}"}]}}}]}'				
```

 The following command illustrates how to use the command line to provide the automatic scaling policy definition as part of an instance group configuration file named `instancegroupconfig.json`.

```
aws emr create-cluster --release-label emr-5.2.0 --service-role EMR_DefaultRole --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole --instance-groups file://your/path/to/instancegroupconfig.json --auto-scaling-role EMR_AutoScaling_DefaultRole								
```

With the contents of the configuration file as follows:

```
[
{
  "InstanceCount": 1,
  "Name": "MyMasterIG",
  "InstanceGroupType": "MASTER",
  "InstanceType": "m5.xlarge"
},
{
  "InstanceCount": 2,
  "Name": "MyCoreIG",
  "InstanceGroupType": "CORE",
  "InstanceType": "m5.xlarge",
  "AutoScalingPolicy":
    {
     "Constraints":
      {
       "MinCapacity": 2,
       "MaxCapacity": 10
      },
     "Rules":
     [
      {
       "Name": "Default-scale-out",
       "Description": "Replicates the default scale-out rule in the console for YARN memory.",
       "Action":{
        "SimpleScalingPolicyConfiguration":{
          "AdjustmentType": "CHANGE_IN_CAPACITY",
          "ScalingAdjustment": 1,
          "CoolDown": 300
        }
       },
       "Trigger":{
        "CloudWatchAlarmDefinition":{
          "ComparisonOperator": "LESS_THAN",
          "EvaluationPeriods": 1,
          "MetricName": "YARNMemoryAvailablePercentage",
          "Namespace": "AWS/ElasticMapReduce",
          "Period": 300,
          "Threshold": 15,
          "Statistic": "AVERAGE",
          "Unit": "PERCENT",
          "Dimensions":[
             {
               "Key" : "JobFlowId",
               "Value" : "${emr.clusterId}"
             }
          ]
        }
       }
      }
     ]
   }
}
]
```

### Adding an instance group with an automatic scaling policy to a cluster
<a name="emr-autoscale-cli-createinstancegroup"></a>

You can specify a scaling policy configuration with the `--instance-groups` option with the `add-instance-groups` command in the same way you can when you use `create-cluster`. The following example uses a reference to a JSON file, `instancegroupconfig.json`, with the instance group configuration.

```
aws emr add-instance-groups --cluster-id j-1EKZ3TYEVF1S2 --instance-groups file://your/path/to/instancegroupconfig.json
```

### Applying an automatic scaling policy to an existing instance group or modifying an applied policy
<a name="emr-autoscale-cli-modifyinstancegroup"></a>

Use the `aws emr put-auto-scaling-policy` command to apply an automatic scaling policy to an existing instance group. The instance group must be part of a cluster that uses the automatic scaling IAM role. The following example uses a reference to a JSON file, `autoscaleconfig.json`, that specifies the automatic scaling policy configuration.

```
aws emr put-auto-scaling-policy --cluster-id j-1EKZ3TYEVF1S2 --instance-group-id ig-3PLUZBA6WLS07 --auto-scaling-policy file://your/path/to/autoscaleconfig.json 
```

The contents of the `autoscaleconfig.json` file, which defines the same scale-out rule as shown in the previous example, is shown below.

```
{
          "Constraints": {
                  "MaxCapacity": 10,
                  "MinCapacity": 2
          },
          "Rules": [{
                  "Action": {
                          "SimpleScalingPolicyConfiguration": {
                                  "AdjustmentType": "CHANGE_IN_CAPACITY",
                                  "CoolDown": 300,
                                  "ScalingAdjustment": 1
                          }
                  },
                  "Description": "Replicates the default scale-out rule in the console for YARN memory",
                  "Name": "Default-scale-out",
                  "Trigger": {
                          "CloudWatchAlarmDefinition": {
                                  "ComparisonOperator": "LESS_THAN",
                                  "Dimensions": [{
                                          "Key": "JobFlowId",
                                          "Value": "${emr.clusterID}"
                                  }],
                                  "EvaluationPeriods": 1,
                                  "MetricName": "YARNMemoryAvailablePercentage",
                                  "Namespace": "AWS/ElasticMapReduce",
                                  "Period": 300,
                                  "Statistic": "AVERAGE",
                                  "Threshold": 15,
                                  "Unit": "PERCENT"
                          }
                  }
          }]
  }
```

### Removing an automatic scaling policy from an instance group
<a name="emr-autoscale-cli-removepolicy"></a>

```
aws emr remove-auto-scaling-policy --cluster-id j-1EKZ3TYEVF1S2 --instance-group-id ig-3PLUZBA6WLS07
```

### Retrieving an automatic scaling policy configuration
<a name="emr-autoscale-cli-getpolicy"></a>

The `describe-cluster` command retrieves the policy configuration in the InstanceGroup block. For example, the following command retrieves the configuration for the cluster with a cluster ID of `j-1CWOHP4PI30VJ`.

```
aws emr describe-cluster --cluster-id j-1CWOHP4PI30VJ
```

The command produces the following example output.

```
{
    "Cluster": {
        "Configurations": [],
        "Id": "j-1CWOHP4PI30VJ",
        "NormalizedInstanceHours": 48,
        "Name": "Auto Scaling Cluster",
        "ReleaseLabel": "emr-5.2.0",
        "ServiceRole": "EMR_DefaultRole",
        "AutoTerminate": false,
        "TerminationProtected": true,
        "MasterPublicDnsName": "ec2-54-167-31-38.compute-1.amazonaws.com",
        "LogUri": "s3n://aws-logs-232939870606-us-east-1/elasticmapreduce/",
        "Ec2InstanceAttributes": {
            "Ec2KeyName": "performance",
            "AdditionalMasterSecurityGroups": [],
            "AdditionalSlaveSecurityGroups": [],
            "EmrManagedSlaveSecurityGroup": "sg-09fc9362",
            "Ec2AvailabilityZone": "us-east-1d",
            "EmrManagedMasterSecurityGroup": "sg-0bfc9360",
            "IamInstanceProfile": "EMR_EC2_DefaultRole"
        },
        "Applications": [
            {
                "Name": "Hadoop",
                "Version": "2.7.3"
            }
        ],
        "InstanceGroups": [
            {
                "AutoScalingPolicy": {
                    "Status": {
                        "State": "ATTACHED",
                        "StateChangeReason": {
                            "Message": ""
                        }
                    },
                    "Constraints": {
                        "MaxCapacity": 10,
                        "MinCapacity": 2
                    },
                    "Rules": [
                        {
                            "Name": "Default-scale-out",
                            "Trigger": {
                                "CloudWatchAlarmDefinition": {
                                    "MetricName": "YARNMemoryAvailablePercentage",
                                    "Unit": "PERCENT",
                                    "Namespace": "AWS/ElasticMapReduce",
                                    "Threshold": 15,
                                    "Dimensions": [
                                        {
                                            "Key": "JobFlowId",
                                            "Value": "j-1CWOHP4PI30VJ"
                                        }
                                    ],
                                    "EvaluationPeriods": 1,
                                    "Period": 300,
                                    "ComparisonOperator": "LESS_THAN",
                                    "Statistic": "AVERAGE"
                                }
                            },
                            "Description": "",
                            "Action": {
                                "SimpleScalingPolicyConfiguration": {
                                    "CoolDown": 300,
                                    "AdjustmentType": "CHANGE_IN_CAPACITY",
                                    "ScalingAdjustment": 1
                                }
                            }
                        },
                        {
                            "Name": "Default-scale-in",
                            "Trigger": {
                                "CloudWatchAlarmDefinition": {
                                    "MetricName": "YARNMemoryAvailablePercentage",
                                    "Unit": "PERCENT",
                                    "Namespace": "AWS/ElasticMapReduce",
                                    "Threshold": 75,
                                    "Dimensions": [
                                        {
                                            "Key": "JobFlowId",
                                            "Value": "j-1CWOHP4PI30VJ"
                                        }
                                    ],
                                    "EvaluationPeriods": 1,
                                    "Period": 300,
                                    "ComparisonOperator": "GREATER_THAN",
                                    "Statistic": "AVERAGE"
                                }
                            },
                            "Description": "",
                            "Action": {
                                "SimpleScalingPolicyConfiguration": {
                                    "CoolDown": 300,
                                    "AdjustmentType": "CHANGE_IN_CAPACITY",
                                    "ScalingAdjustment": -1
                                }
                            }
                        }
                    ]
                },
                "Configurations": [],
                "InstanceType": "m5.xlarge",
                "Market": "ON_DEMAND",
                "Name": "Core - 2",
                "ShrinkPolicy": {},
                "Status": {
                    "Timeline": {
                        "CreationDateTime": 1479413437.342,
                        "ReadyDateTime": 1479413864.615
                    },
                    "State": "RUNNING",
                    "StateChangeReason": {
                        "Message": ""
                    }
                },
                "RunningInstanceCount": 2,
                "Id": "ig-3M16XBE8C3PH1",
                "InstanceGroupType": "CORE",
                "RequestedInstanceCount": 2,
                "EbsBlockDevices": []
            },
            {
                "Configurations": [],
                "Id": "ig-OP62I28NSE8M",
                "InstanceGroupType": "MASTER",
                "InstanceType": "m5.xlarge",
                "Market": "ON_DEMAND",
                "Name": "Master - 1",
                "ShrinkPolicy": {},
                "EbsBlockDevices": [],
                "RequestedInstanceCount": 1,
                "Status": {
                    "Timeline": {
                        "CreationDateTime": 1479413437.342,
                        "ReadyDateTime": 1479413752.088
                    },
                    "State": "RUNNING",
                    "StateChangeReason": {
                        "Message": ""
                    }
                },
                "RunningInstanceCount": 1
            }
        ],
        "AutoScalingRole": "EMR_AutoScaling_DefaultRole",
        "Tags": [],
        "BootstrapActions": [],
        "Status": {
            "Timeline": {
                "CreationDateTime": 1479413437.339,
                "ReadyDateTime": 1479413863.666
            },
            "State": "WAITING",
            "StateChangeReason": {
                "Message": "Cluster ready after last step completed."
            }
        }
    }
}
```

# Manually resize a running Amazon EMR cluster
<a name="emr-manage-resize"></a>

You can add and remove instances from core and task instance groups and instance fleets in a running cluster with the AWS Management Console, AWS CLI, or the Amazon EMR API. If a cluster uses instance groups, you explicitly change the instance count. If your cluster uses instance fleets, you can change the target units for On-Demand Instances and Spot Instances. The instance fleet then adds and removes instances to meet the new target. For more information, see [Instance fleet options](emr-instance-fleet.md#emr-instance-fleet-options). Applications can use newly provisioned Amazon EC2 instances to host nodes as soon as the instances are available. When instances are removed, Amazon EMR shuts down tasks in a way that does not interrupt jobs and safeguards against data loss. For more information, see [Terminate at task completion](emr-scaledown-behavior.md#emr-scaledown-terminate-task).

## Resize a cluster with the console
<a name="resize-console"></a>

You can use the Amazon EMR console to resize a running cluster.

------
#### [ Console ]

**To change the instance count for an existing cluster with the new console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and select the cluster that you want to update. The cluster must be running; you can't resize a provisioning or terminated cluster.

1. On the **Instances** tab on the cluster details page, view the **Instance groups** panel. 

1. To resize an existing instance group, select the radio button next to the core or task instance group that you want to resize and then choose **Resize instance group**. Specify the new number of instances for the instance group, then select **Resize**.
**Note**  
If you choose to reduce the size of a running instance group, Amazon EMR will intelligently select the instances to remove from the group for minimal data loss. For more granular control of your resize action, you can select the **ID** for the instance group, choose the instances you want to remove, and then use the **Terminate** option. For more information on intelligent scale-down behavior, see [Cluster scale-down options for Amazon EMR clusters](emr-scaledown-behavior.md).

1. If you want to cancel the resizing action, you can select the radio button for an instance group with the status **Resizing** and then choose **Stop resize** from the list actions.

1. To add one or more task instance groups to your cluster in response to increasing workload, choose **Add task instance group** from the list actions. Choose the Amazon EC2 instance type, enter the number of instances for the task group, then select **Add task instance group** to return to the **Instance groups** panel for your cluster.

------

When you make a change to the number of nodes, the **Status** of the instance group updates. When the change you requested is complete, the **Status** is **Running**.

## Resize a cluster with the AWS CLI
<a name="ResizingParameters"></a>

You can use the AWS CLI to resize a running cluster. You can increase or decrease the number of task nodes, and you can increase the number of core nodes in a running cluster. It is also possible to shut down an instance in the core instance group with the AWS CLI or the API. This should be done with caution. Shutting down an instance in the core instance group risks data loss, and the instance is not automatically replaced.

In addition to resizing the core and task groups, you can also add one or more task instance groups to a running cluster with the AWS CLI. <a name="IncreaseDecreaseNodesawscli"></a>

**To resize a cluster by changing the instance count with the AWS CLI**

You can add instances to the core group or task group, and you can remove instances from the task group with the AWS CLI `modify-instance-groups` subcommand with the `InstanceCount` parameter. To add instances to the core or task groups, increase the `InstanceCount`. To reduce the number of instances in the task group, decrease the `InstanceCount`. Changing the instance count of the task group to 0 removes all instances but not the instance group.
+ To increase the number of instances in the task instance group from 3 to 4, type the following command and replace *ig-31JXXXXXXBTO* with the instance group ID.

  ```
  aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-31JXXXXXXBTO,InstanceCount=4
  ```

  To retrieve the `InstanceGroupId`, use the `describe-cluster` subcommand. The output is a JSON object called `Cluster` that contains the ID of each instance group. To use this command, you need the cluster ID (which you can retrieve with the `aws emr list-clusters` command or the console). To retrieve the instance group ID, type the following command and replace *j-2AXXXXXXGAPLF* with the cluster ID.

  ```
  aws emr describe-cluster --cluster-id j-2AXXXXXXGAPLF
  ```

  With the AWS CLI, you can also terminate an instance in the core instance group with the `--modify-instance-groups` subcommand.
**Warning**  
Specifying `EC2InstanceIdsToTerminate` must be done with caution. Instances are terminated immediately, regardless of the status of applications running on them, and the instance is not automatically replaced. This is true regardless of the cluster's **Scale down behavior** configuration. Terminating an instance in this way risks data loss and unpredictable cluster behavior.

  To terminate a specific instance you need the instance group ID (returned by the `aws emr describe-cluster --cluster-id` subcommand) and the instance ID (returned by the `aws emr list-instances --cluster-id` subcommand), type the following command, replace *ig-6RXXXXXX07SA* with the instance group ID and replace *i-f9XXXXf2* with the instance ID.

  ```
  1. aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-6RXXXXXX07SA,EC2InstanceIdsToTerminate=i-f9XXXXf2
  ```

  For more information about using Amazon EMR commands in the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/emr](https://docs.aws.amazon.com/cli/latest/reference/emr).

**To resize a cluster by adding task instance groups with the AWS CLI**

With the AWS CLI, you can add from 1–48 task instance groups to a cluster with the `--add-instance-groups` subcommand. Task instances groups can only be added to a cluster containing a primary instance group and a core instance group. When you use the AWS CLI, you can add up to five task instance groups each time you use the `--add-instance-groups` subcommand.

1. To add a single task instance group to a cluster, type the following command and replace *j-JXBXXXXXX37R* with the cluster ID.

   ```
   1. aws emr add-instance-groups --cluster-id j-JXBXXXXXX37R --instance-groups InstanceCount=6,InstanceGroupType=task,InstanceType=m5.xlarge
   ```

1. To add multiple task instance groups to a cluster, type the following command and replace *j-JXBXXXXXX37R* with the cluster ID. You can add up to five task instance groups in a single command.

   ```
   aws emr add-instance-groups --cluster-id j-JXBXXXXXX37R --instance-groups InstanceCount=6,InstanceGroupType=task,InstanceType=m5.xlarge InstanceCount=10,InstanceGroupType=task,InstanceType=m5.xlarge
   ```

   For more information about using Amazon EMR commands in the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/emr](https://docs.aws.amazon.com/cli/latest/reference/emr).

## Interrupting a resize
<a name="interruptible-resize"></a>

Using Amazon EMR version 4.1.0 or later, you can issue a resize in the midst of an existing resize operation. Additionally, you can stop a previously submitted resize request or submit a new request to override a previous request without waiting for it to finish. You can also stop an existing resize from the console or with the `ModifyInstanceGroups` API call with the current count as the target count of the cluster.

The following screenshot shows a task instance group that is resizing but can be stopped by choosing **Stop**.

![\[Task instance group showing resizing status with options to resize or stop.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/resize-stop.png)


**To interrupt a resize with the AWS CLI**

You can use the AWS CLI to stop a resize with the `modify-instance-groups` subcommand. Assume that you have six instances in your instance group and you want to increase this to 10. You later decide that you would like to cancel this request:
+ The initial request:

  ```
  aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-myInstanceGroupId,InstanceCount=10
  ```

  The second request to stop the first request:

  ```
  aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-myInstanceGroupId,InstanceCount=6
  ```

**Note**  
Because this process is asynchronous, you may see instance counts change with respect to previous API requests before subsequent requests are honored. In the case of shrinking, it is possible that if you have work running on the nodes, the instance group may not shrink until nodes have completed their work.

## Suspended state
<a name="emr-manage-resizeSuspended"></a>

An instance group goes into a suspended state if it encounters too many errors while trying to start the new cluster nodes. For example, if new nodes fail while performing bootstrap actions, the instance group goes into a *SUSPENDED* state, rather than continuously provisioning new nodes. After you resolve the underlying issue, reset the desired number of nodes on the cluster's instance group, and then the instance group resumes allocating nodes. Modifying an instance group instructs Amazon EMR to attempt to provision nodes again. No running nodes are restarted or terminated.

In the AWS CLI, the `list-instances` subcommand returns all instances and their states as does the `describe-cluster` subcommand. If Amazon EMR detects a fault with an instance group, it changes the group's state to `SUSPENDED`. 

**To reset a cluster in a SUSPENDED state with the AWS CLI**

Type the `describe-cluster` subcommand with the `--cluster-id` parameter to view the state of the instances in your cluster.
+ To view information on all instances and instance groups in a cluster, type the following command and replace *j-3KVXXXXXXY7UG* with the cluster ID.

  ```
  1. aws emr describe-cluster --cluster-id j-3KVXXXXXXY7UG
  ```

  The output displays information about your instance groups and the state of the instances:

  ```
   1. {
   2.     "Cluster": {
   3.         "Status": {
   4.             "Timeline": {
   5.                 "ReadyDateTime": 1413187781.245,
   6.                 "CreationDateTime": 1413187405.356
   7.             },
   8.             "State": "WAITING",
   9.             "StateChangeReason": {
  10.                 "Message": "Waiting after step completed"
  11.             }
  12.         },
  13.         "Ec2InstanceAttributes": {
  14.             "Ec2AvailabilityZone": "us-west-2b"
  15.         },
  16.         "Name": "Development Cluster",
  17.         "Tags": [],
  18.         "TerminationProtected": false,
  19.         "RunningAmiVersion": "3.2.1",
  20.         "NormalizedInstanceHours": 16,
  21.         "InstanceGroups": [
  22.             {
  23.                 "RequestedInstanceCount": 1,
  24.                 "Status": {
  25.                     "Timeline": {
  26.                         "ReadyDateTime": 1413187775.749,
  27.                         "CreationDateTime": 1413187405.357
  28.                     },
  29.                     "State": "RUNNING",
  30.                     "StateChangeReason": {
  31.                         "Message": ""
  32.                     }
  33.                 },
  34.                 "Name": "MASTER",
  35.                 "InstanceGroupType": "MASTER",
  36.                 "InstanceType": "m5.xlarge",
  37.                 "Id": "ig-3ETXXXXXXFYV8",
  38.                 "Market": "ON_DEMAND",
  39.                 "RunningInstanceCount": 1
  40.             },
  41.             {
  42.                 "RequestedInstanceCount": 1,
  43.                 "Status": {
  44.                     "Timeline": {
  45.                         "ReadyDateTime": 1413187781.301,
  46.                         "CreationDateTime": 1413187405.357
  47.                     },
  48.                     "State": "RUNNING",
  49.                     "StateChangeReason": {
  50.                         "Message": ""
  51.                     }
  52.                 },
  53.                 "Name": "CORE",
  54.                 "InstanceGroupType": "CORE",
  55.                 "InstanceType": "m5.xlarge",
  56.                 "Id": "ig-3SUXXXXXXQ9ZM",
  57.                 "Market": "ON_DEMAND",
  58.                 "RunningInstanceCount": 1
  59.             }
  60. ...
  61. }
  ```

  To view information about a particular instance group, type the `list-instances` subcommand with the `--cluster-id` and `--instance-group-types` parameters. You can view information for the primary, core, or task groups.

  ```
  1. aws emr list-instances --cluster-id j-3KVXXXXXXY7UG --instance-group-types "CORE"
  ```

  Use the `modify-instance-groups` subcommand with the `--instance-groups` parameter to reset a cluster in the `SUSPENDED` state. The instance group id is returned by the `describe-cluster` subcommand.

  ```
  1. aws emr modify-instance-groups --instance-groups InstanceGroupId=ig-3SUXXXXXXQ9ZM,InstanceCount=3
  ```

## Considerations when reducing cluster size
<a name="resize-considerations"></a>

If you choose to reduce the size of a running cluster, consider the following Amazon EMR behavior and best practices:
+ To reduce impact on jobs that are in progress, Amazon EMR intelligently selects the instances to remove. For more information on cluster scale-down behavior, see [Terminate at task completion](emr-scaledown-behavior.md#emr-scaledown-terminate-task) in the Amazon EMR Management Guide. 
+ When you scale down the size of a cluster, Amazon EMR copies the data from the instances that it removes to the instances that remain. Ensure that there is sufficient storage capacity for this data in the instances that remain in the group.
+ Amazon EMR attempts to decommission HDFS on instances in the group. Before you reduce the size of a cluster, we recommend that you minimize HDFS write I/O.
+ For the most granular control when you reduce the size of a cluster, you can view the cluster in the console and navigate to the **Instances** tab. Select the **ID** for the instance group that you want to resize. Then use the **Terminate** option for the specific instances that you want to remove. 

# Configuring provisioning timeouts to control capacity in Amazon EMR
<a name="emr-provisioning-timeout"></a>

When you use instance fleets, you can configure *provisioning timeouts*. A provisioning timeout instructs Amazon EMR to stop provisioning instance capacity if the cluster exceeds a specified time threshold during cluster launch or cluster scaling operations. The following topics cover how to configure a provisioning timeout for cluster launch and for cluster scale-up operations.

**Topics**
+ [Configure provisioning timeouts for cluster launch in Amazon EMR](emr-provisioning-timeout-launch.md)
+ [Customize a provisioning timeout period for cluster resize in Amazon EMR](emr-provisioning-timeout-resize.md)

# Configure provisioning timeouts for cluster launch in Amazon EMR
<a name="emr-provisioning-timeout-launch"></a>

You can define a timeout period to provision Spot Instances for each fleet in your cluster. If Amazon EMR can't provision Spot capacity, you can choose either to terminate the cluster or to provision On-Demand capacity instead. If the timeout period ends during the cluster resizing process, Amazon EMR cancels unprovisioned Spot requests. Unprovisioned Spot instances aren't transferred to On-Demand capacity.

Perform the following steps to customize a provisioning timeout period for cluster launch with the Amazon EMR console.

------
#### [ Console ]

**To configure the provisioning timeout when you create a cluster with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. On the **Create Cluster** page, navigate to **Cluster configuration** and select **Instance Fleets**.

1. Under **Cluster scaling and provisioning option**, specify the Spot size for your core and task fleets.

1. Under **Spot timeout configuration**, select either **Terminate cluster after Spot timeout** or **Switch to On-Demand after Spot timeout**. Then, specify the timeout period for provisioning Spot Instances. The default value is 1 hour.

1. Choose any other options that apply for your cluster.

1. To launch your cluster with the configured timeout, choose **Create cluster**.

------
#### [ AWS CLI ]

**To specify a provisioning timeout with the `create-cluster` command**

```
aws emr create-cluster \
--release-label emr-5.35.0 \
--service-role EMR_DefaultRole \
--ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","SubnetIds":["subnet-XXXXX"]}' \
--instance-fleets '[{"InstanceFleetType":"MASTER","TargetOnDemandCapacity":1,"TargetSpotCapacity":0,"LaunchSpecifications":{"OnDemandSpecification":{"AllocationStrategy":"lowest-price"}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"m5.xlarge"}],"Name":"Master - 1"},{"InstanceFleetType":"CORE","TargetOnDemandCapacity":1,"TargetSpotCapacity":1,"LaunchSpecifications":{"SpotSpecification":{"TimeoutDurationMinutes":120,"TimeoutAction":"SWITCH_TO_ON_DEMAND"},"OnDemandSpecification":{"AllocationStrategy":"lowest-price"}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":1,"InstanceType":"m5.xlarge"}],"Name":"Core - 2"}]'
```

------

# Customize a provisioning timeout period for cluster resize in Amazon EMR
<a name="emr-provisioning-timeout-resize"></a>

You can define a timeout period for provisioning Spot Instances for each fleet in your cluster. If Amazon EMR can't provision the Spot capacity, it cancels the resize request and stops its attempts to provision additional Spot capacity. When you create a cluster, you can configure the timeout. For a running cluster, you can add or update a timeout.

When the timeout period expires, Amazon EMR automatically sends events to an Amazon CloudWatch Events stream. With CloudWatch, you can create rules that match events according to a specified pattern, and then route the events to targets to take action. For example, you might configure a rule to send an email notification. For more information on how to create rules, see [Creating rules for Amazon EMR events with CloudWatch](emr-events-cloudwatch-console.md). For more information about different event details, see [Instance fleet state-change events](emr-manage-cloudwatch-events.md#emr-cloudwatch-instance-fleet-events).

## Examples of provisioning timeouts for cluster resize
<a name="emr-provisioning-timeout-examples"></a>

**Specify a provisioning timeout for resize with the AWS CLI**

The following example uses the `create-cluster` command to add a provisioning timeout for resize.

```
aws emr create-cluster \
--release-label emr-5.35.0 \
--service-role EMR_DefaultRole \
--ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","SubnetIds":["subnet-XXXXX"]}' \
--instance-fleets '[{"InstanceFleetType":"MASTER","TargetOnDemandCapacity":1,"TargetSpotCapacity":0,"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"m5.xlarge"}],"Name":"Master - 1"},{"InstanceFleetType":"CORE","TargetOnDemandCapacity":1,"TargetSpotCapacity":1,"LaunchSpecifications":{"SpotSpecification":{"TimeoutDurationMinutes":120,"TimeoutAction":"SWITCH_TO_ON_DEMAND"},"OnDemandSpecification":{"AllocationStrategy":"lowest-price"}},"ResizeSpecifications":{"SpotResizeSpecification":{"TimeoutDurationMinutes":20},"OnDemandResizeSpecification":{"TimeoutDurationMinutes":25}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":1,"InstanceType":"m5.xlarge"}],"Name":"Core - 2"}]'
```

The following example uses the `modify-instance-fleet` command to add a provisioning timeout for resize.

```
aws emr modify-instance-fleet \
--cluster-id j-XXXXXXXXXXXXX \
--instance-fleet '{"InstanceFleetId":"if-XXXXXXXXXXXX","ResizeSpecifications":{"SpotResizeSpecification":{"TimeoutDurationMinutes":30},"OnDemandResizeSpecification":{"TimeoutDurationMinutes":60}}}' \
--region us-east-1
```

The following example uses the `add-instance-fleet-command` to add a provisioning timeout for resize.

```
aws emr add-instance-fleet \
--cluster-id j-XXXXXXXXXXXXX \
--instance-fleet '{"InstanceFleetType":"TASK","TargetOnDemandCapacity":1,"TargetSpotCapacity":0,"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"m5.xlarge"}],"Name":"TaskFleet","ResizeSpecifications":{"SpotResizeSpecification":{"TimeoutDurationMinutes":30},"OnDemandResizeSpecification":{"TimeoutDurationMinutes":35}}}' \
--region us-east-1
```

**Specify a provisioning timeout for resize and launch with the AWS CLI**

The following example uses the `create-cluster` command to add a provisioning timeout for resize and launch.

```
aws emr create-cluster \
--release-label emr-5.35.0 \
--service-role EMR_DefaultRole \
--ec2-attributes '{"InstanceProfile":"EMR_EC2_DefaultRole","SubnetIds":["subnet-XXXXX"]}' \
--instance-fleets '[{"InstanceFleetType":"MASTER","TargetOnDemandCapacity":1,"TargetSpotCapacity":0,"LaunchSpecifications":{"OnDemandSpecification":{"AllocationStrategy":"lowest-price"}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":100,"InstanceType":"m5.xlarge"}],"Name":"Master - 1"},{"InstanceFleetType":"CORE","TargetOnDemandCapacity":1,"TargetSpotCapacity":1,"LaunchSpecifications":{"SpotSpecification":{"TimeoutDurationMinutes":120,"TimeoutAction":"SWITCH_TO_ON_DEMAND"},"OnDemandSpecification":{"AllocationStrategy":"lowest-price"}},"ResizeSpecifications":{"SpotResizeSpecification":{"TimeoutDurationMinutes":20},"OnDemandResizeSpecification":{"TimeoutDurationMinutes":25}},"InstanceTypeConfigs":[{"WeightedCapacity":1,"EbsConfiguration":{"EbsBlockDeviceConfigs":[{"VolumeSpecification":{"SizeInGB":32,"VolumeType":"gp2"},"VolumesPerInstance":2}]},"BidPriceAsPercentageOfOnDemandPrice":1,"InstanceType":"m5.xlarge"}],"Name":"Core - 2"}]'
```

## Considerations for resize provisioning timeouts
<a name="emr-provisioning-timeout-considerations"></a>

When you configure cluster provisioning timeouts for your instance fleets, consider the following behaviors.
+ You can configure provisioning timeouts for both Spot and On-Demand Instances. The minimum provisioning timeout is 5 minutes. The maximum provisioning timeout is 7 days.
+ You can only configure provisioning timeouts for an EMR cluster that uses instance fleets. You must configure each core and task fleet separately.
+ When you create a cluster, you can configure provisioning timeouts. You can add a timeout or update an existing timeout for a running cluster.
+ If you submit multiple resize operations, then Amazon EMR tracks provisioning timeouts for every resize operation. For example, set the provisioning timeout on a cluster to *60* minutes. Then, submit a resize operation *R1* at time *T1*. Submit a second resize operation *R2* at time *T2*. The provisioning timeout for R1 expires at *T1 \$1 60 minutes*. The provisioning timeout for R2 expires at *T2 \$1 60 minutes*.
+ If you submit a new scale-up resize operation before the timeout expires, Amazon EMR continues its attempt to provision capacity for your EMR cluster.

# Cluster scale-down options for Amazon EMR clusters
<a name="emr-scaledown-behavior"></a>

**Note**  
Scale-down behavior options are no longer supported since Amazon EMR release 5.10.0. Because of the introduction of per-second billing in Amazon EC2, the default scale-down behavior for Amazon EMR clusters is now terminate at task completion.

With Amazon EMR releases 5.1.0 through 5.9.1, there are two options for scale-down behavior: terminate at the instance-hour boundary for Amazon EC2 billing, or terminate at task completion. Starting with Amazon EMR release 5.10.0, the setting for termination at instance-hour boundary is deprecated because of the introduction of per-second billing in Amazon EC2. We do not recommend specifying termination at the instance-hour boundary in versions where the option is available.

**Warning**  
If you use the AWS CLI to issue a `modify-instance-groups` with `EC2InstanceIdsToTerminate`, these instances are terminated immediately, without consideration for these settings, and regardless of the status of applications running on them. Terminating an instance in this way risks data loss and unpredictable cluster behavior.

When terminate at task completion is specified, Amazon EMR deny lists and drains tasks from nodes before terminating the Amazon EC2 instances. With either behavior specified, Amazon EMR does not terminate Amazon EC2 instances in core instance groups if it could lead to HDFS corruption. 

## Terminate at task completion
<a name="emr-scaledown-terminate-task"></a>

Amazon EMR allows you to scale down your cluster without affecting your workload. Amazon EMR attempts to gracefully decommission YARN, HDFS, and other daemons on core and task nodes during a resize down operation without losing data or interrupting jobs. Amazon EMR only reduces instance group size if the work assigned to the groups has completed and they are idle. For YARN NodeManager Graceful Decommission, you can manually adjust the time a node waits for decommissioning.

**Note**  
When graceful decommissioning occurs, there can be data loss. Be sure to back up your data.

**Important**  
It is possible that HDFS data can be permanently lost during the graceful replacement of an unhealthy core instance. We recommend that you always back up your data.

This time is set using a property in the `YARN-site` configuration classification. Using Amazon EMR release 5.12.0 and higher, specify the `YARN.resourcemanager.nodemanager-graceful-decommission-timeout-secs` property. Using earlier Amazon EMR releases, specify the `YARN.resourcemanager.decommissioning.timeout` property.

If there are still running containers or YARN applications when the decommissioning timeout passes, the node is forced to be decommissioned and YARN reschedules affected containers on other nodes. The default value is 3600s (one hour). You can set this timeout to be an arbitrarily high value to force graceful reduction to wait longer. For more information, see [Graceful Decommission of YARN nodes](http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/GracefulDecommission.html) in the Apache Hadoop documentation.

### Task node groups
<a name="emr-scaledown-task-nodes"></a>

Amazon EMR intelligently selects instances that do not have tasks that are running against any step or application, and removes those instances from a cluster first. If all instances in the cluster are in use, Amazon EMR waits for tasks to complete on an instance before removing it from the cluster. The default wait time is 1 hour. This value can be changed with the `YARN.resourcemanager.decommissioning.timeout` setting. Amazon EMR dynamically uses the new setting. You can set this to an arbitrarily large number to ensure that Amazon EMR doesn't terminate any tasks while reducing the cluster size.

### Core node groups
<a name="emr-scaledown-core-nodes"></a>

On core nodes, both YARN NodeManager and HDFS DataNode daemons must be decommissioned for the instance group to reduce. For YARN, graceful reduction ensures that a node marked for decommissioning is only transitioned to the `DECOMMISSIONED` state if there are no pending or incomplete containers or applications. The decommissioning finishes immediately if there are no running containers on the node at the beginning of decommissioning. 

For HDFS, graceful reduction ensures that the target capacity of HDFS is large enough to fit all existing blocks. If the target capacity is not large enough, only a partial amount of core instances are decommissioned such that the remaining nodes can handle the current data residing in HDFS. You should ensure additional HDFS capacity to allow further decommissioning. You should also try to minimize write I/O before attempting to reduce instance groups. Excessive write I/O might delay completion of the resize operation. 

Another limit is the default replication factor, `dfs.replication` inside `/etc/hadoop/conf/hdfs-site`. When it creates a cluster, Amazon EMR configures the value based on the number of instances in the cluster: `1` with 1-3 instances, `2` for clusters with 4-9 instances, and `3` for clusters with 10\$1 instances. 

**Warning**  
Setting `dfs.replication` to 1 on clusters with fewer than four nodes can lead to HDFS data loss if a single node goes down. We recommend you use a cluster with at least four core nodes for production workloads.
Amazon EMR will not allow clusters to scale core nodes below `dfs.replication`. For example, if `dfs.replication = 2`, the minimum number of core nodes is 2.
When you use Managed Scaling, Auto-scaling, or choose to manually resize your cluster, we recommend that you to set `dfs.replication` to 2 or higher.

Graceful reduction doesn't let you reduce core nodes below the HDFS replication factor. This is to allow HDFS to close files due insufficient replicas. To circumvent this limit, lower the replication factor and restart the NameNode daemon. 

# Configure Amazon EMR scale-down behavior
<a name="emr-scaledown-configure"></a>

**Note**  
The terminate at instance hour scale-down behavior option is no longer supported for Amazon EMR release 5.10.0 and higher. The following scale-down behavior options only appear in the Amazon EMR console for releases 5.1.0 through 5.9.1.

You can use the AWS Management Console, the AWS CLI, or the Amazon EMR API to configure scale-down behavior when you create a cluster. 

------
#### [ Console ]

**To configure scale-down behavior with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. In the **Cluster scaling and provisioning option** section, choose **Use custom automatic scaling **. Under **Custom automatic scaling policies**, choose the **plus action button** to add **Scale in** policies. We recommend that you add both **Scale in** and **Scale out** policies. Adding in only one set of policies means Amazon EMR will only perform one-way scaling, and you have to manually perform the other actions.

1. Choose any other options that apply to your cluster. 

1. To launch your cluster, choose **Create cluster**.

------
#### [ AWS CLI ]

**To configure scale-down behavior with the AWS CLI**
+ Use the `--scale-down-behavior` option to specify either `TERMINATE_AT_INSTANCE_HOUR` or `TERMINATE_AT_TASK_COMPLETION`.

------