# Security in Amazon EMR
<a name="emr-security"></a>

Security and compliance is a responsibility you share with AWS. This shared responsibility model can help relieve your operational burden as AWS operates, manages, and controls the components from the host operating system and virtualization layer down to the physical security of the facilities in which EMR clusters operate. You assume responsibility, management, and updating Amazon EMR clusters, as well as configuring the application software and AWS provided security controls. This differentiation of responsibility is commonly referred to as security *of* the cloud versus security *in* the cloud. 
+ Security of the cloud – AWS is responsible for protecting the infrastructure that runs AWS services in AWS. AWS also provides you with services that you can use securely. Third-party auditors regularly test and verify the effectiveness of our security as part of the [AWS compliance programs](https://aws.amazon.com/compliance/programs/). To learn about the compliance programs that apply to Amazon EMR, see [AWS services in scope by compliance program](https://aws.amazon.com/compliance/services-in-scope/).
+ Security in the cloud – you are also responsible to perform all of the necessary security configuration and management tasks for securing an Amazon EMR cluster. Customers that deploy an Amazon EMR cluster are responsible for management of the application software installed on the instances, and the configuration of the AWS-provided features such as security groups, encryption and access control according to your requirements, applicable laws, and regulations.

This documentation helps you understand how to apply the shared responsibility model when using Amazon EMR. The topics in this chapter show you how to configure Amazon EMR and use other AWS services to meet your security and compliance objectives.

## Network and infrastructure security
<a name="w2aac30b9"></a>

As a managed service, Amazon EMR is protected by the AWS global network security procedures that are described in the [Amazon Web Services: Overview of security processes](https://d0.awsstatic.com/whitepapers/Security/AWS_Security_Whitepaper.pdf) whitepaper. AWS network and infrastructure protection services give you fine-grained protections at both the host and network-level boundaries. Amazon EMR supports AWS services and application features that address your network protection and compliance requirements.
+ **Amazon EC2 security groups** act as a virtual firewall for Amazon EMR cluster instances, limiting inbound and outbound network traffic. For more information, see [ Control network traffic with security groups](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-security-groups.html).
+ **Amazon EMR block public access (BPA)** prevents you from launching a cluster in a public subnet if the cluster has a security configuration that allows inbound traffic from public IP addresses on a port. For more information, see [Using Amazon EMR block public access](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-block-public-access.html).
+ **Secure Shell (SSH)** helps provide a secure way for users to connect to the command line on cluster instances. You can also use SSH to view web interfaces that applications host on the master node of a cluster. For more information, see [ Use an EC2 key pair for SSH credentials](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-access-ssh.html) and [ Connect to a cluster](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node.html).

## Updates to the default Amazon Linux AMI for Amazon EMR
<a name="w2aac30c11"></a>

**Important**  
EMR clusters that run Amazon Linux or Amazon Linux 2 Amazon Machine Images (AMIs) use default Amazon Linux behavior, and do not automatically download and install important and critical kernel updates that require a reboot. This is the same behavior as other Amazon EC2 instances that run the default Amazon Linux AMI. If new Amazon Linux software updates that require a reboot (such as kernel, NVIDIA, and CUDA updates) become available after an Amazon EMR release becomes available, EMR cluster instances that run the default AMI do not automatically download and install those updates. To get kernel updates, you can [customize your Amazon EMR AMI](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-custom-ami.html) to [use the latest Amazon Linux AMI](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/finding-an-ami.html).

Depending on the security posture of your application and the length of time that a cluster runs, you may choose to periodically reboot your cluster to apply security updates, or create a bootstrap action to customize package installation and updates. You may also choose to test and then install select security updates on running cluster instances. For more information, see [Using the default Amazon Linux AMI for Amazon EMR](emr-default-ami.md). Note that your networking configuration must allow for HTTP and HTTPS egress to Linux repositories in Amazon S3, otherwise security updates will not succeed.

## AWS Identity and Access Management with Amazon EMR
<a name="w2aac30c13"></a>

AWS Identity and Access Management (IAM) is an AWS service that helps an administrator securely control access to AWS resources. IAM administrators control who can be *authenticated* (signed in) and *authorized* (have permissions) to use Amazon EMR resources. IAM identities include users, groups, and roles. An IAM role is similar to an IAM user, but is not associated with a specific person, and is intended to be assumable by any user who needs permissions. For more information, see [AWS Identity and Access Management for Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-access-iam.html). Amazon EMR uses multiple IAM roles to help you implement access controls for Amazon EMR clusters. IAM is an AWS service that you can use with no additional charge.
+ **IAM role for Amazon EMR (EMR role)** – controls how Amazon EMR service is able to access other AWS services on your behalf, such as provisioning Amazon EC2 instances when the Amazon EMR cluster launches. For more information, see [Configure IAM service roles for Amazon EMR permissions to AWS services and resources](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles.html).
+ **IAM role for cluster EC2 instances (EC2 instance profile)** – a role that is assigned to every EC2 instance in the Amazon EMR cluster when the instance launches. Application processes that run on the cluster use this role to interact with other AWS services, such as Amazon S3. For more information, see [IAM role for cluster’s EC2 instances](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-role-for-ec2.html).
+ **IAM role for applications (runtime role)** – an IAM role that you can specify when you submit a job or query to an Amazon EMR cluster. The job or query that you submit to your Amazon EMR cluster uses the runtime role to access AWS resources, such as objects in Amazon S3. You can specify runtime roles with Amazon EMR for Spark and Hive jobs. By using runtime roles, you can isolate jobs running on the same cluster by using different IAM roles. For more information, see [Using IAM role as runtime role with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-steps-runtime-roles.html).

Workforce identities refer to users who build or operate workloads in AWS. Amazon EMR provides support for workforce identities with the following:
+ **AWS IAM identity center (Idc)** is the recommended AWS service for managing user access to AWS resources. It is a single place where you can assign your workforce identities, consistent access to multiple AWS accounts and applications. Amazon EMR supports workforce identities through trusted identity propagation. With trusted identity propagation capability, a user can sign in to the application and that application can pass the identity of the user to other AWS services for authorizing access to data or resources. For more information see, Enabling support for [AWS IAM identity center with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-idc.html).

  **Lightweight Directory Access Protocol (LDAP)** is an open, vendor-neutral, industry standard application protocol for accessing and maintaining information about users, systems, services, and applications over the network. LDAP is commonly used for user authentication against corporate identity servers such as Active Directory (AD) and OpenLDAP. By enabling LDAP with EMR clusters, you allow you users use their existing credentials to authenticate and access clusters. For more information see, [enabling support for LDAP with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/ldap.html).

  **Kerberos** is a network authentication protocol designed to provide strong authentication for client/server applications by using secret-key cryptography. When you use Kerberos, Amazon EMR configures Kerberos for the applications, components, and subsystems that it installs on the cluster so that they are authenticated with each other. To access a cluster with Kerberos configured, a kerberos principal must be present in the Kerberos Domain Controller (KDC). For more information, see [enabling support for Kerberos with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-kerberos.html).

### Single-tenant and multi-tenant clusters
<a name="w2aac30c13c11"></a>

A cluster is by default configured for a single tenancy with the EC2 Instance profile as the IAM identity. In a single-tenant cluster, every job has full and complete access to the cluster and access to all AWS services and resources is done on the basis of the EC2 instance profile. In a multi-tenant cluster, tenants are isolated from each other and the tenants don't have full and complete access to the clusters and EC2 Instances of the cluster. The identity on multi-tenant clusters is either the runtime roles or the workforce identifies. In a multi-tenant cluster, you can also enable support for fine-grained access control (FGAC) via AWS Lake Formation or Apache Ranger. A cluster that has runtime roles or FGAC enabled, access to the EC2 Instance profile is also disable via iptables.

**Important**  
Any users who have access to a single-tenant cluster can install any software on the Linux operating system (OS), change or remove software components installed by Amazon EMR and impact the EC2 Instances that are part of the cluster. If you want to ensure that users can't install or change configurations of an Amazon EMR cluster, we recommend that you enable multi-tenancy for the cluster. You can enable multi-tenancy on a cluster by enabling support for runtime role, AWS IAM identity center, Kerberos, or LDAP.

## Data protection
<a name="w2aac30c15"></a>

With AWS, you control your data by using AWS services and tools to determine how the data is secured and who has access to it. Services such as AWS Identity and Access Management (IAM) let you securely manage access to AWS services and resources. AWS CloudTrail enables detection and auditing. Amazon EMR makes it easy for you to encrypt data at rest in Amazon S3 by using keys either managed by AWS or fully managed by you. Amazon EMR also support enabling encryption for data in transit. For more information, see [encrypt data at rest and in transit](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-data-encryption.html).

### Data Access Control
<a name="w2aac30c15b5"></a>

With data access control, you can control what data an IAM identity or a workforce identity can access. Amazon EMR supports the following access controls:
+ **IAM identity-based policies** – manage permissions for IAM roles that you use with Amazon EMR. IAM policies can be combined with tagging to control access on a cluster-by-cluster basis. For more information, see [AWS Identity and Access Management for Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-access-iam.html).
+ **AWS Lake Formation** centralizes permissions management of your data and makes it easier to share across your organization and externally. You can use Lake Formation to enable fine-grained, column-level access to databases and tables in the AWS Glue Data Catalog. For more information, see [Using AWS Lake Formation with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-lake-formation.html).
+ **Amazon S3 access grants** map identities map identities in directories such as Active Directory, or AWS Identity and Access Management (IAM) principals, to datasets in S3. Additionally, S3 access grants log end-user identity and the application used to access S3 data in AWS CloudTrail. For more information, see [Using Amazon S3 access grants with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-access-grants.html).
+ **Apache Ranger** is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. Amazon EMR supports Apache Ranger based fine-grained access control for Apache Hive Metastore and Amazon S3. For more information see [Integrate Apache Ranger with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ranger.html).

# Use security configurations to set up Amazon EMR cluster security
<a name="emr-security-configurations"></a>

You can use Amazon EMR security configurations to configure data encryption, Kerberos authentication, and Amazon S3 authorization for EMRFS on your clusters. First, you create a security configuration. Then, the security configuration is available to use and re-use when you create clusters.

You can use the AWS Management Console, the AWS Command Line Interface (AWS CLI), or the AWS SDKs to create security configurations. You can also use an AWS CloudFormation template to create a security configuration. For more information, see [AWS CloudFormation User Guide](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/) and the template reference for [AWS::EMR::SecurityConfiguration](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-emr-securityconfiguration.html#cfn-emr-securityconfiguration-securityconfiguration).

**Topics**
+ [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md)
+ [Specify a security configuration for an Amazon EMR cluster](emr-specify-security-configuration.md)

# Create a security configuration with the Amazon EMR console or with the AWS CLI
<a name="emr-create-security-configuration"></a>

This topic covers general procedures to create a security configuration with the Amazon EMR console and the AWS CLI, followed by a reference for the parameters that comprise encryption, authentication, and IAM roles for EMRFS. For more information about these features, see the following topics:
+ [Encrypt data at rest and in transit with Amazon EMR](emr-data-encryption.md)
+ [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md)
+ [Configure IAM roles for EMRFS requests to Amazon S3](emr-emrfs-iam-roles.md)

**To create a security configuration using the console**

1. Open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr/).

1. In the navigation pane, choose **Security Configurations**, **Create security configuration**. 

1. Type a **Name** for the security configuration.

1. Choose options for **Encryption** and **Authentication** as described in the sections below and then choose **Create**.

**To create a security configuration using the AWS CLI**
+ Use the `create-security-configuration` command as shown in the following example.
  + For *SecConfigName*, specify the name of the security configuration. This is the name you specify when you create a cluster that uses this security configuration.
  + For `SecConfigDef`, specify an inline JSON structure or the path to a local JSON file, such as `file://MySecConfig.json`. The JSON parameters define options for **Encryption**, **IAM Roles for EMRFS access to Amazon S3**, and **Authentication** as described in the sections below.

  ```
  aws emr create-security-configuration --name "SecConfigName" --security-configuration SecConfigDef
  ```

## Configure data encryption
<a name="emr-security-configuration-encryption"></a>

Before you configure encryption in a security configuration, create the keys and certificates that are used for encryption. For more information, see [Providing keys for encrypting data at rest](emr-encryption-enable.md#emr-encryption-create-keys) and [Providing certificates for encrypting data in transit with Amazon EMR encryption](emr-encryption-enable.md#emr-encryption-certificates).

When you create a security configuration, you specify two sets of encryption options: at-rest data encryption and in-transit data encryption. Options for at-rest data encryption include both Amazon S3 with EMRFS and local-disk encryption. In-transit encryption options enable the open-source encryption features for certain applications that support Transport Layer Security (TLS). At-rest options and in-transit options can be enabled together or separately. For more information, see [Encrypt data at rest and in transit with Amazon EMR](emr-data-encryption.md).

**Note**  
When you use AWS KMS, charges apply for the storage and use of encryption keys. For more information, see [AWS KMS Pricing](https://aws.amazon.com/kms/pricing/).

### Specifying encryption options using the console
<a name="emr-security-configuration-encryption-console"></a>

Choose options under **Encryption** according to the following guidelines.
+ Choose options under **At rest encryption** to encrypt data stored within the file system. 

  You can choose to encrypt data in Amazon S3, local disks, or both. 
+ Under **S3 data encryption**, for **Encryption mode**, choose a value to determine how Amazon EMR encrypts Amazon S3 data with EMRFS. 

  What you do next depends on the encryption mode you chose:
  + **SSE-S3**

    Specifies [Server-side encryption with Amazon S3-managed encryption keys](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingServerSideEncryption.html). You don't need to do anything more because Amazon S3 handles keys for you.
  + **SSE-KMS** or **CSE-KMS**

    Specifies [server-side encryption with AWS KMS-managed keys (SSE-KMS)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingKMSEncryption.html) or [client-side encryption with AWS KMS-managed keys (CSE-KMS)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingClientSideEncryption.html). For **AWS KMS key**, select a key. The key must exist in the same region as your EMR cluster. For key requirements, see [Using AWS KMS keys for encryption](emr-encryption-enable.md#emr-awskms-keys).
  + **CSE-Custom**

    Specifies [client-side encryption using a custom client-side root key (CSE-custom)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingClientSideEncryption.html#client-side-encryption-client-side-master-key-intro). For **S3 object**, enter the location in Amazon S3, or the Amazon S3 ARN, of your custom key-provider JAR file. Then, for **Key provider class**, enter the full class name of a class declared in your application that implements the EncryptionMaterialsProvider interface.
+ Under **Local disk encryption**, choose a value for **Key provider type**.
  + **AWS KMS key**

    Select this option to specify an AWS KMS key. For **AWS KMS key**, select a key. The key must exist in the same region as your EMR cluster. For more information about key requirements, see [Using AWS KMS keys for encryption](emr-encryption-enable.md#emr-awskms-keys).

    **EBS Encryption**

    When you specify AWS KMS as your key provider, you can enable EBS encryption to encrypt EBS root device and storage volumes. To enable such option, you must grant the Amazon EMR service role `EMR_DefaultRole` with permissions to use the AWS KMS key that you specify. For more information about key requirements, see [Enabling EBS encryption by providing additional permissions for KMS keys](emr-encryption-enable.md#emr-awskms-ebs-encryption).
  + **Custom**

    Select this option to specify a custom key provider. For **S3 object**, enter the location in Amazon S3, or the Amazon S3 ARN, of your custom key-provider JAR file. For **Key provider class**, enter the full class name of a class declared in your application that implements the EncryptionMaterialsProvider interface. The class name you provide here must be different from the class name provided for CSE-Custom.
+ Choose **In-transit encryption** to enable the open-source TLS encryption features for in-transit data. Choose a **Certificate provider type** according to the following guidelines: 
  + **PEM**

    Select this option to use PEM files that you provide within a zip file. Two artifacts are required within the zip file: privateKey.pem and certificateChain.pem. A third file, trustedCertificates.pem, is optional. See [Providing certificates for encrypting data in transit with Amazon EMR encryption](emr-encryption-enable.md#emr-encryption-certificates) for details. For **S3 object**, specify the location in Amazon S3, or the Amazon S3 ARN, of the zip file field. 
  + **Custom**

    Select this option to specify a custom certificate provider and then, for **S3 object**, enter the location in Amazon S3, or the Amazon S3 ARN, of your custom certificate-provider JAR file. For **Key provider class**, enter the full class name of a class declared in your application that implements the TLSArtifactsProvider interface. 

### Specifying encryption options using the AWS CLI
<a name="emr-security-configuration-encryption-cli"></a>

The sections that follow use sample scenarios to illustrate well-formed **--security-configuration** JSON for different configurations and key providers, followed by a reference for the JSON parameters and appropriate values.

#### Example in-transit data encryption options
<a name="emr-encryption-intransit-cli"></a>

The example below illustrates the following scenario:
+ In-transit data encryption is enabled and at-rest data encryption is disabled.
+ A zip file with certificates in Amazon S3 is used as the key provider (see [Providing certificates for encrypting data in transit with Amazon EMR encryption](emr-encryption-enable.md#emr-encryption-certificates) for certificate requirements.

```
aws emr create-security-configuration --name "MySecConfig" --security-configuration '{
	"EncryptionConfiguration": {
		"EnableInTransitEncryption": true,
		"EnableAtRestEncryption": false,
		"InTransitEncryptionConfiguration": {
			"TLSCertificateConfiguration": {
				"CertificateProviderType": "PEM",
				"S3Object": "s3://MyConfigStore/artifacts/MyCerts.zip"
			}
		}
	}
}'
```

The example below illustrates the following scenario:
+ In-transit data encryption is enabled and at-rest data encryption is disabled.
+ A custom key provider is used (see [Providing certificates for encrypting data in transit with Amazon EMR encryption](emr-encryption-enable.md#emr-encryption-certificates) for certificate requirements).

```
aws emr create-security-configuration --name "MySecConfig" --security-configuration '{
	"EncryptionConfiguration": {
		"EnableInTransitEncryption": true,
		"EnableAtRestEncryption": false,
		"InTransitEncryptionConfiguration": {
			"TLSCertificateConfiguration": {
				"CertificateProviderType": "Custom",
				"S3Object": "s3://MyConfig/artifacts/MyCerts.jar",
				"CertificateProviderClass": "com.mycompany.MyCertProvider"
			}
		}
 	}
}'
```

#### Example at-rest data encryption options
<a name="emr-encryption-atrest-cli"></a>

The example below illustrates the following scenario:
+ In-transit data encryption is disabled and at-rest data encryption is enabled.
+ SSE-S3 is used for Amazon S3 encryption.
+ Local disk encryption uses AWS KMS as the key provider.

```
aws emr create-security-configuration --name "MySecConfig" --security-configuration '{
	"EncryptionConfiguration": {
		"EnableInTransitEncryption": false,
		"EnableAtRestEncryption": true,
		"AtRestEncryptionConfiguration": {
			"S3EncryptionConfiguration": {
				"EncryptionMode": "SSE-S3"
			},
			"LocalDiskEncryptionConfiguration": {
				"EncryptionKeyProviderType": "AwsKms",
				"AwsKmsKey": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
			}
		}
 	}
}'
```

The example below illustrates the following scenario:
+ In-transit data encryption is enabled and references a zip file with PEM certificates in Amazon S3, using the ARN.
+ SSE-KMS is used for Amazon S3 encryption.
+ Local disk encryption uses AWS KMS as the key provider.

```
aws emr create-security-configuration --name "MySecConfig" --security-configuration '{
	"EncryptionConfiguration": {
		"EnableInTransitEncryption": true,
		"EnableAtRestEncryption": true,
		"InTransitEncryptionConfiguration": {
			"TLSCertificateConfiguration": {
				"CertificateProviderType": "PEM",
				"S3Object": "arn:aws:s3:::MyConfigStore/artifacts/MyCerts.zip"
			}
		},
		"AtRestEncryptionConfiguration": {
			"S3EncryptionConfiguration": {
				"EncryptionMode": "SSE-KMS",
				"AwsKmsKey": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
			},
			"LocalDiskEncryptionConfiguration": {
				"EncryptionKeyProviderType": "AwsKms",
				"AwsKmsKey": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
			}
		}
	}
}'
```

The example below illustrates the following scenario:
+ In-transit data encryption is enabled and references a zip file with PEM certificates in Amazon S3.
+ CSE-KMS is used for Amazon S3 encryption.
+ Local disk encryption uses a custom key provider referenced by its ARN.

```
aws emr create-security-configuration --name "MySecConfig" --security-configuration '{
	"EncryptionConfiguration": {
		"EnableInTransitEncryption": true,
		"EnableAtRestEncryption": true,
		"InTransitEncryptionConfiguration": {
			"TLSCertificateConfiguration": {
				"CertificateProviderType": "PEM",
				"S3Object": "s3://MyConfigStore/artifacts/MyCerts.zip"
			}
		},
		"AtRestEncryptionConfiguration": {
			"S3EncryptionConfiguration": {
				"EncryptionMode": "CSE-KMS",
				"AwsKmsKey": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
			},
			"LocalDiskEncryptionConfiguration": {
				"EncryptionKeyProviderType": "Custom",
				"S3Object": "arn:aws:s3:::artifacts/MyKeyProvider.jar",
				"EncryptionKeyProviderClass": "com.mycompany.MyKeyProvider"
			}
		}
	}
}'
```

The example below illustrates the following scenario:
+ In-transit data encryption is enabled with a custom key provider.
+ CSE-Custom is used for Amazon S3 data.
+ Local disk encryption uses a custom key provider.

```
aws emr create-security-configuration --name "MySecConfig" --security-configuration '{
	"EncryptionConfiguration": {
		"EnableInTransitEncryption": "true",
		"EnableAtRestEncryption": "true",
		"InTransitEncryptionConfiguration": {
			"TLSCertificateConfiguration": {
				"CertificateProviderType": "Custom",
				"S3Object": "s3://MyConfig/artifacts/MyCerts.jar", 
				"CertificateProviderClass": "com.mycompany.MyCertProvider"
			}
		},
		"AtRestEncryptionConfiguration": {
			"S3EncryptionConfiguration": {
				"EncryptionMode": "CSE-Custom",
				"S3Object": "s3://MyConfig/artifacts/MyCerts.jar", 
				"EncryptionKeyProviderClass": "com.mycompany.MyKeyProvider"
				},
			"LocalDiskEncryptionConfiguration": {
				"EncryptionKeyProviderType": "Custom",
				"S3Object": "s3://MyConfig/artifacts/MyCerts.jar",
				"EncryptionKeyProviderClass": "com.mycompany.MyKeyProvider"
			}
		}
	}
}'
```

The example below illustrates the following scenario:
+ In-transit data encryption is disabled and at-rest data encryption is enabled.
+ Amazon S3 encryption is enabled with SSE-KMS.
+ Multiple AWS KMS keys are used, one per each S3 bucket, and encryption exceptions are applied to these individual S3 buckets.
+ Local disk encryption is disabled.

```
aws emr create-security-configuration --name "MySecConfig" --security-configuration '{
  	"EncryptionConfiguration": {
   		"AtRestEncryptionConfiguration": {
      	     	"S3EncryptionConfiguration": {
        			"EncryptionMode": "SSE-KMS",
        			"AwsKmsKey": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012",
        			"Overrides": [
         				 {
           				 "BucketName": "amzn-s3-demo-bucket1",
            				"EncryptionMode": "SSE-S3"
          				},
          				{
            				"BucketName": "amzn-s3-demo-bucket2",
           				 "EncryptionMode": "CSE-KMS",
            				"AwsKmsKey": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
         				 },
         				 {
           				 "BucketName": "amzn-s3-demo-bucket3",
          				  "EncryptionMode": "SSE-KMS",
           				 "AwsKmsKey": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
          				}
        					]
      							}
   						 	},
   		"EnableInTransitEncryption": false,
    		"EnableAtRestEncryption": true
  }
}'
```

The example below illustrates the following scenario:
+ In-transit data encryption is disabled and at-rest data encryption is enabled.
+ Amazon S3 encryption is enabled with SSE-S3 and local disk encryption is disabled.

```
aws emr create-security-configuration --name "MyS3EncryptionConfig" --security-configuration '{
    "EncryptionConfiguration": {
        "EnableInTransitEncryption": false,
        "EnableAtRestEncryption": true,
        "AtRestEncryptionConfiguration": {
            "S3EncryptionConfiguration": {
                "EncryptionMode": "SSE-S3"
            }
        }
     }
}'
```

The example below illustrates the following scenario:
+ In-transit data encryption is disabled and at-rest data encryption is enabled.
+ Local disk encryption is enabled with AWS KMS as the key provider and Amazon S3 encryption is disabled.

```
aws emr create-security-configuration --name "MyLocalDiskEncryptionConfig" --security-configuration '{
    "EncryptionConfiguration": {
        "EnableInTransitEncryption": false,
        "EnableAtRestEncryption": true,
        "AtRestEncryptionConfiguration": {
            "LocalDiskEncryptionConfiguration": {
                "EncryptionKeyProviderType": "AwsKms",
                "AwsKmsKey": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
            }
        }
     }
}'
```

The example below illustrates the following scenario:
+ In-transit data encryption is disabled and at-rest data encryption is enabled.
+ Local disk encryption is enabled with AWS KMS as the key provider and Amazon S3 encryption is disabled.
+ EBS encryption is enabled. 

```
aws emr create-security-configuration --name "MyLocalDiskEncryptionConfig" --security-configuration '{
    "EncryptionConfiguration": {
        "EnableInTransitEncryption": false,
        "EnableAtRestEncryption": true,
        "AtRestEncryptionConfiguration": {
            "LocalDiskEncryptionConfiguration": {
                "EnableEbsEncryption": true,
                "EncryptionKeyProviderType": "AwsKms",
                "AwsKmsKey": "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
            }
        }
     }
}'
```

The example below illustrates the following scenario:

SSE-EMR-WAL is used for EMR WAL encryption

```
aws emr create-security-configuration --name "MySecConfig" \
    --security-configuration '{
        "EncryptionConfiguration": {
            "EMRWALEncryptionConfiguration":{ },
            "EnableInTransitEncryption":false, "EnableAtRestEncryption":false
        }
    }'
```

`EnableInTransitEncryption` and `EnableAtRestEncryption` still could be true, if want to enable related encryption.

The example below illustrates the following scenario:
+ SSE-KMS-WAL is used for EMR WAL encryption
+ Server side encryption uses AWS Key Management Service as the key provider

```
aws emr create-security-configuration --name "MySecConfig" \
    --security-configuration '{
        "EncryptionConfiguration": {
            "EMRWALEncryptionConfiguration":{
                "AwsKmsKey":"arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
                },
            "EnableInTransitEncryption":false, "EnableAtRestEncryption":false
        }
    }'
```

`EnableInTransitEncryption` and `EnableAtRestEncryption` still could be true, if want to enable related encryption.

#### JSON reference for encryption settings
<a name="emr-encryption-cli-parameters"></a>

The following table lists the JSON parameters for encryption settings and provides a description of acceptable values for each parameter.


| Parameter | Description | 
| --- |--- |
| "EnableInTransitEncryption" : true \$1 false | Specify true to enable in-transit encryption and false to disable it. If omitted, false is assumed, and in-transit encryption is disabled. | 
| "EnableAtRestEncryption": true \$1 false | Specify true to enable at-rest encryption and false to disable it. If omitted, false is assumed and at-rest encryption is disabled. | 
| **In-transit encryption parameters** | 
| --- |
| "InTransitEncryptionConfiguration" : | Specifies a collection of values used to configure in-transit encryption when EnableInTransitEncryption is true. | 
|  "CertificateProviderType": "PEM" \$1 "Custom" | Specifies whether to use PEM certificates referenced with a zipped file, or a Custom certificate provider. If PEM is specified, S3Object must be a reference to the location in Amazon S3 of a zip file containing the certificates. If Custom is specified, S3Object must be a reference to the location in Amazon S3 of a JAR file, followed by a CertificateProviderClass entry. | 
|  "S3Object" : "ZipLocation" \$1 "JarLocation" | Provides the location in Amazon S3 to a zip file when PEM is specified, or to a JAR file when Custom is specified. The format can be a path (for example, s3://MyConfig/artifacts/CertFiles.zip) or an ARN (for example, arn:aws:s3:::Code/MyCertProvider.jar). If a zip file is specified, it must contain files named exactly privateKey.pem and certificateChain.pem. A file named trustedCertificates.pem is optional. | 
|  "CertificateProviderClass" : "MyClassID" | Required only if Custom is specified for CertificateProviderType. MyClassID specifies a full class name declared in the JAR file, which implements the TLSArtifactsProvider interface. For example, com.mycompany.MyCertProvider. | 
| **At-rest encryption parameters** | 
| --- |
| "AtRestEncryptionConfiguration" :  | Specifies a collection of values for at-rest encryption when EnableAtRestEncryption is true, including Amazon S3 encryption and local disk encryption. | 
| Amazon S3 encryption parameters | 
| "S3EncryptionConfiguration" : | Specifies a collection of values used for Amazon S3 encryption with the Amazon EMR File System (EMRFS). | 
| "EncryptionMode": "SSE-S3" \$1 "SSE-KMS" \$1 "CSE-KMS" \$1 "CSE-Custom" | Specifies the type of Amazon S3 encryption to use. If SSE-S3 is specified, no further Amazon S3 encryption values are required. If either SSE-KMS or CSE-KMS is specified, an AWS KMS key ARN must be specified as the AwsKmsKey value. If CSE-Custom is specified, S3Object and EncryptionKeyProviderClass values must be specified. | 
| "AwsKmsKey" : "MyKeyARN" | Required only when either SSE-KMS or CSE-KMS is specified for EncryptionMode. MyKeyARN must be a fully specified ARN to a key (for example, arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012). | 
|  "S3Object" : "JarLocation" | Required only when CSE-Custom is specified for CertificateProviderType. JarLocation provides the location in Amazon S3 to a JAR file. The format can be a path (for example, s3://MyConfig/artifacts/MyKeyProvider.jar) or an ARN (for example, arn:aws:s3:::Code/MyKeyProvider.jar). | 
| "EncryptionKeyProviderClass" : "MyS3KeyClassID" | Required only when CSE-Custom is specified for EncryptionMode. MyS3KeyClassID specifies a full class name of a class declared in the application that implements the EncryptionMaterialsProvider interface; for example, com.mycompany.MyS3KeyProvider. | 
| Local disk encryption parameters | 
| "LocalDiskEncryptionConfiguration" | Specifies the key provider and corresponding values to be used for local disk encryption. | 
| "EnableEbsEncryption": true \$1 false | Specify true to enable EBS encryption. EBS encryption encrypts the EBS root device volume and attached storage volumes. To use EBS encryption, you must specify AwsKms as your EncryptionKeyProviderType. | 
| "EncryptionKeyProviderType": "AwsKms" \$1 "Custom" | Specifies the key provider. If AwsKms is specified, an KMS key ARN must be specified as the AwsKmsKey value. If Custom is specified, S3Object and EncryptionKeyProviderClass values must be specified. | 
| "AwsKmsKey : "MyKeyARN" | Required only when AwsKms is specified for Type. MyKeyARN must be a fully specified ARN to a key (for example, arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-456789012123). | 
| "S3Object" : "JarLocation" | Required only when CSE-Custom is specified for CertificateProviderType. JarLocation provides the location in Amazon S3 to a JAR file. The format can be a path (for example, s3://MyConfig/artifacts/MyKeyProvider.jar) or an ARN (for example, arn:aws:s3:::Code/MyKeyProvider.jar). | 
|  `"EncryptionKeyProviderClass" : "MyLocalDiskKeyClassID"`  | Required only when Custom is specified for Type. MyLocalDiskKeyClassID specifies a full class name of a class declared in the application that implements the EncryptionMaterialsProvider interface; for example, com.mycompany.MyLocalDiskKeyProvider. | 
| **EMR WAL encryption parameters** | 
| --- |
| "EMRWALEncryptionConfiguration"  | Specifies the value for EMR WAL encryption. | 
| "AwsKmsKey"  | Specifies the CMK Key Id Arn. | 

## Configure Kerberos authentication
<a name="emr-security-configuration-kerberos"></a>

A security configuration with Kerberos settings can only be used by a cluster that is created with Kerberos attributes or an error occurs. For more information, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md). Kerberos is only available in Amazon EMR release version 5.10.0 and later.

### Specifying Kerberos settings using the console
<a name="emr-security-configuration-console-kerberos"></a>

Choose options under **Kerberos authentication** according to the following guidelines.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-create-security-configuration.html)

### Specifying Kerberos settings using the AWS CLI
<a name="emr-kerberos-cli-parameters"></a>

The following reference table shows JSON parameters for Kerberos settings in a security configuration. For example configurations, see, [Configuration examples](emr-kerberos-config-examples.md).

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-create-security-configuration.html)

## Configure IAM roles for EMRFS requests to Amazon S3
<a name="emr-security-configuration-emrfs"></a>

IAM roles for EMRFS allow you to provide different permissions to EMRFS data in Amazon S3. You create mappings that specify an IAM role that is used for permissions when an access request contains an identifier that you specify. The identifier can be a Hadoop user or role, or an Amazon S3 prefix. 

For more information, see [Configure IAM roles for EMRFS requests to Amazon S3](emr-emrfs-iam-roles.md).

### Specifying IAM roles for EMRFS using the AWS CLI
<a name="w2aac30c17b9c15b7"></a>

The following is an example JSON snippet for specifying custom IAM roles for EMRFS within a security configuration. It demonstrates role mappings for the three different identifier types, followed by a parameter reference. 

```
{
  "AuthorizationConfiguration": {
    "EmrFsConfiguration": {
      "RoleMappings": [{
        "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_for_user1",
        "IdentifierType": "User",
        "Identifiers": [ "user1" ]
      },{
        "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_to_demo_s3_buckets",
        "IdentifierType": "Prefix",
        "Identifiers": [ "s3://amzn-s3-demo-bucket1/","s3://amzn-s3-demo-bucket2/" ]
      },{
        "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_for_AdminGroup",
        "IdentifierType": "Group",
        "Identifiers": [ "AdminGroup" ]
      }]
    }
  }
}
```


| Parameter | Description | 
| --- | --- | 
|  `"AuthorizationConfiguration":`  |  Required.  | 
|   `"EmrFsConfiguration":`  |  Required. Contains role mappings.  | 
|    `"RoleMappings":`  |  Required. Contains one or more role mapping definitions. Role mappings are evaluated in the top-down order that they appear. If a role mapping evaluates as true for an EMRFS call for data in Amazon S3, no further role mappings are evaluated and EMRFS uses the specified IAM role for the request. Role mappings consist of the following required parameters: | 
|    `"Role":` | Specifies the ARN identifier of an IAM role in the format `arn:aws:iam::account-id:role/role-name`. This is the IAM role that Amazon EMR assumes if the EMRFS request to Amazon S3 matches any of the `Identifiers` specified. | 
|    `"IdentifierType":` | Can be one of the following: [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-create-security-configuration.html)  | 
|     `"Identifiers":`  |  Specifies one or more identifiers of the appropriate identifier type. Separate multiple identifiers by commas with no spaces.  | 

## Configure metadata service requests to Amazon EC2 instances
<a name="emr-security-configuration-imdsv2"></a>

Instance metadata is data about your instance that you can use to configure or manage the running instance. You can access instance metadata from a running instance using one of the following methods:
+ Instance Metadata Service Version 1 (IMDSv1) - a request/response method
+ Instance Metadata Service Version 2 (IMDSv2) - a session-oriented method

While Amazon EC2 supports both IMDSv1 and IMDSv2, Amazon EMR supports IMDSv2 in Amazon EMR 5.23.1, 5.27.1, 5.32 or later, and 6.2 or later. In these releases, Amazon EMR components use IMDSv2 for all IMDS calls. For IMDS calls in your application code, you can use both IMDSv1 and IMDSv2, or configure the IMDS to use only IMDSv2 for added security. When you specify that IMDSv2 must be used, IMDSv1 no longer works.

For more information, see [Configure the instance metadata service](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html) in the *Amazon EC2 User Guide*.

**Note**  
In earlier Amazon EMR 5.x or 6.x releases, turning off IMDSv1 causes cluster startup failure as Amazon EMR components use IMDSv1 for all IMDS calls. When turning off IMDSv1, please ensure that any custom software that utilizes IMDSv1 is updated to IMDSv2.

### Specifying instance metadata service configuration using the AWS CLI
<a name="w2aac30c17b9c17c13"></a>

The following is an example JSON snippet for specifying Amazon EC2 instance metadata service (IMDS) within a security configuration. Using a custom security configuration is optional.

```
{
  "InstanceMetadataServiceConfiguration" : {
      "MinimumInstanceMetadataServiceVersion": integer,
      "HttpPutResponseHopLimit": integer
   }
}
```


| Parameter | Description | 
| --- | --- | 
|  `"InstanceMetadataServiceConfiguration":`  |  If you don't specify IMDS within a security configuration and use an Amazon EMR release that requires IMDSv1, Amazon EMR defaults to using IMDSv1 as the minimum instance metadata service version. If you want to use your own configuration, both of the following parameters are required.  | 
|   `"MinimumInstanceMetadataServiceVersion":`  |  Required. Specify `1` or `2`. A value of `1` allows IMDSv1 and IMDSv2. A value of `2` allows only IMDSv2.  | 
|   `"HttpPutResponseHopLimit":`  |  Required. The desired HTTP PUT response hop limit for instance metadata requests. The larger the number, the further instance metadata requests can travel. Default: `1`. Specify an integer from `1` to `64`. | 

### Specifying instance metadata service configuration using the console
<a name="emr-security-configuration-imdsv2-console"></a>

You can configure the use of IMDS for a cluster when you launch it from the Amazon EMR console.

**To configure the use of IMDS using the console:**

1. When creating a new security configuration on the **Security configurations** page, select **Configure EC2 Instance metadata service** under the **EC2 Instance Metadata Service** setting. This configuration is supported only in Amazon EMR 5.23.1, 5.27.1, 5.32 or later, and 6.2 or later.

1. For the **Minimum Instance Metadata Service Version** option, select either:
   + **Turn off IMDSv1 and only allow IMDSv2**, if you want to allow only IMDSv2 on this cluster. See [Transition to using instance metadata service version 2](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html#instance-metadata-transition-to-version-2) in the *Amazon EC2 User Guide*.
   + **Allow both IMDSv1 and IMDSv2 on cluster**, if you want to allow IMDSv1 and session-oriented IMDSv2 on this cluster.

1. For IMDSv2, you can also configure the allowable number of network hops for the metadata token by setting the **HTTP put response hop limit** to an integer between `1` and `64`.

For more information, see [Configure the instance metadata service](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html) in the *Amazon EC2 User Guide*.

See [Configure instance details](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/launching-instance.html#configure_instance_details_step) and [Configure the instance metadata service](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configuring-instance-metadata-service.html) in the *Amazon EC2 User Guide*.

# Specify a security configuration for an Amazon EMR cluster
<a name="emr-specify-security-configuration"></a>

You can specify encryption settings when you create a cluster by specifying the security configuration. You can use the AWS Management Console or the AWS CLI.

------
#### [ Console ]

**To specify a security configuration with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. Under **Security configuration and permissions**, find the **Security configuration** field. Select the dropdown menu or choose **Browse** to select the name of a security configuration that you created previously. Alternatively, choose **Create security configuration** to create a configuration that you can use for your cluster.

1. Choose any other options that apply to your cluster.

1. To launch your cluster, choose **Create cluster**.

------
#### [ CLI ]

**To specify a security configuration with the AWS CLI**
+ Use `aws emr create-cluster` to optionally apply a security configuration with `--security-configuration MySecConfig`, where `MySecConfig` is the name of the security configuration, as shown in the following example. The `--release-label` you specify must be 4.8.0 or later and the `--instance-type` can be any available.

  ```
  aws emr create-cluster --instance-type m5.xlarge --release-label emr-5.0.0 --security-configuration mySecConfig			
  ```

------

# Data protection in Amazon EMR
<a name="data-protection"></a>

The AWS [shared responsibility model](https://aws.amazon.com/compliance/shared-responsibility-model/) applies to data protection in Amazon EMR. As described in this model, AWS is responsible for protecting the global infrastructure that runs all of the AWS Cloud. You are responsible for maintaining control over your content that is hosted on this infrastructure. This content includes the security configuration and management tasks for the AWS that you use. For more information about data privacy, see the [Data Privacy FAQ ](https://aws.amazon.com/compliance/data-privacy-faq/). For information about data protection in Europe, see [the Amazon shared responsibility model and GDPR](http://aws.amazon.com/blogs/security/the-aws-shared-responsibility-model-and-gdpr/) blog post on the AWS Security Blog.

For data protection purposes, we recommend that you protect AWS account credentials and set up individual accounts with AWS Identity and Access Management. That way each user is given only the permissions necessary to fulfill their job duties. We also recommend that you secure your data in the following ways:
+ Use multi-factor authentication (MFA) with each account.
+ Use TLS to communicate with AWS resources. We require TLS 1.2.
+ Set up API and user activity logging with AWS CloudTrail.
+ Use AWS encryption solutions, along with all default security controls within AWS services.
+ Use advanced managed security services such as Amazon Macie, which assists in discovering and securing personal data that is stored in Amazon S3.
+ If you require FIPS 140-2 validated cryptographic modules when accessing AWS through a command line interface or an API, use a FIPS endpoint. For more information about the available FIPS endpoints, see [Federal Information Processing Standard (FIPS) 140-2](https://aws.amazon.com/compliance/fips/).

We strongly recommend that you never put sensitive identifying information, such as your customers' account numbers, into free-form fields such as a **Name** field. This includes when you work with Amazon EMR or other AWS services using the console, API, AWS CLI, or AWS SDKs. Any data that you enter into Amazon EMR or other services might get picked up for inclusion in diagnostic logs. When you provide a URL to an external server, don't include credentials information in the URL to validate your request to that server.

# Encrypt data at rest and in transit with Amazon EMR
<a name="emr-data-encryption"></a>

Data encryption helps prevent unauthorized users from reading data on a cluster and associated data storage systems. This includes data saved to persistent media, known as data *at rest*, and data that may be intercepted as it travels the network, known as data *in transit*.

Beginning with Amazon EMR version 4.8.0, you can use Amazon EMR security configurations to configure data encryption settings for clusters more easily. Security configurations offer settings to enable security for data in-transit and data at-rest in Amazon Elastic Block Store (Amazon EBS) volumes and EMRFS on Amazon S3. 

Optionally, beginning with Amazon EMR release version 4.1.0 and later, you can choose to configure transparent encryption in HDFS, which is not configured using security configurations. For more information, see [Transparent encryption in HDFS on Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-encryption-tdehdfs.html) in the *Amazon EMR Release Guide*.

**Topics**
+ [Encryption options for Amazon EMR](emr-data-encryption-options.md)
+ [Encryption at rest using a customer KMS key for the EMR WAL service](encryption-at-rest-kms.md)
+ [Create keys and certificates for data encryption with Amazon EMR](emr-encryption-enable.md)
+ [Understanding in-transit encryption](emr-encryption-support-matrix.md)

# Encryption options for Amazon EMR
<a name="emr-data-encryption-options"></a>

With Amazon EMR releases 4.8.0 and higher, you can use a security configuration to specify settings for encrypting data at rest, data in transit, or both. When you enable at-rest data encryption, you can choose to encrypt EMRFS data in Amazon S3, data in local disks, or both. Each security configuration that you create is stored in Amazon EMR rather than in the cluster configuration, so you can easily reuse a configuration to specify data encryption settings whenever you create a cluster. For more information, see [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md).

The following diagram shows the different data encryption options available with security configurations. 

![\[There are several in-transit and at-rest encryption options available with Amazon EMR.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/emr-encryption-options.png)


The following encryption options are also available and are not configured using a security configuration:
+ Optionally, with Amazon EMR versions 4.1.0 and later, you can choose to configure transparent encryption in HDFS. For more information, see [Transparent encryption in HDFS on Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-encryption-tdehdfs.html) in the *Amazon EMR Release Guide*.
+ If you are using a release version of Amazon EMR that does not support security configurations, you can configure encryption for EMRFS data in Amazon S3 manually. For more information, see [Specifying Amazon S3 encryption using EMRFS properties](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-emrfs-encryption.html).
+  If you are using an Amazon EMR version earlier than 5.24.0, an encrypted EBS root device volume is supported only when using a custom AMI. For more information, see [Creating a custom AMI with an encrypted Amazon EBS root device volume](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-custom-ami.html#emr-custom-ami-encrypted) in the *Amazon EMR Management Guide*.

**Note**  
Beginning with Amazon EMR version 5.24.0, you can use a security configuration option to encrypt EBS root device and storage volumes when you specify AWS KMS as your key provider. For more information, see [Local disk encryption](#emr-encryption-localdisk).

Data encryption requires keys and certificates. A security configuration gives you the flexibility to choose from several options, including keys managed by AWS Key Management Service, keys managed by Amazon S3, and keys and certificates from custom providers that you supply. When using AWS KMS as your key provider, charges apply for the storage and use of encryption keys. For more information, see [AWS KMS pricing](https://aws.amazon.com/kms/pricing/).

Before you specify encryption options, decide on the key and certificate management systems you want to use, so you can first create the keys and certificates or the custom providers that you specify as part of encryption settings.

## Encryption at rest for EMRFS data in Amazon S3
<a name="emr-encryption-s3"></a>

Amazon S3 encryption works with the Amazon EMR File System (EMRFS) objects read from and written to Amazon S3. You specify Amazon S3 server-side encryption (SSE) or client-side encryption (CSE) as the **Default encryption mode** when you enable encryption at rest. Optionally, you can specify different encryption methods for individual buckets using **Per bucket encryption overrides**. Regardless of whether Amazon S3 encryption is enabled, Transport Layer Security (TLS) encrypts the EMRFS objects in transit between EMR cluster nodes and Amazon S3. For more information about Amazon S3 encryption, see [Protecting data using encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingEncryption.html) in the *Amazon Simple Storage Service User Guide*.

**Note**  
When you use AWS KMS, charges apply for the storage and use of encryption keys. For more information, see [AWS KMS Pricing](https://aws.amazon.com/kms/pricing/).

### Amazon S3 server-side encryption
<a name="emr-encryption-s3-sse"></a>

All Amazon S3 buckets have encryption configured by default, and all new objects that are uploaded to an S3 bucket are automatically encrypted at rest, Amazon S3 encrypts data at the object level as it writes the data to disk and decrypts the data when it is accessed. For more information about SSE, see [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html) in the *Amazon Simple Storage Service User Guide*.

You can choose between two different key management systems when you specify SSE in Amazon EMR: 
+ **SSE-S3** – Amazon S3 manages keys for you.
+ **SSE-KMS** – You use an AWS KMS key to set up with policies suitable for Amazon EMR. For more information about key requirements for Amazon EMR, see [Using AWS KMS keys for encryption](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-enable.html#emr-awskms-keys).

SSE with customer-provided keys (SSE-C) is not available for use with Amazon EMR.

### Amazon S3 client-side encryption
<a name="emr-encryption-s3-cse"></a>

With Amazon S3 client-side encryption, the Amazon S3 encryption and decryption takes place in the EMRFS client on your cluster. Objects are encrypted before being uploaded to Amazon S3 and decrypted after they are downloaded. The provider you specify supplies the encryption key that the client uses. The client can use keys provided by AWS KMS (CSE-KMS) or a custom Java class that provides the client-side root key (CSE-C). The encryption specifics are slightly different between CSE-KMS and CSE-C, depending on the specified provider and the metadata of the object being decrypted or encrypted. For more information about these differences, see [Protecting data using client-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingClientSideEncryption.html) in the *Amazon Simple Storage Service User Guide*.

**Note**  
Amazon S3 CSE only ensures that EMRFS data exchanged with Amazon S3 is encrypted; not all data on cluster instance volumes is encrypted. Furthermore, because Hue does not use EMRFS, objects that the Hue S3 File Browser writes to Amazon S3 are not encrypted.

## Encryption at rest for data in Amazon EMR WAL
<a name="emr-encryption-wal"></a>

When you set up server-side encryption (SSE) for write-ahead logging (WAL), Amazon EMR encrypts data at rest. You can choose from two different key management systems when you specify SSE in Amazon EMR:

**SSE-EMR-WAL**  
Amazon EMR manages keys for you. By default, Amazon EMR encrypts the data that you stored in Amazon EMR WAL with SSE-EMR-WAL.

**SSE-KMS-WAL**  
You use an AWS KMS key to set up policies that apply to Amazon EMR WAL. For more information about configuring encryption at rest for EMR WAL using a customer KMS key, see [Encryption at rest using a customer KMS key for the EMR WAL service](https://docs.aws.amazon.com/emr/latest/ManagementGuide/encryption-at-rest-kms.html).

**Note**  
You can't use your own key with SSE when you enable WAL with Amazon EMR. For more information, see [Write-ahead logs (WAL) for Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-wal.html).

## Local disk encryption
<a name="emr-encryption-localdisk"></a>

The following mechanisms work together to encrypt local disks when you enable local disk encryption using an Amazon EMR security configuration.

### Open-source HDFS encryption
<a name="w2aac30c19c13c11c23b5"></a>

HDFS exchanges data between cluster instances during distributed processing. It also reads from and writes data to instance store volumes and the EBS volumes attached to instances. The following open-source Hadoop encryption options are activated when you enable local disk encryption:
+ [Secure Hadoop RPC](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_RPC) is set to `Privacy`, which uses Simple Authentication Security Layer (SASL). 
+ [Data encryption on HDFS block data transfer](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_Block_data_transfer.) is set to `true` and is configured to use AES 256 encryption.

**Note**  
You can activate additional Apache Hadoop encryption by enabling in-transit encryption. For more information, see [Encryption in transit](#emr-encryption-intransit). These encryption settings do not activate HDFS transparent encryption, which you can configure manually. For more information, see [Transparent encryption in HDFS on Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-encryption-tdehdfs.html) in the *Amazon EMR Release Guide*.

### Instance store encryption
<a name="w2aac30c19c13c11c23b7"></a>

For EC2 instance types that use NVMe-based SSDs as the instance store volume, NVMe encryption is used regardless of Amazon EMR encryption settings. For more information, see [NVMe SSD volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#nvme-ssd-volumes) in the *Amazon EC2 User Guide*. For other instance store volumes, Amazon EMR uses LUKS to encrypt the instance store volume when local disk encryption is enabled regardless of whether EBS volumes are encrypted using EBS encryption or LUKS.

### EBS volume encryption
<a name="w2aac30c19c13c11c23b9"></a>

If you create a cluster in a Region where Amazon EC2 encryption of EBS volumes is enabled by default for your account, EBS volumes are encrypted even if local disk encryption is not enabled. For more information, see [Encryption by default](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html#encryption-by-default) in the *Amazon EC2 User Guide*. With local disk encryption enabled in a security configuration, the Amazon EMR settings take precedence over the Amazon EC2 encryption-by-default settings for cluster EC2 instances.

The following options are available to encrypt EBS volumes using a security configuration:
+ **EBS encryption** – Beginning with Amazon EMR version 5.24.0, you can choose to enable EBS encryption. The EBS encryption option encrypts the EBS root device volume and attached storage volumes. The EBS encryption option is available only when you specify AWS Key Management Service as your key provider. We recommend using EBS encryption. 
+ **LUKS encryption** – If you choose to use LUKS encryption for Amazon EBS volumes, the LUKS encryption applies only to attached storage volumes, not to the root device volume. For more information about LUKS encryption, see the [LUKS on-disk specification](https://gitlab.com/cryptsetup/cryptsetup/wikis/Specification).

  For your key provider, you can set up an AWS KMS key with policies suitable for Amazon EMR, or a custom Java class that provides the encryption artifacts. When you use AWS KMS, charges apply for the storage and use of encryption keys. For more information, see [AWS KMS pricing](https://aws.amazon.com/kms/pricing/).

**Note**  
To check if EBS encryption is enabled on your cluster, it is recommended that you use `DescribeVolumes` API call. For more information, see [DescribeVolumes](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeVolumes.html). Running `lsblk` on the cluster will only check the status of LUKS encryption, instead of EBS encryption.

## Encryption in transit
<a name="emr-encryption-intransit"></a>

Several encryption mechanisms are enabled with in-transit encryption. These are open-source features, are application-specific, and might vary by Amazon EMR release. To enable in-transit encryption, use [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md) in Amazon EMR. For EMR clusters with in-transit encryption enabled, Amazon EMR automatically configures the open-source application configurations to enable in-transit encryption. For advanced use cases, you can configure open-source application configurations directly to override the default behavior in Amazon EMR. For more information, see [in-transit encryption support matrix](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-support-matrix.html) and [Configure applications](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html).

See the following to learn more specific details about open-source applications relevant to in-transit encryption:
+ When you enable in-transit encryption with a security configuration, Amazon EMR enables in-transit encryption for all open-source application endpoints that support in-transit encryption. Support for in-transit encryption for different application endpoints varies by the Amazon EMR release version. For more information, see the [in-transit encryption support matrix](https://docs.aws.amazon.com/).
+ You can override open-source configurations, which lets you do the following:
  + Disable TLS hostname verification if your user-provided TLS certificates doesn't meet requirements
  + Disable in-transit encryption for certain endpoints based on your performance and compatibility requirements
  + Control which TLS versions and cipher suites to use.

  You can find more details about the application-specific configurations in the [in-transit encryption support matrix](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-support-matrix.html)
+ Aside from enabling in-transit encryption with a security configuration, some communication channels also require additional security configurations for you to enable in-transit encryption. For example, some open-source application endpoints use Simple Authentication and Security Layer (SASL) for in-transit encryption, which requires that Kerberos authentication is enabled in the security configuration of the EMR cluster. To learn more about these endpoints, see the [in-transit encryption support matrix](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-support-matrix.html). 
+ We recommend that you use software that support TLS v1.2 or higher. Amazon EMR on EC2 ships the default Corretto JDK distribution, which determines which TLS versions, cipher suites, and key sizes are allowed by the open-source networks that run on Java. At this time, most open-source frameworks enforce TLS v1.2 or higher for Amazon EMR 7.0.0 and higher releases. This is because most open-source frameworks run on Java 17 for Amazon EMR 7.0.0 and higher. Older Amazon EMR release versions might support TLS v1.0 and v1.1 because they consume older Java versions, but Corretto JDK might change which TLS versions that Java supports, which might impact existing Amazon EMR releases.

You specify the encryption artifacts used for in-transit encryption in one of two ways: either by providing a zipped file of certificates that you upload to Amazon S3, or by referencing a custom Java class that provides encryption artifacts. For more information, see [Providing certificates for encrypting data in transit with Amazon EMR encryption](emr-encryption-enable.md#emr-encryption-certificates).

# Encryption at rest using a customer KMS key for the EMR WAL service
<a name="encryption-at-rest-kms"></a>

EMR write-ahead logs (WAL) provides customer KMS key encryption-at-rest support. The following details at a high level how Amazon EMR WAL is integrated with AWS KMS:

The EMR write-ahead logs (WAL) interact with AWS during the following operations: `CreateWAL`, `AppendEdit`, `ArchiveWALCheckPoint`, `CompleteWALFlush`, `DeleteWAL`, `GetCurrentWALTime`, `ReplayEdits`, `TrimWAL` via the `EMR_EC2_DefaultRole` by default When any the previous operations listed are invoked, the EMR WAL makes `Decrypt` and `GenerateDataKey` against the KMS key.

## Considerations
<a name="encryption-at-rest-considerations"></a>

Consider the following when using AWS KMS based encryption for EMR WAL:
+ The encryption configuration can't be changed after an EMR WAL is created.
+ When you use KMS encryption with your own KMS key, the key must exist in the same region as your Amazon EMR cluster.
+ You are responsible to maintain all required IAM permissions and it's recommended to not revoke the needed permissions during the life of the WAL. Otherwise, it will cause unexpected failure scenarios, such as the inability to delete EMR WAL, as the associated encryption key doesn't exist.
+ There is a cost associated with using AWS KMS keys. For more information, see [AWS Key Management Service pricing](https://aws.amazon.com/kms/pricing/).

## Required IAM permissions
<a name="encryption-at-rest-required-iam-permissions"></a>

To use your customer KMS key to encrypt EMR WAL at rest, you need to make sure you set proper permission to the EMR WAL client role and the EMR WAL service principal `emrwal.amazonaws.com`.

### Permissions for the EMR WAL client role
<a name="encryption-at-rest-permissions-client-role"></a>

Below is the IAM policy needed for the EMR WAL client role:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey*"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowKMSDecrypt"
    }
  ]
}
```

------

The EMR WAL client on EMR cluster will use `EMR_EC2_DefaultRole` by default. If you use a different role for the instance profile in the EMR cluster, make sure that each role has appropriate permissions.

For more information about managing the role policy, refer to [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).

### Permissions for the KMS key policy
<a name="encryption-at-rest-permissions-kms-key-policy"></a>

You need to give the EMR WAL client role and EMR WAL service `Decrypt` and `GenerateDataKey*` permission in your KMS policy. For more about the key policy management, refer to [KMS key policy](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey*"
      ],
      "Resource": [
        "arn:aws:kms:*:123456789012:key/*"
      ],
      "Sid": "AllowKMSDecrypt"
    }
  ]
}
```

------

The role specified in the snippet can change if you change the default role.

## Monitoring Amazon EMR WAL interaction with AWS KMS
<a name="encryption-at-rest-monitoring-emr-wal-kms"></a>

### Amazon EMR WAL encryption context
<a name="encryption-at-rest-encryption-context"></a>

An encryption context is a set of key–value pairs that contains arbitrary non-secret data. When you include an encryption context in a request to encrypt data, AWS KMS cryptographically binds the encryption context to the encrypted data. To decrypt the data, you must pass in the same encryption context.

In its [GenerateDataKey](https://docs.aws.amazon.com/kms/latest/APIReference/API_GenerateDataKey.html) and [Decrypt](https://docs.aws.amazon.com/kms/latest/APIReference/API_Decrypt.html) requests to AWS KMS, Amazon EMR WAL uses an encryption context with one name–value pairs that identify the EMR WAL name.

```
"encryptionContext": {
    "aws:emrwal:walname": "111222333444555-testworkspace-emrwalclustertest-emrwaltestwalname"
}
```

You can use the encryption context to identify these cryptographic operation in audit records and logs, such as AWS CloudTrail and [Amazon CloudWatch Logs](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html), and as a condition for authorization in policies and grants.

# Create keys and certificates for data encryption with Amazon EMR
<a name="emr-encryption-enable"></a>

Before you specify encryption options using a security configuration, decide on the provider you want to use for keys and encryption artifacts. For example, you can use AWS KMS or a custom provider that you create. Next, create the keys or key provider as described in this section.

## Providing keys for encrypting data at rest
<a name="emr-encryption-create-keys"></a>

You can use AWS Key Management Service (AWS KMS) or a custom key provider for at-rest data encryption in Amazon EMR. When you use AWS KMS, charges apply for the storage and use of encryption keys. For more information, see [AWS KMS pricing](https://aws.amazon.com/kms/pricing/). 

This topic provides key policy details for a KMS key to be used with Amazon EMR, as well as guidelines and code examples for writing a custom key provider class for Amazon S3 encryption. For more information about creating keys, see [Creating keys](https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html) in the *AWS Key Management Service Developer Guide*.

### Using AWS KMS keys for encryption
<a name="emr-awskms-keys"></a>

The AWS KMS encryption key must be created in the same Region as your Amazon EMR cluster instance and the Amazon S3 buckets used with EMRFS. If the key that you specify is in a different account from the one that you use to configure a cluster, you must specify the key using its ARN.

The role for the Amazon EC2 instance profile must have permissions to use the KMS key you specify. The default role for the instance profile in Amazon EMR is `EMR_EC2_DefaultRole`. If you use a different role for the instance profile, or you use IAM roles for EMRFS requests to Amazon S3, make sure that each role is added as a key user as appropriate. This gives the role permissions to use the KMS key. For more information, see [Using Key Policies](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html#key-policy-default-allow-users) in the *AWS Key Management Service Developer Guide* and [Configure IAM roles for EMRFS requests to Amazon S3](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html).

You can use the AWS Management Console to add your instance profile or EC2 instance profile to the list of key users for the specified KMS key, or you can use the AWS CLI or an AWS SDK to attach an appropriate key policy.

Note that Amazon EMR supports only [symmetric KMS keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#symmetric-cmks). You cannot use an [asymmetric KMS key](https://docs.aws.amazon.com/kms/latest/developerguide/symmetric-asymmetric.html#asymmetric-cmks) to encrypt data at rest in an Amazon EMR cluster. For help determining whether a KMS key is symmetric or asymmetric, see [ Identifying symmetric and asymmetric KMS keys](https://docs.aws.amazon.com/kms/latest/developerguide/find-symm-asymm.html).

The procedure below describes how to add the default Amazon EMR instance profile, `EMR_EC2_DefaultRole` as a *key user* using the AWS Management Console. It assumes that you have already created a KMS key. To create a new KMS key, see [Creating Keys](https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html) in the *AWS Key Management Service Developer Guide*.

**To add the EC2 instance profile for Amazon EMR to the list of encryption key users**

1. Sign in to the AWS Management Console and open the AWS Key Management Service (AWS KMS) console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms).

1. To change the AWS Region, use the Region selector in the upper-right corner of the page.

1. Select the alias of the KMS key to modify.

1. On the key details page under **Key Users**, choose **Add**.

1. In the **Add key users** dialog box, select the appropriate role. The name of the default role is `EMR_EC2_DefaultRole`.

1. Choose **Add**.

### Enabling EBS encryption by providing additional permissions for KMS keys
<a name="emr-awskms-ebs-encryption"></a>

Beginning with Amazon EMR version 5.24.0, you can encrypt EBS root device and storage volumes by using a security configuration option. To enable such option, you must specify AWS KMS as your key provider. Additionally, you must grant the service role `EMR_DefaultRole` with permissions to use the AWS KMS key that you specify.

You can use the AWS Management Console to add the service role to the list of key users for the specified KMS key, or you can use the AWS CLI or an AWS SDK to attach an appropriate key policy.

The following procedure describes how to use the AWS Management Console to add the default Amazon EMR service role `EMR_DefaultRole` as a *key user*. It assumes that you have already created a KMS key. To create a new KMS key, see [Creating keys](https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html) in the *AWS Key Management Service Developer Guide*.

**To add the Amazon EMR service role to the list of encryption key users**

1. Sign in to the AWS Management Console and open the AWS Key Management Service (AWS KMS) console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms).

1. To change the AWS Region, use the Region selector in the upper-right corner of the page.

1. Choose **Customer managed keys** in the left sidebar.

1. Select the alias of the KMS key to modify.

1. On the key details page under **Key Users**, choose **Add**.

1. In the **Add key users** section, select the appropriate role. The name of the default service role for Amazon EMR is `EMR_DefaultRole`.

1. Choose **Add**.

### Creating a custom key provider
<a name="emr-custom-keys"></a>

When using a security configuration, you must specify a different provider class name for local disk encryption and Amazon S3 encryption. The requirements for custom key provider depend on whether you use local disk encryption and Amazon S3 encryption, as well as the Amazon EMR release version.

Depending on the type of encryption you use when creating a custom key provider, the application must also implement different EncryptionMaterialsProvider interfaces. Both interfaces are available in the AWS SDK for Java version 1.11.0 and later.
+ To implement Amazon S3 encryption, use the [ com.amazonaws.services.s3.model.EncryptionMaterialsProvider interface](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/EncryptionMaterialsProvider.html).
+ To implement local disk encryption, use the [ com.amazonaws.services.elasticmapreduce.spi.security.EncryptionMaterialsProvider interface](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/spi/security/EncryptionMaterialsProvider.html).

You can use any strategy to provide encryption materials for the implementation. For example, you might choose to provide static encryption materials or integrate with a more complex key management system.

If you’re using Amazon S3 encryption, you must use the encryption algorithms **AES/GCM/NoPadding** for custom encryption materials.

If you’re using local disk encryption, the encryption algorithm to use for custom encryption materials varies by EMR release. For Amazon EMR 7.0.0 and lower, you must use **AES/GCM/NoPadding**. For Amazon EMR 7.1.0 and higher, you must use **AES**.

The EncryptionMaterialsProvider class gets encryption materials by encryption context. Amazon EMR populates encryption context information at runtime to help the caller determine the correct encryption materials to return.

**Example: Using a custom key provider for Amazon S3 encryption with EMRFS**  
When Amazon EMR fetches the encryption materials from the EncryptionMaterialsProvider class to perform encryption, EMRFS optionally populates the materialsDescription argument with two fields: the Amazon S3 URI for the object and the JobFlowId of the cluster, which can be used by the EncryptionMaterialsProvider class to return encryption materials selectively.  
For example, the provider may return different keys for different Amazon S3 URI prefixes. It is the description of the returned encryption materials that is eventually stored with the Amazon S3 object rather than the materialsDescription value that is generated by EMRFS and passed to the provider. While decrypting an Amazon S3 object, the encryption materials description is passed to the EncryptionMaterialsProvider class, so that it can, again, selectively return the matching key to decrypt the object.  
An EncryptionMaterialsProvider reference implementation is provided below. Another custom provider, [EMRFSRSAEncryptionMaterialsProvider](https://github.com/awslabs/emr-sample-apps/tree/master/emrfs-plugins/EMRFSRSAEncryptionMaterialsProvider), is available from GitHub.   

```
import com.amazonaws.services.s3.model.EncryptionMaterials;
import com.amazonaws.services.s3.model.EncryptionMaterialsProvider;
import com.amazonaws.services.s3.model.KMSEncryptionMaterials;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configuration;

import java.util.Map;

/**
 * Provides KMSEncryptionMaterials according to Configuration
 */
public class MyEncryptionMaterialsProviders implements EncryptionMaterialsProvider, Configurable{
  private Configuration conf;
  private String kmsKeyId;
  private EncryptionMaterials encryptionMaterials;

  private void init() {
    this.kmsKeyId = conf.get("my.kms.key.id");
    this.encryptionMaterials = new KMSEncryptionMaterials(kmsKeyId);
  }

  @Override
  public void setConf(Configuration conf) {
    this.conf = conf;
    init();
  }

  @Override
  public Configuration getConf() {
    return this.conf;
  }

  @Override
  public void refresh() {

  }

  @Override
  public EncryptionMaterials getEncryptionMaterials(Map<String, String> materialsDescription) {
    return this.encryptionMaterials;
  }

  @Override
  public EncryptionMaterials getEncryptionMaterials() {
    return this.encryptionMaterials;
  }
}
```

## Providing certificates for encrypting data in transit with Amazon EMR encryption
<a name="emr-encryption-certificates"></a>

With Amazon EMR release version 4.8.0 or later, you have two options for specifying artifacts for encrypting data in transit using a security configuration: 
+ You can manually create PEM certificates, include them in a .zip file, and then reference the .zip file in Amazon S3.
+ You can implement a custom certificate provider as a Java class. You specify the JAR file of the application in Amazon S3, and then provide the full class name of the provider as declared in the application. The class must implement the [TLSArtifactsProvider](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/spi/security/TLSArtifactsProvider.html) interface available beginning with the AWS SDK for Java version 1.11.0.

Amazon EMR automatically downloads artifacts to each node in the cluster and later uses them to implement the open-source, in-transit encryption features. For more information about available options, see [Encryption in transit](emr-data-encryption-options.md#emr-encryption-intransit).

### Using PEM certificates
<a name="emr-encryption-pem-certificate"></a>

When you specify a .zip file for in-transit encryption, the security configuration expects PEM files within the .zip file to be named exactly as they appear below:


**In-transit encryption certificates**  

| File name | Required/optional | Details | 
| --- | --- | --- | 
| privateKey.pem | Required | Private key | 
| certificateChain.pem | Required | Certificate chain | 
| trustedCertificates.pem | Optional | We recommend that you provide a certificate that isn't signed by the the Java default trusted root certification authority (CA) or an intermediate CA that can link to the Java default trusted root CA. We don't reocmmend that you use public CAs when you use wildcard certificates or when you disable hostname verification. | 

You likely want to configure the private key PEM file to be a wildcard certificate that enables access to the Amazon VPC domain in which your cluster instances reside. For example, if your cluster resides in us-east-1 (N. Virginia), you could specify a common name in the certificate configuration that allows access to the cluster by specifying `CN=*.ec2.internal` in the certificate subject definition. If your cluster resides in us-west-2 (Oregon), you could specify `CN=*.us-west-2.compute.internal`.

If the provided PEM file in the encryption artifact doesn't have a wildcard character for the domain in the common name, you must change the value of `hadoop.ssl.hostname.verifier` to `ALLOW_ALL`. To do so in Amazon EMR releases 7.3.0 and higher, add the `core-site` classification when you submit configurations to a cluster. In releases lower than 7.3.0, add the configuration `"hadoop.ssl.hostname.verifier": "ALLOW_ALL"` directly into the `core-site.xml` file. This change is required because the default hostname verifier requires a hostname without the wildcard because all hosts in the cluster use it. For more information about EMR cluster configuration within an Amazon VPC, see [Configure networking in a VPC for Amazon EMR](emr-plan-vpc-subnet.md).

The following example demonstrates how to use [OpenSSL](https://www.openssl.org/) to generate a self-signed X.509 certificate with a 2048-bit RSA private key. The key allows access to the issuer's Amazon EMR cluster instances in the `us-west-2` (Oregon) Region as specified by the `*.us-west-2.compute.internal` domain name as the common name.

Other optional subject items, such as country (C), state (S), and Locale (L), are specified. Because a self-signed certificate is generated, the second command in the example copies the `certificateChain.pem` file to the `trustedCertificates.pem` file. The third command uses `zip` to create the `my-certs.zip` file that contains the certificates.


**Important**  
This example is a proof-of-concept demonstration only. Using self-signed certificates is not recommended and presents a potential security risk. For production systems, use a trusted certification authority (CA) to issue certificates.

```
$ openssl req -x509 -newkey rsa:2048 -keyout privateKey.pem -out certificateChain.pem -days 365 -nodes -subj '/C=US/ST=Washington/L=Seattle/O=MyOrg/OU=MyDept/CN=*.us-west-2.compute.internal'
$ cp certificateChain.pem trustedCertificates.pem
$ zip -r -X my-certs.zip certificateChain.pem privateKey.pem trustedCertificates.pem
```

# Understanding in-transit encryption
<a name="emr-encryption-support-matrix"></a>

You can configure an EMR cluster to run open-source frameworks such as [Apache Spark](https://aws.amazon.com/emr/features/spark/), [Apache Hive](https://aws.amazon.com/emr/features/hive/), and [Presto](https://aws.amazon.com/emr/features/presto/). each of these open-source frameworks has a set of processes running on the EC2 instances of a cluster. Each of these processes can host network endpoints for network communication.

If in-transit encryption is enabled on an EMR cluster, different network endpoints use different encryption mechanisms. See the following sections to learn more about the specific open-source framework network endpoints supported with in-transit encryption, the related encryption mechanisms, and which Amazon EMR release added the support. Each open-source application might also have different best practices and open-source framework configurations that you can change. 

 For the most in-transit encryption coverage, we recommend that you enable both in-transit encryption and Kerberos. If you only enable in-transit encryption, then in-transit encryption will be available only for the network endpoints that support TLS. Kerberos is necessary because some open-source framework network endpoints use Simple Authentication and Security Layer (SASL) for in-transit encryption.

Note that any open-source frameworks not supported in Amazon EMR 7.x.x releases are not included.

## Spark
<a name="emr-encryption-support-matrix-spark"></a>

When you enable in-transit encryption in security configurations, `spark.authenticate` is automatically set to `true` and uses AES-based encryption for RPC connections.

Starting with Amazon EMR 7.3.0, if you use in-transit encryption and Kerberos authentication, you can't use Spark applications that depend on the Hive metastore. Hive 3 fixes this issue in [HIVE-16340](https://issues.apache.org/jira/browse/HIVE-16340). [HIVE-44114](https://issues.apache.org/jira/browse/SPARK-44114) fully resolves this issue when open-source Spark can upgrade to Hive 3. In the meantime, you can set `hive.metastore.use.SSL` to `false` to work around this issue. For more information, see [Configure applications](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html).

For more information, see [Spark security](https://spark.apache.org/docs/latest/security) in the Apache Spark documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Spark History Server  |  spark.ssl.history.port  |  18480  |  TLS  |  emr-5.3.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Spark UI  |  spark.ui.port  |  4440  |  TLS  |  emr-5.3.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Spark Driver  |  spark.driver.port  |  Dynamic  |  Spark AES-based encryption  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Spark Executor  |  Executor Port (no named config)  |  Dynamic  |  Spark AES-based encryption  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  YARN NodeManager  |  spark.shuffle.service.port1  |  7337  |  Spark AES-based encryption  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 

1`spark.shuffle.service.port` is hosted on YARN NodeManager but is only used by Apache Spark.

**Known issue**

On intransit enabled clusters `spark.yarn.historyServer.address` configuration is currently using port `18080`, which prevents access to spark application UI using YARN tracking URL. **Affects Version:** EMR - 7.3.0 to EMR - 7.9.0.

Use the following workaround:

1. Modify the `spark.yarn.historyServer.address` configuration in `/etc/spark/conf/spark-defaults.conf` to use the `HTTPS` port number `18480` on a running cluster.

1. This can also be provided in configuration overrides while launching the cluster.

Example configuration:

```
[
                               {
                                 "Classification": "spark-defaults",
                                 "Properties": {
                                     "spark.yarn.historyServer.address": "${hadoopconf-yarn.resourcemanager.hostname}:18480"
                                 }
                               }
  
                               ]
```

## Hadoop YARN
<a name="emr-encryption-support-matrix-hadoop-yarn"></a>

[ Secure Hadoop RPC](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_RPC) is set to `privacy` and uses SASL-based in-transit encryption. This requires that Kerberos authentication is enabled in the security configuration. If you don't want in-transit encryption for Hadoop RPC, configure `hadoop.rpc.protection = authentication`. We recommend that you use the default configuration for maximum security.

If your TLS certificates can't meet TLS hostname verification requirements, you can configure `hadoop.ssl.hostname.verifier = ALLOW_ALL`. We recommend that you use the default configuration of `hadoop.ssl.hostname.verifier = DEFAULT`, which enforces TLS hostname verification. 

To disable HTTPS for the YARN web application endpoints, configure `yarn.http.policy = HTTP_ONLY`. This makes it so that traffic to these endpoints stays unencrypted. We recommend that you use the default configuration for maximum security.

For more information, see [Hadoop in secure mode](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html) in the Apache Hadoop documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
| ResourceManager |  yarn.resourcemanager.webapp.address  |  8088  |  TLS  |  emr-7.3.0\$1  | 
| ResourceManager |  yarn.resourcemanager.resource-tracker.address  |  8025  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
| ResourceManager |  yarn.resourcemanager.scheduler.address  |  8030  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
| ResourceManager |  yarn.resourcemanager.address  |  8032  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
| ResourceManager |  yarn.resourcemanager.admin.address  |  8033  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
| TimelineServer |  yarn.timeline-service.address  |  10200  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
| TimelineServer |  yarn.timeline-service.webapp.address  |  8188  |  TLS  |  emr-7.3.0\$1  | 
|  WebApplicationProxy  |  yarn.web-proxy.address  |  20888  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  NodeManager  |  yarn.nodemanager.address  |  8041  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  NodeManager  |  yarn.nodemanager.localizer.address  |  8040  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  NodeManager  |  yarn.nodemanager.webapp.address  |  8044  |  TLS  |  emr-7.3.0\$1  | 
|  NodeManager  |  mapreduce.shuffle.port1  |  13562  |  TLS  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  NodeManager  |  spark.shuffle.service.port2  |  7337  |  Spark AES-based encryption  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 

1 `mapreduce.shuffle.port` is hosted on YARN NodeManager but is only used by Hadoop MapReduce.

2 `spark.shuffle.service.port` is hosted on YARN NodeManager but is only used by Apache Spark.

**Known issue**

The `yarn.log.server.url` configuration in is currently using HTTP with port 19888, which prevents access to application logs from the Resource Manager UI. **Affects Version:** EMR - 7.3.0 to EMR - 7.8.0.

Use the following workaround:

1. Modify the `yarn.log.server.url` configuration in `yarn-site.xml` to use the `HTTPS` protocol and port number `19890`.

1. Restart YARN Resource Manager: `sudo systemctl restart hadoop-yarn-resourcemanager.service`.

## Hadoop HDFS
<a name="emr-encryption-support-matrix-hadoop-hdfs"></a>

The Hadoop name node, data node, and journal node all support TLS by default if in-transit encryption is enabled in EMR clusters.

[ Secure Hadoop RPC](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_RPC) is set to to `privacy` and uses SASL-based in-transit encryption. This requires that Kerberos authentication is enabled in the security configuration.

We recommend that you don't change the default ports used for HTTPS endpoints.

[ Data encryption on HDFS block transfer uses](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_Block_data_transfer.) AES 256 and requires that at-rest encryption is enabled in the security configuration.

For more information, see [Hadoop in secure mode](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html) in the Apache Hadoop documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Namenode  |  dfs.namenode.https-address  |  9871  |  TLS  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Namenode  |  dfs.namenode.rpc-address  |  8020  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Datanode  |  dfs.datanode.https.address  |  9865  |  TLS  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Datanode  |  dfs.datanode.address  |  9866  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Journal Node  |  dfs.journalnode.https-address  |  8481  |  TLS  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Journal Node  |  dfs.journalnode.rpc-address  |  8485  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  DFSZKFailoverController  |  dfs.ha.zkfc.port  |  8019  |  None  |  TLS for ZKFC is only supported in Hadoop 3.4.0. See [HADOOP-18919](https://issues.apache.org/jira/browse/HADOOP-18919) for more information. Amazon EMR release 7.1.0 is currently on Hadoop 3.3.6. Higher Amazon EMR releases are on Hadoop 3.4.0 in the future  | 

## Hadoop MapReduce
<a name="emr-encryption-support-matrix-hadoop-mapreduce"></a>

Hadoop MapReduce, job history server, and MapReduce shuffle all support TLS by default when in-transit encryption is enabled in EMR clusters.

[ Hadoop MapReduce encrypted shuffle](https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html) uses TLS.

We recommend that you don't change the default ports for HTTPS endpoints.

For more information, see [Hadoop in secure mode](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html) in the Apache Hadoop documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  JobHistoryServer  |  mapreduce.jobhistory.webapp.https.address  |  19890  |  TLS  |  emr-7.3.0\$1  | 
|  YARN NodeManager  |  mapreduce.shuffle.port1  |  13562  |  TLS  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 

1 `mapreduce.shuffle.port` is hosted on YARN NodeManager but is only used by Hadoop MapReduce.

## Presto
<a name="emr-encryption-support-matrix-presto"></a>

In Amazon EMR releases 5.6.0 and higher, internal communication between the Presto coordinator and workers uses TLS Amazon EMR sets up all the required configurations to enable [secure internal communication](https://prestodb.io/docs/current/security/internal-communication.html) in Presto. 

If the connector uses the Hive metastore as the metadata store, communication between the communicator and the Hive metastore is also encrypted with TLS.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Presto Coordinator  |  http-server.https.port  |  8446  |  TLS  |  emr-5.6.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Presto Worker  |  http-server.https.port  |  8446  |  TLS  |  emr-5.6.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 

## Trino
<a name="emr-encryption-support-matrix-trino"></a>

In Amazon EMR releases 6.1.0 and higher, internal communication between the Presto coordinator and workers uses TLS Amazon EMR sets up all the required configurations to enable [secure internal communication](https://trino.io/docs/current/security/internal-communication.html) in Trino. 

If the connector uses the Hive metastore as the metadata store, communication between the communicator and the Hive metastore is also encrypted with TLS.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Trino Coordinator  |  http-server.https.port  |  8446  |  TLS  |  emr-6.1.0\$1, emr-7.0.0\$1  | 
|  Trino Worker  |  http-server.https.port  |  8446  |  TLS  |  emr-6.1.0\$1, emr-7.0.0\$1  | 

## Hive and Tez
<a name="emr-encryption-support-matrix-hive-tez"></a>

By default, Hive server 2, Hive metastore server, Hive LLAP Daemon web UI, and Hive LLAP shuffle all support TLS when in-transit encryption is enabled in the EMR clusters. For more information about the Hive configurations, see [Configuration properties](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties).

Tez UI that's hosted on the Tomcat server is also HTTPS-enabled when in-transit encryption is enable in the EMR cluster. However, HTTPS is disabled for the Tez AM web UI service so AM users don't have access to the keystore file for the opening SSL listener. You can also enable this behavior with the Boolean configurations `tez.am.tez-ui.webservice.enable.ssl` and `tez.am.tez-ui.webservice.enable.client.auth`.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  HiveServer2  |  hive.server2.thrift.port  |  10000  |  TLS  |  emr-6.9.0\$1, emr-7.0.0\$1  | 
|  HiveServer2  |  hive.server2.thrift.http.port  |  10001  |  TLS  |  emr-6.9.0\$1, emr-7.0.0\$1  | 
|  HiveServer2  |  hive.server2.webui.port  |  10002  |  TLS  |  emr-7.3.0\$1  | 
|  HiveMetastoreServer  |  hive.metastore.port  |  9083  |  TLS  |  emr-7.3.0\$1  | 
|  LLAP Daemon  |  hive.llap.daemon.yarn.shuffle.port  |  15551  |  TLS  |  emr-7.3.0\$1  | 
|  LLAP Daemon  |  hive.llap.daemon.web.port  |  15002  |  TLS  |  emr-7.3.0\$1  | 
|  LLAP Daemon  |  hive.llap.daemon.output.service.port  |  15003  |  None  |  Hive doesn't support in-transit encryption for this endpoint  | 
|  LLAP Daemon  |  hive.llap.management.rpc.port  |  15004  |  None  |  Hive doesn't support in-transit encryption for this endpoint  | 
|  LLAP Daemon  |  hive.llap.plugin.rpc.port  |  Dynamic  |  None  |  Hive doesn't support in-transit encryption for this endpoint  | 
|  LLAP Daemon  |  hive.llap.daemon.rpc.port  |  Dynamic  |  None  |  Hive doesn't support in-transit encryption for this endpoint  | 
|  WebHCat  |  templeton.port  |  50111  |  TLS  |  emr-7.3.0\$1  | 
|  Tez Application Master  |  tez.am.client.am.port-range tez.am.task.am.port-range  |  Dynamic  |  None  |  Tez doesn't support in-transit encryption for this endpoint  | 
|  Tez Application Master  |  tez.am.tez-ui.webservice.port-range  |  Dynamic  |  None  |  Disabled by default. Can be enabled using Tez configurations in emr-7.3.0\$1  | 
|  Tez Task  |  N/A - not configurable  |  Dynamic  |  None  |  Tez doesn't support in-transit encryption for this endpoint  | 
|  Tez UI  |  Configurable via Tomcat server on which Tez UI is hosted  |  8080  |  TLS  |  emr-7.3.0\$1  | 

## Flink
<a name="emr-encryption-support-matrix-flink"></a>

 Apache Flink REST endpoints and internal communication between flink processes support TLS by default when you enable in-transit encryption in EMR clusters. 

 [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-internal-enabled](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-internal-enabled) is set to `true` and uses in-transit encryption for internal communication between the Flink processes. If you don't want in-transit encryption for internal communication, disable that configuration. We recommend you use the default configuration for maximum security. 

 Amazon EMR sets [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-rest-enabled](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-rest-enabled) to `true` and uses in-transit encryption for the REST endpoints. Additionally, Amazon EMR also sets [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#historyserver-web-ssl-enabled](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#historyserver-web-ssl-enabled) to true to use TLS communication with the Flink history server. If you don't want in-transit encryption for the REST points, disable these configurations. We recommend you use the default configuration for maximum security. 

Amazon EMR uses [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-algorithms](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-algorithms). to specify the list of ciphers that use AES-based encryption. Override this configuration to use the ciphers you want.

For more information, see [SSL Setup](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/security/security-ssl/) in the Flink documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Flink History Server  |  historyserver.web.port  |  8082  |  TLS  |  emr-7.3.0\$1  | 
|  Job Manager Rest Server  |  rest.bind-port rest.port  |  Dynamic  |  TLS  |  emr-7.3.0\$1  | 

## HBase
<a name="emr-encryption-support-matrix-hbase"></a>

 Amazon EMR sets [ Secure Hadoop RPC](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_RPC) to `privacy`. HMaster and RegionServer use SASL-based in-transit encryption. This requires that Kerberos authentication is enabled in the security configuration. 

Amazon EMR sets `hbase.ssl.enabled` to true and uses TLS for UI endpoints. If you don't want to use TLS for UI endpoints, disable this configuration. We recommend that you use the default configuration for maximum security.

Amazon EMR sets `hbase.rest.ssl.enabled` and `hbase.thrift.ssl.enabled` and uses TLS for the REST and Thirft server endpoints, respectively. If you don't want to use TLS for these endpoints, disable this configuration. We recommend that you use the default configuration for maximum security.

Starting with EMR 7.6.0, TLS is supported on HMaster and RegionServer endpoints. Amazon EMR also sets `hbase.server.netty.tls.enabled` and `hbase.client.netty.tls.enabled`. If you don’t want to use TLS for these endpoints, disable this configuration. We recommend that you use the default configuration, which provides encryption and thus higher security. To learn more, see [Transport Level Security (TLS) in HBase RPC communication](https://hbase.apache.org/book.html#_transport_level_security_tls_in_hbase_rpc_communication) in the *Apache HBase Reference Guide*. 


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  HMaster  |  HMaster  |  16000  |  SASL \$1 Kerberos TLS  |  SASL \$1 Kerberos in emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, and emr-7.0.0\$1 TLS in emr-7.6.0\$1  | 
|  HMaster  |  HMaster UI  |  16010  |  TLS  |  emr-7.3.0\$1  | 
|  RegionServer  |  RegionServer  |  16020  |  SASL \$1 Kerberos TLS  |  SASL \$1 Kerberos in emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, and emr-7.0.0\$1 TLS in emr-7.6.0\$1  | 
|  RegionServer  |  RegionServer Info  |  16030  |  TLS  |  emr-7.3.0\$1  | 
|  HBase Rest Server  |  Rest Server  |  8070  |  TLS  |  emr-7.3.0\$1  | 
|  HBase Rest Server  |  Rest UI  |  8085  |  TLS  |  emr-7.3.0\$1  | 
|  Hbase Thrift Server  |  Thrift Server  |  9090  |  TLS  |  emr-7.3.0\$1  | 
|  Hbase Thrift Server  |  Thrift Server UI  |  9095  |  TLS  |  emr-7.3.0\$1  | 

## Phoenix
<a name="emr-encryption-support-matrix-phoenix"></a>

 If you enabled in-transit encryption in your EMR cluster, Phoenix Query Serversupports the TLS property `phoenix.queryserver.tls.enabled`, which is set to `true` by default. 

To learn more, see [ Configurations relating to HTTPS](https://phoenix.apache.org/docs/features/query-server#query-server-configuration) in the Phoenix Query Server documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Query Server  |  phoenix.queryserver.http.port  |  8765  |  TLS  |  emr-7.3.0\$1  | 

## Oozie
<a name="emr-encryption-support-matrix-oozie"></a>

[OOZIE-3673](https://issues.apache.org/jira/browse/OOZIE-3673) is available on Amazon EMR if you run Oozie on Amazon EMR 7.3.0 and higher. If you need to configure custom SSL or TLS protocols when you run an email action, you can set the property `oozie.email.smtp.ssl.protocols` in the `oozie-site.xml` file. By default, if you enabled in-transit encryption, Amazon EMR uses the TLS v1.3 protocol.

[OOZIE-3677](https://issues.apache.org/jira/browse/OOZIE-3677) and [OOZIE-3674](https://issues.apache.org/jira/browse/OOZIE-3674) are also available on Amazon EMR if you run Oozie on Amazon EMR 7.3.0 and higher. This lets you specify the properties `keyStoreType` and `trustStoreType` in `oozie-site.xml`. OOZIE-3674 adds the parameter `--insecure` to the Oozie client so it can ignore certificate errors.

Oozie enforces TLS hostname verification, which means that any certificate you use for in-transit encryption must meet hostname verification requirements. If the certificate doesn't meet the criteria, the cluster might get stuck at the `oozie share lib update` stage when Amazon EMR provisions the cluster. We recommend that you update your certificates to make sure they're compliant with hostname verification. However, if you can't update the certificates, you can disable SSL for Oozie by setting the `oozie.https.enabled` property to `false` in cluster configuration. 


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  EmbeddedOozieServer  |  oozie.https.port  |  11443  |  TLS  |  emr-7.3.0\$1  | 
|  EmbeddedOozieServer  |  oozie.email.smtp.port  |  25  |  TLS  |  emr-7.3.0\$1  | 

## Hue
<a name="emr-encryption-support-matrix-hue"></a>

By default, Hue supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information about Hue configurations, see [Configure Hue with HTTPS / SSL](https://gethue.com/configure-hue-with-https-ssl/). 


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Hue  |  http\$1port  |  8888  |  TLS  |  emr-7.4.0\$1  | 

## Livy
<a name="emr-encryption-support-matrix-livy"></a>

By default, Livy supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information about Livy configurations, see [Enabling HTTPS with Apache Livy](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/enabling-https.html).

Starting with Amazon EMR 7.3.0, if you use in-transit encryption and Kerberos authentication, you can't use the Livy server for Spark applications that depend on the Hive metastore. This issue is fixed in [HIVE-16340](https://issues.apache.org/jira/browse/HIVE-16340) and is fully resolved in [SPARK-44114](https://issues.apache.org/jira/browse/SPARK-44114) when the open-source Spark application can upgrade to Hive 3. In the meantime, you can work around this issue if you set `hive.metastore.use.SSL` to `false`. For more information, see [Configure applications](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html).

For more information, see [enabling HTTPS with Apache Livy](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/enabling-https.html).


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  livy-server  |  livy.server.port  |  8998  |  TLS  |  emr-7.4.0\$1  | 

## JupyterEnterpriseGateway
<a name="emr-encryption-matrix-jupyter-enterprise"></a>

By default, Jupyter Enterprise Gateway supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information about the Jupyter Enterprise Gateway configurations, see [Securing Enterprise Gateway Server](https://jupyter-enterprise-gateway.readthedocs.io/en/v1.2.0/getting-started-security.html#securing-enterprise-gateway-server).


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  jupyter\$1enterprise\$1gateway  |  c.EnterpriseGatewayApp.port  |  9547  |  TLS  |  emr-7.4.0\$1  | 

## JupyterHub
<a name="emr-encryption-matrix-jupyter-hub"></a>

By default, JupyterHub supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information, see [Enabling SSL encryption](https://jupyterhub.readthedocs.io/en/latest/tutorial/getting-started/security-basics.html#enabling-ssl-encryption) in the JupyterHub documentation. It isn't recommended to disable encryption. 


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  jupyter\$1hub  |  c.JupyterHub.port  |  9443  |  TLS  |  emr-5.14.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 

## Zeppelin
<a name="emr-encryption-matrix-zeppelin"></a>

 By default, Zeppelin supports TLS when you enable in-transit encryption in your EMR cluster. For more information about the Zeppelin configurations, see [ SSL Configuration](https://zeppelin.apache.org/docs/0.11.1/setup/operation/configuration.html#ssl-configuration) in the Zeppelin documentation. 


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  zeppelin  |  zeppelin.server.ssl.port  |  8890  |  TLS  |  7.3.0\$1  | 

## Zookeeper
<a name="emr-encryption-matrix-zookeeper"></a>

Amazon EMR sets `serverCnxnFactory` to `org.apache.zookeeper.server.NettyServerCnxnFactory` to enable TLS for the Zookeeper quorum and client communication.

`secureClientPort` specifies the port that listens to TLS connections. If the client doesn't support TLS connections to Zookeeper, clients can connect to the insecure port of 2181 specified in `clientPort`. You can override or disable these two ports.

Amazon EMR sets both `sslQuorum` and `admin.forceHttps` to `true` to enable TLS communication for the quorum and admin server. If you don't want in-transit encryption for the quorum and the admin server, you can disable those configurations. We recommend that you use the default configurations for maximum security.

For more information, see [Encryption, Authentication, Authorization Options](https://zookeeper.apache.org/doc/r3.9.2/zookeeperAdmin.html#sc_authOptions) in the Zookeeper documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Zookeeper Server  |  secureClientPort  |  2281  |  TLS  |  emr-7.4.0\$1  | 
|  Zookeeper Server  |  Quorum Ports  |  There are 2: Followers use 2888 to connect to the leader. Leader election uses 3888  |  TLS  |  emr-7.4.0\$1  | 
|  Zookeeper Server  |  admin.serverPort  |  8341  |  TLS  |  emr-7.4.0\$1  | 

# AWS Identity and Access Management for Amazon EMR
<a name="emr-plan-access-iam"></a>

AWS Identity and Access Management (IAM) is an AWS service that helps an administrator securely control access to AWS resources. IAM administrators control who can be *authenticated* (signed in) and *authorized* (have permissions) to use Amazon EMR resources. IAM is an AWS service that you can use with no additional charge.

**Topics**
+ [Audience](#security_iam_audience)
+ [Authenticating with identities](#security_iam_authentication)
+ [Managing access using policies](#security_iam_access-manage)
+ [How Amazon EMR works with IAM](security_iam_service-with-iam.md)
+ [Runtime roles for Amazon EMR steps](emr-steps-runtime-roles.md)
+ [Configure IAM service roles for Amazon EMR permissions to AWS services and resources](emr-iam-roles.md)
+ [Amazon EMR identity-based policy examples](security_iam_id-based-policy-examples.md)

## Audience
<a name="security_iam_audience"></a>

How you use AWS Identity and Access Management (IAM) differs based on your role:
+ **Service user** - request permissions from your administrator if you cannot access features (see [Troubleshooting Amazon EMR identity and access](security_iam_troubleshoot.md))
+ **Service administrator** - determine user access and submit permission requests (see [How Amazon EMR works with IAM](security_iam_service-with-iam.md))
+ **IAM administrator** - write policies to manage access (see [Amazon EMR identity-based policy examples](security_iam_id-based-policy-examples.md))

## Authenticating with identities
<a name="security_iam_authentication"></a>

Authentication is how you sign in to AWS using your identity credentials. You must be authenticated as the AWS account root user, an IAM user, or by assuming an IAM role.

You can sign in as a federated identity using credentials from an identity source like AWS IAM Identity Center (IAM Identity Center), single sign-on authentication, or Google/Facebook credentials. For more information about signing in, see [How to sign in to your AWS account](https://docs.aws.amazon.com/signin/latest/userguide/how-to-sign-in.html) in the *AWS Sign-In User Guide*.

For programmatic access, AWS provides an SDK and CLI to cryptographically sign requests. For more information, see [AWS Signature Version 4 for API requests](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_sigv.html) in the *IAM User Guide*.

### AWS account root user
<a name="security_iam_authentication-rootuser"></a>

 When you create an AWS account, you begin with one sign-in identity called the AWS account *root user* that has complete access to all AWS services and resources. We strongly recommend that you don't use the root user for everyday tasks. For tasks that require root user credentials, see [Tasks that require root user credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_root-user.html#root-user-tasks) in the *IAM User Guide*. 

### Federated identity
<a name="security_iam_authentication-federated"></a>

As a best practice, require human users to use federation with an identity provider to access AWS services using temporary credentials.

A *federated identity* is a user from your enterprise directory, web identity provider, or Directory Service that accesses AWS services using credentials from an identity source. Federated identities assume roles that provide temporary credentials.

For centralized access management, we recommend AWS IAM Identity Center. For more information, see [What is IAM Identity Center?](https://docs.aws.amazon.com/singlesignon/latest/userguide/what-is.html) in the *AWS IAM Identity Center User Guide*.

### IAM users and groups
<a name="security_iam_authentication-iamuser"></a>

An *[IAM user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html)* is an identity with specific permissions for a single person or application. We recommend using temporary credentials instead of IAM users with long-term credentials. For more information, see [Require human users to use federation with an identity provider to access AWS using temporary credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#bp-users-federation-idp) in the *IAM User Guide*.

An [https://docs.aws.amazon.com/IAM/latest/UserGuide/id_groups.html](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_groups.html) specifies a collection of IAM users and makes permissions easier to manage for large sets of users. For more information, see [Use cases for IAM users](https://docs.aws.amazon.com/IAM/latest/UserGuide/gs-identities-iam-users.html) in the *IAM User Guide*.

### IAM roles
<a name="security_iam_authentication-iamrole"></a>

An *[IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html)* is an identity with specific permissions that provides temporary credentials. You can assume a role by [switching from a user to an IAM role (console)](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-console.html) or by calling an AWS CLI or AWS API operation. For more information, see [Methods to assume a role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_manage-assume.html) in the *IAM User Guide*.

IAM roles are useful for federated user access, temporary IAM user permissions, cross-account access, cross-service access, and applications running on Amazon EC2. For more information, see [Cross account resource access in IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies-cross-account-resource-access.html) in the *IAM User Guide*.

## Managing access using policies
<a name="security_iam_access-manage"></a>

You control access in AWS by creating policies and attaching them to AWS identities or resources. A policy defines permissions when associated with an identity or resource. AWS evaluates these policies when a principal makes a request. Most policies are stored in AWS as JSON documents. For more information about JSON policy documents, see [Overview of JSON policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html#access_policies-json) in the *IAM User Guide*.

Using policies, administrators specify who has access to what by defining which **principal** can perform **actions** on what **resources**, and under what **conditions**.

By default, users and roles have no permissions. An IAM administrator creates IAM policies and adds them to roles, which users can then assume. IAM policies define permissions regardless of the method used to perform the operation.

### Identity-based policies
<a name="security_iam_access-manage-id-based-policies"></a>

Identity-based policies are JSON permissions policy documents that you attach to an identity (user, group, or role). These policies control what actions identities can perform, on which resources, and under what conditions. To learn how to create an identity-based policy, see [Define custom IAM permissions with customer managed policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) in the *IAM User Guide*.

Identity-based policies can be *inline policies* (embedded directly into a single identity) or *managed policies* (standalone policies attached to multiple identities). To learn how to choose between managed and inline policies, see [Choose between managed policies and inline policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies-choosing-managed-or-inline.html) in the *IAM User Guide*.

### Resource-based policies
<a name="security_iam_access-manage-resource-based-policies"></a>

Resource-based policies are JSON policy documents that you attach to a resource. Examples include IAM *role trust policies* and Amazon S3 *bucket policies*. In services that support resource-based policies, service administrators can use them to control access to a specific resource. You must [specify a principal](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html) in a resource-based policy.

Resource-based policies are inline policies that are located in that service. You can't use AWS managed policies from IAM in a resource-based policy.

### Other policy types
<a name="security_iam_access-manage-other-policies"></a>

AWS supports additional policy types that can set the maximum permissions granted by more common policy types:
+ **Permissions boundaries** – Set the maximum permissions that an identity-based policy can grant to an IAM entity. For more information, see [Permissions boundaries for IAM entities](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_boundaries.html) in the *IAM User Guide*.
+ **Service control policies (SCPs)** – Specify the maximum permissions for an organization or organizational unit in AWS Organizations. For more information, see [Service control policies](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html) in the *AWS Organizations User Guide*.
+ **Resource control policies (RCPs)** – Set the maximum available permissions for resources in your accounts. For more information, see [Resource control policies (RCPs)](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_rcps.html) in the *AWS Organizations User Guide*.
+ **Session policies** – Advanced policies passed as a parameter when creating a temporary session for a role or federated user. For more information, see [Session policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html#policies_session) in the *IAM User Guide*.

### Multiple policy types
<a name="security_iam_access-manage-multiple-policies"></a>

When multiple types of policies apply to a request, the resulting permissions are more complicated to understand. To learn how AWS determines whether to allow a request when multiple policy types are involved, see [Policy evaluation logic](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_evaluation-logic.html) in the *IAM User Guide*.

# How Amazon EMR works with IAM
<a name="security_iam_service-with-iam"></a>

Before you use IAM to manage access to Amazon EMR, learn what IAM features are available to use with Amazon EMR.


**IAM features you can use with Amazon EMR**  

| IAM feature | Amazon EMR support | 
| --- | --- | 
|  [Identity-based policies](#security_iam_service-with-iam-id-based-policies)  |   Yes  | 
|  [Resource-based policies](#security_iam_service-with-iam-resource-based-policies)  |   Yes  | 
|  [Policy actions](#security_iam_service-with-iam-id-based-policies-actions)  |   Yes  | 
|  [Policy resources](#security_iam_service-with-iam-id-based-policies-resources)  |   Yes  | 
|  [Policy condition keys](#security_iam_service-with-iam-id-based-policies-conditionkeys)  |   Yes  | 
|  [ACLs](#security_iam_service-with-iam-acls)  |   No   | 
|  [ABAC (tags in policies)](#security_iam_service-with-iam-tags)  |  Yes  | 
|  [Temporary credentials](#security_iam_service-with-iam-roles-tempcreds)  |   Yes  | 
|  [Principal permissions](#security_iam_service-with-iam-principal-permissions)  |   Yes  | 
|  [Service roles](#security_iam_service-with-iam-roles-service)  | No | 
|  [Service-linked roles](#security_iam_service-with-iam-roles-service-linked)  |  Yes  | 

To get a high-level view of how Amazon EMR and other AWS services work with most IAM features, see [AWS services that work with IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_aws-services-that-work-with-iam.html) in the *IAM User Guide*.

## Identity-based policies for Amazon EMR
<a name="security_iam_service-with-iam-id-based-policies"></a>

**Supports identity-based policies:** Yes

Identity-based policies are JSON permissions policy documents that you can attach to an identity, such as an IAM user, group of users, or role. These policies control what actions users and roles can perform, on which resources, and under what conditions. To learn how to create an identity-based policy, see [Define custom IAM permissions with customer managed policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) in the *IAM User Guide*.

With IAM identity-based policies, you can specify allowed or denied actions and resources as well as the conditions under which actions are allowed or denied. To learn about all of the elements that you can use in a JSON policy, see [IAM JSON policy elements reference](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements.html) in the *IAM User Guide*.

### Identity-based policy examples for Amazon EMR
<a name="security_iam_service-with-iam-id-based-policies-examples"></a>


To view examples of Amazon EMR identity-based policies, see [Amazon EMR identity-based policy examples](security_iam_id-based-policy-examples.md).

## Resource-based policies within Amazon EMR
<a name="security_iam_service-with-iam-resource-based-policies"></a>

**Supports resource-based policies:** Yes

Resource-based policies are JSON policy documents that you attach to a resource. Examples of resource-based policies are IAM *role trust policies* and Amazon S3 *bucket policies*. In services that support resource-based policies, service administrators can use them to control access to a specific resource. For the resource where the policy is attached, the policy defines what actions a specified principal can perform on that resource and under what conditions. You must [specify a principal](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_principal.html) in a resource-based policy. Principals can include accounts, users, roles, federated users, or AWS services.

To enable cross-account access, you can specify an entire account or IAM entities in another account as the principal in a resource-based policy. For more information, see [Cross account resource access in IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies-cross-account-resource-access.html) in the *IAM User Guide*.

## Policy actions for Amazon EMR
<a name="security_iam_service-with-iam-id-based-policies-actions"></a>

**Supports policy actions:** Yes

Administrators can use AWS JSON policies to specify who has access to what. That is, which **principal** can perform **actions** on what **resources**, and under what **conditions**.

The `Action` element of a JSON policy describes the actions that you can use to allow or deny access in a policy. Include actions in a policy to grant permissions to perform the associated operation.


To see a list of Amazon EMR actions, see [Actions, resources, and condition keys for Amazon EMR](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonemroneksemrcontainers.html) in the *Service Authorization Reference*.

Policy actions in Amazon EMR use the following prefix before the action:

```
EMR
```

To specify multiple actions in a single statement, separate them with commas.

```
"Action": [
      "EMR:action1",
      "EMR:action2"
         ]
```


To view examples of Amazon EMR identity-based policies, see [Amazon EMR identity-based policy examples](security_iam_id-based-policy-examples.md).

## Policy resources for Amazon EMR
<a name="security_iam_service-with-iam-id-based-policies-resources"></a>

**Supports policy resources:** Yes

Administrators can use AWS JSON policies to specify who has access to what. That is, which **principal** can perform **actions** on what **resources**, and under what **conditions**.

The `Resource` JSON policy element specifies the object or objects to which the action applies. As a best practice, specify a resource using its [Amazon Resource Name (ARN)](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html). For actions that don't support resource-level permissions, use a wildcard (\$1) to indicate that the statement applies to all resources.

```
"Resource": "*"
```

To see a list of Amazon EMR resource types and their ARNs, see [Resources Defined by Amazon EMR](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonelasticmapreduce.html#amazonelasticmapreduce-resources-for-iam-policies) in the *Service Authorization Reference*. To learn which actions you can specify the ARN of each resource, see [Actions, resources, and condition keys for Amazon EMR](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonemroneksemrcontainers.html).


To view examples of Amazon EMR identity-based policies, see [Amazon EMR identity-based policy examples](security_iam_id-based-policy-examples.md).

## Policy condition keys for Amazon EMR
<a name="security_iam_service-with-iam-id-based-policies-conditionkeys"></a>

**Supports service-specific policy condition keys:** Yes

Administrators can use AWS JSON policies to specify who has access to what. That is, which **principal** can perform **actions** on what **resources**, and under what **conditions**.

The `Condition` element specifies when statements execute based on defined criteria. You can create conditional expressions that use [condition operators](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_condition_operators.html), such as equals or less than, to match the condition in the policy with values in the request. To see all AWS global condition keys, see [AWS global condition context keys](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html) in the *IAM User Guide*.

To see a list of Amazon EMR condition keys and to learn which actions and resources you can use a condition key, see [Actions, resources, and condition keys for Amazon EMR](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonemroneksemrcontainers.html) in the *Service Authorization Reference*. 

To view examples of Amazon EMR identity-based policies, see [Amazon EMR identity-based policy examples](security_iam_id-based-policy-examples.md).

## Access control lists (ACLs) in Amazon EMR
<a name="security_iam_service-with-iam-acls"></a>

**Supports ACLs:** No 

Access control lists (ACLs) control which principals (account members, users, or roles) have permissions to access a resource. ACLs are similar to resource-based policies, although they do not use the JSON policy document format.

## Attribute-based access control (ABAC) with Amazon EMR
<a name="security_iam_service-with-iam-tags"></a>


|  |  | 
| --- |--- |
| Supports ABAC (tags in policies) | Yes | 

Attribute-based access control (ABAC) is an authorization strategy that defines permissions based on attributes called tags. You can attach tags to IAM entities and AWS resources, then design ABAC policies to allow operations when the principal's tag matches the tag on the resource.

To control access based on tags, you provide tag information in the [condition element](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_condition.html) of a policy using the `aws:ResourceTag/key-name`, `aws:RequestTag/key-name`, or `aws:TagKeys` condition keys.

If a service supports all three condition keys for every resource type, then the value is **Yes** for the service. If a service supports all three condition keys for only some resource types, then the value is **Partial**.

For more information about ABAC, see [Define permissions with ABAC authorization](https://docs.aws.amazon.com/IAM/latest/UserGuide/introduction_attribute-based-access-control.html) in the *IAM User Guide*. To view a tutorial with steps for setting up ABAC, see [Use attribute-based access control (ABAC)](https://docs.aws.amazon.com/IAM/latest/UserGuide/tutorial_attribute-based-access-control.html) in the *IAM User Guide*.

## Using Temporary credentials with Amazon EMR
<a name="security_iam_service-with-iam-roles-tempcreds"></a>

**Supports temporary credentials:** Yes

Temporary credentials provide short-term access to AWS resources and are automatically created when you use federation or switch roles. AWS recommends that you dynamically generate temporary credentials instead of using long-term access keys. For more information, see [Temporary security credentials in IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp.html) and [AWS services that work with IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_aws-services-that-work-with-iam.html) in the *IAM User Guide*.

## Cross-service principal permissions for Amazon EMR
<a name="security_iam_service-with-iam-principal-permissions"></a>

**Supports forward access sessions (FAS):** Yes

 Forward access sessions (FAS) use the permissions of the principal calling an AWS service, combined with the requesting AWS service to make requests to downstream services. For policy details when making FAS requests, see [Forward access sessions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_forward_access_sessions.html). 

## Service roles for Amazon EMR
<a name="security_iam_service-with-iam-roles-service"></a>


|  |  | 
| --- |--- |
| Supports service roles | No | 

## Service-linked roles for Amazon EMR
<a name="security_iam_service-with-iam-roles-service-linked"></a>


|  |  | 
| --- |--- |
| Supports service-linked roles | Yes | 

For details about creating or managing service-linked roles, see [AWS services that work with IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_aws-services-that-work-with-iam.html). Find a service in the table that includes a `Yes` in the **Service-linked role** column. Choose the **Yes** link to view the service-linked role documentation for that service.

## Use cluster and Notebook tags with IAM policies for access control
<a name="emr-tag-based-access"></a>

Permission for Amazon EMR actions associated with EMR Notebooks and EMR clusters can be fine-tuned using tag-based access control with identity-based IAM policies. You can use *condition keys* within a `Condition` element (also called a `Condition` block) to allow certain actions only when a notebook, cluster, or both has a certain tag key or key-value combination. You can also limit the `CreateEditor` action (which creates an EMR notebook) and the `RunJobFlow` action (which creates a cluster) so that a request for a tag must be submitted when the resource is created.

In Amazon EMR, the condition keys that can be used in a `Condition` element apply only to those Amazon EMR API actions where `ClusterID` or `NotebookID` is a required request parameter. For example, the [ModifyInstanceGroups](https://docs.aws.amazon.com/ElasticMapReduce/latest/API/API_ModifyInstanceGroups.html) action does not support context keys because `ClusterID` is an optional parameter.

When you create an EMR notebook, a default tag is applied with a key string of `creatorUserId` set to the value of the IAM user ID who created the notebook. This is useful for limiting allowed actions for the notebook only to the creator.

The following condition keys are available in Amazon EMR:
+ Use the `elasticmapreduce:ResourceTag/TagKeyString` condition context key to allow or deny user actions on clusters or notebooks with tags that have the `TagKeyString` that you specify. If an action passes both `ClusterID` and `NotebookID`, the condition applies to both the cluster and the notebook. This means that both resources must have the tag key string or key-value combination that you specify. You can use the `Resource` element to limit the statement so that it applies only to clusters or notebooks as required. For more information, see [Amazon EMR identity-based policy examples](security_iam_id-based-policy-examples.md).
+ Use the `elasticmapreduce:RequestTag/TagKeyString` condition context key to require a specific tag with actions/API calls. For example, you can use this condition context key along with the `CreateEditor` action to require that a key with `TagKeyString` is applied to a notebook when it is created.

## Examples
<a name="security_iam_service-with-iam-id-based-policies-examples"></a>

To see a list of Amazon EMR actions, see [Actions Defined by Amazon EMR](https://docs.aws.amazon.com/IAM/latest/UserGuide/list_amazonelasticmapreduce.html#amazonelasticmapreduce-actions-as-permissions) in the *IAM User Guide*.

# Runtime roles for Amazon EMR steps
<a name="emr-steps-runtime-roles"></a>

A *runtime role* is an AWS Identity and Access Management (IAM) role that you can specify when you submit a job or query to an Amazon EMR cluster. The job or query that you submit to your Amazon EMR cluster uses the runtime role to access AWS resources, such as objects in Amazon S3. You can specify runtime roles with Amazon EMR for Spark and Hive jobs.

You can also specify runtime roles when you connect to Amazon EMR clusters in Amazon SageMaker AI and when you attach an Amazon EMR Studio Workspace to an EMR cluster. For more information, see [Connect to an Amazon EMR cluster from SageMaker AI Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/connect-emr-clusters.html) and [Run an EMR Studio Workspace with a runtime role](emr-studio-runtime.md).

Previously, Amazon EMR clusters ran Amazon EMR jobs or queries with permissions based on the IAM policy attached to the instance profile that you used to launch the cluster. This meant that the policies had to contain the union of all the permissions for all jobs and queries that ran on an Amazon EMR cluster. With runtime roles, you can now manage access control for each job or query individually, instead of sharing the Amazon EMR instance profile of the cluster.

On Amazon EMR clusters with runtime roles, you can also apply AWS Lake Formation based access control to Spark, Hive, and Presto jobs and queries against your data lakes. To learn more on how to integrate with AWS Lake Formation, see [Integrate Amazon EMR with AWS Lake Formation](emr-lake-formation.md).

**Note**  
When you specify a runtime role for an Amazon EMR step, the jobs or queries that you submit can only access AWS resources that the policies attached to the runtime role allow. These jobs and queries can't access the Instance Metadata Service on the EC2 instances of the cluster or use the EC2 instance profile of the cluster to access any AWS resources. 

## Prerequisites for launching an Amazon EMR cluster with a runtime role
<a name="emr-steps-runtime-roles-configure"></a>

**Topics**
+ [Step 1: Set up security configurations in Amazon EMR](#configure-security)
+ [Step 2: Set up an EC2 instance profile for the Amazon EMR cluster](#configure-ec2-profile)
+ [Step 3: Set up a trust policy](#configure-trust-policy)

### Step 1: Set up security configurations in Amazon EMR
<a name="configure-security"></a>

Use the following JSON structure to create a security configuration on the AWS Command Line Interface (AWS CLI), and set `EnableApplicationScopedIAMRole` to `true`. For more information about security configurations, see [Use security configurations to set up Amazon EMR cluster security](emr-security-configurations.md).

```
{
    "AuthorizationConfiguration":{
        "IAMConfiguration":{
            "EnableApplicationScopedIAMRole":true
        }
    }
}
```

We recommend that you always enable the in-transit encryption options in the security configuration, so that data transferred over the internet is encrypted, rather than in plain text. You can skip these options if you don’t want to connect to Amazon EMR clusters with runtime roles from SageMaker Runtime Studio or EMR Studio. To configure data encryption, see [Configure data encryption](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-create-security-configuration.html#emr-security-configuration-encryption).

Alternatively, you can create a security configuration with custom settings with the [AWS Management Console](https://console.aws.amazon.com/emr/home#/securityConfigs).

### Step 2: Set up an EC2 instance profile for the Amazon EMR cluster
<a name="configure-ec2-profile"></a>

Amazon EMR clusters use the Amazon EC2 instance profile role to assume the runtime roles. To use runtime roles with Amazon EMR steps, add the following policies to the IAM role that you plan to use as the instance profile role. To add policies to an IAM role or edit an existing inline or managed policy, see [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowRuntimeRoleUsage",
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole",
        "sts:TagSession"
      ],
      "Resource": [
        "arn:aws:iam::123456789012:role/EMRRuntimeRole"
      ]
    }
  ]
}
```

------

### Step 3: Set up a trust policy
<a name="configure-trust-policy"></a>

For each IAM role that you plan to use as a runtime role, set the following trust policy, replacing `EMR_EC2_DefaultRole` with your instance profile role. To modify the trust policy of an IAM role, see [Modifying a role trust policy](https://docs.aws.amazon.com//IAM/latest/UserGuide/roles-managingrole-editing-console.html).

```
{
    "Sid":"AllowAssumeRole",
    "Effect":"Allow",
    "Principal":{
        "AWS":"arn:aws:iam::<AWS_ACCOUNT_ID>:role/EMR_EC2_DefaultRole"
    },
    "Action":[
             "sts:AssumeRole",
             "sts:TagSession"
            ]
}
```

## Launch an Amazon EMR cluster with role-based access control
<a name="emr-steps-runtime-roles-launch"></a>

After you set up your configurations, you can launch an Amazon EMR cluster with the security configuration from [Step 1: Set up security configurations in Amazon EMR](#configure-security). To use runtime roles with Amazon EMR steps, use release label `emr-6.7.0` or later, and select Hive, Spark, or both as your cluster application. CloudWatchAgent is supported on Runtime Role Clusters for EMR 7.6 and above. To connect from SageMaker AI Studio, use release `emr-6.9.0` or later, and select Livy, Spark, Hive, or Presto as your cluster application. For instructions on how to launch your cluster, see [Specify a security configuration for an Amazon EMR cluster](emr-specify-security-configuration.md).

### Submit Spark jobs using Amazon EMR steps
<a name="launch-spark"></a>

The following is an example of how to run the HdfsTest example included with Apache Spark. This API call only succeeds if the provided Amazon EMR runtime role can access the `S3_LOCATION`.

```
RUNTIME_ROLE_ARN=<runtime-role-arn>
S3_LOCATION=<s3-path>
REGION=<aws-region>
CLUSTER_ID=<cluster-id>

aws emr add-steps --cluster-id $CLUSTER_ID \
--steps '[{ "Name": "Spark Example", "ActionOnFailure": "CONTINUE", "Jar":"command-runner.jar","Args" : ["spark-example","HdfsTest", "$S3_LOCATION"] }]' \
--execution-role-arn $RUNTIME_ROLE_ARN \
--region $REGION
```

**Note**  
We recommend that you turn off SSH access to the Amazon EMR cluster and only allow the Amazon EMR `AddJobFlowSteps` API to access to the cluster.

### Submit Hive jobs using Amazon EMR steps
<a name="launch-hive"></a>

The following example uses Apache Hive with Amazon EMR steps to submit a job to run the `QUERY_FILE.hql` file. This query only succeeds if the provided runtime role can access the Amazon S3 path of the query file.

```
RUNTIME_ROLE_ARN=<runtime-role-arn>
REGION=<aws-region>
CLUSTER_ID=<cluster-id>

aws emr add-steps --cluster-id $CLUSTER_ID \
--steps '[{ "Name": "Run hive query using command-runner.jar - simple select","ActionOnFailure":"CONTINUE", "Jar": "command-runner.jar","Args" :["hive -
f","s3://DOC_EXAMPLE_BUCKET/QUERY_FILE.hql"] }]' \
--execution-role-arn $RUNTIME_ROLE_ARN \
--region $REGION
```

### Connect to Amazon EMR clusters with runtime roles from a SageMaker AI Studio notebook
<a name="sagemaker"></a>

You can apply Amazon EMR runtime roles to queries that you run in Amazon EMR clusters from SageMaker AI Studio. To do so, go through the following steps.

1. Follow the instructions in [Launch Amazon SageMaker AI Studio]() to create an SageMaker AI Studio.

1. In the SageMaker AI Studio UI, start a notebook with supported kernels. For example, start a SparkMagic image with a PySpark kernel.

1. Choose an Amazon EMR cluster in SageMaker AI Studio, and then choose **Connect**.

1. Choose a runtime role, and then choose **Connect**. 

This will create an SageMaker AI notebook cell with magic commands to connect to your Amazon EMR cluster with the chosen Amazon EMR runtime role. In the notebook cell, you can enter and run queries with runtime role and Lake Formation based access control. For a more detailed example, see [Apply fine-grained data access controls with AWS Lake Formation and Amazon EMR from Amazon SageMaker AI Studio](https://aws.amazon.com/blogs/machine-learning/apply-fine-grained-data-access-controls-with-aws-lake-formation-and-amazon-emr-from-amazon-sagemaker-studio).

### Control access to the Amazon EMR runtime role
<a name="role-access"></a>

You can control access to the runtime role with the condition key `elasticmapreduce:ExecutionRoleArn`. The following policy allows an IAM principal to use an IAM role named `Caller`, or any IAM role that begins with the string `CallerTeamRole`, as the runtime role.

**Important**  
You must create a condition based on the `elasticmapreduce:ExecutionRoleArn` context key when you grant a caller access to call the `AddJobFlowSteps` or `GetClusterSessionCredentials` APIs, as the following example shows.

```
{
    "Sid":"AddStepsWithSpecificExecRoleArn",
    "Effect":"Allow",
    "Action":[
        "elasticmapreduce:AddJobFlowSteps"
    ],
    "Resource":"*",
    "Condition":{
        "StringEquals":{
            "elasticmapreduce:ExecutionRoleArn":[
                "arn:aws:iam::<AWS_ACCOUNT_ID>:role/Caller"
            ]
        },
        "StringLike":{
            "elasticmapreduce:ExecutionRoleArn":[
                "arn:aws:iam::<AWS_ACCOUNT_ID>:role/CallerTeamRole*"
            ]
        }
    }
}
```

### Establish trust between runtime roles and Amazon EMR clusters
<a name="external-id"></a>

Amazon EMR generates a unique identifier `ExternalId` for each security configuration with activated runtime role authorization. This authorization allows every user to own a set of runtime roles to use on clusters that belong to them. For example, in an enterprise, every department can use their external ID to update the trust policy on their own set of runtime roles.

You can find the external ID with the Amazon EMR `DescribeSecurityConfiguration` API, as shown in the following example.

```
aws emr describe-security-configuration --name 'iamconfig-with-lf'{"Name": "iamconfig-with-lf",
    "SecurityConfiguration":
        "{\"AuthorizationConfiguration\":{\"IAMConfiguration\":{\"EnableApplicationScopedIAMRole\
        ":true,\"ApplicationScopedIAMRoleConfiguration\":{\"PropagateSourceIdentity\":true,\"Exter
        nalId\":\"FXH5TSACFDWUCDSR3YQE2O7ETPUSM4OBCGLYWODSCUZDNZ4Y\"}},\"Lake
        FormationConfiguration\":{\"AuthorizedSessionTagValue\":\"Amazon EMR\"}}}",
    "CreationDateTime": "2022-06-03T12:52:35.308000-07:00"
}
```

For information about how to use an external ID, see [How to use an external ID when granting access to your AWS resources to a third party](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html). 

### Audit
<a name="audit-source-identity"></a>

To monitor and control actions that end users take with IAM roles, you can turn on the source identity feature. To learn more about source identity, see [Monitor and control actions taken with assumed roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_control-access_monitor).

To track source identity, set `ApplicationScopedIAMRoleConfiguration/PropagateSourceIdentity` to `true` in your security configuration, as follows.

```
{
    "AuthorizationConfiguration":{
        "IAMConfiguration":{
            "EnableApplicationScopedIAMRole":true,
            "ApplicationScopedIAMRoleConfiguration":{
                "PropagateSourceIdentity":true
            }
        }
    }
}
```

When you set `PropagateSourceIdentity` to `true`, Amazon EMR applies the source identity from the calling credentials to a job or query session that you create with the runtime role. If no source identity is present in the calling credentials, Amazon EMR doesn't set the source identity.

To use this property, provide `sts:SetSourceIdentity` permissions to your instance profile, as follows.

```
{ // PropagateSourceIdentity statement
    "Sid":"PropagateSourceIdentity",
    "Effect":"Allow",
    "Action":"sts:SetSourceIdentity",
    "Resource":[
        <runtime-role-ARN>
    ],
    "Condition":{
        "StringEquals":{
            "sts:SourceIdentity":<source-identity>
        }
    }
}
```

You must also add the `AllowSetSourceIdentity` statement to the trust policy of your runtime roles.

```
{ // AllowSetSourceIdentity statement
    "Sid":"AllowSetSourceIdentity",
    "Effect":"Allow",
    "Principal":{
        "AWS":"arn:aws:iam::<AWS_ACCOUNT_ID>:role/EMR_EC2_DefaultRole"
    },
    "Action":[
        "sts:SetSourceIdentity",
        "sts:AssumeRole"
    ],
    "Condition":{
        "StringEquals":{
            "sts:SourceIdentity":<source-identity>
        }
    }
}
```

## Additional considerations
<a name="emr-steps-runtime-roles-considerations"></a>

**Note**  
With Amazon EMR release `emr-6.9.0`, you might experience intermittent failures when you connect to Amazon EMR clusters from SageMaker AI Studio. To address this issue, you can install the patch with a bootstrap action when you launch the cluster. For patch details, see [Amazon EMR release 6.9.0 known issues](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-690-release.html#emr-690-relnotes).

Additionally, consider the following when you configure runtime roles for Amazon EMR.
+ Amazon EMR supports runtime roles in all commercial AWS Regions.
+ Amazon EMR steps support Apache Spark and Apache Hive jobs with runtime roles when you use release `emr-6.7.0` or later.
+ SageMaker AI Studio supports Spark, Hive, and Presto queries with runtime roles when you use release `emr-6.9.0` or later. 
+ The following notebook kernels in SageMaker AI support runtime roles:
  + DataScience – Python 3 kernel
  + DataScience 2.0 – Python 3 kernel
  + DataScience 3.0 – Python 3 kernel
  + SparkAnalytics 1.0 – SparkMagic and PySpark kernels
  + SparkAnalytics 2.0 – SparkMagic and PySpark kernels
  + SparkMagic – PySpark kernel
+ Amazon EMR supports steps that use `RunJobFlow` only at the time of cluster creation. This API doesn't support runtime roles.
+ Amazon EMR doesn't support runtime roles on clusters that you configure to be highly-available. 
+ Starting with Amazon EMR release 7.5.0 and higher, runtime roles support viewing Spark and YARN User Interfaces (UIs), such as the following: Spark Live UI, Spark History Server, YARN NodeManager, and YARN ResourceManager. When you navigate to these UIs, there is a username and password prompt. Usernames and passwords can be generated through use of the EMR GetClusterSessionCredentials API. For more information regarding usage details for the API, see [GetClusterSessionCredentials](https://docs.aws.amazon.com/emr/latest/APIReference/API_GetClusterSessionCredentials.html).

  An example of how to use the EMR GetClusterSessionCredentials API is the following:

  ```
  aws emr  get-cluster-session-credentials --cluster-id <cluster_ID> --execution-role-arn <IAM_role_arn>
  ```
+ You must escape your Bash command arguments when running commands with the `command-runner.jar` JAR file:

  ```
  aws emr add-steps --cluster-id <cluster-id> --steps '[{"Name":"sample-step","ActionOnFailure":"CONTINUE","Jar":"command-runner.jar","Properties":"","Args":["bash","-c","\"aws s3 ls\""],"Type":"CUSTOM_JAR"}]' --execution-role-arn <IAM_ROLE_ARN>
  ```

  Additionally, you must escape your Bash command arguments when running commands with the script runner. The following is a sample that shows setting Spark properties, with included escape characters:

  ```
  "\"--conf spark.sql.autoBroadcastJoinThreshold=-1\""
  ```
+ Runtime roles don't provide support for controlling access to on-cluster resources, such as HDFS and HMS.
+ Runtime roles don't provide support for docker/containers.

# Configure IAM service roles for Amazon EMR permissions to AWS services and resources
<a name="emr-iam-roles"></a>

Amazon EMR and applications such as Hadoop and Spark need permissions to access other AWS resources and perform actions when they run. Each cluster in Amazon EMR must have a *service role* and a role for the Amazon EC2 *instance profile*. For more information, see [IAM roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) and [Using instance profiles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2_instance-profiles.html) in the *IAM User Guide*. The IAM policies attached to these roles provide permissions for the cluster to interoperate with other AWS services on behalf of a user.

An additional role, the Auto Scaling role, is required if your cluster uses automatic scaling in Amazon EMR. The AWS service role for EMR Notebooks is required if you use EMR Notebooks.

Amazon EMR provides default roles and default managed policies that determine permissions for each role. Managed policies are created and maintained by AWS, so they are updated automatically if service requirements change. See [AWS managed policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html#aws-managed-policies.html) in the *IAM User Guide*.

If you are creating a cluster or notebook for the first time in an account, roles for Amazon EMR do not yet exist. After you create them, you can view the roles, the policies attached to them, and the permissions allowed or denied by the policies in the IAM console ([https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/)). You can specify default roles for Amazon EMR to create and use, you can create your own roles and specify them individually when you create a cluster to customize permissions, and you can specify default roles to be used when you create a cluster using the AWS CLI. For more information, see [Customize IAM roles with Amazon EMR](emr-iam-roles-custom.md).

## Modifying identity-based policies for permissions to pass service roles for Amazon EMR
<a name="emr-iam-roles-passrole"></a>

The Amazon EMR full-permissions default managed policies incorporate `iam:PassRole` security configurations, including the following:
+ `iam:PassRole` permissions only for specific default Amazon EMR roles.
+ `iam:PassedToService` conditions that allow you to use the policy with only specified AWS services, such as `elasticmapreduce.amazonaws.com` and `ec2.amazonaws.com`.

You can view the JSON version of the [AmazonEMRFullAccessPolicy\$1v2](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AmazonEMRFullAccessPolicy_v2) and [AmazonEMRServicePolicy\$1v2](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AmazonEMRServicePolicy_v2) policies in the IAM console. We recommend that you create new clusters with the v2 managed policies.

## Service role summary
<a name="emr-iam-roles-summary"></a>

The following table lists the IAM service roles associated with Amazon EMR for quick reference.


| Function | Default role | Description | Default managed policy | 
| --- | --- | --- | --- | 
|  [Service role for Amazon EMR (EMR role)](emr-iam-role.md)  |  `EMR_DefaultRole_V2`  |  Allows Amazon EMR to call other AWS services on your behalf when provisioning resources and performing service-level actions. This role is required for all clusters.  |  `AmazonEMRServicePolicy_v2`  A service-linked role is required to request Spot Instances. If this role doesn't exist, the Amazon EMR service role must have permission to create it or a permission error occurs. If you plan to request Spot Instances, you must update this policy to include a statement that allows the creation of this service-linked role. For more information, see [Service role for Amazon EMR (EMR role)](emr-iam-role.md) and [Service-linked role for Spot Instance requests](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-requests.html#service-linked-roles-spot-instance-requests) in the *Amazon EC2 User Guide*.    | 
| [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md) |  `EMR_EC2_DefaultRole`  |  Application processes that run on top of the Hadoop ecosystem on cluster instances use this role when they call other AWS services. For accessing data in Amazon S3 using EMRFS, you can specify different roles to be assumed based on the location of data in Amazon S3. For example, multiple teams can access a single Amazon S3 data "storage account." For more information, see [Configure IAM roles for EMRFS requests to Amazon S3](emr-emrfs-iam-roles.md). This role is required for all clusters.  |  `AmazonElasticMapReduceforEC2Role`. For more information, see [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md).  | 
| [Service role for automatic scaling in Amazon EMR (Auto Scaling role)](emr-iam-role-automatic-scaling.md) |  `EMR_AutoScaling_DefaultRole`  |  Allows additional actions for dynamically scaling environments. Required only for clusters that use automatic scaling in Amazon EMR. For more information, see [Using automatic scaling with a custom policy for instance groups in Amazon EMR](emr-automatic-scaling.md).  |  `AmazonElasticMapReduceforAutoScalingRole`. For more information, see [Service role for automatic scaling in Amazon EMR (Auto Scaling role)](emr-iam-role-automatic-scaling.md).  | 
| [Service role for EMR Notebooks](emr-managed-notebooks-service-role.md) |  `EMR_Notebooks_DefaultRole`  |  Provides permissions that an EMR notebook needs to access other AWS resources and perform actions. Required only if EMR Notebooks is used.  |  `AmazonElasticMapReduceEditorsRole`. For more information, see [Service role for EMR Notebooks](emr-managed-notebooks-service-role.md). `S3FullAccessPolicy` is also attached by default. Following is the contents of this policy.   JSON   

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:*"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowS3"
    }
  ]
}
```      | 
| [Service-Linked Role](using-service-linked-roles.md) | `AWSServiceRoleForEMRCleanup` | Amazon EMR automatically creates a service-linked role. If the service for Amazon EMR has lost the ability to clean up Amazon EC2 resources, Amazon EMR can use this role to clean up. If a cluster uses Spot Instances, the permissions policy attached to the [Service role for Amazon EMR (EMR role)](emr-iam-role.md) must allow the creation of a service-linked role. For more information, see [Using service-linked roles for Amazon EMR](using-service-linked-roles.md). | `AmazonEMRCleanupPolicy` | 

**Topics**
+ [Modifying identity-based policies for permissions to pass service roles for Amazon EMR](#emr-iam-roles-passrole)
+ [Service role summary](#emr-iam-roles-summary)
+ [IAM service roles used by Amazon EMR](emr-iam-service-roles.md)
+ [Customize IAM roles with Amazon EMR](emr-iam-roles-custom.md)
+ [Configure IAM roles for EMRFS requests to Amazon S3](emr-emrfs-iam-roles.md)
+ [Use resource-based policies for Amazon EMR access to AWS Glue Data Catalog](emr-iam-roles-glue.md)
+ [Use IAM roles with applications that call AWS services directly](emr-iam-roles-calling.md)
+ [Allow users and groups to create and modify roles](emr-iam-roles-create-permissions.md)

# IAM service roles used by Amazon EMR
<a name="emr-iam-service-roles"></a>

Amazon EMR uses IAM service roles to perform actions on your behalf when provisioning cluster resources, running applications, dynamically scaling resources, and creating and running EMR Notebooks. Amazon EMR uses the following roles when interacting with other AWS services. Each role has a unique function within Amazon EMR. The topics in this section describe the role function and provide the default roles and permissions policy for each role.

If you have application code on your cluster that calls AWS services directly, you may need to use the SDK to specify roles. For more information, see [Use IAM roles with applications that call AWS services directly](emr-iam-roles-calling.md).

**Topics**
+ [Service role for Amazon EMR (EMR role)](emr-iam-role.md)
+ [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md)
+ [Service role for automatic scaling in Amazon EMR (Auto Scaling role)](emr-iam-role-automatic-scaling.md)
+ [Service role for EMR Notebooks](emr-managed-notebooks-service-role.md)
+ [Using service-linked roles for Amazon EMR](using-service-linked-roles.md)

# Service role for Amazon EMR (EMR role)
<a name="emr-iam-role"></a>

The Amazon EMR role defines the allowable actions for Amazon EMR when it provisions resources and performs service-level tasks that aren't performed in the context of an Amazon EC2 instance running within a cluster. For example, the service role is used to provision EC2 instances when a cluster launches.
+ The default role name is `EMR_DefaultRole_V2`.
+ The Amazon EMR scoped default managed policy attached to `EMR_DefaultRole_V2` is `AmazonEMRServicePolicy_v2`. This v2 policy replaces the deprecated default managed policy, `AmazonElasticMapReduceRole`.

`AmazonEMRServicePolicy_v2` depends on scoped down access to resources that Amazon EMR provisions or uses. When you use this policy, you need to pass the user tag `for-use-with-amazon-emr-managed-policies = true` when provisioning the cluster. Amazon EMR will automatically propagate those tags. Additionally, you may need to manually add a user tag to specific types of resources, such as EC2 security groups that were not created by Amazon EMR. See [Tagging resources to use managed policies](emr-managed-iam-policies.md#manually-tagged-resources).

**Important**  
Amazon EMR uses this Amazon EMR service role and the `AWSServiceRoleForEMRCleanup` role to clean up cluster resources in your account that you no longer use, such as Amazon EC2 instances. You must include actions for the role policies to delete or terminate the resources. Otherwise, Amazon EMR can’t perform these cleanup actions, and you might incur costs for unused resources that remain on the cluster.

The following shows the contents of the current `AmazonEMRServicePolicy_v2` policy. You can also see the current content of the [https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AmazonEMRServicePolicy_v2](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AmazonEMRServicePolicy_v2) managed policy on the IAM console.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "CreateInTaggedNetwork",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateNetworkInterface",
        "ec2:RunInstances",
        "ec2:CreateFleet",
        "ec2:CreateLaunchTemplate",
        "ec2:CreateLaunchTemplateVersion"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:subnet/*",
        "arn:aws:ec2:*:*:security-group/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "CreateWithEMRTaggedLaunchTemplate",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateFleet",
        "ec2:RunInstances",
        "ec2:CreateLaunchTemplateVersion"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:launch-template/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "CreateEMRTaggedLaunchTemplate",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateLaunchTemplate"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:launch-template/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "CreateEMRTaggedInstancesAndVolumes",
      "Effect": "Allow",
      "Action": [
        "ec2:RunInstances",
        "ec2:CreateFleet"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:instance/*",
        "arn:aws:ec2:*:*:volume/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "ResourcesToLaunchEC2",
      "Effect": "Allow",
      "Action": [
        "ec2:RunInstances",
        "ec2:CreateFleet",
        "ec2:CreateLaunchTemplate",
        "ec2:CreateLaunchTemplateVersion"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:network-interface/*",
        "arn:aws:ec2:*::image/ami-*",
        "arn:aws:ec2:*:*:key-pair/*",
        "arn:aws:ec2:*:*:capacity-reservation/*",
        "arn:aws:ec2:*:*:placement-group/pg-*",
        "arn:aws:ec2:*:*:fleet/*",
        "arn:aws:ec2:*:*:dedicated-host/*",
        "arn:aws:resource-groups:*:*:group/*"
      ]
    },
    {
      "Sid": "ManageEMRTaggedResources",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateLaunchTemplateVersion",
        "ec2:DeleteLaunchTemplate",
        "ec2:DeleteNetworkInterface",
        "ec2:ModifyInstanceAttribute",
        "ec2:TerminateInstances"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "ManageTagsOnEMRTaggedResources",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags",
        "ec2:DeleteTags"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:instance/*",
        "arn:aws:ec2:*:*:volume/*",
        "arn:aws:ec2:*:*:network-interface/*",
        "arn:aws:ec2:*:*:launch-template/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "CreateNetworkInterfaceNeededForPrivateSubnet",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateNetworkInterface"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:network-interface/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "TagOnCreateTaggedEMRResources",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:network-interface/*",
        "arn:aws:ec2:*:*:instance/*",
        "arn:aws:ec2:*:*:volume/*",
        "arn:aws:ec2:*:*:launch-template/*"
      ],
      "Condition": {
        "StringEquals": {
          "ec2:CreateAction": [
            "RunInstances",
            "CreateFleet",
            "CreateLaunchTemplate",
            "CreateNetworkInterface"
          ]
        }
      }
    },
    {
      "Sid": "TagPlacementGroups",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags",
        "ec2:DeleteTags"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:placement-group/pg-*"
      ]
    },
    {
      "Sid": "ListActionsForEC2Resources",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeAccountAttributes",
        "ec2:DescribeCapacityReservations",
        "ec2:DescribeDhcpOptions",
        "ec2:DescribeImages",
        "ec2:DescribeInstances",
        "ec2:DescribeInstanceTypeOfferings",
        "ec2:DescribeLaunchTemplates",
        "ec2:DescribeNetworkAcls",
        "ec2:DescribeNetworkInterfaces",
        "ec2:DescribePlacementGroups",
        "ec2:DescribeRouteTables",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeVolumes",
        "ec2:DescribeVolumeStatus",
        "ec2:DescribeVpcAttribute",
        "ec2:DescribeVpcEndpoints",
        "ec2:DescribeVpcs"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "CreateDefaultSecurityGroupWithEMRTags",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateSecurityGroup"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:security-group/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "CreateDefaultSecurityGroupInVPCWithEMRTags",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateSecurityGroup"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:vpc/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "TagOnCreateDefaultSecurityGroupWithEMRTags",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:security-group/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true",
          "ec2:CreateAction": "CreateSecurityGroup"
        }
      }
    },
    {
      "Sid": "ManageSecurityGroups",
      "Effect": "Allow",
      "Action": [
        "ec2:AuthorizeSecurityGroupEgress",
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:RevokeSecurityGroupEgress",
        "ec2:RevokeSecurityGroupIngress"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "CreateEMRPlacementGroups",
      "Effect": "Allow",
      "Action": [
        "ec2:CreatePlacementGroup"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:placement-group/pg-*"
      ]
    },
    {
      "Sid": "DeletePlacementGroups",
      "Effect": "Allow",
      "Action": [
        "ec2:DeletePlacementGroup"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "AutoScaling",
      "Effect": "Allow",
      "Action": [
        "application-autoscaling:DeleteScalingPolicy",
        "application-autoscaling:DeregisterScalableTarget",
        "application-autoscaling:DescribeScalableTargets",
        "application-autoscaling:DescribeScalingPolicies",
        "application-autoscaling:PutScalingPolicy",
        "application-autoscaling:RegisterScalableTarget"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "ResourceGroupsForCapacityReservations",
      "Effect": "Allow",
      "Action": [
        "resource-groups:ListGroupResources"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "AutoScalingCloudWatch",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:PutMetricAlarm",
        "cloudwatch:DeleteAlarms",
        "cloudwatch:DescribeAlarms"
      ],
      "Resource": [
        "arn:aws:cloudwatch:*:*:alarm:*_EMR_Auto_Scaling"
      ]
    },
    {
      "Sid": "PassRoleForAutoScaling",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/EMR_AutoScaling_DefaultRole"
      ],
      "Condition": {
        "StringLike": {
          "iam:PassedToService": "application-autoscaling.amazonaws.com*"
        }
      }
    },
    {
      "Sid": "PassRoleForEC2",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/EMR_EC2_DefaultRole"
      ],
      "Condition": {
        "StringLike": {
          "iam:PassedToService": "ec2.amazonaws.com*"
        }
      }
    },
    {
      "Sid": "CreateAndModifyEmrServiceVPCEndpoint",
      "Effect": "Allow",
      "Action": [
        "ec2:ModifyVpcEndpoint",
        "ec2:CreateVpcEndpoint"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:vpc-endpoint/*",
        "arn:aws:ec2:*:*:subnet/*",
        "arn:aws:ec2:*:*:security-group/*",
        "arn:aws:ec2:*:*:vpc/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "CreateEmrServiceVPCEndpoint",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateVpcEndpoint"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:vpc-endpoint/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true",
          "aws:RequestTag/Name": "emr-service-vpce"
        }
      }
    },
    {
      "Sid": "TagEmrServiceVPCEndpoint",
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:vpc-endpoint/*"
      ],
      "Condition": {
        "StringEquals": {
          "ec2:CreateAction": "CreateVpcEndpoint",
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true",
          "aws:RequestTag/Name": "emr-service-vpce"
        }
      }
    }
  ]
}
```

------

Your service role should use the following trust policy.

**Important**  
The following trust policy includes the [https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn) and [https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceaccount](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceaccount) global condition keys, which limit the permissions that you give Amazon EMR to particular resources in your account. Using them can protect you against [the confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowSTSAssumerole",
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole"
      ],
      "Resource": "arn:aws:iam::123456789012:role/EMRServiceRole",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "123456789012"
        },
        "ArnLike": {
          "aws:SourceArn": "arn:aws:elasticmapreduce:*:123456789012:*"
        }
      }
    }
  ]
}
```

------

# Service role for cluster EC2 instances (EC2 instance profile)
<a name="emr-iam-role-for-ec2"></a>

The service role for cluster EC2 instances (also called the EC2 instance profile for Amazon EMR) is a special type of service role that is assigned to every EC2 instance in an Amazon EMR cluster when the instance launches. Application processes that run on top of the Hadoop ecosystem assume this role for permissions to interact with other AWS services.

For more information about service roles for EC2 instances, see [Using an IAM role to grant permissions to applications running on Amazon EC2 instances](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html) in the *IAM User Guide*.

**Important**  
The default service role for cluster EC2 instances and its associated AWS default managed policy, `AmazonElasticMapReduceforEC2Role` are on the path to deprecation, with no replacement AWS managed policies provided. You'll need to create and specify an instance profile to replace the deprecated role and default policy.

## Default role and managed policy
<a name="emr-ec2-role-default"></a>
+ The default role name is `EMR_EC2_DefaultRole`.
+ The `EMR_EC2_DefaultRole` default managed policy, `AmazonElasticMapReduceforEC2Role`, is nearing end of support. Instead of using a default managed policy for the EC2 instance profile, apply resource-based policies to S3 buckets and other resources that Amazon EMR needs, or use your own customer-managed policy with an IAM role as an instance profile. For more information, see [Creating a service role for cluster EC2 instances with least-privilege permissions](#emr-ec2-role-least-privilege).

The following shows the contents of version 3 of `AmazonElasticMapReduceforEC2Role`.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Resource": [
        "*"
      ],
      "Action": [
        "cloudwatch:*",
        "dynamodb:*",
        "ec2:Describe*",
        "elasticmapreduce:Describe*",
        "elasticmapreduce:ListBootstrapActions",
        "elasticmapreduce:ListClusters",
        "elasticmapreduce:ListInstanceGroups",
        "elasticmapreduce:ListInstances",
        "elasticmapreduce:ListSteps",
        "kinesis:CreateStream",
        "kinesis:DeleteStream",
        "kinesis:DescribeStream",
        "kinesis:GetRecords",
        "kinesis:GetShardIterator",
        "kinesis:MergeShards",
        "kinesis:PutRecord",
        "kinesis:SplitShard",
        "rds:Describe*",
        "s3:*",
        "sdb:*",
        "sns:*",
        "sqs:*",
        "glue:CreateDatabase",
        "glue:UpdateDatabase",
        "glue:DeleteDatabase",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:CreateTable",
        "glue:UpdateTable",
        "glue:DeleteTable",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetTableVersions",
        "glue:CreatePartition",
        "glue:BatchCreatePartition",
        "glue:UpdatePartition",
        "glue:DeletePartition",
        "glue:BatchDeletePartition",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:BatchGetPartition",
        "glue:CreateUserDefinedFunction",
        "glue:UpdateUserDefinedFunction",
        "glue:DeleteUserDefinedFunction",
        "glue:GetUserDefinedFunction",
        "glue:GetUserDefinedFunctions"
      ],
      "Sid": "AllowCLOUDWATCH"
    }
  ]
}
```

------

Your service role should use the following trust policy.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "AllowSTSAssumerole",
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole"
      ],
      "Resource": "arn:aws:iam::123456789012:role/EMR_EC2_DefaultRole"
    }
  ]
}
```

------

## Creating a service role for cluster EC2 instances with least-privilege permissions
<a name="emr-ec2-role-least-privilege"></a>

As a best practice, we strongly recommend that you create a service role for cluster EC2 instances and permissions policy that has the minimum permissions to other AWS services required by your application.

The default managed policy, `AmazonElasticMapReduceforEC2Role`, provides permissions that make it easy to launch an initial cluster. However, `AmazonElasticMapReduceforEC2Role` is on the path to deprecation and Amazon EMR will not provide a replacement AWS managed default policy for the deprecated role. To launch an initial cluster, you need to provide a customer managed resource-based or ID-based policy.

The following policy statements provide examples of the permissions required for different features of Amazon EMR. We recommend that you use these permissions to create a permissions policy that restricts access to only those features and resources that your cluster requires. All example policy statements use the *us-west-2* Region and the fictional AWS account ID *123456789012*. Replace these as appropriate for your cluster.

For more information about creating and specifying custom roles, see [Customize IAM roles with Amazon EMR](emr-iam-roles-custom.md).

**Note**  
If you create a custom EMR role for EC2, follow the basic work flow, which automatically creates an instance profile of the same name. Amazon EC2 allows you to create instance profiles and roles with different names, but Amazon EMR does not support this configuration, and it results in an "invalid instance profile" error when you create the cluster. 

### Reading and writing data to Amazon S3 using EMRFS
<a name="emr-ec2-role-EMRFS"></a>

When an application running on an Amazon EMR cluster references data using the `s3://mydata` format, Amazon EMR uses the EC2 instance profile to make the request. Clusters typically read and write data to Amazon S3 in this way, and Amazon EMR uses the permissions attached to the service role for cluster EC2 instances by default. For more information, see [Configure IAM roles for EMRFS requests to Amazon S3](emr-emrfs-iam-roles.md).

Because IAM roles for EMRFS will fall back to the permissions attached to the service role for cluster EC2 instances, as a best practice, we recommend that you use IAM roles for EMRFS, and limit the EMRFS and Amazon S3 permissions attached to the service role for cluster EC2 instances.

The sample statement below demonstrates the permissions that EMRFS requires to make requests to Amazon S3.
+ *my-data-bucket-in-s3-for-emrfs-reads-and-writes* specifies the bucket in Amazon S3 where the cluster reads and writes data and all sub-folders using */\$1*. Add only those buckets and folders that your application requires.
+ The policy statement that allows `dynamodb` actions is required only if EMRFS consistent view is enabled. *EmrFSMetadata* specifies the default folder for EMRFS consistent view.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:AbortMultipartUpload",
        "s3:CreateBucket",
        "s3:DeleteObject",
        "s3:GetBucketVersioning",
        "s3:GetObject",
        "s3:GetObjectTagging",
        "s3:GetObjectVersion",
        "s3:ListBucket",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions",
        "s3:ListMultipartUploadParts",
        "s3:PutBucketVersioning",
        "s3:PutObject",
        "s3:PutObjectTagging"
      ],
      "Resource": [
        "arn:aws:s3:::my-data-bucket-in-s3-for-emrfs-reads-and-writes",
        "arn:aws:s3:::my-data-bucket-in-s3-for-emrfs-reads-and-writes/*"
      ],
      "Sid": "AllowS3Abortmultipartupload"
    },
    {
      "Effect": "Allow",
      "Action": [
        "dynamodb:CreateTable",
        "dynamodb:BatchGetItem",
        "dynamodb:BatchWriteItem",
        "dynamodb:PutItem",
        "dynamodb:DescribeTable",
        "dynamodb:DeleteItem",
        "dynamodb:GetItem",
        "dynamodb:Scan",
        "dynamodb:Query",
        "dynamodb:UpdateItem",
        "dynamodb:DeleteTable",
        "dynamodb:UpdateTable"
      ],
      "Resource": [
        "arn:aws:dynamodb:*:123456789012:table/EmrFSMetadata"
      ],
      "Sid": "AllowDYNAMODBCreatetable"
    },
    {
      "Effect": "Allow",
      "Action": [
        "cloudwatch:PutMetricData",
        "dynamodb:ListTables",
        "s3:ListBucket"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowCLOUDWATCHPutmetricdata"
    },
    {
      "Effect": "Allow",
      "Action": [
        "sqs:GetQueueUrl",
        "sqs:ReceiveMessage",
        "sqs:DeleteQueue",
        "sqs:SendMessage",
        "sqs:CreateQueue"
      ],
      "Resource": [
        "arn:aws:sqs:*:123456789012:EMRFS-Inconsistency-*"
      ],
      "Sid": "AllowSQSGetqueueurl"
    }
  ]
}
```

------

### Archiving log files to Amazon S3
<a name="emr-ec2-role-s3-logs"></a>

The following policy statement allows the Amazon EMR cluster to archive log files to the Amazon S3 location specified. In the example below, when the cluster was created, *s3://MyLoggingBucket/MyEMRClusterLogs* was specified using the **Log folder S3 location** in the console, using the `--log-uri` option from the AWS CLI, or using the `LogUri` parameter in the `RunJobFlow` command. For more information, see [Archive log files to Amazon S3](emr-plan-debugging.md#emr-plan-debugging-logs-archive).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:PutObject"
      ],
      "Resource": [
        "arn:aws:s3:::MyLoggingBucket/MyEMRClusterLogs/*"
      ],
      "Sid": "AllowS3Putobject"
    }
  ]
}
```

------

### Using the AWS Glue Data Catalog
<a name="emr-ec2-role-glue"></a>

The following policy statement allows actions that are required if you use the AWS Glue Data Catalog as the metastore for applications. For more information, see [Using the AWS Glue Data Catalog as the metastore for Spark SQL](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html), [Using the AWS Glue Data Catalog as the metastore for Hive](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-metastore-glue.html), and [Using Presto with the AWS Glue Data Catalog](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-presto-glue.html) in the *Amazon EMR Release Guide*.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "glue:CreateDatabase",
        "glue:UpdateDatabase",
        "glue:DeleteDatabase",
        "glue:GetDatabase",
        "glue:GetDatabases",
        "glue:CreateTable",
        "glue:UpdateTable",
        "glue:DeleteTable",
        "glue:GetTable",
        "glue:GetTables",
        "glue:GetTableVersions",
        "glue:CreatePartition",
        "glue:BatchCreatePartition",
        "glue:UpdatePartition",
        "glue:DeletePartition",
        "glue:BatchDeletePartition",
        "glue:GetPartition",
        "glue:GetPartitions",
        "glue:BatchGetPartition",
        "glue:CreateUserDefinedFunction",
        "glue:UpdateUserDefinedFunction",
        "glue:DeleteUserDefinedFunction",
        "glue:GetUserDefinedFunction",
        "glue:GetUserDefinedFunctions"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowGLUECreatedatabase"
    }
  ]
}
```

------

# Service role for automatic scaling in Amazon EMR (Auto Scaling role)
<a name="emr-iam-role-automatic-scaling"></a>

The Auto Scaling role for Amazon EMR performs a similar function as the service role, but allows additional actions for dynamically scaling environments.
+ The default role name is `EMR_AutoScaling_DefaultRole`.
+ The default managed policy attached to `EMR_AutoScaling_DefaultRole` is `AmazonElasticMapReduceforAutoScalingRole`.

The contents of version 1 of `AmazonElasticMapReduceforAutoScalingRole` are shown below.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "cloudwatch:DescribeAlarms",
        "elasticmapreduce:ListInstanceGroups",
        "elasticmapreduce:ModifyInstanceGroups"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ],
      "Sid": "AllowCLOUDWATCHDescribealarms"
    }
  ]
}
```

------

Your service role should use the following trust policy.

**Important**  
The following trust policy includes the [https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn) and [https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceaccount](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceaccount) global condition keys, which limit the permissions that you give Amazon EMR to particular resources in your account. Using them can protect you against [the confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole"
      ],
      "Resource": "arn:aws:iam::123456789012:role/ApplicationAutoScalingEMRRole",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "123456789012"
        },
        "ArnLike": {
          "aws:SourceArn": "arn:aws:application-autoscaling:*:123456789012:scalable-target/*"
        }
      },
      "Sid": "AllowSTSAssumerole"
    }
  ]
}
```

------

# Service role for EMR Notebooks
<a name="emr-managed-notebooks-service-role"></a>

Each EMR notebook needs permissions to access other AWS resources and perform actions. The IAM policies attached to this service role provide permissions for the notebook to interoperate with other AWS services. When you create a notebook using the AWS Management Console, you specify an *AWS service role*. You can use the default role, `EMR_Notebooks_DefaultRole`, or specify a role that you create. If a notebook has not been created before, you can choose to create the default role.
+ The default role name is `EMR_Notebooks_DefaultRole`.
+ The default managed policies attached to `EMR_Notebooks_DefaultRole` are `AmazonElasticMapReduceEditorsRole` and `S3FullAccessPolicy`.

Your service role should use the following trust policy.

**Important**  
The following trust policy includes the [https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourcearn) and [https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceaccount](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-sourceaccount) global condition keys, which limit the permissions that you give Amazon EMR to particular resources in your account. Using them can protect you against [the confused deputy problem](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole"
      ],
      "Resource": "arn:aws:iam::123456789012:role/EMRServiceRole",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "123456789012"
        },
        "ArnLike": {
          "aws:SourceArn": "arn:aws:elasticmapreduce:*:123456789012:*"
        }
      },
      "Sid": "AllowSTSAssumerole"
    }
  ]
}
```

------

The contents of version 1 of `AmazonElasticMapReduceEditorsRole` are as follows.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ec2:AuthorizeSecurityGroupEgress",
        "ec2:AuthorizeSecurityGroupIngress",
        "ec2:CreateSecurityGroup",
        "ec2:DescribeSecurityGroups",
        "ec2:RevokeSecurityGroupEgress",
        "ec2:CreateNetworkInterface",
        "ec2:CreateNetworkInterfacePermission",
        "ec2:DeleteNetworkInterface",
        "ec2:DeleteNetworkInterfacePermission",
        "ec2:DescribeNetworkInterfaces",
        "ec2:ModifyNetworkInterfaceAttribute",
        "ec2:DescribeTags",
        "ec2:DescribeInstances",
        "ec2:DescribeSubnets",
        "ec2:DescribeVpcs",
        "elasticmapreduce:ListInstances",
        "elasticmapreduce:DescribeCluster",
        "elasticmapreduce:ListSteps"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowEC2Authorizesecuritygroupegress"
    },
    {
      "Effect": "Allow",
      "Action": [
        "ec2:CreateTags"
      ],
      "Resource": [
        "arn:aws:ec2:*:*:network-interface/*"
      ],
      "Condition": {
        "ForAllValues:StringEquals": {
          "aws:TagKeys": [
            "aws:elasticmapreduce:editor-id",
            "aws:elasticmapreduce:job-flow-id"
          ]
        }
      },
      "Sid": "AllowEC2Createtags"
    }
  ]
}
```

------

Following is the contents of the `S3FullAccessPolicy`. The `S3FullAccessPolicy` allows your service role for EMR Notebooks to perform all Amazon S3 actions on objects in your AWS account. When you create a custom service role for EMR Notebooks, you must give your service role Amazon S3 permissions.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:*"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowS3"
    }
  ]
}
```

------

You can scope down read and write access for your service role to the Amazon S3 location where you want to save your notebook files. Use the following minimum set of Amazon S3 permissions.

```
"s3:PutObject",
"s3:GetObject",
"s3:GetEncryptionConfiguration",
"s3:ListBucket",
"s3:DeleteObject"
```

If your Amazon S3 bucket is encrypted, you must include the following permissions for AWS Key Management Service.

```
"kms:Decrypt",
"kms:GenerateDataKey",
"kms:ReEncryptFrom",
"kms:ReEncryptTo",
"kms:DescribeKey"
```

When you link Git repositories to your notebook and need to create a secret for the repository, you must add the `secretsmanager:GetSecretValue` permission in the IAM policy attached to the service role for Amazon EMR notebooks. An example policy is demonstrated below: 

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "VisualEditor0",
      "Effect": "Allow",
      "Action": [
        "secretsmanager:GetSecretValue"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

## EMR Notebooks service role permissions
<a name="emr-managed-notebooks-service-role-permissions"></a>

This table lists the actions that EMR Notebooks takes using the service role, along with the permissions that are needed for each action.


****  

| Action | Permissions | 
| --- | --- | 
| Establish a secure network channel between a notebook and an Amazon EMR cluster, and perform necessary cleanup actions. |  <pre>"ec2:CreateNetworkInterface", <br />"ec2:CreateNetworkInterfacePermission", <br />"ec2:DeleteNetworkInterface", <br />"ec2:DeleteNetworkInterfacePermission", <br />"ec2:DescribeNetworkInterfaces", <br />"ec2:ModifyNetworkInterfaceAttribute", <br />"ec2:AuthorizeSecurityGroupEgress", <br />"ec2:AuthorizeSecurityGroupIngress", <br />"ec2:CreateSecurityGroup",<br />"ec2:DescribeSecurityGroups", <br />"ec2:RevokeSecurityGroupEgress",<br />"ec2:DescribeTags",<br />"ec2:DescribeInstances",<br />"ec2:DescribeSubnets",<br />"ec2:DescribeVpcs",<br />"elasticmapreduce:ListInstances", <br />"elasticmapreduce:DescribeCluster", <br />"elasticmapreduce:ListSteps"</pre>  | 
| Use Git credentials stored in AWS Secrets Manager to link Git repositories to a notebook. |  <pre>"secretsmanager:GetSecretValue"</pre>  | 
| Apply AWS tags to the network interface and default security groups that EMR Notebooks creates while setting up the secure network channel. For more information, see [Tagging AWS resources](https://docs.aws.amazon.com/general/latest/gr/aws_tagging.html). |  <pre>"ec2:CreateTags"</pre>  | 
| Access or upload notebook files and metadata to Amazon S3. |  <pre>"s3:PutObject",<br />"s3:GetObject",<br />"s3:GetEncryptionConfiguration",<br />"s3:ListBucket",<br />"s3:DeleteObject" </pre> The following permissions are only required if you use an encrypted Amazon S3 bucket. <pre>"kms:Decrypt",<br />"kms:GenerateDataKey",<br />"kms:ReEncryptFrom",<br />"kms:ReEncryptTo",<br />"kms:DescribeKey"</pre>  | 

## EMR Notebooks updates to AWS managed policies
<a name="notebooks-slr-updates"></a>

View details about updates to AWS managed policies for EMR Notebooks since March 1, 2021.


| Change | Description | Date | 
| --- | --- | --- | 
| AmazonElasticMapReduceEditorsRole - Added permissions | EMR Notebooks added `ec2:describeVPCs` and `elastmicmapreduce:ListSteps` permissions to `AmazonElasticMapReduceEditorsRole`.  | Feb 8, 2023  | 
| EMR Notebooks started tracking changes  |  EMR Notebooks started tracking changes for its AWS managed policies.  | Feb 8, 2023  | 

# Using service-linked roles for Amazon EMR
<a name="using-service-linked-roles"></a>

Amazon EMR uses AWS Identity and Access Management (IAM) [service-linked roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_terms-and-concepts.html#iam-term-service-linked-role). A service-linked role is a unique type of IAM role that is linked directly to Amazon EMR. Service-linked roles are predefined by Amazon EMR and include all the permissions that the service requires to call other AWS services on your behalf.

**Topics**
+ [Using service-linked roles for Amazon EMR for cleanup](using-service-linked-roles-cleanup.md)
+ [Using service-linked roles with Amazon EMR for write-ahead logging](using-service-linked-roles-wal.md)

For information about other services that support service-linked roles, see [AWS services that work with IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_aws-services-that-work-with-iam.html) and look for the services that have **Yes** in the **Service-linked roles** column. Choose a **Yes** with a link to view the service-linked role documentation for that service.

# Using service-linked roles for Amazon EMR for cleanup
<a name="using-service-linked-roles-cleanup"></a>

Amazon EMR uses AWS Identity and Access Management (IAM) [service-linked roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_terms-and-concepts.html#iam-term-service-linked-role). A service-linked role is a unique type of IAM role that is linked directly to Amazon EMR. Service-linked roles are predefined by Amazon EMR and include all the permissions that the service requires to call other AWS services on your behalf.

Service-linked roles work together with the Amazon EMR service role and Amazon EC2 instance profile for Amazon EMR. For more information about the service role and instance profile, see [Configure IAM service roles for Amazon EMR permissions to AWS services and resources](emr-iam-roles.md).

A service-linked role makes setting up Amazon EMR easier because you don’t have to manually add the necessary permissions. Amazon EMR defines the permissions of its service-linked roles, and unless defined otherwise, only Amazon EMR can assume its roles. The defined permissions include the trust policy and the permissions policy, and that permissions policy cannot be attached to any other IAM entity.

You can delete this service-linked role for Amazon EMR only after you delete any related resources and terminate all EMR clusters in the account. This protects your Amazon EMR resources so that you can't inadvertently remove permission to access the resources.

## Using service-linked roles for cleanup
<a name="using-service-linked-roles-permissions-cleanup"></a>

Amazon EMR uses the service-based **AWSServiceRoleForEMRCleanup** role to grant Amazon EMR permission to terminate and delete Amazon EC2 resources on your behalf if the Amazon EMR service-linked role loses that capability. Amazon EMR creates the service-linked role automatically during cluster creation if it doesn't already exist.

The AWSServiceRoleForEMRCleanup service-linked role trusts the following services to assume the role:
+ `elasticmapreduce.amazonaws.com`

The AWSServiceRoleForEMRCleanup service-linked role permissions policy allows Amazon EMR to complete the following actions on the specified resources:
+ Action: `DescribeInstances` on `ec2`
+ Action: `DescribeLaunchTemplates` on `ec2`
+ Action: `DeleteLaunchTemplate` on `ec2`
+ Action: `DescribeSpotInstanceRequests` on `ec2`
+ Action: `ModifyInstanceAttribute` on `ec2`
+ Action: `TerminateInstances` on `ec2`
+ Action: `CancelSpotInstanceRequests` on `ec2`
+ Action: `DeleteNetworkInterface` on `ec2`
+ Action: `DescribeInstanceAttribute` on `ec2`
+ Action: `DescribeVolumeStatus` on `ec2`
+ Action: `DescribeVolumes` on `ec2`
+ Action: `DetachVolume` on `ec2`
+ Action: `DeleteVolume` on `ec2`
+ Action: `DescribePlacementGroups` on `ec2`
+ Action: `DeletePlacementGroup` on `ec2`

You must configure permissions to allow an IAM entity (such as a user, group, or role) to create, edit, or delete a service-linked role.

## Creating a service-linked role for Amazon EMR
<a name="create-service-linked-role"></a>

You don't need to manually create the AWSServiceRoleForEMRCleanup role. When you launch a cluster, either for the first time or when the AWSServiceRoleForEMRCleanup service-linked role is not present, Amazon EMR creates the AWSServiceRoleForEMRCleanup service-linked role for you. You must have permissions to create a service-linked role. For an example statement that adds this capability to the permissions policy of an IAM entity (such as a user, group, or role): 

Add the following statement to the permissions policy for the IAM entity that needs to create the service-linked role.

```
{
             "Sid": "ElasticMapReduceServiceLinkedRole",
             "Effect": "Allow",
             "Action": "iam:CreateServiceLinkedRole",
             "Resource": "arn:aws:iam::*:role/aws-service-role/elasticmapreduce.amazonaws.com*/AWSServiceRoleForEMRCleanup*",
             "Condition": {
                 "StringEquals": {
                     "iam:AWSServiceName": [
                         "elasticmapreduce.amazonaws.com",
                         "elasticmapreduce.amazonaws.com.rproxy.govskope.us.cn"
                     ]
                 }
             }
 }
```

**Important**  
If you used Amazon EMR before October 24, 2017, when service-linked roles weren't supported, then Amazon EMR created the AWSServiceRoleForEMRCleanup service-linked role in your account. For more information, see [A new role appeared in my IAM account](https://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_roles.html#troubleshoot_roles_new-role-appeared).

## Editing a service-linked role for Amazon EMR
<a name="edit-service-linked-role"></a>

Amazon EMR doesn't allow you to edit the AWSServiceRoleForEMRCleanup service-linked role. After you create a service-linked role, you can't change the name of the service-linked role because various entities might reference the service-linked role. However, you can edit the description of the service-linked role using IAM.

### Editing a service-linked role description (IAM console)
<a name="edit-service-linked-role-iam-console"></a>

You can use the IAM console to edit the description of a service-linked role.

**To edit the description of a service-linked role (console)**

1. In the navigation pane of the IAM console, choose **Roles**.

1. Choose the name of the role to modify.

1. To the right of the **Role description**, choose **Edit**. 

1. Enter a new description in the box and choose **Save changes**.

### Editing a service-linked role description (IAM CLI)
<a name="edit-service-linked-role-iam-cli"></a>

You can use IAM commands from the AWS Command Line Interface to edit the description of a service-linked role.

**To change the description of a service-linked role (CLI)**

1. (Optional) To view the current description for a role, use the following commands:

   ```
   $ aws iam get-role --role-name role-name
   ```

   Use the role name, not the ARN, to refer to roles with the CLI commands. For example, if a role has the following ARN: `arn:aws:iam::123456789012:role/myrole`, you refer to the role as **myrole**.

1. To update a service-linked role's description, use one of the following commands:

   ```
   $ aws iam update-role-description --role-name role-name --description description
   ```

### Editing a service-linked role description (IAM API)
<a name="edit-service-linked-role-iam-api"></a>

You can use the IAM API to edit the description of a service-linked role.

**To change the description of a service-linked role (API)**

1. (Optional) To view the current description for a role, use the following command:

   IAM API: [GetRole](https://docs.aws.amazon.com/IAM/latest/APIReference/API_GetRole.html) 

1. To update a role's description, use the following command: 

   IAM API: [UpdateRoleDescription](https://docs.aws.amazon.com/IAM/latest/APIReference/API_UpdateRoleDescription.html)

## Deleting a service-linked role for Amazon EMR
<a name="delete-service-linked-role"></a>

If you no longer need to use a feature or service that requires a service-linked role, we recommend that you delete that service-linked role. That way, you don't have an unused entity that is not being actively monitored or maintained. However, you must clean up your service-linked role before you can delete it.

### Cleaning up a service-linked role
<a name="service-linked-role-review-before-delete"></a>

Before you can use IAM to delete a service-linked role, you must first confirm that the service-linked role has no active sessions and remove any resources used by the service-linked role.

**To check whether the service-linked role has an active session in the IAM console**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Roles**. Select the name (not the check box) of the AWSServiceRoleForEMRCleanup service-linked role.

1. On the **Summary** page for the selected service-linked role, choose **Access Advisor**.

1. On the **Access Advisor** tab, review the recent activity for the service-linked role.
**Note**  
If you are unsure whether Amazon EMR is using the AWSServiceRoleForEMRCleanup service-linked role, you can try to delete the service-linked role. If the service is using the service-linked role, then the deletion fails and you can view the Regions where the service-linked role is being used. If the service-linked role is being used, then you must wait for the session to end before you can delete the service-linked role. You cannot revoke the session for a service-linked role. 

**To remove Amazon EMR resources used by the AWSServiceRoleForEMRCleanup**
+ Terminate all clusters in your account. For more information, see [Terminate an Amazon EMR cluster in the starting, running, or waiting states](UsingEMR_TerminateJobFlow.md).

### Deleting a service-linked role (IAM console)
<a name="delete-service-linked-role-iam-console"></a>

You can use the IAM console to delete a service-linked role.

**To delete a service-linked role (console)**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Roles**. Select the check box next to AWSServiceRoleForEMRCleanup, not the name or row itself. 

1. For **Role actions** at the top of the page, choose **Delete role**.

1. In the confirmation dialog box, review the service last accessed data, which shows when each of the selected roles last accessed an AWS service. This helps you to confirm whether the role is currently active. To proceed, choose **Yes, Delete**.

1. Watch the IAM console notifications to monitor the progress of the service-linked role deletion. Because the IAM service-linked role deletion is asynchronous, after you submit the service-linked role for deletion, the deletion task can succeed or fail. If the task fails, you can choose **View details** or **View Resources** from the notifications to learn why the deletion failed. If the deletion fails because there are resources in the service that are being used by the role, then the reason for the failure includes a list of resources.

### Deleting a service-linked role (IAM CLI)
<a name="delete-service-linked-role-iam-cli"></a>

You can use IAM commands from the AWS Command Line Interface to delete a service-linked role. Because a service-linked role cannot be deleted if it is being used or has associated resources, you must submit a deletion request. If these conditions are not met, that request can be denied. 

**To delete a service-linked role (CLI)**

1. To check the status of the deletion task, you must capture the `deletion-task-id` from the response. Type the following command to submit a service-linked role deletion request:

   ```
   $ aws iam [delete-service-linked-role](https://docs.aws.amazon.com/cli/latest/reference/iam/delete-service-linked-role.html) --role-name AWSServiceRoleForEMRCleanup
   ```

1. Type the following command to check the status of the deletion task:

   ```
   $ aws iam [get-service-linked-role-deletion-status](https://docs.aws.amazon.com/cli/latest/reference/iam/get-service-linked-role-deletion-status.html) --deletion-task-id deletion-task-id
   ```

   The status of the deletion task can be `NOT_STARTED`, `IN_PROGRESS`, `SUCCEEDED`, or `FAILED`. If the deletion fails, the call returns the reason that it failed so that you can troubleshoot.

### Deleting a service-linked role (IAM API)
<a name="delete-service-linked-role-iam-api"></a>

You can use the IAM API to delete a service-linked role. Because a service-linked role cannot be deleted if it is being used or has associated resources, you must submit a deletion request. If these conditions are not met, that request can be denied. 

**To delete a service-linked role (API)**

1. To submit a deletion request for a service-linked role, call [DeleteServiceLinkedRole](https://docs.aws.amazon.com/IAM/latest/APIReference/API_DeleteServiceLinkedRole.html). In the request, specify the AWSServiceRoleForEMRCleanup role name.

   To check the status of the deletion task, you must capture the `DeletionTaskId` from the response.

1. To check the status of the deletion, call [GetServiceLinkedRoleDeletionStatus](https://docs.aws.amazon.com/IAM/latest/APIReference/API_GetServiceLinkedRoleDeletionStatus.html). In the request, specify the `DeletionTaskId`.

   The status of the deletion task can be `NOT_STARTED`, `IN_PROGRESS`, `SUCCEEDED`, or `FAILED`. If the deletion fails, the call returns the reason that it failed so that you can troubleshoot.

## Supported Regions for AWSServiceRoleForEMRCleanup
<a name="emr-slr-regions"></a>

Amazon EMR supports using the AWSServiceRoleForEMRCleanup service-linked role in the following Regions.


****  

| Region name | Region identity | Support in Amazon EMR | 
| --- | --- | --- | 
| US East (N. Virginia) | us-east-1 | Yes | 
| US East (Ohio) | us-east-2 | Yes | 
| US West (N. California) | us-west-1 | Yes | 
| US West (Oregon) | us-west-2 | Yes | 
| Asia Pacific (Mumbai) | ap-south-1 | Yes | 
| Asia Pacific (Osaka) | ap-northeast-3 | Yes | 
| Asia Pacific (Seoul) | ap-northeast-2 | Yes | 
| Asia Pacific (Singapore) | ap-southeast-1 | Yes | 
| Asia Pacific (Sydney) | ap-southeast-2 | Yes | 
| Asia Pacific (Tokyo) | ap-northeast-1 | Yes | 
| Canada (Central) | ca-central-1 | Yes | 
| Europe (Frankfurt) | eu-central-1 | Yes | 
| Europe (Ireland) | eu-west-1 | Yes | 
| Europe (London) | eu-west-2 | Yes | 
| Europe (Paris) | eu-west-3 | Yes | 
| South America (São Paulo) | sa-east-1 | Yes | 

# Using service-linked roles with Amazon EMR for write-ahead logging
<a name="using-service-linked-roles-wal"></a>

Amazon EMR uses AWS Identity and Access Management (IAM) [service-linked roles](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_terms-and-concepts.html#iam-term-service-linked-role). A service-linked role is a unique type of IAM role that is linked directly to Amazon EMR. Service-linked roles are predefined by Amazon EMR and include all the permissions that the service requires to call other AWS services on your behalf.

Service-linked roles work together with the Amazon EMR service role and Amazon EC2 instance profile for Amazon EMR. For more information about the service role and instance profile, see [Configure IAM service roles for Amazon EMR permissions to AWS services and resources](emr-iam-roles.md).

A service-linked role makes setting up Amazon EMR easier because you don’t have to manually add the necessary permissions. Amazon EMR defines the permissions of its service-linked roles, and unless defined otherwise, only Amazon EMR can assume its roles. The defined permissions include the trust policy and the permissions policy, and that permissions policy cannot be attached to any other IAM entity.

You can delete this service-linked role for Amazon EMR only after you delete their related resources and terminate all EMR clusters in the account. This protects your Amazon EMR resources so that you can't inadvertently remove permission to access the resources.

## Service-linked role permissions for write-ahead logging (WAL)
<a name="using-service-linked-roles-permissions-wal"></a>

Amazon EMR uses the service-linked role **AWSServiceRoleForEMRWAL** to retrieve a cluster status. 

The AWSServiceRoleForEMRWAL service-linked role trusts the following services to assume the role:
+ `emrwal.amazonaws.com`

The [`EMRDescribeClusterPolicyForEMRWAL`](EMRDescribeClusterPolicyForEMRWAL.md) permissions policy for the service-linked role allows Amazon EMR to complete the following actions on the specified resources:
+ Action: `DescribeCluster` on `*`

You must configure permissions to allow an IAM entity (in this case, Amazon EMR WAL) to create, edit, or delete a service-linked role. Add the following statements as needed to the permissions policy for your instance profile:

## CreateServiceLinkedRole
<a name="iam-create-wal"></a>

**To allow an IAM entity to create the AWSServiceRoleForEMRWAL service-linked role**

Add the following statement to the permissions policy for the IAM entity that needs to create the service-linked role:

```
{
    "Effect": "Allow",
    "Action": [
        "iam:CreateServiceLinkedRole",
        "iam:PutRolePolicy"
    ],
    "Resource": "arn:aws:iam::*:role/aws-service-role/emrwal.amazonaws.com*/AWSServiceRoleForEMRWAL*",
    "Condition": {
        "StringLike": {
            "iam:AWSServiceName": [
                "emrwal.amazonaws.com",
                "elasticmapreduce.amazonaws.com.rproxy.govskope.us.cn"
            ]
        }
    }
}
```

## UpdateRoleDescription
<a name="iam-update-wal"></a>

**To allow an IAM entity to edit the description of the AWSServiceRoleForEMRWAL service-linked role**

Add the following statement to the permissions policy for the IAM entity that needs to edit the description of a service-linked role:

```
{
    "Effect": "Allow",
    "Action": [
        "iam:UpdateRoleDescription"
    ],
    "Resource": "arn:aws:iam::*:role/aws-service-role/emrwal.amazonaws.com*/AWSServiceRoleForEMRWAL*",
    "Condition": {
        "StringLike": {
            "iam:AWSServiceName": [
                "emrwal.amazonaws.com",
                "elasticmapreduce.amazonaws.com.rproxy.govskope.us.cn"
            ]
        }
    }
}
```

## DeleteServiceLinkedRole
<a name="iam-delete-wal"></a>

**To allow an IAM entity to delete the AWSServiceRoleForEMRWAL service-linked role**

Add the following statement to the permissions policy for the IAM entity that needs to delete a service-linked role:

```
{
    "Effect": "Allow",
    "Action": [
        "iam:DeleteServiceLinkedRole",
        "iam:GetServiceLinkedRoleDeletionStatus"
    ],
    "Resource": "arn:aws:iam::*:role/aws-service-role/elasticmapreduce.amazonaws.com*/AWSServiceRoleForEMRCleanup*",
    "Condition": {
        "StringLike": {
            "iam:AWSServiceName": [
                "emrwal.amazonaws.com",
                "elasticmapreduce.amazonaws.com.rproxy.govskope.us.cn"
            ]
        }
    }
}
```

## Creating a service-linked role for Amazon EMR
<a name="create-service-linked-role-wal"></a>

You don't need to manually create the AWSServiceRoleForEMRWAL role. Amazon EMR creates this service-linked role automatically when you create a WAL workspace with the EMRWAL CLI or from AWS CloudFormation, or HBase will create the service-linked role when you configure a workspace for Amazon EMR WAL and the service-linked role doesn't yet exist. You must have permissions to create a service-linked role. For example statements that add this capability to the permissions policy of an IAM entity (such as a user, group, or role), see the prior section, [Service-linked role permissions for write-ahead logging (WAL)](#using-service-linked-roles-permissions-wal).

## Editing a service-linked role for Amazon EMR
<a name="edit-service-linked-role-wal"></a>

Amazon EMR doesn't allow you to edit the AWSServiceRoleForEMRWAL service-linked role. After you create a service-linked role, you can't change the name of the service-linked role because various entities might reference the service-linked role. However, you can edit the description of the service-linked role using IAM.

### Editing a service-linked role description (IAM console)
<a name="edit-service-linked-role-iam-console"></a>

You can use the IAM console to edit the description of a service-linked role.

**To edit the description of a service-linked role (console)**

1. In the navigation pane of the IAM console, choose **Roles**.

1. Choose the name of the role to modify.

1. To the right of the **Role description**, choose **Edit**. 

1. Enter a new description in the box and choose **Save changes**.

### Editing a service-linked role description (IAM CLI)
<a name="edit-service-linked-role-iam-cli"></a>

You can use IAM commands from the AWS Command Line Interface to edit the description of a service-linked role.

**To change the description of a service-linked role (CLI)**

1. (Optional) To view the current description for a role, use the following commands:

   ```
   $ aws iam get-role --role-name role-name
   ```

   Use the role name, not the ARN, to refer to roles with the CLI commands. For example, if a role has the following ARN: `arn:aws:iam::123456789012:role/myrole`, you refer to the role as **myrole**.

1. To update a service-linked role's description, use one of the following commands:

   ```
   $ aws iam update-role-description --role-name role-name --description description
   ```

### Editing a service-linked role description (IAM API)
<a name="edit-service-linked-role-iam-api"></a>

You can use the IAM API to edit the description of a service-linked role.

**To change the description of a service-linked role (API)**

1. (Optional) To view the current description for a role, use the following command:

   IAM API: [GetRole](https://docs.aws.amazon.com/IAM/latest/APIReference/API_GetRole.html) 

1. To update a role's description, use the following command: 

   IAM API: [UpdateRoleDescription](https://docs.aws.amazon.com/IAM/latest/APIReference/API_UpdateRoleDescription.html)

## Deleting a service-linked role for Amazon EMR
<a name="delete-service-linked-role-wal"></a>

If you no longer need to use a feature or service that requires a service-linked role, we recommend that you delete that service-linked role. That way, you don't have an unused entity that is not being actively monitored or maintained. However, you must clean up your service-linked role before you can delete it.

**Note**  
The write-ahead logging operation isn't affected if you delete the AWSServiceRoleForEMRWAL role, but Amazon EMR won't auto-delete the logs that it created once your EMR cluster terminates. Therefore, you'll need to manually delete the Amazon EMR WAL logs if you delete the service-linked role.

### Cleaning up a service-linked role
<a name="service-linked-role-review-before-delete"></a>

Before you can use IAM to delete a service-linked role, you must first confirm that the role has no active sessions and remove any resources used by the role.

**To check whether the service-linked role has an active session in the IAM console**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Roles**. Select the name (not the check box) of the AWSServiceRoleForEMRWAL role.

1. On the **Summary** page for the selected role, choose **Access Advisor**.

1. On the **Access Advisor** tab, review the recent activity for the service-linked role.
**Note**  
If you are unsure whether Amazon EMR is using the AWSServiceRoleForEMRWAL role, you can try to delete the service-linked role. If the service is using the role, then the deletion fails and you can view the Regions where the service-linked role is being used. If the service-linked role is being used, then you must wait for the session to end before you can delete the service-linked role. You cannot revoke the session for a service-linked role. 

**To remove Amazon EMR resources used by the AWSServiceRoleForEMRWAL**
+ Terminate all clusters in your account. For more information, see [Terminate an Amazon EMR cluster in the starting, running, or waiting states](UsingEMR_TerminateJobFlow.md).

### Deleting a service-linked role (IAM console)
<a name="delete-service-linked-role-iam-console"></a>

You can use the IAM console to delete a service-linked role.

**To delete a service-linked role (console)**

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the navigation pane, choose **Roles**. Select the check box next to AWSServiceRoleForEMRWAL, not the name or row itself. 

1. For **Role actions** at the top of the page, choose **Delete role**.

1. In the confirmation dialog box, review the service last accessed data, which shows when each of the selected roles last accessed an AWS service. This helps you to confirm whether the role is currently active. To proceed, choose **Yes, Delete**.

1. Watch the IAM console notifications to monitor the progress of the service-linked role deletion. Because the IAM service-linked role deletion is asynchronous, after you submit the role for deletion, the deletion task can succeed or fail. If the task fails, you can choose **View details** or **View Resources** from the notifications to learn why the deletion failed. If the deletion fails because there are resources in the service that are being used by the role, then the reason for the failure includes a list of resources.

### Deleting a service-linked role (IAM CLI)
<a name="delete-service-linked-role-iam-cli"></a>

You can use IAM commands from the AWS Command Line Interface to delete a service-linked role. Because a service-linked role cannot be deleted if it is being used or has associated resources, you must submit a deletion request. If these conditions are not met, that request can be denied. 

**To delete a service-linked role (CLI)**

1. To check the status of the deletion task, you must capture the `deletion-task-id` from the response. Type the following command to submit a service-linked role deletion request:

   ```
   $ aws iam [delete-service-linked-role](https://docs.aws.amazon.com/cli/latest/reference/iam/delete-service-linked-role.html) --role-name AWSServiceRoleForEMRWAL
   ```

1. Type the following command to check the status of the deletion task:

   ```
   $ aws iam [get-service-linked-role-deletion-status](https://docs.aws.amazon.com/cli/latest/reference/iam/get-service-linked-role-deletion-status.html) --deletion-task-id deletion-task-id
   ```

   The status of the deletion task can be `NOT_STARTED`, `IN_PROGRESS`, `SUCCEEDED`, or `FAILED`. If the deletion fails, the call returns the reason that it failed so that you can troubleshoot.

### Deleting a service-linked role (IAM API)
<a name="delete-service-linked-role-iam-api"></a>

You can use the IAM API to delete a service-linked role. Because a service-linked role cannot be deleted if it is being used or has associated resources, you must submit a deletion request. If these conditions are not met, that request can be denied. 

**To delete a service-linked role (API)**

1. To submit a deletion request for a service-linked role, call [DeleteServiceLinkedRole](https://docs.aws.amazon.com/IAM/latest/APIReference/API_DeleteServiceLinkedRole.html). In the request, specify the AWSServiceRoleForEMRWAL role name.

   To check the status of the deletion task, you must capture the `DeletionTaskId` from the response.

1. To check the status of the deletion, call [GetServiceLinkedRoleDeletionStatus](https://docs.aws.amazon.com/IAM/latest/APIReference/API_GetServiceLinkedRoleDeletionStatus.html). In the request, specify the `DeletionTaskId`.

   The status of the deletion task can be `NOT_STARTED`, `IN_PROGRESS`, `SUCCEEDED`, or `FAILED`. If the deletion fails, the call returns the reason that it failed so that you can troubleshoot.

## Supported Regions for AWSServiceRoleForEMRWAL
<a name="emr-slr-regions-wal"></a>

Amazon EMR supports using the AWSServiceRoleForEMRWAL service-linked role in the following Regions.


****  

| Region name | Region identity | Support in Amazon EMR | 
| --- | --- | --- | 
| US East (N. Virginia) | us-east-1 | Yes | 
| US East (Ohio) | us-east-2 | Yes | 
| US West (N. California) | us-west-1 | Yes | 
| US West (Oregon) | us-west-2 | Yes | 
| Asia Pacific (Mumbai) | ap-south-1 | Yes | 
| Asia Pacific (Singapore) | ap-southeast-1 | Yes | 
| Asia Pacific (Sydney) | ap-southeast-2 | Yes | 
| Asia Pacific (Tokyo) | ap-northeast-1 | Yes | 
| Europe (Frankfurt) | eu-central-1 | Yes | 
| Europe (Ireland) | eu-west-1 | Yes | 

# Customize IAM roles with Amazon EMR
<a name="emr-iam-roles-custom"></a>

You may want to customize the IAM service roles and permissions to limit privileges according to your security requirements. To customize permissions, we recommend that you create new roles and policies. Begin with the permissions in the managed policies for the default roles (for example, `AmazonElasticMapReduceforEC2Role` and `AmazonElasticMapReduceRole`). Then, copy and paste the contents to new policy statements, modify the permissions as appropriate, and attach the modified permissions policies to the roles that you create. You must have the appropriate IAM permissions to work with roles and policies. For more information, see [Allow users and groups to create and modify roles](emr-iam-roles-create-permissions.md).

If you create a custom EMR role for EC2, follow the basic work flow, which automatically creates an instance profile of the same name. Amazon EC2 allows you to create instance profiles and roles with different names, but Amazon EMR does not support this configuration, and it results in an "invalid instance profile" error when you create the cluster. 

**Important**  
Inline policies are not automatically updated when service requirements change. If you create and attach inline policies, be aware that service updates might occur that suddenly cause permissions errors. For more information, see [Managed Policies and Inline Policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/policies_managed-vs-inline.html) in the *IAM User Guide* and [Specify custom IAM roles when you create a cluster](#emr-iam-roles-launch-jobflow).

For more information about working with IAM roles, see the following topics in the *IAM User Guide*:
+  [Creating a role to delegate permissions to an AWS service](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html) 
+  [Modifying a role](https://docs.aws.amazon.com/IAM/latest/UserGuide/modifying-role.html) 
+  [Deleting a role](https://docs.aws.amazon.com/IAM/latest/UserGuide/deleting-roles.html) 

## Specify custom IAM roles when you create a cluster
<a name="emr-iam-roles-launch-jobflow"></a>

You specify the service role for Amazon EMR and the role for the Amazon EC2 instance profile when you create a cluster. The user who is creating clusters needs permissions to retrieve and assign roles to Amazon EMR and EC2 instances. Otherwise, a **account is not authorized to call EC2** error occurs. For more information, see [Allow users and groups to create and modify roles](emr-iam-roles-create-permissions.md).

### Use the console to specify custom roles
<a name="emr-iam-roles-launch-console"></a>

When you create a cluster, you can specify a custom service role for Amazon EMR, a custom role for the EC2 instance profile, and a custom Auto Scaling role using **Advanced options**. When you use **Quick options**, the default service role and the default role for the EC2 instance profile are specified. For more information, see [IAM service roles used by Amazon EMR](emr-iam-service-roles.md).

------
#### [ Console ]

**To specify custom IAM roles with the console**

When you create a cluster with the console, you must specify a custom service role for Amazon EMR and a custom role for the EC2 instance profile. For more information, see [IAM service roles used by Amazon EMR](emr-iam-service-roles.md).

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. Under **Security configuration and permissions**, find the **IAM role for instance profile** and **Service role for Amazon EMR** fields. For each role type, select a role from the list. Only roles within your account that have the appropriate trust policy for that role type are listed.

1. Choose any other options that apply to your cluster. 

1. To launch your cluster, choose **Create cluster**.

------

### Use the AWS CLI to specify custom roles
<a name="emr-iam-roles-launch-cli"></a>

You can specify a service role for Amazon EMR and a service role for cluster EC2 instances explicitly using options with the `create-cluster` command from the AWS CLI. Use the `--service-role` option to specify the service role. Use the `InstanceProfile` argument of the `--ec2-attributes` option to specify the role for the EC2 instance profile.

The Auto Scaling role is specified using a separate option, `--auto-scaling-role`. For more information, see [Using automatic scaling with a custom policy for instance groups in Amazon EMR](emr-automatic-scaling.md).

**To specify custom IAM roles using the AWS CLI**
+ The following command specifies the custom service role, *MyCustomServiceRoleForEMR*, and a custom role for the EC2 instance profile, *MyCustomServiceRoleForClusterEC2Instances*, when launching a cluster. This example uses the default Amazon EMR role.
**Note**  
Linux line continuation characters (\$1) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

  ```
  aws emr create-cluster --name "Test cluster" --release-label emr-7.12.0 \
  --applications Name=Hive Name=Pig --service-role MyCustomServiceRoleForEMR \
  --ec2-attributes InstanceProfile=MyCustomServiceRoleForClusterEC2Instances,\
  KeyName=myKey --instance-type m5.xlarge --instance-count 3
  ```

You can use these options to specify default roles explicitly rather than using the `--use-default-roles` option. The `--use-default-roles` option specifies the service role and the role for the EC2 instance profile defined in the `config` file for the AWS CLI.

The following example demonstrates the contents of a `config` file for the AWS CLI the specifies custom roles for Amazon EMR. With this configuration file, when the `--use-default-roles` option is specified, the cluster is created using the *MyCustomServiceRoleForEMR* and *MyCustomServiceRoleForClusterEC2Instances*. By default, the `config` file specifies the default `service_role` as `AmazonElasticMapReduceRole` and the default `instance_profile` as `EMR_EC2_DefaultRole`.

```
[default]
output = json
region = us-west-1
aws_access_key_id = myAccessKeyID
aws_secret_access_key = mySecretAccessKey
emr =
     service_role = MyCustomServiceRoleForEMR
     instance_profile = MyCustomServiceRoleForClusterEC2Instances
```

# Configure IAM roles for EMRFS requests to Amazon S3
<a name="emr-emrfs-iam-roles"></a>

**Note**  
The EMRFS role mapping capability described on this page has been improved upon with the introduction of Amazon S3 Access Grants in Amazon EMR 6.15.0. For a scalable access control solution for your data in Amazon S3, we recommend that you use [S3 Access Grants with Amazon EMR](emr-access-grants.md).

When an application running on a cluster references data using the `s3://mydata` format, Amazon EMR uses EMRFS to make the request. To interact with Amazon S3, EMRFS assumes the permissions policies that are attached to your [Amazon EC2 instance profile](emr-iam-role-for-ec2.md). The same Amazon EC2 instance profile is used regardless of the user or group running the application or the location of the data in Amazon S3. 

If you have a cluster with multiple users who need different levels of access to data in Amazon S3 through EMRFS, you can set up a security configuration with IAM roles for EMRFS. EMRFS can assume a different service role for cluster EC2 instances based on the user or group making the request, or based on the location of data in Amazon S3. Each IAM role for EMRFS can have different permissions for data access in Amazon S3. For more information about the service role for cluster EC2 instances, see [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md).

Using custom IAM roles for EMRFS is supported in Amazon EMR versions 5.10.0 and later. If you use an earlier version or have requirements beyond what IAM roles for EMRFS provide, you can create a custom credentials provider instead. For more information, see [Authorizing access to EMRFS data in Amazon S3](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-plan-credentialsprovider). 

When you use a security configuration to specify IAM roles for EMRFS, you set up role mappings. Each role mapping specifies an IAM role that corresponds to identifiers. These identifiers determine the basis for access to Amazon S3 through EMRFS. The identifiers can be users, groups, or Amazon S3 prefixes that indicate a data location. When EMRFS makes a request to Amazon S3, if the request matches the basis for access, EMRFS has cluster EC2 instances assume the corresponding IAM role for the request. The IAM permissions attached to that role apply instead of the IAM permissions attached to the service role for cluster EC2 instances.

The users and groups in a role mapping are Hadoop users and groups that are defined on the cluster. Users and groups are passed to EMRFS in the context of the application using it (for example, YARN user impersonation). The Amazon S3 prefix can be a bucket specifier of any depth (for example, `s3://amzn-s3-demo-bucket` or `s3://amzn-s3-demo-bucket/myproject/mydata`). You can specify multiple identifiers within a single role mapping, but they all must be of the same type.

**Important**  
IAM roles for EMRFS provide application-level isolation between users of the application. It does not provide host level isolation between users on the host. Any user with access to the cluster can bypass the isolation to assume any of the roles.

When a cluster application makes a request to Amazon S3 through EMRFS, EMRFS evaluates role mappings in the top-down order that they appear in the security configuration. If a request made through EMRFS doesn't match any identifier, EMRFS falls back to using the service role for cluster EC2 instances. For this reason, we recommend that the policies attached to this role limit permissions to Amazon S3. For more information, see [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md).

## Configure roles
<a name="emr-emrfs-iam-roles-role-configuration"></a>

Before you set up a security configuration with IAM roles for EMRFS, plan and create the roles and permission policies to attach to the roles. For more information, see [How do roles for EC2 instances work?](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html) in the *IAM User Guide*. When creating permissions policies, we recommend that you start with the managed policy attached to the default Amazon EMR role for EC2, and then edit this policy according to your requirements. The default role name is `EMR_EC2_DefaultRole`, and the default managed policy to edit is `AmazonElasticMapReduceforEC2Role`. For more information, see [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md).

### Updating trust policies to assume role permissions
<a name="emr-emrfs-iam-role-trust-policy"></a>

Each role that EMRFS uses must have a trust policy that allows the cluster's Amazon EMR role for EC2 to assume it. Similarly, the cluster's Amazon EMR role for EC2 must have a trust policy that allows EMRFS roles to assume it.

The following example trust policy is attached to roles for EMRFS. The statement allows the default Amazon EMR role for EC2 to assume the role. For example, if you have two fictitious EMRFS roles, `EMRFSRole_First` and `EMRFSRole_Second`, this policy statement is added to the trust policies for each of them.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole"
      ],
      "Resource": "arn:aws:iam::123456789012:role/EMR_EC2_DefaultRole",
      "Sid": "AllowSTSAssumerole"
    }
  ]
}
```

------

In addition, the following example trust policy statement is added to the `EMR_EC2_DefaultRole` to allow the two fictitious EMRFS roles to assume it.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "sts:AssumeRole"
      ],
      "Resource": [
        "arn:aws:iam::123456789012:role/EMRFSRole_First",
        "arn:aws:iam::123456789012:role/EMRFSRole_Second"
      ],
      "Sid": "AllowSTSAssumerole"
    }
  ]
}
```

------

**To update the trust policy of an IAM role**

Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Choose **Roles**, enter the name of the role in **Search**, and then select its **Role name**.

1. Choose **Trust relationships**, **Edit trust relationship**.

1. Add a trust statement according to the **Policy document** according to the guidelines above, and then choose **Update trust policy**.

### Specifying a role as a key user
<a name="emr-emrfs-iam-role-key-user"></a>

If a role allows access to a location in Amazon S3 that is encrypted using an AWS KMS key, make sure that the role is specified as a key user. This gives the role permission to use the KMS key. For more information, see [Key policies in AWS KMS](https://docs.aws.amazon.com//kms/latest/developerguide/key-policies.html#key-policy-default-allow-users) in the *AWS Key Management Service Developer Guide*.

## Set up a security configuration with IAM roles for EMRFS
<a name="emr-emrfs-iam-roles-setup"></a>

**Important**  
If none of the IAM roles for EMRFS that you specify apply, EMRFS falls back to the Amazon EMR role for EC2. Consider customizing this role to restrict permissions to Amazon S3 as appropriate for your application and then specifying this custom role instead of `EMR_EC2_DefaultRole` when you create a cluster. For more information, see [Customize IAM roles with Amazon EMR](emr-iam-roles-custom.md) and [Specify custom IAM roles when you create a cluster](emr-iam-roles-custom.md#emr-iam-roles-launch-jobflow).

**To specify IAM roles for EMRFS requests to Amazon S3 using the console**

1. Create a security configuration that specifies role mappings:

   1. In the Amazon EMR console, select **Security configurations**, **Create**.

   1. Type a **Name** for the security configuration. You use this name to specify the security configuration when you create a cluster.

   1. Choose **Use IAM roles for EMRFS requests to Amazon S3**.

   1. Select an **IAM role** to apply, and under **Basis for access** select an identifier type (**Users**, **Groups**, or **S3 prefixes**) from the list and enter corresponding identifiers. If you use multiple identifiers, separate them with a comma and no space. For more information about each identifier type, see the [JSON configuration reference](#emrfs-seccfg-json) below.

   1. Choose **Add role** to set up additional role mappings as described in the previous step.

   1. Set up other security configuration options as appropriate and choose **Create**. For more information, see [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md).

1. Specify the security configuration you created above when you create a cluster. For more information, see [Specify a security configuration for an Amazon EMR cluster](emr-specify-security-configuration.md).

**To specify IAM roles for EMRFS requests to Amazon S3 using the AWS CLI**

1. Use the `aws emr create-security-configuration` command, specifying a name for the security configuration, and the security configuration details in JSON format.

   The example command shown below creates a security configuration with the name `EMRFS_Roles_Security_Configuration`. It is based on a JSON structure in the file `MyEmrfsSecConfig.json`, which is saved in the same directory where the command is executed.

   ```
   aws emr create-security-configuration --name EMRFS_Roles_Security_Configuration --security-configuration file://MyEmrFsSecConfig.json.
   ```

   Use the following guidelines for the structure of the `MyEmrFsSecConfig.json` file. You can specify this structure along with structures for other security configuration options. For more information, see [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md).

   The following is an example JSON snippet for specifying custom IAM roles for EMRFS within a security configuration. It demonstrates role mappings for the three different identifier types, followed by a parameter reference. 

   ```
   {
     "AuthorizationConfiguration": {
       "EmrFsConfiguration": {
         "RoleMappings": [{
           "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_for_user1",
           "IdentifierType": "User",
           "Identifiers": [ "user1" ]
         },{
           "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_to_demo_s3_buckets",
           "IdentifierType": "Prefix",
           "Identifiers": [ "s3://amzn-s3-demo-bucket1/","s3://amzn-s3-demo-bucket2/" ]
         },{
           "Role": "arn:aws:iam::123456789101:role/allow_EMRFS_access_for_AdminGroup",
           "IdentifierType": "Group",
           "Identifiers": [ "AdminGroup" ]
         }]
       }
     }
   }
   ```    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html)

1. Use the `aws emr create-cluster` command to create a cluster and specify the security configuration you created in the previous step. 

   The following example creates a cluster with default core Hadoop applications installed. The cluster uses the security configuration created above as `EMRFS_Roles_Security_Configuration` and also uses a custom Amazon EMR role for EC2, `EC2_Role_EMR_Restrict_S3`, which is specified using the `InstanceProfile` argument of the `--ec2-attributes` parameter.
**Note**  
Linux line continuation characters (\$1) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

   ```
   aws emr create-cluster --name MyEmrFsS3RolesCluster \
   --release-label emr-7.12.0 --ec2-attributes InstanceProfile=EC2_Role_EMR_Restrict_S3,KeyName=MyKey \
   --instance-type m5.xlarge --instance-count 3 \
   --security-configuration EMRFS_Roles_Security_Configuration
   ```

# Use resource-based policies for Amazon EMR access to AWS Glue Data Catalog
<a name="emr-iam-roles-glue"></a>

If you use AWS Glue in conjunction with Hive, Spark, or Presto in Amazon EMR, AWS Glue supports resource-based policies to control access to Data Catalog resources. These resources include databases, tables, connections, and user-defined functions. For more information, see [AWS Glue Resource Policies](https://docs.aws.amazon.com/glue/latest/dg/glue-resource-policies.html) in the *AWS Glue Developer Guide*.

When using resource-based policies to limit access to AWS Glue from within Amazon EMR, the principal that you specify in the permissions policy must be the role ARN associated with the EC2 instance profile that is specified when a cluster is created. For example, for a resource-based policy attached to a catalog, you can specify the role ARN for the default service role for cluster EC2 instances, *EMR\$1EC2\$1DefaultRole* as the `Principal`, using the format shown in the following example:

```
arn:aws:iam::acct-id:role/EMR_EC2_DefaultRole
```

The *acct-id* can be different from the AWS Glue account ID. This enables access from EMR clusters in different accounts. You can specify multiple principals, each from a different account.

# Use IAM roles with applications that call AWS services directly
<a name="emr-iam-roles-calling"></a>

Applications running on the EC2 instances of a cluster can use the EC2 instance profile to obtain temporary security credentials when calling AWS services.

The versions of Hadoop available with Amazon EMR release 2.3.0 and later have already been updated to make use of IAM roles. If your application runs strictly on top of the Hadoop architecture, and does not directly call any service in AWS, it should work with IAM roles with no modification.

If your application calls services in AWS directly, you need to update it to take advantage of IAM roles. This means that instead of obtaining account credentials from `/etc/hadoop/conf/core-site.xml` on the EC2 instances in the cluster, your application uses an SDK to access the resources using IAM roles, or calls the EC2 instance metadata to obtain the temporary credentials.

**To access AWS resources with IAM roles using an SDK**
+ The following topics show how to use several of the AWS SDKs to access temporary credentials using IAM roles. Each topic starts with a version of an application that does not use IAM roles and then walks you through the process of converting that application to use IAM roles. 
  +  [Using IAM roles for Amazon EC2 instances with the SDK for Java](https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/java-dg-roles.html) in the *AWS SDK for Java Developer Guide* 
  +  [Using IAM roles for Amazon EC2 instances with the SDK for .NET](https://docs.aws.amazon.com/sdk-for-net/v4/developer-guide/net-dg-hosm.html) in the *AWS SDK for .NET Developer Guide* 
  +  [Using IAM roles for Amazon EC2 instances with the SDK for PHP](https://docs.aws.amazon.com/sdk-for-php/v3/developer-guide/guide_credentials_assume_role.html) in the *AWS SDK for PHP Developer Guide* 
  +  [Using IAM roles for Amazon EC2 instances with the SDK for Ruby](https://docs.aws.amazon.com/sdk-for-ruby/v3/developer-guide/credential-providers.html) in the *AWS SDK for Ruby Developer Guide* 

**To obtain temporary credentials from EC2 instance metadata**
+ Call the following URL from an EC2 instance that is running with the specified IAM role, which returns the associated temporary security credentials (AccessKeyId, SecretAccessKey, SessionToken, and Expiration). The following example uses the default instance profile for Amazon EMR, `EMR_EC2_DefaultRole`. 

  ```
  GET http://169.254.169.254/latest/meta-data/iam/security-credentials/EMR_EC2_DefaultRole
  ```

For more information about writing applications that use IAM roles, see [Granting applications that run on Amazon EC2 instances access to AWS resources](https://docs.aws.amazon.com/IAM/latest/UserGuide/role-usecase-ec2app.html).

For more information about temporary security credentials, see [Using temporary security credentials](https://docs.aws.amazon.com/STS/latest/UsingSTS/using-temp-creds.html) in the *Using Temporary Security Credentials* guide. 

# Allow users and groups to create and modify roles
<a name="emr-iam-roles-create-permissions"></a>

IAM principals (users and groups) who create, modify, and specify roles for a cluster, including default roles, must be allowed to perform the following actions. For details about each action, see [Actions](https://docs.aws.amazon.com/IAM/latest/APIReference/API_Operations.html) in the *IAM API Reference*.
+ `iam:CreateRole`
+ `iam:PutRolePolicy`
+ `iam:CreateInstanceProfile`
+ `iam:AddRoleToInstanceProfile`
+ `iam:ListRoles`
+ `iam:GetPolicy`
+ `iam:GetInstanceProfile`
+ `iam:GetPolicyVersion`
+ `iam:AttachRolePolicy`
+ `iam:PassRole`

The `iam:PassRole` permission allows cluster creation. The remaining permissions allow the creation of the default roles.

For information about assigning permissions to a user, see [Changing permissions for a user](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_change-permissions.html) in the *IAM User Guide*.

# Amazon EMR identity-based policy examples
<a name="security_iam_id-based-policy-examples"></a>

By default, a users and roles don't have permission to create or modify Amazon EMR resources. They also can't perform tasks using the AWS Management Console, AWS CLI, or AWS API. An IAM administrator must create IAM policies that grant users and roles permission to perform specific API operations on the specified resources they need. The administrator must then attach those policies to the a users or groups that require those permissions.

To learn how to create an IAM identity-based policy using these example JSON policy documents, see [Creating policies on the JSON tab](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html#access_policies_create-json-editor) in the *IAM User Guide*.

**Topics**
+ [Policy best practices for Amazon EMR](security_iam_service-with-iam-policy-best-practices.md)
+ [Allow users to view their own permissions](security_iam_id-based-policy-examples-view-own-permissions.md)
+ [Amazon EMR managed policies](emr-managed-iam-policies.md)
+ [IAM policies for tag-based access to clusters and EMR notebooks](emr-fine-grained-cluster-access.md)
+ [Denying the ModifyInstanceGroup action in Amazon EMR](emr-cluster-deny-modifyinstancegroup.md)
+ [Troubleshooting Amazon EMR identity and access](security_iam_troubleshoot.md)

# Policy best practices for Amazon EMR
<a name="security_iam_service-with-iam-policy-best-practices"></a>

Identity-based policies are very powerful. They determine whether someone can create, access, or delete Amazon EMR resources in your account. These actions can incur costs for your AWS account. When you create or edit identity-based policies, follow these guidelines and recommendations:
+ **Get Started Using AWS Managed Policies** – To start using Amazon EMR quickly, use AWS managed policies to give your employees the permissions they need. These policies are already available in your account and are maintained and updated by AWS. For more information, see [Get started using permissions with AWS managed policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#bp-use-aws-defined-policies) in the *IAM User Guide* and [Amazon EMR managed policies](emr-managed-iam-policies.md).
+ **Grant Least Privilege** – When you create custom policies, grant only the permissions required to perform a task. Start with a minimum set of permissions and grant additional permissions as necessary. Doing so is more secure than starting with permissions that are too lenient and then trying to tighten them later. For more information, see [Grant least privilege](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html#grant-least-privilege) in the *IAM User Guide*.
+ **Enable MFA for Sensitive Operations** – For extra security, require a users to use multi-factor authentication (MFA) to access sensitive resources or API operations. For more information, see [Using multi-factor authentication (MFA) in AWS](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_mfa.html) in the *IAM User Guide*.
+ **Use Policy Conditions for Extra Security** – To the extent that it's practical, define the conditions under which your identity-based policies allow access to a resource. For example, you can write conditions to specify a range of allowable IP addresses that a request must come from. You can also write conditions to allow requests only within a specified date or time range, or to require the use of SSL or MFA. For more information, see [IAM JSON policy elements: Condition](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_condition.html) in the *IAM User Guide*.

# Allow users to view their own permissions
<a name="security_iam_id-based-policy-examples-view-own-permissions"></a>

This example shows how you might create a policy that allows a users to view the inline and managed policies that are attached to their user identity. This policy includes permissions to complete this action on the console or programmatically using the AWS CLI or AWS API.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "ViewOwnUserInfo",
      "Effect": "Allow",
      "Action": [
        "iam:GetUser",
        "iam:GetUserPolicy",
        "iam:ListAttachedUserPolicies",
        "iam:ListGroupsForUser",
        "iam:ListUserPolicies"
      ],
      "Resource": [
        "arn:aws:iam::*:user/${aws:username}"
      ]
    },
    {
      "Sid": "NavigateInConsole",
      "Effect": "Allow",
      "Action": [
        "iam:GetGroupPolicy",
        "iam:GetPolicy",
        "iam:GetPolicyVersion",
        "iam:ListAttachedGroupPolicies",
        "iam:ListGroupPolicies",
        "iam:ListGroups",
        "iam:ListPolicies",
        "iam:ListPolicyVersions",
        "iam:ListUsers"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

# Amazon EMR managed policies
<a name="emr-managed-iam-policies"></a>

The easiest way to grant full access or read-only access to required Amazon EMR actions is to use the IAM managed policies for Amazon EMR. Managed policies offer the benefit of updating automatically if permission requirements change. If you use inline policies, service changes may occur that cause permission errors to appear. 

Amazon EMR will be deprecating existing managed policies (v1 policies) in favor of new managed policies (v2 policies). The new managed policies have been scoped-down to align with AWS best practices. After the existing v1 managed policies are deprecated, you will not be able to attach these policies to any new IAM roles or users. Existing roles and users that use deprecated policies can continue to use them. The v2 managed policies restrict access using tags. They allow only specified Amazon EMR actions and require cluster resources that are tagged with an EMR-specific key. We recommend that you carefully review the documentation before using the new v2 policies.

The v1 policies will be marked deprecated with a warning icon next to them in the **Policies** list in the IAM console. The deprecated policies will have the following characteristics:
+ They will continue to work for all currently attached users, groups, and roles. Nothing breaks.
+ They cannot be attached to new users, groups, or roles. If you detach one of the policies from a current entity, you cannot reattach it.
+ After you detach a v1 policy from all current entities, the policy will no longer be visible and can no longer be used.

The following table summarizes the changes between current policies (v1) and v2 policies.


**Amazon EMR managed policy changes**  

| Policy type | Policy names | Policy purpose | Changes to v2 policy | 
| --- | --- | --- | --- | 
|  Default EMR service role and attached managed policy  |   Role name: **EMR\$1DefaultRole** V1 policy (to be deprecated): **AmazonElasticMapReduceRole** (EMR Service Role)  V2 (scoped-down) policy name: [`AmazonEMRServicePolicy_v2`](emr-iam-role.md)  |  Allows Amazon EMR to call other AWS services on your behalf when provisioning resources and performing service-level actions. This role is required for all clusters.  |  Policy adds the new permission `"ec2:DescribeInstanceTypeOfferings"`. This API operation returns a list of instance types that are supported by a list of given Availability Zones.  | 
|  IAM managed policy for full Amazon EMR access by attached user, role, or group  |   V2 (scoped) policy name: [`AmazonEMRServicePolicy_v2`](emr-managed-policy-fullaccess-v2.md)  |  Allows users full permissions for EMR actions. Includes iam:PassRole permissions for resources.  |  Policy adds a prerequisite that users must add user tags to resources before they can use this policy. See [Tagging resources to use managed policies](#manually-tagged-resources). iam:PassRole action requires iam:PassedToService condition set to specified service. Access to Amazon EC2, Amazon S3, and other services is not allowed by default. See [IAM Managed Policy for Full Access (v2 Managed Default Policy)](emr-managed-policy-fullaccess-v2.md).  | 
|  IAM managed policy for read-only access by attached user, role, or group  |  V1 policy (to be deprecated): [`AmazonElasticMapReduceReadOnlyAccess`](emr-managed-policy-readonly.md)  V2 (scoped) policy name: [`AmazonEMRReadOnlyAccessPolicy_v2`](emr-managed-policy-readonly-v2.md)  |  Allows users read-only permissions for Amazon EMR actions.  |  Permissions allow only specified elasticmapreduce read-only actions. Access to Amazon S3 is access not allowed by default. See [IAM Managed Policy for Read-Only Access (v2 Managed Default Policy)](emr-managed-policy-readonly-v2.md).  | 
|  Default EMR service role and attached managed policy  |   Role name: **EMR\$1DefaultRole** V1 policy (to be deprecated): **AmazonElasticMapReduceRole** (EMR Service Role)  V2 (scoped-down) policy name: [`AmazonEMRServicePolicy_v2`](emr-iam-role.md)  |  Allows Amazon EMR to call other AWS services on your behalf when provisioning resources and performing service-level actions. This role is required for all clusters.  |  The v2 service role and v2 default policy replace the deprecated role and policy. The policy adds a prerequisite that users must add user tags to resources before they can use this policy. See [Tagging resources to use managed policies](#manually-tagged-resources). See [Service role for Amazon EMR (EMR role)](emr-iam-role.md).  | 
|  Service role for cluster EC2 instances (EC2 instance profile)  |  Role name: **EMR\$1EC2\$1DefaultRole** Deprecated policy name: **AmazonElasticMapReduceforEC2Role**  |  Allows applications running on an EMR cluster to access other AWS resources, such as Amazon S3. For example, if you run Apache Spark jobs that process data from Amazon S3, the policy needs to allow access to such resources.  |  Both default role and default policy are on the path to deprecation. There is no replacement AWS default managed role or policy. You need to provide a resource-based or identity-based policy. This means that, by default, applications running on an EMR cluster do not have access to Amazon S3 or other resource unless you manually add these to the policy. See [Default role and managed policy](emr-iam-role-for-ec2.md#emr-ec2-role-default).  | 
|  Other EC2 service role policies  |  Current policy names: **AmazonElasticMapReduceforAutoScalingRole, AmazonElasticMapReduceEditorsRole, AmazonEMRCleanupPolicy**  |  Provides permissions that Amazon EMR needs to access other AWS resources and perform actions if using auto scaling, notebooks, or to clean up EC2 resources.  |  No changes for v2.  | 

## Securing iam:PassRole
<a name="securing-iampassrole"></a>

The Amazon EMR full-permissions default managed policies incorporate `iam:PassRole` security configurations, including the following:
+ `iam:PassRole` permissions only for specific default Amazon EMR roles.
+ `iam:PassedToService` conditions that allow you to use the policy with only specified AWS services, such as `elasticmapreduce.amazonaws.com` and `ec2.amazonaws.com`.

You can view the JSON version of the [AmazonEMRFullAccessPolicy\$1v2](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AmazonEMRFullAccessPolicy_v2) and [AmazonEMRServicePolicy\$1v2](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AmazonEMRServicePolicy_v2) policies in the IAM console. We recommend that you create new clusters with the v2 managed policies.

To create custom policies, we recommend that you begin with the managed policies and edit them according to your requirements.

For information about how to attach policies to a users (principals), see [Working with managed policies using the AWS Management Console](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-using.html#policies_using-managed-console) in the *IAM User Guide*.

## Tagging resources to use managed policies
<a name="manually-tagged-resources"></a>

**AmazonEMRServicePolicy\$1v2** and **AmazonEMRFullAccessPolicy\$1v2** depend on scoped-down access to resources that Amazon EMR provisions or uses. The scope down is achieved by restricting access to only those resources that have a predefined user tag associated with them. When you use either of these two policies, you must pass the predefined user tag `for-use-with-amazon-emr-managed-policies = true` when you provision the cluster. Amazon EMR will then automatically propagate that tag. Additionally, you must add a user tag to the resources listed in the following section. If you use the Amazon EMR console to launch your cluster, see [Considerations for using the Amazon EMR console to launch clusters with v2 managed policies](#emr-cluster-v2policy-awsconsole-launch).

To use managed policies, pass the user tag `for-use-with-amazon-emr-managed-policies = true` when you provision a cluster with the CLI, SDK, or another method.

When you pass the tag, Amazon EMR propagates the tag to private subnet ENI, EC2 instance, and EBS volumes that it creates. Amazon EMR also automatically tags security groups that it creates. However, if you want Amazon EMR to launch with a certain security group, you must tag it. For resources that are not created by Amazon EMR, you must add tags to those resources. For example, you must tag Amazon EC2 subnets, EC2 security groups (if not created by Amazon EMR), and VPCs (if you want Amazon EMR to create security groups). To launch clusters with v2 managed policies in VPCs, you must tag those VPCs with the predefined user tag. See, [Considerations for using the Amazon EMR console to launch clusters with v2 managed policies](#emr-cluster-v2policy-awsconsole-launch).

**Propagated user-specified tagging**  
Amazon EMR tags resources that it creates using the Amazon EMR tags that you specify when creating a cluster. Amazon EMR applies tags to the resources it creates during the lifetime of the cluster.

Amazon EMR propagates user tags for the following resources:
+ Private Subnet ENI (service access elastic network interfaces)
+ EC2 Instances
+ EBS Volumes
+ EC2 Launch Template

**Automatically-tagged security groups**  
Amazon EMR tags EC2 security groups that it creates with the tag that is required for v2 managed policies for Amazon EMR, `for-use-with-amazon-emr-managed-policies`, regardless of which tags you specify in the create cluster command. For a security group that was created before the introduction of v2 managed policies, Amazon EMR does not automatically tag the security group. If you want to use v2 managed policies with the default security groups that already exist in the account, you need to tag the security groups manually with `for-use-with-amazon-emr-managed-policies = true`.

**Manually-tagged cluster resources**  
You must manually tag some cluster resources so that they can be accessed by Amazon EMR default roles.
+ You must manually tag EC2 security groups and EC2 subnets with the Amazon EMR managed policy tag `for-use-with-amazon-emr-managed-policies`.
+ You must manually tag a VPC if you want Amazon EMR to create default security groups. EMR will try to create a security group with the specific tag if the default security group doesn't already exist.

Amazon EMR automatically tags the following resources:
+ EMR-created EC2 Security Groups

You must manually tag the following resources:
+ EC2 Subnet
+ EC2 Security Groups

Optionally, you can manually tag the following resources:
+ VPC - only when you want Amazon EMR to create security groups

## Considerations for using the Amazon EMR console to launch clusters with v2 managed policies
<a name="emr-cluster-v2policy-awsconsole-launch"></a>

You can provision clusters with v2 managed policies using the Amazon EMR console. Here are some considerations when you use the console to launch Amazon EMR clusters.
+ You do not need to pass the predefined tag. Amazon EMR automatically adds the tag and propagates it to the appropriate components.
+ For components that need to be manually tagged, the old Amazon EMR console tries to automatically tag them if you have the required permissions to tag resources. If you don’t have the permissions to tag resources or if you want to use the console, ask your administrator to tag those resources. 
+ You cannot launch clusters with v2 managed policies unless all the prerequisites are met.
+ The old Amazon EMR console shows you which resources (VPC/Subnets) need to be tagged.

# IAM managed policy for full access (v2 managed default policy) for Amazon EMR
<a name="emr-managed-policy-fullaccess-v2"></a>

The v2 scoped EMR default managed policies grant specific access privileges to users. They require a predefined Amazon EMR resource tag and `iam:PassRole` condition keys to resources used by Amazon EMR, such as the `Subnet` and `SecurityGroup` you use to launch your cluster.

To grant required actions scoped for Amazon EMR, attach the `AmazonEMRFullAccessPolicy_v2` managed policy. This updated default managed policy replaces the [`AmazonElasticMapReduceFullAccess`](emr-managed-policy-fullaccess.md) managed policy.

`AmazonEMRFullAccessPolicy_v2` depends on scoped-down access to resources that Amazon EMR provisions or uses. When you use this policy, you need to pass the user tag `for-use-with-amazon-emr-managed-policies = true` when provisioning the cluster. Amazon EMR will automatically propagate the tag. Additionally, you may need to manually add a user tag to specific types of resources, such as EC2 security groups that were not created by Amazon EMR. For more information, see [Tagging resources to use managed policies](emr-managed-iam-policies.md#manually-tagged-resources).

The [https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEMRFullAccessPolicy_v2](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEMRFullAccessPolicy_v2) policy secures resources by doing the following:
+ Requires resources to be tagged with the pre-defined Amazon EMR managed policies tag `for-use-with-amazon-emr-managed-policies` for cluster creation and Amazon EMR access.
+ Restricts the `iam:PassRole` action to specific default roles and `iam:PassedToService` access to specific services.
+ No longer provides access to Amazon EC2, Amazon S3, and other services by default.

Following are the contents of this policy.

**Note**  
You can also use the console link [https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEMRFullAccessPolicy_v2](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEMRFullAccessPolicy_v2) to view the policy.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "RunJobFlowExplicitlyWithEMRManagedTag",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:RunJobFlow"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/for-use-with-amazon-emr-managed-policies": "true"
        }
      }
    },
    {
      "Sid": "ElasticMapReduceActions",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:AddInstanceFleet",
        "elasticmapreduce:AddInstanceGroups",
        "elasticmapreduce:AddJobFlowSteps",
        "elasticmapreduce:AddTags",
        "elasticmapreduce:CancelSteps",
        "elasticmapreduce:CreateEditor",
        "elasticmapreduce:CreatePersistentAppUI",
        "elasticmapreduce:CreateSecurityConfiguration",
        "elasticmapreduce:DeleteEditor",
        "elasticmapreduce:DeleteSecurityConfiguration",
        "elasticmapreduce:DescribeCluster",
        "elasticmapreduce:DescribeEditor",
        "elasticmapreduce:DescribeJobFlows",
        "elasticmapreduce:DescribePersistentAppUI",
        "elasticmapreduce:DescribeSecurityConfiguration",
        "elasticmapreduce:DescribeStep",
        "elasticmapreduce:DescribeReleaseLabel",
        "elasticmapreduce:GetBlockPublicAccessConfiguration",
        "elasticmapreduce:GetManagedScalingPolicy",
        "elasticmapreduce:GetPersistentAppUIPresignedURL",
        "elasticmapreduce:GetAutoTerminationPolicy",
        "elasticmapreduce:ListBootstrapActions",
        "elasticmapreduce:ListClusters",
        "elasticmapreduce:ListEditors",
        "elasticmapreduce:ListInstanceFleets",
        "elasticmapreduce:ListInstanceGroups",
        "elasticmapreduce:ListInstances",
        "elasticmapreduce:ListSecurityConfigurations",
        "elasticmapreduce:ListSteps",
        "elasticmapreduce:ListSupportedInstanceTypes",
        "elasticmapreduce:ModifyCluster",
        "elasticmapreduce:ModifyInstanceFleet",
        "elasticmapreduce:ModifyInstanceGroups",
        "elasticmapreduce:OpenEditorInConsole",
        "elasticmapreduce:PutAutoScalingPolicy",
        "elasticmapreduce:PutBlockPublicAccessConfiguration",
        "elasticmapreduce:PutManagedScalingPolicy",
        "elasticmapreduce:RemoveAutoScalingPolicy",
        "elasticmapreduce:RemoveManagedScalingPolicy",
        "elasticmapreduce:RemoveTags",
        "elasticmapreduce:SetTerminationProtection",
        "elasticmapreduce:StartEditor",
        "elasticmapreduce:StopEditor",
        "elasticmapreduce:TerminateJobFlows",
        "elasticmapreduce:ViewEventsFromAllClustersInConsole"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "ViewMetricsInEMRConsole",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricStatistics"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "PassRoleForElasticMapReduce",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/EMR_DefaultRole",
        "arn:aws:iam::*:role/EMR_DefaultRole_V2"
      ],
      "Condition": {
        "StringLike": {
          "iam:PassedToService": "elasticmapreduce.amazonaws.com*"
        }
      }
    },
    {
      "Sid": "PassRoleForEC2",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/EMR_EC2_DefaultRole"
      ],
      "Condition": {
        "StringLike": {
          "iam:PassedToService": "ec2.amazonaws.com*"
        }
      }
    },
    {
      "Sid": "PassRoleForAutoScaling",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/EMR_AutoScaling_DefaultRole"
      ],
      "Condition": {
        "StringLike": {
          "iam:PassedToService": "application-autoscaling.amazonaws.com*"
        }
      }
    },
    {
      "Sid": "ElasticMapReduceServiceLinkedRole",
      "Effect": "Allow",
      "Action": [
        "iam:CreateServiceLinkedRole"
      ],
      "Resource": [
        "arn:aws:iam::*:role/aws-service-role/elasticmapreduce.amazonaws.com*/AWSServiceRoleForEMRCleanup*"
      ],
      "Condition": {
        "StringEquals": {
          "iam:AWSServiceName": [
            "elasticmapreduce.amazonaws.com",
            "elasticmapreduce.amazonaws.com.rproxy.govskope.us.cn"
          ]
        }
      }
    },
    {
      "Sid": "ConsoleUIActions",
      "Effect": "Allow",
      "Action": [
        "ec2:DescribeAccountAttributes",
        "ec2:DescribeAvailabilityZones",
        "ec2:DescribeImages",
        "ec2:DescribeKeyPairs",
        "ec2:DescribeNatGateways",
        "ec2:DescribeRouteTables",
        "ec2:DescribeSecurityGroups",
        "ec2:DescribeSubnets",
        "ec2:DescribeVpcs",
        "ec2:DescribeVpcEndpoints",
        "s3:ListAllMyBuckets",
        "iam:ListRoles"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

# IAM managed policy for full access (on path to deprecation)
<a name="emr-managed-policy-fullaccess"></a>

The `AmazonElasticMapReduceFullAccess` and `AmazonEMRFullAccessPolicy_v2` AWS Identity and Access Management (IAM) managed policies grant all the required actions for Amazon EMR and other services.

**Important**  
The `AmazonElasticMapReduceFullAccess` managed policy is on the path to deprecation, and no longer recommended for use with Amazon EMR. Instead, use [`AmazonEMRFullAccessPolicy_v2`](emr-managed-policy-fullaccess-v2.md). When the IAM service eventually deprecates the v1 policy, you won't be able to attach it to a role. However, you can attach an existing role to a cluster even if that role uses the deprecated policy.

The Amazon EMR full-permissions default managed policies incorporate `iam:PassRole` security configurations, including the following:
+ `iam:PassRole` permissions only for specific default Amazon EMR roles.
+ `iam:PassedToService` conditions that allow you to use the policy with only specified AWS services, such as `elasticmapreduce.amazonaws.com` and `ec2.amazonaws.com`.

You can view the JSON version of the [AmazonEMRFullAccessPolicy\$1v2](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AmazonEMRFullAccessPolicy_v2) and [AmazonEMRServicePolicy\$1v2](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/service-role/AmazonEMRServicePolicy_v2) policies in the IAM console. We recommend that you create new clusters with the v2 managed policies.

You can view the contents of the deprecated v1 policy in the AWS Management Console at [https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonElasticMapReduceFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonElasticMapReduceFullAccess). The `ec2:TerminateInstances` action in the policy grants permission to the a user or role to terminate any of the Amazon EC2 instances associated with the IAM account. This includes instances that are not part of an EMR cluster.

# IAM managed policy for read-only access (v2 managed default policy) for Amazon EMR
<a name="emr-managed-policy-readonly-v2"></a>

To grant read-only privileges to Amazon EMR, attach the **AmazonEMRReadOnlyAccessPolicy\$1v2** managed policy. This default managed policy replaces the [`AmazonElasticMapReduceReadOnlyAccess`](emr-managed-policy-readonly.md) managed policy. The content of this policy statement is shown in the following snippet. Compared with the `AmazonElasticMapReduceReadOnlyAccess` policy, the `AmazonEMRReadOnlyAccessPolicy_v2` policy does not use wildcard characters for the `elasticmapreduce` element. Instead, the default v2 policy scopes the allowable `elasticmapreduce` actions.

**Note**  
You can also use the AWS Management Console link [https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEMRReadOnlyAccessPolicy_v2](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AmazonEMRReadOnlyAccessPolicy_v2) to view the policy.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "ElasticMapReduceActions",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:DescribeCluster",
        "elasticmapreduce:DescribeEditor",
        "elasticmapreduce:DescribeJobFlows",
        "elasticmapreduce:DescribeSecurityConfiguration",
        "elasticmapreduce:DescribeStep",
        "elasticmapreduce:DescribeReleaseLabel",
        "elasticmapreduce:GetBlockPublicAccessConfiguration",
        "elasticmapreduce:GetManagedScalingPolicy",
        "elasticmapreduce:GetAutoTerminationPolicy",
        "elasticmapreduce:ListBootstrapActions",
        "elasticmapreduce:ListClusters",
        "elasticmapreduce:ListEditors",
        "elasticmapreduce:ListInstanceFleets",
        "elasticmapreduce:ListInstanceGroups",
        "elasticmapreduce:ListInstances",
        "elasticmapreduce:ListSecurityConfigurations",
        "elasticmapreduce:ListSteps",
        "elasticmapreduce:ListSupportedInstanceTypes",
        "elasticmapreduce:ViewEventsFromAllClustersInConsole"
      ],
      "Resource": [
        "*"
      ]
    },
    {
      "Sid": "ViewMetricsInEMRConsole",
      "Effect": "Allow",
      "Action": [
        "cloudwatch:GetMetricStatistics"
      ],
      "Resource": [
        "*"
      ]
    }
  ]
}
```

------

# IAM managed policy for read-only access (on path to deprecation)
<a name="emr-managed-policy-readonly"></a>

The `AmazonElasticMapReduceReadOnlyAccess` managed policy is on the path to deprecation. You cannot attach this policy when launching new clusters. `AmazonElasticMapReduceReadOnlyAccess` has been replaced with [`AmazonEMRReadOnlyAccessPolicy_v2`](emr-managed-policy-readonly-v2.md) as the Amazon EMR default managed policy. The content of this policy statement is shown in the following snippet. Wildcard characters for the `elasticmapreduce` element specify that only actions that begin with the specified strings are allowed. Keep in mind that because this policy does not explicitly deny actions, a different policy statement may still be used to grant access to specified actions.

**Note**  
You can also use the AWS Management Console to view the policy.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:Describe*",
        "elasticmapreduce:List*",
        "elasticmapreduce:ViewEventsFromAllClustersInConsole",
        "s3:GetObject",
        "s3:ListAllMyBuckets",
        "s3:ListBucket",
        "sdb:Select",
        "cloudwatch:GetMetricStatistics"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowELASTICMAPREDUCEDescribe"
    }
  ]
}
```

------

# AWS managed policy: EMRDescribeClusterPolicyForEMRWAL
<a name="EMRDescribeClusterPolicyForEMRWAL"></a>

You can't attach `EMRDescribeClusterPolicyForEMRWAL` to your IAM entities. This policy is attached to a service-linked role that allows Amazon EMR to perform actions on your behalf. For more information on this service-linked role, see [Using service-linked roles with Amazon EMR for write-ahead logging](using-service-linked-roles-wal.md). 

This policy grants read-only permissions that allow the WAL service for Amazon EMR to find and return the status of a cluster. For more information about Amazon EMR WAL, see [Write-ahead logs (WAL) for Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-wal.html).

**Permissions details**

This policy includes the following permissions:
+ `emr` – Allows principals to describe cluster status from Amazon EMR. This is required so that Amazon EMR can confirm when a cluster has terminated and then, after thirty days, clean up any WAL logs left behind by the cluster.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:DescribeCluster"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowELASTICMAPREDUCEDescribecluster"
    }
  ]
}
```

------

## AWS managed policies for Amazon EMR
<a name="security-iam-awsmanpol"></a>

An AWS managed policy is a standalone policy that is created and administered by AWS. AWS managed policies are designed to provide permissions for many common use cases so that you can start assigning permissions to users, groups, and roles.

Keep in mind that AWS managed policies might not grant least-privilege permissions for your specific use cases because they're available for all AWS customers to use. We recommend that you reduce permissions further by defining [ customer managed policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html#customer-managed-policies) that are specific to your use cases.

You cannot change the permissions defined in AWS managed policies. If AWS updates the permissions defined in an AWS managed policy, the update affects all principal identities (users, groups, and roles) that the policy is attached to. AWS is most likely to update an AWS managed policy when a new AWS service is launched or new API operations become available for existing services.

For more information, see [AWS managed policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_managed-vs-inline.html#aws-managed-policies) in the *IAM User Guide*.

# Amazon EMR updates to AWS managed policies
<a name="security-iam-awsmanpol-updates"></a>

View details about updates to AWS managed policies for Amazon EMR since this service began tracking these changes. 


| Change | Description | Date | 
| --- | --- | --- | 
| [`AmazonEMRServicePolicy_v2`](emr-iam-role.md) – Update to an existing policy | Added ec2:CreateVpcEndpoint, ec2:ModifyVpcEndpoint, and ec2:CreateTags required for optimal experience, starting with Amazon EMR release 7.5.0. | March 4, 2025 | 
| [`AmazonEMRServicePolicy_v2`](emr-iam-role.md) – Update to an existing policy | Added elasticmapreduce:CreatePersistentAppUI, elasticmapreduce:DescribePersistentAppUI, and elasticmapreduce:GetPersistentAppUIPresignedURL. | February 28, 2025 | 
| [`EMRDescribeClusterPolicyForEMRWAL`](EMRDescribeClusterPolicyForEMRWAL.md) – New policy | Added a new policy so that Amazon EMR can determine cluster status for WAL cleanup thirty days after cluster termination. | August 10, 2023 | 
| [`AmazonEMRFullAccessPolicy_v2`](emr-managed-policy-fullaccess-v2.md) and [`AmazonEMRReadOnlyAccessPolicy_v2`](emr-managed-policy-readonly-v2.md) – Update to an existing policy | Added elasticmapreduce:DescribeReleaseLabel and elasticmapreduce:GetAutoTerminationPolicy. | April 21, 2022 | 
| [`AmazonEMRFullAccessPolicy_v2`](emr-managed-policy-fullaccess-v2.md) – Update to an existing policy | Added ec2:DescribeImages for [Using a custom AMI to provide more flexibility for Amazon EMR cluster configuration](emr-custom-ami.md). | February 15, 2022 | 
|  [**Amazon EMR managed policies**](emr-managed-iam-policies.md)  |  Updated to clarify use of predefined user tags. Added section on using the AWS console to launch clsuters with v2 managed policies.  | September 29, 2021 | 
|  [`AmazonEMRFullAccessPolicy_v2`](emr-managed-policy-fullaccess-v2.md) – Update to an existing policy  | Changed the PassRoleForAutoScaling and PassRoleForEC2 actions to use the StringLike condition operator to match "iam:PassedToService":"application-autoscaling.amazonaws.com\$1" and "iam:PassedToService":"ec2.amazonaws.com\$1", respectively. | May 20, 2021 | 
|  [`AmazonEMRFullAccessPolicy_v2`](emr-managed-policy-fullaccess-v2.md) – Update to an existing policy  |  Removed invalid action `s3:ListBuckets` and replaced with `s3:ListAllMyBuckets` action. Updated service-linked role (SLR) creation to be explicitly scoped-down to the only SLR that Amazon EMR has with explicit Service Principles. The SLRs that can be created are exactly the same as before this change.  | March 23, 2021 | 
|  [`AmazonEMRFullAccessPolicy_v2`](emr-managed-policy-fullaccess-v2.md) – New policy  |  Amazon EMR added new permissions to scope access to resources and to add a prerequisite that users must add predefined user tag to resources before they can use Amazon EMR managed policies. `iam:PassRole` action requires `iam:PassedToService` condition set to specified service. Access to Amazon EC2, Amazon S3, and other services is not allowed by default.   | March 11, 2021 | 
| [`AmazonEMRServicePolicy_v2`](emr-iam-role.md) – New policy |  Adds a prerequisite that users must add user tags to resources before they can use this policy.  | March 11, 2021 | 
| [`AmazonEMRReadOnlyAccessPolicy_v2`](emr-managed-policy-readonly-v2.md) – New policy |  Permissions allow only specified elasticmapreduce read-only actions. Access to Amazon S3 is access not allowed by default.  | March 11, 2021 | 
|  Amazon EMR started tracking changes  |  Amazon EMR started tracking changes for its AWS managed policies.  | March 11, 2021 | 

# IAM policies for tag-based access to clusters and EMR notebooks
<a name="emr-fine-grained-cluster-access"></a>

You can use conditions in your identity-based policy to control access to clusters and EMR notebooks based on tags.

For more information about adding tags to clusters, see [Tagging EMR clusters](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-tags.html). 

The following examples demonstrate different scenarios and ways to use condition operators with Amazon EMR condition keys. These IAM policy statements are intended for demonstration purposes only and should not be used in production environments. There are multiple ways to combine policy statements to grant and deny permissions according to your requirements. For more information about planning and testing IAM policies, see the [IAM User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/).

**Important**  
Explicitly denying permission for tagging actions is an important consideration. This prevents users from tagging a resource and thereby granting themselves permissions that you did not intend to grant. If you don't deny tagging actions for a resource, a user can modify tags and circumvent the intention of the tag-based policies.

## Example identity-based policy statements for clusters
<a name="emr-cluster-access-resourcetag"></a>

The following examples demonstrate identity-based permissions policies that are used to control the actions that are allowed with EMR clusters.

**Important**  
The `ModifyInstanceGroup` action in Amazon EMR does not require that you specify a cluster ID. For that reason, denying this action based on cluster tags requires additional consideration. For more information, see [Denying the ModifyInstanceGroup action in Amazon EMR](emr-cluster-deny-modifyinstancegroup.md).

**Topics**
+ [Allow actions only on clusters with specific tag values](#emr-cluster-access-example-tagvalue)
+ [Require cluster tagging when a cluster is created](#emr-cluster-access-example-require-tagging)
+ [Allow actions on clusters with a specific tag, regardless of tag value](#emr-cluster-access-example-tag)

### Allow actions only on clusters with specific tag values
<a name="emr-cluster-access-example-tagvalue"></a>

The following examples demonstrate a policy that allows a user to perform actions based on the cluster tag `department` with the value `dev` and also allows a user to tag clusters with that same tag. The final policy example demonstrates how to deny privileges to tag EMR clusters with anything but that same tag.

In the following policy example, the `StringEquals` condition operator tries to match `dev` with the value for the tag `department`. If the tag `department` hasn't been added to the cluster, or doesn't contain the value `dev`, the policy doesn't apply, and the actions aren't allowed by this policy. If no other policy statements allow the actions, the user can only work with clusters that have this tag with this value.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "Stmt12345678901234",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:DescribeCluster",
        "elasticmapreduce:ListSteps",
        "elasticmapreduce:TerminateJobFlows",
        "elasticmapreduce:SetTerminationProtection",
        "elasticmapreduce:ListInstances",
        "elasticmapreduce:ListInstanceGroups",
        "elasticmapreduce:ListBootstrapActions",
        "elasticmapreduce:DescribeStep"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/department": "dev"
        }
      }
    }
  ]
}
```

------

You can also specify multiple tag values using a condition operator. For example, to allow all actions on clusters where the `department` tag contains the value `dev` or `test`, you could replace the condition block in the earlier example with the following. 

```
            "Condition": {
              "StringEquals": {
                "elasticmapreduce:ResourceTag/department":["dev", "test"]
              }
            }
```

### Require cluster tagging when a cluster is created
<a name="emr-cluster-access-example-require-tagging"></a>

As in the prior example, the following policy example looks for the same matching tag: the value `dev` for the `department` tag. But in this example, the `RequestTag` condition key specifies that the policy applies during tag creation. So you must create a cluster with a tag that matches the specified value. 

To create a cluster with a tag, you must also have permission for the `elasticmapredue:AddTags` action. For this statement, the `elasticmapreduce:ResourceTag` condition key ensures that IAM only grants access to tag resources with the value `dev` on the `department` tag. The `Resource` element is used to limit this permission to cluster resources.

For the `PassRole` resources, you must provide the AWS account ID or alias, the service role name in the `PassRoleForEMR` statement, and the instance profile name in the `PassRoleForEC2` statement. For more information about the IAM ARN format, see [IAM ARNs](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-arns) in the *IAM User Guide*. 

For more information about matching tag-key values, see [https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-requesttag](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-requesttag) in the *IAM User Guide*.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "RunJobFlowExplicitlyWithTag",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:RunJobFlow"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:RequestTag/department": "dev"
        }
      }
    },
    {
      "Sid": "AddTagsForDevClusters",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:AddTags"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:*:*:cluster/*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/department": "dev"
        }
      }
    },
    {
      "Sid": "PassRoleForEMR",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::123456789012:role/Role-Name-With-Path"
      ],
      "Condition": {
        "StringLike": {
          "iam:PassedToService": "elasticmapreduce.amazonaws.com*"
        }
      }
    },
    {
      "Sid": "PassRoleForEC2",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::123456789012:role/Role-Name-With-Path"
      ],
      "Condition": {
        "StringLike": {
          "iam:PassedToService": "ec2.amazonaws.com*"
        }
      }
    }
  ]
}
```

------

### Allow actions on clusters with a specific tag, regardless of tag value
<a name="emr-cluster-access-example-tag"></a>

You can also allow actions only on clusters that have a particular tag, regardless of the tag value. To do this, you can use the `Null` operator. For more information, see [Condition operator to check existence of condition keys](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements.html#Conditions_Null) in the *IAM User Guide*. For example, to allow actions only on EMR clusters that have the `department` tag, regardless of the value it contains, you could replace the Condition blocks in the earlier example with the following one. The `Null` operator looks for the presence of the tag `department` on an EMR cluster. If the tag exists, the `Null` statement evaluates to false, matching the condition specified in this policy statement, and the appropriate actions are allowed. 

```
1. "Condition": {
2.   "Null": {
3.     "elasticmapreduce:ResourceTag/department":"false"
4.   }
5. }
```

The following policy statement allows a user to create an EMR cluster only if the cluster will have a `department` tag, which can contain any value. For the `PassRole` resource, you need to provide the AWS account ID or alias, and the service role name. For more information about the IAM ARN format, see [IAM ARNs](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-arns) in the *IAM User Guide*.

For more information specifying the null ("false") condition operator, see [Condition operator to check existence of condition keys](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_condition_operators.html#Conditions_Null) in the *IAM User Guide*.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "CreateClusterTagNullCondition",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:RunJobFlow"
      ],
      "Resource": [
        "*"
      ],
      "Condition": {
        "Null": {
          "aws:RequestTag/department": "false"
        }
      }
    },
    {
      "Sid": "AddTagsNullCondition",
      "Effect": "Allow",
      "Action": [
        "elasticmapreduce:AddTags"
      ],
      "Resource": [
        "arn:aws:elasticmapreduce:*:*:cluster/*"
      ],
      "Condition": {
        "Null": {
          "elasticmapreduce:ResourceTag/department": "false"
        }
      }
    },
    {
      "Sid": "PassRoleForElasticMapReduce",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::123456789012:role/Role-Name-With-Path"
      ],
      "Condition": {
        "StringLike": {
          "iam:PassedToService": "elasticmapreduce.amazonaws.com*"
        }
      }
    },
    {
      "Sid": "PassRoleForEC2",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::123456789012:role/Role-Name-With-Path"
      ],
      "Condition": {
        "StringLike": {
          "iam:PassedToService": "ec2.amazonaws.com*"
        }
      }
    }
  ]
}
```

------

## Example identity-based policy statements for EMR Notebooks
<a name="emr-managed-notebooks-tags-examples"></a>

The example IAM policy statements in this section demonstrate common scenarios for using keys to limit allowed actions using EMR Notebooks. As long as no other policy associated with the principal (user) allows the actions, the condition context keys limit allowed actions as indicated.

**Example – Allow access only to EMR Notebooks that a user creates based on tagging**  
The following example policy statement, when attached to a role or user, allows the a user to work only with notebooks that they have created. This policy statement uses the default tag applied when a notebook is created.  
In the example, the `StringEquals` condition operator tries to match a variable representing the current users a user ID (`{aws:userId}`) with the value of the tag `creatorUserID`. If the tag `creatorUserID` hasn't been added to the notebook, or doesn't contain the value of the current user's ID, the policy doesn't apply, and the actions aren't allowed by this policy. If no other policy statements allow the actions, the user can only work with notebooks that have this tag with this value.    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:DescribeEditor",
        "elasticmapreduce:StartEditor",
        "elasticmapreduce:StopEditor",
        "elasticmapreduce:DeleteEditor",
        "elasticmapreduce:OpenEditorInConsole"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/creatorUserId": "${aws:userId}"
        }
      },
      "Sid": "AllowELASTICMAPREDUCEDescribeeditor"
    }
  ]
}
```

**Example –Require notebook tagging when a notebook is created**  
In this example, the `RequestTag` context key is used. The `CreateEditor` action is allowed only if the user does not change or delete the `creatorUserID` tag is added by default. The variable \$1\$1aws:userId\$1, specifies the currently active user's User ID, which is the default value of the tag.  
The policy statement can be used to help ensure that users do not remove the `createUserId` tag or change its value.    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:CreateEditor"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:RequestTag/creatorUserId": "${aws:userid}"
        }
      },
      "Sid": "AllowELASTICMAPREDUCECreateeditor"
    }
  ]
}
```
This example requires that the user create the cluster with a tag having the key string `dept` and a value set to one of the following: `datascience`, `analytics`, `operations`.    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:CreateEditor"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:RequestTag/dept": [
            "datascience",
            "analytics",
            "operations"
          ]
        }
      },
      "Sid": "AllowELASTICMAPREDUCECreateeditor"
    }
  ]
}
```

**Example –Limit notebook creation to tagged clusters, and require notebook tags**  
This example allows notebook creation only if the notebook is created with a tag that has the key string `owner` set to one of the specified values. In addition, the notebook can be created only if the cluster has a tag with the key string `department` set to one of the specified values.    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:CreateEditor"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:RequestTag/owner": [
            "owner1",
            "owner2",
            "owner3"
          ],
          "elasticmapreduce:ResourceTag/department": [
            "dep1",
            "dep3"
          ]
        }
      },
      "Sid": "AllowELASTICMAPREDUCECreateeditor"
    }
  ]
}
```

**Example –Limit the ability to start a notebook based on tags**  
This example limits the ability to start notebooks only to those notebooks that have a tag with the key string `owner` set to one of the specified values. Because the `Resource` element is used to specify only the `editor`, the condition does not apply to the cluster, and it does not need to be tagged.    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:StartEditor"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:elasticmapreduce:*:123456789012:editor/*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/owner": [
            "owner1",
            "owner2"
          ]
        }
      },
      "Sid": "AllowELASTICMAPREDUCEStarteditor"
    }
  ]
}
```
This example is similar to one above. However, the limit only applies to tagged clusters, not notebooks.    
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:StartEditor"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:elasticmapreduce:*:123456789012:cluster/*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/department": [
            "dep1",
            "dep3"
          ]
        }
      },
      "Sid": "AllowELASTICMAPREDUCEStarteditor"
    }
  ]
}
```
This example uses a different set of notebook and cluster tags. It allows a notebook to be started only if:  
+ The notebook has a tag with the key string `owner` set to any of the specified values

  —and—
+ The cluster has a tag with the key string `department` set to any of the specified values  
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:StartEditor"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:elasticmapreduce:*:123456789012:editor/*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/owner": [
            "user1",
            "user2"
          ]
        }
      },
      "Sid": "AllowELASTICMAPREDUCEStarteditorByOwner"
    },
    {
      "Action": [
        "elasticmapreduce:StartEditor"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:elasticmapreduce:*:123456789012:cluster/*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/department": [
            "datascience",
            "analytics"
          ]
        }
      },
      "Sid": "AllowELASTICMAPREDUCEStarteditorByDepartment"
    }
  ]
}
```

**Example –Limit the ability to open the notebook editor based on tags**  
This example allows the notebook editor to be opened only if:  
+ The notebook has a tag with the key string `owner` set to any of the specified values.

  —and—
+ The cluster has a tag with the key string `department` set to any of the specified values.  
****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:OpenEditorInConsole"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:elasticmapreduce:*:123456789012:editor/*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/owner": [
            "user1",
            "user2"
          ]
        }
      },
      "Sid": "AllowELASTICMAPREDUCEOpeneditorconsoleByOwner"
    },
    {
      "Action": [
        "elasticmapreduce:OpenEditorInConsole"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:elasticmapreduce:*:123456789012:cluster/*"
      ],
      "Condition": {
        "StringEquals": {
          "elasticmapreduce:ResourceTag/department": [
            "datascience",
            "analytics"
          ]
        }
      },
      "Sid": "AllowELASTICMAPREDUCEOpeneditorconsoleByDepartment"
    }
  ]
}
```

# Denying the ModifyInstanceGroup action in Amazon EMR
<a name="emr-cluster-deny-modifyinstancegroup"></a>

The [ModifyInstanceGroups](https://docs.aws.amazon.com/emr/latest/APIReference/API_ModifyInstanceGroups.html) action in Amazon EMR does not require that you provide a cluster ID with the action. Instead, you can specify only an instance group ID. For this reason, an apparently simple deny policy for this action based on cluster ID or a cluster tag may not have the intended effect. Consider the following example policy.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:ModifyInstanceGroups"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ],
      "Sid": "AllowELASTICMAPREDUCEModifyinstancegroups"
    },
    {
      "Action": [
        "elasticmapreduce:ModifyInstanceGroups"
      ],
      "Effect": "Deny",
      "Resource": [
        "arn:aws:elasticmapreduce:us-east-1:123456789012:cluster/j-12345ABCDEFG67"
      ],
      "Sid": "DenyELASTICMAPREDUCEModifyinstancegroups"
    }
  ]
}
```

------

If a user with this policy attached performs a `ModifyInstanceGroup` action and specifies only the instance group ID, the policy does not apply. Because the action is allowed on all other resources, the action is successful.

A solution to this issue is to attach a policy statement to the identity that uses a [NotResource](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_elements_notresource.html) element to deny any `ModifyInstanceGroup` action issued without a cluster ID. The following example policy adds such a deny statement so that any `ModifyInstanceGroups` request fails unless a cluster ID is specified. Because an identity must specify a cluster ID with the action, deny statements based on cluster ID are therefore effective.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:ModifyInstanceGroups"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ],
      "Sid": "AllowELASTICMAPREDUCEModifyinstancegroups"
    },
    {
      "Action": [
        "elasticmapreduce:ModifyInstanceGroups"
      ],
      "Effect": "Deny",
      "Resource": [
        "arn:aws:elasticmapreduce:us-east-1:123456789012:cluster/j-12345ABCDEFG67"
      ],
      "Sid": "DenyELASTICMAPREDUCEModifyinstancegroupsSpecificCluster"
    },
    {
      "Action": [
        "elasticmapreduce:ModifyInstanceGroups"
      ],
      "Effect": "Deny",
      "NotResource": "arn:*:elasticmapreduce:*:*:cluster/*",
      "Sid": "DenyELASTICMAPREDUCEModifyinstancegroupsNonCluster"
    }
  ]
}
```

------

A similar issue exists when you want to deny the `ModifyInstanceGroups` action based on the value associated with a cluster tag. The solution is similar. In addition to a deny statement that specifies the tag value, you can add a policy statement that denies the `ModifyInstanceGroup` action if the tag that you specify is not present, regardless of value.

The following example demonstrates a policy that, when attached to an identity, denies the identity the `ModifyInstanceGroups` action any cluster with the tag `department` set to `dev`. This statement is only effective because of the deny statement that uses the `StringNotLike` condition to deny the action unless the `department` tag is present.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Action": [
        "elasticmapreduce:ModifyInstanceGroups"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ],
      "Sid": "AllowELASTICMAPREDUCEModifyinstancegroups"
    },
    {
      "Action": [
        "elasticmapreduce:ModifyInstanceGroups"
      ],
      "Condition": {
        "StringEquals": {
          "aws:ResourceTag/department": "dev"
        }
      },
      "Effect": "Deny",
      "Resource": [
        "*"
      ],
      "Sid": "DenyELASTICMAPREDUCEModifyinstancegroupsDevDepartment"
    },
    {
      "Action": [
        "elasticmapreduce:ModifyInstanceGroups"
      ],
      "Condition": {
        "StringNotLike": {
          "aws:ResourceTag/department": "?*"
        }
      },
      "Effect": "Deny",
      "Resource": [
        "*"
      ],
      "Sid": "DenyELASTICMAPREDUCEModifyinstancegroupsNoDepartmentTag"
    }
  ]
}
```

------

# Troubleshooting Amazon EMR identity and access
<a name="security_iam_troubleshoot"></a>

Use the following information to help you diagnose and fix common issues that you might encounter when working with Amazon EMR and IAM.

**Topics**
+ [I am not authorized to perform an action in Amazon EMR](#security_iam_troubleshoot-no-permissions)
+ [I am not authorized to perform iam:PassRole](#security_iam_troubleshoot-passrole)
+ [I want to allow people outside of my AWS account to access my Amazon EMR resources](#security_iam_troubleshoot-cross-account-access)

## I am not authorized to perform an action in Amazon EMR
<a name="security_iam_troubleshoot-no-permissions"></a>

If the AWS Management Console tells you that you're not authorized to perform an action, then you must contact your administrator for assistance. Your administrator is the person that provided you with your user name and password.

The following example error occurs when the `mateojackson` user tries to use the console to view details about a fictional `my-example-widget` resource but does not have the fictional `EMR:GetWidget` permissions.

```
User: arn:aws:iam::123456789012:user/mateojackson is not authorized to perform: EMR:GetWidget on resource: my-example-widget
```

In this case, Mateo asks his administrator to update his policies to allow him to access the `my-example-widget` resource using the `EMR:GetWidget` action.

## I am not authorized to perform iam:PassRole
<a name="security_iam_troubleshoot-passrole"></a>

If you receive an error that you're not authorized to perform the `iam:PassRole` action, your policies must be updated to allow you to pass a role to Amazon EMR.

Some AWS services allow you to pass an existing role to that service instead of creating a new service role or service-linked role. To do this, you must have permissions to pass the role to the service.

The following example error occurs when an IAM user named `marymajor` tries to use the console to perform an action in Amazon EMR. However, the action requires the service to have permissions that are granted by a service role. Mary does not have permissions to pass the role to the service.

```
User: arn:aws:iam::123456789012:user/marymajor is not authorized to perform: iam:PassRole
```

In this case, Mary's policies must be updated to allow her to perform the `iam:PassRole` action.

If you need help, contact your AWS administrator. Your administrator is the person who provided you with your sign-in credentials.

## I want to allow people outside of my AWS account to access my Amazon EMR resources
<a name="security_iam_troubleshoot-cross-account-access"></a>

You can create a role that users in other accounts or people outside of your organization can use to access your resources. You can specify who is trusted to assume the role. For services that support resource-based policies or access control lists (ACLs), you can use those policies to grant people access to your resources.

To learn more, consult the following:
+ To learn whether Amazon EMR supports these features, see [How Amazon EMR works with IAM](security_iam_service-with-iam.md).
+ To learn how to provide access to your resources across AWS accounts that you own, see [Providing access to an IAM user in another AWS account that you own](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_aws-accounts.html) in the *IAM User Guide*.
+ To learn how to provide access to your resources to third-party AWS accounts, see [Providing access to AWS accounts owned by third parties](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_third-party.html) in the *IAM User Guide*.
+ To learn how to provide access through identity federation, see [Providing access to externally authenticated users (identity federation)](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_common-scenarios_federated-users.html) in the *IAM User Guide*.
+ To learn the difference between using roles and resource-based policies for cross-account access, see [Cross account resource access in IAM](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies-cross-account-resource-access.html) in the *IAM User Guide*.

# Using Amazon S3 Access Grants with Amazon EMR
<a name="emr-access-grants"></a>

## S3 Access Grants overview for Amazon EMR
<a name="emr-access-grants-overview"></a>

With Amazon EMR releases 6.15.0 and higher, Amazon S3 Access Grants provide a scalable access control solution that you can use to augment access to your Amazon S3 data from Amazon EMR. If you have a complex or large permission configuration for your S3 data, you can use Access Grants to scale S3 data permissions for users, roles, and applications on your cluster.

Use S3 Access Grants to augment access to Amazon S3 data beyond the permissions that are granted by the runtime role or the IAM roles that are attached to the identities with access to your EMR cluster. For more information, see [Managing access with S3 Access Grants](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-grants.html) in the *Amazon S3 User Guide*.

For steps to use S3 Access Grants with other Amazon EMR deployments, see the following documentation: 
+ [Using S3 Access Grants with Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/access-grants.html)
+ [Using S3 Access Grants with Amazon EMR Serverless](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/access-grants.html)

## How Amazon EMR works with S3 Access Grants
<a name="emr-access-grants-howitworks"></a>

Amazon EMR releases 6.15.0 and higher provide a native integration with S3 Access Grants. You can enable S3 Access Grants on Amazon EMR and run Spark jobs. When a Spark job makes a request for S3 data, Amazon S3 provides temporary credentials that are scoped to the specific bucket, prefix, or object.

The following is a high-level overview of how Amazon EMR gets access to data that's protected by S3 Access Grants.

![\[How Amazon EMR works with S3 Access Grants\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/access-grants-overview.png)


1. A user submits an Amazon EMR Spark job that uses data stored in Amazon S3. 

1. Amazon EMR makes a request for S3 Access Grants to allow access to the bucket, prefix, or object on behalf of that user. 

1. Amazon S3 returns temporary credentials in the form of an AWS Security Token Service (STS) token for the user. The token is scoped to access the S3 bucket, prefix, or object.

1. Amazon EMR uses the STS token to retrieve data from S3. 

1. Amazon EMR receives the data from S3 and returns the results to the user.

## S3 Access Grants considerations with Amazon EMR
<a name="emr-access-grants-considerations"></a>

Take note of the following behaviors and limitations when you use S3 Access Grants with Amazon EMR.

### Feature support
<a name="emr-access-grants-support"></a>
+ S3 Access Grants is supported with Amazon EMR releases 6.15.0 and higher.
+ Spark is the only supported query engine when you use S3 Access Grants with Amazon EMR.
+ Delta Lake and Hudi are the only supported open-table formats when you use S3 Access Grants with Amazon EMR.
+ The following Amazon EMR capabilities are not supported for use with S3 Access Grants:
  + Apache Iceberg tables
  + LDAP native authentication 
  + Apache Ranger native authentication 
  + AWS CLI requests to Amazon S3 that use IAM roles
  + S3 access through the open-source S3A protocol
+ The `fallbackToIAM` option isn't supported for EMR clusters that use trusted identity propagation with IAM Identity Center.
+ [S3 Access Grants with AWS Lake Formation](#emr-access-grants-lf) is only supported with Amazon EMR clusters that run on Amazon EC2.

### Behavioral considerations
<a name="emr-access-grants-behavior"></a>
+ The Apache Ranger native integration with Amazon EMR holds functionality that is congruent with S3 Access Grants as part of the EMRFS S3 Apache Ranger plugin. If you use Apache Ranger for fine-grained access control (FGAC), we recommend that you use that plugin instead of S3 Access Grants.
+ Amazon EMR provides a credentials cache in EMRFS to ensure that a user doesn't need to make repeated requests for the same credentials within a Spark job. Therefore, Amazon EMR always requests the default-level privilege when it requests credentials. For more information, see [Request access to S3 data](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-grants-credentials.html) in the *Amazon S3 User Guide*.
+ In the case that a user performs an action that S3 Access Grants doesn't support, Amazon EMR is set to use the IAM role that was specified for job execution. For more information, see [Fall back to IAM roles](#emr-access-grants-fallback).

## Launch an Amazon EMR cluster with S3 Access Grants
<a name="emr-access-grants-launch-ec2"></a>

This section describes how to launch an EMR cluster that runs on Amazon EC2, and uses S3 Access Grants to manage access to data in Amazon S3. For steps to use S3 Access Grants with other Amazon EMR deployments, see the following documentation: 
+ [Using S3 Access Grants with Amazon EMR on EKS](https://docs.aws.amazon.com/emr/latest/EMR-on-EKS-DevelopmentGuide/access-grants.html)
+ [Using S3 Access Grants with EMR Serverless ](https://docs.aws.amazon.com/emr/latest/EMR-Serverless-UserGuide/access-grants.html)

Use the following steps to launch an EMR cluster that runs on Amazon EC2, and uses S3 Access Grants to manage access to data in Amazon S3.

1. Set up a job execution role for your EMR cluster. Include the required IAM permissions that you need to run Spark jobs, `s3:GetDataAccess` and `s3:GetAccessGrantsInstanceForPrefix`:

   ```
   {
       "Effect": "Allow",
       "Action": [
       "s3:GetDataAccess",
       "s3:GetAccessGrantsInstanceForPrefix"
       ],
       "Resource": [     //LIST ALL INSTANCE ARNS THAT THE ROLE IS ALLOWED TO QUERY
            "arn:aws_partition:s3:Region:account-id1:access-grants/default",
            "arn:aws_partition:s3:Region:account-id2:access-grants/default"
       ]
   }
   ```
**Note**  
With Amazon EMR, S3 Access Grants augment the permissions that are set in IAM roles. If the IAM roles that you specify for job execution contain permissions to access S3 directly, then users might be able to access more data than just the data that you define in S3 Access Grants.

1. Next, use the AWS CLI to create a cluster with Amazon EMR 6.15 or higher and the `emrfs-site` classification to enable S3 Access Grants, similar to the following example:

   ```
   aws emr create-cluster 
     --release-label emr-6.15.0 \
     --instance-count 3 \
     --instance-type m5.xlarge \
     --configurations '[{"Classification":"emrfs-site", "Properties":{"fs.s3.s3AccessGrants.enabled":"true", "fs.s3.s3AccessGrants.fallbackToIAM":"false"}}]'
   ```

## S3 Access Grants with AWS Lake Formation
<a name="emr-access-grants-lf"></a>

If you use Amazon EMR with the [AWS Lake Formation integration](emr-lake-formation.md), you can use Amazon S3 Access Grants for direct or tabular access to data in Amazon S3. 

**Note**  
S3 Access Grants with AWS Lake Formation is only supported with Amazon EMR clusters that run on Amazon EC2.

**Direct access**  
Direct access involves all calls to access S3 data that don't invoke the API for the AWS Glue service that Lake Formation uses as a metastore with Amazon EMR, for example, to call `spark.read`:  

```
spark.read.csv("s3://...")
```
When you use S3 Access Grants with AWS Lake Formation on Amazon EMR, all direct access patterns go through S3 Access Grants to get temporary S3 credentials.

**Tabular access**  
Tabular access occurs when Lake Formation invokes the metastore API to access your S3 location, for example, to query table data:  

```
spark.sql("select * from test_tbl")
```
When you use S3 Access Grants with AWS Lake Formation on Amazon EMR, all tabular access patterns go through Lake Formation.

## Fall back to IAM roles
<a name="emr-access-grants-fallback"></a>

If a user attempts to perform an action that S3 Access Grants doesn't support, Amazon EMR defaults to the IAM role that was specified for job execution when the `fallbackToIAM` configuration is `true`. This allows users to fall back on their job execution role to give credentials for S3 access in scenarios that S3 Access Grants doesn't cover.

With `fallbackToIAM` enabled, users can access the data that the Access Grant allows. If there isn't an S3 Access Grants token for the target data, then Amazon EMR checks for the permission on their job execution role.

**Note**  
We recommend that you test your access permissions with the `fallbackToIAM` configuration enabled even if you plan to disable the option for production workloads. With Spark jobs, there are other ways that users might be able to access all permission sets with their IAM credentials. When enabled on EMR clusters, grants from S3 give Spark jobs access to S3 locations. You should ensure that you protect these S3 locations from access outside of EMRFS. For example, you should protect the S3 locations from access by S3 clients used in notebooks, or by applications that aren't supported by S3 Access Grants such as Hive or Presto.

# Authenticate to Amazon EMR cluster nodes
<a name="emr-authenticate-cluster-connections"></a>

SSH clients can use an Amazon EC2 key pair to authenticate to cluster instances. Alternatively, with Amazon EMR releases 5.10.0 and higher, you can configure Kerberos to authenticate users and SSH connections to the primary node. And with Amazon EMR releases 5.12.0 and higher, you can authenticate with LDAP.

**Topics**
+ [Use an EC2 key pair for SSH credentials for Amazon EMR](emr-plan-access-ssh.md)
+ [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md)
+ [Use Active Directory or LDAP servers for authentication with Amazon EMR](ldap.md)

# Use an EC2 key pair for SSH credentials for Amazon EMR
<a name="emr-plan-access-ssh"></a>

Amazon EMR cluster nodes run on Amazon EC2 instances. You can connect to cluster nodes in the same way that you can connect to Amazon EC2 instances. You can use Amazon EC2 to create a key pair, or you can import a key pair. When you create a cluster, you can specify the Amazon EC2 key pair that will be used for SSH connections to all cluster instances. You can also create a cluster without a key pair. This is usually done with transient clusters that start, run steps, and then terminate automatically.

The SSH client that you use to connect to the cluster needs to use the private key file associated with this key pair. This is a .pem file for SSH clients using Linux, Unix and macOS. You must set permissions so that only the key owner has permission to access the file. This is a .ppk file for SSH clients using Windows, and the .ppk file is usually created from the .pem file.
+ For more information about creating an Amazon EC2 key pair, see [Amazon EC2 key pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*.
+ For instructions about using PuTTYgen to create a .ppk file from a .pem file, see [Converting your private key using PuTTYgen](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html#putty-private-key) in the *Amazon EC2 User Guide*.
+ For more information about setting .pem file permissions and how to connect to an EMR cluster's primary node using different methods - including `ssh` from Linux or macOS, PuTTY from Windows, or the AWS CLI from any supported operating system, see [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md).

# Use Kerberos for authentication with Amazon EMR
<a name="emr-kerberos"></a>

Amazon EMR releases 5.10.0 and higher support Kerberos. Kerberos is a network authentication protocol that uses secret-key cryptography to provide strong authentication so that passwords or other credentials aren't sent over the network in an unencrypted format.

In Kerberos, services and users that need to authenticate are known as *principals*. Principals exist within a Kerberos *realm*. Within the realm, a Kerberos server known as the *key distribution center (KDC)* provides the means for principals to authenticate. The KDC does this by issuing *tickets* for authentication. The KDC maintains a database of the principals within its realm, their passwords, and other administrative information about each principal. A KDC can also accept authentication credentials from principals in other realms, which is known as a *cross-realm trust*. In addition, an EMR cluster can use an external KDC to authenticate principals.

A common scenario for establishing a cross-realm trust or using an external KDC is to authenticate users from an Active Directory domain. This allows users to access an EMR cluster with their domain account when they use SSH to connect to a cluster or work with big data applications.

When you use Kerberos authentication, Amazon EMR configures Kerberos for the applications, components, and subsystems that it installs on the cluster so that they are authenticated with each other.

**Important**  
Amazon EMR does not support AWS Directory Service for Microsoft Active Directory in a cross-realm trust or as an external KDC.

Before you configure Kerberos using Amazon EMR, we recommend that you become familiar with Kerberos concepts, the services that run on a KDC, and the tools for administering Kerberos services. For more information, see [MIT Kerberos documentation](http://web.mit.edu/kerberos/krb5-latest/doc/), which is published by the [Kerberos consortium](http://kerberos.org/).

**Topics**
+ [Supported applications with Amazon EMR](emr-kerberos-principals.md)
+ [Kerberos architecture options with Amazon EMR](emr-kerberos-options.md)
+ [Configuring Kerberos on Amazon EMR](emr-kerberos-configure.md)
+ [Using SSH to connect to Kerberized clusters with Amazon EMR](emr-kerberos-connect-ssh.md)
+ [Tutorial: Configure an cluster-dedicated KDC with Amazon EMR](emr-kerberos-cluster-kdc.md)
+ [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md)

# Supported applications with Amazon EMR
<a name="emr-kerberos-principals"></a>

Within an EMR cluster, Kerberos principals are the big data application services and subsystems that run on all cluster nodes. Amazon EMR can configure the applications and components listed below to use Kerberos. Each application has a Kerberos user principal associated with it.

Amazon EMR does not support cross-realm trusts with AWS Directory Service for Microsoft Active Directory.

Amazon EMR only configures the open-source Kerberos authentication features for the applications and components listed below. Any other applications installed are not Kerberized, which can result in an inability to communicate with Kerberized components and cause application errors. Applications and components that are not Kerberized do not have authentication enabled. Supported applications and components may vary for different Amazon EMR releases.

The Livy user interface is the only web user interface hosted on the cluster that is Kerberized.
+ **Hadoop MapReduce**
+ **Hbase**
+ **HCatalog**
+ **HDFS**
+ **Hive**
  + Do not enable Hive with LDAP authentication. This may cause issues communicating with Kerberized YARN.
+ **Hue**
  + Hue user authentication isn't set automatically and can be configured using the configuration API.
  + Hue server is Kerberized. The Hue front-end (UI) is not configured for authentication. LDAP authentication can be configured for the Hue UI. 
+ **Livy**
  + Livy impersonation with Kerberized clusters is supported in Amazon EMR releases 5.22.0 and higher.
+ **Oozie**
+ **Phoenix**
+ **Presto**
  + Presto supports Kerberos authentication in Amazon EMR releases 6.9.0 and higher.
  + To use Kerberos authentication for Presto, you must enable [in-transit encryption](emr-data-encryption-options.md#emr-encryption-intransit).
+ **Spark**
+ **Tez**
+ **Trino**
  + Trino supports Kerberos authentication in Amazon EMR releases 6.11.0 and higher.
  + To use Kerberos authentication for Trino, you must enable [in-transit encryption](emr-data-encryption-options.md#emr-encryption-intransit).
+ **YARN**
+ **Zeppelin**
  + Zeppelin is only configured to use Kerberos with the Spark interpreter. It is not configured for other interpreters.
  + User impersonation is not supported for Kerberized Zeppelin interpreters other than Spark.
+ **Zookeeper**
  + Zookeeper client is not supported.

# Kerberos architecture options with Amazon EMR
<a name="emr-kerberos-options"></a>

When you use Kerberos with Amazon EMR, you can choose from the architectures listed in this section. Regardless of the architecture that you choose, you configure Kerberos using the same steps. You create a security configuration, you specify the security configuration and compatible cluster-specific Kerberos options when you create the cluster, and you create HDFS directories for Linux users on the cluster that match user principals in the KDC. For an explanation of configuration options and example configurations for each architecture, see [Configuring Kerberos on Amazon EMR](emr-kerberos-configure.md).

## Cluster-dedicated KDC (KDC on primary node)
<a name="emr-kerberos-localkdc-summary"></a>

This configuration is available with Amazon EMR releases 5.10.0 and higher.

![\[Amazon EMRcluster architecture with master node, core nodes, and task node within a Kerberos realm.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-cluster-dedicated-kdc.png)


**Advantages**
+ Amazon EMR has full ownership of the KDC.
+ The KDC on the EMR cluster is independent from centralized KDC implementations such as Microsoft Active Directory or AWS Managed Microsoft AD.
+ Performance impact is minimal because the KDC manages authentication only for local nodes within the cluster.
+ Optionally, other Kerberized clusters can reference the KDC as an external KDC. For more information, see [External KDC—primary node on a different cluster](#emr-kerberos-extkdc-cluster-summary).

**Considerations and limitations**
+ Kerberized clusters can not authenticate to one another, so applications can not interoperate. If cluster applications need to interoperate, you must establish a cross-realm trust between clusters, or set up one cluster as the external KDC for other clusters. If a cross-realm trust is established, the KDCs must have different Kerberos realms.
+ You must create Linux users on the EC2 instance of the primary node that correspond to KDC user principals, along with the HDFS directories for each user.
+ User principals must use an EC2 private key file and `kinit` credentials to connect to the cluster using SSH.

## Cross-realm trust
<a name="emr-kerberos-crossrealm-summary"></a>

In this configuration, principals (usually users) from a different Kerberos realm authenticate to application components on a Kerberized EMR cluster, which has its own KDC. The KDC on the primary node establishes a trust relationship with another KDC using a *cross-realm principal* that exists in both KDCs. The principal name and the password match precisely in each KDC. Cross-realm trusts are most common with Active Directory implementations, as shown in the following diagram. Cross-realm trusts with an external MIT KDC or a KDC on another Amazon EMR cluster are also supported.

![\[Amazon EMR clusters in different Kerberos realms with cross-realm trust to Active Directory.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-cross-realm-trust.png)


**Advantages**
+ The EMR cluster on which the KDC is installed maintains full ownership of the KDC.
+ With Active Directory, Amazon EMR automatically creates Linux users that correspond to user principals from the KDC. You still must create HDFS directories for each user. In addition, user principals in the Active Directory domain can access Kerberized clusters using `kinit` credentials, without the EC2 private key file. This eliminates the need to share the private key file among cluster users.
+ Because each cluster KDC manages authentication for the nodes in the cluster, the effects of network latency and processing overhead for a large number of nodes across clusters is minimized.

**Considerations and limitations**
+ If you are establishing a trust with an Active Directory realm, you must provide an Active Directory user name and password with permissions to join principals to the domain when you create the cluster.
+ Cross-realm trusts cannot be established between Kerberos realms with the same name.
+ Cross-realm trusts must be established explicitly. For example, if Cluster A and Cluster B both establish a cross-realm trust with a KDC, they do not inherently trust one another and their applications cannot authenticate to one another to interoperate.
+ KDCs must be maintained independently and coordinated so that credentials of user principals match precisely.

## External KDC
<a name="emr-kerberos-extkdc-summary"></a>

Configurations with an External KDC are supported with Amazon EMR 5.20.0 and later.
+ [External KDC—MIT KDC](#emr-kerberos-extkdc-mit-summary)
+ [External KDC—primary node on a different cluster](#emr-kerberos-extkdc-cluster-summary)
+ [External KDC—cluster KDC on a different cluster with Active Directory cross-realm trust](#emr-kerberos-extkdc-ad-trust-summary)

### External KDC—MIT KDC
<a name="emr-kerberos-extkdc-mit-summary"></a>

This configuration allows one or more EMR clusters to use principals defined and maintained in an MIT KDC server.

![\[Amazon EMRcluster architecture with Kerberos realm, showing master, core, and task nodes.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-external-kdc.png)


**Advantages**
+ Managing principals is consolidated in a single KDC.
+ Multiple clusters can use the same KDC in the same Kerberos realm. For more information, see [Requirements for using multiple clusters with the same KDC](#emr-kerberos-multi-kdc).
+ The primary node on a Kerberized cluster does not have the performance burden associated with maintaining the KDC.

**Considerations and limitations**
+ You must create Linux users on the EC2 instance of each Kerberized cluster's primary node that correspond to KDC user principals, along with the HDFS directories for each user.
+ User principals must use an EC2 private key file and `kinit` credentials to connect to Kerberized clusters using SSH.
+ Each node in Kerberized EMR clusters must have a network route to the KDC.
+ Each node in Kerberized clusters places an authentication burden on the external KDC, so the configuration of the KDC affects cluster performance. When you configure the hardware of the KDC server, consider the maximum number of Amazon EMR nodes to be supported simultaneously.
+ Cluster performance is dependent on the network latency between nodes in Kerberized clusters and the KDC.
+ Troubleshooting can be more difficult because of interdependencies.

### External KDC—primary node on a different cluster
<a name="emr-kerberos-extkdc-cluster-summary"></a>

This configuration is nearly identical to the external MIT KDC implementation above, except that the KDC is on the primary node of an EMR cluster. For more information, see [Cluster-dedicated KDC (KDC on primary node)](#emr-kerberos-localkdc-summary) and [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md).

![\[Diagram of Amazon EMR clusters with Kerberos realm, showing master and core nodes.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-external-cluster-kdc.png)


**Advantages**
+ Managing principals is consolidated in a single KDC.
+ Multiple clusters can use the same KDC in the same Kerberos realm. For more information, see [Requirements for using multiple clusters with the same KDC](#emr-kerberos-multi-kdc).

**Considerations and limitations**
+ You must create Linux users on the EC2 instance of each Kerberized cluster's primary node that correspond to KDC user principals, along with the HDFS directories for each user.
+ User principals must use an EC2 private key file and `kinit` credentials to connect to Kerberized clusters using SSH.
+ Each node in each EMR cluster must have a network route to the KDC.
+ Each Amazon EMR node in Kerberized clusters places an authentication burden on the external KDC, so the configuration of the KDC affects cluster performance. When you configure the hardware of the KDC server, consider the maximum number of Amazon EMR nodes to be supported simultaneously.
+ Cluster performance is dependent on the network latency between nodes in the clusters and the KDC.
+ Troubleshooting can be more difficult because of interdependencies.

### External KDC—cluster KDC on a different cluster with Active Directory cross-realm trust
<a name="emr-kerberos-extkdc-ad-trust-summary"></a>

In this configuration, you first create a cluster with a cluster-dedicated KDC that has a one-way cross-realm trust with Active Directory. For a detailed tutorial, see [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md). You then launch additional clusters, referencing the cluster KDC that has the trust as an external KDC. For an example, see [External cluster KDC with Active Directory cross-realm trust](emr-kerberos-config-examples.md#emr-kerberos-example-extkdc-ad-trust). This allows each Amazon EMR cluster that uses the external KDC to authenticate principals defined and maintained in a Microsoft Active Directory domain.

![\[Amazon EMR clusters with Kerberos authentication and Active Directory integration diagram.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-external-ad-trust-kdc.png)


**Advantages**
+ Managing principals is consolidated in the Active Directory domain.
+ Amazon EMR joins the Active Directory realm, which eliminates the need to create Linux users that correspond Active Directory users. You still must create HDFS directories for each user.
+ Multiple clusters can use the same KDC in the same Kerberos realm. For more information, see [Requirements for using multiple clusters with the same KDC](#emr-kerberos-multi-kdc).
+ User principals in the Active Directory domain can access Kerberized clusters using `kinit` credentials, without the EC2 private key file. This eliminates the need to share the private key file among cluster users.
+ Only one Amazon EMR primary node has the burden of maintaining the KDC, and only that cluster must be created with Active Directory credentials for the cross-realm trust between the KDC and Active Directory.

**Considerations and limitations**
+ Each node in each EMR cluster must have a network route to the KDC and the Active Directory domain controller.
+ Each Amazon EMR node places an authentication burden on the external KDC, so the configuration of the KDC affects cluster performance. When you configure the hardware of the KDC server, consider the maximum number of Amazon EMR nodes to be supported simultaneously.
+ Cluster performance is dependent on the network latency between nodes in the clusters and the KDC server.
+ Troubleshooting can be more difficult because of interdependencies.

## Requirements for using multiple clusters with the same KDC
<a name="emr-kerberos-multi-kdc"></a>

Multiple clusters can use the same KDC in the same Kerberos realm. However, if the clusters concurrently run, then the clusters might fail if they use Kerberos ServicePrincipal names that conflict.

If you have multiple concurrent clusters with the same external KDC, then ensure that the clusters use different Kerberos realms. If the clusters must use the same Kerberos realm, then ensure that the clusters are in different subnets, and that their CIDR ranges don’t overlap. 

# Configuring Kerberos on Amazon EMR
<a name="emr-kerberos-configure"></a>

This section provides configuration details and examples for setting up Kerberos with common architectures. Regardless of the architecture you choose, the configuration basics are the same and done in three steps. If you use an external KDC or set up a cross-realm trust, you must ensure that every node in a cluster has a network route to the external KDC, including the configuration of applicable security groups to allow inbound and outbound Kerberos traffic.

## Step 1: Create a security configuration with Kerberos properties
<a name="emr-kerberos-step1-summary"></a>

The security configuration specifies details about the Kerberos KDC, and allows the Kerberos configuration to be re-used each time you create a cluster. You can create a security configuration using the Amazon EMR console, the AWS CLI, or the EMR API. The security configuration can also contain other security options, such as encryption. For more information about creating security configurations and specifying a security configuration when you create a cluster, see [Use security configurations to set up Amazon EMR cluster security](emr-security-configurations.md). For information about Kerberos properties in a security configuration, see [Kerberos settings for security configurations](emr-kerberos-configure-settings.md#emr-kerberos-security-configuration).

## Step 2: Create a cluster and specify cluster-specific Kerberos attributes
<a name="emr-kerberos-step2-summary"></a>

When you create a cluster, you specify a Kerberos security configuration along with cluster-specific Kerberos options. When you use the Amazon EMR console, only the Kerberos options compatible with the specified security configuration are available. When you use the AWS CLI or Amazon EMR API, ensure that you specify Kerberos options compatible with the specified security configuration. For example, if you specify a principal password for a cross-realm trust when you create a cluster using the CLI, and the specified security configuration is not configured with cross-realm trust parameters, an error occurs. For more information, see [Kerberos settings for clusters](emr-kerberos-configure-settings.md#emr-kerberos-cluster-configuration).

## Step 3: Configure the cluster primary node
<a name="emr-kerberos-step3-summary"></a>

Depending on the requirements of your architecture and implementation, additional set up on the cluster may be required. You can do this after you create it or using steps or bootstrap actions during the creation process.

For each Kerberos-authenticated user that connects to the cluster using SSH, you must ensure that Linux accounts are created that correspond to the Kerberos user. If user principals are provided by an Active Directory domain controller, either as the external KDC or through a cross-realm trust, Amazon EMR creates Linux accounts automatically. If Active Directory is not used, you must create principals for each user that correspond to their Linux user. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

Each user also must also have an HDFS user directory that they own, which you must create. In addition, SSH must be configured with GSSAPI enabled to allow connections from Kerberos-authenticated users. GSSAPI must be enabled on the primary node, and the client SSH application must be configured to use GSSAPI. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

# Security configuration and cluster settings for Kerberos on Amazon EMR
<a name="emr-kerberos-configure-settings"></a>

When you create a Kerberized cluster, you specify the security configuration together with Kerberos attributes that are specific to the cluster. You can't specify one set without the other, or an error occurs.

This topic provides an overview of the configuration parameters available for Kerberos when you create a security configuration and a cluster. In addition, CLI examples for creating compatible security configurations and clusters are provided for common architectures.

## Kerberos settings for security configurations
<a name="emr-kerberos-security-configuration"></a>

You can create a security configuration that specifies Kerberos attributes using the Amazon EMR console, the AWS CLI, or the EMR API. The security configuration can also contain other security options, such as encryption. For more information, see [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md).

Use the following references to understand the available security configuration settings for the Kerberos architecture that you choose. Amazon EMR console settings are shown. For corresponding CLI options, see [Specifying Kerberos settings using the AWS CLI](emr-create-security-configuration.md#emr-kerberos-cli-parameters) or [Configuration examples](emr-kerberos-config-examples.md).

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-kerberos-configure-settings.html)

## Kerberos settings for clusters
<a name="emr-kerberos-cluster-configuration"></a>

You can specify Kerberos settings when you create a cluster using the Amazon EMR console, the AWS CLI, or the EMR API.

Use the following references to understand the available cluster configuration settings for the Kerberos architecture that you choose. Amazon EMR console settings are shown. For corresponding CLI options, see [Configuration examples](emr-kerberos-config-examples.md).


| Parameter | Description | 
| --- | --- | 
|  Realm  |  The Kerberos realm name for the cluster. The Kerberos convention is to set this to be the same as the domain name, but in uppercase. For example, for the domain `ec2.internal`, using `EC2.INTERNAL` as the realm name.  | 
|  KDC admin password  |  The password used within the cluster for `kadmin` or `kadmin.local`. These are command-line interfaces to the Kerberos V5 administration system, which maintains Kerberos principals, password policies, and keytabs for the cluster.   | 
|  Cross-realm trust principal password (optional)  |  Required when establishing a cross-realm trust. The cross-realm principal password, which must be identical across realms. Use a strong password.  | 
|  Active Directory domain join user (optional)  |  Required when using Active Directory in a cross-realm trust. This is the user logon name of an Active Directory account with permission to join computers to the domain. Amazon EMR uses this identity to join the cluster to the domain. For more information, see [Step 3: Add accounts to the domain for the EMR Cluster](emr-kerberos-cross-realm.md#emr-kerberos-ad-users).  | 
|  Active Directory domain join password (optional)  |  The password for the Active Directory domain join user. For more information, see [Step 3: Add accounts to the domain for the EMR Cluster](emr-kerberos-cross-realm.md#emr-kerberos-ad-users).  | 

# Configuration examples
<a name="emr-kerberos-config-examples"></a>

The following examples demonstrate security configurations and cluster configurations for common scenarios. AWS CLI commands are shown for brevity.

## Local KDC
<a name="emr-kerberos-example-local-kdc"></a>

The following commands create a cluster with a cluster-dedicated KDC running on the primary node. Additional configuration on the cluster is required. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name LocalKDCSecurityConfig \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ClusterDedicatedKdc",\
"ClusterDedicatedKdcConfiguration": {"TicketLifetimeInHours": 24 }}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge \
--applications Name=Hadoop Name=Hive --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole \
--security-configuration LocalKDCSecurityConfig \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=MyPassword
```

## Cluster-dedicated KDC with Active Directory cross-realm trust
<a name="emr-kerberos-example-crossrealm"></a>

The following commands create a cluster with a cluster-dedicated KDC running on the primary node with a cross-realm trust to an Active Directory domain. Additional configuration on the cluster and in Active Directory is required. For more information, see [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name LocalKDCWithADTrustSecurityConfig \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ClusterDedicatedKdc", \
"ClusterDedicatedKdcConfiguration": {"TicketLifetimeInHours": 24, \
"CrossRealmTrustConfiguration": {"Realm":"AD.DOMAIN.COM", \
"Domain":"ad.domain.com", "AdminServer":"ad.domain.com", \
"KdcServer":"ad.domain.com"}}}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge --applications Name=Hadoop Name=Hive \
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole --security-configuration KDCWithADTrustSecurityConfig \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=MyClusterKDCAdminPassword,\
ADDomainJoinUser=ADUserLogonName,ADDomainJoinPassword=ADUserPassword,\
CrossRealmTrustPrincipalPassword=MatchADTrustPassword
```

## External KDC on a different cluster
<a name="emr-kerberos-example-extkdc-cluster"></a>

The following commands create a cluster that references a cluster-dedicated KDC on the primary node of a different cluster to authenticate principals. Additional configuration on the cluster is required. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name ExtKDCOnDifferentCluster \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ExternalKdc", \
"ExternalKdcConfiguration": {"KdcServerType": "Single", \
"AdminServer": "MasterDNSOfKDCMaster:749", \
"KdcServer": "MasterDNSOfKDCMaster:88"}}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge \
--applications Name=Hadoop Name=Hive \
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole --security-configuration ExtKDCOnDifferentCluster \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=KDCOnMasterPassword
```

## External cluster KDC with Active Directory cross-realm trust
<a name="emr-kerberos-example-extkdc-ad-trust"></a>

The following commands create a cluster with no KDC. The cluster references a cluster-dedicated KDC running on the primary node of another cluster to authenticate principals. That KDC has a cross-realm trust with an Active Directory domain controller. Additional configuration on the primary node with the KDC is required. For more information, see [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name ExtKDCWithADIntegration \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ExternalKdc", \
"ExternalKdcConfiguration": {"KdcServerType": "Single", \
"AdminServer": "MasterDNSofClusterKDC:749", \
"KdcServer": "MasterDNSofClusterKDC.com:88", \
"AdIntegrationConfiguration": {"AdRealm":"AD.DOMAIN.COM", \
"AdDomain":"ad.domain.com", \
"AdServer":"ad.domain.com"}}}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge --applications Name=Hadoop Name=Hive \
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole --security-configuration ExtKDCWithADIntegration \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=KDCOnMasterPassword,\
ADDomainJoinUser=MyPrivilegedADUserName,ADDomainJoinPassword=PasswordForADDomainJoinUser
```

# Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections
<a name="emr-kerberos-configuration-users"></a>

Amazon EMR creates Kerberos-authenticated user clients for the applications that run on the cluster—for example, the `hadoop` user, `spark` user, and others. You can also add users who are authenticated to cluster processes using Kerberos. Authenticated users can then connect to the cluster with their Kerberos credentials and work with applications. For a user to authenticate to the cluster, the following configurations are required:
+ A Linux account matching the Kerberos principal in the KDC must exist on the cluster. Amazon EMR does this automatically in architectures that integrate with Active Directory.
+ You must create an HDFS user directory on the primary node for each user, and give the user permissions to the directory.
+ You must configure the SSH service so that GSSAPI is enabled on the primary node. In addition, users must have an SSH client with GSSAPI enabled.

## Adding Linux users and Kerberos principals to the primary node
<a name="emr-kerberos-configure-linux-kdc"></a>

If you do not use Active Directory, you must create Linux accounts on the cluster primary node and add principals for these Linux users to the KDC. This includes a principal in the KDC for the primary node. In addition to the user principals, the KDC running on the primary node needs a principal for the local host.

When your architecture includes Active Directory integration, Linux users and principals on the local KDC, if applicable, are created automatically. You can skip this step. For more information, see [Cross-realm trust](emr-kerberos-options.md#emr-kerberos-crossrealm-summary) and [External KDC—cluster KDC on a different cluster with Active Directory cross-realm trust](emr-kerberos-options.md#emr-kerberos-extkdc-ad-trust-summary).

**Important**  
The KDC, along with the database of principals, is lost when the primary node terminates because the primary node uses ephemeral storage. If you create users for SSH connections, we recommend that you establish a cross-realm trust with an external KDC configured for high-availability. Alternatively, if you create users for SSH connections using Linux accounts, automate the account creation process using bootstrap actions and scripts so that it can be repeated when you create a new cluster.

Submitting a step to the cluster after you create it or when you create the cluster is the easiest way to add users and KDC principals. Alternatively, you can connect to the primary node using an EC2 key pair as the default `hadoop` user to run the commands. For more information, see [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md).

The following example submits a bash script `configureCluster.sh` to a cluster that already exists, referencing its cluster ID. The script is saved to Amazon S3. 

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,\
Args=["s3://amzn-s3-demo-bucket/configureCluster.sh"]
```

The following example demonstrates the contents of the `configureCluster.sh` script. The script also handles creating HDFS user directories and enabling GSSAPI for SSH, which are covered in the following sections.

```
#!/bin/bash
#Add a principal to the KDC for the primary node, using the primary node's returned host name
sudo kadmin.local -q "ktadd -k /etc/krb5.keytab host/`hostname -f`"
#Declare an associative array of user names and passwords to add
declare -A arr
arr=([lijuan]=pwd1 [marymajor]=pwd2 [richardroe]=pwd3)
for i in ${!arr[@]}; do
    #Assign plain language variables for clarity
     name=${i} 
     password=${arr[${i}]}

     # Create a principal for each user in the primary node and require a new password on first logon
     sudo kadmin.local -q "addprinc -pw $password +needchange $name"

     #Add hdfs directory for each user
     hdfs dfs -mkdir /user/$name

     #Change owner of each user's hdfs directory to that user
     hdfs dfs -chown $name:$name /user/$name
done

# Enable GSSAPI authentication for SSH and restart SSH service
sudo sed -i 's/^.*GSSAPIAuthentication.*$/GSSAPIAuthentication yes/' /etc/ssh/sshd_config
sudo sed -i 's/^.*GSSAPICleanupCredentials.*$/GSSAPICleanupCredentials yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
```

## Adding user HDFS directories
<a name="emr-kerberos-configure-HDFS"></a>

To allow your users to log in to the cluster to run Hadoop jobs, you must add HDFS user directories for their Linux accounts, and grant each user ownership of their directory.

Submitting a step to the cluster after you create it or when you create the cluster is the easiest way to create HDFS directories. Alternatively, you could connect to the primary node using an EC2 key pair as the default `hadoop` user to run the commands. For more information, see [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md).

The following example submits a bash script `AddHDFSUsers.sh` to a cluster that already exists, referencing its cluster ID. The script is saved to Amazon S3. 

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/AddHDFSUsers.sh"]
```

The following example demonstrates the contents of the `AddHDFSUsers.sh` script.

```
#!/bin/bash
# AddHDFSUsers.sh script

# Initialize an array of user names from AD, or Linux users created manually on the cluster
ADUSERS=("lijuan" "marymajor" "richardroe" "myusername")

# For each user listed, create an HDFS user directory
# and change ownership to the user

for username in ${ADUSERS[@]}; do
     hdfs dfs -mkdir /user/$username
     hdfs dfs -chown $username:$username /user/$username
done
```

## Enabling GSSAPI for SSH
<a name="emr-kerberos-ssh-config"></a>

For Kerberos-authenticated users to connect to the primary node using SSH, the SSH service must have GSSAPI authentication enabled. To enable GSSAPI, run the following commands from the primary node command line or use a step to run it as a script. After reconfiguring SSH, you must restart the service.

```
sudo sed -i 's/^.*GSSAPIAuthentication.*$/GSSAPIAuthentication yes/' /etc/ssh/sshd_config
sudo sed -i 's/^.*GSSAPICleanupCredentials.*$/GSSAPICleanupCredentials yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
```

# Using SSH to connect to Kerberized clusters with Amazon EMR
<a name="emr-kerberos-connect-ssh"></a>

This section demonstrates the steps for a Kerberos-authenticated user to connect to the primary node of an EMR cluster.

Each computer that is used for an SSH connection must have SSH client and Kerberos client applications installed. Linux computers most likely include these by default. For example, OpenSSH is installed on most Linux, Unix, and macOS operating systems. You can check for an SSH client by typing **ssh** at the command line. If your computer does not recognize the command, install an SSH client to connect to the primary node. The OpenSSH project provides a free implementation of the full suite of SSH tools. For more information, see the [OpenSSH](http://www.openssh.org/) website. Windows users can use applications such as [PuTTY](http://www.chiark.greenend.org.uk/~sgtatham/putty/) as an SSH client. 

For more information about SSH connections, see [Connect to an Amazon EMR cluster](emr-connect-master-node.md).

SSH uses GSSAPI for authenticating Kerberos clients, and you must enable GSSAPI authentication for the SSH service on the cluster primary node. For more information, see [Enabling GSSAPI for SSH](emr-kerberos-configuration-users.md#emr-kerberos-ssh-config). SSH clients must also use GSSAPI.

In the following examples, for *MasterPublicDNS* use the value that appears for **Master public DNS** on the **Summary** tab of the cluster details pane—for example, *ec2-11-222-33-44.compute-1.amazonaws.com*.

## Prerequisite for krb5.conf (non-Active Directory)
<a name="emr-kerberos-conffile"></a>

When using a configuration without Active Directory integration, in addition to the SSH client and Kerberos client applications, each client computer must have a copy of the `/etc/krb5.conf` file that matches the `/etc/krb5.conf` file on the cluster primary node.

**To copy the krb5.conf file**

1. Use SSH to connect to the primary node using an EC2 key pair and the default `hadoop` user—for example, `hadoop@MasterPublicDNS`. For detailed instructions, see [Connect to an Amazon EMR cluster](emr-connect-master-node.md).

1. From the primary node, copy the contents of the `/etc/krb5.conf` file . For more information, see [Connect to an Amazon EMR cluster](emr-connect-master-node.md).

1. On each client computer that will connect to the cluster, create an identical `/etc/krb5.conf` file based on the copy that you made in the previous step.

## Using kinit and SSH
<a name="emr-kerberos-kinit-ssh"></a>

Each time a user connects from a client computer using Kerberos credentials, the user must first renew Kerberos tickets for their user on the client computer. In addition, the SSH client must be configured to use GSSAPI authentication.

**To use SSH to connect to a Kerberized EMR cluster**

1. Use `kinit` to renew your Kerberos tickets as shown in the following example

   ```
   kinit user1
   ```

1. Use an `ssh` client along with the principal that you created in the cluster-dedicated KDC or Active Directory user name. Make sure that GSSAPI authentication is enabled as shown in the following examples.

   **Example: Linux users**

   The `-K `option specifies GSSAPI authentication.

   ```
   ssh -K user1@MasterPublicDNS
   ```

   **Example: Windows users (PuTTY)**

   Make sure that the GSSAPI authentication option for the session is enabled as shown:  
![\[PuTTY Configuration window showing GSSAPI authentication options and library preferences.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-gssapi-putty.png)

# Tutorial: Configure an cluster-dedicated KDC with Amazon EMR
<a name="emr-kerberos-cluster-kdc"></a>

This topic guides you through creating a cluster with a cluster-dedicated *key distribution center (KDC)*, manually adding Linux accounts to all cluster nodes, adding Kerberos principals to the KDC on the primary node, and ensuring that client computers have a Kerberos client installed.

For more information on Amazon EMR support for Kerberos and KDC, as well as links to MIT Kerberos Documentation, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md).

## Step 1: Create the Kerberized cluster
<a name="emr-kerberos-clusterdedicated-cluster"></a>

1. Create a security configuration that enables Kerberos. The following example demonstrates a `create-security-configuration` command using the AWS CLI that specifies the security configuration as an inline JSON structure. You can also reference a file saved locally.

   ```
   aws emr create-security-configuration --name MyKerberosConfig \
   --security-configuration '{"AuthenticationConfiguration": {"KerberosConfiguration": 
   {"Provider": "ClusterDedicatedKdc", "ClusterDedicatedKdcConfiguration": {"TicketLifetimeInHours": 24}}}}'
   ```

1. Create a cluster that references the security configuration, establishes Kerberos attributes for the cluster, and adds Linux accounts using a bootstrap action. The following example demonstrates a `create-cluster `command using the AWS CLI. The command references the security configuration that you created above, `MyKerberosConfig`. It also references a simple script, `createlinuxusers.sh`, as a bootstrap action, which you create and upload to Amazon S3 before creating the cluster.

   ```
   aws emr create-cluster --name "MyKerberosCluster" \
   --release-label emr-7.12.0 \
   --instance-type m5.xlarge \
   --instance-count 3 \
   --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair \
   --service-role EMR_DefaultRole \
   --security-configuration MyKerberosConfig \
   --applications Name=Hadoop Name=Hive Name=Oozie Name=Hue Name=HCatalog Name=Spark \
   --kerberos-attributes Realm=EC2.INTERNAL,\
   KdcAdminPassword=MyClusterKDCAdminPwd \
   --bootstrap-actions Path=s3://amzn-s3-demo-bucket/createlinuxusers.sh
   ```

   The following code demonstrates the contents of the `createlinuxusers.sh` script, which adds user1, user2, and user3 to each node in the cluster. In the next step, you add these users as KDC principals.

   ```
   #!/bin/bash
   sudo adduser user1
   sudo adduser user2
   sudo adduser user3
   ```

## Step 2: Add principals to the KDC, create HDFS user directories, and configure SSH
<a name="emr-kerberos-clusterdedicated-KDC"></a>

The KDC running on the primary node needs a principal added for the local host and for each user that you create on the cluster. You may also create HDFS directories for each user if they need to connect to the cluster and run Hadoop jobs. Similarly, configure the SSH service to enable GSSAPI authentication, which is required for Kerberos. After you enable GSSAPI, restart the SSH service.

The easiest way to accomplish these tasks is to submit a step to the cluster. The following example submits a bash script `configurekdc.sh` to the cluster you created in the previous step, referencing its cluster ID. The script is saved to Amazon S3. Alternatively, you can connect to the primary node using an EC2 key pair to run the commands or submit the step during cluster creation.

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://myregion.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/configurekdc.sh"]
```

The following code demonstrates the contents of the `configurekdc.sh` script.

```
#!/bin/bash
#Add a principal to the KDC for the primary node, using the primary node's returned host name
sudo kadmin.local -q "ktadd -k /etc/krb5.keytab host/`hostname -f`"
#Declare an associative array of user names and passwords to add
declare -A arr
arr=([user1]=pwd1 [user2]=pwd2 [user3]=pwd3)
for i in ${!arr[@]}; do
    #Assign plain language variables for clarity
     name=${i} 
     password=${arr[${i}]}

     # Create principal for sshuser in the primary node and require a new password on first logon
     sudo kadmin.local -q "addprinc -pw $password +needchange $name"

     #Add user hdfs directory
     hdfs dfs -mkdir /user/$name

     #Change owner of user's hdfs directory to user
     hdfs dfs -chown $name:$name /user/$name
done

# Enable GSSAPI authentication for SSH and restart SSH service
sudo sed -i 's/^.*GSSAPIAuthentication.*$/GSSAPIAuthentication yes/' /etc/ssh/sshd_config
sudo sed -i 's/^.*GSSAPICleanupCredentials.*$/GSSAPICleanupCredentials yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
```

The users that you added should now be able to connect to the cluster using SSH. For more information, see [Using SSH to connect to Kerberized clusters with Amazon EMR](emr-kerberos-connect-ssh.md).

# Tutorial: Configure a cross-realm trust with an Active Directory domain
<a name="emr-kerberos-cross-realm"></a>

When you set up a cross-realm trust, you allow principals (usually users) from a different Kerberos realm to authenticate to application components on the EMR cluster. The cluster-dedicated *key distribution center (KDC)* establishes a trust relationship with another KDC using a *cross-realm principal* that exists in both KDCs. The principal name and the password match precisely.

A cross-realm trust requires that the KDCs can reach one another over the network and resolve each other's domain names. Steps for establishing a cross-realm trust relationship with a Microsoft AD domain controller running as an EC2 instance are provided below, along with an example network setup that provides the required connectivity and domain-name resolution. Any network setup that allows the required network traffic between KDCs is acceptable.

Optionally, after you establish a cross-realm trust with Active Directory using a KDC on one cluster, you can create another cluster using a different security configuration to reference the KDC on the first cluster as an external KDC. For an example security configuration and cluster set up, see [External cluster KDC with Active Directory cross-realm trust](emr-kerberos-config-examples.md#emr-kerberos-example-extkdc-ad-trust).

For more information on Amazon EMR support for Kerberos and KDC, as well as links to MIT Kerberos Documentation, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md).

**Important**  
Amazon EMR does not support cross-realm trusts with AWS Directory Service for Microsoft Active Directory.

[Step 1: Set up the VPC and subnet](#emr-kerberos-ad-network)

[Step 2: Launch and install the Active Directory domain controller](#emr-kerberos-ad-dc)

[Step 3: Add accounts to the domain for the EMR Cluster](#emr-kerberos-ad-users)

[Step 4: Configure an incoming trust on the Active Directory domain controller](#emr-kerberos-ad-configure-trust)

[Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server](#emr-kerberos-ad-DHCP)

[Step 6: Launch a Kerberized EMR Cluster](#emr-kerberos-ad-cluster)

[Step 7: Create HDFS users and set permissions on the cluster for Active Directory accounts](#emr-kerberos-ad-hadoopuser)

## Step 1: Set up the VPC and subnet
<a name="emr-kerberos-ad-network"></a>

The following steps demonstrate creating a VPC and subnet so that the cluster-dedicated KDC can reach the Active Directory domain controller and resolve its domain name. In these steps, domain-name resolution is provided by referencing the Active Directory domain controller as the domain name server in the DHCP option set. For more information, see [Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server](#emr-kerberos-ad-DHCP).

The KDC and the Active Directory domain controller must be able to resolve one other's domain names. This allows Amazon EMR to join computers to the domain and automatically configure corresponding Linux accounts and SSH parameters on cluster instances. 

If Amazon EMR can't resolve the domain name, you can reference the trust using the Active Directory domain controller's IP address. However, you must manually add Linux accounts, add corresponding principals to the cluster-dedicated KDC, and configure SSH.

**To set up the VPC and subnet**

1. Create an Amazon VPC with a single public subnet. For more information, see [Step 1: Create the VPC](https://docs.aws.amazon.com/AmazonVPC/latest/GettingStartedGuide/getting-started-ipv4.html#getting-started-create-vpc) in the *Amazon VPC Getting Started Guide*.
**Important**  
When you use a Microsoft Active Directory domain controller, choose a CIDR block for the EMR cluster so that all IPv4 addresses are fewer than nine characters in length (for example, 10.0.0.0/16). This is because the DNS names of cluster computers are used when the computers join the Active Directory directory. AWS assigns [DNS hostnames](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-hostnames) based on IPv4 address in a way that longer IP addresses may result in DNS names longer than 15 characters. Active Directory has a 15-character limit for registering joined computer names, and truncates longer names, which can cause unpredictable errors.

1. Remove the default DHCP option set assigned to the VPC. For more information, see [Changing a VPC to use No DHCP options](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_DHCP_Options.html#DHCP_Use_No_Options). Later on, you add a new one that specifies the Active Directory domain controller as the DNS server. 

1. Confirm that DNS support is enabled for the VPC, that is, that DNS Hostnames and DNS Resolution are both enabled. They are enabled by default. For more information, see [Updating DNS support for your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-updating).

1. Confirm that your VPC has an internet gateway attached, which is the default. For more information, see [Creating and attaching an internet gateway](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html#Add_IGW_Attach_Gateway).
**Note**  
An internet gateway is used in this example because you are establishing a new domain controller for the VPC. An internet gateway may not be required for your application. The only requirement is that the cluster-dedicated KDC can access the Active Directory domain controller.

1. Create a custom route table, add a route that targets the Internet Gateway, and then attach it to your subnet. For more information, see [Create a custom route table](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html#Add_IGW_Routing).

1. When you launch the EC2 instance for the domain controller, it must have a static public IPv4 address for you to connect to it using RDP. The easiest way to do this is to configure your subnet to auto-assign public IPv4 addresses. This is not the default setting when a subnet is created. For more information, see [Modifying the public IPv4 addressing attribute of your subnet](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-ip-addressing.html#subnet-public-ip). Optionally, you can assign the address when you launch the instance. For more information, see [Assigning a public IPv4 address during instance launch](https://docs.aws.amazon.com/vpc/latest/userguide/using-instance-addressing.html#public-ip-addresses).

1. When you finish, make a note of your VPC and subnet IDs. You use them later when you launch the Active Directory domain controller and the cluster.

## Step 2: Launch and install the Active Directory domain controller
<a name="emr-kerberos-ad-dc"></a>

1. Launch an EC2 instance based on the Microsoft Windows Server 2016 Base AMI. We recommend an m4.xlarge or better instance type. For more information, see [Launching an AWS Marketplace instance](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/launch-marketplace-console.html) in the *Amazon EC2 User Guide*.

1. Make a note of the Group ID of the security group associated with the EC2 instance. You need it for [Step 6: Launch a Kerberized EMR Cluster](#emr-kerberos-ad-cluster). We use *sg-012xrlmdomain345*. Alternatively, you can specify different security groups for the EMR cluster and this instance that allows traffic between them. For more information, see [Amazon EC2 security groups for Linux instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html) in the *Amazon EC2 User Guide*.

1. Connect to the EC2 instance using RDP. For more information, see [Connecting to your Windows instance](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/connecting_to_windows_instance.html) in the *Amazon EC2 User Guide*.

1. Start **Server Manager** to install and configure the Active Directory domain Services role on the server. Promote the server to a domain controller and assign a domain name (the example we use here is `ad.domain.com`). Make a note of the domain name because you need it later when you create the EMR security configuration and cluster. If you are new to setting up Active Directory, you can follow the instructions in [How to set up Active Directory (AD) in Windows Server 2016](https://ittutorials.net/microsoft/windows-server-2016/setting-up-active-directory-ad-in-windows-server-2016/).

   The instance restarts when you finish.

## Step 3: Add accounts to the domain for the EMR Cluster
<a name="emr-kerberos-ad-users"></a>

RDP to the Active Directory domain controller to create accounts in Active Directory Users and Computers for each cluster user. For more information, see [Create a User Account in Active Directory Users and Computers](https://technet.microsoft.com/en-us/library/dd894463(v=ws.10).aspx) on the *Microsoft Learn* site. Make a note of each user's **User logon name**. You need these later when you configure the cluster. 

In addition, create a account with sufficient privileges to join computers to the domain. You specify this account when you create a cluster. Amazon EMR uses it to join cluster instances to the domain. You specify this account and its password in [Step 6: Launch a Kerberized EMR Cluster](#emr-kerberos-ad-cluster). To delegate computer join privileges to the account, we recommend that you create a group with join privileges and then assign the user to the group. For instructions, see [Delegating directory join privileges](https://docs.aws.amazon.com/directoryservice/latest/admin-guide/directory_join_privileges.html) in the *AWS Directory Service Administration Guide*.

## Step 4: Configure an incoming trust on the Active Directory domain controller
<a name="emr-kerberos-ad-configure-trust"></a>

The example commands below create a trust in Active Directory, which is a one-way, incoming, non-transitive, realm trust with the cluster-dedicated KDC. The example we use for the cluster's realm is `EC2.INTERNAL`. Replace the *KDC-FQDN* with the **Public DNS** name listed for the Amazon EMR primary node hosting the KDC. The `passwordt` parameter specifies the **cross-realm principal password**, which you specify along with the cluster **realm** when you create a cluster. The realm name is derived from the default domain name in `us-east-1` for the cluster. The `Domain` is the Active Directory domain in which you are creating the trust, which is lower case by convention. The example uses `ad.domain.com`

Open the Windows command prompt with administrator privileges and type the following commands to create the trust relationship on the Active Directory domain controller:

```
C:\Users\Administrator> ksetup /addkdc EC2.INTERNAL KDC-FQDN
C:\Users\Administrator> netdom trust EC2.INTERNAL /Domain:ad.domain.com /add /realm /passwordt:MyVeryStrongPassword
C:\Users\Administrator> ksetup /SetEncTypeAttr EC2.INTERNAL AES256-CTS-HMAC-SHA1-96
```

## Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server
<a name="emr-kerberos-ad-DHCP"></a>

Now that the Active Directory domain controller is configured, you must configure the VPC to use it as a domain name server for name resolution within your VPC. To do this, attach a DHCP options set. Specify the **Domain name** as the domain name of your cluster - for example, `ec2.internal` if your cluster is in us-east-1 or `region.compute.internal` for other regions. For **Domain name servers**, you must specify the IP address of the Active Directory domain controller (which must be reachable from the cluster) as the first entry, followed by **AmazonProvidedDNS** (for example, ***xx.xx.xx.xx*,AmazonProvidedDNS**). For more information, see [Changing DHCP option sets](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_DHCP_Options.html#DHCPOptions).

## Step 6: Launch a Kerberized EMR Cluster
<a name="emr-kerberos-ad-cluster"></a>

1. In Amazon EMR, create a security configuration that specifies the Active Directory domain controller you created in the previous steps. An example command is shown below. Replace the domain, `ad.domain.com`, with the name of the domain you specified in [Step 2: Launch and install the Active Directory domain controller](#emr-kerberos-ad-dc).

   ```
   aws emr create-security-configuration --name MyKerberosConfig \
   --security-configuration '{
     "AuthenticationConfiguration": {
       "KerberosConfiguration": {
         "Provider": "ClusterDedicatedKdc",
         "ClusterDedicatedKdcConfiguration": {
           "TicketLifetimeInHours": 24,
           "CrossRealmTrustConfiguration": {
             "Realm": "AD.DOMAIN.COM",
             "Domain": "ad.domain.com",
             "AdminServer": "ad.domain.com",
             "KdcServer": "ad.domain.com"
           }
         }
       }
     }
   }'
   ```

1. Create the cluster with the following attributes:
   + Use the `--security-configuration` option to specify the security configuration that you created. We use *MyKerberosConfig* in the example.
   + Use the `SubnetId` property of the `--ec2-attributes option` to specify the subnet that you created in [Step 1: Set up the VPC and subnet](#emr-kerberos-ad-network). We use *step1-subnet* in the example.
   + Use the `AdditionalMasterSecurityGroups` and `AdditionalSlaveSecurityGroups` of the `--ec2-attributes` option to specify that the security group associated with the AD domain controller from [Step 2: Launch and install the Active Directory domain controller](#emr-kerberos-ad-dc) is associated with the cluster primary node as well as core and task nodes. We use *sg-012xrlmdomain345* in the example.

   Use `--kerberos-attributes` to specify the following cluster-specific Kerberos attributes:
   + The realm for the cluster that you specified when you set up the Active Directory domain controller.
   + The cross-realm trust principal password that you specified as `passwordt` in [Step 4: Configure an incoming trust on the Active Directory domain controller](#emr-kerberos-ad-configure-trust).
   + A `KdcAdminPassword`, which you can use to administer the cluster-dedicated KDC.
   + The user logon name and password of the Active Directory account with computer join privileges that you created in [Step 3: Add accounts to the domain for the EMR Cluster](#emr-kerberos-ad-users).

   The following example launches a Kerberized cluster.

   ```
   aws emr create-cluster --name "MyKerberosCluster" \
   --release-label emr-5.10.0 \
   --instance-type m5.xlarge \
   --instance-count 3 \
   --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair,\
   SubnetId=step1-subnet, AdditionalMasterSecurityGroups=sg-012xrlmdomain345,
   AdditionalSlaveSecurityGroups=sg-012xrlmdomain345\
   --service-role EMR_DefaultRole \
   --security-configuration MyKerberosConfig \
   --applications Name=Hadoop Name=Hive Name=Oozie Name=Hue Name=HCatalog Name=Spark \
   --kerberos-attributes Realm=EC2.INTERNAL,\
   KdcAdminPassword=MyClusterKDCAdminPwd,\
   ADDomainJoinUser=ADUserLogonName,ADDomainJoinPassword=ADUserPassword,\
   CrossRealmTrustPrincipalPassword=MatchADTrustPwd
   ```

## Step 7: Create HDFS users and set permissions on the cluster for Active Directory accounts
<a name="emr-kerberos-ad-hadoopuser"></a>

When setting up a trust relationship with Active Directory, Amazon EMR creates Linux users on the cluster for each Active Directory account. For example, the user logon name `LiJuan` in Active Directory has a Linux account of `lijuan`. Active Directory user names can contain upper-case letters, but Linux does not honor Active Directory casing.

To allow your users to log in to the cluster to run Hadoop jobs, you must add HDFS user directories for their Linux accounts, and grant each user ownership of their directory. To do this, we recommend that you run a script saved to Amazon S3 as a cluster step. Alternatively, you can run the commands in the script below from the command line on the primary node. Use the EC2 key pair that you specified when you created the cluster to connect to the primary node over SSH as the Hadoop user. For more information, see [Use an EC2 key pair for SSH credentials for Amazon EMR](emr-plan-access-ssh.md).

Run the following command to add a step to the cluster that runs a script, *AddHDFSUsers.sh*.

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/AddHDFSUsers.sh"]
```

The contents of the file *AddHDFSUsers.sh* is as follows.

```
#!/bin/bash
# AddHDFSUsers.sh script

# Initialize an array of user names from AD or Linux users and KDC principals created manually on the cluster
ADUSERS=("lijuan" "marymajor" "richardroe" "myusername")

# For each user listed, create an HDFS user directory
# and change ownership to the user

for username in ${ADUSERS[@]}; do
     hdfs dfs -mkdir /user/$username
     hdfs dfs -chown $username:$username /user/$username
done
```

### Active Directory groups mapped to Hadoop groups
<a name="emr-kerberos-ad-group"></a>

Amazon EMR uses System Security Services Daemon (SSD) to map Active Directory groups to Hadoop groups. To confirm group mappings, after you log in to the primary node as described in [Using SSH to connect to Kerberized clusters with Amazon EMR](emr-kerberos-connect-ssh.md), you can use the `hdfs groups` command to confirm that Active Directory groups to which your Active Directory account belongs have been mapped to Hadoop groups for the corresponding Hadoop user on the cluster. You can also check other users' group mappings by specifying one or more user names with the command, for example `hdfs groups lijuan`. For more information, see [groups](https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#groups) in the [Apache HDFS Commands Guide](https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html).

# Use Active Directory or LDAP servers for authentication with Amazon EMR
<a name="ldap"></a>

With Amazon EMR releases 6.12.0 and higher, you can use the LDAP over SSL (LDAPS) protocol to launch a cluster that natively integrates with your corporate identity server. LDAP (Lightweight Directory Access Protocol) is an open, vendor-neutral application protocol that accesses and maintains data. LDAP is commonly used for user authentication against corporate identity servers that are hosted on applications such as Active Directory (AD) and OpenLDAP. With this native integration, you can use your LDAP server to authenticate users on Amazon EMR.

Highlights of the Amazon EMR LDAP integration include:
+ Amazon EMR configures the supported applications to authenticate with LDAP authentication on your behalf.
+ Amazon EMR configures and maintains security for the supported applications with the Kerberos protocol. You don't need to input any commands or scripts.
+ You get fine-grained access control (FGAC) through Apache Ranger authorization for Hive Metastore database and tables. See [Integrate Amazon EMR with Apache Ranger](emr-ranger.md) for more information.
+ When you require LDAP credentials to access a cluster, you get fine-grained access control (FGAC) over who can access your EMR clusters through SSH.

The following pages provide a conceptual overview, prerequisites, and steps to launch an EMR cluster with the Amazon EMR LDAP integration.

**Topics**
+ [Overview of LDAP with Amazon EMR](ldap-overview.md)
+ [LDAP components for Amazon EMR](ldap-components.md)
+ [Application support and considerations with LDAP for Amazon EMR](ldap-considerations.md)
+ [Configure and launch an EMR cluster with LDAP](ldap-setup.md)
+ [Examples using LDAP with Amazon EMR](ldap-examples.md)

# Overview of LDAP with Amazon EMR
<a name="ldap-overview"></a>

Lightweight Directory Access Protocol (LDAP) is a software protocol that network administrators use to manage and control access to data by authenticating users within a company’s network. The LDAP protocol stores information in a hierarchical, tree directory structure. For more information, see [Basic LDAP Concepts](https://ldap.com/basic-ldap-concepts/) on *LDAP.com*.

Within a company’s network, many applications might use the LDAP protocol to authenticate users. With the Amazon EMR LDAP integration, EMR clusters can natively use the same LDAP protocol with an added security configuration.

There are two major implementations of the LDAP protocol that Amazon EMR supports: **Active Directory** and **OpenLDAP**. While other implementations are possible, most fit the same authentication protocols as Active Directory or OpenLDAP.

## Active Directory (AD)
<a name="ldap-ad"></a>

Active Directory (AD) is a directory service from Microsoft for Windows domain networks. AD is included on most Windows Server operating systems, and can communicate with clients over the LDAP and LDAPS protocols. For authentication, Amazon EMR attempts a user-bind with your AD instance with the User Principal Name (UPN) as the distinguished name and password. The UPN uses the standard format `username@domain_name`.

## OpenLDAP
<a name="ldap-openldap"></a>

OpenLDAP is a free, open-source implementation of the LDAP protocol. For authentication, Amazon EMR attempts a user-bind with your OpenLDAP instance with the fully qualified domain name (FQDN) as the distinguished name and password. The FQDN uses the standard format `username_attribute=username,LDAP_user_search_base`. Commonly, the `username_attribute` value is `uid`, and the `LDAP_user_search_base` value contains the attributes of the tree that leads to the user. For example, `ou=People,dc=example,dc=com`.

Other free and open-source implementations of the LDAP protocol typically follow a similar FQDN as OpenLDAP for the distinguished names of their users. 

# LDAP components for Amazon EMR
<a name="ldap-components"></a>

You can use your LDAP server to authenticate with Amazon EMR and any applications that the user directly utilizes on the EMR cluster through the following components. 

**Secret Agent**  
The *Secret Agent* is an on-cluster process that authenticates all user requests. The Secret Agent creates the user bind to your LDAP server on behalf of the supported applications on the EMR cluster. The Secret Agent runs as the `emrsecretagent` user, and it writes logs to the `/emr/secretagent/log` directory. These logs provide details about the state of each user's authentication request and any errors that might surface during user authentication.

**System Security Services Daemon (SSSD)**  
*SSSD* is a daemon that runs on each node of an LDAP-enabled EMR cluster. SSSD creates and manages a UNIX user to sync your remote corporate identity to each node. YARN-based applications such as Hive and Spark require that a local UNIX user exists on every node that runs a query for a user.

# Application support and considerations with LDAP for Amazon EMR
<a name="ldap-considerations"></a>

This topic lists supported applications, supported features and unsupported features.

## Supported applications with LDAP for Amazon EMR
<a name="ldap-considerations-apps"></a>

**Important**  
The applications listed on this page are the only applications that Amazon EMR supports for LDAP. To ensure cluster security, you can only include LDAP-compatible applications when you create an EMR cluster with LDAP enabled. If you attempt to install other, unsupported applications, Amazon EMR will reject your request for a new cluster.

The Amazon EMR releases 6.12 and higher support LDAP integration with the following applications:
+ Apache Livy
+ Apache Hive through HiveServer2 (HS2)
+ Trino
+ Presto
+ Hue

You can also install the following applications on an EMR cluster and configure them to meet your security needs:
+ Apache Spark
+ Apache Hadoop

## Supported features with LDAP for Amazon EMR
<a name="ldap-considerations-features"></a>

You can use the following Amazon EMR features with the LDAP integration:

**Note**  
To keep LDAP credentials secure, you must use in-transit encryption to secure the flow of data on and off the cluster. For more information about in-transit encryption, see [Encrypt data at rest and in transit with Amazon EMR](emr-data-encryption.md).
+ Encryption in transit (required) and at rest
+ Instance groups, instance fleets, and Spot Instances
+ Reconfiguration of applications on a running cluster
+ EMRFS server-side encryption (SSE)

## Unsupported features
<a name="ldap-considerations-limitations"></a>

Consider the following limitations when you use the Amazon EMR LDAP integration:
+ Amazon EMR disables steps for clusters with LDAP enabled.
+ Amazon EMR doesn't support runtime roles and AWS Lake Formation integrations for clusters with LDAP enabled.
+ Amazon EMR doesn't support LDAP with StartTLS.
+ Amazon EMR doesn't support high-availability mode (clusters with multiple primary nodes) for clusters with LDAP enabled.
+ You can't rotate bind credentials or certificates for clusters with LDAP enabled. If any of those fields were rotated, we recommend that you start a new cluster with the updated bind credentials or certificates.
+ You must use exact search bases with LDAP. The LDAP user and group search base doesn't support LDAP search filters.

# Configure and launch an EMR cluster with LDAP
<a name="ldap-setup"></a>

This section covers how to configure Amazon EMR for use with LDAP authentication.

**Topics**
+ [Add AWS Secrets Manager permissions to the Amazon EMR instance role](ldap-setup-asm.md)
+ [Create the Amazon EMR security configuration for LDAP integration](ldap-setup-security.md)
+ [Launch an EMR cluster that authenticates with LDAP](ldap-setup-launch.md)

# Add AWS Secrets Manager permissions to the Amazon EMR instance role
<a name="ldap-setup-asm"></a>

Amazon EMR uses an IAM service role to perform actions on your behalf to provision and manage clusters. The service role for cluster EC2 instances, also called *the EC2 instance profile for Amazon EMR*, is a special type of service role that Amazon EMR assigns to every EC2 instance in a cluster at launch.

To define permissions for an EMR cluster to interact with Amazon S3 data and other AWS services, define a custom Amazon EC2 instance profile instead of the `EMR_EC2_DefaultRole` when you launch your cluster. For more information, see [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md) and [Customize IAM roles with Amazon EMR](emr-iam-roles-custom.md).

Add the following statements to the default EC2 instance profile to allow Amazon EMR to tag sessions and access the AWS Secrets Manager that stores LDAP certificates.

```
    {
      "Sid": "AllowAssumeOfRolesAndTagging",
      "Effect": "Allow",
      "Action": ["sts:TagSession", "sts:AssumeRole"],
      "Resource": [
        "arn:aws:iam::111122223333:role/LDAP_DATA_ACCESS_ROLE_NAME",
        "arn:aws:iam::111122223333:role/LDAP_USER_ACCESS_ROLE_NAME"
      ]
    },
    {
        "Sid": "AllowSecretsRetrieval",
        "Effect": "Allow",
        "Action": "secretsmanager:GetSecretValue",
        "Resource": [
            "arn:aws:secretsmanager:us-east-1:111122223333:secret:LDAP_SECRET_NAME*",
            "arn:aws:secretsmanager:us-east-1:111122223333:secret:ADMIN_LDAP_SECRET_NAME*"
        ]
    }
```

**Note**  
Your cluster requests will fail if you forget the wildcard `*` character at the end of the secret name when you set Secrets Manager permissions. The wildcard represents the secret versions.  
You should aslo limit the scope of the AWS Secrets Manager policy to only the certificates that your cluster needs to provision instances.

# Create the Amazon EMR security configuration for LDAP integration
<a name="ldap-setup-security"></a>

Before you can launch an EMR cluster with LDAP integration, use the steps in [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md) to create an Amazon EMR security configuration for the cluster. Complete the following configurations in the `LDAPConfiguration` block under `AuthenticationConfiguration`, or the in corresponding fields in the Amazon EMR console **Security Configurations** section:

**`EnableLDAPAuthentication`**  
Console option: **Authentication protocol: LDAP**  
To use the LDAP integration, set this option to `true` or select it as your authentication protocol when you create a cluster in the console. By default, `EnableLDAPAuthentication` is `true` when you create a security configuration in the Amazon EMR console.

**`LDAPServerURL`**  
Console option: **LDAP server location**  
The location of the LDAP server including the prefix: `ldaps://location_of_server`.

**`BindCertificateARN`**  
Console option: **LDAP SSL certificate**  
The AWS Secrets Manager ARN that contains the certificate to sign the SSL certificate that the LDAP server uses. If your LDAP server is signed by a public Certificate Authority (CA), you can provide an AWS Secrets Manager ARN with a blank file. For more information on how to store your certificate in Secrets Manager, see [Store TLS certificates in AWS Secrets Manager](emr-ranger-tls-certificates.md).

**`BindCredentialsARN`**  
Console option: **LDAP server bind credentials**  
An AWS Secrets Manager ARN that contains the LDAP admin user bind credentials. The credentials are stored as a JSON object. There is only one key-value pair in this secret; the key in the pair is the username, and the value is the password. For example, `{"uid=admin,cn=People,dc=example,dc=com": "AdminPassword1"}`. This is an optional field unless you enable SSH login for your EMR cluster. In many configurations, Active Directory instances require bind credentials to allow SSSD to sync users.

**`LDAPAccessFilter`**  
Console option: **LDAP access filter**  
Specifies the subset of objects within your LDAP server that can authenticate. For example, if all you want to grant access to all users with the `posixAccount` object class in your LDAP server, define the access filter as `(objectClass=posixAccount)`.

**`LDAPUserSearchBase`**  
Console option: **LDAP user search base**  
The search base that your users belong under within your LDAP server. For example, `cn=People,dc=example,dc=com`.

**`LDAPGroupSearchBase`**  
Console option: **LDAP group search base**  
The search base that your groups belong under within your LDAP server. For example, `cn=Groups,dc=example,dc=com`.

**`EnableSSHLogin`**  
Console option: **SSH login**  
Specifies whether or not to allow password authentication with LDAP credentials. We don't recommend that you enable this option. Key pairs are a more secure route to allow access into EMR clusters. This field is optional and defaults to `false`. 

**`LDAPServerType`**  
Console option: **LDAP server type**  
Specifies the type of LDAP server that Amazon EMR connects to. Supported options are Active Directory and OpenLDAP. Other LDAP server types might work, but Amazon EMR doesn't officially support other server types. For more information, see [LDAP components for Amazon EMR](ldap-components.md).

**`ActiveDirectoryConfigurations`**  
A required sub-block for security configurations that use the Active Directory server type.

**`ADDomain`**  
Console option: **Active Directory domain**  
The domain name used to create the User Principal Name (UPN) for user authentication with security configurations that use the Active Directory server type.

## Considerations for security configurations with LDAP and Amazon EMR
<a name="ldap-setup-security-considerations"></a>
+ To create a security configuration with Amazon EMR LDAP integration, you must use in-transit encryption. For information about in-transit encryption, see [Encrypt data at rest and in transit with Amazon EMR](emr-data-encryption.md).
+ You can't define Kerberos configuration in the same security configuration. Amazon EMR provisions a KDC thar is dedicated to the automatically, and manages the admin password for this KDC. Users can't access this admin password.
+ You can't define IAM runtime roles and AWS Lake Formation in the same security configuration.
+ The `LDAPServerURL` must have the `ldaps://` protocol in its value.
+ The `LDAPAccessFilter` can't be empty. 

## Use LDAP with the Apache Ranger integration for Amazon EMR
<a name="ldap-setup-ranger"></a>

With the LDAP integration for Amazon EMR, you can further integrate with Apache Ranger. When you pull .your LDAP users into Ranger, you can then associate those users with an Apache Ranger policy server to integrate with Amazon EMR and other applications. To do this, define the `RangerConfiguration` field within `AuthorizationConfiguration` in the security configuration that you use with your LDAP cluster. For more information on how to set up the security configuration, see [Create the EMR security configuration](emr-ranger-security-config.md).

When you use LDAP with Amazon EMR, you don't need to provide a `KerberosConfiguration` with the Amazon EMR integration for Apache Ranger. 

# Launch an EMR cluster that authenticates with LDAP
<a name="ldap-setup-launch"></a>

Use the following steps to launch an EMR cluster with LDAP or Active Directory. 

1. Set up your environment:
   + Make sure that the nodes on your EMR cluster can communicate with Amazon S3 and AWS Secrets Manager. For more information on how to modify your EC2 instance profile role to communicate with these services, see [Add AWS Secrets Manager permissions to the Amazon EMR instance role](ldap-setup-asm.md).
   + If you plan to run your EMR cluster in a private subnet, you should use AWS PrivateLink and Amazon VPC endpoints, or use network address transalation (NAT) to configure the VPC to communicate with S3 and Secrets Manager. For more information, see [AWS PrivateLink and VPC endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/endpoint-services-overview.html) and [NAT instances](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Instance.html) in the *Amazon VPC Getting Started Guide*.
   + Make sure that there is network connectivity between your EMR cluster and the LDAP server. Your EMR clusters must access your LDAP server over the network. The primary, core, and task nodes for the cluster communicate with the LDAP server to sync user data. If your LDAP server runs on Amazon EC2, update the EC2 security group to accept traffic from the EMR cluster. For more information, see [Add AWS Secrets Manager permissions to the Amazon EMR instance role](ldap-setup-asm.md).

1. Create an Amazon EMR security configuration for the LDAP integration. For more information, see [Create the Amazon EMR security configuration for LDAP integration](ldap-setup-security.md).

1. Now that you're set up, use the steps in [Launch an Amazon EMR cluster](emr-gs.md#emr-getting-started-launch-sample-cluster) to launch your cluster with the following configurations:
   + Select Amazon EMR release 6.12 or higher. We recommend that you use the latest Amazon EMR release.
   + Only specify or select applications for your cluster that support LDAP. For a list of LDAP-supported applications with Amazon EMR, see [Application support and considerations with LDAP for Amazon EMR](ldap-considerations.md).
   + Apply the security configuration that you created in the previous step.

# Examples using LDAP with Amazon EMR
<a name="ldap-examples"></a>

Once you [provision an EMR cluster that uses LDAP](ldap-setup-launch.md) integration, you can provide your LDAP credentials to any [supported application](ldap-considerations.md#ldap-considerations-apps) through its built-in username and password authentication mechanism. This page shows some examples.

## Using LDAP authentication with Apache Hive
<a name="ldap-examples-"></a>

**Example - Apache Hive**  
The following example command starts an Apache Hive session through HiveServer2 and Beeline:  

```
beeline -u "jdbc:hive2://$HOSTNAME:10000/default;ssl=true;sslTrustStore=$TRUSTSTORE_PATH;trustStorePassword=$TRUSTSTORE_PASS"  -n LDAP_USERNAME -p LDAP_PASSWORD
```

## Using LDAP authentication with Apache Livy
<a name="ldap-examples-livy"></a>

**Example - Apache Livy**  
The following example command starts a Livy session through cURL. Replace `ENCODED-KEYPAIR` with a Base64-encoded string for `username:password`.  

```
curl -X POST --data '{"proxyUser":"LDAP_USERNAME","kind": "pyspark"}' -H "Content-Type: application/json" -H "Authorization: Basic ENCODED-KEYPAIR" DNS_OF_PRIMARY_NODE:8998/sessions
```

## Using LDAP authentication with Presto
<a name="ldap-examples-presto"></a>

**Example - Presto**  
The following example command starts a Presto session through the Presto CLI:  

```
presto-cli --user "LDAP_USERNAME" --password --catalog hive
```
After you run this command, enter the LDAP password at the prompt.

## Using LDAP authentication with Trino
<a name="ldap-examples-trino"></a>

**Example - Trino**  
The following example command starts a Trino session through the Trino CLI:  

```
trino-cli --user "LDAP_USERNAME" --password --catalog hive
```
After you run this command, enter the LDAP password at the prompt.

## Using LDAP authentication with Hue
<a name="ldap-examples-hue"></a>

You can access Hue UI through an SSH tunnel that you create on the cluster, or you can set a proxy server to publicly broadcast the connection to Hue. Because Hue doesn't run in HTTPS mode by default, we recommend that you use an additional encryption layer to ensure that communication between clients and the Hue UI is encrypted with HTTPS. This reduces the chance that you might accidentally expose user credentials in plain text.

To use the Hue UI, open the Hue UI in your browser and enter your LDAP username password to log in. If the credentials are correct, Hue logs you in and uses your identity to authenticate you with all supported applications.

## Using SSH for password authentication and Kerberos tickets for other applications
<a name="ldap-examples-ssh"></a>

**Important**  
We don't recommend that you use password authentication to SSH into an EMR cluster.

You can use your LDAP credentials to SSH to an EMR cluster. To do this, set the `EnableSSHLogin` configuration to `true` in the Amazon EMR security configuration that you use to start the cluster. Then, use the following command to SSH to the cluster once its been launched:

```
ssh username@EMR_PRIMARY_DNS_NAME
```

After you run this command, enter the LDAP password at the prompt.

Amazon EMR includes an on-cluster script that allows users to generate a Kerberos keytab file and ticket to use with supported applications that don't accept LDAP credentials directly. Some of these applications include `spark-submit`, Spark SQL, and PySpark.

Run `ldap-kinit` and follow the prompts. If the authentication succeeds, the Kerberos keytab file appears in your home directory with a valid Kerberos ticket. Use the Kerberos ticket to run applications as you would on any Kerberized environment.

# Integrate Amazon EMR with AWS IAM Identity Center
<a name="emr-idc"></a>

With Amazon EMR releases 6.15.0 and higher, you can use identities from AWS IAM Identity Center to authenticate with an Amazon EMR cluster. The following sections provides a conceptual overview, prerequisites, and steps required to launch an EMR cluster with Identity Center integration.

**Topics**
+ [Overview](#emr-idc-overview)
+ [Features and benefits](#emr-idc-features)
+ [Getting started with AWS IAM Identity Center and Amazon EMR](emr-idc-start.md)
+ [User background sessions](user-background-sessions.md)
+ [Considerations and limitations for Amazon EMR with the Identity Center integration](emr-idc-considerations.md)

## Overview
<a name="emr-idc-overview"></a>

[Identity Center](https://docs.aws.amazon.com/singlesignon/latest/userguide/what-is.html) is the recommended approach for workforce authentication and authorization on AWS for organizations of any size and type. With Identity Center, you can create and manage user identities in AWS, or connect your existing identity source, including Microsoft Active Directory, Okta, Ping Identity, JumpCloud, Google Workspace, and Microsoft Entra ID (formerly Azure AD).

[Trusted identity propagation](https://docs.aws.amazon.com//singlesignon/latest/userguide/trustedidentitypropagation-overview.html) is an AWS IAM Identity Center feature that administrators of connected AWS services can use to grant and audit access to service data. Access to this data is based on user attributes such as group associations. Setting up trusted identity propagation requires collaboration between the administrators of connected AWS services and the IAM Identity Center administrators. For more information, see [Prerequisites and considerations](https://docs.aws.amazon.com//singlesignon/latest/userguide/trustedidentitypropagation-overall-prerequisites.html).

## Features and benefits
<a name="emr-idc-features"></a>

The Amazon EMR integration with IAM Identity Center provides the following benefits:
+ Amazon EMR provides credentials to relay your Identity Center Identity to an EMR cluster.
+ Amazon EMR configures all supported applications to authenticate with the cluster credentials.
+ Amazon EMR configures and maintains the supported application security with the Kerberos protocol and no commands or scripts required by you.
+ The ability to enforce Amazon S3 prefix-level authorization with Identity Center identities on S3 Access Grants-managed S3 prefixes.
+ The ability to enforce table-level authorization with Identity Center identities on AWS Lake Formation managed AWS Glue tables. 

# Getting started with AWS IAM Identity Center and Amazon EMR
<a name="emr-idc-start"></a>

This section helps you configure Amazon EMR to integrate with AWS IAM Identity Center.

**Topics**
+ [Create an Identity Center instance](#emr-idc-start-instance)
+ [Create an IAM role for Identity Center](#emr-idc-start-role)
+ [Add permissions for services not integrated with IAM Identity Center](#emr-idc-start-securityconfig-nonidc)
+ [Create an Identity Center enabled security configuration](#emr-idc-start-securityconfig)
+ [Create and launch an Identity Center enabled cluster](#emr-idc-cluster)
+ [Configure Lake Formation for an IAM Identity Center enabled EMR cluster](emr-idc-lf.md)
+ [Working with S3 Access Grants on an IAM Identity Center enabled EMR cluster](emr-idc-s3ag.md)

**Note**  
In order to use Identity Center integration with EMR, Lake Formation or S3 Access Grants must be enabled. You can also use both. If neither is enabled, Identity Center integration isn't supported.

## Create an Identity Center instance
<a name="emr-idc-start-instance"></a>

If you don't already have one, create an Identity Center instance in the AWS Region where you want to launch your EMR cluster. An Identity Center instance can only exist in a single Region for an AWS account.

Use the following AWS CLI command to create a new instance named `MyInstance`:

```
aws sso-admin create-instance --name MyInstance
```

## Create an IAM role for Identity Center
<a name="emr-idc-start-role"></a>

To integrate Amazon EMR with AWS IAM Identity Center, create an IAM role that authenticates with Identity Center from the EMR cluster. Under the hood, Amazon EMR uses SigV4 credentials to relay the Identity Center identity to downstream services such as AWS Lake Formation. Your role should also have the respective permissions to invoke the downstream services.

When you create the role, use the following permissions policy:

```
{
  "Statement": [
    {
      "Sid": "IdCPermissions",
      "Effect": "Allow",
      "Action": [
        "sso-oauth:*"
      ],
      "Resource": "*"
    },
    {
      "Sid": "GlueandLakePermissions",
      "Effect": "Allow",
      "Action": [
        "glue:*",
        "lakeformation:GetDataAccess"
      ],
      "Resource": "*"
    },
    {
      "Sid": "AccessGrantsPermissions",
      "Effect": "Allow",
      "Action": [
        "s3:GetDataAccess",
        "s3:GetAccessGrantsInstanceForPrefix"
      ],
      "Resource": "*"
    }
  ]
}
```

The trust policy for this role allows the InstanceProfile role to let it assume the role.

```
{
    "Sid": "AssumeRole",
    "Effect": "Allow",
    "Principal": {
        "AWS": "arn:aws:iam::12345678912:role/EMR_EC2_DefaultRole"
    },
    "Action": [
        "sts:AssumeRole",
        "sts:SetContext"
    ]
}
```

If the role doesn't have trusted credentials and accesses a Lake Formation-protected table, Amazon EMR automatically sets the `principalId` of the assumed role to `userID-untrusted`. The following is a snippet of a CloudTrail event that displays the `principalId`.

```
{
    "eventVersion": "1.09",
    "userIdentity": {
        "type": "AssumedRole",
        "principalId": "ABCDEFGH1JKLMNO2PQR3TU:5000-untrusted",
        "arn": "arn:aws:sts::123456789012:assumed-role/EMR_TIP/5000-untrusted",
        "accountId": "123456789012",
        "accessKeyId": "ABCDEFGH1IJKLMNOPQ7R3"
        ...
```

## Add permissions for services not integrated with IAM Identity Center
<a name="emr-idc-start-securityconfig-nonidc"></a>

AWS credentials that use Trusted Identity Propagation the IAM policies defined in the IAM role for any calls made to services not integrated with IAM Identity Center. This includes, for example, the AWS Key Management Service. Your role should also define any IAM permissions for any such services you would attempt to access example AWS Key Management Service. Currently supported IAM Identity Center integrated services include AWS Lake Formation and Amazon S3 Access Grants.

To learn more about Trusted Identity Propagation, see [Trusted Identity Propagation across applications](https://docs.aws.amazon.com/singlesignon/latest/userguide/trustedidentitypropagation.html).

## Create an Identity Center enabled security configuration
<a name="emr-idc-start-securityconfig"></a>

To launch an EMR cluster with IAM Identity Center integration, use the following example command to create an Amazon EMR security configuration that has Identity Center enabled. Each configuration is explained below.

```
aws emr create-security-configuration --name "IdentityCenterConfiguration-with-lf-accessgrants" --region "us-west-2" --security-configuration '{
    "AuthenticationConfiguration":{
        "IdentityCenterConfiguration":{
            "EnableIdentityCenter":true,
            "IdentityCenterApplicationAssigmentRequired":false,
            "IdentityCenterInstanceARN": "arn:aws:sso:::instance/ssoins-123xxxxxxxxxx789"
        }
    },
    "AuthorizationConfiguration": {
        "LakeFormationConfiguration": {
            "AuthorizedSessionTagValue": "Amazon EMR"
        },
        "IAMConfiguration": {
          "EnableApplicationScopedIAMRole": true,
          "ApplicationScopedIAMRoleConfiguration": {
            "PropagateSourceIdentity": true
          }
        }
    },
    "EncryptionConfiguration": {
        "EnableInTransitEncryption": true,
        "EnableAtRestEncryption": false,
        "InTransitEncryptionConfiguration": {
            "TLSCertificateConfiguration": {
                "CertificateProviderType": "PEM",
                "S3Object": "s3://amzn-s3-demo-bucket/cert/my-certs.zip"
            }
        }
    }
}'
```
+ **`EnableIdentityCenter`** – (required) Enables Identity Center integration.
+ **`IdentityCenterInstanceARN`** – (optional) The Identity Center instance ARN. If this isn't included, the existing IAM Identity Center instance ARN is looked up as part of the configuration step.
+ **`IAMRoleForEMRIdentityCenterApplicationARN`** – (required) The IAM role that procures Identity Center tokens from the cluster.
+ **`IdentityCenterApplicationAssignmentRequired `** – (boolean) Governs if an assignment will be required to use the Identity Center application. This field is optional. If a value isn't provided, the default is `false`.
+ **`AuthorizationConfiguration` / `LakeFormationConfiguration`** – Optionally, configure authorization:
  + **`IAMConfiguration`** – Enables EMR Runtimes Roles feature to be used in addition to your TIP identity. If you enable this configuration, then you (or the caller AWS Service) will be required to specify an IAM Runtime Role in each call to the EMR Steps or EMR `GetClusterSessionCredentials` APIs. If the EMR cluster is being used with SageMaker Unified Studio, then this option is required if Trusted Identity Propagation is also enabled.
  + **`EnableLakeFormation`** – Enable Lake Formation authorization on the cluster.

To enable Identity Center integration with Amazon EMR, you must specify `EncryptionConfiguration` and `IntransitEncryptionConfiguration`.

## Create and launch an Identity Center enabled cluster
<a name="emr-idc-cluster"></a>

Now that you've set up the IAM role that authenticates with Identity Center, and created an Amazon EMR security configuration that has Identity Center enabled, you can create and launch your identity-aware cluster. For steps to launch your cluster with the required security configuration, see [Specify a security configuration for an Amazon EMR cluster](emr-specify-security-configuration.md).

The following sections describe how to configure your Identity Center enabled cluster with security options that Amazon EMR supports:
+ [Working with S3 Access Grants on an IAM Identity Center enabled EMR cluster](emr-idc-s3ag.md)
+ [Configure Lake Formation for an IAM Identity Center enabled EMR cluster](emr-idc-lf.md)

# Configure Lake Formation for an IAM Identity Center enabled EMR cluster
<a name="emr-idc-lf"></a>

You can integrate [AWS Lake Formation](https://docs.aws.amazon.com/lake-formation/latest/dg/) with your AWS IAM Identity Center enabled EMR cluster.

First, be sure you have an Identity Center instance set up in the same Region as your cluster. For more information, see [Create an Identity Center instance](emr-idc-start.md#emr-idc-start-instance). You can find the instance ARN in the IAM Identity Center console when you view the instance details, or use the following command to view details for all your instances from the CLI:

```
aws sso-admin list-instances
```

Then use the ARN and your AWS account ID with the following command to configure Lake Formation to be compatible with IAM Identity Center:

```
aws lakeformation create-lake-formation-identity-center-configuration --cli-input-json file://create-lake-fromation-idc-config.json 
json input:
{
    "CatalogId": "account-id/org-account-id",
    "InstanceArn": "identity-center-instance-arn"
}
```

Now, call `put-data-lake-settings` and enable `AllowFullTableExternalDataAccess` with Lake Formation:

```
aws lakeformation put-data-lake-settings --cli-input-json file://put-data-lake-settings.json 
json input:
{
    "DataLakeSettings": {
        "DataLakeAdmins": [
            {
                "DataLakePrincipalIdentifier": "admin-ARN"
            }
        ],
        "CreateDatabaseDefaultPermissions": [...],
        "CreateTableDefaultPermissions": [...],
        "AllowExternalDataFiltering": true,
        "AllowFullTableExternalDataAccess": true
    }
}
```

Finally, grant full table permissions to the identity ARN for the user that accesses the EMR cluster. The ARN contains the user ID from Identity Center. Navigate to Identity Center in the console, select **Users**, and then select the user to view their **General information** settings.

Copy the User ID and paste it into the following ARN for `user-id`:

```
arn:aws:identitystore:::user/user-id
```

**Note**  
Queries on the EMR cluster only work if the IAM Identity Center identity has full table access on the Lake Formation protected table. If the identity doesn't have full table access, then the query will fail.

Use the following command to grant the user full table access:

```
aws lakeformation grant-permissions --cli-input-json file://grantpermissions.json
json input:
{
    "Principal": {
        "DataLakePrincipalIdentifier": "arn:aws:identitystore:::user/user-id"
    },
    "Resource": {
        "Table": {
            "DatabaseName": "tip_db",
            "Name": "tip_table"
        }
    },
    "Permissions": [
        "ALL"
    ],
    "PermissionsWithGrantOption": [
        "ALL"
    ]
}
```

## Adding the application ARN to IDC for Lake Formation integration
<a name="emr-idc-enabled-idc"></a>

In order to query Lake Formation enabled resources, the Application ARN of the IDC application needs to be added. To do this, follow these steps:

1. On the console, choose **AWS Lake Formation**.

1. Select **IAM Identity Center integration** and **Lake Formation application integration** by matching the application ARN. The ARN will appear in the **Application ID** list.

# Working with S3 Access Grants on an IAM Identity Center enabled EMR cluster
<a name="emr-idc-s3ag"></a>

You can integrate [S3 Access Grants](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-grants.html) with your AWS IAM Identity Center enabled EMR cluster.

Use S3 Access Grants to authorize access to your data sets from clusters that use Identity Center. Create grants to augment the permissions that you set for IAM users, groups, roles, or for a corporate directory. For more information, see [Using S3 Access Grants with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-access-grants.html).

**Topics**
+ [Create an S3 Access Grants instance and location](#emr-idc-s3ag-instance)
+ [Create grants for Identity Center identities](#emr-idc-s3ag-identities)

## Create an S3 Access Grants instance and location
<a name="emr-idc-s3ag-instance"></a>

If you don't already have one, create an S3 Access Grants instance in the AWS Region where you want to launch your EMR cluster. 

Use the following AWS CLI command to create a new instance named `MyInstance`:

```
aws s3control-access-grants create-access-grants-instance \
--account-id 12345678912 \
--identity-center-arn "identity-center-instance-arn" \
```

Then, create an S3 Access Grants location, replacing the red values with your own:

```
aws s3control-access-grants create-access-grants-location \
--account-id 12345678912 \
--location-scope s3:// \
--iam-role-arn "access-grant-role-arn" \
--region aa-example-1
```

**Note**  
Define the `iam-role-arn` parameter as the `accessGrantRole` ARN.

## Create grants for Identity Center identities
<a name="emr-idc-s3ag-identities"></a>

Finally, create the grants for the identities that have access to your cluster:

```
aws s3control-access-grants create-access-grant \
--account-id 12345678912 \
--access-grants-location-id "default" \
--access-grants-location-configuration S3SubPrefix="s3-bucket-prefix"
--permission READ \
--grantee GranteeType=DIRECTORY_USER,GranteeIdentifier="your-identity-center-user-id"
```

Example Output:

```
{
"CreatedAt": "2023-09-21T23:47:24.870000+00:00",
"AccessGrantId": "1234-12345-1234-1234567",
"AccessGrantArn": "arn:aws:s3:aa-example-1-1:123456789012:access-grants/default/grant/xxxx1234-1234-5678-1234-1234567890",
"Grantee": {
"GranteeType": "DIRECTORY_USER",
"GranteeIdentifier": "5678-56789-5678-567890"
},
"AccessGrantsLocationId": "default",
"AccessGrantsLocationConfiguration": {
"S3SubPrefix": "myprefix/*"
},
"Permission": "READ",
"GrantScope": "s3://myprefix/*"
}
```

# User background sessions
<a name="user-background-sessions"></a>

User background sessions enable long-running analytics and machine learning workloads to continue even after the user has logged off from their notebook interface. Starting with EMR on EC2 release 7.11, this capability is available through EMR-EC2's trusted identity propagation feature. The following sections explains the configuration options and behaviors for user background sessions.

**Note**  
User background session settings only affect Spark workloads launched through SageMaker Unified Studio. Changes to this setting apply to new Livy sessions—existing active sessions remain unaffected.

## Configure user background sessions
<a name="w2aac30c29c15b7"></a>

User background sessions must be enabled at two levels for proper functionality:

1. **IAM Identity Center instance level** (configured by IdC administrators)

1. **EMR cluster level** (configured by EMR cluster administrators)

### Enable user background sessions for Amazon EMR
<a name="w2aac30c29c15b7b7"></a>

To enable user background sessions for you must set the `userBackgroundSessionsEnabled` parameter to `true` in the `identityCenterConfiguration` when creating EMR security configuration.

**Prerequisites:**
+ The IAM role used to create or update EMR Security Configuration requires the `sso:PutApplicationSessionConfiguration` permission. This permission enables user background sessions for Amazon EMR managed IAM Identity Center application.
+ Create an IAM role for IAM Identity Center
  + To integrate Amazon EMR with IAM Identity Center, create an IAM role that authenticates with IAM Identity Center from the EMR cluster. Amazon EMR uses SigV4 credentials to relay the IAM Identity Center identity to downstream services such as AWS Lake Formation. Your role should also have the required permissions to invoke the downstream services.
  + [Configure Lake Formation for an IAM Identity Center enabled EMR cluster](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-idc-lf.html). For required role permissions see: [Create an IAM role for Identity Center.](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-idc-start.html#emr-idc-start-role) 
+ Launch your EMR cluster with release 7.11 or later and enable Trusted-Identity Propagation.

**Step 1 - Create an Identity Center UserBackgroundSession enabled EMR security configuration**

Users need to set `EnableUserBackgroundSession`** flag to `true` **, which will allow EMR service to enable UserBackgourndSession at EMR managed IDC application level. If this flag is set to `false` or not set, EMR will disable IDC UserBackgroundSession by default.

**Example of using the AWS CLI:**

```
aws emr create-security-configuration --name "idc-userBackgroundSession-enabled-secConfig" \
--region AWS_REGION  \
--security-configuration ' \
{ 
	"AuthenticationConfiguration":{
		"IdentityCenterConfiguration":{
		"EnableIdentityCenter":true,
		"IdentityCenterInstanceARN": "arn:aws:sso:::instance/ssoins-123xxxxxxxxxx789",
		"IdentityCenterApplicationAssigmentRequired": false,
		"EnableUserBackgroundSession": true,
		"IAMRoleForEMRIdentityCenterApplicationARN": "arn:aws:iam::12345678912:role/YOUR_ROLE"
		}
	},\
	"AuthorizationConfiguration": {
	"IAMConfiguration": {
		"EnableApplicationScopedIAMRole": true,
		"ApplicationScopedIAMRoleConfiguration": {
		"PropagateSourceIdentity": true
		}
	},\
	"LakeFormationConfiguration": {
		"AuthorizedSessionTagValue": "Amazon EMR"
	}
	},\
	"EncryptionConfiguration": {
		"EnableInTransitEncryption": true,
		"EnableAtRestEncryption": false,
		"InTransitEncryptionConfiguration": {
			"TLSCertificateConfiguration": {
				"CertificateProviderType": "PEM",
							"S3Object": "s3://amzn-s3-demo-bucket/cert/my-certs.zip"
			}
		}
	}
}'
```

** Step 2 - Create and launch an Identity Center enabled cluster**

 Now that you've set up the IAM role that authenticates with Identity Center, and created an Amazon EMR security configuration that has Identity Center enabled, you can create and launch your identity-aware cluster. For steps to launch your cluster with the required security configuration, see Specify a security configuration for an Amazon EMR cluster. 

### Configuration Matrix
<a name="security-trusted-prop-user-background-matrix"></a>

The user background session behavior depends on both the EMR-EC2 setting and the IAM Identity Center instance-level settings:


**User Background Session Configuration Matrix**  

| IAM Identity Center userBackgroundSession Enabled | Amazon EMR userBackgroundSessionsEnabled | Behavior | 
| --- | --- | --- | 
| Yes | TRUE | User background session enabled | 
| Yes | FALSE | Session expires with user logout | 
| No | TRUE | Session expires with user logout | 
| No | FALSE | Session expires with user logout | 

### Default user background session duration
<a name="security-trusted-prop-user-background-duration"></a>

By default, all user background sessions have a duration limit of 7 days in IAM Identity Center. Administrators can modify this duration in the IAM Identity Center console. This setting applies at the IAM Identity Center instance level, affecting all supported IAM Identity Center applications within that instance.
+ Duration can be set to any value from 15 minutes up to 90 days.
+ This setting is configured in the IAM Identity Center console under **Settings** → **Authentication** → **Configure** (See Non-Interactive Jobs section)

### Impact of disabling user background sessions
<a name="security-trusted-prop-user-background-disabling"></a>

When user background sessions are disabled in IAM Identity Center:

Existing Livy sessions  
+ Continue to run without interruption if they were started with user background sessions enabled. These sessions will continue using their existing background session tokens until they terminate naturally or are explicitly stopped.

New Livy sessions  
+ Will use the standard trusted identity propagation flow and will terminate when the user logs out or their interactive session expires (such as when closing a Amazon SageMaker Unified Studio JupyterLab notebook).

### Changing user background sessions duration
<a name="security-trusted-prop-user-background-changing-duration"></a>

When the duration setting for user background sessions is modified in IAM Identity Center:

Existing Livy sessions  
+ Continue to run with the same background session duration with which they were started.

New Livy sessions  
+ Will use the new session duration for background sessions.

### Considerations
<a name="security-trusted-prop-user-background-considerations"></a>

#### Feature Availability
<a name="prop-user-background-additional-feature-availability"></a>

User background sessions for Amazon EMR are available for:
+ Spark engine only (Hive engine is not supported)
+ Livy interactive sessions only (batch jobs and streaming jobs are not supported)
+ Amazon EMR release labels 7.11 and later. With EMR release 7.11, you need to install a bootstrap action script to enable user background sessions when creating a cluster. Please contact AWS Support for additional details. 
**Note**  
If you are using SageMaker Unified Studio provisioned cluster, you do not need the bootstrap action script to use this feature.

#### Cost Implications
<a name="prop-user-background-additional-data-persistence-cost"></a>
+ Jobs will continue to run to completion even after users end their Amazon SageMaker Unified Studio JupyterLab session and will incur charges for the entire duration of the completed run.
+ Monitor your active background sessions to avoid unnecessary costs from forgotten or abandoned sessions.

#### Livy Session Termination Conditions
<a name="security-trusted-prop-user-background-considerations-session"></a>

When using user background sessions, a Livy session will continue running until one of the following occurs:
+ The user background session expires (based on IdC configuration, up to 90 days).
+ The user background session is manually revoked by an administrator.
+ The Livy session reaches its idle timeout (default: 8 hours after the last executed statement).
+ The user explicitly stops or restarts the notebook kernel.

# Considerations and limitations for Amazon EMR with the Identity Center integration
<a name="emr-idc-considerations"></a>

Consider the following points when you use IAM Identity Center with Amazon EMR: 
+ Trusted Identity Propagation through Identity Center is supported on Amazon EMR 6.15.0 and higher, and only with Apache Spark. Also, Trusted Identity Propagation through Identity Center using EMR Runtime Roles feature is supported on Amazon EMR 7.8.0 and higher, and only with Apache Spark.
+ To enable EMR clusters with trusted identity propagation, you must use the AWS CLI to create a security configuration that has trusted identity propagation enabled, and use that security configuration when you launch your cluster. For more information, see [Create an Identity Center enabled security configuration](emr-idc-start.md#emr-idc-start-securityconfig).
+ Fine-grained access controls using AWS Lake Formation that use Trusted Identity Propagation are available for Amazon EMR clusters on EMR version 7.2.0 and higher. Between EMR versions 6.15.0 and 7.1.0, only table-level access control, based on AWS Lake Formation, is available.
+ With Amazon EMR clusters that use Trusted Identity Propagation, operations that support access control based on Lake Formation with Apache Spark include SELECT, ALTER TABLE, INSERT INTO, and DROP TABLE.
+  Fine-grained access controls using AWS Lake Formation that use Trusted Identity Propagation will need to update Lake Formation Identity Center configuration by adding EMR managed IAM Identity application arn as authorized target. You can find Amazon EMR managed IAM Identity application ARN by calling EMR `describe-security-configure` API and look for field `IdCApplicationARN`. More details: [Updating IAM Identity Center integration](https://docs.aws.amazon.com/lake-formation/latest/dg/update-lf-identity-center-connection.html) on how to setup Lake Formation with IAM Identity Center configuration. 
+  To use Fine-grained access controls using AWS Lake Formation that use Trusted Identity Propagation, IAM Identity users should be granted Lake Formation permissions on default database. More details: [Configure Lake Formation for an IAM Identity Center enabled EMR cluster](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-idc-lf.html). 
+ Trusted Identity Propagation with Amazon EMR is supported in the following AWS Regions: 
  + `af-south-1` – Africa (Cape Town)
  + `ap-east-1` – Asia Pacific (Hong Kong)
  + `ap-northeast-1` – Asia Pacific (Tokyo)
  + `ap-northeast-2` – Asia Pacific (Seoul)
  + `ap-northeast-3` – Asia Pacific (Osaka)
  + `ap-south-1` – Asia Pacific (Mumbai)
  + `ap-south-2` – Asia Pacific (Hyderabad)
  + `ap-southeast-1` – Asia Pacific (Singapore)
  + `ap-southeast-2` – Asia Pacific (Sydney)
  + `ap-southeast-3` – Asia Pacific (Jakarta)
  + `ap-southeast-4` – Asia Pacific (Melbourne)
  + `ca-central-1` – Canada (Central)
  + `eu-central-1` – Europe (Frankfurt)
  + `eu-central-2` – Europe (Zurich)
  + `eu-north-1` – Europe (Stockholm)
  + `eu-south-1` – Europe (Milan)
  + `eu-south-2` – Europe (Spain)
  + `eu-west-1` – Europe (Ireland)
  + `eu-west-2` – Europe (London)
  + `eu-west-3` – Europe (Paris)
  + `il-central-1` – Israel (Tel Aviv)
  + `me-central-1` – Middle East (UAE)
  + `me-south-1` – Middle East (Bahrain)
  + `sa-east-1` – South America (São Paulo)
  + `us-east-1` – US East (N. Virginia)
  + `us-east-2` – US East (Ohio)
  + `us-west-1` – US West (N. California)
  + `us-west-2` – US West (Oregon)
+ If the IAM Role for identity center role is accidentally deleted and recreated, the principal will have a different principal-id. Example *NewRole* would have principal-id *456* which would not match the recorded principal-id *123*. The only way to resolve this at this point is to re-set the principal in the downstream resource policies in every downstream account.

# Integrate Amazon EMR with AWS Lake Formation
<a name="emr-lake-formation"></a>

AWS Lake Formation is a managed service that helps you discover, catalog, cleanse, and secure data in an Amazon Simple Storage Service (S3) data lake. Lake Formation provides fine-grained access at the column, row, or cell level to databases and tables in the AWS Glue Data Catalog. For more information, see [What is AWS Lake Formation?](https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html)

With Amazon EMR release 6.7.0 and later, you can apply Lake Formation based access control to Spark, Hive, and Presto jobs that you submit to Amazon EMR clusters. To integrate with Lake Formation, you must create an EMR cluster with a *runtime role*. A runtime role is an AWS Identity and Access Management (IAM) role that you associate with Amazon EMR jobs or queries. Amazon EMR then uses this role to access AWS resources. For more information, see [Runtime roles for Amazon EMR steps](emr-steps-runtime-roles.md).

## How Amazon EMR works with Lake Formation
<a name="how-emr-lf-works"></a>

After you integrate Amazon EMR with Lake Formation, you can execute queries to Amazon EMR clusters with the [`Step` API](https://docs.aws.amazon.com/emr/latest/APIReference/API_Step.html) or with SageMaker AI Studio. Then, Lake Formation provides access to data through temporary credentials for Amazon EMR. This process is called credential vending. For more information, see [What is AWS Lake Formation?](https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html)

The following is a high-level overview of how Amazon EMR gets access to data protected by Lake Formation security policies.

![\[How Amazon EMR accesses data protected by Lake Formation security policies\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/lf-emr-security.png)


1. A user submits an Amazon EMR query for data in Lake Formation.

1. Amazon EMR requests temporary credentials from Lake Formation to give the user data access.

1. Lake Formation returns temporary credentials.

1. Amazon EMR sends the query request to retrieve data from Amazon S3.

1. Amazon EMR receives the data from Amazon S3, filters it, and returns results based on the user permissions that the user defined in Lake Formation.

For more information about adding users and groups to Lake Formation policies, see [Granting Data Catalog permissions](https://docs.aws.amazon.com/lake-formation/latest/dg/granting-catalog-permissions.html).

## Prerequisites
<a name="prerequisites"></a>

You must meet the following requirements before you integrate Amazon EMR and Lake Formation:
+ Turn on runtime role authorization on your Amazon EMR cluster.
+ Use the AWS Glue Data Catalog as your metadata store.
+ Define and manage permissions in Lake Formation to access databases, tables, and columns in AWS Glue Data Catalog. For more information, see [What is AWS Lake Formation?](https://docs.aws.amazon.com/lake-formation/latest/dg/what-is-lake-formation.html)

# Fine-grained access with Lake Formation
<a name="lake-formation-fine-grained-access"></a>

Amazon EMR releases 6.15.0 and higher include support for fine-grained access control at the row, column, or cell level based on AWS Lake Formation. The topics in this section cover how you can access Lake Formation protected Glue Data catalog tables from EMR Spark jobs or interactive sessions with fine-grained access control.

# Enable Lake Formation with Amazon EMR
<a name="emr-lf-enable"></a>

With Amazon EMR 6.15.0 and higher, when you run Spark jobs on Amazon EMR on EC2 clusters that access data in the AWS Glue Data Catalog, you can use AWS Lake Formation to apply table, row, column, and cell level permissions on Hudi, Iceberg, or Delta Lake based tables.

In this section, we cover how to create a security configuration and set up Lake Formation to work with Amazon EMR. We also go over how to launch a cluster with the security configuration that you created for Lake Formation. 

## Step 1: Set up a runtime role for your EMR cluster
<a name="emr-lf-launch-cluster"></a>

To use a runtime role for your EMR cluster, you must create a security configuration. With a security configuration, you can apply consistent security, authorization, and authentication options across your clusters. 

1. Create a file called `lf-runtime-roles-sec-cfg.json` with the following security configuration.

   ```
   {
       "AuthorizationConfiguration": {
           "IAMConfiguration": {
               "EnableApplicationScopedIAMRole": true,
               "ApplicationScopedIAMRoleConfiguration": {
                   "PropagateSourceIdentity": true
               }
           },
           "LakeFormationConfiguration": {
               "AuthorizedSessionTagValue": "Amazon EMR"
           }
       },
       "EncryptionConfiguration": {
   	    "EnableAtRestEncryption": false,
               "EnableInTransitEncryption": true,
               "InTransitEncryptionConfiguration": {
               "TLSCertificateConfiguration": {<certificate-configuration>}
           }
       }
   }
   ```

   The example below illustrates how to use a zip file with certificates in Amazon S3 for certificate configuration:
   + A zip file with certificates in Amazon S3 is used as the key provider. (See [Providing certificates for encrypting data in transit with Amazon EMR encryption](emr-encryption-enable.md#emr-encryption-certificates) for certificate requirements.)

   ```
   "TLSCertificateConfiguration": {
   	"CertificateProviderType": "PEM",       
   	"S3Object": "s3://MyConfigStore/artifacts/MyCerts.zip"
    }
   ```

   The example below illustrates how to use a custom key provider for certificate configuration:
   + A custom key provider is used. (See [Providing certificates for encrypting data in transit with Amazon EMR encryption](emr-encryption-enable.md#emr-encryption-certificates) for certificate requirements.)

   ```
   "TLSCertificateConfiguration": {
   	"CertificateProviderType": "Custom",
   	"S3Object": "s3://MyConfig/artifacts/MyCerts.jar",
   	"CertificateProviderClass": "com.mycompany.MyCertProvider"
       }
   ```

1. Next, to ensure that the session tag can authorize Lake Formation, set the `LakeFormationConfiguration/AuthorizedSessionTagValue` property to `Amazon EMR`. 

1. Use the following command to create the Amazon EMR security configuration.

   ```
   aws emr create-security-configuration \
   --name 'iamconfig-with-iam-lf' \
   --security-configuration file://lf-runtime-roles-sec-cfg.json
   ```

   Alternatively, you can use the [Amazon EMR console](https://console.aws.amazon.com//emr) to create a security configuration with custom settings.

## Step 2: Launch an Amazon EMR cluster
<a name="emr-lf-launch-cluster"></a>

Now you’re ready to launch an EMR cluster with the security configuration that you created in the previous step. For more information on security configurations, see [Use security configurations to set up Amazon EMR cluster security](emr-security-configurations.md) and [Runtime roles for Amazon EMR steps](emr-steps-runtime-roles.md).

## Step 3: Set up Lake Formation-based column, row, or cell-level permissions with Amazon EMR runtime roles
<a name="emr-lf-fgac-perms"></a>

To apply fine-grained access control at the column, row, or cell level with Lake Formation, the data lake administrator for Lake Formation must set `Amazon EMR` as the value for the session tag configuration, `AuthorizedSessionTagValue`. Lake Formation uses this session tag to authorize callers and provide access to the data lake. You can set this session tag in the **Application integration settings** section of the Lake Formation console. Replace *123456789012* with your own AWS account ID.

## Step 4: Configure AWS Glue and Lake Formation grants for Amazon EMR runtime roles
<a name="emr-lf-trust-policy"></a>

To continue with your setup of Lake Formation based access control with Amazon EMR runtime roles, you must configure AWS Glue and Lake Formation grants for Amazon EMR runtime roles. To allow your IAM runtime roles to interact with Lake Formation, grant them access with `lakeformation:GetDataAccess` and `glue:Get*`.

Lake Formation permissions control access to AWS Glue Data Catalog resources, Amazon S3 locations, and the underlying data at those locations. IAM permissions control access to the Lake Formation and AWS Glue APIs and resources. Although you might have the Lake Formation permission to access a table in the data catalog (SELECT), your operation fails if you don’t have the IAM permission on the `glue:Get*` API. For more details about Lake Formation access control, see [Lake Formation access control overview](https://docs.aws.amazon.com/lake-formation/latest/dg/lf-permissions-overview.html).

1.  Create the `emr-runtime-roles-lake-formation-policy.json` file with the following content. 

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "LakeFormationManagedAccess",
         "Effect": "Allow",
         "Action": [
           "lakeformation:GetDataAccess",
           "glue:Get*",
           "glue:Create*",
           "glue:Update*"
         ],
         "Resource": [
           "*"
         ]
       }
     ]
   }
   ```

------

1. Create the related IAM policy.

   ```
   aws iam create-policy \
   --policy-name emr-runtime-roles-lake-formation-policy \
   --policy-document file://emr-runtime-roles-lake-formation-policy.json
   ```

1. To assign this policy to your IAM runtime roles, follow the steps in [Managing AWS Lake Formation permissions](https://docs.aws.amazon.com/lake-formation/latest/dg/managing-permissions.html).

You can now use runtime roles and Lake Formation to apply table and column level permissions. You can also use a source identity to control actions and monitor operations with AWS CloudTrail.

For each IAM role that you plan to use as a runtime role, set the following trust policy, replacing `EMR_EC2_DefaultRole` with your instance profile role. To modify the trust policy of an IAM role, see [Modifying a role trust policy](https://docs.aws.amazon.com//IAM/latest/UserGuide/roles-managingrole-editing-console.html).

```
{
   "Sid":"AllowAssumeRole",
   "Effect":"Allow",
   "Principal":{
     "AWS":"arn:aws:iam::<AWS_ACCOUNT_ID>:role/EMR_EC2_DefaultRole"
   },
   "Action":[
        "sts:AssumeRole",
        "sts:TagSession"
       ]
 }
```

For a detailed, end-to-end example, see [Introducing runtime roles for Amazon EMR steps](https://aws.amazon.com/blogs/big-data/introducing-runtime-roles-for-amazon-emr-steps-use-iam-roles-and-aws-lake-formation-for-access-control-with-amazon-emr/).<a name="iceberg-with-lake-formation-spark-catalog-integration-lf-ec2"></a>

For information about how to integrate with Iceberg and AWS Glue Data Catalog for a multi-catalog hierarchy, see [Configure Spark to access a multi-catalog hierarchy in AWS Glue Data Catalog](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-multi-catalog.html#emr-lakehouse-using-spark-access).

# Open-table format support
<a name="emr-lf-fgac1"></a>

Amazon EMR releases 6.15.0 and higher include support for fine-grained access control based on AWS Lake Formation with Hive tables, Apache Iceberg, Apache Hudi, and Delta Lake when you read and write data with Spark SQL. Amazon EMR supports table, row, column, and cell-level access control with Apache Hudi. Amazon EMR releases 6.15.0 and higher include support for fine-grained access control at the row, column, or cell level based on AWS Lake Formation. Starting with EMR 7.12, DML and DDL operations that modify table data are supported for Apache Hive, Apache Iceberg, and Delta Lake tables using Lake Formation vended credentials. 

The topics in this section cover how you can access Lake Formation registered tables in open table formats from EMR Spark jobs or interactive sessions with fine-grained access control.

## Permission requirements
<a name="emr-lf-perm"></a>

### Tables not registered in AWS Lake Formation
<a name="emr-lf-tbl-reg"></a>

For tables not registered with AWS Lake Formation, the job runtime role accesses both the AWS Glue Data Catalog and the underlying table data in Amazon S3. This requires the job runtime role to have appropriate IAM permissions for both AWS Glue and Amazon S3 operations. 

### Tables registered in AWS Lake Formation
<a name="emr-lf-tbl-not-reg"></a>

For tables registered with AWS Lake Formation, the job runtime role accesses the AWS Glue Data Catalog metadata, while temporary credentials vended by Lake Formation access the underlying table data in Amazon S3. The Lake Formation permissions required to execute an operation depend on the AWS Glue Data Catalog and Amazon S3 API calls that the Spark job initiates and can be summarized as follows:
+ **DESCRIBE** permission allows the runtime role to read table or database metadata in the Data Catalog
+ **ALTER** permission allows the runtime role to modify table or database metadata in the Data Catalog
+ **DROP** permission allows the runtime role to delete table or database metadata from the Data Catalog
+ **SELECT** permission allows the runtime role to read table data from Amazon S3
+ **INSERT** permission allows the runtime role to write table data to Amazon S3
+ **DELETE** permission allows the runtime role to delete table data from Amazon S3
**Note**  
Lake Formation evaluates permissions lazily when a Spark job calls AWS Glue to retrieve table metadata and Amazon S3 to retrieve table data. Jobs that use a runtime role with insufficient permissions will not fail until Spark makes an AWS Glue or Amazon S3 call that requires the missing permission.

**Note**  
In the following supported table matrix:   
Operations marked as **Supported** exclusively use Lake Formation credentials to access table data for tables registered with Lake Formation. If Lake Formation permissions are insufficient, the operation will not fall back to runtime role credentials. For tables not registered with Lake Formation, the job runtime role credentials access the table data.
Operations marked as **Supported with IAM permissions on Amazon S3 location** do not use Lake Formation credentials to access underlying table data in Amazon S3. To run these operations, the job runtime role must have the necessary Amazon S3 IAM permissions to access the table data, regardless of whether the table is registered with Lake Formation.

------
#### [ Hive ]

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-lf-fgac1.html)

------
#### [ Iceberg ]

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-lf-fgac1.html)

**Spark configuration for Iceberg:** If you want to use Iceberg format, set the following configurations. Replace `DB_LOCATION` with the Amazon S3 path where your Iceberg tables are located, and replace the region and account ID placeholders with your own values.

```
spark-sql \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog 
--conf spark.sql.catalog.spark_catalog.warehouse=s3://DB_LOCATION
--conf spark.sql.catalog.spark_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog 
--conf spark.sql.catalog.spark_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
--conf spark.sql.catalog.spark_catalog.glue.account-id=ACCOUNT_ID
--conf spark.sql.catalog.spark_catalog.glue.id=ACCOUNT_ID
--conf spark.sql.catalog.spark_catalog.client.region=AWS_REGION
```

If you want to use Iceberg format on earlier EMR versions, use the following command instead:

```
spark-sql \
--conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,com.amazonaws.emr.recordserver.connector.spark.sql.RecordServerSQLExtension  
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog 
--conf spark.sql.catalog.spark_catalog.warehouse=s3://DB_LOCATION
--conf spark.sql.catalog.spark_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog 
--conf spark.sql.catalog.spark_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO  
--conf spark.sql.catalog.spark_catalog.glue.account-id=ACCOUNT_ID
--conf spark.sql.catalog.spark_catalog.glue.id=ACCOUNT_ID
--conf spark.sql.catalog.spark_catalog.client.assume-role.region=AWS_REGION
--conf spark.sql.catalog.spark_catalog.lf.managed=true
```

**Examples:**

Here are some examples of working with Iceberg tables:

```
-- Create an Iceberg table
CREATE TABLE my_iceberg_table (
    id BIGINT,
    name STRING,
    created_at TIMESTAMP
) USING ICEBERG;

-- Insert data
INSERT INTO my_iceberg_table VALUES (1, 'Alice', current_timestamp());

-- Query the table
SELECT * FROM my_iceberg_table;
```

------
#### [ Hudi ]

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-lf-fgac1.html)

**Spark configuration for Hudi:**

To start the Spark shell on EMR 7.10 or higher versions, use the following command:

```
spark-sql
--jars /usr/lib/hudi/hudi-spark-bundle.jar \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
--conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
```

To start the Spark shell on earlier EMR versions, use the below command instead:

```
spark-sql
--jars /usr/lib/hudi/hudi-spark-bundle.jar \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog \
--conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension,com.amazonaws.emr.recordserver.connector.spark.sql.RecordServerSQLExtension  \
--conf spark.sql.catalog.spark_catalog.lf.managed=true
```

**Examples:**

Here are some examples of working with Hudi tables:

```
-- Create a Hudi table
CREATE TABLE my_hudi_table (
    id BIGINT,
    name STRING,
    created_at TIMESTAMP
) USING HUDI
TBLPROPERTIES (
    'type' = 'cow',
    'primaryKey' = 'id'
);

-- Insert data
INSERT INTO my_hudi_table VALUES (1, 'Alice', current_timestamp());

-- Query the latest snapshot
SELECT * FROM my_hudi_table;
```

To query the latest snapshot of copy-on-write tables:

```
SELECT * FROM my_hudi_cow_table
```

```
spark.read.table("my_hudi_cow_table")
```

To query the latest compacted data of `MOR` tables, you can query the read-optimized table that is suffixed with `_ro`:

```
SELECT * FROM my_hudi_mor_table_ro
```

```
spark.read.table("my_hudi_mor_table_ro")
```

------
#### [ Delta Lake ]

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-lf-fgac1.html)

**Spark configuration for Delta Lake:**

To use Delta Lake with Lake Formation on EMR 7.10 and higher, run the following command:

```
spark-sql \
   --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog
```

To use Delta Lake with Lake Formation on EMR 6.15 to 7.9, run the following

```
spark-sql \
  --conf spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension,com.amazonaws.emr.recordserver.connector.spark.sql.RecordServerSQLExtension \
  --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog \
  --conf spark.sql.catalog.spark_catalog.lf.managed=true
```

If you want Lake Formation to use record server to manage your Spark catalog, set `spark.sql.catalog.<managed_catalog_name>.lf.managed` to true.

**Examples:**

Here are some examples of working with Delta Lake tables:

```
-- Create a Delta Lake table
CREATE TABLE my_delta_table (
    id BIGINT,
    name STRING,
    created_at TIMESTAMP
) USING DELTA;

-- Insert data
INSERT INTO my_delta_table VALUES (1, 'Alice', current_timestamp());

-- Query the table
SELECT * FROM my_delta_table;

-- Update data
UPDATE my_delta_table SET name = 'Alice Smith' WHERE id = 1;

-- Merge data
MERGE INTO my_delta_table AS target
USING (SELECT 2 as id, 'Bob' as name, current_timestamp() as created_at) AS source
ON target.id = source.id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
```

**Creating a Delta Lake table in AWS Glue Data Catalog**

Amazon EMR with Lake Formation doesn't support DDL commands and Delta table creation in EMR releases earlier than 7.12. Follow these steps to create tables in the AWS Glue Data Catalog.

1. Use the following example to create a Delta table. Make sure that your S3 location exists.

   ```
   spark-sql \
   --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \
   --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
   
   > CREATE DATABASE if not exists <DATABASE_NAME> LOCATION 's3://<S3_LOCATION>/transactionaldata/native-delta/<DATABASE_NAME>/';
   > CREATE TABLE <TABLE_NAME> (x INT, y STRING, z STRING) USING delta;
   > INSERT INTO <TABLE_NAME> VALUES (1, 'a1', 'b1');
   ```

1. To see the details of your table, go to [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. In the left navigation, expand **Data Catalog**, choose **Tables**, then choose the table you created. Under **Schema**, you should see that the Delta table you created with Spark stores all columns in a data type of `array<string>` in AWS Glue.

1. To define column and cell-level filters in Lake Formation, remove the `col` column from your schema, and then add the columns that are in your table schema. In this example, add the columns `x`, `y`, and `z`.

------

With this feature, you can run snapshot queries on copy-on-write tables to query the latest snapshot of the table at a given commit or compaction instant. Currently, a Lake Formation-enabled Amazon EMR cluster must retrieve Hudi's commit time column to perform incremental queries and time travel queries. It doesn't support Spark's `timestamp as of` syntax and the `Spark.read()` function. The correct syntax is `select * from table where _hoodie_commit_time <= point_in_time`. For more information, see [Point in time Time-Travel queries on Hudi table](https://cwiki.apache.org/confluence/display/HUDI/RFC+-+07+%3A+Point+in+time+Time-Travel+queries+on+Hudi+table).

**Note**  
The performance of reads on Lake Formation clusters might be slower because of optimizations that are not supported. These features include file listing based on Hudi metadata, and data skipping. We recommend that you test your application performance to ensure that it meets your requirements.

# Working with Glue Data Catalog views in Amazon EMR
<a name="SECTION-jobs-glue-data-catalog-views-ec2"></a>

**Note**  
Creating and managing AWS Glue Data Catalog views for use with EMR on EC2 is available with [Amazon EMR release 7.10.0](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-7100-release.html) and later.

You can create and manage views in the AWS Glue Data Catalog for use with EMR on EC2. These are known commonly as AWS Glue Data Catalog views. These views are useful because they support multiple SQL query engines, so you can access the same view across different AWS services, such as EMR on EC2, Amazon Athena, and Amazon Redshift.

By creating a view in the Data Catalog, you can use resource grants and tag-based access controls in AWS Lake Formation to grant access to it. Using this method of access control, you don't have to configure additional access to the tables you referenced when creating the view. This method of granting permissions is called definer semantics, and these views are called definer views. For more information about access control in Lake Formation, see [Granting and revoking permissions on Data Catalog resources](https://docs.aws.amazon.com/lake-formation/latest/dg/granting-catalog-permissions.html) in the AWS Lake Formation Developer Guide.

Data Catalog views are useful for the following use cases:
+ **Granular access control** – You can create a view that restricts data access based on the permissions the user needs. For example, you can use views in the Data Catalog to prevent employees who don’t work in the HR department from seeing personally identifiable information (PII).
+ **Complete view definition** – By applying filters on your view in the Data Catalog, you make sure that data records available in a view in the Data Catalog are always complete.
+ **Enhanced security** – The query definition used to create the view must be complete. This benefit means that views in the Data Catalog are less susceptible to SQL commands from malicious actors.
+ **Simple sharing data** – Share data with other AWS accounts without moving data. For more information, see [Cross-account data sharing in Lake Formation](https://docs.aws.amazon.com/lake-formation/latest/dg/cross-account-permissions.html).

## Creating a Data Catalog view
<a name="SECTION-jobs-glue-data-catalog-views-create-ec2"></a>

There are different ways to create a Data Catalog view. These include using the AWS CLI or Spark SQL. A few examples follow.

------
#### [ Using SQL ]

The following shows the syntax for creating a Data Catalog view. Note the `MULTI DIALECT` view type. This distinguishes the Data Catalog view from other views. The `SECURITY` predicate is specified as `DEFINER`. This indicates a Data Catalog view with `DEFINER` semantics.

```
CREATE [ OR REPLACE ] PROTECTED MULTI DIALECT VIEW [IF NOT EXISTS] view_name
[(column_name [COMMENT column_comment], ...) ]
[ COMMENT view_comment ]
[TBLPROPERTIES (property_name = property_value, ... )]
SECURITY DEFINER
AS query;
```

The following is a sample `CREATE` statement, following the syntax:

```
CREATE PROTECTED MULTI DIALECT VIEW catalog_view
SECURITY DEFINER
AS
SELECT order_date, sum(totalprice) AS price
FROM source_table
GROUP BY order_date
```

You can also create a view in dry-run mode, using SQL, to test view creation, without actually creating the resource. Using this option results in a "dry run" that validates the input and, if the validation succeeds, returns the JSON of the AWS Glue table object that will represent the view. In this case, The actual view isn't created.

```
CREATE [ OR REPLACE ] PROTECTED MULTI DIALECT VIEW view_name
SECURITY DEFINER 
[ SHOW VIEW JSON ]
AS view-sql
```

------
#### [ Using the AWS CLI ]

**Note**  
When you use the CLI command, the SQL used to create the view isn't parsed. This can result in a case where the view is created, but queries aren't successful. Be sure to test your SQL syntax prior to creating the view.

You use the following CLI command to create a view:

```
aws glue create-table --cli-input-json '{
  "DatabaseName": "database",
  "TableInput": {
    "Name": "view",
    "StorageDescriptor": {
      "Columns": [
        {
          "Name": "col1",
          "Type": "data-type"
        },
        ...
        {
          "Name": "col_n",
          "Type": "data-type"
        }
      ],
      "SerdeInfo": {}
    },
    "ViewDefinition": {
      "SubObjects": [
        "arn:aws:glue:aws-region:aws-account-id:table/database/referenced-table1",
        ...
        "arn:aws:glue:aws-region:aws-account-id:table/database/referenced-tableN",
       ],
      "IsProtected": true,
      "Representations": [
        {
          "Dialect": "SPARK",
          "DialectVersion": "1.0",
          "ViewOriginalText": "Spark-SQL",
          "ViewExpandedText": "Spark-SQL"
        }
      ]
    }
  }
}'
```

------

## Supported view operations
<a name="SECTION-jobs-glue-data-catalog-views-supported-operations-ec2"></a>

The following command fragments show you various ways to work with Data Catalog views:
+ **CREATE VIEW**

  Creates a data-catalog view. The following is a sample that shows creating a view from an existing table:

  ```
  CREATE PROTECTED MULTI DIALECT VIEW catalog_view 
  SECURITY DEFINER AS SELECT * FROM my_catalog.my_database.source_table
  ```
+ **ALTER VIEW**

  Available syntax:
  + `ALTER VIEW view_name [FORCE] ADD DIALECT AS query`
  + `ALTER VIEW view_name [FORCE] UPDATE DIALECT AS query`
  + `ALTER VIEW view_name DROP DIALECT`

  You can use the `FORCE ADD DIALECT` option to force update the schema and sub objects as per the new engine dialect. Note that doing this can result in query errors if you don't also use `FORCE` to update other engine dialects. The following shows a sample:

  ```
  ALTER VIEW catalog_view FORCE ADD DIALECT
  AS
  SELECT order_date, sum(totalprice) AS price
  FROM source_table
  GROUP BY orderdate;
  ```

  The following shows how to alter a view in order to update the dialect:

  ```
  ALTER VIEW catalog_view UPDATE DIALECT AS 
  SELECT count(*) FROM my_catalog.my_database.source_table;
  ```
+ **DESCRIBE VIEW**

  Available syntax for describing a view:
  + `SHOW COLUMNS {FROM|IN} view_name [{FROM|IN} database_name]` – If the user has the required AWS Glue and Lake Formation permissions to describe the view, they can list the columns. The following shows a couple sample commands for showing columns:

    ```
    SHOW COLUMNS FROM my_database.source_table;    
    SHOW COLUMNS IN my_database.source_table;
    ```
  + `DESCRIBE view_name` – If the user has the required AWS Glue and Lake Formation permissions to describe the view, they can list the columns in the view along with its metadata.
+ **DROP VIEW**

  Available syntax:
  + `DROP VIEW [ IF EXISTS ] view_name`

    The following sample shows a `DROP` statement that tests if a view exists prior to dropping it:

    ```
    DROP VIEW IF EXISTS catalog_view;
    ```
+ **SHOW CREATE VIEW**
  + `SHOW CREATE VIEW view_name` – Shows the SQL statement that creates the specified view. The following is a sample that shows creating a data-catalog view:

    ```
    SHOW CREATE TABLE my_database.catalog_view;
    CREATE PROTECTED MULTI DIALECT VIEW my_catalog.my_database.catalog_view (
      net_profit,
      customer_id,
      item_id,
      sold_date)
    TBLPROPERTIES (
      'transient_lastDdlTime' = '1736267222')
    SECURITY DEFINER AS SELECT * FROM
    my_database.store_sales_partitioned_lf WHERE customer_id IN (SELECT customer_id from source_table limit 10)
    ```
+ **SHOW VIEWS**

  List all views in the catalog such asregular views, multi-dialect views (MDV), and MDV without Spark dialect. Available syntax is the following:
  + `SHOW VIEWS [{ FROM | IN } database_name] [LIKE regex_pattern]`:

    The following shows a sample command to show views:

    ```
    SHOW VIEWS IN marketing_analytics LIKE 'catalog_view*';
    ```

For more information about creating and configuring data-catalog views, see [Building AWS Glue Data Catalog views](https://docs.aws.amazon.com/lake-formation/latest/dg/working-with-views.html) in the AWS Lake Formation Developer Guide.

## Querying a Data Catalog view
<a name="SECTION-jobs-glue-data-catalog-views-querying-ec2"></a>

 After creating a Data Catalog view, you can query it using an Amazon EMR Spark job that has AWS Lake Formation fine-grained access control enabled. The job runtime role must have the Lake Formation `SELECT` permission on the Data Catalog view. You don't need to grant access to the underlying tables referenced in the view. 

Once you have everything set up, you can query your view. For example, after creating an Amazon EMR application in EMR Studio, you can run the following query to access a view.

```
SELECT * from my_database.catalog_view LIMIT 10;
```

A helpful function is the `invoker_principal`. It returns the unique identifier of the EMRS job runtime role. This can be used to control the view output, based on the invoking principal. You can use this to add a condition in your view that refines query results, based on the calling role. The job runtime role must have permission to the `LakeFormation:GetDataLakePrincipal` IAM action to use this function.

```
select invoker_principal();
```

You can add this function to a `WHERE` clause, for instance, to refine query results.

## Considerations and limitations
<a name="SECTION-jobs-glue-data-catalog-views-considerations-ec2"></a>

When you create Data Catalog views, the following apply:
+ You can only create Data Catalog views with Amazon EMR 7.10 and above.
+ The Data Catalog view definer must have `SELECT` access to the underlying base tables accessed by the view. Creating the Data Catalog view fails if a specific base table has any Lake Formation filters imposed on the definer role.
+ Base tables must not have the `IAMAllowedPrincipals` data lake permission in Lake Formation. If present, the error *Multi Dialect views may only reference tables without IAMAllowedPrincipals permissions* occurs.
+ The table's Amazon S3 location must be registered as a Lake Formation data lake location. If the table isn't registered, the error *Multi Dialect views may only reference Lake Formation managed tables* occurs. For information about how to register Amazon S3 locations in Lake Formation, see [Registering an Amazon S3 location](https://docs.aws.amazon.com/lake-formation/latest/dg/register-location.html) in the AWS Lake Formation Developer Guide.
+ You can only create `PROTECTED` Data Catalog views. `UNPROTECTED` views aren't supported.
+ You can't reference tables in another AWS account in a Data Catalog view definition. You also can't reference a table in the same account that's in a separate region.
+ To share data across an account or region, the entire view must be be shared cross account and cross region, using Lake Formation resource links.
+ User-defined functions (UDFs) aren't supported.
+ You can use views based on Iceberg tables. The open-table formats Apache Hudi and Delta Lake are also supported.
+ You can't reference other views in Data Catalog views.
+ An AWS Glue Data Catalog view schema is always stored using lowercase. For example, if you use a DDL statement to create a Glue Data Catalog view with a column named `Castle`, the column created in the Glue Data Catalog will be made lowercase, to `castle`. If you then specify the column name in a DML query as `Castle` or `CASTLE`, EMR Spark will make the name lowercase for you in order to run the query. But the column heading displays using the casing that you specified in the query. 

  If you want a query to fail in a case where a column name specified in the DML query does not match the column name in the Glue Data Catalog, you can set `spark.sql.caseSensitive=true`.

# Considerations for Amazon EMR with Lake Formation
<a name="emr-lf-limitations-cont"></a>

Amazon EMR with Lake Formation is available in all [available regions](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-region.html).

## Considerations for Amazon EMR with Lake Formation for version 7.9 and earlier
<a name="emr-lf-limitations-early"></a>

Consider the following when using AWS Lake Formation on EMR 7.9 and earlier versions.
+ [Fine-grained access control](emr-lf-enable.md#emr-lf-fgac-perms) at row, column, and cell level is available on clusters with Amazon EMR releases 6.15 and higher.
+ Users with access to a table can access all the properties of that table. If you have Lake Formation based access control on a table, review the table to make sure that the properties don't contain any sensitive data or information.
+ Amazon EMR clusters with Lake Formation don't support Spark's fallback to HDFS when Spark collects table statistics. This ordinarily helps optimize query performance.
+ Operations that support access controls based on Lake Formation with non-governed Apache Spark tables include `INSERT INTO` and `INSERT OVERWRITE`.
+ Operations that support access controls based on Lake Formation with Apache Spark and Apache Hive include `SELECT`, `DESCRIBE`, `SHOW DATABASE`, `SHOW TABLE`, `SHOW COLUMN`, and `SHOW PARTITION`.
+ Amazon EMR doesn't support access control to the following Lake Formation based operations: 
  + Writes to governed tables
  + Amazon EMR doesn't support `CREATE TABLE`. Amazon EMR 6.10.0 and higher supports `ALTER TABLE`.
  + DML statements other than `INSERT` commands.
+ There are performance differences between the same query with and without Lake Formation based access control.
+ You can only use Amazon EMR with Lake Formation for Spark jobs.
+ Trusted Identity propagation is not supported with multi-catalog hierarchy in Glue Data Catalog. For more information, see [Working with a multi-catalog hierarchy in AWS Glue Data Catalog](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-multi-catalog.html).

## Considerations for Amazon EMR with Lake Formation for version 7.10 and later
<a name="emr-lf-limitations"></a>

Consider the following when using Amazon EMR with AWS Lake Formation on EMR 7.10 and later versions.
+ Amazon EMR supports fine-grained access control via Lake Formation only for Apache Hive, Apache Iceberg, Apache Delta and Apache Hudi tables. Apache Hive formats include Parquet, ORC, and xSV CSV. 
+ For Lake Formation–enabled applications, Spark logs are written to Amazon S3 in two groups: system space logs and user space logs. System space logs may contain sensitive information such as the full table schema. To safeguard this data, Amazon EMR stores system space logs in a separate location from user space logs. It is strongly recommended that account administrators do not grant users access to system space logs.
+ If you register a table location with Lake Formation, data access will be controlled exclusively by the permissions of the role used for registration, rather than by the Amazon EMR job runtime role. If the registration role is misconfigured, jobs that attempt to access the table will fail.
+ You can't turn off `DynamicResourceAllocation` for Lake Formation jobs.
+ You can only use Lake Formation with Spark jobs.
+ Amazon EMR with Lake Formation only supports a single Spark session throughout a job.
+ Amazon EMR with Lake Formation only supports cross-account table queries shared through resource links.
+ The following aren't supported:
  + Resilient distributed datasets (RDD)
  + Spark streaming
  + Write with Lake Formation granted permissions
  + Access control for nested columns
+ Amazon EMR blocks functionalities that might undermine the complete isolation of system driver, including the following:
  + UDTs, HiveUDFs, and any user-defined function that involves custom classes
  + Custom data sources
  + Supply of additional jars for Spark extension, connector, or metastore
  + `ANALYZE TABLE` command
+ To enforce access controls, `EXPLAIN PLAN` and DDL operations such as `DESCRIBE TABLE` don't expose restricted information.
+ Amazon EMR restricts access to system driver Spark logs on Lake Formation-enabled applications. Since the system driver runs with elevated permissions, events and logs that the system driver generates can include sensitive information. To prevent unauthorized users or code from accessing this sensitive data, Amazon EMR disables access to system driver logs.

  System profile logs are always persisted in managed storage – this is a mandatory setting that cannot be disabled. These logs are stored securely and encrypted using either a Customer Managed KMS key or an AWS Managed KMS key. 

  If your Amazon EMR application is in a private subnet with VPC endpoints for Amazon S3 and you attach an endpoint policy to control access, before your jobs can send log data to AWS Managed Amazon S3, you must include the permissions detailed in [Managed storage](logging.html#jobs-log-storage-managed-storage) in your VPC policy to S3 gateway endpoint. For troubleshooting requests, contact AWS support.
+ If you registered a table location with Lake Formation, the data access path goes through the Lake Formation stored credentials regardless of the IAM permission for the Amazon EMR job runtime role. If you misconfigure the role registered with table location, jobs submitted that use the role with S3 IAM permission to the table location will fail.
+ Writing to a Lake Formation table uses IAM permission rather than Lake Formation granted permissions. If your job runtime role has the necessary S3 permissions, you can use it to run write operations.

The following are considerations and limitations when using Apache Iceberg:
+ You can only use Apache Iceberg with session catalog and not arbitrarily named catalogs.
+ Iceberg tables that are registered in Lake Formation only support the metadata tables `history`, `metadata_log_entries`, `snapshots`, `files`, `manifests`, and `refs`. Amazon EMR hides the columns that might have sensitive data, such as `partitions`, `path`, and `summaries`. This limitation doesn't apply to Iceberg tables that aren't registered in Lake Formation.
+ Tables that you don't register in Lake Formation support all Iceberg stored procedures. The `register_table` and `migrate` procedures aren't supported for any tables.
+ We recommend that you use Iceberg DataFrameWriterV2 instead of V1.

## Considerations for Amazon EMR with Lake Formation for version 7.12 and later
<a name="emr-lf-limit-712"></a>

### General
<a name="emr-lf-limits-g"></a>

Review the following limitations when using Lake Formation with Amazon EMR .
+ You can't turn off `DynamicResourceAllocation` for Lake Formation jobs.
+ You can only use Lake Formation with Spark jobs.
+ Amazon EMR with Lake Formation only supports a single Spark session throughout a job.
+ Amazon EMR with Lake Formation only supports cross-account table queries shared through resource links.
+ The following aren't supported:
  + Resilient distributed datasets (RDD)
  + Spark streaming
  + Access control for nested columns
+ Amazon EMR blocks functionalities that might undermine the complete isolation of system driver, including the following:
  + UDTs, HiveUDFs, and any user-defined function that involves custom classes
  + Custom data sources
  + Supply of additional jars for Spark extension, connector, or metastore
  + `ANALYZE TABLE` command
+ If your Amazon EMR application is in a private subnet with VPC endpoints for Amazon S3 and you attach an endpoint policy to control access, before your jobs can send log data to AWS Managed Amazon S3, you must include the permissions detailed in [Managed storage](logging.html#jobs-log-storage-managed-storage) in your VPC policy to S3 gateway endpoint. For troubleshooting requests, contact AWS support.
+ Starting with Amazon EMR 7.9.0, Spark FGAC supports S3AFileSystem when used with the s3a:// scheme.
+ Amazon EMR 7.11 supports creating managed tables using CTAS.
+ Amazon EMR 7.12 supports creating managed and external tables using CTAS.

## Permissions
<a name="emr-lf-permissions"></a>
+ To enforce access controls, EXPLAIN PLAN and DDL operations such as DESCRIBE TABLE don't expose restricted information.
+ When you register a table location with Lake Formation, data access uses Lake Formation stored credentials instead of the EMR Serverless job runtime role's IAM permissions. Jobs will fail if the registered role for table location is misconfigured, even when the runtime role has S3 IAM permissions for that location.
+ Starting with Amazon EMR 7.12, you can write to existing Hive and Iceberg tables using DataFrameWriter (V2) with Lake Formation credentials in append mode. For overwrite operations or when creating new tables, EMR uses the runtime role credentials to modify table data.
+ The following limitations apply when using views or cached tables as source data (these limitations do not apply to AWS Glue Data Catalog views):
  + For MERGE, DELETE, and UPDATE operations
    + Supported: Using views and cached tables as source tables.
    + Not supported: Using views and cached tables in assignment and condition clauses.
  + For CREATE OR REPLACE and REPLACE TABLE AS SELECT operations:
    + Not supported: Using views and cached tables as source tables.
+ Delta Lake tables with UDFs in source data support MERGE, DELETE, and UPDATE operations only when deletion vector is enabled.

## Logs and debugging
<a name="emr-lf-logs-debugging"></a>
+ Amazon EMR restricts access to system driver Spark logs on Lake Formation-enabled applications. Since the system driver runs with elevated permissions, events and logs that the system driver generates can include sensitive information. To prevent unauthorized users or code from accessing this sensitive data, Amazon EMR disables access to system driver logs.

  System profile logs are always persisted in managed storage – this is a mandatory setting that cannot be disabled. These logs are stored securely and encrypted using either a Customer Managed KMS key or an AWS Managed KMS key. 

## Iceberg
<a name="emr-lf-iceberg-considerations"></a>

Review the following considerations when using Apache Iceberg:
+ You can only use Apache Iceberg with session catalog and not arbitrarily named catalogs.
+ Iceberg tables that are registered in Lake Formation only support the metadata tables `history`, `metadata_log_entries`, `snapshots`, `files`, `manifests`, and `refs`. Amazon EMR hides the columns that might have sensitive data, such as `partitions`, `path`, and `summaries`. This limitation doesn't apply to Iceberg tables that aren't registered in Lake Formation.
+ Tables that not registered in Lake Formation support all Iceberg stored procedures. The `register_table` and `migrate` procedures aren't supported for any tables.
+ We suggest that you use Iceberg DataFrameWriterV2 instead of V1.

# Spark native fine-grained access control allow-listed PySpark API
<a name="clean-rooms-spark-fgac-pyspark-api-allowlist"></a>

To maintain security and data access controls, Spark fine-grained access control (FGAC) restricts certain PySpark functions. These restrictions are enforced through:
+ Explicit blocking that prevents function execution
+ Architecture incompatibilities that make functions non-functional
+ Functions that may throw errors, return access denied messages, or do nothing when called

The following PySpark features aren't supported in Spark FGAC:
+ RDD operations (blocked with SparkRDDUnsupportedException)
+ Spark Connect (unsupported)
+ Spark Streaming (unsupported)

While we've tested the listed functions in a Native Spark FGAC environment and confirmed they work as expected, our testing typically covers only basic usage of each API. Functions with multiple input types or complex logic paths may have untested scenarios.

For any functions not listed here and not clearly part of the unsupported categories above, we recommend:
+ Testing them first in a gamma environment or small-scale deployment
+ Verifying their behavior before using them in production

**Note**  
If you see a class method listed but not its base class, the method should still work—it just means we haven't explicitly verified the base class constructor.

The PySpark API is organized into modules. General support for methods within each module is detailed in the table below.


| Module name | Status | Notes | 
| --- | --- | --- | 
|  pyspark\$1core  |  Supported  |  This module contains the main RDD classes, and these functions are mostly unsupported.  | 
|  pyspark\$1sql  |  Supported  |  | 
|  pyspark\$1testing  |  Supported  |  | 
|  pyspark\$1resource  |  Supported  |  | 
|  pyspark\$1streaming  |  Blocked  |  Streaming usage is blocked in Spark FGAC.  | 
|  pyspark\$1mllib  |  Experimental  |  This module contains RDD based ML operations, and these functions are mostly unsupported. This module isn't thoroughly tested.  | 
|  pyspark\$1ml  |  Experimental  |  This module containes DataFrame based ML operations, and these functions are mostly supported. This module isn't thoroughly tested.  | 
|  pyspark\$1pandas  |  Supported  |    | 
|  pyspark\$1pandas\$1slow  |  Supported  |    | 
| pyspark\$1connect |  Blocked  |  Spark Connect usage is blocked in Spark FGAC.  | 
| pyspark\$1pandas\$1connect |  Blocked  |  Spark Connect usage is blocked in Spark FGAC.  | 
| pyspark\$1pandas\$1slow\$1connect |  Blocked  |  Spark Connect usage is blocked in Spark FGAC.  | 
| pyspark\$1errors |  Experimental  |  This module is not thoroughly tested. Custom error classes can't be utilized.  | 

**API Allowlist**

For a downloadable and easier to search list, a file with the modules and classes is available at [Python functions allowed in Native FGAC](samples/Python functions allowed in Native FGAC.zip).

# Lake Formation full table access for Amazon EMR on EC2
<a name="lake-formation-unfiltered-ec2-access"></a>

With Amazon EMR releases 7.8.0 and higher, you can leverage AWS Lake Formation with Glue Data Catalog where the job runtime role has full table permissions without the limitations of fine-grained access control. This capability allows you to read and write to tables that are protected by Lake Formation from your Amazon EMR on EC2 Spark batch and interactive jobs. See the following sections to learn more about Lake Formation and how to use it with Amazon EMR on EC2.

## Using Lake Formation with full table access
<a name="lake-formation-unfiltered-ec2-full-access"></a>

You can access AWS Lake Formation protected Glue Data catalog tables from Amazon EMR on EC2 Spark jobs or interactive sessions where the job's runtime role has full table access. You do not need to enable AWS Lake Formation on the Amazon EMR on EC2 application. When a Spark job is configured for Full Table Access (FTA), AWS Lake Formation credentials are used to read/write S3 data for AWS Lake Formation registered tables, while the job's runtime role credentials will be used to read/write tables not registered with AWS Lake Formation.

**Important**  
Do not enable AWS Lake Formation for fine-grained access control. A job cannot simultaneously run Full Table Access (FTA) and Fine-Grained Access Control (FGAC) on the same EMR cluster or application.

### Step 1: Enable Full Table Access in Lake Formation
<a name="lake-formation-unfiltered-ec2-full-table-access"></a>

To use Full Table Access (FTA) mode, you must allow third-party query engines to access data without the IAM session tag validation in AWS Lake Formation. To enable, follow the steps in [Application integration for full table access](https://docs.aws.amazon.com/lake-formation/latest/dg/full-table-credential-vending.html).

**Note**  
 When accessing cross-account tables, full-table access must be enabled in both producer and consumer accounts. In the same manner, when accessing cross-region tables, this setting must be enabled in both producer and consumer regions. 

### Step 2: Setup IAM permissions for job runtime role
<a name="lake-formation-unfiltered-ec2-iam-permissions"></a>

For read or write access to underlying data, in addition to Lake Formation permissions, a job runtime role needs the `lakeformation:GetDataAccess` IAM permission. With this permission, Lake Formation grants the request for temporary credentials to access the data.

The following is an example policy of how to provide IAM permissions to access a script in Amazon S3, uploading logs to S3, AWS Glue API permissions, and permission to access Lake Formation.

#### Step 2.1 Configure Lake Formation permissions
<a name="lake-formation-unfiltered-ec2-permission-model"></a>
+ Spark jobs that read data from S3 require Lake Formation SELECT permission.
+ Spark jobs that write/delete data in S3 require Lake Formation ALL (SUPER) permission.
+ Spark jobs that interact with Glue Data catalog require DESCRIBE, ALTER, DROP permission as appropriate.

For more information, refer to [Granting permissions on Data Catalog resources](https://docs.aws.amazon.com/lake-formation/latest/dg/granting-catalog-permissions.html).

### Step 3: Initialize a Spark session for Full Table Access using Lake Formation
<a name="lake-formation-unfiltered-ec2-spark-session"></a>

#### Prerequisites
<a name="lake-formation-unfiltered-ec2-spark-session-prereq"></a>

AWS Glue Data Catalog must be configured as a metastore to access Lake Formation tables.

Set the following settings to configure Glue catalog as a metastore:

```
--conf spark.sql.catalogImplementation=hive
--conf spark.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory
```

For more information on enabling Data Catalog for Amazon EMR on EC2, refer to [Metastore configuration for Amazon EMR on EC2](metastore-config.html).

To access tables registered with AWS Lake Formation, the following configurations need to be set during Spark initialization to configure Spark to use AWS Lake Formation credentials.

------
#### [ Hive ]

```
‐‐conf spark.hadoop.fs.s3.credentialsResolverClass=com.amazonaws.glue.accesscontrol.AWSLakeFormationCredentialResolver
--conf spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true 
--conf spark.hadoop.fs.s3.folderObject.autoAction.disabled=true
--conf spark.sql.catalog.skipLocationValidationOnCreateTable.enabled=true
--conf spark.sql.catalog.createDirectoryAfterTable.enabled=true
--conf spark.sql.catalog.dropDirectoryBeforeTable.enabled=true
```

------
#### [ Iceberg ]

```
--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog
--conf spark.sql.catalog.spark_catalog.warehouse=S3_DATA_LOCATION
--conf spark.sql.catalog.spark_catalog.client.region=REGION
--conf spark.sql.catalog.spark_catalog.type=glue
--conf spark.sql.catalog.spark_catalog.glue.account-id=ACCOUNT_ID
--conf spark.sql.catalog.spark_catalog.glue.lakeformation-enabled=true
--conf spark.sql.catalog.dropDirectoryBeforeTable.enabled=true
```

------
#### [ Delta Lake ]

```
‐‐conf spark.hadoop.fs.s3.credentialsResolverClass=com.amazonaws.glue.accesscontrol.AWSLakeFormationCredentialResolver
--conf spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true 
--conf spark.hadoop.fs.s3.folderObject.autoAction.disabled=true
--conf spark.sql.catalog.skipLocationValidationOnCreateTable.enabled=true
--conf spark.sql.catalog.createDirectoryAfterTable.enabled=true
--conf spark.sql.catalog.dropDirectoryBeforeTable.enabled=true
```

------
#### [ Hudi ]

```
‐‐conf spark.hadoop.fs.s3.credentialsResolverClass=com.amazonaws.glue.accesscontrol.AWSLakeFormationCredentialResolver
--conf spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true 
--conf spark.hadoop.fs.s3.folderObject.autoAction.disabled=true
--conf spark.sql.catalog.skipLocationValidationOnCreateTable.enabled=true
--conf spark.sql.catalog.createDirectoryAfterTable.enabled=true
--conf spark.sql.catalog.dropDirectoryBeforeTable.enabled=true
--conf spark.jars=/usr/lib/hudi/hudi-spark-bundle.jar
--conf spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension
--conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer
```

------
+ `spark.hadoop.fs.s3.credentialsResolverClass=com.amazonaws.glue.accesscontrol.AWSLakeFormationCredentialResolver`: Configure EMR Filesystem (EMRFS) or EMR S3A to use AWS Lake Formation S3 credentials for Lake Formation registered tables. If the table is not registered, use the job's runtime role credentials. 
+ `spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true` and `spark.hadoop.fs.s3.folderObject.autoAction.disabled=true`: Configure EMRFS to use content type header application/x-directory instead of \$1folder\$1 suffix when creating S3 folders. This is required when reading Lake Formation tables, as Lake Formation credentials do not allow reading table folders with \$1folder\$1 suffix.
+ `spark.sql.catalog.skipLocationValidationOnCreateTable.enabled=true`: Configure Spark to skip validating the table location's emptiness before creation. This is necessary for Lake Formation registered tables, as Lake Formation credentials to verify the empty location are available only after Glue Data Catalog table creation. Without this configuration, the job's runtime role credentials will validate the empty table location.
+ `spark.sql.catalog.createDirectoryAfterTable.enabled=true`: Configure Spark to create the Amazon S3 folder after table creation in the Hive metastore. This is required for Lake Formation registered tables, as Lake Formation credentials to create the S3 folder are available only after Glue Data Catalog table creation.
+ `spark.sql.catalog.dropDirectoryBeforeTable.enabled=true`: Configure Spark to drop the S3 folder before table deletion in the Hive metastore. This is necessary for Lake Formation registered tables, as Lake Formation credentials to drop the S3 folder are not available after table deletion from the Glue Data Catalog.
+ `spark.sql.catalog.<catalog>.glue.lakeformation-enabled=true`: Configure Iceberg catalog to use AWS Lake Formation S3 credentials for Lake Formation registered tables. If the table is not registered, use default environment credentials.

#### Configure full table access mode in SageMaker Unified Studio
<a name="lake-formation-unfiltered-ec2-full-table"></a>

To access Lake Formation registered tables from interactive Spark sessions in JupyterLab notebooks, use compatibility permission mode. Use the %%configure magic command to set up your Spark configuration. Choose the configuration based on your table type:

------
#### [ For Hive tables ]

```
%%configure -f
{
    "conf": {
        "spark.hadoop.fs.s3.credentialsResolverClass": "com.amazonaws.glue.accesscontrol.AWSLakeFormationCredentialResolver",
        "spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject": true,
        "spark.hadoop.fs.s3.folderObject.autoAction.disabled": true,
        "spark.sql.catalog.skipLocationValidationOnCreateTable.enabled": true,
        "spark.sql.catalog.createDirectoryAfterTable.enabled": true,
        "spark.sql.catalog.dropDirectoryBeforeTable.enabled": true
    }
}
```

------
#### [ For Iceberg tables ]

```
%%configure -f
{
    "conf": {
        "spark.sql.catalog.spark_catalog": "org.apache.iceberg.spark.SparkSessionCatalog",
        "spark.sql.catalog.spark_catalog.warehouse": "S3_DATA_LOCATION",
        "spark.sql.catalog.spark_catalog.client.region": "REGION",
        "spark.sql.catalog.spark_catalog.type": "glue",
        "spark.sql.catalog.spark_catalog.glue.account-id": "ACCOUNT_ID",
        "spark.sql.catalog.spark_catalog.glue.lakeformation-enabled": "true",
        "spark.sql.catalog.dropDirectoryBeforeTable.enabled": "true", 
    }
}
```

------
#### [ For Delta Lake tables ]

```
%%configure -f
{
    "conf": {
        "spark.hadoop.fs.s3.credentialsResolverClass": "com.amazonaws.glue.accesscontrol.AWSLakeFormationCredentialResolver",
        "spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject": true,
        "spark.hadoop.fs.s3.folderObject.autoAction.disabled": true,
        "spark.sql.catalog.skipLocationValidationOnCreateTable.enabled": true,
        "spark.sql.catalog.createDirectoryAfterTable.enabled": true,
        "spark.sql.catalog.dropDirectoryBeforeTable.enabled": true
    }
}
```

------
#### [ For Hudi tables ]

```
%%configure -f
{
    "conf": {
        "spark.hadoop.fs.s3.credentialsResolverClass": "com.amazonaws.glue.accesscontrol.AWSLakeFormationCredentialResolver",
        "spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject": true,
        "spark.hadoop.fs.s3.folderObject.autoAction.disabled": true,
        "spark.sql.catalog.skipLocationValidationOnCreateTable.enabled": true,
        "spark.sql.catalog.createDirectoryAfterTable.enabled": true,
        "spark.sql.catalog.dropDirectoryBeforeTable.enabled": true,
        "spark.jars": "/usr/lib/hudi/hudi-spark-bundle.jar",
        "spark.sql.extensions": "org.apache.spark.sql.hudi.HoodieSparkSessionExtension",
        "spark.sql.catalog.spark_catalog": "org.apache.spark.sql.hudi.catalog.HoodieCatalog",
        "spark.serializer": "org.apache.spark.serializer.KryoSerializer"
    }
}
```

------

Replace the placeholders:
+ `S3_DATA_LOCATION`: Your S3 bucket path
+ `REGION`: AWS region (e.g., us-east-1)
+ `ACCOUNT_ID`: Your AWS account ID

**Note**  
You must set these configurations before executing any Spark operations in your notebook.

#### Supported Operations
<a name="lake-formation-unfiltered-ec2-supported-operations"></a>

These operations will use AWS Lake Formation credentials to access the table data.
+ CREATE TABLE
+ ALTER TABLE
+ INSERT INTO
+ INSERT OVERWRITE
+ UPDATE
+ MERGE INTO
+ DELETE FROM
+ ANALYZE TABLE
+ REPAIR TABLE
+ DROP TABLE
+ Spark datasource queries
+ Spark datasource writes

**Note**  
Operations not listed above will continue to use IAM permissions to access table data.

#### Considerations
<a name="considerations"></a>
+ If a Hive table is created using a job that doesn’t have full table access enabled, and no records are inserted, subsequent reads or writes from a job with full table access will fail. This is because EMR Spark without full table access adds the `$folder$` suffix to the table folder name. To resolve this, you can either:
  + Insert at least one row into the table from a job that does not have FTA enabled.
  + Configure the job that does not have FTA enabled to not use `$folder$` suffix in folder name in S3. This can be achieved by setting Spark configuration `spark.hadoop.fs.s3.useDirectoryHeaderAsFolderObject=true`.
  + Create a S3 folder at the table location `s3://path/to/table/table_name` using the AWS S3 console or AWS S3 CLI.
+ Full Table Access is supported with the EMR Filesystem (EMRFS) starting in Amazon EMR release 7.8.0, and with the S3A filesystem starting in Amazon EMR release 7.10.0.
+ Full Table Access is supported for Hive, Iceberg, Delta, and Hudi tables.
+ **Hudi FTA Write Support considerations:**
  + Hudi FTA writes require using HoodieCredentialedHadoopStorage for credential vending during job execution. Set the following configuration when running Hudi jobs: `hoodie.storage.class=org.apache.spark.sql.hudi.storage.HoodieCredentialedHadoopStorage`
  + Full Table Access (FTA) write support for Hudi is available starting from Amazon EMR release 7.12.
  + Hudi FTA write support currently works only with the default Hudi configurations. Custom or non-default Hudi settings may not be fully supported and could result in unexpected behavior.
  + Clustering for Hudi Merge-On-Read (MOR) tables is not supported at this point under FTA write mode.
+ Jobs referencing tables with Lake Formation Fine-Grained Access Control (FGAC) rules or Glue Data Catalog Views will fail. To query a table with an FGAC rules or a Glue Data Catalog View, you must use the FGAC mode. You can enable FGAC mode by following the steps outlined in the AWS documentation: [Using Amazon EMR on EC2 with AWS Lake Formation for fine-grained access control](emr-serverless-lf-enable.html).
+ Full table access does not support Spark Streaming.
+ When writing Spark DataFrame to a Lake Formation table, only APPEND mode is supported for Hive and Iceberg tables: `df.write.mode("append").saveAsTable(table_name)`
+ Creating external tables requires IAM permissions.
+ Because Lake Formation temporarily caches credentials within a Spark job, a Spark batch job or interactive session that is currently running might not reflect permission changes.
+ You must use user defined role and not a service-linked role:[Lake Formation requirements for roles](https://docs.aws.amazon.com/lake-formation/latest/dg/registration-role.html).

#### Hudi FTA Write Support - Supported Operations
<a name="hudi-fta-supported-operations"></a>

The following table shows the supported write operations for Hudi Copy-On-Write (COW) and Merge-On-Read (MOR) tables under Full Table Access mode:


**Hudi FTA Supported Write Operations**  

| Table Type | Operation | SQL Write Command | Status | 
| --- | --- | --- | --- | 
| COW | INSERT | INSERT INTO TABLE | Supported | 
| COW | INSERT | INSERT INTO TABLE - PARTITION (Static, Dynamic) | Supported | 
| COW | INSERT | INSERT OVERWRITE | Supported | 
| COW | INSERT | INSERT OVERWRITE - PARTITION (Static, Dynamic) | Supported | 
| UPDATE | UPDATE | UPDATE TABLE | Supported | 
| COW | UPDATE | UPDATE TABLE - Change Partition | Not Supported | 
| DELETE | DELETE | DELETE FROM TABLE | Supported | 
| ALTER | ALTER | ALTER TABLE - RENAME TO | Not Supported | 
| COW | ALTER | ALTER TABLE - SET TBLPROPERTIES | Supported | 
| COW | ALTER | ALTER TABLE - UNSET TBLPROPERTIES | Supported | 
| COW | ALTER | ALTER TABLE - ALTER COLUMN | Supported | 
| COW | ALTER | ALTER TABLE - ADD COLUMNS | Supported | 
| COW | ALTER | ALTER TABLE - ADD PARTITION | Supported | 
| COW | ALTER | ALTER TABLE - DROP PARTITION | Supported | 
| COW | ALTER | ALTER TABLE - RECOVER PARTITIONS | Supported | 
| COW | ALTER | REPAIR TABLE SYNC PARTITIONS | Supported | 
| DROP | DROP | DROP TABLE | Supported | 
| COW | DROP | DROP TABLE - PURGE | Supported | 
| CREATE | CREATE | CREATE TABLE - Managed | Supported | 
| COW | CREATE | CREATE TABLE - PARTITION BY | Supported | 
| COW | CREATE | CREATE TABLE IF NOT EXISTS | Supported | 
| COW | CREATE | CREATE TABLE LIKE | Supported | 
| COW | CREATE | CREATE TABLE AS SELECT | Supported | 
| CREATE | CREATE | CREATE TABLE with LOCATION - External Table | Not Supported | 
| DATAFRAME(INSERT) | DATAFRAME(INSERT) | saveAsTable.Overwrite | Supported | 
| COW | DATAFRAME(INSERT) | saveAsTable.Append | Not Supported | 
| COW | DATAFRAME(INSERT) | saveAsTable.Ignore | Supported | 
| COW | DATAFRAME(INSERT) | saveAsTable.ErrorIfExists | Supported | 
| COW | DATAFRAME(INSERT) | saveAsTable - External table (Path) | Not Supported | 
| COW | DATAFRAME(INSERT) | save(path) - DF v1 | Not Supported | 
| MOR | INSERT | INSERT INTO TABLE | Supported | 
| MOR | INSERT | INSERT INTO TABLE - PARTITION (Static, Dynamic) | Supported | 
| MOR | INSERT | INSERT OVERWRITE | Supported | 
| MOR | INSERT | INSERT OVERWRITE - PARTITION (Static, Dynamic) | Supported | 
| UPDATE | UPDATE | UPDATE TABLE | Supported | 
| MOR | UPDATE | UPDATE TABLE - Change Partition | Not Supported | 
| DELETE | DELETE | DELETE FROM TABLE | Supported | 
| ALTER | ALTER | ALTER TABLE - RENAME TO | Not Supported | 
| MOR | ALTER | ALTER TABLE - SET TBLPROPERTIES | Supported | 
| MOR | ALTER | ALTER TABLE - UNSET TBLPROPERTIES | Supported | 
| MOR | ALTER | ALTER TABLE - ALTER COLUMN | Supported | 
| MOR | ALTER | ALTER TABLE - ADD COLUMNS | Supported | 
| MOR | ALTER | ALTER TABLE - ADD PARTITION | Supported | 
| MOR | ALTER | ALTER TABLE - DROP PARTITION | Supported | 
| MOR | ALTER | ALTER TABLE - RECOVER PARTITIONS | Supported | 
| MOR | ALTER | REPAIR TABLE SYNC PARTITIONS | Supported | 
| DROP | DROP | DROP TABLE | Supported | 
| MOR | DROP | DROP TABLE - PURGE | Supported | 
| CREATE | CREATE | CREATE TABLE - Managed | Supported | 
| MOR | CREATE | CREATE TABLE - PARTITION BY | Supported | 
| MOR | CREATE | CREATE TABLE IF NOT EXISTS | Supported | 
| MOR | CREATE | CREATE TABLE LIKE | Supported | 
| MOR | CREATE | CREATE TABLE AS SELECT | Supported | 
| CREATE | CREATE | CREATE TABLE with LOCATION - External Table | Not Supported | 
| DATAFRAME(UPSERT) | DATAFRAME(UPSERT) | saveAsTable.Overwrite | Supported | 
| MOR | DATAFRAME(UPSERT) | saveAsTable.Append | Not Supported | 
| MOR | DATAFRAME(UPSERT) | saveAsTable.Ignore | Supported | 
| MOR | DATAFRAME(UPSERT) | saveAsTable.ErrorIfExists | Supported | 
| MOR | DATAFRAME(UPSERT) | saveAsTable - External table (Path) | Not Supported | 
| MOR | DATAFRAME(UPSERT) | save(path) - DF v1 | Not Supported | 
| DATAFRAME(DELETE) | DATAFRAME(DELETE) | saveAsTable.Append | Not Supported | 
| MOR | DATAFRAME(DELETE) | saveAsTable - External table (Path) | Not Supported | 
| MOR | DATAFRAME(DELETE) | save(path) - DF v1 | Not Supported | 
| DATAFRAME(BULK\$1INSERT) | DATAFRAME(BULK\$1INSERT) | saveAsTable.Overwrite | Supported | 
| MOR | DATAFRAME(BULK\$1INSERT) | saveAsTable.Append | Not Supported | 
| MOR | DATAFRAME(BULK\$1INSERT) | saveAsTable.Ignore | Supported | 
| MOR | DATAFRAME(BULK\$1INSERT) | saveAsTable.ErrorIfExists | Supported | 
| MOR | DATAFRAME(BULK\$1INSERT) | saveAsTable - External table (Path) | Not Supported | 
| MOR | DATAFRAME(BULK\$1INSERT) | save(path) - DF v1 | Not Supported | 

# Integrate Amazon EMR with Apache Ranger
<a name="emr-ranger"></a>

Beginning with Amazon EMR 5.32.0, you can launch a cluster that natively integrates with Apache Ranger. Apache Ranger is an open-source framework to enable, monitor, and manage comprehensive data security across the Hadoop platform. For more information, see [Apache Ranger](https://ranger.apache.org/). With native integration, you can bring your own Apache Ranger to enforce fine-grained data access control on Amazon EMR.

This section provides a conceptual overview of Amazon EMR integration with Apache Ranger. It also includes the prerequisites and steps required to launch an Amazon EMR cluster integrated with Apache Ranger.

Natively integrating Amazon EMR with Apache Ranger provides the following key benefits: 
+ Fine-grained access control to Hive Metastore databases and tables, which enables you to define data filtering policies at the level of database, table, and column for Apache Spark and Apache Hive applications. Row-level filtering and data masking are supported with Hive applications.
+ The ability to use your existing Hive policies directly with Amazon EMR for Hive applications.
+ Access control to Amazon S3 data at the prefix and object level, which enables you to define data filtering policies for access to S3 data using the EMR File System.
+ The ability to use CloudWatch Logs for centralized auditing.
+ Amazon EMR installs and manages the Apache Ranger plugins on your behalf.

**Important**  
Amazon EMR does not support Apache Ranger integration starting with Amazon EMR release 7.4. For more information, see [Amazon EMR release 7.4.0](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-740-release.html).

# Apache Ranger with Amazon EMR
<a name="emr-ranger-overview"></a>

Apache Ranger is a framework to enable, monitor, and manage comprehensive data security across the Hadoop platform.

Apache Ranger has the following features:
+ Centralized security administration to manage all security related tasks in a central UI or using REST APIs.
+ Fine-grained authorization to do a specific action or operation with a Hadoop component or tool, managed through a central administration tool.
+ A standardized authorization method across all Hadoop components.
+ Enhanced support for various authorization methods.
+ Centralized auditing of user access and administrative actions (security related) within all the components of Hadoop.

Apache Ranger uses two key components for authorization: 
+ **Apache Ranger policy admin server** - This server allows you to define the authorization policies for Hadoop applications. When integrating with Amazon EMR, you are able to define and enforce policies for Apache Spark and Hive to access Hive Metastore, and accessing Amazon S3 data [EMR File System (EMRFS)](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-fs). You can set up a new or use an existing Apache Ranger policy admin server to integrate with Amazon EMR.
+ **Apache Ranger plugin** - This plugin validates the access of a user against the authorization policies defined in the Apache Ranger policy admin server. Amazon EMR installs and configures the Apache Ranger plugin automatically for each Hadoop application selected in the Apache Ranger configuration. 

**Topics**
+ [Architecture of Amazon EMR integration with Apache Ranger](emr-ranger-architecture.md)
+ [Amazon EMR components for use with Apache Ranger](emr-ranger-components.md)

# Architecture of Amazon EMR integration with Apache Ranger
<a name="emr-ranger-architecture"></a>

![\[Amazon EMR and Apache Ranger architecture diagram.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/emr-ranger-architecture.png)


# Amazon EMR components for use with Apache Ranger
<a name="emr-ranger-components"></a>

Amazon EMR enables fine-grained access control with Apache Ranger through the following components. See the [architecture diagram](emr-ranger-architecture.md) for a visual representation of these Amazon EMR components with the Apache Ranger plugins.

**Secret agent** – The secret agent securely stores secrets and distributes secrets to other Amazon EMR components or applications. The secrets can include temporary user credentials, encryption keys, or Kerberos tickets. The secret agent runs on every node in the cluster and intercepts calls to the Instance Metadata Service. For requests to the instance profile role credentials, the Secret Agent vends credentials depending on the requesting user and requested resources after authorizing the request with the EMRFS S3 Ranger plugin. The secret agent runs as the *`emrsecretagent`* user, and it writes logs to the /emr/secretagent/log directory. The process relies on a specific set of `iptables` rules to function. It is important to ensure that `iptables` is not disabled. If you customize `iptables` configuration, the NAT table rules must be preserved and left unaltered.

**EMR record server** – The record server receives requests to access data from Spark. It then authorizes requests by forwarding the requested resources to the Spark Ranger plugin for Amazon EMR. The record server reads data from Amazon S3 and returns filtered data that the user is authorized to access based on Ranger policy. The record server runs on every node in the cluster as the emr\$1record\$1server user and writes logs to the /var/log/emr-record-server directory.

# Considerations for using Amazon EMR with Apache Ranger
<a name="emr-ranger-app-support"></a>

## Supported applications for Amazon EMR with Apache Ranger
<a name="emr-ranger-app-support-list"></a>

The integration between Amazon EMR and Apache Ranger in which EMR installs Ranger plugins currently supports the following applications:
+ Apache Spark (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Apache Hive (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ S3 Access through EMRFS (Available with EMR 5.32\$1 and EMR 6.3\$1)

The following applications can be installed on an EMR cluster and may need to be configured to meet your security needs:
+ Apache Hadoop (Available with EMR 5.32\$1 and EMR 6.3\$1 including YARN and HDFS)
+ Apache Livy (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Apache Zeppelin (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Apache Hue (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Ganglia (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ HCatalog (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Mahout (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ MXNet (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ TensorFlow (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Tez (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Trino (Available with EMR 6.7\$1)
+ ZooKeeper (Available with EMR 5.32\$1 and EMR 6.3\$1)

**Important**  
Applications listed above are the only applications that are currently supported. To ensure cluster security, you are allowed to create an EMR cluster with only the applications in the above list when Apache Ranger is enabled.  
Other applications are currently not supported. To ensure the security of your cluster, attempting to install other applications will cause the rejection of your cluster.  
AWS Glue Data Catalog and Open table formats such as Apache Hudi, Delta Lake, and Apache Iceberg aren't supported.

**Supported Amazon EMR features with Apache Ranger**  
The following Amazon EMR features are supported when you use Amazon EMR with Apache Ranger:
+ Encryption at rest and in transit
+ Kerberos authentication (required)
+ Instance groups, instance fleets, and Spot Instances
+ Reconfiguration of applications on a running cluster
+ EMRFS server-side encryption (SSE)

**Note**  
Amazon EMR encryption settings govern SSE. For more information, see [Encryption Options](emr-data-encryption-options.md).

## Application limitations
<a name="emr-ranger-app-support-limitations"></a>

There are several limitations to keep in mind when you integrate Amazon EMR and Apache Ranger:
+ You cannot currently use the console to create a security configuration that specifies the AWS Ranger integration option in the AWS GovCloud (US) Region. Security configuration can be done using the CLI.
+ Kerberos must be installed on your cluster.
+ Application UIs (user interfaces) such as the YARN Resource Manager UI, HDFS NameNode UI, and Livy UI are not set with authentication by default.
+ The HDFS default permissions `umask` are configured so that objects created are set to `world wide readable` by default.
+ Amazon EMR doesn't support high-availability (multiple primary) mode with Apache Ranger.
+ For additional limitations, see limitations for each application.

**Note**  
Amazon EMR encryption settings govern SSE. For more information, see [Encryption Options](emr-data-encryption-options.md).

## Plugin limitations
<a name="plugin-limitations"></a>

Each plugin has specific limitations. For the Apache Hive plugin's limitations, see [Apache Hive plugin limitations](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ranger-hive.html#emr-ranger-hive-limitations). For the Apache Spark plugin's limitations, see [Apache Spark plugin limitations](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ranger-spark.html#emr-ranger-spark-limitations). For the EMRFS S3 plugin's limitations, see [EMRFS S3 plugin limitations](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ranger-emrfs.html#emr-ranger-emrfs-limitations).

# Set up Amazon EMR for Apache Ranger
<a name="emr-ranger-begin"></a>

Before you install Apache Ranger, review the information in this section to make sure that Amazon EMR is properly configured.

**Topics**
+ [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md)
+ [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md)
+ [Create the EMR security configuration](emr-ranger-security-config.md)
+ [Store TLS certificates in AWS Secrets Manager](emr-ranger-tls-certificates.md)
+ [Start an EMR cluster with Apache Ranger](emr-ranger-start-emr-cluster.md)
+ [Configure Zeppelin for Apache Ranger-enabled Amazon EMR clusters](emr-ranger-configure-zeppelin.md)
+ [Known issues for Amazon EMR integration](emr-ranger-security-considerations.md)

# Set up a Ranger Admin server to integrate with Amazon EMR
<a name="emr-ranger-admin"></a>

For Amazon EMR integration, the Apache Ranger application plugins must communicate with the Admin server using TLS/SSL.

**Prerequisite: Ranger Admin Server SSL Enablement**

Apache Ranger on Amazon EMR requires two-way SSL communication between plugins and the Ranger Admin server. To ensure that plugins communicate with the Apache Ranger server over SSL, enable the following attribute within ranger-admin-site.xml on the Ranger Admin server.

```
<property>
    <name>ranger.service.https.attrib.ssl.enabled</name>
    <value>true</value>
</property>
```

In addition, the following configurations are needed.

```
<property>
    <name>ranger.https.attrib.keystore.file</name>
    <value>_<PATH_TO_KEYSTORE>_</value>
</property>

<property>
    <name>ranger.service.https.attrib.keystore.file</name>
    <value>_<PATH_TO_KEYSTORE>_</value>
</property>

<property>
    <name>ranger.service.https.attrib.keystore.pass</name>
    <value>_<KEYSTORE_PASSWORD>_</value>
</property>

<property>
    <name>ranger.service.https.attrib.keystore.keyalias</name>
    <value><PRIVATE_CERTIFICATE_KEY_ALIAS></value>
</property>

<property>
    <name>ranger.service.https.attrib.clientAuth</name>
    <value>want</value>
</property>

<property>
    <name>ranger.service.https.port</name>
    <value>6182</value>
</property>
```

# TLS certificates for Apache Ranger integration with Amazon EMR
<a name="emr-ranger-admin-tls"></a>

Apache Ranger integration with Amazon EMR requires that traffic from Amazon EMR nodes to the Ranger Admin server is encrypted using TLS, and that Ranger plugins authenticate to the Apache Ranger server using two-way mutual TLS authentication. Amazon EMR service needs the public certificate of your Ranger Admin server (specified in the previous example) and the private certificate.

**Apache Ranger plugin certificates**

Apache Ranger plugin public TLS certificates must be accessible to the Apache Ranger Admin server to validate when the plugins connect. There are three different methods to do this.

**Method 1: Configure a truststore in Apache Ranger Admin server**

Fill in the following configurations in ranger-admin-site.xml to configure a truststore.

```
<property>
    <name>ranger.truststore.file</name>
    <value><LOCATION TO TRUSTSTORE></value>
</property>

<property>
    <name>ranger.truststore.password</name>
    <value><PASSWORD FOR TRUSTSTORE></value>
</property>
```

**Method 2: Load the certificate into Java cacerts truststore**

If your Ranger Admin server doesn't specify a truststore in its JVM options, then you can put the plugin public certificates in the default cacerts store.

**Method 3: Create a truststore and specify as part of JVM Options**

Within `{RANGER_HOME_DIRECTORY}/ews/ranger-admin-services.sh`, modify `JAVA_OPTS` to include `"-Djavax.net.ssl.trustStore=<TRUSTSTORE_LOCATION>"` and `"-Djavax.net.ssl.trustStorePassword=<TRUSTSTORE_PASSWORD>"`. For example, add the following line after the existing JAVA\$1OPTS.

```
JAVA_OPTS=" ${JAVA_OPTS} -Djavax.net.ssl.trustStore=${RANGER_HOME}/truststore/truststore.jck -Djavax.net.ssl.trustStorePassword=changeit"
```

**Note**  
This specification may expose the truststore password if any user is able to log into the Apache Ranger Admin server and see running processes, such as when using the `ps` command.

**Using Self-Signed Certificates**

Self-signed certificates are not recommended as certificates. Self-signed certificates may not be revoked, and self-signed certificates may not conform to internal security requirements.

# Service definition installation for Ranger integration with Amazon EMR
<a name="emr-ranger-admin-servicedef-install"></a>

A service definition is used by the Ranger Admin server to describe the attributes of policies for an application. The policies are then stored in a policy repository for clients to download. 

To be able to configure service definitions, REST calls must be made to the Ranger Admin server. See [Apache Ranger PublicAPIsv2](https://ranger.apache.org/apidocs/resource_PublicAPIsv2.html#resource_PublicAPIsv2_createServiceDef_POST)for APIs required in the following section.

**Installing Apache Spark's Service Definition**

To install Apache Spark's service definition, see [Apache Spark plugin for Ranger integration with Amazon EMR](emr-ranger-spark.md).

**Installing EMRFS Service Definition**

To install the S3 service definition for Amazon EMR, see [EMRFS S3 plugin for Ranger integration with Amazon EMR](emr-ranger-emrfs.md).

**Using Hive Service Definition**

Apache Hive can use the existing Ranger service definition that ships with Apache Ranger 2.0 and later. For more information, see [Apache Hive plugin for Ranger integration with Amazon EMR](emr-ranger-hive.md).

# Network traffic rules for integrating with Amazon EMR
<a name="emr-ranger-network"></a>

When Apache Ranger is integrated with your EMR cluster, the cluster needs to communicate with additional servers and AWS.

All Amazon EMR nodes, including core and task nodes, must be able to communicate with the Apache Ranger Admin servers to download policies. If your Apache Ranger Admin is running on Amazon EC2, you need to update the security group to be able to take traffic from the EMR cluster.

In addition to communicating with the Ranger Admin server, all nodes need to be able to communicate with the following AWS services:
+ Amazon S3
+ AWS KMS (if using EMRFS SSE-KMS)
+ Amazon CloudWatch
+ AWS STS

If you are planning to run your EMR cluster within a private subnet, configure the VPC to be able to communicate with these services using either [AWS PrivateLink and VPC endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/endpoint-services-overview.html) in the *Amazon VPC User Guide* or using [network address translation (NAT) instance](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Instance.html) in the *Amazon VPC User Guide*.

# IAM roles for native integration with Apache Ranger
<a name="emr-ranger-iam"></a>

The integration between Amazon EMR and Apache Ranger relies on three key roles that you should create before you launch your cluster:
+ A custom Amazon EC2 instance profile for Amazon EMR
+ An IAM role for Apache Ranger Engines
+ An IAM role for other AWS services

This section gives an overview of these roles and the policies that you need to include for each IAM role. For information about creating these roles, see [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md).

# EC2 instance profile for Amazon EMR
<a name="emr-ranger-iam-ec2"></a>

Amazon EMR uses an IAM service role to perform actions on your behalf to provision and manage clusters. The service role for cluster EC2 instances, also called the EC2 instance profile for Amazon EMR, is a special type of service role assigned to every EC2 instance in a cluster at launch.

To define permissions for EMR cluster interaction with Amazon S3 data and with Hive metastore protected by Apache Ranger and other AWS services, define a custom EC2 instance profile to use instead of the `EMR_EC2_DefaultRole` when you launch your cluster.

For more information, see [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md) and [Customize IAM roles with Amazon EMR](emr-iam-roles-custom.md).

You need to add the following statements to the default EC2 Instance Profile for Amazon EMR to be able to tag sessions and access the AWS Secrets Manager that stores TLS certificates.

```
    {
      "Sid": "AllowAssumeOfRolesAndTagging",
      "Effect": "Allow",
      "Action": ["sts:TagSession", "sts:AssumeRole"],
      "Resource": [
        "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<RANGER_ENGINE-PLUGIN_DATA_ACCESS_ROLE_NAME>",
        "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<RANGER_USER_ACCESS_ROLE_NAME>"
      ]
    },
    {
        "Sid": "AllowSecretsRetrieval",
        "Effect": "Allow",
        "Action": "secretsmanager:GetSecretValue",
        "Resource": [
            "arn:aws:secretsmanager:<REGION>:<AWS_ACCOUNT_ID>:secret:<PLUGIN_TLS_SECRET_NAME>*",
            "arn:aws:secretsmanager:<REGION>:<AWS_ACCOUNT_ID>:secret:<ADMIN_RANGER_SERVER_TLS_SECRET_NAME>*"
        ]
    }
```

**Note**  
For the Secrets Manager permissions, do not forget the wildcard ("\$1") at the end of the secret name or your requests will fail. The wildcard is for secret versions.

**Note**  
Limit the scope of the AWS Secrets Manager policy to only the certificates that are required for provisioning.

# IAM role for Apache Ranger
<a name="emr-ranger-iam-ranger"></a>

This role provides credentials for trusted execution engines, such as Apache Hive and Amazon EMR Record Server to access Amazon S3 data. Use only this role to access Amazon S3 data, including any KMS keys, if you are using S3 SSE-KMS.

This role must be created with the minimum policy stated in the following example.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "CloudwatchLogsPermissions",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:logs:*:123456789012:log-group:CLOUDWATCH_LOG_GROUP_NAME_IN_SECURITY_CONFIGURATION:*"
      ]
    },
    {
      "Sid": "BucketPermissionsInS3Buckets",
      "Action": [
        "s3:CreateBucket",
        "s3:DeleteBucket",
        "s3:ListAllMyBuckets",
        "s3:ListBucket"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket1",
        "arn:aws:s3:::amzn-s3-demo-bucket2"
      ]
    },
    {
      "Sid": "ObjectPermissionsInS3Objects",
      "Action": [
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:PutObject"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket1/*",
        "arn:aws:s3:::amzn-s3-demo-bucket2/*"
      ]
    }
  ]
}
```

------

**Important**  
The asterisk "\$1" at the end of the CloudWatch Log Resource must be included to provide permission to write to the log streams.

**Note**  
If you are using EMRFS consistency view or S3-SSE encryption, add permissions to the DynamoDB tables and KMS keys so that execution engines can interact with those engines.

The IAM role for Apache Ranger is assumed by the EC2 Instance Profile Role. Use the following example to create a trust policy that allows the IAM role for Apache Ranger to be assumed by the EC2 instance profile role.

```
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<EC2 INSTANCE PROFILE ROLE NAME eg. EMR_EC2_DefaultRole>"
      },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    }
```

# IAM role for other AWS services for Amazon EMR integration
<a name="emr-ranger-iam-other-AWS"></a>

This role provides users who are not trusted execution engines with credentials to interact with AWS services, if needed. Do not use this IAM role to allow access to Amazon S3 data, unless it's data that should be accessible by all users.

This role will be assumed by the EC2 Instance Profile Role. Use the following example to create a trust policy that allows the IAM role for Apache Ranger to be assumed by the EC2 instance profile role.

```
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<EC2 INSTANCE PROFILE ROLE NAME eg. EMR_EC2_DefaultRole>"
      },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    }
```

# Validate your permissions for Amazon EMR integration with Apache Ranger
<a name="emr-ranger-iam-validate"></a>

See [Apache Ranger troubleshooting](emr-ranger-troubleshooting.md) for instructions on validating permissions.

# Create the EMR security configuration
<a name="emr-ranger-security-config"></a>

**Creating an Amazon EMR Security Configuration for Apache Ranger**

Before you launch an Amazon EMR cluster integrated with Apache Ranger, create a security configuration.

------
#### [ Console ]

**To create a security configuration that specifies the AWS Ranger integration option**

1. In the Amazon EMR console, select **Security configurations**, then **Create**.

1. Type a **Name** for the security configuration. You use this name to specify the security configuration when you create a cluster.

1. Under **AWS Ranger Integration**, select **Enable fine-grained access control managed by Apache Ranger**.

1. Select your **IAM role for Apache Ranger** to apply. For more information, see [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md).

1. Select your **IAM role for other AWS services** to apply.

1. Configure the plugins to connect to the Ranger Admin server by entering the Secrets Manager ARN for the Admin server and the address.

1. Select the applications to configure Ranger plugins. Enter the Secrets Manager ARN that contains the private TLS certificate for the plugin.

   If you do not configure Apache Spark or Apache Hive, and they are selected as an application for your cluster, the request fails.

1. Set up other security configuration options as appropriate and choose **Create**. You must enable Kerberos authentication using the cluster-dedicated or external KDC.

**Note**  
You cannot currently use the console to create a security configuration that specifies the AWS Ranger integration option in the AWS GovCloud (US) Region. Security configuration can be done using the CLI.

------
#### [ CLI ]

**To create a security configuration for Apache Ranger integration**

1. Replace `<ACCOUNT ID>` with your AWS account ID.

1. Replace `<REGION>` with the Region that the resource is in.

1. Specify a value for `TicketLifetimeInHours` to determine the period for which a Kerberos ticket issued by the KDC is valid.

1. Specify the address of the Ranger Admin server for `AdminServerURL`.

```
{
    "AuthenticationConfiguration": {
        "KerberosConfiguration": {
            "Provider": "ClusterDedicatedKdc",
            "ClusterDedicatedKdcConfiguration": {
                "TicketLifetimeInHours": 24
            }
        }
    },
    "AuthorizationConfiguration":{
      "RangerConfiguration":{
         "AdminServerURL":"https://_<RANGER ADMIN SERVER IP>_:6182",
         "RoleForRangerPluginsARN":"arn:aws:iam::_<ACCOUNT ID>_:role/_<RANGER PLUGIN DATA ACCESS ROLE NAME>_",
         "RoleForOtherAWSServicesARN":"arn:aws:iam::_<ACCOUNT ID>_:role/_<USER ACCESS ROLE NAME>_",
         "AdminServerSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES ADMIN SERVERS PUBLIC TLS CERTIFICATE WITHOUT VERSION>_",
         "RangerPluginConfigurations":[
            {
               "App":"Spark",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES SPARK PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<SPARK SERVICE NAME eg. amazon-emr-spark>"
            },
            {
               "App":"Hive",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES Hive PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<HIVE SERVICE NAME eg. Hivedev>"
            },
            {
               "App":"EMRFS-S3",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES EMRFS S3 PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<EMRFS S3 SERVICE NAME eg amazon-emr-emrfs>"
            }, 
	      {
               "App":"Trino",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES TRINO PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<TRINO SERVICE NAME eg amazon-emr-trino>"
            }
         ],
         "AuditConfiguration":{
            "Destinations":{
               "AmazonCloudWatchLogs":{
                  "CloudWatchLogGroup":"arn:aws:logs:<REGION>:_<ACCOUNT ID>_:log-group:_<LOG GROUP NAME FOR AUDIT EVENTS>_"
               }
            }
         }
      }
   }
}
```

The PolicyRespositoryNames are the service names that are specified in your Apache Ranger Admin.

Create an Amazon EMR security configuration with the following command. Replace security-configuration with a name of your choice. Select this configuration by name when you create your cluster.

```
aws emr create-security-configuration \
--security-configuration file://./security-configuration.json \
--name security-configuration
```

------

**Configure Additional Security Features**

To securely integrate Amazon EMR with Apache Ranger, configure the following EMR security features:
+ Enable Kerberos authentication using the cluster-dedicated or external KDC. For instructions, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md).
+ (Optional) Enable encryption in transit or at rest. For more information, see [Encryption options for Amazon EMR](emr-data-encryption-options.md).

For more information, see [Security in Amazon EMR](emr-security.md).

# Store TLS certificates in AWS Secrets Manager
<a name="emr-ranger-tls-certificates"></a>

The Ranger plugins installed on an Amazon EMR cluster and the Ranger Admin server must communicate over TLS to ensure that policy data and other information sent cannot be read if they are intercepted. EMR also mandates that the plugins authenticate to the Ranger Admin server by providing its own TLS certificate and perform two-way TLS authentication. This setup required four certificates to be created: two pairs of private and public TLS certificates. For instructions on installing the certificate to your Ranger Admin server, see [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md). To complete the setup, the Ranger plugins installed on the EMR cluster need two certificates: the public TLS certificate of your admin server, and the private certificate that the plugin will use to authenticate against the Ranger Admin server. To provide these TLS certificates, they must be in the AWS Secrets Manager and provided in a EMR Security Configuration.

**Note**  
It is strongly recommended, but not required, to create a certificate pair for each of your applications to limit impact if one of the plugin certificates becomes compromised.

**Note**  
You need to track and rotate certificates prior to their expiration date. 

## Certificate format
<a name="emr-ranger-tls-cert-format"></a>

Importing the certificates to the AWS Secrets Manager is the same regardless of whether it is the private plugin certificate or the public Ranger admin certificate. Before importing the TLS certificates, the certificates must be in 509x PEM format.

An example of a public certificate is in the format:

```
-----BEGIN CERTIFICATE-----
...Certificate Body...
-----END CERTIFICATE-----
```

An example of a private certificate is in the format:

```
-----BEGIN PRIVATE KEY-----
...Private Certificate Body...
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
...Trust Certificate Body...
-----END CERTIFICATE-----
```

The private certificate should also contain a trust certificate as well.

You can validate that the certificates are in the correct format by running the following command:

```
openssl x509 -in <PEM FILE> -text
```

## Importing a certificate to the AWS Secrets Manager
<a name="emr-ranger-tls-cert-import"></a>

When creating your Secret in the Secrets Manager, choose **Other type of secrets** under **secret type** and paste your PEM encoded certificate in the **Plaintext** field.

![\[Importing a certificate to AWS Secrets Manager.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-tls-cert-import.png)


# Start an EMR cluster with Apache Ranger
<a name="emr-ranger-start-emr-cluster"></a>

Before you launch an Amazon EMR cluster with Apache Ranger, make sure each component meets the following minimum version requirement:
+ Amazon EMR 5.32.0 or later, or 6.3.0 or later. We recommend that you use the latest Amazon EMR release version.
+ Apache Ranger Admin server 2.x.

Complete the following steps.
+ Install Apache Ranger if you haven't already. For more information, see [Apache Ranger 0.5.0 installation](https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+0.5.0+Installation).
+ Make sure there is network connectivity between your Amazon EMR cluster and the Apache Ranger Admin server. See [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md)
+ Create the necessary IAM Roles. See [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md).
+ Create a EMR security configuration for Apache Ranger installation. See more information, see [Create the EMR security configuration](emr-ranger-security-config.md).

# Configure Zeppelin for Apache Ranger-enabled Amazon EMR clusters
<a name="emr-ranger-configure-zeppelin"></a>

The topic covers how to configure [Apache Zeppelin](https://zeppelin.apache.org/) for an Apache Ranger-enabled Amazon EMR cluster so that you can use Zeppelin as a notebook for interactive data exploration. Zeppelin is included in Amazon EMR release versions 5.0.0 and later. Earlier release versions include Zeppelin as a sandbox application. For more information, see [Amazon EMR 4.x release versions](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-4x.html) in the *Amazon EMR Release Guide*.

By default, Zeppelin is configured with a default login and password which is not secure in a multi-tenant environment.

To configure Zeppelin, complete the following steps.

1. **Modify the authentication mechanism**. 

   Modify the `shiro.ini` file to implement your preferred authentication mechanism. Zeppelin supports Active Directory, LDAP, PAM, and Knox SSO. See [Apache Shiro authentication for Apache Zeppelin](https://zeppelin.apache.org/docs/0.8.2/setup/security/shiro_authentication.html) for more information.

1. **Configure Zeppelin to impersonate the end user**

   When you allow Zeppelin to impersonate the end user, jobs submitted by Zeppelin can be run as that end user. Add the following configuration to `core-site.xml`:

   ```
   [
     {
       "Classification": "core-site",
       "Properties": {
         "hadoop.proxyuser.zeppelin.hosts": "*",
         "hadoop.proxyuser.zeppelin.groups": "*"
       },
       "Configurations": [
       ]
     }
   ]
   ```

   Next, add the following configuration to `hadoop-kms-site.xml` located in `/etc/hadoop/conf`:

   ```
   [
     {
       "Classification": "hadoop-kms-site",
       "Properties": {
         "hadoop.kms.proxyuser.zeppelin.hosts": "*",
         "hadoop.kms.proxyuser.zeppelin.groups": "*"
       },
       "Configurations": [
       ]
     }
   ]
   ```

   You can also add these configurations to your Amazon EMR cluster using the console by following the steps in [Reconfigure an instance group in the console](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html#emr-configure-apps-running-cluster-console).

1. **Allow Zeppelin to sudo as the end user**

   Create a file `/etc/sudoers.d/90-zeppelin-user` that contains the following:

   ```
   zeppelin ALL=(ALL) NOPASSWD:ALL
   ```

1. **Modify interpreters settings to run user jobs in their own processes**.

   For all interpreters, configure them to instantiate the interpreters "Per User" in "isolated" processes.  
![\[Amazon EMR and Apache Ranger architecture diagram.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/per_user.png)

1. **Modify `zeppelin-env.sh`**

   Add the following to `zeppelin-env.sh` so that Zeppelin starts launch interpreters as the end user:

   ```
   ZEPPELIN_IMPERSONATE_USER=`echo ${ZEPPELIN_IMPERSONATE_USER} | cut -d @ -f1`
   export ZEPPELIN_IMPERSONATE_CMD='sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c'
   ```

   Add the following to `zeppelin-env.sh` to change the default notebook permissions to read-only to the creator only:

   ```
   export ZEPPELIN_NOTEBOOK_PUBLIC="false"
   ```

   Finally, add the following to `zeppelin-env.sh` to include the EMR RecordServer class path after the first `CLASSPATH` statement:

   ```
   export CLASSPATH="$CLASSPATH:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-connector-common.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-spark-connector.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-client.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-common.jar:/usr/share/aws/emr/record-server/lib/jars/secret-agent-interface.jar"
   ```

1. **Restart Zeppelin.**

   Run the following command to restart Zeppelin:

   ```
   sudo systemctl restart zeppelin
   ```

# Known issues for Amazon EMR integration
<a name="emr-ranger-security-considerations"></a>

**Known Issues**

There is a known issue within Amazon EMR release 5.32 in which the permissions for `hive-site.xml` was changed so that only privileged users can read it as there may be credentials stored within it. This could prevent Hue from reading `hive-site.xml` and cause webpages to continuously reload. If you experience this issue, add the following configuration to fix the issue:

```
[
  {
    "Classification": "hue-ini",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "desktop",
        "Properties": {
          "server_group":"hive_site_reader"
         },
        "Configurations":[
        ]
      }
    ]
  }
]
```

There is a known issue that the EMRFS S3 plugin for Apache Ranger currently does not support Apache Ranger’s Security Zone feature. Access control restrictions defined using the Security Zone feature are not applied on your Amazon EMR clusters.

**Application UIs**

By default, Application UI's do not perform authentication. This includes the ResourceManager UI, NodeManager UI, Livy UI, among others. In addition, any user that has the ability to access the UIs is able to view information about all other users' jobs.

If this behavior is not desired, you should ensure that a security group is used to restrict access to the application UIs by users.

**HDFS Default Permissions**

By default, the objects that users create in HDFS are given world readable permissions. This can potentially cause data readable by users that should not have access to it. To change this behavior such that the default file permissions are set to read and write only by the creator of the job, perform these steps.

When creating your EMR cluster, provide the following configuration:

```
[
  {
    "Classification": "hdfs-site",
    "Properties": {
      "dfs.namenode.acls.enabled": "true",
      "fs.permissions.umask-mode": "077",
      "dfs.permissions.superusergroup": "hdfsadmingroup"
    }
  }
]
```

In addition, run the following bootstrap action:

```
--bootstrap-actions Name='HDFS UMask Setup',Path=s3://elasticmapreduce/hdfs/umask/umask-main.sh
```

# Apache Ranger plugins for Amazon EMR integration scenarios
<a name="emr-ranger-plugins"></a>

Apache Ranger plugins validate the access of a user against the authorization policies defined in the Apache Ranger policy admin server.

**Topics**
+ [Apache Hive plugin for Ranger integration with Amazon EMR](emr-ranger-hive.md)
+ [Apache Spark plugin for Ranger integration with Amazon EMR](emr-ranger-spark.md)
+ [EMRFS S3 plugin for Ranger integration with Amazon EMR](emr-ranger-emrfs.md)
+ [Trino plugin for Ranger integration with Amazon EMR](emr-ranger-trino.md)

# Apache Hive plugin for Ranger integration with Amazon EMR
<a name="emr-ranger-hive"></a>

Apache Hive is a popular execution engine within the Hadoop ecosystem. Amazon EMR provides an Apache Ranger plugin to be able to provide fine-grained access controls for Hive. The plugin is compatible with open source Apache Ranger Admin server version 2.0 and later.

**Topics**
+ [Supported features](#emr-ranger-supported-features)
+ [Installation of service configuration](#emr-ranger-hive-service-config)
+ [Considerations](#emr-ranger-hive-considerations)
+ [Limitations](#emr-ranger-hive-limitations)

## Supported features
<a name="emr-ranger-supported-features"></a>

The Apache Ranger plugin for Hive on EMR supports all the functionality of the open source plugin, which includes database, table, column level access controls and row filtering and data masking. For a table of Hive commands and associated Ranger permissions, see [Hive commands to Ranger permission mapping](https://cwiki.apache.org/confluence/display/RANGER/Hive+Commands+to+Ranger+Permission+Mapping).

## Installation of service configuration
<a name="emr-ranger-hive-service-config"></a>

The Apache Hive plugin is compatible with the existing Hive service definition within Apache Hive Hadoop SQL.

![\[Apache Hive service definition for Hadoop SQL.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger_service_mgr.png)


If you do not have an instance of the service under Hadoop SQL, like shown above, you can create one. Click on the **\$1** next to Hadoop SQL.

1. **Service Name (If displayed)**: Enter the service name. The suggested value is **amazonemrhive**. Make a note of this service name -- it's needed when creating an EMR security configuration.

1. **Display Name**: Enter the name to be displayed for the service. The suggested value is **amazonemrhive**.

![\[Apache Hive service details for Hadoop SQL.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger_create_service.png)


The Apache Hive Config Properties are used to establish a connection to your Apache Ranger Admin server with a HiveServer2 to implement auto complete when creating policies. The properties below are not required to be accurate if you do not have a persistent HiveServer2 process and can be filled with any information.
+ **Username**: Enter a user name for the JDBC connection to an instance of an HiveServer2 instance.
+ **Password**: Enter the password for the user name above.
+ **jdbc.driver.ClassName**: Enter the class name of JDBC class for Apache Hive connectivity. The default value can be used.
+ **jdbc.url**: Enter the JDBC connection string to use when connecting to HiveServer2.
+ **Common Name for Certificate**: The CN field within the certificate used to connect to the admin server from a client plugin. This value must match the CN field in your TLS certificate that was created for the plugin.

![\[Apache Hive service configuration properties.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger_config_props.png)


The **Test Connection** button tests whether the values above can be used to successfully connect to the HiveServer2 instance. Once the service is successfully created, the Service Manager should look like below:

![\[Connected to the HiveServer2 instance\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger_config_connected.png)


## Considerations
<a name="emr-ranger-hive-considerations"></a>

**Hive metadata server**

The Hive metadata server can only be accessed by trusted engines, specifically Hive and `emr_record_server`, to protect against unauthorized access. The Hive metadata server is also accessed by all nodes on the cluster. The required port 9083 provides all nodes access to the main node.

**Authentication**

By default, Apache Hive is configured to authenticate using Kerberos as configured in the EMR Security configuration. HiveServer2 can be configured to authenticate users using LDAP as well. See [Implementing LDAP authentication for Hive on a multi-tenant Amazon EMR cluster](https://aws.amazon.com/blogs/big-data/implementing-ldap-authentication-for-hive-on-a-multi-tenant-amazon-emr-cluster/) for information.

## Limitations
<a name="emr-ranger-hive-limitations"></a>

The following are current limitations for the Apache Hive plugin on Amazon EMR 5.x:
+ Hive roles are not currently supported. Grant, Revoke statements are not supported.
+ Hive CLI is not supported. JDBC/Beeline is the only authorized way to connect Hive.
+ `hive.server2.builtin.udf.blacklist` configuration should be populated with UDFs that you deem unsafe.

# Apache Spark plugin for Ranger integration with Amazon EMR
<a name="emr-ranger-spark"></a>

Amazon EMR has integrated EMR RecordServer to provide fine-grained access control for SparkSQL. EMR's RecordServer is a privileged process running on all nodes on an Apache Ranger-enabled cluster. When a Spark driver or executor runs a SparkSQL statement, all metadata and data requests go through the RecordServer. To learn more about EMR RecordServer, see the [Amazon EMR components for use with Apache Ranger](emr-ranger-components.md) page.

**Topics**
+ [Supported features](#emr-ranger-spark-supported-features)
+ [Redeploy service definition to use INSERT, ALTER, or DDL statements](#emr-ranger-spark-redeploy-service-definition)
+ [Installation of service definition](#emr-ranger-spark-install-servicedef)
+ [Creating SparkSQL policies](#emr-ranger-spark-create-sparksql)
+ [Considerations](#emr-ranger-spark-considerations)
+ [Limitations](#emr-ranger-spark-limitations)

## Supported features
<a name="emr-ranger-spark-supported-features"></a>


| SQL statement/Ranger action | STATUS | Supported EMR release | 
| --- | --- | --- | 
|  SELECT  |  Supported  |  As of 5.32  | 
|  SHOW DATABASES  |  Supported  |  As of 5.32  | 
|  SHOW COLUMNS  |  Supported  |  As of 5.32  | 
|  SHOW TABLES  |  Supported  |  As of 5.32  | 
|  SHOW TABLE PROPERTIES  |  Supported  |  As of 5.32  | 
|  DESCRIBE TABLE  |  Supported  |  As of 5.32  | 
|  INSERT OVERWRITE  |  Supported  |  As of 5.34 and 6.4  | 
| INSERT INTO | Supported | As of 5.34 and 6.4 | 
|  ALTER TABLE  |  Supported  |  As of 6.4  | 
|  CREATE TABLE  |  Supported  |  As of 5.35 and 6.7  | 
|  CREATE DATABASE  |  Supported  |  As of 5.35 and 6.7  | 
|  DROP TABLE  |  Supported  |  As of 5.35 and 6.7  | 
|  DROP DATABASE  |  Supported  |  As of 5.35 and 6.7  | 
|  DROP VIEW  |  Supported  |  As of 5.35 and 6.7  | 
|  CREATE VIEW  |  Not Supported  |    | 

The following features are supported when using SparkSQL:
+ Fine-grained access control on tables within the Hive Metastore, and policies can be created at a database, table, and column level.
+ Apache Ranger policies can include grant policies and deny policies to users and groups.
+ Audit events are submitted to CloudWatch Logs.

## Redeploy service definition to use INSERT, ALTER, or DDL statements
<a name="emr-ranger-spark-redeploy-service-definition"></a>

**Note**  
Starting with Amazon EMR 6.4, you can use Spark SQL with the statements: INSERT INTO, INSERT OVERWRITE, or ALTER TABLE. Starting with Amazon EMR 6.7, you can use Spark SQL to create or drop databases and tables. If you have an existing installation on Apache Ranger server with Apache Spark service definitions deployed, use the following code to redeploy the service definitions.  

```
# Get existing Spark service definition id calling Ranger REST API and JSON processor
curl --silent -f -u <admin_user_login>:<password_for_ranger_admin_user> \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef/name/amazon-emr-spark' | jq .id

# Download the latest Service definition
wget https://s3.amazonaws.com/elasticmapreduce/ranger/service-definitions/version-2.0/ranger-servicedef-amazon-emr-spark.json

# Update the service definition using the Ranger REST API
curl -u <admin_user_login>:<password_for_ranger_admin_user> -X PUT -d @ranger-servicedef-amazon-emr-spark.json \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef/<Spark service definition id from step 1>'
```

## Installation of service definition
<a name="emr-ranger-spark-install-servicedef"></a>

The installation of EMR's Apache Spark service definition requires the Ranger Admin server to be setup. See [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md).

Follow these steps to install the Apache Spark service definition:

**Step 1: SSH into the Apache Ranger Admin server**

For example:

```
ssh ec2-user@ip-xxx-xxx-xxx-xxx.ec2.internal
```

**Step 2: Download the service definition and Apache Ranger Admin server plugin**

In a temporary directory, download the service definition. This service definition is supported by Ranger 2.x versions.

```
mkdir /tmp/emr-spark-plugin/
cd /tmp/emr-spark-plugin/

wget https://s3.amazonaws.com/elasticmapreduce/ranger/service-definitions/version-2.0/ranger-spark-plugin-2.x.jar
wget https://s3.amazonaws.com/elasticmapreduce/ranger/service-definitions/version-2.0/ranger-servicedef-amazon-emr-spark.json
```

**Step 3: Install the Apache Spark plugin for Amazon EMR**

```
export RANGER_HOME=.. # Replace this Ranger Admin's home directory eg /usr/lib/ranger/ranger-2.0.0-admin
mkdir $RANGER_HOME/ews/webapp/WEB-INF/classes/ranger-plugins/amazon-emr-spark
mv ranger-spark-plugin-2.x.jar $RANGER_HOME/ews/webapp/WEB-INF/classes/ranger-plugins/amazon-emr-spark
```

**Step 4: Register the Apache Spark service definition for Amazon EMR**

```
curl -u *<admin users login>*:*_<_**_password_ **_for_** _ranger admin user_**_>_* -X POST -d @ranger-servicedef-amazon-emr-spark.json \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef'
```

If this command runs successfully, you see a new service in your Ranger Admin UI called "AMAZON-EMR-SPARK", as shown in the following image (Ranger version 2.0 is shown).

![\["AMAZON-EMR-SPARK" registered in Ranger Admin.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-amazon-emr-spark.png)


**Step 5: Create an instance of the AMAZON-EMR-SPARK application**

**Service Name (If displayed):** The service name that will be used. The suggested value is **amazonemrspark**. Note this service name as it will be needed when creating an EMR security configuration.

**Display Name:** The name to be displayed for this instance. The suggested value is **amazonemrspark**.

**Common Name For Certificate:** The CN field within the certificate used to connect to the admin server from a client plugin. This value must match the CN field in your TLS certificate that was created for the plugin.

![\[Ranger Admin create service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-service.png)


**Note**  
The TLS certificate for this plugin should have been registered in the trust store on the Ranger Admin server. See [TLS certificates for Apache Ranger integration with Amazon EMR](emr-ranger-admin-tls.md) for more details.

## Creating SparkSQL policies
<a name="emr-ranger-spark-create-sparksql"></a>

When creating a new policy, the fields to fill in are:

**Policy Name**: The name of this policy.

**Policy Label**: A label that you can put on this policy.

**Database**: The database that this policy applies to. The wildcard "\$1" represents all databases.

**Table**: The tables that this policy applies to. The wildcard "\$1" represents all tables.

**EMR Spark Column**: The columns that this policy applies to. The wildcard "\$1" represents all columns.

**Description**: A description of this policy.

![\[Ranger Admin create SparkSQL policy details.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-policy-details.png)


To specify the users and groups, enter the users and groups below to grant permissions. You can also specify exclusions for the **allow** conditions and **deny** conditions.

![\[Ranger Admin SparkSQL policy details allow conditions.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-policy-allow-conditions.png)


After specifying the allow and deny conditions, click **Save**.

## Considerations
<a name="emr-ranger-spark-considerations"></a>

Each node within the EMR cluster must be able to connect to the main node on port 9083.

## Limitations
<a name="emr-ranger-spark-limitations"></a>

The following are current limitations for the Apache Spark plugin:
+ Record Server will always connect to HMS running on an Amazon EMR cluster. Configure HMS to connect to Remote Mode, if required. You should not put config values inside the Apache Spark Hive-site.xml configuration file.
+ Tables created using Spark datasources on CSV or Avro are not readable using EMR RecordServer. Use Hive to create and write data, and read using Record.
+ Delta Lake, Hudi and Iceberg tables aren't supported.
+ Users must have access to the default database. This is a requirement for Apache Spark.
+ Ranger Admin server does not support auto-complete.
+ The SparkSQL plugin for Amazon EMR does not support row filters or data masking.
+ When using ALTER TABLE with Spark SQL, a partition location must be the child directory of a table location. Inserting data into a partition where the partition location is different from the table location is not supported.

# EMRFS S3 plugin for Ranger integration with Amazon EMR
<a name="emr-ranger-emrfs"></a>

To make it easier to provide access controls against objects in S3 on a multi-tenant cluster, the EMRFS S3 plugin provides access controls to the data within S3 when accessing it through EMRFS. You can allow access to S3 resources at a user and group level.

To achieve this, when your application attempts to access data within S3, EMRFS sends a request for credentials to the Secret Agent process, where the request is authenticated and authorized against an Apache Ranger plugin. If the request is authorized, then the Secret Agent assumes the IAM role for Apache Ranger Engines with a restricted policy to generate credentials that only have access to the Ranger policy that allowed the access. The credentials are then passed back to EMRFS to access S3.

**Topics**
+ [Supported features](#emr-ranger-emrfs-features)
+ [Installation of service configuration](#emr-ranger-emrfs-service-config)
+ [Creating EMRFS S3 policies](#emr-ranger-emrfs-create-policies)
+ [EMRFS S3 policies usage notes](#emr-ranger-emrfs-considerations)
+ [Limitations](#emr-ranger-emrfs-limitations)

## Supported features
<a name="emr-ranger-emrfs-features"></a>

EMRFS S3 plugin provides storage level authorization. Policies can be created to provide access to users and groups to S3 buckets and prefixes. Authorization is done only against EMRFS.

## Installation of service configuration
<a name="emr-ranger-emrfs-service-config"></a>

To install the EMRFS service definition, you must set up the Ranger Admin server. To set up the server, see [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md).

Follow these steps to install the EMRFS service definition.

**Step 1: SSH into the Apache Ranger Admin server**.

For example:

```
ssh ec2-user@ip-xxx-xxx-xxx-xxx.ec2.internal
```

**Step 2: Download the EMRFS service definition**.

In a temporary directory, download the Amazon EMR service definition. This service definition is supported by Ranger 2.x versions.

```
wget https://s3.amazonaws.com/elasticmapreduce/ranger/service-definitions/version-2.0/ranger-servicedef-amazon-emr-emrfs.json
```

**Step 3: Register EMRFS S3 service definition**.

```
curl -u *<admin users login>*:*_<_**_password_ **_for_** _ranger admin user_**_>_* -X POST -d @ranger-servicedef-amazon-emr-emrfs.json \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef'
```

If this command runs successfully, you see a new service in the Ranger Admin UI called "AMAZON-EMR-S3", as shown in the following image (Ranger version 2.0 is shown).

![\[Ranger Admin create EMRFS S3 service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-service-EMRFS.png)


**Step 4: Create an instance of the AMAZON-EMR-EMRFS application**.

Create an instance of the service definition.
+ Click on the **\$1** next to AMAZON-EMR-EMRFS.

Fill in the following fields:

**Service Name (If displayed)**: The suggested value is **amazonemrs3**. Note this service name as it will be needed when creating an EMR security configuration. 

**Display Name**: The name displayed for this service. The suggested value is **amazonemrs3**.

**Common Name For Certificate**: The CN field within the certificate used to connect to the admin server from a client plugin. This value must match the CN field in the TLS certificate that was created for the plugin.

![\[Ranger Admin edit EMRFS S3 service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-edit-service-EMRFS.png)


**Note**  
The TLS certificate for this plugin should have been registered in the trust store on the Ranger Admin server. See [TLS certificates for Apache Ranger integration with Amazon EMR](emr-ranger-admin-tls.md) for more details.

When the service is created, the Service Manager includes "AMAZON-EMR-EMRFS", as shown in the following image.

![\[Ranger Admin showing new EMRFS S3 service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-new-service-EMRFS.png)


## Creating EMRFS S3 policies
<a name="emr-ranger-emrfs-create-policies"></a>

To create a new policy in the **Create policy** page of the Service Manager, fill in the following fields.

**Policy Name**: The name of this policy.

**Policy Label**: A label that you can put on this policy.

**S3 Resource**: A resource starting with the bucket and optional prefix. See [EMRFS S3 policies usage notes](#emr-ranger-emrfs-considerations) for information on best practices. Resources in Ranger Admin server should not contain **s3://**, **s3a://** or **s3n://**.

![\[Ranger Admin showing create policy for EMRFS S3 service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-policy-EMRFS.png)


You can specify users and groups to grant permissions. You can also specify exclusions for **allow** conditions and **deny** conditions.

![\[Ranger Admin showing user/group permissions for EMRFS S3 policy.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-permissions-EMRFS.png)


**Note**  
A maximum of three resources are allowed for each policy. Adding more than three resources may result in an error when this policy is used on an EMR cluster. Adding more than three policies displays a reminder about the policy limit.

## EMRFS S3 policies usage notes
<a name="emr-ranger-emrfs-considerations"></a>

When creating S3 policies within Apache Ranger, there are some usage considerations to be aware of.

### Permissions to multiple S3 objects
<a name="emr-ranger-emrfs-considerations-s3objects"></a>

You can use recursive policies and wildcard expressions to give permissions to multiple S3 objects with common prefixes. Recursive policies give permissions to all objects with a common prefix. Wildcard expressions select multiple prefixes. Together, they give permissions to all objects with multiple common prefixes as shown in the following examples.

**Example Using a recursive policy**  
Suppose you want permissions to list all the parquet files in an S3 bucket organized as follows.  

```
s3://sales-reports/americas/
    +- year=2000
    |      +- data-q1.parquet
    |      +- data-q2.parquet
    +- year=2019
    |      +- data-q1.json
    |      +- data-q2.json
    |      +- data-q3.json
    |      +- data-q4.json
    |
    +- year=2020
    |      +- data-q1.parquet
    |      +- data-q2.parquet
    |      +- data-q3.parquet
    |      +- data-q4.parquet
    |      +- annual-summary.parquet    
    +- year=2021
```
First, consider the parquet files with the prefix `s3://sales-reports/americas/year=2000`. You can grant GetObject permissions to all of them in two ways:  
**Using non-recursive policies**: One option is to use two separate non-recursive policies, one for the directory and the other for the files.   
The first policy grants permission to the prefix `s3://sales-reports/americas/year=2020` (there is no trailing `/`).  

```
- S3 resource = "sales-reports/americas/year=2000"
- permission = "GetObject"
- user = "analyst"
```
The second policy uses wildcard expression to grant permissions all the files with prefix `sales-reports/americas/year=2020/` (note the trailing `/`).  

```
- S3 resource = "sales-reports/americas/year=2020/*"
- permission = "GetObject"
- user = "analyst"
```
**Using a recursive policy**: A more convenient alternative is to use a single recursive policy and grant recursive permission to the prefix.  

```
 - S3 resource = "sales-reports/americas/year=2020"
 - permission = "GetObject"
 - user = "analyst"
 - is recursive = "True"
```
So far, only the parquet files with the prefix `s3://sales-reports/americas/year=2000` have been included. You can now also include the parquet files with a different prefix, `s3://sales-reports/americas/year=2020`, into the same recursive policy by introducing a wildcard expression as follows.  

```
 - S3 resource = "sales-reports/americas/year=20?0"
 - permission = "GetObject"
 - user = "analyst"
 - is recursive = "True"
```

### Policies for PutObject and DeleteObject permissions
<a name="emr-ranger-emrfs-considerations-putobject"></a>

Writing policies for `PutObject` and `DeleteObject` permissions to files on EMRFS need special care because, unlike GetObject permissions, they need additional recursive permissions granted to the prefix.

**Example Policies for PutObject and DeleteObject permissions**  
For example, deleting the file `annual-summary.parquet` requires not only a DeleteObject permission to the actual file.  

```
- S3 resource = "sales-reports/americas/year=2020/annual-summary.parquet"
- permission = "DeleteObject"
- user = "analyst"
```
It also requires a policy granting recursive `GetObject` and `PutObject` permissions to its prefix.  
Similarly, modifying the file `annual-summary.parquet`, requires not only a `PutObject` permission to the actual file.  

```
- S3 resource = "sales-reports/americas/year=2020/annual-summary.parquet"
- permission = "PutObject"
- user = "analyst"
```
It also requires a policy granting recursive `GetObject` permission to its prefix.  

```
- S3 resource = "sales-reports/americas/year=2020"
- permission = "GetObject"
- user = "analyst"
- is recursive = "True"
```

### Wildcards in policies
<a name="emr-ranger-emrfs-considerations-wildcards"></a>

There are two areas in which wildcards can be specified. When specifying an S3 resource, the "\$1" and "?" can be used. The "\$1" provides matching against an S3 path and matches everything after the prefix. For example, the following policy.

```
S3 resource = "sales-reports/americas/*"
```

This matches the following S3 paths.

```
sales-reports/americas/year=2020/
sales-reports/americas/year=2019/
sales-reports/americas/year=2019/month=12/day=1/afile.parquet 
sales-reports/americas/year=2018/month=6/day=1/afile.parquet 
sales-reports/americas/year=2017/afile.parquet
```

The "?" wildcard matches only a single character. For example, for the policy.

```
S3 resource = "sales-reports/americas/year=201?/"
```

This matches the following S3 paths.

```
sales-reports/americas/year=2019/
sales-reports/americas/year=2018/
sales-reports/americas/year=2017/
```

### Wildcards in users
<a name="emr-ranger-emrfs-considerations-wildcards-in-users"></a>

There are two built-in wildcards when assigning users to provide access to users. The first is the "\$1USER\$1" wildcard that provides access to all users. The second wildcard is "\$1OWNER\$1", which provides access to the owner of a particular object or directly. However, the "\$1USER\$1" wildcard is currently not supported.

## Limitations
<a name="emr-ranger-emrfs-limitations"></a>

The following are current limitations of the EMRFS S3 plugin:
+ Apache Ranger policies can have at most three policies.
+ Access to S3 must be done through EMRFS and can be used with Hadoop-related applications. The following is not supported:

  - Boto3 libraries

  - AWS SDK and AWK CLI

  - S3A open source connector
+ Apache Ranger deny policies are not supported.
+ Operations on S3 with keys having CSE-KMS encryption are currently not supported.
+ Cross-Region support is not supported.
+ Apache Ranger’s Security Zone feature is not supported. Access control restrictions defined using the Security Zone feature are not applied on your Amazon EMR clusters.
+ The Hadoop user does not generate any audit events as Hadoop always accesses the EC2 Instance Profile.
+ It's recommended that you disable Amazon EMR Consistency View. S3 is strongly consistent, so it's no longer needed. See [Amazon S3 strong consistency](https://aws.amazon.com/s3/consistency/) for more information.
+ The EMRFS S3 plugin makes numerous STS calls. It's advised that you do load testing on a development account and monitor STS call volume. It is also recommended that you make an STS request to raise AssumeRole service limits.
+ The Ranger Admin server doesn't support auto-complete.

# Trino plugin for Ranger integration with Amazon EMR
<a name="emr-ranger-trino"></a>

Trino (previously PrestoSQL) is a SQL query engine that you can use to run queries on data sources such as HDFS, object storage, relational databases, and NoSQL databases. It eliminates the need to migrate data into a central location and allows you to query the data from wherever it sits. Amazon EMR provides an Apache Ranger plugin to provide fine-grained access controls for Trino. The plugin is compatible with open source Apache Ranger Admin server version 2.0 and later.

**Topics**
+ [Supported features](#emr-ranger-trino-features)
+ [Installation of service configuration](#emr-ranger-trino-service-config)
+ [Creating Trino policies](#emr-ranger-trino-create-policies)
+ [Considerations](#emr-ranger-trino-considerations)
+ [Limitations](#emr-ranger-trino-limitations)

## Supported features
<a name="emr-ranger-trino-features"></a>

The Apache Ranger plugin for Trino on Amazon EMR supports all the functionality of the Trino query engine that is protected by fine-grained access control. This includes database, table, column level access controls and row filtering and data masking. Apache Ranger policies can include grant policies and deny policies to users and groups. Audit events are also submitted to CloudWatch logs.

## Installation of service configuration
<a name="emr-ranger-trino-service-config"></a>

The installation of the Trino service definition requires that the Ranger Admin server be set up. To set up the Ranger Admin sever, see [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md).

Follow these steps to install the Trino service definition.

1. SSH into the Apache Ranger Admin server.

   ```
   ssh ec2-user@ip-xxx-xxx-xxx-xxx.ec2.internal
   ```

   
1. Uninstall the Presto server plugin, if it exists. Run the following command. If this errors out with a “Service not found” error, this means the Presto server plugin wasn't installed on your server. Proceed to the next step.

   ```
   curl -f -u *<admin users login>*:*_<_**_password_ **_for_** _ranger admin user_**_>_* -X DELETE -k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef/name/presto'
   ```

1. Download the service definition and Apache Ranger Admin server plugin. In a temporary directory, download the service definition. This service definition is supported by Ranger 2.x versions.

   ```
   wget https://s3.amazonaws.com/elasticmapreduce/ranger/service-definitions/version-2.0/ranger-servicedef-amazon-emr-trino.json
   ```

1. Register the Apache Trino service definition for Amazon EMR.

   ```
   curl -u *<admin users login>*:*_<_**_password_ **_for_** _ranger admin user_**_>_* -X POST -d @ranger-servicedef-amazon-emr-trino.json \
   -H "Accept: application/json" \
   -H "Content-Type: application/json" \
   -k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef'
   ```

   If this command runs successfully, you see a new service in your Ranger Admin UI called `TRINO`, as shown in the following image.  
![\[Ranger Admin create service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-service-trino.png)

1. Create an instance of the `TRINO` application, entering the following information.

   **Service Name**: The service name that you'll use. The suggested value is `amazonemrtrino`. Note this service name, as it will be needed when creating an Amazon EMR security configuration.

   **Display Name**: The name to be displayed for this instance. The suggested value is `amazonemrtrino`.  
![\[Ranger Admin display name.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-display-name-trino.png)

   **jdbc.driver.ClassName**: The class name of JDBC class for Trino connectivity. You can use the default value.

   **jdbc.url**: The JDBC connection string to use when connecting to Trino coordinator.

   **Common Name For Certificate**: The CN field within the certificate used to connect to the admin server from a client plugin. This value must match the CN field in your TLS certificate that was created for the plugin.  
![\[Ranger Admin common name.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-common-name-trino.png)

   Note that the TLS certificate for this plugin should have been registered in the trust store on the Ranger Admin server. For more information, see [TLS certificates](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ranger-admin-tls.html).

## Creating Trino policies
<a name="emr-ranger-trino-create-policies"></a>

When you create a new policy, fill in the following fields.

**Policy Name**: The name of this policy.

**Policy Label**: A label that you can put on this policy.

**Catalog**: The catalog that this policy applies to. The wildcard "\$1" represents all catalogs.

**Schema**: The schemas that this policy applies to. The wildcard "\$1" represents all schemas.

**Table**: The tables that this policy applies to. The wildcard "\$1" represents all tables.

**Column**: The columns that this policy applies to. The wildcard "\$1" represents all columns.

**Description**: A description of this policy.

Other types of policies exist for the **Trino User** (for user impersonation access), the **Trino System/Session Property** (for altering engine system or session properties), **Functions/Procedures** (for allowing function or procedure calls), and the **URL** (for granting read/write access to the engine on data locations).

![\[Ranger Admin create policy details.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-policy-details-trino.png)


To grant permissions to specific users and groups, enter the users and groups. You can also specify exclusions for **allow** conditions and **deny** conditions.

![\[Ranger Admin policy details allow deny conditions.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-policy-allow-conditions-trino.png)


After specifying the allow and deny conditions, choose **Save**.

## Considerations
<a name="emr-ranger-trino-considerations"></a>

When creating Trino policies within Apache Ranger, there are some usage considerations to be aware of.

**Hive metadata server**

The Hive metadata server can only be accessed by trusted engines, specifically the Trino engine, to protect against unauthorized access. The Hive metadata server is also accessed by all nodes on the cluster. The required port 9083 provides all nodes access to the main node.

**Authentication**

By default, Trino is configured to authenticate using Kerberos as configured in the Amazon EMR security configuration.

**In-transit encryption required**

The Trino plugin requires you to have in-transit encryption enabled in the Amazon EMR security configuration. To enable encryption, see [Encryption in transit](emr-data-encryption-options.md#emr-encryption-intransit).

## Limitations
<a name="emr-ranger-trino-limitations"></a>

The following are current limitations of the Trino plugin:
+ Ranger Admin server doesn't support auto-complete.

# Apache Ranger troubleshooting
<a name="emr-ranger-troubleshooting"></a>

Here are some commonly diagnosed issues related to using Apache Ranger.

## Recommendations
<a name="emr-ranger-troubleshooting-recommendations"></a>
+ **Test using a single main node cluster:** Single node master clusters provision quicker than a multi-node cluster which can decrease the time for each testing iteration.
+ **Set development mode on the cluster.** When starting your EMR cluster, set the `--additional-info"` parameter to:

  `'{"clusterType":"development"}'`

  This parameter can only be set through the AWS CLI or AWS SDK and is not available through the Amazon EMR console. When this flag is set, and the master fails to provision, the Amazon EMR service keeps the cluster alive for some time before it decommissions it. This time is very useful for probing various log files before the cluster is terminated.

# EMR cluster failed to provision
<a name="emr-ranger-troubleshooting-cluster-failed"></a>

There are several reasons why an Amazon EMR cluster may fail to start. Here are a few ways to diagnose the issue.

**Check EMR provisioning logs**

Amazon EMR uses Puppet to install and configure applications on a cluster. Looking at the logs will provide details as to if there are any errors during the provisioning phase of a cluster. The logs are accessible on cluster or S3 if logs are configured to be pushed to S3.

The logs are stored in `/var/log/provision-node/apps-phase/0/{UUID}/puppet.log` on the disk and `s3://<LOG LOCATION>/<CLUSTER ID>/node/<EC2 INSTANCE ID>/provision-node/apps-phase/0/{UUID}/puppet.log.gz.`

**Common Error Messages**


| Error message | Cause | 
| --- | --- | 
|  Puppet (err): Systemd start for emr-record-server failed\$1 journalctl log for emr-record-server:  |  EMR Record Server failed to start. See EMR Record Server logs below.  | 
|  Puppet (err): Systemd start for emr-record-server failed\$1 journalctl log for emrsecretagent:  |  EMR Secret Agent failed to start. See Check Secret Agent logs below.  | 
|  /Stage[main]/Ranger\$1plugins::Ranger\$1hive\$1plugin/Ranger\$1plugins::Prepare\$1two\$1way\$1tls[configure 2-way TLS in Hive plugin]/Exec[create keystore and truststore for Ranger Hive plugin]/returns (notice): 140408606197664:error:0906D06C:PEM routines:PEM\$1read\$1bio:no start line:pem\$1lib.c:707:Expecting: ANY PRIVATE KEY  |  The private TLS certificate in Secret Manager for the Apache Ranger plugin certificate is not in the correct format or is not a private certificate. See [TLS certificates for Apache Ranger integration with Amazon EMR](emr-ranger-admin-tls.md) for certificate formats.  | 
|  /Stage[main]/Ranger\$1plugins::Ranger\$1s3\$1plugin/Ranger\$1plugins::Prepare\$1two\$1way\$1tls[configure 2-way TLS in Ranger s3 plugin]/Exec[create keystore and truststore for Ranger amazon-emr-s3 plugin]/returns (notice): An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:sts::XXXXXXXXXXX:assumed-role/EMR\$1EC2\$1DefaultRole/i-XXXXXXXXXXXX is not authorized to perform: secretsmanager:GetSecretValue on resource: arn:aws:secretsmanager:us-east-1:XXXXXXXXXX:secret:AdminServer-XXXXX  |  The EC2 Instance profile role does not have the correct permissions to retrieve the TLS certificates from Secrets Agent.  | 

**Check SecretAgent logs**

Secret Agent logs are located at `/emr/secretagent/log/` on an EMR node, or in the `s3://<LOG LOCATION>/<CLUSTER ID>/node/<EC2 INSTANCE ID>/daemons/secretagent/` directory in S3.

**Common Error Messages**


| Error message | Cause | 
| --- | --- | 
|  Exception in thread "main" com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: User: arn:aws:sts::XXXXXXXXXXXX:assumed-role/EMR\$1EC2\$1DefaultRole/i-XXXXXXXXXXXXXXX is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::XXXXXXXXXXXX:role/\$1RangerPluginDataAccessRole\$1 (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX; Proxy: null)  |  The above exception means that the EMR EC2 instance profile role does not have permissions to assume the role **RangerPluginDataAccessRole**. See [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md).  | 
|  ERROR qtp54617902-149: Web App Exception Occurred javax.ws.rs.NotAllowedException: HTTP 405 Method Not Allowed  |  These errors can be safely ignored.  | 

**Check Record Server Logs (for SparkSQL)**

EMR Record Server logs are available at /var/log/emr-record-server/ on an EMR node, or they can be found in the s3://<LOG LOCATION>/<CLUSTER ID>/node/<EC2 INSTANCE ID>/daemons/emr-record-server/ directory in S3.

**Common Error Messages**


| Error message | Cause | 
| --- | --- | 
|  InstanceMetadataServiceResourceFetcher:105 - [] Fail to retrieve token com.amazonaws.SdkClientException: Failed to connect to service endpoint   |  The EMR SecretAgent failed to come up or is having an issue. Inspect the SecretAgent logs for errors and the puppet script to determine if there were any provisioning errors.  | 

# Queries are unexpectedly failing for Ranger integration with Amazon EMR
<a name="emr-ranger-troubleshooting-queries-failed"></a>

**Check Apache Ranger plugin logs (Apache Hive, EMR RecordServer, EMR SecretAgent, etc., logs)**

This section is common across all applications that integrate with the Ranger plugin, such as Apache Hive, EMR Record Server, and EMR SecretAgent.

**Common Error Messages**


| Error message | Cause | 
| --- | --- | 
|  ERROR PolicyRefresher:272 - [] PolicyRefresher(serviceName=policy-repository): failed to find service. Will clean up local cache of policies (-1)   |  This error messages means that the service name you provided in the EMR security configuration does not match a service policy repository in the Ranger Admin server.  | 

If within Ranger Admin server your AMAZON-EMR-SPARK service looks like the following, then you should enter **amazonemrspark** as the service name.

![\[Ranger Admin server showing AMAZON-EMR-SPARK troubleshooting.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-amazon-emr-spark-troubleshooting.png)


# Control network traffic with security groups for your Amazon EMR cluster
<a name="emr-security-groups"></a>

Security groups act as virtual firewalls for EC2 instances in your cluster to control inbound and outbound traffic. Each security group has a set of rules that control inbound traffic, and a separate set of rules to control outbound traffic. For more information, see [Amazon EC2 security groups for Linux instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html) in the *Amazon EC2 User Guide*.

You use two classes of security groups with Amazon EMR: *Amazon EMR-managed security groups* and *additional security groups*.

Every cluster has managed security groups associated with it. You can use the default managed security groups that Amazon EMR creates, or specify custom managed security groups. Either way, Amazon EMR automatically adds rules to managed security groups that a cluster needs to communicate between cluster instances and AWS services.

Additional security groups are optional. You can specify them in addition to managed security groups to tailor access to cluster instances. Additional security groups contain only rules that you define. Amazon EMR does not modify them.

The rules that Amazon EMR creates in managed security groups allow the cluster to communicate among internal components. To allow users and applications to access a cluster from outside the cluster, you can edit rules in managed security groups, you can create additional security groups with additional rules, or do both.

**Important**  
Editing rules in managed security groups may have unintended consequences. You may inadvertently block the traffic required for clusters to function properly and cause errors because nodes are unreachable. Carefully plan and test security group configurations before implementation.

You can specify security groups only when you create a cluster. They can't be added to a cluster or cluster instances while a cluster is running, but you can edit, add, and remove rules from existing security groups. The rules take effect as soon as you save them.

Security groups are restrictive by default. Unless a rule is added that allows traffic, the traffic is rejected. If there is more than one rule that applies to the same traffic and the same source, the most permissive rule applies. For example, if you have a rule that allows SSH from IP address 192.0.2.12/32, and another rule that allows access to all TCP traffic from the range 192.0.2.0/24, the rule that allows all TCP traffic from the range that includes 192.0.2.12 takes precedence. In this case, the client at 192.0.2.12 might have more access than you intended. 

**Important**  
Use caution when you edit security group rules to open ports. Be sure to add rules that only allow traffic from trusted and authenticated clients for the protocols and ports that are required to run your workloads.

You can configure Amazon EMR *block public access* in each Region that you use to prevent cluster creation if a rule allows public access on any port that you don't add to a list of exceptions. For AWS accounts created after July 2019, Amazon EMR block public access is on by default. For AWS accounts that created a cluster before July 2019, Amazon EMR block public access is off by default. For more information, see [Using Amazon EMR block public access](emr-block-public-access.md).

**Topics**
+ [Working with Amazon EMR-managed security groups](emr-man-sec-groups.md)
+ [Working with additional security groups for an Amazon EMR cluster](emr-additional-sec-groups.md)
+ [Specifying Amazon EMR-managed and additional security groups](emr-sg-specify.md)
+ [Specifying EC2 security groups for EMR Notebooks](emr-managed-notebooks-security-groups.md)
+ [Using Amazon EMR block public access](emr-block-public-access.md)

**Note**  
Amazon EMR aims to use inclusive alternatives for potentially offensive or non-inclusive industry terms such as "master" and "slave". We've transitioned to new terminology to foster a more inclusive experience and to facilitate your understanding of the service components.  
We now describe "nodes" as **instances**, and we describe Amazon EMR instance types as **primary**, **core**, and **task** instances. During the transition, you might still find legacy references to the outdated terms, such as those that pertain to security groups for Amazon EMR.

# Working with Amazon EMR-managed security groups
<a name="emr-man-sec-groups"></a>

**Note**  
Amazon EMR aims to use inclusive alternatives for potentially offensive or non-inclusive industry terms such as "master" and "slave". We've transitioned to new terminology to foster a more inclusive experience and to facilitate your understanding of the service components.  
We now describe "nodes" as **instances**, and we describe Amazon EMR instance types as **primary**, **core**, and **task** instances. During the transition, you might still find legacy references to the outdated terms, such as those that pertain to security groups for Amazon EMR.

Different managed security groups are associated with the primary instance and with the core and task instances in a cluster. An additional managed security group for service access is required when you create a cluster in a private subnet. For more information about the role of managed security groups with respect to your network configuration, see [Amazon VPC options when you launch a cluster](emr-clusters-in-a-vpc.md).

When you specify managed security groups for a cluster, you must use the same type of security group, default or custom, for all managed security groups. For example, you can't specify a custom security group for the primary instance, and then not specify a custom security group for core and task instances.

If you use default managed security groups, you don't need to specify them when you create a cluster. Amazon EMR automatically uses the defaults. Moreover, if the defaults don't exist in the cluster's VPC yet, Amazon EMR creates them. Amazon EMR also creates them if you explicitly specify them and they don't exist yet.

You can edit rules in managed security groups after clusters are created. When you create a new cluster, Amazon EMR checks the rules in the managed security groups that you specify, and then creates any missing *inbound* rules that the new cluster needs in addition to rules that may have been added earlier. Unless specifically stated otherwise, each rule for default Amazon EMR-managed security groups is also added to custom Amazon EMR-managed security groups that you specify.

The default managed security groups are as follows:
+ **ElasticMapReduce-primary**

  For rules in this security group, see [Amazon EMR-managed security group for the primary instance (public subnets)](#emr-sg-elasticmapreduce-master).
+ **ElasticMapReduce-core**

  For rules in this security group, see [Amazon EMR-managed security group for core and task instances (public subnets)](#emr-sg-elasticmapreduce-slave).
+ **ElasticMapReduce-Primary-Private**

  For rules in this security group, see [Amazon EMR-managed security group for the primary instance (private subnets)](#emr-sg-elasticmapreduce-master-private).
+ **ElasticMapReduce-Core-Private**

  For rules in this security group, see [Amazon EMR-managed security group for core and task instances (private subnets)](#emr-sg-elasticmapreduce-slave-private).
+ **ElasticMapReduce-ServiceAccess**

  For rules in this security group, see [Amazon EMR-managed security group for service access (private subnets)](#emr-sg-elasticmapreduce-sa-private).

## Amazon EMR-managed security group for the primary instance (public subnets)
<a name="emr-sg-elasticmapreduce-master"></a>

The default managed security group for the primary instance in public subnets has the **Group Name** of **ElasticMapReduce-primary**. It has the following rules. If you specify a custom managed security group, Amazon EMR adds all the same rules to your custom security group.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-man-sec-groups.html)

**To grant trusted sources SSH access to the primary security group with the console**

To edit your security groups, you must have permission to manage security groups for the VPC that the cluster is in. For more information, see [Changing Permissions for a user](https://docs.aws.amazon.com//IAM/latest/UserGuide/id_users_change-permissions.html) and the [Example Policy](https://docs.aws.amazon.com//IAM/latest/UserGuide/reference_policies_examples_ec2_securitygroups-vpc.html) that allows managing EC2 security groups in the *IAM User Guide*.

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Choose **Clusters**. Choose the **ID** of the cluster you want to modify.

1. In the **Network and security** pane, expand the **EC2 security groups (firewall)** dropdown.

1. Under **Primary node**, choose your security group.

1. Choose **Edit inbound rules**.

1. Check for an inbound rule that allows public access with the following settings. If it exists, choose **Delete** to remove it.
   + **Type**

     SSH
   + **Port**

     22
   + **Source**

     Custom 0.0.0.0/0
**Warning**  
Before December 2020, there was a pre-configured rule to allow inbound traffic on Port 22 from all sources. This rule was created to simplify initial SSH connections to the primary node. We strongly recommend that you remove this inbound rule and restrict traffic to trusted sources.

1. Scroll to the bottom of the list of rules and choose **Add Rule**.

1. For **Type**, select **SSH**.

   Selecting SSH automatically enters **TCP** for **Protocol** and **22** for **Port Range**.

1. For source, select **My IP** to automatically add your IP address as the source address. You can also add a range of **Custom** trusted client IP addresses, or create additional rules for other clients. Many network environments dynamically allocate IP addresses, so you might need to update your IP addresses for trusted clients in the future.

1. Choose **Save**.

1. Optionally, choose the other security group under **Core and task nodes** in the **Network and security ** pane and repeat the steps above to allow SSH client access to core and task nodes.

## Amazon EMR-managed security group for core and task instances (public subnets)
<a name="emr-sg-elasticmapreduce-slave"></a>

The default managed security group for core and task instances in public subnets has the **Group Name** of **ElasticMapReduce-core**. The default managed security group has the following rules, and Amazon EMR adds the same rules if you specify a custom managed security group.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-man-sec-groups.html)

## Amazon EMR-managed security group for the primary instance (private subnets)
<a name="emr-sg-elasticmapreduce-master-private"></a>

The default managed security group for the primary instance in private subnets has the **Group Name** of **ElasticMapReduce-Primary-Private**. The default managed security group has the following rules, and Amazon EMR adds the same rules if you specify a custom managed security group.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-man-sec-groups.html)

## Amazon EMR-managed security group for core and task instances (private subnets)
<a name="emr-sg-elasticmapreduce-slave-private"></a>

The default managed security group for core and task instances in private subnets has the **Group Name** of **ElasticMapReduce-Core-Private**. The default managed security group has the following rules, and Amazon EMR adds the same rules if you specify a custom managed security group.

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-man-sec-groups.html)

### Editing outbound rules
<a name="private-sg-egress-rules"></a>

By default, Amazon EMR creates this security group with outbound rules that allow all outbound traffic on all protocols and ports. Allowing all outbound traffic is selected because various Amazon EMR and customer applications that can run on Amazon EMR clusters may require different egress rules. Amazon EMR is not able to anticipate these specific settings when creating default security groups. You can scope down egress in your security groups to include only those rules that suit your use cases and security policies. At minimum, this security group requires the following outbound rules, but some applications might need additional egress.


| Type | Protocol | Port range | Destination | Details | 
| --- | --- | --- | --- | --- | 
| All TCP | TCP | All | pl-xxxxxxxx | Managed Amazon S3 prefix list com.amazonaws.MyRegion.s3. | 
| All Traffic | All | All | sg-xxxxxxxxxxxxxxxxx | The ID of the ElasticMapReduce-Core-Private security group. | 
| All Traffic | All | All | sg-xxxxxxxxxxxxxxxxx | The ID of the ElasticMapReduce-Primary-Private security group. | 
| Custom TCP | TCP | 9443 | sg-xxxxxxxxxxxxxxxxx | The ID of the ElasticMapReduce-ServiceAccess security group. | 

## Amazon EMR-managed security group for service access (private subnets)
<a name="emr-sg-elasticmapreduce-sa-private"></a>

The default managed security group for service access in private subnets has the **Group Name** of **ElasticMapReduce-ServiceAccess**. It has inbound rules, and outbound rules that allow traffic over HTTPS (port 8443, port 9443) to the other managed security groups in private subnets. These rules allow the cluster manager to communicate with the primary node and with core and task nodes. The same rules are needed if you are using custom security groups.


| Type | Protocol | Port range | Source | Details | 
| --- | --- | --- | --- | --- | 
| Inbound rules Required for Amazon EMR clusters with Amazon EMR release 5.30.0 and later. | 
| Custom TCP | TCP | 9443 | The Group ID of the managed security group for primary instance.  |  This rule allows the communication between primary instance's security group to the service access security group. | 
| Outbound rules Required for all Amazon EMR clusters | 
| Custom TCP | TCP | 8443 | The Group ID of the managed security group for primary instance.  |  These rules allow the cluster manager to communicate with the primary node and with core and task nodes. | 
| Custom TCP | TCP | 8443 | The Group ID of the managed security group for core and task instances.  |  These rules allow the cluster manager to communicate with the primary node and with core and task nodes.  | 

# Working with additional security groups for an Amazon EMR cluster
<a name="emr-additional-sec-groups"></a>

Whether you use the default managed security groups or specify custom managed security groups, you can use additional security groups. Additional security groups give you the flexibility to tailor access between different clusters and from external clients, resources, and applications.

Consider the following scenario as an example. You have multiple clusters that you need to communicate with each other, but you want to allow inbound SSH access to the primary instance for only a particular subset of clusters. To do this, you can use the same set of managed security groups for the clusters. You then create additional security groups that allow inbound SSH access from trusted clients, and specify the additional security groups for the primary instance to each cluster in the subset.

You can apply up to 15 additional security groups for the primary instance, 15 for core and task instances, and 15 for service access (in private subnets). If necessary, you can specify the same additional security group for primary instances, core and task instances, and service access. The maximum number of security groups and rules in your account is subject to account limits. For more information, see [Security group limits](https://docs.aws.amazon.com/vpc/latest/userguide/amazon-vpc-limits.html#vpc-limits-security-groups) in the *Amazon VPC User Guide*. 

# Specifying Amazon EMR-managed and additional security groups
<a name="emr-sg-specify"></a>

You can specify security groups using the AWS Management Console, the AWS CLI, or the Amazon EMR API. If you don't specify security groups, Amazon EMR creates default security groups. Specifying additional security groups is optional. You can assign additional security groups for primary instances, core and task instances, and service access (private subnets only).

------
#### [ Console ]

**To specify security groups with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. Under **Networking**, select the arrow next to **EC2 security groups (firewall)** to expand this section. Under **Primary node** and **Core and task nodes**, the default Amazon EMR managed security groups are selected by default. If you use a private subnet, you also have the option to select a security group for **Service access**.

1. To change your Amazon EMR managed security group, use the **Choose security groups** dropdown menu to select a different option from the **Amazon EMR-managed security group** list of options. You have one Amazon EMR managed security group for both **Primary node** and **Core and task nodes**.

1. To add custom security groups, use the same **Choose security groups** dropdown menu to select up to four custom security groups from the **Custom security group** list of options. You can have up to four custom security groups for both **Primary node** and **Core and task nodes**.

1. Choose any other options that apply to your cluster. 

1. To launch your cluster, choose **Create cluster**.

------

## Specifying security groups with the AWS CLI
<a name="emr-sg-specify-cli"></a>

To specify security groups using the AWS CLI you use the `create-cluster` command with the following parameters of the `--ec2-attributes` option:


| Parameter | Description | 
| --- | --- | 
|  `EmrManagedPrimarySecurityGroup`  |  Use this parameter to specify a custom managed security group for the primary instance. If this parameter is specified, `EmrManagedCoreSecurityGroup` must also be specified. For clusters in private subnets, `ServiceAccessSecurityGroup` must also be specified.  | 
|  `EmrManagedCoreSecurityGroup`  |  Use this parameter to specify a custom managed security group for core and task instances. If this parameter is specified, `EmrManagedPrimarySecurityGroup` must also be specified. For clusters in private subnets, `ServiceAccessSecurityGroup` must also be specified.  | 
|  `ServiceAccessSecurityGroup`  |  Use this parameter to specify a custom managed security group for service access, which applies only to clusters in private subnets. The security group you specify as `ServiceAccessSecurityGroup` should not be used for any other purpose and should also be reserved for Amazon EMR. If this parameter is specified, `EmrManagedPrimarySecurityGroup` must also be specified.  | 
|  `AdditionalPrimarySecurityGroups`  |  Use this parameter to specify up to four additional security groups for the primary instance.  | 
|  `AdditionalCoreSecurityGroups`  |  Use this parameter to specify up to four additional security groups for core and task instances.  | 

**Example — specify custom Amazon EMR-managed security groups and additional security groups**  
The following example specifies custom Amazon EMR managed security groups for a cluster in a private subnet, multiple additional security groups for the primary instance, and a single additional security group for core and task instances.  
Linux line continuation characters (\$1) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

```
 1. aws emr create-cluster --name "ClusterCustomManagedAndAdditionalSGs" \
 2. --release-label emr-emr-7.12.0 --applications Name=Hue Name=Hive \
 3. Name=Pig --use-default-roles --ec2-attributes \
 4. SubnetIds=subnet-xxxxxxxxxxxx,KeyName=myKey,\
 5. ServiceAccessSecurityGroup=sg-xxxxxxxxxxxx,\
 6. EmrManagedPrimarySecurityGroup=sg-xxxxxxxxxxxx,\
 7. EmrManagedCoreSecurityGroup=sg-xxxxxxxxxxx,\
 8. AdditionalPrimarySecurityGroups=['sg-xxxxxxxxxxx',\
 9. 'sg-xxxxxxxxxxx','sg-xxxxxxxxxx'],\
10. AdditionalCoreSecurityGroups=sg-xxxxxxxxxxx \
11. --instance-type m5.xlarge
```

For more information, see [create-cluster](https://docs.aws.amazon.com/cli/latest/reference/emr/create-cluster.html) in the *AWS CLI Command Reference*.

# Specifying EC2 security groups for EMR Notebooks
<a name="emr-managed-notebooks-security-groups"></a>

When you create an EMR notebook, two security groups are used to control network traffic between the EMR notebook and the Amazon EMR cluster when you use the notebook editor. The default security groups have minimal rules that allow only network traffic between the EMR Notebooks service and the clusters to which notebooks are attached.

An EMR notebook uses [Apache Livy](https://livy.incubator.apache.org/) to communicate with the cluster via a proxy through TCP Port 18888. When you create custom security groups with rules that you tailor to your environment, you can limit network traffic so that only a subset of notebooks can run code within the notebook editor on particular clusters. The cluster uses your custom security in addition to the default security groups for the cluster. For more information, see [Control network traffic with security groups](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-security-groups.html) in the *Amazon EMR Management Guide* and [Specifying EC2 security groups for EMR Notebooks](#emr-managed-notebooks-security-groups).

## Default EC2 security group for the primary instance
<a name="emr-managed-notebooks-security-group-for-master"></a>

The default EC2 security group for the primary instance is associated with the primary instance in addition to the cluster's security groups for the primary instance.

Group Name: **ElasticMapReduceEditors-Livy**

**Rules**
+ Inbound

  Allow TCP Port 18888 from any resources in the default EC2 security group for EMR Notebooks
+ Outbound

  None

## Default EC2 security group for EMR Notebooks
<a name="emr-managed-notebooks-security-group-for-notebooks"></a>

The default EC2 security group for the EMR notebook is associated with the notebook editor for any EMR notebook to which it is assigned.

Group Name: **ElasticMapReduceEditors-Editor**

**Rules**
+ Inbound

  None
+ Outbound

  Allow TCP Port 18888 to any resources in the default EC2 security group for EMR Notebooks.

## Custom EC2 security group for EMR Notebooks when associating Notebooks with Git repositories
<a name="emr-managed-notebooks-security-group-for-notebooks-git"></a>

To link a Git repository to your notebook, the security group for the EMR notebook must include an outbound rule so that the notebook can route traffic to the internet. It is recommended that you create a new security group for this purpose. Updating the default **ElasticMapReduceEditors-Editor** security group may give the same outbound rules to other notebooks that are attached to this security group. 

**Rules**
+ Inbound

  None
+ Outbound

  Allow the notebook to route traffic to the internet via the cluster, as the following example demonstrates. The value 0.0.0.0/0 is used for example purposes. You can modify this rule to specify the IP address(es) for your Git-based repositories.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-managed-notebooks-security-groups.html)

# Using Amazon EMR block public access
<a name="emr-block-public-access"></a>

Amazon EMR *block public access (BPA)* prevents you from launching a cluster in a public subnet if the cluster has a security configuration that allows inbound traffic from public IP addresses on a port.

**Important**  
*Block public access* is enabled by default. To increase account protection, we recommend that you keep it enabled.

## Understanding block public access
<a name="emr-block-public-access-about"></a>

You can use the *block public access* account-level configuration to centrally manage public network access to Amazon EMR clusters.

When a user from your AWS account launches a cluster, Amazon EMR checks the port rules in the security group for the cluster and compares them with your inbound traffic rules. If the security group has an inbound rule that opens ports to the public IP addresses IPv4 0.0.0.0/0 or IPv6 ::/0, and those ports aren't specified as exceptions for your account, Amazon EMR doesn't let the user create the cluster.

If a user modifies the security group rules for a running cluster in a public subnet to have a public access rule that violates the BPA configuration for your account, Amazon EMR revokes the new rule if it has permission to do so. If Amazon EMR doesn't have permission to revoke the rule, it creates an event in the AWS Health dashboard that describes the violation. To grant the revoke rule permission to Amazon EMR, see [Configure Amazon EMR to revoke security group rules](#revoke-block-public-access).

Block public access is enabled by default for all clusters in every AWS Region for your AWS account. BPA applies to the entire lifecycle of a cluster, but doesn't apply to clusters that you create in private subnets. You can configure exceptions to the BPA rule; port 22 is an exception by default. For more information on setting exceptions, see [Configure block public access](#configure-block-public-access).

## Configure block public access
<a name="configure-block-public-access"></a>

You can update security groups and the block public access configuration in your accounts at any time.

You can turn block public access (BPA) settings on and off with the AWS Management Console, the AWS Command Line Interface (AWS CLI), and the Amazon EMR API. Settings apply across your account on a Region-by-Region basis. To maintain cluster security, we recommend that you use BPA.

------
#### [ Console ]

**To configure block public access with the console**

1. Sign in to the AWS Management Console, then open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. On the top navigation bar, select the **Region** that you want to configure if it's not already selected.

1. Under **EMR on EC2** in the left navigation pane, choose **Block public access**.

1. Under **Block public access settings**, complete the following steps.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-block-public-access.html)

------
#### [ AWS CLI ]

**To configure block public access using the AWS CLI**
+ Use the `aws emr put-block-public-access-configuration` command to configure block public access as shown in the following examples.    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-block-public-access.html)

------

## Configure Amazon EMR to revoke security group rules
<a name="revoke-block-public-access"></a>

Amazon EMR needs permission to revoke security group rules and comply with your block public access configuration. You can use one of the following approaches to give Amazon EMR the permission that it needs:
+ **(Recommended)** Attach the `AmazonEMRServicePolicy_v2` managed policy to the service role. For more information, see [Service role for Amazon EMR (EMR role)](emr-iam-role.md).
+ Create a new inline policy that allows the `ec2:RevokeSecurityGroupIngress` action on security groups. For more information about how to modify a role permissions policy, see **Modifying a role permissions policy** with the [IAM Console](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-console.html#roles-modify_permissions-policy), [AWS API](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-api.html#roles-modify_permissions-policy-api), and [AWS CLI](https://docs.aws.amazon.com/IAM/latest/UserGuide/roles-managingrole-editing-cli.html#roles-modify_permissions-policy-cli) in the * IAM User Guide*.

## Resolve block public access violations
<a name="resolve-block-public-access"></a>

If a block public access violation occurs, you can mitigate it with one of the following actions:
+ If you want to access a web interface on your cluster, use one of the options described in [View web interfaces hosted on Amazon EMR clusters](emr-web-interfaces.md) to access the interface through SSH (port 22).
+ To allow traffic to the cluster from specific IP addresses rather than from the public IP address, add a security group rule. For more information, see [Add rules to a security group](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/working-with-security-groups.html#adding-security-group-rule) in the *Amazon EC2 Getting Started Guide*.
+ **(Not recommended)** You can configure Amazon EMR BPA exceptions to include the desired port or range of ports. When you specify a BPA exception, you introduce risk with an unprotected port. If you plan to specify an exception, you should remove the exception as soon as it's no longer needed. For more information, see [Configure block public access](#configure-block-public-access).

## Identify clusters associated with security group rules
<a name="identify-block-public-access"></a>

You might need to identify all of the clusters that are associated with a given security group rule, or to find the security group rule for a given cluster.
+ If you know the security group, then you can identify its associated clusters if you find the network interfaces for the security group. For more information, see [How can I find the resources associated with an Amazon EC2 security group?](https://forums.aws.amazon.com/knowledge-center/ec2-find-security-group-resources) on AWS re:Post. The Amazon EC2 instances that are attached to these network interfaces will be tagged with the ID of the cluster that they belong to.
+ If you want to find the security groups for a known cluster, follow the steps in [View Amazon EMR cluster status and details](emr-manage-view-clusters.md). You can find the security groups for the cluster in the **Network and security** panel in the console, or in the `Ec2InstanceAttributes` field from the AWS CLI.

# Compliance validation for Amazon EMR
<a name="emr-compliance"></a>

Third-party auditors assess the security and compliance of Amazon EMR as part of multiple AWS compliance programs. These include SOC, PCI, FedRAMP, HIPAA, and others.

For a list of AWS services in scope of specific compliance programs, see [AWS services in scope by compliance program](https://aws.amazon.com/compliance/services-in-scope/). For general information, see [AWS compliance programs](https://aws.amazon.com/compliance/programs/).

You can download third-party audit reports using AWS Artifact. For more information, see [Downloading reports in AWS Artifact](https://docs.aws.amazon.com/artifact/latest/ug/downloading-documents.html). 

Your compliance responsibility when using Amazon EMR is determined by the sensitivity of your data, your company's compliance objectives, and applicable laws and regulations. If your use of Amazon EMR is subject to compliance with standards such as HIPAA, PCI, or FedRAMP, AWS provides resources to help:
+ [Security and compliance Quick Start Guides](https://aws.amazon.com/quickstart/?awsf.quickstart-homepage-filter=categories%23security-identity-compliance) – These deployment guides discuss architectural considerations and provide steps for deploying security- and compliance-focused baseline environments on AWS.
+ [Architecting for HIPAA Security and Compliance whitepaper ](https://docs.aws.amazon.com/whitepapers/latest/architecting-hipaa-security-and-compliance-on-aws/architecting-hipaa-security-and-compliance-on-aws.html) – This whitepaper describes how companies can use AWS to create HIPAA-compliant applications.
+ [AWS compliance resources](https://aws.amazon.com/compliance/resources/) – This collection of workbooks and guides might apply to your industry and location.
+ [AWS Config](https://docs.aws.amazon.com/config/latest/developerguide/evaluate-config.html) – This AWS service assesses how well your resource configurations comply with internal practices, industry guidelines, and regulations.
+ [AWS Security Hub CSPM](https://docs.aws.amazon.com/securityhub/latest/userguide/what-is-securityhub.html) – This AWS service provides a comprehensive view of your security state within AWS that helps you check your compliance with security industry standards and best practices.

# Resilience in Amazon EMR
<a name="disaster-recovery-resiliency"></a>

The AWS global infrastructure is built around AWS Regions and Availability Zones. AWS Regions provide multiple physically separated and isolated Availability Zones, which are connected with low-latency, high-throughput, and highly redundant networking. With Availability Zones, you can design and operate applications and databases that automatically fail over between Availability Zones without interruption. Availability Zones are more highly available, fault tolerant, and scalable than traditional single or multiple data center infrastructures. 

For more information about AWS Regions and Availability Zones, see [AWS global infrastructure](https://aws.amazon.com/about-aws/global-infrastructure/).

In addition to the AWS global infrastructure, Amazon EMR offers several features to help support your data resiliency and backup needs.
+ **Integration with Amazon S3 through EMRFS**
+ **Support for multiple master nodes**

# Infrastructure security in Amazon EMR
<a name="infrastructure-security"></a>

As a managed service, Amazon EMR is protected by AWS global network security. For information about AWS security services and how AWS protects infrastructure, see [AWS Cloud Security](https://aws.amazon.com/security/). To design your AWS environment using the best practices for infrastructure security, see [Infrastructure Protection](https://docs.aws.amazon.com/wellarchitected/latest/security-pillar/infrastructure-protection.html) in *Security Pillar AWS Well‐Architected Framework*.

You use AWS published API calls to access Amazon EMR through the network. Clients must support the following:
+ Transport Layer Security (TLS). We require TLS 1.2 and recommend TLS 1.3.
+ Cipher suites with perfect forward secrecy (PFS) such as DHE (Ephemeral Diffie-Hellman) or ECDHE (Elliptic Curve Ephemeral Diffie-Hellman). Most modern systems such as Java 7 and later support these modes.

**Topics**
+ [Connect to Amazon EMR using an interface VPC endpoint](interface-vpc-endpoint.md)

# Connect to Amazon EMR using an interface VPC endpoint
<a name="interface-vpc-endpoint"></a>

You can connect directly to Amazon EMR using an [interface VPC endpoint (AWS PrivateLink)](https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpce-interface.html) in your Virtual Private Cloud (VPC) instead of connecting over the internet. When you use an interface VPC endpoint, communication between your VPC and Amazon EMR is conducted entirely within the AWS network. Each VPC endpoint is represented by one or more [Elastic network interfaces](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-eni.html) (ENIs) with private IP addresses in your VPC subnets.

The interface VPC endpoint connects your VPC directly to Amazon EMR without an internet gateway, NAT device, VPN connection, or Direct Connect connection. The instances in your VPC don't need public IP addresses to communicate with the Amazon EMR API.

To use Amazon EMR through your VPC, you must connect from an instance that is inside the VPC or connect your private network to your VPC by using an Amazon Virtual Private Network (VPN) or Direct Connect. For information about Amazon VPN, see [VPN connections](https://docs.aws.amazon.com/vpc/latest/userguide/vpn-connections.html) in the *Amazon Virtual Private Cloud User Guide*. For information about AWS Direct Connect, see [Creating a connection](https://docs.aws.amazon.com/directconnect/latest/UserGuide/create-connection.html) in the *Direct Connect User Guide*.

You can create an interface VPC endpoint to connect to Amazon EMR using the AWS console or AWS Command Line Interface (AWS CLI) commands. For more information, see [Creating an interface endpoint](https://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpce-interface.html#create-interface-endpoint).

After you create an interface VPC endpoint, if you enable private DNS hostnames for the endpoint, the default Amazon EMR endpoint resolves to your VPC endpoint. The default service name endpoint for Amazon EMR is in the following format.

```
elasticmapreduce.Region.amazonaws.com
```

If you do not enable private DNS hostnames, Amazon VPC provides a DNS endpoint name that you can use in the following format.

```
VPC_Endpoint_ID.elasticmapreduce.Region.vpce.amazonaws.com
```

For more information, see [Interface VPC endpoints (AWS PrivateLink)](https://docs.aws.amazon.com/vpc/latest/userguide/vpce-interface.html) in the *Amazon VPC User Guide*.

Amazon EMR supports making calls to all of its [API actions](https://docs.aws.amazon.com/emr/latest/APIReference/API_Operations.html) inside your VPC.

You can attach VPC endpoint policies to a VPC endpoint to control access for IAM principals. You can also associate security groups with a VPC endpoint to control inbound and outbound access based on the origin and destination of network traffic, such as a range of IP addresses. For more information, see [Controlling access to services with VPC endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-access.html).

## Create a VPC endpoint policy for Amazon EMR
<a name="api-private-link-policy"></a>

You can create a policy for Amazon VPC endpoints for Amazon EMR to specify the following:
+ The principal that can or cannot perform actions
+ The actions that can be performed
+ The resources on which actions can be performed

For more information, see [Controlling access to services with VPC endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints-access.html) in the *Amazon VPC User Guide*.

**Example – VPC endpoint policy to deny all access from a specified AWS account**  
The following VPC endpoint policy denies AWS account *123456789012* all access to resources using the endpoint.  

```
{
    "Statement": [
        {
            "Action": "*",
            "Effect": "Allow",
            "Resource": "*",
            "Principal": "*"
        },
        {
            "Action": "*",
            "Effect": "Deny",
            "Resource": "*",
            "Principal": {
                "AWS": [
                    "123456789012"
                ]
            }
        }
    ]
}
```

**Example – VPC endpoint policy to allow VPC access only to a specified IAM principal (user)**  
The following VPC endpoint policy allows full access only to the a user *lijuan* in AWS account *123456789012*. All other IAM principals are denied access using the endpoint.  

```
{
    "Statement": [
        {
            "Action": "*",
            "Effect": "Allow",
            "Resource": "*",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::123456789012:user/lijuan"
                ]
            }
        }]
}
```

**Example – VPC endpoint policy to allow read-only EMR operations**  
The following VPC endpoint policy allows only AWS account *123456789012* to perform the specified Amazon EMR actions.  
The actions specified provide the equivalent of read-only access for Amazon EMR. All other actions on the VPC are denied for the specified account. All other accounts are denied any access. For a list of Amazon EMR actions, see [Actions, resources, and condition keys for Amazon EMR](https://docs.aws.amazon.com/IAM/latest/UserGuide/list_amazonelasticmapreduce.html).  

```
{
    "Statement": [
        {
            "Action": [
                "elasticmapreduce:DescribeSecurityConfiguration",
                "elasticmapreduce:GetBlockPublicAccessConfiguration",
                "elasticmapreduce:ListBootstrapActions",
                "elasticmapreduce:ViewEventsFromAllClustersInConsole",
                "elasticmapreduce:ListSteps",
                "elasticmapreduce:ListInstanceFleets",
                "elasticmapreduce:DescribeCluster",
                "elasticmapreduce:ListInstanceGroups",
                "elasticmapreduce:DescribeStep",
                "elasticmapreduce:ListInstances",
                "elasticmapreduce:ListSecurityConfigurations",
                "elasticmapreduce:DescribeEditor",
                "elasticmapreduce:ListClusters",
                "elasticmapreduce:ListEditors"            
            ],
            "Effect": "Allow",
            "Resource": "*",
            "Principal": {
                "AWS": [
                    "123456789012"
                ]
            }
        }
    ]
}
```

**Example – VPC endpoint policy denying access to a specified cluster**  

The following VPC endpoint policy allows full access for all accounts and principals, but denies any access for AWS account *123456789012* to actions performed on the Amazon EMR cluster with cluster ID *j-A1B2CD34EF5G*. Other Amazon EMR actions that don't support resource-level permissions for clusters are still allowed. For a list of Amazon EMR actions and their corresponding resource type, see [Actions, resources, and condition keys for Amazon EMR](https://docs.aws.amazon.com/IAM/latest/UserGuide/list_amazonelasticmapreduce.html).

```
{
    "Statement": [
        {
            "Action": "*",
            "Effect": "Allow",
            "Resource": "*",
            "Principal": "*"
        },
        {
            "Action": "*",
            "Effect": "Deny",
            "Resource": "arn:aws:elasticmapreduce:us-west-2:123456789012:cluster/j-A1B2CD34EF5G",
            "Principal": {
                "AWS": [
                    "123456789012"
                ]
            }
        }
    ]
}
```