

# Data protection in Amazon EMR
<a name="data-protection"></a>

The AWS [shared responsibility model](https://aws.amazon.com/compliance/shared-responsibility-model/) applies to data protection in Amazon EMR. As described in this model, AWS is responsible for protecting the global infrastructure that runs all of the AWS Cloud. You are responsible for maintaining control over your content that is hosted on this infrastructure. This content includes the security configuration and management tasks for the AWS that you use. For more information about data privacy, see the [Data Privacy FAQ ](https://aws.amazon.com/compliance/data-privacy-faq/). For information about data protection in Europe, see [the Amazon shared responsibility model and GDPR](http://aws.amazon.com/blogs/security/the-aws-shared-responsibility-model-and-gdpr/) blog post on the AWS Security Blog.

For data protection purposes, we recommend that you protect AWS account credentials and set up individual accounts with AWS Identity and Access Management. That way each user is given only the permissions necessary to fulfill their job duties. We also recommend that you secure your data in the following ways:
+ Use multi-factor authentication (MFA) with each account.
+ Use TLS to communicate with AWS resources. We require TLS 1.2.
+ Set up API and user activity logging with AWS CloudTrail.
+ Use AWS encryption solutions, along with all default security controls within AWS services.
+ Use advanced managed security services such as Amazon Macie, which assists in discovering and securing personal data that is stored in Amazon S3.
+ If you require FIPS 140-2 validated cryptographic modules when accessing AWS through a command line interface or an API, use a FIPS endpoint. For more information about the available FIPS endpoints, see [Federal Information Processing Standard (FIPS) 140-2](https://aws.amazon.com/compliance/fips/).

We strongly recommend that you never put sensitive identifying information, such as your customers' account numbers, into free-form fields such as a **Name** field. This includes when you work with Amazon EMR or other AWS services using the console, API, AWS CLI, or AWS SDKs. Any data that you enter into Amazon EMR or other services might get picked up for inclusion in diagnostic logs. When you provide a URL to an external server, don't include credentials information in the URL to validate your request to that server.

# Encrypt data at rest and in transit with Amazon EMR
<a name="emr-data-encryption"></a>

Data encryption helps prevent unauthorized users from reading data on a cluster and associated data storage systems. This includes data saved to persistent media, known as data *at rest*, and data that may be intercepted as it travels the network, known as data *in transit*.

Beginning with Amazon EMR version 4.8.0, you can use Amazon EMR security configurations to configure data encryption settings for clusters more easily. Security configurations offer settings to enable security for data in-transit and data at-rest in Amazon Elastic Block Store (Amazon EBS) volumes and EMRFS on Amazon S3. 

Optionally, beginning with Amazon EMR release version 4.1.0 and later, you can choose to configure transparent encryption in HDFS, which is not configured using security configurations. For more information, see [Transparent encryption in HDFS on Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-encryption-tdehdfs.html) in the *Amazon EMR Release Guide*.

**Topics**
+ [

# Encryption options for Amazon EMR
](emr-data-encryption-options.md)
+ [

# Encryption at rest using a customer KMS key for the EMR WAL service
](encryption-at-rest-kms.md)
+ [

# Create keys and certificates for data encryption with Amazon EMR
](emr-encryption-enable.md)
+ [

# Understanding in-transit encryption
](emr-encryption-support-matrix.md)

# Encryption options for Amazon EMR
<a name="emr-data-encryption-options"></a>

With Amazon EMR releases 4.8.0 and higher, you can use a security configuration to specify settings for encrypting data at rest, data in transit, or both. When you enable at-rest data encryption, you can choose to encrypt EMRFS data in Amazon S3, data in local disks, or both. Each security configuration that you create is stored in Amazon EMR rather than in the cluster configuration, so you can easily reuse a configuration to specify data encryption settings whenever you create a cluster. For more information, see [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md).

The following diagram shows the different data encryption options available with security configurations. 

![\[There are several in-transit and at-rest encryption options available with Amazon EMR.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/emr-encryption-options.png)


The following encryption options are also available and are not configured using a security configuration:
+ Optionally, with Amazon EMR versions 4.1.0 and later, you can choose to configure transparent encryption in HDFS. For more information, see [Transparent encryption in HDFS on Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-encryption-tdehdfs.html) in the *Amazon EMR Release Guide*.
+ If you are using a release version of Amazon EMR that does not support security configurations, you can configure encryption for EMRFS data in Amazon S3 manually. For more information, see [Specifying Amazon S3 encryption using EMRFS properties](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-emrfs-encryption.html).
+  If you are using an Amazon EMR version earlier than 5.24.0, an encrypted EBS root device volume is supported only when using a custom AMI. For more information, see [Creating a custom AMI with an encrypted Amazon EBS root device volume](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-custom-ami.html#emr-custom-ami-encrypted) in the *Amazon EMR Management Guide*.

**Note**  
Beginning with Amazon EMR version 5.24.0, you can use a security configuration option to encrypt EBS root device and storage volumes when you specify AWS KMS as your key provider. For more information, see [Local disk encryption](#emr-encryption-localdisk).

Data encryption requires keys and certificates. A security configuration gives you the flexibility to choose from several options, including keys managed by AWS Key Management Service, keys managed by Amazon S3, and keys and certificates from custom providers that you supply. When using AWS KMS as your key provider, charges apply for the storage and use of encryption keys. For more information, see [AWS KMS pricing](https://aws.amazon.com/kms/pricing/).

Before you specify encryption options, decide on the key and certificate management systems you want to use, so you can first create the keys and certificates or the custom providers that you specify as part of encryption settings.

## Encryption at rest for EMRFS data in Amazon S3
<a name="emr-encryption-s3"></a>

Amazon S3 encryption works with the Amazon EMR File System (EMRFS) objects read from and written to Amazon S3. You specify Amazon S3 server-side encryption (SSE) or client-side encryption (CSE) as the **Default encryption mode** when you enable encryption at rest. Optionally, you can specify different encryption methods for individual buckets using **Per bucket encryption overrides**. Regardless of whether Amazon S3 encryption is enabled, Transport Layer Security (TLS) encrypts the EMRFS objects in transit between EMR cluster nodes and Amazon S3. For more information about Amazon S3 encryption, see [Protecting data using encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingEncryption.html) in the *Amazon Simple Storage Service User Guide*.

**Note**  
When you use AWS KMS, charges apply for the storage and use of encryption keys. For more information, see [AWS KMS Pricing](https://aws.amazon.com/kms/pricing/).

### Amazon S3 server-side encryption
<a name="emr-encryption-s3-sse"></a>

All Amazon S3 buckets have encryption configured by default, and all new objects that are uploaded to an S3 bucket are automatically encrypted at rest, Amazon S3 encrypts data at the object level as it writes the data to disk and decrypts the data when it is accessed. For more information about SSE, see [Protecting data using server-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/serv-side-encryption.html) in the *Amazon Simple Storage Service User Guide*.

You can choose between two different key management systems when you specify SSE in Amazon EMR: 
+ **SSE-S3** – Amazon S3 manages keys for you.
+ **SSE-KMS** – You use an AWS KMS key to set up with policies suitable for Amazon EMR. For more information about key requirements for Amazon EMR, see [Using AWS KMS keys for encryption](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-enable.html#emr-awskms-keys).

SSE with customer-provided keys (SSE-C) is not available for use with Amazon EMR.

### Amazon S3 client-side encryption
<a name="emr-encryption-s3-cse"></a>

With Amazon S3 client-side encryption, the Amazon S3 encryption and decryption takes place in the EMRFS client on your cluster. Objects are encrypted before being uploaded to Amazon S3 and decrypted after they are downloaded. The provider you specify supplies the encryption key that the client uses. The client can use keys provided by AWS KMS (CSE-KMS) or a custom Java class that provides the client-side root key (CSE-C). The encryption specifics are slightly different between CSE-KMS and CSE-C, depending on the specified provider and the metadata of the object being decrypted or encrypted. For more information about these differences, see [Protecting data using client-side encryption](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingClientSideEncryption.html) in the *Amazon Simple Storage Service User Guide*.

**Note**  
Amazon S3 CSE only ensures that EMRFS data exchanged with Amazon S3 is encrypted; not all data on cluster instance volumes is encrypted. Furthermore, because Hue does not use EMRFS, objects that the Hue S3 File Browser writes to Amazon S3 are not encrypted.

## Encryption at rest for data in Amazon EMR WAL
<a name="emr-encryption-wal"></a>

When you set up server-side encryption (SSE) for write-ahead logging (WAL), Amazon EMR encrypts data at rest. You can choose from two different key management systems when you specify SSE in Amazon EMR:

**SSE-EMR-WAL**  
Amazon EMR manages keys for you. By default, Amazon EMR encrypts the data that you stored in Amazon EMR WAL with SSE-EMR-WAL.

**SSE-KMS-WAL**  
You use an AWS KMS key to set up policies that apply to Amazon EMR WAL. For more information about configuring encryption at rest for EMR WAL using a customer KMS key, see [Encryption at rest using a customer KMS key for the EMR WAL service](https://docs.aws.amazon.com/emr/latest/ManagementGuide/encryption-at-rest-kms.html).

**Note**  
You can't use your own key with SSE when you enable WAL with Amazon EMR. For more information, see [Write-ahead logs (WAL) for Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hbase-wal.html).

## Local disk encryption
<a name="emr-encryption-localdisk"></a>

The following mechanisms work together to encrypt local disks when you enable local disk encryption using an Amazon EMR security configuration.

### Open-source HDFS encryption
<a name="w2aac30c19c13c11c23b5"></a>

HDFS exchanges data between cluster instances during distributed processing. It also reads from and writes data to instance store volumes and the EBS volumes attached to instances. The following open-source Hadoop encryption options are activated when you enable local disk encryption:
+ [Secure Hadoop RPC](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_RPC) is set to `Privacy`, which uses Simple Authentication Security Layer (SASL). 
+ [Data encryption on HDFS block data transfer](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_Block_data_transfer.) is set to `true` and is configured to use AES 256 encryption.

**Note**  
You can activate additional Apache Hadoop encryption by enabling in-transit encryption. For more information, see [Encryption in transit](#emr-encryption-intransit). These encryption settings do not activate HDFS transparent encryption, which you can configure manually. For more information, see [Transparent encryption in HDFS on Amazon EMR](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-encryption-tdehdfs.html) in the *Amazon EMR Release Guide*.

### Instance store encryption
<a name="w2aac30c19c13c11c23b7"></a>

For EC2 instance types that use NVMe-based SSDs as the instance store volume, NVMe encryption is used regardless of Amazon EMR encryption settings. For more information, see [NVMe SSD volumes](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ssd-instance-store.html#nvme-ssd-volumes) in the *Amazon EC2 User Guide*. For other instance store volumes, Amazon EMR uses LUKS to encrypt the instance store volume when local disk encryption is enabled regardless of whether EBS volumes are encrypted using EBS encryption or LUKS.

### EBS volume encryption
<a name="w2aac30c19c13c11c23b9"></a>

If you create a cluster in a Region where Amazon EC2 encryption of EBS volumes is enabled by default for your account, EBS volumes are encrypted even if local disk encryption is not enabled. For more information, see [Encryption by default](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSEncryption.html#encryption-by-default) in the *Amazon EC2 User Guide*. With local disk encryption enabled in a security configuration, the Amazon EMR settings take precedence over the Amazon EC2 encryption-by-default settings for cluster EC2 instances.

The following options are available to encrypt EBS volumes using a security configuration:
+ **EBS encryption** – Beginning with Amazon EMR version 5.24.0, you can choose to enable EBS encryption. The EBS encryption option encrypts the EBS root device volume and attached storage volumes. The EBS encryption option is available only when you specify AWS Key Management Service as your key provider. We recommend using EBS encryption. 
+ **LUKS encryption** – If you choose to use LUKS encryption for Amazon EBS volumes, the LUKS encryption applies only to attached storage volumes, not to the root device volume. For more information about LUKS encryption, see the [LUKS on-disk specification](https://gitlab.com/cryptsetup/cryptsetup/wikis/Specification).

  For your key provider, you can set up an AWS KMS key with policies suitable for Amazon EMR, or a custom Java class that provides the encryption artifacts. When you use AWS KMS, charges apply for the storage and use of encryption keys. For more information, see [AWS KMS pricing](https://aws.amazon.com/kms/pricing/).

**Note**  
To check if EBS encryption is enabled on your cluster, it is recommended that you use `DescribeVolumes` API call. For more information, see [DescribeVolumes](https://docs.aws.amazon.com/AWSEC2/latest/APIReference/API_DescribeVolumes.html). Running `lsblk` on the cluster will only check the status of LUKS encryption, instead of EBS encryption.

## Encryption in transit
<a name="emr-encryption-intransit"></a>

Several encryption mechanisms are enabled with in-transit encryption. These are open-source features, are application-specific, and might vary by Amazon EMR release. To enable in-transit encryption, use [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md) in Amazon EMR. For EMR clusters with in-transit encryption enabled, Amazon EMR automatically configures the open-source application configurations to enable in-transit encryption. For advanced use cases, you can configure open-source application configurations directly to override the default behavior in Amazon EMR. For more information, see [in-transit encryption support matrix](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-support-matrix.html) and [Configure applications](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html).

See the following to learn more specific details about open-source applications relevant to in-transit encryption:
+ When you enable in-transit encryption with a security configuration, Amazon EMR enables in-transit encryption for all open-source application endpoints that support in-transit encryption. Support for in-transit encryption for different application endpoints varies by the Amazon EMR release version. For more information, see the [in-transit encryption support matrix](https://docs.aws.amazon.com/).
+ You can override open-source configurations, which lets you do the following:
  + Disable TLS hostname verification if your user-provided TLS certificates doesn't meet requirements
  + Disable in-transit encryption for certain endpoints based on your performance and compatibility requirements
  + Control which TLS versions and cipher suites to use.

  You can find more details about the application-specific configurations in the [in-transit encryption support matrix](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-support-matrix.html)
+ Aside from enabling in-transit encryption with a security configuration, some communication channels also require additional security configurations for you to enable in-transit encryption. For example, some open-source application endpoints use Simple Authentication and Security Layer (SASL) for in-transit encryption, which requires that Kerberos authentication is enabled in the security configuration of the EMR cluster. To learn more about these endpoints, see the [in-transit encryption support matrix](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-support-matrix.html). 
+ We recommend that you use software that support TLS v1.2 or higher. Amazon EMR on EC2 ships the default Corretto JDK distribution, which determines which TLS versions, cipher suites, and key sizes are allowed by the open-source networks that run on Java. At this time, most open-source frameworks enforce TLS v1.2 or higher for Amazon EMR 7.0.0 and higher releases. This is because most open-source frameworks run on Java 17 for Amazon EMR 7.0.0 and higher. Older Amazon EMR release versions might support TLS v1.0 and v1.1 because they consume older Java versions, but Corretto JDK might change which TLS versions that Java supports, which might impact existing Amazon EMR releases.

You specify the encryption artifacts used for in-transit encryption in one of two ways: either by providing a zipped file of certificates that you upload to Amazon S3, or by referencing a custom Java class that provides encryption artifacts. For more information, see [Providing certificates for encrypting data in transit with Amazon EMR encryption](emr-encryption-enable.md#emr-encryption-certificates).

# Encryption at rest using a customer KMS key for the EMR WAL service
<a name="encryption-at-rest-kms"></a>

EMR write-ahead logs (WAL) provides customer KMS key encryption-at-rest support. The following details at a high level how Amazon EMR WAL is integrated with AWS KMS:

The EMR write-ahead logs (WAL) interact with AWS during the following operations: `CreateWAL`, `AppendEdit`, `ArchiveWALCheckPoint`, `CompleteWALFlush`, `DeleteWAL`, `GetCurrentWALTime`, `ReplayEdits`, `TrimWAL` via the `EMR_EC2_DefaultRole` by default When any the previous operations listed are invoked, the EMR WAL makes `Decrypt` and `GenerateDataKey` against the KMS key.

## Considerations
<a name="encryption-at-rest-considerations"></a>

Consider the following when using AWS KMS based encryption for EMR WAL:
+ The encryption configuration can't be changed after an EMR WAL is created.
+ When you use KMS encryption with your own KMS key, the key must exist in the same region as your Amazon EMR cluster.
+ You are responsible to maintain all required IAM permissions and it's recommended to not revoke the needed permissions during the life of the WAL. Otherwise, it will cause unexpected failure scenarios, such as the inability to delete EMR WAL, as the associated encryption key doesn't exist.
+ There is a cost associated with using AWS KMS keys. For more information, see [AWS Key Management Service pricing](https://aws.amazon.com/kms/pricing/).

## Required IAM permissions
<a name="encryption-at-rest-required-iam-permissions"></a>

To use your customer KMS key to encrypt EMR WAL at rest, you need to make sure you set proper permission to the EMR WAL client role and the EMR WAL service principal `emrwal.amazonaws.com`.

### Permissions for the EMR WAL client role
<a name="encryption-at-rest-permissions-client-role"></a>

Below is the IAM policy needed for the EMR WAL client role:

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey*"
      ],
      "Resource": [
        "*"
      ],
      "Sid": "AllowKMSDecrypt"
    }
  ]
}
```

------

The EMR WAL client on EMR cluster will use `EMR_EC2_DefaultRole` by default. If you use a different role for the instance profile in the EMR cluster, make sure that each role has appropriate permissions.

For more information about managing the role policy, refer to [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html).

### Permissions for the KMS key policy
<a name="encryption-at-rest-permissions-kms-key-policy"></a>

You need to give the EMR WAL client role and EMR WAL service `Decrypt` and `GenerateDataKey*` permission in your KMS policy. For more about the key policy management, refer to [KMS key policy](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html).

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt",
        "kms:GenerateDataKey*"
      ],
      "Resource": [
        "arn:aws:kms:*:123456789012:key/*"
      ],
      "Sid": "AllowKMSDecrypt"
    }
  ]
}
```

------

The role specified in the snippet can change if you change the default role.

## Monitoring Amazon EMR WAL interaction with AWS KMS
<a name="encryption-at-rest-monitoring-emr-wal-kms"></a>

### Amazon EMR WAL encryption context
<a name="encryption-at-rest-encryption-context"></a>

An encryption context is a set of key–value pairs that contains arbitrary non-secret data. When you include an encryption context in a request to encrypt data, AWS KMS cryptographically binds the encryption context to the encrypted data. To decrypt the data, you must pass in the same encryption context.

In its [GenerateDataKey](https://docs.aws.amazon.com/kms/latest/APIReference/API_GenerateDataKey.html) and [Decrypt](https://docs.aws.amazon.com/kms/latest/APIReference/API_Decrypt.html) requests to AWS KMS, Amazon EMR WAL uses an encryption context with one name–value pairs that identify the EMR WAL name.

```
"encryptionContext": {
    "aws:emrwal:walname": "111222333444555-testworkspace-emrwalclustertest-emrwaltestwalname"
}
```

You can use the encryption context to identify these cryptographic operation in audit records and logs, such as AWS CloudTrail and [Amazon CloudWatch Logs](https://docs.aws.amazon.com/awscloudtrail/latest/userguide/cloudtrail-user-guide.html), and as a condition for authorization in policies and grants.

# Create keys and certificates for data encryption with Amazon EMR
<a name="emr-encryption-enable"></a>

Before you specify encryption options using a security configuration, decide on the provider you want to use for keys and encryption artifacts. For example, you can use AWS KMS or a custom provider that you create. Next, create the keys or key provider as described in this section.

## Providing keys for encrypting data at rest
<a name="emr-encryption-create-keys"></a>

You can use AWS Key Management Service (AWS KMS) or a custom key provider for at-rest data encryption in Amazon EMR. When you use AWS KMS, charges apply for the storage and use of encryption keys. For more information, see [AWS KMS pricing](https://aws.amazon.com/kms/pricing/). 

This topic provides key policy details for a KMS key to be used with Amazon EMR, as well as guidelines and code examples for writing a custom key provider class for Amazon S3 encryption. For more information about creating keys, see [Creating keys](https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html) in the *AWS Key Management Service Developer Guide*.

### Using AWS KMS keys for encryption
<a name="emr-awskms-keys"></a>

The AWS KMS encryption key must be created in the same Region as your Amazon EMR cluster instance and the Amazon S3 buckets used with EMRFS. If the key that you specify is in a different account from the one that you use to configure a cluster, you must specify the key using its ARN.

The role for the Amazon EC2 instance profile must have permissions to use the KMS key you specify. The default role for the instance profile in Amazon EMR is `EMR_EC2_DefaultRole`. If you use a different role for the instance profile, or you use IAM roles for EMRFS requests to Amazon S3, make sure that each role is added as a key user as appropriate. This gives the role permissions to use the KMS key. For more information, see [Using Key Policies](https://docs.aws.amazon.com/kms/latest/developerguide/key-policies.html#key-policy-default-allow-users) in the *AWS Key Management Service Developer Guide* and [Configure IAM roles for EMRFS requests to Amazon S3](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-emrfs-iam-roles.html).

You can use the AWS Management Console to add your instance profile or EC2 instance profile to the list of key users for the specified KMS key, or you can use the AWS CLI or an AWS SDK to attach an appropriate key policy.

Note that Amazon EMR supports only [symmetric KMS keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#symmetric-cmks). You cannot use an [asymmetric KMS key](https://docs.aws.amazon.com/kms/latest/developerguide/symmetric-asymmetric.html#asymmetric-cmks) to encrypt data at rest in an Amazon EMR cluster. For help determining whether a KMS key is symmetric or asymmetric, see [ Identifying symmetric and asymmetric KMS keys](https://docs.aws.amazon.com/kms/latest/developerguide/find-symm-asymm.html).

The procedure below describes how to add the default Amazon EMR instance profile, `EMR_EC2_DefaultRole` as a *key user* using the AWS Management Console. It assumes that you have already created a KMS key. To create a new KMS key, see [Creating Keys](https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html) in the *AWS Key Management Service Developer Guide*.

**To add the EC2 instance profile for Amazon EMR to the list of encryption key users**

1. Sign in to the AWS Management Console and open the AWS Key Management Service (AWS KMS) console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms).

1. To change the AWS Region, use the Region selector in the upper-right corner of the page.

1. Select the alias of the KMS key to modify.

1. On the key details page under **Key Users**, choose **Add**.

1. In the **Add key users** dialog box, select the appropriate role. The name of the default role is `EMR_EC2_DefaultRole`.

1. Choose **Add**.

### Enabling EBS encryption by providing additional permissions for KMS keys
<a name="emr-awskms-ebs-encryption"></a>

Beginning with Amazon EMR version 5.24.0, you can encrypt EBS root device and storage volumes by using a security configuration option. To enable such option, you must specify AWS KMS as your key provider. Additionally, you must grant the service role `EMR_DefaultRole` with permissions to use the AWS KMS key that you specify.

You can use the AWS Management Console to add the service role to the list of key users for the specified KMS key, or you can use the AWS CLI or an AWS SDK to attach an appropriate key policy.

The following procedure describes how to use the AWS Management Console to add the default Amazon EMR service role `EMR_DefaultRole` as a *key user*. It assumes that you have already created a KMS key. To create a new KMS key, see [Creating keys](https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html) in the *AWS Key Management Service Developer Guide*.

**To add the Amazon EMR service role to the list of encryption key users**

1. Sign in to the AWS Management Console and open the AWS Key Management Service (AWS KMS) console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms).

1. To change the AWS Region, use the Region selector in the upper-right corner of the page.

1. Choose **Customer managed keys** in the left sidebar.

1. Select the alias of the KMS key to modify.

1. On the key details page under **Key Users**, choose **Add**.

1. In the **Add key users** section, select the appropriate role. The name of the default service role for Amazon EMR is `EMR_DefaultRole`.

1. Choose **Add**.

### Creating a custom key provider
<a name="emr-custom-keys"></a>

When using a security configuration, you must specify a different provider class name for local disk encryption and Amazon S3 encryption. The requirements for custom key provider depend on whether you use local disk encryption and Amazon S3 encryption, as well as the Amazon EMR release version.

Depending on the type of encryption you use when creating a custom key provider, the application must also implement different EncryptionMaterialsProvider interfaces. Both interfaces are available in the AWS SDK for Java version 1.11.0 and later.
+ To implement Amazon S3 encryption, use the [ com.amazonaws.services.s3.model.EncryptionMaterialsProvider interface](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/model/EncryptionMaterialsProvider.html).
+ To implement local disk encryption, use the [ com.amazonaws.services.elasticmapreduce.spi.security.EncryptionMaterialsProvider interface](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/spi/security/EncryptionMaterialsProvider.html).

You can use any strategy to provide encryption materials for the implementation. For example, you might choose to provide static encryption materials or integrate with a more complex key management system.

If you’re using Amazon S3 encryption, you must use the encryption algorithms **AES/GCM/NoPadding** for custom encryption materials.

If you’re using local disk encryption, the encryption algorithm to use for custom encryption materials varies by EMR release. For Amazon EMR 7.0.0 and lower, you must use **AES/GCM/NoPadding**. For Amazon EMR 7.1.0 and higher, you must use **AES**.

The EncryptionMaterialsProvider class gets encryption materials by encryption context. Amazon EMR populates encryption context information at runtime to help the caller determine the correct encryption materials to return.

**Example: Using a custom key provider for Amazon S3 encryption with EMRFS**  
When Amazon EMR fetches the encryption materials from the EncryptionMaterialsProvider class to perform encryption, EMRFS optionally populates the materialsDescription argument with two fields: the Amazon S3 URI for the object and the JobFlowId of the cluster, which can be used by the EncryptionMaterialsProvider class to return encryption materials selectively.  
For example, the provider may return different keys for different Amazon S3 URI prefixes. It is the description of the returned encryption materials that is eventually stored with the Amazon S3 object rather than the materialsDescription value that is generated by EMRFS and passed to the provider. While decrypting an Amazon S3 object, the encryption materials description is passed to the EncryptionMaterialsProvider class, so that it can, again, selectively return the matching key to decrypt the object.  
An EncryptionMaterialsProvider reference implementation is provided below. Another custom provider, [EMRFSRSAEncryptionMaterialsProvider](https://github.com/awslabs/emr-sample-apps/tree/master/emrfs-plugins/EMRFSRSAEncryptionMaterialsProvider), is available from GitHub.   

```
import com.amazonaws.services.s3.model.EncryptionMaterials;
import com.amazonaws.services.s3.model.EncryptionMaterialsProvider;
import com.amazonaws.services.s3.model.KMSEncryptionMaterials;
import org.apache.hadoop.conf.Configurable;
import org.apache.hadoop.conf.Configuration;

import java.util.Map;

/**
 * Provides KMSEncryptionMaterials according to Configuration
 */
public class MyEncryptionMaterialsProviders implements EncryptionMaterialsProvider, Configurable{
  private Configuration conf;
  private String kmsKeyId;
  private EncryptionMaterials encryptionMaterials;

  private void init() {
    this.kmsKeyId = conf.get("my.kms.key.id");
    this.encryptionMaterials = new KMSEncryptionMaterials(kmsKeyId);
  }

  @Override
  public void setConf(Configuration conf) {
    this.conf = conf;
    init();
  }

  @Override
  public Configuration getConf() {
    return this.conf;
  }

  @Override
  public void refresh() {

  }

  @Override
  public EncryptionMaterials getEncryptionMaterials(Map<String, String> materialsDescription) {
    return this.encryptionMaterials;
  }

  @Override
  public EncryptionMaterials getEncryptionMaterials() {
    return this.encryptionMaterials;
  }
}
```

## Providing certificates for encrypting data in transit with Amazon EMR encryption
<a name="emr-encryption-certificates"></a>

With Amazon EMR release version 4.8.0 or later, you have two options for specifying artifacts for encrypting data in transit using a security configuration: 
+ You can manually create PEM certificates, include them in a .zip file, and then reference the .zip file in Amazon S3.
+ You can implement a custom certificate provider as a Java class. You specify the JAR file of the application in Amazon S3, and then provide the full class name of the provider as declared in the application. The class must implement the [TLSArtifactsProvider](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/spi/security/TLSArtifactsProvider.html) interface available beginning with the AWS SDK for Java version 1.11.0.

Amazon EMR automatically downloads artifacts to each node in the cluster and later uses them to implement the open-source, in-transit encryption features. For more information about available options, see [Encryption in transit](emr-data-encryption-options.md#emr-encryption-intransit).

### Using PEM certificates
<a name="emr-encryption-pem-certificate"></a>

When you specify a .zip file for in-transit encryption, the security configuration expects PEM files within the .zip file to be named exactly as they appear below:


**In-transit encryption certificates**  

| File name | Required/optional | Details | 
| --- | --- | --- | 
| privateKey.pem | Required | Private key | 
| certificateChain.pem | Required | Certificate chain | 
| trustedCertificates.pem | Optional | We recommend that you provide a certificate that isn't signed by the the Java default trusted root certification authority (CA) or an intermediate CA that can link to the Java default trusted root CA. We don't reocmmend that you use public CAs when you use wildcard certificates or when you disable hostname verification. | 

You likely want to configure the private key PEM file to be a wildcard certificate that enables access to the Amazon VPC domain in which your cluster instances reside. For example, if your cluster resides in us-east-1 (N. Virginia), you could specify a common name in the certificate configuration that allows access to the cluster by specifying `CN=*.ec2.internal` in the certificate subject definition. If your cluster resides in us-west-2 (Oregon), you could specify `CN=*.us-west-2.compute.internal`.

If the provided PEM file in the encryption artifact doesn't have a wildcard character for the domain in the common name, you must change the value of `hadoop.ssl.hostname.verifier` to `ALLOW_ALL`. To do so in Amazon EMR releases 7.3.0 and higher, add the `core-site` classification when you submit configurations to a cluster. In releases lower than 7.3.0, add the configuration `"hadoop.ssl.hostname.verifier": "ALLOW_ALL"` directly into the `core-site.xml` file. This change is required because the default hostname verifier requires a hostname without the wildcard because all hosts in the cluster use it. For more information about EMR cluster configuration within an Amazon VPC, see [Configure networking in a VPC for Amazon EMR](emr-plan-vpc-subnet.md).

The following example demonstrates how to use [OpenSSL](https://www.openssl.org/) to generate a self-signed X.509 certificate with a 2048-bit RSA private key. The key allows access to the issuer's Amazon EMR cluster instances in the `us-west-2` (Oregon) Region as specified by the `*.us-west-2.compute.internal` domain name as the common name.

Other optional subject items, such as country (C), state (S), and Locale (L), are specified. Because a self-signed certificate is generated, the second command in the example copies the `certificateChain.pem` file to the `trustedCertificates.pem` file. The third command uses `zip` to create the `my-certs.zip` file that contains the certificates.



**Important**  
This example is a proof-of-concept demonstration only. Using self-signed certificates is not recommended and presents a potential security risk. For production systems, use a trusted certification authority (CA) to issue certificates.

```
$ openssl req -x509 -newkey rsa:2048 -keyout privateKey.pem -out certificateChain.pem -days 365 -nodes -subj '/C=US/ST=Washington/L=Seattle/O=MyOrg/OU=MyDept/CN=*.us-west-2.compute.internal'
$ cp certificateChain.pem trustedCertificates.pem
$ zip -r -X my-certs.zip certificateChain.pem privateKey.pem trustedCertificates.pem
```

# Understanding in-transit encryption
<a name="emr-encryption-support-matrix"></a>

You can configure an EMR cluster to run open-source frameworks such as [Apache Spark](https://aws.amazon.com/emr/features/spark/), [Apache Hive](https://aws.amazon.com/emr/features/hive/), and [Presto](https://aws.amazon.com/emr/features/presto/). each of these open-source frameworks has a set of processes running on the EC2 instances of a cluster. Each of these processes can host network endpoints for network communication.

If in-transit encryption is enabled on an EMR cluster, different network endpoints use different encryption mechanisms. See the following sections to learn more about the specific open-source framework network endpoints supported with in-transit encryption, the related encryption mechanisms, and which Amazon EMR release added the support. Each open-source application might also have different best practices and open-source framework configurations that you can change. 

 For the most in-transit encryption coverage, we recommend that you enable both in-transit encryption and Kerberos. If you only enable in-transit encryption, then in-transit encryption will be available only for the network endpoints that support TLS. Kerberos is necessary because some open-source framework network endpoints use Simple Authentication and Security Layer (SASL) for in-transit encryption.

Note that any open-source frameworks not supported in Amazon EMR 7.x.x releases are not included.

## Spark
<a name="emr-encryption-support-matrix-spark"></a>

When you enable in-transit encryption in security configurations, `spark.authenticate` is automatically set to `true` and uses AES-based encryption for RPC connections.

Starting with Amazon EMR 7.3.0, if you use in-transit encryption and Kerberos authentication, you can't use Spark applications that depend on the Hive metastore. Hive 3 fixes this issue in [HIVE-16340](https://issues.apache.org/jira/browse/HIVE-16340). [HIVE-44114](https://issues.apache.org/jira/browse/SPARK-44114) fully resolves this issue when open-source Spark can upgrade to Hive 3. In the meantime, you can set `hive.metastore.use.SSL` to `false` to work around this issue. For more information, see [Configure applications](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html).

For more information, see [Spark security](https://spark.apache.org/docs/latest/security) in the Apache Spark documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Spark History Server  |  spark.ssl.history.port  |  18480  |  TLS  |  emr-5.3.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Spark UI  |  spark.ui.port  |  4440  |  TLS  |  emr-5.3.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Spark Driver  |  spark.driver.port  |  Dynamic  |  Spark AES-based encryption  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Spark Executor  |  Executor Port (no named config)  |  Dynamic  |  Spark AES-based encryption  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  YARN NodeManager  |  spark.shuffle.service.port1  |  7337  |  Spark AES-based encryption  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 

1`spark.shuffle.service.port` is hosted on YARN NodeManager but is only used by Apache Spark.

**Known issue**

On intransit enabled clusters `spark.yarn.historyServer.address` configuration is currently using port `18080`, which prevents access to spark application UI using YARN tracking URL. **Affects Version:** EMR - 7.3.0 to EMR - 7.9.0.

Use the following workaround:

1. Modify the `spark.yarn.historyServer.address` configuration in `/etc/spark/conf/spark-defaults.conf` to use the `HTTPS` port number `18480` on a running cluster.

1. This can also be provided in configuration overrides while launching the cluster.

Example configuration:

```
[
                               {
                                 "Classification": "spark-defaults",
                                 "Properties": {
                                     "spark.yarn.historyServer.address": "${hadoopconf-yarn.resourcemanager.hostname}:18480"
                                 }
                               }
  
                               ]
```

## Hadoop YARN
<a name="emr-encryption-support-matrix-hadoop-yarn"></a>

[ Secure Hadoop RPC](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_RPC) is set to `privacy` and uses SASL-based in-transit encryption. This requires that Kerberos authentication is enabled in the security configuration. If you don't want in-transit encryption for Hadoop RPC, configure `hadoop.rpc.protection = authentication`. We recommend that you use the default configuration for maximum security.

If your TLS certificates can't meet TLS hostname verification requirements, you can configure `hadoop.ssl.hostname.verifier = ALLOW_ALL`. We recommend that you use the default configuration of `hadoop.ssl.hostname.verifier = DEFAULT`, which enforces TLS hostname verification. 

To disable HTTPS for the YARN web application endpoints, configure `yarn.http.policy = HTTP_ONLY`. This makes it so that traffic to these endpoints stays unencrypted. We recommend that you use the default configuration for maximum security.

For more information, see [Hadoop in secure mode](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html) in the Apache Hadoop documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
| ResourceManager |  yarn.resourcemanager.webapp.address  |  8088  |  TLS  |  emr-7.3.0\$1  | 
| ResourceManager |  yarn.resourcemanager.resource-tracker.address  |  8025  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
| ResourceManager |  yarn.resourcemanager.scheduler.address  |  8030  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
| ResourceManager |  yarn.resourcemanager.address  |  8032  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
| ResourceManager |  yarn.resourcemanager.admin.address  |  8033  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
| TimelineServer |  yarn.timeline-service.address  |  10200  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
| TimelineServer |  yarn.timeline-service.webapp.address  |  8188  |  TLS  |  emr-7.3.0\$1  | 
|  WebApplicationProxy  |  yarn.web-proxy.address  |  20888  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  NodeManager  |  yarn.nodemanager.address  |  8041  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  NodeManager  |  yarn.nodemanager.localizer.address  |  8040  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  NodeManager  |  yarn.nodemanager.webapp.address  |  8044  |  TLS  |  emr-7.3.0\$1  | 
|  NodeManager  |  mapreduce.shuffle.port1  |  13562  |  TLS  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  NodeManager  |  spark.shuffle.service.port2  |  7337  |  Spark AES-based encryption  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 

1 `mapreduce.shuffle.port` is hosted on YARN NodeManager but is only used by Hadoop MapReduce.

2 `spark.shuffle.service.port` is hosted on YARN NodeManager but is only used by Apache Spark.

**Known issue**

The `yarn.log.server.url` configuration in is currently using HTTP with port 19888, which prevents access to application logs from the Resource Manager UI. **Affects Version:** EMR - 7.3.0 to EMR - 7.8.0.

Use the following workaround:

1. Modify the `yarn.log.server.url` configuration in `yarn-site.xml` to use the `HTTPS` protocol and port number `19890`.

1. Restart YARN Resource Manager: `sudo systemctl restart hadoop-yarn-resourcemanager.service`.

## Hadoop HDFS
<a name="emr-encryption-support-matrix-hadoop-hdfs"></a>

The Hadoop name node, data node, and journal node all support TLS by default if in-transit encryption is enabled in EMR clusters.

[ Secure Hadoop RPC](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_RPC) is set to to `privacy` and uses SASL-based in-transit encryption. This requires that Kerberos authentication is enabled in the security configuration.

We recommend that you don't change the default ports used for HTTPS endpoints.

[ Data encryption on HDFS block transfer uses](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_Block_data_transfer.) AES 256 and requires that at-rest encryption is enabled in the security configuration.

For more information, see [Hadoop in secure mode](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html) in the Apache Hadoop documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Namenode  |  dfs.namenode.https-address  |  9871  |  TLS  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Namenode  |  dfs.namenode.rpc-address  |  8020  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Datanode  |  dfs.datanode.https.address  |  9865  |  TLS  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Datanode  |  dfs.datanode.address  |  9866  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Journal Node  |  dfs.journalnode.https-address  |  8481  |  TLS  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Journal Node  |  dfs.journalnode.rpc-address  |  8485  |  SASL \$1 Kerberos  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  DFSZKFailoverController  |  dfs.ha.zkfc.port  |  8019  |  None  |  TLS for ZKFC is only supported in Hadoop 3.4.0. See [HADOOP-18919](https://issues.apache.org/jira/browse/HADOOP-18919) for more information. Amazon EMR release 7.1.0 is currently on Hadoop 3.3.6. Higher Amazon EMR releases are on Hadoop 3.4.0 in the future  | 

## Hadoop MapReduce
<a name="emr-encryption-support-matrix-hadoop-mapreduce"></a>

Hadoop MapReduce, job history server, and MapReduce shuffle all support TLS by default when in-transit encryption is enabled in EMR clusters.

[ Hadoop MapReduce encrypted shuffle](https://hadoop.apache.org/docs/r2.7.1/hadoop-mapreduce-client/hadoop-mapreduce-client-core/EncryptedShuffle.html) uses TLS.

We recommend that you don't change the default ports for HTTPS endpoints.

For more information, see [Hadoop in secure mode](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html) in the Apache Hadoop documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  JobHistoryServer  |  mapreduce.jobhistory.webapp.https.address  |  19890  |  TLS  |  emr-7.3.0\$1  | 
|  YARN NodeManager  |  mapreduce.shuffle.port1  |  13562  |  TLS  |  emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 

1 `mapreduce.shuffle.port` is hosted on YARN NodeManager but is only used by Hadoop MapReduce.

## Presto
<a name="emr-encryption-support-matrix-presto"></a>

In Amazon EMR releases 5.6.0 and higher, internal communication between the Presto coordinator and workers uses TLS Amazon EMR sets up all the required configurations to enable [secure internal communication](https://prestodb.io/docs/current/security/internal-communication.html) in Presto. 

If the connector uses the Hive metastore as the metadata store, communication between the communicator and the Hive metastore is also encrypted with TLS.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Presto Coordinator  |  http-server.https.port  |  8446  |  TLS  |  emr-5.6.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 
|  Presto Worker  |  http-server.https.port  |  8446  |  TLS  |  emr-5.6.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 

## Trino
<a name="emr-encryption-support-matrix-trino"></a>

In Amazon EMR releases 6.1.0 and higher, internal communication between the Presto coordinator and workers uses TLS Amazon EMR sets up all the required configurations to enable [secure internal communication](https://trino.io/docs/current/security/internal-communication.html) in Trino. 

If the connector uses the Hive metastore as the metadata store, communication between the communicator and the Hive metastore is also encrypted with TLS.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Trino Coordinator  |  http-server.https.port  |  8446  |  TLS  |  emr-6.1.0\$1, emr-7.0.0\$1  | 
|  Trino Worker  |  http-server.https.port  |  8446  |  TLS  |  emr-6.1.0\$1, emr-7.0.0\$1  | 

## Hive and Tez
<a name="emr-encryption-support-matrix-hive-tez"></a>

By default, Hive server 2, Hive metastore server, Hive LLAP Daemon web UI, and Hive LLAP shuffle all support TLS when in-transit encryption is enabled in the EMR clusters. For more information about the Hive configurations, see [Configuration properties](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties).

Tez UI that's hosted on the Tomcat server is also HTTPS-enabled when in-transit encryption is enable in the EMR cluster. However, HTTPS is disabled for the Tez AM web UI service so AM users don't have access to the keystore file for the opening SSL listener. You can also enable this behavior with the Boolean configurations `tez.am.tez-ui.webservice.enable.ssl` and `tez.am.tez-ui.webservice.enable.client.auth`.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  HiveServer2  |  hive.server2.thrift.port  |  10000  |  TLS  |  emr-6.9.0\$1, emr-7.0.0\$1  | 
|  HiveServer2  |  hive.server2.thrift.http.port  |  10001  |  TLS  |  emr-6.9.0\$1, emr-7.0.0\$1  | 
|  HiveServer2  |  hive.server2.webui.port  |  10002  |  TLS  |  emr-7.3.0\$1  | 
|  HiveMetastoreServer  |  hive.metastore.port  |  9083  |  TLS  |  emr-7.3.0\$1  | 
|  LLAP Daemon  |  hive.llap.daemon.yarn.shuffle.port  |  15551  |  TLS  |  emr-7.3.0\$1  | 
|  LLAP Daemon  |  hive.llap.daemon.web.port  |  15002  |  TLS  |  emr-7.3.0\$1  | 
|  LLAP Daemon  |  hive.llap.daemon.output.service.port  |  15003  |  None  |  Hive doesn't support in-transit encryption for this endpoint  | 
|  LLAP Daemon  |  hive.llap.management.rpc.port  |  15004  |  None  |  Hive doesn't support in-transit encryption for this endpoint  | 
|  LLAP Daemon  |  hive.llap.plugin.rpc.port  |  Dynamic  |  None  |  Hive doesn't support in-transit encryption for this endpoint  | 
|  LLAP Daemon  |  hive.llap.daemon.rpc.port  |  Dynamic  |  None  |  Hive doesn't support in-transit encryption for this endpoint  | 
|  WebHCat  |  templeton.port  |  50111  |  TLS  |  emr-7.3.0\$1  | 
|  Tez Application Master  |  tez.am.client.am.port-range tez.am.task.am.port-range  |  Dynamic  |  None  |  Tez doesn't support in-transit encryption for this endpoint  | 
|  Tez Application Master  |  tez.am.tez-ui.webservice.port-range  |  Dynamic  |  None  |  Disabled by default. Can be enabled using Tez configurations in emr-7.3.0\$1  | 
|  Tez Task  |  N/A - not configurable  |  Dynamic  |  None  |  Tez doesn't support in-transit encryption for this endpoint  | 
|  Tez UI  |  Configurable via Tomcat server on which Tez UI is hosted  |  8080  |  TLS  |  emr-7.3.0\$1  | 

## Flink
<a name="emr-encryption-support-matrix-flink"></a>

 Apache Flink REST endpoints and internal communication between flink processes support TLS by default when you enable in-transit encryption in EMR clusters. 

 [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-internal-enabled](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-internal-enabled) is set to `true` and uses in-transit encryption for internal communication between the Flink processes. If you don't want in-transit encryption for internal communication, disable that configuration. We recommend you use the default configuration for maximum security. 

 Amazon EMR sets [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-rest-enabled](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-rest-enabled) to `true` and uses in-transit encryption for the REST endpoints. Additionally, Amazon EMR also sets [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#historyserver-web-ssl-enabled](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#historyserver-web-ssl-enabled) to true to use TLS communication with the Flink history server. If you don't want in-transit encryption for the REST points, disable these configurations. We recommend you use the default configuration for maximum security. 

Amazon EMR uses [https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-algorithms](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/config/#security-ssl-algorithms). to specify the list of ciphers that use AES-based encryption. Override this configuration to use the ciphers you want.

For more information, see [SSL Setup](https://nightlies.apache.org/flink/flink-docs-release-1.18/docs/deployment/security/security-ssl/) in the Flink documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Flink History Server  |  historyserver.web.port  |  8082  |  TLS  |  emr-7.3.0\$1  | 
|  Job Manager Rest Server  |  rest.bind-port rest.port  |  Dynamic  |  TLS  |  emr-7.3.0\$1  | 

## HBase
<a name="emr-encryption-support-matrix-hbase"></a>

 Amazon EMR sets [ Secure Hadoop RPC](https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SecureMode.html#Data_Encryption_on_RPC) to `privacy`. HMaster and RegionServer use SASL-based in-transit encryption. This requires that Kerberos authentication is enabled in the security configuration. 

Amazon EMR sets `hbase.ssl.enabled` to true and uses TLS for UI endpoints. If you don't want to use TLS for UI endpoints, disable this configuration. We recommend that you use the default configuration for maximum security.

Amazon EMR sets `hbase.rest.ssl.enabled` and `hbase.thrift.ssl.enabled` and uses TLS for the REST and Thirft server endpoints, respectively. If you don't want to use TLS for these endpoints, disable this configuration. We recommend that you use the default configuration for maximum security.

Starting with EMR 7.6.0, TLS is supported on HMaster and RegionServer endpoints. Amazon EMR also sets `hbase.server.netty.tls.enabled` and `hbase.client.netty.tls.enabled`. If you don’t want to use TLS for these endpoints, disable this configuration. We recommend that you use the default configuration, which provides encryption and thus higher security. To learn more, see [Transport Level Security (TLS) in HBase RPC communication](https://hbase.apache.org/book.html#_transport_level_security_tls_in_hbase_rpc_communication) in the *Apache HBase Reference Guide*. 


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  HMaster  |  HMaster  |  16000  |  SASL \$1 Kerberos TLS  |  SASL \$1 Kerberos in emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, and emr-7.0.0\$1 TLS in emr-7.6.0\$1  | 
|  HMaster  |  HMaster UI  |  16010  |  TLS  |  emr-7.3.0\$1  | 
|  RegionServer  |  RegionServer  |  16020  |  SASL \$1 Kerberos TLS  |  SASL \$1 Kerberos in emr-4.8.0\$1, emr-5.0.0\$1, emr-6.0.0\$1, and emr-7.0.0\$1 TLS in emr-7.6.0\$1  | 
|  RegionServer  |  RegionServer Info  |  16030  |  TLS  |  emr-7.3.0\$1  | 
|  HBase Rest Server  |  Rest Server  |  8070  |  TLS  |  emr-7.3.0\$1  | 
|  HBase Rest Server  |  Rest UI  |  8085  |  TLS  |  emr-7.3.0\$1  | 
|  Hbase Thrift Server  |  Thrift Server  |  9090  |  TLS  |  emr-7.3.0\$1  | 
|  Hbase Thrift Server  |  Thrift Server UI  |  9095  |  TLS  |  emr-7.3.0\$1  | 

## Phoenix
<a name="emr-encryption-support-matrix-phoenix"></a>

 If you enabled in-transit encryption in your EMR cluster, Phoenix Query Serversupports the TLS property `phoenix.queryserver.tls.enabled`, which is set to `true` by default. 

To learn more, see [ Configurations relating to HTTPS](https://phoenix.apache.org/server.html#Configuration) in the Phoenix Query Server documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Query Server  |  phoenix.queryserver.http.port  |  8765  |  TLS  |  emr-7.3.0\$1  | 

## Oozie
<a name="emr-encryption-support-matrix-oozie"></a>

[OOZIE-3673](https://issues.apache.org/jira/browse/OOZIE-3673) is available on Amazon EMR if you run Oozie on Amazon EMR 7.3.0 and higher. If you need to configure custom SSL or TLS protocols when you run an email action, you can set the property `oozie.email.smtp.ssl.protocols` in the `oozie-site.xml` file. By default, if you enabled in-transit encryption, Amazon EMR uses the TLS v1.3 protocol.

[OOZIE-3677](https://issues.apache.org/jira/browse/OOZIE-3677) and [OOZIE-3674](https://issues.apache.org/jira/browse/OOZIE-3674) are also available on Amazon EMR if you run Oozie on Amazon EMR 7.3.0 and higher. This lets you specify the properties `keyStoreType` and `trustStoreType` in `oozie-site.xml`. OOZIE-3674 adds the parameter `--insecure` to the Oozie client so it can ignore certificate errors.

Oozie enforces TLS hostname verification, which means that any certificate you use for in-transit encryption must meet hostname verification requirements. If the certificate doesn't meet the criteria, the cluster might get stuck at the `oozie share lib update` stage when Amazon EMR provisions the cluster. We recommend that you update your certificates to make sure they're compliant with hostname verification. However, if you can't update the certificates, you can disable SSL for Oozie by setting the `oozie.https.enabled` property to `false` in cluster configuration. 


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  EmbeddedOozieServer  |  oozie.https.port  |  11443  |  TLS  |  emr-7.3.0\$1  | 
|  EmbeddedOozieServer  |  oozie.email.smtp.port  |  25  |  TLS  |  emr-7.3.0\$1  | 

## Hue
<a name="emr-encryption-support-matrix-hue"></a>

By default, Hue supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information about Hue configurations, see [Configure Hue with HTTPS / SSL](https://gethue.com/configure-hue-with-https-ssl/). 


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Hue  |  http\$1port  |  8888  |  TLS  |  emr-7.4.0\$1  | 

## Livy
<a name="emr-encryption-support-matrix-livy"></a>

By default, Livy supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information about Livy configurations, see [Enabling HTTPS with Apache Livy](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/enabling-https.html).

Starting with Amazon EMR 7.3.0, if you use in-transit encryption and Kerberos authentication, you can't use the Livy server for Spark applications that depend on the Hive metastore. This issue is fixed in [HIVE-16340](https://issues.apache.org/jira/browse/HIVE-16340) and is fully resolved in [SPARK-44114](https://issues.apache.org/jira/browse/SPARK-44114) when the open-source Spark application can upgrade to Hive 3. In the meantime, you can work around this issue if you set `hive.metastore.use.SSL` to `false`. For more information, see [Configure applications](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html).

For more information, see [enabling HTTPS with Apache Livy](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/enabling-https.html).


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  livy-server  |  livy.server.port  |  8998  |  TLS  |  emr-7.4.0\$1  | 

## JupyterEnterpriseGateway
<a name="emr-encryption-matrix-jupyter-enterprise"></a>

By default, Jupyter Enterprise Gateway supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information about the Jupyter Enterprise Gateway configurations, see [Securing Enterprise Gateway Server](https://jupyter-enterprise-gateway.readthedocs.io/en/v1.2.0/getting-started-security.html#securing-enterprise-gateway-server).


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  jupyter\$1enterprise\$1gateway  |  c.EnterpriseGatewayApp.port  |  9547  |  TLS  |  emr-7.4.0\$1  | 

## JupyterHub
<a name="emr-encryption-matrix-jupyter-hub"></a>

By default, JupyterHub supports TLS when in-transit encryption is enabled in Amazon EMR clusters. For more information, see [Enabling SSL encryption](https://jupyterhub.readthedocs.io/en/latest/tutorial/getting-started/security-basics.html#enabling-ssl-encryption) in the JupyterHub documentation. It isn't recommended to disable encryption. 


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  jupyter\$1hub  |  c.JupyterHub.port  |  9443  |  TLS  |  emr-5.14.0\$1, emr-6.0.0\$1, emr-7.0.0\$1  | 

## Zeppelin
<a name="emr-encryption-matrix-zeppelin"></a>

 By default, Zeppelin supports TLS when you enable in-transit encryption in your EMR cluster. For more information about the Zeppelin configurations, see [ SSL Configuration](https://zeppelin.apache.org/docs/0.11.1/setup/operation/configuration.html#ssl-configuration) in the Zeppelin documentation. 


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  zeppelin  |  zeppelin.server.ssl.port  |  8890  |  TLS  |  7.3.0\$1  | 

## Zookeeper
<a name="emr-encryption-matrix-zookeeper"></a>

Amazon EMR sets `serverCnxnFactory` to `org.apache.zookeeper.server.NettyServerCnxnFactory` to enable TLS for the Zookeeper quorum and client communication.

`secureClientPort` specifies the port that listens to TLS connections. If the client doesn't support TLS connections to Zookeeper, clients can connect to the insecure port of 2181 specified in `clientPort`. You can override or disable these two ports.

Amazon EMR sets both `sslQuorum` and `admin.forceHttps` to `true` to enable TLS communication for the quorum and admin server. If you don't want in-transit encryption for the quorum and the admin server, you can disable those configurations. We recommend that you use the default configurations for maximum security.

For more information, see [Encryption, Authentication, Authorization Options](https://zookeeper.apache.org/doc/r3.9.2/zookeeperAdmin.html#sc_authOptions) in the Zookeeper documentation.


| Component | Endpoint | Port | In-Transit Encryption Mechanism | Supported from Release | 
| --- | --- | --- | --- | --- | 
|  Zookeeper Server  |  secureClientPort  |  2281  |  TLS  |  emr-7.4.0\$1  | 
|  Zookeeper Server  |  Quorum Ports  |  There are 2: Followers use 2888 to connect to the leader. Leader election uses 3888  |  TLS  |  emr-7.4.0\$1  | 
|  Zookeeper Server  |  admin.serverPort  |  8341  |  TLS  |  emr-7.4.0\$1  | 