

# Authenticate to Amazon EMR cluster nodes
<a name="emr-authenticate-cluster-connections"></a>

SSH clients can use an Amazon EC2 key pair to authenticate to cluster instances. Alternatively, with Amazon EMR releases 5.10.0 and higher, you can configure Kerberos to authenticate users and SSH connections to the primary node. And with Amazon EMR releases 5.12.0 and higher, you can authenticate with LDAP.

**Topics**
+ [Use an EC2 key pair for SSH credentials for Amazon EMR](emr-plan-access-ssh.md)
+ [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md)
+ [Use Active Directory or LDAP servers for authentication with Amazon EMR](ldap.md)

# Use an EC2 key pair for SSH credentials for Amazon EMR
<a name="emr-plan-access-ssh"></a>

Amazon EMR cluster nodes run on Amazon EC2 instances. You can connect to cluster nodes in the same way that you can connect to Amazon EC2 instances. You can use Amazon EC2 to create a key pair, or you can import a key pair. When you create a cluster, you can specify the Amazon EC2 key pair that will be used for SSH connections to all cluster instances. You can also create a cluster without a key pair. This is usually done with transient clusters that start, run steps, and then terminate automatically.

The SSH client that you use to connect to the cluster needs to use the private key file associated with this key pair. This is a .pem file for SSH clients using Linux, Unix and macOS. You must set permissions so that only the key owner has permission to access the file. This is a .ppk file for SSH clients using Windows, and the .ppk file is usually created from the .pem file.
+ For more information about creating an Amazon EC2 key pair, see [Amazon EC2 key pairs](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html) in the *Amazon EC2 User Guide*.
+ For instructions about using PuTTYgen to create a .ppk file from a .pem file, see [Converting your private key using PuTTYgen](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/putty.html#putty-private-key) in the *Amazon EC2 User Guide*.
+ For more information about setting .pem file permissions and how to connect to an EMR cluster's primary node using different methods - including `ssh` from Linux or macOS, PuTTY from Windows, or the AWS CLI from any supported operating system, see [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md).

# Use Kerberos for authentication with Amazon EMR
<a name="emr-kerberos"></a>

Amazon EMR releases 5.10.0 and higher support Kerberos. Kerberos is a network authentication protocol that uses secret-key cryptography to provide strong authentication so that passwords or other credentials aren't sent over the network in an unencrypted format.

In Kerberos, services and users that need to authenticate are known as *principals*. Principals exist within a Kerberos *realm*. Within the realm, a Kerberos server known as the *key distribution center (KDC)* provides the means for principals to authenticate. The KDC does this by issuing *tickets* for authentication. The KDC maintains a database of the principals within its realm, their passwords, and other administrative information about each principal. A KDC can also accept authentication credentials from principals in other realms, which is known as a *cross-realm trust*. In addition, an EMR cluster can use an external KDC to authenticate principals.

A common scenario for establishing a cross-realm trust or using an external KDC is to authenticate users from an Active Directory domain. This allows users to access an EMR cluster with their domain account when they use SSH to connect to a cluster or work with big data applications.

When you use Kerberos authentication, Amazon EMR configures Kerberos for the applications, components, and subsystems that it installs on the cluster so that they are authenticated with each other.

**Important**  
Amazon EMR does not support AWS Directory Service for Microsoft Active Directory in a cross-realm trust or as an external KDC.

Before you configure Kerberos using Amazon EMR, we recommend that you become familiar with Kerberos concepts, the services that run on a KDC, and the tools for administering Kerberos services. For more information, see [MIT Kerberos documentation](http://web.mit.edu/kerberos/krb5-latest/doc/), which is published by the [Kerberos consortium](http://kerberos.org/).

**Topics**
+ [Supported applications with Amazon EMR](emr-kerberos-principals.md)
+ [Kerberos architecture options with Amazon EMR](emr-kerberos-options.md)
+ [Configuring Kerberos on Amazon EMR](emr-kerberos-configure.md)
+ [Using SSH to connect to Kerberized clusters with Amazon EMR](emr-kerberos-connect-ssh.md)
+ [Tutorial: Configure an cluster-dedicated KDC with Amazon EMR](emr-kerberos-cluster-kdc.md)
+ [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md)

# Supported applications with Amazon EMR
<a name="emr-kerberos-principals"></a>

Within an EMR cluster, Kerberos principals are the big data application services and subsystems that run on all cluster nodes. Amazon EMR can configure the applications and components listed below to use Kerberos. Each application has a Kerberos user principal associated with it.

Amazon EMR does not support cross-realm trusts with AWS Directory Service for Microsoft Active Directory.

Amazon EMR only configures the open-source Kerberos authentication features for the applications and components listed below. Any other applications installed are not Kerberized, which can result in an inability to communicate with Kerberized components and cause application errors. Applications and components that are not Kerberized do not have authentication enabled. Supported applications and components may vary for different Amazon EMR releases.

The Livy user interface is the only web user interface hosted on the cluster that is Kerberized.
+ **Hadoop MapReduce**
+ **Hbase**
+ **HCatalog**
+ **HDFS**
+ **Hive**
  + Do not enable Hive with LDAP authentication. This may cause issues communicating with Kerberized YARN.
+ **Hue**
  + Hue user authentication isn't set automatically and can be configured using the configuration API.
  + Hue server is Kerberized. The Hue front-end (UI) is not configured for authentication. LDAP authentication can be configured for the Hue UI. 
+ **Livy**
  + Livy impersonation with Kerberized clusters is supported in Amazon EMR releases 5.22.0 and higher.
+ **Oozie**
+ **Phoenix**
+ **Presto**
  + Presto supports Kerberos authentication in Amazon EMR releases 6.9.0 and higher.
  + To use Kerberos authentication for Presto, you must enable [in-transit encryption](emr-data-encryption-options.md#emr-encryption-intransit).
+ **Spark**
+ **Tez**
+ **Trino**
  + Trino supports Kerberos authentication in Amazon EMR releases 6.11.0 and higher.
  + To use Kerberos authentication for Trino, you must enable [in-transit encryption](emr-data-encryption-options.md#emr-encryption-intransit).
+ **YARN**
+ **Zeppelin**
  + Zeppelin is only configured to use Kerberos with the Spark interpreter. It is not configured for other interpreters.
  + User impersonation is not supported for Kerberized Zeppelin interpreters other than Spark.
+ **Zookeeper**
  + Zookeeper client is not supported.

# Kerberos architecture options with Amazon EMR
<a name="emr-kerberos-options"></a>

When you use Kerberos with Amazon EMR, you can choose from the architectures listed in this section. Regardless of the architecture that you choose, you configure Kerberos using the same steps. You create a security configuration, you specify the security configuration and compatible cluster-specific Kerberos options when you create the cluster, and you create HDFS directories for Linux users on the cluster that match user principals in the KDC. For an explanation of configuration options and example configurations for each architecture, see [Configuring Kerberos on Amazon EMR](emr-kerberos-configure.md).

## Cluster-dedicated KDC (KDC on primary node)
<a name="emr-kerberos-localkdc-summary"></a>

This configuration is available with Amazon EMR releases 5.10.0 and higher.

![\[Amazon EMRcluster architecture with master node, core nodes, and task node within a Kerberos realm.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-cluster-dedicated-kdc.png)


**Advantages**
+ Amazon EMR has full ownership of the KDC.
+ The KDC on the EMR cluster is independent from centralized KDC implementations such as Microsoft Active Directory or AWS Managed Microsoft AD.
+ Performance impact is minimal because the KDC manages authentication only for local nodes within the cluster.
+ Optionally, other Kerberized clusters can reference the KDC as an external KDC. For more information, see [External KDC—primary node on a different cluster](#emr-kerberos-extkdc-cluster-summary).

**Considerations and limitations**
+ Kerberized clusters can not authenticate to one another, so applications can not interoperate. If cluster applications need to interoperate, you must establish a cross-realm trust between clusters, or set up one cluster as the external KDC for other clusters. If a cross-realm trust is established, the KDCs must have different Kerberos realms.
+ You must create Linux users on the EC2 instance of the primary node that correspond to KDC user principals, along with the HDFS directories for each user.
+ User principals must use an EC2 private key file and `kinit` credentials to connect to the cluster using SSH.

## Cross-realm trust
<a name="emr-kerberos-crossrealm-summary"></a>

In this configuration, principals (usually users) from a different Kerberos realm authenticate to application components on a Kerberized EMR cluster, which has its own KDC. The KDC on the primary node establishes a trust relationship with another KDC using a *cross-realm principal* that exists in both KDCs. The principal name and the password match precisely in each KDC. Cross-realm trusts are most common with Active Directory implementations, as shown in the following diagram. Cross-realm trusts with an external MIT KDC or a KDC on another Amazon EMR cluster are also supported.

![\[Amazon EMR clusters in different Kerberos realms with cross-realm trust to Active Directory.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-cross-realm-trust.png)


**Advantages**
+ The EMR cluster on which the KDC is installed maintains full ownership of the KDC.
+ With Active Directory, Amazon EMR automatically creates Linux users that correspond to user principals from the KDC. You still must create HDFS directories for each user. In addition, user principals in the Active Directory domain can access Kerberized clusters using `kinit` credentials, without the EC2 private key file. This eliminates the need to share the private key file among cluster users.
+ Because each cluster KDC manages authentication for the nodes in the cluster, the effects of network latency and processing overhead for a large number of nodes across clusters is minimized.

**Considerations and limitations**
+ If you are establishing a trust with an Active Directory realm, you must provide an Active Directory user name and password with permissions to join principals to the domain when you create the cluster.
+ Cross-realm trusts cannot be established between Kerberos realms with the same name.
+ Cross-realm trusts must be established explicitly. For example, if Cluster A and Cluster B both establish a cross-realm trust with a KDC, they do not inherently trust one another and their applications cannot authenticate to one another to interoperate.
+ KDCs must be maintained independently and coordinated so that credentials of user principals match precisely.

## External KDC
<a name="emr-kerberos-extkdc-summary"></a>

Configurations with an External KDC are supported with Amazon EMR 5.20.0 and later.
+ [External KDC—MIT KDC](#emr-kerberos-extkdc-mit-summary)
+ [External KDC—primary node on a different cluster](#emr-kerberos-extkdc-cluster-summary)
+ [External KDC—cluster KDC on a different cluster with Active Directory cross-realm trust](#emr-kerberos-extkdc-ad-trust-summary)

### External KDC—MIT KDC
<a name="emr-kerberos-extkdc-mit-summary"></a>

This configuration allows one or more EMR clusters to use principals defined and maintained in an MIT KDC server.

![\[Amazon EMRcluster architecture with Kerberos realm, showing master, core, and task nodes.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-external-kdc.png)


**Advantages**
+ Managing principals is consolidated in a single KDC.
+ Multiple clusters can use the same KDC in the same Kerberos realm. For more information, see [Requirements for using multiple clusters with the same KDC](#emr-kerberos-multi-kdc).
+ The primary node on a Kerberized cluster does not have the performance burden associated with maintaining the KDC.

**Considerations and limitations**
+ You must create Linux users on the EC2 instance of each Kerberized cluster's primary node that correspond to KDC user principals, along with the HDFS directories for each user.
+ User principals must use an EC2 private key file and `kinit` credentials to connect to Kerberized clusters using SSH.
+ Each node in Kerberized EMR clusters must have a network route to the KDC.
+ Each node in Kerberized clusters places an authentication burden on the external KDC, so the configuration of the KDC affects cluster performance. When you configure the hardware of the KDC server, consider the maximum number of Amazon EMR nodes to be supported simultaneously.
+ Cluster performance is dependent on the network latency between nodes in Kerberized clusters and the KDC.
+ Troubleshooting can be more difficult because of interdependencies.

### External KDC—primary node on a different cluster
<a name="emr-kerberos-extkdc-cluster-summary"></a>

This configuration is nearly identical to the external MIT KDC implementation above, except that the KDC is on the primary node of an EMR cluster. For more information, see [Cluster-dedicated KDC (KDC on primary node)](#emr-kerberos-localkdc-summary) and [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md).

![\[Diagram of Amazon EMR clusters with Kerberos realm, showing master and core nodes.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-external-cluster-kdc.png)


**Advantages**
+ Managing principals is consolidated in a single KDC.
+ Multiple clusters can use the same KDC in the same Kerberos realm. For more information, see [Requirements for using multiple clusters with the same KDC](#emr-kerberos-multi-kdc).

**Considerations and limitations**
+ You must create Linux users on the EC2 instance of each Kerberized cluster's primary node that correspond to KDC user principals, along with the HDFS directories for each user.
+ User principals must use an EC2 private key file and `kinit` credentials to connect to Kerberized clusters using SSH.
+ Each node in each EMR cluster must have a network route to the KDC.
+ Each Amazon EMR node in Kerberized clusters places an authentication burden on the external KDC, so the configuration of the KDC affects cluster performance. When you configure the hardware of the KDC server, consider the maximum number of Amazon EMR nodes to be supported simultaneously.
+ Cluster performance is dependent on the network latency between nodes in the clusters and the KDC.
+ Troubleshooting can be more difficult because of interdependencies.

### External KDC—cluster KDC on a different cluster with Active Directory cross-realm trust
<a name="emr-kerberos-extkdc-ad-trust-summary"></a>

In this configuration, you first create a cluster with a cluster-dedicated KDC that has a one-way cross-realm trust with Active Directory. For a detailed tutorial, see [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md). You then launch additional clusters, referencing the cluster KDC that has the trust as an external KDC. For an example, see [External cluster KDC with Active Directory cross-realm trust](emr-kerberos-config-examples.md#emr-kerberos-example-extkdc-ad-trust). This allows each Amazon EMR cluster that uses the external KDC to authenticate principals defined and maintained in a Microsoft Active Directory domain.

![\[Amazon EMR clusters with Kerberos authentication and Active Directory integration diagram.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-external-ad-trust-kdc.png)


**Advantages**
+ Managing principals is consolidated in the Active Directory domain.
+ Amazon EMR joins the Active Directory realm, which eliminates the need to create Linux users that correspond Active Directory users. You still must create HDFS directories for each user.
+ Multiple clusters can use the same KDC in the same Kerberos realm. For more information, see [Requirements for using multiple clusters with the same KDC](#emr-kerberos-multi-kdc).
+ User principals in the Active Directory domain can access Kerberized clusters using `kinit` credentials, without the EC2 private key file. This eliminates the need to share the private key file among cluster users.
+ Only one Amazon EMR primary node has the burden of maintaining the KDC, and only that cluster must be created with Active Directory credentials for the cross-realm trust between the KDC and Active Directory.

**Considerations and limitations**
+ Each node in each EMR cluster must have a network route to the KDC and the Active Directory domain controller.
+ Each Amazon EMR node places an authentication burden on the external KDC, so the configuration of the KDC affects cluster performance. When you configure the hardware of the KDC server, consider the maximum number of Amazon EMR nodes to be supported simultaneously.
+ Cluster performance is dependent on the network latency between nodes in the clusters and the KDC server.
+ Troubleshooting can be more difficult because of interdependencies.

## Requirements for using multiple clusters with the same KDC
<a name="emr-kerberos-multi-kdc"></a>

Multiple clusters can use the same KDC in the same Kerberos realm. However, if the clusters concurrently run, then the clusters might fail if they use Kerberos ServicePrincipal names that conflict.

If you have multiple concurrent clusters with the same external KDC, then ensure that the clusters use different Kerberos realms. If the clusters must use the same Kerberos realm, then ensure that the clusters are in different subnets, and that their CIDR ranges don’t overlap. 

# Configuring Kerberos on Amazon EMR
<a name="emr-kerberos-configure"></a>

This section provides configuration details and examples for setting up Kerberos with common architectures. Regardless of the architecture you choose, the configuration basics are the same and done in three steps. If you use an external KDC or set up a cross-realm trust, you must ensure that every node in a cluster has a network route to the external KDC, including the configuration of applicable security groups to allow inbound and outbound Kerberos traffic.

## Step 1: Create a security configuration with Kerberos properties
<a name="emr-kerberos-step1-summary"></a>

The security configuration specifies details about the Kerberos KDC, and allows the Kerberos configuration to be re-used each time you create a cluster. You can create a security configuration using the Amazon EMR console, the AWS CLI, or the EMR API. The security configuration can also contain other security options, such as encryption. For more information about creating security configurations and specifying a security configuration when you create a cluster, see [Use security configurations to set up Amazon EMR cluster security](emr-security-configurations.md). For information about Kerberos properties in a security configuration, see [Kerberos settings for security configurations](emr-kerberos-configure-settings.md#emr-kerberos-security-configuration).

## Step 2: Create a cluster and specify cluster-specific Kerberos attributes
<a name="emr-kerberos-step2-summary"></a>

When you create a cluster, you specify a Kerberos security configuration along with cluster-specific Kerberos options. When you use the Amazon EMR console, only the Kerberos options compatible with the specified security configuration are available. When you use the AWS CLI or Amazon EMR API, ensure that you specify Kerberos options compatible with the specified security configuration. For example, if you specify a principal password for a cross-realm trust when you create a cluster using the CLI, and the specified security configuration is not configured with cross-realm trust parameters, an error occurs. For more information, see [Kerberos settings for clusters](emr-kerberos-configure-settings.md#emr-kerberos-cluster-configuration).

## Step 3: Configure the cluster primary node
<a name="emr-kerberos-step3-summary"></a>

Depending on the requirements of your architecture and implementation, additional set up on the cluster may be required. You can do this after you create it or using steps or bootstrap actions during the creation process.

For each Kerberos-authenticated user that connects to the cluster using SSH, you must ensure that Linux accounts are created that correspond to the Kerberos user. If user principals are provided by an Active Directory domain controller, either as the external KDC or through a cross-realm trust, Amazon EMR creates Linux accounts automatically. If Active Directory is not used, you must create principals for each user that correspond to their Linux user. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

Each user also must also have an HDFS user directory that they own, which you must create. In addition, SSH must be configured with GSSAPI enabled to allow connections from Kerberos-authenticated users. GSSAPI must be enabled on the primary node, and the client SSH application must be configured to use GSSAPI. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

# Security configuration and cluster settings for Kerberos on Amazon EMR
<a name="emr-kerberos-configure-settings"></a>

When you create a Kerberized cluster, you specify the security configuration together with Kerberos attributes that are specific to the cluster. You can't specify one set without the other, or an error occurs.

This topic provides an overview of the configuration parameters available for Kerberos when you create a security configuration and a cluster. In addition, CLI examples for creating compatible security configurations and clusters are provided for common architectures.

## Kerberos settings for security configurations
<a name="emr-kerberos-security-configuration"></a>

You can create a security configuration that specifies Kerberos attributes using the Amazon EMR console, the AWS CLI, or the EMR API. The security configuration can also contain other security options, such as encryption. For more information, see [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md).

Use the following references to understand the available security configuration settings for the Kerberos architecture that you choose. Amazon EMR console settings are shown. For corresponding CLI options, see [Specifying Kerberos settings using the AWS CLI](emr-create-security-configuration.md#emr-kerberos-cli-parameters) or [Configuration examples](emr-kerberos-config-examples.md).

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-kerberos-configure-settings.html)

## Kerberos settings for clusters
<a name="emr-kerberos-cluster-configuration"></a>

You can specify Kerberos settings when you create a cluster using the Amazon EMR console, the AWS CLI, or the EMR API.

Use the following references to understand the available cluster configuration settings for the Kerberos architecture that you choose. Amazon EMR console settings are shown. For corresponding CLI options, see [Configuration examples](emr-kerberos-config-examples.md).


| Parameter | Description | 
| --- | --- | 
|  Realm  |  The Kerberos realm name for the cluster. The Kerberos convention is to set this to be the same as the domain name, but in uppercase. For example, for the domain `ec2.internal`, using `EC2.INTERNAL` as the realm name.  | 
|  KDC admin password  |  The password used within the cluster for `kadmin` or `kadmin.local`. These are command-line interfaces to the Kerberos V5 administration system, which maintains Kerberos principals, password policies, and keytabs for the cluster.   | 
|  Cross-realm trust principal password (optional)  |  Required when establishing a cross-realm trust. The cross-realm principal password, which must be identical across realms. Use a strong password.  | 
|  Active Directory domain join user (optional)  |  Required when using Active Directory in a cross-realm trust. This is the user logon name of an Active Directory account with permission to join computers to the domain. Amazon EMR uses this identity to join the cluster to the domain. For more information, see [Step 3: Add accounts to the domain for the EMR Cluster](emr-kerberos-cross-realm.md#emr-kerberos-ad-users).  | 
|  Active Directory domain join password (optional)  |  The password for the Active Directory domain join user. For more information, see [Step 3: Add accounts to the domain for the EMR Cluster](emr-kerberos-cross-realm.md#emr-kerberos-ad-users).  | 

# Configuration examples
<a name="emr-kerberos-config-examples"></a>

The following examples demonstrate security configurations and cluster configurations for common scenarios. AWS CLI commands are shown for brevity.

## Local KDC
<a name="emr-kerberos-example-local-kdc"></a>

The following commands create a cluster with a cluster-dedicated KDC running on the primary node. Additional configuration on the cluster is required. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name LocalKDCSecurityConfig \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ClusterDedicatedKdc",\
"ClusterDedicatedKdcConfiguration": {"TicketLifetimeInHours": 24 }}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge \
--applications Name=Hadoop Name=Hive --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole \
--security-configuration LocalKDCSecurityConfig \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=MyPassword
```

## Cluster-dedicated KDC with Active Directory cross-realm trust
<a name="emr-kerberos-example-crossrealm"></a>

The following commands create a cluster with a cluster-dedicated KDC running on the primary node with a cross-realm trust to an Active Directory domain. Additional configuration on the cluster and in Active Directory is required. For more information, see [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name LocalKDCWithADTrustSecurityConfig \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ClusterDedicatedKdc", \
"ClusterDedicatedKdcConfiguration": {"TicketLifetimeInHours": 24, \
"CrossRealmTrustConfiguration": {"Realm":"AD.DOMAIN.COM", \
"Domain":"ad.domain.com", "AdminServer":"ad.domain.com", \
"KdcServer":"ad.domain.com"}}}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge --applications Name=Hadoop Name=Hive \
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole --security-configuration KDCWithADTrustSecurityConfig \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=MyClusterKDCAdminPassword,\
ADDomainJoinUser=ADUserLogonName,ADDomainJoinPassword=ADUserPassword,\
CrossRealmTrustPrincipalPassword=MatchADTrustPassword
```

## External KDC on a different cluster
<a name="emr-kerberos-example-extkdc-cluster"></a>

The following commands create a cluster that references a cluster-dedicated KDC on the primary node of a different cluster to authenticate principals. Additional configuration on the cluster is required. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name ExtKDCOnDifferentCluster \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ExternalKdc", \
"ExternalKdcConfiguration": {"KdcServerType": "Single", \
"AdminServer": "MasterDNSOfKDCMaster:749", \
"KdcServer": "MasterDNSOfKDCMaster:88"}}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge \
--applications Name=Hadoop Name=Hive \
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole --security-configuration ExtKDCOnDifferentCluster \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=KDCOnMasterPassword
```

## External cluster KDC with Active Directory cross-realm trust
<a name="emr-kerberos-example-extkdc-ad-trust"></a>

The following commands create a cluster with no KDC. The cluster references a cluster-dedicated KDC running on the primary node of another cluster to authenticate principals. That KDC has a cross-realm trust with an Active Directory domain controller. Additional configuration on the primary node with the KDC is required. For more information, see [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name ExtKDCWithADIntegration \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ExternalKdc", \
"ExternalKdcConfiguration": {"KdcServerType": "Single", \
"AdminServer": "MasterDNSofClusterKDC:749", \
"KdcServer": "MasterDNSofClusterKDC.com:88", \
"AdIntegrationConfiguration": {"AdRealm":"AD.DOMAIN.COM", \
"AdDomain":"ad.domain.com", \
"AdServer":"ad.domain.com"}}}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge --applications Name=Hadoop Name=Hive \
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole --security-configuration ExtKDCWithADIntegration \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=KDCOnMasterPassword,\
ADDomainJoinUser=MyPrivilegedADUserName,ADDomainJoinPassword=PasswordForADDomainJoinUser
```

# Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections
<a name="emr-kerberos-configuration-users"></a>

Amazon EMR creates Kerberos-authenticated user clients for the applications that run on the cluster—for example, the `hadoop` user, `spark` user, and others. You can also add users who are authenticated to cluster processes using Kerberos. Authenticated users can then connect to the cluster with their Kerberos credentials and work with applications. For a user to authenticate to the cluster, the following configurations are required:
+ A Linux account matching the Kerberos principal in the KDC must exist on the cluster. Amazon EMR does this automatically in architectures that integrate with Active Directory.
+ You must create an HDFS user directory on the primary node for each user, and give the user permissions to the directory.
+ You must configure the SSH service so that GSSAPI is enabled on the primary node. In addition, users must have an SSH client with GSSAPI enabled.

## Adding Linux users and Kerberos principals to the primary node
<a name="emr-kerberos-configure-linux-kdc"></a>

If you do not use Active Directory, you must create Linux accounts on the cluster primary node and add principals for these Linux users to the KDC. This includes a principal in the KDC for the primary node. In addition to the user principals, the KDC running on the primary node needs a principal for the local host.

When your architecture includes Active Directory integration, Linux users and principals on the local KDC, if applicable, are created automatically. You can skip this step. For more information, see [Cross-realm trust](emr-kerberos-options.md#emr-kerberos-crossrealm-summary) and [External KDC—cluster KDC on a different cluster with Active Directory cross-realm trust](emr-kerberos-options.md#emr-kerberos-extkdc-ad-trust-summary).

**Important**  
The KDC, along with the database of principals, is lost when the primary node terminates because the primary node uses ephemeral storage. If you create users for SSH connections, we recommend that you establish a cross-realm trust with an external KDC configured for high-availability. Alternatively, if you create users for SSH connections using Linux accounts, automate the account creation process using bootstrap actions and scripts so that it can be repeated when you create a new cluster.

Submitting a step to the cluster after you create it or when you create the cluster is the easiest way to add users and KDC principals. Alternatively, you can connect to the primary node using an EC2 key pair as the default `hadoop` user to run the commands. For more information, see [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md).

The following example submits a bash script `configureCluster.sh` to a cluster that already exists, referencing its cluster ID. The script is saved to Amazon S3. 

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,\
Args=["s3://amzn-s3-demo-bucket/configureCluster.sh"]
```

The following example demonstrates the contents of the `configureCluster.sh` script. The script also handles creating HDFS user directories and enabling GSSAPI for SSH, which are covered in the following sections.

```
#!/bin/bash
#Add a principal to the KDC for the primary node, using the primary node's returned host name
sudo kadmin.local -q "ktadd -k /etc/krb5.keytab host/`hostname -f`"
#Declare an associative array of user names and passwords to add
declare -A arr
arr=([lijuan]=pwd1 [marymajor]=pwd2 [richardroe]=pwd3)
for i in ${!arr[@]}; do
    #Assign plain language variables for clarity
     name=${i} 
     password=${arr[${i}]}

     # Create a principal for each user in the primary node and require a new password on first logon
     sudo kadmin.local -q "addprinc -pw $password +needchange $name"

     #Add hdfs directory for each user
     hdfs dfs -mkdir /user/$name

     #Change owner of each user's hdfs directory to that user
     hdfs dfs -chown $name:$name /user/$name
done

# Enable GSSAPI authentication for SSH and restart SSH service
sudo sed -i 's/^.*GSSAPIAuthentication.*$/GSSAPIAuthentication yes/' /etc/ssh/sshd_config
sudo sed -i 's/^.*GSSAPICleanupCredentials.*$/GSSAPICleanupCredentials yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
```

## Adding user HDFS directories
<a name="emr-kerberos-configure-HDFS"></a>

To allow your users to log in to the cluster to run Hadoop jobs, you must add HDFS user directories for their Linux accounts, and grant each user ownership of their directory.

Submitting a step to the cluster after you create it or when you create the cluster is the easiest way to create HDFS directories. Alternatively, you could connect to the primary node using an EC2 key pair as the default `hadoop` user to run the commands. For more information, see [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md).

The following example submits a bash script `AddHDFSUsers.sh` to a cluster that already exists, referencing its cluster ID. The script is saved to Amazon S3. 

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/AddHDFSUsers.sh"]
```

The following example demonstrates the contents of the `AddHDFSUsers.sh` script.

```
#!/bin/bash
# AddHDFSUsers.sh script

# Initialize an array of user names from AD, or Linux users created manually on the cluster
ADUSERS=("lijuan" "marymajor" "richardroe" "myusername")

# For each user listed, create an HDFS user directory
# and change ownership to the user

for username in ${ADUSERS[@]}; do
     hdfs dfs -mkdir /user/$username
     hdfs dfs -chown $username:$username /user/$username
done
```

## Enabling GSSAPI for SSH
<a name="emr-kerberos-ssh-config"></a>

For Kerberos-authenticated users to connect to the primary node using SSH, the SSH service must have GSSAPI authentication enabled. To enable GSSAPI, run the following commands from the primary node command line or use a step to run it as a script. After reconfiguring SSH, you must restart the service.

```
sudo sed -i 's/^.*GSSAPIAuthentication.*$/GSSAPIAuthentication yes/' /etc/ssh/sshd_config
sudo sed -i 's/^.*GSSAPICleanupCredentials.*$/GSSAPICleanupCredentials yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
```

# Using SSH to connect to Kerberized clusters with Amazon EMR
<a name="emr-kerberos-connect-ssh"></a>

This section demonstrates the steps for a Kerberos-authenticated user to connect to the primary node of an EMR cluster.

Each computer that is used for an SSH connection must have SSH client and Kerberos client applications installed. Linux computers most likely include these by default. For example, OpenSSH is installed on most Linux, Unix, and macOS operating systems. You can check for an SSH client by typing **ssh** at the command line. If your computer does not recognize the command, install an SSH client to connect to the primary node. The OpenSSH project provides a free implementation of the full suite of SSH tools. For more information, see the [OpenSSH](http://www.openssh.org/) website. Windows users can use applications such as [PuTTY](http://www.chiark.greenend.org.uk/~sgtatham/putty/) as an SSH client. 

For more information about SSH connections, see [Connect to an Amazon EMR cluster](emr-connect-master-node.md).

SSH uses GSSAPI for authenticating Kerberos clients, and you must enable GSSAPI authentication for the SSH service on the cluster primary node. For more information, see [Enabling GSSAPI for SSH](emr-kerberos-configuration-users.md#emr-kerberos-ssh-config). SSH clients must also use GSSAPI.

In the following examples, for *MasterPublicDNS* use the value that appears for **Master public DNS** on the **Summary** tab of the cluster details pane—for example, *ec2-11-222-33-44.compute-1.amazonaws.com*.

## Prerequisite for krb5.conf (non-Active Directory)
<a name="emr-kerberos-conffile"></a>

When using a configuration without Active Directory integration, in addition to the SSH client and Kerberos client applications, each client computer must have a copy of the `/etc/krb5.conf` file that matches the `/etc/krb5.conf` file on the cluster primary node.

**To copy the krb5.conf file**

1. Use SSH to connect to the primary node using an EC2 key pair and the default `hadoop` user—for example, `hadoop@MasterPublicDNS`. For detailed instructions, see [Connect to an Amazon EMR cluster](emr-connect-master-node.md).

1. From the primary node, copy the contents of the `/etc/krb5.conf` file . For more information, see [Connect to an Amazon EMR cluster](emr-connect-master-node.md).

1. On each client computer that will connect to the cluster, create an identical `/etc/krb5.conf` file based on the copy that you made in the previous step.

## Using kinit and SSH
<a name="emr-kerberos-kinit-ssh"></a>

Each time a user connects from a client computer using Kerberos credentials, the user must first renew Kerberos tickets for their user on the client computer. In addition, the SSH client must be configured to use GSSAPI authentication.

**To use SSH to connect to a Kerberized EMR cluster**

1. Use `kinit` to renew your Kerberos tickets as shown in the following example

   ```
   kinit user1
   ```

1. Use an `ssh` client along with the principal that you created in the cluster-dedicated KDC or Active Directory user name. Make sure that GSSAPI authentication is enabled as shown in the following examples.

   **Example: Linux users**

   The `-K `option specifies GSSAPI authentication.

   ```
   ssh -K user1@MasterPublicDNS
   ```

   **Example: Windows users (PuTTY)**

   Make sure that the GSSAPI authentication option for the session is enabled as shown:  
![\[PuTTY Configuration window showing GSSAPI authentication options and library preferences.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-gssapi-putty.png)

# Tutorial: Configure an cluster-dedicated KDC with Amazon EMR
<a name="emr-kerberos-cluster-kdc"></a>

This topic guides you through creating a cluster with a cluster-dedicated *key distribution center (KDC)*, manually adding Linux accounts to all cluster nodes, adding Kerberos principals to the KDC on the primary node, and ensuring that client computers have a Kerberos client installed.

For more information on Amazon EMR support for Kerberos and KDC, as well as links to MIT Kerberos Documentation, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md).

## Step 1: Create the Kerberized cluster
<a name="emr-kerberos-clusterdedicated-cluster"></a>

1. Create a security configuration that enables Kerberos. The following example demonstrates a `create-security-configuration` command using the AWS CLI that specifies the security configuration as an inline JSON structure. You can also reference a file saved locally.

   ```
   aws emr create-security-configuration --name MyKerberosConfig \
   --security-configuration '{"AuthenticationConfiguration": {"KerberosConfiguration": 
   {"Provider": "ClusterDedicatedKdc", "ClusterDedicatedKdcConfiguration": {"TicketLifetimeInHours": 24}}}}'
   ```

1. Create a cluster that references the security configuration, establishes Kerberos attributes for the cluster, and adds Linux accounts using a bootstrap action. The following example demonstrates a `create-cluster `command using the AWS CLI. The command references the security configuration that you created above, `MyKerberosConfig`. It also references a simple script, `createlinuxusers.sh`, as a bootstrap action, which you create and upload to Amazon S3 before creating the cluster.

   ```
   aws emr create-cluster --name "MyKerberosCluster" \
   --release-label emr-7.12.0 \
   --instance-type m5.xlarge \
   --instance-count 3 \
   --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair \
   --service-role EMR_DefaultRole \
   --security-configuration MyKerberosConfig \
   --applications Name=Hadoop Name=Hive Name=Oozie Name=Hue Name=HCatalog Name=Spark \
   --kerberos-attributes Realm=EC2.INTERNAL,\
   KdcAdminPassword=MyClusterKDCAdminPwd \
   --bootstrap-actions Path=s3://amzn-s3-demo-bucket/createlinuxusers.sh
   ```

   The following code demonstrates the contents of the `createlinuxusers.sh` script, which adds user1, user2, and user3 to each node in the cluster. In the next step, you add these users as KDC principals.

   ```
   #!/bin/bash
   sudo adduser user1
   sudo adduser user2
   sudo adduser user3
   ```

## Step 2: Add principals to the KDC, create HDFS user directories, and configure SSH
<a name="emr-kerberos-clusterdedicated-KDC"></a>

The KDC running on the primary node needs a principal added for the local host and for each user that you create on the cluster. You may also create HDFS directories for each user if they need to connect to the cluster and run Hadoop jobs. Similarly, configure the SSH service to enable GSSAPI authentication, which is required for Kerberos. After you enable GSSAPI, restart the SSH service.

The easiest way to accomplish these tasks is to submit a step to the cluster. The following example submits a bash script `configurekdc.sh` to the cluster you created in the previous step, referencing its cluster ID. The script is saved to Amazon S3. Alternatively, you can connect to the primary node using an EC2 key pair to run the commands or submit the step during cluster creation.

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://myregion.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/configurekdc.sh"]
```

The following code demonstrates the contents of the `configurekdc.sh` script.

```
#!/bin/bash
#Add a principal to the KDC for the primary node, using the primary node's returned host name
sudo kadmin.local -q "ktadd -k /etc/krb5.keytab host/`hostname -f`"
#Declare an associative array of user names and passwords to add
declare -A arr
arr=([user1]=pwd1 [user2]=pwd2 [user3]=pwd3)
for i in ${!arr[@]}; do
    #Assign plain language variables for clarity
     name=${i} 
     password=${arr[${i}]}

     # Create principal for sshuser in the primary node and require a new password on first logon
     sudo kadmin.local -q "addprinc -pw $password +needchange $name"

     #Add user hdfs directory
     hdfs dfs -mkdir /user/$name

     #Change owner of user's hdfs directory to user
     hdfs dfs -chown $name:$name /user/$name
done

# Enable GSSAPI authentication for SSH and restart SSH service
sudo sed -i 's/^.*GSSAPIAuthentication.*$/GSSAPIAuthentication yes/' /etc/ssh/sshd_config
sudo sed -i 's/^.*GSSAPICleanupCredentials.*$/GSSAPICleanupCredentials yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
```

The users that you added should now be able to connect to the cluster using SSH. For more information, see [Using SSH to connect to Kerberized clusters with Amazon EMR](emr-kerberos-connect-ssh.md).

# Tutorial: Configure a cross-realm trust with an Active Directory domain
<a name="emr-kerberos-cross-realm"></a>

When you set up a cross-realm trust, you allow principals (usually users) from a different Kerberos realm to authenticate to application components on the EMR cluster. The cluster-dedicated *key distribution center (KDC)* establishes a trust relationship with another KDC using a *cross-realm principal* that exists in both KDCs. The principal name and the password match precisely.

A cross-realm trust requires that the KDCs can reach one another over the network and resolve each other's domain names. Steps for establishing a cross-realm trust relationship with a Microsoft AD domain controller running as an EC2 instance are provided below, along with an example network setup that provides the required connectivity and domain-name resolution. Any network setup that allows the required network traffic between KDCs is acceptable.

Optionally, after you establish a cross-realm trust with Active Directory using a KDC on one cluster, you can create another cluster using a different security configuration to reference the KDC on the first cluster as an external KDC. For an example security configuration and cluster set up, see [External cluster KDC with Active Directory cross-realm trust](emr-kerberos-config-examples.md#emr-kerberos-example-extkdc-ad-trust).

For more information on Amazon EMR support for Kerberos and KDC, as well as links to MIT Kerberos Documentation, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md).

**Important**  
Amazon EMR does not support cross-realm trusts with AWS Directory Service for Microsoft Active Directory.

[Step 1: Set up the VPC and subnet](#emr-kerberos-ad-network)

[Step 2: Launch and install the Active Directory domain controller](#emr-kerberos-ad-dc)

[Step 3: Add accounts to the domain for the EMR Cluster](#emr-kerberos-ad-users)

[Step 4: Configure an incoming trust on the Active Directory domain controller](#emr-kerberos-ad-configure-trust)

[Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server](#emr-kerberos-ad-DHCP)

[Step 6: Launch a Kerberized EMR Cluster](#emr-kerberos-ad-cluster)

[Step 7: Create HDFS users and set permissions on the cluster for Active Directory accounts](#emr-kerberos-ad-hadoopuser)

## Step 1: Set up the VPC and subnet
<a name="emr-kerberos-ad-network"></a>

The following steps demonstrate creating a VPC and subnet so that the cluster-dedicated KDC can reach the Active Directory domain controller and resolve its domain name. In these steps, domain-name resolution is provided by referencing the Active Directory domain controller as the domain name server in the DHCP option set. For more information, see [Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server](#emr-kerberos-ad-DHCP).

The KDC and the Active Directory domain controller must be able to resolve one other's domain names. This allows Amazon EMR to join computers to the domain and automatically configure corresponding Linux accounts and SSH parameters on cluster instances. 

If Amazon EMR can't resolve the domain name, you can reference the trust using the Active Directory domain controller's IP address. However, you must manually add Linux accounts, add corresponding principals to the cluster-dedicated KDC, and configure SSH.

**To set up the VPC and subnet**

1. Create an Amazon VPC with a single public subnet. For more information, see [Step 1: Create the VPC](https://docs.aws.amazon.com/AmazonVPC/latest/GettingStartedGuide/getting-started-ipv4.html#getting-started-create-vpc) in the *Amazon VPC Getting Started Guide*.
**Important**  
When you use a Microsoft Active Directory domain controller, choose a CIDR block for the EMR cluster so that all IPv4 addresses are fewer than nine characters in length (for example, 10.0.0.0/16). This is because the DNS names of cluster computers are used when the computers join the Active Directory directory. AWS assigns [DNS hostnames](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-hostnames) based on IPv4 address in a way that longer IP addresses may result in DNS names longer than 15 characters. Active Directory has a 15-character limit for registering joined computer names, and truncates longer names, which can cause unpredictable errors.

1. Remove the default DHCP option set assigned to the VPC. For more information, see [Changing a VPC to use No DHCP options](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_DHCP_Options.html#DHCP_Use_No_Options). Later on, you add a new one that specifies the Active Directory domain controller as the DNS server. 

1. Confirm that DNS support is enabled for the VPC, that is, that DNS Hostnames and DNS Resolution are both enabled. They are enabled by default. For more information, see [Updating DNS support for your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-updating).

1. Confirm that your VPC has an internet gateway attached, which is the default. For more information, see [Creating and attaching an internet gateway](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html#Add_IGW_Attach_Gateway).
**Note**  
An internet gateway is used in this example because you are establishing a new domain controller for the VPC. An internet gateway may not be required for your application. The only requirement is that the cluster-dedicated KDC can access the Active Directory domain controller.

1. Create a custom route table, add a route that targets the Internet Gateway, and then attach it to your subnet. For more information, see [Create a custom route table](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html#Add_IGW_Routing).

1. When you launch the EC2 instance for the domain controller, it must have a static public IPv4 address for you to connect to it using RDP. The easiest way to do this is to configure your subnet to auto-assign public IPv4 addresses. This is not the default setting when a subnet is created. For more information, see [Modifying the public IPv4 addressing attribute of your subnet](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-ip-addressing.html#subnet-public-ip). Optionally, you can assign the address when you launch the instance. For more information, see [Assigning a public IPv4 address during instance launch](https://docs.aws.amazon.com/vpc/latest/userguide/using-instance-addressing.html#public-ip-addresses).

1. When you finish, make a note of your VPC and subnet IDs. You use them later when you launch the Active Directory domain controller and the cluster.

## Step 2: Launch and install the Active Directory domain controller
<a name="emr-kerberos-ad-dc"></a>

1. Launch an EC2 instance based on the Microsoft Windows Server 2016 Base AMI. We recommend an m4.xlarge or better instance type. For more information, see [Launching an AWS Marketplace instance](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/launch-marketplace-console.html) in the *Amazon EC2 User Guide*.

1. Make a note of the Group ID of the security group associated with the EC2 instance. You need it for [Step 6: Launch a Kerberized EMR Cluster](#emr-kerberos-ad-cluster). We use *sg-012xrlmdomain345*. Alternatively, you can specify different security groups for the EMR cluster and this instance that allows traffic between them. For more information, see [Amazon EC2 security groups for Linux instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html) in the *Amazon EC2 User Guide*.

1. Connect to the EC2 instance using RDP. For more information, see [Connecting to your Windows instance](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/connecting_to_windows_instance.html) in the *Amazon EC2 User Guide*.

1. Start **Server Manager** to install and configure the Active Directory domain Services role on the server. Promote the server to a domain controller and assign a domain name (the example we use here is `ad.domain.com`). Make a note of the domain name because you need it later when you create the EMR security configuration and cluster. If you are new to setting up Active Directory, you can follow the instructions in [How to set up Active Directory (AD) in Windows Server 2016](https://ittutorials.net/microsoft/windows-server-2016/setting-up-active-directory-ad-in-windows-server-2016/).

   The instance restarts when you finish.

## Step 3: Add accounts to the domain for the EMR Cluster
<a name="emr-kerberos-ad-users"></a>

RDP to the Active Directory domain controller to create accounts in Active Directory Users and Computers for each cluster user. For more information, see [Create a User Account in Active Directory Users and Computers](https://technet.microsoft.com/en-us/library/dd894463(v=ws.10).aspx) on the *Microsoft Learn* site. Make a note of each user's **User logon name**. You need these later when you configure the cluster. 

In addition, create a account with sufficient privileges to join computers to the domain. You specify this account when you create a cluster. Amazon EMR uses it to join cluster instances to the domain. You specify this account and its password in [Step 6: Launch a Kerberized EMR Cluster](#emr-kerberos-ad-cluster). To delegate computer join privileges to the account, we recommend that you create a group with join privileges and then assign the user to the group. For instructions, see [Delegating directory join privileges](https://docs.aws.amazon.com/directoryservice/latest/admin-guide/directory_join_privileges.html) in the *AWS Directory Service Administration Guide*.

## Step 4: Configure an incoming trust on the Active Directory domain controller
<a name="emr-kerberos-ad-configure-trust"></a>

The example commands below create a trust in Active Directory, which is a one-way, incoming, non-transitive, realm trust with the cluster-dedicated KDC. The example we use for the cluster's realm is `EC2.INTERNAL`. Replace the *KDC-FQDN* with the **Public DNS** name listed for the Amazon EMR primary node hosting the KDC. The `passwordt` parameter specifies the **cross-realm principal password**, which you specify along with the cluster **realm** when you create a cluster. The realm name is derived from the default domain name in `us-east-1` for the cluster. The `Domain` is the Active Directory domain in which you are creating the trust, which is lower case by convention. The example uses `ad.domain.com`

Open the Windows command prompt with administrator privileges and type the following commands to create the trust relationship on the Active Directory domain controller:

```
C:\Users\Administrator> ksetup /addkdc EC2.INTERNAL KDC-FQDN
C:\Users\Administrator> netdom trust EC2.INTERNAL /Domain:ad.domain.com /add /realm /passwordt:MyVeryStrongPassword
C:\Users\Administrator> ksetup /SetEncTypeAttr EC2.INTERNAL AES256-CTS-HMAC-SHA1-96
```

## Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server
<a name="emr-kerberos-ad-DHCP"></a>

Now that the Active Directory domain controller is configured, you must configure the VPC to use it as a domain name server for name resolution within your VPC. To do this, attach a DHCP options set. Specify the **Domain name** as the domain name of your cluster - for example, `ec2.internal` if your cluster is in us-east-1 or `region.compute.internal` for other regions. For **Domain name servers**, you must specify the IP address of the Active Directory domain controller (which must be reachable from the cluster) as the first entry, followed by **AmazonProvidedDNS** (for example, ***xx.xx.xx.xx*,AmazonProvidedDNS**). For more information, see [Changing DHCP option sets](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_DHCP_Options.html#DHCPOptions).

## Step 6: Launch a Kerberized EMR Cluster
<a name="emr-kerberos-ad-cluster"></a>

1. In Amazon EMR, create a security configuration that specifies the Active Directory domain controller you created in the previous steps. An example command is shown below. Replace the domain, `ad.domain.com`, with the name of the domain you specified in [Step 2: Launch and install the Active Directory domain controller](#emr-kerberos-ad-dc).

   ```
   aws emr create-security-configuration --name MyKerberosConfig \
   --security-configuration '{
     "AuthenticationConfiguration": {
       "KerberosConfiguration": {
         "Provider": "ClusterDedicatedKdc",
         "ClusterDedicatedKdcConfiguration": {
           "TicketLifetimeInHours": 24,
           "CrossRealmTrustConfiguration": {
             "Realm": "AD.DOMAIN.COM",
             "Domain": "ad.domain.com",
             "AdminServer": "ad.domain.com",
             "KdcServer": "ad.domain.com"
           }
         }
       }
     }
   }'
   ```

1. Create the cluster with the following attributes:
   + Use the `--security-configuration` option to specify the security configuration that you created. We use *MyKerberosConfig* in the example.
   + Use the `SubnetId` property of the `--ec2-attributes option` to specify the subnet that you created in [Step 1: Set up the VPC and subnet](#emr-kerberos-ad-network). We use *step1-subnet* in the example.
   + Use the `AdditionalMasterSecurityGroups` and `AdditionalSlaveSecurityGroups` of the `--ec2-attributes` option to specify that the security group associated with the AD domain controller from [Step 2: Launch and install the Active Directory domain controller](#emr-kerberos-ad-dc) is associated with the cluster primary node as well as core and task nodes. We use *sg-012xrlmdomain345* in the example.

   Use `--kerberos-attributes` to specify the following cluster-specific Kerberos attributes:
   + The realm for the cluster that you specified when you set up the Active Directory domain controller.
   + The cross-realm trust principal password that you specified as `passwordt` in [Step 4: Configure an incoming trust on the Active Directory domain controller](#emr-kerberos-ad-configure-trust).
   + A `KdcAdminPassword`, which you can use to administer the cluster-dedicated KDC.
   + The user logon name and password of the Active Directory account with computer join privileges that you created in [Step 3: Add accounts to the domain for the EMR Cluster](#emr-kerberos-ad-users).

   The following example launches a Kerberized cluster.

   ```
   aws emr create-cluster --name "MyKerberosCluster" \
   --release-label emr-5.10.0 \
   --instance-type m5.xlarge \
   --instance-count 3 \
   --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair,\
   SubnetId=step1-subnet, AdditionalMasterSecurityGroups=sg-012xrlmdomain345,
   AdditionalSlaveSecurityGroups=sg-012xrlmdomain345\
   --service-role EMR_DefaultRole \
   --security-configuration MyKerberosConfig \
   --applications Name=Hadoop Name=Hive Name=Oozie Name=Hue Name=HCatalog Name=Spark \
   --kerberos-attributes Realm=EC2.INTERNAL,\
   KdcAdminPassword=MyClusterKDCAdminPwd,\
   ADDomainJoinUser=ADUserLogonName,ADDomainJoinPassword=ADUserPassword,\
   CrossRealmTrustPrincipalPassword=MatchADTrustPwd
   ```

## Step 7: Create HDFS users and set permissions on the cluster for Active Directory accounts
<a name="emr-kerberos-ad-hadoopuser"></a>

When setting up a trust relationship with Active Directory, Amazon EMR creates Linux users on the cluster for each Active Directory account. For example, the user logon name `LiJuan` in Active Directory has a Linux account of `lijuan`. Active Directory user names can contain upper-case letters, but Linux does not honor Active Directory casing.

To allow your users to log in to the cluster to run Hadoop jobs, you must add HDFS user directories for their Linux accounts, and grant each user ownership of their directory. To do this, we recommend that you run a script saved to Amazon S3 as a cluster step. Alternatively, you can run the commands in the script below from the command line on the primary node. Use the EC2 key pair that you specified when you created the cluster to connect to the primary node over SSH as the Hadoop user. For more information, see [Use an EC2 key pair for SSH credentials for Amazon EMR](emr-plan-access-ssh.md).

Run the following command to add a step to the cluster that runs a script, *AddHDFSUsers.sh*.

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/AddHDFSUsers.sh"]
```

The contents of the file *AddHDFSUsers.sh* is as follows.

```
#!/bin/bash
# AddHDFSUsers.sh script

# Initialize an array of user names from AD or Linux users and KDC principals created manually on the cluster
ADUSERS=("lijuan" "marymajor" "richardroe" "myusername")

# For each user listed, create an HDFS user directory
# and change ownership to the user

for username in ${ADUSERS[@]}; do
     hdfs dfs -mkdir /user/$username
     hdfs dfs -chown $username:$username /user/$username
done
```

### Active Directory groups mapped to Hadoop groups
<a name="emr-kerberos-ad-group"></a>

Amazon EMR uses System Security Services Daemon (SSD) to map Active Directory groups to Hadoop groups. To confirm group mappings, after you log in to the primary node as described in [Using SSH to connect to Kerberized clusters with Amazon EMR](emr-kerberos-connect-ssh.md), you can use the `hdfs groups` command to confirm that Active Directory groups to which your Active Directory account belongs have been mapped to Hadoop groups for the corresponding Hadoop user on the cluster. You can also check other users' group mappings by specifying one or more user names with the command, for example `hdfs groups lijuan`. For more information, see [groups](https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#groups) in the [Apache HDFS Commands Guide](https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html).

# Use Active Directory or LDAP servers for authentication with Amazon EMR
<a name="ldap"></a>

With Amazon EMR releases 6.12.0 and higher, you can use the LDAP over SSL (LDAPS) protocol to launch a cluster that natively integrates with your corporate identity server. LDAP (Lightweight Directory Access Protocol) is an open, vendor-neutral application protocol that accesses and maintains data. LDAP is commonly used for user authentication against corporate identity servers that are hosted on applications such as Active Directory (AD) and OpenLDAP. With this native integration, you can use your LDAP server to authenticate users on Amazon EMR.

Highlights of the Amazon EMR LDAP integration include:
+ Amazon EMR configures the supported applications to authenticate with LDAP authentication on your behalf.
+ Amazon EMR configures and maintains security for the supported applications with the Kerberos protocol. You don't need to input any commands or scripts.
+ You get fine-grained access control (FGAC) through Apache Ranger authorization for Hive Metastore database and tables. See [Integrate Amazon EMR with Apache Ranger](emr-ranger.md) for more information.
+ When you require LDAP credentials to access a cluster, you get fine-grained access control (FGAC) over who can access your EMR clusters through SSH.

The following pages provide a conceptual overview, prerequisites, and steps to launch an EMR cluster with the Amazon EMR LDAP integration.

**Topics**
+ [Overview of LDAP with Amazon EMR](ldap-overview.md)
+ [LDAP components for Amazon EMR](ldap-components.md)
+ [Application support and considerations with LDAP for Amazon EMR](ldap-considerations.md)
+ [Configure and launch an EMR cluster with LDAP](ldap-setup.md)
+ [Examples using LDAP with Amazon EMR](ldap-examples.md)

# Overview of LDAP with Amazon EMR
<a name="ldap-overview"></a>

Lightweight Directory Access Protocol (LDAP) is a software protocol that network administrators use to manage and control access to data by authenticating users within a company’s network. The LDAP protocol stores information in a hierarchical, tree directory structure. For more information, see [Basic LDAP Concepts](https://ldap.com/basic-ldap-concepts/) on *LDAP.com*.

Within a company’s network, many applications might use the LDAP protocol to authenticate users. With the Amazon EMR LDAP integration, EMR clusters can natively use the same LDAP protocol with an added security configuration.

There are two major implementations of the LDAP protocol that Amazon EMR supports: **Active Directory** and **OpenLDAP**. While other implementations are possible, most fit the same authentication protocols as Active Directory or OpenLDAP.

## Active Directory (AD)
<a name="ldap-ad"></a>

Active Directory (AD) is a directory service from Microsoft for Windows domain networks. AD is included on most Windows Server operating systems, and can communicate with clients over the LDAP and LDAPS protocols. For authentication, Amazon EMR attempts a user-bind with your AD instance with the User Principal Name (UPN) as the distinguished name and password. The UPN uses the standard format `username@domain_name`.

## OpenLDAP
<a name="ldap-openldap"></a>

OpenLDAP is a free, open-source implementation of the LDAP protocol. For authentication, Amazon EMR attempts a user-bind with your OpenLDAP instance with the fully qualified domain name (FQDN) as the distinguished name and password. The FQDN uses the standard format `username_attribute=username,LDAP_user_search_base`. Commonly, the `username_attribute` value is `uid`, and the `LDAP_user_search_base` value contains the attributes of the tree that leads to the user. For example, `ou=People,dc=example,dc=com`.

Other free and open-source implementations of the LDAP protocol typically follow a similar FQDN as OpenLDAP for the distinguished names of their users. 

# LDAP components for Amazon EMR
<a name="ldap-components"></a>

You can use your LDAP server to authenticate with Amazon EMR and any applications that the user directly utilizes on the EMR cluster through the following components. 

**Secret Agent**  
The *Secret Agent* is an on-cluster process that authenticates all user requests. The Secret Agent creates the user bind to your LDAP server on behalf of the supported applications on the EMR cluster. The Secret Agent runs as the `emrsecretagent` user, and it writes logs to the `/emr/secretagent/log` directory. These logs provide details about the state of each user's authentication request and any errors that might surface during user authentication.

**System Security Services Daemon (SSSD)**  
*SSSD* is a daemon that runs on each node of an LDAP-enabled EMR cluster. SSSD creates and manages a UNIX user to sync your remote corporate identity to each node. YARN-based applications such as Hive and Spark require that a local UNIX user exists on every node that runs a query for a user.

# Application support and considerations with LDAP for Amazon EMR
<a name="ldap-considerations"></a>

This topic lists supported applications, supported features and unsupported features.

## Supported applications with LDAP for Amazon EMR
<a name="ldap-considerations-apps"></a>

**Important**  
The applications listed on this page are the only applications that Amazon EMR supports for LDAP. To ensure cluster security, you can only include LDAP-compatible applications when you create an EMR cluster with LDAP enabled. If you attempt to install other, unsupported applications, Amazon EMR will reject your request for a new cluster.

The Amazon EMR releases 6.12 and higher support LDAP integration with the following applications:
+ Apache Livy
+ Apache Hive through HiveServer2 (HS2)
+ Trino
+ Presto
+ Hue

You can also install the following applications on an EMR cluster and configure them to meet your security needs:
+ Apache Spark
+ Apache Hadoop

## Supported features with LDAP for Amazon EMR
<a name="ldap-considerations-features"></a>

You can use the following Amazon EMR features with the LDAP integration:

**Note**  
To keep LDAP credentials secure, you must use in-transit encryption to secure the flow of data on and off the cluster. For more information about in-transit encryption, see [Encrypt data at rest and in transit with Amazon EMR](emr-data-encryption.md).
+ Encryption in transit (required) and at rest
+ Instance groups, instance fleets, and Spot Instances
+ Reconfiguration of applications on a running cluster
+ EMRFS server-side encryption (SSE)

## Unsupported features
<a name="ldap-considerations-limitations"></a>

Consider the following limitations when you use the Amazon EMR LDAP integration:
+ Amazon EMR disables steps for clusters with LDAP enabled.
+ Amazon EMR doesn't support runtime roles and AWS Lake Formation integrations for clusters with LDAP enabled.
+ Amazon EMR doesn't support LDAP with StartTLS.
+ Amazon EMR doesn't support high-availability mode (clusters with multiple primary nodes) for clusters with LDAP enabled.
+ You can't rotate bind credentials or certificates for clusters with LDAP enabled. If any of those fields were rotated, we recommend that you start a new cluster with the updated bind credentials or certificates.
+ You must use exact search bases with LDAP. The LDAP user and group search base doesn't support LDAP search filters.

# Configure and launch an EMR cluster with LDAP
<a name="ldap-setup"></a>

This section covers how to configure Amazon EMR for use with LDAP authentication.

**Topics**
+ [Add AWS Secrets Manager permissions to the Amazon EMR instance role](ldap-setup-asm.md)
+ [Create the Amazon EMR security configuration for LDAP integration](ldap-setup-security.md)
+ [Launch an EMR cluster that authenticates with LDAP](ldap-setup-launch.md)

# Add AWS Secrets Manager permissions to the Amazon EMR instance role
<a name="ldap-setup-asm"></a>

Amazon EMR uses an IAM service role to perform actions on your behalf to provision and manage clusters. The service role for cluster EC2 instances, also called *the EC2 instance profile for Amazon EMR*, is a special type of service role that Amazon EMR assigns to every EC2 instance in a cluster at launch.

To define permissions for an EMR cluster to interact with Amazon S3 data and other AWS services, define a custom Amazon EC2 instance profile instead of the `EMR_EC2_DefaultRole` when you launch your cluster. For more information, see [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md) and [Customize IAM roles with Amazon EMR](emr-iam-roles-custom.md).

Add the following statements to the default EC2 instance profile to allow Amazon EMR to tag sessions and access the AWS Secrets Manager that stores LDAP certificates.

```
    {
      "Sid": "AllowAssumeOfRolesAndTagging",
      "Effect": "Allow",
      "Action": ["sts:TagSession", "sts:AssumeRole"],
      "Resource": [
        "arn:aws:iam::111122223333:role/LDAP_DATA_ACCESS_ROLE_NAME",
        "arn:aws:iam::111122223333:role/LDAP_USER_ACCESS_ROLE_NAME"
      ]
    },
    {
        "Sid": "AllowSecretsRetrieval",
        "Effect": "Allow",
        "Action": "secretsmanager:GetSecretValue",
        "Resource": [
            "arn:aws:secretsmanager:us-east-1:111122223333:secret:LDAP_SECRET_NAME*",
            "arn:aws:secretsmanager:us-east-1:111122223333:secret:ADMIN_LDAP_SECRET_NAME*"
        ]
    }
```

**Note**  
Your cluster requests will fail if you forget the wildcard `*` character at the end of the secret name when you set Secrets Manager permissions. The wildcard represents the secret versions.  
You should aslo limit the scope of the AWS Secrets Manager policy to only the certificates that your cluster needs to provision instances.

# Create the Amazon EMR security configuration for LDAP integration
<a name="ldap-setup-security"></a>

Before you can launch an EMR cluster with LDAP integration, use the steps in [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md) to create an Amazon EMR security configuration for the cluster. Complete the following configurations in the `LDAPConfiguration` block under `AuthenticationConfiguration`, or the in corresponding fields in the Amazon EMR console **Security Configurations** section:

**`EnableLDAPAuthentication`**  
Console option: **Authentication protocol: LDAP**  
To use the LDAP integration, set this option to `true` or select it as your authentication protocol when you create a cluster in the console. By default, `EnableLDAPAuthentication` is `true` when you create a security configuration in the Amazon EMR console.

**`LDAPServerURL`**  
Console option: **LDAP server location**  
The location of the LDAP server including the prefix: `ldaps://location_of_server`.

**`BindCertificateARN`**  
Console option: **LDAP SSL certificate**  
The AWS Secrets Manager ARN that contains the certificate to sign the SSL certificate that the LDAP server uses. If your LDAP server is signed by a public Certificate Authority (CA), you can provide an AWS Secrets Manager ARN with a blank file. For more information on how to store your certificate in Secrets Manager, see [Store TLS certificates in AWS Secrets Manager](emr-ranger-tls-certificates.md).

**`BindCredentialsARN`**  
Console option: **LDAP server bind credentials**  
An AWS Secrets Manager ARN that contains the LDAP admin user bind credentials. The credentials are stored as a JSON object. There is only one key-value pair in this secret; the key in the pair is the username, and the value is the password. For example, `{"uid=admin,cn=People,dc=example,dc=com": "AdminPassword1"}`. This is an optional field unless you enable SSH login for your EMR cluster. In many configurations, Active Directory instances require bind credentials to allow SSSD to sync users.

**`LDAPAccessFilter`**  
Console option: **LDAP access filter**  
Specifies the subset of objects within your LDAP server that can authenticate. For example, if all you want to grant access to all users with the `posixAccount` object class in your LDAP server, define the access filter as `(objectClass=posixAccount)`.

**`LDAPUserSearchBase`**  
Console option: **LDAP user search base**  
The search base that your users belong under within your LDAP server. For example, `cn=People,dc=example,dc=com`.

**`LDAPGroupSearchBase`**  
Console option: **LDAP group search base**  
The search base that your groups belong under within your LDAP server. For example, `cn=Groups,dc=example,dc=com`.

**`EnableSSHLogin`**  
Console option: **SSH login**  
Specifies whether or not to allow password authentication with LDAP credentials. We don't recommend that you enable this option. Key pairs are a more secure route to allow access into EMR clusters. This field is optional and defaults to `false`. 

**`LDAPServerType`**  
Console option: **LDAP server type**  
Specifies the type of LDAP server that Amazon EMR connects to. Supported options are Active Directory and OpenLDAP. Other LDAP server types might work, but Amazon EMR doesn't officially support other server types. For more information, see [LDAP components for Amazon EMR](ldap-components.md).

**`ActiveDirectoryConfigurations`**  
A required sub-block for security configurations that use the Active Directory server type.

**`ADDomain`**  
Console option: **Active Directory domain**  
The domain name used to create the User Principal Name (UPN) for user authentication with security configurations that use the Active Directory server type.

## Considerations for security configurations with LDAP and Amazon EMR
<a name="ldap-setup-security-considerations"></a>
+ To create a security configuration with Amazon EMR LDAP integration, you must use in-transit encryption. For information about in-transit encryption, see [Encrypt data at rest and in transit with Amazon EMR](emr-data-encryption.md).
+ You can't define Kerberos configuration in the same security configuration. Amazon EMR provisions a KDC thar is dedicated to the automatically, and manages the admin password for this KDC. Users can't access this admin password.
+ You can't define IAM runtime roles and AWS Lake Formation in the same security configuration.
+ The `LDAPServerURL` must have the `ldaps://` protocol in its value.
+ The `LDAPAccessFilter` can't be empty. 

## Use LDAP with the Apache Ranger integration for Amazon EMR
<a name="ldap-setup-ranger"></a>

With the LDAP integration for Amazon EMR, you can further integrate with Apache Ranger. When you pull .your LDAP users into Ranger, you can then associate those users with an Apache Ranger policy server to integrate with Amazon EMR and other applications. To do this, define the `RangerConfiguration` field within `AuthorizationConfiguration` in the security configuration that you use with your LDAP cluster. For more information on how to set up the security configuration, see [Create the EMR security configuration](emr-ranger-security-config.md).

When you use LDAP with Amazon EMR, you don't need to provide a `KerberosConfiguration` with the Amazon EMR integration for Apache Ranger. 

# Launch an EMR cluster that authenticates with LDAP
<a name="ldap-setup-launch"></a>

Use the following steps to launch an EMR cluster with LDAP or Active Directory. 

1. Set up your environment:
   + Make sure that the nodes on your EMR cluster can communicate with Amazon S3 and AWS Secrets Manager. For more information on how to modify your EC2 instance profile role to communicate with these services, see [Add AWS Secrets Manager permissions to the Amazon EMR instance role](ldap-setup-asm.md).
   + If you plan to run your EMR cluster in a private subnet, you should use AWS PrivateLink and Amazon VPC endpoints, or use network address transalation (NAT) to configure the VPC to communicate with S3 and Secrets Manager. For more information, see [AWS PrivateLink and VPC endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/endpoint-services-overview.html) and [NAT instances](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Instance.html) in the *Amazon VPC Getting Started Guide*.
   + Make sure that there is network connectivity between your EMR cluster and the LDAP server. Your EMR clusters must access your LDAP server over the network. The primary, core, and task nodes for the cluster communicate with the LDAP server to sync user data. If your LDAP server runs on Amazon EC2, update the EC2 security group to accept traffic from the EMR cluster. For more information, see [Add AWS Secrets Manager permissions to the Amazon EMR instance role](ldap-setup-asm.md).

1. Create an Amazon EMR security configuration for the LDAP integration. For more information, see [Create the Amazon EMR security configuration for LDAP integration](ldap-setup-security.md).

1. Now that you're set up, use the steps in [Launch an Amazon EMR cluster](emr-gs.md#emr-getting-started-launch-sample-cluster) to launch your cluster with the following configurations:
   + Select Amazon EMR release 6.12 or higher. We recommend that you use the latest Amazon EMR release.
   + Only specify or select applications for your cluster that support LDAP. For a list of LDAP-supported applications with Amazon EMR, see [Application support and considerations with LDAP for Amazon EMR](ldap-considerations.md).
   + Apply the security configuration that you created in the previous step.

# Examples using LDAP with Amazon EMR
<a name="ldap-examples"></a>

Once you [provision an EMR cluster that uses LDAP](ldap-setup-launch.md) integration, you can provide your LDAP credentials to any [supported application](ldap-considerations.md#ldap-considerations-apps) through its built-in username and password authentication mechanism. This page shows some examples.

## Using LDAP authentication with Apache Hive
<a name="ldap-examples-"></a>

**Example - Apache Hive**  
The following example command starts an Apache Hive session through HiveServer2 and Beeline:  

```
beeline -u "jdbc:hive2://$HOSTNAME:10000/default;ssl=true;sslTrustStore=$TRUSTSTORE_PATH;trustStorePassword=$TRUSTSTORE_PASS"  -n LDAP_USERNAME -p LDAP_PASSWORD
```

## Using LDAP authentication with Apache Livy
<a name="ldap-examples-livy"></a>

**Example - Apache Livy**  
The following example command starts a Livy session through cURL. Replace `ENCODED-KEYPAIR` with a Base64-encoded string for `username:password`.  

```
curl -X POST --data '{"proxyUser":"LDAP_USERNAME","kind": "pyspark"}' -H "Content-Type: application/json" -H "Authorization: Basic ENCODED-KEYPAIR" DNS_OF_PRIMARY_NODE:8998/sessions
```

## Using LDAP authentication with Presto
<a name="ldap-examples-presto"></a>

**Example - Presto**  
The following example command starts a Presto session through the Presto CLI:  

```
presto-cli --user "LDAP_USERNAME" --password --catalog hive
```
After you run this command, enter the LDAP password at the prompt.

## Using LDAP authentication with Trino
<a name="ldap-examples-trino"></a>

**Example - Trino**  
The following example command starts a Trino session through the Trino CLI:  

```
trino-cli --user "LDAP_USERNAME" --password --catalog hive
```
After you run this command, enter the LDAP password at the prompt.

## Using LDAP authentication with Hue
<a name="ldap-examples-hue"></a>

You can access Hue UI through an SSH tunnel that you create on the cluster, or you can set a proxy server to publicly broadcast the connection to Hue. Because Hue doesn't run in HTTPS mode by default, we recommend that you use an additional encryption layer to ensure that communication between clients and the Hue UI is encrypted with HTTPS. This reduces the chance that you might accidentally expose user credentials in plain text.

To use the Hue UI, open the Hue UI in your browser and enter your LDAP username password to log in. If the credentials are correct, Hue logs you in and uses your identity to authenticate you with all supported applications.

## Using SSH for password authentication and Kerberos tickets for other applications
<a name="ldap-examples-ssh"></a>

**Important**  
We don't recommend that you use password authentication to SSH into an EMR cluster.

You can use your LDAP credentials to SSH to an EMR cluster. To do this, set the `EnableSSHLogin` configuration to `true` in the Amazon EMR security configuration that you use to start the cluster. Then, use the following command to SSH to the cluster once its been launched:

```
ssh username@EMR_PRIMARY_DNS_NAME
```

After you run this command, enter the LDAP password at the prompt.

Amazon EMR includes an on-cluster script that allows users to generate a Kerberos keytab file and ticket to use with supported applications that don't accept LDAP credentials directly. Some of these applications include `spark-submit`, Spark SQL, and PySpark.

Run `ldap-kinit` and follow the prompts. If the authentication succeeds, the Kerberos keytab file appears in your home directory with a valid Kerberos ticket. Use the Kerberos ticket to run applications as you would on any Kerberized environment.