

# Use Kerberos for authentication with Amazon EMR
<a name="emr-kerberos"></a>

Amazon EMR releases 5.10.0 and higher support Kerberos. Kerberos is a network authentication protocol that uses secret-key cryptography to provide strong authentication so that passwords or other credentials aren't sent over the network in an unencrypted format.

In Kerberos, services and users that need to authenticate are known as *principals*. Principals exist within a Kerberos *realm*. Within the realm, a Kerberos server known as the *key distribution center (KDC)* provides the means for principals to authenticate. The KDC does this by issuing *tickets* for authentication. The KDC maintains a database of the principals within its realm, their passwords, and other administrative information about each principal. A KDC can also accept authentication credentials from principals in other realms, which is known as a *cross-realm trust*. In addition, an EMR cluster can use an external KDC to authenticate principals.

A common scenario for establishing a cross-realm trust or using an external KDC is to authenticate users from an Active Directory domain. This allows users to access an EMR cluster with their domain account when they use SSH to connect to a cluster or work with big data applications.

When you use Kerberos authentication, Amazon EMR configures Kerberos for the applications, components, and subsystems that it installs on the cluster so that they are authenticated with each other.

**Important**  
Amazon EMR does not support AWS Directory Service for Microsoft Active Directory in a cross-realm trust or as an external KDC.

Before you configure Kerberos using Amazon EMR, we recommend that you become familiar with Kerberos concepts, the services that run on a KDC, and the tools for administering Kerberos services. For more information, see [MIT Kerberos documentation](http://web.mit.edu/kerberos/krb5-latest/doc/), which is published by the [Kerberos consortium](http://kerberos.org/).

**Topics**
+ [Supported applications with Amazon EMR](emr-kerberos-principals.md)
+ [Kerberos architecture options with Amazon EMR](emr-kerberos-options.md)
+ [Configuring Kerberos on Amazon EMR](emr-kerberos-configure.md)
+ [Using SSH to connect to Kerberized clusters with Amazon EMR](emr-kerberos-connect-ssh.md)
+ [Tutorial: Configure an cluster-dedicated KDC with Amazon EMR](emr-kerberos-cluster-kdc.md)
+ [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md)

# Supported applications with Amazon EMR
<a name="emr-kerberos-principals"></a>

Within an EMR cluster, Kerberos principals are the big data application services and subsystems that run on all cluster nodes. Amazon EMR can configure the applications and components listed below to use Kerberos. Each application has a Kerberos user principal associated with it.

Amazon EMR does not support cross-realm trusts with AWS Directory Service for Microsoft Active Directory.

Amazon EMR only configures the open-source Kerberos authentication features for the applications and components listed below. Any other applications installed are not Kerberized, which can result in an inability to communicate with Kerberized components and cause application errors. Applications and components that are not Kerberized do not have authentication enabled. Supported applications and components may vary for different Amazon EMR releases.

The Livy user interface is the only web user interface hosted on the cluster that is Kerberized.
+ **Hadoop MapReduce**
+ **Hbase**
+ **HCatalog**
+ **HDFS**
+ **Hive**
  + Do not enable Hive with LDAP authentication. This may cause issues communicating with Kerberized YARN.
+ **Hue**
  + Hue user authentication isn't set automatically and can be configured using the configuration API.
  + Hue server is Kerberized. The Hue front-end (UI) is not configured for authentication. LDAP authentication can be configured for the Hue UI. 
+ **Livy**
  + Livy impersonation with Kerberized clusters is supported in Amazon EMR releases 5.22.0 and higher.
+ **Oozie**
+ **Phoenix**
+ **Presto**
  + Presto supports Kerberos authentication in Amazon EMR releases 6.9.0 and higher.
  + To use Kerberos authentication for Presto, you must enable [in-transit encryption](emr-data-encryption-options.md#emr-encryption-intransit).
+ **Spark**
+ **Tez**
+ **Trino**
  + Trino supports Kerberos authentication in Amazon EMR releases 6.11.0 and higher.
  + To use Kerberos authentication for Trino, you must enable [in-transit encryption](emr-data-encryption-options.md#emr-encryption-intransit).
+ **YARN**
+ **Zeppelin**
  + Zeppelin is only configured to use Kerberos with the Spark interpreter. It is not configured for other interpreters.
  + User impersonation is not supported for Kerberized Zeppelin interpreters other than Spark.
+ **Zookeeper**
  + Zookeeper client is not supported.

# Kerberos architecture options with Amazon EMR
<a name="emr-kerberos-options"></a>

When you use Kerberos with Amazon EMR, you can choose from the architectures listed in this section. Regardless of the architecture that you choose, you configure Kerberos using the same steps. You create a security configuration, you specify the security configuration and compatible cluster-specific Kerberos options when you create the cluster, and you create HDFS directories for Linux users on the cluster that match user principals in the KDC. For an explanation of configuration options and example configurations for each architecture, see [Configuring Kerberos on Amazon EMR](emr-kerberos-configure.md).

## Cluster-dedicated KDC (KDC on primary node)
<a name="emr-kerberos-localkdc-summary"></a>

This configuration is available with Amazon EMR releases 5.10.0 and higher.

![\[Amazon EMRcluster architecture with master node, core nodes, and task node within a Kerberos realm.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-cluster-dedicated-kdc.png)


**Advantages**
+ Amazon EMR has full ownership of the KDC.
+ The KDC on the EMR cluster is independent from centralized KDC implementations such as Microsoft Active Directory or AWS Managed Microsoft AD.
+ Performance impact is minimal because the KDC manages authentication only for local nodes within the cluster.
+ Optionally, other Kerberized clusters can reference the KDC as an external KDC. For more information, see [External KDC—primary node on a different cluster](#emr-kerberos-extkdc-cluster-summary).

**Considerations and limitations**
+ Kerberized clusters can not authenticate to one another, so applications can not interoperate. If cluster applications need to interoperate, you must establish a cross-realm trust between clusters, or set up one cluster as the external KDC for other clusters. If a cross-realm trust is established, the KDCs must have different Kerberos realms.
+ You must create Linux users on the EC2 instance of the primary node that correspond to KDC user principals, along with the HDFS directories for each user.
+ User principals must use an EC2 private key file and `kinit` credentials to connect to the cluster using SSH.

## Cross-realm trust
<a name="emr-kerberos-crossrealm-summary"></a>

In this configuration, principals (usually users) from a different Kerberos realm authenticate to application components on a Kerberized EMR cluster, which has its own KDC. The KDC on the primary node establishes a trust relationship with another KDC using a *cross-realm principal* that exists in both KDCs. The principal name and the password match precisely in each KDC. Cross-realm trusts are most common with Active Directory implementations, as shown in the following diagram. Cross-realm trusts with an external MIT KDC or a KDC on another Amazon EMR cluster are also supported.

![\[Amazon EMR clusters in different Kerberos realms with cross-realm trust to Active Directory.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-cross-realm-trust.png)


**Advantages**
+ The EMR cluster on which the KDC is installed maintains full ownership of the KDC.
+ With Active Directory, Amazon EMR automatically creates Linux users that correspond to user principals from the KDC. You still must create HDFS directories for each user. In addition, user principals in the Active Directory domain can access Kerberized clusters using `kinit` credentials, without the EC2 private key file. This eliminates the need to share the private key file among cluster users.
+ Because each cluster KDC manages authentication for the nodes in the cluster, the effects of network latency and processing overhead for a large number of nodes across clusters is minimized.

**Considerations and limitations**
+ If you are establishing a trust with an Active Directory realm, you must provide an Active Directory user name and password with permissions to join principals to the domain when you create the cluster.
+ Cross-realm trusts cannot be established between Kerberos realms with the same name.
+ Cross-realm trusts must be established explicitly. For example, if Cluster A and Cluster B both establish a cross-realm trust with a KDC, they do not inherently trust one another and their applications cannot authenticate to one another to interoperate.
+ KDCs must be maintained independently and coordinated so that credentials of user principals match precisely.

## External KDC
<a name="emr-kerberos-extkdc-summary"></a>

Configurations with an External KDC are supported with Amazon EMR 5.20.0 and later.
+ [External KDC—MIT KDC](#emr-kerberos-extkdc-mit-summary)
+ [External KDC—primary node on a different cluster](#emr-kerberos-extkdc-cluster-summary)
+ [External KDC—cluster KDC on a different cluster with Active Directory cross-realm trust](#emr-kerberos-extkdc-ad-trust-summary)

### External KDC—MIT KDC
<a name="emr-kerberos-extkdc-mit-summary"></a>

This configuration allows one or more EMR clusters to use principals defined and maintained in an MIT KDC server.

![\[Amazon EMRcluster architecture with Kerberos realm, showing master, core, and task nodes.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-external-kdc.png)


**Advantages**
+ Managing principals is consolidated in a single KDC.
+ Multiple clusters can use the same KDC in the same Kerberos realm. For more information, see [Requirements for using multiple clusters with the same KDC](#emr-kerberos-multi-kdc).
+ The primary node on a Kerberized cluster does not have the performance burden associated with maintaining the KDC.

**Considerations and limitations**
+ You must create Linux users on the EC2 instance of each Kerberized cluster's primary node that correspond to KDC user principals, along with the HDFS directories for each user.
+ User principals must use an EC2 private key file and `kinit` credentials to connect to Kerberized clusters using SSH.
+ Each node in Kerberized EMR clusters must have a network route to the KDC.
+ Each node in Kerberized clusters places an authentication burden on the external KDC, so the configuration of the KDC affects cluster performance. When you configure the hardware of the KDC server, consider the maximum number of Amazon EMR nodes to be supported simultaneously.
+ Cluster performance is dependent on the network latency between nodes in Kerberized clusters and the KDC.
+ Troubleshooting can be more difficult because of interdependencies.

### External KDC—primary node on a different cluster
<a name="emr-kerberos-extkdc-cluster-summary"></a>

This configuration is nearly identical to the external MIT KDC implementation above, except that the KDC is on the primary node of an EMR cluster. For more information, see [Cluster-dedicated KDC (KDC on primary node)](#emr-kerberos-localkdc-summary) and [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md).

![\[Diagram of Amazon EMR clusters with Kerberos realm, showing master and core nodes.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-external-cluster-kdc.png)


**Advantages**
+ Managing principals is consolidated in a single KDC.
+ Multiple clusters can use the same KDC in the same Kerberos realm. For more information, see [Requirements for using multiple clusters with the same KDC](#emr-kerberos-multi-kdc).

**Considerations and limitations**
+ You must create Linux users on the EC2 instance of each Kerberized cluster's primary node that correspond to KDC user principals, along with the HDFS directories for each user.
+ User principals must use an EC2 private key file and `kinit` credentials to connect to Kerberized clusters using SSH.
+ Each node in each EMR cluster must have a network route to the KDC.
+ Each Amazon EMR node in Kerberized clusters places an authentication burden on the external KDC, so the configuration of the KDC affects cluster performance. When you configure the hardware of the KDC server, consider the maximum number of Amazon EMR nodes to be supported simultaneously.
+ Cluster performance is dependent on the network latency between nodes in the clusters and the KDC.
+ Troubleshooting can be more difficult because of interdependencies.

### External KDC—cluster KDC on a different cluster with Active Directory cross-realm trust
<a name="emr-kerberos-extkdc-ad-trust-summary"></a>

In this configuration, you first create a cluster with a cluster-dedicated KDC that has a one-way cross-realm trust with Active Directory. For a detailed tutorial, see [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md). You then launch additional clusters, referencing the cluster KDC that has the trust as an external KDC. For an example, see [External cluster KDC with Active Directory cross-realm trust](emr-kerberos-config-examples.md#emr-kerberos-example-extkdc-ad-trust). This allows each Amazon EMR cluster that uses the external KDC to authenticate principals defined and maintained in a Microsoft Active Directory domain.

![\[Amazon EMR clusters with Kerberos authentication and Active Directory integration diagram.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-external-ad-trust-kdc.png)


**Advantages**
+ Managing principals is consolidated in the Active Directory domain.
+ Amazon EMR joins the Active Directory realm, which eliminates the need to create Linux users that correspond Active Directory users. You still must create HDFS directories for each user.
+ Multiple clusters can use the same KDC in the same Kerberos realm. For more information, see [Requirements for using multiple clusters with the same KDC](#emr-kerberos-multi-kdc).
+ User principals in the Active Directory domain can access Kerberized clusters using `kinit` credentials, without the EC2 private key file. This eliminates the need to share the private key file among cluster users.
+ Only one Amazon EMR primary node has the burden of maintaining the KDC, and only that cluster must be created with Active Directory credentials for the cross-realm trust between the KDC and Active Directory.

**Considerations and limitations**
+ Each node in each EMR cluster must have a network route to the KDC and the Active Directory domain controller.
+ Each Amazon EMR node places an authentication burden on the external KDC, so the configuration of the KDC affects cluster performance. When you configure the hardware of the KDC server, consider the maximum number of Amazon EMR nodes to be supported simultaneously.
+ Cluster performance is dependent on the network latency between nodes in the clusters and the KDC server.
+ Troubleshooting can be more difficult because of interdependencies.

## Requirements for using multiple clusters with the same KDC
<a name="emr-kerberos-multi-kdc"></a>

Multiple clusters can use the same KDC in the same Kerberos realm. However, if the clusters concurrently run, then the clusters might fail if they use Kerberos ServicePrincipal names that conflict.

If you have multiple concurrent clusters with the same external KDC, then ensure that the clusters use different Kerberos realms. If the clusters must use the same Kerberos realm, then ensure that the clusters are in different subnets, and that their CIDR ranges don’t overlap. 

# Configuring Kerberos on Amazon EMR
<a name="emr-kerberos-configure"></a>

This section provides configuration details and examples for setting up Kerberos with common architectures. Regardless of the architecture you choose, the configuration basics are the same and done in three steps. If you use an external KDC or set up a cross-realm trust, you must ensure that every node in a cluster has a network route to the external KDC, including the configuration of applicable security groups to allow inbound and outbound Kerberos traffic.

## Step 1: Create a security configuration with Kerberos properties
<a name="emr-kerberos-step1-summary"></a>

The security configuration specifies details about the Kerberos KDC, and allows the Kerberos configuration to be re-used each time you create a cluster. You can create a security configuration using the Amazon EMR console, the AWS CLI, or the EMR API. The security configuration can also contain other security options, such as encryption. For more information about creating security configurations and specifying a security configuration when you create a cluster, see [Use security configurations to set up Amazon EMR cluster security](emr-security-configurations.md). For information about Kerberos properties in a security configuration, see [Kerberos settings for security configurations](emr-kerberos-configure-settings.md#emr-kerberos-security-configuration).

## Step 2: Create a cluster and specify cluster-specific Kerberos attributes
<a name="emr-kerberos-step2-summary"></a>

When you create a cluster, you specify a Kerberos security configuration along with cluster-specific Kerberos options. When you use the Amazon EMR console, only the Kerberos options compatible with the specified security configuration are available. When you use the AWS CLI or Amazon EMR API, ensure that you specify Kerberos options compatible with the specified security configuration. For example, if you specify a principal password for a cross-realm trust when you create a cluster using the CLI, and the specified security configuration is not configured with cross-realm trust parameters, an error occurs. For more information, see [Kerberos settings for clusters](emr-kerberos-configure-settings.md#emr-kerberos-cluster-configuration).

## Step 3: Configure the cluster primary node
<a name="emr-kerberos-step3-summary"></a>

Depending on the requirements of your architecture and implementation, additional set up on the cluster may be required. You can do this after you create it or using steps or bootstrap actions during the creation process.

For each Kerberos-authenticated user that connects to the cluster using SSH, you must ensure that Linux accounts are created that correspond to the Kerberos user. If user principals are provided by an Active Directory domain controller, either as the external KDC or through a cross-realm trust, Amazon EMR creates Linux accounts automatically. If Active Directory is not used, you must create principals for each user that correspond to their Linux user. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

Each user also must also have an HDFS user directory that they own, which you must create. In addition, SSH must be configured with GSSAPI enabled to allow connections from Kerberos-authenticated users. GSSAPI must be enabled on the primary node, and the client SSH application must be configured to use GSSAPI. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

# Security configuration and cluster settings for Kerberos on Amazon EMR
<a name="emr-kerberos-configure-settings"></a>

When you create a Kerberized cluster, you specify the security configuration together with Kerberos attributes that are specific to the cluster. You can't specify one set without the other, or an error occurs.

This topic provides an overview of the configuration parameters available for Kerberos when you create a security configuration and a cluster. In addition, CLI examples for creating compatible security configurations and clusters are provided for common architectures.

## Kerberos settings for security configurations
<a name="emr-kerberos-security-configuration"></a>

You can create a security configuration that specifies Kerberos attributes using the Amazon EMR console, the AWS CLI, or the EMR API. The security configuration can also contain other security options, such as encryption. For more information, see [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md).

Use the following references to understand the available security configuration settings for the Kerberos architecture that you choose. Amazon EMR console settings are shown. For corresponding CLI options, see [Specifying Kerberos settings using the AWS CLI](emr-create-security-configuration.md#emr-kerberos-cli-parameters) or [Configuration examples](emr-kerberos-config-examples.md).

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-kerberos-configure-settings.html)

## Kerberos settings for clusters
<a name="emr-kerberos-cluster-configuration"></a>

You can specify Kerberos settings when you create a cluster using the Amazon EMR console, the AWS CLI, or the EMR API.

Use the following references to understand the available cluster configuration settings for the Kerberos architecture that you choose. Amazon EMR console settings are shown. For corresponding CLI options, see [Configuration examples](emr-kerberos-config-examples.md).


| Parameter | Description | 
| --- | --- | 
|  Realm  |  The Kerberos realm name for the cluster. The Kerberos convention is to set this to be the same as the domain name, but in uppercase. For example, for the domain `ec2.internal`, using `EC2.INTERNAL` as the realm name.  | 
|  KDC admin password  |  The password used within the cluster for `kadmin` or `kadmin.local`. These are command-line interfaces to the Kerberos V5 administration system, which maintains Kerberos principals, password policies, and keytabs for the cluster.   | 
|  Cross-realm trust principal password (optional)  |  Required when establishing a cross-realm trust. The cross-realm principal password, which must be identical across realms. Use a strong password.  | 
|  Active Directory domain join user (optional)  |  Required when using Active Directory in a cross-realm trust. This is the user logon name of an Active Directory account with permission to join computers to the domain. Amazon EMR uses this identity to join the cluster to the domain. For more information, see [Step 3: Add accounts to the domain for the EMR Cluster](emr-kerberos-cross-realm.md#emr-kerberos-ad-users).  | 
|  Active Directory domain join password (optional)  |  The password for the Active Directory domain join user. For more information, see [Step 3: Add accounts to the domain for the EMR Cluster](emr-kerberos-cross-realm.md#emr-kerberos-ad-users).  | 

# Configuration examples
<a name="emr-kerberos-config-examples"></a>

The following examples demonstrate security configurations and cluster configurations for common scenarios. AWS CLI commands are shown for brevity.

## Local KDC
<a name="emr-kerberos-example-local-kdc"></a>

The following commands create a cluster with a cluster-dedicated KDC running on the primary node. Additional configuration on the cluster is required. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name LocalKDCSecurityConfig \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ClusterDedicatedKdc",\
"ClusterDedicatedKdcConfiguration": {"TicketLifetimeInHours": 24 }}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge \
--applications Name=Hadoop Name=Hive --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole \
--security-configuration LocalKDCSecurityConfig \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=MyPassword
```

## Cluster-dedicated KDC with Active Directory cross-realm trust
<a name="emr-kerberos-example-crossrealm"></a>

The following commands create a cluster with a cluster-dedicated KDC running on the primary node with a cross-realm trust to an Active Directory domain. Additional configuration on the cluster and in Active Directory is required. For more information, see [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name LocalKDCWithADTrustSecurityConfig \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ClusterDedicatedKdc", \
"ClusterDedicatedKdcConfiguration": {"TicketLifetimeInHours": 24, \
"CrossRealmTrustConfiguration": {"Realm":"AD.DOMAIN.COM", \
"Domain":"ad.domain.com", "AdminServer":"ad.domain.com", \
"KdcServer":"ad.domain.com"}}}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge --applications Name=Hadoop Name=Hive \
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole --security-configuration KDCWithADTrustSecurityConfig \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=MyClusterKDCAdminPassword,\
ADDomainJoinUser=ADUserLogonName,ADDomainJoinPassword=ADUserPassword,\
CrossRealmTrustPrincipalPassword=MatchADTrustPassword
```

## External KDC on a different cluster
<a name="emr-kerberos-example-extkdc-cluster"></a>

The following commands create a cluster that references a cluster-dedicated KDC on the primary node of a different cluster to authenticate principals. Additional configuration on the cluster is required. For more information, see [Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections](emr-kerberos-configuration-users.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name ExtKDCOnDifferentCluster \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ExternalKdc", \
"ExternalKdcConfiguration": {"KdcServerType": "Single", \
"AdminServer": "MasterDNSOfKDCMaster:749", \
"KdcServer": "MasterDNSOfKDCMaster:88"}}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge \
--applications Name=Hadoop Name=Hive \
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole --security-configuration ExtKDCOnDifferentCluster \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=KDCOnMasterPassword
```

## External cluster KDC with Active Directory cross-realm trust
<a name="emr-kerberos-example-extkdc-ad-trust"></a>

The following commands create a cluster with no KDC. The cluster references a cluster-dedicated KDC running on the primary node of another cluster to authenticate principals. That KDC has a cross-realm trust with an Active Directory domain controller. Additional configuration on the primary node with the KDC is required. For more information, see [Tutorial: Configure a cross-realm trust with an Active Directory domain](emr-kerberos-cross-realm.md).

**Create Security Configuration**

```
aws emr create-security-configuration --name ExtKDCWithADIntegration \
--security-configuration '{"AuthenticationConfiguration": \
{"KerberosConfiguration": {"Provider": "ExternalKdc", \
"ExternalKdcConfiguration": {"KdcServerType": "Single", \
"AdminServer": "MasterDNSofClusterKDC:749", \
"KdcServer": "MasterDNSofClusterKDC.com:88", \
"AdIntegrationConfiguration": {"AdRealm":"AD.DOMAIN.COM", \
"AdDomain":"ad.domain.com", \
"AdServer":"ad.domain.com"}}}}}'
```

**Create Cluster**

```
aws emr create-cluster --release-label emr-7.12.0 \
--instance-count 3 --instance-type m5.xlarge --applications Name=Hadoop Name=Hive \
--ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2Key \
--service-role EMR_DefaultRole --security-configuration ExtKDCWithADIntegration \
--kerberos-attributes Realm=EC2.INTERNAL,KdcAdminPassword=KDCOnMasterPassword,\
ADDomainJoinUser=MyPrivilegedADUserName,ADDomainJoinPassword=PasswordForADDomainJoinUser
```

# Configuring an Amazon EMR cluster for Kerberos-authenticated HDFS users and SSH connections
<a name="emr-kerberos-configuration-users"></a>

Amazon EMR creates Kerberos-authenticated user clients for the applications that run on the cluster—for example, the `hadoop` user, `spark` user, and others. You can also add users who are authenticated to cluster processes using Kerberos. Authenticated users can then connect to the cluster with their Kerberos credentials and work with applications. For a user to authenticate to the cluster, the following configurations are required:
+ A Linux account matching the Kerberos principal in the KDC must exist on the cluster. Amazon EMR does this automatically in architectures that integrate with Active Directory.
+ You must create an HDFS user directory on the primary node for each user, and give the user permissions to the directory.
+ You must configure the SSH service so that GSSAPI is enabled on the primary node. In addition, users must have an SSH client with GSSAPI enabled.

## Adding Linux users and Kerberos principals to the primary node
<a name="emr-kerberos-configure-linux-kdc"></a>

If you do not use Active Directory, you must create Linux accounts on the cluster primary node and add principals for these Linux users to the KDC. This includes a principal in the KDC for the primary node. In addition to the user principals, the KDC running on the primary node needs a principal for the local host.

When your architecture includes Active Directory integration, Linux users and principals on the local KDC, if applicable, are created automatically. You can skip this step. For more information, see [Cross-realm trust](emr-kerberos-options.md#emr-kerberos-crossrealm-summary) and [External KDC—cluster KDC on a different cluster with Active Directory cross-realm trust](emr-kerberos-options.md#emr-kerberos-extkdc-ad-trust-summary).

**Important**  
The KDC, along with the database of principals, is lost when the primary node terminates because the primary node uses ephemeral storage. If you create users for SSH connections, we recommend that you establish a cross-realm trust with an external KDC configured for high-availability. Alternatively, if you create users for SSH connections using Linux accounts, automate the account creation process using bootstrap actions and scripts so that it can be repeated when you create a new cluster.

Submitting a step to the cluster after you create it or when you create the cluster is the easiest way to add users and KDC principals. Alternatively, you can connect to the primary node using an EC2 key pair as the default `hadoop` user to run the commands. For more information, see [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md).

The following example submits a bash script `configureCluster.sh` to a cluster that already exists, referencing its cluster ID. The script is saved to Amazon S3. 

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,\
Args=["s3://amzn-s3-demo-bucket/configureCluster.sh"]
```

The following example demonstrates the contents of the `configureCluster.sh` script. The script also handles creating HDFS user directories and enabling GSSAPI for SSH, which are covered in the following sections.

```
#!/bin/bash
#Add a principal to the KDC for the primary node, using the primary node's returned host name
sudo kadmin.local -q "ktadd -k /etc/krb5.keytab host/`hostname -f`"
#Declare an associative array of user names and passwords to add
declare -A arr
arr=([lijuan]=pwd1 [marymajor]=pwd2 [richardroe]=pwd3)
for i in ${!arr[@]}; do
    #Assign plain language variables for clarity
     name=${i} 
     password=${arr[${i}]}

     # Create a principal for each user in the primary node and require a new password on first logon
     sudo kadmin.local -q "addprinc -pw $password +needchange $name"

     #Add hdfs directory for each user
     hdfs dfs -mkdir /user/$name

     #Change owner of each user's hdfs directory to that user
     hdfs dfs -chown $name:$name /user/$name
done

# Enable GSSAPI authentication for SSH and restart SSH service
sudo sed -i 's/^.*GSSAPIAuthentication.*$/GSSAPIAuthentication yes/' /etc/ssh/sshd_config
sudo sed -i 's/^.*GSSAPICleanupCredentials.*$/GSSAPICleanupCredentials yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
```

## Adding user HDFS directories
<a name="emr-kerberos-configure-HDFS"></a>

To allow your users to log in to the cluster to run Hadoop jobs, you must add HDFS user directories for their Linux accounts, and grant each user ownership of their directory.

Submitting a step to the cluster after you create it or when you create the cluster is the easiest way to create HDFS directories. Alternatively, you could connect to the primary node using an EC2 key pair as the default `hadoop` user to run the commands. For more information, see [Connect to the Amazon EMR cluster primary node using SSH](emr-connect-master-node-ssh.md).

The following example submits a bash script `AddHDFSUsers.sh` to a cluster that already exists, referencing its cluster ID. The script is saved to Amazon S3. 

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/AddHDFSUsers.sh"]
```

The following example demonstrates the contents of the `AddHDFSUsers.sh` script.

```
#!/bin/bash
# AddHDFSUsers.sh script

# Initialize an array of user names from AD, or Linux users created manually on the cluster
ADUSERS=("lijuan" "marymajor" "richardroe" "myusername")

# For each user listed, create an HDFS user directory
# and change ownership to the user

for username in ${ADUSERS[@]}; do
     hdfs dfs -mkdir /user/$username
     hdfs dfs -chown $username:$username /user/$username
done
```

## Enabling GSSAPI for SSH
<a name="emr-kerberos-ssh-config"></a>

For Kerberos-authenticated users to connect to the primary node using SSH, the SSH service must have GSSAPI authentication enabled. To enable GSSAPI, run the following commands from the primary node command line or use a step to run it as a script. After reconfiguring SSH, you must restart the service.

```
sudo sed -i 's/^.*GSSAPIAuthentication.*$/GSSAPIAuthentication yes/' /etc/ssh/sshd_config
sudo sed -i 's/^.*GSSAPICleanupCredentials.*$/GSSAPICleanupCredentials yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
```

# Using SSH to connect to Kerberized clusters with Amazon EMR
<a name="emr-kerberos-connect-ssh"></a>

This section demonstrates the steps for a Kerberos-authenticated user to connect to the primary node of an EMR cluster.

Each computer that is used for an SSH connection must have SSH client and Kerberos client applications installed. Linux computers most likely include these by default. For example, OpenSSH is installed on most Linux, Unix, and macOS operating systems. You can check for an SSH client by typing **ssh** at the command line. If your computer does not recognize the command, install an SSH client to connect to the primary node. The OpenSSH project provides a free implementation of the full suite of SSH tools. For more information, see the [OpenSSH](http://www.openssh.org/) website. Windows users can use applications such as [PuTTY](http://www.chiark.greenend.org.uk/~sgtatham/putty/) as an SSH client. 

For more information about SSH connections, see [Connect to an Amazon EMR cluster](emr-connect-master-node.md).

SSH uses GSSAPI for authenticating Kerberos clients, and you must enable GSSAPI authentication for the SSH service on the cluster primary node. For more information, see [Enabling GSSAPI for SSH](emr-kerberos-configuration-users.md#emr-kerberos-ssh-config). SSH clients must also use GSSAPI.

In the following examples, for *MasterPublicDNS* use the value that appears for **Master public DNS** on the **Summary** tab of the cluster details pane—for example, *ec2-11-222-33-44.compute-1.amazonaws.com*.

## Prerequisite for krb5.conf (non-Active Directory)
<a name="emr-kerberos-conffile"></a>

When using a configuration without Active Directory integration, in addition to the SSH client and Kerberos client applications, each client computer must have a copy of the `/etc/krb5.conf` file that matches the `/etc/krb5.conf` file on the cluster primary node.

**To copy the krb5.conf file**

1. Use SSH to connect to the primary node using an EC2 key pair and the default `hadoop` user—for example, `hadoop@MasterPublicDNS`. For detailed instructions, see [Connect to an Amazon EMR cluster](emr-connect-master-node.md).

1. From the primary node, copy the contents of the `/etc/krb5.conf` file . For more information, see [Connect to an Amazon EMR cluster](emr-connect-master-node.md).

1. On each client computer that will connect to the cluster, create an identical `/etc/krb5.conf` file based on the copy that you made in the previous step.

## Using kinit and SSH
<a name="emr-kerberos-kinit-ssh"></a>

Each time a user connects from a client computer using Kerberos credentials, the user must first renew Kerberos tickets for their user on the client computer. In addition, the SSH client must be configured to use GSSAPI authentication.

**To use SSH to connect to a Kerberized EMR cluster**

1. Use `kinit` to renew your Kerberos tickets as shown in the following example

   ```
   kinit user1
   ```

1. Use an `ssh` client along with the principal that you created in the cluster-dedicated KDC or Active Directory user name. Make sure that GSSAPI authentication is enabled as shown in the following examples.

   **Example: Linux users**

   The `-K `option specifies GSSAPI authentication.

   ```
   ssh -K user1@MasterPublicDNS
   ```

   **Example: Windows users (PuTTY)**

   Make sure that the GSSAPI authentication option for the session is enabled as shown:  
![\[PuTTY Configuration window showing GSSAPI authentication options and library preferences.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/kerb-gssapi-putty.png)

# Tutorial: Configure an cluster-dedicated KDC with Amazon EMR
<a name="emr-kerberos-cluster-kdc"></a>

This topic guides you through creating a cluster with a cluster-dedicated *key distribution center (KDC)*, manually adding Linux accounts to all cluster nodes, adding Kerberos principals to the KDC on the primary node, and ensuring that client computers have a Kerberos client installed.

For more information on Amazon EMR support for Kerberos and KDC, as well as links to MIT Kerberos Documentation, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md).

## Step 1: Create the Kerberized cluster
<a name="emr-kerberos-clusterdedicated-cluster"></a>

1. Create a security configuration that enables Kerberos. The following example demonstrates a `create-security-configuration` command using the AWS CLI that specifies the security configuration as an inline JSON structure. You can also reference a file saved locally.

   ```
   aws emr create-security-configuration --name MyKerberosConfig \
   --security-configuration '{"AuthenticationConfiguration": {"KerberosConfiguration": 
   {"Provider": "ClusterDedicatedKdc", "ClusterDedicatedKdcConfiguration": {"TicketLifetimeInHours": 24}}}}'
   ```

1. Create a cluster that references the security configuration, establishes Kerberos attributes for the cluster, and adds Linux accounts using a bootstrap action. The following example demonstrates a `create-cluster `command using the AWS CLI. The command references the security configuration that you created above, `MyKerberosConfig`. It also references a simple script, `createlinuxusers.sh`, as a bootstrap action, which you create and upload to Amazon S3 before creating the cluster.

   ```
   aws emr create-cluster --name "MyKerberosCluster" \
   --release-label emr-7.12.0 \
   --instance-type m5.xlarge \
   --instance-count 3 \
   --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair \
   --service-role EMR_DefaultRole \
   --security-configuration MyKerberosConfig \
   --applications Name=Hadoop Name=Hive Name=Oozie Name=Hue Name=HCatalog Name=Spark \
   --kerberos-attributes Realm=EC2.INTERNAL,\
   KdcAdminPassword=MyClusterKDCAdminPwd \
   --bootstrap-actions Path=s3://amzn-s3-demo-bucket/createlinuxusers.sh
   ```

   The following code demonstrates the contents of the `createlinuxusers.sh` script, which adds user1, user2, and user3 to each node in the cluster. In the next step, you add these users as KDC principals.

   ```
   #!/bin/bash
   sudo adduser user1
   sudo adduser user2
   sudo adduser user3
   ```

## Step 2: Add principals to the KDC, create HDFS user directories, and configure SSH
<a name="emr-kerberos-clusterdedicated-KDC"></a>

The KDC running on the primary node needs a principal added for the local host and for each user that you create on the cluster. You may also create HDFS directories for each user if they need to connect to the cluster and run Hadoop jobs. Similarly, configure the SSH service to enable GSSAPI authentication, which is required for Kerberos. After you enable GSSAPI, restart the SSH service.

The easiest way to accomplish these tasks is to submit a step to the cluster. The following example submits a bash script `configurekdc.sh` to the cluster you created in the previous step, referencing its cluster ID. The script is saved to Amazon S3. Alternatively, you can connect to the primary node using an EC2 key pair to run the commands or submit the step during cluster creation.

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> --steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,Jar=s3://myregion.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/configurekdc.sh"]
```

The following code demonstrates the contents of the `configurekdc.sh` script.

```
#!/bin/bash
#Add a principal to the KDC for the primary node, using the primary node's returned host name
sudo kadmin.local -q "ktadd -k /etc/krb5.keytab host/`hostname -f`"
#Declare an associative array of user names and passwords to add
declare -A arr
arr=([user1]=pwd1 [user2]=pwd2 [user3]=pwd3)
for i in ${!arr[@]}; do
    #Assign plain language variables for clarity
     name=${i} 
     password=${arr[${i}]}

     # Create principal for sshuser in the primary node and require a new password on first logon
     sudo kadmin.local -q "addprinc -pw $password +needchange $name"

     #Add user hdfs directory
     hdfs dfs -mkdir /user/$name

     #Change owner of user's hdfs directory to user
     hdfs dfs -chown $name:$name /user/$name
done

# Enable GSSAPI authentication for SSH and restart SSH service
sudo sed -i 's/^.*GSSAPIAuthentication.*$/GSSAPIAuthentication yes/' /etc/ssh/sshd_config
sudo sed -i 's/^.*GSSAPICleanupCredentials.*$/GSSAPICleanupCredentials yes/' /etc/ssh/sshd_config
sudo systemctl restart sshd
```

The users that you added should now be able to connect to the cluster using SSH. For more information, see [Using SSH to connect to Kerberized clusters with Amazon EMR](emr-kerberos-connect-ssh.md).

# Tutorial: Configure a cross-realm trust with an Active Directory domain
<a name="emr-kerberos-cross-realm"></a>

When you set up a cross-realm trust, you allow principals (usually users) from a different Kerberos realm to authenticate to application components on the EMR cluster. The cluster-dedicated *key distribution center (KDC)* establishes a trust relationship with another KDC using a *cross-realm principal* that exists in both KDCs. The principal name and the password match precisely.

A cross-realm trust requires that the KDCs can reach one another over the network and resolve each other's domain names. Steps for establishing a cross-realm trust relationship with a Microsoft AD domain controller running as an EC2 instance are provided below, along with an example network setup that provides the required connectivity and domain-name resolution. Any network setup that allows the required network traffic between KDCs is acceptable.

Optionally, after you establish a cross-realm trust with Active Directory using a KDC on one cluster, you can create another cluster using a different security configuration to reference the KDC on the first cluster as an external KDC. For an example security configuration and cluster set up, see [External cluster KDC with Active Directory cross-realm trust](emr-kerberos-config-examples.md#emr-kerberos-example-extkdc-ad-trust).

For more information on Amazon EMR support for Kerberos and KDC, as well as links to MIT Kerberos Documentation, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md).

**Important**  
Amazon EMR does not support cross-realm trusts with AWS Directory Service for Microsoft Active Directory.

[Step 1: Set up the VPC and subnet](#emr-kerberos-ad-network)

[Step 2: Launch and install the Active Directory domain controller](#emr-kerberos-ad-dc)

[Step 3: Add accounts to the domain for the EMR Cluster](#emr-kerberos-ad-users)

[Step 4: Configure an incoming trust on the Active Directory domain controller](#emr-kerberos-ad-configure-trust)

[Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server](#emr-kerberos-ad-DHCP)

[Step 6: Launch a Kerberized EMR Cluster](#emr-kerberos-ad-cluster)

[Step 7: Create HDFS users and set permissions on the cluster for Active Directory accounts](#emr-kerberos-ad-hadoopuser)

## Step 1: Set up the VPC and subnet
<a name="emr-kerberos-ad-network"></a>

The following steps demonstrate creating a VPC and subnet so that the cluster-dedicated KDC can reach the Active Directory domain controller and resolve its domain name. In these steps, domain-name resolution is provided by referencing the Active Directory domain controller as the domain name server in the DHCP option set. For more information, see [Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server](#emr-kerberos-ad-DHCP).

The KDC and the Active Directory domain controller must be able to resolve one other's domain names. This allows Amazon EMR to join computers to the domain and automatically configure corresponding Linux accounts and SSH parameters on cluster instances. 

If Amazon EMR can't resolve the domain name, you can reference the trust using the Active Directory domain controller's IP address. However, you must manually add Linux accounts, add corresponding principals to the cluster-dedicated KDC, and configure SSH.

**To set up the VPC and subnet**

1. Create an Amazon VPC with a single public subnet. For more information, see [Step 1: Create the VPC](https://docs.aws.amazon.com/AmazonVPC/latest/GettingStartedGuide/getting-started-ipv4.html#getting-started-create-vpc) in the *Amazon VPC Getting Started Guide*.
**Important**  
When you use a Microsoft Active Directory domain controller, choose a CIDR block for the EMR cluster so that all IPv4 addresses are fewer than nine characters in length (for example, 10.0.0.0/16). This is because the DNS names of cluster computers are used when the computers join the Active Directory directory. AWS assigns [DNS hostnames](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-hostnames) based on IPv4 address in a way that longer IP addresses may result in DNS names longer than 15 characters. Active Directory has a 15-character limit for registering joined computer names, and truncates longer names, which can cause unpredictable errors.

1. Remove the default DHCP option set assigned to the VPC. For more information, see [Changing a VPC to use No DHCP options](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_DHCP_Options.html#DHCP_Use_No_Options). Later on, you add a new one that specifies the Active Directory domain controller as the DNS server. 

1. Confirm that DNS support is enabled for the VPC, that is, that DNS Hostnames and DNS Resolution are both enabled. They are enabled by default. For more information, see [Updating DNS support for your VPC](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-updating).

1. Confirm that your VPC has an internet gateway attached, which is the default. For more information, see [Creating and attaching an internet gateway](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html#Add_IGW_Attach_Gateway).
**Note**  
An internet gateway is used in this example because you are establishing a new domain controller for the VPC. An internet gateway may not be required for your application. The only requirement is that the cluster-dedicated KDC can access the Active Directory domain controller.

1. Create a custom route table, add a route that targets the Internet Gateway, and then attach it to your subnet. For more information, see [Create a custom route table](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_Internet_Gateway.html#Add_IGW_Routing).

1. When you launch the EC2 instance for the domain controller, it must have a static public IPv4 address for you to connect to it using RDP. The easiest way to do this is to configure your subnet to auto-assign public IPv4 addresses. This is not the default setting when a subnet is created. For more information, see [Modifying the public IPv4 addressing attribute of your subnet](https://docs.aws.amazon.com/vpc/latest/userguide/vpc-ip-addressing.html#subnet-public-ip). Optionally, you can assign the address when you launch the instance. For more information, see [Assigning a public IPv4 address during instance launch](https://docs.aws.amazon.com/vpc/latest/userguide/using-instance-addressing.html#public-ip-addresses).

1. When you finish, make a note of your VPC and subnet IDs. You use them later when you launch the Active Directory domain controller and the cluster.

## Step 2: Launch and install the Active Directory domain controller
<a name="emr-kerberos-ad-dc"></a>

1. Launch an EC2 instance based on the Microsoft Windows Server 2016 Base AMI. We recommend an m4.xlarge or better instance type. For more information, see [Launching an AWS Marketplace instance](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/launch-marketplace-console.html) in the *Amazon EC2 User Guide*.

1. Make a note of the Group ID of the security group associated with the EC2 instance. You need it for [Step 6: Launch a Kerberized EMR Cluster](#emr-kerberos-ad-cluster). We use *sg-012xrlmdomain345*. Alternatively, you can specify different security groups for the EMR cluster and this instance that allows traffic between them. For more information, see [Amazon EC2 security groups for Linux instances](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html) in the *Amazon EC2 User Guide*.

1. Connect to the EC2 instance using RDP. For more information, see [Connecting to your Windows instance](https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/connecting_to_windows_instance.html) in the *Amazon EC2 User Guide*.

1. Start **Server Manager** to install and configure the Active Directory domain Services role on the server. Promote the server to a domain controller and assign a domain name (the example we use here is `ad.domain.com`). Make a note of the domain name because you need it later when you create the EMR security configuration and cluster. If you are new to setting up Active Directory, you can follow the instructions in [How to set up Active Directory (AD) in Windows Server 2016](https://ittutorials.net/microsoft/windows-server-2016/setting-up-active-directory-ad-in-windows-server-2016/).

   The instance restarts when you finish.

## Step 3: Add accounts to the domain for the EMR Cluster
<a name="emr-kerberos-ad-users"></a>

RDP to the Active Directory domain controller to create accounts in Active Directory Users and Computers for each cluster user. For more information, see [Create a User Account in Active Directory Users and Computers](https://technet.microsoft.com/en-us/library/dd894463(v=ws.10).aspx) on the *Microsoft Learn* site. Make a note of each user's **User logon name**. You need these later when you configure the cluster. 

In addition, create a account with sufficient privileges to join computers to the domain. You specify this account when you create a cluster. Amazon EMR uses it to join cluster instances to the domain. You specify this account and its password in [Step 6: Launch a Kerberized EMR Cluster](#emr-kerberos-ad-cluster). To delegate computer join privileges to the account, we recommend that you create a group with join privileges and then assign the user to the group. For instructions, see [Delegating directory join privileges](https://docs.aws.amazon.com/directoryservice/latest/admin-guide/directory_join_privileges.html) in the *AWS Directory Service Administration Guide*.

## Step 4: Configure an incoming trust on the Active Directory domain controller
<a name="emr-kerberos-ad-configure-trust"></a>

The example commands below create a trust in Active Directory, which is a one-way, incoming, non-transitive, realm trust with the cluster-dedicated KDC. The example we use for the cluster's realm is `EC2.INTERNAL`. Replace the *KDC-FQDN* with the **Public DNS** name listed for the Amazon EMR primary node hosting the KDC. The `passwordt` parameter specifies the **cross-realm principal password**, which you specify along with the cluster **realm** when you create a cluster. The realm name is derived from the default domain name in `us-east-1` for the cluster. The `Domain` is the Active Directory domain in which you are creating the trust, which is lower case by convention. The example uses `ad.domain.com`

Open the Windows command prompt with administrator privileges and type the following commands to create the trust relationship on the Active Directory domain controller:

```
C:\Users\Administrator> ksetup /addkdc EC2.INTERNAL KDC-FQDN
C:\Users\Administrator> netdom trust EC2.INTERNAL /Domain:ad.domain.com /add /realm /passwordt:MyVeryStrongPassword
C:\Users\Administrator> ksetup /SetEncTypeAttr EC2.INTERNAL AES256-CTS-HMAC-SHA1-96
```

## Step 5: Use a DHCP option set to specify the Active Directory domain controller as a VPC DNS server
<a name="emr-kerberos-ad-DHCP"></a>

Now that the Active Directory domain controller is configured, you must configure the VPC to use it as a domain name server for name resolution within your VPC. To do this, attach a DHCP options set. Specify the **Domain name** as the domain name of your cluster - for example, `ec2.internal` if your cluster is in us-east-1 or `region.compute.internal` for other regions. For **Domain name servers**, you must specify the IP address of the Active Directory domain controller (which must be reachable from the cluster) as the first entry, followed by **AmazonProvidedDNS** (for example, ***xx.xx.xx.xx*,AmazonProvidedDNS**). For more information, see [Changing DHCP option sets](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_DHCP_Options.html#DHCPOptions).

## Step 6: Launch a Kerberized EMR Cluster
<a name="emr-kerberos-ad-cluster"></a>

1. In Amazon EMR, create a security configuration that specifies the Active Directory domain controller you created in the previous steps. An example command is shown below. Replace the domain, `ad.domain.com`, with the name of the domain you specified in [Step 2: Launch and install the Active Directory domain controller](#emr-kerberos-ad-dc).

   ```
   aws emr create-security-configuration --name MyKerberosConfig \
   --security-configuration '{
     "AuthenticationConfiguration": {
       "KerberosConfiguration": {
         "Provider": "ClusterDedicatedKdc",
         "ClusterDedicatedKdcConfiguration": {
           "TicketLifetimeInHours": 24,
           "CrossRealmTrustConfiguration": {
             "Realm": "AD.DOMAIN.COM",
             "Domain": "ad.domain.com",
             "AdminServer": "ad.domain.com",
             "KdcServer": "ad.domain.com"
           }
         }
       }
     }
   }'
   ```

1. Create the cluster with the following attributes:
   + Use the `--security-configuration` option to specify the security configuration that you created. We use *MyKerberosConfig* in the example.
   + Use the `SubnetId` property of the `--ec2-attributes option` to specify the subnet that you created in [Step 1: Set up the VPC and subnet](#emr-kerberos-ad-network). We use *step1-subnet* in the example.
   + Use the `AdditionalMasterSecurityGroups` and `AdditionalSlaveSecurityGroups` of the `--ec2-attributes` option to specify that the security group associated with the AD domain controller from [Step 2: Launch and install the Active Directory domain controller](#emr-kerberos-ad-dc) is associated with the cluster primary node as well as core and task nodes. We use *sg-012xrlmdomain345* in the example.

   Use `--kerberos-attributes` to specify the following cluster-specific Kerberos attributes:
   + The realm for the cluster that you specified when you set up the Active Directory domain controller.
   + The cross-realm trust principal password that you specified as `passwordt` in [Step 4: Configure an incoming trust on the Active Directory domain controller](#emr-kerberos-ad-configure-trust).
   + A `KdcAdminPassword`, which you can use to administer the cluster-dedicated KDC.
   + The user logon name and password of the Active Directory account with computer join privileges that you created in [Step 3: Add accounts to the domain for the EMR Cluster](#emr-kerberos-ad-users).

   The following example launches a Kerberized cluster.

   ```
   aws emr create-cluster --name "MyKerberosCluster" \
   --release-label emr-5.10.0 \
   --instance-type m5.xlarge \
   --instance-count 3 \
   --ec2-attributes InstanceProfile=EMR_EC2_DefaultRole,KeyName=MyEC2KeyPair,\
   SubnetId=step1-subnet, AdditionalMasterSecurityGroups=sg-012xrlmdomain345,
   AdditionalSlaveSecurityGroups=sg-012xrlmdomain345\
   --service-role EMR_DefaultRole \
   --security-configuration MyKerberosConfig \
   --applications Name=Hadoop Name=Hive Name=Oozie Name=Hue Name=HCatalog Name=Spark \
   --kerberos-attributes Realm=EC2.INTERNAL,\
   KdcAdminPassword=MyClusterKDCAdminPwd,\
   ADDomainJoinUser=ADUserLogonName,ADDomainJoinPassword=ADUserPassword,\
   CrossRealmTrustPrincipalPassword=MatchADTrustPwd
   ```

## Step 7: Create HDFS users and set permissions on the cluster for Active Directory accounts
<a name="emr-kerberos-ad-hadoopuser"></a>

When setting up a trust relationship with Active Directory, Amazon EMR creates Linux users on the cluster for each Active Directory account. For example, the user logon name `LiJuan` in Active Directory has a Linux account of `lijuan`. Active Directory user names can contain upper-case letters, but Linux does not honor Active Directory casing.

To allow your users to log in to the cluster to run Hadoop jobs, you must add HDFS user directories for their Linux accounts, and grant each user ownership of their directory. To do this, we recommend that you run a script saved to Amazon S3 as a cluster step. Alternatively, you can run the commands in the script below from the command line on the primary node. Use the EC2 key pair that you specified when you created the cluster to connect to the primary node over SSH as the Hadoop user. For more information, see [Use an EC2 key pair for SSH credentials for Amazon EMR](emr-plan-access-ssh.md).

Run the following command to add a step to the cluster that runs a script, *AddHDFSUsers.sh*.

```
aws emr add-steps --cluster-id <j-2AL4XXXXXX5T9> \
--steps Type=CUSTOM_JAR,Name=CustomJAR,ActionOnFailure=CONTINUE,\
Jar=s3://region.elasticmapreduce/libs/script-runner/script-runner.jar,Args=["s3://amzn-s3-demo-bucket/AddHDFSUsers.sh"]
```

The contents of the file *AddHDFSUsers.sh* is as follows.

```
#!/bin/bash
# AddHDFSUsers.sh script

# Initialize an array of user names from AD or Linux users and KDC principals created manually on the cluster
ADUSERS=("lijuan" "marymajor" "richardroe" "myusername")

# For each user listed, create an HDFS user directory
# and change ownership to the user

for username in ${ADUSERS[@]}; do
     hdfs dfs -mkdir /user/$username
     hdfs dfs -chown $username:$username /user/$username
done
```

### Active Directory groups mapped to Hadoop groups
<a name="emr-kerberos-ad-group"></a>

Amazon EMR uses System Security Services Daemon (SSD) to map Active Directory groups to Hadoop groups. To confirm group mappings, after you log in to the primary node as described in [Using SSH to connect to Kerberized clusters with Amazon EMR](emr-kerberos-connect-ssh.md), you can use the `hdfs groups` command to confirm that Active Directory groups to which your Active Directory account belongs have been mapped to Hadoop groups for the corresponding Hadoop user on the cluster. You can also check other users' group mappings by specifying one or more user names with the command, for example `hdfs groups lijuan`. For more information, see [groups](https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#groups) in the [Apache HDFS Commands Guide](https://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html).