

# Use Active Directory or LDAP servers for authentication with Amazon EMR
<a name="ldap"></a>

With Amazon EMR releases 6.12.0 and higher, you can use the LDAP over SSL (LDAPS) protocol to launch a cluster that natively integrates with your corporate identity server. LDAP (Lightweight Directory Access Protocol) is an open, vendor-neutral application protocol that accesses and maintains data. LDAP is commonly used for user authentication against corporate identity servers that are hosted on applications such as Active Directory (AD) and OpenLDAP. With this native integration, you can use your LDAP server to authenticate users on Amazon EMR.

Highlights of the Amazon EMR LDAP integration include:
+ Amazon EMR configures the supported applications to authenticate with LDAP authentication on your behalf.
+ Amazon EMR configures and maintains security for the supported applications with the Kerberos protocol. You don't need to input any commands or scripts.
+ You get fine-grained access control (FGAC) through Apache Ranger authorization for Hive Metastore database and tables. See [Integrate Amazon EMR with Apache Ranger](emr-ranger.md) for more information.
+ When you require LDAP credentials to access a cluster, you get fine-grained access control (FGAC) over who can access your EMR clusters through SSH.

The following pages provide a conceptual overview, prerequisites, and steps to launch an EMR cluster with the Amazon EMR LDAP integration.

**Topics**
+ [Overview of LDAP with Amazon EMR](ldap-overview.md)
+ [LDAP components for Amazon EMR](ldap-components.md)
+ [Application support and considerations with LDAP for Amazon EMR](ldap-considerations.md)
+ [Configure and launch an EMR cluster with LDAP](ldap-setup.md)
+ [Examples using LDAP with Amazon EMR](ldap-examples.md)

# Overview of LDAP with Amazon EMR
<a name="ldap-overview"></a>

Lightweight Directory Access Protocol (LDAP) is a software protocol that network administrators use to manage and control access to data by authenticating users within a company’s network. The LDAP protocol stores information in a hierarchical, tree directory structure. For more information, see [Basic LDAP Concepts](https://ldap.com/basic-ldap-concepts/) on *LDAP.com*.

Within a company’s network, many applications might use the LDAP protocol to authenticate users. With the Amazon EMR LDAP integration, EMR clusters can natively use the same LDAP protocol with an added security configuration.

There are two major implementations of the LDAP protocol that Amazon EMR supports: **Active Directory** and **OpenLDAP**. While other implementations are possible, most fit the same authentication protocols as Active Directory or OpenLDAP.

## Active Directory (AD)
<a name="ldap-ad"></a>

Active Directory (AD) is a directory service from Microsoft for Windows domain networks. AD is included on most Windows Server operating systems, and can communicate with clients over the LDAP and LDAPS protocols. For authentication, Amazon EMR attempts a user-bind with your AD instance with the User Principal Name (UPN) as the distinguished name and password. The UPN uses the standard format `username@domain_name`.

## OpenLDAP
<a name="ldap-openldap"></a>

OpenLDAP is a free, open-source implementation of the LDAP protocol. For authentication, Amazon EMR attempts a user-bind with your OpenLDAP instance with the fully qualified domain name (FQDN) as the distinguished name and password. The FQDN uses the standard format `username_attribute=username,LDAP_user_search_base`. Commonly, the `username_attribute` value is `uid`, and the `LDAP_user_search_base` value contains the attributes of the tree that leads to the user. For example, `ou=People,dc=example,dc=com`.

Other free and open-source implementations of the LDAP protocol typically follow a similar FQDN as OpenLDAP for the distinguished names of their users. 

# LDAP components for Amazon EMR
<a name="ldap-components"></a>

You can use your LDAP server to authenticate with Amazon EMR and any applications that the user directly utilizes on the EMR cluster through the following components. 

**Secret Agent**  
The *Secret Agent* is an on-cluster process that authenticates all user requests. The Secret Agent creates the user bind to your LDAP server on behalf of the supported applications on the EMR cluster. The Secret Agent runs as the `emrsecretagent` user, and it writes logs to the `/emr/secretagent/log` directory. These logs provide details about the state of each user's authentication request and any errors that might surface during user authentication.

**System Security Services Daemon (SSSD)**  
*SSSD* is a daemon that runs on each node of an LDAP-enabled EMR cluster. SSSD creates and manages a UNIX user to sync your remote corporate identity to each node. YARN-based applications such as Hive and Spark require that a local UNIX user exists on every node that runs a query for a user.

# Application support and considerations with LDAP for Amazon EMR
<a name="ldap-considerations"></a>

This topic lists supported applications, supported features and unsupported features.

## Supported applications with LDAP for Amazon EMR
<a name="ldap-considerations-apps"></a>

**Important**  
The applications listed on this page are the only applications that Amazon EMR supports for LDAP. To ensure cluster security, you can only include LDAP-compatible applications when you create an EMR cluster with LDAP enabled. If you attempt to install other, unsupported applications, Amazon EMR will reject your request for a new cluster.

The Amazon EMR releases 6.12 and higher support LDAP integration with the following applications:
+ Apache Livy
+ Apache Hive through HiveServer2 (HS2)
+ Trino
+ Presto
+ Hue

You can also install the following applications on an EMR cluster and configure them to meet your security needs:
+ Apache Spark
+ Apache Hadoop

## Supported features with LDAP for Amazon EMR
<a name="ldap-considerations-features"></a>

You can use the following Amazon EMR features with the LDAP integration:

**Note**  
To keep LDAP credentials secure, you must use in-transit encryption to secure the flow of data on and off the cluster. For more information about in-transit encryption, see [Encrypt data at rest and in transit with Amazon EMR](emr-data-encryption.md).
+ Encryption in transit (required) and at rest
+ Instance groups, instance fleets, and Spot Instances
+ Reconfiguration of applications on a running cluster
+ EMRFS server-side encryption (SSE)

## Unsupported features
<a name="ldap-considerations-limitations"></a>

Consider the following limitations when you use the Amazon EMR LDAP integration:
+ Amazon EMR disables steps for clusters with LDAP enabled.
+ Amazon EMR doesn't support runtime roles and AWS Lake Formation integrations for clusters with LDAP enabled.
+ Amazon EMR doesn't support LDAP with StartTLS.
+ Amazon EMR doesn't support high-availability mode (clusters with multiple primary nodes) for clusters with LDAP enabled.
+ You can't rotate bind credentials or certificates for clusters with LDAP enabled. If any of those fields were rotated, we recommend that you start a new cluster with the updated bind credentials or certificates.
+ You must use exact search bases with LDAP. The LDAP user and group search base doesn't support LDAP search filters.

# Configure and launch an EMR cluster with LDAP
<a name="ldap-setup"></a>

This section covers how to configure Amazon EMR for use with LDAP authentication.

**Topics**
+ [Add AWS Secrets Manager permissions to the Amazon EMR instance role](ldap-setup-asm.md)
+ [Create the Amazon EMR security configuration for LDAP integration](ldap-setup-security.md)
+ [Launch an EMR cluster that authenticates with LDAP](ldap-setup-launch.md)

# Add AWS Secrets Manager permissions to the Amazon EMR instance role
<a name="ldap-setup-asm"></a>

Amazon EMR uses an IAM service role to perform actions on your behalf to provision and manage clusters. The service role for cluster EC2 instances, also called *the EC2 instance profile for Amazon EMR*, is a special type of service role that Amazon EMR assigns to every EC2 instance in a cluster at launch.

To define permissions for an EMR cluster to interact with Amazon S3 data and other AWS services, define a custom Amazon EC2 instance profile instead of the `EMR_EC2_DefaultRole` when you launch your cluster. For more information, see [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md) and [Customize IAM roles with Amazon EMR](emr-iam-roles-custom.md).

Add the following statements to the default EC2 instance profile to allow Amazon EMR to tag sessions and access the AWS Secrets Manager that stores LDAP certificates.

```
    {
      "Sid": "AllowAssumeOfRolesAndTagging",
      "Effect": "Allow",
      "Action": ["sts:TagSession", "sts:AssumeRole"],
      "Resource": [
        "arn:aws:iam::111122223333:role/LDAP_DATA_ACCESS_ROLE_NAME",
        "arn:aws:iam::111122223333:role/LDAP_USER_ACCESS_ROLE_NAME"
      ]
    },
    {
        "Sid": "AllowSecretsRetrieval",
        "Effect": "Allow",
        "Action": "secretsmanager:GetSecretValue",
        "Resource": [
            "arn:aws:secretsmanager:us-east-1:111122223333:secret:LDAP_SECRET_NAME*",
            "arn:aws:secretsmanager:us-east-1:111122223333:secret:ADMIN_LDAP_SECRET_NAME*"
        ]
    }
```

**Note**  
Your cluster requests will fail if you forget the wildcard `*` character at the end of the secret name when you set Secrets Manager permissions. The wildcard represents the secret versions.  
You should aslo limit the scope of the AWS Secrets Manager policy to only the certificates that your cluster needs to provision instances.

# Create the Amazon EMR security configuration for LDAP integration
<a name="ldap-setup-security"></a>

Before you can launch an EMR cluster with LDAP integration, use the steps in [Create a security configuration with the Amazon EMR console or with the AWS CLI](emr-create-security-configuration.md) to create an Amazon EMR security configuration for the cluster. Complete the following configurations in the `LDAPConfiguration` block under `AuthenticationConfiguration`, or the in corresponding fields in the Amazon EMR console **Security Configurations** section:

**`EnableLDAPAuthentication`**  
Console option: **Authentication protocol: LDAP**  
To use the LDAP integration, set this option to `true` or select it as your authentication protocol when you create a cluster in the console. By default, `EnableLDAPAuthentication` is `true` when you create a security configuration in the Amazon EMR console.

**`LDAPServerURL`**  
Console option: **LDAP server location**  
The location of the LDAP server including the prefix: `ldaps://location_of_server`.

**`BindCertificateARN`**  
Console option: **LDAP SSL certificate**  
The AWS Secrets Manager ARN that contains the certificate to sign the SSL certificate that the LDAP server uses. If your LDAP server is signed by a public Certificate Authority (CA), you can provide an AWS Secrets Manager ARN with a blank file. For more information on how to store your certificate in Secrets Manager, see [Store TLS certificates in AWS Secrets Manager](emr-ranger-tls-certificates.md).

**`BindCredentialsARN`**  
Console option: **LDAP server bind credentials**  
An AWS Secrets Manager ARN that contains the LDAP admin user bind credentials. The credentials are stored as a JSON object. There is only one key-value pair in this secret; the key in the pair is the username, and the value is the password. For example, `{"uid=admin,cn=People,dc=example,dc=com": "AdminPassword1"}`. This is an optional field unless you enable SSH login for your EMR cluster. In many configurations, Active Directory instances require bind credentials to allow SSSD to sync users.

**`LDAPAccessFilter`**  
Console option: **LDAP access filter**  
Specifies the subset of objects within your LDAP server that can authenticate. For example, if all you want to grant access to all users with the `posixAccount` object class in your LDAP server, define the access filter as `(objectClass=posixAccount)`.

**`LDAPUserSearchBase`**  
Console option: **LDAP user search base**  
The search base that your users belong under within your LDAP server. For example, `cn=People,dc=example,dc=com`.

**`LDAPGroupSearchBase`**  
Console option: **LDAP group search base**  
The search base that your groups belong under within your LDAP server. For example, `cn=Groups,dc=example,dc=com`.

**`EnableSSHLogin`**  
Console option: **SSH login**  
Specifies whether or not to allow password authentication with LDAP credentials. We don't recommend that you enable this option. Key pairs are a more secure route to allow access into EMR clusters. This field is optional and defaults to `false`. 

**`LDAPServerType`**  
Console option: **LDAP server type**  
Specifies the type of LDAP server that Amazon EMR connects to. Supported options are Active Directory and OpenLDAP. Other LDAP server types might work, but Amazon EMR doesn't officially support other server types. For more information, see [LDAP components for Amazon EMR](ldap-components.md).

**`ActiveDirectoryConfigurations`**  
A required sub-block for security configurations that use the Active Directory server type.

**`ADDomain`**  
Console option: **Active Directory domain**  
The domain name used to create the User Principal Name (UPN) for user authentication with security configurations that use the Active Directory server type.

## Considerations for security configurations with LDAP and Amazon EMR
<a name="ldap-setup-security-considerations"></a>
+ To create a security configuration with Amazon EMR LDAP integration, you must use in-transit encryption. For information about in-transit encryption, see [Encrypt data at rest and in transit with Amazon EMR](emr-data-encryption.md).
+ You can't define Kerberos configuration in the same security configuration. Amazon EMR provisions a KDC thar is dedicated to the automatically, and manages the admin password for this KDC. Users can't access this admin password.
+ You can't define IAM runtime roles and AWS Lake Formation in the same security configuration.
+ The `LDAPServerURL` must have the `ldaps://` protocol in its value.
+ The `LDAPAccessFilter` can't be empty. 

## Use LDAP with the Apache Ranger integration for Amazon EMR
<a name="ldap-setup-ranger"></a>

With the LDAP integration for Amazon EMR, you can further integrate with Apache Ranger. When you pull .your LDAP users into Ranger, you can then associate those users with an Apache Ranger policy server to integrate with Amazon EMR and other applications. To do this, define the `RangerConfiguration` field within `AuthorizationConfiguration` in the security configuration that you use with your LDAP cluster. For more information on how to set up the security configuration, see [Create the EMR security configuration](emr-ranger-security-config.md).

When you use LDAP with Amazon EMR, you don't need to provide a `KerberosConfiguration` with the Amazon EMR integration for Apache Ranger. 

# Launch an EMR cluster that authenticates with LDAP
<a name="ldap-setup-launch"></a>

Use the following steps to launch an EMR cluster with LDAP or Active Directory. 

1. Set up your environment:
   + Make sure that the nodes on your EMR cluster can communicate with Amazon S3 and AWS Secrets Manager. For more information on how to modify your EC2 instance profile role to communicate with these services, see [Add AWS Secrets Manager permissions to the Amazon EMR instance role](ldap-setup-asm.md).
   + If you plan to run your EMR cluster in a private subnet, you should use AWS PrivateLink and Amazon VPC endpoints, or use network address transalation (NAT) to configure the VPC to communicate with S3 and Secrets Manager. For more information, see [AWS PrivateLink and VPC endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/endpoint-services-overview.html) and [NAT instances](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Instance.html) in the *Amazon VPC Getting Started Guide*.
   + Make sure that there is network connectivity between your EMR cluster and the LDAP server. Your EMR clusters must access your LDAP server over the network. The primary, core, and task nodes for the cluster communicate with the LDAP server to sync user data. If your LDAP server runs on Amazon EC2, update the EC2 security group to accept traffic from the EMR cluster. For more information, see [Add AWS Secrets Manager permissions to the Amazon EMR instance role](ldap-setup-asm.md).

1. Create an Amazon EMR security configuration for the LDAP integration. For more information, see [Create the Amazon EMR security configuration for LDAP integration](ldap-setup-security.md).

1. Now that you're set up, use the steps in [Launch an Amazon EMR cluster](emr-gs.md#emr-getting-started-launch-sample-cluster) to launch your cluster with the following configurations:
   + Select Amazon EMR release 6.12 or higher. We recommend that you use the latest Amazon EMR release.
   + Only specify or select applications for your cluster that support LDAP. For a list of LDAP-supported applications with Amazon EMR, see [Application support and considerations with LDAP for Amazon EMR](ldap-considerations.md).
   + Apply the security configuration that you created in the previous step.

# Examples using LDAP with Amazon EMR
<a name="ldap-examples"></a>

Once you [provision an EMR cluster that uses LDAP](ldap-setup-launch.md) integration, you can provide your LDAP credentials to any [supported application](ldap-considerations.md#ldap-considerations-apps) through its built-in username and password authentication mechanism. This page shows some examples.

## Using LDAP authentication with Apache Hive
<a name="ldap-examples-"></a>

**Example - Apache Hive**  
The following example command starts an Apache Hive session through HiveServer2 and Beeline:  

```
beeline -u "jdbc:hive2://$HOSTNAME:10000/default;ssl=true;sslTrustStore=$TRUSTSTORE_PATH;trustStorePassword=$TRUSTSTORE_PASS"  -n LDAP_USERNAME -p LDAP_PASSWORD
```

## Using LDAP authentication with Apache Livy
<a name="ldap-examples-livy"></a>

**Example - Apache Livy**  
The following example command starts a Livy session through cURL. Replace `ENCODED-KEYPAIR` with a Base64-encoded string for `username:password`.  

```
curl -X POST --data '{"proxyUser":"LDAP_USERNAME","kind": "pyspark"}' -H "Content-Type: application/json" -H "Authorization: Basic ENCODED-KEYPAIR" DNS_OF_PRIMARY_NODE:8998/sessions
```

## Using LDAP authentication with Presto
<a name="ldap-examples-presto"></a>

**Example - Presto**  
The following example command starts a Presto session through the Presto CLI:  

```
presto-cli --user "LDAP_USERNAME" --password --catalog hive
```
After you run this command, enter the LDAP password at the prompt.

## Using LDAP authentication with Trino
<a name="ldap-examples-trino"></a>

**Example - Trino**  
The following example command starts a Trino session through the Trino CLI:  

```
trino-cli --user "LDAP_USERNAME" --password --catalog hive
```
After you run this command, enter the LDAP password at the prompt.

## Using LDAP authentication with Hue
<a name="ldap-examples-hue"></a>

You can access Hue UI through an SSH tunnel that you create on the cluster, or you can set a proxy server to publicly broadcast the connection to Hue. Because Hue doesn't run in HTTPS mode by default, we recommend that you use an additional encryption layer to ensure that communication between clients and the Hue UI is encrypted with HTTPS. This reduces the chance that you might accidentally expose user credentials in plain text.

To use the Hue UI, open the Hue UI in your browser and enter your LDAP username password to log in. If the credentials are correct, Hue logs you in and uses your identity to authenticate you with all supported applications.

## Using SSH for password authentication and Kerberos tickets for other applications
<a name="ldap-examples-ssh"></a>

**Important**  
We don't recommend that you use password authentication to SSH into an EMR cluster.

You can use your LDAP credentials to SSH to an EMR cluster. To do this, set the `EnableSSHLogin` configuration to `true` in the Amazon EMR security configuration that you use to start the cluster. Then, use the following command to SSH to the cluster once its been launched:

```
ssh username@EMR_PRIMARY_DNS_NAME
```

After you run this command, enter the LDAP password at the prompt.

Amazon EMR includes an on-cluster script that allows users to generate a Kerberos keytab file and ticket to use with supported applications that don't accept LDAP credentials directly. Some of these applications include `spark-submit`, Spark SQL, and PySpark.

Run `ldap-kinit` and follow the prompts. If the authentication succeeds, the Kerberos keytab file appears in your home directory with a valid Kerberos ticket. Use the Kerberos ticket to run applications as you would on any Kerberized environment.