

# Integrate Amazon EMR with Apache Ranger
<a name="emr-ranger"></a>

Beginning with Amazon EMR 5.32.0, you can launch a cluster that natively integrates with Apache Ranger. Apache Ranger is an open-source framework to enable, monitor, and manage comprehensive data security across the Hadoop platform. For more information, see [Apache Ranger](https://ranger.apache.org/). With native integration, you can bring your own Apache Ranger to enforce fine-grained data access control on Amazon EMR.

This section provides a conceptual overview of Amazon EMR integration with Apache Ranger. It also includes the prerequisites and steps required to launch an Amazon EMR cluster integrated with Apache Ranger.

Natively integrating Amazon EMR with Apache Ranger provides the following key benefits: 
+ Fine-grained access control to Hive Metastore databases and tables, which enables you to define data filtering policies at the level of database, table, and column for Apache Spark and Apache Hive applications. Row-level filtering and data masking are supported with Hive applications.
+ The ability to use your existing Hive policies directly with Amazon EMR for Hive applications.
+ Access control to Amazon S3 data at the prefix and object level, which enables you to define data filtering policies for access to S3 data using the EMR File System.
+ The ability to use CloudWatch Logs for centralized auditing.
+ Amazon EMR installs and manages the Apache Ranger plugins on your behalf.

**Important**  
Amazon EMR does not support Apache Ranger integration starting with Amazon EMR release 7.4. For more information, see [Amazon EMR release 7.4.0](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-740-release.html).

# Apache Ranger with Amazon EMR
<a name="emr-ranger-overview"></a>

Apache Ranger is a framework to enable, monitor, and manage comprehensive data security across the Hadoop platform.

Apache Ranger has the following features:
+ Centralized security administration to manage all security related tasks in a central UI or using REST APIs.
+ Fine-grained authorization to do a specific action or operation with a Hadoop component or tool, managed through a central administration tool.
+ A standardized authorization method across all Hadoop components.
+ Enhanced support for various authorization methods.
+ Centralized auditing of user access and administrative actions (security related) within all the components of Hadoop.

Apache Ranger uses two key components for authorization: 
+ **Apache Ranger policy admin server** - This server allows you to define the authorization policies for Hadoop applications. When integrating with Amazon EMR, you are able to define and enforce policies for Apache Spark and Hive to access Hive Metastore, and accessing Amazon S3 data [EMR File System (EMRFS)](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-fs). You can set up a new or use an existing Apache Ranger policy admin server to integrate with Amazon EMR.
+ **Apache Ranger plugin** - This plugin validates the access of a user against the authorization policies defined in the Apache Ranger policy admin server. Amazon EMR installs and configures the Apache Ranger plugin automatically for each Hadoop application selected in the Apache Ranger configuration. 

**Topics**
+ [Architecture of Amazon EMR integration with Apache Ranger](emr-ranger-architecture.md)
+ [Amazon EMR components for use with Apache Ranger](emr-ranger-components.md)

# Architecture of Amazon EMR integration with Apache Ranger
<a name="emr-ranger-architecture"></a>

![\[Amazon EMR and Apache Ranger architecture diagram.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/emr-ranger-architecture.png)


# Amazon EMR components for use with Apache Ranger
<a name="emr-ranger-components"></a>

Amazon EMR enables fine-grained access control with Apache Ranger through the following components. See the [architecture diagram](emr-ranger-architecture.md) for a visual representation of these Amazon EMR components with the Apache Ranger plugins.

**Secret agent** – The secret agent securely stores secrets and distributes secrets to other Amazon EMR components or applications. The secrets can include temporary user credentials, encryption keys, or Kerberos tickets. The secret agent runs on every node in the cluster and intercepts calls to the Instance Metadata Service. For requests to the instance profile role credentials, the Secret Agent vends credentials depending on the requesting user and requested resources after authorizing the request with the EMRFS S3 Ranger plugin. The secret agent runs as the *`emrsecretagent`* user, and it writes logs to the /emr/secretagent/log directory. The process relies on a specific set of `iptables` rules to function. It is important to ensure that `iptables` is not disabled. If you customize `iptables` configuration, the NAT table rules must be preserved and left unaltered.

**EMR record server** – The record server receives requests to access data from Spark. It then authorizes requests by forwarding the requested resources to the Spark Ranger plugin for Amazon EMR. The record server reads data from Amazon S3 and returns filtered data that the user is authorized to access based on Ranger policy. The record server runs on every node in the cluster as the emr\$1record\$1server user and writes logs to the /var/log/emr-record-server directory.

# Considerations for using Amazon EMR with Apache Ranger
<a name="emr-ranger-app-support"></a>

## Supported applications for Amazon EMR with Apache Ranger
<a name="emr-ranger-app-support-list"></a>

The integration between Amazon EMR and Apache Ranger in which EMR installs Ranger plugins currently supports the following applications:
+ Apache Spark (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Apache Hive (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ S3 Access through EMRFS (Available with EMR 5.32\$1 and EMR 6.3\$1)

The following applications can be installed on an EMR cluster and may need to be configured to meet your security needs:
+ Apache Hadoop (Available with EMR 5.32\$1 and EMR 6.3\$1 including YARN and HDFS)
+ Apache Livy (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Apache Zeppelin (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Apache Hue (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Ganglia (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ HCatalog (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Mahout (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ MXNet (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ TensorFlow (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Tez (Available with EMR 5.32\$1 and EMR 6.3\$1)
+ Trino (Available with EMR 6.7\$1)
+ ZooKeeper (Available with EMR 5.32\$1 and EMR 6.3\$1)

**Important**  
Applications listed above are the only applications that are currently supported. To ensure cluster security, you are allowed to create an EMR cluster with only the applications in the above list when Apache Ranger is enabled.  
Other applications are currently not supported. To ensure the security of your cluster, attempting to install other applications will cause the rejection of your cluster.  
AWS Glue Data Catalog and Open table formats such as Apache Hudi, Delta Lake, and Apache Iceberg aren't supported.

**Supported Amazon EMR features with Apache Ranger**  
The following Amazon EMR features are supported when you use Amazon EMR with Apache Ranger:
+ Encryption at rest and in transit
+ Kerberos authentication (required)
+ Instance groups, instance fleets, and Spot Instances
+ Reconfiguration of applications on a running cluster
+ EMRFS server-side encryption (SSE)

**Note**  
Amazon EMR encryption settings govern SSE. For more information, see [Encryption Options](emr-data-encryption-options.md).

## Application limitations
<a name="emr-ranger-app-support-limitations"></a>

There are several limitations to keep in mind when you integrate Amazon EMR and Apache Ranger:
+ You cannot currently use the console to create a security configuration that specifies the AWS Ranger integration option in the AWS GovCloud (US) Region. Security configuration can be done using the CLI.
+ Kerberos must be installed on your cluster.
+ Application UIs (user interfaces) such as the YARN Resource Manager UI, HDFS NameNode UI, and Livy UI are not set with authentication by default.
+ The HDFS default permissions `umask` are configured so that objects created are set to `world wide readable` by default.
+ Amazon EMR doesn't support high-availability (multiple primary) mode with Apache Ranger.
+ For additional limitations, see limitations for each application.

**Note**  
Amazon EMR encryption settings govern SSE. For more information, see [Encryption Options](emr-data-encryption-options.md).

## Plugin limitations
<a name="plugin-limitations"></a>

Each plugin has specific limitations. For the Apache Hive plugin's limitations, see [Apache Hive plugin limitations](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ranger-hive.html#emr-ranger-hive-limitations). For the Apache Spark plugin's limitations, see [Apache Spark plugin limitations](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ranger-spark.html#emr-ranger-spark-limitations). For the EMRFS S3 plugin's limitations, see [EMRFS S3 plugin limitations](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ranger-emrfs.html#emr-ranger-emrfs-limitations).

# Set up Amazon EMR for Apache Ranger
<a name="emr-ranger-begin"></a>

Before you install Apache Ranger, review the information in this section to make sure that Amazon EMR is properly configured.

**Topics**
+ [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md)
+ [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md)
+ [Create the EMR security configuration](emr-ranger-security-config.md)
+ [Store TLS certificates in AWS Secrets Manager](emr-ranger-tls-certificates.md)
+ [Start an EMR cluster with Apache Ranger](emr-ranger-start-emr-cluster.md)
+ [Configure Zeppelin for Apache Ranger-enabled Amazon EMR clusters](emr-ranger-configure-zeppelin.md)
+ [Known issues for Amazon EMR integration](emr-ranger-security-considerations.md)

# Set up a Ranger Admin server to integrate with Amazon EMR
<a name="emr-ranger-admin"></a>

For Amazon EMR integration, the Apache Ranger application plugins must communicate with the Admin server using TLS/SSL.

**Prerequisite: Ranger Admin Server SSL Enablement**

Apache Ranger on Amazon EMR requires two-way SSL communication between plugins and the Ranger Admin server. To ensure that plugins communicate with the Apache Ranger server over SSL, enable the following attribute within ranger-admin-site.xml on the Ranger Admin server.

```
<property>
    <name>ranger.service.https.attrib.ssl.enabled</name>
    <value>true</value>
</property>
```

In addition, the following configurations are needed.

```
<property>
    <name>ranger.https.attrib.keystore.file</name>
    <value>_<PATH_TO_KEYSTORE>_</value>
</property>

<property>
    <name>ranger.service.https.attrib.keystore.file</name>
    <value>_<PATH_TO_KEYSTORE>_</value>
</property>

<property>
    <name>ranger.service.https.attrib.keystore.pass</name>
    <value>_<KEYSTORE_PASSWORD>_</value>
</property>

<property>
    <name>ranger.service.https.attrib.keystore.keyalias</name>
    <value><PRIVATE_CERTIFICATE_KEY_ALIAS></value>
</property>

<property>
    <name>ranger.service.https.attrib.clientAuth</name>
    <value>want</value>
</property>

<property>
    <name>ranger.service.https.port</name>
    <value>6182</value>
</property>
```

# TLS certificates for Apache Ranger integration with Amazon EMR
<a name="emr-ranger-admin-tls"></a>

Apache Ranger integration with Amazon EMR requires that traffic from Amazon EMR nodes to the Ranger Admin server is encrypted using TLS, and that Ranger plugins authenticate to the Apache Ranger server using two-way mutual TLS authentication. Amazon EMR service needs the public certificate of your Ranger Admin server (specified in the previous example) and the private certificate.

**Apache Ranger plugin certificates**

Apache Ranger plugin public TLS certificates must be accessible to the Apache Ranger Admin server to validate when the plugins connect. There are three different methods to do this.

**Method 1: Configure a truststore in Apache Ranger Admin server**

Fill in the following configurations in ranger-admin-site.xml to configure a truststore.

```
<property>
    <name>ranger.truststore.file</name>
    <value><LOCATION TO TRUSTSTORE></value>
</property>

<property>
    <name>ranger.truststore.password</name>
    <value><PASSWORD FOR TRUSTSTORE></value>
</property>
```

**Method 2: Load the certificate into Java cacerts truststore**

If your Ranger Admin server doesn't specify a truststore in its JVM options, then you can put the plugin public certificates in the default cacerts store.

**Method 3: Create a truststore and specify as part of JVM Options**

Within `{RANGER_HOME_DIRECTORY}/ews/ranger-admin-services.sh`, modify `JAVA_OPTS` to include `"-Djavax.net.ssl.trustStore=<TRUSTSTORE_LOCATION>"` and `"-Djavax.net.ssl.trustStorePassword=<TRUSTSTORE_PASSWORD>"`. For example, add the following line after the existing JAVA\$1OPTS.

```
JAVA_OPTS=" ${JAVA_OPTS} -Djavax.net.ssl.trustStore=${RANGER_HOME}/truststore/truststore.jck -Djavax.net.ssl.trustStorePassword=changeit"
```

**Note**  
This specification may expose the truststore password if any user is able to log into the Apache Ranger Admin server and see running processes, such as when using the `ps` command.

**Using Self-Signed Certificates**

Self-signed certificates are not recommended as certificates. Self-signed certificates may not be revoked, and self-signed certificates may not conform to internal security requirements.

# Service definition installation for Ranger integration with Amazon EMR
<a name="emr-ranger-admin-servicedef-install"></a>

A service definition is used by the Ranger Admin server to describe the attributes of policies for an application. The policies are then stored in a policy repository for clients to download. 

To be able to configure service definitions, REST calls must be made to the Ranger Admin server. See [Apache Ranger PublicAPIsv2](https://ranger.apache.org/apidocs/resource_PublicAPIsv2.html#resource_PublicAPIsv2_createServiceDef_POST)for APIs required in the following section.

**Installing Apache Spark's Service Definition**

To install Apache Spark's service definition, see [Apache Spark plugin for Ranger integration with Amazon EMR](emr-ranger-spark.md).

**Installing EMRFS Service Definition**

To install the S3 service definition for Amazon EMR, see [EMRFS S3 plugin for Ranger integration with Amazon EMR](emr-ranger-emrfs.md).

**Using Hive Service Definition**

Apache Hive can use the existing Ranger service definition that ships with Apache Ranger 2.0 and later. For more information, see [Apache Hive plugin for Ranger integration with Amazon EMR](emr-ranger-hive.md).

# Network traffic rules for integrating with Amazon EMR
<a name="emr-ranger-network"></a>

When Apache Ranger is integrated with your EMR cluster, the cluster needs to communicate with additional servers and AWS.

All Amazon EMR nodes, including core and task nodes, must be able to communicate with the Apache Ranger Admin servers to download policies. If your Apache Ranger Admin is running on Amazon EC2, you need to update the security group to be able to take traffic from the EMR cluster.

In addition to communicating with the Ranger Admin server, all nodes need to be able to communicate with the following AWS services:
+ Amazon S3
+ AWS KMS (if using EMRFS SSE-KMS)
+ Amazon CloudWatch
+ AWS STS

If you are planning to run your EMR cluster within a private subnet, configure the VPC to be able to communicate with these services using either [AWS PrivateLink and VPC endpoints](https://docs.aws.amazon.com/vpc/latest/userguide/endpoint-services-overview.html) in the *Amazon VPC User Guide* or using [network address translation (NAT) instance](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_NAT_Instance.html) in the *Amazon VPC User Guide*.

# IAM roles for native integration with Apache Ranger
<a name="emr-ranger-iam"></a>

The integration between Amazon EMR and Apache Ranger relies on three key roles that you should create before you launch your cluster:
+ A custom Amazon EC2 instance profile for Amazon EMR
+ An IAM role for Apache Ranger Engines
+ An IAM role for other AWS services

This section gives an overview of these roles and the policies that you need to include for each IAM role. For information about creating these roles, see [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md).

# EC2 instance profile for Amazon EMR
<a name="emr-ranger-iam-ec2"></a>

Amazon EMR uses an IAM service role to perform actions on your behalf to provision and manage clusters. The service role for cluster EC2 instances, also called the EC2 instance profile for Amazon EMR, is a special type of service role assigned to every EC2 instance in a cluster at launch.

To define permissions for EMR cluster interaction with Amazon S3 data and with Hive metastore protected by Apache Ranger and other AWS services, define a custom EC2 instance profile to use instead of the `EMR_EC2_DefaultRole` when you launch your cluster.

For more information, see [Service role for cluster EC2 instances (EC2 instance profile)](emr-iam-role-for-ec2.md) and [Customize IAM roles with Amazon EMR](emr-iam-roles-custom.md).

You need to add the following statements to the default EC2 Instance Profile for Amazon EMR to be able to tag sessions and access the AWS Secrets Manager that stores TLS certificates.

```
    {
      "Sid": "AllowAssumeOfRolesAndTagging",
      "Effect": "Allow",
      "Action": ["sts:TagSession", "sts:AssumeRole"],
      "Resource": [
        "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<RANGER_ENGINE-PLUGIN_DATA_ACCESS_ROLE_NAME>",
        "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<RANGER_USER_ACCESS_ROLE_NAME>"
      ]
    },
    {
        "Sid": "AllowSecretsRetrieval",
        "Effect": "Allow",
        "Action": "secretsmanager:GetSecretValue",
        "Resource": [
            "arn:aws:secretsmanager:<REGION>:<AWS_ACCOUNT_ID>:secret:<PLUGIN_TLS_SECRET_NAME>*",
            "arn:aws:secretsmanager:<REGION>:<AWS_ACCOUNT_ID>:secret:<ADMIN_RANGER_SERVER_TLS_SECRET_NAME>*"
        ]
    }
```

**Note**  
For the Secrets Manager permissions, do not forget the wildcard ("\$1") at the end of the secret name or your requests will fail. The wildcard is for secret versions.

**Note**  
Limit the scope of the AWS Secrets Manager policy to only the certificates that are required for provisioning.

# IAM role for Apache Ranger
<a name="emr-ranger-iam-ranger"></a>

This role provides credentials for trusted execution engines, such as Apache Hive and Amazon EMR Record Server to access Amazon S3 data. Use only this role to access Amazon S3 data, including any KMS keys, if you are using S3 SSE-KMS.

This role must be created with the minimum policy stated in the following example.

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Sid": "CloudwatchLogsPermissions",
      "Action": [
        "logs:CreateLogGroup",
        "logs:CreateLogStream",
        "logs:PutLogEvents"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:logs:*:123456789012:log-group:CLOUDWATCH_LOG_GROUP_NAME_IN_SECURITY_CONFIGURATION:*"
      ]
    },
    {
      "Sid": "BucketPermissionsInS3Buckets",
      "Action": [
        "s3:CreateBucket",
        "s3:DeleteBucket",
        "s3:ListAllMyBuckets",
        "s3:ListBucket"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket1",
        "arn:aws:s3:::amzn-s3-demo-bucket2"
      ]
    },
    {
      "Sid": "ObjectPermissionsInS3Objects",
      "Action": [
        "s3:GetObject",
        "s3:DeleteObject",
        "s3:PutObject"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::amzn-s3-demo-bucket1/*",
        "arn:aws:s3:::amzn-s3-demo-bucket2/*"
      ]
    }
  ]
}
```

------

**Important**  
The asterisk "\$1" at the end of the CloudWatch Log Resource must be included to provide permission to write to the log streams.

**Note**  
If you are using EMRFS consistency view or S3-SSE encryption, add permissions to the DynamoDB tables and KMS keys so that execution engines can interact with those engines.

The IAM role for Apache Ranger is assumed by the EC2 Instance Profile Role. Use the following example to create a trust policy that allows the IAM role for Apache Ranger to be assumed by the EC2 instance profile role.

```
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<EC2 INSTANCE PROFILE ROLE NAME eg. EMR_EC2_DefaultRole>"
      },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    }
```

# IAM role for other AWS services for Amazon EMR integration
<a name="emr-ranger-iam-other-AWS"></a>

This role provides users who are not trusted execution engines with credentials to interact with AWS services, if needed. Do not use this IAM role to allow access to Amazon S3 data, unless it's data that should be accessible by all users.

This role will be assumed by the EC2 Instance Profile Role. Use the following example to create a trust policy that allows the IAM role for Apache Ranger to be assumed by the EC2 instance profile role.

```
    {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::<AWS_ACCOUNT_ID>:role/<EC2 INSTANCE PROFILE ROLE NAME eg. EMR_EC2_DefaultRole>"
      },
      "Action": ["sts:AssumeRole", "sts:TagSession"]
    }
```

# Validate your permissions for Amazon EMR integration with Apache Ranger
<a name="emr-ranger-iam-validate"></a>

See [Apache Ranger troubleshooting](emr-ranger-troubleshooting.md) for instructions on validating permissions.

# Create the EMR security configuration
<a name="emr-ranger-security-config"></a>

**Creating an Amazon EMR Security Configuration for Apache Ranger**

Before you launch an Amazon EMR cluster integrated with Apache Ranger, create a security configuration.

------
#### [ Console ]

**To create a security configuration that specifies the AWS Ranger integration option**

1. In the Amazon EMR console, select **Security configurations**, then **Create**.

1. Type a **Name** for the security configuration. You use this name to specify the security configuration when you create a cluster.

1. Under **AWS Ranger Integration**, select **Enable fine-grained access control managed by Apache Ranger**.

1. Select your **IAM role for Apache Ranger** to apply. For more information, see [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md).

1. Select your **IAM role for other AWS services** to apply.

1. Configure the plugins to connect to the Ranger Admin server by entering the Secrets Manager ARN for the Admin server and the address.

1. Select the applications to configure Ranger plugins. Enter the Secrets Manager ARN that contains the private TLS certificate for the plugin.

   If you do not configure Apache Spark or Apache Hive, and they are selected as an application for your cluster, the request fails.

1. Set up other security configuration options as appropriate and choose **Create**. You must enable Kerberos authentication using the cluster-dedicated or external KDC.

**Note**  
You cannot currently use the console to create a security configuration that specifies the AWS Ranger integration option in the AWS GovCloud (US) Region. Security configuration can be done using the CLI.

------
#### [ CLI ]

**To create a security configuration for Apache Ranger integration**

1. Replace `<ACCOUNT ID>` with your AWS account ID.

1. Replace `<REGION>` with the Region that the resource is in.

1. Specify a value for `TicketLifetimeInHours` to determine the period for which a Kerberos ticket issued by the KDC is valid.

1. Specify the address of the Ranger Admin server for `AdminServerURL`.

```
{
    "AuthenticationConfiguration": {
        "KerberosConfiguration": {
            "Provider": "ClusterDedicatedKdc",
            "ClusterDedicatedKdcConfiguration": {
                "TicketLifetimeInHours": 24
            }
        }
    },
    "AuthorizationConfiguration":{
      "RangerConfiguration":{
         "AdminServerURL":"https://_<RANGER ADMIN SERVER IP>_:6182",
         "RoleForRangerPluginsARN":"arn:aws:iam::_<ACCOUNT ID>_:role/_<RANGER PLUGIN DATA ACCESS ROLE NAME>_",
         "RoleForOtherAWSServicesARN":"arn:aws:iam::_<ACCOUNT ID>_:role/_<USER ACCESS ROLE NAME>_",
         "AdminServerSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES ADMIN SERVERS PUBLIC TLS CERTIFICATE WITHOUT VERSION>_",
         "RangerPluginConfigurations":[
            {
               "App":"Spark",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES SPARK PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<SPARK SERVICE NAME eg. amazon-emr-spark>"
            },
            {
               "App":"Hive",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES Hive PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<HIVE SERVICE NAME eg. Hivedev>"
            },
            {
               "App":"EMRFS-S3",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES EMRFS S3 PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<EMRFS S3 SERVICE NAME eg amazon-emr-emrfs>"
            }, 
	      {
               "App":"Trino",
               "ClientSecretARN":"arn:aws:secretsmanager:_<REGION>_:_<ACCOUNT ID>_:secret:_<SECRET NAME THAT PROVIDES TRINO PLUGIN PRIVATE TLS CERTIFICATE WITHOUT VERSION>_",
               "PolicyRepositoryName":"<TRINO SERVICE NAME eg amazon-emr-trino>"
            }
         ],
         "AuditConfiguration":{
            "Destinations":{
               "AmazonCloudWatchLogs":{
                  "CloudWatchLogGroup":"arn:aws:logs:<REGION>:_<ACCOUNT ID>_:log-group:_<LOG GROUP NAME FOR AUDIT EVENTS>_"
               }
            }
         }
      }
   }
}
```

The PolicyRespositoryNames are the service names that are specified in your Apache Ranger Admin.

Create an Amazon EMR security configuration with the following command. Replace security-configuration with a name of your choice. Select this configuration by name when you create your cluster.

```
aws emr create-security-configuration \
--security-configuration file://./security-configuration.json \
--name security-configuration
```

------

**Configure Additional Security Features**

To securely integrate Amazon EMR with Apache Ranger, configure the following EMR security features:
+ Enable Kerberos authentication using the cluster-dedicated or external KDC. For instructions, see [Use Kerberos for authentication with Amazon EMR](emr-kerberos.md).
+ (Optional) Enable encryption in transit or at rest. For more information, see [Encryption options for Amazon EMR](emr-data-encryption-options.md).

For more information, see [Security in Amazon EMR](emr-security.md).

# Store TLS certificates in AWS Secrets Manager
<a name="emr-ranger-tls-certificates"></a>

The Ranger plugins installed on an Amazon EMR cluster and the Ranger Admin server must communicate over TLS to ensure that policy data and other information sent cannot be read if they are intercepted. EMR also mandates that the plugins authenticate to the Ranger Admin server by providing its own TLS certificate and perform two-way TLS authentication. This setup required four certificates to be created: two pairs of private and public TLS certificates. For instructions on installing the certificate to your Ranger Admin server, see [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md). To complete the setup, the Ranger plugins installed on the EMR cluster need two certificates: the public TLS certificate of your admin server, and the private certificate that the plugin will use to authenticate against the Ranger Admin server. To provide these TLS certificates, they must be in the AWS Secrets Manager and provided in a EMR Security Configuration.

**Note**  
It is strongly recommended, but not required, to create a certificate pair for each of your applications to limit impact if one of the plugin certificates becomes compromised.

**Note**  
You need to track and rotate certificates prior to their expiration date. 

## Certificate format
<a name="emr-ranger-tls-cert-format"></a>

Importing the certificates to the AWS Secrets Manager is the same regardless of whether it is the private plugin certificate or the public Ranger admin certificate. Before importing the TLS certificates, the certificates must be in 509x PEM format.

An example of a public certificate is in the format:

```
-----BEGIN CERTIFICATE-----
...Certificate Body...
-----END CERTIFICATE-----
```

An example of a private certificate is in the format:

```
-----BEGIN PRIVATE KEY-----
...Private Certificate Body...
-----END PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
...Trust Certificate Body...
-----END CERTIFICATE-----
```

The private certificate should also contain a trust certificate as well.

You can validate that the certificates are in the correct format by running the following command:

```
openssl x509 -in <PEM FILE> -text
```

## Importing a certificate to the AWS Secrets Manager
<a name="emr-ranger-tls-cert-import"></a>

When creating your Secret in the Secrets Manager, choose **Other type of secrets** under **secret type** and paste your PEM encoded certificate in the **Plaintext** field.

![\[Importing a certificate to AWS Secrets Manager.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-tls-cert-import.png)


# Start an EMR cluster with Apache Ranger
<a name="emr-ranger-start-emr-cluster"></a>

Before you launch an Amazon EMR cluster with Apache Ranger, make sure each component meets the following minimum version requirement:
+ Amazon EMR 5.32.0 or later, or 6.3.0 or later. We recommend that you use the latest Amazon EMR release version.
+ Apache Ranger Admin server 2.x.

Complete the following steps.
+ Install Apache Ranger if you haven't already. For more information, see [Apache Ranger 0.5.0 installation](https://cwiki.apache.org/confluence/display/RANGER/Apache+Ranger+0.5.0+Installation).
+ Make sure there is network connectivity between your Amazon EMR cluster and the Apache Ranger Admin server. See [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md)
+ Create the necessary IAM Roles. See [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md).
+ Create a EMR security configuration for Apache Ranger installation. See more information, see [Create the EMR security configuration](emr-ranger-security-config.md).

# Configure Zeppelin for Apache Ranger-enabled Amazon EMR clusters
<a name="emr-ranger-configure-zeppelin"></a>

The topic covers how to configure [Apache Zeppelin](https://zeppelin.apache.org/) for an Apache Ranger-enabled Amazon EMR cluster so that you can use Zeppelin as a notebook for interactive data exploration. Zeppelin is included in Amazon EMR release versions 5.0.0 and later. Earlier release versions include Zeppelin as a sandbox application. For more information, see [Amazon EMR 4.x release versions](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-4x.html) in the *Amazon EMR Release Guide*.

By default, Zeppelin is configured with a default login and password which is not secure in a multi-tenant environment.

To configure Zeppelin, complete the following steps.

1. **Modify the authentication mechanism**. 

   Modify the `shiro.ini` file to implement your preferred authentication mechanism. Zeppelin supports Active Directory, LDAP, PAM, and Knox SSO. See [Apache Shiro authentication for Apache Zeppelin](https://zeppelin.apache.org/docs/0.8.2/setup/security/shiro_authentication.html) for more information.

1. **Configure Zeppelin to impersonate the end user**

   When you allow Zeppelin to impersonate the end user, jobs submitted by Zeppelin can be run as that end user. Add the following configuration to `core-site.xml`:

   ```
   [
     {
       "Classification": "core-site",
       "Properties": {
         "hadoop.proxyuser.zeppelin.hosts": "*",
         "hadoop.proxyuser.zeppelin.groups": "*"
       },
       "Configurations": [
       ]
     }
   ]
   ```

   Next, add the following configuration to `hadoop-kms-site.xml` located in `/etc/hadoop/conf`:

   ```
   [
     {
       "Classification": "hadoop-kms-site",
       "Properties": {
         "hadoop.kms.proxyuser.zeppelin.hosts": "*",
         "hadoop.kms.proxyuser.zeppelin.groups": "*"
       },
       "Configurations": [
       ]
     }
   ]
   ```

   You can also add these configurations to your Amazon EMR cluster using the console by following the steps in [Reconfigure an instance group in the console](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps-running-cluster.html#emr-configure-apps-running-cluster-console).

1. **Allow Zeppelin to sudo as the end user**

   Create a file `/etc/sudoers.d/90-zeppelin-user` that contains the following:

   ```
   zeppelin ALL=(ALL) NOPASSWD:ALL
   ```

1. **Modify interpreters settings to run user jobs in their own processes**.

   For all interpreters, configure them to instantiate the interpreters "Per User" in "isolated" processes.  
![\[Amazon EMR and Apache Ranger architecture diagram.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/per_user.png)

1. **Modify `zeppelin-env.sh`**

   Add the following to `zeppelin-env.sh` so that Zeppelin starts launch interpreters as the end user:

   ```
   ZEPPELIN_IMPERSONATE_USER=`echo ${ZEPPELIN_IMPERSONATE_USER} | cut -d @ -f1`
   export ZEPPELIN_IMPERSONATE_CMD='sudo -H -u ${ZEPPELIN_IMPERSONATE_USER} bash -c'
   ```

   Add the following to `zeppelin-env.sh` to change the default notebook permissions to read-only to the creator only:

   ```
   export ZEPPELIN_NOTEBOOK_PUBLIC="false"
   ```

   Finally, add the following to `zeppelin-env.sh` to include the EMR RecordServer class path after the first `CLASSPATH` statement:

   ```
   export CLASSPATH="$CLASSPATH:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-connector-common.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-spark-connector.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-client.jar:/usr/share/aws/emr/record-server/lib/aws-emr-record-server-common.jar:/usr/share/aws/emr/record-server/lib/jars/secret-agent-interface.jar"
   ```

1. **Restart Zeppelin.**

   Run the following command to restart Zeppelin:

   ```
   sudo systemctl restart zeppelin
   ```

# Known issues for Amazon EMR integration
<a name="emr-ranger-security-considerations"></a>

**Known Issues**

There is a known issue within Amazon EMR release 5.32 in which the permissions for `hive-site.xml` was changed so that only privileged users can read it as there may be credentials stored within it. This could prevent Hue from reading `hive-site.xml` and cause webpages to continuously reload. If you experience this issue, add the following configuration to fix the issue:

```
[
  {
    "Classification": "hue-ini",
    "Properties": {},
    "Configurations": [
      {
        "Classification": "desktop",
        "Properties": {
          "server_group":"hive_site_reader"
         },
        "Configurations":[
        ]
      }
    ]
  }
]
```

There is a known issue that the EMRFS S3 plugin for Apache Ranger currently does not support Apache Ranger’s Security Zone feature. Access control restrictions defined using the Security Zone feature are not applied on your Amazon EMR clusters.

**Application UIs**

By default, Application UI's do not perform authentication. This includes the ResourceManager UI, NodeManager UI, Livy UI, among others. In addition, any user that has the ability to access the UIs is able to view information about all other users' jobs.

If this behavior is not desired, you should ensure that a security group is used to restrict access to the application UIs by users.

**HDFS Default Permissions**

By default, the objects that users create in HDFS are given world readable permissions. This can potentially cause data readable by users that should not have access to it. To change this behavior such that the default file permissions are set to read and write only by the creator of the job, perform these steps.

When creating your EMR cluster, provide the following configuration:

```
[
  {
    "Classification": "hdfs-site",
    "Properties": {
      "dfs.namenode.acls.enabled": "true",
      "fs.permissions.umask-mode": "077",
      "dfs.permissions.superusergroup": "hdfsadmingroup"
    }
  }
]
```

In addition, run the following bootstrap action:

```
--bootstrap-actions Name='HDFS UMask Setup',Path=s3://elasticmapreduce/hdfs/umask/umask-main.sh
```

# Apache Ranger plugins for Amazon EMR integration scenarios
<a name="emr-ranger-plugins"></a>

Apache Ranger plugins validate the access of a user against the authorization policies defined in the Apache Ranger policy admin server.

**Topics**
+ [Apache Hive plugin for Ranger integration with Amazon EMR](emr-ranger-hive.md)
+ [Apache Spark plugin for Ranger integration with Amazon EMR](emr-ranger-spark.md)
+ [EMRFS S3 plugin for Ranger integration with Amazon EMR](emr-ranger-emrfs.md)
+ [Trino plugin for Ranger integration with Amazon EMR](emr-ranger-trino.md)

# Apache Hive plugin for Ranger integration with Amazon EMR
<a name="emr-ranger-hive"></a>

Apache Hive is a popular execution engine within the Hadoop ecosystem. Amazon EMR provides an Apache Ranger plugin to be able to provide fine-grained access controls for Hive. The plugin is compatible with open source Apache Ranger Admin server version 2.0 and later.

**Topics**
+ [Supported features](#emr-ranger-supported-features)
+ [Installation of service configuration](#emr-ranger-hive-service-config)
+ [Considerations](#emr-ranger-hive-considerations)
+ [Limitations](#emr-ranger-hive-limitations)

## Supported features
<a name="emr-ranger-supported-features"></a>

The Apache Ranger plugin for Hive on EMR supports all the functionality of the open source plugin, which includes database, table, column level access controls and row filtering and data masking. For a table of Hive commands and associated Ranger permissions, see [Hive commands to Ranger permission mapping](https://cwiki.apache.org/confluence/display/RANGER/Hive+Commands+to+Ranger+Permission+Mapping).

## Installation of service configuration
<a name="emr-ranger-hive-service-config"></a>

The Apache Hive plugin is compatible with the existing Hive service definition within Apache Hive Hadoop SQL.

![\[Apache Hive service definition for Hadoop SQL.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger_service_mgr.png)


If you do not have an instance of the service under Hadoop SQL, like shown above, you can create one. Click on the **\$1** next to Hadoop SQL.

1. **Service Name (If displayed)**: Enter the service name. The suggested value is **amazonemrhive**. Make a note of this service name -- it's needed when creating an EMR security configuration.

1. **Display Name**: Enter the name to be displayed for the service. The suggested value is **amazonemrhive**.

![\[Apache Hive service details for Hadoop SQL.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger_create_service.png)


The Apache Hive Config Properties are used to establish a connection to your Apache Ranger Admin server with a HiveServer2 to implement auto complete when creating policies. The properties below are not required to be accurate if you do not have a persistent HiveServer2 process and can be filled with any information.
+ **Username**: Enter a user name for the JDBC connection to an instance of an HiveServer2 instance.
+ **Password**: Enter the password for the user name above.
+ **jdbc.driver.ClassName**: Enter the class name of JDBC class for Apache Hive connectivity. The default value can be used.
+ **jdbc.url**: Enter the JDBC connection string to use when connecting to HiveServer2.
+ **Common Name for Certificate**: The CN field within the certificate used to connect to the admin server from a client plugin. This value must match the CN field in your TLS certificate that was created for the plugin.

![\[Apache Hive service configuration properties.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger_config_props.png)


The **Test Connection** button tests whether the values above can be used to successfully connect to the HiveServer2 instance. Once the service is successfully created, the Service Manager should look like below:

![\[Connected to the HiveServer2 instance\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger_config_connected.png)


## Considerations
<a name="emr-ranger-hive-considerations"></a>

**Hive metadata server**

The Hive metadata server can only be accessed by trusted engines, specifically Hive and `emr_record_server`, to protect against unauthorized access. The Hive metadata server is also accessed by all nodes on the cluster. The required port 9083 provides all nodes access to the main node.

**Authentication**

By default, Apache Hive is configured to authenticate using Kerberos as configured in the EMR Security configuration. HiveServer2 can be configured to authenticate users using LDAP as well. See [Implementing LDAP authentication for Hive on a multi-tenant Amazon EMR cluster](https://aws.amazon.com/blogs/big-data/implementing-ldap-authentication-for-hive-on-a-multi-tenant-amazon-emr-cluster/) for information.

## Limitations
<a name="emr-ranger-hive-limitations"></a>

The following are current limitations for the Apache Hive plugin on Amazon EMR 5.x:
+ Hive roles are not currently supported. Grant, Revoke statements are not supported.
+ Hive CLI is not supported. JDBC/Beeline is the only authorized way to connect Hive.
+ `hive.server2.builtin.udf.blacklist` configuration should be populated with UDFs that you deem unsafe.

# Apache Spark plugin for Ranger integration with Amazon EMR
<a name="emr-ranger-spark"></a>

Amazon EMR has integrated EMR RecordServer to provide fine-grained access control for SparkSQL. EMR's RecordServer is a privileged process running on all nodes on an Apache Ranger-enabled cluster. When a Spark driver or executor runs a SparkSQL statement, all metadata and data requests go through the RecordServer. To learn more about EMR RecordServer, see the [Amazon EMR components for use with Apache Ranger](emr-ranger-components.md) page.

**Topics**
+ [Supported features](#emr-ranger-spark-supported-features)
+ [Redeploy service definition to use INSERT, ALTER, or DDL statements](#emr-ranger-spark-redeploy-service-definition)
+ [Installation of service definition](#emr-ranger-spark-install-servicedef)
+ [Creating SparkSQL policies](#emr-ranger-spark-create-sparksql)
+ [Considerations](#emr-ranger-spark-considerations)
+ [Limitations](#emr-ranger-spark-limitations)

## Supported features
<a name="emr-ranger-spark-supported-features"></a>


| SQL statement/Ranger action | STATUS | Supported EMR release | 
| --- | --- | --- | 
|  SELECT  |  Supported  |  As of 5.32  | 
|  SHOW DATABASES  |  Supported  |  As of 5.32  | 
|  SHOW COLUMNS  |  Supported  |  As of 5.32  | 
|  SHOW TABLES  |  Supported  |  As of 5.32  | 
|  SHOW TABLE PROPERTIES  |  Supported  |  As of 5.32  | 
|  DESCRIBE TABLE  |  Supported  |  As of 5.32  | 
|  INSERT OVERWRITE  |  Supported  |  As of 5.34 and 6.4  | 
| INSERT INTO | Supported | As of 5.34 and 6.4 | 
|  ALTER TABLE  |  Supported  |  As of 6.4  | 
|  CREATE TABLE  |  Supported  |  As of 5.35 and 6.7  | 
|  CREATE DATABASE  |  Supported  |  As of 5.35 and 6.7  | 
|  DROP TABLE  |  Supported  |  As of 5.35 and 6.7  | 
|  DROP DATABASE  |  Supported  |  As of 5.35 and 6.7  | 
|  DROP VIEW  |  Supported  |  As of 5.35 and 6.7  | 
|  CREATE VIEW  |  Not Supported  |    | 

The following features are supported when using SparkSQL:
+ Fine-grained access control on tables within the Hive Metastore, and policies can be created at a database, table, and column level.
+ Apache Ranger policies can include grant policies and deny policies to users and groups.
+ Audit events are submitted to CloudWatch Logs.

## Redeploy service definition to use INSERT, ALTER, or DDL statements
<a name="emr-ranger-spark-redeploy-service-definition"></a>

**Note**  
Starting with Amazon EMR 6.4, you can use Spark SQL with the statements: INSERT INTO, INSERT OVERWRITE, or ALTER TABLE. Starting with Amazon EMR 6.7, you can use Spark SQL to create or drop databases and tables. If you have an existing installation on Apache Ranger server with Apache Spark service definitions deployed, use the following code to redeploy the service definitions.  

```
# Get existing Spark service definition id calling Ranger REST API and JSON processor
curl --silent -f -u <admin_user_login>:<password_for_ranger_admin_user> \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef/name/amazon-emr-spark' | jq .id

# Download the latest Service definition
wget https://s3.amazonaws.com/elasticmapreduce/ranger/service-definitions/version-2.0/ranger-servicedef-amazon-emr-spark.json

# Update the service definition using the Ranger REST API
curl -u <admin_user_login>:<password_for_ranger_admin_user> -X PUT -d @ranger-servicedef-amazon-emr-spark.json \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef/<Spark service definition id from step 1>'
```

## Installation of service definition
<a name="emr-ranger-spark-install-servicedef"></a>

The installation of EMR's Apache Spark service definition requires the Ranger Admin server to be setup. See [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md).

Follow these steps to install the Apache Spark service definition:

**Step 1: SSH into the Apache Ranger Admin server**

For example:

```
ssh ec2-user@ip-xxx-xxx-xxx-xxx.ec2.internal
```

**Step 2: Download the service definition and Apache Ranger Admin server plugin**

In a temporary directory, download the service definition. This service definition is supported by Ranger 2.x versions.

```
mkdir /tmp/emr-spark-plugin/
cd /tmp/emr-spark-plugin/

wget https://s3.amazonaws.com/elasticmapreduce/ranger/service-definitions/version-2.0/ranger-spark-plugin-2.x.jar
wget https://s3.amazonaws.com/elasticmapreduce/ranger/service-definitions/version-2.0/ranger-servicedef-amazon-emr-spark.json
```

**Step 3: Install the Apache Spark plugin for Amazon EMR**

```
export RANGER_HOME=.. # Replace this Ranger Admin's home directory eg /usr/lib/ranger/ranger-2.0.0-admin
mkdir $RANGER_HOME/ews/webapp/WEB-INF/classes/ranger-plugins/amazon-emr-spark
mv ranger-spark-plugin-2.x.jar $RANGER_HOME/ews/webapp/WEB-INF/classes/ranger-plugins/amazon-emr-spark
```

**Step 4: Register the Apache Spark service definition for Amazon EMR**

```
curl -u *<admin users login>*:*_<_**_password_ **_for_** _ranger admin user_**_>_* -X POST -d @ranger-servicedef-amazon-emr-spark.json \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef'
```

If this command runs successfully, you see a new service in your Ranger Admin UI called "AMAZON-EMR-SPARK", as shown in the following image (Ranger version 2.0 is shown).

![\["AMAZON-EMR-SPARK" registered in Ranger Admin.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-amazon-emr-spark.png)


**Step 5: Create an instance of the AMAZON-EMR-SPARK application**

**Service Name (If displayed):** The service name that will be used. The suggested value is **amazonemrspark**. Note this service name as it will be needed when creating an EMR security configuration.

**Display Name:** The name to be displayed for this instance. The suggested value is **amazonemrspark**.

**Common Name For Certificate:** The CN field within the certificate used to connect to the admin server from a client plugin. This value must match the CN field in your TLS certificate that was created for the plugin.

![\[Ranger Admin create service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-service.png)


**Note**  
The TLS certificate for this plugin should have been registered in the trust store on the Ranger Admin server. See [TLS certificates for Apache Ranger integration with Amazon EMR](emr-ranger-admin-tls.md) for more details.

## Creating SparkSQL policies
<a name="emr-ranger-spark-create-sparksql"></a>

When creating a new policy, the fields to fill in are:

**Policy Name**: The name of this policy.

**Policy Label**: A label that you can put on this policy.

**Database**: The database that this policy applies to. The wildcard "\$1" represents all databases.

**Table**: The tables that this policy applies to. The wildcard "\$1" represents all tables.

**EMR Spark Column**: The columns that this policy applies to. The wildcard "\$1" represents all columns.

**Description**: A description of this policy.

![\[Ranger Admin create SparkSQL policy details.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-policy-details.png)


To specify the users and groups, enter the users and groups below to grant permissions. You can also specify exclusions for the **allow** conditions and **deny** conditions.

![\[Ranger Admin SparkSQL policy details allow conditions.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-policy-allow-conditions.png)


After specifying the allow and deny conditions, click **Save**.

## Considerations
<a name="emr-ranger-spark-considerations"></a>

Each node within the EMR cluster must be able to connect to the main node on port 9083.

## Limitations
<a name="emr-ranger-spark-limitations"></a>

The following are current limitations for the Apache Spark plugin:
+ Record Server will always connect to HMS running on an Amazon EMR cluster. Configure HMS to connect to Remote Mode, if required. You should not put config values inside the Apache Spark Hive-site.xml configuration file.
+ Tables created using Spark datasources on CSV or Avro are not readable using EMR RecordServer. Use Hive to create and write data, and read using Record.
+ Delta Lake, Hudi and Iceberg tables aren't supported.
+ Users must have access to the default database. This is a requirement for Apache Spark.
+ Ranger Admin server does not support auto-complete.
+ The SparkSQL plugin for Amazon EMR does not support row filters or data masking.
+ When using ALTER TABLE with Spark SQL, a partition location must be the child directory of a table location. Inserting data into a partition where the partition location is different from the table location is not supported.

# EMRFS S3 plugin for Ranger integration with Amazon EMR
<a name="emr-ranger-emrfs"></a>

To make it easier to provide access controls against objects in S3 on a multi-tenant cluster, the EMRFS S3 plugin provides access controls to the data within S3 when accessing it through EMRFS. You can allow access to S3 resources at a user and group level.

To achieve this, when your application attempts to access data within S3, EMRFS sends a request for credentials to the Secret Agent process, where the request is authenticated and authorized against an Apache Ranger plugin. If the request is authorized, then the Secret Agent assumes the IAM role for Apache Ranger Engines with a restricted policy to generate credentials that only have access to the Ranger policy that allowed the access. The credentials are then passed back to EMRFS to access S3.

**Topics**
+ [Supported features](#emr-ranger-emrfs-features)
+ [Installation of service configuration](#emr-ranger-emrfs-service-config)
+ [Creating EMRFS S3 policies](#emr-ranger-emrfs-create-policies)
+ [EMRFS S3 policies usage notes](#emr-ranger-emrfs-considerations)
+ [Limitations](#emr-ranger-emrfs-limitations)

## Supported features
<a name="emr-ranger-emrfs-features"></a>

EMRFS S3 plugin provides storage level authorization. Policies can be created to provide access to users and groups to S3 buckets and prefixes. Authorization is done only against EMRFS.

## Installation of service configuration
<a name="emr-ranger-emrfs-service-config"></a>

To install the EMRFS service definition, you must set up the Ranger Admin server. To set up the server, see [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md).

Follow these steps to install the EMRFS service definition.

**Step 1: SSH into the Apache Ranger Admin server**.

For example:

```
ssh ec2-user@ip-xxx-xxx-xxx-xxx.ec2.internal
```

**Step 2: Download the EMRFS service definition**.

In a temporary directory, download the Amazon EMR service definition. This service definition is supported by Ranger 2.x versions.

```
wget https://s3.amazonaws.com/elasticmapreduce/ranger/service-definitions/version-2.0/ranger-servicedef-amazon-emr-emrfs.json
```

**Step 3: Register EMRFS S3 service definition**.

```
curl -u *<admin users login>*:*_<_**_password_ **_for_** _ranger admin user_**_>_* -X POST -d @ranger-servicedef-amazon-emr-emrfs.json \
-H "Accept: application/json" \
-H "Content-Type: application/json" \
-k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef'
```

If this command runs successfully, you see a new service in the Ranger Admin UI called "AMAZON-EMR-S3", as shown in the following image (Ranger version 2.0 is shown).

![\[Ranger Admin create EMRFS S3 service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-service-EMRFS.png)


**Step 4: Create an instance of the AMAZON-EMR-EMRFS application**.

Create an instance of the service definition.
+ Click on the **\$1** next to AMAZON-EMR-EMRFS.

Fill in the following fields:

**Service Name (If displayed)**: The suggested value is **amazonemrs3**. Note this service name as it will be needed when creating an EMR security configuration. 

**Display Name**: The name displayed for this service. The suggested value is **amazonemrs3**.

**Common Name For Certificate**: The CN field within the certificate used to connect to the admin server from a client plugin. This value must match the CN field in the TLS certificate that was created for the plugin.

![\[Ranger Admin edit EMRFS S3 service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-edit-service-EMRFS.png)


**Note**  
The TLS certificate for this plugin should have been registered in the trust store on the Ranger Admin server. See [TLS certificates for Apache Ranger integration with Amazon EMR](emr-ranger-admin-tls.md) for more details.

When the service is created, the Service Manager includes "AMAZON-EMR-EMRFS", as shown in the following image.

![\[Ranger Admin showing new EMRFS S3 service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-new-service-EMRFS.png)


## Creating EMRFS S3 policies
<a name="emr-ranger-emrfs-create-policies"></a>

To create a new policy in the **Create policy** page of the Service Manager, fill in the following fields.

**Policy Name**: The name of this policy.

**Policy Label**: A label that you can put on this policy.

**S3 Resource**: A resource starting with the bucket and optional prefix. See [EMRFS S3 policies usage notes](#emr-ranger-emrfs-considerations) for information on best practices. Resources in Ranger Admin server should not contain **s3://**, **s3a://** or **s3n://**.

![\[Ranger Admin showing create policy for EMRFS S3 service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-policy-EMRFS.png)


You can specify users and groups to grant permissions. You can also specify exclusions for **allow** conditions and **deny** conditions.

![\[Ranger Admin showing user/group permissions for EMRFS S3 policy.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-permissions-EMRFS.png)


**Note**  
A maximum of three resources are allowed for each policy. Adding more than three resources may result in an error when this policy is used on an EMR cluster. Adding more than three policies displays a reminder about the policy limit.

## EMRFS S3 policies usage notes
<a name="emr-ranger-emrfs-considerations"></a>

When creating S3 policies within Apache Ranger, there are some usage considerations to be aware of.

### Permissions to multiple S3 objects
<a name="emr-ranger-emrfs-considerations-s3objects"></a>

You can use recursive policies and wildcard expressions to give permissions to multiple S3 objects with common prefixes. Recursive policies give permissions to all objects with a common prefix. Wildcard expressions select multiple prefixes. Together, they give permissions to all objects with multiple common prefixes as shown in the following examples.

**Example Using a recursive policy**  
Suppose you want permissions to list all the parquet files in an S3 bucket organized as follows.  

```
s3://sales-reports/americas/
    +- year=2000
    |      +- data-q1.parquet
    |      +- data-q2.parquet
    +- year=2019
    |      +- data-q1.json
    |      +- data-q2.json
    |      +- data-q3.json
    |      +- data-q4.json
    |
    +- year=2020
    |      +- data-q1.parquet
    |      +- data-q2.parquet
    |      +- data-q3.parquet
    |      +- data-q4.parquet
    |      +- annual-summary.parquet    
    +- year=2021
```
First, consider the parquet files with the prefix `s3://sales-reports/americas/year=2000`. You can grant GetObject permissions to all of them in two ways:  
**Using non-recursive policies**: One option is to use two separate non-recursive policies, one for the directory and the other for the files.   
The first policy grants permission to the prefix `s3://sales-reports/americas/year=2020` (there is no trailing `/`).  

```
- S3 resource = "sales-reports/americas/year=2000"
- permission = "GetObject"
- user = "analyst"
```
The second policy uses wildcard expression to grant permissions all the files with prefix `sales-reports/americas/year=2020/` (note the trailing `/`).  

```
- S3 resource = "sales-reports/americas/year=2020/*"
- permission = "GetObject"
- user = "analyst"
```
**Using a recursive policy**: A more convenient alternative is to use a single recursive policy and grant recursive permission to the prefix.  

```
 - S3 resource = "sales-reports/americas/year=2020"
 - permission = "GetObject"
 - user = "analyst"
 - is recursive = "True"
```
So far, only the parquet files with the prefix `s3://sales-reports/americas/year=2000` have been included. You can now also include the parquet files with a different prefix, `s3://sales-reports/americas/year=2020`, into the same recursive policy by introducing a wildcard expression as follows.  

```
 - S3 resource = "sales-reports/americas/year=20?0"
 - permission = "GetObject"
 - user = "analyst"
 - is recursive = "True"
```

### Policies for PutObject and DeleteObject permissions
<a name="emr-ranger-emrfs-considerations-putobject"></a>

Writing policies for `PutObject` and `DeleteObject` permissions to files on EMRFS need special care because, unlike GetObject permissions, they need additional recursive permissions granted to the prefix.

**Example Policies for PutObject and DeleteObject permissions**  
For example, deleting the file `annual-summary.parquet` requires not only a DeleteObject permission to the actual file.  

```
- S3 resource = "sales-reports/americas/year=2020/annual-summary.parquet"
- permission = "DeleteObject"
- user = "analyst"
```
It also requires a policy granting recursive `GetObject` and `PutObject` permissions to its prefix.  
Similarly, modifying the file `annual-summary.parquet`, requires not only a `PutObject` permission to the actual file.  

```
- S3 resource = "sales-reports/americas/year=2020/annual-summary.parquet"
- permission = "PutObject"
- user = "analyst"
```
It also requires a policy granting recursive `GetObject` permission to its prefix.  

```
- S3 resource = "sales-reports/americas/year=2020"
- permission = "GetObject"
- user = "analyst"
- is recursive = "True"
```

### Wildcards in policies
<a name="emr-ranger-emrfs-considerations-wildcards"></a>

There are two areas in which wildcards can be specified. When specifying an S3 resource, the "\$1" and "?" can be used. The "\$1" provides matching against an S3 path and matches everything after the prefix. For example, the following policy.

```
S3 resource = "sales-reports/americas/*"
```

This matches the following S3 paths.

```
sales-reports/americas/year=2020/
sales-reports/americas/year=2019/
sales-reports/americas/year=2019/month=12/day=1/afile.parquet 
sales-reports/americas/year=2018/month=6/day=1/afile.parquet 
sales-reports/americas/year=2017/afile.parquet
```

The "?" wildcard matches only a single character. For example, for the policy.

```
S3 resource = "sales-reports/americas/year=201?/"
```

This matches the following S3 paths.

```
sales-reports/americas/year=2019/
sales-reports/americas/year=2018/
sales-reports/americas/year=2017/
```

### Wildcards in users
<a name="emr-ranger-emrfs-considerations-wildcards-in-users"></a>

There are two built-in wildcards when assigning users to provide access to users. The first is the "\$1USER\$1" wildcard that provides access to all users. The second wildcard is "\$1OWNER\$1", which provides access to the owner of a particular object or directly. However, the "\$1USER\$1" wildcard is currently not supported.

## Limitations
<a name="emr-ranger-emrfs-limitations"></a>

The following are current limitations of the EMRFS S3 plugin:
+ Apache Ranger policies can have at most three policies.
+ Access to S3 must be done through EMRFS and can be used with Hadoop-related applications. The following is not supported:

  - Boto3 libraries

  - AWS SDK and AWK CLI

  - S3A open source connector
+ Apache Ranger deny policies are not supported.
+ Operations on S3 with keys having CSE-KMS encryption are currently not supported.
+ Cross-Region support is not supported.
+ Apache Ranger’s Security Zone feature is not supported. Access control restrictions defined using the Security Zone feature are not applied on your Amazon EMR clusters.
+ The Hadoop user does not generate any audit events as Hadoop always accesses the EC2 Instance Profile.
+ It's recommended that you disable Amazon EMR Consistency View. S3 is strongly consistent, so it's no longer needed. See [Amazon S3 strong consistency](https://aws.amazon.com/s3/consistency/) for more information.
+ The EMRFS S3 plugin makes numerous STS calls. It's advised that you do load testing on a development account and monitor STS call volume. It is also recommended that you make an STS request to raise AssumeRole service limits.
+ The Ranger Admin server doesn't support auto-complete.

# Trino plugin for Ranger integration with Amazon EMR
<a name="emr-ranger-trino"></a>

Trino (previously PrestoSQL) is a SQL query engine that you can use to run queries on data sources such as HDFS, object storage, relational databases, and NoSQL databases. It eliminates the need to migrate data into a central location and allows you to query the data from wherever it sits. Amazon EMR provides an Apache Ranger plugin to provide fine-grained access controls for Trino. The plugin is compatible with open source Apache Ranger Admin server version 2.0 and later.

**Topics**
+ [Supported features](#emr-ranger-trino-features)
+ [Installation of service configuration](#emr-ranger-trino-service-config)
+ [Creating Trino policies](#emr-ranger-trino-create-policies)
+ [Considerations](#emr-ranger-trino-considerations)
+ [Limitations](#emr-ranger-trino-limitations)

## Supported features
<a name="emr-ranger-trino-features"></a>

The Apache Ranger plugin for Trino on Amazon EMR supports all the functionality of the Trino query engine that is protected by fine-grained access control. This includes database, table, column level access controls and row filtering and data masking. Apache Ranger policies can include grant policies and deny policies to users and groups. Audit events are also submitted to CloudWatch logs.

## Installation of service configuration
<a name="emr-ranger-trino-service-config"></a>

The installation of the Trino service definition requires that the Ranger Admin server be set up. To set up the Ranger Admin sever, see [Set up a Ranger Admin server to integrate with Amazon EMR](emr-ranger-admin.md).

Follow these steps to install the Trino service definition.

1. SSH into the Apache Ranger Admin server.

   ```
   ssh ec2-user@ip-xxx-xxx-xxx-xxx.ec2.internal
   ```

   

1. Uninstall the Presto server plugin, if it exists. Run the following command. If this errors out with a “Service not found” error, this means the Presto server plugin wasn't installed on your server. Proceed to the next step.

   ```
   curl -f -u *<admin users login>*:*_<_**_password_ **_for_** _ranger admin user_**_>_* -X DELETE -k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef/name/presto'
   ```

1. Download the service definition and Apache Ranger Admin server plugin. In a temporary directory, download the service definition. This service definition is supported by Ranger 2.x versions.

   ```
   wget https://s3.amazonaws.com/elasticmapreduce/ranger/service-definitions/version-2.0/ranger-servicedef-amazon-emr-trino.json
   ```

1. Register the Apache Trino service definition for Amazon EMR.

   ```
   curl -u *<admin users login>*:*_<_**_password_ **_for_** _ranger admin user_**_>_* -X POST -d @ranger-servicedef-amazon-emr-trino.json \
   -H "Accept: application/json" \
   -H "Content-Type: application/json" \
   -k 'https://*<RANGER SERVER ADDRESS>*:6182/service/public/v2/api/servicedef'
   ```

   If this command runs successfully, you see a new service in your Ranger Admin UI called `TRINO`, as shown in the following image.  
![\[Ranger Admin create service.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-service-trino.png)

1. Create an instance of the `TRINO` application, entering the following information.

   **Service Name**: The service name that you'll use. The suggested value is `amazonemrtrino`. Note this service name, as it will be needed when creating an Amazon EMR security configuration.

   **Display Name**: The name to be displayed for this instance. The suggested value is `amazonemrtrino`.  
![\[Ranger Admin display name.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-display-name-trino.png)

   **jdbc.driver.ClassName**: The class name of JDBC class for Trino connectivity. You can use the default value.

   **jdbc.url**: The JDBC connection string to use when connecting to Trino coordinator.

   **Common Name For Certificate**: The CN field within the certificate used to connect to the admin server from a client plugin. This value must match the CN field in your TLS certificate that was created for the plugin.  
![\[Ranger Admin common name.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-common-name-trino.png)

   Note that the TLS certificate for this plugin should have been registered in the trust store on the Ranger Admin server. For more information, see [TLS certificates](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-ranger-admin-tls.html).

## Creating Trino policies
<a name="emr-ranger-trino-create-policies"></a>

When you create a new policy, fill in the following fields.

**Policy Name**: The name of this policy.

**Policy Label**: A label that you can put on this policy.

**Catalog**: The catalog that this policy applies to. The wildcard "\$1" represents all catalogs.

**Schema**: The schemas that this policy applies to. The wildcard "\$1" represents all schemas.

**Table**: The tables that this policy applies to. The wildcard "\$1" represents all tables.

**Column**: The columns that this policy applies to. The wildcard "\$1" represents all columns.

**Description**: A description of this policy.

Other types of policies exist for the **Trino User** (for user impersonation access), the **Trino System/Session Property** (for altering engine system or session properties), **Functions/Procedures** (for allowing function or procedure calls), and the **URL** (for granting read/write access to the engine on data locations).

![\[Ranger Admin create policy details.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-policy-details-trino.png)


To grant permissions to specific users and groups, enter the users and groups. You can also specify exclusions for **allow** conditions and **deny** conditions.

![\[Ranger Admin policy details allow deny conditions.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-create-policy-allow-conditions-trino.png)


After specifying the allow and deny conditions, choose **Save**.

## Considerations
<a name="emr-ranger-trino-considerations"></a>

When creating Trino policies within Apache Ranger, there are some usage considerations to be aware of.

**Hive metadata server**

The Hive metadata server can only be accessed by trusted engines, specifically the Trino engine, to protect against unauthorized access. The Hive metadata server is also accessed by all nodes on the cluster. The required port 9083 provides all nodes access to the main node.

**Authentication**

By default, Trino is configured to authenticate using Kerberos as configured in the Amazon EMR security configuration.

**In-transit encryption required**

The Trino plugin requires you to have in-transit encryption enabled in the Amazon EMR security configuration. To enable encryption, see [Encryption in transit](emr-data-encryption-options.md#emr-encryption-intransit).

## Limitations
<a name="emr-ranger-trino-limitations"></a>

The following are current limitations of the Trino plugin:
+ Ranger Admin server doesn't support auto-complete.

# Apache Ranger troubleshooting
<a name="emr-ranger-troubleshooting"></a>

Here are some commonly diagnosed issues related to using Apache Ranger.

## Recommendations
<a name="emr-ranger-troubleshooting-recommendations"></a>
+ **Test using a single main node cluster:** Single node master clusters provision quicker than a multi-node cluster which can decrease the time for each testing iteration.
+ **Set development mode on the cluster.** When starting your EMR cluster, set the `--additional-info"` parameter to:

  `'{"clusterType":"development"}'`

  This parameter can only be set through the AWS CLI or AWS SDK and is not available through the Amazon EMR console. When this flag is set, and the master fails to provision, the Amazon EMR service keeps the cluster alive for some time before it decommissions it. This time is very useful for probing various log files before the cluster is terminated.

# EMR cluster failed to provision
<a name="emr-ranger-troubleshooting-cluster-failed"></a>

There are several reasons why an Amazon EMR cluster may fail to start. Here are a few ways to diagnose the issue.

**Check EMR provisioning logs**

Amazon EMR uses Puppet to install and configure applications on a cluster. Looking at the logs will provide details as to if there are any errors during the provisioning phase of a cluster. The logs are accessible on cluster or S3 if logs are configured to be pushed to S3.

The logs are stored in `/var/log/provision-node/apps-phase/0/{UUID}/puppet.log` on the disk and `s3://<LOG LOCATION>/<CLUSTER ID>/node/<EC2 INSTANCE ID>/provision-node/apps-phase/0/{UUID}/puppet.log.gz.`

**Common Error Messages**


| Error message | Cause | 
| --- | --- | 
|  Puppet (err): Systemd start for emr-record-server failed\$1 journalctl log for emr-record-server:  |  EMR Record Server failed to start. See EMR Record Server logs below.  | 
|  Puppet (err): Systemd start for emr-record-server failed\$1 journalctl log for emrsecretagent:  |  EMR Secret Agent failed to start. See Check Secret Agent logs below.  | 
|  /Stage[main]/Ranger\$1plugins::Ranger\$1hive\$1plugin/Ranger\$1plugins::Prepare\$1two\$1way\$1tls[configure 2-way TLS in Hive plugin]/Exec[create keystore and truststore for Ranger Hive plugin]/returns (notice): 140408606197664:error:0906D06C:PEM routines:PEM\$1read\$1bio:no start line:pem\$1lib.c:707:Expecting: ANY PRIVATE KEY  |  The private TLS certificate in Secret Manager for the Apache Ranger plugin certificate is not in the correct format or is not a private certificate. See [TLS certificates for Apache Ranger integration with Amazon EMR](emr-ranger-admin-tls.md) for certificate formats.  | 
|  /Stage[main]/Ranger\$1plugins::Ranger\$1s3\$1plugin/Ranger\$1plugins::Prepare\$1two\$1way\$1tls[configure 2-way TLS in Ranger s3 plugin]/Exec[create keystore and truststore for Ranger amazon-emr-s3 plugin]/returns (notice): An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:sts::XXXXXXXXXXX:assumed-role/EMR\$1EC2\$1DefaultRole/i-XXXXXXXXXXXX is not authorized to perform: secretsmanager:GetSecretValue on resource: arn:aws:secretsmanager:us-east-1:XXXXXXXXXX:secret:AdminServer-XXXXX  |  The EC2 Instance profile role does not have the correct permissions to retrieve the TLS certificates from Secrets Agent.  | 

**Check SecretAgent logs**

Secret Agent logs are located at `/emr/secretagent/log/` on an EMR node, or in the `s3://<LOG LOCATION>/<CLUSTER ID>/node/<EC2 INSTANCE ID>/daemons/secretagent/` directory in S3.

**Common Error Messages**


| Error message | Cause | 
| --- | --- | 
|  Exception in thread "main" com.amazonaws.services.securitytoken.model.AWSSecurityTokenServiceException: User: arn:aws:sts::XXXXXXXXXXXX:assumed-role/EMR\$1EC2\$1DefaultRole/i-XXXXXXXXXXXXXXX is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::XXXXXXXXXXXX:role/\$1RangerPluginDataAccessRole\$1 (Service: AWSSecurityTokenService; Status Code: 403; Error Code: AccessDenied; Request ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX; Proxy: null)  |  The above exception means that the EMR EC2 instance profile role does not have permissions to assume the role **RangerPluginDataAccessRole**. See [IAM roles for native integration with Apache Ranger](emr-ranger-iam.md).  | 
|  ERROR qtp54617902-149: Web App Exception Occurred javax.ws.rs.NotAllowedException: HTTP 405 Method Not Allowed  |  These errors can be safely ignored.  | 

**Check Record Server Logs (for SparkSQL)**

EMR Record Server logs are available at /var/log/emr-record-server/ on an EMR node, or they can be found in the s3://<LOG LOCATION>/<CLUSTER ID>/node/<EC2 INSTANCE ID>/daemons/emr-record-server/ directory in S3.

**Common Error Messages**


| Error message | Cause | 
| --- | --- | 
|  InstanceMetadataServiceResourceFetcher:105 - [] Fail to retrieve token com.amazonaws.SdkClientException: Failed to connect to service endpoint   |  The EMR SecretAgent failed to come up or is having an issue. Inspect the SecretAgent logs for errors and the puppet script to determine if there were any provisioning errors.  | 

# Queries are unexpectedly failing for Ranger integration with Amazon EMR
<a name="emr-ranger-troubleshooting-queries-failed"></a>

**Check Apache Ranger plugin logs (Apache Hive, EMR RecordServer, EMR SecretAgent, etc., logs)**

This section is common across all applications that integrate with the Ranger plugin, such as Apache Hive, EMR Record Server, and EMR SecretAgent.

**Common Error Messages**


| Error message | Cause | 
| --- | --- | 
|  ERROR PolicyRefresher:272 - [] PolicyRefresher(serviceName=policy-repository): failed to find service. Will clean up local cache of policies (-1)   |  This error messages means that the service name you provided in the EMR security configuration does not match a service policy repository in the Ranger Admin server.  | 

If within Ranger Admin server your AMAZON-EMR-SPARK service looks like the following, then you should enter **amazonemrspark** as the service name.

![\[Ranger Admin server showing AMAZON-EMR-SPARK troubleshooting.\]](http://docs.aws.amazon.com/emr/latest/ManagementGuide/images/ranger-amazon-emr-spark-troubleshooting.png)
