# Apache Hive
<a name="emr-hive"></a>

Hive is an open-source, data warehouse, and analytic package that runs on top of a Hadoop cluster. Hive scripts use an SQL-like language called Hive QL (query language) that abstracts programming models and supports typical data warehouse interactions. Hive enables you to avoid the complexities of writing Tez jobs based on directed acyclic graphs (DAGs) or MapReduce programs in a lower level computer language, such as Java. 

Hive extends the SQL paradigm by including serialization formats. You can also customize query processing by creating table schema that match your data, without touching the data itself. While SQL only supports primitive value types, such as dates, numbers, and strings), Hive table values are structured elements, such as JSON objects, any user-defined data type, or any function written in Java. 

For more information about Hive, see [http://hive.apache.org/](http://hive.apache.org/).

The following table lists the version of Hive included in the latest release of the Amazon EMR 7.x series, along with the components that Amazon EMR installs with Hive.

For the version of components installed with Hive in this release, see [Release 7.12.0 Component Versions](emr-7120-release.md).


**Hive version information for emr-7.12.0**  

| Amazon EMR Release Label | Hive Version | Components Installed With Hive | 
| --- | --- | --- | 
| emr-7.12.0 | Hive 3.1.3-amzn-21 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-hdfs-zkfc, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 

The following table lists the version of Hive included in the latest release of the Amazon EMR 6.x series, along with the components that Amazon EMR installs with Hive.

For the version of components installed with Hive in this release, see [Release 6.15.0 Component Versions](emr-6150-release.md).


**Hive version information for emr-6.15.0**  

| Amazon EMR Release Label | Hive Version | Components Installed With Hive | 
| --- | --- | --- | 
| emr-6.15.0 | Hive 3.1.3-amzn-8 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 

The following table lists the version of Hive included in the latest release of the Amazon EMR 5.x series, along with the components that Amazon EMR installs with Hive.

For the version of components installed with Hive in this release, see [Release 5.36.2 Component Versions](emr-5362-release.md).


**Hive version information for emr-5.36.2**  

| Amazon EMR Release Label | Hive Version | Components Installed With Hive | 
| --- | --- | --- | 
| emr-5.36.2 | Hive 2.3.9-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 

Beginning with Amazon EMR 5.18.0, you can use the Amazon EMR artifact repository to build your job code against the exact versions of libraries and dependencies that are available with specific Amazon EMR releases. For more information, see [Checking dependencies using the Amazon EMR artifact repository](emr-artifact-repository.md).

**Topics**
+ [Differences and considerations for Hive on Amazon EMR](emr-hive-differences.md)
+ [Configuring an external metastore for Hive](emr-metastore-external-hive.md)
+ [Use the Hive JDBC driver](HiveJDBCDriver.md)
+ [Improve Hive performance](emr-hive-s3-performance.md)
+ [Using Hive Live Long and Process (LLAP)](emr-hive-llap.md)
+ [Encryption in Hive](hive-encryption.md)
+ [Hive release history](Hive-release-history.md)

# Differences and considerations for Hive on Amazon EMR
<a name="emr-hive-differences"></a>

## Differences between Apache Hive on Amazon EMR and Apache Hive
<a name="emr-hive-apache-diff"></a>

This section describes the differences between Hive on Amazon EMR and the default versions of Hive available at [http://svn.apache.org/viewvc/hive/branches/](http://svn.apache.org/viewvc/hive/branches/). 

### Hive authorization
<a name="emr-hive-authorization"></a>

 Amazon EMR supports [Hive authorization](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization) for HDFS but not for EMRFS and Amazon S3. Amazon EMR clusters run with authorization disabled by default.

### Hive file merge behavior with Amazon S3
<a name="emr-hive-filemerge"></a>

Apache Hive merges small files at the end of a map-only job if `hive.merge.mapfiles` is true and the merge is triggered only if the average output size of the job is less than the `hive.merge.smallfiles.avgsize` setting. Amazon EMR Hive has exactly the same behavior if the final output path is in HDFS. If the output path is in Amazon S3, the `hive.merge.smallfiles.avgsize` parameter is ignored. In that situation, the merge task is always triggered if `hive.merge.mapfiles` is set to `true`.

### ACID transactions and Amazon S3
<a name="emr-hive-acid"></a>

Amazon EMR 6.1.0 and later supports Hive ACID (Atomicity, Consistency, Isolation, Durability) transactions so it complies with the ACID properties of a database. With this feature, you can run INSERT, UPDATE, DELETE, and MERGE operations in Hive managed tables with data in Amazon Simple Storage Service (Amazon S3).

### Hive Live Long and Process (LLAP)
<a name="emr-hive-LLAP"></a>

[LLAP functionality](https://cwiki.apache.org/confluence/display/Hive/LLAP) added in version 2.0 of default Apache Hive is not supported in Hive 2.1.0 on Amazon EMR release 5.0.

Amazon EMR version 6.0.0 and later supports the Live Long and Process (LLAP) functionality for Hive. For more information, see [Using Hive LLAP](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-llap.html). 

## Differences in Hive between Amazon EMR release version 4.x and 5.x
<a name="emr-hive-diff"></a>

This section covers differences to consider before you migrate a Hive implementation from Hive version 1.0.0 on Amazon EMR release 4.x to Hive 2.x on Amazon EMR release 5.x.

### Operational differences and considerations
<a name="emr-hive-diffs-ops"></a>
+ **Support added for [ACID (atomicity, consistency, isolation, and durability) transactions](https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions):** This difference between Hive 1.0.0 on Amazon EMR 4.x and default Apache Hive has been eliminated.
+ **Direct writes to Amazon S3 eliminated:** This difference between Hive 1.0.0 on Amazon EMR and the default Apache Hive has been eliminated. Hive 2.1.0 on Amazon EMR release 5.x now creates, reads from, and writes to temporary files stored in Amazon S3. As a result, to read from and write to the same table you no longer have to create a temporary table in the cluster's local HDFS file system as a workaround. If you use versioned buckets, be sure to manage these temporary files as described below.
+ **Manage temp files when using Amazon S3 versioned buckets:** When you run Hive queries where the destination of generated data is Amazon S3, many temporary files and directories are created. This is new behavior as described earlier. If you use versioned S3 buckets, these temp files clutter Amazon S3 and incur cost if they're not deleted. Adjust your lifecycle rules so that data with a `/_tmp` prefix is deleted after a short period, such as five days. See [Specifying a lifecycle configuration](https://docs.aws.amazon.com/AmazonS3/latest/userguide/how-to-set-lifecycle-configuration-intro.html) for more information.
+ **Log4j updated to log4j 2:** If you use log4j, you may need to change your logging configuration because of this upgrade. See [Apache log4j 2](http://logging.apache.org/log4j/2.x/) for details.

### Performance differences and considerations
<a name="emr-hive-diffs-perf"></a>
+ **Performance differences with Tez:** With Amazon EMR release 5.x , Tez is the default execution engine for Hive instead of MapReduce. Tez provides improved performance for most workflows.
+ **Tables with many partitions:** Queries that generate a large number of dynamic partitions may fail, and queries that select from tables with many partitions may take longer than expected to execute. For example, a select from 100,000 partitions may take 10 minutes or more.

## Additional features of Hive on Amazon EMR
<a name="emr-hive-additional-features"></a>

Amazon EMR extends Hive with new features that support Hive integration with other AWS services, such as the ability to read from and write to Amazon Simple Storage Service (Amazon S3) and DynamoDB.

### Variables in Hive
<a name="emr-hive-variables"></a>

 You can include variables in your scripts by using the dollar sign and curly braces. 

```
add jar ${LIB}/jsonserde.jar
```

 You pass the values of these variables to Hive on the command line using the `-d` parameter, as in the following example: 

```
-d LIB=s3://elasticmapreduce/samples/hive-ads/lib
```

 You can also pass the values into steps that execute Hive scripts. 

**To pass variable values into Hive steps using the console**

1. Open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr/).

1. Choose **Create cluster**.

1. In the **Steps** section, for **Add Step**, choose **Hive Program** from the list and **Configure and add**.

1.  In the **Add Step** dialog, specify the parameters using the following table as a guide, and then choose **Add**.     
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hive-differences.html)

1. Select values as necessary and choose **Create cluster**.

**To pass variable values into Hive steps using the AWS CLI**

To pass variable values into Hive steps using the AWS CLI, use the `--steps` parameter and include an arguments list.
+ 
**Note**  
Linux line continuation characters (\$1) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

  ```
  aws emr create-cluster --name "Test cluster" --release-label emr-7.12.0 \
  --applications Name=Hive Name=Pig --use-default-roles --ec2-attributes KeyName=myKey --instance-type m5.xlarge --instance-count 3 \
  --steps Type=Hive,Name="Hive Program",ActionOnFailure=CONTINUE,Args=[-f,s3://elasticmapreduce/samples/hive-ads/libs/response-time-stats.q,-d,INPUT=s3://elasticmapreduce/samples/hive-ads/tables,-d,OUTPUT=s3://amzn-s3-demo-bucket/hive-ads/output/,-d,SAMPLE=s3://elasticmapreduce/samples/hive-ads/]
  ```

  For more information on using Amazon EMR commands in the AWS CLI, see [https://docs.aws.amazon.com/cli/latest/reference/emr](https://docs.aws.amazon.com/cli/latest/reference/emr).

**To pass variable values into Hive steps using the Java SDK**
+ The following example demonstrates how to pass variables into steps using the SDK. For more information, see [Class StepFactory](https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/elasticmapreduce/util/StepFactory.html) in the *AWS SDK for Java API Reference*. 

  ```
  StepFactory stepFactory = new StepFactory();
  
     StepConfig runHive = new StepConfig()
       .withName("Run Hive Script")
       .withActionOnFailure("TERMINATE_JOB_FLOW")
       .withHadoopJarStep(stepFactory.newRunHiveScriptStep(“s3://amzn-s3-demo-bucket/script.q”,
        Lists.newArrayList(“-d”,”LIB= s3://elasticmapreduce/samples/hive-ads/lib”));
  ```

### Amazon EMR Hive queries to accommodate partial DynamoDB schemas
<a name="emr-hive-partial-schema"></a>

Amazon EMR Hive provides maximum flexibility when querying DynamoDB tables by allowing you to specify a subset of columns on which you can filter data, rather than requiring your query to include all columns. This partial schema query technique is effective when you have a sparse database schema and want to filter records based on a few columns, such as filtering on time stamps. 

 The following example shows how to use a Hive query to: 
+ Create a DynamoDB table.
+ Select a subset of items (rows) in DynamoDB and further narrow the data to certain columns.
+ Copy the resulting data to Amazon S3. 

```
DROP TABLE dynamodb; 
DROP TABLE s3;

CREATE EXTERNAL TABLE dynamodb(hashKey STRING, recordTimeStamp BIGINT, fullColumn map<String, String>)
    STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' 
    TBLPROPERTIES ( 
     "dynamodb.table.name" = "myTable",
     "dynamodb.throughput.read.percent" = ".1000", 
     "dynamodb.column.mapping" = "hashKey:HashKey,recordTimeStamp:RangeKey"); 

CREATE EXTERNAL TABLE s3(map<String, String>)
     ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
     LOCATION 's3://bucketname/path/subpath/';

INSERT OVERWRITE TABLE s3 SELECT item fullColumn FROM dynamodb WHERE recordTimeStamp < "2012-01-01";
```

The following table shows the query syntax for selecting any combination of items from DynamoDB.


| Query example | Result description | 
| --- | --- | 
| SELECT \$1 FROM table\$1name; | Selects all items (rows) from a given table and includes data from all columns available for those items. | 
| SELECT \$1 FROM table\$1name WHERE field\$1name =value; | Selects some items (rows) from a given table and includes data from all columns available for those items. | 
| SELECT column1\$1name, column2\$1name, column3\$1name FROM table\$1name; | Selects all items (rows) from a given table and includes data from some columns available for those items. | 
| SELECT column1\$1name, column2\$1name, column3\$1name FROM table\$1name WHERE field\$1name =value; | Selects some items (rows) from a given table and includes data from some columns available for those items. | 

### Copy data between DynamoDB tables in different AWS Regions
<a name="emr-hive-cross-region-ddb-copy"></a>

Amazon EMR Hive provides a `dynamodb.region` property you can set per DynamoDB table. When `dynamodb.region` is set differently on two tables, any data you copy between the tables automatically occurs between the specified regions.

 The following example shows you how to create a DynamoDB table with a Hive script that sets the `dynamodb.region` property:

**Note**  
Per-table region properties override the global Hive properties.

```
CREATE EXTERNAL TABLE dynamodb(hashKey STRING, recordTimeStamp BIGINT, map<String, String> fullColumn)
    STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' 
    TBLPROPERTIES ( 
     "dynamodb.table.name" = "myTable",
     "dynamodb.region" = "eu-west-1", 
     "dynamodb.throughput.read.percent" = ".1000", 
     "dynamodb.column.mapping" = "hashKey:HashKey,recordTimeStamp:RangeKey");
```

### Set DynamoDB throughput values per table
<a name="emr-hive-set-ddb-throughput"></a>

Amazon EMR Hive enables you to set the DynamoDB readThroughputPercent and writeThroughputPercent settings on a per table basis in the table definition. The following Amazon EMR Hive script shows how to set the throughput values. For more information about DynamoDB throughput values, see [Specifying read and write requirements for tables](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/WorkingWithDDTables.html#ProvisionedThroughput). 

```
CREATE EXTERNAL TABLE dynamodb(hashKey STRING, recordTimeStamp BIGINT, map<String, String> fullColumn)
    STORED BY 'org.apache.hadoop.hive.dynamodb.DynamoDBStorageHandler' 
    TBLPROPERTIES ( 
     "dynamodb.table.name" = "myTable",
     "dynamodb.throughput.read.percent" = ".4",
     "dynamodb.throughput.write.percent" = "1.0",
     "dynamodb.column.mapping" = "hashKey:HashKey,recordTimeStamp:RangeKey");
```

# Configuring an external metastore for Hive
<a name="emr-metastore-external-hive"></a>

By default, Hive records metastore information in a MySQL database on the primary node's file system. The metastore contains a description of the table and the underlying data on which it is built, including the partition names, data types, and so on. When a cluster terminates, all cluster nodes shut down, including the primary node. When this happens, local data is lost because node file systems use ephemeral storage. If you need the metastore to persist, you must create an *external metastore* that exists outside the cluster.

You have two options for an external metastore:
+ AWS Glue Data Catalog (Amazon EMR release 5.8.0 or later only).

  For more information, see [Using the AWS Glue Data Catalog as the metastore for Hive](emr-hive-metastore-glue.md).
+ Amazon RDS or Amazon Aurora.

  For more information, see [Using an external MySQL database or Amazon Aurora](emr-hive-metastore-external.md).

**Note**  
If you're using Hive 3 and encounter too many connections to Hive metastore, configure the parameter `datanucleus.connectionPool.maxPoolSize` to have a smaller value or increase the number of connection the database server can handle. The increased number of connections is due to the way Hive computes the maximum number of JDBC connections. To calculate the optimal value for performance, see [Hive Configuration Properties](https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-datanucleus.connectionPool.maxPoolSize.1).

# Using the AWS Glue Data Catalog as the metastore for Hive
<a name="emr-hive-metastore-glue"></a>

Using Amazon EMR release 5.8.0 or later, you can configure Hive to use the AWS Glue Data Catalog as its metastore. We recommend this configuration when you require a persistent metastore or a metastore shared by different clusters, services, applications, or AWS accounts.

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it simple and cost-effective to categorize your data, clean it, enrich it, and move it reliably between various data stores. The AWS Glue Data Catalog provides a unified metadata repository across a variety of data sources and data formats, integrating with Amazon EMR as well as Amazon RDS, Amazon Redshift, Redshift Spectrum, Athena, and any application compatible with the Apache Hive metastore. AWS Glue crawlers can automatically infer schema from source data in Amazon S3 and store the associated metadata in the Data Catalog. For more information about the Data Catalog, see [Populating the AWS Glue Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/populate-data-catalog.html) in the *AWS Glue Developer Guide*.

Separate charges apply for AWS Glue. There is a monthly rate for storing and accessing the metadata in the Data Catalog, an hourly rate billed per minute for AWS Glue ETL jobs and crawler runtime, and an hourly rate billed per minute for each provisioned development endpoint. The Data Catalog allows you to store up to a million objects at no charge. If you store more than a million objects, you are charged USD\$11 for each 100,000 objects over a million. An object in the Data Catalog is a table, partition, or database. For more information, see [Glue Pricing](https://aws.amazon.com/glue/pricing).

**Important**  
If you created tables using Amazon Athena or Amazon Redshift Spectrum before August 14, 2017, databases and tables are stored in an Athena-managed catalog, which is separate from the AWS Glue Data Catalog. To integrate Amazon EMR with these tables, you must upgrade to the AWS Glue Data Catalog. For more information, see [Upgrading to the AWS Glue Data Catalog](https://docs.aws.amazon.com/athena/latest/ug/glue-upgrade.html) in the *Amazon Athena User Guide*.

## Specifying AWS Glue Data Catalog as the metastore
<a name="emr-hive-glue-configure"></a>

You can specify the AWS Glue Data Catalog as the metastore using the AWS Management Console, AWS CLI, or Amazon EMR API. When you use the CLI or API, you use the configuration classification for Hive to specify the Data Catalog. In addition, with Amazon EMR 5.16.0 and later, you can use the configuration classification to specify a Data Catalog in a different AWS account. When you use the console, you can specify the Data Catalog using **Advanced Options** or **Quick Options**.

------
#### [ Console ]

**To specify AWS Glue Data Catalog as the Hive metastore with the console**

1. Sign in to the AWS Management Console, and open the Amazon EMR console at [https://console.aws.amazon.com/emr](https://console.aws.amazon.com/emr).

1. Under **EMR on EC2** in the left navigation pane, choose **Clusters**, and then choose **Create cluster**.

1. Under **Application bundle**, choose **Core Hadoop**, **HBase**, or **Custom**. If you customize your cluster, make sure that you select Hive or HCatalog as one of your applications.

1. Under **AWS Glue Data Catalog settings**, select the **Use for Hive table metadata** check box.

1. Choose any other options that apply to your cluster. 

1. To launch your cluster, choose **Create cluster**.

------
#### [ CLI ]

**To specify the AWS Glue Data Catalog as the Hive metastore with the AWS CLI**

For more information about specifying a configuration classification using the AWS CLI and EMR API, see [Configure applications](emr-configure-apps.md).
+ Specify the value for `hive.metastore.client.factory.class` using the `hive-site` configuration classification as shown in the following example:

  ```
  [
    {
      "Classification": "hive-site",
      "Properties": {
        "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory"
      }
    }
  ]
  ```

  On EMR release versions 5.28.0, 5.28.1, 5.29.0, or 6.x, if you're creating a cluster using the AWS Glue Data Catalog as the metastore, set the `hive.metastore.schema.verification` to `false`. This prevents Hive and HCatalog from validating the metastore schema against MySQL. Without this configuration, the primary instance group will become suspended after reconfiguration on Hive or HCatalog. 

  ```
  [
    {
      "Classification": "hive-site",
      "Properties": {
        "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
        "hive.metastore.schema.verification": "false"
      }
    }
  ]
  ```

  If you already have a cluster on EMR release version 5.28.0, 5.28.1, or 5.29.0, you can set the primary instance group `hive.metastore.schema.verification` to `false` with following information:

  ```
     
      Classification = hive-site
      Property       = hive.metastore.schema.verification
      Value          = false
  ```

  To specify a Data Catalog in a different AWS account, add the `hive.metastore.glue.catalogid` property as shown in the following example. Replace `acct-id` with the AWS account of the Data Catalog.

  ```
  [
    {
      "Classification": "hive-site",
      "Properties": {
        "hive.metastore.client.factory.class": "com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory",
        "hive.metastore.schema.verification": "false",
        "hive.metastore.glue.catalogid": "acct-id"
      }
    }
  ]
  ```

------

## IAM permissions
<a name="emr-hive-glue-permissions"></a>

The EC2 instance profile for a cluster must have IAM permissions for AWS Glue actions. In addition, if you enable encryption for AWS Glue Data Catalog objects, the role must also be allowed to encrypt, decrypt and generate the AWS KMS key used for encryption.

### Permissions for AWS Glue actions
<a name="emr-hive-glue-permissions-actions"></a>

If you use the default EC2 instance profile for Amazon EMR, no action is required. The `AmazonElasticMapReduceforEC2Role` managed policy that is attached to the `EMR_EC2_DefaultRole` allows all necessary AWS Glue actions. However, if you specify a custom EC2 instance profile and permissions, you must configure the appropriate AWS Glue actions. Use the `AmazonElasticMapReduceforEC2Role` managed policy as a starting point. For more information, see [Service role for cluster EC2 instances (EC2 instance profile)](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-role-for-ec2.html) in the *Amazon EMR Management Guide*.

### Permissions for encrypting and decrypting AWS Glue Data Catalog
<a name="emr-hive-glue-permissions-encrypt"></a>

Your instance profile needs permission to encrypt and decrypt data using your key. You do *not* need to configure these permissions if both of the following statements apply:
+ You enable encryption for AWS Glue Data Catalog objects using managed keys for AWS Glue.
+ You use a cluster that's in the same AWS account as the AWS Glue Data Catalog.

Otherwise, you must add the following statement to the permissions policy attached to your EC2 instance profile. 

For more information about AWS Glue Data Catalog encryption, see [Encrypting your data catalog](https://docs.aws.amazon.com/glue/latest/dg/encrypt-glue-data-catalog.html) in the *AWS Glue Developer Guide*.

### Resource-based permissions
<a name="emr-hive-glue-permissions-resource"></a>

If you use AWS Glue in conjunction with Hive, Spark, or Presto in Amazon EMR, AWS Glue supports resource-based policies to control access to Data Catalog resources. These resources include databases, tables, connections, and user-defined functions. For more information, see [AWS Glue Resource Policies](https://docs.aws.amazon.com/glue/latest/dg/glue-resource-policies.html) in the *AWS Glue Developer Guide*.

When using resource-based policies to limit access to AWS Glue from within Amazon EMR, the principal that you specify in the permissions policy must be the role ARN associated with the EC2 instance profile that is specified when a cluster is created. For example, for a resource-based policy attached to a catalog, you can specify the role ARN for the default service role for cluster EC2 instances, *EMR\$1EC2\$1DefaultRole* as the `Principal`, using the format shown in the following example:

```
arn:aws:iam::acct-id:role/EMR_EC2_DefaultRole
```

The *acct-id* can be different from the AWS Glue account ID. This enables access from EMR clusters in different accounts. You can specify multiple principals, each from a different account.

## Considerations when using AWS Glue Data Catalog
<a name="emr-hive-glue-considerations-hive"></a>

Consider the following items when using the AWS Glue Data Catalog as the metastore with Hive:
+ Adding auxiliary JARs using the Hive shell is not supported. As a workaround, use the `hive-site` configuration classification to set the `hive.aux.jars.path` property, which adds auxiliary JARs into the Hive classpath.
+ [Hive transactions](https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions) are not supported.
+ Renaming tables from within AWS Glue is not supported.
+ When you create a Hive table without specifying a `LOCATION`, the table data is stored in the location specified by the `hive.metastore.warehouse.dir` property. By default, this is a location in HDFS. If another cluster needs to access the table, it fails unless it has adequate permissions to the cluster that created the table. Furthermore, because HDFS storage is transient, if the cluster terminates, the table data is lost, and the table must be recreated. We recommend that you specify a `LOCATION` in Amazon S3 when you create a Hive table using AWS Glue. Alternatively, you can use the `hive-site` configuration classification to specify a location in Amazon S3 for `hive.metastore.warehouse.dir`, which applies to all Hive tables. If a table is created in an HDFS location and the cluster that created it is still running, you can update the table location to Amazon S3 from within AWS Glue. For more information, see [Working with Tables on the AWS Glue Console](https://docs.aws.amazon.com/glue/latest/dg/console-tables.html) in the *AWS Glue Developer Guide*. 
+ Partition values containing quotes and apostrophes are not supported, for example, `PARTITION (owner="Doe's").`
+ [Column statistics](https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ColumnStatistics) are supported for emr-5.31.0 and later.
+ Using [Hive authorization](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Authorization) is not supported. As an alternative, consider using [AWS Glue Resource-Based Policies](https://docs.aws.amazon.com/glue/latest/dg/glue-resource-policies.html). For more information, see [Use Resource-Based Policies for Amazon EMR Access to AWS Glue Data Catalog](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-iam-roles-glue.html).
+ [Hive constraints](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Constraints) are not supported.
+ [Cost-based Optimization in Hive](https://cwiki.apache.org/confluence/display/Hive/Cost-based+optimization+in+Hive) is not supported.
+ Setting `hive.metastore.partition.inherit.table.properties` is not supported. 
+ Using the following metastore constants is not supported: `BUCKET_COUNT, BUCKET_FIELD_NAME, DDL_TIME, FIELD_TO_DIMENSION, FILE_INPUT_FORMAT, FILE_OUTPUT_FORMAT, HIVE_FILTER_FIELD_LAST_ACCESS, HIVE_FILTER_FIELD_OWNER, HIVE_FILTER_FIELD_PARAMS, IS_ARCHIVED, META_TABLE_COLUMNS, META_TABLE_COLUMN_TYPES, META_TABLE_DB, META_TABLE_LOCATION, META_TABLE_NAME, META_TABLE_PARTITION_COLUMNS, META_TABLE_SERDE, META_TABLE_STORAGE, ORIGINAL_LOCATION`.
+ When you use a predicate expression, explicit values must be on the right side of the comparison operator, or queries might fail.
  + **Correct**: `SELECT * FROM mytable WHERE time > 11`
  + **Incorrect**: `SELECT * FROM mytable WHERE 11 > time`
+ Amazon EMR versions 5.32.0 and 6.3.0 and later support using user-defined functions (UDFs) in predicate expressions. When using earlier versions, your queries may fail because of the way Hive tries to optimize query execution.
+ [Temporary tables](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-TemporaryTables) are not supported.
+ We recommend creating tables using applications through Amazon EMR rather than creating them directly using AWS Glue. Creating a table through AWS Glue may cause required fields to be missing and cause query exceptions.
+ In EMR 5.20.0 or later, parallel partition pruning is enabled automatically for Spark and Hive when AWS Glue Data Catalog is used as the metastore. This change significantly reduces query planning time by executing multiple requests in parallel to retrieve partitions. The total number of segments that can be executed concurrently range between 1 and 10. The default value is 5, which is a recommended setting. You can change it by specifying the property `aws.glue.partition.num.segments` in `hive-site` configuration classification. If throttling occurs, you can turn off the feature by changing the value to 1. For more information, see [AWS Glue Segment Structure](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-partitions.html#aws-glue-api-catalog-partitions-Segment).

# Using an external MySQL database or Amazon Aurora
<a name="emr-hive-metastore-external"></a>

To use an external MySQL database or Amazon Aurora as your Hive metastore, you override the default configuration values for the metastore in Hive to specify the external database location, either on an Amazon RDS MySQL instance or an Amazon Aurora PostgreSQLinstance.

**Note**  
Hive neither supports nor prevents concurrent write access to metastore tables. If you share metastore information between two clusters, you must ensure that you do not write to the same metastore table concurrently, unless you are writing to different partitions of the same metastore table.

The following procedure shows you how to override the default configuration values for the Hive metastore location and start a cluster using the reconfigured metastore location.

**To create a metastore located outside of the EMR cluster**

1. Create a MySQL or Aurora PostgreSQL database. If you use PostgreSQL, you must configure it after you've provisioned your cluster. Only MySQL is supported at cluster creation. For information about the differences between Aurora MySQL and Aurora PostgreSQL, see [Overview of Amazon Aurora MySQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraMySQL.Overview.html) and [Working with Amazon Aurora PostgreSQL](https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/Aurora.AuroraPostgreSQL.html). For information about how to create an Amazon RDS database in general, see [https://aws.amazon.com/rds/](https://aws.amazon.com/rds/).

1. Modify your security groups to allow JDBC connections between your database and the **ElasticMapReduce-Master** security group. For information about how to modify your security groups for access, see [Working with Amazon EMR-managed security groups](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-man-sec-groups.html).

1. Set JDBC configuration values in `hive-site.xml`:
**Important**  
If you supply sensitive information, such as passwords, to the Amazon EMR configuration API, this information is displayed for those accounts that have sufficient permissions. If you are concerned that this information could be displayed to other users, create the cluster with an administrative account and limit other users (IAM users or those with delegated credentials) to accessing services on the cluster by creating a role which explicitly denies permissions to the `elasticmapreduce:DescribeCluster` API key.

   1. Create a configuration file called `hiveConfiguration.json` containing edits to `hive-site.xml` as shown in the following example.

       Replace *hostname* with the DNS address of your Amazon RDS instance running the database, and *username* and *password* with the credentials for your database. For more information about connecting to MySQL and Aurora database instances, see [Connecting to a DB instance running the MySQL database engine](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ConnectToInstance.html) and [Connecting to an Athena DB cluster](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.Connect.html) in the *Amazon RDS User Guide*. `javax.jdo.option.ConnectionURL` is the JDBC connect string for a JDBC metastore. `javax.jdo.option.ConnectionDriverName` is the driver class name for a JDBC metastore.

      The MySQL JDBC drivers are installed by Amazon EMR. 

      The value property can not contain any spaces or carriage returns. It should appear all on one line.

      ```
      [
          {
            "Classification": "hive-site",
            "Properties": {
              "javax.jdo.option.ConnectionURL": "jdbc:mysql://hostname:3306/hive?createDatabaseIfNotExist=true",
              "javax.jdo.option.ConnectionDriverName": "org.mariadb.jdbc.Driver",
              "javax.jdo.option.ConnectionUserName": "username",
              "javax.jdo.option.ConnectionPassword": "password"
            }
          }
        ]
      ```

   1. Reference the `hiveConfiguration.json` file when you create the cluster as shown in the following AWS CLI command. In this command, the file is stored locally, you can also upload the file to Amazon S3 and reference it there, for example, `s3://DOC-EXAMPLE-BUCKET/hiveConfiguration.json`.
**Note**  
Linux line continuation characters (\$1) are included for readability. They can be removed or used in Linux commands. For Windows, remove them or replace with a caret (^).

      ```
      aws emr create-cluster --release-label emr-7.12.0 --instance-type m5.xlarge --instance-count 2 \
      --applications Name=Hive --configurations file://hiveConfiguration.json --use-default-roles
      ```

1. Connect to the primary node of your cluster. 

   For information about how to connect to the primary node, see [Connect to the primary node using SSH](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-connect-master-node-ssh.html) in the *Amazon EMR Management Guide*.

1. Create your Hive tables specifying the location on Amazon S3 by entering a command similar to the following:

   ```
   CREATE EXTERNAL TABLE IF NOT EXISTS table_name
   (
   key int,
   value int
   )
   LOCATION s3://DOC-EXAMPLE-BUCKET/hdfs/
   ```

1. Add your Hive script to the running cluster.

Your Hive cluster runs using the metastore located in Amazon RDS. Launch all additional Hive clusters that share this metastore by specifying the metastore location. 

# Use the Hive JDBC driver
<a name="HiveJDBCDriver"></a>

You can use popular business intelligence tools like Microsoft Excel, MicroStrategy, QlikView, and Tableau with Amazon EMR to explore and visualize your data. Many of these tools require Java Database Connectivity (JDBC) driver or an Open Database Connectivity (ODBC) driver. Amazon EMR supports both JDBC and ODBC connectivity to a Spark, Hive or Presto cluster.

The example below demonstrates using SQL Workbench/J as a SQL client to connect to a Hive cluster in Amazon EMR. For additional drivers, see [Use business intelligence tools with Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-bi-tools.html).

Before you install and work with SQL Workbench/J, download the driver package and install the driver. The drivers included in the package support the Hive versions available in Amazon EMR release versions 4.0 and later. For detailed release notes and documentation, see the PDF documentation included in the package.
+ **The latest JDBC and ODBC driver packages**

  [http://awssupportdatasvcs.com/bootstrap-actions/Simba/](http://awssupportdatasvcs.com/bootstrap-actions/Simba/)

**To install and configure SQL Workbench**

1. Download the SQL Workbench/J client for your operating system from [http://www.sql-workbench.net/downloads.html](http://www.sql-workbench.net/downloads.html).

1. Install SQL Workbench/J. For more information, see [Installing and starting SQL Workbench/J](http://www.sql-workbench.net/manual/install.html) in the SQL Workbench/J Manual User's Manual.

1. **Linux, Unix, Mac OS X users**: In a terminal session, create an SSH tunnel to the master node of your cluster using the following command. Replace *master-public-dns-name* with the public DNS name of the master node and *path-to-key-file* with the location and file name of your Amazon EC2 private key (`.pem`) file.

   ```
   ssh -o ServerAliveInterval=10 -i path-to-key-file -N -L 10000:localhost:10000 hadoop@master-public-dns-name
   ```

   **Windows users**: In a PuTTY session, create an SSH tunnel to the master node of your cluster (using local port forwarding) with `10000` for **Source port** and `master-public-dns-name:10000` for **Destination**. Replace `master-public-dns-name` with the public DNS name of the master node.

1. Add the JDBC driver to SQL Workbench.

   1. In the **Select Connection Profile** dialog box, click **Manage Drivers**. 

   1. Click the **Create a new entry** (blank page) icon.

   1. In the **Name** field, type **Hive JDBC**.

   1. For **Library**, click the **Select the JAR file(s)** icon.

   1. Navigate to the location containing the extracted drivers. Select the drivers that are included in the JDBC driver package version that you downloaded, and click **Open**.

      For example, your JDBC driver package may include the following JARs.

      ```
      hive_metastore.jar
      hive_service.jar
      HiveJDBC41.jar
      libfb303-0.9.0.jar
      libthrift-0.9.0.jar
      log4j-1.2.14.jar
      ql.jar
      slf4j-api-1.5.11.jar
      slf4j-log4j12-1.5.11.jar
      TCLIServiceClient.jar
      zookeeper-3.4.6.jar
      ```

   1. In the **Please select one driver** dialog box, select `com.amazon.hive.jdbc41.HS2Driver`, **OK**.

1. When you return to the **Manage Drivers** dialog box, verify that the **Classname** field is populated and select **OK**. 

1. When you return to the **Select Connection Profile** dialog box, verify that the **Driver** field is set to **Hive JDBC** and provide the following JDBC connection string in the **URL** field: `jdbc:hive2://localhost:10000/default`.

1. Select **OK** to connect. After the connection is complete, connection details appear at the top of the SQL Workbench/J window.

For more information about using Hive and the JDBC interface, see [HiveClient](https://cwiki.apache.org/confluence/display/Hive/HiveClient) and [HiveJDBCInterface](https://cwiki.apache.org/confluence/display/Hive/HiveJDBCInterface) in Apache Hive documentation.

# Improve Hive performance
<a name="emr-hive-s3-performance"></a>

Amazon EMR offers features to help optimize performance when using Hive to query, read and write data saved in Amazon S3.

S3 Select can improve query performance for CSV and JSON files in some applications by “pushing down” processing to Amazon S3.

The EMRFS S3 optimized committer is an alternative to the [OutputCommitter](https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/OutputCommitter.html) class, that eliminates list and rename operations to improve performance when writing files Amazon S3 using EMRFS.

**Topics**
+ [Enabling Hive EMRFS S3 optimized committer](hive-optimized-committer.md)
+ [Using S3 Select with Hive to improve performance](emr-hive-s3select.md)
+ [MSCK Optimization](emr-msck-optimization.md)

# Enabling Hive EMRFS S3 optimized committer
<a name="hive-optimized-committer"></a>

The Hive EMRFS S3 Optimized Committer is an alternative way using which EMR Hive writes files for insert queries when using EMRFS. The Committer eliminates list and rename operations done on Amazon S3 and improves application’s performance. The feature is available beginning with EMR 5.34 and EMR 6.5.

## Enabling the committer
<a name="enabling-hive-committer"></a>

If you want to enable EMR Hive to use `HiveEMRFSOptimizedCommitter` to commit data as the default for all Hive managed and external tables, use the following `hive-site` configuration in EMR 6.5.0 or EMR 5.34.0 clusters.

```
[
   {
      "classification": "hive-site",
      "properties": {
         "hive.blobstore.use.output-committer": "true"
      }
   }
]
```

**Note**  
Do not turn this feature on when `hive.exec.parallel` is set to `true`.

## Limitations
<a name="hive-committer-limitations"></a>

The following basic restrictions apply to tags:
+ Enabling Hive to merge small files automatically is not supported. The default Hive commit logic will be used even when the optimized committer is enabled.
+ Hive ACID tables are not supported. The default Hive commit logic will be used even when the optimized committer is enabled.
+ File naming nomenclature for files written is changed from Hive’s `<task_id>_<attempt_id>_<copy_n>` to `<task_id>_<attempt_id>_<copy_n>_<query_id>`. For example, a file named 

  `s3://warehouse/table/partition=1/000000_0` will be changed to `s3://warehouse/table/partition=1/000000_0-hadoop_20210714130459_ba7c23ec-5695-4947-9d98-8a40ef759222-1`. The `query_id` here is a combination of the username, time stamp, and UUID.
+ When custom partitions are on different file systems (HDFS, S3), this feature is automatically disabled. The default Hive commit logic will be used when enabled.

# Using S3 Select with Hive to improve performance
<a name="emr-hive-s3select"></a>

**Important**  
Amazon S3 Select is no longer available to new customers. Existing customers of Amazon S3 Select can continue to use the feature as usual. [Learn more](https://aws.amazon.com/blogs/storage/how-to-optimize-querying-your-data-in-amazon-s3/) 

With Amazon EMR release version 5.18.0 and later, you can use [S3 Select](https://aws.amazon.com/blogs/aws/s3-glacier-select/) with Hive on Amazon EMR. S3 Select allows applications to retrieve only a subset of data from an object. For Amazon EMR, the computational work of filtering large datasets for processing is "pushed down" from the cluster to Amazon S3, which can improve performance in some applications and reduces the amount of data transferred between Amazon EMR and Amazon S3.

S3 Select is supported with Hive tables based on CSV and JSON files and by setting the `s3select.filter` configuration variable to `true` during your Hive session. For more information and examples, see [Specifying S3 Select in your code](#emr-hive-s3select-specify).

## Is S3 Select right for my application?
<a name="emr-hive-s3select-apps"></a>

We recommend that you benchmark your applications with and without S3 Select to see if using it may be suitable for your application.

Use the following guidelines to determine if your application is a candidate for using S3 Select:
+ Your query filters out more than half of the original dataset.
+ Your query filter predicates use columns that have a data type supported by Amazon S3 Select. For more information, see [Data types](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference-data-types.html) in the *Amazon Simple Storage Service User Guide*.
+ Your network connection between Amazon S3 and the Amazon EMR cluster has good transfer speed and available bandwidth. Amazon S3 does not compress HTTP responses, so the response size is likely to increase for compressed input files.

## Considerations and limitations
<a name="emr-hive-s3select-considerations"></a>
+ Amazon S3 server-side encryption with customer-provided encryption keys (SSE-C) and client-side encryption are not supported. 
+ The `AllowQuotedRecordDelimiters` property is not supported. If this property is specified, the query fails.
+ Only CSV and JSON files in UTF-8 format are supported. Multi-line CSVs and JSON are not supported.
+ Only uncompressed or gzip or bzip2 files are supported.
+ Comment characters in the last line are not supported.
+ Empty lines at the end of a file are not processed.
+ Hive on Amazon EMR supports the primitive data types that S3 Select supports. For more information, see [Data types](https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference-data-types.html) in the *Amazon Simple Storage Service User Guide*.

## Specifying S3 Select in your code
<a name="emr-hive-s3select-specify"></a>

To use S3 Select in your Hive table, create the table by specifying `com.amazonaws.emr.s3select.hive.S3SelectableTextInputFormat` as the `INPUTFORMAT` class name, and specify a value for the `s3select.format` property using the `TBLPROPERTIES` clause.

By default, S3 Select is disabled when you run queries. Enable S3 Select by setting `s3select.filter` to `true` in your Hive session as shown below. The examples below demonstrate how to specify S3 Select when creating a table from underlying CSV and JSON files and then querying the table using a simple select statement.

**Example CREATE TABLE statement for CSV-based table**  

```
CREATE TABLE mys3selecttable (
col1 string,
col2 int,
col3 boolean
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS
INPUTFORMAT
  'com.amazonaws.emr.s3select.hive.S3SelectableTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://path/to/mycsvfile/'
TBLPROPERTIES (
  "s3select.format" = "csv",
  "s3select.headerInfo" = "ignore"
);
```

**Example CREATE TABLE statement for JSON-based table**  

```
CREATE TABLE mys3selecttable (
col1 string,
col2 int,
col3 boolean
)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
STORED AS
INPUTFORMAT
  'com.amazonaws.emr.s3select.hive.S3SelectableTextInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION 's3://path/to/json/'
TBLPROPERTIES (
  "s3select.format" = "json"
);
```

**Example SELECT TABLE statement**  

```
SET s3select.filter=true;
SELECT * FROM mys3selecttable WHERE col2 > 10;
```

# MSCK Optimization
<a name="emr-msck-optimization"></a>

Hive stores a list of partitions for each table in its metastore. However, when partitions are directly added to or removed from the file system, the Hive metastore is unaware of these changes. The [ MSCK command](https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-RecoverPartitions(MSCKREPAIRTABLE)) updates the partition metadata in the Hive metastore for partitions that were directly added to or removed from the file system. The syntax for the command is:

```
MSCK [REPAIR] TABLE table_name [ADD/DROP/SYNC PARTITIONS];
```

Hive implements this command as follows:

1. Hive retrieves all the partitions for the table from the metastore. From the list of partition paths that do not exist in the file system then creates a list of partitions to drop from the metastore.

1. Hive gathers the partition paths present in the file system, compares them with the list of partitions from the metastore, and generates a list of partitions that need to be added to the metastore.

1. Hive updates the metastore using `ADD`, `DROP`, or `SYNC` mode.

**Note**  
When there are many partitions in the metastore, the step to check if a partition does not exist in the file system takes a long time to run because the file system's `exists` API call must be made for each partition.

In Amazon EMR 6.5.0, Hive introduced a flag called `hive.emr.optimize.msck.fs.check`. When enabled, this flag causes Hive to check for the presence of a partition from the list of partition paths from the file system that is generated in step 2 above instead of making file system API calls. In Amazon EMR 6.8.0, Hive enabled this optimization by default, eliminating the need to set the flag `hive.emr.optimize.msck.fs.check`.

# Using Hive Live Long and Process (LLAP)
<a name="emr-hive-llap"></a>

Amazon EMR 6.0.0 supports the Live Long and Process (LLAP) functionality for Hive. LLAP uses persistent daemons with intelligent in-memory caching to improve query performance compared to the previous default Tez container execution mode.

The Hive LLAP daemons are managed and run as a YARN Service. Since a YARN service can be considered a long-running YARN application, some of your cluster resources are dedicated to Hive LLAP and cannot be used for other workloads. For more information, see [LLAP](https://cwiki.apache.org/confluence/display/Hive/LLAP) and [YARN Service API](https://hadoop.apache.org/docs/r3.2.1/hadoop-yarn/hadoop-yarn-site/yarn-service/YarnServiceAPI.html).

## Enable Hive LLAP on Amazon EMR
<a name="emr-llap-enable"></a>

To enable Hive LLAP on Amazon EMR, supply the following configuration when you launch a cluster. 

```
[
  {
    "Classification": "hive",
    "Properties": {
      "hive.llap.enabled": "true"
    }
  }
]
```

For more information, see [Configuring applications](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-configure-apps.html).

By default, Amazon EMR allocates about 60 percent of cluster YARN resources to Hive LLAP daemons. You can configure the percentage of cluster YARN resource allocated to Hive LLAP and the number of task and core nodes to be considered for the Hive LLAP allocation.

For example, the following configuration starts Hive LLAP with three daemons on three task or core nodes and allocates 40 percent of the three core or task nodes' YARN resource to the Hive LLAP daemons.

```
[
  {
    "Classification": "hive",
    "Properties": {
      "hive.llap.enabled": "true",
      "hive.llap.percent-allocation": "0.4",
      "hive.llap.num-instances": "3"
    }
  }
]
```

You can use the following `hive-site` configurations in the classification API to override default LLAP resource settings.


| Property | Description | 
| --- | --- | 
| hive.llap.daemon.yarn.container.mb | Total LLAP daemon container size (in MB) | 
| hive.llap.daemon.memory.per.instance.mb |  The total memory used by executors in the LLAP daemon container (in MB)  | 
| hive.llap.io.memory.size |  Cache size for LLAP Input/Output  | 
| hive.llap.daemon.num.executors |  Number of executors per LLAP daemon  | 

## Start Hive LLAP on your cluster manually
<a name="emr-llap-manually"></a>

All dependencies and configurations used by LLAP are packaged into the LLAP tar archive as part of cluster startup. If LLAP is enabled using `"hive.llap.enabled": "true"`, we recommend that you use Amazon EMR reconfiguration to make configuration changes to LLAP.

Otherwise, for any manual changes to `hive-site.xml`, you must rebuild the LLAP tar archive by using the `hive --service llap` command, as the following example demonstrates. 

```
# Define how many resources you want to allocate to Hive LLAP

LLAP_INSTANCES=<how many llap daemons to run on cluster>
LLAP_SIZE=<total container size per llap daemon>
LLAP_EXECUTORS=<number of executors per daemon>
LLAP_XMX=<Memory used by executors>
LLAP_CACHE=<Max cache size for IO allocator>

yarn app -enableFastLaunch

hive --service llap \
--instances $LLAP_INSTANCES \
--size ${LLAP_SIZE}m \
--executors $LLAP_EXECUTORS \
--xmx ${LLAP_XMX}m \
--cache ${LLAP_CACHE}m \
--name llap0 \
--auxhbase=false \
--startImmediately
```

## Check Hive LLAP status
<a name="emr-llap-check"></a>

Use the following command to check the status of Hive LLAP through Hive.

```
hive --service llapstatus
```

Use the following command to check the status of Hive LLAP using YARN.

```
yarn app -status (name-of-llap-service)

# example: 
yarn app -status llap0 | jq
```

## Start or stop Hive LLAP
<a name="emr-llap-start"></a>

Since Hive LLAP runs as a persistent YARN service, you stop or restart the YARN service to stop or restart Hive LLAP. The following commands demonstrate this. 

```
yarn app -stop llap0
yarn app -start llap0
```

## Resize the number of Hive LLAP daemons
<a name="emr-llap-resize"></a>

Use the following command to reduce the number of LLAP instances. 

```
yarn app -flex llap0 -component llap -1
```

For more information, see [Flex a component of a service](https://hadoop.apache.org/docs/r3.2.1/hadoop-yarn/hadoop-yarn-site/yarn-service/QuickStart.html#Flex_a_component_of_a_service). 

# Encryption in Hive
<a name="hive-encryption"></a>

This section describes the encryption types Amazon EMR supports.

# Parquet modular encryption in Hive
<a name="hive-parquet-modular-encryption"></a>

Parquet modular encryption provides columnar level access control and encryption to enhance privacy and data integrity for data stored in Parquet file format. This feature is available in Amazon EMR Hive starting with release 6.6.0.

Previously supported solutions for security and integrity, which include encrypting files or encrypting the storage layer, are described in [Encryption Options](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-data-encryption-options.html) in the Amazon EMR Management Guide. These solutions can be used for Parquet files, but leveraging the new features of the integrated Parquet encryption mechanism provides granular access to the column level, as well as improvements in performance and security. Learn more about this feature on the Apache github page [Parquet Modular Encryption](https://github.com/apache/parquet-format/blob/master/Encryption.md).

Users pass configurations to Parquet readers and writers using Hadoop configurations. The detailed configurations for users to configure readers and writers to enable encryption as well as toggle advanced features are documented at [PARQUET-1854: Properties-driven Interface to Parquet Encryption Management](https://docs.google.com/document/d/1boH6HPkG0ZhgxcaRkGk3QpZ8X_J91uXZwVGwYN45St4/edit) 

## Usage examples
<a name="usage-examples"></a>

The following example covers creating and writing to a Hive table using AWS KMS for managing encryption keys.

1. Implement a KmsClient for the AWS KMS service as described in the document [PARQUET-1373: Encryption Key Management Tools](https://docs.google.com/document/d/1bEu903840yb95k9q2X-BlsYKuXoygE4VnMDl9xz_zhk/edit). The following sample shows an implementation snippet.

   ```
   package org.apache.parquet.crypto.keytools;
   
   import com.amazonaws.AmazonClientException;
   import com.amazonaws.AmazonServiceException;
   import com.amazonaws.regions.Regions;
   import com.amazonaws.services.kms.AWSKMS;
   import com.amazonaws.services.kms.AWSKMSClientBuilder;
   import com.amazonaws.services.kms.model.DecryptRequest;
   import com.amazonaws.services.kms.model.EncryptRequest;
   import com.amazonaws.util.Base64;
   import org.apache.hadoop.conf.Configuration;
   import org.apache.parquet.crypto.KeyAccessDeniedException;
   import org.apache.parquet.crypto.ParquetCryptoRuntimeException;
   import org.apache.parquet.crypto.keytools.KmsClient;
   import org.slf4j.Logger;
   import org.slf4j.LoggerFactory;
   
   import java.nio.ByteBuffer;
   import java.nio.charset.Charset;
   import java.nio.charset.StandardCharsets;
   
   public class AwsKmsClient implements KmsClient {
   
       private static final AWSKMS AWSKMS_CLIENT = AWSKMSClientBuilder
               .standard()
               .withRegion(Regions.US_WEST_2)
               .build();
       public static final Logger LOG = LoggerFactory.getLogger(AwsKmsClient.class);
   
       private String kmsToken;
       private Configuration hadoopConfiguration;
   
       @Override
       public void initialize(Configuration configuration, String kmsInstanceID, String kmsInstanceURL, String accessToken) throws KeyAccessDeniedException {
           hadoopConfiguration = configuration;
           kmsToken = accessToken;
   
       }
   
       @Override
       public String wrapKey(byte[] keyBytes, String masterKeyIdentifier) throws KeyAccessDeniedException {
           String value = null;
           try {
               ByteBuffer plaintext = ByteBuffer.wrap(keyBytes);
   
               EncryptRequest req = new EncryptRequest().withKeyId(masterKeyIdentifier).withPlaintext(plaintext);
               ByteBuffer ciphertext = AWSKMS_CLIENT.encrypt(req).getCiphertextBlob();
   
               byte[] base64EncodedValue = Base64.encode(ciphertext.array());
               value = new String(base64EncodedValue, Charset.forName("UTF-8"));
           } catch (AmazonClientException ae) {
               throw new KeyAccessDeniedException(ae.getMessage());
           }
           return value;
       }
   
       @Override
       public byte[] unwrapKey(String wrappedKey, String masterKeyIdentifier) throws KeyAccessDeniedException {
           byte[] arr = null;
           try {
               ByteBuffer ciphertext  = ByteBuffer.wrap(Base64.decode(wrappedKey.getBytes(StandardCharsets.UTF_8)));
               DecryptRequest request = new DecryptRequest().withKeyId(masterKeyIdentifier).withCiphertextBlob(ciphertext);
               ByteBuffer decipheredtext = AWSKMS_CLIENT.decrypt(request).getPlaintext();
               arr = new byte[decipheredtext.remaining()];
               decipheredtext.get(arr);
           } catch (AmazonClientException ae) {
               throw new KeyAccessDeniedException(ae.getMessage());
           }
           return arr;
       }
   }
   ```

1. Create your AWS KMS encryption keys for the footer as well the columns with your IAM roles having access as described in [Creating keys](https://docs.aws.amazon.com/kms/latest/developerguide/create-keys.html) in the *AWS Key Management Service Developer Guide*. The default IAM role is EMR\$1ECS\$1default.

1. On the Hive application on an Amazon EMR cluster, add the client above using the `ADD JAR` statement, as described in the [Apache Hive Resources documentation](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli#LanguageManualCli-HiveResources). The following is an example statement:

   ```
   ADD JAR 's3://location-to-custom-jar';
   ```

   An alternative method is to add the JAR to the `auxlib` of Hive using a bootstrap action. The following is an example line to be added to the boostrap action:

   ```
   aws s3 cp 's3://location-to-custom-jar' /usr/lib/hive/auxlib 
   ```

1. Set the following configurations:

   ```
   set parquet.crypto.factory.class=org.apache.parquet.crypto.keytools.PropertiesDrivenCryptoFactory;
   set parquet.encryption.kms.client.class=org.apache.parquet.crypto.keytools.AwsKmsClient;
   ```

1. Create a Hive table with Parquet format and specify the AWS KMS keys in SERDEPROPERTIES and insert some data to it:

   ```
   CREATE TABLE my_table(name STRING, credit_card STRING)
   ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe’
   WITH SERDEPROPERTIES (
     'parquet.encryption.column.key’=<aws-kms-key-id-for-column-1>: credit_card’,
     'parquet.encryption.footer.key’='<aws-kms-key-id-for-footer>’)
   STORED AS parquet
   LOCATION “s3://<bucket/<warehouse-location>/my_table”;
   
   INSERT INTO my_table SELECT 
   java_method ('org.apache.commons.lang.RandomStringUtils','randomAlphabetic',5) as name,
   java_method ('org.apache.commons.lang.RandomStringUtils','randomAlphabetic',10) as credit_card
   from (select 1) x lateral view posexplode(split(space(100),' ')) pe as i,x;
   
   select * from my_table;
   ```

1. Verify that when you create an external table at the same location with no access to AWS KMS keys (for example, IAM role access denied), you cannot read the data.

   ```
   CREATE EXTERNAL TABLE ext_table (name STRING, credit_card STRING)
   ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe’
   STORED AS parquet
   LOCATION “s3://<bucket>/<warehouse-location>/my_table”;
   
   SELECT * FROM ext_table;
   ```

1. The last statement should throw the following exception:

   ```
   Failed with exception java.io.IOException:org.apache.parquet.crypto.KeyAccessDeniedException: Footer key: access denied
   ```

# In-transit encryption in HiveServer2
<a name="hs2-encryption-intransit"></a>

Starting with Amazon EMR release 6.9.0, HiveServer2 (HS2) is TLS/SSL-enabled as part of [In-transit encryption in HiveServer2](#hs2-encryption-intransit) security configuration. This affects how you connect to HS2 running on an Amazon EMR cluster with in-transit encryption enabled. To connect to HS2, you must modify the `TRUSTSTORE_PATH` and `TRUSTSTORE_PASSWORD` parameter values in the JDBC URL. The following URL is an example of a JDBC connection for HS2 with the required parameters: 

```
jdbc:hive2://HOST_NAME:10000/default;ssl=true;sslTrustStore=TRUSTSTORE_PATH;trustStorePassword=TRUSTSTORE_PASSWORD
```

Use the appropriate instuctions for on-cluster or off-cluster HiveServer2 encryption below.

------
#### [ On-cluster HS2 access ]

If you are accessing HiveServer2 using the Beeline client after you SSH to the primary node, then reference `/etc/hadoop/conf/ssl-server.xml` to find the `TRUSTSTORE_PATH` and `TRUSTSTORE_PASSWORD` parameter values using configuration `ssl.server.truststore.location` and `ssl.server.truststore.password`.

The following example commands can help you retrieve these configurations:

```
TRUSTSTORE_PATH=$(sed -n '/ssl.server.truststore.location/,+2p' /etc/hadoop/conf/ssl-server.xml | awk -F "[><]" '/value/{print $3}')
TRUSTSTORE_PASSWORD=$(sed -n '/ssl.server.truststore.password/,+2p' /etc/hadoop/conf/ssl-server.xml | awk -F "[><]" '/value/{print $3}')
```

------
#### [ Off-cluster HS2 access ]

 If you are accessing HiveServer2 from a client outside the Amazon EMR cluster. you can use one of the following approaches to get the `TRUSTSTORE_PATH` and `TRUSTSTORE_PASSWORD`:
+ Convert the PEM file that was created during [security configuration](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-encryption-enable.html) to a JKS file and use the same in the JDBC connection URL. For example, with openssl and keytool, use the following commands:

  ```
  openssl pkcs12 -export -in trustedCertificates.pem -inkey privateKey.pem -out trustedCertificates.p12 -name "certificate"
  keytool -importkeystore -srckeystore trustedCertificates.p12 -srcstoretype pkcs12 -destkeystore trustedCertificates.jks
  ```
+ Alternatively, reference `/etc/hadoop/conf/ssl-server.xml` to find the `TRUSTSTORE_PATH` and `TRUSTSTORE_PASSWORD` parameter values using configuration `ssl.server.truststore.location` and `ssl.server.truststore.password`. Download the truststore file to the client machine and use the path on the client machine as the `TRUSTSTORE_PATH`.

  For more information on accessing applications from a client outside of the Amazon EMR cluster, see [Use the Hive JDBC driver](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/HiveJDBCDriver.html). 

------

# Hive release history
<a name="Hive-release-history"></a>

The following table lists the version of Hive included in each release version of Amazon EMR, along with the components installed with the application. For component versions in each release, see the Component Version section for your release in [Amazon EMR 7.x release versions](emr-release-7x.md), [Amazon EMR 6.x release versions](emr-release-6x.md), or [Amazon EMR 5.x release versions](emr-release-5x.md).


**Hive version information**  

| Amazon EMR Release Label | Hive Version | Components Installed With Hive | 
| --- | --- | --- | 
| emr-7.12.0 | 3.1.3-amzn-21 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-hdfs-zkfc, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.11.0 | 3.1.3-amzn-20 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-hdfs-zkfc, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.10.0 | 3.1.3-amzn-19 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.9.0 | 3.1.3-amzn-18 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.8.0 | 3.1.3-amzn-17 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.7.0 | 3.1.3-amzn-16 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.6.0 | 3.1.3-amzn-15 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.5.0 | 3.1.3-amzn-14 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.4.0 | 3.1.3-amzn-13 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.3.0 | 3.1.3-amzn-12 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.2.0 | 3.1.3-amzn-11 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-5.36.2 | 2.3.9-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-7.1.0 | 3.1.3-amzn-10 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-7.0.0 | 3.1.3-amzn-9 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-6.15.0 | 3.1.3-amzn-8 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-6.14.0 | 3.1.3-amzn-7 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-6.13.0 | 3.1.3-amzn-6 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-6.12.0 | 3.1.3-amzn-5 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-6.11.1 | 3.1.3-amzn-4.1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-6.11.0 | 3.1.3-amzn-4 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-6.10.1 | 3.1.3-amzn-3.1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-6.10.0 | 3.1.3-amzn-3 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, tez-on-worker, zookeeper-client, zookeeper-server | 
| emr-6.9.1 | 3.1.3-amzn-2.1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.9.0 | 3.1.3-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.8.1 | 3.1.3-amzn-1.1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.8.0 | 3.1.3-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.7.0 | 3.1.3-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-5.36.1 | 2.3.9-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.36.0 | 2.3.9-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-6.6.0 | 3.1.2-amzn-7 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-5.35.0 | 2.3.9-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-6.5.0 | 3.1.2-amzn-6 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.4.0 | 3.1.2-amzn-5 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.3.1 | 3.1.2-amzn-4 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.3.0 | 3.1.2-amzn-4 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.2.1 | 3.1.2-amzn-3 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.2.0 | 3.1.2-amzn-3 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.1.1 | 3.1.2-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.1.0 | 3.1.2-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.0.1 | 3.1.2-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-6.0.0 | 3.1.2-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn, zookeeper-client, zookeeper-server | 
| emr-5.34.0 | 2.3.8-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.33.1 | 2.3.7-amzn-4 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.33.0 | 2.3.7-amzn-4 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.32.1 | 2.3.7-amzn-3 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.32.0 | 2.3.7-amzn-3 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.31.1 | 2.3.7-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.31.0 | 2.3.7-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.30.2 | 2.3.6-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.30.1 | 2.3.6-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.30.0 | 2.3.6-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mariadb-server, tez-on-yarn | 
| emr-5.29.0 | 2.3.6-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mysql-server, tez-on-yarn | 
| emr-5.28.1 | 2.3.6-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mysql-server, tez-on-yarn | 
| emr-5.28.0 | 2.3.6-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, hudi, mysql-server, tez-on-yarn | 
| emr-5.27.1 | 2.3.5-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.27.0 | 2.3.5-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.26.0 | 2.3.5-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.25.0 | 2.3.5-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.24.1 | 2.3.4-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.24.0 | 2.3.4-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.23.1 | 2.3.4-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.23.0 | 2.3.4-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.22.0 | 2.3.4-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.21.2 | 2.3.4-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.21.1 | 2.3.4-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.21.0 | 2.3.4-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.20.1 | 2.3.4-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.20.0 | 2.3.4-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.19.1 | 2.3.3-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.19.0 | 2.3.3-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.18.1 | 2.3.3-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.18.0 | 2.3.3-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.17.2 | 2.3.3-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.17.1 | 2.3.3-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.17.0 | 2.3.3-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, emr-s3-select, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.16.1 | 2.3.3-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.16.0 | 2.3.3-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.15.1 | 2.3.3-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.15.0 | 2.3.3-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.14.2 | 2.3.2-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.14.1 | 2.3.2-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.14.0 | 2.3.2-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.13.1 | 2.3.2-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.13.0 | 2.3.2-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.12.3 | 2.3.2-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.12.2 | 2.3.2-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.12.1 | 2.3.2-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.12.0 | 2.3.2-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.11.4 | 2.3.2-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.11.3 | 2.3.2-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.11.2 | 2.3.2-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.11.1 | 2.3.2-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.11.0 | 2.3.2-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.10.1 | 2.3.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.10.0 | 2.3.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.9.1 | 2.3.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.9.0 | 2.3.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.8.3 | 2.3.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.8.2 | 2.3.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.8.1 | 2.3.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.8.0 | 2.3.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.7.1 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.7.0 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.6.1 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.6.0 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.5.4 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.5.3 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.5.2 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.5.1 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.5.0 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.4.1 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.4.0 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hive-hbase, hcatalog-server, hive-server2, mysql-server, tez-on-yarn | 
| emr-5.3.2 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.3.1 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.3.0 | 2.1.1-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.2.3 | 2.1.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.2.2 | 2.1.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.2.1 | 2.1.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.2.0 | 2.1.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.1.1 | 2.1.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.1.0 | 2.1.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.0.3 | 2.1.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.0.2 | 2.1.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.0.1 | 2.1.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-5.0.0 | 2.1.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hadoop-yarn-timeline-server, hive-client, hcatalog-server, hive-server, mysql-server, tez-on-yarn | 
| emr-4.9.6 | 1.0.0-amzn-9 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.9.5 | 1.0.0-amzn-9 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.9.4 | 1.0.0-amzn-9 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.9.3 | 1.0.0-amzn-9 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.9.2 | 1.0.0-amzn-9 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.9.1 | 1.0.0-amzn-9 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.8.5 | 1.0.0-amzn-8 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.8.4 | 1.0.0-amzn-8 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.8.3 | 1.0.0-amzn-8 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.8.2 | 1.0.0-amzn-7 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.8.1 | 1.0.0-amzn-7 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.8.0 | 1.0.0-amzn-7 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.7.4 | 1.0.0-amzn-6 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.7.3 | 1.0.0-amzn-6 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.7.2 | 1.0.0-amzn-6 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.7.1 | 1.0.0-amzn-5 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.7.0 | 1.0.0-amzn-5 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-kms-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hcatalog-server, hive-server, mysql-server | 
| emr-4.6.1 | 1.0.0-amzn-4 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hive-metastore-server, hive-server, mysql-server | 
| emr-4.6.0 | 1.0.0-amzn-4 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hive-metastore-server, hive-server, mysql-server | 
| emr-4.5.0 | 1.0.0-amzn-4 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hive-metastore-server, hive-server, mysql-server | 
| emr-4.4.0 | 1.0.0-amzn-3 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hive-metastore-server, hive-server, mysql-server | 
| emr-4.3.0 | 1.0.0-amzn-2 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hive-metastore-server, hive-server, mysql-server | 
| emr-4.2.0 | 1.0.0-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hive-metastore-server, hive-server, mysql-server | 
| emr-4.1.0 | 1.0.0-amzn-1 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-library, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hive-metastore-server, hive-server, mysql-server | 
| emr-4.0.0 | 1.0.0-amzn-0 | emrfs, emr-ddb, emr-goodies, emr-kinesis, emr-s3-dist-cp, hadoop-client, hadoop-mapred, hadoop-hdfs-datanode, hadoop-hdfs-namenode, hadoop-httpfs-server, hadoop-yarn-nodemanager, hadoop-yarn-resourcemanager, hive-client, hive-metastore-server, hive-server, mysql-server | 

# Hive release notes by version
<a name="Hive-release-history-versions"></a>

**Topics**
+ [Amazon EMR 7.10.0 - Hive release notes](Hive-release-history-7100.md)
+ [Amazon EMR 7.9.0 - Hive release notes](Hive-release-history-790.md)
+ [Amazon EMR 7.8.0 - Hive release notes](Hive-release-history-780.md)
+ [Amazon EMR 7.7.0 - Hive release notes](Hive-release-history-770.md)
+ [Amazon EMR 7.6.0 - Hive release notes](Hive-release-history-760.md)
+ [Amazon EMR 7.5.0 - Hive release notes](Hive-release-history-750.md)
+ [Amazon EMR 7.4.0 - Hive release notes](Hive-release-history-740.md)
+ [Amazon EMR 7.3.0 - Hive release notes](Hive-release-history-730.md)
+ [Amazon EMR 7.2.0 - Hive release notes](Hive-release-history-720.md)
+ [Amazon EMR 7.1.0 - Hive release notes](Hive-release-history-710.md)
+ [Amazon EMR 7.0.0 - Hive release notes](Hive-release-history-700.md)
+ [Amazon EMR 6.15.0 - Hive release notes](Hive-release-history-6150.md)
+ [Amazon EMR 6.14.0 - Hive release notes](Hive-release-history-6140.md)
+ [Amazon EMR 6.13.0 - Hive release notes](Hive-release-history-6130.md)
+ [Amazon EMR 6.12.0 - Hive release notes](Hive-release-history-6120.md)
+ [Amazon EMR 6.11.0 - Hive release notes](Hive-release-history-6110.md)
+ [Amazon EMR 6.10.0 - Hive release notes](Hive-release-history-6100.md)
+ [Amazon EMR 6.9.0 - Hive release notes](Hive-release-history-690.md)
+ [Amazon EMR 6.8.0 - Hive release notes](Hive-release-history-680.md)
+ [Amazon EMR 6.7.0 - Hive release notes](Hive-release-history-670.md)
+ [Amazon EMR 6.6.0 - Hive release notes](Hive-release-history-660.md)

# Amazon EMR 7.10.0 - Hive release notes
<a name="Hive-release-history-7100"></a>

## Amazon EMR 7.10.0 - Hive changes
<a name="Hive-release-history-changes-7100"></a>


****  

| Type | Description | 
| --- | --- | 
| Bug Fix | Hive side fix for [TEZ-4595](https://issues.apache.org/jira/browse/TEZ-4595). | 

**Known issues**
+ AWS EMR from EMR-7.10.0 now uses S3A as the default filesystem (replacing EMRFS), which means Hive operations will no longer create `_$folder$` marker objects in S3, and the intermediate manifest files used in Hive write queries are now stored in S3 as compared to EMRFS’s HDFS. For considerations while using S3A, please refer to the [migration guide](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-s3a-migrate.html).
+ From EMR-7.3.0 to EMR-7.10.0, there is a Bug due to Hive Iceberg integration which causes HBase table creation in Hive to fail when AWS Glue Data Catalog is used as the metastore. Please reach out to the AWS support team if you encounter this issue. 

# Amazon EMR 7.9.0 - Hive release notes
<a name="Hive-release-history-790"></a>

## Amazon EMR 7.9.0 - Hive changes
<a name="Hive-release-history-changes-790"></a>


****  

| Type | Description | 
| --- | --- | 
| Bug Fix | Hive Blobstore Committer should not be used if the table being created via CTAS is ACID. | 
| Bug Fix | [HIVE-26096](https://issues.apache.org/jira/browse/HIVE-26096): Select on single column MultiDelimitSerDe table throws AIOBE (\$13158). | 
| Upgrade | Upgrade Avro version to 1.11.4 by backporting [HIVE-26954](https://issues.apache.org/jira/browse/HIVE-26954), [HIVE-27877](https://issues.apache.org/jira/browse/HIVE-27877) and [HIVE-28574](https://issues.apache.org/jira/browse/HIVE-28574). | 

**Known issues**
+ For Hive Insert Over-write queries with Amazon S3 Express One Zone as the output location, set the core-site config: `fs.s3a.directory.operations.purge.uploads` to `false`.
+ From EMR-7.3.0 to EMR-7.10.0, there is a Bug due to Hive Iceberg integration which causes HBase table creation in Hive to fail when AWS Glue Data Catalog is used as the metastore. Please reach out to the AWS support team if you encounter this issue.

# Amazon EMR 7.8.0 - Hive release notes
<a name="Hive-release-history-780"></a>

## Amazon EMR 7.8.0 - Hive changes
<a name="Hive-release-history-changes-780"></a>


****  

| Type | Description | 
| --- | --- | 
| Bug Fix | Fixes CVE-2024-23953: Apache Hive: Timing Attack Against Signature in LLAP. | 

**Known issues**
+ For Hive Insert Over-write queries with Amazon S3 Express One Zone as the output location, set the core-site config: `fs.s3a.directory.operations.purge.uploads` to `false`.

# Amazon EMR 7.7.0 - Hive release notes
<a name="Hive-release-history-770"></a>

## Amazon EMR 7.7.0 - Hive changes
<a name="Hive-release-history-changes-770"></a>


****  

| Type | Description | 
| --- | --- | 
| Bug Fix | Fixes CVE-2024-29869: Apache Hive: Credentials file created with non restrictive permissions. | 
| Bug Fix | Fixes SemanticException when a row-level filtering policy is enabled in Apache Ran. | 
| Bug Fix | Disable Tez Async Init RR when LLAP or ACID is enabled. | 

**Known issues**
+ For Hive Insert Over-write queries with Amazon S3 Express One Zone as the output location, set the core-site config: `fs.s3a.directory.operations.purge.uploads` to `false`.

# Amazon EMR 7.6.0 - Hive release notes
<a name="Hive-release-history-760"></a>

## Amazon EMR 7.6.0 - Hive changes
<a name="Hive-release-history-changes-760"></a>


****  

| Type | Description | 
| --- | --- | 
| Improvement | Added fast S3 prefix listing feature for ORC non ACID partitioned tables | 
| Feature | Add support for Magic Committers for Hive Write Queries on S3AFileSystem | 

**Known issues**
+ For Hive Insert Over-write queries with Amazon S3 Express One Zone as the output location, set the core-site config: `fs.s3a.directory.operations.purge.uploads` to `false`.

### Amazon EMR 7.6.0 - New configurations
<a name="Hive-release-history-changes-760-new-configs"></a>


****  

| Classification | Name | Default | Description | 
| --- | --- | --- | --- | 
| hive-site | `hive.exec.fast.s3.partition.discovery.enabled` | true | Whether to use fast S3 partition discovery for split calculation. This will enable prefix based listing for supported file formats: ORC. Note that this feature uses an S3 API parameter that the S3 Express One Zone storage class doesn't support. When using them, disable this feature. | 
| hive-site | `hive.exec.fast.s3.partition.discovery.max.thread.threshold` | 128 | The maximum degree of parallelism for fast S3 partition discovery. | 
| hive-site | `hive.exec.fast.s3.partition.discovery.parallelism` | 10 | The degree of parallelism of a single run of fast S3 partition discovery. This configuration only has an effect if `hive.exec.fast.s3.partition.discovery.enabled` is set to `true` | 
| hive-site | `hive.blobstore.output-committer.magic.track.commits.in.memory.enabled` | true | Flag to toggle should Magic committer with Hive track all the pending commits in memory? The Magic committer has an option to store the commit data in-memory which can speed up the TaskCommit operation by making fewer S3 calls. This config overrides the Hadoop config `fs.s3a.committer.magic.track.commits.in.memory.enabled` | 
| hive-site | `hive.blobstore.output-committer.dp.skip.task.staging.dir.creation` | true | Flag to toggle should Magic committer create the dp staging paths in the blobstore? This flag is applicable only when tracking commits in memory when Hive uses Magic Committer via `hive.blobstore.output-committer.magic.track.commits.in.memory.enabled`. By default, it is set to true but it takes effect only if `hive.blobstore.output-committer.magic.track.commits.in.memory.enabled` is enabled and saves additional S3 calls of create task attempt paths in blobstore. | 
| hive-site | `hive.blobstore.output-committer.magic.disable.fs.cache.for.llap` | true | Flag to toggle if blobstore FS caches should be disabled in write flows for LLAP when using Magic Committer. This flag comes into picture when LLAP is enabled, and is by default set to true. | 

# Amazon EMR 7.5.0 - Hive release notes
<a name="Hive-release-history-750"></a>

## Amazon EMR 7.5.0 - Hive changes
<a name="Hive-release-history-changes-750"></a>


****  

| Type | Description | 
| --- | --- | 
| Improvement | Increased maximum time to wait for Tez session to be opened while trying to use the existing session in HiveCLI to 10s | 
| Improvement | Tuned configs to improve performance in simple select queries with LIMIT | 

# Amazon EMR 7.4.0 - Hive release notes
<a name="Hive-release-history-740"></a>

## Amazon EMR 7.4.0 - Hive changes
<a name="Hive-release-history-changes-740"></a>


****  

| Type | Description | 
| --- | --- | 
| Upgrade | [HIVE-28191](https://issues.apache.org/jira/browse/HIVE-28191): Upgrade Hadoop Version to 3.4.0 | 
| Upgrade |  Upgrade hadoop shaded protobuf to 3.21 | 
| Upgrade | Upgrade commons-cli to 1.5.0 | 
| Upgrade | Upgrade commons-compress to 1.24.0 | 
| Upgrade | Upgrade commons-io to 2.14.0 | 
| Upgrade | Upgrade commons-lang3 to 3.21.0 | 
| Improvement | Change time to wait for Tez session to be opened while trying to use the existing session in HiveCLI to 10s | 
| Improvement | Enable short circuit mechanism in Tez DAG for simple select queries with LIMIT | 
| Improvement | [HIVE-21100](https://issues.apache.org/jira/browse/HIVE-21100): Allow flattening of table subdirectories resulted when using TEZ engine and UNION clause | 
| Bug Fix | [HIVE-25095](https://issues.apache.org/jira/browse/HIVE-25095): Beeline/hive -e command can't deal with query with trailing quote | 
| Bug Fix | [HIVE-13781](https://issues.apache.org/jira/browse/HIVE-13781): Tez Job failed with FileNotFoundException when partition dir doesnt exists  | 
| Bug Fix | [HIVE-28480](https://issues.apache.org/jira/browse/HIVE-28480): Disable SMB on partition hash generator mismatch across join branches in previous RS | 

### Amazon EMR 7.4.0 - New configurations
<a name="Hive-release-history-changes-740-new-configs"></a>


****  

| Classification | Name | Default | Description | 
| --- | --- | --- | --- | 
| hive-site | hive.ignore.failure.partition.dir.not.found | false | Ignores failure if the table partition exists but the actual object storage path does not exist. | 
| hive-site | hive.tez.union.flatten.subdirectories | false | When writing data into a table and UNION ALL is the last step of the query, Hive on Tez creates a subdirectory for each branch of the UNION ALL. When this property is enabled, the subdirectories are removed, and the files are renamed and moved to the parent directory. Note that this has no effect when hive.blobstore.use.output-committer is enabled. | 

# Amazon EMR 7.3.0 - Hive release notes
<a name="Hive-release-history-730"></a>

## Amazon EMR 7.3.0 - Hive changes
<a name="Hive-release-history-changes-730"></a>


****  

| Type | Description | 
| --- | --- | 
| Feature | [ HIVE-18728](https://issues.apache.org/jira/browse/HIVE-18728) – Secure webHCat with SSL. | 
| Improvement | Support configuring SSL keystore credentials for LLAP daemon web UI. | 
| Improvement | Provide option to control SSL hostname verification for Hive metastore server. | 
| Bug Fix | [ HIVE-26541](https://issues.apache.org/jira/browse/HIVE-26541) – NPE when starting WebHCat Service. | 
| Bug Fix | [ HIVE-23011](https://issues.apache.org/jira/browse/HIVE-23011) – Shared work optimizer should check residual predicates when comparing joins. | 
| Bug Fix | Fix **javax.security.sasl.SaslException**: No common protection layer between client and server between HMS and Namenode when In-Transit Encryption is enabled. | 
| Bug Fix | Fix **IOException** where end of orc split overlaps with the start of a block location. | 
| Bug Fix | Use column name delimiter instead of always splitting by comma when column names contain comma character and using CSVSerde. | 

### Amazon EMR 7.3.0 - New configurations
<a name="Hive-release-history-changes-730-new-configs"></a>


****  

| Classification | Name | Default | Description | 
| --- | --- | --- | --- | 
| hcatalog-webhcat-site | templeton.use.ssl | false | Set this to true for using SSL encryption for WebHCat server. | 
| hcatalog-webhcat-site | templeton.keystore.path |  | SSL certificate keystore location for WebHCat server. | 
| hcatalog-webhcat-site | templeton.keystore.password |  | SSL certificate keystore password for WebHCat server. | 
| hcatalog-webhcat-site | templeton.ssl.protocol.blacklist | SSLv2, SSLv3 | SSL Versions to disable for WebHCat server. | 
| hcatalog-webhcat-site | templeton.host | 0.0.0.0 | The host address the WebHCat server will listen on. | 
| hive-site | hive.metastore.ssl.enable.hostname.verification | false | Control hostname verification during SSL/TLS handshaking. | 
| hive-site | hive.llap.daemon.web.ssl.keystore.path |  | SSL certificate keystore location for LLAP daemon web UI. | 
| hive-site | hive.llap.daemon.web.ssl.keystore.password |  | SSL certificate keystore password for LLAP daemon web UI. | 
| hive-site | hive.metastore.hadoop.rpc.protection.override.to.authentication | false | When enabled, HMS always overrides the value for hadoop.rpc.protection for authentication in its set of configurations. | 

# Amazon EMR 7.2.0 - Hive release notes
<a name="Hive-release-history-720"></a>

## Amazon EMR 7.2.0 - Hive changes
<a name="Hive-release-history-changes-720"></a>


****  

| Type | Description | 
| --- | --- | 
| Upgrade | [ Parquet 1.13.1](https://github.com/apache/parquet-java/blob/apache-parquet-1.13.1/CHANGES.md) – Parquet is upgraded to 113.1. | 
| Improvement | [ HIVE-12930](https://issues.apache.org/jira/browse/HIVE-12930) – Support SSL shuffle for LLAP. | 
| Improvement | [ HIVE-23062](https://issues.apache.org/jira/browse/HIVE-23062) – Hive to check Yarn RM URL in TLS and Yarn HA mode for custom Tez queue. | 
| Bug Fix | [ HIVE-27952](https://issues.apache.org/jira/browse/HIVE-27952) – Hive fails to create SslContextFactory when KeyStore has multiple certificates. | 
| Bug Fix | [ HIVE-28085](https://issues.apache.org/jira/browse/HIVE-28085) – YarnQueueHelper fails to access HTTPS enabled YARN WebService. | 
| Bug Fix | [ HIVE-26436](https://issues.apache.org/jira/browse/HIVE-26436) – Hive on MR NullPointerException When initializeOp has not been called and close called. If the operator has not been initialized, skip close.. | 

### Amazon EMR 7.2.0 - New configurations
<a name="Hive-release-history-changes-720-new-configs"></a>


****  

| Classification | Name | Default | Description | 
| --- | --- | --- | --- | 
| hive-site | hive.llap.shuffle.ssl.enabled | false | Set to true along with *tez.runtime.shuffle.ssl.enable* to enable SSL shuffle for LLAP. | 

# Amazon EMR 7.1.0 - Hive release notes
<a name="Hive-release-history-710"></a>

## Amazon EMR 7.1.0 - Hive changes
<a name="Hive-release-history-changes-6150"></a>


****  

| Type | Description | 
| --- | --- | 
| Bug Fix | [ HIVE-24381](https://issues.apache.org/jira/browse/HIVE-24381) – Compressed text input returns 0 rows if skip header/footer is included. | 
| Bug Fix | [ HIVE-24190](https://issues.apache.org/jira/browse/HIVE-24190) – LLAP: ShuffleHandler might return DISK\$1ERROR\$1EXCEPTION according to TEZ-4233. | 
| Bug Fix | [ HIVE-23073](https://issues.apache.org/jira/browse/HIVE-23073) – Shade Netty. | 
| Bug Fix | [ HIVE-23073](https://issues.apache.org/jira/browse/HIVE-23073) – Shade Netty and upgrade to netty 4.1.48.Final. | 
| Bug Fix | [ HIVE-23148](https://issues.apache.org/jira/browse/HIVE-23148) – Llap external client flow is broken due to netty shading. | 
| Bug Fix | [ HIVE-25180](https://issues.apache.org/jira/browse/HIVE-25180) – Upgrades Netty. | 
| Bug Fix | [ HIVE-24524](https://issues.apache.org/jira/browse/HIVE-24524) – LLAP ShuffleHandler: upgrade to Netty4 and remove Netty3 dependency from hive where it's possible. | 
| Bug Fix | [ HIVE-28000](https://issues.apache.org/jira/browse/HIVE-28000) – Hive QL: the"not in" clause gives incorrect results when type coercion cannot take place. | 
| Bug Fix | [ HIVE-27993](https://issues.apache.org/jira/browse/HIVE-27993) – Netty4 ShuffleHandler should use 1 boss thread. | 
| Upgrade | Upgrades Netty to 4.1.100.Final | 
| Upgrade | Upgrades Jetty to 9.4.53.v20231009 | 
| Upgrade | Upgrades Zookeeper to 3.9.1 | 

## Amazon EMR 7.1.0 - Hive changes
<a name="emr-Hive-710-issues"></a>
+ Amazon EMR 7.1 upgrades Hive to Netty 4.1.100.Final to solve the security vulnerabilities in Netty3. Since hive-druid-handler has a dependency on netty3, Hive doesn't have the `hive-druid-handler` JAR in Hive's classpath in Amazon EMR 7.1. An upcoming Amazon EMR release will include it in Hive's classpath once the Druid handler supports 4.1.100.Final or later versions of Netty. Reach out to AWS support if you need the `hive-druid-handler` JAR in Amazon EMR releases 7.1 or higher.

# Amazon EMR 7.0.0 - Hive release notes
<a name="Hive-release-history-700"></a>

## Amazon EMR 7.0.0 - Hive changes
<a name="Hive-release-history-changes-700"></a>


****  

| Type | Description | 
| --- | --- | 
| Upgrade | Hive Runtime now uses Java 17 by default. Please refer [EMR 7.0.0 Release Guide](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-700-release.html) for more details. | 
| Backport | [HIVE-17709](https://issues.apache.org/jira/browse/HIVE-17709): remove sun.misc.Cleaner references | 
| Bug Fix | Disable Tez Async Init RR when LLAP or ACID is enabled  | 

# Amazon EMR 6.15.0 - Hive release notes
<a name="Hive-release-history-6150"></a>

## Amazon EMR 6.15.0 - Hive changes
<a name="Hive-release-history-changes-6150"></a>


****  

| Type | Description | 
| --- | --- | 
| Feature | Support for [TEZ-4397](https://issues.apache.org/jira/browse/TEZ-4397) – For Tez asynchronous split opening, Hive now supports the Tez configs described in [Tez asynchronous split opening](tez-configure.md#tez-configure-async). | 
| Bug fix | [HIVE-25400](https://issues.apache.org/jira/browse/HIVE-25400) – Move the offset updating in `BytesColumnVector` to `setValPreallocated`. | 
| Bug fix | [HIVE-25190](https://issues.apache.org/jira/browse/HIVE-25190) – Fix many small allocations in `BytesColumnVector`. | 
| Bug Fix | Packaging netty modules with llap server to avoid *NoClassDefFound* exception when starting *LLapDaemon* on worker nodes. | 
| Upgrade | Upgrade Apache Hadoop to 3.3.6. | 
| Upgrade | [HIVE-26684](https://issues.apache.org/jira/browse/HIVE-26684) – Upgrade `maven-shade-plugin` to 3.4.1. | 
| Improvement | To reduce Amazon EMR cluster startup time, remove 15 seconds of sleep time from the HCatalog startup script. | 

# Amazon EMR 6.14.0 - Hive release notes
<a name="Hive-release-history-6140"></a>

## Amazon EMR 6.14.0 - Hive changes
<a name="Hive-release-history-changes-6140"></a>


****  

| Type | Description | 
| --- | --- | 
|  Improvement  |  [HIVE-26762](https://issues.apache.org/jira/browse/HIVE-26762): Remove operand pruning in `HiveFilterSetOpTransposeRule` | 
|  Bug fix  |  [HIVE-27582](https://issues.apache.org/jira/browse/HIVE-27582): Do not cache HBase table input format in FetchOperator | 
|  Bug fix  |  [HIVE-26452](https://issues.apache.org/jira/browse/HIVE-26452): NPE when converting JOIN to MAPJOIN and JOIN column referenced more than once | 
|  Bug fix  |  [HIVE-26416](https://issues.apache.org/jira/browse/HIVE-26416): `AcidUtils.isRawFormatFile()` throws `InvalidProtocolBufferException` for non-ORC file  | 
|  Bug fix  |  [HIVE-26105](https://issues.apache.org/jira/browse/HIVE-26105): **Show columns** shows extra values if column **Comments** contains specific Chinese character  | 
|  Bug fix  |  [HIVE-25864](https://issues.apache.org/jira/browse/HIVE-25864): Hive query optimisation creates wrong plan for predicate pushdown with windowing function  | 
|  Bug fix  |  [HIVE-25224](https://issues.apache.org/jira/browse/HIVE-25224): Multiple INSERT statements involving tables with different `bucketing_versions` results in error | 
|  Bug fix  |  [HIVE-24151](https://issues.apache.org/jira/browse/HIVE-24151): `MultiDelimitSerDe` shifts data if strings contain non-ASCII characters | 
|  Bug fix  |  [HIVE-23606](https://issues.apache.org/jira/browse/HIVE-23606): (LLAP) Delay In `DirectByteBuffer` cleanup for `EncodedReaderImpl` | 
|  Bug fix  |  [HIVE-22165](https://issues.apache.org/jira/browse/HIVE-22165): Synchronisation introduced by [HIVE-14296](https://issues.apache.org/jira/browse/HIVE-14296) on `SessionManager.closeSession` causes high latency in a busy Hive server  | 
|  Bug fix  |  [HIVE-21304](https://issues.apache.org/jira/browse/HIVE-21304): Make bucketing version usage more robust | 

# Amazon EMR 6.13.0 - Hive release notes
<a name="Hive-release-history-6130"></a>

## Amazon EMR 6.13.0 - Hive changes
<a name="Hive-release-history-changes-6130"></a>


****  

| Type | Description | 
| --- | --- | 
|  Improvement  |  Upgrade Python Scripts to Support Python3  | 
|  Improvement  |  [HIVE-27097](https://issues.apache.org/jira/browse/HIVE-27097): Improve the retry strategy for MetaStore client and server  | 
|  Bug Fix  |  [HIVE-21778](https://issues.apache.org/jira/browse/HIVE-21778): CBO: “Struct is not null” gets evaluated as nullable always causing filter miss in the query  | 
|  Bug Fix  |  [HIVE-21009](https://issues.apache.org/jira/browse/HIVE-21009): Adding ability for user to set bind user  | 
|  Bug Fix  |  [HIVE-22661](https://issues.apache.org/jira/browse/HIVE-22661): Compaction fails on non bucketed table with data loaded inpath  | 
|  Bug Fix  |  [HIVE-19718](https://issues.apache.org/jira/browse/HIVE-19718): Adding partitions in bulk also fetches table for each partition  | 
|  Bug Fix  |  [HIVE-22173](https://issues.apache.org/jira/browse/HIVE-22173): Query with multiple lateral views hangs during compilation  | 
|  Bug Fix  |  [HIVE-27088](https://issues.apache.org/jira/browse/HIVE-27088): Incorrect results when inner and outer joins with post join filters are merged  | 
|  Bug Fix  |  [HIVE-21935](https://issues.apache.org/jira/browse/HIVE-21935): Hive Vectorization : degraded performance with vectorize UDF  | 
|  Bug Fix  |  [HIVE-25299](https://issues.apache.org/jira/browse/HIVE-25299): Casting timestamp to numeric data types is incorrect for non-UTC timezones  | 
|  Bug Fix  |  [HIVE-24626](https://issues.apache.org/jira/browse/HIVE-24626): LLAP: reader threads could be starvated if all IO elevator threads are busy to enqueue to another readers with full queue  | 
|  Bug Fix  |  [HIVE-27029](https://issues.apache.org/jira/browse/HIVE-27029): hive query fails with Filesystem closed error, Rework done for HIVE-26352  | 
|  Bug Fix  |  [HIVE-26352](https://issues.apache.org/jira/browse/HIVE-26352): Tez queue access check fails with GSS Exception on Compaction  | 
|  Bug Fix  |  [HIVE-24590](https://issues.apache.org/jira/browse/HIVE-24590): Operation logging still leaks log4j appenders  | 
|  Bug Fix  |  [HIVE-24552](https://issues.apache.org/jira/browse/HIVE-24552): Possible HMS connections leak or accumulation in loadDynamicPartitions  | 
|  Bug Fix  |  [HIVE-27069](https://issues.apache.org/jira/browse/HIVE-27069): Incorrect results with bucket map join  | 
|  Bug Fix  |  [HIVE-27344](https://issues.apache.org/jira/browse/HIVE-27344): Add a null check in RecordReaderImpl\$1close  | 
|  Bug Fix  |  [HIVE-27439](https://issues.apache.org/jira/browse/HIVE-27439): Support space in Decimal  | 
|  Bug Fix  |  [HIVE-27267](https://issues.apache.org/jira/browse/HIVE-27267): Incorrect results when doing bucket map join on decimal bucketed column with subquery  | 
|  Bug Fix  |  [HIVE-21986](https://issues.apache.org/jira/browse/HIVE-21986): HiveServer Web UI: Setting the Strict-Transport-Security in default response header  | 
|  Bug Fix  |  [HIVE-22148](https://issues.apache.org/jira/browse/HIVE-22148): S3A delegation tokens are not added in the job config of the Compactor.  | 
|  Bug Fix  |  [HIVE-22622](https://issues.apache.org/jira/browse/HIVE-22622): Hive allows to create a struct with duplicate attribute names  | 
|  Bug Fix  |  [HIVE-22008](https://issues.apache.org/jira/browse/HIVE-22008): LIKE Operator should match multi-line input  | 
|  Bug Fix  |  [HIVE-23144](https://issues.apache.org/jira/browse/HIVE-23144): LLAP: Let QueryTracker cleanup on serviceStop  | 
|  Bug Fix  |  [HIVE-22391](https://issues.apache.org/jira/browse/HIVE-22391): NPE while checking Hive query results cache  | 
|  Bug Fix  |  [HIVE-23305](https://issues.apache.org/jira/browse/HIVE-23305): NullPointerException in LlapTaskSchedulerService addNode due to race condition  | 
|  Bug Fix  |  [HIVE-22178](https://issues.apache.org/jira/browse/HIVE-22178): Parquet FilterPredicate throws CastException after SchemaEvolution  | 
|  Bug Fix  |  [HIVE-21517](https://issues.apache.org/jira/browse/HIVE-21517): Fix AggregateStatsCache  | 
|  Bug Fix  |  [HIVE-21825](https://issues.apache.org/jira/browse/HIVE-21825): Improve client error msg when Active/Passive HA is enabled  | 
|  Bug Fix  |  [HIVE-23389](https://issues.apache.org/jira/browse/HIVE-23389): FilterMergeRule can lead to AssertionError  | 
|  Bug Fix  |  [HIVE-22767](https://issues.apache.org/jira/browse/HIVE-22767): Beeline doesn’t parse semicolons in comments properly  | 
|  Bug Fix  |  [HIVE-22996](https://issues.apache.org/jira/browse/HIVE-22996): BasicStats parsing should check proactively for null or empty string  | 
|  Bug Fix  |  [HIVE-22808](https://issues.apache.org/jira/browse/HIVE-22808): HiveRelFieldTrimmer does not handle HiveTableFunctionScan  | 
|  Bug Fix  |  [HIVE-22437](https://issues.apache.org/jira/browse/HIVE-22437): LLAP Metadata cache NPE on locking metadata.  | 
|  Bug Fix  |  [HIVE-22606](https://issues.apache.org/jira/browse/HIVE-22606): AvroSerde logs avro.schema.literal under INFO level  | 
|  Bug Fix  |  [HIVE-22713](https://issues.apache.org/jira/browse/HIVE-22713): Constant propagation shouldn’t be done for Join-Fil(\$1)-RS structure  | 
|  Bug Fix  |  [HIVE-21624](https://issues.apache.org/jira/browse/HIVE-21624): LLAP: Cpu metrics at thread level is broken  | 
|  Bug Fix  |  [HIVE-22815](https://issues.apache.org/jira/browse/HIVE-22815): reduce the unnecessary file system object creation in MROutput  | 
|  Bug Fix  |  [HIVE-23060](https://issues.apache.org/jira/browse/HIVE-23060): Query failing with error “Grouping sets expression is not in GROUP BY key. Error encountered near token”  | 
|  Bug Fix  |  [HIVE-22236](https://issues.apache.org/jira/browse/HIVE-22236): Fail to create View selecting View containing `NOT IN` subquery  | 
|  Bug Fix  |  [HIVE-19886](https://issues.apache.org/jira/browse/HIVE-19886): Logs may be directed to 2 files if —hiveconf hive.log.file is used  | 
|  Bug Fix  |  [HIVE-20620](https://issues.apache.org/jira/browse/HIVE-20620): manifest collisions when inserting into bucketed sorted MM tables with dynamic partitioning  | 
|  Bug Fix  |  [HIVE-14557](https://issues.apache.org/jira/browse/HIVE-14557): Nullpointer When both SkewJoin and Mapjoin Enabled  | 
|  Bug Fix  |  [HIVE-20471](https://issues.apache.org/jira/browse/HIVE-20471): issues getting the default database path  | 
|  Bug Fix  |  [HIVE-20598](https://issues.apache.org/jira/browse/HIVE-20598): Fix typos in HiveAlgorithmsUtil calculations  | 
|  Bug Fix  |  [HIVE-14737](https://issues.apache.org/jira/browse/HIVE-14737): Problem accessing /logs in a Kerberized Hive Server 2 Web UI  | 
|  Bug Fix  |  [HIVE-20733](https://issues.apache.org/jira/browse/HIVE-20733): GenericUDFOPEqualNS may not use = in plan descriptions  | 
|  Bug Fix  |  [HIVE-20848](https://issues.apache.org/jira/browse/HIVE-20848): After setting UpdateInputAccessTimeHook query fail with Table Not Found.  | 
|  Bug Fix  |  [HIVE-18929](https://issues.apache.org/jira/browse/HIVE-18929): The method humanReadableInt in HiveStringUtils.java has a race condition.  | 
|  Bug Fix  |  [HIVE-20841](https://issues.apache.org/jira/browse/HIVE-20841): LLAP: Make dynamic ports configurable  | 
|  Bug Fix  |  [HIVE-20930](https://issues.apache.org/jira/browse/HIVE-20930): VectorCoalesce in FILTER mode doesn’t take effect  | 
|  Bug Fix  |  [HIVE-21007](https://issues.apache.org/jira/browse/HIVE-21007): Semi join \$1 Union can lead to wrong plans  | 
|  Bug Fix  |  [HIVE-21074](https://issues.apache.org/jira/browse/HIVE-21074): Hive bucketed table query pruning does not work for IS NOT NULL condition  | 
|  Bug Fix  |  [HIVE-21223](https://issues.apache.org/jira/browse/HIVE-21223): CachedStore returns null partition when partition does not exist  | 
|  Bug Fix  |  [HIVE-19625](https://issues.apache.org/jira/browse/HIVE-19625): Potential NPE and hiding actual exception in Hive\$1copyFiles  | 
|  Bug Fix  |  [HIVE-17020](https://issues.apache.org/jira/browse/HIVE-17020): Aggressive RS dedup can incorrectly remove OP tree branch  | 
|  Bug Fix  |  [HIVE-20168](https://issues.apache.org/jira/browse/HIVE-20168): ReduceSinkOperator Logging Hidden  | 
|  Bug Fix  |  [HIVE-20879](https://issues.apache.org/jira/browse/HIVE-20879): Using null in a projection expression leads to CastException  | 
|  Bug Fix  |  [HIVE-20888](https://issues.apache.org/jira/browse/HIVE-20888): TxnHandler: sort() called on immutable lists  | 
|  Bug Fix  |  [HIVE-19948](https://issues.apache.org/jira/browse/HIVE-19948): HiveCli is not splitting the command by semicolon properly if quotes are inside the string  | 
|  Bug Fix  |  [HIVE-20621](https://issues.apache.org/jira/browse/HIVE-20621): GetOperationStatus called in resultset.next causing incremental slowness  | 
|  Bug Fix  |  [HIVE-20854](https://issues.apache.org/jira/browse/HIVE-20854): Sensible Defaults: Hive’s Zookeeper heartbeat interval is 20 minutes, change to 2  | 
|  Bug Fix  |  [HIVE-20330](https://issues.apache.org/jira/browse/HIVE-20330): HCatLoader cannot handle multiple InputJobInfo objects for a job with multiple inputs  | 
|  Bug Fix  |  [HIVE-20787](https://issues.apache.org/jira/browse/HIVE-20787): MapJoinBytesTableContainer dummyRow case doesn’t handle reuse  | 
|  Bug Fix  |  [HIVE-20331](https://issues.apache.org/jira/browse/HIVE-20331): Query with union all, lateral view and Join fails with “cannot find parent in the child operator”  | 
|  Bug Fix  |  [HIVE-19968](https://issues.apache.org/jira/browse/HIVE-19968): UDF exception is not throw out  | 
|  Bug Fix  |  [HIVE-20410](https://issues.apache.org/jira/browse/HIVE-20410): aborted Insert Overwrite on transactional table causes “Not enough history available for…” error  | 
|  Bug Fix  |  [HIVE-20059](https://issues.apache.org/jira/browse/HIVE-20059): Hive streaming should try shade prefix unconditionally on exception  | 
|  Bug Fix  |  [HIVE-19424](https://issues.apache.org/jira/browse/HIVE-19424): NPE In MetaDataFormatters  | 
|  Bug Fix  |  [HIVE-20355](https://issues.apache.org/jira/browse/HIVE-20355): Clean up parameter of HiveConnection.setSchema  | 
|  Bug Fix  |  [HIVE-20858](https://issues.apache.org/jira/browse/HIVE-20858): Serializer is not correctly initialized with configuration in Utilities.createEmptyBuckets  | 
|  Bug Fix  |  [HIVE-20424](https://issues.apache.org/jira/browse/HIVE-20424): schematool shall not pollute beeline history  | 
|  Bug Fix  |  [HIVE-20338](https://issues.apache.org/jira/browse/HIVE-20338): LLAP: Force synthetic file-id for filesystems which have HDFS protocol impls with POSIX mutation semantics  | 
|  Bug Fix  |  [HIVE-11708](https://issues.apache.org/jira/browse/HIVE-11708): Logical operators raises ClassCastExceptions with NULL  | 
|  Bug Fix  |  [HIVE-21082](https://issues.apache.org/jira/browse/HIVE-21082): In HPL/SQL, declare statement does not support variable of type character  | 
|  Bug Fix  |  [HIVE-16690](https://issues.apache.org/jira/browse/HIVE-16690): Configure Tez cartesian product edge based on LLAP cluster size  | 
|  Bug Fix  |  [HIVE-14516](https://issues.apache.org/jira/browse/HIVE-14516): OrcInputFormat.SplitGenerator.callInternal  | 
|  Bug Fix  |  [HIVE-20981](https://issues.apache.org/jira/browse/HIVE-20981): streaming/AbstractRecordWriter leaks HeapMemoryMonitor  | 
|  Bug Fix  |  [HIVE-20043](https://issues.apache.org/jira/browse/HIVE-20043): HiveServer2: SessionState has a static sync block around an AtomicBoolean  | 
|  Bug Fix  |  [HIVE-20191](https://issues.apache.org/jira/browse/HIVE-20191): PreCommit patch application doesn’t fail if patch is empty  | 
|  Bug Fix  |  [HIVE-20400](https://issues.apache.org/jira/browse/HIVE-20400): create table should always use a fully qualified path to avoid potential FS ambiguity  | 
|  Bug Fix  |  Add null check for skewedInfo before accessing skewed columns  | 

# Amazon EMR 6.12.0 - Hive release notes
<a name="Hive-release-history-6120"></a>

## Amazon EMR 6.12.0 - Hive changes
<a name="Hive-release-history-changes-6120"></a>


****  

| Type | Description | 
| --- | --- | 
| Improvement | Added Support For JDK 11 and JDK 17 Runtime | 
| Improvement | Add support to query case sensitive and reserved keyword column names when using S3 Select. To use it, define table property in format "s3select.column.mapping" = "column1:fieldName1, column2:fieldName2,..." | 
| Improvement | [HIVE-23133](https://issues.apache.org/jira/browse/HIVE-23133): Numeric operations can have different result across hardware archs | 
| Improvement | [HIVE-27145](https://issues.apache.org/jira/browse/HIVE-27145): Use StrictMath for remaining Math functions as followup of HIVE-23133 | 
| Bug Fix | Fix wildcard incompatibility in get\$1partitions\$1by\$1filter and get\$1num\$1partitions\$1by\$1filter HMS APIs caused by porting [HIVE-22900](https://issues.apache.org/jira/browse/HIVE-22900) in EMR Hive 6.4.0 | 
| Bug Fix | [HIVE-26736](https://issues.apache.org/jira/browse/HIVE-26736): Authorization failure for nested Views having WITH clause | 
| Bug Fix | [HIVE-22416](https://issues.apache.org/jira/browse/HIVE-22416): MR-related operation logs missing when parallel execution is enabled | 
| Bug Fix | [HIVE-19653](https://issues.apache.org/jira/browse/HIVE-19653): Incorrect predicate pushdown for groupby with grouping sets | 
| Bug Fix | [HIVE-22094](https://issues.apache.org/jira/browse/HIVE-22094): queries failing with ClassCastException: hive.ql.exec.vector.DecimalColumnVector cannot be cast to hive.ql.exec.vector.Decimal64ColumnVector | 
| Bug Fix | [HIVE-26340](https://issues.apache.org/jira/browse/HIVE-26340): Vectorized PTF operator fails if query has upper case window function | 
| Bug Fix | [HIVE-26184](https://issues.apache.org/jira/browse/HIVE-26184): COLLECT\$1SET with GROUP BY is very slow when some keys are highly skewed | 
| Bug Fix | [HIVE-26373](https://issues.apache.org/jira/browse/HIVE-26373): ClassCastException when reading timestamps from HBase table with Avro data | 
| Bug Fix | [HIVE-26388](https://issues.apache.org/jira/browse/HIVE-26388): ClassCastException when there is non string type column in source table of CTAS query Upgrade [HIVE-26172](https://issues.apache.org/jira/browse/HIVE-26172): Hive - Upgrade Ant to 1.10.11 due to CVE-2021-36373 and CVE-2021-36374 | 
| Bug Fix | [HIVE-26114](https://issues.apache.org/jira/browse/HIVE-26114): Fix jdbc connection hiveserver2 using dfs command with prefix space will cause exceptio | 
| Bug Fix | [HIVE-26396](https://issues.apache.org/jira/browse/HIVE-26396): The trunc function has a problem with precision interception and the result has many 0 | 
| Bug Fix | [HIVE-26446](https://issues.apache.org/jira/browse/HIVE-26446): HiveProtoLoggingHook fails to populate TablesWritten field for partitioned tables. | 
| Bug Fix | [HIVE-26639](https://issues.apache.org/jira/browse/HIVE-26639): ConstantVectorExpression and ExplainTask shouldn't rely on default charset | 
| Bug Fix | [HIVE-22670](https://issues.apache.org/jira/browse/HIVE-22670): ArrayIndexOutOfBoundsException when vectorized reader is used for reading a parquet file | 
| Bug Fix | [HIVE-23607](https://issues.apache.org/jira/browse/HIVE-23607): Permission Issue: Create view on another view succeeds but alter view fails | 
| Bug Fix | [HIVE-25498](https://issues.apache.org/jira/browse/HIVE-25498): Query with more than 31 count distinct functions returns wrong result | 
| Bug Fix | [HIVE-25780](https://issues.apache.org/jira/browse/HIVE-25780): DistinctExpansion creates more than 64 grouping sets II | 
| Bug Fix | [HIVE-23868](https://issues.apache.org/jira/browse/HIVE-23868): Windowing function spec: support 0 preceeding/following | 
| Bug Fix | [HIVE-24539](https://issues.apache.org/jira/browse/HIVE-24539): OrcInputFormat schema generation should respect column delimiter | 
| Bug Fix | [HIVE-23476](https://issues.apache.org/jira/browse/HIVE-23476): LLAP: Preallocate arenas for mmap case as well | 
| Bug Fix | [HIVE-25806](https://issues.apache.org/jira/browse/HIVE-25806): Possible leak in LlapCacheAwareFs - Parquet, LLAP IO | 
| Bug Fix | [HIVE-23498](https://issues.apache.org/jira/browse/HIVE-23498): Disable HTTP Trace method on ThriftHttpCliService | 
| Bug Fix | [HIVE-25729](https://issues.apache.org/jira/browse/HIVE-25729): ThriftUnionObjectInspector should be notified when fully inited | 
| Bug Fix | [HIVE-23846](https://issues.apache.org/jira/browse/HIVE-23846): Avoid unnecessary serialization and deserialization of bitvectors | 
| Bug Fix | [HIVE-24233](https://issues.apache.org/jira/browse/HIVE-24233): except subquery throws nullpointer with cbo disabled | 
| Bug Fix | [HIVE-24276](https://issues.apache.org/jira/browse/HIVE-24276): HiveServer2 loggerconf jsp Cross-Site Scripting (XSS) Vulnerability | 
| Bug Fix | [HIVE-25721](https://issues.apache.org/jira/browse/HIVE-25721): Outer join result is wrong | 
| Bug Fix | [HIVE-25223](https://issues.apache.org/jira/browse/HIVE-25223): Select with limit returns no rows on non native table | 
| Bug Fix | [HIVE-25794](https://issues.apache.org/jira/browse/HIVE-25794): CombineHiveRecordReader: log statements in a loop leads to memory pressure | 
| Bug Fix | [HIVE-23602](https://issues.apache.org/jira/browse/HIVE-23602): Use Java Concurrent Package for Operation Handle Set | 
| Bug Fix | [HIVE-24045](https://issues.apache.org/jira/browse/HIVE-24045): No logging related to when default database is created | 
| Bug Fix | [HIVE-24305](https://issues.apache.org/jira/browse/HIVE-24305): avro decimal schema is not properly populating scale/precision if value is enclosed in quote | 
| Bug Fix | [HIVE-25040](https://issues.apache.org/jira/browse/HIVE-25040): Drop database cascade cannot remove persistent functions | 
| Bug Fix | [HIVE-23501](https://issues.apache.org/jira/browse/HIVE-23501): AOOB in VectorDeserializeRow when complex types are converted to primitive types | 
| Bug Fix | [HIVE-23704](https://issues.apache.org/jira/browse/HIVE-23704): Thrift HTTP Server Does Not Handle Auth Handle Correctly | 
| Bug Fix | [HIVE-23529](https://issues.apache.org/jira/browse/HIVE-23529): CTAS is broken for uniontype when row\$1deserialize | 
| Bug Fix | [HIVE-24144](https://issues.apache.org/jira/browse/HIVE-24144): getIdentifierQuoteString in HiveDatabaseMetaData returns incorrect value | 
| Bug Fix | [HIVE-23850](https://issues.apache.org/jira/browse/HIVE-23850): Allow PPD when subject is not a column with grouping sets present | 
| Bug Fix | [HIVE-25919](https://issues.apache.org/jira/browse/HIVE-25919): ClassCastException when pushing boolean column predicate in HBaseStorageHandler | 
| Bug Fix | [HIVE-25261](https://issues.apache.org/jira/browse/HIVE-25261): RetryingHMSHandler should wrap the MetaException with short description of the target | 
| Bug Fix | [HIVE-24792](https://issues.apache.org/jira/browse/HIVE-24792): Potential thread leak in Operation | 
| Bug Fix | [HIVE-23409](https://issues.apache.org/jira/browse/HIVE-23409): If TezSession application reopen fails for Timeline service down, default TezSession from SessionPool is closed after a retry | 
| Bug Fix | [HIVE-23615](https://issues.apache.org/jira/browse/HIVE-23615): Do not deference null pointers in Beeline Commands Class | 
| Bug Fix | [HIVE-24849](https://issues.apache.org/jira/browse/HIVE-24849): Create external table socket timeout when location has large number of files (affects 3.1.2) | 
| Bug Fix | [HIVE-25209](https://issues.apache.org/jira/browse/HIVE-25209): SELECT query with SUM function producing unexpected result | 
| Bug Fix | [HIVE-23666](https://issues.apache.org/jira/browse/HIVE-23666): checkHashModeEfficiency is skipped when a groupby operator doesn't have a grouping set | 
| Bug Fix | [HIVE-23873](https://issues.apache.org/jira/browse/HIVE-23873): Querying Hive JDBCStorageHandler table fails with NPE when CBO is off | 
| Bug Fix | [HIVE-24149](https://issues.apache.org/jira/browse/HIVE-24149): HiveStreamingConnection doesn't close HMS connection | 
| Bug Fix | [HIVE-25561](https://issues.apache.org/jira/browse/HIVE-25561): Killed task should not commit file. (affects 2.x and 3.x versions) | 
| Bug Fix | [HIVE-25683](https://issues.apache.org/jira/browse/HIVE-25683): Close reader in AcidUtils.isRawFormatFile | 
| Bug Fix | [HIVE-24294](https://issues.apache.org/jira/browse/HIVE-24294): TezSessionPool sessions can throw AssertionError | 
| Bug Fix | [HIVE-24182](https://issues.apache.org/jira/browse/HIVE-24182): Ranger authorization issue with permanent UDFs | 
| Bug Fix | [HIVE-22805](https://issues.apache.org/jira/browse/HIVE-22805): Vectorization with conditional array or map is not implemented and throws an error | 
| Bug Fix | [HIVE-22828](https://issues.apache.org/jira/browse/HIVE-22828): Decimal64: NVL and CASE statements implicitly convert decimal64 to 128 | 
| Bug Fix | [HIVE-21398](https://issues.apache.org/jira/browse/HIVE-21398): Columns which has estimated statistics should not be considered as unique keys | 
| Bug Fix | [HIVE-22490](https://issues.apache.org/jira/browse/HIVE-22490): Adding jars with special characters in their path throws error | 
| Bug Fix | [HIVE-22700](https://issues.apache.org/jira/browse/HIVE-22700): Compactions may leak memory when unauthorized | 
| Bug Fix | [HIVE-22053](https://issues.apache.org/jira/browse/HIVE-22053): Function name is not normalized when creating function | 
| Bug Fix | [HIVE-22595](https://issues.apache.org/jira/browse/HIVE-22595): Dynamic partition inserts fail on Avro table table with external schema | 
| Bug Fix | [HIVE-21795](https://issues.apache.org/jira/browse/HIVE-21795): Rollup summary row might be missing when a mapjoin is happening on a partitioned table | 
| Bug Fix | [HIVE-22987](https://issues.apache.org/jira/browse/HIVE-22987): ClassCastException in VectorCoalesce when DataTypePhysicalVariation is null | 
| Bug Fix | [HIVE-22219](https://issues.apache.org/jira/browse/HIVE-22219): Bringing a node manager down blocks restart of LLAP service | 
| Bug Fix | [HIVE-21793](https://issues.apache.org/jira/browse/HIVE-21793): CBO retrieves column stats even if hive.stats.fetch.column.stats is set to false | 
| Bug Fix | [HIVE-22163](https://issues.apache.org/jira/browse/HIVE-22163): CBO: Enabling CBO turns on stats estimation, even when the estimation is disabled | 
| Bug Fix | [HIVE-18735](https://issues.apache.org/jira/browse/HIVE-18735): Create table like loses transactional attribute | 
| Bug Fix | [HIVE-22433](https://issues.apache.org/jira/browse/HIVE-22433): Hive JDBC Storage Handler: Incorrect results fetched from BOOLEAN and TIMESTAMP DataType From JDBC Data Source | 
| Bug Fix | [HIVE-19430](https://issues.apache.org/jira/browse/HIVE-19430): ObjectStore.cleanNotificationEvents OutOfMemory on large number of pending events | 
| Bug Fix | [HIVE-20785](https://issues.apache.org/jira/browse/HIVE-20785): Wrong key name in the JDBC DatabaseMetaData.getPrimaryKeys method | 
| Bug Fix | [HIVE-16116](https://issues.apache.org/jira/browse/HIVE-16116): Beeline throws NPE when beeline.hiveconfvariables=\$1\$1 in beeline.properties | 
| Bug Fix | [HIVE-20066](https://issues.apache.org/jira/browse/HIVE-20066): hive.load.data.owner is compared to full principal | 
| Bug Fix | [HIVE-20489](https://issues.apache.org/jira/browse/HIVE-20489): Explain plan of query hangs | 
| Bug Fix | [HIVE-21033](https://issues.apache.org/jira/browse/HIVE-21033): Forgetting to close operation cuts off any more HiveServer2 output | 
| Bug Fix | [HIVE-19888](https://issues.apache.org/jira/browse/HIVE-19888): Misleading "METASTORE\$1FILTER\$1HOOK will be ignored" warning from SessionState | 
| Bug Fix | [HIVE-20303](https://issues.apache.org/jira/browse/HIVE-20303): INSERT OVERWRITE TABLE db.table PARTITION (...) IF NOT EXISTS throws InvalidTableException | 
| Bug Fix | [HIVE-16144](https://issues.apache.org/jira/browse/HIVE-16144): CompactionInfo doesn't have equals/hashCode but used in Set | 
| Bug Fix | [HIVE-20818](https://issues.apache.org/jira/browse/HIVE-20818): Views created with a WHERE subquery will regard views referenced in the subquery as direct input | 
| Bug Fix | [HIVE-21005](https://issues.apache.org/jira/browse/HIVE-21005): LLAP: Reading more stripes per-split leaks ZlibCodecs | 
| Bug Fix | [HIVE-20771](https://issues.apache.org/jira/browse/HIVE-20771): LazyBinarySerDe fails on empty structs. | 
| Bug Fix | [HIVE-18852](https://issues.apache.org/jira/browse/HIVE-18852): Misleading error message in alter table validation | 
| Bug Fix | [HIVE-21124](https://issues.apache.org/jira/browse/HIVE-21124): HPL/SQL does not support the CREATE TABLE LIKE statement | 
| Bug Fix | [HIVE-20935](https://issues.apache.org/jira/browse/HIVE-20935): Upload of llap package tarball fails in EC2 causing LLAP service start failure | 
| Bug Fix | [HIVE-20409](https://issues.apache.org/jira/browse/HIVE-20409): Hive ACID: Update/delete/merge does not clean hdfs staging directory | 
| Bug Fix | [HIVE-20570](https://issues.apache.org/jira/browse/HIVE-20570): Union ALL with hive.optimize.union.remove=true has incorrect plan | 
| Bug Fix | [HIVE-20421](https://issues.apache.org/jira/browse/HIVE-20421): Illegal character entity '\$1b' in hive-default.xml.template | 
| Bug Fix | [HIVE-19133](https://issues.apache.org/jira/browse/HIVE-19133): HS2 WebUI phase-wise performance metrics not showing correctly | 
| Bug Fix | [HIVE-18977](https://issues.apache.org/jira/browse/HIVE-18977): Listing partitions returns different results with JDO and direct SQL | 
| Bug Fix | [HIVE-20034](https://issues.apache.org/jira/browse/HIVE-20034): Roll back MetaStore exception handling changes for backward compatibility | 
| Bug Fix | [HIVE-20672](https://issues.apache.org/jira/browse/HIVE-20672): Logging thread in LlapTaskSchedulerService should report every fixed interval | 
| Bug Fix | [HIVE-12812](https://issues.apache.org/jira/browse/HIVE-12812): Enable mapred.input.dir.recursive by default to support union with aggregate function | 
| Bug Fix | [HIVE-20147](https://issues.apache.org/jira/browse/HIVE-20147): Hive streaming ingest is contented on synchronized logging | 
| Bug Fix | [HIVE-19203](https://issues.apache.org/jira/browse/HIVE-19203): Thread-Safety Issue in HiveMetaStore | 
| Bug Fix | [HIVE-20091](https://issues.apache.org/jira/browse/HIVE-20091): Tez: Add security credentials for FileSinkOperator output | 
| Bug Fix | [HIVE-16906](https://issues.apache.org/jira/browse/HIVE-16906): Hive ATSHook should check for yarn.timeline-service.enabled before connecting to ATS | 
| Bug Fix | [HIVE-20714](https://issues.apache.org/jira/browse/HIVE-20714): SHOW tblproperties for a single property returns the value in the name column | 
| Bug Fix | [HIVE-24730](https://issues.apache.org/jira/browse/HIVE-24730): Shims classes override values from hive-site.xml and tez-site.xml silently | 
| Bug Fix | [HIVE-22055](https://issues.apache.org/jira/browse/HIVE-22055): select count gives incorrect result after loading data from text file | 

# Amazon EMR 6.11.0 - Hive release notes
<a name="Hive-release-history-6110"></a>

## Amazon EMR 6.11.0 - Hive changes
<a name="Hive-release-history-changes-6110"></a>


****  

| Type | Description | 
| --- | --- | 
| Improvement | Added support for multithreaded dropping of partitions to improve the performance of dropping of partitions | 
| Improvement | Support reading encoded Hive query files | 
| Improvement | Enabled Tez Shuffle Handler by default for Hive on Tez jobs | 
| Bug | Added an option to enable deterministic distribution of keys to reducers to fix incorrect result when hive.groupby.skewindata is enabled (reported in [HIVE-20220](https://issues.apache.org/jira/browse/HIVE-20220)) | 
| Bug | Fixed stats computation failure when default partition name is configured | 
| Bug | Respect any custom SSL classification parameters passed when SSL is configured out of the box for HiveServer2 in a cluster with in-transit encryption enabled | 
| Backport | [HIVE-23617](https://issues.apache.org/jira/browse/HIVE-23617): Fixed storage-api FindBug issues | 
| Backport |  [HIVE-26408](https://issues.apache.org/jira/browse/HIVE-26408): Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output | 
| Backport | [HIVE-23614](https://issues.apache.org/jira/browse/HIVE-23614): Always pass HiveConfig to removeTempOrDuplicateFiles | 
| Backport | [HIVE-23354](https://issues.apache.org/jira/browse/HIVE-23354): Remove file size sanity checking from compareTempOrDuplicateFiles | 
| Backport | [HIVE-20344](https://issues.apache.org/jira/browse/HIVE-20344): Fixed PrivilegeSynchronizer for SBA throwing AccessControlException. Also introduced property hive.privilege.synchronizer to disable privilege synchronizer | 
| Backport | [HIVE-15826](https://issues.apache.org/jira/browse/HIVE-15826): Support configuring 'serialization.encoding' for all SerDes | 
| Backport | [HIVE-18284](https://issues.apache.org/jira/browse/HIVE-18284): Fix NPE when inserting data with 'distribute by' clause with dynpart sort optimization | 
| Backport | [HIVE-24930](https://issues.apache.org/jira/browse/HIVE-24930): Operator.setDone() short-circuit from child op is not used in vectorized codepath (if childSize == 1) | 
| Backport | [HIVE-24523](https://issues.apache.org/jira/browse/HIVE-24523): Vectorized read path for LazySimpleSerde does not honor the SERDEPROPERTIES for timestamp | 
| Backport | [HIVE-23265](https://issues.apache.org/jira/browse/HIVE-23265): Duplicate rowsets are returned with Limit and Offset set | 
| Backport | [HIVE-21492](https://issues.apache.org/jira/browse/HIVE-21492): VectorizedParquetRecordReader can't to read parquet file generated using thrift/custom tool | 
| Backport | [HIVE-22540](https://issues.apache.org/jira/browse/HIVE-22540): Vectorization: Decimal64 columns don't work with VectorizedBatchUtil.makeLikeColumnVector() | 
| Backport | [HIVE-22588](https://issues.apache.org/jira/browse/HIVE-22588): Flush the remaining rows for the rest of the grouping sets when switching the vector groupby mode | 
| Backport | [HIVE-22551](https://issues.apache.org/jira/browse/HIVE-22551): BytesColumnVector initBuffer should clean vector and length consistently | 
| Backport | [HIVE-22448](https://issues.apache.org/jira/browse/HIVE-22448): CBO: Expand the multiple count distinct with a group-by key | 
| Backport | [HIVE-22248](https://issues.apache.org/jira/browse/HIVE-22248): Fix statistics persisting issues | 
| Backport | [HIVE-22210](https://issues.apache.org/jira/browse/HIVE-22210): Vectorization may reuse computation output columns involved in filtering | 
| Backport | [HIVE-21531](https://issues.apache.org/jira/browse/HIVE-21531): Vectorization: all NULL hashcodes are not computed using Murmur3 | 
| Backport | [HIVE-20419](https://issues.apache.org/jira/browse/HIVE-20419): Vectorization: Prevent mutation of VectorPartitionDesc after being used in a hashmap key | 
| Backport | [HIVE-19388](https://issues.apache.org/jira/browse/HIVE-19388): ClassCastException during VectorMapJoinCommonOperator initialization | 
| Backport | [HIVE-21584](https://issues.apache.org/jira/browse/HIVE-21584): Java 11 preparation: system class loader is not URLClassLoader | 
| Backport | [HIVE-25107](https://issues.apache.org/jira/browse/HIVE-25107): Classpath logging should be on DEBUG level (\$12271) | 
| Backport | [HIVE-22097](https://issues.apache.org/jira/browse/HIVE-22097): Incompatible java.util.ArrayList for java 11 | 
| Backport | [HIVE-23938](https://issues.apache.org/jira/browse/HIVE-23938): LLAP: JDK11 - some GC log file rotation related jvm arguments cannot be used anymore | 
| Backport | [HIVE-26226](https://issues.apache.org/jira/browse/HIVE-26226): Exclude jdk.tools dep from hive-metastore in upgrade-acid | 
| Backport | [HIVE-17879](https://issues.apache.org/jira/browse/HIVE-17879): Upgrade Datanucleus Maven Plugin | 
| Backport | [HIVE-27004](https://issues.apache.org/jira/browse/HIVE-27004): DateTimeFormatterBuilder\$1appendZoneText cannot parse 'UTC\$1' in Java versions higher than 8 | 
| Backport | [HIVE-16812](https://issues.apache.org/jira/browse/HIVE-16812): VectorizedOrcAcidRowBatchReader doesn't filter delete events | 
| Backport | [HIVE-17917](https://issues.apache.org/jira/browse/HIVE-17917): VectorizedOrcAcidRowBatchReader.computeOffsetAndBucket optimization | 
| Backport | [HIVE-19985](https://issues.apache.org/jira/browse/HIVE-19985): ACID: Skip decoding the ROW\$1\$1ID sections for read-only queries | 
| Backport | [HIVE-20635](https://issues.apache.org/jira/browse/HIVE-20635): VectorizedOrcAcidRowBatchReader doesn't filter delete events for original files | 
| Upgrade | Upgrade Javadoc to 3.3.1 | 
| Upgrade | Upgrade Javassist to 3.24.1-GA | 
| Upgrade | Update apache-directory-server to 2.0.0-M14 | 

## New configurations
<a name="Hive-release-history-changes-6110-new-configurations"></a>


****  

| Name | Classification | Description | 
| --- | --- | --- | 
| hive.metastore.fs.drop.partition.threads | hive-site | Number of core threads in the drop partition thread pool. | 
| hive.metastore.fs.drop.partition.keepalive.time | hive-site | Time in seconds that an idle drop partition async thread (from the thread pool) will wait for a new task to arrive before terminating. | 
| hive.metastore.fs.drop.partition.threadpool.max.queue.size | hive-site | Maximum queue size to be used in thread pool for dropping of partitions from file system. | 
| hive.groupby.enable.deterministic.distribution | hive-site | Enable deterministic distribution of keys to reducers. It will pass a constant seed value while calling the rand function used for random partitioning. | 
| hive.privilege.synchronizer | hive-site | Whether to synchronize privileges from external authorizer periodically in HiveServer2. | 
| hive.cli.query.file.encoding | hive-site | File encoding for the all type of query files (query file, init query file, rc file etc) provided in the cli arguments. | 
| hive.emr.tez.shuffle.enabled | hive-site | Hive on Tez jobs now use tez\$1shuffle by default instead of mapreduce\$1shuffle as the default Shuffle Handler. | 

## Deprecated configurations
<a name="Hive-release-history-changes-6110-old-configurations"></a>

The following configuration properties are deprecated as a result of [HIVE-23354](https://issues.apache.org/jira/browse/HIVE-23354) and are no longer supported with Amazon EMR releases 6.11.0 and higher.


| Name | Default value | 
| --- | --- | 
| `hive.mapred.reduce.tasks.speculative.execution` | `false` | 
| `tez.am.speculation.enabled` | `false` | 

# Amazon EMR 6.10.0 - Hive release notes
<a name="Hive-release-history-6100"></a>

## Amazon EMR 6.10.0 - Hive changes
<a name="Hive-release-history-changes-6100"></a>


****  

| Type | Description | 
| --- | --- | 
| Feature | Enable AWS Lake Formation based access controls for Apache Hive queries (write) [ via IAM Passthrough (HiveCLI/Steps API)](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-steps-runtime-roles.html). | 
| Improvement | Disable config hive.log.explain.output by default to reduce log size | 
| Backport | [HIVE-26408](https://issues.apache.org/jira/browse/HIVE-26408): Vectorization: Fix deallocation of scratch columns, don't reuse a child ConstantVectorExpression as an output | 
| Backport | [HIVE-22269](https://issues.apache.org/jira/browse/HIVE-22269): Fix wrong reducers count in insert queries with dynamic partition due to missing of stats caused by [HIVE-20703](https://issues.apache.org/jira/browse/HIVE-20703). | 
| Backport | [HIVE-22891](https://issues.apache.org/jira/browse/HIVE-22891): Skip PartitionDesc Extraction In CombineHiveRecord For Non-LLAP Execution Mode  | 
| Backport | [HIVE-23804](https://issues.apache.org/jira/browse/HIVE-23804): Add default database for column stats specific tables in Hive metastore schema to make then backward compatible | 
| Backport | [HIVE-25277](https://issues.apache.org/jira/browse/HIVE-25277): Slow Hive partition deletion for Cloud object stores with expensive ListFiles | 
| Backport | [HIVE-19202](https://issues.apache.org/jira/browse/HIVE-19202): CBO failed due to NullPointerException in HiveAggregate.isBucketedInput() | 
| Backport | [HIVE-19048](https://issues.apache.org/jira/browse/HIVE-19048): Fix beeline Initscript errors are ignored | 
| Backport | [HIVE-21085](https://issues.apache.org/jira/browse/HIVE-21085): Materialized views registry starts non-external tez session | 
| Backport | [HIVE-21675](https://issues.apache.org/jira/browse/HIVE-21675): CREATE VIEW IF NOT EXISTS returns an error rather than "OK" if the view already exists. This is a regression from Hive 2.  | 
| Backport | [HIVE-21646](https://issues.apache.org/jira/browse/HIVE-21646): Tez: Prevent TezTasks from escaping thread logging context | 
| Backport | [HIVE-22054](https://issues.apache.org/jira/browse/HIVE-22054): Avoid recursive listing to check if a directory is empty | 
| Backport | [HIVE-16587](https://issues.apache.org/jira/browse/HIVE-16587): NPE when inserting complex types with nested null values | 
| Backport | [HIVE-22647](https://issues.apache.org/jira/browse/HIVE-22647): Enable session pool by default | 
| Backport | [HIVE-13288](https://issues.apache.org/jira/browse/HIVE-13288): Confusing exception message in DagUtils.localizeResource | 
| Backport | [HIVE-23870](https://issues.apache.org/jira/browse/HIVE-23870): Optimise multiple text conversions in WritableHiveCharObjectInspector.getPrimitiveJavaObject / HiveCharWritable | 
| Backport | [HIVE-21498](https://issues.apache.org/jira/browse/HIVE-21498): Upgrade Thrift to 0.13.0 | 
| Backport | [HIVE-24378](https://issues.apache.org/jira/browse/HIVE-24378): Leading and trailing spaces are not removed before decimal conversion | 
| Backport | [HIVE-21341](https://issues.apache.org/jira/browse/HIVE-21341): Sensible defaults : hive.server2.idle.operation.timeout and hive.server2.idle.session.timeout are too high | 
| Backport | [HIVE-22465](https://issues.apache.org/jira/browse/HIVE-22465): Add ssl conf in TezConfigurationFactory | 
| Backport | [HIVE-24710](https://issues.apache.org/jira/browse/HIVE-24710): Optimise PTF iteration for count(\$1) to reduce CPU and IO cost | 
| Backport | [HIVE-15406](https://issues.apache.org/jira/browse/HIVE-15406): Consider vectorizing the new 'trunc' function | 
| Backport | [HIVE-21541](https://issues.apache.org/jira/browse/HIVE-21541): Fix missing asf headers from HIVE-15406 | 
| Backport | [HIVE-24808](https://issues.apache.org/jira/browse/HIVE-24808): Cache Parsed Dates | 
| Backport | [HIVE-24746](https://issues.apache.org/jira/browse/HIVE-24746): PTF: TimestampValueBoundaryScanner can be optimised during range computation | 
| Backport | [HIVE-25059](https://issues.apache.org/jira/browse/HIVE-25059): Alter event is converted to rename during replication | 
| Backport | [HIVE-25142](https://issues.apache.org/jira/browse/HIVE-25142): Rehashing in map join fast hash table causing corruption for large keys | 
| Backport | [HIVE-23756](https://issues.apache.org/jira/browse/HIVE-23756): Added more constraints to the package.jdo file | 
| Backport | [HIVE-25150](https://issues.apache.org/jira/browse/HIVE-25150): Tab characters are not removed before decimal conversion similar to space character which is fixed as part of HIVE-24378 | 
| Backport | [HIVE-25093](https://issues.apache.org/jira/browse/HIVE-25093): date\$1format() UDF is returning output in UTC time zone only | 
| Backport | [HIVE-25268](https://issues.apache.org/jira/browse/HIVE-25268): date\$1format udf returns wrong results for dates prior to 1900 if the local timezone is other than UTC | 
| Backport | [HIVE-25338](https://issues.apache.org/jira/browse/HIVE-25338): AIOBE in conv UDF if input is empty | 
| Backport | [HIVE-22400](https://issues.apache.org/jira/browse/HIVE-22400): UDF minute with time returns NULL | 
| Backport | [HIVE-25058](https://issues.apache.org/jira/browse/HIVE-25058): PTF: TimestampValueBoundaryScanner can be optimised during range computation pt2 - isDistanceGreater | 
| Backport | [HIVE-25449](https://issues.apache.org/jira/browse/HIVE-25449): datediff() gives wrong output when run in a tez task with some non-UTC timezone | 
| Backport | [HIVE-23688](https://issues.apache.org/jira/browse/HIVE-23688): Vectorization: IndexArrayOutOfBoundsException For map type column which includes null value | 
| Backport | [HIVE-22247](https://issues.apache.org/jira/browse/HIVE-22247): HiveHFileOutputFormat throws FileNotFoundException when partition's task output empty | 
| Backport | [HIVE-25570](https://issues.apache.org/jira/browse/HIVE-25570): Hive should send full URL path for authorization for the command insert overwrite location | 
| Backport | [HIVE-22903](https://issues.apache.org/jira/browse/HIVE-22903): Vectorized row\$1number() resets the row number after one batch in case of constant expression in partition clause | 
| Backport | [HIVE-25549](https://issues.apache.org/jira/browse/HIVE-25549): Wrong results for window function with expression in PARTITION BY or ORDER BY clause | 
| Backport | [HIVE-25579](https://issues.apache.org/jira/browse/HIVE-25579): LOAD overwrite appends rather than overwriting | 
| Backport | [HIVE-25659](https://issues.apache.org/jira/browse/HIVE-25659): Metastore direct sql queries with IN/(NOT IN) should be split based on max parameters allowed by SQL DB | 
| Backport | [HIVE-20502](https://issues.apache.org/jira/browse/HIVE-20502): Fix NPE while running skewjoin\$1mapjoin10.q when column stats is used. | 
| Backport | [HIVE-25765](https://issues.apache.org/jira/browse/HIVE-25765): skip.header.line.count property skips rows of each block in FetchOperator when file size is larger | 
| Bug | Fix NPE on insert in certain scenarios when hive.stats.column.autogather and hive.groupby.skewindata are both enabled  | 
| Bug | Fix NPE when mapred.tasktracker.expiry.interval value is not set | 

# Amazon EMR 6.9.0 - Hive release notes
<a name="Hive-release-history-690"></a>

## Amazon EMR 6.9.0 - Hive changes
<a name="Hive-release-history-changes-690"></a>


****  

| Type | Description | 
| --- | --- | 
| Upgrade | Upgrade Jetty to [9.4.48.v20220622](https://github.com/eclipse/jetty.project/releases/tag/jetty-9.4.48.v20220622) | 
| Upgrade | Support for Hadoop 3.3.3 | 
| Feature | Amazon EMR Hive integration with Lake Formation for interactive workloads using GCSC API. | 
| Feature | Amazon EMR Hive integration with Iceberg. | 
| Improvement | Enable SSL in HiveServer2 when [in-transit encryption](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-data-encryption-options.html#emr-encryption-intransit) is enabled using Amazon EMR security configurations. | 
| Improvement | Enable Hive EMRFS Amazon S3 optimized committer by default. For more information see, [Enabling Hive EMRFS S3 optimized committer](hive-optimized-committer.md). | 
| Improvement | Add HiveHBaseTableInputFormatV2 that inherits only mapred version of InputFormat to fix [SPARK-34210](https://issues.apache.org/jira/browse/SPARK-34210). Set hive.hbase.inputformat.v2 to true to use it. | 
| Improvement | Wait for TezAM to launch in background with [hive.cli.tez.session.async]() instead of terminating it and launching new immediately. Use hive.emr.cli.tez.session.open.timeout to set this timeout in seconds.  | 
| Improvement | Add option [hive.conf.restricted.list.append]() to append comma-separated configs to existing restricted config list hive.conf.restricted.list. | 
| Improvement | Clearer error message when Hive query fails because location is not defined for database. | 
| Backport | [HIVE-24484](https://issues.apache.org/jira/browse/HIVE-24484): Upgrade Hadoop to 3.3.1 And Tez to 0.10.2  | 
| Backport | [HIVE-22398](https://issues.apache.org/jira/browse/HIVE-22398): Remove YARN queue management via ShimLoader.  | 
| Backport | [HIVE-23190](https://issues.apache.org/jira/browse/HIVE-23190): LLAP: modify IndexCache to pass filesystem object to TezSpillRecord. | 
| Backport | [HIVE-22185](https://issues.apache.org/jira/browse/HIVE-22185): HADOOP-15832 will cause problems with tests using MiniYarn clusters. | 
| Backport | [HIVE-21670](https://issues.apache.org/jira/browse/HIVE-21670): Replacing mockito-all with mockito-core dependency. | 
| Backport | [HIVE-24542](https://issues.apache.org/jira/browse/HIVE-24542): Prepare Guava for Upgrades. | 
| Backport | [HIVE-23751](https://issues.apache.org/jira/browse/HIVE-23751): QTest: Override \$1mkdirs() method in ProxyFileSystem to align after HADOOP-16582. | 
| Backport | [HIVE-21603](https://issues.apache.org/jira/browse/HIVE-21603): Java 11 preparation: update powermock version.  | 
| Backport | [HIVE-24083](https://issues.apache.org/jira/browse/HIVE-24083): hcatalog error in Hadoop 3.3.0: authentication type needed. | 
| Backport | [HIVE-24282](https://issues.apache.org/jira/browse/HIVE-24282): Show columns shouldn't sort output columns unless explicitly mentioned. | 
| Backport | [HIVE-20656](https://issues.apache.org/jira/browse/HIVE-20656): Sensible defaults: Map aggregation memory configs are too aggressive. | 
| Backport | [HIVE-25443](https://issues.apache.org/jira/browse/HIVE-25443): Arrow SerDe cannot serialize/deserialize complex data types when there are more than 1024 values | 
| Backport | [HIVE-19792](https://issues.apache.org/jira/browse/HIVE-19792): Upgrade orc to 1.5.2 and enable decimal\$164 schema evolution tests.  | 
| Backport | [HIVE-20437](https://issues.apache.org/jira/browse/HIVE-20437): Handle schema evolution from float, double, and decimal.  | 
| Backport | [HIVE-21987](https://issues.apache.org/jira/browse/HIVE-21987): Hive is unable to read Parquet int32 annotated with decimal.  | 
| Backport | [HIVE-20038](https://issues.apache.org/jira/browse/HIVE-20038): Update queries on non-bucketed and partitioned tables throws NPE. | 

## Amazon EMR 6.9.0 - Hive known issues
<a name="emr-Hive-690-issues"></a>
+ With Amazon EMR 6.6.0 through 6.9.x, INSERT queries with dynamic partition and an ORDER BY or SORT BY clause will always have two reducers. This issue is caused by OSS change [HIVE-20703](https://issues.apache.org/jira/browse/HIVE-20703), which puts dynamic sort partition optimization under cost-based decision. If your workload doesn't require sorting of dynamic partitions, we recommend that you set the `hive.optimize.sort.dynamic.partition.threshold` property to `-1` to disable the new feature and get the correctly calculated number of reducers. This issue is fixed in OSS Hive as part of [HIVE-22269](https://issues.apache.org/jira/browse/HIVE-22269) and is fixed in Amazon EMR 6.10.0.

# Amazon EMR 6.8.0 - Hive release notes
<a name="Hive-release-history-680"></a>

## Amazon EMR 6.8.0 - Hive changes
<a name="Hive-release-history-changes-680"></a>


| Type | Description | 
| --- | --- | 
| Improvement | Reduce file system calls in msck command. Performance improvements (\$115-20x on 10k\$1 partitions) | 
| Backport | [HIVE-20678](https://issues.apache.org/jira/browse/HIVE-20678): HiveHBaseTableOutputFormat should implement HiveOutputFormat to ensure compatibility | 
| Backport | [HIVE-21040](https://issues.apache.org/jira/browse/HIVE-21040): msck does unnecessary file listing at last level of directory tree | 
| Backport | [HIVE-21460](https://issues.apache.org/jira/browse/HIVE-21460): Load data followed by a select \$1 query results in incorrect results | 
| Backport | [HIVE-21660](https://issues.apache.org/jira/browse/HIVE-21660): Wrong result when union all and later view with explode is used | 
| Backport | [HIVE-22505](https://issues.apache.org/jira/browse/HIVE-22505): ClassCastException caused by wrong Vectorized operator selection | 
| Backport | [HIVE-22513](https://issues.apache.org/jira/browse/HIVE-22513): Constant propagation of casted column in filter ops can cause incorrect results | 
| Backport | [HIVE-23435](https://issues.apache.org/jira/browse/HIVE-23435): Full outer join result is missing rows | 
| Backport | [HIVE-24209](https://issues.apache.org/jira/browse/HIVE-24209): Incorrect search argument conversion for NOT BETWEEN operation when vectorization is enabled | 
| Backport | [HIVE-24934](https://issues.apache.org/jira/browse/HIVE-24934): VectorizedExpressions annotation is not needed in GenericUDFSQCountCheck | 
| Backport | [HIVE-25278](https://issues.apache.org/jira/browse/HIVE-25278): HiveProjectJoinTransposeRule may do invalid transformations with windowing expressions | 
| Backport | [HIVE-25505](https://issues.apache.org/jira/browse/HIVE-25505): Incorrect results with header. skip.header.line.count if first line is blank | 
| Backport | [HIVE-26080](https://issues.apache.org/jira/browse/HIVE-26080): Upgrade accumulo-core to 1.10.1 | 
| Backport | [HIVE-26235](https://issues.apache.org/jira/browse/HIVE-26235): OR Condition on binary column is returning empty result | 
| Bug | Fix multiple SLF4J bindings warning logs in stderr during launch | 
| Bug | Fix SHOW TABLE EXTENDED query failing with Wrong FS error when partition and table are on different file systems. | 

## Amazon EMR 6.8.0 - Hive known issues
<a name="emr-Hive-680-issues"></a>
+ With Amazon EMR 6.6.0 through 6.9.x, INSERT queries with dynamic partition and an ORDER BY or SORT BY clause will always have two reducers. This issue is caused by OSS change [HIVE-20703](https://issues.apache.org/jira/browse/HIVE-20703), which puts dynamic sort partition optimization under cost-based decision. If your workload doesn't require sorting of dynamic partitions, we recommend that you set the `hive.optimize.sort.dynamic.partition.threshold` property to `-1` to disable the new feature and get the correctly calculated number of reducers. This issue is fixed in OSS Hive as part of [HIVE-22269](https://issues.apache.org/jira/browse/HIVE-22269) and is fixed in Amazon EMR 6.10.0.

# Amazon EMR 6.7.0 - Hive release notes
<a name="Hive-release-history-670"></a>

## Amazon EMR 6.7.0 - Hive changes
<a name="Hive-release-history-changes-670"></a>


| Type | Description | 
| --- | --- | 
| Feature | [Amazon EMR Hive integration with LakeFormation](https://aws.amazon.com/about-aws/whats-new/2022/07/fine-grained-access-controls-job-scoped-iam-roles-integration-aws-lake-formation-apache-spark-hive-amazon-emr-ec2-clusters/). | 
| Feature | Additional audit logging for Hive EMRFS Amazon S3 optimized committer. Hive config: hive.blobstore.output-committer.logging, default: false | 
| Feature | Deleted target directory on insert overwrite with empty select result to an unpartitioned table/static partition to behave similarly to Hive 2.x. Hive config: hive.emr.iow.clean.target.dir, default: false | 
| Bug | Fixed intermittent query failure when using Hive EMRFS Amazon S3 optimized committer with partition bucket sorting. | 
| Upgrade | Hive version upgraded to 3.1.3. Refer to [Apache Hive 3.1.3 release notes](https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12346277&styleName=Html&projectId=12310843) for more details.  | 
| Upgrade | Upgraded Parquet to [1.12.2](https://github.com/apache/parquet-mr/blob/apache-parquet-1.12.2/CHANGES.md). | 
| Backport | [HIVE-20065](https://issues.apache.org/jira/browse/HIVE-20065): Metastore should not rely on jackson 1.x | 
| Backport | [HIVE-20071](https://issues.apache.org/jira/browse/HIVE-20071): Migrate to jackson 2.x and prevent usage | 
| Backport | [HIVE-20607](https://issues.apache.org/jira/browse/HIVE-20607): TxnHandler should use PreparedStatement to execute direct SQL queries | 
| Backport | [HIVE-20740](https://issues.apache.org/jira/browse/HIVE-20740): Remove global lock in ObjectStore.setConf method | 
| Backport | [HIVE-20961](https://issues.apache.org/jira/browse/HIVE-20961): Retire NVL implementation | 
| Backport | [HIVE-22059](https://issues.apache.org/jira/browse/HIVE-22059): hive-exec jar doesn't contain (fasterxml) jackson library | 
| Backport | [HIVE-22351](https://issues.apache.org/jira/browse/HIVE-22351): Fix incorrect threaded ObjectStore usage in TestObjectStore | 
| Backport | [HIVE-23534](https://issues.apache.org/jira/browse/HIVE-23534): NPE in RetryingMetaStoreClient\$1invoke when catching MetaException with no message | 
| Backport | [HIVE-24048](https://issues.apache.org/jira/browse/HIVE-24048): Harmonise Jackson components to version 2.10.latest - Hive | 
| Backport | [HIVE-24768](https://issues.apache.org/jira/browse/HIVE-24768): Use jackson-bom everywhere for version replacement | 
| Backport | [HIVE-24816](https://issues.apache.org/jira/browse/HIVE-24816): Upgrade jackson to 2.10.5.1 or 2.11.0\$1 due to CVE-2020-25649 | 
| Backport | [HIVE-25971](https://issues.apache.org/jira/browse/HIVE-25971): Tez task shutdown getting delayed due to cached thread pool not closed | 
| Backport | [HIVE-26036](https://issues.apache.org/jira/browse/HIVE-26036): NPE caused by getMTable() in ObjectStore | 

## Amazon EMR 6.7.0 - Hive known issues
<a name="emr-Hive-670-issues"></a>
+ Queries with windowing functions on the same column as join may lead to invalid transformations as reported in [HIVE-25278](https://issues.apache.org/jira/browse/HIVE-25278) and cause incorrect results or query failures. A workaround would be to disable CBO at the query level for such queries. The fix will be available in an Amazon EMR release following 6.7.0. For more information, contact AWS support.
+ With Amazon EMR 6.6.0 through 6.9.x, INSERT queries with dynamic partition and an ORDER BY or SORT BY clause will always have two reducers. This issue is caused by OSS change [HIVE-20703](https://issues.apache.org/jira/browse/HIVE-20703), which puts dynamic sort partition optimization under cost-based decision. If your workload doesn't require sorting of dynamic partitions, we recommend that you set the `hive.optimize.sort.dynamic.partition.threshold` property to `-1` to disable the new feature and get the correctly calculated number of reducers. This issue is fixed in OSS Hive as part of [HIVE-22269](https://issues.apache.org/jira/browse/HIVE-22269) and is fixed in Amazon EMR 6.10.0.

# Amazon EMR 6.6.0 - Hive release notes
<a name="Hive-release-history-660"></a>

## Amazon EMR 6.6.0 - Hive changes
<a name="Hive-release-history-changes-660"></a>


| Type | Description | 
| --- | --- | 
| Upgrade |  Upgrade Parquet to [1.12.1](https://issues.apache.org/jira/browse/HIVE-24408).  | 
| Upgrade |  Upgrade jetty jars version to 9.4.43.v20210629  | 
| Bug | Fixed an issue that was causing Hive to be installed on all task/core nodes when LLAP was enabled on a Hive cluster. | 
| Backport | [HIVE-25942](https://issues.apache.org/jira/browse/HIVE-25942): Upgrade commons-io to 2.8.0 due to CVE-2021-29425 | 
| Backport | [HIVE-25726](https://issues.apache.org/jira/browse/HIVE-25726): Upgrade velocity to 2.3 due to CVE-2020-13936 | 
| Backport | [HIVE-25680](https://issues.apache.org/jira/browse/HIVE-25680): Authorize \$1get\$1table\$1meta HiveMetastore Server API to use any of the HiveMetastore Authorization model. | 
| Backport | [HIVE-25554](https://issues.apache.org/jira/browse/HIVE-25554): Upgrade arrow version to 0.15 | 
| Backport | [HIVE-25242](https://issues.apache.org/jira/browse/HIVE-25242): Query performs extremely slow with vectorized.adaptor = chosen | 
| Backport | [HIVE-25085](https://issues.apache.org/jira/browse/HIVE-25085): MetaStore Clients no longer shared across sessions. | 
| Backport | [HIVE-24827](https://issues.apache.org/jira/browse/HIVE-24827): Hive aggregation query returns incorrect results for non text files. | 
| Backport | [HIVE-24683](https://issues.apache.org/jira/browse/HIVE-24683): Hadoop23Shims getFileId prone to NPE for non-existing paths | 
| Backport | [HIVE-24656](https://issues.apache.org/jira/browse/HIVE-24656): CBO fails for queries with is null on map and array types | 
| Backport | [HIVE-24556](https://issues.apache.org/jira/browse/HIVE-24556): Optimize DefaultGraphWalker for case with no grandchild | 
| Backport | [HIVE-24408](https://issues.apache.org/jira/browse/HIVE-24408): Upgrade Parquet to 1.11.1 | 
| Backport | [HIVE-24391](https://issues.apache.org/jira/browse/HIVE-24391): Fix FIX TestOrcFile failures in branch-3.1 | 
| Backport | [HIVE-24362](https://issues.apache.org/jira/browse/HIVE-24362): AST tree processing is suboptimal for tree with large number of nodes | 
| Backport | [HIVE-24316](https://issues.apache.org/jira/browse/HIVE-24316): Upgrade ORC from 1.5.6 to 1.5.8 in branch-3.1 | 
| Backport | [HIVE-24307](https://issues.apache.org/jira/browse/HIVE-24307): Beeline with property-file and -e parameter is failing | 
| Backport | [HIVE-24245](https://issues.apache.org/jira/browse/HIVE-24245): Vectorized PTF with count and distinct over partition producing incorrect results. | 
| Backport | [HIVE-24224](https://issues.apache.org/jira/browse/HIVE-24224): Fix skipping header/footer for Hive on Tez on compressed file | 
| Backport | [HIVE-24157](https://issues.apache.org/jira/browse/HIVE-24157): Strict mode to fail on CAST timestamp ↔ numeric | 
| Backport | [HIVE-24113](https://issues.apache.org/jira/browse/HIVE-24113): NPE in GenericUDFToUnixTimeStamp | 
| Backport | [HIVE-23987](https://issues.apache.org/jira/browse/HIVE-23987): Upgrade arrow version to 0.11.0 | 
| Backport | [HIVE-23972](https://issues.apache.org/jira/browse/HIVE-23972): Add external client ID to LLAP external client | 
| Backport | [HIVE-23806](https://issues.apache.org/jira/browse/HIVE-23806): Avoid clearing column stat states in all partition in case schema is extended. This improves runtime of alter table add columns statement. | 
| Backport | [HIVE-23779](https://issues.apache.org/jira/browse/HIVE-23779): BasicStatsTask Info is not getting printed in beeline console | 
| Backport | [HIVE-23306](https://issues.apache.org/jira/browse/HIVE-23306): RESET command does not work if there is a config set by System.getProperty | 
| Backport | [HIVE-23164](https://issues.apache.org/jira/browse/HIVE-23164): Server is not properly terminated because of non-daemon threads | 
| Backport | [HIVE-22967](https://issues.apache.org/jira/browse/HIVE-22967): Support hive.reloadable.aux.jars.path for Hive on Tez | 
| Backport | [HIVE-22934](https://issues.apache.org/jira/browse/HIVE-22934): Hive server interactive log counters to error stream | 
| Backport | [HIVE-22901](https://issues.apache.org/jira/browse/HIVE-22901): Variable substitution can lead to OOM on circular references | 
| Backport | [HIVE-22769](https://issues.apache.org/jira/browse/HIVE-22769): Incorrect query results and query failure during split generation for compressed text files | 
| Backport | [HIVE-22716](https://issues.apache.org/jira/browse/HIVE-22716): Reading to ByteBuffer is broken in ParquetFooterInputFromCache | 
| Backport | [HIVE-22648](https://issues.apache.org/jira/browse/HIVE-22648): Upgrade Parquet to 1.11.0 | 
| Backport | [HIVE-22640](https://issues.apache.org/jira/browse/HIVE-22640): Decimal64ColumnVector: ClassCastException when partition column type is Decimal | 
| Backport | [HIVE-22621](https://issues.apache.org/jira/browse/HIVE-22621): unstable testcase: TestLlapSignerImpl.testSigning | 
| Backport | [HIVE-22533](https://issues.apache.org/jira/browse/HIVE-22533): Fix possible LLAP daemon web UI vulnerabilities | 
| Backport | [HIVE-22532](https://issues.apache.org/jira/browse/HIVE-22532): PTFPPD may push limit incorrectly through Rank/DenseRank function | 
| Backport | [HIVE-22514](https://issues.apache.org/jira/browse/HIVE-22514): HiveProtoLoggingHook might consume lots of memory | 
| Backport | [HIVE-22476](https://issues.apache.org/jira/browse/HIVE-22476): Hive datediff function provided inconsistent results when hive.fetch.task.conversion is set to none | 
| Backport | [HIVE-22429](https://issues.apache.org/jira/browse/HIVE-22429): Migrated clustered tables using bucketing\$1version 1 on hive 3 uses bucketing\$1version 2 for inserts | 
| Backport | [HIVE-22412](https://issues.apache.org/jira/browse/HIVE-22412): StatsUtils throw NPE when explain | 
| Backport | [HIVE-22360](https://issues.apache.org/jira/browse/HIVE-22360): MultiDelimitSerDe returns wrong results in last column when the loaded file has more columns than those in table schema | 
| Backport | [HIVE-22332](https://issues.apache.org/jira/browse/HIVE-22332): Hive should ensure valid schema evolution settings since ORC-540 | 
| Backport | [HIVE-22331](https://issues.apache.org/jira/browse/HIVE-22331): unix\$1timestamp without argument returns timestamp in millisecond instead of second | 
| Backport | [HIVE-22275](https://issues.apache.org/jira/browse/HIVE-22275): OperationManager.queryIdOperation does not properly clean up multiple queryIds | 
| Backport | [HIVE-22273](https://issues.apache.org/jira/browse/HIVE-22273): Access check is failed when a temporary directory is removed | 
| Backport | [HIVE-22270](https://issues.apache.org/jira/browse/HIVE-22270): Upgrade commons-io to 2.6 | 
| Backport | [HIVE-22241](https://issues.apache.org/jira/browse/HIVE-22241): Implement UDF to interpret date/timestamp using its internal representation and Gregorian-Julian hybrid calendar | 
| Backport | [HIVE-22241](https://issues.apache.org/jira/browse/HIVE-22241): Implement UDF to interpret date/timestamp using its internal representation and Gregorian-Julian hybrid | 
| Backport | [HIVE-22232](https://issues.apache.org/jira/browse/HIVE-22232): NPE when hive.order.columnalignment is set to false | 
| Backport | [HIVE-22231](https://issues.apache.org/jira/browse/HIVE-22231): Hive query with big size via knox fails with Broken pipe Write failed | 
| Backport | [HIVE-22221](https://issues.apache.org/jira/browse/HIVE-22221): Llap external client - Need to reduce LlapBaseInputFormat\$1getSplits | 
| Backport | [HIVE-22208](https://issues.apache.org/jira/browse/HIVE-22208): Column name with reserved keyword is unescaped when query including join on table with mask column is re-written | 
| Backport | [HIVE-22197](https://issues.apache.org/jira/browse/HIVE-22197): Common Merge join throwing class cast exception. | 
| Backport | [HIVE-22170](https://issues.apache.org/jira/browse/HIVE-22170): from\$1unixtime and unix\$1timestamp should use user session time zone | 
| Backport | [HIVE-22169](https://issues.apache.org/jira/browse/HIVE-22169): Tez: SplitGenerator tries to look for plan files which won't exist for Tez | 
| Backport | [HIVE-22168](https://issues.apache.org/jira/browse/HIVE-22168): Remove very expensive logging from the llap cache hotpath | 
| Backport | [HIVE-22161](https://issues.apache.org/jira/browse/HIVE-22161): UDF: FunctionRegistry synchronizes on org.apache.hadoop.hive.ql.udf.UDFType class | 
| Backport | [HIVE-22120](https://issues.apache.org/jira/browse/HIVE-22120): Fix wrong results/ArrayOutOfBound exception in left outer map joins on specific boundary conditions | 
| Backport | [HIVE-22115](https://issues.apache.org/jira/browse/HIVE-22115): Prevent the creation of query routing appender if property is set to false | 
| Backport | [HIVE-22113](https://issues.apache.org/jira/browse/HIVE-22113): Prevent LLAP shutdown on AMReporter related RuntimeException | 
| Backport | [HIVE-22106](https://issues.apache.org/jira/browse/HIVE-22106): Remove cross-query synchronization for the partition-eval | 
| Backport | [HIVE-22099](https://issues.apache.org/jira/browse/HIVE-22099): Several date related UDFs can't handle Julian dates properly since HIVE-20007 | 
| Backport | [HIVE-22037](https://issues.apache.org/jira/browse/HIVE-22037): HS2 should log when shutting down due to OOM | 
| Backport | [HIVE-21976](https://issues.apache.org/jira/browse/HIVE-21976): Offset should be null instead of zero in Calcite HiveSortLimit | 
| Backport | [HIVE-21924](https://issues.apache.org/jira/browse/HIVE-21924): Split text files even if header/footer exists | 
| Backport | [HIVE-21913](https://issues.apache.org/jira/browse/HIVE-21913): GenericUDTFGetSplits should handle usernames in the same way as LLAP | 
| Backport | [HIVE-21905](https://issues.apache.org/jira/browse/HIVE-21905): Generics improvement around the FetchOperator class | 
| Backport | [HIVE-21902](https://issues.apache.org/jira/browse/HIVE-21902): HiveServer2 UI: jetty response header needs X-Frame-Options | 
| Backport | [HIVE-21888](https://issues.apache.org/jira/browse/HIVE-21888): Set hive.parquet.timestamp.skip.conversion default to true | 
| Backport | [HIVE-21868](https://issues.apache.org/jira/browse/HIVE-21868): Vectorize CAST...FORMAT | 
| Backport | [HIVE-21864](https://issues.apache.org/jira/browse/HIVE-21864): LlapBaseInputFormat\$1closeAll | 
| Backport | [HIVE-21863](https://issues.apache.org/jira/browse/HIVE-21863): Improve Vectorizer type casting for WHEN expression | 
| Backport | [HIVE-21862](https://issues.apache.org/jira/browse/HIVE-21862): ORC ppd produces wrong result with timestamp | 
| Backport | [HIVE-21846](https://issues.apache.org/jira/browse/HIVE-21846): Create a thread in TezAM which periodically fetches LlapDaemon metrics | 
| Backport | [HIVE-21837](https://issues.apache.org/jira/browse/HIVE-21837): MapJoin is throwing exception when selected column is having completely null values | 
| Backport | [HIVE-21834](https://issues.apache.org/jira/browse/HIVE-21834): Avoid unnecessary calls to simplify filter conditions | 
| Backport | [HIVE-21832](https://issues.apache.org/jira/browse/HIVE-21832): New metrics to get the average queue/serving/response time | 
| Backport | [HIVE-21827](https://issues.apache.org/jira/browse/HIVE-21827): Multiple calls in SemanticAnalyzer do not go through getTableObjectByName method | 
| Backport | [HIVE-21822](https://issues.apache.org/jira/browse/HIVE-21822): Expose LlapDaemon metrics through a new API method | 
| Backport | [HIVE-21818](https://issues.apache.org/jira/browse/HIVE-21818): CBO: Copying TableRelOptHiveTable has metastore traffic | 
| Backport | [HIVE-21815](https://issues.apache.org/jira/browse/HIVE-21815): Stats in ORC file are parsed twice | 
| Backport | [HIVE-21805](https://issues.apache.org/jira/browse/HIVE-21805): HiveServer2: Use the fast ShutdownHookManager APIs | 
| Backport | [HIVE-21799](https://issues.apache.org/jira/browse/HIVE-21799): NullPointerException in DynamicPartitionPruningOptimization, when join key is on aggregation column | 
| Backport | [HIVE-21794](https://issues.apache.org/jira/browse/HIVE-21794): Add materialized view parameters to sqlStdAuthSafeVarNameRegexes | 
| Backport | [HIVE-21768](https://issues.apache.org/jira/browse/HIVE-21768): JDBC: Strip the default union prefix for un-enclosed UNION queries | 
| Backport | [HIVE-21746](https://issues.apache.org/jira/browse/HIVE-21746): ArrayIndexOutOfBoundsException during dynamically partitioned hash join, with CBO disabled | 
| Backport | [HIVE-21717](https://issues.apache.org/jira/browse/HIVE-21717): Rename is failing for directory in move task. | 
| Backport | [HIVE-21685](https://issues.apache.org/jira/browse/HIVE-21685): Wrong simplification in query with multiple IN clauses | 
| Backport | [HIVE-21681](https://issues.apache.org/jira/browse/HIVE-21681): Describe formatted shows incorrect information for multiple primary keys | 
| Backport | [HIVE-21651](https://issues.apache.org/jira/browse/HIVE-21651): Move protobuf serde into hive-exec. | 
| Backport | [HIVE-21619](https://issues.apache.org/jira/browse/HIVE-21619): Print timestamp type without precision in SQL explain extended | 
| Backport | [HIVE-21592](https://issues.apache.org/jira/browse/HIVE-21592): OptimizedSql is not shown when the expression contains CONCAT | 
| Backport | [HIVE-21576](https://issues.apache.org/jira/browse/HIVE-21576): Introduce CAST...FORMAT and limited list of SQL:2016 datetime formats | 
| Backport | [HIVE-21573](https://issues.apache.org/jira/browse/HIVE-21573): Binary transport shall ignore principal if auth is set to delegationToken | 
| Backport | [HIVE-21550](https://issues.apache.org/jira/browse/HIVE-21550): TestObjectStore tests are flaky - A lock could not be obtained within the time requested | 
| Backport | [HIVE-21544](https://issues.apache.org/jira/browse/HIVE-21544): Constant propagation corrupts coalesce/case/when expressions during folding | 
| Backport | [HIVE-21539](https://issues.apache.org/jira/browse/HIVE-21539): GroupBy \$1 where clause on same column results in incorrect query rewrite | 
| Backport | [HIVE-21538](https://issues.apache.org/jira/browse/HIVE-21538): Beeline: password source though the console reader did not pass to connection param | 
| Backport | [HIVE-21509](https://issues.apache.org/jira/browse/HIVE-21509): LLAP may cache corrupted column vectors and return wrong query result | 
| Backport | [HIVE-21499](https://issues.apache.org/jira/browse/HIVE-21499): should not remove the function from registry if create command failed with AlreadyExistsException | 
| Backport | [HIVE-21496](https://issues.apache.org/jira/browse/HIVE-21496): Automatic sizing of unordered buffer can overflow | 
| Backport | [HIVE-21468](https://issues.apache.org/jira/browse/HIVE-21468): Case sensitivity in identifier names for JDBC storage handler | 
| Backport | [HIVE-21467](https://issues.apache.org/jira/browse/HIVE-21467): Remove deprecated junit.framework.Assert imports | 
| Backport | [HIVE-21435](https://issues.apache.org/jira/browse/HIVE-21435): LlapBaseInputFormat should get task number from TASK\$1ATTEMPT\$1ID conf if present, while building SubmitWorkRequestProto | 
| Backport | [HIVE-21389](https://issues.apache.org/jira/browse/HIVE-21389): Hive distribution miss javax.ws.rs-api.jar after HIVE-21247 | 
| Backport | [HIVE-21385](https://issues.apache.org/jira/browse/HIVE-21385): Allow disabling pushdown of non-splittable computation to JDBC sources | 
| Backport | [HIVE-21383](https://issues.apache.org/jira/browse/HIVE-21383): JDBC storage handler: Use catalog and schema to retrieve tables if specified | 
| Backport | [HIVE-21382](https://issues.apache.org/jira/browse/HIVE-21382): Group by keys reduction optimization - keys are not reduced in query23 | 
| Backport | [HIVE-21362](https://issues.apache.org/jira/browse/HIVE-21362): Add an input format and serde to read from protobuf files. | 
| Backport | [HIVE-21340](https://issues.apache.org/jira/browse/HIVE-21340): CBO: Prune non-key columns feeding into a SemiJoin | 
| Backport | [HIVE-21332](https://issues.apache.org/jira/browse/HIVE-21332): Purge the non locked buffers instead of locked ones | 
| Backport | [HIVE-21329](https://issues.apache.org/jira/browse/HIVE-21329): Custom Tez runtime unordered output buffer size depending on operator pipeline | 
| Backport | [HIVE-21295](https://issues.apache.org/jira/browse/HIVE-21295): StorageHandler shall convert date to string using Hive convention | 
| Backport | [HIVE-21294](https://issues.apache.org/jira/browse/HIVE-21294): Vectorization: 1-reducer Shuffle can skip the object hash functions | 
| Backport | [HIVE-21255](https://issues.apache.org/jira/browse/HIVE-21255): Remove QueryConditionBuilder in JdbcStorageHandler | 
| Backport | [HIVE-21253](https://issues.apache.org/jira/browse/HIVE-21253): Support DB2 in JDBC StorageHandler | 
| Backport | [HIVE-21232](https://issues.apache.org/jira/browse/HIVE-21232): LLAP: Add a cache-miss friendly split affinity provider | 
| Backport | [HIVE-21214](https://issues.apache.org/jira/browse/HIVE-21214): MoveTask : Use attemptId instead of file size for deduplication of files compareTempOrDuplicateFiles | 
| Backport | [HIVE-21184](https://issues.apache.org/jira/browse/HIVE-21184): Add explain and explain formatted CBO plan with cost information | 
| Backport | [HIVE-21182](https://issues.apache.org/jira/browse/HIVE-21182): Skip setting up hive scratch dir during planning | 
| Backport | [HIVE-21171](https://issues.apache.org/jira/browse/HIVE-21171): Skip creating scratch dirs for tez if RPC is on | 
| Backport | [HIVE-21126](https://issues.apache.org/jira/browse/HIVE-21126): Allow session level queries in LlapBaseInputFormat\$1getSplit | 
| Backport | [HIVE-21107](https://issues.apache.org/jira/browse/HIVE-21107): Cannot find field" error during dynamically partitioned hash join | 
| Backport | [HIVE-21061](https://issues.apache.org/jira/browse/HIVE-21061): CTAS query fails with IllegalStateException for empty source | 
| Backport | [HIVE-21041](https://issues.apache.org/jira/browse/HIVE-21041): NPE, ParseException in getting schema from logical plan | 
| Backport | [HIVE-21013](https://issues.apache.org/jira/browse/HIVE-21013): JdbcStorageHandler fail to find partition column in Oracle | 
| Backport | [HIVE-21006](https://issues.apache.org/jira/browse/HIVE-21006): Extend SharedWorkOptimizer to remove semijoins when there is a reutilization opportunity | 
| Backport | [HIVE-20992](https://issues.apache.org/jira/browse/HIVE-20992): Split the config hive.metastore.dbaccess.ssl.properties into more meaningful configs | 
| Backport | [HIVE-20989](https://issues.apache.org/jira/browse/HIVE-20989): JDBC - The GetOperationStatus \$1 log can block query progress via sleep | 
| Backport | [HIVE-20988](https://issues.apache.org/jira/browse/HIVE-20988): Wrong results for group by queries with primary key on multiple columns | 
| Backport | [HIVE-20985](https://issues.apache.org/jira/browse/HIVE-20985): If select operator inputs are temporary columns vectorization may reuse some of them as output | 
| Backport | [HIVE-20978](https://issues.apache.org/jira/browse/HIVE-20978): "hive.jdbc.\$1" should add to sqlStdAuthSafeVarNameRegexes | 
| Backport | [HIVE-20953](https://issues.apache.org/jira/browse/HIVE-20953): Remove a function from function registry when it can not be added to the metastore when creating it. | 
| Backport | [HIVE-20952](https://issues.apache.org/jira/browse/HIVE-20952): Cleaning VectorizationContext.java | 
| Backport | [HIVE-20951](https://issues.apache.org/jira/browse/HIVE-20951): LLAP: Set Xms to 50% always | 
| Backport | [HIVE-20949](https://issues.apache.org/jira/browse/HIVE-20949): Improve PKFK cardinality estimation in physical planning | 
| Backport | [HIVE-20944](https://issues.apache.org/jira/browse/HIVE-20944): Not validate stats during query compilation | 
| Backport | [HIVE-20940](https://issues.apache.org/jira/browse/HIVE-20940): Bridge cases in which Calcite's type resolution is more stricter than Hive. | 
| Backport | [HIVE-20937](https://issues.apache.org/jira/browse/HIVE-20937): Postgres jdbc query fail with "LIMIT must not be negative" | 
| Backport | [HIVE-20926](https://issues.apache.org/jira/browse/HIVE-20926): Semi join reduction hint fails when bloom filter entries are high or when there are no stats | 
| Backport | [HIVE-20920](https://issues.apache.org/jira/browse/HIVE-20920): Use SQL constraints to improve join reordering algorithm | 
| Backport | [HIVE-20918](https://issues.apache.org/jira/browse/HIVE-20918): Flag to enable/disable pushdown of computation from Calcite into JDBC connection | 
| Backport | [HIVE-20915](https://issues.apache.org/jira/browse/HIVE-20915): Make dynamic sort partition optimization available to HoS and MR | 
| Backport | [HIVE-20910](https://issues.apache.org/jira/browse/HIVE-20910): Insert in bucketed table fails due to dynamic partition sort optimization | 
| Backport | [HIVE-20899](https://issues.apache.org/jira/browse/HIVE-20899): Keytab URI for LLAP YARN Service is restrictive to support HDFS only | 
| Backport | [HIVE-20898](https://issues.apache.org/jira/browse/HIVE-20898): For time related functions arguments may not be casted to a non nullable type | 
| Backport | [HIVE-20881](https://issues.apache.org/jira/browse/HIVE-20881): Constant propagation oversimplifies projections | 
| Backport | [HIVE-20880](https://issues.apache.org/jira/browse/HIVE-20880): Update default value for hive.stats.filter.in.min.ratio | 
| Backport | [HIVE-20873](https://issues.apache.org/jira/browse/HIVE-20873): Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision | 
| Backport | [HIVE-20868](https://issues.apache.org/jira/browse/HIVE-20868): SMB Join fails intermittently when TezDummyOperator has child op in getFinalOp in MapRecordProcessor | 
| Backport | [HIVE-20853](https://issues.apache.org/jira/browse/HIVE-20853): Expose ShuffleHandler.registerDag in the llap daemon API | 
| Backport | [HIVE-20850](https://issues.apache.org/jira/browse/HIVE-20850): Push case conditional from projections to dimension tables if possible | 
| Backport | [HIVE-20842](https://issues.apache.org/jira/browse/HIVE-20842): Fix logic introduced in HIVE-20660 to estimate statistics for group by | 
| Backport | [HIVE-20839](https://issues.apache.org/jira/browse/HIVE-20839): "Cannot find field" error during dynamically partitioned hash join | 
| Backport | [HIVE-20835](https://issues.apache.org/jira/browse/HIVE-20835): Interaction between constraints and MV rewriting may create loop in Calcite planner | 
| Backport | [HIVE-20834](https://issues.apache.org/jira/browse/HIVE-20834): Hive QueryResultCache entries keeping reference to SemanticAnalyzer from cached query | 
| Backport | [HIVE-20830](https://issues.apache.org/jira/browse/HIVE-20830): JdbcStorageHandler range query assertion failure in some cases | 
| Backport | [HIVE-20829](https://issues.apache.org/jira/browse/HIVE-20829): JdbcStorageHandler range split throws NPE | 
| Backport | [HIVE-20827](https://issues.apache.org/jira/browse/HIVE-20827): Inconsistent results for empty arrays | 
| Backport | [HIVE-20826](https://issues.apache.org/jira/browse/HIVE-20826): Enhance HiveSemiJoin rule to convert join \$1 group by on left side to Left Semi Join | 
| Backport | [HIVE-20821](https://issues.apache.org/jira/browse/HIVE-20821): Rewrite SUM0 into SUM \$1 COALESCE combination | 
| Backport | [HIVE-20815](https://issues.apache.org/jira/browse/HIVE-20815): JdbcRecordReader.next shall not eat exception | 
| Backport | [HIVE-20813](https://issues.apache.org/jira/browse/HIVE-20813): udf to\$1epoch\$1milli need to support timestamp without time zone as well. | 
| Backport | [HIVE-20804](https://issues.apache.org/jira/browse/HIVE-20804): Further improvements to group by optimization with constraints | 
| Backport | [HIVE-20792](https://issues.apache.org/jira/browse/HIVE-20792): Inserting timestamp with zones truncates the data | 
| Backport | [HIVE-20788](https://issues.apache.org/jira/browse/HIVE-20788): Extended SJ reduction may backtrack columns incorrectly when creating filters | 
| Backport | [HIVE-20778](https://issues.apache.org/jira/browse/HIVE-20778): Join reordering may not be triggered if all joins in plan are created by decorrelation logic | 
| Backport | [HIVE-20772](https://issues.apache.org/jira/browse/HIVE-20772): record per-task CPU counters in LLAP | 
| Backport | [HIVE-20768](https://issues.apache.org/jira/browse/HIVE-20768): Adding Tumbling Window UDF | 
| Backport | [HIVE-20767](https://issues.apache.org/jira/browse/HIVE-20767): Multiple project between join operators may affect join reordering using constraints | 
| Backport | [HIVE-20762](https://issues.apache.org/jira/browse/HIVE-20762): NOTIFICATION\$1LOG cleanup interval is hardcoded as 60s and is too small | 
| Backport | [HIVE-20761](https://issues.apache.org/jira/browse/HIVE-20761): Select for update on notification\$1sequence table has retry interval and retries count too small | 
| Backport | [HIVE-20751](https://issues.apache.org/jira/browse/HIVE-20751): Upgrade arrow version to 0.10.0 | 
| Backport | [HIVE-20746](https://issues.apache.org/jira/browse/HIVE-20746): HiveProtoHookLogger does not close file at end of day. | 
| Backport | [HIVE-20744](https://issues.apache.org/jira/browse/HIVE-20744): Use SQL constraints to improve join reordering algorithm | 
| Backport | [HIVE-20740](https://issues.apache.org/jira/browse/HIVE-20740): Remove global lock in ObjectStore.setConf method. This cherrypick backports HIVE-20740 intended for Hive 3.2 and 4.x to 3.1.x | 
| Backport | [HIVE-20734](https://issues.apache.org/jira/browse/HIVE-20734): Beeline: When beeline-site.xml is and hive CLI redirects to beeline, it should use the system username/dummy password instead of prompting for one | 
| Backport | [HIVE-20731](https://issues.apache.org/jira/browse/HIVE-20731): keystore file in JdbcStorageHandler should be authorized | 
| Backport | [HIVE-20720](https://issues.apache.org/jira/browse/HIVE-20720): Add partition column option to JDBC handler | 
| Backport | [HIVE-20719](https://issues.apache.org/jira/browse/HIVE-20719): SELECT statement fails after UPDATE with hive.optimize.sort.dynamic.partition optimization and vectorization on | 
| Backport | [HIVE-20718](https://issues.apache.org/jira/browse/HIVE-20718): Add perf cli driver with constraints | 
| Backport | [HIVE-20716](https://issues.apache.org/jira/browse/HIVE-20716): Set default value for hive.cbo.stats.correlated.multi.key.joins to true | 
| Backport | [HIVE-20712](https://issues.apache.org/jira/browse/HIVE-20712): HivePointLookupOptimizer should extract deep cases | 
| Backport | [HIVE-20710](https://issues.apache.org/jira/browse/HIVE-20710): Constant folding may not create null constants without types | 
| Backport | [HIVE-20706](https://issues.apache.org/jira/browse/HIVE-20706): external\$1jdbc\$1table2.q failing intermittently | 
| Backport | [HIVE-20704](https://issues.apache.org/jira/browse/HIVE-20704): Extend HivePreFilteringRule to support other functions | 
| Backport | [HIVE-20703](https://issues.apache.org/jira/browse/HIVE-20703): Put dynamic sort partition optimization under cost based decision | 
| Backport | [HIVE-20702](https://issues.apache.org/jira/browse/HIVE-20702): Account for overhead from datastructure aware estimations during mapjoin selection | 
| Backport | [HIVE-20692](https://issues.apache.org/jira/browse/HIVE-20692): Enable folding of NOT x IS (NOT) [TRUE\$1FALSE] expressions | 
| Backport | [HIVE-20691](https://issues.apache.org/jira/browse/HIVE-20691): Fix org.apache.hadoop.hive.cli.TestMiniLlapCliDriver.testCliDriver[cttl] | 
| Backport | [HIVE-20682](https://issues.apache.org/jira/browse/HIVE-20682): Async query execution can potentially fail if shared sessionHive is closed by master thread | 
| Backport | [HIVE-20676](https://issues.apache.org/jira/browse/HIVE-20676): HiveServer2: PrivilegeSynchronizer is not set to daemon status | 
| Backport | [HIVE-20660](https://issues.apache.org/jira/browse/HIVE-20660): Group by statistics estimation could be improved by bounding the total number of rows to source table | 
| Backport | [HIVE-20652](https://issues.apache.org/jira/browse/HIVE-20652): JdbcStorageHandler push join of two different datasource to jdbc driver | 
| Backport | [HIVE-20651](https://issues.apache.org/jira/browse/HIVE-20651): JdbcStorageHandler password should be encrypted | 
| Backport | [HIVE-20649](https://issues.apache.org/jira/browse/HIVE-20649): LLAP aware memory manager for Orc writers | 
| Backport | [HIVE-20648](https://issues.apache.org/jira/browse/HIVE-20648): LLAP: Vector group by operator should use memory per executor | 
| Backport | [HIVE-20646](https://issues.apache.org/jira/browse/HIVE-20646): Partition filter condition is not pushed down to metastore query if it has IS NOT NULL | 
| Backport | [HIVE-20644](https://issues.apache.org/jira/browse/HIVE-20644): Avoid exposing sensitive infomation through a Hive Runtime exception | 
| Backport | [HIVE-20636](https://issues.apache.org/jira/browse/HIVE-20636): Improve number of null values estimation after outer join | 
| Backport | [HIVE-20632](https://issues.apache.org/jira/browse/HIVE-20632): Query with get\$1splits UDF fails if materialized view is created on queried table | 
| Backport | [HIVE-20627](https://issues.apache.org/jira/browse/HIVE-20627): Concurrent async queries intermittently fails with LockException and cause memory leak | 
| Backport | [HIVE-20623](https://issues.apache.org/jira/browse/HIVE-20623): Shared work: Extend sharing of map-join cache entries in LLAP | 
| Backport | [HIVE-20619](https://issues.apache.org/jira/browse/HIVE-20619): Include MultiDelimitSerDe in HiveServer2 By Default | 
| Backport | [HIVE-20618](https://issues.apache.org/jira/browse/HIVE-20618): During join selection BucketMapJoin might be choosen for non bucketed tables | 
| Backport | [HIVE-20617](https://issues.apache.org/jira/browse/HIVE-20617): Fix type of constants in IN expressions to have correct type | 
| Backport | [HIVE-20612](https://issues.apache.org/jira/browse/HIVE-20612): Create new join multi-key correlation flag for CBO | 
| Backport | [HIVE-20603](https://issues.apache.org/jira/browse/HIVE-20603): "Wrong FS" error when inserting to partition after changing table location filesystem | 
| Backport | [HIVE-20601](https://issues.apache.org/jira/browse/HIVE-20601): EnvironmentContext null in ALTER\$1PARTITION event in DbNotificationListener | 
| Backport | [HIVE-20583](https://issues.apache.org/jira/browse/HIVE-20583): Use canonical hostname only for kerberos auth in HiveConnection | 
| Backport | [HIVE-20582](https://issues.apache.org/jira/browse/HIVE-20582): Make hflush in hive proto logging configurable | 
| Backport | [HIVE-20563](https://issues.apache.org/jira/browse/HIVE-20563): Vectorization: CASE WHEN expression fails when THEN/ELSE type and result type are different | 
| Backport | [HIVE-20558](https://issues.apache.org/jira/browse/HIVE-20558): Change default of hive.hashtable.key.count.adjustment to 0.99 | 
| Backport | [HIVE-20552](https://issues.apache.org/jira/browse/HIVE-20552): Get Schema from LogicalPlan faster | 
| Backport | [HIVE-20550](https://issues.apache.org/jira/browse/HIVE-20550): Switch WebHCat to use beeline to submit Hive queries | 
| Backport | [HIVE-20537](https://issues.apache.org/jira/browse/HIVE-20537): Multi-column joins estimates with uncorrelated columns different in CBO and Hive | 
| Backport | [HIVE-20524](https://issues.apache.org/jira/browse/HIVE-20524): Schema Evolution checking is broken in going from Hive version 2 to version 3 for ALTER TABLE VARCHAR to DECIMAL | 
| Backport | [HIVE-20522](https://issues.apache.org/jira/browse/HIVE-20522): HiveFilterSetOpTransposeRule may throw assertion error due to nullability of fields | 
| Backport | [HIVE-20521](https://issues.apache.org/jira/browse/HIVE-20521): HS2 doAs=true has permission issue with hadoop.tmp.dir, with MR and S3A filesystem | 
| Backport | [HIVE-20515](https://issues.apache.org/jira/browse/HIVE-20515): Empty query results when using results cache and query temp dir, results cache dir in different filesystems | 
| Backport | [HIVE-20508](https://issues.apache.org/jira/browse/HIVE-20508): Hive does not support user names of type "user@realm" | 
| Backport | [HIVE-20507](https://issues.apache.org/jira/browse/HIVE-20507): Beeline: Add a utility command to retrieve all uris from beeline-site.xml | 
| Backport | [HIVE-20505](https://issues.apache.org/jira/browse/HIVE-20505): upgrade org.openjdk.jmh:jmh-core to 1.21 | 
| Backport | [HIVE-20503](https://issues.apache.org/jira/browse/HIVE-20503): Use datastructure aware estimations during mapjoin selection | 
| Backport | [HIVE-20498](https://issues.apache.org/jira/browse/HIVE-20498): Support date type for column stats autogather | 
| Backport | [HIVE-20496](https://issues.apache.org/jira/browse/HIVE-20496): Vectorization: Vectorized PTF IllegalStateException | 
| Backport | [HIVE-20494](https://issues.apache.org/jira/browse/HIVE-20494): GenericUDFRestrictInformationSchema is broken after HIVE-19440 | 
| Backport | [HIVE-20477](https://issues.apache.org/jira/browse/HIVE-20477): OptimizedSql is not shown if the expression contains INs | 
| Backport | [HIVE-20467](https://issues.apache.org/jira/browse/HIVE-20467): Allow IF NOT EXISTS/IF EXISTS in Resource plan creation/drop | 
| Backport | [HIVE-20462](https://issues.apache.org/jira/browse/HIVE-20462): "CREATE VIEW IF NOT EXISTS" fails if view already exists | 
| Backport | [HIVE-20455](https://issues.apache.org/jira/browse/HIVE-20455): Log spew from security.authorization.PrivilegeSynchonizer.run | 
| Backport | [HIVE-20439](https://issues.apache.org/jira/browse/HIVE-20439): Use the inflated memory limit during join selection for llap | 
| Backport | [HIVE-20433](https://issues.apache.org/jira/browse/HIVE-20433): Implicit String to Timestamp conversion is slow | 
| Backport | [HIVE-20432](https://issues.apache.org/jira/browse/HIVE-20432): Rewrite BETWEEN to IN for integer types for stats estimation | 
| Backport | [HIVE-20423](https://issues.apache.org/jira/browse/HIVE-20423): Set NULLS LAST as the default null ordering | 
| Backport | [HIVE-20418](https://issues.apache.org/jira/browse/HIVE-20418): LLAP IO may not handle ORC files that have row index disabled correctly for queries with no columns selected | 
| Backport | [HIVE-20412](https://issues.apache.org/jira/browse/HIVE-20412): NPE in HiveMetaHook | 
| Backport | [HIVE-20406](https://issues.apache.org/jira/browse/HIVE-20406): Nested Coalesce giving incorrect results | 
| Backport | [HIVE-20399](https://issues.apache.org/jira/browse/HIVE-20399): CTAS w/a custom table location that is not fully qualified fails for MM tables | 
| Backport | [HIVE-20393](https://issues.apache.org/jira/browse/HIVE-20393): Semijoin Reduction : markSemiJoinForDPP behaves inconsistently | 
| Backport | [HIVE-20391](https://issues.apache.org/jira/browse/HIVE-20391): HiveAggregateReduceFunctionsRule may infer wrong return type when decomposing aggregate function | 
| Backport | [HIVE-20383](https://issues.apache.org/jira/browse/HIVE-20383): Invalid queue name and synchronisation issues in hive proto events hook. | 
| Backport | [HIVE-20367](https://issues.apache.org/jira/browse/HIVE-20367): Vectorization: Support streaming for PTF AVG, MAX, MIN, SUM | 
| Backport | [HIVE-20366](https://issues.apache.org/jira/browse/HIVE-20366): TPC-DS query78 stats estimates are off for is null filte | 
| Backport | [HIVE-20364](https://issues.apache.org/jira/browse/HIVE-20364): Update default for hive.map.aggr.hash.min.reduction | 
| Backport | [HIVE-20352](https://issues.apache.org/jira/browse/HIVE-20352): Vectorization: Support grouping function | 
| Backport | [HIVE-20347](https://issues.apache.org/jira/browse/HIVE-20347): hive.optimize.sort.dynamic.partition should work with partitioned CTAS and MV | 
| Backport | [HIVE-20345](https://issues.apache.org/jira/browse/HIVE-20345): Drop database may hang if the tables get deleted from a different call | 
| Backport | [HIVE-20343](https://issues.apache.org/jira/browse/HIVE-20343): Hive 3: CTAS does not respect transactional\$1properties | 
| Backport | [HIVE-20340](https://issues.apache.org/jira/browse/HIVE-20340): Druid Needs Explicit CASTs from Timestamp to STRING when the output of timestamp function is used as Strin | 
| Backport | [HIVE-20339](https://issues.apache.org/jira/browse/HIVE-20339): Vectorization: Lift unneeded restriction causing some PTF with RANK not to be vectorized | 
| Backport | [HIVE-20337](https://issues.apache.org/jira/browse/HIVE-20337): CachedStore: getPartitionsByExpr is not populating the partition list correctly | 
| Backport | [HIVE-20336](https://issues.apache.org/jira/browse/HIVE-20336): Masking and filtering policies for materialized views | 
| Backport | [HIVE-20326](https://issues.apache.org/jira/browse/HIVE-20326): Create constraints with RELY as default instead of NO RELY | 
| Backport | [HIVE-20321](https://issues.apache.org/jira/browse/HIVE-20321): Vectorization: Cut down memory size of 1 col VectorHashKeyWrapper to <1 CacheLine | 
| Backport | [HIVE-20320](https://issues.apache.org/jira/browse/HIVE-20320): Turn on hive.optimize.remove.sq\$1count\$1check flag | 
| Backport | [HIVE-20315](https://issues.apache.org/jira/browse/HIVE-20315): Vectorization: Fix more NULL / Wrong Results issues and avoid unnecessary casts/conversions | 
| Backport | [HIVE-20314](https://issues.apache.org/jira/browse/HIVE-20314): Include partition pruning in materialized view rewriting | 
| Backport | [HIVE-20312](https://issues.apache.org/jira/browse/HIVE-20312): Allow arrow clients to use their own BufferAllocator with LlapOutputFormatService | 
| Backport | [HIVE-20302](https://issues.apache.org/jira/browse/HIVE-20302): LLAP: non-vectorized execution in IO ignores virtual columns, including ROW\$1\$1ID | 
| Backport | [HIVE-20300](https://issues.apache.org/jira/browse/HIVE-20300): VectorFileSinkArrowOperator | 
| Backport | [HIVE-20299](https://issues.apache.org/jira/browse/HIVE-20299): potential race in LLAP signer unit test | 
| Backport | [HIVE-20296](https://issues.apache.org/jira/browse/HIVE-20296): Improve HivePointLookupOptimizerRule to be able to extract from more sophisticated contexts | 
| Backport | [HIVE-20294](https://issues.apache.org/jira/browse/HIVE-20294): Vectorization: Fix NULL / Wrong Results issues in COALESCE / ELT | 
| Backport | [HIVE-20292](https://issues.apache.org/jira/browse/HIVE-20292): Bad join ordering in tpcds query93 with primary constraint defined | 
| Backport | [HIVE-20290](https://issues.apache.org/jira/browse/HIVE-20290): Lazy initialize ArrowColumnarBatchSerDe so it doesn't allocate buffers during GetSplits | 
| Backport | [HIVE-20281](https://issues.apache.org/jira/browse/HIVE-20281): SharedWorkOptimizer fails with 'operator cache contents and actual plan differ' | 
| Backport | [HIVE-20277](https://issues.apache.org/jira/browse/HIVE-20277): Vectorization: Case expressions that return BOOLEAN are not supported for FILTER | 
| Backport | [HIVE-20267](https://issues.apache.org/jira/browse/HIVE-20267): Expanding WebUI to include form to dynamically config log levels | 
| Backport | [HIVE-20263](https://issues.apache.org/jira/browse/HIVE-20263): Typo in HiveReduceExpressionsWithStatsRule variable | 
| Backport | [HIVE-20260](https://issues.apache.org/jira/browse/HIVE-20260): NDV of a column shouldn't be scaled when row count is changed by filter on another column | 
| Backport | [HIVE-20252](https://issues.apache.org/jira/browse/HIVE-20252): Semijoin Reduction : Cycles due to semi join branch may remain undetected if small table side has a map join upstream. | 
| Backport | [HIVE-20245](https://issues.apache.org/jira/browse/HIVE-20245): Vectorization: Fix NULL / Wrong Results issues in BETWEEN / IN | 
| Backport | [HIVE-20241](https://issues.apache.org/jira/browse/HIVE-20241): Support partitioning spec in CTAS statements | 
| Backport | [HIVE-20240](https://issues.apache.org/jira/browse/HIVE-20240): Semijoin Reduction: Use local variable to check for external table condition | 
| Backport | [HIVE-20226](https://issues.apache.org/jira/browse/HIVE-20226): HMS getNextNotification will throw exception when request maxEvents exceed table's max\$1rows | 
| Backport | [HIVE-20225](https://issues.apache.org/jira/browse/HIVE-20225): SerDe to support Teradata Binary Format | 
| Backport | [HIVE-20213](https://issues.apache.org/jira/browse/HIVE-20213): Upgrade Calcite to 1.17.0 | 
| Backport | [HIVE-20212](https://issues.apache.org/jira/browse/HIVE-20212): Hiveserver2 in http mode emitting metric default.General.open\$1connections incorrectly | 
| Backport | [HIVE-20210](https://issues.apache.org/jira/browse/HIVE-20210): Simple Fetch optimizer should lead to MapReduce when filter on non-partition column and conversion is minimal | 
| Backport | [HIVE-20209](https://issues.apache.org/jira/browse/HIVE-20209): Metastore connection fails for first attempt in repl dump | 
| Backport | [HIVE-20207](https://issues.apache.org/jira/browse/HIVE-20207): Vectorization: Fix NULL / Wrong Results issues in Filter / Compare | 
| Backport | [HIVE-20204](https://issues.apache.org/jira/browse/HIVE-20204): Type conversion during IN | 
| Backport | [HIVE-20203](https://issues.apache.org/jira/browse/HIVE-20203): Arrow SerDe leaks a DirectByteBuffer | 
| Backport | [HIVE-20197](https://issues.apache.org/jira/browse/HIVE-20197): Vectorization: Add DECIMAL\$164 testing, add Date/Interval/Timestamp arithmetic, and add more GROUP BY Aggregation | 
| Backport | [HIVE-20193](https://issues.apache.org/jira/browse/HIVE-20193): cboInfo is not present in the explain plan json | 
| Backport | [HIVE-20192](https://issues.apache.org/jira/browse/HIVE-20192): HS2 with embedded metastore is leaking JDOPersistenceManager objects | 
| Backport | [HIVE-20183](https://issues.apache.org/jira/browse/HIVE-20183): Inserting from bucketed table can cause data loss, if the source table contains empty bucket | 
| Backport | [HIVE-20177](https://issues.apache.org/jira/browse/HIVE-20177): Vectorization: Reduce KeyWrapper allocation in GroupBy Streaming mode | 
| Backport | [HIVE-20174](https://issues.apache.org/jira/browse/HIVE-20174): Vectorization: Fix NULL / Wrong Results issues in GROUP BY Aggregation Functions | 
| Backport | [HIVE-20172](https://issues.apache.org/jira/browse/HIVE-20172): StatsUpdater failed with GSS Exception while trying to connect to remote metastore | 
| Backport | [HIVE-20153](https://issues.apache.org/jira/browse/HIVE-20153): Count and Sum UDF consume more memory in Hive 2\$1 | 
| Backport | [HIVE-20152](https://issues.apache.org/jira/browse/HIVE-20152): reset db state, when repl dump fails, so rename table can be done | 
| Backport | [HIVE-20149](https://issues.apache.org/jira/browse/HIVE-20149): TestHiveCli failing/timing out | 
| Backport | [HIVE-20130](https://issues.apache.org/jira/browse/HIVE-20130): Better logging for information schema synchronizer | 
| Backport | [HIVE-20129](https://issues.apache.org/jira/browse/HIVE-20129): Revert to position based schema evolution for orc tables | 
| Backport | [HIVE-20118](https://issues.apache.org/jira/browse/HIVE-20118): SessionStateUserAuthenticator.getGroupNames | 
| Backport | [HIVE-20116](https://issues.apache.org/jira/browse/HIVE-20116): TezTask is using parent logger | 
| Backport | [HIVE-20115](https://issues.apache.org/jira/browse/HIVE-20115): Acid tables should not use footer scan for analyze | 
| Backport | [HIVE-20103](https://issues.apache.org/jira/browse/HIVE-20103): WM: Only Aggregate DAG counters if at least one is used | 
| Backport | [HIVE-20101](https://issues.apache.org/jira/browse/HIVE-20101): BloomKFilter: Avoid using the local byte[] arrays entirely | 
| Backport | [HIVE-20100](https://issues.apache.org/jira/browse/HIVE-20100): OpTraits : Select Optraits should stop when a mismatch is detected | 
| Backport | [HIVE-20098](https://issues.apache.org/jira/browse/HIVE-20098): Statistics: NPE when getting Date column partition statistics | 
| Backport | [HIVE-20095](https://issues.apache.org/jira/browse/HIVE-20095): Fix feature to push computation to jdbc external tables | 
| Backport | [HIVE-20093](https://issues.apache.org/jira/browse/HIVE-20093): LlapOutputFomatService: Use ArrowBuf with Netty for Accounting | 
| Backport | [HIVE-20090](https://issues.apache.org/jira/browse/HIVE-20090): Extend creation of semijoin reduction filters to be able to discover new opportunities | 
| Backport | [HIVE-20088](https://issues.apache.org/jira/browse/HIVE-20088): Beeline config location path is assembled incorrectly | 
| Backport | [HIVE-20082](https://issues.apache.org/jira/browse/HIVE-20082): HiveDecimal to string conversion doesn't format the decimal correctly | 
| Backport | [HIVE-20069](https://issues.apache.org/jira/browse/HIVE-20069): Fix reoptimization in case of DPP and Semijoin optimization | 
| Backport | [HIVE-20051](https://issues.apache.org/jira/browse/HIVE-20051): Skip authorization for temp tables | 
| Backport | [HIVE-20044](https://issues.apache.org/jira/browse/HIVE-20044): Arrow Serde should pad char values and handle empty strings correctly | 
| Backport | [HIVE-20028](https://issues.apache.org/jira/browse/HIVE-20028): Metastore client cache config is used incorrectly | 
| Backport | [HIVE-20025](https://issues.apache.org/jira/browse/HIVE-20025): Clean-up of event files created by HiveProtoLoggingHook | 
| Backport | [HIVE-20020](https://issues.apache.org/jira/browse/HIVE-20020): Hive contrib jar should not be in lib | 
| Backport | [HIVE-20013](https://issues.apache.org/jira/browse/HIVE-20013): Add an Implicit cast to date type for to\$1date function | 
| Backport | [HIVE-20011](https://issues.apache.org/jira/browse/HIVE-20011): Move away from append mode in proto logging hook | 
| Backport | [HIVE-20005](https://issues.apache.org/jira/browse/HIVE-20005): acid\$1table\$1stats, acid\$1no\$1buckets, etc - query result change on the branch | 
| Backport | [HIVE-20004](https://issues.apache.org/jira/browse/HIVE-20004): Wrong scale used by ConvertDecimal64ToDecimal results in incorrect results | 
| Backport | [HIVE-19995](https://issues.apache.org/jira/browse/HIVE-19995): Aggregate row traffic for acid tables | 
| Backport | [HIVE-19993](https://issues.apache.org/jira/browse/HIVE-19993): Using a table alias which also appears as a column name is not possible | 
| Backport | [HIVE-19992](https://issues.apache.org/jira/browse/HIVE-19992): Vectorization: Follow-on to HIVE-19951 --> add call to SchemaEvolution.isOnlyImplicitConversion to disable encoded LLAP I/O for ORC only when data type conversion is not implicit | 
| Backport | [HIVE-19989](https://issues.apache.org/jira/browse/HIVE-19989): Metastore uses wrong application name for HADOOP2 metrics | 
| Backport | [HIVE-19981](https://issues.apache.org/jira/browse/HIVE-19981): Managed tables converted to external tables by the HiveStrictManagedMigration utility should be set to delete data when the table is dropped | 
| Backport | [HIVE-19967](https://issues.apache.org/jira/browse/HIVE-19967): SMB Join : Need Optraits for PTFOperator ala GBY Op | 
| Backport | [HIVE-19935](https://issues.apache.org/jira/browse/HIVE-19935): Hive WM session killed: Failed to update LLAP tasks count | 
| Backport | [HIVE-19924](https://issues.apache.org/jira/browse/HIVE-19924): Tag distcp jobs run by Repl Load | 
| Backport | [HIVE-19891](https://issues.apache.org/jira/browse/HIVE-19891): inserting into external tables with custom partition directories may cause data loss | 
| Backport | [HIVE-19850](https://issues.apache.org/jira/browse/HIVE-19850): Dynamic partition pruning in Tez is leading to 'No work found for tablescan' error | 
| Backport | [HIVE-19806](https://issues.apache.org/jira/browse/HIVE-19806): Sort qtests output to avoid flakiness in test results | 
| Backport | [HIVE-19770](https://issues.apache.org/jira/browse/HIVE-19770): Support for CBO for queries with multiple same columns in select | 
| Backport | [HIVE-19769](https://issues.apache.org/jira/browse/HIVE-19769): Create dedicated objects for DB and Table names | 
| Backport | [HIVE-19765](https://issues.apache.org/jira/browse/HIVE-19765): Add Parquet specific tests to BlobstoreCliDriver | 
| Backport | [HIVE-19759](https://issues.apache.org/jira/browse/HIVE-19759): Flaky test: TestRpc\$1testServerPort | 
| Backport | [HIVE-19711](https://issues.apache.org/jira/browse/HIVE-19711): Refactor Hive Schema Tool | 
| Backport | [HIVE-19701](https://issues.apache.org/jira/browse/HIVE-19701): getDelegationTokenFromMetaStore doesn't need to be synchronized | 
| Backport | [HIVE-19694](https://issues.apache.org/jira/browse/HIVE-19694): Create Materialized View statement should check for MV name conflicts before running MV's SQL statement. | 
| Backport | [HIVE-19674](https://issues.apache.org/jira/browse/HIVE-19674): Group by Decimal Constants push down to Druid table | 
| Backport | [HIVE-19668](https://issues.apache.org/jira/browse/HIVE-19668): Over 30% of the heap wasted by duplicate org.antlr.runtime.CommonToken's and duplicate strings | 
| Backport | [HIVE-19663](https://issues.apache.org/jira/browse/HIVE-19663): refactor LLAP IO report generation | 
| Backport | [HIVE-19661](https://issues.apache.org/jira/browse/HIVE-20829): switch Hive UDFs to use Re2J regex engine | 
| Backport | [HIVE-19628](https://issues.apache.org/jira/browse/HIVE-19628): possible NPE in LLAP testSigning | 
| Backport | [HIVE-19568](https://issues.apache.org/jira/browse/HIVE-19568): Active/Passive HS2 HA: Disallow direct connection to passive HS2 instance | 
| Backport | [HIVE-19564](https://issues.apache.org/jira/browse/HIVE-19564): Vectorization: Fix NULL / Wrong Results issues in Arithmetic | 
| Backport | [HIVE-19552](https://issues.apache.org/jira/browse/HIVE-19552): Enable TestMiniDruidKafkaCliDriver\$1druidkafkamini\$1basic.q | 
| Backport | [HIVE-19432](https://issues.apache.org/jira/browse/HIVE-19432): GetTablesOperation is too slow if the hive has too many databases and tables | 
| Backport | [HIVE-19360](https://issues.apache.org/jira/browse/HIVE-19360): CBO: Add an "optimizedSQL" to QueryPlan object | 
| Backport | [HIVE-19326](https://issues.apache.org/jira/browse/HIVE-19326): stats auto gather: incorrect aggregation during UNION queries | 
| Backport | [HIVE-19313](https://issues.apache.org/jira/browse/HIVE-19313): TestJdbcWithDBTokenStoreNoDoAs tests are failing | 
| Backport | [HIVE-19285](https://issues.apache.org/jira/browse/HIVE-19285): Add logs to the subclasses of MetaDataOperation | 
| Backport | [HIVE-19235](https://issues.apache.org/jira/browse/HIVE-19235): Update golden files for Minimr tests | 
| Backport | [HIVE-19104](https://issues.apache.org/jira/browse/HIVE-19104): When test MetaStore is started with retry the instances should be independent | 
| Backport | [HIVE-18986](https://issues.apache.org/jira/browse/HIVE-18986): Table rename will run java.lang.StackOverflowError in dataNucleus if the table contains large number of columns | 
| Backport | [HIVE-18920](https://issues.apache.org/jira/browse/HIVE-18920): CBO: Initialize the Janino providers ahead of 1st query | 
| Backport | [HIVE-18873](https://issues.apache.org/jira/browse/HIVE-18873): Skipping predicate pushdown for MR silently at HiveInputFormat can cause storage handlers to produce erroneous result | 
| Backport | [HIVE-18871](https://issues.apache.org/jira/browse/HIVE-18871): hive on tez execution error due to set hive.aux.jars.path to hdfs:// | 
| Backport | [HIVE-18725](https://issues.apache.org/jira/browse/HIVE-18725): Improve error handling for subqueries if there is wrong column reference | 
| Backport | [HIVE-18696](https://issues.apache.org/jira/browse/HIVE-18696): The partition folders might not get cleaned up properly in the HiveMetaStore.add\$1partitions\$1core method if an | 
| Backport | [HIVE-18453](https://issues.apache.org/jira/browse/HIVE-18453): ACID: Add "CREATE TRANSACTIONAL TABLE" syntax to unify ACID ORC & Parquet support | 
| Backport | [HIVE-18201](https://issues.apache.org/jira/browse/HIVE-18201): Disable XPROD\$1EDGE for sq\$1count\$1chec | 
| Backport | [HIVE-18140](https://issues.apache.org/jira/browse/HIVE-18140): Partitioned tables statistics can go wrong in basic stats mixed case | 
| Backport | [HIVE-17921](https://issues.apache.org/jira/browse/HIVE-17921): Aggregation with struct in LLAP produces wrong result | 
| Backport | [HIVE-17896](https://issues.apache.org/jira/browse/HIVE-17896): TopNKey: Create a standalone vectorizable TopNKey operator | 
| Backport | [HIVE-17840](https://issues.apache.org/jira/browse/HIVE-17840): HiveMetaStore eats exception if transactionalListeners.notifyEvent fail | 
| Backport | [HIVE-17043](https://issues.apache.org/jira/browse/HIVE-17043): Remove non unique columns from group by keys if not referenced later | 
| Backport | [HIVE-17040](https://issues.apache.org/jira/browse/HIVE-17040): Join elimination in the presence of FK relationship | 
| Backport | [HIVE-16839](https://issues.apache.org/jira/browse/HIVE-16839): Unbalanced calls to openTransaction/commitTransaction when alter the same partition concurrently | 
| Backport | [HIVE-16100](https://issues.apache.org/jira/browse/HIVE-16100): Dynamic Sorted Partition optimizer loses sibling operators | 
| Backport | [HIVE-15956](https://issues.apache.org/jira/browse/HIVE-15956): StackOverflowError when drop lots of partitions | 
| Backport | [HIVE-15177](https://issues.apache.org/jira/browse/HIVE-15177): Authentication with hive fails when kerberos auth type is set to fromSubject and principal contains \$1HOST | 
| Backport | [HIVE-14898](https://issues.apache.org/jira/browse/HIVE-14898): HS2 shouldn't log callstack for an empty auth header error | 
| Backport | [HIVE-14493](https://issues.apache.org/jira/browse/HIVE-14493): Partitioning support for materialized views | 
| Backport | [HIVE-14431](https://issues.apache.org/jira/browse/HIVE-14431): Recognize COALESCE as CASE | 
| Backport | [HIVE-13457](https://issues.apache.org/jira/browse/HIVE-13457): Create HS2 REST API endpoints for monitoring information | 
| Backport | [HIVE-12342](https://issues.apache.org/jira/browse/HIVE-12342): Set default value of hive.optimize.index.filter to true | 
| Backport | [HIVE-10296](https://issues.apache.org/jira/browse/HIVE-10296): Cast exception observed when hive runs a multi join query on metastore | 
| Backport | [HIVE-6980](https://issues.apache.org/jira/browse/HIVE-6980): Drop table by using direct sql | 

## Amazon EMR 6.6.0 - Hive configuration changes
<a name="emr-Hive-660-configs"></a>
+ As part of OSS change [HIVE-20703](https://issues.apache.org/jira/browse/HIVE-20703), the property to sort dynamic partitions, `hive.optimize.sort.dynamic.partition`, has been replaced with `hive.optimize.sort.dynamic.partition.threshold`. 

  The `hive.optimize.sort.dynamic.partition.threshold` configuration has the following potential values:    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/emr/latest/ReleaseGuide/Hive-release-history-660.html)

## Amazon EMR 6.6.0 - Hive known issues
<a name="emr-Hive-660-issues"></a>
+ Queries with windowing functions on the same column as join may lead to invalid transformations as reported in [HIVE-25278](https://issues.apache.org/jira/browse/HIVE-25278) and cause incorrect results or query failures. As a workaround, you can disable CBO at the query level for such queries. Contact AWS support for further information.
+  Amazon EMR 6.6.0 includes Hive software version 3.1.2. Hive 3.1.2 introduces a feature that splits text files if they contain a header and footer ([HIVE-21924](https://issues.apache.org/jira/browse/HIVE-21924)). The Apache Tez App Master reads each of your files to determine offset points in the data range. These behaviors combined could negatively impact performance if your queries read a large number of small text files. As a workaround, use `CombineHiveInputFormat` and tune the max split size by configuring the following properties:

  ```
  SET hive.tez.input.format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
  SET mapreduce.input.fileinputformat.split.maxsize=16777216;
  ```
+ With Amazon EMR 6.6.0 through 6.9.x, INSERT queries with dynamic partition and an ORDER BY or SORT BY clause will always have two reducers. This issue is caused by OSS change [HIVE-20703](https://issues.apache.org/jira/browse/HIVE-20703), which puts dynamic sort partition optimization under cost-based decision. If your workload doesn't require sorting of dynamic partitions, we recommend that you set the `hive.optimize.sort.dynamic.partition.threshold` property to `-1` to disable the new feature and get the correctly calculated number of reducers. This issue is fixed in OSS Hive as part of [HIVE-22269](https://issues.apache.org/jira/browse/HIVE-22269) and is fixed in Amazon EMR 6.10.0.