# Use Athena ACID transactions
<a name="acid-transactions"></a>

The term "ACID transactions" refers to a set of properties ([atomicity](https://en.wikipedia.org/wiki/Atomicity_(database_systems)), [consistency](https://en.wikipedia.org/wiki/Consistency_(database_systems)), [isolation](https://en.wikipedia.org/wiki/Isolation_(database_systems)), and [durability](https://en.wikipedia.org/wiki/Durability_(database_systems))) that ensure data integrity in database transactions. ACID transactions enable multiple users to concurrently and reliably add and delete Amazon S3 objects in an atomic manner, while isolating any existing queries by maintaining read consistency for queries against the data lake. Athena ACID transactions add single-table support for insert, delete, update, and time travel operations to the Athena SQL data manipulation language (DML). You and multiple concurrent users can use Athena ACID transactions to make reliable, row-level modifications to Amazon S3 data. Athena transactions automatically manage locking semantics and coordination and do not require a custom record locking solution.

Athena ACID transactions and familiar SQL syntax simplify updates to your business and regulatory data. For example, to respond to a data erasure request, you can perform a SQL `DELETE` operation. To make manual record corrections, you can use a single `UPDATE` statement. To recover data that was recently deleted, you can issue time travel queries using a `SELECT` statement.

Because they are built on shared table formats, Athena ACID transactions are compatible with other services and engines such as [Amazon EMR](https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-what-is-emr.html) and [Apache Spark](https://spark.apache.org/) that also support shared table formats.

Athena transactions are available through the Athena console, API operations, and ODBC and JDBC drivers.

**Topics**
+ [Query Linux Foundation Delta Lake tables](delta-lake-tables.md)
+ [Query Apache Hudi datasets](querying-hudi.md)
+ [Query Apache Iceberg tables](querying-iceberg.md)

# Query Linux Foundation Delta Lake tables
<a name="delta-lake-tables"></a>

Linux Foundation [Delta Lake](https://delta.io/) is a table format for big data analytics. You can use Amazon Athena to read Delta Lake tables stored in Amazon S3 directly without having to generate manifest files or run the `MSCK REPAIR` statement.

The Delta Lake format stores the minimum and maximum values per column of each data file. The Athena implementation makes use of this information to enable file-skipping on predicates to eliminate unwanted files from consideration.

## Considerations and limitations
<a name="delta-lake-tables-considerations-and-limitations"></a>

Delta Lake support in Athena has the following considerations and limitations:
+ **Tables with AWS Glue catalog only** – Native Delta Lake support is supported only through tables registered with AWS Glue. If you have a Delta Lake table that is registered with another metastore, you can still keep it and treat it as your primary metastore. Because Delta Lake metadata is stored in the file system (for example, in Amazon S3) rather than in the metastore, Athena requires only the location property in AWS Glue to read from your Delta Lake tables.
+ **V3 engine only** – Delta Lake queries are supported only on Athena engine version 3. You must ensure that the workgroup you create is configured to use Athena engine version 3.
+ **No time travel support** – There is no support for queries that use Delta Lake’s time travel capabilities.
+ **Read only** – Write DML statements like `UPDATE`, `INSERT`, or `DELETE` are not supported.
+ **Lake Formation support** – Lake Formation integration is available for Delta Lake tables with their schema in sync with AWS Glue. For more information, see [Using AWS Lake Formation with Amazon Athena](https://docs.aws.amazon.com/lake-formation/latest/dg/athena-lf.html) and [Set up permissions for a Delta Lake table](https://docs.aws.amazon.com/lake-formation/latest/dg/set-up-delta-table.html) in the *AWS Lake Formation Developer Guide*.
+ **Limited DDL support** – The following DDL statements are supported: `CREATE EXTERNAL TABLE`, `SHOW COLUMNS`, `SHOW TBLPROPERTIES`, `SHOW PARTITIONS`, `SHOW CREATE TABLE`, and `DESCRIBE`. For information on using the `CREATE EXTERNAL TABLE` statement, see the [Get started with Delta Lake tables](delta-lake-tables-getting-started.md) section.
+ **Skipping Amazon Glacier objects not supported ** – If objects in the Linux Foundation Delta Lake table are in an Amazon Glacier storage class, setting the `read_restored_glacier_objects` table property to `false` has no effect.

  For example, suppose you issue the following command:

  ```
  ALTER TABLE table_name SET TBLPROPERTIES ('read_restored_glacier_objects' = 'false')
  ```

  For Iceberg and Delta Lake tables, the command produces the error Unsupported table property key: read\$1restored\$1glacier\$1objects. For Hudi tables, the `ALTER TABLE` command does not produce an error, but Amazon Glacier objects are still not skipped. Running `SELECT` queries after the `ALTER TABLE` command continues to return all objects.
+ **Encrypted tables** – Athena does not support natively reading CSE-KMS encrypted Delta Lake tables. This includes SELECT statements and DDL statements.

### Delta Lake versioning and Athena
<a name="delta-lake-tables-versioning"></a>

Athena does not use the [versioning](https://docs.delta.io/latest/releases.html) listed in the Delta Lake documentation. To determine whether your Delta Lake tables are compatible with Athena, consider the following two characteristics:
+ **Reader version ** – Every Delta Lake table has a reader version. Currently, this is a number between 1 and 3. Queries that include a table with a reader version that Athena does not support will fail.
+ **Table features ** – Every Delta Lake table also can declare a set of reader/writer features. Because Athena's support of Delta Lake is read-only, table writer feature compatibility does not apply. However, queries on tables with unsupported table reader features will fail.

The following table shows the Delta Lake reader versions and Delta Lake table reader features that Athena supports.


****  

| Query type | Supported reader versions | Supported reader features | 
| --- | --- | --- | 
| DQL (SELECT statements) | <= 3 | [Column mapping](https://docs.delta.io/latest/delta-column-mapping.html), [timestampNtz](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#timestamp-without-timezone-timestampntz), [deletion vectors](https://docs.delta.io/latest/delta-deletion-vectors.html) | 
| DDL | <= 1 | Not applicable. Reader features can be declared only on tables with a reader version of 2 or greater. | 
+ For a list of Delta Lake table features, see [Valid feature names in table features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features) on GitHub.com
+ For a list of Delta Lake features by protocol version, see [Features by protocol version](https://docs.delta.io/latest/versioning.html#features-by-protocol-version) on GitHub.com.

To create a Delta Lake table in Athena with a reader version greater than 1, see [Synchronize Delta Lake metadata](delta-lake-tables-syncing-metadata.md).

**Topics**
+ [Considerations and limitations](#delta-lake-tables-considerations-and-limitations)
+ [Supported column data types](delta-lake-tables-supported-data-types-columns.md)
+ [Get started with Delta Lake tables](delta-lake-tables-getting-started.md)
+ [Query Delta Lake tables with SQL](delta-lake-tables-querying.md)
+ [Synchronize Delta Lake metadata](delta-lake-tables-syncing-metadata.md)
+ [Additional resources](delta-lake-tables-additional-resources.md)

# Supported column data types
<a name="delta-lake-tables-supported-data-types-columns"></a>

This section describes the supported data types for non-partition and partition columns. 

## Supported non-partition column data types
<a name="delta-lake-tables-supported-data-types-non-partition-columns"></a>

For non-partition columns, all data types that Athena supports except `CHAR` are supported (`CHAR` is not supported in the Delta Lake protocol itself). Supported data types include:

```
boolean
tinyint
smallint
integer
bigint
double
float
decimal
varchar
string
binary
date
timestamp
array
map
struct
```

## Supported partition column data types
<a name="delta-lake-tables-supported-data-types-partition-columns"></a>

For partition columns, Athena supports tables with the following data types:

```
boolean
integer
smallint
tinyint
bigint
decimal
float
double
date
timestamp
varchar
```

For more information about the data types in Athena, see [Data types in Amazon Athena](data-types.md).

# Get started with Delta Lake tables
<a name="delta-lake-tables-getting-started"></a>

To be queryable, your Delta Lake table must exist in AWS Glue. If your table is in Amazon S3 but not in AWS Glue, run a `CREATE EXTERNAL TABLE` statement using the following syntax. If your table already exists in AWS Glue (for example, because you are using Apache Spark or another engine with AWS Glue), you can skip this step. Note the omission of column definitions, SerDe library, and other table properties. Unlike traditional Hive tables, Delta Lake table metadata are inferred from the Delta Lake transaction log and synchronized directly to AWS Glue.

```
CREATE EXTERNAL TABLE
  [db_name.]table_name
  LOCATION 's3://amzn-s3-demo-bucket/your-folder/'
  TBLPROPERTIES ('table_type' = 'DELTA')
```

**Note**  
This statement is not compatible with S3 buckets that have requester pays enabled. If you want to create a Delta Lake table against an S3 bucket with requester pays enabled, follow the instructions and DDL statement in [Synchronize Delta Lake metadata](delta-lake-tables-syncing-metadata.md).
For Delta Lake tables, `CREATE TABLE` statements that include more than the `LOCATION` and `table_type` property are not allowed.

# Query Delta Lake tables with SQL
<a name="delta-lake-tables-querying"></a>

To query a Delta Lake table, use standard SQL `SELECT` syntax:

```
[ WITH with_query [, ...] ]SELECT [ ALL | DISTINCT ] select_expression [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST] [, ...] ]
[ OFFSET count [ ROW | ROWS ] ]
[ LIMIT [ count | ALL ] ]
```

For more information about `SELECT` syntax, see [SELECT](select.md) in the Athena documentation.

The Delta Lake format stores the minimum and maximum values per column of each data file. Athena makes use of this information to enable file skipping on predicates to eliminate unnecessary files from consideration.

# Synchronize Delta Lake metadata
<a name="delta-lake-tables-syncing-metadata"></a>

Athena synchronizes table metadata, including schema, partition columns, and table properties, to AWS Glue if you use Athena to create your Delta Lake table. As time passes, this metadata can lose its synchronization with the underlying table metadata in the transaction log. To keep your table up to date, you can choose one of the following options:
+ Use the AWS Glue crawler for Delta Lake tables. For more information, see [Introducing native Delta Lake table support with AWS Glue crawlers](https://aws.amazon.com/blogs/big-data/introducing-native-delta-lake-table-support-with-aws-glue-crawlers/) in the *AWS Big Data Blog* and [Scheduling an AWS Glue crawler](https://docs.aws.amazon.com/glue/latest/dg/schedule-crawler.html) in the AWS Glue Developer Guide.
+ Drop and recreate the table in Athena.
+ Use the SDK, CLI, or AWS Glue console to manually update the schema in AWS Glue.

Note that the following features require your AWS Glue schema to always have the same schema as the transaction log:
+ Lake Formation
+ Views
+ Row and column filters

If your workflow does not require any of this functionality, and you prefer not to maintain this compatibility, you can use `CREATE TABLE` DDL in Athena and then add the Amazon S3 path as a SerDe parameter in AWS Glue.

## Create a Delta Lake table using the Athena and AWS Glue consoles
<a name="delta-lake-tables-syncing-metadata-console"></a>

You can use the following procedure to create a Delta Lake table with the Athena and AWS Glue consoles.

**To create a Delta Lake table using the Athena and AWS Glue consoles**

1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home).

1. In the Athena query editor, use the following DDL to create your Delta Lake table. Note that when using this method, the value for `TBLPROPERTIES` must be `'spark.sql.sources.provider' = 'delta'` and not `'table_type' = 'delta'`.

   Note that this same schema (with a single of column named `col` of type `array<string>`) is inserted when you use Apache Spark (Athena for Apache Spark) or most other engines to create your table.

   ```
   CREATE EXTERNAL TABLE
      [db_name.]table_name(col array<string>)
      LOCATION 's3://amzn-s3-demo-bucket/your-folder/'
      TBLPROPERTIES ('spark.sql.sources.provider' = 'delta')
   ```

1. Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. In the navigation pane, choose **Data Catalog**, **Tables**.

1. In the list of tables, choose the link for your table.

1. On the page for the table, choose **Actions**, **Edit table**.

1. In the **Serde parameters** section, add the key **path** with the value **s3://amzn-s3-demo-bucket/*your-folder*/**.

1. Choose **Save**.

## Create a Delta Lake table using the AWS CLI
<a name="delta-lake-tables-syncing-metadata-cli"></a>

To create a Delta Lake table using the AWS CLI, enter a command like the following.

```
aws glue create-table --database-name dbname \
    --table-input '{"Name" : "tablename", "StorageDescriptor":{
            "Columns" : [
                {
                    "Name": "col",
                    "Type": "array<string>"
                }
            ],
            "Location" : "s3://amzn-s3-demo-bucket/<prefix>/",
            "SerdeInfo" : {
                "Parameters" : {
                    "serialization.format" : "1",
                    "path" : "s3://amzn-s3-demo-bucket/<prefix>/"
                }
            }
        },
        "PartitionKeys": [],
        "TableType": "EXTERNAL_TABLE",
        "Parameters": {
            "EXTERNAL": "TRUE",
            "spark.sql.sources.provider": "delta"
        }
    }'
```

# Additional resources
<a name="delta-lake-tables-additional-resources"></a>

For a discussion of using Delta Lake tables with AWS Glue and querying them with Athena, see [Handle UPSERT data operations using open-source Delta Lake and AWS Glue](https://aws.amazon.com/blogs/big-data/handle-upsert-data-operations-using-open-source-delta-lake-and-aws-glue/) in the *AWS Big Data Blog*.

# Query Apache Hudi datasets
<a name="querying-hudi"></a>

[https://hudi.incubator.apache.org/](https://hudi.incubator.apache.org/) is an open-source data management framework that simplifies incremental data processing. Record-level insert, update, upsert, and delete actions are processed much more granularly, reducing overhead. `Upsert` refers to the ability to insert records into an existing dataset if they do not already exist or to update them if they do.

Hudi handles data insertion and update events without creating many small files that can cause performance issues for analytics. Apache Hudi automatically tracks changes and merges files so that they remain optimally sized. This avoids the need to build custom solutions that monitor and re-write many small files into fewer large files.

Hudi datasets are suitable for the following use cases:
+ Complying with privacy regulations like [General data protection regulation](https://en.wikipedia.org/wiki/General_Data_Protection_Regulation) (GDPR) and [California consumer privacy act](https://en.wikipedia.org/wiki/California_Consumer_Privacy_Act) (CCPA) that enforce people's right to remove personal information or change how their data is used.
+ Working with streaming data from sensors and other Internet of Things (IoT) devices that require specific data insertion and update events.
+ Implementing a [change data capture (CDC) system](https://en.wikipedia.org/wiki/Change_data_capture).

A Hudi dataset can be one of the following types:
+ **Copy on Write (CoW)** – Data is stored in a columnar format (Parquet), and each update creates a new version of files during a write.
+ **Merge on Read (MoR)** – Data is stored using a combination of columnar (Parquet) and row-based (Avro) formats. Updates are logged to row-based `delta` files and are compacted as needed to create new versions of the columnar files.

With CoW datasets, each time there is an update to a record, the file that contains the record is rewritten with the updated values. With a MoR dataset, each time there is an update, Hudi writes only the row for the changed record. MoR is better suited for write- or change-heavy workloads with fewer reads. CoW is better suited for read-heavy workloads on data that change less frequently.

Hudi provides three query types for accessing the data:
+ **Snapshot queries** – Queries that see the latest snapshot of the table as of a given commit or compaction action. For MoR tables, snapshot queries expose the most recent state of the table by merging the base and delta files of the latest file slice at the time of the query. 
+ **Incremental queries** – Queries only see new data written to the table, since a given commit/compaction. This effectively provides change streams to enable incremental data pipelines.
+ **Read optimized queries** – For MoR tables, queries see the latest data compacted. For CoW tables, queries see the latest data committed.

The following table shows the possible Hudi query types for each table type.


| Table type | Possible Hudi query types | 
| --- | --- | 
| Copy On Write | snapshot, incremental | 
| Merge On Read | snapshot, incremental, read optimized | 

For more information about the tradeoffs between table and query types, see [Table & Query Types](https://hudi.apache.org/docs/table_types/) in the Apache Hudi documentation.

## Hudi terminology change: Views are now queries
<a name="querying-hudi-hudi-dataset-table-types-terminology"></a>

Starting in Apache Hudi release version 0.5.1, what were formerly called views are now called queries. The following table summarizes the changes between the old and new terms.


| Old term | New term | 
| --- | --- | 
|  CoW: read optimized view MoR: realtime view  |  Snapshot queries  | 
| Incremental view | Incremental query | 
| MoR read optimized view | Read optimized query | 

**Topics**
+ [Hudi terminology change: Views are now queries](#querying-hudi-hudi-dataset-table-types-terminology)
+ [Considerations and limitations](querying-hudi-in-athena-considerations-and-limitations.md)
+ [Copy on write (CoW) create table examples](querying-hudi-copy-on-write-create-table-examples.md)
+ [Merge on read (MoR) create table examples](querying-hudi-merge-on-read-create-table-examples.md)
+ [Use Hudi metadata for improved performance](querying-hudi-metadata-table.md)
+ [Additional resources](querying-hudi-additional-resources.md)

# Considerations and limitations
<a name="querying-hudi-in-athena-considerations-and-limitations"></a>

When you use Athena to read Apache Hudi tables, consider the following points.
+ **Read and write operations** – Athena can read compacted Hudi datasets but not write Hudi data.
+ **Hudi versions** – Athena supports Hudi version 0.14.0 (default) and 0.15.0. Athena cannot guarantee read compatibility with tables that are created with later versions of Hudi. For more information about Hudi features and versioning, see the [Hudi documentation](https://hudi.apache.org/) on the Apache website. Note that version 0.15.0 of the Hudi connector on Athena does not support bootstrapped tables. To use 0.15.0 of the Hudi connector, set the following table property:

  ```
  ALTER TABLE table_name SET TBLPROPERTIES ('athena_enable_native_hudi_connector_implementation' = 'true')
  ```
+ **Cross account queries** – Version 0.15.0 of the Hudi connector does not support cross account queries.
+ **Query types** – Currently, Athena supports snapshot queries and read optimized queries, but not incremental queries. On MoR tables, all data exposed to read optimized queries are compacted. This provides good performance but does not include the latest delta commits. Snapshot queries contain the freshest data but incur some computational overhead, which makes these queries less performant. For more information about the tradeoffs between table and query types, see [Table & Query Types](https://hudi.apache.org/docs/table_types/) in the Apache Hudi documentation.
+ **Incremental queries** – Athena does not support incremental queries.
+ **CTAS** – Athena does not support [CTAS](ctas.md) or [INSERT INTO](insert-into.md) on Hudi data. If you would like Athena support for writing Hudi datasets, send feedback to athena-feedback@amazon.com.

  For more information about writing Hudi data, see the following resources:
  + [Working with a Hudi dataset](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-hudi-work-with-dataset.html) in the [Amazon EMR Release Guide](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/).
  + [Writing Data](https://hudi.apache.org/docs/0.8.0/writing_data.html) in the Apache Hudi documentation.
+ **MSCK REPAIR TABLE** – Using MSCK REPAIR TABLE on Hudi tables in Athena is not supported. If you need to load a Hudi table not created in AWS Glue, use [ALTER TABLE ADD PARTITION](alter-table-add-partition.md).
+ **Skipping Amazon Glacier objects not supported ** – If objects in the Apache Hudi table are in an Amazon Glacier storage class, setting the `read_restored_glacier_objects` table property to `false` has no effect.

  For example, suppose you issue the following command:

  ```
  ALTER TABLE table_name SET TBLPROPERTIES ('read_restored_glacier_objects' = 'false')
  ```

  For Iceberg and Delta Lake tables, the command produces the error Unsupported table property key: read\$1restored\$1glacier\$1objects. For Hudi tables, the `ALTER TABLE` command does not produce an error, but Amazon Glacier objects are still not skipped. Running `SELECT` queries after the `ALTER TABLE` command continues to return all objects.
+ **Timestamp queries** – Currently, queries that attempt to read timestamp columns in Hudi real time tables either fail or produce empty results. This limitation applies only to queries that read a timestamp column. Queries that include only non-timestamp columns from the same table succeed. 

  Failed queries return a message similar to the following: 

  GENERIC\$1INTERNAL\$1ERROR: class org.apache.hadoop.io.ArrayWritable cannot be cast to class org.apache.hadoop.hive.serde2.io.TimestampWritableV2 (org.apache.hadoop.io.ArrayWritable and org.apache.hadoop.hive.serde2.io.TimestampWritableV2 are in unnamed module of loader io.trino.server.PluginClassLoader @75c67992)
+ **Lake Formation Permissions on 0.15.0 Hudi Connector** – This limitation applies only when you opt in to using the native Hudi connector (version 0.15.0) by setting the table property `athena_enable_native_hudi_connector_implementation` to `true`. By default, Athena uses Hudi connector version 0.14.0, which does not require this additional permission. To query a Lake Formation protected table, you must grant Lake Formation permissions to both the table's data location and the `.hoodie` metadata directory. For example, if your Hudi table is located at `s3://bucket/hudi-table/`, you must register and grant permissions to both `s3://bucket/hudi-table/` and `s3://bucket/hudi-table/.hoodie/` in Lake Formation. The `.hoodie` directory contains metadata files (such as `hoodie.properties`) that Athena needs to read during query planning. Without permissions to the `.hoodie` directory, queries will fail with permission denied errors.

# Copy on write (CoW) create table examples
<a name="querying-hudi-copy-on-write-create-table-examples"></a>

If you have Hudi tables already created in AWS Glue, you can query them directly in Athena. When you create partitioned Hudi tables in Athena, you must run `ALTER TABLE ADD PARTITION` to load the Hudi data before you can query it.

## Nonpartitioned CoW table
<a name="querying-hudi-nonpartitioned-cow-table"></a>

The following example creates a nonpartitioned CoW table in Athena.

```
CREATE EXTERNAL TABLE `non_partition_cow`(
  `_hoodie_commit_time` string,
  `_hoodie_commit_seqno` string,
  `_hoodie_record_key` string,
  `_hoodie_partition_path` string,
  `_hoodie_file_name` string,
  `event_id` string,
  `event_time` string,
  `event_name` string,
  `event_guests` int,
  `event_type` string)
ROW FORMAT SERDE
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT
  'org.apache.hudi.hadoop.HoodieParquetInputFormat'
OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://amzn-s3-demo-bucket/folder/non_partition_cow/'
```

## Partitioned CoW table
<a name="querying-hudi-partitioned-cow-table"></a>

The following example creates a partitioned CoW table in Athena.

```
CREATE EXTERNAL TABLE `partition_cow`(
  `_hoodie_commit_time` string, 
  `_hoodie_commit_seqno` string, 
  `_hoodie_record_key` string, 
  `_hoodie_partition_path` string, 
  `_hoodie_file_name` string, 
  `event_id` string, 
  `event_time` string, 
  `event_name` string, 
  `event_guests` int)
PARTITIONED BY ( 
  `event_type` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION
  's3://amzn-s3-demo-bucket/folder/partition_cow/'
```

The following `ALTER TABLE ADD PARTITION` example adds two partitions to the example `partition_cow` table.

```
ALTER TABLE partition_cow ADD
  PARTITION (event_type = 'one') LOCATION 's3://amzn-s3-demo-bucket/folder/partition_cow/one/' 
  PARTITION (event_type = 'two') LOCATION 's3://amzn-s3-demo-bucket/folder/partition_cow/two/'
```

# Merge on read (MoR) create table examples
<a name="querying-hudi-merge-on-read-create-table-examples"></a>

Hudi creates two tables in the metastore for MoR: a table for snapshot queries, and a table for read optimized queries. Both tables are queryable. In Hudi versions prior to 0.5.1, the table for read optimized queries had the name that you specified when you created the table. Starting in Hudi version 0.5.1, the table name is suffixed with `_ro` by default. The name of the table for snapshot queries is the name that you specified appended with `_rt`.

## Nonpartitioned merge on read (MoR) table
<a name="querying-hudi-nonpartitioned-merge-on-read-table"></a>

The following example creates a nonpartitioned MoR table in Athena for read optimized queries. Note that read optimized queries use the input format `HoodieParquetInputFormat`.

```
CREATE EXTERNAL TABLE `nonpartition_mor`(
  `_hoodie_commit_time` string, 
  `_hoodie_commit_seqno` string, 
  `_hoodie_record_key` string, 
  `_hoodie_partition_path` string, 
  `_hoodie_file_name` string, 
  `event_id` string, 
  `event_time` string, 
  `event_name` string, 
  `event_guests` int, 
  `event_type` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION
  's3://amzn-s3-demo-bucket/folder/nonpartition_mor/'
```

The following example creates a nonpartitioned MoR table in Athena for snapshot queries. For snapshot queries, use the input format `HoodieParquetRealtimeInputFormat`.

```
CREATE EXTERNAL TABLE `nonpartition_mor_rt`(
  `_hoodie_commit_time` string, 
  `_hoodie_commit_seqno` string, 
  `_hoodie_record_key` string, 
  `_hoodie_partition_path` string, 
  `_hoodie_file_name` string, 
  `event_id` string, 
  `event_time` string, 
  `event_name` string, 
  `event_guests` int, 
  `event_type` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' 
LOCATION
  's3://amzn-s3-demo-bucket/folder/nonpartition_mor/'
```

## Partitioned merge on read (MoR) table
<a name="querying-hudi-partitioned-merge-on-read-table"></a>

The following example creates a partitioned MoR table in Athena for read optimized queries.

```
CREATE EXTERNAL TABLE `partition_mor`(
  `_hoodie_commit_time` string, 
  `_hoodie_commit_seqno` string, 
  `_hoodie_record_key` string, 
  `_hoodie_partition_path` string, 
  `_hoodie_file_name` string, 
  `event_id` string, 
  `event_time` string, 
  `event_name` string, 
  `event_guests` int)
PARTITIONED BY ( 
  `event_type` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.HoodieParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://amzn-s3-demo-bucket/folder/partition_mor/'
```

The following `ALTER TABLE ADD PARTITION` example adds two partitions to the example `partition_mor` table.

```
ALTER TABLE partition_mor ADD
  PARTITION (event_type = 'one') LOCATION 's3://amzn-s3-demo-bucket/folder/partition_mor/one/'
  PARTITION (event_type = 'two') LOCATION 's3://amzn-s3-demo-bucket/folder/partition_mor/two/'
```

The following example creates a partitioned MoR table in Athena for snapshot queries.

```
CREATE EXTERNAL TABLE `partition_mor_rt`(
  `_hoodie_commit_time` string, 
  `_hoodie_commit_seqno` string, 
  `_hoodie_record_key` string, 
  `_hoodie_partition_path` string, 
  `_hoodie_file_name` string, 
  `event_id` string, 
  `event_time` string, 
  `event_name` string, 
  `event_guests` int)
PARTITIONED BY ( 
  `event_type` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS INPUTFORMAT 
  'org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat'
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://amzn-s3-demo-bucket/folder/partition_mor/'
```

Similarly, the following `ALTER TABLE ADD PARTITION` example adds two partitions to the example `partition_mor_rt` table.

```
ALTER TABLE partition_mor_rt ADD
  PARTITION (event_type = 'one') LOCATION 's3://amzn-s3-demo-bucket/folder/partition_mor/one/'
  PARTITION (event_type = 'two') LOCATION 's3://amzn-s3-demo-bucket/folder/partition_mor/two/'
```

# Use Hudi metadata for improved performance
<a name="querying-hudi-metadata-table"></a>

The Apache Hudi has a [metadata table](https://hudi.apache.org/docs/next/metadata/) that contains indexing features for improved performance like file listing, data skipping using column statistics, and a bloom filter based index.

Of these features, Athena currently supports only the file listing index. The file listing index eliminates file system calls like "list files" by fetching the information from an index which maintains a partition to files mapping. This removes the need to recursively list each and every partition under the table path to get a view of the file system. When you work with large datasets, this indexing drastically reduces the latency that would otherwise occur when getting the list of files during writes and queries. It also avoids bottlenecks like request limits throttling on Amazon S3 `LIST` calls.

**Note**  
Athena does not support data skipping or bloom filter indexing at this time.

## Enabling the Hudi metadata table
<a name="querying-hudi-metadata-table-enabling-the-hudi-metadata-table"></a>

Metadata table based file listing is disabled by default. To enable the Hudi metadata table and the related file listing functionality, set the `hudi.metadata-listing-enabled` table property to `TRUE`.

**Example**  
The following `ALTER TABLE SET TBLPROPERTIES` example enables the metadata table on the example `partition_cow` table.

```
ALTER TABLE partition_cow SET TBLPROPERTIES('hudi.metadata-listing-enabled'='TRUE')
```

## Use bootstrap generated metadata
<a name="querying-hudi-hudi-dataset-table-types-bootstrap"></a>

Starting in Apache Hudi version 0.6.0, the bootstrap operation feature provides better performance with existing Parquet datasets. Instead of rewriting the dataset, a bootstrap operation can generate metadata only, leaving the dataset in place. 

You can use Athena to query tables from a bootstrap operation just like other tables based on data in Amazon S3. In your `CREATE TABLE` statement, specify the Hudi table path in your `LOCATION` clause. 

For more information about creating Hudi tables using the bootstrap operation in Amazon EMR, see the article [New features from Apache Hudi available in Amazon EMR](https://aws.amazon.com/blogs/big-data/new-features-from-apache-hudi-available-in-amazon-emr/) in the AWS Big Data Blog.

# Additional resources
<a name="querying-hudi-additional-resources"></a>

For additional resources on using Apache Hudi with Athena, see the following resources.

## Video
<a name="querying-hudi-videos"></a>

The following video shows how you can use Amazon Athena to query a read-optimized Apache Hudi dataset in your Amazon S3-based data lake.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/TVcreqxBaGA/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/TVcreqxBaGA)


## Blog posts
<a name="querying-hudi-big-data-blogs"></a>

The following AWS Big Data Blog posts include descriptions of how you can use Apache Hudi with Athena.
+ [Use AWS Data Exchange to seamlessly share Apache Hudi datasets](https://aws.amazon.com/blogs/big-data/use-aws-data-exchange-to-seamlessly-share-apache-hudi-datasets/) 
+ [Create an Apache Hudi-based near-real-time transactional data lake using AWS DMS, Amazon Kinesis, AWS Glue streaming ETL, and data visualization using Quick](https://aws.amazon.com/blogs/big-data/create-an-apache-hudi-based-near-real-time-transactional-data-lake-using-aws-dms-amazon-kinesis-aws-glue-streaming-etl-and-data-visualization-using-amazon-quicksight/) 
+ For information about using AWS Glue custom connectors and AWS Glue 2.0 jobs to create an Apache Hudi table that you can query with Athena, see [Writing to Apache Hudi tables using AWS Glue custom connector](https://aws.amazon.com/blogs/big-data/writing-to-apache-hudi-tables-using-aws-glue-connector/).
+ For an article about using Apache Hudi, AWS Glue, and Amazon Athena to build a data processing framework for a data lake, see [Simplify operational data processing in data lakes using AWS Glue and Apache Hudi](https://aws.amazon.com/blogs/big-data/simplify-operational-data-processing-in-data-lakes-using-aws-glue-and-apache-hudi/).

# Query Apache Iceberg tables
<a name="querying-iceberg"></a>

You can use Athena to perform read, time travel, write, and DDL queries on Apache Iceberg tables.

[Apache Iceberg](https://iceberg.apache.org/) is an open table format for very large analytic datasets. Iceberg manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries. The Iceberg specification allows seamless table evolution such as schema and partition evolution and is designed for optimized usage on Amazon S3. Iceberg also helps guarantee data correctness under concurrent write scenarios.

For more information about Apache Iceberg, see [https://iceberg.apache.org/](https://iceberg.apache.org/).

## Considerations and limitations
<a name="querying-iceberg-considerations-and-limitations"></a>

Athena support for Iceberg tables has the following considerations and limitations:
+ **Iceberg version support** – Athena supports Apache Iceberg version 1.4.2. 
+ **Tables registered with Lake Formation** – Athena does not currently support DDL operations on Iceberg tables that are registered with Lake Formation. 
+ **Queries against information schema** – When querying the information schema of Iceberg tables, Athena uses S3 metadata as the source of truth for column metadata. This means that column information is derived from the underlying S3 files rather than from the catalog metadata. This behavior differs from other table formats where catalog metadata might be the primary source for column information.
+ **Tables with AWS Glue catalog only** – Only Iceberg tables created against the AWS Glue catalog based on specifications defined by the [open source glue catalog implementation](https://iceberg.apache.org/docs/latest/aws/#glue-catalog) are supported from Athena.
+ **Table locking support by AWS Glue only** – Unlike the open source Glue catalog implementation, which supports plug-in custom locking, Athena supports AWS Glue optimistic locking only. Using Athena to modify an Iceberg table with any other lock implementation will cause potential data loss and break transactions.
+ **Supported file formats** – Athena engine version 3 supports the following Iceberg file formats.
  + Parquet
  + ORC
  + Avro
+ **Iceberg restricted metadata** – Lake Formation does not evaluate the Iceberg metadata tables. Hence, the Iceberg metadata tables are restricted if there are any Lake Formation row or cell filters present on the base table or if you do not have permissions to view all columns in the base table. For such cases, when you query the `$partitions`, `$files`, `$manifests`, and `$snapshots` Iceberg metadata tables, it fails and you get an `AccessDeniedException` error. Additionally, the metadata column `$path` has the same Lake Formation restrictions and fails when selected by the query. All other metadata tables can be queried regardless of the Lake Formation filters. For more information, see [Metadata tables](https://trino.io/docs/current/connector/iceberg.html#metadata-tables).
+ **Iceberg v2 tables** – Athena only creates and operates on Iceberg v2 tables. For the difference between v1 and v2 tables, see [Format version changes](https://iceberg.apache.org/spec/#appendix-e-format-version-changes) in the Apache Iceberg documentation.
+ **Display of time types without time zone** – The time and timestamp without time zone types are displayed in UTC. If the time zone is unspecified in a filter expression on a time column, UTC is used.
+ **Timestamp related data precision** – Although Iceberg supports microsecond precision for the timestamp data type, Athena supports only millisecond precision for timestamps in both reads and writes. For data in time related columns that is rewritten during manual compaction operations, Athena retains only millisecond precision.
+ **Unsupported operations** – The following Athena operations are not supported for Iceberg tables. 
  + [ALTER TABLE SET LOCATION](alter-table-set-location.md)
+ **Views** – Use `CREATE VIEW` to create Athena views as described in [Work with views](views.md). If you are interested in using the [Iceberg view specification](https://github.com/apache/iceberg/blob/master/format/view-spec.md) to create views, contact [athena-feedback@amazon.com](mailto:athena-feedback@amazon.com). 
+ **TTF management commands not supported in AWS Lake Formation** – Although you can use Lake Formation to manage read access permissions for TransactionTable Formats (TTFs) like Apache Iceberg, Apache Hudi, and Linux Foundation Delta Lake, you cannot use Lake Formation to manage permissions for operations like `VACUUM`, `MERGE`, `UPDATE` or `OPTIMIZE` with these table formats. For more information about Lake Formation integration with Athena, see [Using AWS Lake Formation with Amazon Athena](https://docs.aws.amazon.com/lake-formation/latest/dg/athena-lf.html) in the *AWS Lake Formation Developer Guide*.
+ **Partitioning by nested fields** – Partitioning by nested fields is not supported. Attempting to do so produces the message NOT\$1SUPPORTED: Partitioning by nested field is unsupported: *column\$1name*.*nested\$1field\$1name*.
+ **Skipping Amazon Glacier objects not supported ** – If objects in the Apache Iceberg table are in an Amazon Glacier storage class, setting the `read_restored_glacier_objects` table property to `false` has no effect.

  For example, suppose you issue the following command:

  ```
  ALTER TABLE table_name SET TBLPROPERTIES ('read_restored_glacier_objects' = 'false')
  ```

  For Iceberg and Delta Lake tables, the command produces the error Unsupported table property key: read\$1restored\$1glacier\$1objects. For Hudi tables, the `ALTER TABLE` command does not produce an error, but Amazon Glacier objects are still not skipped. Running `SELECT` queries after the `ALTER TABLE` command continues to return all objects.

If you would like Athena to support a particular feature, send feedback to [athena-feedback@amazon.com](mailto:athena-feedback@amazon.com).

**Topics**
+ [Considerations and limitations](#querying-iceberg-considerations-and-limitations)
+ [Create Iceberg tables](querying-iceberg-creating-tables.md)
+ [Query Iceberg table data](querying-iceberg-table-data.md)
+ [Perform time travel and version travel queries](querying-iceberg-time-travel-and-version-travel-queries.md)
+ [Update Iceberg table data](querying-iceberg-updating-iceberg-table-data.md)
+ [Manage Iceberg tables](querying-iceberg-managing-tables.md)
+ [Evolve Iceberg table schema](querying-iceberg-evolving-table-schema.md)
+ [Perform other DDL operations on Iceberg tables](querying-iceberg-additional-operations.md)
+ [Optimize Iceberg tables](querying-iceberg-data-optimization.md)
+ [Query AWS Glue Data Catalog materialized views](querying-iceberg-gdc-mv.md)
+ [Supported data types for Iceberg tables in Athena](querying-iceberg-supported-data-types.md)
+ [Additional resources](querying-iceberg-additional-resources.md)

# Create Iceberg tables
<a name="querying-iceberg-creating-tables"></a>

To create an Iceberg table for use in Athena, you can use a `CREATE TABLE` statement as documented on this page, or you can use an AWS Glue crawler.

## Use a CREATE TABLE statement
<a name="querying-iceberg-creating-tables-query-editor"></a>

Athena creates Iceberg v2 tables. For the difference between v1 and v2 tables, see [Format version changes](https://iceberg.apache.org/spec/#appendix-e-format-version-changes) in the Apache Iceberg documentation.

Athena `CREATE TABLE` creates an Iceberg table with no data. You can query a table from external systems such as Apache Spark directly if the table uses the [Iceberg open source glue catalog](https://iceberg.apache.org/docs/latest/aws/#glue-catalog). You do not have to create an external table.

**Warning**  
Running `CREATE EXTERNAL TABLE` results in the error message External keyword not supported for table type ICEBERG. 

To create an Iceberg table from Athena, set the `'table_type'` table property to `'ICEBERG'` in the `TBLPROPERTIES` clause, as in the following syntax summary.

```
CREATE TABLE
  [db_name.]table_name (col_name data_type [COMMENT col_comment] [, ...] )
  [PARTITIONED BY (col_name | transform, ... )]
  LOCATION 's3://amzn-s3-demo-bucket/your-folder/'
  TBLPROPERTIES ( 'table_type' ='ICEBERG' [, property_name=property_value] )
```

For information about the data types that you can query in Iceberg tables, see [Supported data types for Iceberg tables in Athena](querying-iceberg-supported-data-types.md).

### Use partitions
<a name="querying-iceberg-partitioning"></a>

To create Iceberg tables with partitions, use `PARTITIONED BY` syntax. Columns used for partitioning must be specified in the columns declarations first. Within the `PARTITIONED BY` clause, the column type must not be included. You can also define [partition transforms](https://iceberg.apache.org/spec/#partition-transforms) in `CREATE TABLE` syntax. To specify multiple columns for partitioning, separate the columns with the comma (`,`) character, as in the following example.

```
CREATE TABLE iceberg_table (id bigint, data string, category string)
  PARTITIONED BY (category, bucket(16, id))
  LOCATION 's3://amzn-s3-demo-bucket/your-folder/'
  TBLPROPERTIES ( 'table_type' = 'ICEBERG' )
```

The following table shows the available partition transform functions.


****  

| Function | Description | Supported types | 
| --- | --- | --- | 
| year(ts) | Partition by year | date, timestamp | 
| month(ts) | Partition by month | date, timestamp | 
| day(ts)  | Partition by day | date, timestamp | 
| hour(ts) | Partition by hour | timestamp | 
| bucket(N, col) | Partition by hashed value mod N buckets. This is the same concept as hash bucketing for Hive tables. | int, long, decimal, date, timestamp, string, binary  | 
| truncate(L, col) | Partition by value truncated to L | int, long, decimal, string | 

Athena supports Iceberg's hidden partitioning. For more information, see [Iceberg's hidden partitioning](https://iceberg.apache.org/docs/latest/partitioning/#icebergs-hidden-partitioning) in the Apache Iceberg documentation.

### Specify table properties
<a name="querying-iceberg-table-properties"></a>

This section describes table properties that you can specify as key-value pairs in the `TBLPROPERTIES` clause of the `CREATE TABLE` statement. Athena allows only a predefined list of key-value pairs in the table properties for creating or altering Iceberg tables. The following tables show the table properties that you can specify. For more information about the compaction options, see [Optimize Iceberg tables](querying-iceberg-data-optimization.md) in this documentation. If you would like Athena to support a specific open source table configuration property, send feedback to [athena-feedback@amazon.com](mailto:athena-feedback@amazon.com). 

***format***


****  

|  |  | 
| --- |--- |
| Description | File data format | 
| Allowed property values | Supported file format and compression combinations vary by Athena engine version. For more information, see [Use Iceberg table compression](compression-support-iceberg.md). | 
| Default value | parquet | 

***write\$1compression***


****  

|  |  | 
| --- |--- |
| Description | File compression codec | 
| Allowed property values | Supported file format and compression combinations vary by Athena engine version. For more information, see [Use Iceberg table compression](compression-support-iceberg.md). | 
| Default value |  Default write compression varies by Athena engine version. For more information, see [Use Iceberg table compression](compression-support-iceberg.md).  | 

***optimize\$1rewrite\$1data\$1file\$1threshold***


****  

|  |  | 
| --- |--- |
| Description | Data optimization specific configuration. If there are fewer data files that require optimization than the given threshold, the files are not rewritten. This allows the accumulation of more data files to produce files closer to the target size and skip unnecessary computation for cost saving. | 
| Allowed property values | A positive number. Must be less than 50. | 
| Default value | 5 | 

***optimize\$1rewrite\$1delete\$1file\$1threshold***


****  

|  |  | 
| --- |--- |
| Description | Data optimization specific configuration. If there are fewer delete files associated with a data file than the threshold, the data file is not rewritten. This allows the accumulation of more delete files for each data file for cost saving. | 
| Allowed property values | A positive number. Must be less than 50. | 
| Default value | 2 | 

***vacuum\$1min\$1snapshots\$1to\$1keep***


****  

|  |  | 
| --- |--- |
| Description |  Minimum number of snapshots to retain on a table's main branch. This value takes precedence over the `vacuum_max_snapshot_age_seconds` property. If the minimum remaining snapshots are older than the age specified by `vacuum_max_snapshot_age_seconds`, the snapshots are kept, and the value of `vacuum_max_snapshot_age_seconds` is ignored.  | 
| Allowed property values | A positive number. | 
| Default value | 1 | 

***vacuum\$1max\$1snapshot\$1age\$1seconds***


****  

|  |  | 
| --- |--- |
| Description | Maximum age of the snapshots to retain on the main branch. This value is ignored if the remaining minimum of snapshots specified by vacuum\$1min\$1snapshots\$1to\$1keep are older than the age specified. This table behavior property corresponds to the history.expire.max-snapshot-age-ms property in Apache Iceberg configuration. | 
| Allowed property values | A positive number. | 
| Default value | 432000 seconds (5 days) | 

***vacuum\$1max\$1metadata\$1files\$1to\$1keep***


****  

|  |  | 
| --- |--- |
| Description | The maximum number of previous metadata files to retain on the table's main branch. | 
| Allowed property values | A positive number. | 
| Default value | 100 | 

***write\$1data\$1path\$1enabled***


****  

|  |  | 
| --- |--- |
| Description | When set to true, the Iceberg table is created with the write.data.path property instead of the deprecated write.object-storage.path property. Use this option to ensure compatibility with Iceberg 1.9.0 and later, which no longer supports the deprecated property. | 
| Allowed property values | true, false | 
| Default value | false | 

### Example CREATE TABLE statement
<a name="querying-iceberg-example-create-table-statement"></a>

The following example creates an Iceberg table that has three columns.

```
CREATE TABLE iceberg_table (
  id int,
  data string,
  category string) 
PARTITIONED BY (category, bucket(16,id)) 
LOCATION 's3://amzn-s3-demo-bucket/iceberg-folder' 
TBLPROPERTIES (
  'table_type'='ICEBERG',
  'format'='parquet',
  'write_compression'='snappy',
  'optimize_rewrite_delete_file_threshold'='10'
)
```

## Use CREATE TABLE AS SELECT (CTAS)
<a name="querying-iceberg-creating-tables-ctas"></a>

For information about creating an Iceberg table using the `CREATE TABLE AS` statement, see [CREATE TABLE AS](create-table-as.md), with particular attention to the [CTAS table properties](create-table-as.md#ctas-table-properties) section.

## Use an AWS Glue crawler
<a name="querying-iceberg-creating-tables-crawler"></a>

You can use an AWS Glue crawler to automatically register your Iceberg tables into the AWS Glue Data Catalog. If you want to migrate from another Iceberg catalog, you can create and schedule an AWS Glue crawler and provide the Amazon S3 paths where the Iceberg tables are located. You can specify the maximum depth of the Amazon S3 paths that the AWS Glue crawler can traverse. After you schedule an AWS Glue crawler, the crawler extracts schema information and updates the AWS Glue Data Catalog with the schema changes every time it runs. The AWS Glue crawler supports schema merging across snapshots and updates the latest metadata file location in the AWS Glue Data Catalog. For more information, see [Data Catalog and crawlers in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html). 

# Query Iceberg table data
<a name="querying-iceberg-table-data"></a>

To query an Iceberg dataset, use a standard `SELECT` statement like the following. Queries follow the Apache Iceberg [format v2 spec](https://iceberg.apache.org/spec/#format-versioning) and perform merge-on-read of both position and equality deletes.

```
SELECT * FROM [db_name.]table_name [WHERE predicate]
```

To optimize query times, all predicates are pushed down to where the data lives.

For information about time travel and version travel queries, see [Perform time travel and version travel queries](querying-iceberg-time-travel-and-version-travel-queries.md).

## Create and query views with Iceberg tables
<a name="querying-iceberg-views"></a>

To create and query Athena views on Iceberg tables, use `CREATE VIEW` views as described in [Work with views](views.md).

Example:

```
CREATE VIEW view1 AS SELECT * FROM iceberg_table
```

```
SELECT * FROM view1 
```

If you are interested in using the [Iceberg view specification](https://github.com/apache/iceberg/blob/master/format/view-spec.md) to create views, contact [athena-feedback@amazon.com](mailto:athena-feedback@amazon.com). 

## Query Iceberg table metadata
<a name="querying-iceberg-table-metadata"></a>

In a `SELECT` query, you can use the following properties after *table\$1name*to query Iceberg table metadata:
+ **\$1files** – Shows a table's current data files.
+ **\$1manifests** – Shows a table's current file manifests.
+ **\$1history** – Shows a table's history.
+ **\$1partitions** – Shows a table's current partitions.
+ **\$1snapshots** – Shows a table's snapshots.
+ **\$1refs** – Shows a table's references.

### Examples
<a name="querying-iceberg-table-metadata-syntax"></a>

The following statement lists the files for an Iceberg table.

```
SELECT * FROM "dbname"."tablename$files"
```

The following statement lists the manifests for an Iceberg table.

```
SELECT * FROM "dbname"."tablename$manifests" 
```

The following statement shows the history for an Iceberg table.

```
SELECT * FROM "dbname"."tablename$history"
```

The following example shows the partitions for an Iceberg table.

```
SELECT * FROM "dbname"."tablename$partitions" 
```

The following example lists the snapshots for an Iceberg table.

```
SELECT * FROM "dbname"."tablename$snapshots" 
```

The following example shows the references for an Iceberg table.

```
SELECT * FROM "dbname"."tablename$refs" 
```

## Use Lake Formation fine-grained access control
<a name="querying-iceberg-working-with-lf-fgac"></a>

Athena engine version 3 supports Lake Formation fine-grained access control with Iceberg tables, including column level and row level security access control. This access control works with time travel queries and with tables that have performed schema evolution. For more information, see [Lake Formation fine-grained access control and Athena workgroups](lf-athena-limitations.md#lf-athena-limitations-fine-grained-access-control).

If you created your Iceberg table outside of Athena, use [Apache Iceberg SDK](https://iceberg.apache.org/releases/) version 0.13.0 or higher so that your Iceberg table column information is populated in the AWS Glue Data Catalog. If your Iceberg table does not contain column information in AWS Glue, you can use the Athena [ALTER TABLE SET TBLPROPERTIES](querying-iceberg-alter-table-set-properties.md) statement or the latest Iceberg SDK to fix the table and update the column information in AWS Glue. 

# Perform time travel and version travel queries
<a name="querying-iceberg-time-travel-and-version-travel-queries"></a>

Each Apache Iceberg table maintains a versioned manifest of the Amazon S3 objects that it contains. Previous versions of the manifest can be used for time travel and version travel queries.

Time travel queries in Athena query Amazon S3 for historical data from a consistent snapshot as of a specified date and time. Version travel queries in Athena query Amazon S3 for historical data as of a specified snapshot ID.

## Time travel queries
<a name="querying-iceberg-time-travel-queries"></a>

To run a time travel query, use `FOR TIMESTAMP AS OF timestamp` after the table name in the `SELECT` statement, as in the following example.

```
SELECT * FROM iceberg_table FOR TIMESTAMP AS OF timestamp
```

The system time to be specified for traveling is either a timestamp or timestamp with a time zone. If not specified, Athena considers the value to be a timestamp in UTC time.

The following example time travel queries select CloudTrail data for the specified date and time.

```
SELECT * FROM iceberg_table FOR TIMESTAMP AS OF TIMESTAMP '2020-01-01 10:00:00 UTC'
```

```
SELECT * FROM iceberg_table FOR TIMESTAMP AS OF (current_timestamp - interval '1' day)
```

## Version travel queries
<a name="querying-iceberg-version-travel-queries"></a>

To execute a version travel query (that is, view a consistent snapshot as of a specified version), use `FOR VERSION AS OF version` after the table name in the `SELECT` statement, as in the following example.

```
SELECT * FROM [db_name.]table_name FOR VERSION AS OF version         
```

The *version* parameter is the `bigint` snapshot ID associated with an Iceberg table version.

The following example version travel query selects data for the specified version.

```
SELECT * FROM iceberg_table FOR VERSION AS OF 949530903748831860
```

**Note**  
The `FOR SYSTEM_TIME AS OF` and `FOR SYSTEM_VERSION AS OF` clauses in Athena engine version 2 have been replaced by the `FOR TIMESTAMP AS OF` and `FOR VERSION AS OF` clauses in Athena engine version 3.

### Retrieve the snapshot ID
<a name="querying-iceberg-table-snapshot-id"></a>

You can use the Java [SnapshotUtil](https://iceberg.apache.org/javadoc/1.6.0/org/apache/iceberg/util/SnapshotUtil.html) class provided by Iceberg to retrieve the Iceberg snapshot ID, as in the following example.

```
import org.apache.iceberg.Table;
import org.apache.iceberg.aws.glue.GlueCatalog;
import org.apache.iceberg.catalog.TableIdentifier;
import org.apache.iceberg.util.SnapshotUtil;

import java.text.SimpleDateFormat;
import java.util.Date;

Catalog catalog = new GlueCatalog();

Map<String, String> properties = new HashMap<String, String>();
properties.put("warehouse", "s3://amzn-s3-demo-bucket/my-folder");
catalog.initialize("my_catalog", properties);

Date date = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss").parse("2022/01/01 00:00:00");
long millis = date.getTime();

TableIdentifier name = TableIdentifier.of("db", "table");
Table table = catalog.loadTable(name);
long oldestSnapshotIdAfter2022 = SnapshotUtil.oldestAncestorAfter(table, millis);
```

## Combine time and version travel
<a name="querying-iceberg-combining-time-and-version-travel"></a>

You can use time travel and version travel syntax in the same query to specify different timing and versioning conditions, as in the following example.

```
SELECT table1.*, table2.* FROM 
  [db_name.]table_name FOR TIMESTAMP AS OF (current_timestamp - interval '1' day) AS table1 
  FULL JOIN 
  [db_name.]table_name FOR VERSION AS OF 5487432386996890161 AS table2 
  ON table1.ts = table2.ts 
  WHERE (table1.id IS NULL OR table2.id IS NULL)
```

# Update Iceberg table data
<a name="querying-iceberg-updating-iceberg-table-data"></a>

You can manage Iceberg table data directly on Athena by using `INSERT`, `UPDATE`, and `DELETE` queries. Each data management transaction produces a new snapshot, which can be queried using time travel. The `UPDATE` and `DELETE` statements follow the Iceberg format v2 row-level [position delete](https://iceberg.apache.org/spec/#position-delete-files) specification and enforce snapshot isolation.

**Note**  
Athena SQL does not currently support the copy-on-write approach. The `UPDATE`, `MERGE INTO`, and `DELETE FROM` operations always use the merge-on-read approach with positional deletes, regardless of specified table properties. In case you have setup table properties such as `write.update.mode`, `write.merge.mode`, and/or `write.delete.mode` to use copy-on-write, your queries won't fail as Athena will ignore them and keep using merge-on-read. 

Use the following commands to perform data management operations on Iceberg tables.

**Topics**
+ [INSERT INTO](querying-iceberg-insert-into.md)
+ [DELETE](querying-iceberg-delete.md)
+ [UPDATE](querying-iceberg-update.md)
+ [MERGE INTO](querying-iceberg-merge-into.md)

# INSERT INTO
<a name="querying-iceberg-insert-into"></a>

Inserts data into an Iceberg table. Athena Iceberg `INSERT INTO` is charged the same as current `INSERT INTO` queries for external Hive tables by the amount of data scanned. To insert data into an Iceberg table, use the following syntax, where *query* can be either `VALUES (val1, val2, ...)` or `SELECT (col1, col2, …) FROM [db_name.]table_name WHERE predicate`. For SQL syntax and semantic details, see [INSERT INTO](insert-into.md).

```
INSERT INTO [db_name.]table_name [(col1, col2, …)] query
```

The following examples insert values into the table `iceberg_table`.

```
INSERT INTO iceberg_table VALUES (1,'a','c1')
```

```
INSERT INTO iceberg_table (col1, col2, ...) VALUES (val1, val2, ...)
```

```
INSERT INTO iceberg_table SELECT * FROM another_table
```

# DELETE
<a name="querying-iceberg-delete"></a>

Athena Iceberg `DELETE` writes Iceberg position delete files to a table. This is known as a merge-on-read delete. In contrast to a copy-on-write delete, a merge-on-read delete is more efficient because it does not rewrite file data. When Athena reads Iceberg data, it merges the Iceberg position delete files with data files to produce the latest view of a table. To remove these position delete files, you can run the [REWRITE DATA compaction action](querying-iceberg-data-optimization.md#querying-iceberg-data-optimization-rewrite-data-action). `DELETE` operations are charged by the amount of data scanned. For syntax, see [DELETE](delete-statement.md).

The following example deletes rows from `iceberg_table` that have `c3` as the value for `category`.

```
DELETE FROM iceberg_table WHERE category='c3'
```

# UPDATE
<a name="querying-iceberg-update"></a>

Athena Iceberg `UPDATE` writes Iceberg position delete files and newly updated rows as data files in the same transaction. `UPDATE` can be imagined as a combination of `INSERT INTO` and `DELETE`. `UPDATE` operations are charged by the amount of data scanned. For syntax, see [UPDATE](update-statement.md).

The following example updates the specified values in the table `iceberg_table`.

```
UPDATE iceberg_table SET category='c2' WHERE category='c1'
```

# MERGE INTO
<a name="querying-iceberg-merge-into"></a>

Conditionally updates, deletes, or inserts rows into an Iceberg table. A single statement can combine update, delete, and insert actions. For syntax, see [MERGE INTO](merge-into-statement.md).

**Note**  
`MERGE INTO` is transactional and is supported only for Apache Iceberg tables in Athena engine version 3.

The following example deletes all customers from table `t` that are in the source table `s`.

```
MERGE INTO accounts t USING monthly_accounts_update s
ON t.customer = s.customer
WHEN MATCHED
THEN DELETE
```

The following example updates target table `t` with customer information from source table `s`. For customer rows in table `t` that have matching customer rows in table `s`, the example increments the purchases in table t. If table `t` has no match for a customer row in table `s`, the example inserts the customer row from table `s` into table `t`.

```
MERGE INTO accounts t USING monthly_accounts_update s
    ON (t.customer = s.customer)
    WHEN MATCHED
        THEN UPDATE SET purchases = s.purchases + t.purchases
    WHEN NOT MATCHED
        THEN INSERT (customer, purchases, address)
              VALUES(s.customer, s.purchases, s.address)
```

The following example conditionally updates target table `t` with information from the source table `s`. The example deletes any matching target row for which the source address is Centreville. For all other matching rows, the example adds the source purchases and sets the target address to the source address. If there is no match in the target table, the example inserts the row from the source table.

```
MERGE INTO accounts t USING monthly_accounts_update s
    ON (t.customer = s.customer)
    WHEN MATCHED AND s.address = 'Centreville'
        THEN DELETE
    WHEN MATCHED
        THEN UPDATE
            SET purchases = s.purchases + t.purchases, address = s.address
    WHEN NOT MATCHED
        THEN INSERT (customer, purchases, address)
              VALUES(s.customer, s.purchases, s.address)
```

# Manage Iceberg tables
<a name="querying-iceberg-managing-tables"></a>

Athena supports the following table DDL operations for Iceberg tables.

**Topics**
+ [ALTER TABLE RENAME](querying-iceberg-alter-table-rename.md)
+ [ALTER TABLE SET TBLPROPERTIES](querying-iceberg-alter-table-set-properties.md)
+ [ALTER TABLE UNSET TBLPROPERTIES](querying-iceberg-alter-table-unset-properties.md)
+ [DESCRIBE](querying-iceberg-describe-table.md)
+ [DROP TABLE](querying-iceberg-drop-table.md)
+ [SHOW CREATE TABLE](querying-iceberg-show-create-table.md)
+ [SHOW TBLPROPERTIES](querying-iceberg-show-table-properties.md)

# ALTER TABLE RENAME
<a name="querying-iceberg-alter-table-rename"></a>

Renames a table.

Because the table metadata of an Iceberg table is stored in Amazon S3, you can update the database and table name of an Iceberg managed table without affecting underlying table information.

## Synopsis
<a name="querying-iceberg-alter-table-rename-synopsis"></a>

```
ALTER TABLE [db_name.]table_name RENAME TO [new_db_name.]new_table_name
```

## Example
<a name="querying-iceberg-alter-table-rename-example"></a>

```
ALTER TABLE my_db.my_table RENAME TO my_db2.my_table2
```

# ALTER TABLE SET TBLPROPERTIES
<a name="querying-iceberg-alter-table-set-properties"></a>

Adds properties to an Iceberg table and sets their assigned values.

In accordance with [Iceberg specifications](https://iceberg.apache.org/#spec/#table-metadata-fields), table properties are stored in the Iceberg table metadata file rather than in AWS Glue. Athena does not accept custom table properties. Refer to the [Specify table properties](querying-iceberg-creating-tables.md#querying-iceberg-table-properties) section for allowed key-value pairs. You can also use `ALTER TABLE SET TBLPROPERTIES` and `ALTER TABLE UNSET TBLPROPERTIES` to set or remove the `write.data.path` and `write.object-storage.path` Iceberg table properties. If you would like Athena to support a specific open source table configuration property, send feedback to [athena-feedback@amazon.com](mailto:athena-feedback@amazon.com).

## Synopsis
<a name="querying-iceberg-alter-table-set-properties-synopsis"></a>

```
ALTER TABLE [db_name.]table_name SET TBLPROPERTIES ('property_name' = 'property_value' [ , ... ])
```

## Example
<a name="querying-iceberg-alter-table-set-properties-example"></a>

```
ALTER TABLE iceberg_table SET TBLPROPERTIES (
  'format'='parquet',
  'write_compression'='snappy',
  'optimize_rewrite_delete_file_threshold'='10'
)
```

The following example sets the `write.data.path` property on an existing Iceberg table.

```
ALTER TABLE iceberg_table SET TBLPROPERTIES (
  'write.data.path'='s3://amzn-s3-demo-bucket/your-folder/data'
)
```

# ALTER TABLE UNSET TBLPROPERTIES
<a name="querying-iceberg-alter-table-unset-properties"></a>

Drops existing properties from an Iceberg table.

## Synopsis
<a name="querying-iceberg-alter-table-unset-properties-synopsis"></a>

```
ALTER TABLE [db_name.]table_name UNSET TBLPROPERTIES ('property_name' [ , ... ])
```

## Example
<a name="querying-iceberg-alter-table-unset-properties-example"></a>

```
ALTER TABLE iceberg_table UNSET TBLPROPERTIES ('write_compression')
```

The following example removes the `write.data.path` property from an Iceberg table.

```
ALTER TABLE iceberg_table UNSET TBLPROPERTIES ('write.data.path')
```

# DESCRIBE
<a name="querying-iceberg-describe-table"></a>

Describes table information.

## Synopsis
<a name="querying-iceberg-describe-table-synopsis"></a>

```
DESCRIBE [FORMATTED] [db_name.]table_name
```

When the `FORMATTED` option is specified, the output displays additional information such as table location and properties.

## Example
<a name="querying-iceberg-describe-table-example"></a>

```
DESCRIBE iceberg_table
```

# DROP TABLE
<a name="querying-iceberg-drop-table"></a>

Drops an Iceberg table.

**Warning**  
Because Iceberg tables are considered managed tables in Athena, dropping an Iceberg table also removes all the data in the table.

## Synopsis
<a name="querying-iceberg-drop-table-synopsis"></a>

```
DROP TABLE [IF EXISTS] [db_name.]table_name
```

## Example
<a name="querying-iceberg-drop-table-example"></a>

```
DROP TABLE iceberg_table
```

# SHOW CREATE TABLE
<a name="querying-iceberg-show-create-table"></a>

Displays a `CREATE TABLE` DDL statement that can be used to recreate the Iceberg table in Athena. If Athena cannot reproduce the table structure (for example, because custom table properties are specified in the table), an UNSUPPORTED error is thrown.

## Synopsis
<a name="querying-iceberg-show-create-table-synopsis"></a>

```
SHOW CREATE TABLE [db_name.]table_name
```

## Example
<a name="querying-iceberg-show-create-table-example"></a>

```
SHOW CREATE TABLE iceberg_table
```

# SHOW TBLPROPERTIES
<a name="querying-iceberg-show-table-properties"></a>

Shows one or more table properties of an Iceberg table. Only Athena-supported table properties are shown.

## Synopsis
<a name="querying-iceberg-show-table-properties-synopsis"></a>

```
SHOW TBLPROPERTIES [db_name.]table_name [('property_name')]
```

## Example
<a name="querying-iceberg-show-table-properties-example"></a>

```
SHOW TBLPROPERTIES iceberg_table
```

# Evolve Iceberg table schema
<a name="querying-iceberg-evolving-table-schema"></a>

Iceberg schema updates are metadata-only changes. No data files are changed when you perform a schema update. 

The Iceberg format supports the following schema evolution changes:
+ **Add** – Adds a new column to a table or to a nested `struct`.
+ **Drop** – Removes an existing column from a table or nested `struct`.
+ **Rename** – Renames an existing column or field in a nested `struct`.
+ **Reorder** – Changes the order of columns.
+  **Type promotion** – Widens the type of a column, `struct` field, `map` key, `map` value, or `list` element. Currently, the following cases are supported for Iceberg tables: 
  + integer to big integer
  + float to double
  + increasing the precision of a decimal type

You can use the DDL statements in this section to modify Iceberg table schema.

**Topics**
+ [ALTER TABLE ADD COLUMNS](querying-iceberg-alter-table-add-columns.md)
+ [ALTER TABLE DROP COLUMN](querying-iceberg-alter-table-drop-column.md)
+ [ALTER TABLE CHANGE COLUMN](querying-iceberg-alter-table-change-column.md)
+ [SHOW COLUMNS](querying-iceberg-show-columns.md)

# ALTER TABLE ADD COLUMNS
<a name="querying-iceberg-alter-table-add-columns"></a>

Adds one or more columns to an existing Iceberg table.

## Synopsis
<a name="querying-iceberg-alter-table-add-columns-synopsis"></a>

```
ALTER TABLE [db_name.]table_name ADD COLUMNS (col_name data_type [,...])
```

## Examples
<a name="querying-iceberg-alter-table-add-columns-example"></a>

The following example adds a `comment` column of type `string` to an Iceberg table.

```
ALTER TABLE iceberg_table ADD COLUMNS (comment string)
```

The following example adds a `point` column of type `struct` to an Iceberg table.

```
ALTER TABLE iceberg_table 
ADD COLUMNS (point struct<x: double, y: double>)
```

The following example adds a `points` column that is an array of structs to an Iceberg table.

```
ALTER TABLE iceberg_table 
ADD COLUMNS (points array<struct<x: double, y: double>>)
```

# ALTER TABLE DROP COLUMN
<a name="querying-iceberg-alter-table-drop-column"></a>

Drops a column from an existing Iceberg table.

## Synopsis
<a name="querying-iceberg-alter-table-drop-column-synopsis"></a>

```
ALTER TABLE [db_name.]table_name DROP COLUMN col_name
```

## Example
<a name="querying-iceberg-alter-table-drop-column-example"></a>

```
ALTER TABLE iceberg_table DROP COLUMN userid
```

# ALTER TABLE CHANGE COLUMN
<a name="querying-iceberg-alter-table-change-column"></a>

Changes the name, type, order or comment of a column in an Iceberg table.

**Note**  
`ALTER TABLE REPLACE COLUMNS` is not supported. Because `REPLACE COLUMNS` removes all columns and then adds new ones, it is not supported for Iceberg. `CHANGE COLUMN` is the preferred syntax for schema evolution. 

## Synopsis
<a name="querying-iceberg-alter-table-change-column-synopsis"></a>

```
ALTER TABLE [db_name.]table_name
  CHANGE [COLUMN] col_old_name col_new_name column_type 
  [COMMENT col_comment] [FIRST|AFTER column_name]
```

## Example
<a name="querying-iceberg-alter-table-change-column-example"></a>

```
ALTER TABLE iceberg_table CHANGE comment blog_comment string AFTER id
```

# SHOW COLUMNS
<a name="querying-iceberg-show-columns"></a>

Shows the columns in a table.

## Synopsis
<a name="querying-iceberg-show-columns-synopsis"></a>

```
SHOW COLUMNS (FROM|IN) [db_name.]table_name
```

## Example
<a name="querying-iceberg-alter-table-change-column-example"></a>

```
SHOW COLUMNS FROM iceberg_table
```

# Perform other DDL operations on Iceberg tables
<a name="querying-iceberg-additional-operations"></a>

In addition to the schema evolution operations described in [Evolve Iceberg table schema](querying-iceberg-evolving-table-schema.md), you can also perform the following DDL operations on Apache Iceberg tables in Athena.

## Database level operations
<a name="querying-iceberg-additional-operations-database-level-operations"></a>

When you use [DROP DATABASE](drop-database.md) with the `CASCADE` option , any Iceberg table data is also removed. The following DDL operations have no effect on Iceberg tables.
+ [CREATE DATABASE](create-database.md)
+ [ALTER DATABASE SET DBPROPERTIES](alter-database-set-dbproperties.md)
+ [SHOW DATABASES](show-databases.md)
+ [SHOW TABLES](show-tables.md)
+ [SHOW VIEWS](show-views.md)

## Partition related operations
<a name="querying-iceberg-additional-operations-partition-related-operations"></a>

Because Iceberg tables use [hidden partitioning](https://iceberg.apache.org/docs/latest/partitioning/#icebergs-hidden-partitioning), you do not have to work with physical partitions directly. As a result, Iceberg tables in Athena do not support the following partition-related DDL operations:
+ [SHOW PARTITIONS](show-partitions.md)
+ [ALTER TABLE ADD PARTITION](alter-table-add-partition.md)
+ [ALTER TABLE DROP PARTITION](alter-table-drop-partition.md)
+ [ALTER TABLE RENAME PARTITION](alter-table-rename-partition.md)

If you would like to see Iceberg [ partition evolution](https://iceberg.apache.org/docs/latest/evolution/#partition-evolution) in Athena, send feedback to [athena-feedback@amazon.com](mailto:athena-feedback@amazon.com).

## Unload Iceberg tables
<a name="querying-iceberg-additional-operations-unload-iceberg-table"></a>

Iceberg tables can be unloaded to files in a folder on Amazon S3. For information, see [UNLOAD](unload.md).

## MSCK REPAIR
<a name="querying-iceberg-additional-operations-msck-repair"></a>

Because Iceberg tables keep track of table layout information, running [MSCK REPAIR TABLE](msck-repair-table.md) as one does with Hive tables is not necessary and is not supported.

# Optimize Iceberg tables
<a name="querying-iceberg-data-optimization"></a>

Athena provides several optimization features to improve query performance on Apache Iceberg tables. As data accumulates, queries can become less efficient due to increased file processing overhead and the computational cost of applying row-level deletes stored in Iceberg delete files. To address these challenges, Athena supports manual compaction and vacuum operators to optimize table structure. Athena also works with Iceberg statistics to enable cost-based query optimization and Parquet column indexing for precise data pruning during query execution. These features work together to reduce query execution time, minimize data scanning, and lower costs. This topic describes how to use these optimization capabilities to maintain high-performance queries on your Iceberg tables.

## OPTIMIZE
<a name="querying-iceberg-data-optimization-rewrite-data-action"></a>

The `OPTIMIZE table REWRITE DATA` compaction action rewrites data files into a more optimized layout based on their size and number of associated delete files. For syntax and table property details, see [OPTIMIZE](optimize-statement.md).

### Example
<a name="querying-iceberg-data-optimization-example"></a>

The following example merges delete files into data files and produces files near the targeted file size where the value of `category` is `c1`.

```
OPTIMIZE iceberg_table REWRITE DATA USING BIN_PACK
  WHERE category = 'c1'
```

## VACUUM
<a name="querying-iceberg-vacuum"></a>

`VACUUM` performs [snapshot expiration](https://iceberg.apache.org/docs/latest/spark-procedures/#expire_snapshots) and [orphan file removal](https://iceberg.apache.org/docs/latest/spark-procedures/#remove_orphan_files). These actions reduce metadata size and remove files not in the current table state that are also older than the retention period specified for the table. For syntax details, see [VACUUM](vacuum-statement.md).

### Example
<a name="querying-iceberg-vacuum-example"></a>

The following example uses a table property to configure the table `iceberg_table` to retain the last three days of data, then uses `VACUUM` to expire the old snapshots and remove the orphan files from the table.

```
ALTER TABLE iceberg_table SET TBLPROPERTIES (
  'vacuum_max_snapshot_age_seconds'='259200'
)

VACUUM iceberg_table
```

## Use Iceberg table statistics
<a name="querying-iceberg-data-optimization-statistics"></a>

Athena's cost-based optimizer uses Iceberg statistics to produce optimal query plans. When statistics have been generated for your Iceberg tables, Athena automatically uses this information to make intelligent decisions about join ordering, filters, and aggregation behavior, often improving query performance and reducing costs.

Iceberg statistics are turned on by default when you use S3 Tables. For other Iceberg tables, Athena uses the table property `use_iceberg_statistics` to determine whether to leverage statistics for cost-based optimization. To get started, see [Optimizing query performance using column statistics](https://docs.aws.amazon.com//glue/latest/dg/column-statistics.html) in the *AWS Glue User Guide* or use the [Athena console](https://docs.aws.amazon.com/athena/latest/ug/cost-based-optimizer.html) to generate on-demand statistics on your Iceberg tables.

## Use Parquet column indexing
<a name="querying-iceberg-data-optimization-parquet-column-indexing"></a>

Parquet column indexing makes it possible for Athena to perform more precise data pruning during query execution by leveraging page-level min/max statistics in addition to row group-level statistics. This allows Athena to skip unnecessary pages within row groups, significantly reducing the amount of data scanned and improving query performance. It works best for queries with selective filter predicates on sorted columns, improving both execution time and data scan efficiency while reducing the amount of data Athena needs to read from Amazon S3.

Athena uses Parquet column indexes by default with S3 Tables if column indexes are present in the underlying Parquet files. For other Iceberg tables, Athena uses the `use_iceberg_parquet_column_index` property to determine whether to utilize the column indexes in the Parquet file. Set this table property using the AWS Glue console or `UpdateTable` API.

# Query AWS Glue Data Catalog materialized views
<a name="querying-iceberg-gdc-mv"></a>

Athena allows you to query AWS Glue Data Catalog materialized views. Glue Data Catalog materialized views store pre-computed results of SQL queries as Apache Iceberg tables.

When you create Glue Data Catalog materialized views using Apache Spark in Amazon EMR or AWS Glue, the view definitions and metadata are stored in the AWS Glue Data Catalog. The pre-computed results are stored as Apache Iceberg tables in Amazon S3. You can query these materialized views from Athena using standard SQL `SELECT` statements, just as you would query regular Iceberg tables.

## Prerequisites
<a name="querying-iceberg-gdc-mv-prerequisites"></a>

Before you query materialized views in Athena, ensure the following:
+ The materialized view exists in the AWS Glue Data Catalog and was created using Apache Spark (Amazon EMR release 7.12.0 or later, or AWS Glue version 5.1 or later)
+ To query materialized views in Athena, you need the following AWS Lake Formation permissions:
  + `SELECT` permission on the materialized view
  + `DESCRIBE` permission on the materialized view
  + Access to the underlying Amazon S3 location where the materialized view data is stored
+ The materialized view's underlying data is stored in Amazon S3 Table buckets or Amazon S3 general purpose buckets
+ You have access to the AWS Glue Data Catalog database containing the materialized view
+ For materialized views stored in Amazon S3 Tables buckets, ensure your IAM role has the necessary permissions to access the S3 Tables catalog.

## Considerations and limitations
<a name="querying-iceberg-gdc-mv-considerations"></a>
+ Athena does not support the following operations on materialized views: `ALTER`, `CREATE MATERIALIZED VIEW`, `REFRESH MATERIALIZED VIEW`, `DROP`, `INSERT`, `UPDATE`, `MERGE`, `DELETE`, `OPTIMIZE`, `VACUUM`. To create materialized views, use Apache Spark in Amazon EMR or AWS Glue. Refresh operations must be performed through the AWS Glue Data Catalog API or Apache Spark. Modify materialized views using Apache Spark.

## Querying materialized views
<a name="querying-iceberg-gdc-mv-operations"></a>

Athena treats materialized views as standard Iceberg tables for read operations, allowing you to access the pre-computed data without requiring special syntax or configuration changes.

To query a materialized view in Athena, use standard `SELECT` statements:

```
SELECT * FROM my_database.sales_summary_mv;
```

You can apply filters, aggregations, and joins just as you would with regular tables:

```
SELECT
  region,
  SUM(total_sales) as sales_total
FROM my_database.sales_summary_mv
WHERE year = 2025
GROUP BY region
ORDER BY sales_total DESC;
```

## Supported operations
<a name="querying-iceberg-gdc-mv-supported"></a>

Athena supports the following operations on materialized views:
+ `SELECT` queries - Read data from materialized views using standard SQL `SELECT` statements
+ `DESCRIBE` - View the schema and metadata of materialized views
+ `SHOW TABLES` - List materialized views along with other tables in a database
+ `JOIN` operations - Join materialized views with other tables or views
+ Filtering and aggregation - Apply `WHERE` clauses, `GROUP BY`, and aggregate functions

# Supported data types for Iceberg tables in Athena
<a name="querying-iceberg-supported-data-types"></a>

Athena can query Iceberg tables that contain the following data types:

```
binary
boolean
date
decimal
double
float
int
list
long
map
string
struct
timestamp without time zone
```

For more information about Iceberg table types, see the [schemas page for Iceberg](https://iceberg.apache.org/docs/latest/schemas/) in the Apache documentation.

The following table shows the relationship between Athena data types and Iceberg table data types.


****  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-supported-data-types.html)

For more information about data types in Athena, see [Data types in Amazon Athena](data-types.md).

# Additional resources
<a name="querying-iceberg-additional-resources"></a>

The following article is in the AWS Prescriptive Guidance documentation.
+ [Working with Apache Iceberg tables by using Amazon Athena SQL](https://docs.aws.amazon.com/prescriptive-guidance/latest/apache-iceberg-on-aws/iceberg-athena.html) 

For in-depth articles on using Athena with Apache Iceberg tables, see the following posts in the *AWS Big Data Blog*.
+ [Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena](https://aws.amazon.com/blogs/big-data/implement-a-serverless-cdc-process-with-apache-iceberg-using-amazon-dynamodb-and-amazon-athena/) 
+ [Accelerate data science feature engineering on transactional data lakes using Amazon Athena with Apache Iceberg](https://aws.amazon.com/blogs/big-data/accelerate-data-science-feature-engineering-on-transactional-data-lakes-using-amazon-athena-with-apache-iceberg/) 
+ [Build an Apache Iceberg data lake using Amazon Athena, Amazon EMR, and AWS Glue](https://aws.amazon.com/blogs/big-data/build-an-apache-iceberg-data-lake-using-amazon-athena-amazon-emr-and-aws-glue/) 
+ [Perform upserts in a data lake using Amazon Athena and Apache Iceberg](https://aws.amazon.com/blogs/big-data/perform-upserts-in-a-data-lake-using-amazon-athena-and-apache-iceberg/) 
+ [Build a transactional data lake using Apache Iceberg, AWS Glue, and cross-account data shares using AWS Lake Formation and Amazon Athena](https://aws.amazon.com/blogs/big-data/build-a-transactional-data-lake-using-apache-iceberg-aws-glue-and-cross-account-data-shares-using-aws-lake-formation-and-amazon-athena/) 
+ [Use Apache Iceberg in a data lake to support incremental data processing](https://aws.amazon.com/blogs/big-data/use-apache-iceberg-in-a-data-lake-to-support-incremental-data-processing/) 
+ [Build a real-time GDPR-aligned Apache Iceberg data lake](https://aws.amazon.com/blogs/big-data/build-a-real-time-gdpr-aligned-apache-iceberg-data-lake/) 
+ [Automate replication of relational sources into a transactional data lake with Apache Iceberg and AWS Glue](https://aws.amazon.com/blogs/big-data/automate-replication-of-relational-sources-into-a-transactional-data-lake-with-apache-iceberg-and-aws-glue/) 
+ [Interact with Apache Iceberg tables using Amazon Athena and cross account fine-grained permissions using AWS Lake Formation](https://aws.amazon.com/blogs/big-data/interact-with-apache-iceberg-tables-using-amazon-athena-and-cross-account-fine-grained-permissions-using-aws-lake-formation/) 
+ [Build a serverless transactional data lake with Apache Iceberg, Amazon EMR Serverless, and Amazon Athena](https://aws.amazon.com/blogs/big-data/build-a-serverless-transactional-data-lake-with-apache-iceberg-amazon-emr-serverless-and-amazon-athena/)