# Query Linux Foundation Delta Lake tables Linux Foundation [Delta Lake](https://delta.io/) is a table format for big data analytics. You can use Amazon Athena to read Delta Lake tables stored in Amazon S3 directly without having to generate manifest files or run the `MSCK REPAIR` statement. The Delta Lake format stores the minimum and maximum values per column of each data file. The Athena implementation makes use of this information to enable file-skipping on predicates to eliminate unwanted files from consideration. ## Considerations and limitations Delta Lake support in Athena has the following considerations and limitations: + **Tables with AWS Glue catalog only** – Native Delta Lake support is supported only through tables registered with AWS Glue. If you have a Delta Lake table that is registered with another metastore, you can still keep it and treat it as your primary metastore. Because Delta Lake metadata is stored in the file system (for example, in Amazon S3) rather than in the metastore, Athena requires only the location property in AWS Glue to read from your Delta Lake tables. + **V3 engine only** – Delta Lake queries are supported only on Athena engine version 3. You must ensure that the workgroup you create is configured to use Athena engine version 3. + **No time travel support** – There is no support for queries that use Delta Lake’s time travel capabilities. + **Read only** – Write DML statements like `UPDATE`, `INSERT`, or `DELETE` are not supported. + **Lake Formation support** – Lake Formation integration is available for Delta Lake tables with their schema in sync with AWS Glue. For more information, see [Using AWS Lake Formation with Amazon Athena](https://docs.aws.amazon.com/lake-formation/latest/dg/athena-lf.html) and [Set up permissions for a Delta Lake table](https://docs.aws.amazon.com/lake-formation/latest/dg/set-up-delta-table.html) in the *AWS Lake Formation Developer Guide*. + **Limited DDL support** – The following DDL statements are supported: `CREATE EXTERNAL TABLE`, `SHOW COLUMNS`, `SHOW TBLPROPERTIES`, `SHOW PARTITIONS`, `SHOW CREATE TABLE`, and `DESCRIBE`. For information on using the `CREATE EXTERNAL TABLE` statement, see the [Get started with Delta Lake tables](delta-lake-tables-getting-started.md) section. + **Skipping Amazon Glacier objects not supported ** – If objects in the Linux Foundation Delta Lake table are in an Amazon Glacier storage class, setting the `read_restored_glacier_objects` table property to `false` has no effect. For example, suppose you issue the following command: ``` ALTER TABLE table_name SET TBLPROPERTIES ('read_restored_glacier_objects' = 'false') ``` For Iceberg and Delta Lake tables, the command produces the error Unsupported table property key: read\$1restored\$1glacier\$1objects. For Hudi tables, the `ALTER TABLE` command does not produce an error, but Amazon Glacier objects are still not skipped. Running `SELECT` queries after the `ALTER TABLE` command continues to return all objects. + **Encrypted tables** – Athena does not support natively reading CSE-KMS encrypted Delta Lake tables. This includes SELECT statements and DDL statements. ### Delta Lake versioning and Athena Athena does not use the [versioning](https://docs.delta.io/latest/releases.html) listed in the Delta Lake documentation. To determine whether your Delta Lake tables are compatible with Athena, consider the following two characteristics: + **Reader version ** – Every Delta Lake table has a reader version. Currently, this is a number between 1 and 3. Queries that include a table with a reader version that Athena does not support will fail. + **Table features ** – Every Delta Lake table also can declare a set of reader/writer features. Because Athena's support of Delta Lake is read-only, table writer feature compatibility does not apply. However, queries on tables with unsupported table reader features will fail. The following table shows the Delta Lake reader versions and Delta Lake table reader features that Athena supports. **** | Query type | Supported reader versions | Supported reader features | | --- | --- | --- | | DQL (SELECT statements) | <= 3 | [Column mapping](https://docs.delta.io/latest/delta-column-mapping.html), [timestampNtz](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#timestamp-without-timezone-timestampntz), [deletion vectors](https://docs.delta.io/latest/delta-deletion-vectors.html) | | DDL | <= 1 | Not applicable. Reader features can be declared only on tables with a reader version of 2 or greater. | + For a list of Delta Lake table features, see [Valid feature names in table features](https://github.com/delta-io/delta/blob/master/PROTOCOL.md#valid-feature-names-in-table-features) on GitHub.com + For a list of Delta Lake features by protocol version, see [Features by protocol version](https://docs.delta.io/latest/versioning.html#features-by-protocol-version) on GitHub.com. To create a Delta Lake table in Athena with a reader version greater than 1, see [Synchronize Delta Lake metadata](delta-lake-tables-syncing-metadata.md). **Topics** + [ ## Considerations and limitations ](#delta-lake-tables-considerations-and-limitations) + [ # Supported column data types ](delta-lake-tables-supported-data-types-columns.md) + [ # Get started with Delta Lake tables ](delta-lake-tables-getting-started.md) + [ # Query Delta Lake tables with SQL ](delta-lake-tables-querying.md) + [ # Synchronize Delta Lake metadata ](delta-lake-tables-syncing-metadata.md) + [ # Additional resources ](delta-lake-tables-additional-resources.md) # Supported column data types This section describes the supported data types for non-partition and partition columns. ## Supported non-partition column data types For non-partition columns, all data types that Athena supports except `CHAR` are supported (`CHAR` is not supported in the Delta Lake protocol itself). Supported data types include: ``` boolean tinyint smallint integer bigint double float decimal varchar string binary date timestamp array map struct ``` ## Supported partition column data types For partition columns, Athena supports tables with the following data types: ``` boolean integer smallint tinyint bigint decimal float double date timestamp varchar ``` For more information about the data types in Athena, see [Data types in Amazon Athena](data-types.md). # Get started with Delta Lake tables To be queryable, your Delta Lake table must exist in AWS Glue. If your table is in Amazon S3 but not in AWS Glue, run a `CREATE EXTERNAL TABLE` statement using the following syntax. If your table already exists in AWS Glue (for example, because you are using Apache Spark or another engine with AWS Glue), you can skip this step. Note the omission of column definitions, SerDe library, and other table properties. Unlike traditional Hive tables, Delta Lake table metadata are inferred from the Delta Lake transaction log and synchronized directly to AWS Glue. ``` CREATE EXTERNAL TABLE [db_name.]table_name LOCATION 's3://amzn-s3-demo-bucket/your-folder/' TBLPROPERTIES ('table_type' = 'DELTA') ``` **Note** This statement is not compatible with S3 buckets that have requester pays enabled. If you want to create a Delta Lake table against an S3 bucket with requester pays enabled, follow the instructions and DDL statement in [Synchronize Delta Lake metadata](delta-lake-tables-syncing-metadata.md). For Delta Lake tables, `CREATE TABLE` statements that include more than the `LOCATION` and `table_type` property are not allowed. # Query Delta Lake tables with SQL To query a Delta Lake table, use standard SQL `SELECT` syntax: ``` [ WITH with_query [, ...] ]SELECT [ ALL | DISTINCT ] select_expression [, ...] [ FROM from_item [, ...] ] [ WHERE condition ] [ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ] [ HAVING condition ] [ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ] [ ORDER BY expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST] [, ...] ] [ OFFSET count [ ROW | ROWS ] ] [ LIMIT [ count | ALL ] ] ``` For more information about `SELECT` syntax, see [SELECT](select.md) in the Athena documentation. The Delta Lake format stores the minimum and maximum values per column of each data file. Athena makes use of this information to enable file skipping on predicates to eliminate unnecessary files from consideration. # Synchronize Delta Lake metadata Athena synchronizes table metadata, including schema, partition columns, and table properties, to AWS Glue if you use Athena to create your Delta Lake table. As time passes, this metadata can lose its synchronization with the underlying table metadata in the transaction log. To keep your table up to date, you can choose one of the following options: + Use the AWS Glue crawler for Delta Lake tables. For more information, see [Introducing native Delta Lake table support with AWS Glue crawlers](https://aws.amazon.com/blogs/big-data/introducing-native-delta-lake-table-support-with-aws-glue-crawlers/) in the *AWS Big Data Blog* and [Scheduling an AWS Glue crawler](https://docs.aws.amazon.com/glue/latest/dg/schedule-crawler.html) in the AWS Glue Developer Guide. + Drop and recreate the table in Athena. + Use the SDK, CLI, or AWS Glue console to manually update the schema in AWS Glue. Note that the following features require your AWS Glue schema to always have the same schema as the transaction log: + Lake Formation + Views + Row and column filters If your workflow does not require any of this functionality, and you prefer not to maintain this compatibility, you can use `CREATE TABLE` DDL in Athena and then add the Amazon S3 path as a SerDe parameter in AWS Glue. ## Create a Delta Lake table using the Athena and AWS Glue consoles You can use the following procedure to create a Delta Lake table with the Athena and AWS Glue consoles. **To create a Delta Lake table using the Athena and AWS Glue consoles** 1. Open the Athena console at [https://console.aws.amazon.com/athena/](https://console.aws.amazon.com/athena/home). 1. In the Athena query editor, use the following DDL to create your Delta Lake table. Note that when using this method, the value for `TBLPROPERTIES` must be `'spark.sql.sources.provider' = 'delta'` and not `'table_type' = 'delta'`. Note that this same schema (with a single of column named `col` of type `array`) is inserted when you use Apache Spark (Athena for Apache Spark) or most other engines to create your table. ``` CREATE EXTERNAL TABLE [db_name.]table_name(col array) LOCATION 's3://amzn-s3-demo-bucket/your-folder/' TBLPROPERTIES ('spark.sql.sources.provider' = 'delta') ``` 1. Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/). 1. In the navigation pane, choose **Data Catalog**, **Tables**. 1. In the list of tables, choose the link for your table. 1. On the page for the table, choose **Actions**, **Edit table**. 1. In the **Serde parameters** section, add the key **path** with the value **s3://amzn-s3-demo-bucket/*your-folder*/**. 1. Choose **Save**. ## Create a Delta Lake table using the AWS CLI To create a Delta Lake table using the AWS CLI, enter a command like the following. ``` aws glue create-table --database-name dbname \ --table-input '{"Name" : "tablename", "StorageDescriptor":{ "Columns" : [ { "Name": "col", "Type": "array" } ], "Location" : "s3://amzn-s3-demo-bucket//", "SerdeInfo" : { "Parameters" : { "serialization.format" : "1", "path" : "s3://amzn-s3-demo-bucket//" } } }, "PartitionKeys": [], "TableType": "EXTERNAL_TABLE", "Parameters": { "EXTERNAL": "TRUE", "spark.sql.sources.provider": "delta" } }' ``` # Additional resources For a discussion of using Delta Lake tables with AWS Glue and querying them with Athena, see [Handle UPSERT data operations using open-source Delta Lake and AWS Glue](https://aws.amazon.com/blogs/big-data/handle-upsert-data-operations-using-open-source-delta-lake-and-aws-glue/) in the *AWS Big Data Blog*.