

# DML queries, functions, and operators
<a name="dml-queries-functions-operators"></a>

The Athena DML query engine generally supports Trino and Presto syntax and adds its own improvements. Athena does not support all Trino or Presto features. For more information, see the topics for specific statements in this section and [Considerations and limitations](other-notable-limitations.md). For information about functions, see [Functions in Amazon Athena](functions.md). For information about Athena engine versions, see [Athena engine versioning](engine-versions.md). 

For information about DDL statements, see [DDL statements](ddl-reference.md). For a list of unsupported DDL statements, see [Unsupported DDL](unsupported-ddl.md).

**Topics**
+ [SELECT](select.md)
+ [INSERT INTO](insert-into.md)
+ [VALUES](values-statement.md)
+ [DELETE](delete-statement.md)
+ [UPDATE](update-statement.md)
+ [MERGE INTO](merge-into-statement.md)
+ [OPTIMIZE](optimize-statement.md)
+ [VACUUM](vacuum-statement.md)
+ [EXPLAIN and EXPLAIN ANALYZE](athena-explain-statement.md)
+ [PREPARE](sql-prepare.md)
+ [UNLOAD](unload.md)
+ [Functions](functions.md)
+ [Use supported time zones](athena-supported-time-zones.md)

# SELECT
<a name="select"></a>

Retrieves rows of data from zero or more tables.

**Note**  
This topic provides summary information for reference. Comprehensive information about using `SELECT` and the SQL language is beyond the scope of this documentation. For information about using SQL that is specific to Athena, see [Considerations and limitations for SQL queries in Amazon Athena](other-notable-limitations.md) and [Run SQL queries in Amazon Athena](querying-athena-tables.md). For an example of creating a database, creating a table, and running a `SELECT` query on the table in Athena, see [Get started](getting-started.md).

## Synopsis
<a name="synopsis"></a>

```
[ WITH with_query [, ...] ]
SELECT [ ALL | DISTINCT ] select_expression [, ...]
[ FROM from_item [, ...] ]
[ WHERE condition ]
[ GROUP BY [ ALL | DISTINCT ] grouping_element [, ...] ]
[ HAVING condition ]
[ { UNION | INTERSECT | EXCEPT } [ ALL | DISTINCT ] select ]
[ ORDER BY expression [ ASC | DESC ] [ NULLS FIRST | NULLS LAST] [, ...] ]
[ OFFSET count [ ROW | ROWS ] ]
[ LIMIT [ count | ALL ] ]
```

**Note**  
Reserved words in SQL SELECT statements must be enclosed in double quotes. For more information, see [Reserved keywords to escape in SQL SELECT statements](reserved-words.md#list-of-reserved-words-sql-select).

## Parameters
<a name="select-parameters"></a>

**[ WITH with\$1query [, ....] ]**  
You can use `WITH` to flatten nested queries, or to simplify subqueries.  
Using the `WITH` clause to create recursive queries is supported starting in Athena engine version 3. The maximum recursion depth is 10.  
The `WITH` clause precedes the `SELECT` list in a query and defines one or more subqueries for use within the `SELECT` query.   
Each subquery defines a temporary table, similar to a view definition, which you can reference in the `FROM` clause. The tables are used only when the query runs.   
`with_query` syntax is:  

```
subquery_table_name [ ( column_name [, ...] ) ] AS (subquery)
```
Where:  
+  `subquery_table_name` is a unique name for a temporary table that defines the results of the `WITH` clause subquery. Each `subquery` must have a table name that can be referenced in the `FROM` clause.
+  `column_name [, ...]` is an optional list of output column names. The number of column names must be equal to or less than the number of columns defined by `subquery`.
+  `subquery` is any query statement.

**[ ALL \$1 DISTINCT ] select\$1expression**  
 `select_expression` determines the rows to be selected. A `select_expression` can use one of the following formats:  

```
expression [ [ AS ] column_alias ] [, ...]
```

```
row_expression.* [ AS ( column_alias [, ...] ) ]
```

```
relation.*
```

```
*
```
+ The `expression [ [ AS ] column_alias ]` syntax specifies an output column. The optional `[AS] column_alias` syntax specifies a custom heading name to be used for the column in the output.
+ For `row_expression.* [ AS ( column_alias [, ...] ) ]`, `row_expression` is an arbitrary expression of data type `ROW`. The fields of the row define the output columns to be included in the result.
+ For `relation.*`, the columns of `relation` are included in the result. This syntax does not permit the use of column aliases.
+ The asterisk `*` specifies that all columns be included in the result set.
+ In the result set, the order of columns is the same as the order of their specification by the select expression. If a select expression returns multiple columns, the column order follows the order used in the source relation or row type expression.
+ When column aliases are specified, the aliases override preexisting column or row field names. If the select expression does not have column names, zero-indexed anonymous column names (`_col0`, `_col1`, `_col2, ...`) are displayed in the output.
+  `ALL` is the default. Using `ALL` is treated the same as if it were omitted; all rows for all columns are selected and duplicates are kept.
+ Use `DISTINCT` to return only distinct values when a column contains duplicate values.

**FROM from\$1item [, ...]**  
Indicates the input to the query, where `from_item` can be a view, a join construct, or a subquery as described below.  
The `from_item` can be either:  
+  `table_name [ [ AS ] alias [ (column_alias [, ...]) ] ]` 

  Where `table_name` is the name of the target table from which to select rows, `alias` is the name to give the output of the `SELECT` statement, and `column_alias` defines the columns for the `alias` specified.
 **-OR-**   
+  `join_type from_item [ ON join_condition | USING ( join_column [, ...] ) ]` 

  Where `join_type` is one of:
  +  `[ INNER ] JOIN` 
  +  `LEFT [ OUTER ] JOIN` 
  +  `RIGHT [ OUTER ] JOIN` 
  +  `FULL [ OUTER ] JOIN` 
  +  `CROSS JOIN` 
  +  `ON join_condition | USING (join_column [, ...])` Where using `join_condition` allows you to specify column names for join keys in multiple tables, and using `join_column` requires `join_column` to exist in both tables.

**[ WHERE condition ]**  
Filters results according to the `condition` you specify, where `condition` generally has the following syntax.  

```
column_name operator value [[[AND | OR] column_name operator value] ...]
```
The *operator* can be one of the comparators `=`, `>`, `<`, `>=`, `<=`, `<>`, `!=`.   
The following subquery expressions can also be used in the `WHERE` clause.  
+ `[NOT] BETWEEN integer_A AND integer_B` – Specifies a range between two integers, as in the following example. If the column data type is `varchar`, the column must be cast to integer first.

  ```
  SELECT DISTINCT processid FROM "webdata"."impressions"
  WHERE cast(processid as int) BETWEEN 1500 and 1800
  ORDER BY processid
  ```
+ `[NOT] LIKE value` – Searches for the pattern specified. Use the percent sign (`%`) as a wildcard character, as in the following example.

  ```
  SELECT * FROM "webdata"."impressions"
  WHERE referrer LIKE '%.org'
  ```
+ `[NOT] IN (value[, value[, ...])` – Specifies a list of possible values for a column, as in the following example.

  ```
  SELECT * FROM "webdata"."impressions"
  WHERE referrer IN ('example.com','example.net','example.org')
  ```

**[ GROUP BY [ ALL \$1 DISTINCT ] grouping\$1expressions [, ...] ]**  
Divides the output of the `SELECT` statement into rows with matching values.  
 `ALL` and `DISTINCT` determine whether duplicate grouping sets each produce distinct output rows. If omitted, `ALL` is assumed.   
`grouping_expressions` allow you to perform complex grouping operations. You can use complex grouping operations to perform analysis that requires aggregation on multiple sets of columns in a single query.  
The `grouping_expressions` element can be any function, such as `SUM`, `AVG`, or `COUNT`, performed on input columns.   
`GROUP BY` expressions can group output by input column names that don't appear in the output of the `SELECT` statement.   
All output expressions must be either aggregate functions or columns present in the `GROUP BY` clause.   
You can use a single query to perform analysis that requires aggregating multiple column sets.   
Athena supports complex aggregations using `GROUPING SETS`, `CUBE` and `ROLLUP`. `GROUP BY GROUPING SETS` specifies multiple lists of columns to group on. `GROUP BY CUBE` generates all possible grouping sets for a given set of columns. `GROUP BY ROLLUP` generates all possible subtotals for a given set of columns. Complex grouping operations do not support grouping on expressions composed of input columns. Only column names are allowed.   
You can often use `UNION ALL` to achieve the same results as these `GROUP BY` operations, but queries that use `GROUP BY` have the advantage of reading the data one time, whereas `UNION ALL` reads the underlying data three times and may produce inconsistent results when the data source is subject to change. 

**[ HAVING condition ]**  
Used with aggregate functions and the `GROUP BY` clause. Controls which groups are selected, eliminating groups that don't satisfy `condition`. This filtering occurs after groups and aggregates are computed.

**[ \$1 UNION \$1 INTERSECT \$1 EXCEPT \$1 [ ALL \$1 DISTINCT ] union\$1query] ]**  
`UNION`, `INTERSECT`, and `EXCEPT` combine the results of more than one `SELECT` statement into a single query. `ALL` or `DISTINCT` control the uniqueness of the rows included in the final result set.   
`UNION` combines the rows resulting from the first query with the rows resulting from the second query. To eliminate duplicates, `UNION` builds a hash table, which consumes memory. For better performance, consider using `UNION ALL` if your query does not require the elimination of duplicates. Multiple `UNION` clauses are processed left to right unless you use parentheses to explicitly define the order of processing.  
`INTERSECT` returns only the rows that are present in the results of both the first and the second queries.  
`EXCEPT` returns the rows from the results of the first query, excluding the rows found by the second query.  
`ALL` causes all rows to be included, even if the rows are identical.  
`DISTINCT` causes only unique rows to be included in the combined result set.

**[ ORDER BY expression [ ASC \$1 DESC ] [ NULLS FIRST \$1 NULLS LAST] [, ...] ]**  
Sorts a result set by one or more output `expression`.   
When the clause contains multiple expressions, the result set is sorted according to the first `expression`. Then the second `expression` is applied to rows that have matching values from the first expression, and so on.   
Each `expression` may specify output columns from `SELECT` or an ordinal number for an output column by position, starting at one.  
`ORDER BY` is evaluated as the last step after any `GROUP BY` or `HAVING` clause. `ASC` and `DESC` determine whether results are sorted in ascending or descending order. The default sorting order is ascending (`ASC`). The default null ordering is `NULLS LAST`, regardless of ascending or descending sort order.

**[ OFFSET count [ ROW \$1 ROWS ] ]**  
Use the `OFFSET` clause to discard a number of leading rows from the result set. If the `ORDER BY` clause is present, the `OFFSET` clause is evaluated over a sorted result set, and the set remains sorted after the skipped rows are discarded. If the query has no `ORDER BY` clause, it is arbitrary which rows are discarded. If the count specified by `OFFSET` equals or exceeds the size of the result set, the final result is empty. 

**LIMIT [ count \$1 ALL ]**  
Restricts the number of rows in the result set to `count`. `LIMIT ALL` is the same as omitting the `LIMIT` clause. If the query has no `ORDER BY` clause, the results are arbitrary.

**TABLESAMPLE [ BERNOULLI \$1 SYSTEM ] (percentage)**  
Optional operator to select rows from a table based on a sampling method.  
 `BERNOULLI` selects each row to be in the table sample with a probability of `percentage`. All physical blocks of the table are scanned, and certain rows are skipped based on a comparison between the sample `percentage` and a random value calculated at runtime.   
With `SYSTEM`, the table is divided into logical segments of data, and the table is sampled at this granularity.   
Either all rows from a particular segment are selected, or the segment is skipped based on a comparison between the sample `percentage` and a random value calculated at runtime. `SYSTEM` sampling is dependent on the connector. This method does not guarantee independent sampling probabilities.

**[ UNNEST (array\$1or\$1map) [WITH ORDINALITY] ]**  
Expands an array or map into a relation. Arrays are expanded into a single column. Maps are expanded into two columns (*key*, *value*).   
You can use `UNNEST` with multiple arguments, which are expanded into multiple columns with as many rows as the highest cardinality argument.   
Other columns are padded with nulls.   
The `WITH ORDINALITY` clause adds an ordinality column to the end.  
 `UNNEST` is usually used with a `JOIN` and can reference columns from relations on the left side of the `JOIN`.

## Getting the file locations for source data in Amazon S3
<a name="select-path"></a>

To see the Amazon S3 file location for the data in a table row, you can use `"$path"` in a `SELECT` query, as in the following example:

```
SELECT "$path" FROM "my_database"."my_table" WHERE year=2019;
```

This returns a result like the following:

```
s3://amzn-s3-demo-bucket/datasets_mytable/year=2019/data_file1.json
```

To return a sorted, unique list of the S3 filename paths for the data in a table, you can use `SELECT DISTINCT` and `ORDER BY`, as in the following example.

```
SELECT DISTINCT "$path" AS data_source_file
FROM sampledb.elb_logs
ORDER By data_source_file ASC
```

To return only the filenames without the path, you can pass `"$path"` as a parameter to an `regexp_extract` function, as in the following example.

```
SELECT DISTINCT regexp_extract("$path", '[^/]+$') AS data_source_file
FROM sampledb.elb_logs
ORDER By data_source_file ASC
```

To return the data from a specific file, specify the file in the `WHERE` clause, as in the following example.

```
SELECT *,"$path" FROM my_database.my_table WHERE "$path" = 's3://amzn-s3-demo-bucket/my_table/my_partition/file-01.csv'
```

For more information and examples, see the Knowledge Center article [How can I see the Amazon S3 source file for a row in an Athena table?](https://aws.amazon.com/premiumsupport/knowledge-center/find-s3-source-file-athena-table-row/).

**Note**  
In Athena, the Hive or Iceberg hidden metadata columns `$bucket`, `$file_modified_time`, `$file_size`, and `$partition` are not supported for views.

## Escaping single quotes
<a name="select-escaping"></a>

 To escape a single quote, precede it with another single quote, as in the following example. Do not confuse this with a double quote. 

```
Select 'O''Reilly'
```

**Results**  
`O'Reilly`

## Additional resources
<a name="select-additional-resources"></a>

For more information about using `SELECT` statements in Athena, see the following resources.


| For information about this | See this | 
| --- | --- | 
| Running queries in Athena | [Run SQL queries in Amazon Athena](querying-athena-tables.md) | 
| Using SELECT to create a table | [Create a table from query results (CTAS)](ctas.md) | 
| Inserting data from a SELECT query into another table | [INSERT INTO](insert-into.md) | 
| Using built-in functions in SELECT statements | [Functions in Amazon Athena](functions.md) | 
| Using user defined functions in SELECT statements | [Query with user defined functions](querying-udf.md) | 
| Querying Data Catalog metadata | [Query the AWS Glue Data Catalog](querying-glue-catalog.md) | 

# INSERT INTO
<a name="insert-into"></a>

Inserts new rows into a destination table based on a `SELECT` query statement that runs on a source table, or based on a set of `VALUES` provided as part of the statement. When the source table is based on underlying data in one format, such as CSV or JSON, and the destination table is based on another format, such as Parquet or ORC, you can use `INSERT INTO` queries to transform selected data into the destination table's format. 

## Considerations and limitations
<a name="insert-into-limitations"></a>

Consider the following when using `INSERT` queries with Athena.
+ When running an `INSERT` query on a table with underlying data that is encrypted in Amazon S3, the output files that the `INSERT` query writes are not encrypted by default. We recommend that you encrypt `INSERT` query results if you are inserting into tables with encrypted data. 

  For more information about encrypting query results using the console, see [Encrypt Athena query results stored in Amazon S3](encrypting-query-results-stored-in-s3.md). To enable encryption using the AWS CLI or Athena API, use the `EncryptionConfiguration` properties of the [StartQueryExecution](https://docs.aws.amazon.com/athena/latest/APIReference/API_StartQueryExecution.html) action to specify Amazon S3 encryption options according to your requirements.
+ For `INSERT INTO` statements, the expected bucket owner setting does not apply to the destination table location in Amazon S3. The expected bucket owner setting applies only to the Amazon S3 output location that you specify for Athena query results. For more information, see [Specify a query result location using the Athena console](query-results-specify-location-console.md).
+ For ACID compliant `INSERT INTO` statements, see the `INSERT INTO` section of [Update Iceberg table data](querying-iceberg-updating-iceberg-table-data.md).

### Supported formats and SerDes
<a name="insert-into-supported-formats"></a>

You can run an `INSERT` query on tables created from data with the following formats and SerDes.


| Data format | SerDe | 
| --- | --- | 
|  Avro  |  org.apache.hadoop.hive.serde2.avro.AvroSerDe  | 
| Ion | com.amazon.ionhiveserde.IonHiveSerDe | 
|  JSON  |  org.apache.hive.hcatalog.data.JsonSerDe  | 
|  ORC  |  org.apache.hadoop.hive.ql.io.orc.OrcSerde  | 
|  Parquet  |  org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe  | 
|  Text file  |  org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe  TSV and custom-delimited files are supported.   | 
| CSV | org.apache.hadoop.hive.serde2.OpenCSVSerde Writes are only supported for string types. From Athena, you cannot write to any tables that contain non-string types in Glue schema. For more information, see [CSV SerDe](csv-serde.md#csv-serde-opencsvserde-considerations-non-string).  | 

### Bucketed tables not supported
<a name="insert-into-bucketed-tables-not-supported"></a>

`INSERT INTO` is not supported on bucketed tables. For more information, see [Use partitioning and bucketing](ctas-partitioning-and-bucketing.md).

### Federated queries not supported
<a name="insert-into-federated-queries-not-supported"></a>

`INSERT INTO` is not supported for federated queries. Attempting to do so may result in the error message This operation is currently not supported for external catalogs. For information about federated queries, see [Use Amazon Athena Federated Query](federated-queries.md).

### Partitioning
<a name="insert-into-limitations-partitioning"></a>

Consider the points in this section when using partitioning with `INSERT INTO` or `CREATE TABLE AS SELECT` queries.

#### Limits
<a name="insert-into-partition-limits"></a>

The `INSERT INTO` statement supports writing a maximum of 100 partitions to the destination table. If you run the `SELECT` clause on a table with more than 100 partitions, the query fails unless the `SELECT` query is limited to 100 partitions or fewer.

For information about working around this limitation, see [Use CTAS and INSERT INTO to work around the 100 partition limit](ctas-insert-into.md).

#### Column ordering
<a name="insert-into-partition-detection"></a>

`INSERT INTO` or `CREATE TABLE AS SELECT` statements expect the partitioned column to be the last column in the list of projected columns in a `SELECT` statement. 

If the source table is non-partitioned, or partitioned on different columns compared to the destination table, queries like `INSERT INTO destination_table SELECT * FROM source_table` consider the values in the last column of the source table to be values for a partition column in the destination table. Keep this in mind when trying to create a partitioned table from a non-partitioned table.

#### Resources
<a name="insert-into-partition-resources"></a>

For more information about using `INSERT INTO` with partitioning, see the following resources.
+ For inserting partitioned data into a partitioned table, see [Use CTAS and INSERT INTO to work around the 100 partition limit](ctas-insert-into.md).
+ For inserting unpartitioned data into a partitioned table, see [Use CTAS and INSERT INTO for ETL and data analysis](ctas-insert-into-etl.md). 

### Files written to Amazon S3
<a name="insert-into-files-written-to-s3"></a>

Athena writes files to source data locations in Amazon S3 as a result of the `INSERT` command. Each `INSERT` operation creates a new file, rather than appending to an existing file. The file locations depend on the structure of the table and the `SELECT` query, if present. Athena generates a data manifest file for each `INSERT` query. The manifest tracks the files that the query wrote. It is saved to the Athena query result location in Amazon S3. For more information, see [Identify query output files](querying-finding-output-files.md#querying-identifying-output-files).

### Avoid highly transactional updates
<a name="insert-into-transactional-caveat"></a>

When you use `INSERT INTO` to add rows to a table in Amazon S3, Athena does not rewrite or modify existing files. Instead, it writes the rows as one or more new files. Because tables with [ many small files result in lower query performance](performance-tuning-data-optimization-techniques.md#performance-tuning-avoid-having-too-many-files), and write and read operations such as `PutObject` and `GetObject` result in higher costs from Amazon S3, consider the following options when using `INSERT INTO`:
+ Run `INSERT INTO` operations less frequently on larger batches of rows.
+ For large data ingestion volumes, consider using a service like [Amazon Data Firehose](https://docs.aws.amazon.com/firehose/latest/dev/what-is-this-service.html).
+ Avoid using `INSERT INTO` altogether. Instead, accumulate rows into larger files and upload them directly to Amazon S3 where they can be queried by Athena.

### Locating orphaned files
<a name="insert-into-files-partial-data"></a>

If a `CTAS` or `INSERT INTO` statement fails, orphaned data can be left in the data location and might be read in subsequent queries. To locate orphaned files for inspection or deletion, you can use the data manifest file that Athena provides to track the list of files to be written. For more information, see [Identify query output files](querying-finding-output-files.md#querying-identifying-output-files) and [DataManifestLocation](https://docs.aws.amazon.com/athena/latest/APIReference/API_QueryExecutionStatistics.html#athena-Type-QueryExecutionStatistics-DataManifestLocation).

## INSERT INTO...SELECT
<a name="insert-into-select"></a>

Specifies the query to run on one table, `source_table`, which determines rows to insert into a second table, `destination_table`. If the `SELECT` query specifies columns in the `source_table`, the columns must precisely match those in the `destination_table`.

For more information about `SELECT` queries, see [SELECT](select.md).

### Synopsis
<a name="insert-into-select-synopsis"></a>

```
INSERT INTO destination_table 
SELECT select_query 
FROM source_table_or_view
```

### Examples
<a name="insert-into-select-examples"></a>

Select all rows in the `vancouver_pageviews` table and insert them into the `canada_pageviews` table:

```
INSERT INTO canada_pageviews 
SELECT * 
FROM vancouver_pageviews;
```

Select only those rows in the `vancouver_pageviews` table where the `date` column has a value between `2019-07-01` and `2019-07-31`, and then insert them into `canada_july_pageviews`:

```
INSERT INTO canada_july_pageviews
SELECT *
FROM vancouver_pageviews
WHERE date
    BETWEEN date '2019-07-01'
        AND '2019-07-31';
```

Select the values in the `city` and `state` columns in the `cities_world` table only from those rows with a value of `usa` in the `country` column and insert them into the `city` and `state` columns in the `cities_usa` table:

```
INSERT INTO cities_usa (city,state)
SELECT city,state
FROM cities_world
    WHERE country='usa'
```

## INSERT INTO...VALUES
<a name="insert-into-values"></a>

Inserts rows into an existing table by specifying columns and values. Specified columns and associated data types must precisely match the columns and data types in the destination table.

**Important**  
We do not recommend inserting rows using `VALUES` because Athena generates files for each `INSERT` operation. This can cause many small files to be created and degrade the table's query performance. To identify files that an `INSERT` query creates, examine the data manifest file. For more information, see [Work with query results and recent queries](querying.md).

### Synopsis
<a name="insert-into-values-synopsis"></a>

```
INSERT INTO destination_table [(col1,col2,...)] 
VALUES (col1value,col2value,...)[,
       (col1value,col2value,...)][,
       ...]
```

### Examples
<a name="insert-into-values-examples"></a>

In the following examples, the cities table has three columns: `id`, `city`, `state`, `state_motto`. The `id` column is type `INT` and all other columns are type `VARCHAR`.

Insert a single row into the `cities` table, with all column values specified:

```
INSERT INTO cities 
VALUES (1,'Lansing','MI','Si quaeris peninsulam amoenam circumspice')
```

Insert two rows into the `cities` table:

```
INSERT INTO cities 
VALUES (1,'Lansing','MI','Si quaeris peninsulam amoenam circumspice'),
       (3,'Boise','ID','Esto perpetua')
```

# VALUES
<a name="values-statement"></a>

Creates a literal inline table. The table can be anonymous, or you can use the `AS` clause to specify a table name, column names, or both.

## Synopsis
<a name="values-statement-synopsis"></a>

```
VALUES row [, ...]
```

## Parameters
<a name="values-statement-parameters"></a>

**row**  
The `row` parameter can be a single expression or `( column_expression [, ...] )`.

## Examples
<a name="values-statement-examples"></a>

Return a table with one column and three rows:

```
VALUES 1, 2, 3
```

Return a table with two columns and three rows:

```
VALUES
    (1, 'a'),
    (2, 'b'),
    (3, 'c')
```

Return a table with the columns `id` and `name`:

```
SELECT * FROM (
    VALUES
        (1, 'a'),
        (2, 'b'),
        (3, 'c')
) AS t (id, name)
```

Create a table called `customers` with the columns `id` and `name`:

```
CREATE TABLE customers AS
SELECT * FROM (
    VALUES
        (1, 'a'),
        (2, 'b'),
        (3, 'c')
) AS t (id, name)
```

## See also
<a name="values-statement-see-also"></a>

[INSERT INTO...VALUES](insert-into.md#insert-into-values)

# DELETE
<a name="delete-statement"></a>

Deletes rows in an Apache Iceberg table. `DELETE` is transactional and is supported only for Apache Iceberg tables.

## Synopsis
<a name="delete-statement-synopsis"></a>

To delete the rows from an Iceberg table, use the following syntax.

```
DELETE FROM [db_name.]table_name [WHERE predicate]
```

For more information and examples, see the `DELETE` section of [Update Iceberg table data](querying-iceberg-updating-iceberg-table-data.md).

# UPDATE
<a name="update-statement"></a>

Updates rows in an Apache Iceberg table. `UPDATE` is transactional and is supported only for Apache Iceberg tables. The statement works only on existing rows and cannot be used to insert or append a row.

## Synopsis
<a name="update-statement-synopsis"></a>

To update the rows in an Iceberg table, use the following syntax.

```
UPDATE [db_name.]table_name SET xx=yy[,...] [WHERE predicate]
```

For more information and examples, see the `UPDATE` section of [Update Iceberg table data](querying-iceberg-updating-iceberg-table-data.md).

# MERGE INTO
<a name="merge-into-statement"></a>

Conditionally updates, deletes, or inserts rows into an Apache Iceberg table. A single statement can combine update, delete, and insert actions.

**Note**  
`MERGE INTO` is transactional and is supported only for Apache Iceberg tables in Athena engine version 3.

## Synopsis
<a name="merge-into-statement-synopsis"></a>

To conditionally update, delete, or insert rows from an Iceberg table, use the following syntax.

```
MERGE INTO target_table [ [ AS ]  target_alias ]
USING { source_table | query } [ [ AS ] source_alias ]
ON search_condition
when_clause [...]
```

The *when\$1clause* is one of the following:

```
WHEN MATCHED [ AND condition ]
    THEN DELETE
```

```
WHEN MATCHED [ AND condition ]
    THEN UPDATE SET ( column = expression [, ...] )
```

```
WHEN NOT MATCHED [ AND condition ]
    THEN INSERT (column_name[, column_name ...]) VALUES (expression, ...)
```

`MERGE` supports an arbitrary number of `WHEN` clauses with different `MATCHED` conditions. The condition clauses execute the `DELETE`, `UPDATE` or `INSERT` operation in the first `WHEN` clause selected by the `MATCHED` state and the match condition.

For each source row, the `WHEN` clauses are processed in order. Only the first matching `WHEN` clause is executed. Subsequent clauses are ignored. A user error is raised when a single target table row matches more than one source row.

If a source row is not matched by any `WHEN` clause and there is no `WHEN NOT MATCHED` clause, the source row is ignored.

In `WHEN` clauses that have `UPDATE` operations, the column value expressions can refer to any field of the target or the source. In the `NOT MATCHED` case, the `INSERT` expressions can refer to any field of the source.

**Example**  
The following example merges rows from the second table into the first table if the rows don't exist in the first table. Note that the columns listed in the `VALUES` clause must be prefixed by the source table alias. The target columns listed in the `INSERT` clause must *not* be so prefixed.

```
MERGE INTO iceberg_table_sample as ice1
USING iceberg2_table_sample as ice2
ON ice1.col1 = ice2.col1
WHEN NOT MATCHED 
THEN INSERT (col1)
      VALUES (ice2.col1)
```

For more `MERGE INTO` examples, see [Update Iceberg table data](querying-iceberg-updating-iceberg-table-data.md).

# OPTIMIZE
<a name="optimize-statement"></a>

Optimizes rows in an Apache Iceberg table by rewriting data files into a more optimized layout based on their size and number of associated delete files.

**Note**  
`OPTIMIZE` is transactional and is supported only for Apache Iceberg tables.

## Syntax
<a name="optimize-statement-syntax"></a>

The following syntax summary shows how to optimize data layout for an Iceberg table.

```
OPTIMIZE [db_name.]table_name REWRITE DATA USING BIN_PACK
  [WHERE predicate]
```

**Note**  
Only partition columns are allowed in the `WHERE` clause *predicate*. Specifying a non-partition column will cause the query to fail. 

The compaction action is charged by the amount of data scanned during the rewrite process. The `REWRITE DATA` action uses predicates to select for files that contain matching rows. If any row in the file matches the predicate, the file is selected for optimization. Thus, to control the number of files affected by the compaction operation, you can specify a `WHERE` clause.

## Configuring compaction properties
<a name="optimize-statement-configuring-compaction-properties"></a>

To control the size of the files to be selected for compaction and the resulting file size after compaction, you can use table property parameters. You can use the [ALTER TABLE SET TBLPROPERTIES](querying-iceberg-alter-table-set-properties.md) command to configure the related [table properties](querying-iceberg-creating-tables.md#querying-iceberg-table-properties).

## Additional resources
<a name="optimize-statement-additional-resources"></a>

[Optimize Iceberg tables](querying-iceberg-data-optimization.md)

# VACUUM
<a name="vacuum-statement"></a>

The `VACUUM` statement performs table maintenance on Apache Iceberg tables by performing [snapshot expiration](https://iceberg.apache.org/docs/latest/spark-procedures/#expire_snapshots) and [orphan file removal](https://iceberg.apache.org/docs/latest/spark-procedures/#remove_orphan_files).

**Note**  
`VACUUM` is transactional and is supported only for Apache Iceberg tables in Athena engine version 3.

The `VACUUM` statement optimizes Iceberg tables by reducing storage consumption. For more information about using `VACUUM`, see [Optimize Iceberg tables](querying-iceberg-data-optimization.md). Note that, because the `VACUUM` statement makes API calls to Amazon S3, charges apply for the associated requests to Amazon S3.

**Warning**  
If you run a snapshot expiration operation, you can no longer time travel to expired snapshots.

## Synopsis
<a name="vacuum-statement-synopsis"></a>

To remove data files no longer needed for an Iceberg table, use the following syntax.

```
VACUUM [database_name.]target_table
```
+ `VACUUM` expects the Iceberg data to be in an Amazon S3 folder rather than an Amazon S3 bucket. For example, if your Iceberg data is at `s3://amzn-s3-demo-bucket`/ instead of `s3://amzn-s3-demo-bucket/myicebergfolder/`, the `VACUUM` statement fails with the error message GENERIC\$1INTERNAL\$1ERROR: Path missing in file system location: `s3://amzn-s3-demo-bucket`.
+ For `VACUUM` to be able to delete data files, your query execution role must have `s3:DeleteObject` permissions on the bucket where your Iceberg tables, metadata, snapshots, and data files are located. If the permission is not present, the `VACUUM` query will succeed, but the files will not be deleted. 
+ To run `VACUUM` on a table with a name that begins with an underscore (for example, `_mytable`), enclose the table name in backticks, as in the following example. If you prefix the table name with a database name, do not enclose the database name in backticks. Note that double quotes will not work in place of backticks. 

  This behavior is particular to `VACUUM`. The `CREATE` and `INSERT INTO` statements do not require backticks for table names that begin with underscores.

  ```
  VACUUM `_mytable`
  VACUUM my_database.`_mytable`
  ```

## Operations performed
<a name="vacuum-statement-operations-performed"></a>

`VACUUM` performs the following operations:
+ Removes snapshots that are older than the amount of time that is specified by the `vacuum_max_snapshot_age_seconds` table property. By default, this property is set to 432000 seconds (5 days).
+ Removes snapshots that are not within the period to be retained that are in excess of the number specified by the `vacuum_min_snapshots_to_keep` table property. The default is 1.

  You can specify these table properties in your `CREATE TABLE` statement. After the table has been created, you can use the [ALTER TABLE SET TBLPROPERTIES](querying-iceberg-alter-table-set-properties.md) statement to update them. 
+ Removes any metadata and data files that are unreachable as a result of the snapshot removal. You can configure the number of old metadata files to be retained by setting the `vacuum_max_metadata_files_to_keep` table property. The default value is 100.
+ Removes orphan files that are older than the time specified in the `vacuum_max_snapshot_age_seconds` table property. Orphan files are files in the table's data directory that are not part of the table state.

For more information about creating and managing Apache Iceberg tables in Athena, see [Create Iceberg tables](querying-iceberg-creating-tables.md) and [Manage Iceberg tables](querying-iceberg-managing-tables.md).

# Using EXPLAIN and EXPLAIN ANALYZE in Athena
<a name="athena-explain-statement"></a>

The `EXPLAIN` statement shows the logical or distributed execution plan of a specified SQL statement, or validates the SQL statement. You can output the results in text format or in a data format for rendering into a graph.

**Note**  
You can view graphical representations of logical and distributed plans for your queries in the Athena console without using the `EXPLAIN` syntax. For more information, see [View execution plans for SQL queries](query-plans.md).

The `EXPLAIN ANALYZE` statement shows both the distributed execution plan of a specified SQL statement and the computational cost of each operation in a SQL query. You can output the results in text or JSON format. 

## Considerations and limitations
<a name="athena-explain-statement-considerations-and-limitations"></a>

The `EXPLAIN` and `EXPLAIN ANALYZE` statements in Athena have the following limitations.
+ Because `EXPLAIN` queries do not scan any data, Athena does not charge for them. However, because `EXPLAIN` queries make calls to AWS Glue to retrieve table metadata, you may incur charges from Glue if the calls go above the [free tier limit for glue](https://aws.amazon.com/free/?all-free-tier.sort-by=item.additionalFields.SortRank&all-free-tier.sort-order=asc&awsf.Free%20Tier%20Categories=categories%23analytics&all-free-tier.q=glue&all-free-tier.q_operator=AND).
+ Because `EXPLAIN ANALYZE` queries are executed, they do scan data, and Athena charges for the amount of data scanned.
+ Row or cell filtering information defined in Lake Formation and query stats information are not shown in the output of `EXPLAIN` and `EXPLAIN ANALYZE`.

## EXPLAIN syntax
<a name="athena-explain-statement-syntax-athena-engine-version-2"></a>

```
EXPLAIN [ ( option [, ...]) ] statement
```

*option* can be one of the following:

```
FORMAT { TEXT | GRAPHVIZ | JSON }
TYPE { LOGICAL | DISTRIBUTED | VALIDATE | IO }
```

If the `FORMAT` option is not specified, the output defaults to `TEXT` format. The `IO` type provides information about the tables and schemas that the query reads. 

## EXPLAIN ANALYZE syntax
<a name="athena-explain-analyze-statement"></a>

In addition to the output included in `EXPLAIN`, `EXPLAIN ANALYZE` output also includes runtime statistics for the specified query such as CPU usage, the number of rows input, and the number of rows output.

```
EXPLAIN ANALYZE [ ( option [, ...]) ] statement
```

*option* can be one of the following:

```
FORMAT { TEXT | JSON }
```

If the `FORMAT` option is not specified, the output defaults to `TEXT` format. Because all queries for `EXPLAIN ANALYZE` are `DISTRIBUTED`, the `TYPE` option is not available for `EXPLAIN ANALYZE`. 

*statement* can be one of the following:

```
SELECT
CREATE TABLE AS SELECT
INSERT
UNLOAD
```

## EXPLAIN examples
<a name="athena-explain-statement-examples"></a>

The following examples for `EXPLAIN` progress from the more straightforward to the more complex.

### Example 1: Use the EXPLAIN statement to show a query plan in text format
<a name="athena-explain-statement-example-text-query-plan"></a>

In the following example, `EXPLAIN` shows the execution plan for a `SELECT` query on Elastic Load Balancing logs. The format defaults to text output.

```
EXPLAIN 
SELECT 
   request_timestamp, 
   elb_name, 
   request_ip 
FROM sampledb.elb_logs;
```

#### Results
<a name="athena-explain-statement-example-text-query-plan-results"></a>

```
- Output[request_timestamp, elb_name, request_ip] => [[request_timestamp, elb_name, request_ip]]
    - RemoteExchange[GATHER] => [[request_timestamp, elb_name, request_ip]]
        - TableScan[awsdatacatalog:HiveTableHandle{schemaName=sampledb, tableName=elb_logs, 
analyzePartitionValues=Optional.empty}] => [[request_timestamp, elb_name, request_ip]]
                LAYOUT: sampledb.elb_logs
                request_ip := request_ip:string:2:REGULAR
                request_timestamp := request_timestamp:string:0:REGULAR
                elb_name := elb_name:string:1:REGULAR
```

### Example 2: Use EXPLAIN to graph a query plan
<a name="athena-explain-statement-example-graph-a-query-plan"></a>

You can use the Athena console to graph a query plan for you. Enter a `SELECT` statement like the following into the Athena query editor, and then choose **EXPLAIN**.

```
SELECT 
      c.c_custkey,
      o.o_orderkey,
      o.o_orderstatus
   FROM tpch100.customer c 
   JOIN tpch100.orders o 
       ON c.c_custkey = o.o_custkey
```

The **Explain** page of the Athena query editor opens and shows you a distributed plan and a logical plan for the query. The following graph shows the logical plan for the example.

![\[Graph of the query plan rendered by the Athena query editor.\]](http://docs.aws.amazon.com/athena/latest/ug/images/athena-explain-statement-tpch.png)


**Important**  
Currently, some partition filters may not be visible in the nested operator tree graph even though Athena does apply them to your query. To verify the effect of such filters, run `EXPLAIN` or `EXPLAIN ANALYZE` on your query and view the results.

For more information about using the query plan graphing features in the Athena console, see [View execution plans for SQL queries](query-plans.md).

### Example 3: Use the EXPLAIN statement to verify partition pruning
<a name="athena-explain-statement-example-verify-partition-pruning"></a>

When you use a filtering predicate on a partitioned key to query a partitioned table, the query engine applies the predicate to the partitioned key to reduce the amount of data read.

The following example uses an `EXPLAIN` query to verify partition pruning for a `SELECT` query on a partitioned table. First, a `CREATE TABLE` statement creates the `tpch100.orders_partitioned` table. The table is partitioned on column `o_orderdate`.

```
CREATE TABLE `tpch100.orders_partitioned`(
  `o_orderkey` int, 
  `o_custkey` int, 
  `o_orderstatus` string, 
  `o_totalprice` double, 
  `o_orderpriority` string, 
  `o_clerk` string, 
  `o_shippriority` int, 
  `o_comment` string)
PARTITIONED BY ( 
  `o_orderdate` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://amzn-s3-demo-bucket/<your_directory_path>/'
```

The `tpch100.orders_partitioned` table has several partitions on `o_orderdate`, as shown by the `SHOW PARTITIONS` command.

```
SHOW PARTITIONS tpch100.orders_partitioned;

o_orderdate=1994
o_orderdate=2015
o_orderdate=1998
o_orderdate=1995
o_orderdate=1993
o_orderdate=1997
o_orderdate=1992
o_orderdate=1996
```

The following `EXPLAIN` query verifies partition pruning on the specified `SELECT` statement.

```
EXPLAIN 
SELECT 
   o_orderkey, 
   o_custkey, 
   o_orderdate 
FROM tpch100.orders_partitioned
WHERE o_orderdate = '1995'
```

#### Results
<a name="athena-explain-statement-example-verify-partition-pruning-results"></a>

```
Query Plan
- Output[o_orderkey, o_custkey, o_orderdate] => [[o_orderkey, o_custkey, o_orderdate]]
    - RemoteExchange[GATHER] => [[o_orderkey, o_custkey, o_orderdate]]
        - TableScan[awsdatacatalog:HiveTableHandle{schemaName=tpch100, tableName=orders_partitioned, 
analyzePartitionValues=Optional.empty}] => [[o_orderkey, o_custkey, o_orderdate]]
                LAYOUT: tpch100.orders_partitioned
                o_orderdate := o_orderdate:string:-1:PARTITION_KEY
                    :: [[1995]]
                o_custkey := o_custkey:int:1:REGULAR
                o_orderkey := o_orderkey:int:0:REGULAR
```

The bold text in the result shows that the predicate `o_orderdate = '1995'` was applied on the `PARTITION_KEY`.

### Example 4: Use an EXPLAIN query to check the join order and join type
<a name="athena-explain-statement-example-check-join-order-and-type"></a>

The following `EXPLAIN` query checks the `SELECT` statement's join order and join type. Use a query like this to examine query memory usage so that you can reduce the chances of getting an `EXCEEDED_LOCAL_MEMORY_LIMIT` error.

```
EXPLAIN (TYPE DISTRIBUTED)
   SELECT 
      c.c_custkey, 
      o.o_orderkey,
      o.o_orderstatus
   FROM tpch100.customer c 
   JOIN tpch100.orders o 
       ON c.c_custkey = o.o_custkey 
   WHERE c.c_custkey = 123
```

#### Results
<a name="athena-explain-statement-example-check-join-order-and-type-results"></a>

```
Query Plan
Fragment 0 [SINGLE]
    Output layout: [c_custkey, o_orderkey, o_orderstatus]
    Output partitioning: SINGLE []
    Stage Execution Strategy: UNGROUPED_EXECUTION
    - Output[c_custkey, o_orderkey, o_orderstatus] => [[c_custkey, o_orderkey, o_orderstatus]]
        - RemoteSource[1] => [[c_custkey, o_orderstatus, o_orderkey]]

Fragment 1 [SOURCE]
    Output layout: [c_custkey, o_orderstatus, o_orderkey]
    Output partitioning: SINGLE []
    Stage Execution Strategy: UNGROUPED_EXECUTION
    - CrossJoin => [[c_custkey, o_orderstatus, o_orderkey]]
            Distribution: REPLICATED
        - ScanFilter[table = awsdatacatalog:HiveTableHandle{schemaName=tpch100, 
tableName=customer, analyzePartitionValues=Optional.empty}, grouped = false, 
filterPredicate = ("c_custkey" = 123)] => [[c_custkey]]
                LAYOUT: tpch100.customer
                c_custkey := c_custkey:int:0:REGULAR
        - LocalExchange[SINGLE] () => [[o_orderstatus, o_orderkey]]
            - RemoteSource[2] => [[o_orderstatus, o_orderkey]]

Fragment 2 [SOURCE]
    Output layout: [o_orderstatus, o_orderkey]
    Output partitioning: BROADCAST []
    Stage Execution Strategy: UNGROUPED_EXECUTION
    - ScanFilterProject[table = awsdatacatalog:HiveTableHandle{schemaName=tpch100, 
tableName=orders, analyzePartitionValues=Optional.empty}, grouped = false, 
filterPredicate = ("o_custkey" = 123)] => [[o_orderstatus, o_orderkey]]
            LAYOUT: tpch100.orders
            o_orderstatus := o_orderstatus:string:2:REGULAR
            o_custkey := o_custkey:int:1:REGULAR
            o_orderkey := o_orderkey:int:0:REGULAR
```

The example query was optimized into a cross join for better performance. The results show that `tpch100.orders` will be distributed as the `BROADCAST` distribution type. This implies that the `tpch100.orders` table will be distributed to all nodes that perform the join operation. The `BROADCAST` distribution type will require that the all of the filtered results of the `tpch100.orders` table fit into the memory of each node that performs the join operation.

However, the `tpch100.customer` table is smaller than `tpch100.orders`. Because `tpch100.customer` requires less memory, you can rewrite the query to `BROADCAST tpch100.customer` instead of `tpch100.orders`. This reduces the chance of the query receiving the `EXCEEDED_LOCAL_MEMORY_LIMIT` error. This strategy assumes the following points:
+ The `tpch100.customer.c_custkey` is unique in the `tpch100.customer` table.
+ There is a one-to-many mapping relationship between `tpch100.customer` and `tpch100.orders`.

The following example shows the rewritten query.

```
SELECT 
    c.c_custkey,
    o.o_orderkey,
    o.o_orderstatus
FROM tpch100.orders o
JOIN tpch100.customer c -- the filtered results of tpch100.customer are distributed to all nodes.
    ON c.c_custkey = o.o_custkey 
WHERE c.c_custkey = 123
```

### Example 5: Use an EXPLAIN query to remove predicates that have no effect
<a name="athena-explain-statement-example-remove-unneeded-predicates"></a>

You can use an `EXPLAIN` query to check the effectiveness of filtering predicates. You can use the results to remove predicates that have no effect, as in the following example.

```
EXPLAIN
   SELECT 
      c.c_name
   FROM tpch100.customer c
   WHERE c.c_custkey = CAST(RANDOM() * 1000 AS INT)
   AND c.c_custkey BETWEEN 1000 AND 2000
   AND c.c_custkey = 1500
```

#### Results
<a name="athena-explain-statement-example-remove-unneeded-predicates-results"></a>

```
Query Plan
- Output[c_name] => [[c_name]]
    - RemoteExchange[GATHER] => [[c_name]]
        - ScanFilterProject[table = 
awsdatacatalog:HiveTableHandle{schemaName=tpch100, 
tableName=customer, analyzePartitionValues=Optional.empty}, 
filterPredicate = (("c_custkey" = 1500) AND ("c_custkey" = 
CAST(("random"() * 1E3) AS int)))] => [[c_name]]
                LAYOUT: tpch100.customer
                c_custkey := c_custkey:int:0:REGULAR
                c_name := c_name:string:1:REGULAR
```

The `filterPredicate` in the results shows that the optimizer merged the original three predicates into two predicates and changed their order of application.

```
filterPredicate = (("c_custkey" = 1500) AND ("c_custkey" = CAST(("random"() * 1E3) AS int)))
```

Because the results show that the predicate `AND c.c_custkey BETWEEN 1000 AND 2000` has no effect, you can remove this predicate without changing the query results.

For information about the terms used in the results of `EXPLAIN` queries, see [Understand Athena EXPLAIN statement results](athena-explain-statement-understanding.md).

## EXPLAIN ANALYZE examples
<a name="athena-explain-analyze-examples"></a>

The following examples show example `EXPLAIN ANALYZE` queries and outputs.

### Example 1: Use EXPLAIN ANALYZE to show a query plan and computational cost in text format
<a name="athena-explain-analyze-example-cflogs-text"></a>

In the following example, `EXPLAIN ANALYZE` shows the execution plan and computational costs for a `SELECT` query on CloudFront logs. The format defaults to text output.

```
EXPLAIN ANALYZE SELECT FROM cloudfront_logs LIMIT 10
```

#### Results
<a name="athena-explain-analyze-example-cflogs-text-results"></a>

```
 Fragment 1
     CPU: 24.60ms, Input: 10 rows (1.48kB); per task: std.dev.: 0.00, Output: 10 rows (1.48kB)
     Output layout: [date, time, location, bytes, requestip, method, host, uri, status, referrer,\
       os, browser, browserversion]
Limit[10] => [[date, time, location, bytes, requestip, method, host, uri, status, referrer, os,\
  browser, browserversion]]
             CPU: 1.00ms (0.03%), Output: 10 rows (1.48kB)
             Input avg.: 10.00 rows, Input std.dev.: 0.00%
LocalExchange[SINGLE] () => [[date, time, location, bytes, requestip, method, host, uri, status, referrer, os,\
 browser, browserversion]]
                 CPU: 0.00ns (0.00%), Output: 10 rows (1.48kB)
                 Input avg.: 0.63 rows, Input std.dev.: 387.30%
RemoteSource[2] => [[date, time, location, bytes, requestip, method, host, uri, status, referrer, os,\
  browser, browserversion]]
                     CPU: 1.00ms (0.03%), Output: 10 rows (1.48kB)
                     Input avg.: 0.63 rows, Input std.dev.: 387.30%

 Fragment 2
     CPU: 3.83s, Input: 998 rows (147.21kB); per task: std.dev.: 0.00, Output: 20 rows (2.95kB)
     Output layout: [date, time, location, bytes, requestip, method, host, uri, status, referrer, os,\
       browser, browserversion]
LimitPartial[10] => [[date, time, location, bytes, requestip, method, host, uri, status, referrer, os,\
  browser, browserversion]]
             CPU: 5.00ms (0.13%), Output: 20 rows (2.95kB)
             Input avg.: 166.33 rows, Input std.dev.: 141.42%
TableScan[awsdatacatalog:HiveTableHandle{schemaName=default, tableName=cloudfront_logs,\
  analyzePartitionValues=Optional.empty}, 
grouped = false] => [[date, time, location, bytes, requestip, method, host, uri, st
                 CPU: 3.82s (99.82%), Output: 998 rows (147.21kB)
                 Input avg.: 166.33 rows, Input std.dev.: 141.42%
                 LAYOUT: default.cloudfront_logs
                 date := date:date:0:REGULAR
                 referrer := referrer:string:9:REGULAR
                 os := os:string:10:REGULAR
                 method := method:string:5:REGULAR
                 bytes := bytes:int:3:REGULAR
                 browser := browser:string:11:REGULAR
                 host := host:string:6:REGULAR
                 requestip := requestip:string:4:REGULAR
                 location := location:string:2:REGULAR
                 time := time:string:1:REGULAR
                 uri := uri:string:7:REGULAR
                 browserversion := browserversion:string:12:REGULAR
                 status := status:int:8:REGULAR
```

### Example 2: Use EXPLAIN ANALYZE to show a query plan in JSON format
<a name="athena-explain-analyze-example-cflogs-json"></a>

The following example shows the execution plan and computational costs for a `SELECT` query on CloudFront logs. The example specifies JSON as the output format.

```
EXPLAIN ANALYZE (FORMAT JSON) SELECT * FROM cloudfront_logs LIMIT 10
```

#### Results
<a name="athena-explain-analyze-example-cflogs-json-results"></a>

```
{ 
    "fragments": [{ 
        "id": "1", 
 
        "stageStats": { 
            "totalCpuTime": "3.31ms", 
            "inputRows": "10 rows", 
            "inputDataSize": "1514B", 
            "stdDevInputRows": "0.00", 
            "outputRows": "10 rows", 
            "outputDataSize": "1514B" 
        }, 
        "outputLayout": "date, time, location, bytes, requestip, method, host,\
           uri, status, referrer, os, browser, browserversion", 
 
        "logicalPlan": { 
            "1": [{ 
                "name": "Limit", 
                "identifier": "[10]", 
                "outputs": ["date", "time", "location", "bytes", "requestip", "method", "host",\
                  "uri", "status", "referrer", "os", "browser", "browserversion"], 
                "details": "", 
                "distributedNodeStats": { 
                    "nodeCpuTime": "0.00ns", 
                    "nodeOutputRows": 10, 
                    "nodeOutputDataSize": "1514B", 
                    "operatorInputRowsStats": [{ 
                        "nodeInputRows": 10.0, 
                        "nodeInputRowsStdDev": 0.0 
                    }] 
                }, 
                "children": [{ 
                    "name": "LocalExchange", 
                    "identifier": "[SINGLE] ()", 
                    "outputs": ["date", "time", "location", "bytes", "requestip", "method", "host",\
                      "uri", "status", "referrer", "os", "browser", "browserversion"], 
                    "details": "", 
                    "distributedNodeStats": { 
                        "nodeCpuTime": "0.00ns", 
                        "nodeOutputRows": 10, 
                        "nodeOutputDataSize": "1514B", 
                        "operatorInputRowsStats": [{ 
                            "nodeInputRows": 0.625, 
                            "nodeInputRowsStdDev": 387.2983346207417 
                        }] 
                    }, 
                    "children": [{ 
                        "name": "RemoteSource", 
                        "identifier": "[2]", 
                        "outputs": ["date", "time", "location", "bytes", "requestip", "method", "host",\
                          "uri", "status", "referrer", "os", "browser", "browserversion"], 
                        "details": "", 
                        "distributedNodeStats": { 
                            "nodeCpuTime": "0.00ns", 
                            "nodeOutputRows": 10, 
                            "nodeOutputDataSize": "1514B", 
                            "operatorInputRowsStats": [{ 
                                "nodeInputRows": 0.625, 
                                "nodeInputRowsStdDev": 387.2983346207417 
                            }] 
                        }, 
                        "children": [] 
                    }] 
                }] 
            }] 
        } 
    }, { 
        "id": "2", 
 
        "stageStats": { 
            "totalCpuTime": "1.62s", 
            "inputRows": "500 rows", 
            "inputDataSize": "75564B", 
            "stdDevInputRows": "0.00", 
            "outputRows": "10 rows", 
            "outputDataSize": "1514B" 
        }, 
        "outputLayout": "date, time, location, bytes, requestip, method, host, uri, status,\
           referrer, os, browser, browserversion", 
 
        "logicalPlan": { 
            "1": [{ 
                "name": "LimitPartial", 
                "identifier": "[10]", 
                "outputs": ["date", "time", "location", "bytes", "requestip", "method", "host", "uri",\
                  "status", "referrer", "os", "browser", "browserversion"], 
                "details": "", 
                "distributedNodeStats": { 
                    "nodeCpuTime": "0.00ns", 
                    "nodeOutputRows": 10, 
                    "nodeOutputDataSize": "1514B", 
                    "operatorInputRowsStats": [{ 
                        "nodeInputRows": 83.33333333333333, 
                        "nodeInputRowsStdDev": 223.60679774997897 
                    }] 
                }, 
                "children": [{ 
                    "name": "TableScan", 
                    "identifier": "[awsdatacatalog:HiveTableHandle{schemaName=default,\
                       tableName=cloudfront_logs, analyzePartitionValues=Optional.empty},\
                       grouped = false]", 
                    "outputs": ["date", "time", "location", "bytes", "requestip", "method", "host", "uri",\
                       "status", "referrer", "os", "browser", "browserversion"], 
                    "details": "LAYOUT: default.cloudfront_logs\ndate := date:date:0:REGULAR\nreferrer :=\
                       referrer: string:9:REGULAR\nos := os:string:10:REGULAR\nmethod := method:string:5:\
                       REGULAR\nbytes := bytes:int:3:REGULAR\nbrowser := browser:string:11:REGULAR\nhost :=\
                       host:string:6:REGULAR\nrequestip := requestip:string:4:REGULAR\nlocation :=\
                       location:string:2:REGULAR\ntime := time:string:1: REGULAR\nuri := uri:string:7:\
                       REGULAR\nbrowserversion := browserversion:string:12:REGULAR\nstatus :=\
                       status:int:8:REGULAR\n", 
                    "distributedNodeStats": { 
                        "nodeCpuTime": "1.62s", 
                        "nodeOutputRows": 500, 
                        "nodeOutputDataSize": "75564B", 
                        "operatorInputRowsStats": [{ 
                            "nodeInputRows": 83.33333333333333, 
                            "nodeInputRowsStdDev": 223.60679774997897 
                        }] 
                    }, 
                    "children": [] 
                }] 
            }] 
        } 
    }] 
}
```

## Additional resources
<a name="athena-explain-statement-additional-resources"></a>

For additional information, see the following resources.
+  [Understand Athena EXPLAIN statement results](athena-explain-statement-understanding.md)
+  [View execution plans for SQL queries](query-plans.md)
+  [View statistics and execution details for completed queries](query-stats.md)
+ Trino [https://trino.io/docs/current/sql/explain.html](https://trino.io/docs/current/sql/explain.html) documentation
+ Trino [https://trino.io/docs/current/sql/explain-analyze.html](https://trino.io/docs/current/sql/explain-analyze.html) documentation
+  [Optimize Federated Query Performance using EXPLAIN and EXPLAIN ANALYZE in Amazon Athena](https://aws.amazon.com/blogs/big-data/optimize-federated-query-performance-using-explain-and-explain-analyze-in-amazon-athena/) in the *AWS Big Data Blog*. 

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/7JUyTqglmNU/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/7JUyTqglmNU)


# Understand Athena EXPLAIN statement results
<a name="athena-explain-statement-understanding"></a>

This topic provides a brief guide to the operational terms used in Athena `EXPLAIN` statement results.

## EXPLAIN statement output types
<a name="athena-explain-statement-understanding-explain-plan-types"></a>

`EXPLAIN` statement outputs can be one of two types:
+ **Logical plan** – Shows the logical plan that the SQL engine uses to execute a statement. The syntax for this option is `EXPLAIN` or `EXPLAIN (TYPE LOGICAL)`.
+ **Distributed plan** – Shows an execution plan in a distributed environment. The output shows fragments, which are processing stages. Each plan fragment is processed by one or more nodes. Data can be exchanged between the nodes that process the fragments. The syntax for this option is `EXPLAIN (TYPE DISTRIBUTED)`.

  In the output for a distributed plan, fragments (processing stages) are indicated by `Fragment` *number* [*fragment\$1type*], where *number* is a zero-based integer and *fragment\$1type* specifies how the fragment is executed by the nodes. Fragment types, which provide insight into the layout of the data exchange, are described in the following table.  
**Distributed plan fragment types**    
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/athena/latest/ug/athena-explain-statement-understanding.html)

## Exchange
<a name="athena-explain-statement-understanding-exchange-types"></a>

Exchange-related terms describe how data is exchanged between worker nodes. Transfers can be either local or remote. 

**LocalExchange [*exchange\$1type*] **  
Transfers data locally within worker nodes for different stages of a query. The value for *exchange\$1type* can be one of the logical or distributed exchange types as described later in this section.

**RemoteExchange [*exchange\$1type*] **  
Transfers data between worker nodes for different stages of a query. The value for *exchange\$1type* can be one of the logical or distributed exchange types as described later in this section.

### Logical Exchange types
<a name="athena-explain-statement-understanding-exchange-types-logical"></a>

The following exchange types describe actions taken during the exchange phase of a logical plan.
+ **`GATHER`** – A single worker node gathers output from all other worker nodes. For example, the last stage of a select query gathers results from all nodes and writes the results to Amazon S3.
+ **`REPARTITION`** – Sends the row data to a specific worker based on the partitioning scheme required to apply to the next operator.
+ **`REPLICATE`** – Copies the row data to all workers.

### Distributed Exchange types
<a name="athena-explain-statement-understanding-exchange-types-distributed"></a>

The following exchange types indicate the layout of the data when they are exchanged between nodes in a distributed plan.
+ **`HASH`** – The exchange distributes data to multiple destinations using a hash function.
+ **`SINGLE`** – The exchange distributes data to a single destination.

## Scanning
<a name="athena-explain-statement-understanding-scanning"></a>

The following terms describe how data is scanned during a query.

**TableScan **  
Scans a table's source data from Amazon S3 or an Apache Hive connector and applies partition pruning generated from the filter predicate.

**ScanFilter **  
Scans a table's source data from Amazon S3 or a Hive connector and applies partition pruning generated from the filter predicate and from additional filter predicates not applied through partition pruning.

**ScanFilterProject **  
First, scans a table's source data from Amazon S3 or a Hive connector and applies partition pruning generated from the filter predicate and from additional filter predicates not applied through partition pruning. Then, modifies the memory layout of the output data into a new projection to improve performance of later stages.

## Join
<a name="athena-explain-statement-understanding-join"></a>

Joins data between two tables. Joins can be categorized by join type and by distribution type.

### Join types
<a name="athena-explain-statement-understanding-join-types"></a>

Join types define the way in which the join operation occurs.

**CrossJoin** – Produces the Cartesian product of the two tables joined.

**InnerJoin** – Selects records that have matching values in both tables.

**LeftJoin** – Selects all records from the left table and the matching records from the right table. If no match occurs, the result on the right side is NULL.

**RightJoin** – Selects all records from the right table, and the matching records from the left table. If no match occurs, the result on the left side is NULL.

**FullJoin** – Selects all records where there is a match in the left or right table records. The joined table contains all records from both the tables and fills in NULLs for missing matches on either side.

**Note**  
For performance reasons, the query engine can rewrite a join query into a different join type to produce the same results. For example, an inner join query with predicate on one table can be rewritten into a `CrossJoin`. This pushes the predicate down to the scanning phase of the table so that fewer data are scanned.

### Join distribution types
<a name="athena-explain-statement-understanding-join-distribution-types"></a>

Distribution types define how data is exchanged between worker nodes when the join operation is performed.

**Partitioned** – Both the left and right table are hash-partitioned across all worker nodes. Partitioned distribution consumes less memory in each node. Partitioned distribution can be much slower than replicated joins. Partitioned joins are suitable when you join two large tables.

**Replicated** – One table is hash-partitioned across all worker nodes and the other table is replicated to all worker nodes to perform the join operation. Replicated distribution can be much faster than partitioned joins, but it consumes more memory in each worker node. If the replicated table is too large, the worker node can experience an out-of-memory error. Replicated joins are suitable when one of the joined tables is small.

# PREPARE
<a name="sql-prepare"></a>

Creates a SQL statement with the name `statement_name` to be run at a later time. The statement can include parameters represented by question marks. To supply values for the parameters and run the prepared statement, use [EXECUTE](sql-execute.md).

## Synopsis
<a name="sql-prepare-synopsis"></a>

```
PREPARE statement_name FROM statement
```

The following table describes the parameters.


****  

| Parameter | Description | 
| --- | --- | 
| statement\$1name | The name of the statement to be prepared. The name must be unique within the workgroup. | 
| statement | A SELECT, CTAS, or INSERT INTO query. | 

**Note**  
The maximum number of prepared statements in a workgroup is 1000.

## Examples
<a name="sql-prepare-examples"></a>

The following example prepares a select query without parameters.

```
PREPARE my_select1 FROM 
SELECT * FROM nation
```

The following example prepares a select query that includes parameters. The values for `productid` and `quantity` will be supplied by the `USING` clause of an `EXECUTE` statement:

```
PREPARE my_select2 FROM 
SELECT order FROM orders WHERE productid = ? and quantity < ?
```

The following example prepares an insert query.

```
PREPARE my_insert FROM 
INSERT INTO cities_usa (city, state) 
SELECT city, state 
FROM cities_world 
WHERE country = ?
```

## Additional resources
<a name="sql-prepare-additional-resources"></a>

[Use prepared statements](querying-with-prepared-statements-querying.md)

[EXECUTE](sql-execute.md)

[DEALLOCATE PREPARE](sql-deallocate-prepare.md)

[INSERT INTO](insert-into.md)

# EXECUTE
<a name="sql-execute"></a>

Runs a prepared statement with the name `statement_name`. Parameter values for the question marks in the prepared statement are defined in the `USING` clause in a comma separated list. To create a prepared statement, use [PREPARE](sql-prepare.md).

## Synopsis
<a name="sql-execute-synopsis"></a>

```
EXECUTE statement_name [ USING parameter1[, parameter2, ... ] ]
```

## Examples
<a name="sql-execute-examples"></a>

The following example prepares and executes a query with no parameters.

```
PREPARE my_select1 FROM 
SELECT name FROM nation 
EXECUTE my_select1
```

The following example prepares and executes a query with a single parameter.

```
PREPARE my_select2 FROM 
SELECT * FROM "my_database"."my_table" WHERE year = ? 
EXECUTE my_select2 USING 2012
```

This is equivalent to:

```
SELECT * FROM "my_database"."my_table" WHERE year = 2012
```

The following example prepares and executes a query with two parameters.

```
PREPARE my_select3 FROM 
SELECT order FROM orders WHERE productid = ? and quantity < ? 
EXECUTE my_select3 USING 346078, 12
```

## Additional resources
<a name="sql-execute-additional-resources"></a>

[Use prepared statements](querying-with-prepared-statements-querying.md)

[PREPARE](sql-prepare.md)

[INSERT INTO](insert-into.md)

# DEALLOCATE PREPARE
<a name="sql-deallocate-prepare"></a>

Removes the prepared statement with the specified name from the prepared statements in the current workgroup.

## Synopsis
<a name="sql-deallocate-prepare-synopsis"></a>

```
DEALLOCATE PREPARE statement_name
```

## Examples
<a name="sql-deallocate-prepare-examples"></a>

The following example removes the `my_select1` prepared statement from the current workgroup.

```
DEALLOCATE PREPARE my_select1
```

## Additional resources
<a name="sql-deallocate-prepare-additional-resources"></a>

[Use prepared statements](querying-with-prepared-statements-querying.md)

[PREPARE](sql-prepare.md)

# UNLOAD
<a name="unload"></a>

Writes query results from a `SELECT` statement to the specified data format. Supported formats for `UNLOAD` include Apache Parquet, ORC, Apache Avro, and JSON. CSV is the only output format supported by the Athena `SELECT` command, but you can use the `UNLOAD` command, which supports a variety of output formats, to enclose your `SELECT` query and rewrite its output to one of the formats that `UNLOAD` supports. 

Although you can use the `CREATE TABLE AS` (CTAS) statement to output data in formats other than CSV, CTAS statements require the creation of a table in Athena. The `UNLOAD` statement is useful when you want to output the results of a `SELECT` query in a non-CSV format but do not want the associated table. For example, a downstream application might require the results of a `SELECT` query to be in JSON format, and Parquet or ORC might provide a performance advantage over CSV if you intend to use the results of the `SELECT` query for additional analysis.

## Considerations and limitations
<a name="unload-considerations-and-limitations"></a>

When you use the `UNLOAD` statement in Athena, keep in mind the following points:
+ **No global ordering of files** – `UNLOAD` results are written to multiple files in parallel. If the `SELECT` query in the `UNLOAD` statement specifies a sort order, each file's contents are in sorted order, but the files are not sorted relative to each other.
+ **Orphaned data not deleted** – In the case of a failure, Athena does not attempt to delete orphaned data. This behavior is the same as that for CTAS and `INSERT INTO` statements.
+ **Maximum partitions** – The maximum number of partitions that can be used with `UNLOAD` is 100.
+ **Metadata and manifest files** – Athena generates a metadata file and data manifest file for each `UNLOAD` query. The manifest tracks the files that the query wrote. Both files are saved to your Athena query result location in Amazon S3. For more information, see [Identify query output files](querying-finding-output-files.md#querying-identifying-output-files).
+ **Encryption** – `UNLOAD` output files are encrypted according to the encryption configuration used for Amazon S3. To set up encryption configuration to encrypt your `UNLOAD` result, you can use the [EncryptionConfiguration API](https://docs.aws.amazon.com/athena/latest/APIReference/API_EncryptionConfiguration.html).
+ **Prepared statements** – `UNLOAD` can be used with prepared statements. For information about prepared statements in Athena, see [Use parameterized queries](querying-with-prepared-statements.md).
+ **Service quotas** – `UNLOAD` uses DML query quotas. For quota information, see [Service Quotas](service-limits.md).
+ **Expected bucket owner** – The expected bucket owner setting does not apply to the destination Amazon S3 location specfied in the `UNLOAD` query. The expected bucket owner setting applies only to the Amazon S3 output location that you specify for Athena query results. For more information, see [Specify a query result location using the Athena console](query-results-specify-location-console.md).

## Syntax
<a name="unload-syntax"></a>

The `UNLOAD` statement uses the following syntax.

```
UNLOAD (SELECT col_name[, ...] FROM old_table) 
TO 's3://amzn-s3-demo-bucket/my_folder/' 
WITH ( property_name = 'expression' [, ...] )
```

Except when writing to partitions, the `TO` destination must specify a location in Amazon S3 that has no data. Before the `UNLOAD` query writes to the location specified, it verifies that the bucket location is empty. Because `UNLOAD` does not write data to the specified location if the location already has data in it, `UNLOAD` does not overwrite existing data. To reuse a bucket location as a destination for `UNLOAD`, delete the data in the bucket location, and then run the query again. 

Note that when `UNLOAD` writes to partitions, this behavior is different. If you run the same `UNLOAD` query multiple times that has the same `SELECT` statement, the same `TO` location and the same partitions, each `UNLOAD` query unloads the data into Amazon S3 at the location and partitions specified.

### Parameters
<a name="unload-parameters"></a>

Possible values for *property\$1name* are as follows.

** format = '*file\$1format*' **  
Required. Specifies the file format of the output. Possible values for *file\$1format* are `ORC`, `PARQUET`, `AVRO`, `JSON`, or `TEXTFILE`.

** compression = '*compression\$1format*' **  
Optional. This option is specific to the ORC and Parquet formats. For ORC, the default is `zlib`, and for Parquet, the default is `gzip`. For information about supported compression formats, see [Athena compression support](https://docs.aws.amazon.com/athena/latest/ug/compression-formats.html).   
This option does not apply to the `AVRO` format. Athena uses `gzip` for the `JSON` and `TEXTFILE` formats.

**compression\$1level = *compression\$1level* **  
Optional. The compression level to use for ZSTD compression. This property applies only to ZSTD compression. For more information, see [Use ZSTD compression levels](compression-support-zstd-levels.md).

** field\$1delimiter = '*delimiter*' **  
Optional. Specifies a single-character field delimiter for files in CSV, TSV, and other text formats. The following example specifies a comma delimiter.  

```
WITH (field_delimiter = ',')
```
Currently, multicharacter field delimiters are not supported. If you do not specify a field delimiter, the octal character `\001` (^A) is used.

** partitioned\$1by = ARRAY[ *col\$1name*[,...] ] **  
Optional. An array list of columns by which the output is partitioned.  
In your `SELECT` statement, make sure that the names of the partitioned columns are last in your list of columns. 

## Examples
<a name="unload-examples"></a>

The following example writes the output of a `SELECT` query to the Amazon S3 location `s3://amzn-s3-demo-bucket/unload_test_1/` using JSON format.

```
UNLOAD (SELECT * FROM old_table) 
TO 's3://amzn-s3-demo-bucket/unload_test_1/' 
WITH (format = 'JSON')
```

The following example writes the output of a `SELECT` query in Parquet format using Snappy compression.

```
UNLOAD (SELECT * FROM old_table) 
TO 's3://amzn-s3-demo-bucket/' 
WITH (format = 'PARQUET',compression = 'SNAPPY')
```

The following example writes four columns in text format, with the output partitioned by the last column.

```
UNLOAD (SELECT name1, address1, comment1, key1 FROM table1) 
TO 's3://amzn-s3-demo-bucket/ partitioned/' 
WITH (format = 'TEXTFILE', partitioned_by = ARRAY['key1'])
```

The following example unloads the query results to the specified location using the Parquet file format, ZSTD compression, and ZSTD compression level 4.

```
UNLOAD (SELECT * FROM old_table) 
TO 's3://amzn-s3-demo-bucket/' 
WITH (format = 'PARQUET', compression = 'ZSTD', compression_level = 4)
```

## Additional resources
<a name="unload-additional-resources"></a>
+ [Simplify your ETL and ML pipelines using the Amazon Athena UNLOAD feature](https://aws.amazon.com/blogs/big-data/simplify-your-etl-and-ml-pipelines-using-the-amazon-athena-unload-feature/) in the *AWS Big Data Blog*. 

# Functions in Amazon Athena
<a name="functions"></a>

For changes in functions between Athena engine versions, see [Athena engine versioning](engine-versions.md). For a list of the time zones that can be used with the `AT TIME ZONE` operator, see [Use supported time zones](athena-supported-time-zones.md).

**Topics**
+ [Athena engine version 3](functions-env3.md)

# Athena engine version 3 functions
<a name="functions-env3"></a>

Functions in Athena engine version 3 are based on Trino. For information about Trino functions, operators, and expressions, see [Functions and operators](https://trino.io/docs/current/functions.html) and the following subsections from the Trino documentation.
+  [Aggregate](https://trino.io/docs/current/functions/aggregate.html) 
+  [Array](https://trino.io/docs/current/functions/array.html) 
+  [Binary](https://trino.io/docs/current/functions/binary.html) 
+  [Bitwise](https://trino.io/docs/current/functions/bitwise.html) 
+  [Color](https://trino.io/docs/current/functions/color.html) 
+  [Comparison](https://trino.io/docs/current/functions/comparison.html) 
+  [Conditional](https://trino.io/docs/current/functions/conditional.html) 
+  [Conversion](https://trino.io/docs/current/functions/conversion.html) 
+  [Date and time](https://trino.io/docs/current/functions/datetime.html) 
+  [Decimal](https://trino.io/docs/current/functions/decimal.html) 
+  [Geospatial](https://trino.io/docs/current/functions/geospatial.html) 
+  [HyperLogLog](https://trino.io/docs/current/functions/hyperloglog.html) 
+  [IP Address](https://trino.io/docs/current/functions/ipaddress.html) 
+  [JSON](https://trino.io/docs/current/functions/json.html) 
+  [Lambda](https://trino.io/docs/current/functions/lambda.html) 
+  [Logical](https://trino.io/docs/current/functions/logical.html) 
+  [Machine learning](https://trino.io/docs/current/functions/ml.html) 
+  [Map](https://trino.io/docs/current/functions/map.html) 
+  [Math](https://trino.io/docs/current/functions/math.html) 
+  [Quantile digest](https://trino.io/docs/current/functions/qdigest.html) 
+  [Regular expression](https://trino.io/docs/current/functions/regexp.html) 
+  [Session](https://trino.io/docs/current/functions/session.html) 
+  [Set Digest](https://trino.io/docs/current/functions/setdigest.html) 
+  [String](https://trino.io/docs/current/functions/string.html) 
+  [Table](https://trino.io/docs/current/functions/table.html) 
+  [Teradata](https://trino.io/docs/current/functions/teradata.html) 
+  [T-Digest](https://trino.io/docs/current/functions/tdigest.html) 
+  [URL](https://trino.io/docs/current/functions/url.html) 
+  [UUID](https://trino.io/docs/current/functions/uuid.html) 
+  [Window](https://trino.io/docs/current/functions/window.html) 

## invoker\$1principal() function
<a name="functions-env3-invoker-principal"></a>

The `invoker_principal` function is unique to Athena engine version 3 and is not found in Trino.

Returns a `VARCHAR` that contains the ARN of the principal (IAM role or Identity Center identity) that ran the query calling the function. For example, if the query invoker uses the permissions of an IAM role to run the query, the function returns the ARN of the IAM role. The role that runs the query must allow the `LakeFormation:GetDataLakePrincipal` action. 

### Usage
<a name="functions-invoker-principal-usage"></a>

```
SELECT invoker_principal()
```

The following table shows an example result.


****  

| \$1 | \$1col0 | 
| --- | --- | 
| 1 | arn:aws:iam::111122223333:role/Admin | 

# Use supported time zones
<a name="athena-supported-time-zones"></a>

You can use the `AT TIME ZONE` operator in a `SELECT timestamp` statement to specify the timezone for the timestamp that is returned, as in the following example:

```
SELECT timestamp '2012-10-31 01:00 UTC' AT TIME ZONE 'America/Los_Angeles' AS la_time;
```

**Results**

```
la_time

2012-10-30 18:00:00.000 America/Los_Angeles
```

For a list of supported time zones in Athena, expand the [List of supported time zones](#athena-supported-time-zones-list) at the end of this topic.

## Timezone functions and examples
<a name="athena-supported-time-zones-functions-examples"></a>

Following are some additional timezone related functions and examples.
+ **at\$1timezone(*timestamp*, *zone*)** – Returns the value of *timestamp* in the corresponding local time for *zone*.

  **Example**

  ```
  SELECT at_timezone(timestamp '2021-08-22 00:00 UTC', 'Canada/Newfoundland')
  ```

  **Result**

  ```
  2021-08-21 21:30:00.000 Canada/Newfoundland
  ```
+ **timezone\$1hour(*timestamp*)** – Returns the hour of the time zone offset from timestamp as a `bigint`.

  **Example**

  ```
  SELECT timezone_hour(timestamp '2021-08-22 04:00 UTC' AT TIME ZONE 'Canada/Newfoundland')
  ```

  **Result**

  ```
  -2
  ```
+ **timezone\$1minute(*timestamp*)** – Returns the minute of the time zone offset from *timestamp* as a `bigint`.

  **Example**

  ```
  SELECT timezone_minute(timestamp '2021-08-22 04:00 UTC' AT TIME ZONE 'Canada/Newfoundland')
  ```

  **Result**

  ```
  -30
  ```
+ **with\$1timezone(*timestamp*, *zone*)** – Returns a timestamp with time zone from the specified *timestamp* and *zone* values.

  **Example**

  ```
  SELECT with_timezone(timestamp '2021-08-22 04:00', 'Canada/Newfoundland')
  ```

  **Result**

  ```
  2021-08-22 04:00:00.000 Canada/Newfoundland
  ```

## List of supported time zones
<a name="athena-supported-time-zones-list"></a>

The following list contains the time zones that can be used with the `AT TIME ZONE` operator in Athena. For additional timezone related functions and examples, see [Timezone functions and examples](#athena-supported-time-zones-functions-examples).

```
Africa/Abidjan
Africa/Accra
Africa/Addis_Ababa
Africa/Algiers
Africa/Asmara
Africa/Asmera
Africa/Bamako
Africa/Bangui
Africa/Banjul
Africa/Bissau
Africa/Blantyre
Africa/Brazzaville
Africa/Bujumbura
Africa/Cairo
Africa/Casablanca
Africa/Ceuta
Africa/Conakry
Africa/Dakar
Africa/Dar_es_Salaam
Africa/Djibouti
Africa/Douala
Africa/El_Aaiun
Africa/Freetown
Africa/Gaborone
Africa/Harare
Africa/Johannesburg
Africa/Juba
Africa/Kampala
Africa/Khartoum
Africa/Kigali
Africa/Kinshasa
Africa/Lagos
Africa/Libreville
Africa/Lome
Africa/Luanda
Africa/Lubumbashi
Africa/Lusaka
Africa/Malabo
Africa/Maputo
Africa/Maseru
Africa/Mbabane
Africa/Mogadishu
Africa/Monrovia
Africa/Nairobi
Africa/Ndjamena
Africa/Niamey
Africa/Nouakchott
Africa/Ouagadougou
Africa/Porto-Novo
Africa/Sao_Tome
Africa/Timbuktu
Africa/Tripoli
Africa/Tunis
Africa/Windhoek
America/Adak
America/Anchorage
America/Anguilla
America/Antigua
America/Araguaina
America/Argentina/Buenos_Aires
America/Argentina/Catamarca
America/Argentina/ComodRivadavia
America/Argentina/Cordoba
America/Argentina/Jujuy
America/Argentina/La_Rioja
America/Argentina/Mendoza
America/Argentina/Rio_Gallegos
America/Argentina/Salta
America/Argentina/San_Juan
America/Argentina/San_Luis
America/Argentina/Tucuman
America/Argentina/Ushuaia
America/Aruba
America/Asuncion
America/Atikokan
America/Atka
America/Bahia
America/Bahia_Banderas
America/Barbados
America/Belem
America/Belize
America/Blanc-Sablon
America/Boa_Vista
America/Bogota
America/Boise
America/Buenos_Aires
America/Cambridge_Bay
America/Campo_Grande
America/Cancun
America/Caracas
America/Catamarca
America/Cayenne
America/Cayman
America/Chicago
America/Chihuahua
America/Coral_Harbour
America/Cordoba
America/Costa_Rica
America/Creston
America/Cuiaba
America/Curacao
America/Danmarkshavn
America/Dawson
America/Dawson_Creek
America/Denver
America/Detroit
America/Dominica
America/Edmonton
America/Eirunepe
America/El_Salvador
America/Ensenada
America/Fort_Nelson
America/Fort_Wayne
America/Fortaleza
America/Glace_Bay
America/Godthab
America/Goose_Bay
America/Grand_Turk
America/Grenada
America/Guadeloupe
America/Guatemala
America/Guayaquil
America/Guyana
America/Halifax
America/Havana
America/Hermosillo
America/Indiana/Indianapolis
America/Indiana/Knox
America/Indiana/Marengo
America/Indiana/Petersburg
America/Indiana/Tell_City
America/Indiana/Vevay
America/Indiana/Vincennes
America/Indiana/Winamac
America/Indianapolis
America/Inuvik
America/Iqaluit
America/Jamaica
America/Jujuy
America/Juneau
America/Kentucky/Louisville
America/Kentucky/Monticello
America/Knox_IN
America/Kralendijk
America/La_Paz
America/Lima
America/Los_Angeles
America/Louisville
America/Lower_Princes
America/Maceio
America/Managua
America/Manaus
America/Marigot
America/Martinique
America/Matamoros
America/Mazatlan
America/Mendoza
America/Menominee
America/Merida
America/Metlakatla
America/Mexico_City
America/Miquelon
America/Moncton
America/Monterrey
America/Montevideo
America/Montreal
America/Montserrat
America/Nassau
America/New_York
America/Nipigon
America/Nome
America/Noronha
America/North_Dakota/Beulah
America/North_Dakota/Center
America/North_Dakota/New_Salem
America/Ojinaga
America/Panama
America/Pangnirtung
America/Paramaribo
America/Phoenix
America/Port-au-Prince
America/Port_of_Spain
America/Porto_Acre
America/Porto_Velho
America/Puerto_Rico
America/Punta_Arenas
America/Rainy_River
America/Rankin_Inlet
America/Recife
America/Regina
America/Resolute
America/Rio_Branco
America/Rosario
America/Santa_Isabel
America/Santarem
America/Santiago
America/Santo_Domingo
America/Sao_Paulo
America/Scoresbysund
America/Shiprock
America/Sitka
America/St_Barthelemy
America/St_Johns
America/St_Kitts
America/St_Lucia
America/St_Thomas
America/St_Vincent
America/Swift_Current
America/Tegucigalpa
America/Thule
America/Thunder_Bay
America/Tijuana
America/Toronto
America/Tortola
America/Vancouver
America/Virgin
America/Whitehorse
America/Winnipeg
America/Yakutat
America/Yellowknife
Antarctica/Casey
Antarctica/Davis
Antarctica/DumontDUrville
Antarctica/Macquarie
Antarctica/Mawson
Antarctica/McMurdo
Antarctica/Palmer
Antarctica/Rothera
Antarctica/South_Pole
Antarctica/Syowa
Antarctica/Troll
Antarctica/Vostok
Arctic/Longyearbyen
Asia/Aden
Asia/Almaty
Asia/Amman
Asia/Anadyr
Asia/Aqtau
Asia/Aqtobe
Asia/Ashgabat
Asia/Ashkhabad
Asia/Atyrau
Asia/Baghdad
Asia/Bahrain
Asia/Baku
Asia/Bangkok
Asia/Barnaul
Asia/Beirut
Asia/Bishkek
Asia/Brunei
Asia/Calcutta
Asia/Chita
Asia/Choibalsan
Asia/Chongqing
Asia/Chungking
Asia/Colombo
Asia/Dacca
Asia/Damascus
Asia/Dhaka
Asia/Dili
Asia/Dubai
Asia/Dushanbe
Asia/Gaza
Asia/Harbin
Asia/Hebron
Asia/Ho_Chi_Minh
Asia/Hong_Kong
Asia/Hovd
Asia/Irkutsk
Asia/Istanbul
Asia/Jakarta
Asia/Jayapura
Asia/Jerusalem
Asia/Kabul
Asia/Kamchatka
Asia/Karachi
Asia/Kashgar
Asia/Kathmandu
Asia/Katmandu
Asia/Khandyga
Asia/Kolkata
Asia/Krasnoyarsk
Asia/Kuala_Lumpur
Asia/Kuching
Asia/Kuwait
Asia/Macao
Asia/Macau
Asia/Magadan
Asia/Makassar
Asia/Manila
Asia/Muscat
Asia/Nicosia
Asia/Novokuznetsk
Asia/Novosibirsk
Asia/Omsk
Asia/Oral
Asia/Phnom_Penh
Asia/Pontianak
Asia/Pyongyang
Asia/Qatar
Asia/Qyzylorda
Asia/Rangoon
Asia/Riyadh
Asia/Saigon
Asia/Sakhalin
Asia/Samarkand
Asia/Seoul
Asia/Shanghai
Asia/Singapore
Asia/Srednekolymsk
Asia/Taipei
Asia/Tashkent
Asia/Tbilisi
Asia/Tehran
Asia/Tel_Aviv
Asia/Thimbu
Asia/Thimphu
Asia/Tokyo
Asia/Tomsk
Asia/Ujung_Pandang
Asia/Ulaanbaatar
Asia/Ulan_Bator
Asia/Urumqi
Asia/Ust-Nera
Asia/Vientiane
Asia/Vladivostok
Asia/Yakutsk
Asia/Yangon
Asia/Yekaterinburg
Asia/Yerevan
Atlantic/Azores
Atlantic/Bermuda
Atlantic/Canary
Atlantic/Cape_Verde
Atlantic/Faeroe
Atlantic/Faroe
Atlantic/Jan_Mayen
Atlantic/Madeira
Atlantic/Reykjavik
Atlantic/South_Georgia
Atlantic/St_Helena
Atlantic/Stanley
Australia/ACT
Australia/Adelaide
Australia/Brisbane
Australia/Broken_Hill
Australia/Canberra
Australia/Currie
Australia/Darwin
Australia/Eucla
Australia/Hobart
Australia/LHI
Australia/Lindeman
Australia/Lord_Howe
Australia/Melbourne
Australia/NSW
Australia/North
Australia/Perth
Australia/Queensland
Australia/South
Australia/Sydney
Australia/Tasmania
Australia/Victoria
Australia/West
Australia/Yancowinna
Brazil/Acre
Brazil/DeNoronha
Brazil/East
Brazil/West
CET
CST6CDT
Canada/Atlantic
Canada/Central
Canada/Eastern
Canada/Mountain
Canada/Newfoundland
Canada/Pacific
Canada/Saskatchewan
Canada/Yukon
Chile/Continental
Chile/EasterIsland
Cuba
EET
EST5EDT
Egypt
Eire
Europe/Amsterdam
Europe/Andorra
Europe/Astrakhan
Europe/Athens
Europe/Belfast
Europe/Belgrade
Europe/Berlin
Europe/Bratislava
Europe/Brussels
Europe/Bucharest
Europe/Budapest
Europe/Busingen
Europe/Chisinau
Europe/Copenhagen
Europe/Dublin
Europe/Gibraltar
Europe/Guernsey
Europe/Helsinki
Europe/Isle_of_Man
Europe/Istanbul
Europe/Jersey
Europe/Kaliningrad
Europe/Kiev
Europe/Kirov
Europe/Lisbon
Europe/Ljubljana
Europe/London
Europe/Luxembourg
Europe/Madrid
Europe/Malta
Europe/Mariehamn
Europe/Minsk
Europe/Monaco
Europe/Moscow
Europe/Nicosia
Europe/Oslo
Europe/Paris
Europe/Podgorica
Europe/Prague
Europe/Riga
Europe/Rome
Europe/Samara
Europe/San_Marino
Europe/Sarajevo
Europe/Simferopol
Europe/Skopje
Europe/Sofia
Europe/Stockholm
Europe/Tallinn
Europe/Tirane
Europe/Tiraspol
Europe/Ulyanovsk
Europe/Uzhgorod
Europe/Vaduz
Europe/Vatican
Europe/Vienna
Europe/Vilnius
Europe/Volgograd
Europe/Warsaw
Europe/Zagreb
Europe/Zaporozhye
Europe/Zurich
GB
GB-Eire
Hongkong
Iceland
Indian/Antananarivo
Indian/Chagos
Indian/Christmas
Indian/Cocos
Indian/Comoro
Indian/Kerguelen
Indian/Mahe
Indian/Maldives
Indian/Mauritius
Indian/Mayotte
Indian/Reunion
Iran
Israel
Jamaica
Japan
Kwajalein
Libya
MET
MST7MDT
Mexico/BajaNorte
Mexico/BajaSur
Mexico/General
NZ
NZ-CHAT
Navajo
PRC
PST8PDT
Pacific/Apia
Pacific/Auckland
Pacific/Bougainville
Pacific/Chatham
Pacific/Chuuk
Pacific/Easter
Pacific/Efate
Pacific/Enderbury
Pacific/Fakaofo
Pacific/Fiji
Pacific/Funafuti
Pacific/Galapagos
Pacific/Gambier
Pacific/Guadalcanal
Pacific/Guam
Pacific/Honolulu
Pacific/Johnston
Pacific/Kiritimati
Pacific/Kosrae
Pacific/Kwajalein
Pacific/Majuro
Pacific/Marquesas
Pacific/Midway
Pacific/Nauru
Pacific/Niue
Pacific/Norfolk
Pacific/Noumea
Pacific/Pago_Pago
Pacific/Palau
Pacific/Pitcairn
Pacific/Pohnpei
Pacific/Ponape
Pacific/Port_Moresby
Pacific/Rarotonga
Pacific/Saipan
Pacific/Samoa
Pacific/Tahiti
Pacific/Tarawa
Pacific/Tongatapu
Pacific/Truk
Pacific/Wake
Pacific/Wallis
Pacific/Yap
Poland
Portugal
ROK
Singapore
Turkey
US/Alaska
US/Aleutian
US/Arizona
US/Central
US/East-Indiana
US/Eastern
US/Hawaii
US/Indiana-Starke
US/Michigan
US/Mountain
US/Pacific
US/Pacific-New
US/Samoa
W-SU
WET
```