# Using data lake frameworks with AWS Glue ETL jobs
<a name="aws-glue-programming-etl-datalake-native-frameworks"></a>

Open-source data lake frameworks simplify incremental data processing for files that you store in data lakes built on Amazon S3. AWS Glue 3.0 and later supports the following open-source data lake frameworks:
+ Apache Hudi
+ Linux Foundation Delta Lake
+ Apache Iceberg

We provide native support for these frameworks so that you can read and write data that you store in Amazon S3 in a transactionally consistent manner. There's no need to install a separate connector or complete extra configuration steps in order to use these frameworks in AWS Glue ETL jobs.

When you manage datasets through the AWS Glue Data Catalog, you can use AWS Glue methods to read and write data lake tables with Spark DataFrames. You can also read and write Amazon S3 data using the Spark DataFrame API.

In this video, you can learn about the basics of how Apache Hudi, Apache Iceberg, and Delta Lake work. You'll see how to insert, update, and delete data in your data lake and how each of these frameworks works.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/fryfx0Zg7KA/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/fryfx0Zg7KA)


**Topics**
+ [Limitations](aws-glue-programming-etl-datalake-native-frameworks-limitations.md)
+ [Using the Hudi framework in AWS Glue](aws-glue-programming-etl-format-hudi.md)
+ [Using the Delta Lake framework in AWS Glue](aws-glue-programming-etl-format-delta-lake.md)
+ [Using the Iceberg framework in AWS Glue](aws-glue-programming-etl-format-iceberg.md)

# Limitations
<a name="aws-glue-programming-etl-datalake-native-frameworks-limitations"></a>

Consider the following limitations before you use data lake frameworks with AWS Glue.
+ The following AWS Glue `GlueContext` methods for DynamicFrame don't support reading and writing data lake framework tables. Use the `GlueContext` methods for DataFrame or Spark DataFrame API instead.
  + `create_dynamic_frame.from_catalog`
  + `write_dynamic_frame.from_catalog`
  + `getDynamicFrame`
  + `writeDynamicFrame`
+ The following `GlueContext` methods for DataFrame are supported with Lake Formation permission control:
  + `create_data_frame.from_catalog`
  + `write_data_frame.from_catalog`
  + `getDataFrame`
  + `writeDataFrame`
+ [Grouping small files](grouping-input-files.md) is not supported.
+ [Job bookmarks](monitor-continuations.md) are not supported.
+ Apache Hudi 0.10.1 for AWS Glue 3.0 doesn't support Hudi Merge on Read (MoR) tables.
+ `ALTER TABLE … RENAME TO` is not available for Apache Iceberg 0.13.1 for AWS Glue 3.0.

## Limitations for data lake format tables managed by Lake Formation permissions
<a name="w2aac67c11c24c11c31c17b7"></a>

The data lake formats are integrated with AWS Glue ETL via Lake Formation permissions. Creating a DynamicFrame using `create_dynamic_frame` is not supported. For more information, see the following examples:
+ [Example: Read and write Iceberg table with Lake Formation permission control](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-iceberg.html#aws-glue-programming-etl-format-iceberg-read-write-lake-formation-tables)
+ [Example: Read and write Hudi table with Lake Formation permission control](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html#aws-glue-programming-etl-format-hudi-read-write-lake-formation-tables)
+ [Example: Read and write Delta Lake table with Lake Formation permission control](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-delta-lake.html#aws-glue-programming-etl-format-delta-lake-read-write-lake-formation-tables)

**Note**  
The integration with AWS Glue ETL via Lake Formation permissions for Apache Hudi, Apache Iceberg, and Delta Lake is supported only in AWS Glue version 4.0.

Apache Iceberg has the best integration with AWS Glue ETL via Lake Formation permissions. It supports almost all operations and includes SQL support.

Hudi supports most basic operations with the exception of administrative operations. This is because these options generally are done via writing of dataframes and specified via `additional_options`. You need to use AWS Glue APIs to create DataFrames for your operations as SparkSQL is not supported.

Delta Lake only supports the reading and appending and overwriting of table data. Delta Lake requires the use of their own libraries to be able to perform various tasks such as updates.

The following features are not available for Iceberg tables managed by Lake Formation permissions.
+ Compaction using AWS Glue ETL
+ Spark SQL support via AWS Glue ETL

The following are limitations of Hudi tables managed by Lake Formation permissions:
+ Removal of orphaned files

The following are limitations of Delta Lake tables managed by Lake Formation permissions:
+ All features other than inserting and reading from Delta Lake tables.

# Using the Hudi framework in AWS Glue
<a name="aws-glue-programming-etl-format-hudi"></a>

AWS Glue 3.0 and later supports Apache Hudi framework for data lakes. Hudi is an open-source data lake storage framework that simplifies incremental data processing and data pipeline development. This topic covers available features for using your data in AWS Glue when you transport or store your data in a Hudi table. To learn more about Hudi, see the official [Apache Hudi documentation](https://hudi.apache.org/docs/overview/). 

You can use AWS Glue to perform read and write operations on Hudi tables in Amazon S3, or work with Hudi tables using the AWS Glue Data Catalog. Additional operations including insert, update, and all of the [Apache Spark operations](https://hudi.apache.org/docs/quick-start-guide/) are also supported.

**Note**  
The Apache Hudi 0.15.0 implementation in AWS Glue 5.0 internally reverts [HUDI-7001](https://github.com/apache/hudi/pull/9936). It does not exhibit the regression related to Complex Key gen when the record key consists of a single field. However this behavior differs from OSS Apache Hudi 0.15.0.  
Apache Hudi 0.10.1 for AWS Glue 3.0 doesn't support Hudi Merge on Read (MoR) tables.

The following table lists the Hudi version that is included in each AWS Glue version.


****  

| AWS Glue version | Supported Hudi version | 
| --- | --- | 
| 5.1 | 1.0.2 | 
| 5.0 | 0.15.0 | 
| 4.0 | 0.12.1 | 
| 3.0 | 0.10.1 | 

To learn more about the data lake frameworks that AWS Glue supports, see [Using data lake frameworks with AWS Glue ETL jobs](aws-glue-programming-etl-datalake-native-frameworks.md).

## Enabling Hudi
<a name="aws-glue-programming-etl-format-hudi-enable"></a>

To enable Hudi for AWS Glue, complete the following tasks:
+ Specify `hudi` as a value for the `--datalake-formats` job parameter. For more information, see [Using job parameters in AWS Glue jobs](aws-glue-programming-etl-glue-arguments.md).
+ Create a key named `--conf` for your AWS Glue job, and set it to the following value. Alternatively, you can set the following configuration using `SparkConf` in your script. These settings help Apache Spark correctly handle Hudi tables.

  ```
  spark.serializer=org.apache.spark.serializer.KryoSerializer
  ```
+ Lake Formation permission support for Hudi is enabled by default for AWS Glue 4.0. No additional configuration is needed for reading/writing to Lake Formation-registered Hudi tables. To read a registered Hudi table, the AWS Glue job IAM role must have the SELECT permission. To write to a registered Hudi table, the AWS Glue job IAM role must have the SUPER permission. To learn more about managing Lake Formation permissions, see [Granting and revoking permissions on Data Catalog resources](https://docs.aws.amazon.com/lake-formation/latest/dg/granting-catalog-permissions.html).

**Using a different Hudi version**

To use a version of Hudi that AWS Glue doesn't support, specify your own Hudi JAR files using the `--extra-jars` job parameter. Do not include `hudi` as a value for the `--datalake-formats` job parameter. If you use AWS Glue 5.0 or above, you must set `--user-jars-first true` job parameter.

## Example: Write a Hudi table to Amazon S3 and register it in the AWS Glue Data Catalog
<a name="aws-glue-programming-etl-format-hudi-write"></a>

This example script demonstrates how to write a Hudi table to Amazon S3 and register the table to the AWS Glue Data Catalog. The example uses the Hudi [Hive Sync tool](https://hudi.apache.org/docs/syncing_metastore/) to register the table.

**Note**  
This example requires you to set the `--enable-glue-datacatalog` job parameter in order to use the AWS Glue Data Catalog as an Apache Spark Hive metastore. To learn more, see [Using job parameters in AWS Glue jobs](aws-glue-programming-etl-glue-arguments.md).

------
#### [ Python ]

```
# Example: Create a Hudi table from a DataFrame 
# and register the table to Glue Data Catalog

additional_options={
    "hoodie.table.name": "<your_table_name>",
    "hoodie.database.name": "<your_database_name>",
    "hoodie.datasource.write.storage.type": "COPY_ON_WRITE",
    "hoodie.datasource.write.operation": "upsert",
    "hoodie.datasource.write.recordkey.field": "<your_recordkey_field>",
    "hoodie.datasource.write.precombine.field": "<your_precombine_field>",
    "hoodie.datasource.write.partitionpath.field": "<your_partitionkey_field>",
    "hoodie.datasource.write.hive_style_partitioning": "true",
    "hoodie.datasource.hive_sync.enable": "true",
    "hoodie.datasource.hive_sync.database": "<your_database_name>",
    "hoodie.datasource.hive_sync.table": "<your_table_name>",
    "hoodie.datasource.hive_sync.partition_fields": "<your_partitionkey_field>",
    "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor",
    "hoodie.datasource.hive_sync.use_jdbc": "false",
    "hoodie.datasource.hive_sync.mode": "hms",
    "path": "s3://<s3Path/>"
}

dataFrame.write.format("hudi") \
    .options(**additional_options) \
    .mode("overwrite") \
    .save()
```

------
#### [ Scala ]

```
// Example: Example: Create a Hudi table from a DataFrame
// and register the table to Glue Data Catalog

val additionalOptions = Map(
  "hoodie.table.name" -> "<your_table_name>",
  "hoodie.database.name" -> "<your_database_name>",
  "hoodie.datasource.write.storage.type" -> "COPY_ON_WRITE",
  "hoodie.datasource.write.operation" -> "upsert",
  "hoodie.datasource.write.recordkey.field" -> "<your_recordkey_field>",
  "hoodie.datasource.write.precombine.field" -> "<your_precombine_field>",
  "hoodie.datasource.write.partitionpath.field" -> "<your_partitionkey_field>",
  "hoodie.datasource.write.hive_style_partitioning" -> "true",
  "hoodie.datasource.hive_sync.enable" -> "true",
  "hoodie.datasource.hive_sync.database" -> "<your_database_name>",
  "hoodie.datasource.hive_sync.table" -> "<your_table_name>",
  "hoodie.datasource.hive_sync.partition_fields" -> "<your_partitionkey_field>",
  "hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.MultiPartKeysValueExtractor",
  "hoodie.datasource.hive_sync.use_jdbc" -> "false",
  "hoodie.datasource.hive_sync.mode" -> "hms",
  "path" -> "s3://<s3Path/>")

dataFrame.write.format("hudi")
  .options(additionalOptions)
  .mode("append")
  .save()
```

------

## Example: Read a Hudi table from Amazon S3 using the AWS Glue Data Catalog
<a name="aws-glue-programming-etl-format-hudi-read"></a>

This example reads the Hudi table that you created in the [Example: Write a Hudi table to Amazon S3 and register it in the AWS Glue Data Catalog](#aws-glue-programming-etl-format-hudi-write) from Amazon S3.

**Note**  
This example requires you to set the `--enable-glue-datacatalog` job parameter in order to use the AWS Glue Data Catalog as an Apache Spark Hive metastore. To learn more, see [Using job parameters in AWS Glue jobs](aws-glue-programming-etl-glue-arguments.md).

------
#### [ Python ]

For this example, use the `GlueContext.create\$1data\$1frame.from\$1catalog()` method.

```
# Example: Read a Hudi table from Glue Data Catalog

from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)

dataFrame = glueContext.create_data_frame.from_catalog(
    database = "<your_database_name>",
    table_name = "<your_table_name>"
)
```

------
#### [ Scala ]

For this example, use the [getCatalogSource](glue-etl-scala-apis-glue-gluecontext.md#glue-etl-scala-apis-glue-gluecontext-defs-getCatalogSource) method.

```
// Example: Read a Hudi table from Glue Data Catalog

import com.amazonaws.services.glue.GlueContext
import org.apache.spark.SparkContext

object GlueApp {
  def main(sysArgs: Array[String]): Unit = {
    val spark: SparkContext = new SparkContext()
    val glueContext: GlueContext = new GlueContext(spark)
    
    val dataFrame = glueContext.getCatalogSource(
      database = "<your_database_name>",
      tableName = "<your_table_name>"
    ).getDataFrame()
  }
}
```

------

## Example: Update and insert a `DataFrame` into a Hudi table in Amazon S3
<a name="aws-glue-programming-etl-format-hudi-update-insert"></a>

This example uses the AWS Glue Data Catalog to insert a DataFrame into the Hudi table that you created in [Example: Write a Hudi table to Amazon S3 and register it in the AWS Glue Data Catalog](#aws-glue-programming-etl-format-hudi-write).

**Note**  
This example requires you to set the `--enable-glue-datacatalog` job parameter in order to use the AWS Glue Data Catalog as an Apache Spark Hive metastore. To learn more, see [Using job parameters in AWS Glue jobs](aws-glue-programming-etl-glue-arguments.md).

------
#### [ Python ]

For this example, use the `GlueContext.write\$1data\$1frame.from\$1catalog()` method.

```
# Example: Upsert a Hudi table from Glue Data Catalog

from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)

glueContext.write_data_frame.from_catalog(
    frame = dataFrame,
    database = "<your_database_name>",
    table_name = "<your_table_name>",
    additional_options={
        "hoodie.table.name": "<your_table_name>",
        "hoodie.database.name": "<your_database_name>",
        "hoodie.datasource.write.storage.type": "COPY_ON_WRITE",
        "hoodie.datasource.write.operation": "upsert",
        "hoodie.datasource.write.recordkey.field": "<your_recordkey_field>",
        "hoodie.datasource.write.precombine.field": "<your_precombine_field>",
        "hoodie.datasource.write.partitionpath.field": "<your_partitionkey_field>",
        "hoodie.datasource.write.hive_style_partitioning": "true",
        "hoodie.datasource.hive_sync.enable": "true",
        "hoodie.datasource.hive_sync.database": "<your_database_name>",
        "hoodie.datasource.hive_sync.table": "<your_table_name>",
        "hoodie.datasource.hive_sync.partition_fields": "<your_partitionkey_field>",
        "hoodie.datasource.hive_sync.partition_extractor_class": "org.apache.hudi.hive.MultiPartKeysValueExtractor",
        "hoodie.datasource.hive_sync.use_jdbc": "false",
        "hoodie.datasource.hive_sync.mode": "hms"
    }
)
```

------
#### [ Scala ]

For this example, use the [getCatalogSink](glue-etl-scala-apis-glue-gluecontext.md#glue-etl-scala-apis-glue-gluecontext-defs-getCatalogSink) method.

```
// Example: Upsert a Hudi table from Glue Data Catalog

import com.amazonaws.services.glue.GlueContext
import com.amazonaws.services.glue.util.JsonOptions
import org.apacke.spark.SparkContext

object GlueApp {
  def main(sysArgs: Array[String]): Unit = {
    val spark: SparkContext = new SparkContext()
    val glueContext: GlueContext = new GlueContext(spark)
    glueContext.getCatalogSink("<your_database_name>", "<your_table_name>",
      additionalOptions = JsonOptions(Map(
        "hoodie.table.name" -> "<your_table_name>",
        "hoodie.database.name" -> "<your_database_name>",
        "hoodie.datasource.write.storage.type" -> "COPY_ON_WRITE",
        "hoodie.datasource.write.operation" -> "upsert",
        "hoodie.datasource.write.recordkey.field" -> "<your_recordkey_field>",
        "hoodie.datasource.write.precombine.field" -> "<your_precombine_field>",
        "hoodie.datasource.write.partitionpath.field" -> "<your_partitionkey_field>",
        "hoodie.datasource.write.hive_style_partitioning" -> "true",
        "hoodie.datasource.hive_sync.enable" -> "true",
        "hoodie.datasource.hive_sync.database" -> "<your_database_name>",
        "hoodie.datasource.hive_sync.table" -> "<your_table_name>",
        "hoodie.datasource.hive_sync.partition_fields" -> "<your_partitionkey_field>",
        "hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.MultiPartKeysValueExtractor",
        "hoodie.datasource.hive_sync.use_jdbc" -> "false",
        "hoodie.datasource.hive_sync.mode" -> "hms"
      )))
      .writeDataFrame(dataFrame, glueContext)
  }
}
```

------

## Example: Read a Hudi table from Amazon S3 using Spark
<a name="aws-glue-programming-etl-format-hudi-read-spark"></a>

This example reads a Hudi table from Amazon S3 using the Spark DataFrame API.

------
#### [ Python ]

```
# Example: Read a Hudi table from S3 using a Spark DataFrame

dataFrame = spark.read.format("hudi").load("s3://<s3path/>")
```

------
#### [ Scala ]

```
// Example: Read a Hudi table from S3 using a Spark DataFrame

val dataFrame = spark.read.format("hudi").load("s3://<s3path/>")
```

------

## Example: Write a Hudi table to Amazon S3 using Spark
<a name="aws-glue-programming-etl-format-hudi-write-spark"></a>

This example writes a Hudi table to Amazon S3 using Spark.

------
#### [ Python ]

```
# Example: Write a Hudi table to S3 using a Spark DataFrame

dataFrame.write.format("hudi") \
    .options(**additional_options) \
    .mode("overwrite") \
    .save("s3://<s3Path/>)
```

------
#### [ Scala ]

```
// Example: Write a Hudi table to S3 using a Spark DataFrame

dataFrame.write.format("hudi")
  .options(additionalOptions)
  .mode("overwrite")
  .save("s3://<s3path/>")
```

------

## Example: Read and write Hudi table with Lake Formation permission control
<a name="aws-glue-programming-etl-format-hudi-read-write-lake-formation-tables"></a>

This example reads and writes a Hudi table with Lake Formation permission control.

1. Create a Hudi table and register it in Lake Formation.

   1. To enable Lake Formation permission control, you’ll first need to register the table Amazon S3 path on Lake Formation. For more information, see [Registering an Amazon S3 location](https://docs.aws.amazon.com/lake-formation/latest/dg/register-location.html). You can register it either from the Lake Formation console or by using the AWS CLI:

      ```
      aws lakeformation register-resource --resource-arn arn:aws:s3:::<s3-bucket>/<s3-folder> --use-service-linked-role --region <REGION>
      ```

      Once you register an Amazon S3 location, any AWS Glue table pointing to the location (or any of its child locations) will return the value for the `IsRegisteredWithLakeFormation` parameter as true in the `GetTable` call.

   1. Create a Hudi table that points to the registered Amazon S3 path through the Spark dataframe API:

      ```
      hudi_options = {
          'hoodie.table.name': table_name,
          'hoodie.database.name': database_name,
          'hoodie.datasource.write.storage.type': 'COPY_ON_WRITE',
          'hoodie.datasource.write.recordkey.field': 'product_id',
          'hoodie.datasource.write.table.name': table_name,
          'hoodie.datasource.write.operation': 'upsert',
          'hoodie.datasource.write.precombine.field': 'updated_at',
          'hoodie.datasource.write.hive_style_partitioning': 'true',
          'hoodie.upsert.shuffle.parallelism': 2,
          'hoodie.insert.shuffle.parallelism': 2,
          'path': <S3_TABLE_LOCATION>,
          'hoodie.datasource.hive_sync.enable': 'true',
          'hoodie.datasource.hive_sync.database': database_name,
          'hoodie.datasource.hive_sync.table': table_name,
          'hoodie.datasource.hive_sync.use_jdbc': 'false',
          'hoodie.datasource.hive_sync.mode': 'hms'
      }
      
      df_products.write.format("hudi")  \
          .options(**hudi_options)  \
          .mode("overwrite")  \
          .save()
      ```

1. Grant Lake Formation permission to the AWS Glue job IAM role. You can either grant permissions from the Lake Formation console, or using the AWS CLI. For more information, see [Granting table permissions using the Lake Formation console and the named resource method](https://docs.aws.amazon.com/lake-formation/latest/dg/granting-table-permissions.html)

1.  Read the Hudi table registered in Lake Formation. The code is same as reading a non-registered Hudi table. Note that the AWS Glue job IAM role needs to have the SELECT permission for the read to succeed.

   ```
    val dataFrame = glueContext.getCatalogSource(
         database = "<your_database_name>",
         tableName = "<your_table_name>"
       ).getDataFrame()
   ```

1. Write to a Hudi table registered in Lake Formation. The code is same as writing to a non-registered Hudi table. Note that the AWS Glue job IAM role needs to have the SUPER permission for the write to succeed.

   ```
   glueContext.getCatalogSink("<your_database_name>", "<your_table_name>",
         additionalOptions = JsonOptions(Map(
           "hoodie.table.name" -> "<your_table_name>",
           "hoodie.database.name" -> "<your_database_name>",
           "hoodie.datasource.write.storage.type" -> "COPY_ON_WRITE",
           "hoodie.datasource.write.operation" -> "<write_operation>",
           "hoodie.datasource.write.recordkey.field" -> "<your_recordkey_field>",
           "hoodie.datasource.write.precombine.field" -> "<your_precombine_field>",
           "hoodie.datasource.write.partitionpath.field" -> "<your_partitionkey_field>",
           "hoodie.datasource.write.hive_style_partitioning" -> "true",
           "hoodie.datasource.hive_sync.enable" -> "true",
           "hoodie.datasource.hive_sync.database" -> "<your_database_name>",
           "hoodie.datasource.hive_sync.table" -> "<your_table_name>",
           "hoodie.datasource.hive_sync.partition_fields" -> "<your_partitionkey_field>",
           "hoodie.datasource.hive_sync.partition_extractor_class" -> "org.apache.hudi.hive.MultiPartKeysValueExtractor",
           "hoodie.datasource.hive_sync.use_jdbc" -> "false",
           "hoodie.datasource.hive_sync.mode" -> "hms"
         )))
         .writeDataFrame(dataFrame, glueContext)
   ```

# Using the Delta Lake framework in AWS Glue
<a name="aws-glue-programming-etl-format-delta-lake"></a>

AWS Glue 3.0 and later supports the Linux Foundation Delta Lake framework. Delta Lake is an open-source data lake storage framework that helps you perform ACID transactions, scale metadata handling, and unify streaming and batch data processing. This topic covers available features for using your data in AWS Glue when you transport or store your data in a Delta Lake table. To learn more about Delta Lake, see the official [Delta Lake documentation](https://docs.delta.io/latest/delta-intro.html). 

You can use AWS Glue to perform read and write operations on Delta Lake tables in Amazon S3, or work with Delta Lake tables using the AWS Glue Data Catalog. Additional operations such as insert, update, and [Table batch reads and writes](https://docs.delta.io/0.7.0/api/python/index.html) are also supported. When you use Delta Lake tables, you also have the option to use methods from the Delta Lake Python library such as `DeltaTable.forPath`. For more information about the Delta Lake Python library, see Delta Lake's Python documentation.

The following table lists the version of Delta Lake included in each AWS Glue version.


****  

| AWS Glue version | Supported Delta Lake version | 
| --- | --- | 
| 5.1 | 3.3.2 | 
| 5.0 | 3.3.0 | 
| 4.0 | 2.1.0 | 
| 3.0 | 1.0.0 | 

To learn more about the data lake frameworks that AWS Glue supports, see [Using data lake frameworks with AWS Glue ETL jobs](aws-glue-programming-etl-datalake-native-frameworks.md).

## Enabling Delta Lake for AWS Glue
<a name="aws-glue-programming-etl-format-delta-lake-enable"></a>

To enable Delta Lake for AWS Glue, complete the following tasks:
+ Specify `delta` as a value for the `--datalake-formats` job parameter. For more information, see [Using job parameters in AWS Glue jobs](aws-glue-programming-etl-glue-arguments.md).
+ Create a key named `--conf` for your AWS Glue job, and set it to the following value. Alternatively, you can set the following configuration using `SparkConf` in your script. These settings help Apache Spark correctly handle Delta Lake tables.

  ```
  spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension --conf spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog --conf spark.delta.logStore.class=org.apache.spark.sql.delta.storage.S3SingleDriverLogStore
  ```
+ Lake Formation permission support for Delta tables is enabled by default for AWS Glue 4.0. No additional configuration is needed for reading/writing to Lake Formation-registered Delta tables. To read a registered Delta table, the AWS Glue job IAM role must have the SELECT permission. To write to a registered Delta table, the AWS Glue job IAM role must have the SUPER permission. To learn more about managing Lake Formation permissions, see [Granting and revoking permissions on Data Catalog resources](https://docs.aws.amazon.com/lake-formation/latest/dg/granting-catalog-permissions.html).

**Using a different Delta Lake version**

To use a version of Delta lake that AWS Glue doesn't support, specify your own Delta Lake JAR files using the `--extra-jars` job parameter. Do not include `delta` as a value for the `--datalake-formats` job parameter. If you use AWS Glue 5.0 or above, you must set `--user-jars-first true` job parameter. To use the Delta Lake Python library in this case, you must specify the library JAR files using the `--extra-py-files` job parameter. The Python library comes packaged in the Delta Lake JAR files.

## Example: Write a Delta Lake table to Amazon S3 and register it to the AWS Glue Data Catalog
<a name="aws-glue-programming-etl-format-delta-lake-write"></a>

The following AWS Glue ETL script demonstrates how to write a Delta Lake table to Amazon S3 and register the table to the AWS Glue Data Catalog.

------
#### [ Python ]

```
# Example: Create a Delta Lake table from a DataFrame 
# and register the table to Glue Data Catalog

additional_options = {
    "path": "s3://<s3Path>"
}
dataFrame.write \
    .format("delta") \
    .options(**additional_options) \
    .mode("append") \
    .partitionBy("<your_partitionkey_field>") \
    .saveAsTable("<your_database_name>.<your_table_name>")
```

------
#### [ Scala ]

```
// Example: Example: Create a Delta Lake table from a DataFrame
// and register the table to Glue Data Catalog

val additional_options = Map(
  "path" -> "s3://<s3Path>"
)
dataFrame.write.format("delta")
  .options(additional_options)
  .mode("append")
  .partitionBy("<your_partitionkey_field>")
  .saveAsTable("<your_database_name>.<your_table_name>")
```

------

## Example: Read a Delta Lake table from Amazon S3 using the AWS Glue Data Catalog
<a name="aws-glue-programming-etl-format-delta-lake-read"></a>

The following AWS Glue ETL script reads the Delta Lake table that you created in [Example: Write a Delta Lake table to Amazon S3 and register it to the AWS Glue Data Catalog](#aws-glue-programming-etl-format-delta-lake-write).

------
#### [ Python ]

For this example, use the [create\$1data\$1frame.from\$1catalog](aws-glue-api-crawler-pyspark-extensions-glue-context.md#aws-glue-api-crawler-pyspark-extensions-glue-context-create-dataframe-from-catalog) method.

```
# Example: Read a Delta Lake table from Glue Data Catalog

from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)

df = glueContext.create_data_frame.from_catalog(
    database="<your_database_name>",
    table_name="<your_table_name>",
    additional_options=additional_options
)
```

------
#### [ Scala ]

For this example, use the [getCatalogSource](glue-etl-scala-apis-glue-gluecontext.md#glue-etl-scala-apis-glue-gluecontext-defs-getCatalogSource) method.

```
// Example: Read a Delta Lake table from Glue Data Catalog

import com.amazonaws.services.glue.GlueContext
import org.apacke.spark.SparkContext

object GlueApp {
  def main(sysArgs: Array[String]): Unit = {
    val spark: SparkContext = new SparkContext()
    val glueContext: GlueContext = new GlueContext(spark)
    val df = glueContext.getCatalogSource("<your_database_name>", "<your_table_name>",
      additionalOptions = additionalOptions)
      .getDataFrame()
  }
}
```

------

## Example: Insert a `DataFrame` into a Delta Lake table in Amazon S3 using the AWS Glue Data Catalog
<a name="aws-glue-programming-etl-format-delta-lake-insert"></a>

This example inserts data into the Delta Lake table that you created in [Example: Write a Delta Lake table to Amazon S3 and register it to the AWS Glue Data Catalog](#aws-glue-programming-etl-format-delta-lake-write).

**Note**  
This example requires you to set the `--enable-glue-datacatalog` job parameter in order to use the AWS Glue Data Catalog as an Apache Spark Hive metastore. To learn more, see [Using job parameters in AWS Glue jobs](aws-glue-programming-etl-glue-arguments.md).

------
#### [ Python ]

For this example, use the [write\$1data\$1frame.from\$1catalog](aws-glue-api-crawler-pyspark-extensions-glue-context.md#aws-glue-api-crawler-pyspark-extensions-glue-context-write_data_frame_from_catalog) method.

```
# Example: Insert into a Delta Lake table in S3 using Glue Data Catalog

from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)

glueContext.write_data_frame.from_catalog(
    frame=dataFrame,
    database="<your_database_name>",
    table_name="<your_table_name>",
    additional_options=additional_options
)
```

------
#### [ Scala ]

For this example, use the [getCatalogSink](glue-etl-scala-apis-glue-gluecontext.md#glue-etl-scala-apis-glue-gluecontext-defs-getCatalogSink) method.

```
// Example: Insert into a Delta Lake table in S3 using Glue Data Catalog

import com.amazonaws.services.glue.GlueContext
import org.apacke.spark.SparkContext

object GlueApp {
  def main(sysArgs: Array[String]): Unit = {
    val spark: SparkContext = new SparkContext()
    val glueContext: GlueContext = new GlueContext(spark)
    glueContext.getCatalogSink("<your_database_name>", "<your_table_name>",
      additionalOptions = additionalOptions)
      .writeDataFrame(dataFrame, glueContext)
  }
}
```

------

## Example: Read a Delta Lake table from Amazon S3 using the Spark API
<a name="aws-glue-programming-etl-format-delta_lake-read-spark"></a>

This example reads a Delta Lake table from Amazon S3 using the Spark API.

------
#### [ Python ]

```
# Example: Read a Delta Lake table from S3 using a Spark DataFrame

dataFrame = spark.read.format("delta").load("s3://<s3path/>")
```

------
#### [ Scala ]

```
// Example: Read a Delta Lake table from S3 using a Spark DataFrame

val dataFrame = spark.read.format("delta").load("s3://<s3path/>")
```

------

## Example: Write a Delta Lake table to Amazon S3 using Spark
<a name="aws-glue-programming-etl-format-delta_lake-write-spark"></a>

This example writes a Delta Lake table to Amazon S3 using Spark.

------
#### [ Python ]

```
# Example: Write a Delta Lake table to S3 using a Spark DataFrame

dataFrame.write.format("delta") \
    .options(**additional_options) \
    .mode("overwrite") \
    .partitionBy("<your_partitionkey_field>")
    .save("s3://<s3Path>")
```

------
#### [ Scala ]

```
// Example: Write a Delta Lake table to S3 using a Spark DataFrame

dataFrame.write.format("delta")
  .options(additionalOptions)
  .mode("overwrite")
  .partitionBy("<your_partitionkey_field>")
  .save("s3://<s3path/>")
```

------

## Example: Read and write Delta Lake table with Lake Formation permission control
<a name="aws-glue-programming-etl-format-delta-lake-read-write-lake-formation-tables"></a>

This example reads and writes a Delta Lake table with Lake Formation permission control.

1. Create a Delta table and register it in Lake Formation

   1. To enable Lake Formation permission control, you’ll first need to register the table Amazon S3 path on Lake Formation. For more information, see [Registering an Amazon S3 location](https://docs.aws.amazon.com/lake-formation/latest/dg/register-location.html). You can register it either from the Lake Formation console or by using the AWS CLI:

      ```
      aws lakeformation register-resource --resource-arn arn:aws:s3:::<s3-bucket>/<s3-folder> --use-service-linked-role --region <REGION>
      ```

      Once you register an Amazon S3 location, any AWS Glue table pointing to the location (or any of its child locations) will return the value for the `IsRegisteredWithLakeFormation` parameter as true in the `GetTable` call.

   1. Create a Delta table that points to the registered Amazon S3 path through Spark:
**Note**  
The following are Python examples.

      ```
      dataFrame.write \
      	.format("delta") \
      	.mode("overwrite") \
      	.partitionBy("<your_partitionkey_field>") \
      	.save("s3://<the_s3_path>")
      ```

      After the data has been written to Amazon S3, use the AWS Glue crawler to create a new Delta catalog table. For more information, see [Introducing native Delta Lake table support with AWS Glue crawlers](https://aws.amazon.com/blogs/big-data/introducing-native-delta-lake-table-support-with-aws-glue-crawlers/).

      You can also create the table manually through the AWS Glue `CreateTable` API.

1. Grant Lake Formation permission to the AWS Glue job IAM role. You can either grant permissions from the Lake Formation console, or using the AWS CLI. For more information, see [Granting table permissions using the Lake Formation console and the named resource method](https://docs.aws.amazon.com/lake-formation/latest/dg/granting-table-permissions.html)

1.  Read the Delta table registered in Lake Formation. The code is same as reading a non-registered Delta table. Note that the AWS Glue job IAM role needs to have the SELECT permission for the read to succeed.

   ```
   # Example: Read a Delta Lake table from Glue Data Catalog
   
   df = glueContext.create_data_frame.from_catalog(
       database="<your_database_name>",
       table_name="<your_table_name>",
       additional_options=additional_options
   )
   ```

1. Write to a Delta table registered in Lake Formation. The code is same as writing to a non-registered Delta table. Note that the AWS Glue job IAM role needs to have the SUPER permission for the write to succeed.

   By default AWS Glue uses `Append` as saveMode. You can change it by setting the saveMode option in `additional_options`. For information about saveMode support in Delta tables, see [Write to a table](https://docs.delta.io/latest/delta-batch.html#write-to-a-table).

   ```
   glueContext.write_data_frame.from_catalog(
       frame=dataFrame,
       database="<your_database_name>",
       table_name="<your_table_name>",
       additional_options=additional_options
   )
   ```

# Using the Iceberg framework in AWS Glue
<a name="aws-glue-programming-etl-format-iceberg"></a>

AWS Glue 3.0 and later supports the Apache Iceberg framework for data lakes. Iceberg provides a high-performance table format that works just like a SQL table. This topic covers available features for using your data in AWS Glue when you transport or store your data in an Iceberg table. To learn more about Iceberg, see the official [Apache Iceberg documentation](https://iceberg.apache.org/docs/latest/). 

You can use AWS Glue to perform read and write operations on Iceberg tables in Amazon S3, or work with Iceberg tables using the AWS Glue Data Catalog. Additional operations including insert and all [Spark Queries](https://iceberg.apache.org/docs/latest/spark-queries/) [Spark Writes](https://iceberg.apache.org/docs/latest/spark-writes/) are also supported. Update is not supported for Iceberg tables. 

**Note**  
`ALTER TABLE … RENAME TO` is not available for Apache Iceberg 0.13.1 for AWS Glue 3.0.

The following table lists the version of Iceberg included in each AWS Glue version.


****  

| AWS Glue version | Supported Iceberg version | 
| --- | --- | 
| 5.1 | 1.10.0 | 
| 5.0 | 1.7.1 | 
| 4.0 | 1.0.0 | 
| 3.0 | 0.13.1 | 

To learn more about the data lake frameworks that AWS Glue supports, see [Using data lake frameworks with AWS Glue ETL jobs](aws-glue-programming-etl-datalake-native-frameworks.md).

## Enabling the Iceberg framework
<a name="aws-glue-programming-etl-format-iceberg-enable"></a>

To enable Iceberg for AWS Glue, complete the following tasks:
+ Specify `iceberg` as a value for the `--datalake-formats` job parameter. For more information, see [Using job parameters in AWS Glue jobs](aws-glue-programming-etl-glue-arguments.md).
+ Create a key named `--conf` for your AWS Glue job, and set it to the following value. Alternatively, you can set the following configuration using `SparkConf` in your script. These settings help Apache Spark correctly handle Iceberg tables.

  ```
  spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions 
  --conf spark.sql.catalog.glue_catalog=org.apache.iceberg.spark.SparkCatalog 
  --conf spark.sql.catalog.glue_catalog.warehouse=s3://<your-warehouse-dir>/ 
  --conf spark.sql.catalog.glue_catalog.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog 
  --conf spark.sql.catalog.glue_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO
  ```

  If you are reading or writing to Iceberg tables that are registered with Lake Formation, follow the guidance in [Using AWS Glue with AWS Lake Formation for fine-grained access control](security-lf-enable.md) in AWS Glue 5.0 and later. In AWS Glue 4.0, add the following configuration to enable Lake Formation support.

  ```
  --conf spark.sql.catalog.glue_catalog.glue.lakeformation-enabled=true
  --conf spark.sql.catalog.glue_catalog.glue.id=<table-catalog-id>
  ```

  If you use AWS Glue 3.0 with Iceberg 0.13.1, you must set the following additional configurations to use Amazon DynamoDB lock manager to ensure atomic transaction. AWS Glue 4.0 or later uses optimistic locking by default. For more information, see [Iceberg AWS Integrations](https://iceberg.apache.org/docs/latest/aws/#dynamodb-lock-manager) in the official Apache Iceberg documentation.

  ```
  --conf spark.sql.catalog.glue_catalog.lock-impl=org.apache.iceberg.aws.glue.DynamoLockManager 
  --conf spark.sql.catalog.glue_catalog.lock.table=<your-dynamodb-table-name>
  ```

**Using a different Iceberg version**

To use a version of Iceberg that AWS Glue doesn't support, specify your own Iceberg JAR files using the `--extra-jars` job parameter. Do not include `iceberg` as a value for the `--datalake-formats` parameter. If you use AWS Glue 5.0 or above, you must set `--user-jars-first true` job parameter.

**Enabling encryption for Iceberg tables**

**Note**  
Iceberg tables have their own mechanisms to enable server-side encryption. You should enable this configuration in addition to AWS Glue's security configuration.

To enable server-side encryption on Iceberg tables, review the guidance from the [Iceberg documentation](https://iceberg.apache.org/docs/latest/aws/#s3-server-side-encryption).

**Add Spark configuration for Iceberg cross region**

To add additional spark configuration for Iceberg cross-region table access with the AWS Glue Data Catalog and AWS Lake Formation, follow the steps below:

1. Create a [Multi-region access point](https://docs.aws.amazon.com/AmazonS3/latest/userguide/multi-region-access-point-create-examples.html).

1. Set the following Spark properties:

   ```
   -----
       --conf spark.sql.catalog.my_catalog.s3.use-arn-region-enabled=true \
       --conf spark.sql.catalog.{CATALOG}.s3.access-points.bucket1", "arn:aws:s3::<account-id>:accesspoint/<mrap-id>.mrap \
       --conf spark.sql.catalog.{CATALOG}.s3.access-points.bucket2", "arn:aws:s3::<account-id>:accesspoint/<mrap-id>.mrap
   -----
   ```

## Example: Write an Iceberg table to Amazon S3 and register it to the AWS Glue Data Catalog
<a name="aws-glue-programming-etl-format-iceberg-write"></a>

This example script demonstrates how to write an Iceberg table to Amazon S3. The example uses [Iceberg AWS Integrations](https://iceberg.apache.org/docs/latest/aws/) to register the table to the AWS Glue Data Catalog.

------
#### [ Python ]

```
# Example: Create an Iceberg table from a DataFrame 
# and register the table to Glue Data Catalog

dataFrame.createOrReplaceTempView("tmp_<your_table_name>")

query = f"""
CREATE TABLE glue_catalog.<your_database_name>.<your_table_name>
USING iceberg
TBLPROPERTIES ("format-version"="2")
AS SELECT * FROM tmp_<your_table_name>
"""
spark.sql(query)
```

------
#### [ Scala ]

```
// Example: Example: Create an Iceberg table from a DataFrame
// and register the table to Glue Data Catalog

dataFrame.createOrReplaceTempView("tmp_<your_table_name>")

val query = """CREATE TABLE glue_catalog.<your_database_name>.<your_table_name>
USING iceberg
TBLPROPERTIES ("format-version"="2")
AS SELECT * FROM tmp_<your_table_name>
"""
spark.sql(query)
```

------

Alternatively, you can write an Iceberg table to Amazon S3 and the Data Catalog using Spark methods.

Prerequisites: You will need to provision a catalog for the Iceberg library to use. When using the AWS Glue Data Catalog, AWS Glue makes this straightforward. The AWS Glue Data Catalog is pre-configured for use by the Spark libraries as `glue_catalog`. Data Catalog tables are identified by a *databaseName* and a *tableName*. For more information about the AWS Glue Data Catalog, see [Data discovery and cataloging in AWS Glue](catalog-and-crawler.md).

If you are not using the AWS Glue Data Catalog, you will need to provision a catalog through the Spark APIs. For more information, see [Spark Configuration](https://iceberg.apache.org/docs/latest/spark-configuration/) in the Iceberg documentation.

This example writes an Iceberg table to Amazon S3 and the Data Catalog using Spark.

------
#### [ Python ]

```
# Example: Write an Iceberg table to S3 on the Glue Data Catalog

# Create (equivalent to CREATE TABLE AS SELECT)
dataFrame.writeTo("glue_catalog.databaseName.tableName") \
    .tableProperty("format-version", "2") \
    .create()

# Append (equivalent to INSERT INTO)
dataFrame.writeTo("glue_catalog.databaseName.tableName") \
    .tableProperty("format-version", "2") \
    .append()
```

------
#### [ Scala ]

```
// Example: Write an Iceberg table to S3 on the Glue Data Catalog

// Create (equivalent to CREATE TABLE AS SELECT)
dataFrame.writeTo("glue_catalog.databaseName.tableName")
    .tableProperty("format-version", "2")
    .create()

// Append (equivalent to INSERT INTO)
dataFrame.writeTo("glue_catalog.databaseName.tableName")
    .tableProperty("format-version", "2")
    .append()
```

------

## Example: Read an Iceberg table from Amazon S3 using the AWS Glue Data Catalog
<a name="aws-glue-programming-etl-format-iceberg-read"></a>

This example reads the Iceberg table that you created in [Example: Write an Iceberg table to Amazon S3 and register it to the AWS Glue Data Catalog](#aws-glue-programming-etl-format-iceberg-write).

------
#### [ Python ]

For this example, use the `GlueContext.create\$1data\$1frame.from\$1catalog()` method.

```
# Example: Read an Iceberg table from Glue Data Catalog

from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)

df = glueContext.create_data_frame.from_catalog(
    database="<your_database_name>",
    table_name="<your_table_name>",
    additional_options=additional_options
)
```

------
#### [ Scala ]

For this example, use the [getCatalogSource](glue-etl-scala-apis-glue-gluecontext.md#glue-etl-scala-apis-glue-gluecontext-defs-getCatalogSource) method.

```
// Example: Read an Iceberg table from Glue Data Catalog

import com.amazonaws.services.glue.GlueContext
import org.apacke.spark.SparkContext

object GlueApp {
  def main(sysArgs: Array[String]): Unit = {
    val spark: SparkContext = new SparkContext()
    val glueContext: GlueContext = new GlueContext(spark)
    val df = glueContext.getCatalogSource("<your_database_name>", "<your_table_name>",
      additionalOptions = additionalOptions)
      .getDataFrame()
  }
}
```

------

## Example: Insert a `DataFrame` into an Iceberg table in Amazon S3 using the AWS Glue Data Catalog
<a name="aws-glue-programming-etl-format-iceberg-insert"></a>

This example inserts data into the Iceberg table that you created in [Example: Write an Iceberg table to Amazon S3 and register it to the AWS Glue Data Catalog](#aws-glue-programming-etl-format-iceberg-write).

**Note**  
This example requires you to set the `--enable-glue-datacatalog` job parameter in order to use the AWS Glue Data Catalog as an Apache Spark Hive metastore. To learn more, see [Using job parameters in AWS Glue jobs](aws-glue-programming-etl-glue-arguments.md).

------
#### [ Python ]

For this example, use the `GlueContext.write\$1data\$1frame.from\$1catalog()` method.

```
# Example: Insert into an Iceberg table from Glue Data Catalog

from awsglue.context import GlueContext
from pyspark.context import SparkContext

sc = SparkContext()
glueContext = GlueContext(sc)

glueContext.write_data_frame.from_catalog(
    frame=dataFrame,
    database="<your_database_name>",
    table_name="<your_table_name>",
    additional_options=additional_options
)
```

------
#### [ Scala ]

For this example, use the [getCatalogSink](glue-etl-scala-apis-glue-gluecontext.md#glue-etl-scala-apis-glue-gluecontext-defs-getCatalogSink) method.

```
// Example: Insert into an Iceberg table from Glue Data Catalog

import com.amazonaws.services.glue.GlueContext
import org.apacke.spark.SparkContext

object GlueApp {
  def main(sysArgs: Array[String]): Unit = {
    val spark: SparkContext = new SparkContext()
    val glueContext: GlueContext = new GlueContext(spark)
    glueContext.getCatalogSink("<your_database_name>", "<your_table_name>",
      additionalOptions = additionalOptions)
      .writeDataFrame(dataFrame, glueContext)
  }
}
```

------

## Example: Read an Iceberg table from Amazon S3 using Spark
<a name="aws-glue-programming-etl-format-iceberg-read-spark"></a>

Prerequisites: You will need to provision a catalog for the Iceberg library to use. When using the AWS Glue Data Catalog, AWS Glue makes this straightforward. The AWS Glue Data Catalog is pre-configured for use by the Spark libraries as `glue_catalog`. Data Catalog tables are identified by a *databaseName* and a *tableName*. For more information about the AWS Glue Data Catalog, see [Data discovery and cataloging in AWS Glue](catalog-and-crawler.md).

If you are not using the AWS Glue Data Catalog, you will need to provision a catalog through the Spark APIs. For more information, see [Spark Configuration](https://iceberg.apache.org/docs/latest/spark-configuration/) in the Iceberg documentation.

This example reads an Iceberg table in Amazon S3 from the Data Catalog using Spark.

------
#### [ Python ]

```
# Example: Read an Iceberg table on S3 as a DataFrame from the Glue Data Catalog

dataFrame = spark.read.format("iceberg").load("glue_catalog.databaseName.tableName")
```

------
#### [ Scala ]

```
// Example: Read an Iceberg table on S3 as a DataFrame from the Glue Data Catalog

val dataFrame = spark.read.format("iceberg").load("glue_catalog.databaseName.tableName")
```

------

## Example: Read and write Iceberg table with Lake Formation permission control
<a name="aws-glue-programming-etl-format-iceberg-read-write-lake-formation-tables"></a>

This example reads and writes an Iceberg table with Lake Formation permission control.

**Note**  
This example works only in AWS Glue 4.0. In AWS Glue 5.0 and later, follow the guidance in [Using AWS Glue with AWS Lake Formation for fine-grained access control](security-lf-enable.md).

1. Create an Iceberg table and register it in Lake Formation:

   1. To enable Lake Formation permission control, you’ll first need to register the table Amazon S3 path on Lake Formation. For more information, see [Registering an Amazon S3 location](https://docs.aws.amazon.com/lake-formation/latest/dg/register-location.html). You can register it either from the Lake Formation console or by using the AWS CLI:

      ```
      aws lakeformation register-resource --resource-arn arn:aws:s3:::<s3-bucket>/<s3-folder> --use-service-linked-role --region <REGION>
      ```

      Once you register an Amazon S3 location, any AWS Glue table pointing to the location (or any of its child locations) will return the value for the `IsRegisteredWithLakeFormation` parameter as true in the `GetTable` call.

   1. Create an Iceberg table that points to the registered path through Spark SQL:
**Note**  
The following are Python examples.

      ```
      dataFrame.createOrReplaceTempView("tmp_<your_table_name>")
      
      query = f"""
      CREATE TABLE glue_catalog.<your_database_name>.<your_table_name>
      USING iceberg
      AS SELECT * FROM tmp_<your_table_name>
      """
      spark.sql(query)
      ```

      You can also create the table manually through AWS Glue `CreateTable` API. For more information, see [Creating Apache Iceberg tables](https://docs.aws.amazon.com/lake-formation/latest/dg/creating-iceberg-tables.html).
**Note**  
The `UpdateTable` API does not currently support Iceberg table format as an input to the operation.

1. Grant Lake Formation permission to the job IAM role. You can either grant permissions from the Lake Formation console, or using the AWS CLI. For more information, see: https://docs.aws.amazon.com/lake-formation/latest/dg/granting-table-permissions.html

1. Read an Iceberg table registered with Lake Formation. The code is same as reading a non-registered Iceberg table. Note that your AWS Glue job IAM role needs to have the SELECT permission for the read to succeed.

   ```
   # Example: Read an Iceberg table from the AWS Glue Data Catalog
   from awsglue.context import GlueContextfrom pyspark.context import SparkContext
   
   sc = SparkContext()
   glueContext = GlueContext(sc)
   
   df = glueContext.create_data_frame.from_catalog(
       database="<your_database_name>",
       table_name="<your_table_name>",
       additional_options=additional_options
   )
   ```

1. Write to an Iceberg table registered with Lake Formation. The code is same as writing to a non-registered Iceberg table. Note that your AWS Glue job IAM role needs to have the SUPER permission for the write to succeed.

   ```
   glueContext.write_data_frame.from_catalog(
       frame=dataFrame,
       database="<your_database_name>",
       table_name="<your_table_name>",
       additional_options=additional_options
   )
   ```