

# Snapshot retention optimization
<a name="snapshot-retention-management"></a>

Apache Iceberg snapshot retention feature allows users to query historical data at specific points in time and revert unwanted modifications to their tables. In the AWS Glue Data Catalog, snapshot retention configuration controls how long these snapshots (versions of the table data) are kept before being expired and removed. This helps manage storage costs and metadata overhead by automatically removing older snapshots based on a configured retention period or maximum number of snapshots to keep. 

You can configure the retention period in days and the maximum number of snapshots to retain for a table. AWS Glue removes snapshots that are older than the specified retention period from the table metadata, while keeping the most recent snapshots up to the configured limit. After removing old snapshots from the metadata, AWS Glue deletes the corresponding data and metadata files that are no longer referenced and unique to the expired snapshots. This allows time travel queries only up to the remaining retained snapshots, while reclaiming storage space used by expired snapshot data.

**Topics**
+ [Enabling snapshot retention optimizer](enable-snapshot-retention.md)
+ [Updating snapshot retention optimizer](update-snapshot-retention.md)
+ [Disabling snapshot retention optimizer](disable-snapshot-retention.md)

# Enabling snapshot retention optimizer
<a name="enable-snapshot-retention"></a>

 You can use AWS Glue console, AWS CLI, or AWS API to enable snapshot retention optimizers for your Apache Iceberg tables in the Data Catalog. For new tables, you can choose Apache Iceberg as table format and enable snapshot retention optimizer when you create the table. Snapshot retention is disabled by default for new tables.

------
#### [ Console ]

**To enable snapshot retention optimizer**

1.  Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/) and sign in as a data lake administrator, the table creator, or a user who has been granted the `glue:UpdateTable` and `lakeformation:GetDataAccess` permissions on the table. 

1. In the navigation pane, under **Data Catalog**, choose **Tables**.

1. On the **Tables** page, choose an Iceberg table that you want to enable snapshot retention optimizer for, then under **Actions** menu, choose **Enable** under **Optimization**.

   You can also enable optimization by selecting the table and opening the **Table details** page. Choose the **Table optimization** tab on the lower section of the page, and choose **Enable snapshot retention**. 

1. On the **Enable optimization ** page, under **Optimization configuration**, you have two options: **Use default setting** or **Customize settings**. If you choose to use the default settings, AWS Glue utilizes the properties defined in the Iceberg table configuration to determine the snapshot retention period and the number of snapshots to be retained. In the absence of this configuration, AWS Glue retains one snapshot for five days, and deletes files associated with the expired snapshots.

1.  Next, choose an IAM role that AWS Glue can assume on your behalf to run the optimizer. For details about the permissions required for the IAM role, see the [Table optimization prerequisites](optimization-prerequisites.md) section.

   Follow the steps below to update an existing IAM role: 

   1.  To update the permissions policy for the IAM role, in the IAM console, go to the IAM role that is being used for running compaction. 

   1.  In the Add permissions section, choose Create policy. In the newly opened browser window, create a new policy to use with your role. 

   1. On the Create policy page, choose the JSON tab. Copy the JSON code shown in the Prerequisites into the policy editor field.

1. If you prefer to set the values for the **Snapshot retention configuration** manually, choose **Customize settings**.   
![\[Apache Iceberg table details page with Enable retention>Customize settings option.\]](http://docs.aws.amazon.com/glue/latest/dg/images/table-enable-retention.png)

1. Choose the box **Apply the selected IAM role to the selected optimizers** option to use a single IAM role for all enabling all optimizers.

1. If you have security policy configurations where the Iceberg table optimizer needs to access Amazon S3 buckets from a specific Virtual Private Cloud (VPC), create an AWS Glue network connection or use an existing one.

   If you don't have an AWS Glue VPC Connection set up already, create a new one by following the steps in the [Creating connections for connectors](https://docs.aws.amazon.com/glue/latest/dg/creating-connections.html) section using the AWS Glue console or the AWS CLI/SDK.

1. Next, under **Snapshot retention configuration**, either choose to use the values specified in the [Iceberg table configuration](https://iceberg.apache.org/docs/1.5.2/configuration/#table-behavior-properties), or specify custom values for snapshot retention period (history.expire.max-snapshot-age-ms), minimum number of snapshots (history.expire.min-snapshots-to-keep) to retain, and the time in hours between consecutive snapshot deletion job runs.

1.  Choose **Delete associated files** to delete underlying files when the table optimizer deletes old snapshots from the table metadata.

    If you don't choose this option, when older snapshots are removed from the table metadata, their associated files will remain in the storage as orphaned files. 

1. Next, read the caution statement, and choose **I acknowledge** to proceed.
**Note**  
 In the Data Catalog, the snapshot retention optimizer honors the lifecycle that is controlled by branch and tag level retention policies. For more information, see [Branching and tagging](https://iceberg.apache.org/docs/latest/branching/#overview) section in the Iceberg documentation.

1. Review the configuration and choose **Enable optimization**.

   Wait a few minutes for the retention optimizer to run and expire old snapshots based on the configuration.

------
#### [ AWS CLI ]

 To enable snapshot retention for new Iceberg tables in AWS Glue, you need to create a table optimizer of type `retention` and set the `enabled` field to `true` in the `table-optimizer-configuration`. You can do this using the AWS CLI command `create-table-optimizer` or `update-table-optimizer`. Additionally, you need to specify the retention configuration fields like `snapshotRetentionPeriodInDays` and `numberOfSnapshotsToRetain` based on your requirements.

The following example shows how to enable the snapshot retention optimizer. Replace the account ID with a valid AWS account ID. Replace the database name and table name with actual Iceberg table name and the database name. Replace the `roleArn` with the AWS Resource Name (ARN) of the IAM role and name of the IAM role that has the required permissions to run the snapshot retention optimizer. 

```
aws glue create-table-optimizer \
  --catalog-id 123456789012 \
  --database-name iceberg_db \
  --table-name iceberg_table \
  --table-optimizer-configuration '{"roleArn":"arn:aws:iam::123456789012:role/optimizer_role","enabled":'true', "vpcConfiguration":{
"glueConnectionName":"glue_connection_name"}, "retentionConfiguration":{"icebergConfiguration":{"snapshotRetentionPeriodInDays":7,"numberOfSnapshotsToRetain":3,"cleanExpiredFiles":'true'}}}'\
  --type retention
```

 This command creates a retention optimizer for the specified Iceberg table in the given catalog, database, and Region. The table-optimizer-configuration specifies the IAM role ARN to use, enables the optimizer, and sets the retention configuration. In this example, it retains snapshots for 7 days, keeps a minimum of 3 snapshots, and cleans expired files. 
+  snapshotRetentionPeriodInDays –The number of days to retain snapshots before expiring them. The default value is `5`. 
+ numberOfSnapshotsToRetain – The minimum number of snapshots to keep, even if they are older than the retention period. The default value is `1`. 
+ cleanExpiredFiles – A boolean indicating whether to delete expired data files after expiring snapshots. The default value is `true`.

   When set to true, older snapshots are removed from table metadata, and their underlying files are deleted. If this parameter is set to false, older snapshots are removed from table metadata but their underlying files remain in the storage as orphan files. 

------
#### [ AWS API ]

Call [CreateTableOptimizer](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-table-optimizers.html#aws-glue-api-table-optimizers-CreateTableOptimizer) operation to enable snapshot retention optimizer for a table.

------

After you enable compaction, **Table optimization** tab shows the following compaction details (after approximately 15-20 minutes):

Start time  
The time at which the snapshot retention optimizer started. The value is a timestamp in UTC time. 

Run time  
The time shows how long the optimizer takes to complete the task. The value is a timestamp in UTC time. 

Status  
The status of the optimizer run. Values are success or fail.

Data files deleted  
Total number of files deleted.

Manifest files deleted  
Total number of manifest files deleted.

Manifest lists deleted  
Total number of manifest lists deleted.

# Updating snapshot retention optimizer
<a name="update-snapshot-retention"></a>

 You can update the existing configuration of an snapshot retention optimizer for a particular Apache Iceberg table using the AWS Glue console, AWS CLI, or the UpdateTableOptimizer API. 

------
#### [ Console ]

**To update snapshot retention configuration**

1. Sign in to the AWS Management Console and open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. Choose **Data Catalog** and choose **Tables**. From the tables list, choose the Iceberg table you want to update the snapshot retention optimizer configuration.

1. On the lower section of the **Tables details** page, select the **Table optimization ** tab, and then choose **Edit**. You can also choose **Edit** under **Optimization ** from the **Actions **menu located on the top right corner of the page.

1.  On the **Edit optimization** page, make the desired changes. 

1.  Choose **Save**. 

------
#### [ AWS CLI ]

 To update a snapshot retention optimizer using the AWS CLI, you can use the following command: 

```
aws glue update-table-optimizer \
 --catalog-id 123456789012 \
 --database-name iceberg_db \
 --table-name iceberg_table \
 --table-optimizer-configuration '{"roleArn":"arn:aws:iam::123456789012:role/optimizer_role"","enabled":'true', "vpcConfiguration":{"glueConnectionName":"glue_connection_name"},"retentionConfiguration":{"icebergConfiguration":{"snapshotRetentionPeriodInDays":7,"numberOfSnapshotsToRetain":3,"cleanExpiredFiles":'true'}}}' \
 --type retention
```

 This command updates the retention configuration for the specified table in the given catalog, database, and Region. The key parameters are: 
+  snapshotRetentionPeriodInDays –The number of days to retain snapshots before expiring them. The default value is `1`. 
+ numberOfSnapshotsToRetain – The minimum number of snapshots to keep, even if they are older than the retention period. The default value is `5`. 
+ cleanExpiredFiles – A boolean indicating whether to delete expired data files after expiring snapshots. The default value is `true`. 

   When set to true, older snapshots are removed from table metadata, and their underlying files are deleted." If this parameter is set to false, older snapshots are removed from table metadata but their underlying files remain in the storage as orphan files. 

------
#### [ API ]

To update a table optimizer, you can use the `UpdateTableOptimizer` API. This API allows you to update the configuration of an existing table optimizer for compaction, retention, or orphan file removal. The request parameters include:
+ catalogId (required): The ID of the catalog containing the table 
+  databaseName (optional): The name of the database containing the table 
+  tableName (optional): The name of the table 
+  type (required): The type of table optimizer (compaction, retention, or orphan\$1file\$1deletion) 
+  retentionConfiguration (required): The updated configuration for the table optimizer, including role ARN, enabled status, retention configuration, and orphan file removal configuration. 

------

# Disabling snapshot retention optimizer
<a name="disable-snapshot-retention"></a>

 You can disable the snapshot retention optimizer for a particular Apache Iceberg table using AWS Glue console or AWS CLI. 

------
#### [ Console ]

**To disable snapshot retention**

1. Sign in to the AWS Management Console and open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. Choose **Data Catalog** and choose **Tables**. From the tables list, choose the Iceberg table that you want to disable the optimizer for snapshot retention.

1. On lower section of the **Table details** page, choose **Table optimization** and **Disable**, **Snapshot retention** under **Actions**.

   You can also choose **Disable** under ** Optimization** from the **Actions** menu located on top right corner of the page.

1.  Choose **Disable ** on the confirmation message. You can re-enable the snapshot retention optimizer at a later time. 

    After the you confirm, snapshot retention optimizer is disabled and the status for snapshot retention turns back to `Not enabled`.

------
#### [ AWS CLI ]

In the following example, replace the account ID with a valid AWS account ID. Replace the database name and table name with actual Iceberg table name and the database name. Replace the `roleArn` with the AWS Resource Name (ARN) of the IAM role and actual name of the IAM role that has the required permissions to run the retention optimizer.

```
aws glue update-table-optimizer \
  --catalog-id 123456789012 \
  --database-name iceberg_db \
  --table-name iceberg_table \
  --table-optimizer-configuration '{"roleArn":"arn:aws:iam::123456789012:role/optimizer_role", "vpcConfiguration":{"glueConnectionName":"glue_connection_name"}, "enabled":'false'}'\ 
  --type retention
```

------
#### [ AWS API ]

Call [UpdateTableOptimizer](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-table-optimizers.html#aws-glue-api-table-optimizers-UpdateTableOptimizer) operation to disable the snapshot retention optimizer for a specific table.

------