# Accessing table data
<a name="s3-tables-access"></a>

There are multiple ways to access tables in Amazon S3 table buckets. You can integrate tables with AWS analytics services using AWS Glue Data Catalog, or access tables directly using the Amazon S3 Tables Iceberg REST endpoint or the Amazon S3 Tables Catalog for Apache Iceberg. The access method you use will depend on your catalog setup, governance model, and access control needs. The following is an overview of these access methods.

**AWS Glue Data Catalog integration**  
This is the recommended access method for working with tables in S3 table buckets. This integration gives you a unified view of your data estate across multiple AWS analytics services through the AWS Glue Data Catalog. After integration, you can query tables using services such as Athena and Amazon Redshift. Access to tables is managed using IAM permissions. To access tables using this integration, the IAM identity you use needs access to your S3 Tables resources and actions, AWS Glue Data Catalog objects, and the query engine you're using. For more information, see [Access management for S3 Tables](s3-tables-setting-up.md).

**Direct access**  
Use this method if you need to work with AWS Partner Network (APN) catalog implementations, custom catalog implementations, or if you only need to perform basic read/write operations on tables within a single table bucket. Access to tables is managed using IAM permissions. To access tables, the IAM identity you use needs access to your table resources and S3 Tables actions. For more information, see [Access management for S3 Tables](s3-tables-setting-up.md).

## Accessing tables through the AWS Glue Data Catalog integration
<a name="table-access-gdc-integration"></a>

You can integrate S3 table buckets with AWS Glue Data Catalog to access tables from AWS analytics services, such as Amazon Athena, Amazon Redshift, and Quick. The integration populates the AWS Glue Data Catalog with your table resources, and federates access to those resources. For more information on integrating, see [Integrating Amazon S3 Tables with AWS analytics services](s3-tables-integrating-aws.md).

The following AWS analytics services can access tables through this integration:
+ [Amazon Athena](s3-tables-integrating-athena.md)
+ [Amazon Redshift](s3-tables-integrating-redshift.md)
+ [Amazon EMR](s3-tables-integrating-emr.md)
+ [Quick](s3-tables-integrating-quicksight.md)
+ [Amazon Data Firehose](s3-tables-integrating-firehose.md)
+ [AWS Glue ETL](s3-tables-integrating-glue.md)
+ [Querying S3 Tables with SageMaker Unified Studio](s3-tables-integrating-sagemaker.md)

### Accessing tables using the AWS Glue Iceberg REST endpoint
<a name="table-access-glue-irc"></a>

Once your S3 table buckets are integrated with AWS Glue Data Catalog, you can also use the AWS Glue Iceberg REST endpoint to connect to S3 tables from third-party query engines that support Iceberg. For more information, see [Accessing Amazon S3 tables using the AWS Glue Iceberg REST endpoint](s3-tables-integrating-glue-endpoint.md).

We recommend using the AWS Glue Iceberg REST endpoint when you want to access tables from Spark, PyIceberg, or other Iceberg-compatible clients.

The following clients can access tables directly through the AWS Glue Iceberg REST endpoint:
+ Any Iceberg client, including Spark, PyIceberg, and more.

## Accessing tables directly
<a name="table-access-direct"></a>

 You can access tables directly from open source query engines through methods that bridge S3 Tables management operations to your Apache Iceberg analytics applications. There are two direct access methods: the Amazon S3 Tables Iceberg REST endpoint or the Amazon S3 Tables Catalog for Apache Iceberg. The REST endpoint is recommended. 

We recommend direct access if you access tables in self-managed catalog implementations, or only need to perform basic read/write operations on tables in a single table bucket. For other access scenarios, we recommend the AWS Glue Data Catalog integration.

Direct access to tables is managed through either IAM identity-based policies or resource-based policies attached to tables and table buckets.

### Accessing tables through the Amazon S3 Tables Iceberg REST endpoint
<a name="access-tables-irc"></a>

You can use the Amazon S3 Tables Iceberg REST endpoint to access your tables directly from any Iceberg REST compatible clients through HTTP endpoints, for more information, see [Accessing tables using the Amazon S3 Tables Iceberg REST endpoint](s3-tables-integrating-open-source.md). 

The following AWS analytics services and query engines can access tables directly using the Amazon S3 Tables Iceberg REST endpoint:

**Supported query engines**
+ Any Iceberg client, including Spark, PyIceberg, and more.
+ [Amazon EMR](s3-tables-integrating-emr.md)
+ [AWS Glue ETL](s3-tables-integrating-glue.md)

### Accessing tables directly through the Amazon S3 Tables Catalog for Apache Iceberg
<a name="access-client-catalog"></a>

You can also access tables directly from query engines like Apache Spark by using the S3 Tables client catalog, for more information, see [Accessing Amazon S3 tables with the Amazon S3 Tables Catalog for Apache Iceberg](s3-tables-client-catalog.md). However, S3 recommends using the Amazon S3 Tables Iceberg REST endpoint for direct access because it supports more applications, without requiring language or engine-specific code.

The following query engines can access tables directly using the client catalog:
+ [Apache Spark](s3-tables-client-catalog.md#s3-tables-integrating-open-source-spark)

# Amazon S3 Tables integration with AWS analytics services overview
<a name="s3-tables-integration-overview"></a>

To make tables in your account accessible by AWS analytics services, you integrate your Amazon S3 table buckets with AWS Glue Data Catalog. This integration allows AWS analytics services to automatically discover and access your table data. You can use this integration to work with your tables in these services:
+ [Amazon Athena](s3-tables-integrating-athena.md) 
+  [Amazon Redshift](s3-tables-integrating-redshift.md)
+  [Amazon EMR](s3-tables-integrating-emr.md)
+  [Quick](s3-tables-integrating-quicksight.md)
+  [Amazon Data Firehose](s3-tables-integrating-firehose.md)

**Note**  
This integration uses AWS Glue and AWS Lake Formation services and might incur AWS Glue request and storage costs. For more information, see [AWS Glue Pricing.](https://aws.amazon.com/glue/pricing/)  
Additional pricing applies for running queries on your S3 tables. For more information, see pricing information for the query engine that you're using.

## How the integration works
<a name="how-table-integration-works"></a>

When you integrate S3 Tables with the AWS analytics services, Amazon S3 adds the catalog named `s3tablescatalog` to the AWS Glue Data Catalog in the current Region. Adding the `s3tablescatalog` allows all your table buckets, namespaces, and tables to be populated in the Data Catalog.

**Note**  
These actions are automated through the Amazon S3 console. If you perform this integration programmatically, you must manually take these actions.

You integrate your table buckets once per AWS Region. After the integration is completed, all current and future table buckets, namespaces, and tables are added to the AWS Glue Data Catalog in that Region.

The following illustration shows how the `s3tablescatalog` catalog automatically populates table buckets, namespaces, and tables in the current Region as corresponding objects in the Data Catalog. Table buckets are populated as subcatalogs. Namespaces within a table bucket are populated as databases within their respective subcatalogs. Tables are populated as tables in their respective databases.

![\[The ways that table resources are represented in AWS Glue Data Catalog.\]](http://docs.aws.amazon.com/AmazonS3/latest/userguide/images/S3Tables-glue-catalog.png)


After integrating with Data Catalog, you can create Apache Iceberg tables in table buckets and access them via AWS analytics engines such as Amazon Athena, Amazon EMR, as well as third-party analytics engines.

**How permissions work**  
We recommend integrating your table buckets with AWS analytics services so that you can work with your table data across services that use the AWS Glue Data Catalog as a metadata store. Once the integration is enabled, you can use AWS Identity and Access Management (IAM) permissions to grant access to S3 Tables resources and their associated Data Catalog objects.

Make sure that you follow the steps in [Integrating S3 Tables with AWS analytics services](s3-tables-integrating-aws.md) so that you have the appropriate permissions to access the AWS Glue Data Catalog and your table resources, and to work with AWS analytics services.

## Regions supported
<a name="regions-supported-integration-overview"></a>

S3 Tables integration with AWS analytics services uses AWS Glue Data Catalog with IAM-based access controls in the following regions. In all other regions, the integration also requires AWS Lake Formation.
+ US East (N. Virginia)
+ US East (Ohio)
+ US West (N. California)
+ US West (Oregon)
+ Africa (Cape Town)
+ Asia Pacific (Hong Kong)
+ Asia Pacific (Taipei)
+ Asia Pacific (Tokyo)
+ Asia Pacific (Seoul)
+ Asia Pacific (Osaka)
+ Asia Pacific (Mumbai)
+ Asia Pacific (Hyderabad)
+ Asia Pacific (Singapore)
+ Asia Pacific (Sydney)
+ Asia Pacific (Jakarta)
+ Asia Pacific (Melbourne)
+ Asia Pacific (Malaysia)
+ Asia Pacific (New Zealand)
+ Asia Pacific (Thailand)
+ Canada (Central)
+ Canada West (Calgary)
+ Europe (Frankfurt)
+ Europe (Zurich)
+ Europe (Stockholm)
+ Europe (Milan)
+ Europe (Spain)
+ Europe (Ireland)
+ Europe (London)
+ Europe (Paris)
+ Israel (Tel Aviv)
+ Mexico (Central)
+ South America (São Paulo)

## Next steps
<a name="next-steps-integration-overview"></a>
+ [Integrating S3 Tables with AWS analytics services](s3-tables-integrating-aws.md)
+ [Create a namespace](s3-tables-namespace-create.md)
+ [Create a table](s3-tables-create.md)

# Integrating Amazon S3 Tables with AWS analytics services
<a name="s3-tables-integrating-aws"></a>

This topic covers the prerequisites and procedures needed to integrate your Amazon S3 table buckets with AWS analytics services. For an overview of how the integration works, see [S3 Tables integration overview](s3-tables-integration-overview.md).

**Note**  
This integration uses the AWS Glue Data Catalog and might incur AWS Glue request and storage costs. For more information, see [AWS Glue Pricing.](https://aws.amazon.com/glue/pricing/)  
Additional pricing applies for running queries on S3 Tables. For more information, see pricing information for the query engine that you're using.

## Prerequisites for integration
<a name="table-integration-prerequisites"></a>

The following prerequisites are required to integrate table buckets with AWS analytics services:
+ [Create a table bucket.](s3-tables-buckets-create.md)
+ Add the following AWS Glue permissions to your AWS Identity and Access Management (IAM) principal:
  + `glue:CreateCatalog` which is required to create `s3tablescatalog` federated catalog in the Data Catalog
  + `glue:PassConnection` grants the calling principal the right to delegate `aws:s3tables` connection creation to Amazon S3 service.
+ [Update to the latest version of the AWS Command Line Interface (AWS CLI)](https://docs.aws.amazon.com//cli/latest/userguide/getting-started-install.html#getting-started-install-instructions).

**Important**  
When creating tables, make sure that you use all lowercase letters in your table names and table definitions. For example, make sure that your column names are all lowercase. If your table name or table definition contains capital letters, the table isn't supported by AWS Lake Formation or the AWS Glue Data Catalog. In this case, your table won't be visible to AWS analytics services such as Amazon Athena, even if your table buckets are integrated with AWS analytics services.   
If your table definition contains capital letters, you receive the following error message when running a `SELECT` query in Athena: "GENERIC\$1INTERNAL\$1ERROR: Get table request failed: com.amazonaws.services.glue.model.ValidationException: Unsupported Federation Resource - Invalid table or column names."

## Integrating table buckets with AWS analytics services
<a name="table-integration-procedures"></a>

You can integrate table buckets with Data Catalog and AWS analytics services using IAM access controls by default, or optionally use Lake Formation access controls.

When you integrate using IAM access controls, you require IAM privileges to access Amazon S3 table buckets and tables, Data Catalog objects, and the query engine you're using. If you choose to integrate using Lake Formation, then both IAM access controls and Lake Formation grants determine the access to Data Catalog resources. Please refer to [https://docs.aws.amazon.com/lake-formation/latest/dg/create-s3-tables-catalog.html](https://docs.aws.amazon.com/lake-formation/latest/dg/create-s3-tables-catalog.html) to learn more about Lake Formation integration.

The following sections describe how you could use Amazon S3 management console or AWS CLI to configure the integration with IAM access controls.

### Using the S3 console
<a name="integrate-console"></a>

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. In the left navigation pane, choose **Table buckets**.

1. Choose **Create table bucket**.

   The **Create table bucket** page opens.

1. Enter a **Table bucket name** and make sure that the **Enable integration** checkbox is selected.

1. Choose **Create table bucket**. Amazon S3 will attempt to automatically integrate your table buckets in that Region.

### Using the AWS CLI
<a name="integrate-cli"></a>

**To integrate table buckets with IAM access controls using the AWS CLI**

The following steps show how to use the AWS CLI to integrate table buckets. To use these steps, replace the `user input placeholders` with your own information.

1. Create a table bucket.

   ```
   aws s3tables create-table-bucket \
   --region us-east-1 \
   --name amzn-s3-demo-table-bucket
   ```

1. Create a file called `catalog.json` that contains the following catalog:

   ```
   {
      "Name": "s3tablescatalog",
      "CatalogInput": {
         "FederatedCatalog": {
             "Identifier": "arn:aws:s3tables:us-east-1:111122223333:bucket/*",
             "ConnectionName": "aws:s3tables"
          },
          "CreateDatabaseDefaultPermissions":[
          {
                   "Principal": {
                       "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                   },
                   "Permissions": ["ALL"]
               }
          ],
          "CreateTableDefaultPermissions":[
          {
                   "Principal": {
                       "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                   },
                   "Permissions": ["ALL"]
               }
          ],
          "AllowFullTableExternalDataAccess": "True"
      }
   }
   ```

   Create the `s3tablescatalog` catalog by using the following command. Creating this catalog populates the AWS Glue Data Catalog with objects corresponding to table buckets, namespaces, and tables.

   ```
   aws glue create-catalog \
   --region us-east-1 \
   --cli-input-json file://catalog.json
   ```

1. Verify that the `s3tablescatalog` catalog was added in AWS Glue by using the following command:

   ```
   aws glue get-catalog --catalog-id s3tablescatalog
   ```

### Migrating to the updated integration process
<a name="migrate-integrate-console"></a>

The AWS analytics services integration process has been updated to use IAM permissions by default. If you've already set up the integration, you can continue to use your current integration. However, if you want to change your existing integration to use IAM permissions instead, see [https://docs.aws.amazon.com/lake-formation/latest/dg/create-s3-tables-catalog.html](https://docs.aws.amazon.com/lake-formation/latest/dg/create-s3-tables-catalog.html). You can also redo the integration to delete your existing setup in AWS Glue Data Catalog and AWS Lake Formation and re-run the integration. This will remove all existing Lake Formation grants and associated access permissions to the `s3tablescatalog`.

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/), and sign in as a data lake administrator. For more information about how to create a data lake administrator, see [Create a data lake administrator](https://docs.aws.amazon.com/lake-formation/latest/dg/initial-lf-config.html#create-data-lake-admin) in the *AWS Lake Formation Developer Guide*.

1. Delete your `s3tablescatalog` catalog by doing the following: 
   + In the left navigation pane, choose **Catalogs**. 
   + Select the option button next to the `s3tablescatalog` catalog in the **Catalogs** list. On the **Actions** menu, choose **Delete**.

1. Deregister the data location for the `s3tablescatalog` catalog by doing the following:
   + In the left navigation pane, go to the **Administration** section, and choose **Data lake locations**. 
   + Select the option button next to the `s3tablescatalog` data lake location, for example, `s3://tables:region:account-id:bucket/*`. 
   + On the **Actions** menu, choose **Remove**. 
   + In the confirmation dialog box that appears, choose **Remove**. 

1. Now that you've deleted your `s3tablescatalog` catalog and data lake location, you can follow the steps to [integrate your table buckets with AWS analytics services](#table-integration-procedures) by using the updated integration process. 

**Note**  
If you want to work with SSE-KMS encrypted tables in integrated AWS analytics services, the role you use needs to have permission to use your AWS KMS key for encryption operations. For more information, see [Granting IAM principals permissions to work with encrypted tables in integrated AWS analytics services](s3-tables-kms-permissions.md#tables-kms-integration-permissions).

**Next steps**
+ [Create a namespace](s3-tables-namespace-create.md).
+ [Create a table](s3-tables-create.md).

# Accessing Amazon S3 tables using the AWS Glue Iceberg REST endpoint
<a name="s3-tables-integrating-glue-endpoint"></a>

Once your S3 table buckets are integrated with the AWS Glue Data Catalog you can use the AWS Glue Iceberg REST endpoint to connect to your S3 tables from Apache Iceberg-compatible clients, such as PyIceberg or Spark. The AWS Glue Iceberg REST endpoint implements the [Iceberg REST Catalog Open API specification](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml) which provides a standardized interface for interacting with Iceberg tables. To access S3 tables using the endpoint you need to configure permissions through a combination of IAM policies and AWS Lake Formation grants. The following sections explain how to set up access, including creating the necessary IAM role, defining the required policies, and establishing Lake Formation permissions for both database and table-level access. 

For an end to end walkthrough using PyIceberg, see [Access data in Amazon S3 Tables using PyIceberg through the AWS Glue Iceberg REST endpoint](https://aws.amazon.com/blogs/storage/access-data-in-amazon-s3-tables-using-pyiceberg-through-the-aws-glue-iceberg-rest-endpoint/).

**Prerequisites**
+ [Integrate your table buckets with AWS analytics services](s3-tables-integrating-aws.md)
+ [Create a table namespace](s3-tables-namespace-create.md)
+ [Have access to a data lake administrator account](https://docs.aws.amazon.com//lake-formation/latest/dg/initial-lf-config.html#create-data-lake-admin)

## Create an IAM role for your client
<a name="glue-endpoint-create-iam-role"></a>

To access tables through AWS Glue endpoints, you need to create an IAM role with permissions to AWS Glue and Lake Formation actions. This procedure explains how to create this role and configure its permissions.

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the left navigation pane, choose **Policies**.

1. Choose **Create a policy**, and choose **JSON** in policy editor.

1. Add the following inline policy that grants permissions to access AWS Glue and Lake Formation actions:

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "VisualEditor0",
               "Effect": "Allow",
               "Action": [
                   "glue:GetCatalog",
                   "glue:GetDatabase",
                   "glue:GetDatabases",
                   "glue:GetTable",
                   "glue:GetTables",
                   "glue:CreateTable",
                   "glue:UpdateTable"
               ],
               "Resource": [
                   "arn:aws:glue:us-east-1:111122223333:catalog",
                   "arn:aws:glue:us-east-1:111122223333:catalog/s3tablescatalog",
                   "arn:aws:glue:us-east-1:111122223333:catalog/s3tablescatalog/amzn-s3-demo-table-bucket",
                   "arn:aws:glue:us-east-1:111122223333:table/s3tablescatalog/amzn-s3-demo-table-bucket/<namespace>/*",
                   "arn:aws:glue:us-east-1:111122223333:database/s3tablescatalog/amzn-s3-demo-table-bucket/<namespace>"
               ]
           },
           {
               "Effect": "Allow",
               "Action": [
                   "lakeformation:GetDataAccess"
               ],
               "Resource": "*"
           }
       ]
   }
   ```

------

1. After you create the policy, create an IAM role and choose **Custom trust policy** as the **Trusted entity type**.

1. Enter the following for the **Custom trust policy**.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Principal": {
                   "AWS": "arn:aws:iam::111122223333:role/Admin_role"
               },
               "Action": "sts:AssumeRole",
               "Condition": {}
           }
       ]
   }
   ```

------

## Define access in Lake Formation
<a name="define-access-lakeformation"></a>

Lake Formation provides fine-grained access control for your data lake tables. When you integrated your S3 bucket with the AWS Glue Data Catalog, your tables were automatically registered as resources in Lake Formation. To access these tables, you must grant specific Lake Formation permissions to your IAM identity, in addition to its IAM policy permissions.

The following steps explain how to apply Lake Formation access controls to allow your Iceberg client to connect to your tables. You must sign in as a data lake administrator to apply these permissions.

### Allow external engines to access table data
<a name="allow-external-engines"></a>

In Lake Formation, you must enable full table access for external engines to access data. This allows third-party applications to get temporary credentials from Lake Formation when using an IAM role that has full permissions on the requested table.

Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/), and sign in as a data lake administrator.

1. In the navigation pane under **Administration**, choose **Application integration settings**.

1. Select **Allow external engines to access data in Amazon S3 locations with full table access**. Then choose **Save**.

### Grant Lake Formation permissions on your table resources
<a name="grant-lakeformation-permissions"></a>

Next, grant Lake Formation permissions to the IAM role you created for your Iceberg-compatible client. These permissions will allow the role to create and manage tables in your namespace. You need to provide both database and table-level permissions. For more information, see [Granting Lake Formation permission on a table or database](grant-permissions-tables.md#grant-lf-table).

## Set up your environment to use the endpoint
<a name="setup-client-glue-irc"></a>

After you have setup the IAM role with the permissions required for table access you can use it to run Iceberg clients from your local machine by configuring the AWS CLI with your role, using the following command:

```
aws sts assume-role --role-arn "arn:aws:iam::<accountid>:role/<glue-irc-role>" --role-session-name <glue-irc-role>
```

To access tables through the AWS Glue REST endpoint, you need to initialize a catalog in your Iceberg-compatible client. This initialization requires specifying custom properties, including sigv4 properties, the endpoint URI and the warehouse location. Specify these properties as follows:
+ Sigv4 properties - Sigv4 must be enabled, the signing name is `glue`
+ Warehouse location - This is your table bucket, specified in this format: `<accountid>:s3tablescatalog/<table-bucket-name>`
+ Endpoint URI - Refer to the AWS Glue service endpoints reference guide for the region-specific endpoint

The following example shows how to initialize a pyIceberg catalog.

```
rest_catalog = load_catalog(
        s3tablescatalog,
**{
"type": "rest",
"warehouse": "<accountid>:s3tablescatalog/<table-bucket-name>",
"uri": "https://glue.<region>.amazonaws.com/iceberg",
"rest.sigv4-enabled": "true",
"rest.signing-name": "glue",
"rest.signing-region": region
        }
)
```

For additional information about the AWS Glue Iceberg REST endpoint implementation, see [Connecting to the Data Catalog using AWS Glue Iceberg REST endpoint](https://docs.aws.amazon.com/glue/latest/dg/connect-glu-iceberg-rest.html) in the *AWS Glue User Guide*.

# Accessing tables using the Amazon S3 Tables Iceberg REST endpoint
<a name="s3-tables-integrating-open-source"></a>

You can connect your Iceberg REST client to the Amazon S3 Tables Iceberg REST endpoint and make REST API calls to create, update, or query tables in S3 table buckets. The endpoint implements a set of standardized Iceberg REST APIs specified in the [Apache Iceberg REST Catalog Open API specification](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml) . The endpoint works by translating Iceberg REST API operations into corresponding S3 Tables operations.

**Note**  
Amazon S3 Tables Iceberg REST endpoint can be used to access tables in AWS Partner Network (APN) catalog implementations or custom catalog implementations. It can also be used if you only need basic read/write access to a single table bucket. For other access scenarios we recommend using the AWS Glue Iceberg REST endpoint to connect to tables, which provides unified table management, centralized governance, and fine-grained access control. For more information, see [Accessing Amazon S3 tables using the AWS Glue Iceberg REST endpoint](s3-tables-integrating-glue-endpoint.md)

## Configuring the endpoint
<a name="configure-endpoint"></a>

You connect to the Amazon S3 Tables Iceberg REST endpoint using the service endpoint. S3 Tables Iceberg REST endpoints have the following format:

```
https://s3tables.<REGION>.amazonaws.com/iceberg
```

Refer to [S3 Tables AWS Regions and endpoints](s3-tables-regions-quotas.md#s3-tables-regions) for the Region-specific endpoints.

**Catalog configuration properties**

When using an Iceberg client to connect an analytics engine to the service endpoint, you must specify the following configuration properties when you initialize the catalog. Replace the *placeholder values* with the information for your Region and table bucket.
+ The region-specific endpoint as the endpoint URI: `https://s3tables.<REGION>.amazonaws.com/iceberg`
+ Your table bucket ARN as the warehouse location: `arn:aws:s3tables:<region>:<accountID>:bucket/<bucketname>`
+ Sigv4 properties for authentication. The SigV4 signing name for the service endpoint requests is: `s3tables`

The following examples show you how to configure different clients to use the Amazon S3 Tables Iceberg REST endpoint.

------
#### [ PyIceberg ]

To use the Amazon S3 Tables Iceberg REST endpoint with PyIceberg, specify the following application configuration properties:

```
rest_catalog = load_catalog(
  catalog_name,
  **{
    "type": "rest",    
    "warehouse":"arn:aws:s3tables:<Region>:<accountID>:bucket/<bucketname>",
    "uri": "https://s3tables.<Region>.amazonaws.com/iceberg",
    "rest.sigv4-enabled": "true",
    "rest.signing-name": "s3tables",
    "rest.signing-region": "<Region>"
  }
)
```

------
#### [ Apache Spark ]

To use the Amazon S3 Tables Iceberg REST endpoint with Spark, specify the following application configuration properties, replacing the *placeholder values* with the information for your Region and table bucket.

```
spark-shell \
  --packages "org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.4.1,software.amazon.awssdk:bundle:2.20.160,software.amazon.awssdk:url-connection-client:2.20.160" \
  --master "local[*]" \
  --conf "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \
  --conf "spark.sql.defaultCatalog=spark_catalog" \
   --conf "spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkCatalog" \
  --conf "spark.sql.catalog.spark_catalog.type=rest" \
  --conf "spark.sql.catalog.spark_catalog.uri=https://s3tables.<Region>.amazonaws.com/iceberg" \
  --conf "spark.sql.catalog.spark_catalog.warehouse=arn:aws:s3tables:<Region>:<accountID>:bucket/<bucketname>" \
  --conf "spark.sql.catalog.spark_catalog.rest.sigv4-enabled=true" \
  --conf "spark.sql.catalog.spark_catalog.rest.signing-name=s3tables" \
  --conf "spark.sql.catalog.spark_catalog.rest.signing-region=<Region>" \
  --conf "spark.sql.catalog.spark_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO" \
  --conf "spark.hadoop.fs.s3a.aws.credentials.provider=org.apache.hadoop.fs.s3a.SimpleAWSCredentialProvider" \
  --conf "spark.sql.catalog.spark_catalog.rest-metrics-reporting-enabled=false"
```

------

## Authenticating and authorizing access to the endpoint
<a name="tables-endpoint-auth"></a>

API requests to the S3 Tables service endpoints are authenticated using AWS Signature Version 4 (SigV4). See [AWS Signature Version 4 for API requests](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_sigv.html) to learn more about AWS SigV4.

The SigV4 signing name for Amazon S3 Tables Iceberg REST endpoint requests is: `s3tables`

Requests to the Amazon S3 Tables Iceberg REST endpoint are authorized using `s3tables` IAM actions corresponding to the REST API operations. These permissions can be defined in either IAM identity-based policies or resource-based policies attached to tables and table buckets. For more information, see [Access management for S3 Tables](s3-tables-setting-up.md).

 You can track requests made to your tables through the REST endpoint with AWS CloudTrail. Requests will be logged as their corresponding S3 IAM action. For example, a `LoadTable` API will generate a management event for the `GetTableMetadataLocation` operation and a data event for the `GetTableData` operation. For more information, see [Logging with AWS CloudTrail for S3 Tables](s3-tables-logging.md). 

## Prefix and path parameters
<a name="endpoint-parameter"></a>

 Iceberg REST catalog APIs have a free-form prefix in their request URLs. For example, the `ListNamespaces` API call uses the `GET/v1/{prefix}/namespaces` URL format. For S3 Tables the REST path `{prefix}` is always your url-encoded table bucket ARN. 

For example, for the following table bucket ARN: `arn:aws:s3tables:us-east-1:111122223333:bucket/bucketname` the prefix would be: `arn%3Aaws%3As3tables%3Aus-east-1%3A111122223333%3Abucket%2Fbucketname` 

### Namespace path parameter
<a name="endpoint-parameter-namespace"></a>

 Namespaces in an Iceberg REST catalog API path can have multiple levels. However, S3 Tables only supports single-level namespaces. To access a namespace in a multi-level catalog hierarchy, you can connect to a multi-level catalog above the namespace when referencing the namespace. This allows any query engine that supports the 3-part notation of `catalog.namespace.table` to access objects in S3 Tables’ catalog hierarchy without compatibility issues compared to using the multi-level namespace. 

## Supported Iceberg REST API operations
<a name="endpoint-supported-api"></a>

The following table contains the supported Iceberg REST APIs and how they correspond to S3 Tables actions. 


| Iceberg REST operation | REST path | S3 Tables IAM action | CloudTrail EventName | 
| --- | --- | --- | --- | 
|  `getConfig`  |  `GET /v1/config`  |  `s3tables:GetTableBucket`  |  `s3tables:GetTableBucket`  | 
|  `listNamespaces`  |  `GET /v1/{prefix}/namespaces`  |  `s3tables:ListNamespaces`  |  `s3tables:ListNamespaces`  | 
|  `createNamespace`  |  `POST /v1/{prefix}/namespaces`  |  `s3tables:CreateNamespace`  |  `s3tables:CreateNamespace`  | 
|  `loadNamespaceMetadata`  |  `GET /v1/{prefix}/namespaces/{namespace}`  |  `s3tables:GetNamespace`  |  `s3tables:GetNamespace`  | 
|  `dropNamespace`  |  `DELETE /v1/{prefix}/namespaces/{namespace}`  |  `s3tables:DeleteNamespace`  |  `s3tables:DeleteNamespace`  | 
|  `listTables`  |  `GET /v1/{prefix}/namespaces/{namespace}/tables`  |  `s3tables:ListTables`  |  `s3tables:ListTables`  | 
|  `createTable`  |  `POST /v1/{prefix}/namespaces/{namespace}/tables`  |  `s3tables:CreateTable`, `s3tables:PutTableData`  |  `s3tables:CreateTable`, `s3tables:PutObject`  | 
|  `loadTable`  |  `GET /v1/{prefix}/namespaces/{namespace}/tables/{table}`  |  `s3tables:GetTableMetadataLocation`, `s3tables:GetTableData`  |  `s3tables:GetTableMetadataLocation`, `s3tables:GetObject`  | 
|  `updateTable`  |  `POST /v1/{prefix}/namespaces/{namespace}/tables/{table}`  |  `s3tables:UpdateTableMetadataLocation`, `s3tables:PutTableData`, `s3tables:GetTableData`  |  `s3tables:UpdateTableMetadataLocation`, `s3tables:PutObject`, `s3tables:GetObject`  | 
|  `dropTable`  |  `DELETE /v1/{prefix}/namespaces/{namespace}/tables/{table}`  |  `s3tables:DeleteTable`  |  `s3tables:DeleteTable`  | 
|  `renameTable`  |  `POST /v1/{prefix}/tables/rename`  |  `s3tables:RenameTable`  |  `s3tables:RenameTable`  | 
|  `tableExists`  |  `HEAD /v1/{prefix}/namespaces/{namespace}/tables/{table}`  |  `s3tables:GetTable`  |  `s3tables:GetTable`  | 
|  `namespaceExists`  |  `HEAD /v1/{prefix}/namespaces/{namespace}`  |  `s3tables:GetNamespace`  |  `s3tables:GetNamespace`  | 

## Considerations and limitations
<a name="endpoint-considerations"></a>

Following are considerations and limitations when using the Amazon S3 Tables Iceberg REST endpoint.

****Considerations****
+ **CreateTable API behavior** – The `stage-create` option is not supported for this operation, and results in a `400 Bad Request` error. This means you cannot create a table from query results using `CREATE TABLE AS SELECT` (CTAS).
+ **DeleteTable API behavior** – You can only drop tables with purge enabled. Dropping tables with `purge=false` is not supported and results in a `400 Bad Request` error. Some versions of Spark always set this flag to false even when running `DROP TABLE PURGE` commands. You can try with `DROP TABLE PURGE` or use the S3 Tables [DeleteTable](https://docs.aws.amazon.com/AmazonS3/latest/API/API_s3TableBuckets_DeleteTable.html) operation to delete a table.
+  The endpoint only supports standard table metadata operations. For table maintenance, such as snapshot management and compaction, use S3 Tables maintenance API operations. For more information, see [S3 Tables maintenance](s3-tables-maintenance-overview.md). 

****Limitations****
+ Multilevel namespaces are not supported.
+ OAuth-based authentication is not supported.
+ Only the `owner` property is supported for namespaces.
+ View-related APIs defined in the [Apache Iceberg REST Open API specification](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml) are not supported.
+ Running operations on a table with a `metadata.json` file over 50MB is not supported, and will return a `400 Bad Request` error. To control the size of your `metadata.json` files use table maintenance operations. For more information, see [S3 Tables maintenance](s3-tables-maintenance-overview.md).

# Accessing Amazon S3 tables with the Amazon S3 Tables Catalog for Apache Iceberg
<a name="s3-tables-client-catalog"></a>

You can access S3 tables from open source query engines like Apache Spark by using the Amazon S3 Tables Catalog for Apache Iceberg client catalog. Amazon S3 Tables Catalog for Apache Iceberg is an open source library hosted by AWS Labs. It works by translating Apache Iceberg operations in your query engines (such as table discovery, metadata updates, and adding or removing tables) into S3 Tables API operations.

Amazon S3 Tables Catalog for Apache Iceberg is distributed as a Maven JAR called `s3-tables-catalog-for-iceberg.jar`. You can build the client catalog JAR from the [AWS Labs GitHub repository](https://github.com/awslabs/s3-tables-catalog) or download it from [Maven](https://mvnrepository.com/artifact/software.amazon.s3tables/s3-tables-catalog-for-iceberg). When connecting to tables, the client catalog JAR is used as a dependency when you initialize a Spark session for Apache Iceberg.

## Using the Amazon S3 Tables Catalog for Apache Iceberg with Apache Spark
<a name="s3-tables-integrating-open-source-spark"></a>

You can use the Amazon S3 Tables Catalog for Apache Iceberg client catalog to connect to tables from open-source applications when you initialize a Spark session. In your session configuration you specify Iceberg and Amazon S3 dependencies, and create a custom catalog that uses your table bucket as the metadata warehouse.

****Prerequisites****
+ An IAM identity with access to your table bucket and S3 Tables actions. For more information, see [Access management for S3 Tables](s3-tables-setting-up.md).

**To initialize a Spark session using the Amazon S3 Tables Catalog for Apache Iceberg**
+ Initialize Spark using the following command. To use the command, replace the replace the Amazon S3 Tables Catalog for Apache Iceberg *version number* with the latest version from [AWS Labs GitHub repository](https://github.com/awslabs/s3-tables-catalog), and the *table bucket ARN* with your own table bucket ARN.

  ```
  spark-shell \
  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.6.1,software.amazon.s3tables:s3-tables-catalog-for-iceberg-runtime:0.1.4 \
  --conf spark.sql.catalog.s3tablesbucket=org.apache.iceberg.spark.SparkCatalog \
  --conf spark.sql.catalog.s3tablesbucket.catalog-impl=software.amazon.s3tables.iceberg.S3TablesCatalog \
  --conf spark.sql.catalog.s3tablesbucket.warehouse=arn:aws:s3tables:us-east-1:111122223333:bucket/amzn-s3-demo-table-bucket \
  --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
  ```

### Querying S3 tables with Spark SQL
<a name="query-with-spark"></a>

Using Spark, you can run DQL, DML, and DDL operations on S3 tables. When you query tables you use the fully qualified table name, including the session catalog name following this pattern:

`CatalogName.NamespaceName.TableName`

The following example queries show some ways you can interact with S3 tables. To use these example queries in your query engine, replace the *user input placeholder* values with your own.

**To query tables with Spark**
+ Create a namespace

  ```
  spark.sql(" CREATE NAMESPACE IF NOT EXISTS s3tablesbucket.my_namespace")
  ```
+ Create a table

  ```
  spark.sql(" CREATE TABLE IF NOT EXISTS s3tablesbucket.my_namespace.`my_table` 
  ( id INT, name STRING, value INT ) USING iceberg ")
  ```
+ Query a table

  ```
  spark.sql(" SELECT * FROM s3tablesbucket.my_namespace.`my_table` ").show()
  ```
+ Insert data into a table

  ```
  spark.sql(
  """
      INSERT INTO s3tablesbucket.my_namespace.my_table 
      VALUES 
          (1, 'ABC', 100), 
          (2, 'XYZ', 200)
  """)
  ```
+ Load an existing data file into a table

  1. Read the data into Spark.

     ```
     val data_file_location = "Path such as S3 URI to data file"
     val data_file = spark.read.parquet(data_file_location)
     ```

  1. Write the data into an Iceberg table.

     ```
     data_file.writeTo("s3tablesbucket.my_namespace.my_table").using("Iceberg").tableProperty ("format-version", "2").createOrReplace()
     ```

# Querying Amazon S3 tables with Athena
<a name="s3-tables-integrating-athena"></a>

Amazon Athena is an interactive query service that you can use to analyze data directly in Amazon S3 by using standard SQL. For more information, see [What is Amazon Athena?](https://docs.aws.amazon.com//athena/latest/ug/what-is.html) in the *Amazon Athena User Guide*.

After you integrate your table buckets with AWS analytics services, you can run Data Definition Language (DDL), Data Manipulation Language (DML), and Data Query Language (DQL) queries on S3 tables by using Athena. For more information about how to query tables in a table bucket, see [Register S3 Table bucket catalogs](https://docs.aws.amazon.com//athena/latest/ug/gdc-register-s3-table-bucket-cat.html) in the *Amazon Athena User Guide*.

You can also run queries in Athena from the Amazon S3 console. 

**Important**  
When creating tables, make sure that you use all lowercase letters in your table names and table definitions. For example, make sure that your column names are all lowercase. If your table name or table definition contains capital letters, the table isn't supported by AWS Lake Formation or the AWS Glue Data Catalog. In this case, your table won't be visible to AWS analytics services such as Amazon Athena, even if your table buckets are integrated with AWS analytics services.   
If your table definition contains capital letters, you receive the following error message when running a `SELECT` query in Athena: "GENERIC\$1INTERNAL\$1ERROR: Get table request failed: com.amazonaws.services.glue.model.ValidationException: Unsupported Federation Resource - Invalid table or column names."

## Using the S3 console and Amazon Athena
<a name="query-table-console"></a>

The following procedure uses the Amazon S3 console to access the Athena query editor so that you can query a table with Amazon Athena. 

**Note**  
Before performing the following steps, make sure that you've integrated your table buckets with AWS analytics services in this Region. For more information, see [Integrating Amazon S3 Tables with AWS analytics services](s3-tables-integrating-aws.md).

**To query a table**

1. Sign in to the AWS Management Console and open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. In the left navigation pane, choose **Table buckets**.

1. On the **Table buckets** page, choose the bucket that contains the table that you want to query.

1. On the bucket details page, choose the option button next to the name of the table that you want to query. 

1. Choose **Query table with Athena**.

1. The Amazon Athena console opens and the Athena query editor appears with a sample `SELECT` query loaded for you. Modify this query as needed for your use case.

   In the query editor, the **Catalog** field should be populated with **s3tablescatalog/** followed by the name of your table bucket, for example, **s3tablescatalog/*amzn-s3-demo-bucket***. The **Database** field should be populated with the namespace where your table is stored. 
**Note**  
If you don't see these values in the **Catalog** and **Database** fields, make sure that you've integrated your table buckets with AWS analytics services in this Region. For more information, see [Integrating Amazon S3 Tables with AWS analytics services](s3-tables-integrating-aws.md). 

1. To run the query, choose **Run**.
**Note**  
If you receive the error "Insufficient permissions to execute the query. Principal does not have any privilege on specified resource" when you try to run a query in Athena, you must be granted the necessary Lake Formation permissions on the table. For more information, see [Granting Lake Formation permission on a table or database](grant-permissions-tables.md#grant-lf-table).
If you receive the error "Iceberg cannot access the requested resource" when you try to run the query, go to the AWS Lake Formation console and make sure that you've granted yourself permissions on the table bucket catalog and database (namespace) that you created. Don't specify a table when granting these permissions. For more information, see [Granting Lake Formation permission on a table or database](grant-permissions-tables.md#grant-lf-table). 
If you receive the following error message when running a `SELECT` query in Athena, this message is caused by having capital letters in your table name or your column names in your table definition: "GENERIC\$1INTERNAL\$1ERROR: Get table request failed: com.amazonaws.services.glue.model.ValidationException: Unsupported Federation Resource - Invalid table or column names." Make sure that your table and column names are all lowercase.

# Accessing Amazon S3 tables with Amazon Redshift
<a name="s3-tables-integrating-redshift"></a>

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools. Redshift Serverless lets you access and analyze data without all of the configurations of a provisioned data warehouse. For more information, see [Get started with serverless data warehouses](https://docs.aws.amazon.com//redshift/latest/gsg/new-user-serverless.html) in the *Amazon Redshift Getting Started Guide*.

## Query Amazon S3 tables with Amazon Redshift
<a name="rs-query-table"></a>

**Prerequisites**
+ [Integrate your table buckets with AWS analytics services](s3-tables-integrating-aws.md).
  + [Create a namespace](s3-tables-namespace-create.md).
  + [Create a table](s3-tables-create.md).
+ [Managing access to a table or database with Lake Formation](grant-permissions-tables.md).

After you complete the prerequisites, you can begin using Amazon Redshift to query tables in one of the following ways:
+ [Using the Amazon Redshift query editor v2](https://docs.aws.amazon.com//redshift/latest/mgmt/query-editor-v2.html)
+ [Connecting to an Amazon Redshift data warehouse using SQL client tools](https://docs.aws.amazon.com//redshift/latest/mgmt/connecting-to-cluster.html)
+ [Using the Amazon Redshift Data API](https://docs.aws.amazon.com//redshift/latest/mgmt/data-api.html)

# Accessing Amazon S3 tables with Amazon EMR
<a name="s3-tables-integrating-emr"></a>

Amazon EMR (previously called Amazon Elastic MapReduce) is a managed cluster platform that simplifies running big data frameworks, such as Apache Hadoop and Apache Spark, on AWS to process and analyze vast amounts of data. Using these frameworks and related open-source projects, you can process data for analytics purposes and business intelligence workloads. Amazon EMR also lets you transform and move large amounts of data into and out of other AWS data stores and databases.

You can use Apache Iceberg clusters in Amazon EMR to work with S3 tables by connecting to table buckets in a Spark session. To connect to table buckets in Amazon EMR, you can use the AWS analytics services integration through AWS Glue Data Catalog, or you can use the open source Amazon S3 Tables Catalog for Apache Iceberg client catalog.

**Note**  
S3 Tables is supported on [Amazon EMR version 7.5](https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-release-components.html) or higher.

## Connecting to S3 table buckets with Spark on an Amazon EMR Iceberg cluster
<a name="emr-setup-cluster-spark"></a>

In this procedure, you set up an Amazon EMR cluster configured for Apache Iceberg and then launch a Spark session that connects to your table buckets. You can set this up using the AWS analytics services integration through AWS Glue, or you can use the open source Amazon S3 Tables Catalog for Apache Iceberg client catalog. For information about the client catalog, see [Accessing tables using the Amazon S3 Tables Iceberg REST endpoint](s3-tables-integrating-open-source.md). 

Choose your method of using tables with Amazon EMR from the following options.

------
#### [ Amazon S3 Tables Catalog for Apache Iceberg ]

The following prerequisites are required to query tables with Spark on Amazon EMR using the Amazon S3 Tables Catalog for Apache Iceberg.

For the latest version of the client catalog JAR, see the [s3-tables-catalog GitHub repository](https://github.com/awslabs/s3-tables-catalog).

**Prerequisites**
+ Attach the `AmazonS3TablesFullAccess` policy to the IAM role you use for Amazon EMR.

**To set up an Amazon EMR cluster to query tables with Spark**

1. Create a cluster with the following configuration. To use this example, replace the `user input placeholders` with your own information.

   ```
   aws emr create-cluster --release-label emr-7.5.0 \
   --applications Name=Spark \
   --configurations file://configurations.json \
   --region us-east-1 \
   --name My_Spark_Iceberg_Cluster \
   --log-uri s3://amzn-s3-demo-bucket/ \
   --instance-type m5.xlarge \
   --instance-count 2 \
   --service-role EMR_DefaultRole \
   --ec2-attributes \
   
   InstanceProfile=EMR_EC2_DefaultRole,SubnetId=subnet-1234567890abcdef0,KeyName=my-key-pair
   ```

   `configurations.json`:

   ```
   [{
   "Classification":"iceberg-defaults",
   "Properties":{"iceberg.enabled":"true"}
   }]
   ```

1. [Connect to the Spark primary node using SSH](https://docs.aws.amazon.com//emr/latest/ManagementGuide/emr-connect-master-node-ssh.html#emr-connect-cli).

1. To initialize a Spark session for Iceberg that connects to your table bucket, enter the following command. Replace the `user input placeholders` with your table bucket ARN.

   ```
   spark-shell \
   --packages software.amazon.s3tables:s3-tables-catalog-for-iceberg-runtime:0.1.8 \
   --conf spark.sql.catalog.s3tablesbucket=org.apache.iceberg.spark.SparkCatalog \
   --conf spark.sql.catalog.s3tablesbucket.catalog-impl=software.amazon.s3tables.iceberg.S3TablesCatalog \
   --conf spark.sql.catalog.s3tablesbucket.warehouse=arn:aws:s3tables:us-east-1:111122223333:bucket/amzn-s3-demo-bucket1 \
   --conf spark.sql.defaultCatalog=s3tablesbucket \
   --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions
   ```

1. Query your tables with Spark SQL. For example queries, see [Querying S3 tables with Spark SQL](s3-tables-client-catalog.md#query-with-spark).

------
#### [ AWS analytics services integration ]

The following prerequisites are required to query tables with Spark on Amazon EMR using the AWS analytics services integration.

**Prerequisites**
+ [Integrate your table buckets with AWS analytics services](s3-tables-integrating-aws.md).
+ Create the default service role for Amazon EMR (`EMR_DefaultRole_V2`). For details, see [Service role for Amazon EMR (EMR role) ](https://docs.aws.amazon.com//emr/latest/ManagementGuide/emr-iam-role.html).
+ Create the Amazon EC2 instance profile for Amazon EMR (`EMR_EC2_DefaultRole`). For details, see [Service role for cluster EC2 instances (EC2 instance profile)](https://docs.aws.amazon.com//emr/latest/ManagementGuide/emr-iam-role-ec2.html). 
  + Attach the `AmazonS3TablesFullAccess` policy to `EMR_EC2_DefaultRole`.

**To set up an Amazon EMR cluster to query tables with Spark**

1. Create a cluster with the following configuration. To use this example, replace the `user input placeholder` values with your own information.

   ```
   aws emr create-cluster --release-label emr-7.5.0 \
   --applications Name=Spark \
   --configurations file://configurations.json \
   --region us-east-1 \
   --name My_Spark_Iceberg_Cluster \
   --log-uri s3://amzn-s3-demo-bucket/ \
   --instance-type m5.xlarge \
   --instance-count 2 \
   --service-role EMR_DefaultRole \
   --ec2-attributes \
   
   InstanceProfile=EMR_EC2_DefaultRole,SubnetId=subnet-1234567890abcdef0,KeyName=my-key-pair
   ```

   `configurations.json`:

   ```
   [{
   "Classification":"iceberg-defaults",
   "Properties":{"iceberg.enabled":"true"}
   }]
   ```

1. [Connect to the Spark primary node using SSH](https://docs.aws.amazon.com//emr/latest/ManagementGuide/emr-connect-master-node-ssh.html#emr-connect-cli).

1. Enter the following command to initialize a Spark session for Iceberg that connects to your tables. Replace the `user input placeholders` for Region, account ID and table bucket name with your own information.

   ```
   spark-shell \
   --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \
   --conf spark.sql.defaultCatalog=s3tables \
   --conf spark.sql.catalog.s3tables=org.apache.iceberg.spark.SparkCatalog \
   --conf spark.sql.catalog.s3tables.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog \
   --conf spark.sql.catalog.s3tables.client.region=us-east-1 \
   --conf spark.sql.catalog.s3tables.glue.id=111122223333:s3tablescatalog/amzn-s3-demo-table-bucket
   ```

1. Query your tables with Spark SQL. For example queries, see [Querying S3 tables with Spark SQL](s3-tables-client-catalog.md#query-with-spark)

------

**Note**  
If you are using the `DROP TABLE PURGE` command with Amazon EMR:  
Amazon EMR version 7.5  
Set the Spark config `spark.sql.catalog.your-catalog-name.cache-enabled` to `false`. If this config is set to `true`, run the command in a new session or application so the table cache is not activated.
Amazon EMR versions higher than 7.5  
`DROP TABLE` is not supported. You can use the S3 Tables `DeleteTable` REST API to delete a table.

# Visualizing table data with Quick
<a name="s3-tables-integrating-quicksight"></a>

Quick is a fast business analytics service to build visualizations, perform ad hoc analysis, and quickly get business insights from your data. Quick seamlessly discovers AWS data sources, enables organizations to scale to hundreds of thousands of users, and delivers fast and responsive query performance by using the Quick Super-fast, Parallel, In-Memory, Calculation Engine (SPICE). For more information, see [What is Quick?](https://docs.aws.amazon.com//quicksight/latest/user/welcome.html) in the *Quick user guide*.

After you [Integrate your table buckets with AWS analytics services](s3-tables-integrating-aws.md), you can create data sets from your tables and work with them in Quick using SPICE or direct SQL queries from your query engine. Quick supports Athena as a data source for S3 tables.

## Configure permissions for Quick to access tables
<a name="quicksight-permissions-tables"></a>

Before working with S3 table data in Quick you must grant permissions to the Quick service role and the Quick admin user permissions on the tables you want to access. Additionally, if you use AWS Lake Formation, you also need to grant Lake Formation permissions to your Quick admin user on those tables you want to access.

**Grant permissions to the Quick service role**

When set up Quick for the first time in your account, AWS creates a service role that allows Quick to access data sources in other AWS services, such as Athena or Amazon Redshift. The default role name is `aws-quicksight-service-role-v0`.

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Choose **Roles** and select the Quick service role. The default name is `aws-quicksight-service-role-v0`

1. Choose **Add permissions** and then **Create inline policy**.

1. Select **JSON** to open the JSON policy editor, then add the following policy.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Sid": "VisualEditor0",
         "Effect": "Allow",
         "Action": "glue:GetCatalog",
         "Resource": "*"
       }
     ]
   }
   ```

------

1. Choose **Next**, enter a **Policy name** and then **Create policy**.

**To configure Lake Formation permissions for the Quick admin user**

1. Run the following AWS CLI command to find the ARN of your Quick admin user.

   ```
   aws quicksight list-users --aws-account-id 111122223333 --namespace default --region region
   ```

1. Grant Lake Formation permissions to this ARN. For details, see [Managing access to a table or database with Lake Formation](grant-permissions-tables.md).

## Using table data in Quick
<a name="quicksight-connect-tables"></a>

You can connect to table data using Athena as a data source.

**Prerequisites**
+ [Integrate your table buckets with AWS analytics services](s3-tables-integrating-aws.md).
  + [Create a namespace](s3-tables-namespace-create.md).
  + [Create a table](s3-tables-create.md).
  + [Configure permissions for Quick to access tables](#quicksight-permissions-tables).
+ [Sign up for Quick](https://docs.aws.amazon.com/quicksight/latest/user/signing-up.html).

1. Sign in to your Quick account at [ https://quicksight.aws.amazon.com/](https://quicksight.aws.amazon.com/.)

1. In the dashboard, choose **New analysis**.

1. Choose **New dataset**.

1. Select **Athena**.

1. Enter a **Data source name**, then choose **Create data source**.

1. Choose Use custom SQL. You will not be able to select your table from the **Choose your table** pane. 

1. Enter an Athena SQL query that captures the columns you want to visualize, then choose **Confirm query**. For example, use the following query to select all columns:

   ```
   SELECT * FROM "s3tablescatalog/table-bucket-name".namespace.table-name
   ```

1. Choose **Visualize** to analyze data and start building dashboards. For more information, see [Visualizing data in Quick ](https://docs.aws.amazon.com//quicksight/latest/user/working-with-visuals.html) and [Exploring interactive dashboards in Quick ](https://docs.aws.amazon.com//quicksight/latest/user/using-dashboards.html)

# Streaming data to tables with Amazon Data Firehose
<a name="s3-tables-integrating-firehose"></a>

Amazon Data Firehose is a fully managed service for delivering real-time [streaming data](https://aws.amazon.com//streaming-data/) to destinations such as Amazon S3, Amazon Redshift, Amazon OpenSearch Service, Splunk, Apache Iceberg tables, and custom HTTP endpoints or HTTP endpoints owned by supported third-party service providers. With Amazon Data Firehose, you don't need to write applications or manage resources. You configure your data producers to send data to Firehose, and it automatically delivers the data to the destination that you specified. You can also configure Firehose to transform your data before delivering it. To learn more about Amazon Data Firehose, see [What is Amazon Data Firehose?](https://docs.aws.amazon.com//firehose/latest/dev/what-is-this-service.html)

Complete these steps to set up Firehose streaming to tables in S3 table buckets:

1.  [Integrate your table buckets with AWS analytics services](s3-tables-integrating-aws.md). 

1. Configure Firehose to deliver data into your S3 tables. To do so, you [create an AWS Identity and Access Management (IAM) service role that allows Firehose to access your tables](#firehose-role-s3tables).

1. Grant the Firehose service role explicit permissions to your table or table's namespace. For more information, see [Grant necessary permissions](https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-prereq.html#s3-tables-prerequisites).

1. [Create a Firehose stream that routes data to your table.](#firehose-stream-tables)

## Creating a role for Firehose to use S3 tables as a destination
<a name="firehose-role-s3tables"></a>

Firehose needs an IAM [service role](https://docs.aws.amazon.com//IAM/latest/UserGuide/id_roles_create_for-service.html) with specific permissions to access AWS Glue tables and write data to S3 tables. You need this provide this IAM role when you create a Firehose stream.

1. Open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. In the left navigation pane, choose **Policies**

1. Choose **Create a policy**, and choose **JSON** in policy editor.

1. Add the following inline policy that grants permissions to all databases and tables in your data catalog. If you want, you can give permissions only to specific tables and databases. To use this policy, replace the `user input placeholders` with your own information.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Sid": "S3TableAccessViaGlueFederation",
               "Effect": "Allow",
               "Action": [
                   "glue:GetTable",
                   "glue:GetDatabase",
                   "glue:UpdateTable"
               ],
               "Resource": [
                   "arn:aws:glue:us-east-1:111122223333:catalog/s3tablescatalog/*",
                   "arn:aws:glue:us-east-1:111122223333:catalog/s3tablescatalog",
                   "arn:aws:glue:us-east-1:111122223333:catalog",
                   "arn:aws:glue:us-east-1:111122223333:database/*",
                   "arn:aws:glue:us-east-1:111122223333:table/*/*"
               ]
           },
           {
               "Sid": "S3DeliveryErrorBucketPermission",
               "Effect": "Allow",
               "Action": [
                   "s3:AbortMultipartUpload",
                   "s3:GetBucketLocation",
                   "s3:GetObject",
                   "s3:ListBucket",
                   "s3:ListBucketMultipartUploads",
                   "s3:PutObject"
               ],
               "Resource": [
                   "arn:aws:s3:::error delivery bucket",
                   "arn:aws:s3:::error delivery bucket/*"
               ]
           },
           {
               "Sid": "RequiredWhenUsingKinesisDataStreamsAsSource",
               "Effect": "Allow",
               "Action": [
                   "kinesis:DescribeStream",
                   "kinesis:GetShardIterator",
                   "kinesis:GetRecords",
                   "kinesis:ListShards"
               ],
               "Resource": "arn:aws:kinesis:us-east-1:111122223333:stream/stream-name"
           },
           {
               "Sid": "RequiredWhenDoingMetadataReadsANDDataAndMetadataWriteViaLakeformation",
               "Effect": "Allow",
               "Action": [
                   "lakeformation:GetDataAccess"
               ],
               "Resource": "*"
           },
           {
               "Sid": "RequiredWhenUsingKMSEncryptionForS3ErrorBucketDelivery",
               "Effect": "Allow",
               "Action": [
                   "kms:Decrypt",
                   "kms:GenerateDataKey"
               ],
               "Resource": [
                   "arn:aws:kms:us-east-1:111122223333:key/KMS-key-id"
               ],
               "Condition": {
                   "StringEquals": {
                       "kms:ViaService": "s3.us-east-1.amazonaws.com"
                   },
                   "StringLike": {
                       "kms:EncryptionContext:aws:s3:arn": "arn:aws:s3:::error delivery bucket/prefix*"
                   }
               }
           },
           {
               "Sid": "LoggingInCloudWatch",
               "Effect": "Allow",
               "Action": [
                   "logs:PutLogEvents"
               ],
               "Resource": [
                   "arn:aws:logs:us-east-1:111122223333:log-group:log-group-name:log-stream:log-stream-name"
               ]
           },
           {
               "Sid": "RequiredWhenAttachingLambdaToFirehose",
               "Effect": "Allow",
               "Action": [
                   "lambda:InvokeFunction",
                   "lambda:GetFunctionConfiguration"
               ],
               "Resource": [
                   "arn:aws:lambda:us-east-1:111122223333:function:function-name:function-version"
               ]
           }
       ]
   }
   ```

------

   This policy has a statements that allow access to Kinesis Data Streams, invoking Lambda functions and access to AWS KMS keys. If you don't use any of these resources, you can remove the respective statements.

   If error logging is enabled, Firehose also sends data delivery errors to your CloudWatch log group and streams. For this, you must configure log group and log stream names. For log group and log stream names, see [Monitor Amazon Data Firehose Using CloudWatch Logs](https://docs.aws.amazon.com//firehose/latest/dev/controlling-access.html#using-iam-iceberg).

1. After you create the policy, create an IAM role with **AWS service** as the **Trusted entity type**.

1. For **Service or use case**, choose **Kinesis**. For **Use case** choose **Kinesis Firehose**.

1. Choose **Next**, and then select the policy you created earlier.

1. Give your role a name. Review your role details, and choose **Create role**. The role will have the following trust policy.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "sts:AssumeRole"
               ],
               "Principal": {
                   "Service": [
                       "firehose.amazonaws.com"
                   ]
               }
           }
       ]
   }
   ```

------

## Creating a Firehose stream to S3 tables
<a name="firehose-stream-tables"></a>

The following procedure shows how to create a Firehose stream to deliver data to S3 tables using the console. The following prerequisites are required to set up a Firehose stream to S3 tables.

**Prerequisites**
+ [Integrate your table buckets with AWS analytics services](s3-tables-integrating-aws.md).
  + [Create a namespace](s3-tables-namespace-create.md).
  + [Create a table](s3-tables-create.md).
+ Create the [ Role for Firehose to access S3 Tables](#firehose-role-s3tables).
+ [Grant necessary permissions](https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-prereq.html#s3-tables-prerequisites) to the Firehose service role you created to access tables.

To provide routing information to Firehose when you configure a stream, you use your namespace as the database name and the name of a table in that namespace. You can use these values in the Unique key section of a Firehose stream configuration to route data to a single table. You can also use this values to route to a table using JSON Query expressions. For more information, see [Route incoming records to a single Iceberg table](https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-format-input-record.html). 

**To set up a Firehose stream to S3 tables (Console)**

1. Open the Firehose console at [https://console.aws.amazon.com/firehose/](https://console.aws.amazon.com/firehose/).

1. Choose **Create Firehose stream**.

1. For **Source**, choose one of the following sources:
   + Amazon Kinesis Data Streams
   + Amazon MSK
   + Direct PUT

1. For **Destination**, choose **Apache Iceberg Tables**.

1. Enter a **Firehose stream name**.

1. Configure your **Source settings**.

1. For **Destination settings**, choose **Current account** to stream to tables in your account or **Cross-account** for tables in another account.
   + For tables in the **Current account**, select your S3 Tables catalog from the **Catalog** dropdown.
   + For tables in a **Cross-account**, enter the **Catalog ARN** of the catalog you want to stream to in another account.

1. Configure database and table names using **Unique Key configuration**, JSONQuery expressions, or in a Lambda function. For more information, refer to [Route incoming records to a single Iceberg table](https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-format-input-record.html) and [Route incoming records to different Iceberg tables](https://docs.aws.amazon.com//firehose/latest/dev/apache-iceberg-format-input-record-different.html) in the *Amazon Data Firehose Developer Guide*.

1. Under **Backup settings**, specify a **S3 backup bucket**.

1. For **Existing IAM roles** under **Advanced settings**, select the IAM role you created for Firehose.

1. Choose **Create Firehose stream**.

For more information about the other settings that you can configure for a stream, see [Set up the Firehose stream](https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-stream.html) in the *Amazon Data Firehose Developer Guide*.

# Running ETL jobs on Amazon S3 tables with AWS Glue
<a name="s3-tables-integrating-glue"></a>

AWS Glue is a serverless data integration service that makes it easy for analytics users to discover, prepare, move, and integrate data from multiple sources. You can use AWS Glue jobs to run extract, transform, and load (ETL) pipelines to load data into your data lakes. For more information about AWS Glue, see [What is AWS Glue?](https://docs.aws.amazon.com//glue/latest/dg/what-is-glue.html) in the *AWS Glue Developer Guide*.

An AWS Glue job encapsulates a script that connects to your source data, processes it, and then writes it out to your data target. Typically, a job runs extract, transform, and load (ETL) scripts. Jobs can run scripts designed for Apache Spark runtime environments. You can monitor job runs to understand runtime metrics such as completion status, duration, and start time.

You can use AWS Glue jobs to process data in your S3 tables by connecting to your tables through the integration with AWS analytics services, or, connect directly using the Amazon S3 Tables Iceberg REST endpoint or the Amazon S3 Tables Catalog for Apache Iceberg. This guide covers the basic steps to get started using AWS Glue with S3 Tables, including:

**Topics**
+ [

## Step 1 – Prerequisites
](#glue-etl-prereqs)
+ [

## Step 2 – Create a script to connect to table buckets
](#glue-etl-script)
+ [

## Step 3 – Create a AWS Glue job that queries tables
](#glue-etl-job)

Choose your access method based on your specific AWS Glue ETL job requirements:
+ **AWS analytics services integration (Recommended)** – Recommended when you need centralized metadata management across multiple AWS analytics services, need to leverage existing AWS Glue Data Catalog permissions and optionally Lake Formation, or are building production ETL pipelines that integrate with other AWS services like Athena or Amazon EMR.
+ **Amazon S3 Tables Iceberg REST endpoint** – Recommended when you need to connect to S3 tables from third-party query engines that support Apache Iceberg, build custom ETL applications that need direct REST API access, or when you require control over catalog operations without dependencies on AWS Glue Data Catalog.
+ **Amazon S3 Tables Catalog for Apache Iceberg** – Use only for legacy applications or specific programmatic scenarios that require the Java client library. This method is not recommended for new AWS Glue ETL job implementations due to additional `JAR` dependency management and complexity.

**Note**  
S3 Tables is supported on [AWS Glue version 5.0 or higher](https://docs.aws.amazon.com//glue/latest/dg/release-notes.html).

## Step 1 – Prerequisites
<a name="glue-etl-prereqs"></a>

Before you can query tables from a AWS Glue job you must configure an IAM role that AWS Glue can use to run the job. Choose your method of access to see specific prerequisites for that method.

------
#### [ AWS analytics services integration (Recommended) ]

Prerequisites required to use the S3 Tables AWS analytics integration to run AWS Glue jobs.
+ [Integrate your table buckets with AWS analytics services](s3-tables-integrating-aws.md).
+ [Create an IAM role for AWS Glue](https://docs.aws.amazon.com//glue/latest/dg/create-an-iam-role.html).
  + Attach the `AmazonS3TablesFullAccess` managed policy to the role.
  + Attach the `AmazonS3FullAccess` managed policy to the role.

------
#### [ Amazon S3 Tables Iceberg REST endpoint ]

Prerequisites to use the Amazon S3 Tables Iceberg REST endpoint to run AWS Glue ETL jobs.
+ [Create an IAM role for AWS Glue](https://docs.aws.amazon.com//glue/latest/dg/create-an-iam-role.html).
  + Attach the `AmazonS3TablesFullAccess` managed policy to the role.
  + Attach the `AmazonS3FullAccess` managed policy to the role.

------
#### [ Amazon S3 Tables Catalog for Apache Iceberg ]

Prerequisites use the Amazon S3 Tables Catalog for Apache Iceberg to run AWS Glue ETL jobs.
+ [Create an IAM role for AWS Glue](https://docs.aws.amazon.com//glue/latest/dg/create-an-iam-role.html).
  + Attach the `AmazonS3TablesFullAccess` managed policy to the role.
  + Attach the `AmazonS3FullAccess` managed policy to the role.
  + To use the Amazon S3 Tables Catalog for Apache Iceberg you need to download the client catalog JAR and upload it to an S3 bucket.

****Downloading the catalog JAR****

    1. Check for the latest version on [Maven Central](https://mvnrepository.com/artifact/software.amazon.s3tables/s3-tables-catalog-for-iceberg-runtime). You can download the JAR from Maven central using your browser, or using the following command. Make sure to replace the *version number* with the latest version.

       ```
       wget https://repo1.maven.org/maven2/software/amazon/s3tables/s3-tables-catalog-for-iceberg-runtime/0.1.5/s3-tables-catalog-for-iceberg-runtime-0.1.5.jar                       
       ```

    1. Upload the downloaded JAR to an S3 bucket that your AWS Glue IAM role can access. You can use the following AWS CLI command to upload the JAR. Make sure to replace the *version number* with the latest version, and the *bucket name* and *path* with your own.

       ```
       aws s3 cp s3-tables-catalog-for-iceberg-runtime-0.1.5.jar s3://amzn-s3-demo-bucket/jars/
       ```

------

## Step 2 – Create a script to connect to table buckets
<a name="glue-etl-script"></a>

To access your table data when you run an AWS Glue ETL job, you configure a Spark session for Apache Iceberg that connects to your S3 table bucket. You can modify an existing script to connect to your table bucket or create a new script. For more information on creating AWS Glue scripts, see [Tutorial: Writing an AWS Glue for Spark script](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-intro-tutorial.html) in the *AWS Glue Developer Guide*.

You can configure the session to connect to your table buckets through the any of the following S3 Tables access methods:
+ S3 Tables AWS analytics services integration (Recommended)
+ Amazon S3 Tables Iceberg REST endpoint
+ Amazon S3 Tables Catalog for Apache Iceberg

Choose from the following access methods to view setup instructions and configuration examples.

------
#### [ AWS analytics services integration (Recommended) ]

As a prerequisites to query tables with Spark on AWS Glue using the AWS analytics services integration, you must [Integrate your table buckets with AWS analytics services](s3-tables-integrating-aws.md)

You can configure the connection to your table bucket through a Spark session in a job or with AWS Glue Studio magics in an interactive session. To use the following examples, replace the *placeholder values* with the information for your own table bucket.

**Using a PySpark script**  
Use the following code snippet in a PySpark script to configure a AWS Glue job to connect to your table bucket using the integration.  

```
spark = SparkSession.builder.appName("SparkIcebergSQL") \
    .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.2") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.defaultCatalog","s3tables") \
    .config("spark.sql.catalog.s3tables", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.s3tables.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") \
    .config("spark.sql.catalog.s3tables.glue.id", "111122223333:s3tablescatalog/amzn-s3-demo-table-bucket") \
    .config("spark.sql.catalog.s3tables.warehouse", "s3://amzn-s3-demo-table-bucket/warehouse/") \
    .getOrCreate()
```

**Using an interactive AWS Glue session**  
If you are using an interactive notebook session with AWS Glue 5.0, specify the same configurations using the `%%configure` magic in a cell prior to code execution.  

```
%%configure
{"conf": "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.defaultCatalog=s3tables --conf spark.sql.catalog.s3tables=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.s3tables.catalog-impl=org.apache.iceberg.aws.glue.GlueCatalog --conf spark.sql.catalog.s3tables.glue.id=111122223333:s3tablescatalog/amzn-s3-demo-table-bucket --conf spark.sql.catalog.s3tables.warehouse=s3://amzn-s3-demo-table-bucket/warehouse/"}
```

------
#### [ Amazon S3 Tables Iceberg REST endpoint ]

You can configure the connection to your table bucket through a Spark session in a job or with AWS Glue Studio magics in an interactive session. To use the following examples, replace the *placeholder values* with the information for your own table bucket.

**Using a PySpark script**  
Use the following code snippet in a PySpark script to configure a AWS Glue job to connect to your table bucket using the endpoint.   

```
spark = SparkSession.builder.appName("glue-s3-tables-rest") \
    .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.2") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.defaultCatalog", "s3_rest_catalog") \
    .config("spark.sql.catalog.s3_rest_catalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.s3_rest_catalog.type", "rest") \
    .config("spark.sql.catalog.s3_rest_catalog.uri", "https://s3tables.Region.amazonaws.com/iceberg") \
    .config("spark.sql.catalog.s3_rest_catalog.warehouse", "arn:aws:s3tables:Region:111122223333:bucket/amzn-s3-demo-table-bucket") \
    .config("spark.sql.catalog.s3_rest_catalog.rest.sigv4-enabled", "true") \
    .config("spark.sql.catalog.s3_rest_catalog.rest.signing-name", "s3tables") \
    .config("spark.sql.catalog.s3_rest_catalog.rest.signing-region", "Region") \
    .config('spark.sql.catalog.s3_rest_catalog.io-impl','org.apache.iceberg.aws.s3.S3FileIO') \
    .config('spark.sql.catalog.s3_rest_catalog.rest-metrics-reporting-enabled','false') \
    .getOrCreate()
```

**Using an interactive AWS Glue session**  
If you are using an interactive notebook session with AWS Glue 5.0, specify the same configurations using the `%%configure` magic in a cell prior to code execution. Replace the placeholder values with the information for your own table bucket.  

```
%%configure
{"conf": "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.defaultCatalog=s3_rest_catalog --conf spark.sql.catalog.s3_rest_catalog=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.s3_rest_catalog.type=rest --conf spark.sql.catalog.s3_rest_catalog.uri=https://s3tables.Region.amazonaws.com/iceberg --conf spark.sql.catalog.s3_rest_catalog.warehouse=arn:aws:s3tables:Region:111122223333:bucket/amzn-s3-demo-table-bucket --conf spark.sql.catalog.s3_rest_catalog.rest.sigv4-enabled=true --conf spark.sql.catalog.s3_rest_catalog.rest.signing-name=s3tables --conf spark.sql.catalog.s3_rest_catalog.rest.signing-region=Region --conf spark.sql.catalog.s3_rest_catalog.io-impl=org.apache.iceberg.aws.s3.S3FileIO --conf spark.sql.catalog.s3_rest_catalog.rest-metrics-reporting-enabled=false"}
```

------
#### [ Amazon S3 Tables Catalog for Apache Iceberg ]

As a prerequisite to connecting to tables using the Amazon S3 Tables Catalog for Apache Iceberg you must first download the latest catalog jar and upload it to an S3 bucket. Then, when you create your job, you add the the path to the client catalog JAR as a special parameter. For more information on job parameters in AWS Glue, see [Special parameters used in AWS Glue jobs](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-glue-arguments.html) in the *AWS Glue Developer Guide*.

You can configure the connection to your table bucket through a Spark session in a job or with AWS Glue Studio magics in an interactive session. To use the following examples, replace the *placeholder values* with the information for your own table bucket.

**Using a PySpark script**  
Use the following code snippet in a PySpark script to configure a AWS Glue job to connect to your table bucket using the JAR. Replace the placeholder values with the information for your own table bucket.  

```
spark = SparkSession.builder.appName("glue-s3-tables") \
    .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.2") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.defaultCatalog", "s3tablesbucket") \
    .config("spark.sql.catalog.s3tablesbucket", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.s3tablesbucket.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
    .config("spark.sql.catalog.s3tablesbucket.warehouse", "arn:aws:s3tables:Region:111122223333:bucket/amzn-s3-demo-table-bucket") \
    .getOrCreate()
```

**Using an interactive AWS Glue session**  
If you are using an interactive notebook session with AWS Glue 5.0, specify the same configurations using the `%%configure` magic in a cell prior to code execution. Replace the placeholder values with the information for your own table bucket.  

```
%%configure
{"conf": "spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions --conf spark.sql.defaultCatalog=s3tablesbucket --conf spark.sql.catalog.s3tablesbucket=org.apache.iceberg.spark.SparkCatalog --conf spark.sql.catalog.s3tablesbucket.catalog-impl=software.amazon.s3tables.iceberg.S3TablesCatalog --conf spark.sql.catalog.s3tablesbucket.warehouse=arn:aws:s3tables:Region:111122223333:bucket/amzn-s3-demo-table-bucket", "extra-jars": "s3://amzn-s3-demo-bucket/jars/s3-tables-catalog-for-iceberg-runtime-0.1.5.jar"}
```

------

### Sample scripts
<a name="w2aac20c25c29c19c13"></a>

The following example PySpark scripts can be used to test querying S3 tables with an AWS Glue job. These scripts connect to your table bucket and runs queries to: create a new namespace, create a sample table, insert data into the table, and return the table data. To use the scripts, replace the *placeholder values* with the information for you own table bucket.

Choose from the following scripts based on your S3 Tables access method.

------
#### [ S3 Tables integration with AWS analytics services ]

```
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("SparkIcebergSQL") \
    .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.2") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.defaultCatalog","s3tables")
    .config("spark.sql.catalog.s3tables", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.s3tables.catalog-impl", "org.apache.iceberg.aws.glue.GlueCatalog") \
    .config("spark.sql.catalog.s3tables.glue.id", "111122223333:s3tablescatalog/amzn-s3-demo-table-bucket") \
    .config("spark.sql.catalog.s3tables.warehouse", "s3://amzn-s3-demo-table-bucket/bucket/amzn-s3-demo-table-bucket") \
    .getOrCreate()

namespace = "new_namespace"
table = "new_table"

spark.sql("SHOW DATABASES").show()

spark.sql(f"DESCRIBE NAMESPACE {namespace}").show()

spark.sql(f"""
    CREATE TABLE IF NOT EXISTS {namespace}.{table} (
       id INT,
       name STRING,
       value INT
    )
""")

spark.sql(f"""
    INSERT INTO {namespace}.{table}
    VALUES 
       (1, 'ABC', 100),
       (2, 'XYZ', 200)
""")

spark.sql(f"SELECT * FROM {namespace}.{table} LIMIT 10").show()
```

------
#### [ Amazon S3 Tables Iceberg REST endpoint ]

```
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("glue-s3-tables-rest") \
    .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.2") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.defaultCatalog", "s3_rest_catalog") \
    .config("spark.sql.catalog.s3_rest_catalog", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.s3_rest_catalog.type", "rest") \
    .config("spark.sql.catalog.s3_rest_catalog.uri", "https://s3tables.Region.amazonaws.com/iceberg") \
    .config("spark.sql.catalog.s3_rest_catalog.warehouse", "arn:aws:s3tables:Region:111122223333:bucket/amzn-s3-demo-table-bucket") \
    .config("spark.sql.catalog.s3_rest_catalog.rest.sigv4-enabled", "true") \
    .config("spark.sql.catalog.s3_rest_catalog.rest.signing-name", "s3tables") \
    .config("spark.sql.catalog.s3_rest_catalog.rest.signing-region", "Region") \
    .config('spark.sql.catalog.s3_rest_catalog.io-impl','org.apache.iceberg.aws.s3.S3FileIO') \
    .config('spark.sql.catalog.s3_rest_catalog.rest-metrics-reporting-enabled','false') \
    .getOrCreate()

namespace = "s3_tables_rest_namespace"
table = "new_table_s3_rest"

spark.sql("SHOW DATABASES").show()

spark.sql(f"DESCRIBE NAMESPACE {namespace}").show()

spark.sql(f"""
    CREATE TABLE IF NOT EXISTS {namespace}.{table} (
       id INT,
       name STRING,
       value INT
    )
""")

spark.sql(f"""
    INSERT INTO {namespace}.{table}
    VALUES 
       (1, 'ABC', 100),
       (2, 'XYZ', 200)
""")

spark.sql(f"SELECT * FROM {namespace}.{table} LIMIT 10").show()
```

------
#### [ Amazon S3 Tables Catalog for Apache Iceberg ]

```
from pyspark.sql import SparkSession

#Spark session configurations
spark = SparkSession.builder.appName("glue-s3-tables") \
    .config("spark.jars.packages", "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.4.2") \
    .config("spark.sql.extensions", "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions") \
    .config("spark.sql.defaultCatalog", "s3tablesbucket") \
    .config("spark.sql.catalog.s3tablesbucket", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.s3tablesbucket.catalog-impl", "software.amazon.s3tables.iceberg.S3TablesCatalog") \
    .config("spark.sql.catalog.s3tablesbucket.warehouse", "arn:aws:s3tables:Region:111122223333:bucket/amzn-s3-demo-table-bucket") \
    .getOrCreate()

#Script
namespace = "new_namespace"
table = "new_table"

spark.sql(f"CREATE NAMESPACE IF NOT EXISTS s3tablesbucket.{namespace}")
spark.sql(f"DESCRIBE NAMESPACE {namespace}").show()

spark.sql(f"""
    CREATE TABLE IF NOT EXISTS {namespace}.{table} (
       id INT,
       name STRING,
       value INT
    )
""")

spark.sql(f"""
    INSERT INTO {namespace}.{table}
    VALUES 
       (1, 'ABC', 100),
       (2, 'XYZ', 200)
""")

spark.sql(f"SELECT * FROM {namespace}.{table} LIMIT 10").show()
```

------

## Step 3 – Create a AWS Glue job that queries tables
<a name="glue-etl-job"></a>

The following procedures show how to setup AWS Glue jobs that connect to your S3 table buckets. You can do this using the AWS CLI or using the console with AWS Glue Studio script editor. For more information, see [Authoring jobs in AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/author-job-glue.html) in the *AWS Glue User Guide*.

### Using AWS Glue Studio script editor
<a name="tables-glue-studio-job"></a>

The following procedure shows how to use the AWS Glue Studio script editor to create an ETL job that queries your S3 tables.

**Prerequisites**
+ [Step 1 – Prerequisites](#glue-etl-prereqs)
+ [Step 2 – Create a script to connect to table buckets](#glue-etl-script)

1. Open the AWS Glue console at [https://console.aws.amazon.com/glue/](https://console.aws.amazon.com/glue/).

1. From the Navigation pane, choose **ETL jobs**.

1. Choose **Script editor**, then choose **Upload script** and upload the PySpark script you created to query S3 tables.

1. Select the **Job details** tab and enter the following for **Basic properties**.
   + For **Name**, enter a name for the job.
   + For **IAM Role**, select the role you created for AWS Glue.

1. (Optional) If you are using the Amazon S3 Tables Catalog for Apache Iceberg access method, expand **Advanced properties** and for **Dependent JARs path**, enter the S3 URI of the client catalog jar your uploaded to an S3 bucket as a prerequisite. For example, s3://*amzn-s3-demo-bucket1*/*jars*/s3-tables-catalog-for-iceberg-runtime-*0.1.5*.jar

1. Choose **Save** to create the job.

1. Choose **Run** start the job, and review the job status under the **Runs** tab.

### Using the AWS CLI
<a name="tables-glue-cli-job"></a>

The following procedure shows how to use the AWS CLI to create an ETL job that queries your S3 tables. To use the commands replace the *placeholder values* with your own.

**Prerequisites**
+ [Step 1 – Prerequisites](#glue-etl-prereqs)
+ [Step 2 – Create a script to connect to table buckets](#glue-etl-script) and upload it to an S3 bucket.

1. Create an AWS Glue job.

   ```
   aws glue create-job \
   --name etl-tables-job \
   --role arn:aws:iam::111122223333:role/AWSGlueServiceRole \
   --command '{
       "Name": "glueetl",
       "ScriptLocation": "s3://amzn-s3-demo-bucket1/scripts/glue-etl-query.py",
       "PythonVersion": "3"
   }' \
   --default-arguments '{
       "--job-language": "python",
       "--class": "GlueApp"
   }' \
   --glue-version "5.0"
   ```
**Note**  
(Optional) If you are using the Amazon S3 Tables Catalog for Apache Iceberg access method, add the client catalog JAR to the `--default-arguments` using the `--extra-jars` parameter. Replace the *input placeholders* with your own when you add the parameter.  

   ```
                               "--extra-jars": "s3://amzn-s3-demo-bucket/jar-path/s3-tables-catalog-for-iceberg-runtime-0.1.5.jar" 
   ```

1. Start your job.

   ```
   aws glue start-job-run \
   --job-name etl-tables-job
   ```

1. To review you job status, copy the run ID from the previous command and enter it into the following command.

   ```
   aws glue get-job-run --job-name etl-tables-job \
   --run-id jr_ec9a8a302e71f8483060f87b6c309601ea9ee9c1ffc2db56706dfcceb3d0e1ad
   ```

# Getting started querying S3 Tables with Amazon SageMaker Unified Studio
<a name="s3-tables-integrating-sagemaker"></a>

Amazon SageMaker Unified Studio is a comprehensive analytics service that enables you to query and derive insights from your data using SQL, natural language, and interactive notebooks. It supports team collaboration and analysis workflows across AWS data repositories and third-party sources within a unified interface. SageMaker Unified Studio integrates directly with S3 Tables, providing a seamless transition from data storage to analysis within the Amazon S3 console.

You can integrate S3 Tables with SageMaker Unified Studio through the Amazon S3 console or SageMaker Unified Studio console.

For setup through the SageMaker Unified Studio console, see the [SageMaker Unified Studio documentation](https://docs.aws.amazon.com/next-generation-sagemaker/latest/userguide/s3-tables-integration.html).

## Requirements for querying S3 Tables with SageMaker Unified Studio
<a name="sagemaker-unified-studio-requirements"></a>

Using SageMaker Unified Studio with S3 Tables requires the following:
+ Your table buckets have been integrated with AWS analytics services in the current Region. For more information, see [Integrating S3 Tables with AWS analytics services](s3-tables-integrating-aws.md).
+ You are using an IAM role with permissions to create and view resources in SageMaker Unified Studio. For more information, see [Setup IAM-based domains in SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/setup-iam-based-domains.html).
+ You have a SageMaker domain and project. For more information, see [Domains](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/working-with-domains.html) in the *SageMaker Unified Studio Administrator Guide*, and [Projects](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/projects.html) in the *SageMaker Unified Studio User Guide*.

If you haven't already performed these actions or created these resources, S3 Tables can automatically complete this set up for you so you can begin querying with SageMaker Unified Studio.

## Getting started querying S3 Tables with SageMaker Unified Studio
<a name="sagemaker-unified-studio-getting-started"></a>

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/).

1. In the left navigation pane, choose **Table buckets**.

1. On the **Table buckets** page, choose the bucket that contains the table that you want to query.

1. On the bucket details page, select the table that you want to query.

1. Choose **Query**. 

1. Then, choose **Query table in SageMaker Unified Studio**.

   1. If you've already configured SageMaker Unified Studio for your tables the SageMaker Unified Studio console opens to the query editor with a sample `SELECT` query loaded for you. Modify this query as needed for your use case.

   1. If you haven't already configured SageMaker Unified Studio for S3 Tables, a set up page appears with a single step to enable Integration with AWS analytics services which integrates your tables with services like SageMaker Unified Studio. This step will execute automatically then you will be redirected to a page in the SageMaker Unified Studio console with the following options to configure your account for querying S3 Tables:

      1. In **Setting you up as an administrator**, your current federated IAM role is selected. If your current role does not already have the required permissions, you will need to [setup an IAM-based domain in SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/setup-iam-based-domains.html) and assign permissions to your role so you can login to SageMaker Unified Studio.

      1. In **Project data and administrative control**, select **Auto-create a new role with required permissions** to automatically create a role with the required permissions, or select **Use an existing role** and choose a role. If the chosen role does not already have the required permissions, you will need to [setup an IAM-based domain in SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/setup-iam-based-domains.html) and assign permissions to your admin execution role so you can access data in SageMaker Unified Studio.

      1. In **Data encryption** select **Use AWS owned key** to let AWS own and manage a key for you or **Choose a different AWS AWS KMS key (advanced)** to use an existing key or to create a new one.

      1. Select **Set up SageMaker Unified Studio**.

      1. Next, the SageMaker Unified Studio console opens to the query editor with a sample `SELECT` query loaded for you. Modify this query as needed for your use case.

         In the query editor, the **Catalog** field should be populated with `s3tablescatalog/` followed by the name of your table bucket, for example, `s3tablescatalog/amzn-s3-demo-table-bucket`. The **Database** field is populated with the namespace where your table is stored.