

# Federating into external data sources in the AWS Glue Data Catalog
<a name="federated-catalog-data-connection"></a>

 You can connect the AWS Glue Data Catalog (Data Catalog) to data warehouses such as Amazon Redshift, Snowflake, cloud databases such as Amazon RDS, Amazon DynamoDB, Oracle, and streaming services such as Amazon MSK, and on-premises systems such as Teradata using AWS Glue connections. These connections are stored in the AWS Glue Data Catalog and registered with AWS Lake Formation, allowing you to create a federated catalog for each available data source. 

 A *federated catalog* is a top level container that points to a database in an external data system. It enables you to query the data directly from the external data system without extract, transform, and load (ETL) process. 

For more information about AWS Glue connections, see [Connecting to data](https://docs.aws.amazon.com/glue/latest/dg/glue-connections.html) in the AWS Glue Developer Guide.

Data lake administrators can create federated catalogs using [Amazon SageMaker Lakehouse](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/lakehouse.html) or [Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/connect-to-a-data-source.html).

Data lake administrators can then grant fine-grained permissions on the objects within the catalog using Lake Formation, controlling access at various levels such as catalog, database, table, column, row, or cell. Data analysts can discover and query the cataloged data sources using Athena, with Lake Formation enforcing the defined access policies. Analysts can join data across multiple sources in a single query without needing to connect to each source individually. 

**Topics**
+ [Workflow](#connect-data-source-workflow)
+ [Prerequisites for connecting the Data Catalog to external data sources](connect-data-source-prerequisites.md)
+ [Creating a federated catalog using an AWS Glue connection](create-fed-catalog-data-source.md)
+ [Viewing catalog objects](view-fed-glue-connection-catalog.md)
+ [Deleting a federated catalog](delete-glue-fed-catalog.md)
+ [Querying federated catalogs](query-glue-fed-catalog.md)
+ [Additional resources](additional-resources-fed-connection.md)

## Workflow
<a name="connect-data-source-workflow"></a>

A data lake administrator or a user with the required permissions completes the following the steps for connecting the AWS Glue Data Catalog to an external data source.

1.  Creates an AWS Glue connection to the data source. When you register the connection, the IAM role used in registering the connection must have access to the Lambda function and the Amazon S3 spill bucket location. 

1.  Registers the connection with Lake Formation. 

1.  Creates a federated catalog in the Data Catalog using a AWS Glue connection to connect to the available data sources. The databases, tables, and views are automatically cataloged in the Data Catalog, and registered with Lake Formation. 

1.  Grants access to specific catalogs, databases, and tables to data analysts using Lake Formation permissions. Fine-grained access control policies can be defined across data lakes, warehouses, and OLTP sources using Lake Formation, enabling row-level and column-level security filters. 

    Data analysts can then access all data through the Data Catalog using SQL queries in Athena, without needing separate connections or data source credentials. Analysts can run federated SQL queries that scan data from multiple sources, joining data in-place without complex data pipelines. 

# Prerequisites for connecting the Data Catalog to external data sources
<a name="connect-data-source-prerequisites"></a>

To connect the AWS Glue Data Catalog to external data sources, register the connection with Lake Formation, and set up federated catalogs, you need to complete the following requirements:
**Note**  
We recommend that a Lake Formation data lake administrator creates the AWS Glue connections to connect to external data sources, and create the federated catalogs. 

1. 

**Create IAM roles.**
   +  Create a role that has the necessary permissions to deploy resources (Lambda function, Amazon S3 spill bucket, IAM role, and the AWS Glue connection) required to create a connection to the external data source. 
   + Create a role that has the necessary minimum permissions to access the AWS Glue connection properties (the Lambda function and the Amazon S3 spill bucket). This is the role that you'll include when you register the connection with Lake Formation.

     To use Lake Formation to manage and secure the data in your data lake, you must register the AWS Glue connection with Lake Formation. By doing so, Lake Formation can vend credentials to Amazon Athena for querying the federated data sources. 

     The role must have `Select` or `Describe` permissions on the Amazon S3 bucket and the Lambda function.
     +  s3:ListBucket 
     + s3:GetObject
     +  lambda:InvokeFunction 

------
#### [ JSON ]

****  

     ```
     {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
         {
           "Effect": "Allow",
           "Action": [
             "s3:*"
           ],
           "Resource": [
             "arn:aws:s3:::amzn-s3-demo-bucket1/object/*",
             "arn:aws:s3:::amzn-s3-demo-bucket1/object"
           ]
         },
         {
           "Sid": "lambdainvoke",
           "Effect": "Allow",
           "Action": "lambda:InvokeFunction",
           "Resource": "arn:aws:lambda:us-east-1:123456789012:function:example-lambda-function"
         },
         {
           "Sid": "gluepolicy",
           "Effect": "Allow",
           "Action": "glue:*",
           "Resource": "*"
         }
       ]
     }
     ```

------
   + Add the following trust policy to the IAM role that is used in registering the connection:

------
#### [ JSON ]

****  

     ```
     {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Effect": "Allow",
                 "Principal": {
                     "Service": [
                         "lakeformation.amazonaws.com"
                   ]
                 },
                 "Action": "sts:AssumeRole"
             }
         ]
     }
     ```

------
   + The data lake administrator who registers the connection must have the `iam:PassRole` permission on the role.

     The following is an inline policy that grants this permission. Replace *<account-id>* with a valid AWS account number, and replace *<role-name>* with the name of the role.

------
#### [ JSON ]

****  

     ```
     {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Sid": "PassRolePermissions",
                 "Effect": "Allow",
                 "Action": [
                     "iam:PassRole"
                 ],
                 "Resource": [
                     "arn:aws:iam::111122223333:role/example-role-name>"
                 ]
             }
         ]
     }
     ```

------
   +  To create federated catalogs in Data Catalog, make sure the IAM role you’re using is a Lake Formation data lake administrator by checking the data lake settings (`aws lakeformation get-data-lake-settings`).

      If you're not a data lake administrator, you need the Lake Formation `CREATE_CATALOG` permission to create a catalog. The following example shows how to grant the required permissions to create catalogs. 

     ```
     aws lakeformation grant-permissions \
     --cli-input-json \
             '{
                 "Principal": {
                  "DataLakePrincipalIdentifier":"arn:aws:iam::123456789012:role/non-admin"
                 },
                 "Resource": {
                     "Catalog": {
                     }
                 },
                 "Permissions": [
                     "CREATE_CATALOG",
                     "DESCRIBE"
                 ]
             }'
     ```

1. Add the following key policy to the AWS KMS key if you're using a customer managed key to encrypt the data in the data source. Replace the account number with a valid AWS account number, and specify role name. By default, the data is encrypted using an KMS key. Lake Formation provides an option to create your custom KMS key for encryption. If you're using a customer managed key, you must add specific key policies to the key. 

   For more information about managing the permissions of a customer managed key, see [Customer managed keys](https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#customer-cmk).

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "kms:Encrypt",
                   "kms:Decrypt",
                   "kms:ReEncrypt*",
                   "kms:GenerateDataKey*",
                   "kms:DescribeKey"
               ],
               "Resource": "arn:aws:kms:us-east-1:123456789012:key/key-1"
           }
       ]
   }
   ```

------

# Creating a federated catalog using an AWS Glue connection
<a name="create-fed-catalog-data-source"></a>

 To connect the AWS Glue Data Catalog to external data sources, you need to use AWS Glue connections that enable communication with the external data sources. You can create AWS Glue connections using the AWS Glue console, [Create connection](https://docs.aws.amazon.com/glue/latest/webapi/API_CreateConnection.html) API, and Amazon SageMaker Lakehouse console. 

For step by step instructions for creating an AWS Glue connection, see [Connecting to data](https://docs.aws.amazon.com/glue/latest/dg/glue-connections.html) in the AWS Glue Developer Guide or [Creating connections in Amazon SageMaker Lakehouse](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/lakehouse-create-connection.html). 

When a user runs a query on federated tables, Lake Formation vends credentials that invoke an AWS Lambda function specified in the AWS Glue connection to retrieve metadata objects from the data source. 

------
#### [ AWS Management Console ]

**To create a federated catalog from an external data source and set up permissions (console)**

1. Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. In the navigation pane, choose **Catalogs** under **Data Catalog**.

1. Select the option **Create catalog**. 

1. On the **Set Catalog** details page, enter the following information:   
![\[The create catalog page with options.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/create-glue-connection-catalog.png)
   + **Name** – A unique name for your federated catalog. The name can't be changed, and must be in lower case. The name can consist of a maximum of 255 characters maximum. account. 
   + **Type** – Choose federated catalog as the catalog type.
   + **Source** – Choose a data source from the dropdown. The data sources for which you've created connections are displayed. For more information about creating an AWS Glue connection to an external data source, see [Creating connections for connectors](https://docs.aws.amazon.com/glue/latest/dg/creating-connections.html) in the AWS Glue Developer Guide or [Creating connections in Amazon SageMaker Lakehouse](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/lakehouse-create-connection.html).
   + **Connection** – Choose an existing AWS Glue connection to the data source.
   + **Description** – Enter a description for the catalog created from the data source.

1. Choose an **IAM role** for Lake Formation to assume to vend credentials for the querying engine to access data from the data source. This role must have the required permissions to access the AWS Glue connection and invoke the Lambda function to access data from the external data source.

   You can also **Create a new role** in the IAM console.

   See the [Prerequisites for connecting the Data Catalog to external data sources](connect-data-source-prerequisites.md) section for the required permissions.

1.  Select the option **Activate the connector to connect to the data source** to enable Athena to run federated queries.

   For the supported list of connectors, see [Register your connection](https://docs.aws.amazon.com/athena/latest/ug/register-connection-as-gdc.html) in the Amazon Athena User Guide. 

1. **Encryption options** – Choose **Customize encryption settings** option if you want to use a custom key to encrypt the catalog. To use a custom key, you must add additional custom managed key policy to your KMS key. 

1. Choose **Next** to grant permissions to other principals. 

1. On the **Grant permissions** page, choose **Add permissions**.

1.  On the **Add permissions** screen, choose the principals and the types of permissions to grant.   
![\[The catalog permissions page with principal type and grant options.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/catalog-permissions.png)
   +  In the **Principals** section, choose a principal type and then specify principals to grant permissions. 
     + **IAM users and roles** – Choose one or more users or roles from the IAM users and roles list.
     + **SAML users and groups** – For SAML and Amazon Quick users and groups, enter one or more Amazon Resource Names (ARNs) for users or groups federated through SAML, or ARNs for Amazon Quick users or groups. Press **Enter** after each ARN. 
   +  In the **Permissions** section, select permissions and grantable permissions.

     Under **Catalog permissions**, select one or more permissions to grant.

     Choose **Super user** to grant unrestricted administrative permissions on all resources within the catalog.

      Under **Grantable permissions**, select the permissions that the grant recipient can grant to other principals in their AWS account. This option is not supported when you are granting permissions to an IAM principal from an external account. 

1. Choose **Next** to review the information and create the catalog. The **Catalogs** list shows the new federated catalog.

   The **Data locations** list shows the newly registered federated connection.  
![\[The data locations list with the federated connections.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/federated_data_lake_location.png)

------
#### [ AWS CLI ]

**To create a federated catalog from an external data source and set up permissions**

1.  The following example shows how to create an AWS Glue connection. 

   ```
   aws glue create-connection 
     --connection-input \
         '{
            "Name": "DynamoDB connection",
            "ConnectionType": "DYNAMODB",
            "Description": "A connection created for DynamoDB",
            "ConnectionProperties": {},
            "AthenaProperties": "spill_prefix": "your_spill_prefix",
            "lambda_function_arn": "Lambda_function_arn",
            "spill_bucket": "Your_Bucket_name",
            "AuthenticationConfiguration": {}
         }'
   ```

1.  The following example shows how to register an AWS Glue connection with Lake Formation. 

   ```
   aws lakeformation register-resource 
     --cli-input-json \
       {"ResourceArn":"arn:aws:glue:us-east-1:123456789012:connection/dynamo","RoleArn":"arn:aws:iam::123456789012:role/AdminTelemetry","WithFederation":true}
   ```

1.  The following example shows how to create a federated catalog. 

   ```
   aws glue create-catalog 
    --cli-input-json \
         '{
          "Name":"ddbcatalog",
          "CatalogInput":{"CatalogProperties":{"DataLakeAccessProperties":{"DataTransferRole":"arn:aws:iam::123456789012:role/role name"}},
          "CreateDatabaseDefaultPermissions":[],
          "CreateTableDefaultPermissions":[],
          "FederatedCatalog":{"ConnectionName":"dynamo","Identifier":"dynamo"}
            }
          }'
   ```

------

# Viewing catalog objects
<a name="view-fed-glue-connection-catalog"></a>

For each available data source, AWS Glue creates a corresponding catalog in the AWS Glue Data Catalog. After you create a catalog, you can view the databases and tables in the catalog using the Lake Formation console or AWS CLI. For 

1. Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. Choose **Catalogs** under Data Catalog. The catalogs page shows the catalogs that you've permissions on.  
![\[View catalogs.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/view-catalogs.png)

1. Choose a catalog from the list to view the databases and tables contained in the catalog. The list contains the databases in your account and resource links, which are links to shared databases and tables in external accounts, and are used for cross-account access to data in the data lake.  
![\[View catalogs/databases.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/catalog-database-view.png)

1. Choose **Tables** option under **View** to view and manage the tables in the database. 

****AWS CLI examples for viewing catalogs and databases****  
The following example shows how to view a catalog using AWS CLI 

```
aws glue get-catalog \
--catalog-id 123456789012:dynamodbcatalog
```

The following example shows how to request all catalogs in the account.

```
aws glue get-catalogs \
 --recursive
```

The following example request shows how to get the databases in the catalog.

```
aws glue get-database \
--catalog-id 123456789012:dynamodbcatalog
--database-name database name
```

# Deleting a federated catalog
<a name="delete-glue-fed-catalog"></a>

 You can delete the federated catalogs that you created in the AWS Glue Data Catalog using the `glue:DeleteCatalog` operation or the AWS Lake Formation console. 

**To delete a federated catalog (console)**

1. Open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. In the navigation pane, choose **Catalogs** under **Data Catalog**.

1. Choose the catalog that you want to delete from the catalogs list.

1. Choose **Delete** from **Actions**. 

1. Choose **Drop** to confirm and the federated catalog will be deleted from the Data Catalog.  
![\[The delete catalog confirmation.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/delete-fed-catalog.png)

**To delete a federated catalog (CLI)**
+ 

  ```
  aws glue delete-catalog 
    --catalog-id 123456789012:catalog name
  ```

# Querying federated catalogs
<a name="query-glue-fed-catalog"></a>

After you grant permissions to other principals, they can sign in and start querying the tables in the federated catalogs using Athena.

To create and delete tables in the federated database, the principal must have Lake Formation `Create table`, `Drop` permissions.

 For more information on granting Data Catalog permissions, see [Granting permissions on Data Catalog resources](granting-catalog-permissions.md). 

For more information on querying the Data Catalog from Amazon Athena, see [Querying AWS Glue Data Catalog from Amazon Athena](https://docs.aws.amazon.com/athena/latest/ug/gdc-register.html) in Amazon Athena User Guide. 

# Additional resources
<a name="additional-resources-fed-connection"></a>

 In this blog post, we show how data analysts can now securely access and query data stored outside S3 data lakes, including Amazon Redshift data warehouses and Amazon DynamoDB databases, all through a single, unified experience. Administrators can now apply access controls at different levels of granularity to ensure sensitive data remains protected while expanding data access. This allows organizations to accelerate data initiatives while maintaining security and compliance, leading to faster, data-driven decision-making. 
+ [ Catalog and govern Amazon Athena federated queries with Amazon SageMaker Lakehouse ](https://aws.amazon.com/blogs/big-data/catalog-and-govern-amazon-athena-federated-queries-with-amazon-sagemaker-lakehouse/)