# Onboarding to Lake Formation permissions
<a name="onboarding-lf-permissions"></a>

AWS Lake Formation uses the AWS Glue Data Catalog (Data Catalog) to store metadata for the Amazon S3 data lakes and external data sources such as Amazon Redshift in the form of catalogs, databases and tables. Metadata in the Data Catalog is organized in a three-level data hierarchy comprising catalogs, databases, and tables. It organizes data from various sources into logical containers called catalogs. Databases are collections of tables. The Data Catalog also contains resource links, which are links to shared databases and tables in external accounts, and are used for cross-account access to data in the data lake. Each AWS account has one Data Catalog per AWS Region.

 Lake Formation provides a relational database management system (RDBMS) permissions model to grant or revoke access to catalogs, databases, tables, and columns in the Data Catalog with underlying data in Amazon S3. 

Before you learn about the details of the Lake Formation permissions model, it is helpful to review the following background information:
+ Data lakes managed by Lake Formation reside in designated locations in Amazon Simple Storage Service (Amazon S3). The Data Catalog also contains catalog objects. Each catalog represents data from sources like Amazon Redshift data warehouses, Amazon DynamoDB databases, and third-party data sources such as Snowflake, MySQL, and over 30 external data sources, which are integrated through federated connectors.
+ Lake Formation maintains a Data Catalog that contains metadata about source data to be imported into your data lakes, such as data in logs and relational databases, and about data in your data lakes in Amazon S3. The Data Catalog also contains metadata about data from external data sources other than Amazon S3. The metadata is organized as catalogs, databases and tables. Metadata tables contain schema, location, partitioning, and other information about the data that they represent. Metadata databases are collections of tables.
+  The Lake Formation Data Catalog is the same Data Catalog used by AWS Glue. You can use AWS Glue crawlers to create Data Catalog tables, and you can use AWS Glue extract, transform, and load (ETL) jobs to populate the underlying data in your data lakes.
+ The catalogs, databases, and tables in the Data Catalog are referred to as *Data Catalog resources*. Tables in the Data Catalog are referred to as *metadata tables* to distinguish them from tables in data sources or tabular data in Amazon S3. The data that the metadata tables point to in Amazon S3 or in data sources is referred to as *underlying data*.
+ A *principal* is a user or role, an Amazon Quick user or group, a user or group that authenticates with Lake Formation through a SAML provider, or for cross-account access control, an AWS account ID, organization ID, or organizational unit ID.
+ AWS Glue crawlers create metadata tables, but you can also manually create metadata tables with the Lake Formation console, the API, or the AWS Command Line Interface (AWS CLI). When creating a metadata table, you must specify a location. When you create a database, the location is optional. Table locations can be Amazon S3 locations or data source locations such as an Amazon Relational Database Service (Amazon RDS) database. Database locations are always Amazon S3 locations.
+ Services that integrate with Lake Formation, such as Amazon Athena and Amazon Redshift, can access the Data Catalog to obtain metadata and to check authorization for running queries. For a complete list of integrated services, see [AWS service integrations with Lake Formation](service-integrations.md).

**Topics**
+ [Overview of Lake Formation permissions](lf-permissions-overview.md)
+ [Lake Formation personas and IAM permissions reference](permissions-reference.md)
+ [Changing the default settings for your data lake](change-settings.md)
+ [Implicit Lake Formation permissions](implicit-permissions.md)
+ [Lake Formation permissions reference](lf-permissions-reference.md)
+ [Integrating IAM Identity Center](identity-center-integration.md)
+ [Adding an Amazon S3 location to your data lake](register-data-lake.md)
+ [Hybrid access mode](hybrid-access-mode.md)
+ [Creating objects in the AWS Glue Data Catalog](populating-catalog.md)
+ [Importing data using workflows in Lake Formation](workflows.md)

# Overview of Lake Formation permissions
<a name="lf-permissions-overview"></a>

There are two main types of permissions in AWS Lake Formation:
+ Metadata access – Permissions on Data Catalog resources (*Data Catalog permissions*). 

  These permissions enable principals to create, read, update, and delete metadata databases and tables in the Data Catalog. 
+ Underlying data access – Permissions on locations in Amazon Simple Storage Service (Amazon S3) (*data access permissions* and *data location permissions*). 
  + Data lake permissions enable principals to read and write data to *underlying* Amazon S3 locations—data pointed to by Data Catalog resources. 
  + Data location permissions enable principals to create and alter metadata databases and tables that point to specific Amazon S3 locations. 

For both areas, Lake Formation uses a combination of Lake Formation permissions and AWS Identity and Access Management (IAM) permissions. The IAM permissions model consists of IAM policies. The Lake Formation permissions model is implemented as DBMS-style GRANT/REVOKE commands, such as `Grant SELECT on tableName to userName`.

When a principal makes a request to access Data Catalog resources or underlying data, for the request to succeed, it must pass permission checks by both IAM and Lake Formation.

![\[A requestor's request must pass through two "doors" to get to resources: Lake Formation permissions and IAM permissions.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/permissions_doors.png)


Lake Formation permissions control access to Data Catalog resources, Amazon S3 locations, and the underlying data at those locations. IAM permissions control access to the Lake Formation and AWS Glue APIs and resources. So although you might have the Lake Formation permission to create a metadata table in the Data Catalog (`CREATE_TABLE`), your operation fails if you don't have the IAM permission on the `glue:CreateTable` API. (Why a `glue:` permission? Because Lake Formation uses the AWS Glue Data Catalog.)

**Note**  
Lake Formation permissions apply only in the Region in which they were granted.

AWS Lake Formation requires that each principal (user or role) be authorized to perform actions on Lake Formation–managed resources. A principal is granted the necessary authorizations by the data lake administrator or another principal with the permissions to grant Lake Formation permissions.

When you grant a Lake Formation permission to a principal, you can optionally grant the ability to pass that permission to another principal.

You can use the Lake Formation API, the AWS Command Line Interface (AWS CLI), or the **Data permissions** and **Data locations** pages of the Lake Formation console to grant and revoke Lake Formation permissions.

# Methods for fine-grained access control
<a name="access-control-fine-grained"></a>

With a data lake, the goal is to have fine-grained access control to data. In Lake Formation, this means fine-grained access control to Data Catalog resources and Amazon S3 locations. You can achieve fine-grained access control with one of the following methods.


| Method | Lake Formation Permissions | IAM Permissions | Comments | 
| --- | --- | --- | --- | 
| Method 1 | Open | Fine-grained |  **This is the default method** for backward compatibility with AWS Glue. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/access-control-fine-grained.html) On the Lake Formation console, this method appears as **Use only IAM access control**.  | 
| Method 2 | Fine-grained | Coarse-grained |  **This is the recommended method.** [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/access-control-fine-grained.html)  | 

**Important**  
Be aware of the following:  
By default, Lake Formation has the **Use only IAM access control** settings enabled for compatibility with existing AWS Glue Data Catalog behavior. We recommend that you disable these settings after you transition to using Lake Formation permissions. For more information, see [Changing the default settings for your data lake](change-settings.md).
Data lake administrators and database creators have implicit Lake Formation permissions that you must understand. For more information, see [Implicit Lake Formation permissions](implicit-permissions.md).

# Metadata access control
<a name="access-control-metadata"></a>

For access control for Data Catalog resources, the following discussion assumes fine-grained access control with Lake Formation permissions and coarse-grained access control with IAM policies.

There are two distinct methods for granting Lake Formation permissions on Data Catalog resources:
+ **Named resource access control** – With this method, you grant permissions on specific databases or tables by specifying database or table names. The grants have this form:

  Grant *permissions* to *principals* on *resources* [with grant option].

  With the grant option, you can allow the grantee to grant the permissions to other principals.
+ **Tag-based access control** – With this method, you assign one or more LF-Tags to Data Catalog databases, tables, and columns, and grant permissions on one or more LF-Tags to principals. Each LF-Tag is a key-value pair, such as `department=sales`. A principal that has LF-Tags that match the LF-Tags on a Data Catalog resource can access that resource. This method is recommended for data lakes with a large number of databases and tables. It's explained in detail in [Lake Formation tag-based access control](tag-based-access-control.md).

The permissions that a principal has on a resource is the union of the permissions granted by both the methods.

The following table summarizes the available Lake Formation permissions on Data Catalog resources. The column headings indicate the resource on which the permission is granted.


| Catalog | Database | Table | 
| --- | --- | --- | 
| CREATE\$1DATABASE | CREATE\$1TABLE | ALTER | 
|  | ALTER | DROP | 
|  | DROP | DESCRIBE | 
|  | DESCRIBE | SELECT\$1 | 
|  |  | INSERT\$1 | 
|  |  | DELETE\$1 | 

For example, the `CREATE_TABLE` permission is granted on a database. This means that the principal is allowed to create tables in that database.

The permissions with an asterisk (\$1) are granted on Data Catalog resources, but they apply to the underlying data. For example, the `DROP` permission on a metadata table enables you to drop the table from the Data Catalog. However, the `DELETE` permission granted on the same table enables you to delete the table's underlying data in Amazon S3, using, for example, a SQL `DELETE` statement. With these permissions, you also can view the table on the Lake Formation console and retrieve information about the table with the AWS Glue API. Thus, `SELECT`, `INSERT`, and `DELETE` are both Data Catalog permissions and data access permissions.

When granting `SELECT` on a table, you can add a filter that includes or excludes one or more columns. This permits fine-grained access control on metadata table columns, limiting the columns that users of integrated services can see when running queries. This capability is not available using just IAM policies.

There is also a special permission named `Super`. The `Super` permission enables a principal to perform every supported Lake Formation operation on the database or table on which it is granted. This permission can coexist with the other Lake Formation permissions. For example, you can grant `Super`, `SELECT`, and `INSERT` on a metadata table. The principal can perform all supported actions on the table, and when you revoke `Super`, the `SELECT` and `INSERT` permissions remain.

For details on each permission, see [Lake Formation permissions reference](lf-permissions-reference.md).

**Important**  
To be able to see a Data Catalog table created by another user, you must be granted at least one Lake Formation permission on the table. If you are granted at least one permission on the table, you can also see the table's containing database.

You can grant or revoke Data Catalog permissions using the Lake Formation console, the API, or the AWS Command Line Interface (AWS CLI). The following is an example of an AWS CLI command that grants the user `datalake_user1` permission to create tables in the `retail` database.

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 
 --permissions "CREATE_TABLE" --resource '{ "Database": {"Name":"retail"}}'
```

The following is an example of a coarse-grained access control IAM policy that complements fine-grained access control with Lake Formation permissions. It permits all operations on any metadata database or table.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:*Database*",
                "glue:*Table*",
                "glue:*Partition*"
            ],
            "Resource": "*"
        }
    ]
}
```

------

The next example is also coarse-grained but somewhat more restrictive. It permits read-only operations on all metadata databases and tables in the Data Catalog in the designated account and Region.

------
#### [ JSON ]

****  

```
{  
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetTables",
                "glue:SearchTables",
                "glue:GetTable",
                "glue:GetDatabase", 
                "glue:GetDatabases"
            ],
            "Resource": "arn:aws:glue:us-east-1:111122223333:*"
        } 
    ]   
}
```

------

Compare these policies to the following policy, which implements IAM-based fine-grained access control. It grants permissions only on a subset of tables in the customer relationship management (CRM) metadata database in the designated account and Region.

------
#### [ JSON ]

****  

```
{  
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetTables",
                "glue:SearchTables",
                "glue:GetTable",
                "glue:GetDatabase", 
                "glue:GetDatabases"
            ],
            "Resource": [
                "arn:aws:glue:us-east-1:111122223333:catalog",
                "arn:aws:glue:us-east-1:111122223333:database/CRM",
                "arn:aws:glue:us-east-1:111122223333:table/CRM/P*"
            ]
        } 
    ]   
}
```

------

For more examples of coarse-grained access control policies, see [Lake Formation personas and IAM permissions reference](permissions-reference.md).

# Underlying data access control
<a name="access-control-underlying-data"></a>

When an integrated AWS service requests access to data in an Amazon S3 location that is access-controlled by AWS Lake Formation, Lake Formation supplies temporary credentials to access the data.

To enable Lake Formation to control access to underlying data at an Amazon S3 location, you *register* that location with Lake Formation.

After you register an Amazon S3 location, you can start granting the following Lake Formation permissions:
+ Data access permissions (`SELECT`, `INSERT`, and `DELETE)` on Data Catalog tables that point to that location.
+ Data location permissions on that location.

Lake Formation data location permissions control the ability to create Data Catalog resources that point to particular Amazon S3 locations. Data location permissions provide an extra layer of security to locations within the data lake. When you grant the `CREATE_TABLE` or `ALTER` permission to a principal, you also grant data location permissions to limit the locations for which the principal can create or alter metadata tables. 

Amazon S3 locations are buckets or prefixes under a bucket, but not individual Amazon S3 objects.

You can grant data location permissions to a principal by using the Lake Formation console, the API, or the AWS CLI. The general form of a grant is as follows: 

```
grant DATA_LOCATION_ACCESS to principal on S3 location [with grant option]
```

If you include `with grant option`, the grantee can grant the permissions to other principals.

Recall that Lake Formation permissions always work in combination with AWS Identity and Access Management (IAM) permissions for fine-grained access control. For read/write permissions on underlying Amazon S3 data, IAM permissions are granted as follows:

When you register a location, you specify an IAM role that grants read/write permissions on that location. Lake Formation assumes that role when supplying temporary credentials to integrated AWS services. A typical role might have the following policy attached, where the registered location is the bucket `awsexamplebucket`.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::amzn-s3-demo-bucket/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::amzn-s3-demo-bucket"
            ]
        }
    ]
}
```

------

Lake Formation provides a service-linked role that you can use during registration to automatically create policies like this. For more information, see [Using service-linked roles for Lake Formation](service-linked-roles.md).

Therefore, registering an Amazon S3 location grants the required IAM `s3:` permissions on that location, where the permissions are specified by the role used to register the location.

**Important**  
Avoid registering an Amazon S3 bucket that has **Requester pays** enabled. For buckets registered with Lake Formation, the role used to register the bucket is always viewed as the requester. If the bucket is accessed by another AWS account, the bucket owner is charged for data access if the role belongs to the same account as the bucket owner.

For read/write access to underlying data, in addition to Lake Formation permissions, principals also need the `lakeformation:GetDataAccess` IAM permission. With this permission, Lake Formation grants the request for temporary credentials to access the data.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": "lakeformation:GetDataAccess",
            "Resource": "*"
        }
    ]
}
```

------

 In the above policy, you must set the Resource parameter to '\$1' (all. Specifying any other resource for this permission is not supported. This configuration ensures that Lake Formation can manage data access across your entire data lake environment efficiently. 

**Note**  
Amazon Athena requires the user to have the `lakeformation:GetDataAccess` permission. Other integrated services require their underlying execution role to have the `lakeformation:GetDataAccess` permission.

This permission is included in the suggested policies in the [Lake Formation personas and IAM permissions reference](permissions-reference.md).

To summarize, to enable Lake Formation principals to read and write underlying data with access controlled by Lake Formation permissions:
+ Register the Amazon S3 locations that contain the data with Lake Formation.
+ Principals who create Data Catalog tables that point to underlying data locations must have data location permissions.
+ Principals who read and write underlying data must have Lake Formation data access permissions on the Data Catalog tables that point to the underlying data locations.
+ Principals who read and write underlying data must have the `lakeformation:GetDataAccess` IAM permission when the underlying data location is registered with Lake Formation.

**Note**  
The Lake Formation permissions model doesn't prevent access to Amazon S3 locations through the Amazon S3 API or console if you have access to them through IAM or Amazon S3 policies. You can attach IAM policies to principals to block this access.

**More on data location permissions**  
Data location permissions govern the outcome of create and update operations on Data Catalog databases and tables. The rules are as follows:
+ A principal must have explicit or implicit data location permissions on an Amazon S3 location to create or update a database or table that specifies that location.
+ The explicit permission `DATA_LOCATION_ACCESS` is granted using the console, API, or AWS CLI.
+ Implicit permissions are granted when a database has a location property that points to a registered location, the principal has the `CREATE_TABLE` permission on the database, and the principal tries to create a table at that location or a child location.
+ If a principal is granted data location permissions on a location, the principal has data location permissions on all child locations.
+ A principal does not need data location permissions to perform read/write operations on the underlying data. It is sufficient to have the `SELECT` or `INSERT` data access permissions. Data location permissions apply only to creating Data Catalog resources that point to the location.

Consider the scenario shown in the following diagram.

![\[Folder hierarchy and two databases, database A and B, with database B pointing to the Customer service folder.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/location-permissions-example.png)


In this diagram:
+ The Amazon S3 buckets `Products`, `Finance`, and `Customer Service` are registered with Lake Formation.
+ `Database A` has no location property, and `Database B` has a location property that points to the `Customer Service` bucket.
+ User `datalake_user` has `CREATE_TABLE` on both databases.
+ User `datalake_user` has been granted data location permissions only on the `Products` bucket. 

The following are the results when user `datalake_user` tries to create a catalog table in a particular database at a particular location.


**Location where `datalake_user` tries to create a table**  

| Database and Location | Succeeds or Fails | Reason | 
| --- | --- | --- | 
| Database A at Finance/Sales | Fails | No data location permission | 
| Database A at Products | Succeeds | Has data location permission | 
| Database A at HR/Plans | Succeeds | Location is not registered | 
| Database B at Customer Service/Incidents | Succeeds | Database has location property at Customer Service | 

For more information, see the following:
+ [Adding an Amazon S3 location to your data lake](register-data-lake.md)
+ [Lake Formation permissions reference](lf-permissions-reference.md)
+ [Lake Formation personas and IAM permissions reference](permissions-reference.md)

# Lake Formation personas and IAM permissions reference
<a name="permissions-reference"></a>

This section lists some suggested Lake Formation personas and their suggested AWS Identity and Access Management (IAM) permissions. For information about Lake Formation permissions, see [Lake Formation permissions reference](lf-permissions-reference.md).

## AWS Lake Formation personas
<a name="lf-personas"></a>

The following table lists the suggested AWS Lake Formation personas.


**Lake Formation Personas**  

| Persona | Description | 
| --- | --- | 
| IAM administrator (superuser) | (Required) User who can create IAM users and roles. Has the AdministratorAccess AWS managed policy. Has all permissions on all Lake Formation resources. Can add data lake administrators. Cannot grant Lake Formation permissions if not also designated a data lake administrator. | 
| Data lake administrator | (Required) User who can register Amazon S3 locations, access the Data Catalog, create databases, create and run workflows, grant Lake Formation permissions to other users, and view AWS CloudTrail logs. Has fewer IAM permissions than the IAM administrator, but enough to administer the data lake. Cannot add other data lake administrators. | 
| Read only administrator | (Optional) User who can view principals, Data Catalog resources, permissions, and AWS CloudTrail logs, without the permissions to make updates. | 
| Data engineer | (Optional) User who can create databases, create and run crawlers and workflows, and grant Lake Formation permissions on the Data Catalog tables that the crawlers and workflows create. We recommend that you make all data engineers database creators. For more information, see [Creating a database](creating-database.md). | 
| Data analyst | (Optional) User who can run queries against the data lake using, for example, Amazon Athena. Has only enough permissions to run queries. | 
| Workflow role | (Required) Role that runs a workflow on behalf of a user. You specify this role when you create a workflow from a blueprint. | 

**Note**  
In Lake Formation, data lake administrators added after database creation can grant permissions but don't automatically have data access permissions such as SELECT or DESCRIBE. Administrators who create databases receive `SUPER` permissions on those databases. This behavior is intentional—while all administrators can grant themselves necessary permissions, these permissions aren't automatically applied to pre-existing resources. Therefore, administrators must explicitly grant themselves access to databases that existed before they were assigned admin privileges. 

## AWS managed policies for Lake Formation
<a name="lf-managed-policies"></a>

You can grant the AWS Identity and Access Management (IAM) permissions that are required to work with AWS Lake Formation by using AWS managed policies and inline policies. The following AWS managed policies are available for Lake Formation.

### AWS managed policy:AWSLakeFormationDataAdmin
<a name="lf-data-admin"></a>

 [AWSLakeFormationDataAdmin](https://console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AWSLakeFormationDataAdmin) policy grants administrative access to AWS Lake Formation and related services such as AWS Glue to manage data lakes. 

You can attach `AWSLakeFormationDataAdmin` to your users, groups, and roles.

**Permission details**
+ `CloudTrail` – Allows principals to view AWS CloudTrail logs. This is required to review any errors in the set up of the data lake.
+ `Glue` – Allows principals to view, create, and update metadata tables and databases in Data Catalog. This includes API operations that start with `Get`, `List`, `Create`, `Update`, `Delete`, and `Search`. This is required to manage the metadata of the data lake tables.
+ `IAM` – Allows principals to retrieve information about IAM users, roles, and policies attached to the roles. This is required for the data admin to review and list IAM users and roles to grant Lake Formation permissions.
+ `Lake Formation` – Grants data lake admins required Lake Formation permissions to manage data lakes.
+ `S3` – Allows principals to retrieve information about Amazon S3 buckets and their locations in order to set up the data location for data lakes.

```
"Statement": [
        {
            "Sid": "AWSLakeFormationDataAdminAllow",
            "Effect": "Allow",
            "Action": [
                "lakeformation:*",
                "cloudtrail:DescribeTrails",
                "cloudtrail:LookupEvents",
                "glue:CreateCatalog",
		"glue:UpdateCatalog",
                "glue:DeleteCatalog",
		"glue:GetCatalog",
	        "glue:GetCatalogs",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "glue:CreateDatabase",
                "glue:UpdateDatabase",
                "glue:DeleteDatabase",
                "glue:GetConnections",
                "glue:SearchTables",
                "glue:GetTable",
                "glue:CreateTable",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:GetTableVersions",
                "glue:GetPartitions",
                "glue:GetTables",
                "glue:ListWorkflows",
                "glue:BatchGetWorkflows",
                "glue:DeleteWorkflow",
                "glue:GetWorkflowRuns",
                "glue:StartWorkflowRun",
                "glue:GetWorkflow",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:ListAllMyBuckets",
                "s3:GetBucketAcl",
                "iam:ListUsers",
                "iam:ListRoles",
                "iam:GetRole",
                "iam:GetRolePolicy"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AWSLakeFormationDataAdminDeny",
            "Effect": "Deny",
            "Action": [
                "lakeformation:PutDataLakeSettings"
            ],
                "Resource": "*"
        }
    ]
}
```

**Note**  
The `AWSLakeFormationDataAdmin` policy does not grant every required permission for data lake administrators. Additional permissions are needed to create and run workflows and register locations with the service linked role `AWSServiceRoleForLakeFormationDataAccess`. For more information, see [Create a data lake administrator](initial-lf-config.md#create-data-lake-admin) and [Using service-linked roles for Lake Formation](service-linked-roles.md).

### AWS managed policy:AWSLakeFormationCrossAccountManager
<a name="lf-cross-account-manager"></a>

[AWSLakeFormationCrossAccountManager](https://console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AWSLakeFormationCrossAccountManager) policy provides cross account access to AWS Glue resources via Lake Formation, and grants read access to other required services such as AWS Organizations and AWS RAM.

You can attach `AWSLakeFormationCrossAccountManager` to your users, groups, and roles.

**Permission details**

This policy includes the following permissions.
+ `Glue` – Allows principals to set or delete the Data Catalog resource policy for access control.
+ `Organizations` – Allows principals to retrieve account and organizational unit (OU) information for an organization.
+ `ram:CreateResourceShare` – Allows principals to create a resource share.
+ `ram:UpdateResourceShare` –Allows principals to modify some properties of the specified resource share.
+ `ram:DeleteResourceShare` – Allows principals to delete the specified resource share.
+ `ram:AssociateResourceShare` – Allows principals to add the specified list of principals and list of resources to a resource share.
+ `ram:DisassociateResourceShare` – Allows principals to remove the specified principals or resources from participating in the specified resource share. 
+ `ram:GetResourceShares`– Allows principals to retrieve details about the resource shares that you own or that are shared with you. 
+ `ram:RequestedResourceType` – Allows principals to retrieve the resource type (database, table or catalog).
+ `AssociateResourceSharePermission` – Allows principals to add or replace the AWS RAM permission for a resource type included in a resource share. You can have exactly one permission associated with each resource type in the resource share.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [{
            "Sid": "AllowCreateResourceShare",
            "Effect": "Allow",
            "Action": [
                "ram:CreateResourceShare"
            ],
            "Resource": "*",
            "Condition": {
                "StringLikeIfExists": {
                    "ram:RequestedResourceType": [
                        "glue:Table",
                        "glue:Database",
                        "glue:Catalog"
                    ]
                }
            }
        },
        {
            "Sid": "AllowManageResourceShare",
            "Effect": "Allow",
            "Action": [
                "ram:UpdateResourceShare",
                "ram:DeleteResourceShare",
                "ram:AssociateResourceShare",
                "ram:DisassociateResourceShare",
                "ram:GetResourceShares"
            ],
            "Resource": "*",
            "Condition": {
                "StringLike": {
                    "ram:ResourceShareName": [
                        "LakeFormation*"
                    ]
                }
            }
        },
        {
            "Sid": "AllowManageResourceSharePermissions",
            "Effect": "Allow",
            "Action": [
                "ram:AssociateResourceSharePermission"
            ],
            "Resource": "*",
            "Condition": {
                "ArnLike": {
                    "ram:PermissionArn": [
                        "arn:aws:ram::aws:permission/AWSRAMLFEnabled*"
                    ]
                }
            }
        },
        {
            "Sid": "AllowXAcctManagerPermissions",
            "Effect": "Allow",
            "Action": [
                "glue:PutResourcePolicy",
                "glue:DeleteResourcePolicy",
                "organizations:DescribeOrganization",
                "organizations:DescribeAccount",
                "ram:Get*",
                "ram:List*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "AllowOrganizationsPermissions",
            "Effect": "Allow",
            "Action": [
                "organizations:ListRoots",
                "organizations:ListAccountsForParent",
                "organizations:ListOrganizationalUnitsForParent"
            ],
            "Resource": "*"
        }
    ]
}
```

------

### AWS managed policy:AWSGlueConsoleFullAccess
<a name="glue-console-access-policy"></a>

[AWSGlueConsoleFullAccess](https://console.aws.amazon.com/iam/home#policies/arn:aws:iam::aws:policy/AWSGlueConsoleFullAccess) policy grants full access to AWS Glue resources when an identity that the policy is attached to uses the AWS Management Console. If you follow the naming convention for resources specified in this policy, users have full console capabilities. This policy is typically attached to users of the AWS Glue console.

In addition, AWS Glue and Lake Formation assume the service role `AWSGlueServiceRole` to allow access to related services, including Amazon Elastic Compute Cloud (Amazon EC2), Amazon Simple Storage Service (Amazon S3), and Amazon CloudWatch.

### AWS managed policy:LakeFormationDataAccessServiceRolePolicy
<a name="lake-formation-data-access-service-role-policy"></a>

This policy is attached to a service-linked role named `ServiceRoleForLakeFormationDataAccess` that allows the service to perform actions on resources at your request. You can't attach this policy to your IAM identities.

This policy allows the Lake Formation integrated AWS services such as Amazon Athena or Amazon Redshift to use the service-linked role to discover Amazon S3 resources.

For more information see, [Using service-linked roles for Lake Formation](service-linked-roles.md).

**Permission details**

This policy includes the following permission.
+ `s3:ListAllMyBuckets` – Returns a list of all buckets owned by the authenticated sender of the request.

------
#### [ JSON ]

****  

```
{
	"Version":"2012-10-17",		 	 	 
	"Statement": [
		{
			"Sid": "LakeFormationDataAccessServiceRolePolicy",
			"Effect": "Allow",
			"Action": [
				"s3:ListAllMyBuckets"
			],
			"Resource": [
				"arn:aws:s3:::*"
			]
		}
	]
}
```

------

**Lake Formation updates to AWS managed policies**  
View details about updates to AWS managed policies for Lake Formation since this service began tracking these changes.


| Change | Description | Date | 
| --- | --- | --- | 
| Lake Formation updated AWSLakeFormationCrossAccountManager policy.  | Lake Formation enhanced the [AWSLakeFormationCrossAccountManager](https://us-east-1.console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AWSLakeFormationCrossAccountManager) policy by replacing the StringLike condition operator with the ArnLike operator that allows IAM to perform the ARN format check. | January, 2025 | 
| Lake Formation updated AWSLakeFormationDataAdmin policy.  | Lake Formation enhanced the [AWSLakeFormationDataAdmin](https://us-east-1.console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AWSLakeFormationDataAdmin) policy by adding the following AWS Glue Data Catalog CRUD APIs as part of the multi-catalog feature. [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)This managed policy change is to ensure that the Lake Formation administrator persona by default has IAM permission on these new operations. | December, 2024 | 
| Lake Formation updated AWSLakeFormationCrossAccountManager policy.  | Lake Formation enhanced the [AWSLakeFormationCrossAccountManager](https://us-east-1.console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AWSLakeFormationCrossAccountManager) policy by adding Sid elements to the policy statement. | March, 2024 | 
| Lake Formation updated AWSLakeFormationDataAdmin policy.  | Lake Formation enhanced the [AWSLakeFormationDataAdmin](https://us-east-1.console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AWSLakeFormationDataAdmin) policy by adding a Sid element to the policy statement and removing a redundant action. | March, 2024 | 
| Lake Formation updated LakeFormationDataAccessServiceRolePolicy policy.  | Lake Formation enhanced the [LakeFormationDataAccessServiceRolePolicy](https://us-east-1.console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/LakeFormationDataAccessServiceRolePolicy) policy by adding a Sid element to the policy statement. | February, 2024 | 
| Lake Formation updated AWSLakeFormationCrossAccountManager policy.  | Lake Formation enhanced the [AWSLakeFormationCrossAccountManager](https://us-east-1.console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AWSLakeFormationCrossAccountManager) policy by adding a new permission to enable cross-account data sharing in hybrid access mode. | October, 2023 | 
| Lake Formation updated AWSLakeFormationCrossAccountManager policy.  | Lake Formation enhanced the [AWSLakeFormationCrossAccountManager](https://us-east-1.console.aws.amazon.com/iam/home#/policies/arn:aws:iam::aws:policy/AWSLakeFormationCrossAccountManager) policy to create only one resource share per recipient account when the a resource is first shared. All resources shared thereafter with the same account are attached to the same resource share. | May 6, 2022 | 
| Lake Formation started tracking changes. | Lake Formation started tracking changes for its AWS managed policies. | May 6, 2022 | 

## Personas suggested permissions
<a name="lf-permissions-tables"></a>

The following are the suggested permissions for each persona. The IAM administrator is not included because that user has all permissions on all resources.

**Topics**
+ [Data lake administrator permissions](#persona-dl-admin)
+ [Read only administrator permissions](#persona-read-only-admin)
+ [Data engineer permissions](#persona-engineer)
+ [Data analyst permissions](#persona-user)
+ [Workflow role permissions](#persona-workflow-role)

### Data lake administrator permissions
<a name="persona-dl-admin"></a>

**Important**  
In the following policies, replace *<account-id>* with a valid AWS account number, and replace *<workflow\$1role>* with the name of a role that has permissions to run a workflow, as defined in [Workflow role permissions](#persona-workflow-role).


| Policy Type | Policy | 
| --- | --- | 
| AWS managed policies |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html) For information about the optional AWS managed policies, see [Create a data lake administrator](initial-lf-config.md#create-data-lake-admin).  | 
| Inline policy (for creating the Lake Formation service-linked role) |  <pre>{<br />    "Version": "2012-10-17",		 	 	 <br />    "Statement": [<br />        {<br />            "Effect": "Allow",<br />            "Action": "iam:CreateServiceLinkedRole",<br />            "Resource": "*",<br />            "Condition": {<br />                "StringEquals": {<br />                    "iam:AWSServiceName": "lakeformation.amazonaws.com"<br />                }<br />            }<br />        },<br />        {<br />            "Effect": "Allow",<br />            "Action": [<br />                "iam:PutRolePolicy"<br />            ],<br />            "Resource": "arn:aws:iam::<account-id>:role/aws-service-role/lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess"<br />        }<br />    ]<br />}<br /></pre>  | 
| (Optional) Inline policy (passrole policy for the workflow role). This is required only if the data lake administrator creates and runs workflows. |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)  | 
| (Optional) Inline policy (if your account is granting or receiving cross-account Lake Formation permissions). This policy is for accepting or rejecting AWS RAM resource share invitations, and for enabling the granting of cross-account permissions to organizations. ram:EnableSharingWithAwsOrganization is required only for data lake administrators in the AWS Organizations management account. |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)  | 

### Read only administrator permissions
<a name="persona-read-only-admin"></a>


| Policy type | Policy | 
| --- | --- | 
| Inline policy (basic) |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)  | 

### Data engineer permissions
<a name="persona-engineer"></a>

**Important**  
In the following policies, replace *<account-id>* with a valid AWS account number, and replace *<workflow\$1role>* with the name of the workflow role.


| Policy Type | Policy | 
| --- | --- | 
| AWS managed policy | AWSGlueConsoleFullAccess | 
| Inline policy (basic) |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)  | 
| Inline policy (for operations on governed tables, including operations within transactions) |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)  | 
| Inline policy (for metadata access control using the Lake Formation tag-based access control (LF-TBAC) method) |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)  | 
| Inline policy (passrole policy for the workflow role) |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)  | 

### Data analyst permissions
<a name="persona-user"></a>


| Policy Type | Policy | 
| --- | --- | 
| AWS managed policy | AmazonAthenaFullAccess | 
| Inline policy (basic) |  <pre>{<br />    "Version": "2012-10-17",		 	 	 <br />    "Statement": [<br />        {<br />            "Effect": "Allow",<br />            "Action": [<br />                "lakeformation:GetDataAccess",<br />                "glue:GetTable",<br />                "glue:GetTables",<br />                "glue:SearchTables",<br />                "glue:GetDatabase",<br />                "glue:GetDatabases",<br />                "glue:GetPartitions",<br />                "lakeformation:GetResourceLFTags",<br />                "lakeformation:ListLFTags",<br />                "lakeformation:GetLFTag",<br />                "lakeformation:SearchTablesByLFTags",<br />                "lakeformation:SearchDatabasesByLFTags"                <br />           ],<br />            "Resource": "*"<br />        }<br />    ]<br />}</pre>  | 
| (Optional) Inline policy (for operations on governed tables, including operations within transactions) |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)  | 

### Workflow role permissions
<a name="persona-workflow-role"></a>

This role has the permissions required to run a workflow. You specify a role with these permissions when you create a workflow.

**Important**  
In the following policies, replace *<region>* with a valid AWS Region identifier (for example `us-east-1`), *<account-id>* with a valid AWS account number, *<workflow\$1role>* with the name of the workflow role, and *<your-s3-cloudtrail-bucket>* with the Amazon S3 path to your AWS CloudTrail logs.


| Policy Type | Policy | 
| --- | --- | 
| AWS managed policy | AWSGlueServiceRole  | 
| Inline policy (data access) |  <pre>{<br />    "Version": "2012-10-17",		 	 	 <br />    "Statement": [<br />        {<br />            "Sid": "Lakeformation",<br />            "Effect": "Allow",<br />            "Action": [<br />                 "lakeformation:GetDataAccess",<br />                 "lakeformation:GrantPermissions"<br />             ],<br />            "Resource": "*"<br />        }<br />    ]<br />}</pre>  | 
| Inline policy (passrole policy for the workflow role) |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)  | 
| Inline policy (for ingesting data outside the data lake, for example, AWS CloudTrail logs) |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/permissions-reference.html)  | 

# Changing the default settings for your data lake
<a name="change-settings"></a>

To maintain backward compatibility with AWS Glue, AWS Lake Formation has the following initial security settings:
+ The `Super` permission is granted to the group `IAMAllowedPrincipals` on all existing AWS Glue Data Catalog resources.
+ "Use only IAM access control" settings are enabled for new Data Catalog resources.

These settings effectively cause access to Data Catalog resources and Amazon S3 locations to be controlled solely by AWS Identity and Access Management (IAM) policies. Individual Lake Formation permissions are not in effect.

The `IAMAllowedPrincipals` group includes any IAM users and roles that are allowed access to your Data Catalog resources by your IAM policies. The `Super` permission enables a principal to perform every supported Lake Formation operation on the database or table on which it is granted.

To change security settings so that access to Data Catalog resources (databases and tables) is managed by Lake Formation permissions, do the following:

1. Change the default security settings for new resources. For instructions, see [Change the default permission model or use hybrid access mode](initial-lf-config.md#setup-change-cat-settings).

1. Change the settings for existing Data Catalog resources. For instructions, see [Upgrading AWS Glue data permissions to the AWS Lake Formation model](upgrade-glue-lake-formation.md).

**Changing the default security settings using the Lake Formation `PutDataLakeSettings` API operation**  
You can also change default security settings by using the Lake Formation [PutDataLakeSettings](https://docs.aws.amazon.com/lake-formation/latest/APIReference/API_PutDataLakeSettings.html) API operation. This action takes as arguments an optional catalog ID and a [DataLakeSettings](https://docs.aws.amazon.com/lake-formation/latest/APIReference/API_DataLakeSettings.html) structure.

To enforce metadata and underlying data access control by Lake Formation on new databases and tables, code the `DataLakeSettings` structure as follows.

**Note**  
Replace *<AccountID>* with a valid AWS account ID and *<Username>* with a valid IAM user name. You can specify more than one user as a data lake administrator.

```
{
    "DataLakeSettings": {
        "DataLakeAdmins": [
            {
                "DataLakePrincipalIdentifier": "arn:aws:iam::<AccountId>:user/<Username>"
            }
        ],
        "CreateDatabaseDefaultPermissions": [],
        "CreateTableDefaultPermissions": []
    }
}
```

You can also code the structure as follows. Omitting the `CreateDatabaseDefaultPermissions` or `CreateTableDefaultPermissions` parameter is equivalent to passing an empty list.

```
{
    "DataLakeSettings": {
        "DataLakeAdmins": [
            {
                "DataLakePrincipalIdentifier": "arn:aws:iam::<AccountId>:user/<Username>"
            }
        ]
    }
}
```

This action effectively revokes all Lake Formation permissions from the `IAMAllowedPrincipals` group on new databases and tables. When you create a database, you can override this setting.

To enforce metadata and underlying data access control only by IAM on new databases and tables, code the `DataLakeSettings` structure as follows.

```
{
    "DataLakeSettings": {
        "DataLakeAdmins": [
            {
                "DataLakePrincipalIdentifier": "arn:aws:iam::<AccountId>:user/<Username>"
            }
        ],
        "CreateDatabaseDefaultPermissions": [
            {
                "Principal": {
                    "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                },
                "Permissions": [
                    "ALL"
                ]
            }
        ],
        "CreateTableDefaultPermissions": [
            {
                "Principal": {
                    "DataLakePrincipalIdentifier": "IAM_ALLOWED_PRINCIPALS"
                },
                "Permissions": [
                    "ALL"
                ]
            }
        ]
    }
}
```

This grants the `Super` Lake Formation permission to the `IAMAllowedPrincipals` group on new databases and tables. When you create a database, you can override this setting.

**Note**  
In the preceding `DataLakeSettings` structure, the only permitted value for `DataLakePrincipalIdentifier` is `IAM_ALLOWED_PRINCIPALS`, and the only permitted value for `Permissions` is `ALL`.

# Implicit Lake Formation permissions
<a name="implicit-permissions"></a>

AWS Lake Formation grants the following implicit permissions to data lake administrators, database creators, and table creators.

**Data lake administrators**  
+ Have `Describe` access to all resources in the Data Catalog except for resources shared from another account directly to a different principal. This access cannot be revoked from an administrator.
+ Have data location permissions everywhere in the data lake.
+ Can grant or revoke access to any resources in the Data Catalog to any principal (including self). This access cannot be revoked from an administrator.
+ Can create databases in the Data Catalog.
+ Can grant the permission to create a database to another user.
Data lake administrators can register Amazon S3 locations only if they have IAM permissions to do so. The suggested data lake administrator policies in this guide grant those permissions. Also, data lake administrators do not have implicit permissions to drop databases or alter/drop tables created by others. However, they can grant themselves permissions to do so.
For more information about data lake administrators, see [Create a data lake administrator](initial-lf-config.md#create-data-lake-admin).

**Catalog creators**  
+ Have all catalog permissions on catalogs that they create, have permissions on databases and tables that they create in the catalog, and can grant other principals in the same AWS account permission to create databases and tables in the catalog. A catalog creator who also has the `AWSLakeFormationCrossAccountManager` AWS managed policy can grant permissions on the catalog to other AWS accounts or organizations.

  Data lake administrators can use the Lake Formation console or API to designate catalog creators.
**Note**  
Catalog creators do not implicitly have permissions on databases and tables that others create in the catalog.
For more information on creating catalogs, see [Bringing your data into the AWS Glue Data Catalog](bring-your-data-overview.md).

**Database creators**  
+ Have all database permissions on databases that they create, have permissions on tables that they create in the database, and can grant other principals in the same AWS account permission to create tables in the database. A database creator who also has the `AWSLakeFormationCrossAccountManager` AWS managed policy can grant permissions on the database to other AWS accounts or organizations.

  Data lake administrators can use the Lake Formation console or API to designate database creators.
**Note**  
Database creators do not implicitly have permissions on tables that others create in the database.
For more information, see [Creating a database](creating-database.md).

**Table creators**  
+ Have all permissions on tables that they create.
+ Can grant permissions on all tables that they create to principals in the same AWS account.
+ Can grant permissions on all tables that they create to other AWS accounts or organizations if they have the `AWSLakeFormationCrossAccountManager` AWS managed policy.
+ Can view the databases that contain the tables that they create.

# Lake Formation permissions reference
<a name="lf-permissions-reference"></a>

To perform AWS Lake Formation operations, principals need both Lake Formation permissions and AWS Identity and Access Management (IAM) permissions. You typically grant IAM permissions using *coarse-grained* access control policies, as described in [Overview of Lake Formation permissions](lf-permissions-overview.md). You can grant Lake Formation permissions by using the console, the API, or the AWS Command Line Interface (AWS CLI). 

To learn how to grant or revoke Lake Formation permissions, see [Granting permissions on Data Catalog resources](granting-catalog-permissions.md) and [Granting data location permissions](granting-location-permissions.md).

**Note**  
The examples in this section show how to grant permissions to principals in the same AWS account. For examples of cross-account grants, see [Cross-account data sharing in Lake Formation](cross-account-permissions.md). 

## Lake Formation permissions per resource type
<a name="lf-resource-permissions-summary"></a>

Following are the valid Lake Formation permissions available for each type of resource:

[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/lf-permissions-reference.html)

**Topics**
+ [Lake Formation permissions per resource type](#lf-resource-permissions-summary)
+ [Lake Formation grant and revoke AWS CLI commands](#perm-command-format)
+ [Lake Formation permissions](#lf-permissions)

## Lake Formation grant and revoke AWS CLI commands
<a name="perm-command-format"></a>

Each permission description in this section includes examples of granting the permission using an AWS CLI command. The following are the synopses of the Lake Formation **grant-permissions** and **revoke-permissions** AWS CLI commands.

```
grant-permissions
[--catalog-id <value>]
--principal <value>
--resource <value>
--permissions <value>
[--permissions-with-grant-option <value>]
[--cli-input-json <value>]
[--generate-cli-skeleton <value>]
```

```
revoke-permissions
[--catalog-id <value>]
--principal <value>
--resource <value>
--permissions <value>
[--permissions-with-grant-option <value>]
[--cli-input-json <value>]
[--generate-cli-skeleton <value>]
```

For detailed descriptions of these commands, see [grant-permissions](https://docs.aws.amazon.com/cli/latest/reference/lakeformation/grant-permissions.html) and [revoke-permissions](https://docs.aws.amazon.com/cli/latest/reference/lakeformation/revoke-permissions.html) in the *AWS CLI Command Reference*. This section provides additional information on the `--principal` option.

The value of the `--principal` option is one of the following:
+ Amazon Resource Name (ARN) for an AWS Identity and Access Management (IAM) user or role
+ ARN for a user or group that authenticates through a SAML provider, such as Microsoft Active Directory Federation Service (AD FS)
+ ARN for an Amazon Quick user or group
+ For cross-account permissions, an AWS account ID, an organization ID, or an organizational unit ID
+ For IAM Identity Center user or group, IAM Identity Center user or group ARN.

The following are syntax and examples for all `--principal` types.

**Principal is an IAM user**  
Syntax:  

```
--principal DataLakePrincipalIdentifier=arn:aws:iam::<account-id>:user/<user-name>
```
Example:  

```
--principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1
```

**Principal is an IAM role**  
Syntax:  

```
--principal DataLakePrincipalIdentifier=arn:aws:iam::<account-id>:role/<role-name>
```
Example:  

```
--principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:role/workflowrole
```

**Principal is a user authenticating through a SAML provider**  
Syntax:  

```
--principal DataLakePrincipalIdentifier=arn:aws:iam::<account-id>:saml-provider/<SAMLproviderName>:user/<user-name>
```
Examples:  

```
--principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:saml-provider/idp1:user/datalake_user1
```

```
--principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:saml-provider/AthenaLakeFormationOkta:user/athena-user@example.com
```

**Principal is a group authenticating through a SAML provider**  
Syntax:  

```
--principal DataLakePrincipalIdentifier=arn:aws:iam::<account-id>:saml-provider/<SAMLproviderName>:group/<group-name> 
```
Examples:  

```
--principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:saml-provider/idp1:group/data-scientists
```

```
--principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:saml-provider/AthenaLakeFormationOkta:group/my-group
```

**Principal is an Amazon Quick Enterprise Edition user**  
Syntax:  

```
--principal DataLakePrincipalIdentifier=arn:aws:quicksight:<region>:<account-id>:user/<namespace>/<user-name>
```
For *<namespace>*, you must specify `default`.
Example:  

```
--principal DataLakePrincipalIdentifier=arn:aws:quicksight:us-east-1:111122223333:user/default/bi_user1
```

**Principal is an Amazon Quick Enterprise Edition group**  
Syntax:  

```
--principal DataLakePrincipalIdentifier=arn:aws:quicksight:<region>:<account-id>:group/<namespace>/<group-name> 
```
For *<namespace>*, you must specify `default`.
Example:  

```
--principal DataLakePrincipalIdentifier=arn:aws:quicksight:us-east-1:111122223333:group/default/data_scientists
```

**Principal is an AWS account**  
Syntax:  

```
--principal DataLakePrincipalIdentifier=<account-id>
```
Example:  

```
--principal DataLakePrincipalIdentifier=111122223333
```

**Principal is an organization**  
Syntax:  

```
--principal DataLakePrincipalIdentifier=arn:aws:organizations::<account-id>:organization/<organization-id>
```
Example:  

```
--principal DataLakePrincipalIdentifier=arn:aws:organizations::111122223333:organization/o-abcdefghijkl
```

**Principal is an organizational unit**  
Syntax:  

```
--principal DataLakePrincipalIdentifier=arn:aws:organizations::<account-id>:ou/<organization-id>/<organizational-unit-id>
```
Example:  

```
--principal DataLakePrincipalIdentifier=arn:aws:organizations::111122223333:ou/o-abcdefghijkl/ou-ab00-cdefghij
```

**Principal is an IAM Identity Center identity user or group**  
Example:User  

```
--principal DataLakePrincipalIdentifier=arn:aws:identitystore:::user/<UserID>
```
Example:Group:  

```
--principal DataLakePrincipalIdentifier=arn:aws:identitystore:::group/<GroupID>
```

**Principal is an IAM group - `IAMAllowedPrincipals`**  
Lake Formation sets `Super` permission on all databases and tables in the Data Catalog to a group called `IAMAllowedPrincipals` by default. If this group permission exists on a database or a table, all principals in your account will have access to the resource through the IAM principal policies for AWS Glue. It provides backward compatibility when you start using Lake Formation permissions to secure the Data Catalog resources that were earlier protected by IAM policies for AWS Glue.  
When you use Lake Formation to manage permissions for your Data Catalog resources, you need to first revoke the `IAMAllowedPrincipals` permission on the resources, or opt in the principals and the resources to hybrid access mode for Lake Formation permissions to work.   
Example:  

```
--principal DataLakePrincipalIdentifier=IAM_Allowed_Principals
```

**Principal is an IAM group - `ALLIAMPrincipals`**  
When you grant permissions to `ALLIAMPrincipals` group on a Data Catalog resource, every principal in the account gets access to the Data Catalog resource using Lake Formation permissions and IAM permissions.  
Example:  

```
--principal DataLakePrincipalIdentifier=123456789012:IAMPrincipals
```

## Lake Formation permissions
<a name="lf-permissions"></a>

This section contains the available Lake Formation permissions that you can grant to principals.

### `ALTER`
<a name="perm-alter"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| ALTER | DATABASE | glue:UpdateDatabase  | 
| ALTER | TABLE | glue:UpdateTable | 
| ALTER | LF-Tag | lakeformation:UpdateLFTag | 

A principal with this permission can alter metadata for a database or table in the Data Catalog. For tables, you can change the column schema and add column parameters. You cannot alter columns in the underlying data that a metadata table points to.

If the property that is being altered is a registered Amazon Simple Storage Service (Amazon S3) location, the principal must have data location permissions on the new location.

**Example**  
The following example grants the `ALTER` permission to user `datalake_user1` on the database `retail` in AWS account 1111-2222-3333.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "ALTER" --resource '{ "Database": {"Name":"retail"}}'
```

**Example**  
The following example grants `ALTER` to user `datalake_user1` on the table `inventory` in the database `retail`.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "ALTER" --resource '{ "Table": {"DatabaseName":"retail", "Name":"inventory"}}'
```

### `CREATE_DATABASE`
<a name="perm-create-database"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| CREATE\$1DATABASE | Data Catalog | glue:CreateDatabase | 

A principal with this permission can create a metadata database or resource link in the Data Catalog. The principal can also create tables in the database.

**Example**  
The following example grants `CREATE_DATABASE` to user `datalake_user1` in AWS account 1111-2222-3333.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "CREATE_DATABASE" --resource '{ "Catalog": {}}'
```

When a principal creates a database in the Data Catalog, no permissions to underlying data are granted. The following additional metadata permissions are granted (along with the ability to grant these permissions to others):
+ `CREATE_TABLE` in the database
+ `ALTER` database
+ `DROP` database

When creating a database, the principal can optionally specify an Amazon S3 location. Depending on whether the principal has data location permissions, the `CREATE_DATABASE` permission might not be sufficient to create databases in all cases. It is important to keep the following three cases in mind.


| Create database use case | Permissions needed | 
| --- | --- | 
| The location property is unspecified. | CREATE\$1DATABASE is sufficient. | 
| The location property is specified, and the location is not managed by Lake Formation (is not registered). | CREATE\$1DATABASE is sufficient. | 
| The location property is specified, and the location is managed by Lake Formation (is registered). | CREATE\$1DATABASE is required plus data location permissions on the specified location. | 

### `CREATE_TABLE`
<a name="perm-create-table"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| CREATE\$1TABLE | DATABASE | glue:CreateTable  | 

A principal with this permission can create a metadata table or resource link in the Data Catalog within the specified database.

**Example**  
The following example grants the user `datalake_user1` permission to create tables in the `retail` database in AWS account 1111-2222-3333.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 
 --permissions "CREATE_TABLE" --resource '{ "Database": {"Name":"retail"}}'
```

When a principal creates a table in the Data Catalog, all Lake Formation permissions on the table are granted to the principal, with the ability to grant these permissions to others.

**Cross-account Grants**  
If a database owner account grants `CREATE_TABLE` to a recipient account, and a user in the recipient account successfully creates a table in the owner account's database, the following rules apply:
+ The user and data lake administrators in the recipient account have all Lake Formation permissions on the table. They can grant permissions on the table to other principals in their account. They can't grant permissions to principals in the owner account or any other accounts.
+ Data lake administrators in the owner account can grant permissions on the table to other principals in their account.

**Data Location Permissions**  
When you attempt to create a table that points to an Amazon S3 location, depending on whether you have data location permissions, the `CREATE_TABLE` permission might not be sufficient to create a table. It's important to keep the following three cases in mind.


| Create table use case | Permissions needed | 
| --- | --- | 
| The specified location is not managed by Lake Formation (is not registered). | CREATE\$1TABLE is sufficient. | 
| The specified location is managed by Lake Formation (is registered), and the containing database has no location property or has a location property that is not an Amazon S3 prefix of the table location. | CREATE\$1TABLE is required plus data location permissions on the specified location. | 
| The specified location is managed by Lake Formation (is registered), and the containing database has a location property that points to a location that is registered and is an Amazon S3 prefix of the table location. | CREATE\$1TABLE is sufficient. | 

### `DATA_LOCATION_ACCESS`
<a name="perm-location"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| DATA\$1LOCATION\$1ACCESS | Amazon S3 location | (Amazon S3 permissions on the location, which must be specified by the role used to register the location.) | 

This is the only data location permission. A principal with this permission can create a metadata database or table that points to the specified Amazon S3 location. The location must be registered. A principal who has data location permissions on a location also has location permissions on child locations.

**Example**  
The following example grants data location permissions on `s3://products/retail` to user `datalake_user1` in AWS account 1111-2222-3333.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "DATA_LOCATION_ACCESS" --resource '{ "DataLocation": {"ResourceArn":"arn:aws:s3:::products/retail"}}'
```

`DATA_LOCATION_ACCESS` is not needed to query or update underlying data. This permission applies only to creating Data Catalog resources.

For more information about data location permissions, see [Underlying data access control](access-control-underlying-data.md#data-location-permissions).

### `DELETE`
<a name="perm-delete"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| DELETE | TABLE | (No additional IAM permissions are needed if the location is registered.) | 

A principal with this permission can insert, update, and read underlying data at the Amazon S3 location specified by the table. The principal can also view the table on the Lake Formation console and retrieve information about the table with the AWS Glue API.

**Example**  
The following example grants the `DELETE` permission to the user `datalake_user1` on the table `inventory` in the database `retail` in AWS account 1111-2222-3333.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "DELETE" --resource '{ "Table": {"DatabaseName":"retail", "Name":"inventory"}}'
```

This permission applies only to data in Amazon S3, and not to data in other data stores such as Amazon Relational Database Service (Amazon RDS).

### `DESCRIBE`
<a name="perm-describe"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| DESCRIBE |  Table resource link Database resource link  |  `glue:GetTable` `glue:GetDatabase`  | 
| DESCRIBE | DATABASE | glue:GetDatabase | 
| DESCRIBE | TABLE | glue:GetTable | 
| DESCRIBE | LF-Tag |  `glue:GetTable` `glue:GetDatabase` `lakeformation:GetResourceLFTags` `lakeformation:ListLFTags` `lakeformation:GetLFTag` `lakeformation:SearchTablesByLFTags` `lakeformation:SearchDatabasesByLFTags`  | 

A principal with this permission can view the specified database, table, or resource link. No other Data Catalog permissions are implicitly granted, and no data access permissions are implicitly granted. Databases and tables appear in the query editors of integrated services, but no queries can be made against them unless other Lake Formation permissions (for example, `SELECT`) are granted.

For example, a user who has `DESCRIBE` on a database can see the database and all database metadata (description, location, and so on). However, the user can't find out which tables the database contains, and can't drop, alter, or create tables in the database. Similarly, a user who has `DESCRIBE` on a table can see the table and table metadata (description, schema, location, and so on), but can't drop, alter, or run queries against the table.

The following are some additional rules for `DESCRIBE`:
+ If a user has other Lake Formation permissions on a database, table, or resource link, `DESCRIBE` is implicitly granted.
+ If a user has `SELECT` on only a subset of columns for a table (partial `SELECT`), the user is restricted to seeing just those columns.
+ You can't grant `DESCRIBE` to a user who has partial select on a table. Conversely, you can't specify column inclusion or exclusion lists for tables that `DESCRIBE` is granted on.

**Example**  
The following example grants the `DESCRIBE` permission to the user `datalake_user1` on the table resource link `inventory-link` in the database `retail` in AWS account 1111-2222-3333.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "DESCRIBE" --resource '{ "Table": {"DatabaseName":"retail", "Name":"inventory-link"}}'
```

### `DROP`
<a name="perm-drop"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| DROP | DATABASE | glue:DeleteDatabase | 
| DROP | TABLE | glue:DeleteTable  | 
| DROP | LF-Tag | lakeformation:DeleteLFTag  | 
| DROP |  Database resource link Table resource link  | `glue:DeleteDatabase` `glue:DeleteTable`  | 

A principal with this permission can drop a database, table, or resource link in the Data Catalog. You can't grant DROP on a database to an external account or organization.

**Warning**  
Dropping a database drops all tables in the database.

**Example**  
The following example grants the `DROP` permission to the user `datalake_user1` on the database `retail` in AWS account 1111-2222-3333.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "DROP" --resource '{ "Database": {"Name":"retail"}}'
```

**Example**  
The following example grants `DROP` to the user `datalake_user1` on the table `inventory` in the database `retail`.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "DROP" --resource '{ "Table": {"DatabaseName":"retail", "Name":"inventory"}}'
```

**Example**  
The following example grants `DROP` to the user `datalake_user1` on the table resource link `inventory-link` in the database `retail`.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "DROP" --resource '{ "Table": {"DatabaseName":"retail", "Name":"inventory-link"}}'
```

### `INSERT`
<a name="perm-insert"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| INSERT | TABLE | (No additional IAM permissions are needed if the location is registered.) | 

A principal with this permission can insert, update, and read underlying data at the Amazon S3 location specified by the table. The principal can also view the table in the Lake Formation console and retrieve information about the table with the AWS Glue API.

**Example**  
The following example grants the `INSERT` permission to the user `datalake_user1` on the table `inventory` in the database `retail` in AWS account 1111-2222-3333.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "INSERT" --resource '{ "Table": {"DatabaseName":"retail", "Name":"inventory"}}'
```

This permission applies only to data in Amazon S3, and not to data in other data stores such as Amazon RDS.

### `SELECT`
<a name="perm-select"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| SELECT |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/lf-permissions-reference.html)  | (No additional IAM permissions are needed if the location is registered.) | 

A principal with this permission can view a table in the Data Catalog, and can query the underlying data in Amazon S3 at the location specified by the table. The principal can view the table in the Lake Formation console and retrieve information about the table with the AWS Glue API. If column filtering was applied when this permission was granted, the principal can view the metadata only for the included columns and can query data only from the included columns.

**Note**  
It is the responsibility of the integrated analytics service to apply the column filtering when processing a query.

**Example**  
The following example grants the `SELECT` permission to the user `datalake_user1` on the table `inventory` in the database `retail` in AWS account 1111-2222-3333.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "SELECT" --resource '{ "Table": {"DatabaseName":"retail", "Name":"inventory"}}'
```

This permission applies only to data in Amazon S3, and not to data in other data stores such as Amazon RDS.

You can filter (restrict the access to) specific columns with an optional inclusion list or an exclusion list. An inclusion list specifies the columns that can be accessed. An exclusion list specifies the columns that can't be accessed. In the absence of an inclusion or exclusion list, all table columns are accessible.

The results of `glue:GetTable` return only the columns that the caller has permission to view. Integrated services such as Amazon Athena and Amazon Redshift honor column inclusion and exclusion lists.

**Example**  
The following example grants `SELECT` to the user `datalake_user1` on the table `inventory` using an inclusion list.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "SELECT"  --resource '{ "TableWithColumns": {"DatabaseName":"retail", "Name":"inventory", "ColumnNames": ["prodcode","location","period","withdrawals"]}}'
```

**Example**  
This next example grants `SELECT` on the `inventory` table using an exclusion list.  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "SELECT"  --resource '{ "TableWithColumns": {"DatabaseName":"retail", "Name":"inventory", "ColumnWildcard": {"ExcludedColumnNames": ["intkey", "prodcode"]}}}'
```

The following restrictions apply to the `SELECT` permission:
+ When granting `SELECT`, you can't include the grant option if column filtering is applied.
+ You cannot restrict access control on columns that are partition keys.
+ A principal with the `SELECT` permission on a subset of columns in a table cannot be granted the `ALTER`, `DROP`, `DELETE`, or `INSERT` permission on that table. Similarly, a principal with the `ALTER`, `DROP`, `DELETE`, or `INSERT` permission on a table cannot be granted the `SELECT` permission with column filtering.

The `SELECT` permission always appears on the **Data permissions** page of the Lake Formation console as a separate row. This following image shows that `SELECT` is granted to the users `datalake_user2` and `datalake_user3` on all columns in the `inventory` table.

![\[The Data permissions page shows four rows. The first and third rows list the Delete and Insert permissions with resource type Table with the resource shown as inventory, and the second and fourth rows list the Select permission with resource type Column, and with the resource shown as retail.inventory.*.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/data-permissions-dialog-select-cross.png)


### `Super`
<a name="perm-super"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| Super | DATABASE | glue:\$1Database\$1  | 
| Super | TABLE | glue:\$1Table\$1, glue:\$1Partition\$1 | 

This permission allows a principal to perform every supported Lake Formation operation on the database or table. You can't grant `Super` on a database to an external account.

This permission can coexist with the other Lake Formation permissions. For example, you can grant the `Super`, `SELECT`, and `INSERT` permissions on a metadata table. The principal can then perform all supported operations on the table. When you revoke `Super`, the `SELECT` and `INSERT` permissions remain, and the principal can perform only select and insert operations.

Instead of granting `Super` to an individual principal, you can grant it to the group `IAMAllowedPrincipals`. The `IAMAllowedPrincipals` group is automatically created and includes all IAM users and roles that are permitted access to your Data Catalog resources by your IAM policies. When `Super` is granted to `IAMAllowedPrincipals` for a Data Catalog resource, access to the resource is effectively controlled solely by IAM policies.

You can have the `Super` permission to be automatically granted to `IAMAllowedPrincipals` for new catalog resources by taking advantage of options on the **Settings** page of the Lake Formation console.

![\[The Data catalog settings dialog box has the subtitle "Default permissions for newly created databases and tables," and has two check boxes, which are described in the text.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/settings-page.png)

+ To grant `Super` to `IAMAllowedPrincipals` for all new databases, select **Use only IAM access control for new databases**.
+ To grant `Super` to `IAMAllowedPrincipals` for all new tables in new databases, select **Use only IAM access control for new tables in new databases**.
**Note**  
This option causes the check box **Use only IAM access control for new tables in this database** in the **Create database** dialog box to be selected by default. It does nothing more than that. It is the check box in the **Create database** dialog box that enables the grant of `Super` to `IAMAllowedPrincipals`.

These **Settings** page options are enabled by default. For more information, see the following:
+ [Changing the default settings for your data lake](change-settings.md)
+ [Upgrading AWS Glue data permissions to the AWS Lake Formation model](upgrade-glue-lake-formation.md)

### `SUPER_USER`
<a name="perm-super-user"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| Super user | Catalog | glue:GetCatalog  | 

You can grant the `Super user` permission only to specific principals on catalogs within the default Data Catalog. You can't grant `Super user` permission on the default catalog or on other resource types such as databases and tables or to principals in external accounts. The `Super user` permission permission allows a principal to perform every supported Lake Formation operation on the databases and tables within the granted catalog. 

With the `Super user` permission, the principal (grantee) is able to perform the following actions on the resources (catalogs, databases, and tables) within the catalog:
+ `CREATE_DATABASE`, `DESCRIBE` permissions on the catalog.
+ `DROP`, `ALTER`, `CREATE_TABLE`, `DESCRIBE` (effectively `SUPER`) permissions on all databases within the catalog.
+ `DROP`, `ALTER`, `DESCRIBE`, `SELECT`, `INSERT`, `DELETE` (effectively `SUPER`) permissions on all tables within all databases within the catalog.
+ `All` (effectively SUPER) permissions on catalogs within the catalog.
+ Grantable (the ability to grant these permissions to other principals) permissions on all catalogs, databases, and tables within the catalog.

With the `Super user` permission on a catalog resource, the grantee is not allowed to perform or delegate `ALTER` and `DROP` actions on the catalog.

### `ASSOCIATE`
<a name="perm-associate"></a>


| Permission | Granted on this resource | Grantee also needs | 
| --- | --- | --- | 
| ASSOCIATE | LF-Tag |   `glue:GetDatabase` `glue:GetTable`  `lakeformation:AddLFTagsToResource"` `lakeformation:RemoveLFTagsFromResource"` `lakeformation:GetResourceLFTags` `lakeformation:ListLFTags` `lakeformation:GetLFTag` `lakeformation:SearchTablesByLFTags` `lakeformation:SearchDatabasesByLFTags`  | 

A principal with this permission on a LF-Tag can assign the LF-Tag to a Data Catalog resource. Granting `ASSOCIATE` implicitly grants `DESCRIBE`.

**Example**  
This example grants to user `datalake_user1` the `ASSOCIATE` permission on the LF-Tag with the key `module`. It grants permissions to view and assign all values for that key, as indicated by the asterisk (\$1)..  

```
aws lakeformation grant-permissions --principal DataLakePrincipalIdentifier=arn:aws:iam::111122223333:user/datalake_user1 --permissions "ASSOCIATE" --resource '{ "LFTag": {"CatalogId":"111122223333","TagKey":"module","TagValues":["*"]}}'
```

# Integrating IAM Identity Center
<a name="identity-center-integration"></a>

With AWS IAM Identity Center, you can connect to identity providers (IdPs) and centrally manage access for users and groups across AWS analytics services. You can integrate identity providers such as Okta, Ping, and Microsoft Entra ID (formerly Azure Active Directory) with IAM Identity Center for users in your organization to access data using a single-sign on experience. IAM Identity Center also supports connecting additional third-party identity providers.

For more information see, [Supported identity providers](https://docs.aws.amazon.com/singlesignon/latest/userguide/supported-idps.html) in the AWS IAM Identity Center User Guide.

You can configure AWS Lake Formation as an enabled application in IAM Identity Center, and data lake administrators can grant fine-grained permissions to authorized users and groups on AWS Glue Data Catalog resources. 

Users from your organization can sign in to any Identity Center enabled application using your organization’s identity provider, and query datasets applying Lake Formation permissions. With this integration, you can manage access to AWS services, without creating multiple IAM roles.

[Trusted identity propagation](https://docs.aws.amazon.com//singlesignon/latest/userguide/trustedidentitypropagation-overview.html) is an AWS IAM Identity Center feature that administrators of connected AWS services can use to grant and audit access to service data. Access to this data is based on user attributes such as group associations. Setting up trusted identity propagation requires collaboration between the administrators of connected AWS services and the IAM Identity Center administrators. For more information, see [Prerequisites and considerations](https://docs.aws.amazon.com//singlesignon/latest/userguide/trustedidentitypropagation-overall-prerequisites.html).

For limitations, see [IAM Identity Center integration limitations](identity-center-lf-notes.md).

**Topics**
+ [Prerequisites for IAM Identity Center integration with Lake Formation](prerequisites-identity-center.md)
+ [Connecting Lake Formation with IAM Identity Center](connect-lf-identity-center.md)
+ [Updating IAM Identity Center integration](update-lf-identity-center-connection.md)
+ [Deleting a Lake Formation connection with IAM Identity Center](delete-lf-identity-center-connection.md)
+ [Granting permissions to users and groups](grant-permissions-sso.md)
+ [Including IAM Identity Center user context in CloudTrail logs](identity-center-ct-logs.md)

# Prerequisites for IAM Identity Center integration with Lake Formation
<a name="prerequisites-identity-center"></a>

 The following are the prerequisites for integrating IAM Identity Center with Lake Formation. 

1. Enable IAM Identity Center – Enabling IAM Identity Center is a prerequisite to support authentication and identity propagation.

1. Choose your identity source – After you enable IAM Identity Center, you must have an identify provider to manage users and groups. You can either use the built-in Identity Center directory as an identity source or use external IdP, such as Microsoft Entra ID or Okta. 

    For more information, see [Manage your identity source](https://docs.aws.amazon.com/singlesignon/latest/userguide/manage-your-identity-source.html) and [Connect to an external identity provider](https://docs.aws.amazon.com/singlesignon/latest/userguide/manage-your-identity-source-idp.html) in the AWS IAM Identity Center User Guide. 

1. Create an IAM role – The role that creates IAM Identity Center connection requires permissions to create and modify application configuration in Lake Formation and IAM Identity Center as in the following inline policy. 

   You need to add permissions per IAM best practices. Specific permissions are detailed in the procedures that follow. For more information, see [Getting started with IAM Identity Center](https://docs.aws.amazon.com/singlesignon/latest/userguide/get-started-enable-identity-center.html).

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect": "Allow",
               "Action": [
                   "lakeformation:CreateLakeFormationIdentityCenterConfiguration",
                   "sso:CreateApplication",
                   "sso:PutApplicationAssignmentConfiguration",
                   "sso:PutApplicationAuthenticationMethod",
                   "sso:PutApplicationGrant",
                   "sso:PutApplicationAccessScope"
               ],
               "Resource": [
                   "*"
               ]
           }
       ]
   }
   ```

------

    If you're sharing Data Catalog resources with external AWS accounts or organizations, you must have the AWS Resource Access Manager (AWS RAM) permissions for creating resource shares. For more information about the permissions required to share resources, see [Cross-account data sharing prerequisites](cross-account-prereqs.md). 

The following inline policies contain specific permissions required to view, update, and delete properties of Lake Formation integration with IAM Identity Center.
+ Use the following inline policy to allow an IAM role to view a Lake Formation integration with IAM Identity Center.

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "lakeformation:DescribeLakeFormationIdentityCenterConfiguration",
                  "sso:DescribeApplication"
              ],
              "Resource": [
                  "*"
              ]
          }
      ]
  }
  ```

------
+ Use the following inline policy to allow an IAM role to update a Lake Formation integration with IAM Identity Center. The policy also includes optional permissions required to share resources with external accounts.

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "lakeformation:UpdateLakeFormationIdentityCenterConfiguration",
                  "lakeformation:DescribeLakeFormationIdentityCenterConfiguration",
                  "sso:DescribeApplication",
                  "sso:UpdateApplication"
              ],
              "Resource": [
                  "*"
              ]
          }
      ]
  }
  ```

------
+ Use the following inline policy to allow an IAM role to delete a Lake Formation integration with IAM Identity Center.

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "lakeformation:DeleteLakeFormationIdentityCenterConfiguration",
                  "sso:DeleteApplication"
              ],
              "Resource": [
                  "*"
              ]
          }
      ]
  }
  ```

------
+ For IAM permissions required to grant or revoke data lake permissions for IAM Identity Center users and groups, see [IAM permissions required to grant or revoke Lake Formation permissions](required-permissions-for-grant.md). 

*Permissions description*
+ `lakeformation:CreateLakeFormationIdentityCenterConfiguration` – Creates the Lake Formation IdC configuration.
+ `lakeformation:DescribeLakeFormationIdentityCenterConfiguration` – Describes an existing IdC configuration.
+ `lakeformation:DeleteLakeFormationIdentityCenterConfiguration` – Gives the ability to delete an existing Lake Formation IdC configuration. 
+ `lakeformation:UpdateLakeFormationIdentityCenterConfiguration` – Used to change an existing Lake Formation configuration.
+ `sso:CreateApplication` – Used to create an IAM Identity Center application.
+ `sso:DeleteApplication` – Used to delete an IAM Identity Center application.
+ `sso:UpdateApplication` – Used to update an IAM Identity Center application.
+ `sso:PutApplicationGrant` – Used to change the trusted token issuer information.
+ `sso:PutApplicationAuthenticationMethod` – Grants Lake Formation authentication access.
+ `sso:GetApplicationGrant` – Used to list trusted token issuer information.
+ `sso:DeleteApplicationGrant` – Deletes the trust token issuer information.
+ `sso:PutApplicationAccessScope` – Adds or updates the list of authorized targets for an IAM Identity Center access scope for an application.
+ `sso:PutApplicationAssignmentConfiguration` – Used to configure how users gain access to an application.

# Connecting Lake Formation with IAM Identity Center
<a name="connect-lf-identity-center"></a>

Before you can use IAM Identity Center to manage identities to grant access to Data Catalog resources using Lake Formation, you must complete the following steps. You can create the IAM Identity Center integration using the Lake Formation console or AWS CLI. 

------
#### [ AWS Management Console ]

**To connect Lake Formation with IAM Identity Center**

1. Sign in to the AWS Management Console, and open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. In the left navigation pane, select **IAM Identity Center integration**.   
![\[IAM Identity Center integration screen with Identity Center ARN.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/identity-center-integ.png)

1. (Optional) Enter one or more valid AWS account IDs, organization IDs, and/or organizational unit IDs to allow external accounts to access the Data Catalog resources. When IAM Identity Center users or groups try to access Lake Formation managed Data Catalog resources, Lake Formation assumes an IAM role to authorize metadata access. If the IAM role belongs to an external account that does not have an AWS Glue resource policy and an AWS RAM resource share, the IAM Identity Center users and groups won't be able to access the resource even if they've Lake Formation permissions.

   Lake Formation uses the AWS Resource Access Manager (AWS RAM) service to share the resource with external accounts and organizations. AWS RAM sends an invitation to the grantee account to accept or reject the resource share. 

   For more information, see [Accepting a resource share invitation from AWS RAM](accepting-ram-invite.md).
**Note**  
Lake Formation permits IAM roles from external accounts to act as carrier roles on behalf of IAM Identity Center users and groups for accessing Data Catalog resources, but permissions can only be granted on Data Catalog resources within the owning account. If you try to grant permissions to IAM Identity Center users and groups on Data Catalog resources in an external account, Lake Formation throws the following error - "Cross-account grants are not supported for the principal." 

1. (Optional) On the **Create Lake Formation integration** screen, specify the ARNs of third-party applications that can access data in Amazon S3 locations registered with Lake Formation. Lake Formation vends scoped-down temporary credentials in the form of AWS STS tokens to registered Amazon S3 locations based on the effective permissions, so that authorized applications can access data on behalf of users.

1. (Optional) On the **Create Lake Formation integration** screen, check mark the Amazon Redshift Connect checkbox in Trusted Identity Propagation to enable Amazon Redshift Federated Permissions discovery via IDC. Lake Formation propagates identity to downstream based on the effective permissions, so that authorized applications can access data on behalf of users.

1. Select **Submit**.

   After the Lake Formation administrator finishes the steps and creates the integration, the IAM Identity Center properties appear in the Lake Formation console. Completing these tasks makes Lake Formation an IAM Identity Center enabled application. The properties in the console include the integration status. The integration status says `Success` when it's completed. This status indicates if IAM Identity Center configuration is completed. 

------
#### [ AWS CLI ]
+ The following example shows how to create Lake Formation integration with IAM Identity Center. You can also specify the `Status` (`ENABLED`, `DISABLED`) of the applications. 

  ```
  aws lakeformation create-lake-formation-identity-center-configuration \
      --catalog-id <123456789012> \
      --instance-arn <arn:aws:sso:::instance/ssoins-112111f12ca1122p> \
      --share-recipients '[{"DataLakePrincipalIdentifier": "<123456789012>"},
                          {"DataLakePrincipalIdentifier": "<555555555555>"}]' \
      --external-filtering '{"AuthorizedTargets": ["<app arn1>", "<app arn2>"], "Status": "ENABLED"}'
  ```
+ The following example shows how to view a Lake Formation integration with IAM Identity Center.

  ```
  aws lakeformation describe-lake-formation-identity-center-configuration
   --catalog-id <123456789012>
  ```
+ The following example shows how to enable `Redshift:Connect` Authorization. Authorization can be ENABLED or DISABLED.

  ```
  aws lakeformation  create-lake-formation-identity-center-configuration \
  --instance-arn <arn:aws:sso:::instance/ssoins-112111f12ca1122p> \
  --service-integrations '[{
    "Redshift": [{
      "RedshiftConnect": {
        "Authorization": "ENABLED"
      }
    }]
  }]'
  ```
+ Use the `describe-lake-formation-identity-center-configuration` command to describe the lake formation identity center application. `Redshift:Connect` service integration is essential for cross-service and cross-cluster IdC identity propagation:

  ```
  aws lakeformation describe-lake-formation-identity-center-configuration --catalog-id <123456789012>
  ```

  Response:

  ```
  {
      "CatalogId": "CATALOG ID",
      "InstanceArn": "INSTANCE ARN",
      "ApplicationArn": "APPLICATION ARN",
      "ShareRecipients": [],
      "ServiceIntegrations": [
          {
              "Redshift": [
                  {
                      "RedshiftConnect": {
                          "Authorization": "ENABLED"
                      }
                  }
              ]
          }
      ]
  }
  ```

------

## Using IAM Identity Center across multiple AWS Regions
<a name="connect-lf-identity-center-multi-region"></a>

Lake Formation supports IAM Identity Center in multiple AWS Regions. You can extend IAM Identity Center from your primary AWS Region to additional Regions for improved performance through proximity to users and reliability. When a new Region is added in IAM Identity Center, you can create Lake Formation Identity Center applications in the new Region without replicating identities from the primary Region. For more details to get started with IAM Identity Center in multiple Regions, see [Multi-Region IAM Identity Center](https://docs.aws.amazon.com/singlesignon/latest/userguide/multi-region-iam-identity-center.html) in the *IAM Identity Center User Guide*.

# Updating IAM Identity Center integration
<a name="update-lf-identity-center-connection"></a>

After creating the connection, you can add third-party applications for the IAM Identity Center integration to integrate with Lake Formation, and get access to Amazon S3 data on behalf of the users. You can also remove existing applications from the IAM Identity Center integration. You can add or remove applications using Lake Formation console, AWS CLI, and using [UpdateLakeFormationIdentityCenterConfiguration](https://docs.aws.amazon.com/lake-formation/latest/APIReference/API_UpdateLakeFormationIdentityCenterConfiguration.html) operation. 

**Note**  
After creating IAM Identity Center integration, you can't update the instance `ARN`.

------
#### [ AWS Management Console ]

**To update an existing IAM Identity Center connection with Lake Formation**

1. Sign in to the AWS Management Console, and open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. In the left navigation pane, select **IAM Identity Center integration**.

1. Select **Add** on the **IAM Identity Center integration** page.

1. Enter one or more valid AWS account IDs, organization IDs, and/or organizational unit IDs to allow external accounts to access the Data Catalog resources. 

1. On the **Add applications** screen, enter the application IDs of the third-party applications that you want to integrate with Lake Formation. 

1. Select **Add**.

1. (Optioanlly) On the **IAM Identity Center integration** page you can either enabled trusted identity propagation for Amazon Redshift connect or disable it. Lake Formation propagates identity to downstream based on the effective permissions, so that authorized applications can access data on behalf of users.

------
#### [ AWS CLI ]

You can add or remove third-party applications for the IAM Identity Center integration by running the following AWS CLI command. When you set external filtering status to `ENABLED`, it enables the IAM Identity Center to provide identity management for third-party applications to access data managed by Lake Formation. You can also enable or disable the IAM Identity Center integration by setting the application status. 

```
aws lakeformation update-lake-formation-identity-center-configuration \
 --external-filtering '{"AuthorizedTargets": ["<app arn1>", "<app arn2>"], "Status": "ENABLED"}'\
 --share-recipients '[{"DataLakePrincipalIdentifier": "<444455556666>"}
                     {"DataLakePrincipalIdentifier": "<777788889999>"}]' \
 --application-status ENABLED
```

If you have an existing LF IDC application, but wish to add the `Redshift:Connect` authorization, you can use the following to update your Lake Formation IDC Application. Authorization can be ENABLED or DISABLED.

```
aws lakeformation update-lake-formation-identity-center-configuration \
--service-integrations '[{                                                            
  "Redshift": [{
    "RedshiftConnect": {
      "Authorization": "ENABLED"
    }
  }]
}]'
```

------

# Deleting a Lake Formation connection with IAM Identity Center
<a name="delete-lf-identity-center-connection"></a>

 If you would like to delete an existing IAM Identity Center integration, you can do it using Lake Formation console, AWS CLI, or [DeleteLakeFormationIdentityCenterConfiguration](https://docs.aws.amazon.com/lake-formation/latest/APIReference/API_DeleteLakeFormationIdentityCenterConfiguration.html) operation.

------
#### [ AWS Management Console ]

**To delete an existing IAM Identity Center connection with Lake Formation**

1. Sign in to the AWS Management Console, and open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. In the left navigation pane, select **IAM Identity Center integration**.

1. Select **Delete** on the **IAM Identity Center integration** page.

1. On the **Confirm integration** screen, confirm the action, and select **Delete**.

------
#### [ AWS CLI ]

You can delete IAM Identity Center integration by running the following AWS CLI command. 

```
 aws lakeformation delete-lake-formation-identity-center-configuration \
     --catalog-id <123456789012>
```

------

# Granting permissions to users and groups
<a name="grant-permissions-sso"></a>

Your data lake administrator can grant permissions to IAM Identity Center users and groups on Data Catalog resources (databases, tables, and views) to allow easy data access. To grant or revoke data lake permissions, the grantor requires permissions for the following IAM Identity Center actions.
+ [DescribeUser](https://docs.aws.amazon.com/singlesignon/latest/IdentityStoreAPIReference/API_DescribeUser.html)
+ [DescribeGroup](https://docs.aws.amazon.com/singlesignon/latest/IdentityStoreAPIReference/API_DescribeGroup.html)
+ [DescribeInstance](https://docs.aws.amazon.com/singlesignon/latest/APIReference/API_DescribeInstance.html)

You can grant permissions by using the Lake Formation console, the API, or the AWS CLI.

For more information on granting permissions, see [Granting permissions on Data Catalog resources](granting-catalog-permissions.md). 

**Note**  
You can only grant permissions on resources in your account. To cascade permissions to users and groups on resources shared with you, you must use AWS RAM resources shares.

------
#### [ AWS Management Console ]

**To grant permissions to users and groups**

1. Sign in to the AWS Management Console, and open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. Select **Data lake permissions** under **Permissions** in the Lake Formation console. 

1. Select **Grant**.

1. On the **Grant data lake permissions** page, choose, **IAM Identity Center** users and groups. 

1. Select **Add** to choose the users and groups to grant permissions.  
![\[Grant data lake permissions screen with IAM Identity Center users and groups selected.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/identity-center-grant-perm.png)

1. On the **Assign users and groups** screen, choose the users and/or groups to grant permissions.

   Select **Assign**.  
![\[Grant data lake permissions screen with IAM Identity Center users and groups selected.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/identity-center-assign-users-groups.png)

1. Next, choose the method to grant permissions.

   For instructions on granting permissions using named resources method, see [Granting data permissions using the named resource method](granting-cat-perms-named-resource.md).

   For instructions on granting permission using LF-Tags, see [Granting data lake permissions using the LF-TBAC method](granting-catalog-perms-TBAC.md).

1. Choose the Data Catalog resources on which you want to grant permissions.

1. Choose the Data Catalog permissions to grant.

1. Select **Grant**.

------
#### [ AWS CLI ]

The following example shows how to grant IAM Identity Center user `SELECT` permission on a table.

```
aws lakeformation grant-permissions \
--principal DataLakePrincipalIdentifier=arn:aws:identitystore:::user/<UserId> \
--permissions "SELECT" \
--resource '{ "Table": { "DatabaseName": "retail", "TableWildcard": {} } }'
```

To retrieve `UserId` from IAM Identity Center, see [GetUserId](https://docs.aws.amazon.com/singlesignon/latest/IdentityStoreAPIReference/API_GetUserId.html) operation in the IAM Identity Center API Reference.

------

# Including IAM Identity Center user context in CloudTrail logs
<a name="identity-center-ct-logs"></a>

Lake Formation uses [credential vending](using-cred-vending.md) functionality to provide temporary access to Amazon S3 data. By default, when an IAM Identity Center user submits a query to an integrated analytics service, the CloudTrail logs only include the IAM role assumed by the service to provide short term access. If you use a user-defined role to register the Amazon S3 data location with Lake Formation, you can opt in to include the IAM Identity Center user's context in the CloudTrail events, and then track the users that access your resources.

**Important**  
To include object-level Amazon S3 API requests in the CloudTrail, you need to enable CloudTrail event logging for Amazon S3 bucket and objects. For more inormation, see [Enabling CloudTrail event logging for Amazon S3 buckets and objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/enable-cloudtrail-logging-for-s3.html) in the Amazon S3 User Guide.

**To enable credential vending auditing on data lake locations registered with user-defined roles**

1. Sign in to the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. In the left-side navigation, expand **Administration**, and choose **Data Catalog settings**.

1. Under **Enhanced auditing**, choose **Propagate provided context.**

1. Choose **Save**.

 You can also enable the enhanced auditing option by setting the `Parameters` attribute in the [PutDataLakeSettings](https://docs.aws.amazon.com/lake-formation/latest/APIReference/API_PutDataLakeSettings.html) operation. By default, the `SET_CONTEXT"` parameter value is set to "true".

```
{
    "DataLakeSettings": {
        "Parameters": {"SET_CONTEXT": "true"},
    }
}
```

The following is an excerpt from a CloudTrail event with the enhanced auditing option. This log includes both the IAM Identity Center user's session context and the user-defined IAM role assumed by Lake Formation to access the Amazon S3 data location. See the `onBehalfOf` parameter in the following excerpt.

```
{
         "eventVersion":"1.09",
         "userIdentity":{
            "type":"AssumedRole",
            "principalId":"AROAW7F7MOX4OYE6FLIFN:access-grants-e653760c-4e8b-44fd-94d9-309e035b75ab",
            "arn":"arn:aws:sts::123456789012:assumed-role/accessGrantsTestRole/access-grants-e653760c-4e8b-44fd-94d9-309e035b75ab",           
            "accountId":"123456789012",
            "accessKeyId":"ASIAW7F7MOX4CQLD4JIZN",
            "sessionContext":{
               "sessionIssuer":{
                  "type":"Role",
                  "principalId":"AROAW7F7MOX4OYE6FLIFN",
                  "arn":"arn:aws:iam::123456789012:role/accessGrantsTestRole",
                  "accountId":"123456789012",
                  "userName":"accessGrantsTestRole"
               },
               "attributes":{
                  "creationDate":"2023-08-09T17:24:02Z",
                  "mfaAuthenticated":"false"
               }
            },
            "onBehalfOf":{
                "userId": "<identityStoreUserId>",
                "identityStoreArn": "arn:aws:identitystore::<restOfIdentityStoreArn>"
            }
         },
         "eventTime":"2023-08-09T17:25:43Z",
         "eventSource":"s3.amazonaws.com",
         "eventName":"GetObject",
    ....
```

# Adding an Amazon S3 location to your data lake
<a name="register-data-lake"></a>

To add a data location as storage in your data lake, you *register* the location (**Data lake location**) with AWS Lake Formation. You can then use Lake Formation permissions for fine-grained access control to AWS Glue Data Catalog objects that point to this location and to the underlying data in the location.

Lake Formation also allows to register a data location in hybrid access mode and provide you the flexibility to selectively enable Lake Formation permissions for databases and tables in your Data Catalog. With the Hybrid access mode, you have an incremental path that allows you to set Lake Formation permissions for a specific set of users without interrupting the permission policies of other existing users or workloads.

For more information on setting up hybrid access mode, see [Hybrid access mode](hybrid-access-mode.md) 

When you register a location, that Amazon S3 path and all folders under that path are registered.

For example, suppose that you have an Amazon S3 path organization like the following:

`/mybucket/accounting/sales/`

If you register `S3://mybucket/accounting`, the `sales` folder is also registered and under Lake Formation management.

For more information about registering locations, see [Underlying data access control](access-control-underlying-data.md#underlying-data-access-control).

**Note**  
Lake Formation permissions are recommended for structured data (arranged in tables with rows and columns). If your data contains object-based unstructured data, consider using Amazon S3 access grants to manage data access.

**Topics**
+ [Requirements for roles used to register locations](registration-role.md)
+ [Registering an Amazon S3 location](register-location.md)
+ [Registering an encrypted Amazon S3 location](register-encrypted.md)
+ [Registering an Amazon S3 location in another AWS account](register-cross-account.md)
+ [Registering an encrypted Amazon S3 location across AWS accounts](register-cross-encrypted.md)
+ [Deregistering an Amazon S3 location](unregister-location.md)

# Requirements for roles used to register locations
<a name="registration-role"></a>

You must specify an AWS Identity and Access Management (IAM) role when you register an Amazon Simple Storage Service (Amazon S3) location. AWS Lake Formation assumes that role when accessing the data in that location.

You can use one of the following role types to register a location:
+ The Lake Formation service-linked role. This role grants the required permissions on the location. Using this role is the simplest way to register the location. For more information, see [Using service-linked roles for Lake Formation](service-linked-roles.md) and [Service-linked role limitations](service-linked-role-limitations.md).
+ A user-defined role. Use a user-defined role when you need to grant more permissions than the service-linked role provides.

  You must use a user-defined role in the following circumstances:
  + When registering a location in another account.

    For more information, see [Registering an Amazon S3 location in another AWS account](register-cross-account.md) and [Registering an encrypted Amazon S3 location across AWS accounts](register-cross-encrypted.md).
  + If you used an AWS managed CMK (`aws/s3`) to encrypt the Amazon S3 location.

    For more information, see [Registering an encrypted Amazon S3 location](register-encrypted.md).
  + If you plan to access the location using Amazon EMR.

    If you already registered a location with the service-linked role and want to begin accessing the location with Amazon EMR, you must deregister the location and reregister it with a user-defined role. For more information, see [Deregistering an Amazon S3 location](unregister-location.md).

# Using service-linked roles for Lake Formation
<a name="service-linked-roles"></a>

AWS Lake Formation uses an AWS Identity and Access Management (IAM) *service-linked role*. A service-linked role is a unique type of IAM role that is linked directly to Lake Formation. The service-linked role is predefined by Lake Formation and includes all the permissions that the service requires to call other AWS services on your behalf.

A service-linked role makes setting up Lake Formation easier because you don’t have to create a role and manually add the necessary permissions. Lake Formation defines the permissions of its service-linked role, and unless defined otherwise, only Lake Formation can assume its roles. The defined permissions include the trust policy and the permissions policy, and that permissions policy can't be attached to any other IAM entity.

This service-linked role trusts the following services to assume the role:
+ `lakeformation.amazonaws.com`

When you use a service-linked role in account A to register an Amazon S3 location that is owned by account B, the Amazon S3 bucket policy (a resource-based policy) in account B must grant access permissions to the service-linked role in account A.

For information about using service-linked role to register a data location, see [Service-linked role limitations](service-linked-role-limitations.md).

**Note**  
Service control policies (SCPs) don't affect service-linked roles.   
For more information, see [Service control policies (SCPs)](https://docs.aws.amazon.com/organizations/latest/userguide/orgs_manage_policies_scps.html) in the *AWS Organizations user guide*.

## Service-linked role permissions for Lake Formation
<a name="service-linked-role-permissions"></a>

Lake Formation uses the service-linked role named `AWSServiceRoleForLakeFormationDataAccess`. This role provides a set of Amazon Simple Storage Service (Amazon S3) permissions that enable the Lake Formation integrated service (such as Amazon Athena) to access registered locations. When you register a data lake location, you must provide a role that has the required Amazon S3 read/write permissions on that location. Instead of creating a role with the required Amazon S3 permissions, you can use this service-linked role.

The first time that you name the service-linked role as the role with which to register a path, the service-linked role and a new IAM policy are created on your behalf. Lake Formation adds the path to the inline policy and attaches it to the service-linked role. When you register subsequent paths with the service-linked role, Lake Formation adds the path to the existing policy.

While signed in as a data lake administrator, register a data lake location. Then, in the IAM console, search for the role `AWSServiceRoleForLakeFormationDataAccess` and view its attached policies.

For example, after you register the location `s3://my-kinesis-test/logs`, Lake Formation creates the following inline policy and attaches it to `AWSServiceRoleForLakeFormationDataAccess`.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "LakeFormationDataAccessPermissionsForS3",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts"
            ],
            "Resource": [
                "arn:aws:s3:::my-kinesis-test/logs/*"
            ]
        },
        {
            "Sid": "LakeFormationDataAccessPermissionsForS3ListBucket",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads"
            ],
            "Resource": [
                "arn:aws:s3:::my-kinesis-test"
            ]
        }
    ]
}
```

------

## Creating a service-linked role for Lake Formation
<a name="create-slr"></a>

You don't need to manually create a service-linked role. When you register an Amazon S3 location with Lake Formation in the AWS Management Console, the AWS CLI, or the AWS API, Lake Formation creates the service-linked role for you. 

**Important**  
This service-linked role can appear in your account if you completed an action in another service that uses the features supported by this role. To learn more, see [A new role appeared in my IAM account](https://docs.aws.amazon.com/IAM/latest/UserGuide/troubleshoot_roles.html#troubleshoot_roles_new-role-appeared).

If you delete this service-linked role, and then need to create it again, you can use the same process to recreate the role in your account. When you register an Amazon S3 location with Lake Formation, Lake Formation creates the service-linked role for you again. 

You can also use the IAM console to create a service-linked role with the **Lake Formation** use case. In the AWS CLI or the AWS API, create a service-linked role with the `lakeformation.amazonaws.com` service name. For more information, see [Creating a service-linked role](https://docs.aws.amazon.com/IAM/latest/UserGuide/using-service-linked-roles.html#create-service-linked-role) in the *IAM User Guide*. If you delete this service-linked role, you can use this same process to create the role again.

## Editing a service-linked role for Lake Formation
<a name="edit-slr"></a>

Lake Formation does not allow you to edit the `AWSServiceRoleForLakeFormationDataAccess` service-linked role. After you create a service-linked role, you cannot change the name of the role because various entities might reference the role. However, you can edit the description of the role using IAM. For more information, see [Editing a Service-Linked Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/using-service-linked-roles.html#edit-service-linked-role) in the *IAM User Guide*.

## Deleting a service-linked role for Lake Formation
<a name="delete-slr"></a>

If you no longer need to use a feature or service that requires a service-linked role, we recommend that you delete that role. That way you don’t have an unused entity that is not actively monitored or maintained. However, you must clean up the resources for your service-linked role before you can manually delete it.

**Note**  
If the Lake Formation service is using the role when you try to delete the resources, then the deletion might fail. If that happens, wait for a few minutes and try the operation again.

**To delete Lake Formation resources used by the Lake Formation**
+ If you've used the service-linked role to register Amazon S3 locations with Lake Formation, before deleting the service-linked role, you need to deregister the location and reregister it using a custom role.

**To manually delete the service-linked role using IAM**

Use the IAM console, the AWS CLI, or the AWS API to delete the `AWSServiceRoleForLakeFormationDataAccess` service-linked role. For more information, see [Deleting a Service-Linked Role](https://docs.aws.amazon.com/IAM/latest/UserGuide/using-service-linked-roles.html#delete-service-linked-role) in the *IAM User Guide*.

The following are the requirements for a user-defined role:
+ When creating the new role, on the **Create role** page of the IAM console, choose **AWS service**, and then under **Choose a use case**, choose **Lake Formation**.

  If you create the role using a different path, ensure that the role has a trust relationship with `lakeformation.amazonaws.com`. For more information, see [Modifying a role trust policy (Console)](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_update-role-trust-policy.html).
+ The role must have an inline policy that grants Amazon S3 read/write permissions on the location. The following is a typical policy.

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Effect": "Allow",
              "Action": [
                  "s3:PutObject",
                  "s3:GetObject",
                  "s3:DeleteObject"
              ],
              "Resource": [
                  "arn:aws:s3:::awsexamplebucket/*"
              ]
          },
          {
              "Effect": "Allow",
              "Action": [
                  "s3:ListBucket"
              ],
              "Resource": [
                  "arn:aws:s3:::awsexamplebucket"
              ]
          }
      ]
  }
  ```

------
+ Add the following trust policy to the IAM role to allow the Lake Formation service to assume the role and vend temporary credentials to the integrated analytical engines.

  To include IAM Identity Center user context in the CloudTrail logs, the trust policy must have the permission for the `sts:SetContext` action.

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Sid": "DataCatalogViewDefinerAssumeRole1",
              "Effect": "Allow",
              "Principal": {
                 "Service": [                    
                      "lakeformation.amazonaws.com"
                   ]
              },
              "Action": [
                  "sts:AssumeRole",
                  "sts:SetContext"
              ]
          }
      ]
  }
  ```

------
+ The data lake administrator who registers the location must have the `iam:PassRole` permission on the role.

  The following is an inline policy that grants this permission. Replace *<account-id>* with a valid AWS account number, and replace *<role-name>* with the name of the role.

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Sid": "PassRolePermissions",
              "Effect": "Allow",
              "Action": [
                  "iam:PassRole"
              ],
              "Resource": [
                  "arn:aws:iam::111122223333:role/<role-name>"
              ]
          }
      ]
  }
  ```

------
+ To permit Lake Formation to add logs in CloudWatch Logs and publish metrics, add the following inline policy.
**Note**  
Writing to CloudWatch Logs incurs a charge.

------
#### [ JSON ]

****  

  ```
  {
      "Version":"2012-10-17",		 	 	 
      "Statement": [
          {
              "Sid": "Sid1",
              "Effect": "Allow",
              "Action": [
                  "logs:CreateLogStream",
                  "logs:CreateLogGroup",
                  "logs:PutLogEvents"
              ],
              "Resource": [
                   "arn:aws:logs:us-east-1:111122223333:log-group:/aws-lakeformation-acceleration/*",
                   "arn:aws:logs:us-east-1:111122223333:log-group:/aws-lakeformation-acceleration/*:log-stream:*"
              ]
          }
      ]
  }
  ```

------

# Registering an Amazon S3 location
<a name="register-location"></a>

You must specify an AWS Identity and Access Management (IAM) role when you register an Amazon Simple Storage Service (Amazon S3) location. Lake Formation assumes that role when it grants temporary credentials to integrated AWS services that access the data in that location.

**Important**  
Avoid registering an Amazon S3 bucket that has **Requester pays** enabled. For buckets registered with Lake Formation, the role used to register the bucket is always viewed as the requester. If the bucket is accessed by another AWS account, the bucket owner is charged for data access if the role belongs to the same account as the bucket owner.

You can use the AWS Lake Formation console, Lake Formation API, or AWS Command Line Interface (AWS CLI) to register an Amazon S3 location.

**Before you begin**  
Review the [requirements for the role used to register the location](registration-role.md).

**To register a location (console)**
**Important**  
The following procedures assume that the Amazon S3 location is in the same AWS account as the Data Catalog and that the data in the location is not encrypted. Other sections in this chapter cover cross-account registration and registration of encrypted locations.

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as the data lake administrator or as a user with the `lakeformation:RegisterResource` IAM permission.

1. In the navigation pane, under **Administration**, select **Data lake locations**.

1. Choose **Register location**, and then choose **Browse** to select an Amazon Simple Storage Service (Amazon S3) path.

1. (Optional, but strongly recommended) Select **Review location permissions** to view a list of all existing resources in the selected Amazon S3 location and their permissions. 

   Registering the selected location might result in your Lake Formation users gaining access to data already at that location. Viewing this list helps you ensure that existing data remains secure.

1. For **IAM role**, choose either the `AWSServiceRoleForLakeFormationDataAccess` service-linked role (the default) or a custom IAM role that meets the requirements in [Requirements for roles used to register locations](registration-role.md).

   You can update a registered location or other details only when you register it using a custom IAM role. To edit a location registered using a service-linked role, you should deregister the location and register it again. 

1. Choose **Enable Data Catalog Federation** option to allow Lake Formation to assume a role and vend temporary credentials to integrated AWS services to access tables under federated databases. If a location is registered with Lake Formation, and you want to use the same location for a table under a federated database, you need to register the same location with the **Enable Data Catalog Federation** option.

1. Choose **Hybrid access mode** to not enable Lake Formation permissions by default. When you register Amazon S3 location in hybrid access mode, you can enable Lake Formation permissions by opting in principals for databases and tables under that location. 

   For more information on setting up hybrid access mode, see [Hybrid access mode](hybrid-access-mode.md).

1. Select **Register location**.

**To register a location (AWS CLI)**

1. 

**Register a new location with Lake Formation**

   This example uses a service-linked role to register the location. You can use the `--role-arn` argument instead to supply your own role.

   Replace *<s3-path>* with a valid Amazon S3 path, account number with a valid AWS account, and *<s3-access-role>* with an IAM role that has permissions to register a data location.
**Note**  
You can't edit properties of a registered location if it is registered using a service-linked role.

   ```
   aws lakeformation register-resource \
    --resource-arn arn:aws:s3:::<s3-path> \
    --use-service-linked-role
   ```

   The following example uses a custom role to register the location.

   ```
   aws lakeformation register-resource \
    --resource-arn arn:aws:s3:::<s3-path> \
    --role-arn arn:aws:iam::<123456789012>:role/<s3-access-role>
   ```

1. 

**To update a location registered with Lake Formation**

   You can edit a registered location only if it is registered using a custom IAM role. For a location registered with service-linked role, you should deregister the location and register it again. For more information, see [Deregistering an Amazon S3 location](unregister-location.md). 

   ```
   aws lakeformation update-resource \
    --role-arn arn:aws:iam::<123456789012>:role/<s3-access-role>\
    --resource-arn arn:aws:s3:::<s3-path>
   ```

   ```
   aws lakeformation update-resource \
    --resource-arn arn:aws:s3:::<s3-path> \
    --use-service-linked-role
   ```

1. 

**Register a data location in hybrid access mode with federation**

   ```
   aws lakeformation register-resource \
    --resource-arn arn:aws:s3:::<s3-path> \
    --role-arn arn:aws:iam::<123456789012>:role/<s3-access-role> \
    --hybrid-access-enabled
   ```

   ```
   aws lakeformation register-resource \
    --resource-arn arn:aws:s3:::<s3-path> \
    --role-arn arn:aws:iam::<123456789012>:role/<s3-access-role> \
    --with-federation
   ```

   ```
   aws lakeformation update-resource \
    --resource-arn arn:aws:s3:::<s3-path> \
    --role-arn arn:aws:iam::<123456789012>:role/<s3-access-role> \
    --hybrid-access-enabled
   ```

For more information, see [RegisterResource](https://docs.aws.amazon.com/lake-formation/latest/APIReference/API_RegisterResource.html) API operation.

**Note**  
Once you register an Amazon S3 location, any AWS Glue table pointing to the location (or any of its child locations) will return the value for the `IsRegisteredWithLakeFormation` parameter as `true` in the `GetTable` call. There is a known limitation that Data Catalog API operations such as `GetTables` and `SearchTables` do not update the value for the `IsRegisteredWithLakeFormation` parameter, and return the default, which is false. It is recommended to use the `GetTable` API to view the correct value for the `IsRegisteredWithLakeFormation` parameter. 

# Registering an encrypted Amazon S3 location
<a name="register-encrypted"></a>

Lake Formation integrates with [AWS Key Management Service](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html) (AWS KMS) to enable you to more easily set up other integrated services to encrypt and decrypt data in Amazon Simple Storage Service (Amazon S3) locations.

Both customer managed AWS KMS keys and AWS managed keys are supported. Currently, client-side encryption/decryption is supported only with Athena.

You must specify an AWS Identity and Access Management (IAM) role when you register an Amazon S3 location. For encrypted Amazon S3 locations, either the role must have permission to encrypt and decrypt data with the AWS KMS key, or the KMS key policy must grant permissions on the key to the role.

**Important**  
Avoid registering an Amazon S3 bucket that has **Requester pays** enabled. For buckets registered with Lake Formation, the role used to register the bucket is always viewed as the requester. If the bucket is accessed by another AWS account, the bucket owner is charged for data access if the role belongs to the same account as the bucket owner.

Lake Formation uses a service-linked role to register your data locations. However, this role has several [limitations](service-linked-role-limitations.md). Due to these constraints, we recommend creating and using a custom IAM role instead for more flexibility and control. The custom role you create to register the location must meets the requirements specified in [Requirements for roles used to register locations](registration-role.md).

**Important**  
If you used an AWS managed key to encrypt the Amazon S3 location, you can't use the Lake Formation service-linked role. You must use a custom role and add IAM permissions on the key to the role. Details are provided later in this section.

The following procedures explain how to register an Amazon S3 location that is encrypted with either a customer managed key or an AWS managed key.
+ [Registering a location encrypted with a customer managed key](#proc-register-cust-cmk)
+ [Registering a location encrypted with an AWS managed key](#proc-register-aws-cmk)

**Before You Begin**  
Review the [requirements for the role used to register the location](registration-role.md).<a name="proc-register-cust-cmk"></a>

**To register an Amazon S3 location encrypted with a customer managed key**
**Note**  
If the KMS key or Amazon S3 location are not in the same AWS account as the Data Catalog, follow the instructions in [Registering an encrypted Amazon S3 location across AWS accounts](register-cross-encrypted.md) instead.

1. Open the AWS KMS console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms) and log in as an AWS Identity and Access Management (IAM) administrative user or as a user who can modify the key policy of the KMS key used to encrypt the location.

1. In the navigation pane, choose **Customer managed keys**, and then choose the name of the desired KMS key.

1. On the KMS key details page, choose the **Key policy** tab, and then do one of the following to add your custom role or the Lake Formation service-linked role as a KMS key user:
   + **If the default view is showing** (with **Key administrators**, **Key deletion**, **Key users**, and **Other AWS accounts** sections) – Under the **Key users** section, add your custom role or the Lake Formation service-linked role `AWSServiceRoleForLakeFormationDataAccess`.
   + **If the key policy (JSON) is showing** – Edit the policy to add your custom role or the Lake Formation service-linked role `AWSServiceRoleForLakeFormationDataAccess` to the object "Allow use of the key," as shown in the following example.
**Note**  
If that object is missing, add it with the permissions shown in the example. The example uses the service-linked role.

     ```
             ...
             {
                 "Sid": "Allow use of the key",
                 "Effect": "Allow",
                 "Principal": {
                     "AWS": [
                         "arn:aws:iam::111122223333:role/aws-service-role/lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess",
                         "arn:aws:iam::111122223333:user/keyuser"
                     ]
                 },
                 "Action": [
                     "kms:Encrypt",
                     "kms:Decrypt",
                     "kms:ReEncrypt*",
                     "kms:GenerateDataKey*",
                     "kms:DescribeKey"
                 ],
                 "Resource": "*"
             },
             ...
     ```

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as the data lake administrator or as a user with the `lakeformation:RegisterResource` IAM permission.

1. In the navigation pane, under **Administration**, choose **Data lake locations**.

1. Choose **Register location**, and then choose **Browse** to select an Amazon Simple Storage Service (Amazon S3) path.

1. (Optional, but strongly recommended) Choose **Review location permissions** to view a list of all existing resources in the selected Amazon S3 location and their permissions. 

   Registering the selected location might result in your Lake Formation users gaining access to data already at that location. Viewing this list helps you ensure that existing data remains secure.

1. For **IAM role**, choose either the `AWSServiceRoleForLakeFormationDataAccess` service-linked role (the default) or your custom role that meets the [Requirements for roles used to register locations](registration-role.md).

1. Choose **Register location**.

For more information about the service-linked role, see [Service-linked role permissions for Lake Formation](service-linked-roles.md#service-linked-role-permissions).<a name="proc-register-aws-cmk"></a>

**To register an Amazon S3 location encrypted with an AWS managed key**
**Important**  
If the Amazon S3 location is not in the same AWS account as the Data Catalog, follow the instructions in [Registering an encrypted Amazon S3 location across AWS accounts](register-cross-encrypted.md) instead.

1. Create an IAM role to use to register the location. Ensure that it meets the requirements listed in [Requirements for roles used to register locations](registration-role.md).

1. Add the following inline policy to the role. It grants permissions on the key to the role. The `Resource` specification must designate the Amazon Resource Name (ARN) of the AWS managed key. You can obtain the ARN from the AWS KMS console. To get the correct ARN, ensure that you log in to the AWS KMS console with the same AWS account and Region as the AWS managed key that was used to encrypt the location.

------
#### [ JSON ]

****  

   ```
   {
     "Version":"2012-10-17",		 	 	 
     "Statement": [
       {
         "Effect": "Allow",
         "Action": [
           "kms:Encrypt",
           "kms:Decrypt",
           "kms:ReEncrypt*",
           "kms:GenerateDataKey*",
           "kms:DescribeKey"
         ],
         "Resource": "arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"
       }
     ]
   }
   ```

------

   You can use KMS key aliases instead of the key ID - `arn:aws:kms:region:account-id:key/alias/your-key-alias`

   For more information, see [Aliases in AWS KMS](https://docs.aws.amazon.com/kms/latest/developerguide/kms-alias.html) section in the AWS Key Management Service Developer Guide.

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as the data lake administrator or as a user with the `lakeformation:RegisterResource` IAM permission.

1. In the navigation pane, under **Administration**, choose **Data lake locations**.

1. Choose **Register location**, and then choose **Browse** to select an Amazon S3 path.

1. (Optional, but strongly recommended) Choose **Review location permissions** to view a list of all existing resources in the selected Amazon S3 location and their permissions. 

   Registering the selected location might result in your Lake Formation users gaining access to data already at that location. Viewing this list helps you ensure that existing data remains secure.

1. For **IAM role**, choose the role that you created in Step 1.

1. Choose **Register location**.

# Registering an Amazon S3 location in another AWS account
<a name="register-cross-account"></a>

AWS Lake Formation enables you to register Amazon Simple Storage Service (Amazon S3) locations across AWS accounts. For example, if the AWS Glue Data Catalog is in account A, a user in account A can register an Amazon S3 bucket in account B.

Registering an Amazon S3 bucket in AWS account B using an AWS Identity and Access Management (IAM) role in AWS account A requires the following permissions:
+ The role in account A must grant permissions on the bucket in account B.
+ The bucket policy in account B must grant access permissions to the role in Account A.

**Important**  
Avoid registering an Amazon S3 bucket that has **Requester pays** enabled. For buckets registered with Lake Formation, the role used to register the bucket is always viewed as the requester. If the bucket is accessed by another AWS account, the bucket owner is charged for data access if the role belongs to the same account as the bucket owner.  
You can't use the Lake Formation service-linked role to register a location in another account. You must use a user-defined role instead. The role must meet the requirements in [Requirements for roles used to register locations](registration-role.md). For more information about the service-linked role, see [Service-linked role permissions for Lake Formation](service-linked-roles.md#service-linked-role-permissions).

**Before you begin**  
Review the [requirements for the role used to register the location](registration-role.md).

**To register a location in another AWS account**
**Note**  
If the location is encrypted, follow the instructions in [Registering an encrypted Amazon S3 location across AWS accounts](register-cross-encrypted.md) instead.

The following procedure assumes that a principal in account 1111-2222-3333, which contains the Data Catalog, wants to register the Amazon S3 bucket `awsexamplebucket1`, which is in account 1234-5678-9012.

1. In account 1111-2222-3333, sign in to the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Create a new role or view an existing role that meets the requirements in [Requirements for roles used to register locations](registration-role.md). Ensure that the role grants Amazon S3 permissions on `awsexamplebucket1`.

1. Open the Amazon S3 console at [https://console.aws.amazon.com/s3/](https://console.aws.amazon.com/s3/). Sign in with account 1234-5678-9012.

1. In the **Bucket name** list, choose the bucket name, `awsexamplebucket1`.

1. Choose **Permissions**.

1. On the **Permissions** page, choose **Bucket Policy**.

1. In the **Bucket policy editor**, paste the following policy. Replace *<role-name>* with the name of your role.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect":"Allow",
               "Principal": {
                   "AWS":"arn:aws:iam::111122223333:role/<role-name>"
               },
               "Action":"s3:ListBucket",
               "Resource":"arn:aws:s3:::awsexamplebucket1"
           },
           {
               "Effect":"Allow",
               "Principal": {
                   "AWS":"arn:aws:iam::111122223333:role/<role-name>"
               },
               "Action": [
                   "s3:DeleteObject",
                   "s3:GetObject",
                   "s3:PutObject"
               ],
               "Resource":"arn:aws:s3:::awsexamplebucket1/*"
           }
       ]
   }
   ```

------

1. Choose **Save**.

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in to account 1111-2222-3333 as the data lake administrator or as a user with sufficient permissions to register locations.

1. In the navigation pane, under **Administration**, choose **Data lake locations**.

1. On **Data lake locations** page, choose **Register location**.

1. On the **Register location page**, for **Amazon S3 path**, enter the bucket name `s3://awsexamplebucket1`.
**Note**  
You must type the bucket name because cross-account buckets do not appear in the list when you choose **Browse**.

1. For **IAM role**, choose your role.

1. Choose **Register location**.

# Registering an encrypted Amazon S3 location across AWS accounts
<a name="register-cross-encrypted"></a>

AWS Lake Formation integrates with [AWS Key Management Service](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html) (AWS KMS) to enable you to more easily set up other integrated services to encrypt and decrypt data in Amazon Simple Storage Service (Amazon S3) locations.

Both customer managed keys and AWS managed keys are supported. Client-side encryption/decryption is not supported.

**Important**  
Avoid registering an Amazon S3 bucket that has **Requester pays** enabled. For buckets registered with Lake Formation, the role used to register the bucket is always viewed as the requester. If the bucket is accessed by another AWS account, the bucket owner is charged for data access if the role belongs to the same account as the bucket owner.

This section explains how to register an Amazon S3 location under the following circumstances:
+ The data in the Amazon S3 location is encrypted with a KMS key created in AWS KMS.
+ The Amazon S3 location is not in the same AWS account as the AWS Glue Data Catalog.
+ The KMS key either is or is not in the same AWS account as the Data Catalog.

Registering an AWS KMS–encrypted Amazon S3 bucket in AWS account B using an AWS Identity and Access Management (IAM) role in AWS account A requires the following permissions:
+ The role in account A must grant permissions on the bucket in account B.
+ The bucket policy in account B must grant access permissions to the role in Account A.
+ If the KMS key is in account B, the key policy must grant access to the role in account A, and the role in account A must grant permissions on the KMS key.

In the following procedure, you create a role in the AWS account that contains the Data Catalog (account A in the previous discussion). Then, you use this role to register the location. Lake Formation assumes this role when accessing underlying data in Amazon S3. The assumed role has the required permissions on the KMS key. As a result, you don't have to grant permissions on the KMS key to principals accessing underlying data with ETL jobs or with integrated services such as Amazon Athena.

**Important**  
You can't use the Lake Formation service-linked role to register a location in another account. You must use a user-defined role instead. The role must meet the requirements in [Requirements for roles used to register locations](registration-role.md). For more information about the service-linked role, see [Service-linked role permissions for Lake Formation](service-linked-roles.md#service-linked-role-permissions).

**Before You Begin**  
Review the [requirements for the role used to register the location](registration-role.md).

**To register an encrypted Amazon S3 location across AWS accounts**

1. In the same AWS account as the Data Catalog, sign into the AWS Management Console and open the IAM console at [https://console.aws.amazon.com/iam/](https://console.aws.amazon.com/iam/).

1. Create a new role or view an existing role that meets the requirements in [Requirements for roles used to register locations](registration-role.md). Ensure that the role includes a policy that grants Amazon S3 permissions on the location.

1. If the KMS key is not in the same account as the Data Catalog, add to the role an inline policy that grants the required permissions on the KMS key. The following is an example policy. Replace Region and account ID with the region and account number of the KMS key. Replace *<key-id>* with the key ID.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
           "Effect": "Allow",
           "Action": [
               "kms:Encrypt",
               "kms:Decrypt",
               "kms:ReEncrypt*",
               "kms:GenerateDataKey*",
               "kms:DescribeKey"
            ],
           "Resource": "arn:aws:kms:us-east-1:111122223333:key/<key-id>"
           }
       ]
   }
   ```

------

1. On the Amazon S3 console, add a bucket policy granting the required Amazon S3 permissions to the role. The following is an example bucket policy. Replace the account ID with the AWS account number of the Data Catalog, *<role-name>* with the name of your role, and *<bucket-name>* with the name of the bucket.

------
#### [ JSON ]

****  

   ```
   {
       "Version":"2012-10-17",		 	 	 
       "Statement": [
           {
               "Effect":"Allow",
               "Principal": {
                   "AWS":"arn:aws:iam::111122223333:role/<role-name>"
               },
               "Action":"s3:ListBucket",
               "Resource":"arn:aws:s3:::<bucket-name>"
           },
           {
               "Effect":"Allow",
               "Principal": {
                   "AWS":"arn:aws:iam::111122223333:role/<role-name>"
               },
               "Action": [
                   "s3:DeleteObject",
                   "s3:GetObject",
                   "s3:PutObject"
               ],
               "Resource":"arn:aws:s3:::<bucket-name>/*"
           }
       ]
   }
   ```

------

1. In AWS KMS, add the role as a user of the KMS key.

   1. Open the AWS KMS console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms). Then, sign in as an administrator user or as a user who can modify the key policy of the KMS key used to encrypt the location.

   1. In the navigation pane, choose **Customer managed keys**, and then choose the name of the KMS key.

   1. On the KMS key details page, under the **Key policy** tab, if the JSON view of the key policy is not showing, choose **Switch to policy view**.

   1. In the **Key policy** section, choose **Edit**, and add the Amazon Resource Name (ARN) of the role to the `Allow use of the key` object, as shown in the following example.
**Note**  
If that object is missing, add it with the permissions shown in the example.

      ```
              ...
              {
                  "Sid": "Allow use of the key",
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": [
                          "arn:aws:iam::<catalog-account-id>:role/<role-name>"
                      ]
                  },
                  "Action": [
                      "kms:Encrypt",
                      "kms:Decrypt",
                      "kms:ReEncrypt*",
                      "kms:GenerateDataKey*",
                      "kms:DescribeKey"
                  ],
                  "Resource": "*"
              },
              ...
      ```

      For more information, see [Allowing Users in Other Accounts to Use a KMS key](https://docs.amazonaws.cn/en_us/kms/latest/developerguide/key-policy-modifying-external-accounts.html) in the *AWS Key Management Service Developer Guide*.

       
1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign into the Data Catalog AWS account as the data lake administrator.

1. In the navigation pane, under **Administration**, choose **Data lake locations**.

1. Choose **Register location**.

1. On the **Register location page**, for **Amazon S3 path**, enter the location path as **s3://*<bucket>*/*<prefix>***. Replace *<bucket>* with the name of the bucket and *<prefix>* with the rest of the path for the location.
**Note**  
You must type the path because cross-account buckets do not appear in the list when you choose **Browse**.

1. For **IAM role**, choose the role from Step 2.

1. Choose **Register location**.

# Deregistering an Amazon S3 location
<a name="unregister-location"></a>

You can deregister an Amazon Simple Storage Service (Amazon S3) location if you no longer want it to be managed by Lake Formation. Deregistering a location does not affect Lake Formation data location permissions that are granted on that location. You can reregister a location that you deregistered, and the data location permissions remain in effect. You can use a different role to reregister the location.

**To deregister a location (console)**

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as the data lake administrator or as a user with the `lakeformation:RegisterResource` IAM permission.

1. In the navigation pane, under **Administration**, choose **Data lake locations**.

1. Select a location, and on the **Actions** menu, choose **Remove**.

1. When prompted for confirmation, choose **Remove**.

# Hybrid access mode
<a name="hybrid-access-mode"></a>

AWS Lake Formation *hybrid access mode* supports two permission pathways to the same AWS Glue Data Catalog objects.  In the first pathway, Lake Formation allows you to select specific principals, and grant them Lake Formation permissions to access catalogs, databases, tables, and views by opting in. The second pathway allows all other principals to access these resources through the default IAM principal policies for Amazon S3 and AWS Glue actions. 

When registering an Amazon S3 location with Lake Formation, you have the option to either enforce Lake Formation permissions for all resources at this location or use hybrid access mode. The hybrid access mode enforces only `CREATE_TABLE`, `CREATE_PARTITION`, `UPDATE_TABLE` permissions by default. When an Amazon S3 location is in the hybrid mode, you can enable Lake Formation permissions by opting in principals for the Data Catalog objects under that location. It means both Lake Formation permissions and IAM permissions can control access to that data. This means that opted in principals will require both Lake Formation permissions and IAM permissions to access the data, while non-opted-in principals will continue to access data using only IAM permissions.

Thus, hybrid access mode provides the flexibility to selectively enable Lake Formation for catalogs, databases, and tables in your Data Catalog for a specific set of users without interrupting the access for other existing users or workloads.

![\[AWS account architecture showing data flow between S3, Glue, Lake Formation, Athena, and IAM roles.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/hybrid-access-mode-concept.png)


For considerations and limitations, see [Hybrid access mode considerations and limitations](notes-hybrid.md).Terms and definitions

 Here are the definitions of Data Catalog resources based on how you set up access permissions: 

Lake Formation resource  
 A resource that is registered with Lake Formation. Users require Lake Formation permissions to access the resource. 

AWS Glue resource  
A resources that is not registered with Lake Formation. Users require only IAM permissions to access the resource because it has `IAMAllowedPrincipals` group permissions. Lake Formation permissions are not enforced.  
For more information on `IAMAllowedPrincipals` group permissions, see [Metadata permissions](metadata-permissions.md).

Hybrid resource  
A resources that is registered in hybrid access mode. Based on the users accessing the resource, the resource dynamically switch between being a Lake Formation resource or an AWS Glue resource. 

## Common hybrid access mode use cases
<a name="hybrid-access-mode-use-cases"></a>

You can use hybrid access mode to provide access in single account and cross-account data sharing scenarios:

**Single account scenarios**
+ **Convert an AWS Glue resource to a hybrid resource** – In this scenario, you are not currently using Lake Formation but want to adopt Lake Formation permissions for Data Catalog objects. When you register the Amazon S3 location in hybrid access mode, you can grant Lake Formation permissions to users who opt in specific databases and tables pointing to that location. 
+ **Convert a Lake Formation resource to a hybrid resource** – Currently, you are using Lake Formation permissions to control access for a Data Catalog database but want to provide access to new principals using IAM permissions for Amazon S3 and AWS Glue without interrupting the existing Lake Formation permissions.

  When you update a data location registration to hybrid access mode, new principals can access the Data Catalog database pointing the Amazon S3 location using IAM permissions policies without interrupting existing users' Lake Formation permissions.

  Before updating the data location registration to enable hybrid access mode, you need to first opt in principals that are currently accessing the resource with Lake Formation permissions.  This is to prevent potential interruption to the current workflow.  You need to also grant `Super` permission on the tables in the database to the `IAMAllowedPrincipal` group. 

**Cross-account data sharing scenarios**
+ **Share AWS Glue resources using hybrid access mode** – In this scenario, the producer account has tables in a database that are currently shared with a consumer account using IAM permissions policies for Amazon S3 and AWS Glue actions. The data location of the database is not registered with Lake Formation.

   Before registering the data location in hybrid access mode, you need to update the **Cross account version settings** to version 4. Version 4 provides the new AWS RAM permission policies required for cross-account sharing when `IAMAllowedPrincipal` group has `Super` permission on the resource. For those resources with `IAMAllowedPrincipal` group permissions, you can grant Lake Formation permissions to external accounts and opt them in to use Lake Formation permissions. The data lake administrator in the recipient account can grant Lake Formation permissions to principals in the account and opt them in to enforce the Lake Formation permissions. 
+ **Share Lake Formation resources using hybrid access mode** – Currently, the producer account has tables in a database that are shared with a consumer account enforcing Lake Formation permissions. The data location of the database is registered with Lake Formation.

  In this case, you can update the Amazon S3 location registration to hybrid access mode, and share the data from Amazon S3 and metadata from Data Catalog using Amazon S3 bucket policies and Data Catalog resource policies to principals in the consumer account. You need to re-grant the existing Lake Formation permissions and opt in the principals before updating the Amazon S3 location registration. Also, you need to grant `Super` permission on the tables in the database to the `IAMAllowedPrincipals` group.

**Topics**
+ [Common hybrid access mode use cases](#hybrid-access-mode-use-cases)
+ [How hybrid access mode works](hybrid-access-workflow.md)
+ [Setting up hybrid access mode - common scenarios](hybrid-access-setup.md)
+ [Removing principals and resources from hybrid access mode](delete-hybrid-access.md)
+ [Viewing principals and resources in hybrid access mode](view-hybrid-access.md)
+ [Additional resources](additional-resources-hybrid.md)

# How hybrid access mode works
<a name="hybrid-access-workflow"></a>

The following diagram shows how Lake Formation authorization works in hybrid access mode when you query the Data Catalog resources.

![\[AWS Lake Formation authorization process flowchart for hybrid access mode queries.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/hybrid-workflow.png)


Before accessing data in your data lake, a data lake administrator or a user with administrative permissions sets up individual Data Catalog table user policies to allow or deny access to tables in your Data Catalog. Then, a principal who has the permissions to perform `RegisterResource` operation registers the Amazon S3 location of the table with Lake Formation in hybrid access mode. If a data location is not registered with Lake Formation, the administrator grants Lake Formation permissions to specific users on the Data Catalog databases and tables and opts them in to use Lake Formation permissions for those databases and tables in hybrid access mode.

1. **Submits a query** - A principal submits a query or an ETL script using an integrated service such as Amazon Athena, AWS Glue, Amazon EMR, or Amazon Redshift Spectrum.

1. **Requests data** - The integrated analytical engine identifies the table that is being requested and sends the metadata request to the Data Catalog (`GetTable`, `GetDatabase`).

1. **Checks permissions** - The Data Catalog verifies the querying principal’s access permissions with Lake Formation.

   1. If the table doesn't have `IAMAllowedPrincipals` group permissions attached, Lake Formation permissions are enforced.

   1. If the principal has opted in to use Lake Formation permissions in the hybrid access mode, and the table has `IAMAllowedPrincipals` group permissions attached, Lake Formation permissions are enforced. The query engine applies the filters it received from Lake Formation and returns the data to the user.

   1. If the table location is not registered with Lake Formation and the principal has not opted in to use Lake Formation permissions in hybrid access mode, the Data Catalog checks if the table has `IAMAllowedPrincipals` group permissions attached to it. If this permission exists on the table, all principals in the account gets `Super` or `All` permissions on the table. 

      Lake Formation credential vending is not available, even when opted in, unless the data location is registered with Lake Formation.

1. **Get credentials** – The Data Catalog checks and lets the engine know if the table location is registered with Lake Formation or not. If the underlying data is registered with Lake Formation, the analytical engine requests Lake Formation for temporary credentials to access data in the Amazon S3 bucket. 

1. **Get data** – If the principal is authorized to access the table data, Lake Formation provides temporary access to the integrated analytical engine. Using the temporary access, the analytical engine fetches the data from Amazon S3, and performs necessary filtering such as column, row, or cell filtering. When the engine finishes running the job, it returns the results back to the user. This process is called credential vending. For more information,  see [Integrating third-party services with Lake Formation](Integrating-with-LakeFormation.md).

1.  If the data location of the table is not registered with Lake Formation, the second call from the analytic engine is made directly to Amazon S3. The concerned Amazon S3 bucket policy and IAM user policy are evaluated for data access. Whenever you use IAM policies, make sure that you follow IAM best practices. For more information, see [Security best practices in IAM in the IAM User Guide](https://docs.aws.amazon.com/IAM/latest/UserGuide/best-practices.html).

# Setting up hybrid access mode - common scenarios
<a name="hybrid-access-setup"></a>

As with Lake Formation permissions, you generally have two types of scenarios in which you can use hybrid access mode to manage data access: Provide access to principals within one AWS account and provide access to an external AWS account or principal.

 This section provides instructions for setting up hybrid access mode in the following scenarios: 

**Manage permissions in hybrid access mode within one AWS account**
+ [Converting an AWS Glue resource to a hybrid resource](hybrid-access-mode-new.md) – You are currently providing access to tables in a database for all principals in your account using IAM permissions for Amazon S3 and AWS Glue but want to adopt Lake Formation to manage permissions incrementally. 
+ [Converting a Lake Formation resource to a hybrid resource](hybrid-access-mode-update.md) – You are currently using Lake Formation to manage access for tables in a database for all principals in your account but want to use Lake Formation only for specific principals. You want to provide access to new principals by using IAM permissions for AWS Glue and Amazon S3 on the same database and tables.

**Manage permissions in hybrid access mode across AWS accounts**
+ [Sharing an AWS Glue resource using hybrid access mode](hybrid-access-mode-cross-account.md) – You're currently not using Lake Formation to manage permissions for a table but want to apply Lake Formation permissions to provide access for principals in another account.
+ [Sharing a Lake Formation resource using hybrid access mode](hybrid-access-mode-cross-account-IAM.md) – You're using Lake Formation to manage access for a table but want to provide access for principals in another account by using IAM permissions for AWS Glue and Amazon S3 on the same database and tables. 

**Setting up hybrid access mode – High-level steps**

1. Register the Amazon S3 data location with Lake Formation by selecting **Hybrid access mode**. 

1. Principals must have `DATA_LOCATION` permission on a data lake location to create Data Catalog tables or databases that point to that location. 

1.  Set the **Cross-account version setting** to Version 4. 

1. Grant fine-grained permissions to specific IAM users or roles on databases and tables. At the same time, make sure to set `Super` or `All` permissions to the `IAMAllowedPrincipals` group on the database and all or selected tables in the database.

1. Opt in the principals and resources. Other principals in the account can continue accessing the databases and tables using IAM permission policies for AWS Glue and Amazon S3 actions.

1. Optionally clean up IAM permission policies for Amazon S3 for the principals that are opted in to use Lake Formation permissions.

# Prerequisites for setting up hybrid access mode
<a name="hybrid-access-prerequisites"></a>

The following are the prerequisites for setting up hybrid access mode: 

**Note**  
 We recommend that a Lake Formation administrator registers the Amazon S3 location in hybrid access mode, and opt in principals and resources. 

1. Grant data location permission (`DATA_LOCATION_ACCESS`) to create Data Catalog resources that point to the Amazon S3 locations. Data location permissions control the ability to create Data Catalog catalogs, databases and tables that point to particular Amazon S3 locations. 

1. To share Data Catalog resources with another account in hybrid access mode (without removing `IAMAllowedPrincipals` group permissions from the resource), you need to update the **Cross account version settings** to Version 4 or higher. To update the version using Lake Formation console, choose **Version 4** or **Version 5** under **Cross account version settings** on the **Data Catalog settings** page. 

   You can also use the `put-data-lake-settings` AWS CLI command to set the `CROSS_ACCOUNT_VERSION` parameter to version 4 or 5:

   ```
   aws lakeformation put-data-lake-settings --region us-east-1 --data-lake-settings file://settings
   {
   "DataLakeAdmins": [
           {
   "DataLakePrincipalIdentifier": "arn:aws:iam::<111122223333>:user/<user-name>"
           }
       ],
       "CreateDatabaseDefaultPermissions": [],
       "CreateTableDefaultPermissions": [],
       "Parameters": {
   "CROSS_ACCOUNT_VERSION": "5"
       }
   }
   ```

1.  To grant cross-account permissions in hybrid access mode, the grantor must have the required IAM permissions for AWS Glue and AWS RAM services. The AWS managed policy `AWSLakeFormationCrossAccountManager` grants the required permissions.  To enable cross-account data sharing in hybrid access mode, we’ve updated the `AWSLakeFormationCrossAccountManager` managed policy by adding two new IAM permissions:
   + ram:ListResourceSharePermissions
   + ram:AssociateResourceSharePermission
**Note**  
If you are not using the AWS managed policy for the grantor role, add the above policies to your custom policies.

## Amazon S3 bucket location and user access
<a name="w2aac11c34c21c15b9"></a>

When you create a catalog, database or a table in the AWS Glue Data Catalog, you can specify the Amazon S3 bucket location of the underlying data and register it with Lake Formation. The tables below describe how permissions work for AWS Glue and Lake Formation users (principals) based on the Amazon S3 data location of the table or database. 


**Amazon S3 location registered with Lake Formation**  

| Amazon S3 location of a database | AWS Glue users | Lake Formation users | 
| --- | --- | --- | 
|  Registered with Lake Formation (in hybrid access mode or in Lake Formation mode)  |  Have read/write access to the Amazon S3 data location by inheriting permissions from the IAMAllowedPrincipals group (super access) permissions.  | Inherit permissions to create tables from their granted CREATE TABLE permission. | 
| No associated Amazon S3 location |  Require explicit DATA LOCATION permission for running CREATE TABLE and INSERT TABLE statements.  |  Require explicit DATA LOCATION permission for running CREATE TABLE and INSERT TABLE statements.  | 

****IsRegisteredWithLakeFormation** table property**  
The `IsRegisteredWithLakeFormation` property of a table indicates whether the data location of the table is registered with Lake Formation for the requester. If the permission mode of the location is registered as Lake Formation, then the `IsRegisteredWithLakeFormation` property is `true` for all users accessing the data location because all users are considered as opted in for that table. If the location is registered in hybrid access mode, then the value is set to `true` only for users who have opted in for that table. 


**How `IsRegisteredWithLakeFormation` works**  

| Permission mode | Users/Roles |  `IsRegisteredWithLakeFormation`  | Description | 
| --- | --- | --- | --- | 
|  Lake Formation  | All | True |  When a location is registered with Lake Formation, the `IsRegisteredWithLakeFormation` property will be set to true for all users. This means that the permissions defined in Lake Formation apply to the registered location. Credential vending will be done by Lake Formation.  | 
| Hybrid access mode | Opted in | True |  For users who have opted in to using Lake Formation for data access and governance for a table, the `IsRegisteredWithLakeFormation` property will be set to `true` for that table. They are subject to the permission policies defined in Lake Formation for the registered location.  | 
| Hybrid access mode | Not opted in | False |  For users who have not opted in to using Lake Formation permissions, the `IsRegisteredWithLakeFormation` property is set to `false`. They are not subject to the permission policies defined in Lake Formation for the registered location. Instead, users will follow the Amazon S3 permissions policies.  | 

# Converting an AWS Glue resource to a hybrid resource
<a name="hybrid-access-mode-new"></a>

Follow these steps to register an Amazon S3 location in hybrid access mode and on-board new Lake Formation users without interrupting the existing Data Catalog users' data access. 

Scenario description - The data location is not registered with Lake Formation, and users' access to the Data Catalog database and tables is determined by IAM permissions policies for Amazon S3 and AWS Glue actions.  The `IAMAllowedPrincipals` group by default has `Super` permissions on all tables in the database. 

**To enable hybrid access mode for a data location that is not registered with Lake Formation**

1. 

**Register an Amazon S3 location enabling hybrid access mode.**

------
#### [ Console ]

   1. Sign in to the [Lake Formation console](https://console.aws.amazon.com/lakeformation/) as a data lake administrator. 

   1. In the navigation pane, choose **Data lake locations** under **Administration**.

   1. Choose **Register location**.  
![\[Register location form for Amazon S3 data lake with path input, IAM role selection, and permission mode options.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/hybrid-access-register-s3.png)

   1. On the **Register location** window, choose the **Amazon S3** path that you want to register with Lake Formation. 

   1. For **IAM role**, choose either the `AWSServiceRoleForLakeFormationDataAccess` service-linked role (the default) or a custom IAM  role that meets the requirements in [Requirements for roles used to register locations](registration-role.md). 

   1. Choose **Hybrid access mode** to apply fine-grained Lake Formation access control policies to opt-in principals and Data Catalog databases and tables pointing to the registered location. 

      Choose Lake Formation to allow Lake Formation to authorize access requests to the registered location. 

   1. Choose **Register location**.

------
#### [ AWS CLI ]

   Following is an example for registering a data location with Lake Formation with HybridAccessEnabled:true/false. Default value for the `HybridAccessEnabled` parameter is false. Replace Amazon S3 path, role name, and AWS account id with valid values.

   ```
   aws lakeformation register-resource --cli-input-json file:file path
   json:
       {
           "ResourceArn": "arn:aws:s3:::s3-path",
           "UseServiceLinkedRole": false,
           "RoleArn": "arn:aws:iam::<123456789012>:role/<role-name>",
           "HybridAccessEnabled": true
       }
   ```

------

1. 

**Grant permissions and opt in principals to use Lake Formation permissions for resources in hybrid access mode**

   Before you opt in principals and resources in hybrid access mode, verify that `Super` or `All` permissions to `IAMAllowedPrincipals` group exists on the databases and tables that have location registered with Lake Formation in hybrid access mode.
**Note**  
You can't grant the `IAMAllowedPrincipals` group permission on `All tables` within a database. You need to select each table separately from the drop-down menu, and grant permissions. Also, when you create new tables in the database, you can choose to use the `Use only IAM access control for new tables in new databases` in the **Data Catalog Settings**. This option grants `Super` permission to the `IAMAllowedPrincipals` group automatically when you create new tables within the database. 

------
#### [ Console ]

   1. On the Lake Formation console, under **Data Catalog**, choose **Catalogs**, **Databases**, or **Tables**.

   1. Select a catalog, a database, or a table from the list, and choose **Grant** from the **Actions** menu.

   1. Choose principals to grant permissions on the database, tables, and columns using named resource method or LF-Tags.

      Alternatively, choose **Data permissions**, select the principals to grant permissions from the list, and choose **Grant**.

      For more details on granting data permissions, see [Granting permissions on Data Catalog resources](granting-catalog-permissions.md).
**Note**  
If you’re granting a principal Create table permission, you also need to grant data location permissions (`DATA_LOCATION_ACCESS`) to the principal. This permission is not needed to update tables.  
For more information, see [Granting data location permissions](granting-location-permissions.md).

   1. When you use **Named resource method** to grant permissions, the option to opt in principals and resources is available on the lower section of the **Grant data permission** page. 

      Choose **Make Lake Formation permissions effective immediately** to enable Lake Formation permissions for the principals and resources.  
![\[The option to choose hybrid access mode for the Data Catalog resource.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/hybrid-access-grant-option.png)

   1. Choose **Grant**.

       When you opt in principal A on table A that is pointing to a data location, it allows principal A to have access to this table’s location using Lake Formation permissions if the data location is registered in hybrid mode. 

------
#### [ AWS CLI ]

   Following is an example for opting in a principal and a table in hybrid access mode. Replace the role name, AWS account id, database name, and table name with valid values.

   ```
   aws lakeformation create-lake-formation-opt-in --cli-input-json file://file path
   json:
     {
           "Principal": {
               "DataLakePrincipalIdentifier": "arn:aws:iam::<123456789012>:role/<hybrid-access-role>"
           },
           "Resource": {
               "Table": {
                   "CatalogId": "<123456789012>",
                   "DatabaseName": "<hybrid_test>",
                   "Name": "<hybrid_test_table>"
               }
           }
       }
   ```

------

   1. If you choose LF-Tags to grant permissions, you can opt in principals to use Lake Formation permissions in a separate step. You can do this by choosing **Hybrid access mode** under **Permissions** from the left navigation bar.

   1.  On the lower section of the **Hybrid access mode** page, choose **Add** to add resources and principals to hybrid access mode. 

   1.  On the **Add resources and principals** page, choose the catalogs, databases and tables registered in hybrid access mode. 

      You can choose `All tables` under a database to grant access.  
![\[The interface to add catalogs, databases, and tables in hybrid access mode.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/hybrid-access-opt-in.png)

   1. Choose principals opt in to use Lake Formation permissions in hybrid access mode.
      +  **Principals** – You can choose IAM users and roles in the same account or in another account. You can also choose SAML users and groups.
      + **Attributes** – Select attributes to grant permissions based on attributes.  
![\[The interface to add principals and resources with an attribute expression.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/abac-hybrid-access.png)
      + Enter the key-value pair to create a grant based on attributes. Review the Cedar policy expression on the console. For more information about Cedar, see [What is Cedar? \$1 Cedar Policy Language Reference GuideLink](https://docs.cedarpolicy.com/).
      + Choose **Add**.

        All IAM roles/users with matching attributes are granted access.

   1. Choose **Add**.

# Converting a Lake Formation resource to a hybrid resource
<a name="hybrid-access-mode-update"></a>

In cases where you're currently using Lake Formation permissions for your Data Catalog databases and tables, you can edit the location registration properties to enable hybrid access mode. This allows you to provide new principals access to the same resources using IAM permission policies for Amazon S3 and AWS Glue actions without interrupting existing Lake Formation permissions.

 Scenario description - The following steps assume that you’ve a data location registered with Lake Formation, and you've set up permissions for principals on databases, tables, or columns pointing to that location. If the location was registered with a service linked role, you can’t update the location parameters and enable hybrid access mode. The `IAMAllowedPrincipals` group by default has Super permissions on the database and all its tables. 

**Important**  
Don’t update a location registration to hybrid access mode without opting in the principals that are accessing data in this location.

**Enabling hybrid access mode for a data location registered with Lake Formation**

1. 
**Warning**  
We don't recommend converting a Lake Formation managed data location to hybrid access mode to avoid interrupting the permission policies of other existing users or workloads.

   Opt in the existing principals who have Lake Formation permissions.

   1. List and review the permissions you’ve granted to principals on catalogs, databases and tables. For more information, see [Viewing database and table permissions in Lake Formation](viewing-permissions.md). 

   1. Choose **Hybrid access mode** under **Permissions** from the left navigation bar, and choose **Add**. 

   1. On the **Add principals and resources** page, choose the catalogs, databases, and tables from the Amazon S3 data location that you want to use in hybrid access mode. Choose the principals that already have Lake Formation permissions. 

   1.  Choose **Add** to opt in the principals to use Lake Formation permissions in hybrid access mode.

1.  Update the Amazon S3 bucket/prefix registration by choosing **Hybrid access mode** option. 

------
#### [ Console ]

   1. Sign in to the Lake Formation console as the data lake administrator.

   1.  In the navigation pane, under **Register and Ingest**, choose **Data lake locations**.

   1. Select a location, and on the **Actions**menu, choose **Edit**.

   1. Choose **Hybrid access mode**. 

   1. Choose **Save**. 

   1. Under Data Catalog, select the database or table and grant `Super` or `All` permissions to the virtual group called `IAMAllowedPrincipals`. 

   1.  Verify that your existing Lake Formation users' access is not interrupted when you updated the location registration properties. Sign in to Athena console as a Lake Formation principal and run a sample query on a table that is pointing to the updated location. 

      Similarly, verify the access of AWS Glue users who are using IAM permissions policies to access the database and tables.

------
#### [ AWS CLI ]

   Following is an example for registering a data location with Lake Formation with HybridAccessEnabled:true/false. Default value for the `HybridAccessEnabled` parameter is false. Replace Amazon S3 path, role name, and AWS account id with valid values.

   ```
   aws lakeformation update-resource --cli-input-json file://file path
   json:
   {
       "ResourceArn": "arn:aws:s3:::<s3-path>",
       "RoleArn": "arn:aws:iam::<123456789012>:role/<test>",
       "HybridAccessEnabled": true
   }
   ```

------

# Sharing an AWS Glue resource using hybrid access mode
<a name="hybrid-access-mode-cross-account"></a>

Share data with another AWS account or a principal in another AWS account enforcing Lake Formation permissions without interrupting existing Data Catalog users' IAM based access. 

Scenario description - The producer account has a Data Catalog database that has access controlled using IAM principal policies for Amazon S3 and AWS Glue actions. The data location of the database is not registered with Lake Formation. The `IAMAllowedPrincipals` group, by default, has `Super` permissions on the database and all its tables. 

**Granting cross-account Lake Formation permissions in hybrid access mode**

1. 

**Producer account set up**

   1. Sign in to the Lake Formation console using a role that has `lakeformation:PutDataLakeSettings` IAM permission.

   1. Go to **Data Catalog settings**, and choose `Version 4` for the **Cross account version settings**.

      If you're currently using version 1 or 2, see [Updating cross-account data sharing version settings](optimize-ram.md) instructions on updating to version 3. 

      There are no permission policy changes required when upgrading from version 3 to 4.

   1. Register the Amazon S3 location of the database or table that you're planning to share in hybrid access mode.

   1. Verify that `Super` permission to the `IAMAllowedPrincipals` group exists on the databases and tables of which you registered the data location in hybrid access mode in the above step. 

   1. Grant Lake Formation permissions to AWS organizations, organizational units (OUs), or directly with an IAM principal in another account.

   1. If you're granting permissions directly to an IAM principal, opt in the principal from the consumer account to enforce Lake Formation permissions in hybrid access mode by enabling the option **Make Lake Formation permissions effective immediately**.

       If you're granting cross-account permissions to another AWS account, when you opt in the account, Lake Formation permissions are enforced only for the admins of that account. The recipient account data lake administrator need to cascade down the permissions and opt in the principals in the account to enforce Lake Formation permissions for the shared resources that are in hybrid access mode.

      If you choose **Resources matched by LF-Tags** option to grant cross-account permissions, you need to first complete granting permissions step. You can opt in principals and resources to hybrid access mode as a separate step by choosing **Hybrid access mode** under Permissions on the left-navigation bar of the Lake Formation console. Then choose **Add** to add the resources and principals that you want to enforce Lake Formation permissions. 

1. 

**Consumer account set up**

   1. Sign in to the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) as a data lake administrator.

   1. Go to [https://console.aws.amazon.com/ram/home](https://console.aws.amazon.com/ram/home), and accept the resource share invitation. The **Shared with me** tab in the AWS RAM console displays the database and tables that are shared with your account.

   1.  Create a resource link to the shared database and/or table in Lake Formation.

   1.  Grant `Describe` permission on resource link and `Grant on target` permission (on the original shared resource) to the IAM principals in your (consumer) account. 

   1.  Grant Lake Formation permissions on the database or table shared with you to the principals in your account. Opt in the principals and resources to enforce Lake Formation permissions in hybrid access mode by enabling the option **Make Lake Formation permissions effective immediately**.

   1.  Test the principal's Lake Formation permissions by running sample Athena queries. Test the existing access of your AWS Glue users with IAM principal policies for Amazon S3 and AWS Glue actions.

      (Optional) Remove the Amazon S3 bucket policy for data access and IAM principal policies for AWS Glue and Amazon S3 data access for the principals that you configured to use Lake Formation permissions.

# Sharing a Lake Formation resource using hybrid access mode
<a name="hybrid-access-mode-cross-account-IAM"></a>

Allow new Data Catalog users in an external account to access Data Catalog databases and tables using IAM based policies without interrupting the existing Lake Formation cross-account sharing permissions.

Scenario description - The producer account has Lake Formation managed database and tables that are shared with an external (consumer) account at account-level or IAM principal-level. The data location of the database is registered with Lake Formation. The `IAMAllowedPrincipals` group does not have `Super` permissions on the database and its tables. 

**Granting cross-account access to new Data Catalog users via IAM based policies without interrupting existing Lake Formation permissions**

1. 

**Producer account set up**

   1. Sign in to the Lake Formation console using a role that `lakeformation:PutDataLakeSettings`. 

   1. Under **Data Catalog settings**, choose `Version 4` for the **Cross account version settings**.

      If you're currently using version 1 or 2, see [Updating cross-account data sharing version settings](optimize-ram.md) instructions on updating to version 3. 

      There are no permission policy changes required to upgrade from version 3 to 4.

   1. List the permissions you’ve granted to principals on databases and tables. For more information, see [Viewing database and table permissions in Lake Formation](viewing-permissions.md). 

   1.  Regrant existing Lake Formation cross- account permissions by opting in principals and resources.
**Note**  
Before updating a data location registration to hybrid access mode to grant cross-account permissions, you need to regrant at least one cross-account data share per account. This step is necessary to update the AWS RAM managed permissions attached to the AWS RAM resource share.  
In July 2023, Lake Formation has updated the AWS RAM managed permissions used for sharing databases and tables:  
`arn:aws:ram::aws:permission/AWSRAMLFEnabledGlueAllTablesReadWriteForDatabase` (database-level share policy)
`arn:aws:ram::aws:permission/AWSRAMLFEnabledGlueTableReadWrite` (table-level share policy) 
The cross-account permission grants made before July 2023 don't have these updated AWS RAM permissions.   
If you've granted cross-account permissions directly to principals, you need to individually regrant those permissions to the principals. If you skip this step, the principals accessing the shared resource might get an illegal combination error. 

   1. Go to [https://console.aws.amazon.com/ram/home](https://console.aws.amazon.com/ram/home). 

   1. The **Shared by me** tab in the AWS RAM console displays the database and table names that you've shared with an external account or principal.

       Ensure that the permissions attached to the shared resource has the correct ARN. 

   1. Verify the resources in the AWS RAM share are in `Associated` status. If the status shows as `Associating`, wait until they go into `Associated` state. If the status becomes `Failed`, stop and contact Lake Formation service team. 

   1. Choose **Hybrid access mode** under **Permissions** from the left navigation bar, and choose **Add**. 

   1.  The **Add principals and resources** page shows the databases, and/or tables and the principals that have access. You can make the required updates by adding or removing principals and resources.

   1.  Choose the principals with Lake Formation permissions for the database and tables that you want to change to hybrid access mode. Choose the databases and tables. 

   1.  Choose **Add** to opt in the principals to enforce Lake Formation permissions in hybrid access mode.

   1.  Grant `Super` permission to the virtual group `IAMAllowedPrincipals` on your database and selected tables. 

   1. Edit the Amazon S3 location Lake Formation registration to hybrid access mode.

   1. Grant permissions for the AWS Glue users in the external (consumer) account using IAM permission policies for Amazon S3 AWS Glue actions. 

1. 

**Consumer account set up**

   1. Sign in to the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/) as a data lake administrator. 

   1. Go to [https://console.aws.amazon.com/ram/home](https://console.aws.amazon.com/ram/home) and accept the resource share invitation. The **Resources shared with me** tab in the AWS RAM page displays the database and table names that are shared with your account.

       For the AWS RAM share, ensure that the attached permission has the correct ARN of the shared AWS RAM invite. Check if the resources in the AWS RAM share are in `Associated` status. If the status shows as `Associating`, wait until they go into `Associated` state. If the status becomes `Failed`, stop and contact Lake Formation service team. 

   1.  Create a resource link to the shared database and/or table in Lake Formation.

   1.  Grant `Describe` permission on resource link and `Grant on target` permission (on the original shared resource) to the IAM principals in your (consumer) account. 

   1. Next, set up Lake Formation permissions for principals in your account on the shared database or table.

      On the left navigation bar, under **Permissions**, choose **Hybrid access mode**.

   1.  Choose **Add** in the lower section of the **Hybrid access mode** page to opt in the principals and the database or table shared with you from the producer account.

   1.  Grant permissions for the AWS Glue users in your account using IAM permission policies for Amazon S3 AWS Glue actions. 

   1.  Test users' Lake Formation permissions and AWS Glue permissions by running separate sample queries on the table using Athena

      (Optional) Clean up IAM permission policies for Amazon S3 for the principals that are in the hybrid access mode.

# Removing principals and resources from hybrid access mode
<a name="delete-hybrid-access"></a>

 Follow these steps to remove databases, tables, and principals from hybrid access mode. 

------
#### [ Console ]

1. Sign in to the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. Under **Permissions**, choose **Hybrid access mode**.

1.  On the **Hybrid access mode** page, select the checkbox next to the database or table name and choose `Remove`. 

1. A warning message prompts you to confirm the action. Choose **Remove**.

   Lake Formation no longer enforces permissions for those resources, and access to this resource will be controlled using IAM and AWS Glue permissions. This may cause the user to no longer have access to this resource if they don't have the appropriate IAM permissions. 

------
#### [ AWS CLI ]

 The following example shows how to remove resources from hybrid access mode. 

```
aws lakeformation delete-lake-formation-opt-in --cli-input-json file://file path

json:
{
    "Principal": {
        "DataLakePrincipalIdentifier": "arn:aws:iam::<123456789012>:role/role name"
    },
    "Resource": {
        "Table": {
            "CatalogId": "<123456789012>",
            "DatabaseName": "<database name>",
            "Name": "<table name>"
          }
    }
}
```

------

# Viewing principals and resources in hybrid access mode
<a name="view-hybrid-access"></a>

 Follow these steps to view databases, tables, and principals in hybrid access mode. 

------
#### [ Console ]

1. Sign in to the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. Under **Permissions**, choose **Hybrid access mode**.

1.  The **Hybrid access mode** page shows the resources and principals that are currently in hybrid access mode.. 

------
#### [ AWS CLI ]

 The following example shows how to list all opt in principals and resources that are in hybrid access mode. 

```
      
aws lakeformation list-lake-formation-opt-ins
```

 The following example shows how to list opt in for a specific principal-resource pair.

```
aws lakeformation list-lake-formation-opt-ins --cli-input-json file://file path

json:
{
    "Principal": {
        "DataLakePrincipalIdentifier": "arn:aws:iam::<account-id>:role/<role name>"
    },
    "Resource": {
        "Table": {
            "CatalogId": "<account-id>",
            "DatabaseName": "<database name>",
            "Name": "<table name>"
          }
    }
}
```

------

# Additional resources
<a name="additional-resources-hybrid"></a>

In the following blog post, we walk you through the instructions to onboard Lake Formation permissions in hybrid access mode for selected users while the database is already accessible to other users through IAM and Amazon S3 permissions. We will review the instructions to set up hybrid access mode within an AWS account and between two accounts. 
+ [ Introducing hybrid access mode for AWS Glue Data Catalog to secure access using Lake Formation and IAM and Amazon S3 policies. ](https://aws.amazon.com/blogs/big-data/introducing-hybrid-access-mode-for-aws-glue-data-catalog-to-secure-access-using-aws-lake-formation-and-iam-and-amazon-s3-policies/)

# Creating objects in the AWS Glue Data Catalog
<a name="populating-catalog"></a>

AWS Lake Formation uses the AWS Glue Data Catalog (Data Catalog) to store metadata about data lakes, data sources, transforms, and targets. Metadata is data about the underlying data in your dataset. Each AWS account has one Data Catalog per AWS Region.

Metadata in the Data Catalog is organized in a three-level data hierarchy comprising catalogs, databases, and tables. It organizes data from various sources into logical containers called catalogs. Each catalog represents data from sources like Amazon Redshift data warehouses, Amazon DynamoDB databases, and third-party data sources such as Snowflake, MySQL, and over 30 external data sources, which are integrated through federated connectors. You can also create new catalogs in the Data Catalog to store data in S3 Table Buckets or Redshift Managed Storage (RMS).

Tables store information about the underlying data, including schema information, partition information, and data location. Databases are collections of tables. The Data Catalog also contains resource links, which are links to shared catalogs, databases and tables in external accounts, and are used for cross-account access to data in the data lake.

The Data Catalog is a nested catalog object that contains catalogs, databases and tables. It is referenced by the AWS account ID, and is the default catalog in an account and an AWS Region. The Data Catalog uses a three-level hierarchy (catalog.database.table) to organize tables. 
+ Catalog – The top-most level of Data Catalog’s three level metadata hierarchy. You can add multiple catalogs in a Data Catalog through federation.
+ Database – The second level of the metadata hierarchy comprising of tables and views. A database is also referred to as a schema in many data systems like Amazon Redshift and Trino.
+ Table and view – The third-level of the Data Catalog's 3-level data hierarchy.

All Iceberg tables in Amazon S3 are stored in the default Data Catalog having Catalog ID = AWS account ID. You can create federated catalogs in AWS Glue Data Catalog that store definitions of tables in Amazon Redshift, Amazon S3 Table storage, or other third-party data sources through federation. 

**Topics**
+ [Creating a catalog](creating-catalog.md)
+ [Creating a database](creating-database.md)
+ [Creating tables](creating-tables.md)
+ [Building AWS Glue Data Catalog views](working-with-views.md)

# Creating a catalog
<a name="creating-catalog"></a>

Catalogs represent the highest or top-most level in the three-level metadata hierarchy of the AWS Glue Data Catalog. You can use multiple methods to bring data into the Data Catalog and create multi-level catalogs. 

 For more information on creating catalogs from external data sources, see [Bringing your data into the AWS Glue Data Catalog](bring-your-data-overview.md). 

 To create a catalog using the Lake Formation console, you must be signed in as a data lake administrator or a *catalog creator*. A catalog creator is a principal who has been granted the Lake Formation `CREATE_CATALOG` permission. You can see a list of catalog creators on the **Administrative roles and tasks** page of the Lake Formation console. To view this list, you must have the `lakeformation:ListPermissions` IAM permission and be signed in as a data lake administrator or as a catalog creator with the grant option on the `CREATE_CATALOG` permission.

# Creating a database
<a name="creating-database"></a>

Metadata tables in the Data Catalog are stored within databases. You can create as many databases as you need, and you can grant different Lake Formation permissions on each database.

Databases can have an optional location property. This location is typically within an Amazon Simple Storage Service (Amazon S3) location that is registered with Lake Formation. When you specify a location, principals do not need data location permissions to create Data Catalog tables that point to locations within the database location. For more information, see [Underlying data access control](access-control-underlying-data.md#data-location-permissions).

To create a database using the Lake Formation console, you must be signed in as a data lake administrator or *database creator*. A database creator is a principal who has been granted the Lake Formation `CREATE_DATABASE` permission. You can see a list of database creators on the **Administrative roles and tasks** page of the Lake Formation console. To view this list, you must have the `lakeformation:ListPermissions` IAM permission and be signed in as a data lake administrator or as a database creator with the grant option on the `CREATE_DATABASE` permission.

**To create a database**

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/), and sign in as a data lake administrator or database creator.

1. In the navigation pane, under **Data catalog**, choose **Databases**.

1. Choose **Create database**.

1. In the **Create database** dialog box, enter a database name, optional location, and optional description.

1. Optionally select **Use only IAM access control for new tables in this database**.

   For information about this option, see [Changing the default settings for your data lake](change-settings.md).

1. Choose **Create database**.

# Creating tables
<a name="creating-tables"></a>

AWS Lake Formation metadata tables contain information about data in the data lake, including schema information, partition information, and data location. These tables are stored in the AWS Glue Data Catalog. You use them to access underlying data in the data lake and manage that data with Lake Formation permissions. Tables are stored within databases in the Data Catalog.

There are several ways to create Data Catalog tables:
+ Run a crawler in AWS Glue. See [Defining crawlers](https://docs.aws.amazon.com/glue/latest/dg/add-crawler.html) in the *AWS Glue Developer Guide*.
+ Create and run a workflow. See [Importing data using workflows in Lake Formation](workflows.md).
+ Create a table manually using the Lake Formation console, AWS Glue API, or AWS Command Line Interface (AWS CLI).
+ Create a table using Amazon Athena.
+ Create a resource link to a table in an external account. See [Creating resource links](creating-resource-links.md).

# Creating Apache Iceberg tables
<a name="creating-iceberg-tables"></a>

 AWS Lake Formation supports creating Apache Iceberg tables that use the Apache Parquet data format in the AWS Glue Data Catalog with data residing in Amazon S3. A table in the Data Catalog is the metadata definition that represents the data in a data store. By default, Lake Formation creates Iceberg v2 tables. For the difference between v1 and v2 tables, see [Format version changes](https://iceberg.apache.org/spec/#appendix-e-format-version-changes) in the Apache Iceberg documentation.

 [Apache Iceberg](https://iceberg.apache.org/) is an open table format for very large analytic datasets. Iceberg allows for easy changes to your schema, also known as schema evolution, meaning that users can add, rename, or remove columns from a data table without disrupting the underlying data. Iceberg also provides support for data versioning, which allows users to track changes to data overtime. This enables the time travel feature, which allows users to access and query historical versions of data and analyze changes to the data between updates and deletes.

You can use Lake Formation console or the `CreateTable` operation in the AWS Glue API to create an Iceberg table in the Data Catalog. For more information, see [CreateTable action (Python: create\$1table)](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-catalog-tables.html#aws-glue-api-catalog-tables-CreateTable).

When you create an Iceberg table in the Data Catalog, you must specify the table format and metadata file path in Amazon S3 to be able to perform reads and writes.

 You can use Lake Formation to secure your Iceberg table using fine-grained access control permissions when you register the Amazon S3 data location with AWS Lake Formation. For source data in Amazon S3 and metadata that is not registered with Lake Formation, access is determined by IAM permissions policies for Amazon S3 and AWS Glue actions. For more information, see [Managing Lake Formation permissions](managing-permissions.md). 

**Note**  
Data Catalog doesn’t support creating partitions and adding Iceberg table properties.

**Topics**
+ [Prerequisites](#iceberg-prerequisites)
+ [Creating an Iceberg table](#create-iceberg-table)

## Prerequisites
<a name="iceberg-prerequisites"></a>

 To create Iceberg tables in the Data Catalog, and set up Lake Formation data access permissions, you need to complete the following requirements: 

1. 

**Permissions required to create Iceberg tables without the data registered with Lake Formation.**

   In addition to the permissions required to create a table in the Data Catalog, the table creator requires the following permissions:
   + `s3:PutObject` on resource arn:aws:s3:::\$1bucketName\$1
   + `s3:GetObject` on resource arn:aws:s3:::\$1bucketName\$1
   + `s3:DeleteObject`on resource arn:aws:s3:::\$1bucketName\$1

1. 

**Permissions required to create Iceberg tables with data registered with Lake Formation:**

   To use Lake Formation to manage and secure the data in your data lake, register your Amazon S3 location that has the data for tables with Lake Formation. This is so that Lake Formation can vend credentials to AWS analytical services such as Athena, Redshift Spectrum, and Amazon EMR to access data. For more information on registering an Amazon S3 location, see [Adding an Amazon S3 location to your data lake](register-data-lake.md). 

   A principal who reads and writes the underlying data that is registered with Lake Formation requires the following permissions:
   + `lakeformation:GetDataAccess`
   + `DATA_LOCATION_ACCESS`

     A principal who has data location permissions on a location also has location permissions on all child locations.

     For more information on data location permissions, see [Underlying data access control](access-control-underlying-data.md).

 To enable compaction, the service needs to assume an IAM role that has permissions to update tables in the Data Catalog. For details, see [Table optimization prerequisites](https://docs.aws.amazon.com/glue/latest/dg/optimization-prerequisites.html). 

## Creating an Iceberg table
<a name="create-iceberg-table"></a>

You can create Iceberg v1 and v2 tables using Lake Formation console or AWS Command Line Interface as documented on this page. You can also create Iceberg tables using AWS Glue console or AWS Glue crawler. For more information, see [Data Catalog and Crawlers](https://docs.aws.amazon.com/glue/latest/dg/catalog-and-crawler.html) in the AWS Glue Developer Guide.

**To create an Iceberg table**

------
#### [ Console ]

1. Sign in to the AWS Management Console, and open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. Under Data Catalog, choose **Tables**, and use the **Create table** button to specify the following attributes:
   + **Table name**: Enter a name for the table. If you’re using Athena to access tables, use these [naming tips](https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html) in the Amazon Athena User Guide.
   + **Database**: Choose an existing database or create a new one.
   + **Description**:The description of the table. You can write a description to help you understand the contents of the table.
   + **Table format**: For **Table format**, choose Apache Iceberg.  
![\[Apache Iceberg table option selected with table optimization options.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/table-optimization.png)
   + **Table optimization**
     + **Compaction** – Data files are merged and rewritten remove obsolete data and consolidate fragmented data into larger, more efficient files.
     + **Snapshot retention **– Snapshots are timestamped versions of an Iceberg table. Snapshot retention configurations allow customers to enforce how long to retain snapshots and how many snapshots to retain. Configuring a snapshot retention optimizer can help manage storage overhead by removing older, unnecessary snapshots and their associated underlying files.
     + **Orphan file deletion** – Orphan files are files that are no longer referenced by the Iceberg table metadata. These files can accumulate over time, especially after operations like table deletions or failed ETL jobs. Enabling orphan file deletion allows AWS Glue to periodically identify and remove these unnecessary files, freeing up storage.

     For more information, see [Optimizing Iceberg tables](https://docs.aws.amazon.com/glue/latest/dg/table-optimizers.html).
   + **IAM role**: To run compaction, the service assumes an IAM role on your behalf. You can choose an IAM role using the drop-down. Ensure that the role has the permissions required to enable compaction.

     To learn more about the required permissions, see [Table optimization prerequisites](https://docs.aws.amazon.com/glue/latest/dg/optimization-prerequisites.html).
   + **Location**: Specify the path to the folder in Amazon S3 that stores the metadata table. Iceberg needs a metadata file and location in the Data Catalog to be able to perform reads and writes.
   + **Schema**: Choose **Add columns** to add columns and data types of the columns. You have the option to create an empty table and update the schema later. Data Catalog supports Hive data types. For more information, see [Hive data types](https://cwiki.apache.org/confluence/plugins/servlet/mobile?contentId=27838462#content/view/27838462). 

      Iceberg allows you to evolve schema and partition after you create the table. You can use [Athena queries](https://docs.aws.amazon.com/athena/latest/ug/querying-iceberg-evolving-table-schema.html) to update the table schema and [Spark queries](https://iceberg.apache.org/docs/latest/spark-ddl/#alter-table-sql-extensions) for updating partitions. 

------
#### [ AWS CLI ]

```
aws glue create-table \
    --database-name iceberg-db \
    --region us-west-2 \
    --open-table-format-input '{
      "IcebergInput": { 
           "MetadataOperation": "CREATE",
           "Version": "2"
         }
      }' \
    --table-input '{"Name":"test-iceberg-input-demo",
            "TableType": "EXTERNAL_TABLE",
            "StorageDescriptor":{ 
               "Columns":[ 
                   {"Name":"col1", "Type":"int"}, 
                   {"Name":"col2", "Type":"int"}, 
                   {"Name":"col3", "Type":"string"}
                ], 
               "Location":"s3://DOC_EXAMPLE_BUCKET_ICEBERG/"
            }
        }'
```

------

# Optimizing Iceberg tables
<a name="data-compaction"></a>

Lake Formation supports multiple table optimization options to enhance the management and performance of Apache Iceberg tables used by the AWS analytical engines and ETL jobs. These optimizers provide efficient storage utilization, improved query performance, and effective data management. There are three types of table optimizers available in Lake Formation: 
+ **Compaction **– Data compaction compacts small data files to reduce storage usage and improve read performance. Data files are merged and rewritten to remove obsolete data and consolidate fragmented data into larger, more efficient files. Compaction can be configured to run automatically or manually triggered as needed. 
+ **Snapshot retention **– Snapshots are timestamped versions of an Iceberg table. Snapshot retention configurations allow customers to enforce how long to retain snapshots and how many snapshots to retain. Configuring a snapshot retention optimizer can help manage storage overhead by removing older, unnecessary snapshots and their associated underlying files.
+ **Orphan file deletion** – Orphan files are files that are no longer referenced by the Iceberg table metadata. These files can accumulate over time, especially after operations like table deletions or failed ETL jobs. Enabling orphan file deletion allows AWS Glue to periodically identify and remove these unnecessary files, freeing up storage.

You can enable or disable compaction, snapshot retention, and orphan file deletion optimizers for individual Iceberg tables in the Data Catalog using the AWS Glue console, AWS CLI, or AWS Glue API operations.

For more information, see [Optimizing Iceberg tables](https://docs.aws.amazon.com/glue/latest/dg/table-optimizers.html) in the AWS Glue Developer Guide.

# Searching for tables
<a name="searching-for-tables"></a>

You can use the AWS Lake Formation console to search for Data Catalog tables by name, location, containing database, and more. The search results show only the tables that you have Lake Formation permissions on.

**To search for tables (console)**

1. Sign in to the AWS Management Console and open the Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/).

1. In the navigation pane, choose **Tables**.

1. Position the cursor in the search field at the top of the page. The field has the placeholder text *Find table by properties*.

   The **Properties** menu appears, showing the various table properties to search by.  
![\[The properties menu is dropped down from the search field and contains these entries: Name, Classification, Database, Location, Catalog ID\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/search-for-tables.png)

1. Do one of the following:
   + Search by containing database.

     1. Choose **Database** from the **Properties** menu, and then either choose a database from the **Databases** menu that appears or type a database name and press **Enter**.

        The tables that you have permissions on in the database are listed.

     1. (Optional) To narrow down the list to a single table in the database, position the cursor in the search field again, choose **Name** from the **Properties** menu, and either choose a table name from the **Tables** menu that appears or type a table name and press **Enter**.

        The single table is listed, and both the database name and table name appear as tiles under the search field.  
![\[Beneath the search field are two tiles: one labeled Database, which includes the selected database name, and one labeled Table, which includes the selected table name. To the right of the tiles is a Clear filter button.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/search-for-tables-with-filter.png)

        To adjust the filter, close either of the tiles or choose **Clear filter**.
   + Search by other properties.

     1. Choose a search property from the **Properties** menu.

        To search by AWS account ID, choose **Catalog ID** from the **Properties** menu, enter a valid AWS account ID (for example, 111122223333), and press **Enter**.

        To search by location, choose **Location** from the **Properties** menu, and select a location from the **Locations** menu that appears. All tables in the root location of the selected location (for example, Amazon S3) are returned.

**Searching tables using AWS CLI**
+ The following example shows how to run a partial serach. The `--search-text` parameter allows you to search for tables that contain the specified text in their metadata. In this case, it returns all tables that have "customer" in their name, description, or other metadata fields.

  ```
  aws glue search-tables 
        --search-text "customer" 
        --region AWS Region
        --max-results 10
        --sort-criteria "FieldName=Name,Sort=ASC"
  ```

# Sharing Data Catalog tables and databases across AWS Accounts
<a name="sharing-catalog-resources"></a>

You can share Data Catalog resources (databases and tables) with external AWS accounts by granting Lake Formation permissions on the resources to the external accounts. Users can then run queries and jobs that join and query tables across multiple accounts. With some restrictions, when you share a Data Catalog resource with another account, principals in that account can operate on that resource as if the resource were in their Data Catalog.

You don't share resources with specific principals in external AWS accounts—you share the resources with an AWS account or organization. When you share a resource with an AWS organization, you're sharing the resource with all accounts at all levels in that organization. The data lake administrator in each external account must then grant permissions on the shared resources to principals in their account.

For more information, see [Cross-account data sharing in Lake Formation](cross-account-permissions.md) and [Granting permissions on Data Catalog resources](granting-catalog-permissions.md).

**See Also:**  
[Accessing and viewing shared Data Catalog tables and databases](viewing-shared-resources.md)
[Prerequisites](cross-account-prereqs.md)

# Building AWS Glue Data Catalog views
<a name="working-with-views"></a>

In the AWS Glue Data Catalog, a *view* is a virtual table in which the contents are defined by a SQL query that references one or more tables. You can create a Data Catalog view that references up to 10 tables using SQL editors for Amazon Athena, Amazon Redshift, or Apache Spark using EMR Serverless or AWS Glue version 5.0. Underlying reference tables for a view can belong to the same database or different databases within the same AWS account's Data Catalog.

You can reference standard AWS Glue tables and tables in open table formats (OTF) such as [Apache Hudi](https://hudi.incubator.apache.org/), Linux Foundation [Delta Lake](https://delta.io/), and [Apache Iceberg](https://iceberg.apache.org/), with underlying data stored in Amazon S3 locations registered with AWS Lake Formation. Additionally, you can create views from federated tables from Amazon Redshift datashares that are shared with Lake Formation. 

## Differentiating Data Catalog views from other view types
<a name="diff-views"></a>

Data Catalog views differ from Apache Hive, Apache Spark and Amazon Athena views. The Data Catalog view is a native feature of the AWS Glue Data Catalog, and is a multi-dialect definer-created view. You can create a Data Catalog view using one of the supported analytics services, such as Athena or Amazon Redshift Spectrum, and access the same view using other supported analytics services. On the other hand, the Apache Hive, Apache Spark, and Athena views are created independently in each analytics service, such as Athena and Amazon Redshift, and are visible and accessible only within that service.

## What is a definer view?
<a name="definer-view"></a>

 A definer view is a SQL view that operates based on the permissions of the principal that created it. The definer role has the necessary permissions to access the referenced tables, and it runs the SQL statement that defines the view. The definer creates the view and shares it with other users through AWS Lake Formation's fine-grained access control. 

When a user queries the definer view, the query engine uses the definer role's permissions to access the underlying reference tables. This approach enables users to interact with the view without requiring direct access to the source tables, enhancing security and simplifying data access management.

To set up a definer view, the definer IAM role can be within the same AWS account as the base tables, or in a different account using cross-account definer roles. For more information about the permissions required for the definer role, see [Prerequisites for creating views](views-prereqs.md). 

## A framework for multi-dialect views
<a name="multi-dialect"></a>

The Data Catalog supports creating views using multiple structured query language (SQL) dialects. SQL is a language used for storing and processing information in a relational database and each AWS analytical engine uses its own variation of SQL, or SQL dialect.

You create a Data Catalog view in one SQL dialect using one of the supported analytics query engine. Subsequently, you can update the view using the `ALTER VIEW` statement in a different SQL dialect within any other supported analytics engine. However, each dialect must reference the same set of tables, columns, and data types.

You can access the multiple dialects available for the view using the `GetTable` API, AWS CLI and AWS console. Thus, the Data Catalog view is visible and available to query across different supported analytics engines.

By defining a common view schema and metadata object that you can query from multiple engines, Data Catalog views enable you to use uniform views across your data lake.

For more details on how the schema is resolved for each dialect, see, [link to the API reference](). For more details on the matching rules for different types, see, [link to the relevant section in the API doc]().

## Integrating with Lake Formation permissions
<a name="lf-view-integ"></a>

You can use AWS Lake Formation to centralize permissions management on AWS Glue Data Catalog views for users. You can grant fine-grained permissions on the Data Catalog views using the named resource method or LF-Tags, and share them across AWS accounts, AWS organizations, and organizational units. You can also share and access the Data Catalog views across AWS Regions using resource links. This allows users to provide data access without duplicating the data source, and sharing the underlying tables.

The `CREATE VIEW` DDL statement of a Data Catalog view can reference the standard AWS Glue tables and tables in open table formats (OTF) such as Hudi, Delta Lake, and Iceberg with underlying data stored in Amazon S3 locations registered with Lake Formation as well as the federated tables from Amazon Redshift datashare that are shared with Lake Formation. The tables can be of any file format as long as the engine used to query the view supports that format. You can also reference built in functions of the engine on which it is run but other engine-specific resources may not be allowed. For more details, see [Data Catalog views considerations and limitations](views-notes.md)

## Use cases
<a name="views-use-cases"></a>

Following are the important use cases for Data Catalog views:
+ Create and manage permissions on a single view schema. This helps you avoid the risk of inconsistent permissions on duplicate views created in multiple engines.
+ Grant permissions to users on a view that references multiple tables without granting permissions directly on the underlying reference tables.
+ Achieve row level filtering on tables using LF-Tags (where LF-Tags cascade only up to column level) by applying LF-Tags on views and granting LF-Tags based permissions to users. 

## Supported AWS analytics services for views
<a name="views-supported-engines"></a>

The following AWS analytics services support creating Data Catalog views:
+ Amazon Redshift
+ Amazon Athena version 3
+ Apache Spark on EMR Serverless
+  Apache Spark on AWS Glue version 5.0

## Additional resources
<a name="views-addtional-resources"></a>

You can learn more about the Data Catalog in this guide, as well as using the following resources:

The following video demonstrates how to create views and query them from Athena and Amazon Redshift.

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/rFO2OoxVYxE?si=Z0qsyuvTp2ZJg-PL/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/rFO2OoxVYxE?si=Z0qsyuvTp2ZJg-PL)


**Topics**
+ [Differentiating Data Catalog views from other view types](#diff-views)
+ [What is a definer view?](#definer-view)
+ [A framework for multi-dialect views](#multi-dialect)
+ [Integrating with Lake Formation permissions](#lf-view-integ)
+ [Use cases](#views-use-cases)
+ [Supported AWS analytics services for views](#views-supported-engines)
+ [Additional resources](#views-addtional-resources)
+ [Prerequisites for creating views](views-prereqs.md)
+ [Creating Data Catalog views using DDL statements](create-views.md)
+ [Creating Data Catalog views using AWS Glue APIs](views-api-usage.md)
+ [Granting permissions on Data Catalog views](grant-perms-views.md)
+ [Materialized views](materialized-views.md)

# Prerequisites for creating views
<a name="views-prereqs"></a>
+ To create views in Data Catalog, you must register the underlying Amazon S3 data locations of the reference tables with Lake Formation. For details on registering data with Lake Formation, see [Adding an Amazon S3 location to your data lake](register-data-lake.md). 
+ Only IAM roles can create Data Catalog views. Other IAM identities can't create Data Catalog views.
+ The IAM role that defines the view must have the following permissions:
  + Lake Formation `SELECT` permission with the `Grantable` option on all reference tables, all columns included.
  + Lake Formation `CREATE_TABLE` permission on the target database where views are being created.
  + A trust policy for the Lake Formation and AWS Glue services to assume the role. 

------
#### [ JSON ]

****  

    ```
    {
        "Version":"2012-10-17",		 	 	 
        "Statement": [
            {
                "Sid": "DataCatalogViewDefinerAssumeRole1",
                "Effect": "Allow",
                "Principal": {
                   "Service": [
                        "glue.amazonaws.com",
                        "lakeformation.amazonaws.com"
                     ]
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
    ```

------
  + The iam:PassRole permission for AWS Glue and Lake Formation.

------
#### [ JSON ]

****  

    ```
    {
        "Version":"2012-10-17",		 	 	 
        "Statement": [
            {
                "Sid": "DataCatalogViewDefinerPassRole1",
                "Action": [
                    "iam:PassRole"
                ],
                "Effect": "Allow",
                "Resource": "*",
                "Condition": {
                    "StringEquals": {
                        "iam:PassedToService": [ 
                            "glue.amazonaws.com",
                            "lakeformation.amazonaws.com"
                          ]
                    }
                }
            }
        ]
    }
    ```

------
  + AWS Glue and Lake Formation permissions.

------
#### [ JSON ]

****  

    ```
    {
        "Version":"2012-10-17",		 	 	 
                     "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "Glue:GetDatabase",
                    "Glue:GetDatabases",
                    "Glue:CreateTable",
                    "Glue:GetTable",
                    "Glue:GetTables",
                    "Glue:BatchGetPartition",
                    "Glue:GetPartitions",
                    "Glue:GetPartition",
                    "Glue:GetTableVersion",
                    "Glue:GetTableVersions",
    				"Glue:PassConnection",
                    "lakeFormation:GetDataAccess"
                ],
                "Resource": "*"
            }
        ]   
    }
    ```

------
+ You can't create views in a database that has `Super` or `ALL` permission granted to the `IAMAllowedPrincipals` group. You can either revoke the `Super` permission for the `IAMAllowedPrincipals` group on a database, see [Step 4: Switch your data stores to the Lake Formation permissions model](upgrade-glue-lake-formation.md#upgrade-glue-lake-formation-step4), or create a new database with the **Use only IAM access control for new tables in this database** box unchecked under **Default permissions for newly created tables**.

# Creating Data Catalog views using DDL statements
<a name="create-views"></a>

You can create AWS Glue Data Catalog views using SQL editors for Athena, Amazon Redshift, and using the AWS Glue APIs/AWS CLI.

To create a Data Catalog view using SQL editors, choose Athena or Redshift Spectrum, and create the view using a `CREATE VIEW` Data Definition Language (DDL) statement. After creating a view in the dialect of the first engine, you can use an `ALTER VIEW` DDL statement from the second engine to add the additional dialects.

When defining views, it is important to consider the following:
+ **Defining multi-dialect views** – When you define a view with multiple dialects, the schemas of the different dialects must match. Each SQL dialect will have a slightly different syntax specification. The query syntax defining the Data Catalog view should resolve to the exact same column list, including both types and names, across all the dialects. This information is stored in the `StorageDescriptor` of the view. The dialects must also reference the same underlying table objects from the Data Catalog.

  To add another dialect to a view using DDL, you can use the `ALTER VIEW` statement. If an `ALTER VIEW` statement tries to update the view definition, such as modifying the storage descriptor or underlying tables of the view, the statement errors out saying "Input and existing storage descriptor mismatch". You can use SQL cast operations to ensure that the view column types match. 
+ **Updating a view** – To update the view, you can use the `UpdateTable` API. If you update the view without matching storage descriptors or the reference tables, you can provide the `FORCE` flag (see engine SQL documentation for syntax). After a force update, the view will take on the forced `StorageDescriptor` and reference tables. Any further `ALTER VIEW` DDL should match the modified values. A view that has been updated to have incompatible dialects will be in a "Stale" status. The view status is visible in the Lake Formation console and using the `GetTable` operation.
+ **Referencing a varchar column type as a string** – It is not possible to cast a varchar column type of Redshift Spectrum to a string. If a view is created in Redshift Spectrum with a varchar column type and a subsequent dialect tries to reference that field as a string, the Data Catalog will treat it as string without the need for the `FORCE` flag.
+ **Treatment of complex type fields** – Amazon Redshift treats all complex types as [SUPER types](https://docs.aws.amazon.com/redshift/latest/dg/r_SUPER_type.html) while Athena specifies the complex type. If a view has a `SUPER` type field, and another engine references that column as a particular complex type, such as struct (`<street_address:struct<street_number:int, street_name:string, street_type:string>>`), the Data Catalog assumes that the field to be the specific complex type, and uses that in the storage descriptor, without requiring the `Force` flag.

For more information about the syntax for creating and managing Data Catalog views, see:
+ [Using AWS Glue Data Catalog views](https://docs.aws.amazon.com/athena/latest/ug/views-glue.html) in the Amazon Athena User Guide. 
+ [Glue Data Catalog view query syntax](https://docs.aws.amazon.com/athena/latest/ug/views-glue-ddl.html) in the Amazon Athena User Guide. 
+ [Creating views in the AWS Glue Data Catalog](https://docs.aws.amazon.com/redshift/latest/dg/data-catalog-views-overview.html) in the Amazon Redshift Database Developer Guide.

  For more information about the SQL commands related to views in the Data Catalog, see [CREATE EXTERNAL VIEW](https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_EXTERNAL_VIEW.html), [ALTER EXTERNAL VIEW](https://docs.aws.amazon.com/redshift/latest/dg/r_ALTER_EXTERNAL_VIEW.html), and [DROP EXTERNAL VIEW](https://docs.aws.amazon.com/redshift/latest/dg/r_DROP_EXTERNAL_VIEW.html).

After you create a Data Catalog view, the details of the view is available in the Lake Formation console.

1. Choose **Views** under Data Catalog in the Lake Formation console.

1. A list of available views appears on the views page.

1. Choose a view from the list and the details page shows the attributes of the view.

![\[The lower section contains five tabs arranged horizontally where each tab includes corresponding information .\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/view-definition.png)


Schema  
Choose a `Column` row, and select **Edit LF-Tags** to update tag values or assign new LF-Tags.

SQL definitions  
You can see a list of available SQL definitions. Select **Add SQL definition**, and choose a query engine to add a SQL definition. Choose a query engine (Athena or Amazon Redshift) under the `Edit definition` column to update a SQL definition.

LF-Tags  
Choose **Edit LF-Tags** to edit values for a tag or assign new tags. You can use LF-Tags to grant permissions on views.

Cross-account access  
You can see a list of AWS accounts, organizations and organizational units (OUs) which with you've shared the Data Catalog view.

Underlying tables  
The underlying tables referenced in the SQL definition used to create the view are shown under this tab.

# Creating Data Catalog views using AWS Glue APIs
<a name="views-api-usage"></a>

You can use AWS Glue [CreateTable](https://docs.aws.amazon.com/glue/latest/webapi/API_CreateTable.html), and [UpdateTable](https://docs.aws.amazon.com/glue/latest/webapi/API_UpdateTable.html) APIs to create and update views in the Data Catalog. The `CreateTable` and `UpdateTable` operations have a new `TableInput` structure for `ViewDefinition`, while `SearchTables`, `GetTable`, `GetTables`, `GetTableVersion`, `GetTableVersions` operations provide the `ViewDefinition` in their output syntax for views. Additionally, there is a new `Status` field in the `GetTable` API output. 

Two new AWS Glue connections are available for validating the SQL dialect for each supported query engine, Amazon Athena and Amazon Redshift.

The `CreateTable` and `UpdateTable` APIs are asynchronous when used with views. When these APIs are called with multiple SQL dialects, the call are validated with each engine to determine whether the dialect can be run on that engine, and if the resulting schema of the view from each dialect matches. The AWS Glue service uses these connections to make internal calls to the analytical engines. These calls simulates what the engine does to validate if a `CREATE VIEW` or `ALTER VIEW` SQL DDL were executed on the engine.

If the SQL provided is valid, and the schemas match between view dialects, the AWS Glue API atomically commits the result. Atomicity allows views with multiple dialects to be created or altered without any downtime. 

**Topics**
+ [Creating AWS Glue connections to validate status](views-api-usage-connection.md)
+ [Validating the view generation status](views-api-usage-get-table.md)
+ [Asynchronous states and operations](views-api-usage-async-states.md)
+ [View creation failure scenarios during asynchronous operations](views-api-usage-errors.md)

# Creating AWS Glue connections to validate status
<a name="views-api-usage-connection"></a>

To create or update a AWS Glue Data Catalog view using the `CreateTable` or `UpdateTable` operations, you must create a new type of AWS Glue connection for validation, and provide it to the supported analytics engine. These connections are required to use Data Catalog views with Athena or Amazon Redshift. You can create these connections only using the AWS CLI, AWS SDKs, or AWS Glue APIs. You can't use the AWS Management Console to create the AWS Glue connection.

**Note**  
If the view definer role and the role calling `CreateTable` or `UpdateTable` are different, then both of them require `glue:PassConnection` permission in their IAM policy statement.

For more information, see the [create-connection](https://awscli.amazonaws.com/v2/documentation/api/latest/reference/glue/create-connection.html) AWS CLI documentation.

**AWS CLI command for creating a connection**  
The following is an AWS CLI command for creating a connection:

```
aws glue create-connection --region us-east-1 
--endpoint-url https://glue.us-east-1.amazonaws.com 
--cli-input-json file:///root/path/to/create-connection.json
```

**AWS CLI input JSON**  
For Amazon Redshift:

```
{
    "CatalogId": "123456789012",
    "ConnectionInput": {
        "ConnectionType": "VIEW_VALIDATION_REDSHIFT",
        "Name": "views-preview-cluster-connection-2",
        "Description": "My first Amazon Redshift validation connection",
        "ConnectionProperties": {
            "DATABASE": "dev",
            "CLUSTER_IDENTIFIER": "glue-data-catalog-views-preview-cluster"
        }
    }
}
```

For Amazon Athena:

```
{
    "CatalogId": "123456789012",
    "ConnectionInput": {
        "ConnectionType": "VIEW_VALIDATION_ATHENA",
        "Name": "views-preview-cluster-connection-3",
        "Description": "My first Amazon Athena validation connection",
        "ConnectionProperties": {
            "WORKGROUP_NAME": "workgroup-name"
        }
    }
}
```

# Validating the view generation status
<a name="views-api-usage-get-table"></a>

When you run the `CreateTable` or `UpdateTable` operations, the `Status` field for the `GetTable` API output shows the details of the view creation status. For `create` requests where the table does not already exist, AWS Glue creates an empty table for the duration of the asynchronous process. When calling `GetTable`, you can pass an optional boolean flag `IncludeStatusDetails`, which shows diagnostic information about the request. In the case of a failure, this flag shows an error message with individual statuses of each dialect.

Errors during view create, read, update, and delete (CRUD) operations can occur either during processing in the AWS Glue/Lake Formation service or during view SQL validation in Amazon Redshift or Athena. When an error occurs during validation in an engine, the AWS Glue service provides the error message that the engine returns.

**Status fields**  
The following are the status fields:
+ Status: a generic status, which is agnostic to different types of jobs:
  + QUEUED
  + IN\$1PROGRESS
  + SUCCESS
  + FAILED
+ Action – Indicates which action was called on the table, currently only `CREATE` or `UPDATE` operations are available.

  Distinguishing between `UPDATE` and `CREATE` operations is important when working with views. The operation type determines how you should proceed with querying the tables.

   An `UPDATE` operation signifies that the table already exists in the Data Catalog. In this case, you can continue querying the previously created table without any issues. On the other hand, a `CREATE `operation indicates that the table has never been successfully created before. If a table is marked for `CREATE`, attempting to query it will fail because the table does not yet exist in the system. Therefore, it is essential to identify the operation type (UPDATE or CREATE) before attempting to query a table. 
+ RequestedBy – The ARN of the user who requested the asynchronous change.
+ UpdatedBy – The ARN of the user who last manually alter the asynchronous change process, such as requesting a cancellation or modification.
+ Error – This field only appears when the state is **FAILED**. This is a parent level exception message. There may be different errors for each dialect.
  + ErrorCode – The type of exception.
  + ErrorMessage – a brief description of the exception.
+ RequestTime – an ISO 8601-formatted date string indicating the time that the change was initiated.
+ UpdateTime – an ISO 8601-formatted date string indicating the time that the state was last updated.

# Asynchronous states and operations
<a name="views-api-usage-async-states"></a>

When you run a `glue:CreateTable` request, the asynchronous creation of the Data Catalog view begins. In the following sections, this document describes the `Status` of a AWS Glue view that is available in a `glue:GetTable` response. For brevity, this section omits the full response.

```
{
    "Table": {
        ...
        "Status": {
            ...
            "Action": "CREATE",
            "State": "QUEUED",
        }
    }
}
```

Both of the above attributes represent important diagnostic information which indicates the state of the asynchronous operation, as well as the actions that can be performed on this view. Below are the possible values that these attributes can take on.

1. `Status.Action`

   1. CREATE

   1. UPDATE

1. `Status.State`

   1. QUEUED

   1. IN\$1PROGRESS

   1. SUCCESS

   1. FAILED

It is also important to note that some updates on a Data Catalog view don't require an asynchronous operation. For example, one may wish to update the `Description` attribute of the table. Since this does not require any asynchronous operations, the resulting table metadata will not have any `Status`, and the attribute will be `NULL`.

```
{
    "Table": {
        ...,
        "Description": "I changed this attribute!"
    }
}
```

Next, this topic explores how the above status information can impact operations that can be performed on an AWS Glue view.

**glue:CreateTable**  
There are no changes for this API when compared to how `glue:CreateTable` functions for any Glue table. `CreateTable` may be called for any table name that does not already exist.

**glue:UpdateTable**  
This operation cannot be performed on an AWS Glue view which has the following status information:

1. Action == CREATE and State == QUEUED

1. Action == CREATE and State == IN\$1PROGRESS

1. Action == CREATE and state == FAILED

1. Action == UPDATE and state == QUEUED

1. Action == UPDATE and state == IN\$1PROGRESS

To summarize, you can update a Data Catalog view only when it meets the following requirements.

1. It has been successfully created for the first time.

   1. Action == CREATE and State == SUCCESS

1. It has reached a terminal state after an asynchronous update operation.

   1. Action == UPDATE and State == SUCCESS

   1. Action == UPDATE and State == FAILED

1. It has a `NULL` state attribute as a result of a synchronous update.

**glue:DeleteTable**  
There are no changes for this operation when compared to how `glue:DeleteTable` functions for any AWS Glue table. You can delete a Data Catalog view regardless of its state.

**glue:GetTable**  
There are no changes for this operation when compared to how `glue:GetTable` functions for any AWS Glue table. However, you can't query a Data Catalog view from the analytical engines until it has been successfully created for the first time. `Action == CREATE and State == SUCCESS`. After you create a Data Catalog view successfully for the first time, you can query the view regardless of its status.

**Note**  
All of the information in this section applies to all table read APIs such as `GetTable`, `GetTables`, and `SearchTables`.

# View creation failure scenarios during asynchronous operations
<a name="views-api-usage-errors"></a>

The following examples are representative of the types of errors that may result from `CreateTable` or `UpdateTable` view API calls. They are not exhaustive as the error surface of SQL query failures is quite large.

## Scenario 1: Amazon Redshift query failure
<a name="views-api-usage-errors-scenario-1"></a>

The query provided for Amazon Redshift includes a misspelled table name can't be found in the Data Catalog during the validation. The resulting error is shown in the `Status` field in the `GetTable` response for the view.

`GetTable` request:

```
{
    "CatalogId": "123456789012",
    "DatabaseName": "async-view-test-db",
    "TableInput": {
        "Name": "view-athena-redshift-72",
        "Description": "This is an atomic operation",
        "StorageDescriptor": {
            "Columns": [
                { "Name": "col1", "Type": "int" },
                { "Name": "col2", "Type": "string" },
                { "Name": "col3", "Type": "double" }
            ]
        },
        "ViewDefinition": {
            "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
            "SubObjects": [ "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1" ],
            "Representations": [
                {
                    "Dialect": "ATHENA",
                    "DialectVersion": "3",
                    "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                    "ValidationConnection": "athena-connection"
                },
                {
                    "Dialect": "REDSHIFT",
                    "DialectVersion": "1.0",
                    "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table__1\";",
                    "ValidationConnection": "redshift-connection"
                }
            ]
        }
    }
}
```

`GetTable` response:

```
IncludeStatusDetails = FALSE
{
    "Table": {
        "Name": "view-athena-redshift-72",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:39:19-07:00",
        "UpdateTime": "2024-07-11T11:39:19-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:39:19-07:00",
            "UpdateTime": "2024-07-11T11:40:06-07:00",
            "Action": "CREATE",
            "State": "FAILED"
        }
    }
}

IncludeStatusDetails = TRUE
{
    "Table": {
        "Name": "view-athena-redshift-72",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:39:19-07:00",
        "UpdateTime": "2024-07-11T11:39:19-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:39:19-07:00",
            "UpdateTime": "2024-07-11T11:40:06-07:00",
            "Action": "CREATE",
            "State": "FAILED",
            "Error": {
                "ErrorCode": "QueryExecutionException",
                "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: redshift-connection | Query Execution Id: ddb711d3-2415-4aa9-b251-6a76ab4f41b1 | Timestamp: Thu Jul 11 18:39:37 UTC 2024]: Redshift returned error for the statement: ERROR: AwsClientException: EntityNotFoundException from glue - Entity Not Found"
            },
            "Details": {
                "RequestedChange": {
                    "Name": "view-athena-redshift-72",
                    "DatabaseName": "async-view-test-db",
                    "Description": "This is an atomic operation",
                    "Retention": 0,
                    "StorageDescriptor": {
                        "Columns": [
                            {
                                "Name": "col1",
                                "Type": "int"
                            },
                            {
                                "Name": "col2",
                                "Type": "string"
                            },
                            {
                                "Name": "col3",
                                "Type": "double"
                            }
                        ],
                        "Compressed": false,
                        "NumberOfBuckets": 0,
                        "SortColumns": [],
                        "StoredAsSubDirectories": false
                    },
                    "TableType": "VIRTUAL_VIEW",
                    "IsRegisteredWithLakeFormation": false,
                    "CatalogId": "123456789012",
                    "IsRowFilteringEnabled": false,
                    "VersionId": "-1",
                    "DatabaseId": "<databaseID>",
                    "ViewDefinition": {
                        "IsProtected": true,
                        "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
                        "SubObjects": [
                            "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1"
                        ],
                        "Representations": [
                            {
                                "Dialect": "ATHENA",
                                "DialectVersion": "3",
                                "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                                "IsStale": false
                            },
                            {
                                "Dialect": "REDSHIFT",
                                "DialectVersion": "1.0",
                                "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table__1\";",
                                "IsStale": false
                            }
                        ]
                    },
                    "IsMultiDialectView": true
                },
                "ViewValidations": [
                    {
                        "Dialect": "ATHENA",
                        "DialectVersion": "3",
                        "ViewValidationText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                        "UpdateTime": "2024-07-11T11:40:06-07:00",
                        "State": "SUCCESS"
                    },
                    {
                        "Dialect": "REDSHIFT",
                        "DialectVersion": "1.0",
                        "ViewValidationText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table__1\";",
                        "UpdateTime": "2024-07-11T11:39:37-07:00",
                        "State": "FAILED",
                        "Error": {
                            "ErrorCode": "QueryExecutionException",
                            "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: redshift-connection | Query Execution Id: ddb711d3-2415-4aa9-b251-6a76ab4f41b1 | Timestamp: Thu
 Jul 11 18:39:37 UTC 2024]: Redshift returned error for the statement: ERROR: AwsClientException: EntityNotFoundException from glue - Entity Not Found"
                        }
                    }
                ]
            }
        }
    }
}
```

## Scenario 2: Invalid Amazon Redshift connection
<a name="views-api-usage-errors-scenario-2"></a>

The Amazon Redshift connection in the following example is malformed because it refers to a Amazon Redshift database that doesn't exist in the provided cluster/serverless endpoint. Amazon Redshift is not able to validate the view and the `Status` field in the `GetTable` response shows the error (`"State": "FAILED"` from Amazon Redshift.

`GetTable` request:

```
{
    "CatalogId": "123456789012",
    "DatabaseName": "async-view-test-db",
    "TableInput": {
        "Name": "view-athena-redshift-73",
        "Description": "This is an atomic operation",
        "StorageDescriptor": {
            "Columns": [
                { "Name": "col1", "Type": "int" },
                { "Name": "col2", "Type": "string" },
                { "Name": "col3", "Type": "double" }
            ]
        },
        "ViewDefinition": {
            "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
            "SubObjects": [ "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1" ],
            "Representations": [
                {
                    "Dialect": "ATHENA",
                    "DialectVersion": "3",
                    "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                    "ValidationConnection": "athena-connection"
                },
                {
                    "Dialect": "REDSHIFT",
                    "DialectVersion": "1.0",
                    "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                    "ValidationConnection": "redshift-connection-malformed"
                }
            ]
        }
    }
}
```

`GetTable` response:

```
IncludeStatusDetails = FALSE
{
    "Table": {
        "Name": "view-athena-redshift-73",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:43:27-07:00",
        "UpdateTime": "2024-07-11T11:43:27-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:43:27-07:00",
            "UpdateTime": "2024-07-11T11:43:40-07:00",
            "Action": "CREATE",
            "State": "FAILED"
        }
    }
}

IncludeStatusDetails = TRUE
{
    "Table": {
        "Name": "view-athena-redshift-73",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:43:27-07:00",
        "UpdateTime": "2024-07-11T11:43:27-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:43:27-07:00",
            "UpdateTime": "2024-07-11T11:43:40-07:00",
            "Action": "CREATE",
            "State": "FAILED",
            "Error": {
                "ErrorCode": "QueryExecutionException",
                "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: redshift-connection-malformed | Query Execution Id: 69bfafd4-3d51-4cb0-9320-7ce5404b1809 | Timestamp: Thu Jul 11 18:43:38 UTC 2024]: Redshift returned error for the statement: FATAL: database \"devooo\" does not exist"
            },
            "Details": {
                "RequestedChange": {
                    "Name": "view-athena-redshift-73",
                    "DatabaseName": "async-view-test-db",
                    "Description": "This is an atomic operation",
                    "Retention": 0,
                    "StorageDescriptor": {
                        "Columns": [
                            {
                                "Name": "col1",
                                "Type": "int"
                            },
                            {
                                "Name": "col2",
                                "Type": "string"
                            },
                            {
                                "Name": "col3",
                                "Type": "double"
                            }
                        ],
                        "Compressed": false,
                        "NumberOfBuckets": 0,
                        "SortColumns": [],
                        "StoredAsSubDirectories": false
                    },
                    "TableType": "VIRTUAL_VIEW",
                    "IsRegisteredWithLakeFormation": false,
                    "CatalogId": "123456789012",
                    "IsRowFilteringEnabled": false,
                    "VersionId": "-1",
                    "DatabaseId": "<databaseID>",
                    "ViewDefinition": {
                        "IsProtected": true,
                        "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
                        "SubObjects": [
                            "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1"
                        ],
                        "Representations": [
                            {
                                "Dialect": "ATHENA",
                                "DialectVersion": "3",
                                "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                                "IsStale": false
                            },
                            {
                                "Dialect": "REDSHIFT",
                                "DialectVersion": "1.0",
                                "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                                "IsStale": false
                            }
                        ]
                    },
                    "IsMultiDialectView": true
                },
                "ViewValidations": [
                    {
                        "Dialect": "ATHENA",
                        "DialectVersion": "3",
                        "ViewValidationText": "SELECT * FROM \"gdc-view-playground-db\".\"table_1\"",
                        "UpdateTime": "2024-07-11T11:43:40-07:00",
                        "State": "SUCCESS"
                    },
                    {
                        "Dialect": "REDSHIFT",
                        "DialectVersion": "1.0",
                        "ViewValidationText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                        "UpdateTime": "2024-07-11T11:43:38-07:00",
                        "State": "FAILED",
                        "Error": {
                            "ErrorCode": "QueryExecutionException",
                            "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: redshift-connection-malformed | Query Execution Id: 69bfafd4-3d51-4cb0-9320-7ce5404b1809 | Time
stamp: Thu Jul 11 18:43:38 UTC 2024]: Redshift returned error for the statement: FATAL: database \"devooo\" does not exist"
                        }
                    }
                ]
            }
        }
    }
}
```

## Scenario 3: Athena query failure
<a name="views-api-usage-errors-scenario-3"></a>

The SQL for Athena here is invalid because the query misspells the database name. The Athena query validation catches this and the resulting error is surfaced through the `Status` object in a `GetTable` call.

`GetTable` request:

```
{
    "CatalogId": "123456789012",
    "DatabaseName": "async-view-test-db",
    "TableInput": {
        "Name": "view-athena-redshift-70",
        "Description": "This is an atomic operation",
        "StorageDescriptor": {
            "Columns": [
                { "Name": "col1", "Type": "int" },
                { "Name": "col2", "Type": "string" },
                { "Name": "col3", "Type": "double" }
            ]
        },
        "ViewDefinition": {
            "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
            "SubObjects": [ "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1" ],
            "Representations": [
                {
                    "Dialect": "ATHENA",
                    "DialectVersion": "3",
                    "ViewOriginalText": "SELECT * FROM \"gdc--view-playground-db\".\"table_1\"",
                    "ValidationConnection": "athena-connection"
                },
                {
                    "Dialect": "REDSHIFT",
                    "DialectVersion": "1.0",
                    "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                    "ValidationConnection": "redshift-connection"
                }
            ]
        }
    }
}
```

`GetTable` response:

```
IncludeStatusDetails = FALSE
{
    "Table": {
        "Name": "view-athena-redshift-70",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:09:53-07:00",
        "UpdateTime": "2024-07-11T11:09:53-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:09:54-07:00",
            "UpdateTime": "2024-07-11T11:10:41-07:00",
            "Action": "CREATE",
            "State": "FAILED",
        }
    }
}

IncludeStatusDetails = TRUE
{
    "Table": {
        "Name": "view-athena-redshift-70",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:09:53-07:00",
        "UpdateTime": "2024-07-11T11:09:53-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:09:54-07:00",
            "UpdateTime": "2024-07-11T11:10:41-07:00",
            "Action": "CREATE",
            "State": "FAILED",
            "Error": {
                "ErrorCode": "QueryExecutionException",
                "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: athena-connection | Query Execution Id: d9bb1e6d-ce26-4b35-8276-8a199af966aa | Timestamp: Thu Jul 11 18:10:
41 UTC 2024]: Athena validation FAILED: {ErrorCategory: 2,ErrorType: 1301,Retryable: false,ErrorMessage: line 1:118: Schema 'gdc--view-playground-db' does not exist}"
            },
            "Details": {
                "RequestedChange": {
                    "Name": "view-athena-redshift-70",
                    "DatabaseName": "async-view-test-db",
                    "Description": "This is an atomic operation",
                    "Retention": 0,
                    "StorageDescriptor": {
                        "Columns": [
                            {
                                "Name": "col1",
                                "Type": "int"
                            },
                            {
                                "Name": "col2",
                                "Type": "string"
                            },
                            {
                                "Name": "col3",
                                "Type": "double"
                            }
                        ],
                        "Compressed": false,
                        "NumberOfBuckets": 0,
                        "SortColumns": [],
                        "StoredAsSubDirectories": false
                    },
                    "TableType": "VIRTUAL_VIEW",
                    "IsRegisteredWithLakeFormation": false,
                    "CatalogId": "123456789012",
                    "IsRowFilteringEnabled": false,
                    "VersionId": "-1",
                    "DatabaseId": "<databaseID>",
                    "ViewDefinition": {
                        "IsProtected": true,
                        "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
                        "SubObjects": [
                            "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1"
                        ],
                        "Representations": [
                            {
                                "Dialect": "ATHENA",
                                "DialectVersion": "3",
                                "ViewOriginalText": "SELECT * FROM \"gdc--view-playground-db\".\"table_1\"",
                                "IsStale": false
                            },
                            {
                                "Dialect": "REDSHIFT",
                                "DialectVersion": "1.0",
                                "ViewOriginalText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                                "IsStale": false
                            }
                        ]
                    },
                    "IsMultiDialectView": true
                },
                "ViewValidations": [
                    {
                        "Dialect": "ATHENA",
                        "DialectVersion": "3",
                        "ViewValidationText": "SELECT * FROM \"gdc--view-playground-db\".\"table_1\"",
                        "UpdateTime": "2024-07-11T11:10:41-07:00",
                        "State": "FAILED",
                        "Error": {
                            "ErrorCode": "QueryExecutionException",
                            "ErrorMessage": "Error received during view SQL validation using a connection: [Connection Name: athena-connection | Query Execution Id: d9bb1e6d-ce26-4b35-8276-8a199af966aa | Timestamp: Thu J
ul 11 18:10:41 UTC 2024]: Athena validation FAILED: {ErrorCategory: 2,ErrorType: 1301,Retryable: false,ErrorMessage: line 1:118: Schema 'gdc--view-playground-db' does not exist}"
                        }
                    },
                    {
                        "Dialect": "REDSHIFT",
                        "DialectVersion": "1.0",
                        "ViewValidationText": "SELECT * FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                        "UpdateTime": "2024-07-11T11:10:41-07:00",
                        "State": "SUCCESS"
                    }
                ]
            }
        }
    }
}
```

## Scenario 4: Mismatch storage descriptors
<a name="views-api-usage-errors-scenario-4"></a>

The SQL provided for the Athena dialect selects `col1` and `col2` while the SQL for Redshift selects only `col1`. This leads to a storage descriptor mismatch error.

`GetTable` request:

```
{
    "CatalogId": "123456789012",
    "DatabaseName": "async-view-test-db",
    "TableInput": {
        "Name": "view-athena-redshift-71",
        "Description": "This is an atomic operation",
        "StorageDescriptor": {
            "Columns": [
                { "Name": "col1", "Type": "int" },
                { "Name": "col2", "Type": "string" },
                { "Name": "col3", "Type": "double" }
            ]
        },
        "ViewDefinition": {
            "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
            "SubObjects": [ "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1" ],
            "Representations": [
                {
                    "Dialect": "ATHENA",
                    "DialectVersion": "3",
                    "ViewOriginalText": "SELECT col1, col2 FROM \"gdc-view-playground-db\".\"table_1\"",
                    "ValidationConnection": "athena-connection"
                },
                {
                    "Dialect": "REDSHIFT",
                    "DialectVersion": "1.0",
                    "ViewOriginalText": "SELECT col1 FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                    "ValidationConnection": "redshift-connection"
                }
            ]
        }
    }
}
```

`GetTable` response:

```
IncludeStatusDetails = FALSE

{
    "Table": {
        "Name": "view-athena-redshift-71",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:22:02-07:00",
        "UpdateTime": "2024-07-11T11:22:02-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:22:02-07:00",
            "UpdateTime": "2024-07-11T11:23:19-07:00",
            "Action": "CREATE",
            "State": "FAILED"
        }
    }
}

IncludeStatusDetails = TRUE

{
    "Table": {
        "Name": "view-athena-redshift-71",
        "DatabaseName": "async-view-test-db",
        "Description": "",
        "CreateTime": "2024-07-11T11:22:02-07:00",
        "UpdateTime": "2024-07-11T11:22:02-07:00",
        "Retention": 0,
        "ViewOriginalText": "",
        "ViewExpandedText": "",
        "TableType": "",
        "CreatedBy": "arn:aws:iam::123456789012:user/zcaisse",
        "IsRegisteredWithLakeFormation": false,
        "CatalogId": "123456789012",
        "IsRowFilteringEnabled": false,
        "VersionId": "-1",
        "DatabaseId": "<databaseID>",
        "IsMultiDialectView": false,
        "Status": {
            "RequestedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "UpdatedBy": "arn:aws:iam::123456789012:user/zcaisse",
            "RequestTime": "2024-07-11T11:22:02-07:00",
            "UpdateTime": "2024-07-11T11:23:19-07:00",
            "Action": "CREATE",
            "State": "FAILED",
            "Error": {
                "ErrorCode": "InvalidInputException",
                "ErrorMessage": "Engine and existing storage descriptor mismatch"
            },
            "Details": {
                "RequestedChange": {
                    "Name": "view-athena-redshift-71",
                    "DatabaseName": "async-view-test-db",
                    "Description": "This is an atomic operation",
                    "Retention": 0,
                    "StorageDescriptor": {
                        "Columns": [
                            {
                                "Name": "col1",
                                "Type": "int"
                            },
                            {
                                "Name": "col2",
                                "Type": "string"
                            },
                            {
                                "Name": "col3",
                                "Type": "double"
                            }
                        ],
                        "Compressed": false,
                        "NumberOfBuckets": 0,
                        "SortColumns": [],
                        "StoredAsSubDirectories": false
                    },
                    "TableType": "VIRTUAL_VIEW",
                    "IsRegisteredWithLakeFormation": false,
                    "CatalogId": "123456789012",
                    "IsRowFilteringEnabled": false,
                    "VersionId": "-1",
                    "DatabaseId": "<databaseID>",
                    "ViewDefinition": {
                        "IsProtected": true,
                        "Definer": "arn:aws:iam::123456789012:role/GDCViewDefiner",
                        "SubObjects": [
                            "arn:aws:glue:us-east-1:123456789012:table/gdc-view-playground-db/table_1"
                        ],
                        "Representations": [
                            {
                                "Dialect": "ATHENA",
                                "DialectVersion": "3",
                                "ViewOriginalText": "SELECT col1, col2 FROM \"gdc-view-playground-db\".\"table_1\"",
                                "IsStale": false
                            },
                            {
                                "Dialect": "REDSHIFT",
                                "DialectVersion": "1.0",
                                "ViewOriginalText": "SELECT col1 FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                                "IsStale": false
                            }
                        ]
                    },
                    "IsMultiDialectView": true
                },
                "ViewValidations": [
                    {
                        "Dialect": "ATHENA",
                        "DialectVersion": "3",
                        "ViewValidationText": "SELECT col1, col2 FROM \"gdc-view-playground-db\".\"table_1\"",
                        "UpdateTime": "2024-07-11T11:23:19-07:00",
                        "State": "FAILED",
                        "Error": {
                            "ErrorCode": "InvalidInputException",
                            "ErrorMessage": "Engine and existing storage descriptor mismatch"
                        }
                    },
                    {
                        "Dialect": "REDSHIFT",
                        "DialectVersion": "1.0",
                        "ViewValidationText": "SELECT col1 FROM \"gdc-view-playground-external-schema\".\"table_1\";",
                        "UpdateTime": "2024-07-11T11:22:49-07:00",
                        "State": "FAILED",
                        "Error": {
                            "ErrorCode": "InvalidInputException",
                            "ErrorMessage": "Engine and existing storage descriptor mismatch"
                        }
                    }
                ]
            }
        }
    }
}
```

# Granting permissions on Data Catalog views
<a name="grant-perms-views"></a>

 After creating views in the AWS Glue Data Catalog, you can grant data lake permissions on views to principals across AWS accounts, organizations and organizational units. You can grant permissions using LF-Tags or the named resource method. For more information on tagging resources, see [Lake Formation tag-based access control](tag-based-access-control.md). For more information on granting permissions on views directly, see [Granting permissions on views using the named resource method](granting-view-permissions.md).

# Materialized views
<a name="materialized-views"></a>

**Topics**
+ [Differentiating materialized views from other view types](#materialized-views-differentiating)
+ [Use cases](#materialized-views-use-cases)
+ [Key concepts](#materialized-views-key-concepts)
+ [Permissions for materialized views](#materialized-views-permissions)
+ [Creating and managing materialized views](#materialized-views-creating-managing)
+ [Storage and data access](#materialized-views-storage-access)
+ [Integrating with AWS Lake Formation permissions](#materialized-views-lake-formation)
+ [Monitoring and debugging](#materialized-views-monitoring-debugging)
+ [Managing refresh jobs](#materialized-views-managing-refresh-jobs)
+ [Monitoring and troubleshooting](#materialized-views-monitoring-troubleshooting)
+ [Considerations and limitations](#materialized-views-considerations-limitations)

In the AWS Glue Data Catalog, a materialized view is a managed table that stores the precomputed result of a SQL query in Apache Iceberg format. Unlike standard Data Catalog views that execute the query each time they are accessed, materialized views physically store the query results and update them as the underlying source tables change. You can create materialized views using Apache Spark version 3.5.6\$1 in Amazon Athena, Amazon EMR, or AWS Glue.

Materialized views reference Apache Iceberg tables registered in the AWS Glue Data Catalog, with precomputed data stored as Apache Iceberg tables in Amazon S3 Tables buckets or Amazon S3 general purpose buckets, making them accessible from multiple query engines including Amazon Athena, Amazon Redshift, and third-party Iceberg-compatible engines.

## Differentiating materialized views from other view types
<a name="materialized-views-differentiating"></a>

Materialized views differ from AWS Glue Data Catalog views, Apache Spark views, and Amazon Athena views in fundamental ways. While Data Catalog views are virtual tables that execute the SQL query definition each time they are accessed, materialized views physically store precomputed query results. This eliminates redundant computation and significantly improves query performance for frequently accessed complex transformations.

Materialized views also differ from traditional data transformation pipelines built with AWS Glue ETL or custom Spark jobs. Instead of writing custom code to handle change detection, incremental updates, and workflow orchestration, you define materialized views using standard SQL syntax. The AWS Glue Data Catalog automatically monitors source tables, detects changes, and refreshes materialized views using fully managed compute infrastructure.

## Use cases
<a name="materialized-views-use-cases"></a>

Following are important use cases for materialized views:
+ **Accelerate complex analytical queries** – Create materialized views that precompute expensive joins, aggregations, and window functions. Spark engines automatically rewrite subsequent queries to use the precomputed results, reducing query latency and compute costs.
+ **Simplify data transformation pipelines** – Replace complex ETL jobs that handle change detection, incremental updates, and workflow orchestration with simple SQL-based materialized view definitions. The AWS Glue Data Catalog manages all operational complexity automatically.
+ **Enable self-service analytics with governed data access** – Create curated materialized views that transform raw data into business-ready datasets. Grant users access to materialized views without exposing underlying source tables, simplifying security management while empowering self-service analytics.
+ **Optimize feature engineering for machine learning** – Define materialized views that implement feature transformations for ML models. The automatic refresh capability ensures feature stores remain current as source data evolves, while incremental refresh minimizes compute costs.
+ **Implement efficient data sharing** – Create materialized views that filter and transform data for specific consumers. Share materialized views across accounts and regions using AWS Lake Formation, eliminating the need for data duplication while maintaining centralized governance.

## Key concepts
<a name="materialized-views-key-concepts"></a>

### Automatic refresh
<a name="materialized-views-automatic-refresh"></a>

Automatic refresh is a capability that continuously monitors your source tables and updates materialized views according to a schedule you define. When you create a materialized view, you can specify a refresh frequency using time-based scheduling with intervals as frequent as one hour. The AWS Glue Data Catalog uses managed Spark compute infrastructure to execute refresh operations in the background, transparently handling all aspects of change detection and incremental updates.

When source data changes between refresh intervals, the materialized view becomes temporarily stale. Queries directly accessing the materialized view may return outdated results until the next scheduled refresh completes. For scenarios requiring immediate access to the most current data, you can execute a manual refresh using the `REFRESH MATERIALIZED VIEW` SQL command.

### Incremental refresh
<a name="materialized-views-incremental-refresh"></a>

Incremental refresh is an optimization technique that processes only the data that has changed in source tables since the last refresh, rather than recomputing the entire materialized view. The AWS Glue Data Catalog leverages Apache Iceberg's metadata layer to efficiently track changes in source tables and determine which portions of the materialized view require updates.

This approach significantly reduces compute costs and refresh duration compared to full refresh operations, particularly for large datasets where only a small percentage of data changes between refresh cycles. The incremental refresh mechanism operates automatically; you don't need to write custom logic to detect or process changed data.

### Automatic query rewrite
<a name="materialized-views-automatic-query-rewrite"></a>

Automatic query rewrite is a query optimization capability available in Spark engines across Amazon Athena, Amazon EMR, and AWS Glue. When you execute a query against base tables, the Spark optimizer analyzes your query plan and automatically determines whether available materialized views can satisfy the query more efficiently. If a suitable materialized view exists, the optimizer transparently rewrites the query to use the precomputed results instead of processing the base tables.

This optimization occurs without requiring any changes to your application code or query statements. The Spark optimizer ensures that automatic query rewrite only applies when the materialized view is current and can produce accurate results. If a materialized view is stale or doesn't fully match the query requirements, the optimizer executes the original query plan against base tables, prioritizing correctness over performance.

### View definer role
<a name="materialized-views-view-definer-role"></a>

A materialized view operates based on the permissions of the IAM role that created it, known as the view definer role. The definer role must have read access to all base tables referenced in the materialized view definition and create table permissions on the target database. When the AWS Glue Data Catalog refreshes a materialized view, it assumes the definer role to access source tables and write updated results.

This security model enables you to grant users access to materialized views without granting them direct permissions on the underlying source tables. If the view definer role loses access to any base table, subsequent refresh operations will fail until permissions are restored.

## Permissions for materialized views
<a name="materialized-views-permissions"></a>

To create and manage materialized views, you must configure AWS Lake Formation permissions. The IAM role creating the materialized view (the definer role) requires specific permissions on source tables and target databases.

### Required permissions for the definer role
<a name="materialized-views-required-permissions-definer-role"></a>

The definer role must have the following Lake Formation permissions:
+ On source tables – SELECT or ALL permissions without row, column, or cell filters
+ On the target database – CREATE\$1TABLE permission
+ On the AWS Glue Data Catalog – GetTable and CreateTable API permissions

When you create a materialized view, the definer role's ARN is stored in the view definition. The AWS Glue Data Catalog assumes this role when executing automatic refresh operations. If the definer role loses access to source tables, refresh operations will fail until permissions are restored.

### IAM permissions for AWS Glue jobs
<a name="materialized-views-iam-permissions-glue-jobs"></a>

Your AWS Glue job's IAM role requires the following permissions:

```
{
    "Version": "2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetCatalog",
                "glue:GetCatalogs",
                "glue:GetTable",
                "glue:GetTables",
                "glue:CreateTable",
                "glue:UpdateTable",
                "glue:DeleteTable",
                "glue:GetDatabase",
                "glue:GetDatabases",
                "cloudwatch:PutMetricData"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:PutObject",
                "s3:DeleteObject"
            ],
            "Resource": [
                "arn:aws:s3:::amzn-s3-demo-bucket/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::amzn-s3-demo-bucket"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:*:*:*:/aws-glue/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "lakeformation:GetDataAccess"
            ],
            "Resource": "*"
        }
    ]
}
```

The role you use for Materialized View auto-refresh must have the iam:PassRole permission on the role.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::111122223333:role/materialized-view-role-name"
      ]
    }
  ]
}
```

To let Glue automatically refresh the materialized view for you, the role must also have the following trust policy that enables the service to assume the role.

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
```

If the Materialized View is stored in S3 Tables Buckets, you also need to add the following permission to the role.

```
{
  "Version": "2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3tables:PutTableMaintenanceConfiguration"
      ],
      "Resource": "arn:aws:s3tables:*:123456789012:*"
    }
  ]
}
```

### Granting access to materialized views
<a name="materialized-views-granting-access"></a>

To grant other users access to query a materialized view, use AWS Lake Formation to grant SELECT permission on the materialized view table. Users can query the materialized view without requiring direct access to the underlying source tables.

For detailed information about configuring Lake Formation permissions, see Granting and revoking permissions on Data Catalog resources in the AWS Lake Formation Developer Guide.

## Creating and managing materialized views
<a name="materialized-views-creating-managing"></a>

You create materialized views using the `CREATE MATERIALIZED VIEW` SQL statement in Spark engines. The view definition specifies the SQL query that defines the transformation logic, the target database and table name, and optional refresh configuration. You can define complex transformations including aggregations, joins across multiple tables, filters, and window functions.

```
CREATE MATERIALIZED VIEW sales_summary
AS
SELECT 
    region,
    product_category,
    SUM(sales_amount) as total_sales,
    COUNT(DISTINCT customer_id) as unique_customers
FROM sales_transactions
WHERE transaction_date >= current_date - interval '90' day
GROUP BY region, product_category;
```

To configure automatic refresh, include the refresh schedule in your view definition:

```
CREATE MATERIALIZED VIEW sales_summary
SCHEDULE REFRESH EVERY 1 HOUR
AS
SELECT region, product_category, SUM(sales_amount) as total_sales
FROM sales_transactions
GROUP BY region, product_category;
```

You can manually refresh a materialized view at any time using the `REFRESH MATERIALIZED VIEW` command:

```
REFRESH MATERIALIZED VIEW sales_summary;
```

To modify an existing materialized view's refresh schedule, use the `ALTER MATERIALIZED VIEW` statement:

```
ALTER MATERIALIZED VIEW sales_summary
ADD SCHEDULE REFRESH EVERY 2 HOURS;
```

### Nested materialized views
<a name="materialized-views-nested"></a>

You can create materialized views that reference other materialized views as base tables, enabling multi-stage data transformations. When you create nested materialized views, the AWS Glue Data Catalog tracks dependencies and automatically propagates updates through the materialized view hierarchy. When a base materialized view refreshes, all downstream materialized views that depend on it are updated accordingly.

This capability allows you to decompose complex transformations into logical stages, improving maintainability and enabling selective refresh of transformation layers based on your data freshness requirements.

## Storage and data access
<a name="materialized-views-storage-access"></a>

Materialized views store precomputed results as Apache Iceberg tables in S3 Tables buckets or general purpose S3 buckets within your AWS account. The AWS Glue Data Catalog manages all aspects of Iceberg table maintenance, including compaction and snapshot retention, through S3 Tables' automated optimization capabilities.

Because materialized views are stored as Iceberg tables, you can read them directly from any Iceberg-compatible engine, including Amazon Athena, Amazon Redshift, and third-party analytics platforms. This multi-engine accessibility ensures your precomputed data remains accessible across your entire analytics ecosystem without data duplication or format conversion.

## Integrating with AWS Lake Formation permissions
<a name="materialized-views-lake-formation"></a>

You can use AWS Lake Formation to manage fine-grained permissions on materialized views. The view creator automatically becomes the owner of the materialized view and can grant other users or roles permissions using AWS Lake Formation's named resource method or LF-Tags.

When you grant a user `SELECT` permission on a materialized view, they can query the precomputed results without requiring access to the underlying source tables. This security model simplifies data access management and enables you to implement the principle of least privilege, providing users with access to only the specific data transformations they need.

You can share materialized views across AWS accounts, AWS organizations, and organizational units using AWS Lake Formation's cross-account sharing capabilities. You can also access materialized views across AWS Regions using resource links, enabling centralized data governance with distributed data access.

## Monitoring and debugging
<a name="materialized-views-monitoring-debugging"></a>

The AWS Glue Data Catalog publishes all materialized view refresh operations and associated metrics to Amazon CloudWatch. You can monitor refresh start time, end time, duration, data volume processed, and refresh status through CloudWatch metrics. When refresh operations fail, error messages and diagnostic information are captured in CloudWatch Logs.

You can set up CloudWatch alarms to receive notifications when refresh jobs exceed expected duration or fail repeatedly. The AWS Glue Data Catalog also publishes change events to for both successful and failed refresh runs, enabling you to integrate materialized view operations into broader workflow automation.

To check the current status of a materialized view, use the `DESCRIBE MATERIALIZED VIEW` SQL command, which returns metadata including staleness status, last refresh timestamp, and refresh schedule configuration.

## Managing refresh jobs
<a name="materialized-views-managing-refresh-jobs"></a>

### Starting a manual refresh
<a name="materialized-views-manual-refresh"></a>

Trigger an immediate refresh outside the scheduled interval.

Required Permission: The AWS credentials used to make the API call must have `glue:GetTable` permission for the materialized view.

For S3 Tables Catalog:

```
aws glue start-materialized-view-refresh-task-run \
    --catalog-id <ACCOUNT_ID>:s3tablescatalog/<CATALOG_NAME> \
    --database-name <DATABASE_NAME> \
    --table-name <MV_TABLE_NAME>
```

For Root Catalog:

```
aws glue start-materialized-view-refresh-task-run \
    --catalog-id <ACCOUNT_ID> \
    --database-name <DATABASE_NAME> \
    --table-name <MV_TABLE_NAME>
```

### Checking refresh status
<a name="materialized-views-checking-refresh-status"></a>

Get the status of a specific refresh job:

```
aws glue get-materialized-view-refresh-task-run \
    --catalog-id <CATALOG_ID> \
    --materialized-view-refresh-task-run-id <TASK_RUN_ID>
```

### Listing refresh history
<a name="materialized-views-listing-refresh-history"></a>

View all refresh jobs for a materialized view:

```
aws glue list-materialized-view-refresh-task-runs \
    --catalog-id <CATALOG_ID> \
    --database-name <DATABASE_NAME> \
    --table-name <MV_TABLE_NAME>
```

**Note**  
Use `<ACCOUNT_ID>:s3tablescatalog/<CATALOG_NAME>` for S3 Tables or `<ACCOUNT_ID>` for root catalog.

### Stopping a running refresh
<a name="materialized-views-stopping-refresh"></a>

Cancel an in-progress refresh job:

```
aws glue stop-materialized-view-refresh-task-run \
    --catalog-id <CATALOG_ID> \
    --database-name <DATABASE_NAME> \
    --table-name <MV_TABLE_NAME>
```

## Monitoring and troubleshooting
<a name="materialized-views-monitoring-troubleshooting"></a>

There are three ways to monitor materialized view refresh jobs:

### CloudWatch Metrics
<a name="materialized-views-cloudwatch-metrics"></a>

View aggregated metrics for all your materialized view refresh jobs in CloudWatch:

Available Metrics:
+ AWS/Glue namespace with dimensions:
  + CatalogId: Your catalog identifier
  + DatabaseName: Database containing the materialized view
  + TableName: Materialized view name
  + TaskType: Set to "MaterializedViewRefresh"

Viewing in Console:

1. Navigate to CloudWatch Console → Metrics

1. Select AWS/Glue namespace

1. Filter by dimensions: CatalogId, DatabaseName, TableName, TaskType

1. View metrics for job success, failure, and duration

Example CloudWatch Metrics Query:

```
{AWS/Glue,CatalogId,DatabaseName,TableName,TaskType} MaterializedViewRefresh
```

Using AWS CLI:

```
aws cloudwatch get-metric-statistics \
    --namespace AWS/Glue \
    --metric-name <MetricName> \
    --dimensions Name=CatalogId,Value=<CATALOG_ID> \
                 Name=DatabaseName,Value=<DATABASE_NAME> \
                 Name=TableName,Value=<TABLE_NAME> \
                 Name=TaskType,Value=MaterializedViewRefresh \
    --start-time <START_TIME> \
    --end-time <END_TIME> \
    --period 3600 \
    --statistics Sum \
    --region <REGION>
```

### CloudWatch Logs
<a name="materialized-views-cloudwatch-logs"></a>

View detailed execution logs for individual refresh task runs:

Log Group: `/aws-glue/materialized-views/<task_run_id>`

Where `<task_run_id>` is a UUID (e.g., abc12345-def6-7890-ghij-klmnopqrstuv).

Viewing Logs:

```
# List log streams for a task run
aws logs describe-log-streams \
    --log-group-name /aws-glue/materialized-views/<TASK_RUN_ID> \
    --region <REGION>

# Get log events
aws logs get-log-events \
    --log-group-name /aws-glue/materialized-views/<TASK_RUN_ID> \
    --log-stream-name <LOG_STREAM_NAME> \
    --region <REGION>
```

In CloudWatch Console:

1. Navigate to CloudWatch → Log groups

1. Search for /aws-glue/materialized-views/

1. Select the log group with your task run ID

1. View detailed execution logs, errors, and Spark job output

### Notifications
<a name="materialized-views-eventbridge"></a>

Subscribe to events for real-time notifications about refresh job state changes:

Available Event Types:
+ Glue Materialized View Refresh Task Started
+ Glue Materialized View Refresh Task Succeeded
+ Glue Materialized View Refresh Task Failed
+ Glue Materialized View Auto-Refresh Invocation Failure

Creating an Rule:

```
aws events put-rule \
    --name materialized-view-refresh-notifications \
    --event-pattern '{
        "source": ["aws.glue"],
        "detail-type": [
            "Glue Materialized View Refresh Task Started",
            "Glue Materialized View Refresh Task Succeeded",
            "Glue Materialized View Refresh Task Failed",
            "Glue Materialized View Auto-Refresh Invocation Failure"
        ]
    }' \
    --region <REGION>
```

Adding a Target (e.g., SNS Topic):

```
aws events put-targets \
    --rule materialized-view-refresh-notifications \
    --targets "Id"="1","Arn"="arn:aws:sns:<REGION>:<ACCOUNT_ID>:<TOPIC_NAME>" \
    --region <REGION>
```

### Viewing refresh status
<a name="materialized-views-refresh-status"></a>

Check the status of your materialized view refresh jobs using the AWS Glue API:

```
aws glue get-materialized-view-refresh-task-run \
    --catalog-id <CATALOG_ID> \
    --materialized-view-refresh-task-run-id <TASK_RUN_ID> \
    --region <REGION>
```

Or list all recent refresh runs:

```
aws glue list-materialized-view-refresh-task-runs \
    --catalog-id <CATALOG_ID> \
    --database-name <DATABASE_NAME> \
    --table-name <MV_TABLE_NAME> \
    --region <REGION>
```

This shows:
+ Last refresh time
+ Refresh status (SUCCEEDED, FAILED, RUNNING, STOPPED)
+ Task run ID
+ Error messages (if failed)

Common Refresh States:
+ RUNNING: Refresh job is currently executing
+ SUCCEEDED: Refresh completed successfully
+ FAILED: Refresh encountered an error
+ STOPPED: Refresh was manually cancelled

Troubleshooting Failed Refreshes:

If a refresh fails, check:

1. IAM Permissions: Ensure the definer role has access to all base tables and the materialized view location

1. Base Table Availability: Verify all referenced tables exist and are accessible

1. Query Validity: Confirm the SQL query is valid for Spark SQL dialect

1. Resource Limits: Check if you've reached concurrent refresh limits for your account

Use the GetMaterializedViewRefreshTaskRun API to retrieve detailed error messages.

## Considerations and limitations
<a name="materialized-views-considerations-limitations"></a>
+ Materialized views can only reference Apache Iceberg tables registered in the AWS Glue Data Catalog as base tables.
+ View creation and automatic query rewrite are available only from Spark engines in Apache Spark version 3.5.6 and above across Amazon Athena, Amazon EMR, and AWS Glue (Version 5.1).
+ Materialized views are eventually consistent with base tables. During the refresh window, queries directly accessing the materialized view may return outdated data. For immediate access to current data, execute a manual refresh.
+ The minimum automatic refresh interval is one hour. For use cases requiring more frequent updates, execute manual refreshes programmatically using the `REFRESH MATERIALIZED VIEW` command.
+ Query rewrite prioritizes correctness over performance. If a materialized view is stale or cannot satisfy query requirements accurately, Spark engines execute the original query against base tables.

# Importing data using workflows in Lake Formation
<a name="workflows"></a>

With AWS Lake Formation, you can import your data using *workflows*. A workflow defines the data source and schedule to import data into your data lake. It is a container for AWS Glue crawlers, jobs, and triggers that are used to orchestrate the processes to load and update the data lake. 

**Topics**
+ [Blueprints and workflows in Lake Formation](workflows-about.md)
+ [Creating a workflow](workflows-creating.md)
+ [Running a workflow](workflows-running.md)

# Blueprints and workflows in Lake Formation
<a name="workflows-about"></a>

A workflow encapsulates a complex multi-job extract, transform, and load (ETL) activity. Workflows generate AWS Glue crawlers, jobs, and triggers to orchestrate the loading and update of data. Lake Formation executes and tracks a workflow as a single entity. You can configure a workflow to run on demand or on a schedule.

**Note**  
Spark parquet writer doesn't support special characters in column names. This is a technical limitation of the writer itself, not a configuration issue.

Workflows that you create in Lake Formation are visible in the AWS Glue console as a directed acyclic graph (DAG). Each DAG node is a job, crawler, or trigger. To monitor progress and troubleshoot, you can track the status of each node in the workflow.

When a Lake Formation workflow has completed, the user who ran the workflow is granted the Lake Formation `SELECT` permission on the Data Catalog tables that the workflow creates. 

You can also create workflows in AWS Glue. However, because Lake Formation enables you to create a workflow from a blueprint, creating workflows is much simpler and more automated in Lake Formation. Lake Formation provides the following types of blueprints:
+ **Database snapshot** – Loads or reloads data from all tables into the data lake from a JDBC source. You can exclude some data from the source based on an exclude pattern.
+ **Incremental database** – Loads only new data into the data lake from a JDBC source, based on previously set bookmarks. You specify the individual tables in the JDBC source database to include. For each table, you choose the bookmark columns and bookmark sort order to keep track of data that has previously been loaded. The first time that you run an incremental database blueprint against a set of tables, the workflow loads all data from the tables and sets bookmarks for the next incremental database blueprint run. You can therefore use an incremental database blueprint instead of the database snapshot blueprint to load all data, provided that you specify each table in the data source as a parameter.
+ **Log file** – bulk loads data from log file sources, including AWS CloudTrail, Elastic Load Balancing logs, and Application Load Balancer logs.

Use the following table to help decide whether to use a database snapshot or incremental database blueprint.


| Use database snapshot when... | Use incremental database when... | 
| --- | --- | 
|  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/workflows-about.html)  |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/lake-formation/latest/dg/workflows-about.html)  | 

**Note**  
Users cannot edit blue prints and workflows created by Lake Formation. 

# Creating a workflow
<a name="workflows-creating"></a>

Before you start, ensure that you have granted the required data permissions and data location permissions to the role `LakeFormationWorkflowRole`. This is so the workflow can create metadata tables in the Data Catalog and write data to target locations in Amazon S3. For more information, see [(Optional) Create an IAM role for workflows](initial-lf-config.md#iam-create-blueprint-role) and [Overview of Lake Formation permissions](lf-permissions-overview.md).

**Note**  
Lake Formation uses `GetTemplateInstance`, `GetTemplateInstances`, and `InstantiateTemplate` operations to create workflows from blueprints. These operations are not publicly available, and are used only internally for creating resources on your behalf. You receive CloudTrail events for creating workflows.

**To create a workflow from a blueprint**

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as the data lake administrator or as a user who has data engineer permissions. For more information, see [Lake Formation personas and IAM permissions reference](permissions-reference.md).

1. In the navigation pane, choose **Blueprints**, and then choose **Use blueprint**.

1. On the **Use a blueprint** page, choose a tile to select the blueprint type.

1. Under **Import source**, specify the data source. 

   If you are importing from a JDBC source, specify the following:
   + ****Database connection****–Choose a connection from the list. Create additional connections using the AWS Glue console. The JDBC user name and password in the connection determine the database objects that the workflow has access to.
   + ****Source data path****–Enter *<database>*/*<schema>*/*<table>* or *<database>*/*<table>*, depending on the database product. Oracle Database and MySQL don’t support schema in the path. You can substitute the percent (%) character for *<schema>* or *<table>*. For example, for an Oracle database with a system identifier (SID) of `orcl`, enter `orcl/%` to import all tables that the user named in the connection has access to.
**Important**  
This field is case sensitive. The workflow will fail if there is a case mismatch for any of the components.

     If you specify a MySQL database, AWS Glue ETL uses the Mysql5 JDBC driver by default, so MySQL8 is not natively supported. You can edit the ETL job script to use a `customJdbcDriverS3Path` parameter as described in [JDBC connectionType Values](https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-connect.html#aws-glue-programming-etl-connect-jdbc) in the *AWS Glue Developer Guide* to use a different JDBC driver that supports MySQL8.

   If you are importing from a log file, ensure that the role that you specify for the workflow (the "workflow role") has the required IAM permissions to access the data source. For example, to import AWS CloudTrail logs, the user must have the `cloudtrail:DescribeTrails` and `cloudtrail:LookupEvents` permissions to see the list of CloudTrail logs while creating the workflow, and the workflow role must have permissions on the CloudTrail location in Amazon S3.

1. Do one of the following:
   + For the **Database snapshot** blueprint type, optionally identify a subset of data to import by specifying one or more exclude patterns. These exclude patterns are Unix-style `glob` patterns. They are stored as a property of the tables that are created by the workflow.

     For details on the available exclude patterns, see [Include and Exclude Patterns](https://docs.aws.amazon.com/glue/latest/dg/define-crawler.html#crawler-data-stores-exclude) in the *AWS Glue Developer Guide*.
   + For the **Incremental database** blueprint type, specify the following fields. Add a row for each table to import.  
**Table name**  
Table to import. Must be all lower case.  
**Bookmark keys**  
Comma-delimited list of column names that define the bookmark keys. If blank, the primary key is used to determine new data. Case for each column must match the case as defined in the data source.  
The primary key qualifies as the default bookmark key only if it is sequentially increasing or decreasing (with no gaps). If you want to use the primary key as the bookmark key and it has gaps, you must name the primary key column as a bookmark key.  
**Bookmark order**  
When you choose **Ascending**, rows with values greater than bookmarked values are identified as new rows. When you choose **Descending**, rows with values less than bookmarked values are identified as new rows.  
**Partitioning scheme**  
(Optional) List of partitioning key columns, delimited by slashes (/). Example:` year/month/day`.  
![\[The Incremental data section of the console includes these fields: Table name, Bookmark keys, Bookmark order, Partitioning scheme. You can add or remove rows, where each row is for a different table.\]](http://docs.aws.amazon.com/lake-formation/latest/dg/images/incremental-data.png)

     For more information, see [Tracking Processed Data Using Job Bookmarks](https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html) in the *AWS Glue Developer Guide*.

1. Under **Import target**, specify the target database, target Amazon S3 location, and data format.

   Ensure that the workflow role has the required Lake Formation permissions on the database and Amazon S3 target location.
**Note**  
Currently, blueprints do not support encrypting data at the target.

1. Choose an import frequency.

   You can specify a `cron` expression with the **Custom** option.

1. Under **Import options**:

   1. Enter a workflow name.

   1. For role, choose the role `LakeFormationWorkflowRole`, which you created in [(Optional) Create an IAM role for workflows](initial-lf-config.md#iam-create-blueprint-role). 

   1. Optionally specify a table prefix. The prefix is prepended to the names of Data Catalog tables that the workflow creates.

1. Choose **Create**, and wait for the console to report that the workflow was successfully created.
**Tip**  
Did you get the following error message?  
`User: arn:aws:iam::<account-id>:user/<username> is not authorized to perform: iam:PassRole on resource:arn:aws:iam::<account-id>:role/<rolename>...`  
If so, check that you replaced *<account-id>* with a valid AWS account number in all policies.

**See also:**  
[Blueprints and workflows in Lake Formation](workflows-about.md)

# Running a workflow
<a name="workflows-running"></a>

You can run a workflow using the Lake Formation console, the AWS Glue console, or the AWS Glue Command Line Interface (AWS CLI), or API.

**To run a workflow (Lake Formation console)**

1. Open the AWS Lake Formation console at [https://console.aws.amazon.com/lakeformation/](https://console.aws.amazon.com/lakeformation/). Sign in as the data lake administrator or as a user who has data engineer permissions. For more information, see [Lake Formation personas and IAM permissions reference](permissions-reference.md).

1. In the navigation pane, choose **Blueprints**.

1. On the **Blueprints** page, select the workflow. Then on the **Actions** menu, choose **Start**.

1. As the workflow runs, view its progress in the **Last run status** column. Choose the refresh button occasionally.

   The status goes from **RUNNING**, to **Discovering**, to **Importing**, to **COMPLETED**. 

   When the workflow is complete:
   + The Data Catalog has new metadata tables.
   + Your data is ingested into the data lake.

   If the workflow fails, do the following:

   1. Select the workflow. Choose **Actions**, and then choose **View graph**.

      The workflow opens in the AWS Glue console.

   1. Ensure that the workflow is selected, and choose the **History** tab.

   1. Under **History**, select the most recent run and choose **View run details**.

   1. Select a failed job or crawler in the dynamic (runtime) graph, and review the error message. Failed nodes are either red or yellow.

**See also:**  
[Blueprints and workflows in Lake Formation](workflows-about.md)