S3 path-based access using Lake Formation for Amazon EMR Spark
With Amazon EMR releases 7.13.0 and higher, you can use the AWS Lake Formation Access Grants Plugin to obtain temporary credentials for S3 paths registered with Lake Formation. This enables Spark queries that directly reference S3 paths to use Lake Formation-vended credentials, in addition to the existing table-name based credential vending provided by Full Table Access (FTA).
The Lake Formation Access Grants Plugin integrates with the S3A filesystem in Spark. When enabled, S3A uses the plugin to call the Lake Formation GetTemporaryDataLocationCredentials API to obtain temporary credentials for S3 paths that belong to Lake Formation-registered tables.
This plugin is useful when your Spark jobs read or write data by referencing S3 paths
directly (for example,
spark.read.parquet("s3a://my-bucket/my-table/")) rather than using table
names. The plugin also supports optional fallback to S3 Access Grants or IAM role
credentials when Lake Formation access is denied.
Relationship to Full Table Access (FTA)
The existing Full Table Access feature uses the GetTemporaryGlueTableCredentials API, which vends credentials based on table names. The Lake Formation Access Grants Plugin uses the GetTemporaryDataLocationCredentials API, which vends credentials based on S3 paths.
You can use both features together. When both are enabled, FTA credentials take precedence for table-name queries, while the plugin handles direct S3 path queries.
Prerequisites
Before you use the Lake Formation Access Grants Plugin, complete the following steps:
-
Set up Full Table Access (FTA) for your Amazon EMR environment. For instructions, see .
-
In the Lake Formation console, register the S3 path with the owner account:
-
Navigate to Data lake locations.
-
Select the location and enable Register S3 path with Owner Account.
-
-
Grant the runtime role the appropriate Lake Formation permissions on the table:
-
To read table data, grant the runtime role full
SELECTpermission on the table. -
To modify table data, grant the runtime role
SUPERpermission on the table.
-
Enable the Lake Formation Access Grants Plugin
To enable the plugin, set the following S3A configuration properties in your Spark session:
| Property | Default | Description |
|---|---|---|
fs.s3a.lakeformation.access.grants.enabled |
FALSE |
Enables the Lake Formation Access Grants Plugin. |
fs.s3a.lakeformation.access.grants.fallback.to.iam |
FALSE |
Enables fallback to S3 Access Grants or IAM role credentials if Lake Formation denies access. |
Fallback behavior
When fs.s3a.lakeformation.access.grants.fallback.to.iam is set to
true, the plugin uses a fallback chain if Lake Formation denies access. This is
useful in scenarios where some S3 paths are registered with Lake Formation and others are managed
through S3 Access Grants or IAM policies.
When fallback is enabled, the plugin attempts to obtain credentials in the following order:
-
Lake Formation – The plugin calls GetTemporaryDataLocationCredentials. If Lake Formation grants access, the plugin returns those credentials.
-
S3 Access Grants – If Lake Formation denies access, the plugin checks whether S3 Access Grants can provide credentials for the requested path. If an S3 Access Grants instance covers the path and the caller has a matching grant, the plugin uses those credentials.
-
IAM role – If both Lake Formation and S3 Access Grants deny access, the plugin falls back to the IAM role credentials attached to the job.
Considerations and limitations
-
The plugin supports Apache Hive, Apache Hudi, and Delta Lake table formats. Apache Iceberg is not currently supported.
-
The plugin is not currently supported with Amazon EMR Spark Fine-Grained Access Control (FGAC) mode.
-
Path-based credential vending requires a 1:1 mapping between a table and its S3 location in the AWS Glue Data Catalog.