Enable Lake Formation with Feature Groups
When you enable AWS Lake Formation on a feature group in Amazon SageMaker Feature Store, you can enforce column-level, row-level, and cell-level security for the feature data in your offline store. Instead of managing access through individual IAM policies on Amazon S3 and AWS Glue resources, you use the Lake Formation grant and revoke permission model to control which users and roles can access specific features and records. For more information about Lake Formation, see the AWS Lake Formation Developer Guide.
Important
Lake Formation access control applies to the offline store only. The offline store is backed by
Amazon S3 and registered in the AWS Glue Data Catalog, which Lake Formation governs. Online store access
continues to be controlled through IAM policies. To set up Lake Formation, you
use the FeatureGroupManager and LakeFormationConfig classes from the
SageMaker AI Python SDK (sagemaker.mlops.feature_store). Lake Formation supports
hybrid access mode,
which allows both IAM policies and Lake Formation permissions to coexist during a
gradual migration.
Prerequisites
Before you enable Lake Formation, verify that you have the following:
-
A SageMaker AI feature group with an offline store configured, or you can create a new one with an offline store as part of the setup. Lake Formation requires an offline store because it governs access through the AWS Glue Data Catalog table that the offline store creates.
-
An IAM execution role with appropriate permissions. The following example shows the minimum IAM policy required. Replace the placeholder values with your own.
{ "Version": "2012-10-17", "Statement": [ { "Sid": "SageMakerFeatureGroupOperations", "Effect": "Allow", "Action": [ "sagemaker:CreateFeatureGroup", "sagemaker:DescribeFeatureGroup" ], "Resource": "arn:aws:sagemaker:*:*:feature-group/*" }, { "Sid": "LakeFormation", "Effect": "Allow", "Action": [ "lakeformation:RegisterResource", "lakeformation:GrantPermissions", "lakeformation:RevokePermissions", "lakeformation:ListPermissions" ], "Resource": "*" }, { "Sid": "GlueCatalogReadAccess", "Effect": "Allow", "Action": [ "glue:GetTable", "glue:GetDatabase" ], "Resource": [ "arn:aws:glue:*:*:catalog", "arn:aws:glue:*:*:database/sagemaker_featurestore", "arn:aws:glue:*:*:table/sagemaker_featurestore/*" ] }, { "Sid": "GlueCatalogTableCreate", "Effect": "Allow", "Action": [ "glue:CreateTable" ], "Resource": [ "arn:aws:glue:*:*:catalog", "arn:aws:glue:*:*:database/sagemaker_featurestore", "arn:aws:glue:*:*:table/sagemaker_featurestore/*" ] }, { "Sid": "PassOfflineStoreRole", "Effect": "Allow", "Action": "iam:PassRole", "Resource": "arn:aws:iam::<account-id>:role/<offline-store-role-name>" }, { "Sid": "S3FeatureStoreStorage", "Effect": "Allow", "Action": [ "s3:GetBucketAcl", "s3:GetBucketLocation", "s3:ListBucket" ], "Resource": "arn:aws:s3:::<offline-store-bucket-name>" }, { "Sid": "LakeFormationRegistrationRole", "Effect": "Allow", "Action": [ "iam:GetRole", "iam:GetRolePolicy" ], "Resource": "arn:aws:iam::<account-id>:role/<registration-role-name>" } ] }Note
The
LakeFormationRegistrationRolestatement grants permissions to read the role used to register the Amazon S3 location with Lake Formation. If you use the Lake Formation service-linked role (use_service_linked_role=True), set the resource toarn:aws:iam::<account-id>:role/aws-service-role/lakeformation.amazonaws.com/AWSServiceRoleForLakeFormationDataAccess. If you provide your own registration role, set it to that role's ARN.Note
If hybrid access mode is disabled on the
sagemaker_featurestoredatabase, the caller must also have the Lake FormationCREATE_TABLEpermission on the database. The Lake Formation administrator can grant this permission through the Lake Formation console or API. -
A Lake Formation administrator configured in your account. You must designate at least one IAM user or role as a Lake Formation administrator. For setup instructions, see Getting started with Lake Formation in the Lake Formation documentation.
-
The SageMaker AI Python SDK version 3.8.0 or later. Install or upgrade the
sagemakerpackage:pip install --upgrade sagemaker>=3.8.0 -
Cross-account configuration (if applicable). If the feature group's offline store Amazon S3 bucket is in a different AWS account, additional Lake Formation cross-account sharing configuration is required, including AWS Glue resource policies, AWS RAM share acceptance, resource links, and consumer database setup. For more information, see Sharing data across accounts in the Lake Formation documentation.
Key concepts
The following concepts are important for understanding how Lake Formation works with Feature Store.
Lake Formation permission model compared to IAM-only model
By default, access to AWS Glue Data Catalog tables, including those created by Feature Store, is controlled
through IAM policies alone. When you enable Lake Formation, access requires both
IAM permissions and Lake Formation permissions. Lake Formation uses a grant and revoke
model where you explicitly grant permissions, such as SELECT or
DESCRIBE, on specific databases, tables, or columns to IAM principals.
Hybrid access mode
When you enable Lake Formation, you choose whether to use hybrid access mode or Lake Formation-only mode:
-
Hybrid access mode (
hybrid_access_mode_enabled=True): Both IAM policies and Lake Formation permissions are evaluated. Principals that have access through existing IAM policies continue to have access, and you can additionally grant fine-grained access through Lake Formation. This is useful for gradual migration. -
Lake Formation-only mode (
hybrid_access_mode_enabled=False): Only Lake Formation permissions are evaluated. Existing IAM-based access to the AWS Glue table is revoked. This provides the strongest access control but can break existing workloads.
Warning
When you set hybrid_access_mode_enabled=False, the SDK revokes the
IAMAllowedPrincipal grant on the AWS Glue table. Any existing jobs, notebooks,
or pipelines that access this table through IAM permissions alone immediately lose
access. Verify that you have granted the necessary Lake Formation permissions to all
principals that need access before you disable hybrid access mode.
Note
You must disable hybrid access mode for cross-account access when the table format is Iceberg.
S3 deny policy
Even after you enable Lake Formation on the AWS Glue Data Catalog table, users with direct Amazon S3 access (through IAM policies) can bypass Lake Formation by reading the underlying Amazon S3 objects directly. To close this gap, apply an Amazon S3 bucket policy that denies direct access to the offline store prefix for all principals except the Lake Formation service role and the Feature Store execution role.
Important
The SDK does not automatically apply the Amazon S3 deny policy. After you enable Lake Formation, the SDK logs a recommended bucket policy as a warning message. Review this policy and apply it to your Amazon S3 bucket to enforce access control end-to-end.
Configuration reference
Use the LakeFormationConfig class to configure Lake Formation. You
pass this configuration to FeatureGroupManager.create() when creating a new
feature group, or use the individual parameters directly with
enable_lake_formation() for existing feature groups.
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
enabled |
bool | False | No | Set to True to activate Lake Formation on the feature
group's offline store. |
use_service_linked_role |
bool | True | No | Whether to use the Lake Formation service-linked role for registering the S3
location. Set to False if you use a custom registration role. You cannot
use a service-linked role for cross-account access or when using third-party query
engines (such as Apache Spark). For third-party engines, you must provide your own
registration role through registration_role_arn. |
registration_role_arn |
str | None | Conditional | The ARN of a custom IAM role for registering the offline store S3 location with
Lake Formation. Required when use_service_linked_role is
False. |
hybrid_access_mode_enabled |
bool | — | Yes | Whether to revoke IAMAllowedPrincipal from the AWS Glue table.
False = Lake Formation-only permissions. True = hybrid mode
(both IAM and Lake Formation). You must explicitly choose. |
acknowledge_risk |
bool | — | Yes | Must be True to proceed. This is a safety confirmation. Setting to
False raises a RuntimeError before any operations are
performed. |
Understanding acknowledge_risk
The acknowledge_risk parameter is a safety gate. By setting it to
True, you acknowledge the following:
-
If
hybrid_access_mode_enabled=False: Existing IAM-based jobs, notebooks, and pipelines that query this AWS Glue table lose access immediately. You must grant Lake Formation permissions to those principals before or shortly after enabling. -
If Lake Formation permissions are not correctly configured: The data in the feature group might become inaccessible to all users until permissions are corrected.
-
The operation modifies AWS Glue Data Catalog permissions: These changes affect all consumers of the AWS Glue table, not just Feature Store users. Any Athena queries, Spark jobs, or other services that read from this table are affected.
How it works
When you enable Lake Formation, the SDK performs a three-phase setup:
-
Register the S3 location. The SDK registers the offline store's S3 path with Lake Formation using either the Lake Formation service-linked role or a custom registration role you specify. If you use third-party query engines (such as Apache Spark), you must provide your own registration role because the service-linked role does not support third-party engine access.
-
Grant Lake Formation permissions. The SDK grants
SELECT,INSERT,DELETE,DESCRIBE, andALTERpermissions on the AWS Glue Data Catalog table to the feature group's execution role. -
Optionally revoke IAMAllowedPrincipal. If you set
hybrid_access_mode_enabled=False, the SDK revokes theIAMAllowedPrincipalgrant on the AWS Glue table.
After all phases complete, the SDK logs a recommended S3 bucket deny policy. Review and apply this policy to prevent direct Amazon S3 access that bypasses Lake Formation access control.
Note
If the setup fails partway through (for example, after registering the S3 location but
before granting permissions), you can safely rerun the enable_lake_formation()
method. The SDK is idempotent and skips steps that have already completed
successfully.
Usage examples
Create a new feature group with Lake Formation access control
The following example creates a new feature group with an offline store and enables Lake Formation access control in a single operation.
from sagemaker.core.helper.session_helper import Session from sagemaker.core.shapes import ( FeatureDefinition, OfflineStoreConfig, OnlineStoreConfig, S3StorageConfig, ) from sagemaker.mlops.feature_store import FeatureGroupManager, LakeFormationConfig session = Session() region = session.boto_region_name role_arn = "arn:aws:iam::<account-id>:role/<execution-role-name>" registration_role_arn = "arn:aws:iam::<account-id>:role/<registration-role-name>" fg_name = "my-feature-group" bucket = session.default_bucket() offline_s3_uri = f"s3://{bucket}/feature-store/{fg_name}" feature_definitions = [ FeatureDefinition(feature_name="customer_id", feature_type="String"), FeatureDefinition(feature_name="event_time", feature_type="String"), FeatureDefinition(feature_name="age", feature_type="Integral"), FeatureDefinition(feature_name="total_purchases", feature_type="Integral"), FeatureDefinition(feature_name="avg_order_value", feature_type="Fractional"), ] fg = FeatureGroupManager.create( feature_group_name=fg_name, record_identifier_feature_name="customer_id", event_time_feature_name="event_time", feature_definitions=feature_definitions, online_store_config=OnlineStoreConfig(enable_online_store=False), offline_store_config=OfflineStoreConfig( s3_storage_config=S3StorageConfig(s3_uri=offline_s3_uri), table_format="Iceberg", ), role_arn=role_arn, description="A feature group with Lake Formation-managed offline store", # Example configuration — adjust parameters for your use case. lake_formation_config=LakeFormationConfig( enabled=True, use_service_linked_role=False, registration_role_arn=registration_role_arn, # Set to True to keep existing IAM-based access (hybrid mode). hybrid_access_mode_enabled=False, acknowledge_risk=True, ), region=region, )
Enable Lake Formation on an existing feature group
The following example enables Lake Formation on a feature group that already exists.
from sagemaker.mlops.feature_store import FeatureGroupManager fg = FeatureGroupManager.get(feature_group_name="my-existing-feature-group") result = fg.enable_lake_formation( use_service_linked_role=False, registration_role_arn="arn:aws:iam::<account-id>:role/<registration-role-name>", hybrid_access_mode_enabled=False, acknowledge_risk=True, )
Use Lake Formation-enabled feature groups with SageMaker AI jobs
You can access Lake Formation-enabled feature groups from SageMaker AI training and processing jobs. To read Lake Formation-enabled feature data, you can either query the data through Athena or use the Lake Formation GetTemporaryGlueTableCredentials API to vend temporary Amazon S3 credentials scoped to the AWS Glue table. For more information about configuring Lake Formation permissions for compute roles, see Lake Formation permissions reference in the Lake Formation documentation.
Cross-account access
Lake Formation supports sharing feature group data across AWS accounts. When the producer account enables Lake Formation on a feature group, it can grant cross-account access to consumer accounts using either the named resource method or Lake Formation tag-based access control (LF-TBAC).
Cross-account sharing requires the following setup:
-
The producer account grants cross-account permissions on the AWS Glue Data Catalog table to the consumer account, AWS Organization, or organizational unit.
-
If the accounts are not in the same AWS Organization, the consumer account accepts the AWS RAM resource share invitation. For more information, see Accepting a resource share invitation.
-
The consumer account creates a resource link to the shared table in a local database. Resource links are required for services such as Athena and Amazon Redshift Spectrum to query shared resources. For more information, see About resource links.
-
The Lake Formation administrator in the consumer account grants permissions on the resource link and the underlying shared table to the IAM principals that need access.
Important
If you use third-party query engines (such as Apache Spark), you must enable full table access for the shared table. Third-party engines require full table credential vending because they do not support Lake Formation's column-level or cell-level filtering through session tags. You must also register the Amazon S3 location with a custom registration role instead of the service-linked role. For more information, see Full table access for third-party engines.
For prerequisites, step-by-step instructions, and best practices for cross-account sharing, see the following topics in the AWS Lake Formation Developer Guide:
-
Cross-account access — Overview of cross-account sharing methods and AWS RAM integration.
-
Cross-account prerequisites — Required configuration for both producer and consumer accounts.
-
Sharing Data Catalog tables and databases — Step-by-step instructions for both the named resource method and LF-TBAC method.
-
Tag-based access control — Recommended for managing permissions at scale across multiple accounts.
Grant fine-grained access after setup
After you enable Lake Formation, the Lake Formation administrator can grant fine-grained permissions to other IAM principals using the Lake Formation console, API, or CLI. Lake Formation supports column-level, row-level, and cell-level access control.
For instructions on granting and managing Lake Formation permissions, see the following topics in the AWS Lake Formation Developer Guide:
-
Data filters in Lake Formation for row-level and cell-level filtering