

# Trusted Identity Propagation with AWS Glue ETL
<a name="security-trusted-identity-propagation"></a>

With IAM Identity Center, you can connect to identity providers (IdPs) and centrally manage access for users and groups across AWS analytics services. You can integrate identity providers such as Okta, Ping, and Microsoft Entra ID (formerly Azure Active Directory) with IAM Identity Center for users in your organization to access data using a single-sign on experience. IAM Identity Center also supports connecting additional third-party identity providers. 

With AWS Glue 5.0 and higher, you can propagate user-identities from IAM Identity Center to AWS Glue interactive sessions. AWS Glue Interactive Sessions will further propagate supplied identity to downstream services such as Amazon S3 Access Grants, AWS Lake Formation, and Amazon Redshift, enabling secure data access via user identity in these downstream services. 

## Overview
<a name="security-trusted-identity-propagation-overview"></a>

 [Identity Center](https://docs.aws.amazon.com/singlesignon/latest/userguide/what-is.html) is the recommended approach for workforce authentication and authorization on AWS for organizations of any size and type. With Identity Center, you can create and manage user identities in AWS, or connect your existing identity source, including Microsoft Active Directory, Okta, Ping Identity, JumpCloud, Google Workspace, and Microsoft Entra ID (formerly Azure AD). 

[ Trusted identity propagation](https://docs.aws.amazon.com/singlesignon/latest/userguide/trustedidentitypropagation-overview.html) is an IAM Identity Center feature that administrators of connected AWS services can use to grant and audit access to service data. Access to this data is based on user attributes such as group associations. Setting up trusted identity propagation requires collaboration between the administrators of connected AWS services and the IAM Identity Center administrators. 

## Features and benefits
<a name="security-trusted-identity-propagation-features"></a>

The AWS Glue interactive sessions integration with IAM Identity Center [ Trusted identity propagation ](https://docs.aws.amazon.com/singlesignon/latest/userguide/trustedidentitypropagation-overview.html) provides the following benefits:
+ The ability to enforce table-level authorization and fine grained access control with Identity Center identities on Lake Formation managed AWS Glue data catalog tables.
+ The ability to enforce authorization with Identity Center identities on Amazon Redshift clusters.
+ Enables end to end tracking of user actions for auditing.
+ The ability to enforce Amazon S3 prefix-level authorization with Identity Center identities on Amazon S3 Access Grants-managed Amazon S3 prefixes.

## Use cases
<a name="security-trusted-identity-propagation-use-cases"></a>

**Interactive Data Exploration and Analysis**  
 Data engineers use their corporate identities to seamlessly access and analyze data across multiple AWS accounts. Through SageMaker Studio, they launch interactive Spark sessions via AWS Glue ETL, connecting to various data sources including Amazon S3 and the AWS Glue Data Catalog. As engineers explore datasets, Spark enforces fine-grained access controls defined in Lake Formation based on their identities, ensuring they can only view authorized data. All queries and data transformations are logged with the user's identity, creating a clear audit trail. This streamlined approach enables rapid prototyping of new analytics products while maintaining strict data governance across client environments. 

**Data Preparation and Feature Engineering**  
 Data scientists from multiple research teams collaborate on complex projects using a unified data platform. They log into SageMaker Studio with their corporate credentials, immediately accessing a vast, shared data lake that spans multiple AWS accounts. As they begin feature engineering for new machine learning models, Spark sessions launched through AWS Glue ETL enforce Lake Formation's column and row-level security policies based on their propagated identities. Scientists can efficiently prepare data and engineer features using familiar tools, while compliance teams have assurance that every data interaction is automatically tracked and audited. This secure, collaborative environment accelerates research pipelines while maintaining the strict data protection standards required in regulated industries. 

## How it works
<a name="security-trusted-identity-propagation-how-it-works"></a>

![\[Architecture diagram showing AWS Glue Interactive Sessions workflow. A user logs into client-facing applications (SageMaker Unified Studio, or custom applications) through IAM Identity Center. The user's identity is propagated to AWS Glue Interactive Sessions, which connects to access control services including IAM Identity Center, AWS Lake Formation, AWS Glue Data Catalog, and Amazon S3 Access Grant, before finally accessing S3 Storage.\]](http://docs.aws.amazon.com/glue/latest/dg/images/GlueISSMAI.png)


 A user logs into client-facing applications (SageMaker AI, or custom applications) using their corporate identity through IAM Identity Center. This identity is then propagated through the entire data access pipeline. 

 The authenticated user launches AWS AWS Glue Interactive Sessions, which serve as the compute engine for data processing. These sessions maintain the user's identity context throughout the workflow. 

 AWS Lake Formation and the AWS Glue Data Catalog work together to enforce fine-grained access controls. Lake Formation applies security policies based on the user's propagated identity, while Amazon S3 Access Grant provides additional permission layers, ensuring users can only access data they're authorized to view. 

 Finally, the system connects to Amazon S3 Storage where the actual data resides. All access is governed by the combined security policies, maintaining data governance while enabling interactive data exploration and analysis. This architecture enables secure, identity-based data access across multiple AWS services while maintaining a seamless user experience for data scientists and engineers working with large datasets. 

## Integrations
<a name="security-trusted-identity-propagation-integrations"></a>

### AWS managed development environment
<a name="security-trusted-identity-propagation-aws-managed"></a>

The following AWS managed client-facing applications support trusted identity propagation with AWS Glue interactive sessions:
+ [Sagemaker Unified Studio](https://aws.amazon.com/sagemaker/unified-studio/)
+ [Amazon SageMaker AI](https://aws.amazon.com/sagemaker-ai/)

**Sagemaker Unified Studio**  
To use trusted identity propagation with Sagemaker Unified Studio:

1. Set up Sagemaker Unified Studio project with trusted identity propagation enabled as the client-facing development environment. 

1. Set up [ Lake Formation](https://docs.aws.amazon.com/en_us/singlesignon/latest/userguide/tip-tutorial-lf.html) to enable fine-grained access control for AWS Glue tables based on the user or group in IAM Identity Center.

1. [Set up Amazon S3 Access Grants ](https://docs.aws.amazon.com/en_us/singlesignon/latest/userguide/tip-tutorial-s3.html) to enable temporary access to the underlying data locations in Amazon S3.

1. Open Sagemaker Unified Studio JupyterLab IDE space and select AWS Glue as compute for notebook execution.

### Customer managed self-hosted Notebook environment
<a name="security-trusted-identity-propagation-customer-managed"></a>

To enable trusted identity propagation for users of custom-developed applications, see [ Access AWS services programmatically using trusted identity propagation ](https://aws.amazon.com/blogs/security/access-aws-services-programmatically-using-trusted-identity-propagation/) in the AWS Security Blog. 

# Getting started with trusted identity propagation in AWS Glue ETL
<a name="security-trusted-identity-propagation-getting-started"></a>

This section helps you configure AWS Glue application with interactive sessions to integrate with IAM Identity Center and enable [Trusted identity propagation](https://docs.aws.amazon.com/singlesignon/latest/userguide/trustedidentitypropagation-overview.html). 

## Prerequisites
<a name="security-trusted-identity-propagation-prerequisites"></a>
+ An Identity Center instance in the AWS region where you want to create Trusted identity propagation enabled AWS Glue interactive sessions. An Identity Center instance can only exist in a single region for an AWS account. For more information, see [ Enable IAM Identity Center ](https://docs.aws.amazon.com/singlesignon/latest/userguide/get-started-enable-identity-center.html) and [ provision the users and groups from your source of identities into IAM Identity Center ](https://docs.aws.amazon.com/singlesignon/latest/userguide/tutorials.html). 
+ Enable Trusted identity propagation for downstream services such as Lake Formation or Amazon S3 Access Grants or Amazon Redshift cluster with which interactive workload interacts to access data.

## Permissions needed to connect AWS Glue ETL with IAM Identity Center
<a name="security-trusted-identity-propagation-permissions"></a>

**Create an IAM role**  
The role that creates IAM Identity Center connection requires permissions to create and modify application configuration in AWS Glue and IAM Identity Center as in the following inline policy.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:CreateGlueIdentityCenterConfiguration",
                "sso:CreateApplication",
                "sso:PutApplicationAssignmentConfiguration",
                "sso:PutApplicationAuthenticationMethod",
                "sso:PutApplicationGrant",
                "sso:PutApplicationAccessScope",
                "sso:ListInstances"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
```

------

The following inline policies contain specific permissions required to view, update, and delete properties of AWS Glue integration with IAM Identity Center.

Use the following inline policy to allow an IAM role to view a AWS Glue integration with IAM Identity Center.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:GetGlueIdentityCenterConfiguration"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
```

------

Use the following inline policy to allow an IAM role to update AWS Glue integration with IAM Identity Center.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:UpdateGlueIdentityCenterConfiguration",
                "sso:PutApplicationAccessScope",
                "sso:DeleteApplicationAccessScope"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
```

------

Use the following inline policy to allow an IAM role to delete a AWS Glue integration with IAM Identity Center.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "glue:DeleteGlueIdentityCenterConfiguration",
                "sso:DeleteApplication"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}
```

------

### Permissions description
<a name="security-trusted-identity-propagation-permissions-description"></a>
+ `glue:CreateGlueIdentityCenterConfiguration` – Grants permission to create the AWS Glue IdC configuration.
+ `glue:GetGlueIdentityCenterConfiguration` – Grants permission to get an existing IdC configuration.
+ `glue:DeleteGlueIdentityCenterConfiguration` – Grants permission to delete an existing AWS Glue IdC configuration.
+ `glue:UpdateGlueIdentityCenterConfiguration` – Grants permission to update an existing AWS Glue IdC configuration.
+ `sso:CreateApplication` – Grants permission to create a AWS Glue managed IAM Identity Center application.
+ `sso:DescribeApplication` - Grants permission to describe a AWS Glue managed IAM Identity Center application.
+ `sso:DeleteApplication` – Grants permission to delete a AWS Glue managed IAM Identity Center application.
+ `sso:UpdateApplication` – Grants permission to update a AWS Glue managed IAM Identity Center application.
+ `sso:PutApplicationGrant` – Grants permission to apply token-exchange, introspectToken, refreshToken and RevokeToken grants on IdC Application.
+ `sso:PutApplicationAuthenticationMethod` – Grants permission to put authenticationMethod on AWS Glue managed IdC Application that allows AWS Glue service principal to interact with IdC Application.
+ `sso:PutApplicationAccessScope` – Grants permission to add or update the list of authorized down stream service scopes on the AWS Glue managed IdC application.
+ `sso:DeleteApplicationAccessScope` - Grants permission to delete downstream scopes if a scope is removed for the AWS Glue managed IdC application.
+ `sso:PutApplicationAssignmentConfiguration` – Grants permission to set "User-assignment-not-required" setting on IdC Application.
+ `sso:ListInstances` – Grants permission to list instances and validate the IdC InstanceArn that you specify in identity-center-configuration parameter.

## Connecting AWS Glue with IAM Identity Center
<a name="security-trusted-identity-propagation-connecting"></a>

When AWS Glue is connected to IAM Identity Center, it creates a singleton managed IdC application per account. The following example shows how you can connect AWS Glue with IAM Identity Center:

```
aws glue create-glue-identity-center-configuration \
--instance-arn arn:aws:sso:::instance/ssoins-123456789 \
--scopes '["s3:access_grants:read_write", "redshift:connect","lakeformation:query"]'
```

To update the scopes of the managed application (usually done to propagate to more downstream services), you can use:

```
aws glue update-glue-identity-center-configuration \
--scopes '["s3:access_grants:read_write", "redshift:connect","lakeformation:query"]'
```

Scopes parameter is optional and all scopes will be added if not provided. The supported values are `s3:access_grants:read_write`, `redshift:connect` and `lakeformation:query`.

To get the details of the configuration, you can use:

```
aws glue get-glue-identity-center-configuration
```

You can delete the connection between AWS Glue and IAM Identity Center by using the following command:

```
aws glue delete-glue-identity-center-configuration
```

**Note**  
AWS Glue creates a service managed Identity Center Application in your account that service leverages for identity validations and identity propagation to downstream services. AWS Glue created managed Identity Center Application is shared across all trusted-identity-propagation sessions in your account.   
**Warning:** Do not manually modify settings on the managed Identity Center Application. Any changes could affect all trusted-identity-propagation enabled AWS Glue interactive sessions in your account. 

## Creating a AWS Glue Interactive Session with Trusted Identity Propagation Enabled
<a name="security-trusted-identity-propagation-creating-session"></a>

After you connect AWS Glue with IAM Identity Center, you can use [ identity-enhanced role credentials](https://docs.aws.amazon.com/singlesignon/latest/userguide/trustedidentitypropagation-identity-enhanced-iam-role-sessions.html) to create a AWS Glue interactive session. You need not pass additional parameters when creating a 5.0 AWS Glue session. Since AWS Glue is connected with IAM identity center, if AWS Glue detects identity-enhanced-role-credentials, it will automatically propagate the identity information to downstream services which are called as part of your statements. However, the runtime role for the session needs to have the `sts:SetContext` permission as depicted below. 

**Runtime Role permissions to propagate identity**  
 As AWS Glue sessions leverage [ Identity-enhanced credentials ](https://docs.aws.amazon.com/singlesignon/latest/userguide/trustedidentitypropagation-identity-enhanced-iam-role-sessions.html) to propagate identity to downstream AWS services, its runtime role's trust-policy need to have addition permission `sts:SetContext` to allow identity propagation to downstream services (Amazon S3 access-grant, Lake Formation, Amazon Redshift). To learn more about how to create a runtime role, see [Setting up a runtime role](https://docs.aws.amazon.com/glue/latest/dg/create-service-role.html). 

------
#### [ JSON ]

****  

```
{
  "Version":"2012-10-17",		 	 	 
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "glue.amazonaws.com"
      },
      "Action": [
        "sts:AssumeRole",
        "sts:SetContext"
      ]
    }
  ]
}
```

------

Additionally, Runtime role would need permissions for downstream AWS services which job-run would invoke to fetch data using user identity. Please refer to the following links to configure Amazon S3 Access Grants and Lake Formation:
+ [Using Lake Formation with AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/security-lake-formation-fgac.html)
+ [Using Amazon S3 Access Grants with AWS Glue](https://docs.aws.amazon.com/glue/latest/dg/security-s3-access-grants.html)

# Considerations and limitations for AWS Glue ETL Trusted Identity Propagation integration
<a name="security-trusted-identity-propagation-considerations"></a>

**Important**  
 By default sessions are not private which means one IdC user can access another IdC user's session. You can use [tagOnCreate](https://docs.aws.amazon.com/glue/latest/dg/glue-is-security.html#glue-is-tagoncreate) to make your sessions private. For example, the session can be tagged with an owner tag and the value of it as IDC User Id and then on the policy, you can use a global condition key like [https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-identity-store-user-id](https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_policies_condition-keys.html#condition-keys-identity-store-user-id) to validate against the owner tag in the client principal/runtime role policy for all session API operations to ensure that one IdC User isn't able to access another IdC user's session. 

Consider the following points when you use IAM Identity Center Trusted Identity Propagation with AWS Glue Application:
+ Trusted Identity Propagation through Identity Center is supported on AWS Glue 5.0 and higher, and only with AWS Glue interactive sessions. 
+ AWS Glue data catalog is covered under Lake Formation identity center integration.
+ Trusted Identity Propagation is limited to interactive sessions in AWS Glue, excluding other data processing entities like jobs, triggers, workflows, and ML tasks. All AWS Glue APIs, however, record user identities in AWS CloudTrail for auditing.
+ AWS Glue currently supports integration with IAM Identity Center exclusively through API and CLI interfaces, not via the console. 
+ Once an application is enabled on AWS Glue side, make sure to create 5.0 sessions with IdC Credentials but don't create a 4.0 session with IdC credentials.
+ Trusted Identity Propagation with AWS Glue is supported in the following AWS Regions:
  + af-south-1 – Africa (Cape Town)
  + ap-east-1 – Asia Pacific (Hong Kong)
  + ap-northeast-1 – Asia Pacific (Tokyo)
  + ap-northeast-2 – Asia Pacific (Seoul)
  + ap-northeast-3 – Asia Pacific (Osaka)
  + ap-south-1 – Asia Pacific (Mumbai)
  + ap-southeast-1 – Asia Pacific (Singapore)
  + ap-southeast-2 – Asia Pacific (Sydney)
  + ap-southeast-3 – Asia Pacific (Jakarta)
  + ca-central-1 – Canada (Central)
  + eu-central-1 – Europe (Frankfurt)
  + eu-north-1 – Europe (Stockholm)
  + eu-south-1 – Europe (Milan)
  + eu-west-1 – Europe (Ireland)
  + eu-west-2 – Europe (London)
  + eu-west-3 – Europe (Paris)
  + me-south-1 – Middle East (Bahrain)
  + sa-east-1 – South America (São Paulo)
  + us-east-1 – US East (N. Virginia)
  + us-east-2 – US East (Ohio)
  + us-west-1 – US West (N. California)
  + us-west-2 – US West (Oregon)

# User background sessions for AWS Glue ETL
<a name="user-background-sessions"></a>

User background sessions enable long-running analytics and machine learning workloads to continue even after the user has logged off from their notebook interface. This capability is implemented through AWS Glue's trusted identity propagation feature. The following page explains the configuration options and behaviors for user background sessions. 

**Note**  
User background sessions apply to AWS Glue interactive sessions initiated through notebook interfaces like SageMaker Unified Studio. Enabling or disabling this feature affects only new interactive sessions; existing active sessions are not impacted. 

## Configure user background sessions
<a name="configure-user-background-sessions"></a>

User background sessions must be enabled at two levels for proper functionality:

1. IAM Identity Center instance level (configured by IdC administrators)

1. AWS Glue Identity Center configuration level (configured by AWS Glue administrators)

### Enable user background sessions for AWS Glue
<a name="enable-user-background-sessions-glue"></a>

To enable user background sessions for AWS Glue, you must set the `userBackgroundSessionsEnabled` parameter to `true` in the Identity Center configuration when creating or updating the configuration. 

Prerequisites
+ Your IAM role that is used to create/update the AWS Glue Identity Center configuration must have the `sso:PutApplicationSessionConfiguration` permission. This permission allows AWS Glue to enable user background sessions at the AWS Glue-managed IdC application level.
+ Your AWS Glue interactive sessions must use AWS Glue version 5.0 or later and must be Trusted Identity Propagation enabled.

To enable user background sessions using the AWS CLI:

```
aws glue create-glue-identity-center-configuration \
    --instance-arn "arn:aws:sso:::instance/ssoins-1234567890abcdef" \
    --user-background-sessions-enabled
```

To update an existing configuration:

```
aws glue update-glue-identity-center-configuration \
    --user-background-sessions-enabled
```

#### Configuration matrix
<a name="configuration-matrix"></a>

The effective user background session configuration depends on both the AWS Glue configuration setting and the IAM Identity Center instance-level settings: 


| IAM Identity Center userBackgroundSession Enabled? | AWS Glue userBackgroundSessionsEnabled | Behavior | 
| --- | --- | --- | 
| Yes | TRUE | User background sessions enabled | 
| Yes | FALSE | Session expires with user logout | 
| No | TRUE | Session creation fails with Exception | 
| No | FALSE | Session expires with user logout | 

## Default user background session duration
<a name="default-user-background-session-duration"></a>

By default, all user background sessions have a duration limit of 7 days in IAM Identity Center. Administrators can modify this duration in the IAM Identity Center console. This setting applies at the IAM Identity Center instance level, affecting all supported IAM Identity Center applications within that instance. 
+ Duration can be set to any value from 15 minutes up to 90 days
+ This setting is configured in the IAM Identity Center console under Settings → Authentication → Configure (Non-Interactive Jobs section)

**Note**  
AWS Glue interactive sessions have a separate idle timeout limit of 48 hours by default. Sessions will terminate when either the AWS Glue session idle timeout or the user background session duration is reached, whichever comes first. 

## Impact of disabling user background sessions
<a name="impact-disabling-user-background-sessions"></a>

When user background sessions are disabled at the AWS Glue configuration level:
+ **Existing interactive sessions:** Continue to run without interruption if they were started with user background sessions enabled. These sessions will continue using their existing background session tokens until they terminate naturally or are explicitly stopped.
+ **New interactive sessions:** Will use the standard trusted identity propagation flow and will terminate when the user logs out or their interactive session expires (such as when closing a SageMaker Unified Studio JupyterLab notebook).

### Changing user background sessions duration
<a name="changing-user-background-sessions-duration"></a>

When the duration setting for user background sessions is modified in IAM Identity Center:
+ **Existing interactive sessions:** Continue to run with the same background session duration with which they were started
+ **New interactive sessions:** Will use the new session duration for background sessions

## Runtime considerations
<a name="runtime-considerations"></a>

### Session termination conditions
<a name="session-termination-conditions"></a>

When using user background sessions, a AWS Glue interactive session will continue running until one of the following occurs: 
+ The user background session expires (based on IdC configuration, up to 90 days)
+ The user background session is manually revoked by an administrator
+ The AWS Glue interactive session reaches its idle timeout (default: 48 hours after the last executed statement)
+ The user explicitly stops or restarts the notebook kernel

### Data persistence
<a name="data-persistence"></a>

When using user background sessions:
+ Users cannot reconnect to their notebook interface to view results once they have logged out
+ Configure your Spark statements to write results to persistent storage (such as Amazon S3) before execution completes

### Cost implications
<a name="cost-implications"></a>
+ Jobs will continue to run to completion even after users end their SageMaker Unified Studio JupyterLab session and will incur charges for the entire duration of the completed run
+ Monitor your active background sessions to avoid unnecessary costs from forgotten or abandoned sessions

### Feature availability
<a name="feature-availability"></a>

User background sessions for AWS Glue are available for:
+ AWS Glue interactive sessions only (AWS Glue jobs and streaming jobs are not supported)
+ AWS Glue version 5.0 and later
+ Trusted Identity Propagation enabled configurations only