

# Data and catalog connections in IAM-based domains


Amazon SageMaker Unified Studio notebooks can connect to multiple data sources including Amazon S3, AWS Glue Data Catalog, Amazon Athena, Amazon Redshift, and third-party sources. You can query data directly from these sources using SQL cells or Python code. The notebook interface provides built-in connectors for AWS services and supports custom connections for external data sources. Data connections are configured at the project level and shared across notebooks.

## Prerequisites


1. Configured data connections in your Amazon SageMaker Unified Studio project

1. Appropriate IAM permissions to access data sources

1. Network connectivity to external data sources if applicable

## Supported data connections


Amazon SageMaker Unified Studio supports the following data connections for IAM-based domains:

### Databases and data warehouses

+ Amazon DocumentDB
+ Amazon DynamoDB
+ Amazon Redshift
+ Aurora MySQL
+ Aurora PostgreSQL
+ Azure SQL
+ Google BigQuery
+ Microsoft SQL Server
+ MySQL
+ Oracle
+ PostgreSQL
+ Snowflake

### Storage

+ Amazon S3

## AWS resources created by connections


When you create a connection in Amazon SageMaker Unified Studio, the following resources are created in your AWS account(s) behind the scenes:
+ AWS Glue connection - a connection object that stores core connection information.

Those resources are visible in the account where Amazon SageMaker Unified Studio domain is hosted and you can discover and describe them through Console or API/SDK/CLI of the corresponding service (in this case - AWS Glue).

# Connecting to a new data source


## Domain and project VPC configuration


Data sources Amazon DocumentDB, Amazon Redshift, Aurora MySQL, Aurora PostgreSQL, Azure SQL, PostgreSQL, Oracle, Microsoft SQL Server require your project to be configured with a VPC [Step 1](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/configure-vpc-networking-iam-based-domains.html), [Step 2](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/update-individual-projects-vpc.html). You might need to contact administrator to complete that configuration.

**To connect to a new data source**

1. In the navigation pane, choose **Connections**.

1. Choose **Create connection**.

1. Select the connection type in the gallery that opens.

1. For **Name** enter a descriptive name for your connection.

1. Configure the connection parameters based on your selected connection type.

   1. You can select [SSL connection](#ssl-connection) for supported data sources.

1. Configure the authentication details. You can either enter **Username** and **Password** directly in the connection details, or use a secret in AWS Secrets Manager that stores the credentials. You might need to contact your administrator to create a new secret for the connection.

1. Choose **Create connection**.

1. If all validations pass, a new connection will be created.

Your connection will appear under **Data section** in the navigation pane. In that section, select **Connections** to see the list of all available connections. From there you can see connection details such as **Name**, **Connection type**, and **Authorization mode**. You can also check the connection status.

## SSL connection


You can configure SSL connection for Amazon DocumentDB, Amazon Redshift, Aurora MySQL, Aurora PostgreSQL, Azure SQL, Microsoft SQL Server, MySQL, Oracle, and PostgreSQL connection types.

SSL connection enables encrypted communication between Amazon SageMaker Unified Studio and your data source. When enabled, all data transmitted between your notebooks and the database is encrypted using SSL/TLS protocols, protecting sensitive information in transit. This setting is recommended for production environments and required when connecting to databases that enforce SSL connections.

To enable SSL connection select Enforce SSL checkbox when configuring connection properties.

**Note**  
Enabling SSL may slightly increase connection latency due to the encryption overhead.

# Connecting to Amazon S3


You can create a data connection to Amazon S3 when you need to directly access files stored in Amazon S3 buckets from your notebooks. This connection is only required if you want to read or write individual files (such as CSV, JSON, or Parquet files) directly from Amazon S3 storage. If you are working with Data Catalog tables that are backed by Amazon S3, you do not need to create a separate Amazon S3 connection, you can access those tables directly through the catalog.

Before connecting to Amazon S3, complete the one of the following prerequisite options:
+ [Prerequisite option 1 (recommended): Gain access using an access role](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/adding-existing-s3-data.html#adding-existing-s3-access-role)
+ [Prerequisite option 2: Gain access using the project role](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/adding-existing-s3-data.html#adding-existing-s3-project-role)

**To connect to Amazon S3**

1. In the navigation pane, choose **Connections**.

1. Choose **Create connection**.

1. In the gallery that opens, select Amazon S3.

1. For **Name** enter a descriptive name for your connection.

1. Enter S3 URI - optional. If you do not specify an Amazon S3 URI, SageMaker Unified Studio will list all buckets accessible with the provided credential.

1. AWS region - enter the AWS region where the S3 bucket is located.

1. Access role ARN - optional - select an existing IAM role from the dropdown. You might need to contact your Administrator for configuring an access role if you are connecting to S3 bucket in an AWS account that is different from the AWS account where your SageMaker Unified Studio domain is hosted.

1. Choose **Create connection**.

1. If all validations pass, a new Amazon S3 connection will be created.

After creating the connection, you can use it in your notebooks to read and write files directly from the specified S3 location. You can also all the buckets you connected to if you select Data on navigation pane and select S3 buckets tab.

# Connecting to Amazon Redshift


You can create a data connection to Amazon Redshift to query data warehouses from Amazon SageMaker Unified Studio. You can connect to both provisioned clusters and serverless workgroups.

## VPC Requirements


Amazon Redshift connections use different connection methods depending on the tool you are using in Amazon SageMaker Unified Studio.
+ Visual ETL and data processing jobs require your Amazon Redshift database to be in the same VPC as Amazon SageMaker Unified Studio domain. To configure that refer to the following documentation: [Step 1](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/configure-vpc-networking-iam-based-domains.html), [Step 2](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/update-individual-projects-vpc.html). You might need to contact Administrator to complete that configuration.
+ Query Editor and Data notebooks do not require VPC configuration.

## Steps to connect to Amazon Redshift from Amazon SageMaker Unified Studio


1. In the navigation pane, choose **Connections**.

1. Choose **Create connection**.

1. In the gallery that opens, select Amazon Redshift.

1. For **Name** enter a descriptive name for your connection.

1. Configure the connection properties:

   1. **Redshift compute** - Select existing compute or enter a JDBC URL.

      1. JDBC URL format: `jdbc:redshift://endpoint:port/database`.

   1. **EnforceSSL** - Select to enable encrypted communication (recommended).

   1. **JDBC URL Parameters** - Optional configuration parameters for the JDBC/ODBC driver in the following format: `<key1>=<value1>;<key2>=<value2>`.

1. Authentication option, you can select one of the options below

   1. **IAM** - generates a database username based on your IAM identity. You can apply grants directly to this user, or use the RedshiftDbUser principal tag on your project execution role to connect as a different database user.

   1. **Username and password** - Provide a username and password for the database that you are connecting to. We will store your credentials in AWS Secrets Manager.

   1. **AWS Secrets Manager** - Choose a secret in AWS Secrets Manager with the credentials. You might need to contact your Administrator to create a secret.

1. **Access role ARN** - optional. Required when connecting to resources in a different AWS account. See [Gaining access to Amazon Redshift resources](compute-prerequisite-redshift.md).

1. Choose **Create connection**.

## Steps to connect to Amazon SageMaker Unified Studio from Amazon Redshift console pages


These steps describe how Amazon Redshift customers can automatically create an Amazon Redshift connection within the Amazon SageMaker Unified Studio environment from the Amazon Redshift Management Console and Amazon Redshift Query Editor v2 (QEv2). This functionality is available from the following Amazon Redshift console pages:
+ Landing page
+ Provisioned dashboard
+ Serverless dashboard
+ Cluster list
+ Cluster detail
+ Serverless workgroup and namespace list pages
+ Workgroup and namespace detail pages

When a user selects the "Query in SageMaker Unified Studio" option, they will be prompted to choose the Amazon Redshift cluster they want to use. Amazon SageMaker Unified Studio will then issue temporary credentials and provision the session, redirecting the user into the Amazon SageMaker Unified Studio environment where they can begin querying data immediately.

## Steps to connect to Amazon SageMaker Unified Studio from Amazon Redshift Query Editor v2


These steps describe how Amazon Redshift customers can automatically create an Amazon Redshift connection within the Amazon SageMaker Unified Studio environment from the Amazon Redshift Query Editor v2 (QEv2). This functionality is available within Amazon Redshift Query Editor v2 under a "Run in SageMaker Unified Studio" button in the following locations.
+ As a tooltip option next to the "Run" button in the SQL editor
+ As a tooltip option next to the "Run" button in notebook cells

If the connection selected is using IAM authentication then when the user clicks on "Run in Amazon SageMaker Unified Studio" button, the query currently typed in the active editor (SQL or notebook cell) will be executed directly in the Amazon SageMaker Unified Studio environment.

## Redshift Access using IAM Authentication


IAM authentication generates temporary database credentials based on your AWS identity, eliminating the need to manage database passwords. When you connect using IAM authentication, Amazon SageMaker Unified Studio maps your IAM identity to a database user.

### How IAM Authentication Works


When connecting with IAM authentication, your database username is automatically generated based on your IAM identity. By default, the username follows the pattern `IAMR:<your-federated-identity>`. You can customize this behavior using the RedshiftDbUser principal tag on your project execution role to specify a different database user.

### Using Principal Tags


Principal tags allow you to control which database user is used when connecting to Redshift. Configure the RedshiftDbUser tag on your project execution role in IAM. When this tag is set, your connection will use `IAMR:<tag-value>` as the database username instead of the federated IAM identity.

For more information about setting up principal tags, see [Setting up principal tags to connect a cluster or workgroup from query editor v2](https://docs.aws.amazon.com/redshift/latest/mgmt/query-editor-v2-getting-started.html#query-editor-v2-principal-tags-iam).

## Troubleshooting Database Access


If after establishing a connection, you do not see expected databases, schemas, or tables, follow the steps described below.

First identify your effective database user by running the following SQL:

```
SELECT current_user;
```

Your database username follows these patterns:
+ Federated: `IAMR:<your-federated-identity>`
+ With RedshiftDbUser tag on the project execution role: `IAMR:<tag-value>`

### Common Access Issues


**Databases are not displayed**  
Your database user lacks connection privileges. The database administrator must grant:  

```
GRANT CONNECT ON DATABASE <database_name> TO "IAMR:<your-identity>";
```

**Schemas are not displayed**  
Your database user lacks schema usage privileges. The database administrator must grant:  

```
GRANT USAGE ON SCHEMA <schema_name> TO "IAMR:<your-identity>";
```

**Tables are not displayed**  
Your database user lacks table access privileges. The database administrator must grant:  

```
GRANT SELECT ON ALL TABLES IN SCHEMA <schema_name> TO "IAMR:<your-identity>";
```

For more information on grants in Redshift, see [GRANT](https://docs.aws.amazon.com/redshift/latest/dg/r_GRANT.html).