

# IDE spaces in Amazon SageMaker Unified Studio
<a name="ide-spaces"></a>

Amazon SageMaker Unified Studio provides compute spaces for integrated development environments (IDEs) that you can use to code and develop your resources. When you create and use these IDEs in a Amazon SageMaker Unified Studio project, you have access to all the data in that project as well and can share coding work with other project members.

JupyterLab is the IDE application available in Amazon SageMaker Unified Studio. A JupyterLab space is created in your project by default, and you can create additional spaces as desired.

# Creating a new space
<a name="create-space"></a>

To create a space, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to the project you want to create a space in. You can do this by choosing **Browse all projects** from the center menu.

1.  In the **Build** menu, choose **Spaces** to navigate to the **Spaces** tab on the **Compute** page.

1.  Choose **Create space**.

1. Under **Name**, enter a name for the space. This name must be unique. No spaces in the project can use the same name.

1. Under **Application**, choose the IDE application you want to use with the space.

1. Under **Instance**, choose an Amazon EC2 instance that is most compatible with your use case.
**Note**  
 If you use a GPU instance type when configuring your Code Editor application, you must also use a GPU-based image. Within a space, your data is stored in an Amazon EBS volume that persists independently from the life of an instance. You won't lose your data when you change instances. 

1. Under **Image**, choose the image you want to use.

1. Under **EBS space storage**, enter a value from 16 to 100 to choose a storage size.

1. (Optional) Choose the lifecycle configuration for the space.

   1. Find your domain ID. For instructions, see [Get project details](view-project-details.md).

   1. Find your user profile ID:

      1. In the **Compute** panel, choose the **Spaces** tab.

      1. The user profile ID is the string following `default-` of the default JupyterLab space.

   1. [ Create a lifecycle configuration](https://docs.aws.amazon.com/sagemaker/latest/dg/jl-lcc-create.html#jl-lcc-create-console) for Amazon SageMaker Unified Studio and reference your ID values from step b.
**Note**  
Attaching the lifecycle configuration to your Amazon SageMaker AI domain and user profile through the AWS CLI is not currently supported. Use the console to create lifecycle configurations.

   1. Under **Lifecycle configuration**, choose the lifecycle you created.

1. Under **Idle time**, enter the amount of time before the space will time out and stop running.

1. (Optional) Attach a custom file system to your space.

   1. To attatch a custom file system to a domain see [ Attaching a custom file system to a domain with the AWS CLI](https://docs.aws.amazon.com/sagemaker/latest/dg/domain-custom-file-system.html#domain-custom-file-system-cli), and reference your ID values from step 10.

   1. Under **Attach custom file system**, attach the file system.

1. Choose **Create and start space** to create the space and have it start running.

It might take up to 5 minutes for Amazon SageMaker Unified Studio to finish creating the space. After the space is created, you can view it in the list on the **Spaces** tab and choose **Open** to launch the IDE in your browser.

You can then perform various actions using the three-dot **Actions** menu on the **Spaces** tab in Amazon SageMaker Unified Studio. You can edit the space, start or stop running the application, delete the space, or view information about the space.

# Editing a space
<a name="edit-space"></a>

Project members can edit a space at any time to change certain configurations. Before editing a space, make sure that you have saved any changes to your work in that space.

**Note**  
A space's name and IDE application cannot be changed.

To edit configurations for a space, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to the project you want to edit a space in. You can do this by choosing **Browse all projects** from the center menu.

1.  In the **Build** menu, choose **Spaces** to navigate to the **Spaces** tab on the **Compute** page.

1.  Choose the three-dot **Actions** menu next to the space you want to edit. Then choose **Configure space**.

1. Make the desired changes, then choose **Save and restart** if the space is running, or **Save and start** if the space is stopped.

**Note**  
Saving changes to a running space requires a restart, and any unsaved changes will be lost.

# Deleting a space
<a name="delete-space"></a>

Project members can choose to delete a space in Amazon SageMaker Unified Studio. Deleting a space is permanent and cannot be undone. Before deleting a space, make sure that you have saved all your data and that you are sure that this is the action you would like to take.

**Note**  
You cannot delete the default JupyterLab space in Amazon SageMaker Unified Studio.

To delete a space, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to the project you want to delete a space in. You can do this by choosing **Browse all projects** from the center menu.

1.  In the **Build** menu, choose **Spaces** to navigate to the **Spaces** tab on the **Compute** page.

1.  Choose the three-dot **Actions** menu next to the space you want to delete. Then choose **Delete**.

1. Review the information in the window provided, then type **confirm** where indicated.

1. Choose **Delete this space**.

# Using the coding assistant
<a name="using-the-coding-assistant"></a>

The Amazon SageMaker Unified Studio is integrated with Amazon Q. Amazon Q Developer is a coding assistant that can chat about code, provide inline code completions, or generate net new code. 

For more information about Amazon Q Developer, see [What is Amazon Q Developer](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/what-is.html) in the Amazon Q Developer User Guide.

To use the Amazon Q Developer model for chat:

1. Ensure your admin must has subscribed to Amazon Q Developer and added Amazon Q Developer as an application to your domain in the Amazon Q Developer console, as described in the Amazon Amazon SageMaker Unified Studio Administrator Guide. 
**Note**  
When you enable Amazon Q, you can choose between the free or paid tiers of the service. JupyterLab in the default space supports both the free and paid tiers. However, in additional spaces, JupyterLab and Code Editor support the free tier only.  
When using the free tier, request limits are shared at the account level, meaning that one customer can potentially use up all requests. The pro tier of Amazon Q is charged at the user level, with limits set at the user level as well. The pro tier also lets you manage users and policies with enterprise access control.

1. After adding Amazon Q Developer , you can access the chat interface by navigating to the JupyterLab or Code Editor experience and choosing the chat icon in the left navigation panel of your notebook in Amazon SageMaker Unified Studio.  
![\[Screenshot of Amazon SageMaker Unified Studio UI showing Amazon Q programming assitant window.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_programming_modal.png)

1. You are now able to see code completions powered by Amazon Q Developer in your notebook. Amazon Q Developer makes code recommendations automatically as you write your code, based on your existing code and comments. For more information about how inline suggestions work in Amazon Q Developer, see [Generating inline suggestions](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/inline-suggestions.html) in the Amazon Q Developer User Guide.

   Amazon Q Developer provides automatic suggestions for your code by default. To pause or resume automatic suggestions: 

   1. Choose "Amazon Q" from the navigation bar at the bottom of the JupyterLab or Code Editor IDE. Then choose Pause Auto-Suggestions or Resume Auto-Suggestions, as desired.  
![\[Screenshot of Amazon SageMaker Unified Studio UI showing shorcut commands and options for Amazon Q.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/q-dev/q_shortcut.png)

If you want to opt out of Amazon Q data sharing, see the [opt-out section of the Amazon Q developer guide.](https://docs.aws.amazon.com/amazonq/latest/qdeveloper-ug/service-improvement.html#opt-out)

# Bring your own image (BYOI)
<a name="byoi"></a>

An image is a file that identifies the kernels, language packages, and other dependencies required to run your applications. It includes:
+ Programming languages (like Python or R)
+ Kernels
+ Libraries and packages
+ Other necessary software

Amazon SageMaker AI Distribution ([https://gallery.ecr.aws/sagemaker/sagemaker-distribution](https://gallery.ecr.aws/sagemaker/sagemaker-distribution)) is a set of Docker images that include popular frameworks and packages for machine learning, data science, and visualization.

You can also create your own custom image, using an Amazon SageMaker AI Distribution image as a base image, to bring your own image (BYOI). You may want to BYOI when:
+ You need a specific version of a programming language or library
+ You want to include custom tools or packages
+ You're working with specialized software not available in the standard images

**Topics**
+ [Dockerfile specifications](byoi-specifications.md)
+ [How to BYOI](byoi-how-to.md)
+ [Launch your custom image in Amazon SageMaker Unified Studio](byoi-launch-custom-image.md)
+ [Speed up container startup with SOCI](byoi-soci-indexing.md)
+ [Detach and clean up custom image resources](byoi-clean-up.md)

# Dockerfile specifications
<a name="byoi-specifications"></a>

To bring your own image (BYOI), ensure that the following specifications are satisfied when you create your `Dockerfile`.
+ **Base image specifications**: You must use one of the base images from Amazon SageMaker AI Distribution ([https://gallery.ecr.aws/sagemaker/sagemaker-distribution](https://gallery.ecr.aws/sagemaker/sagemaker-distribution)), with the following specifications. These contain important extensions that are required to execute an image on Amazon SageMaker Unified Studio.
  + The base image must be `FROM public.ecr.aws/sagemaker/sagemaker-distribution:version`. You can copy the Image URI of an image from the Image tags tab in [https://gallery.ecr.aws/sagemaker/sagemaker-distribution](https://gallery.ecr.aws/sagemaker/sagemaker-distribution).
  + The chosen image `version` must be greater or equal to the following.
    + For CPU, `2.6-cpu`
    + For GPU: `2.6-gpu`
+ Follow the [Custom image specifications](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-byoi-specs.html) in the *SageMaker AI Developer Guide*.

## Dockerfile example
<a name="byoi-specifications-example"></a>

The following is an example `Dockerfile` that meets the above criteria. The `version` in this example must satisfy the specification above.

**Note**  
Adding `ENTRYPOINT` in the `Dockerfile` will not work as expected. If you would like to configure a custom entrypoint, you will need to update your [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ContainerConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_ContainerConfig.html). For an example, see [Update container configuration](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-byoi-how-to-container-configuration.html) in the *SageMaker AI Developer Guide*.

```
FROM public.ecr.aws/sagemaker/sagemaker-distribution:version

ARG NB_USER="sagemaker-user"
ARG NB_UID=1000
ARG NB_GID=100

ENV MAMBA_USER=$NB_USER

USER root

RUN apt-get update
RUN micromamba install sagemaker-inference --freeze-installed --yes --channel conda-forge --name base

USER $MAMBA_USER
```

# How to BYOI
<a name="byoi-how-to"></a>

When you bring your own image (BYOI) to Amazon SageMaker Unified Studio, you attach a custom image to an Amazon SageMaker Unified Studio project. The following page provides instructions on how to bring your custom image to your Amazon SageMaker Unified Studio project.

**Topics**
+ [Prerequisites](#byoi-how-to-prerequisites)
+ [Step 1: Create your custom image](#byoi-how-to-step-1-create-custom-image)
+ [Step 2: Get the SageMaker AI domain name associated with your Amazon SageMaker Unified Studio project](#byoi-how-to-step-2-get-domain-name)
+ [Step 3: Attach your custom image using the SageMaker AI domain](#byoi-how-to-step-3-attach-custom-image)
+ [Step 4: Access your custom image in Amazon SageMaker Unified Studio](#byoi-how-to-step-4-access-custom-image)

## Prerequisites
<a name="byoi-how-to-prerequisites"></a>

You will need to complete the following prerequisites to bring your own image to Amazon SageMaker Unified Studio.
+ Create an Amazon SageMaker Unified Studio project. For more information, see [Create a new project](create-new-project.md).
+ Set up the Docker application. For more information, see [Get started](https://docs.docker.com/get-started/) in the *Docker documentation*.
+ Install the latest AWS CLI by following the steps in [Getting started with the AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/getting-started-install.html) in the *AWS Command Line Interface User Guide for Version 2*.
+ Permissions to access the Amazon Elastic Container Registry (Amazon ECR) service. For more information, see [Amazon ECR Managed Policies](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-getting-started.html) in the *Amazon ECR User Guide*.
+ An AWS Identity and Access Management role that has the [AmazonSageMakerFullAccess](https://docs.aws.amazon.com/sagemaker/latest/dg/security-iam-awsmanpol.html#security-iam-awsmanpol-AmazonSageMakerFullAccess) policy attached.

## Step 1: Create your custom image
<a name="byoi-how-to-step-1-create-custom-image"></a>

**Important**  
Ensure that you are using the [Dockerfile specifications](byoi-specifications.md) in the following instructions.

Follow the steps in [Create a custom image and push to Amazon ECR](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-byoi-how-to-prepare-image.html) in the *SageMaker AI Developer Guide*.

## Step 2: Get the SageMaker AI domain name associated with your Amazon SageMaker Unified Studio project
<a name="byoi-how-to-step-2-get-domain-name"></a>

An associated SageMaker AI domain is created when you create a Amazon SageMaker Unified Studio project. You will need the SageMaker AI domain name before proceeding to the next step. For instructions, see [View the SageMaker AI domain details associated with your project](view-project-details.md#view-project-details-smai-domain).

## Step 3: Attach your custom image using the SageMaker AI domain
<a name="byoi-how-to-step-3-attach-custom-image"></a>

To attach your custom image to your Amazon SageMaker Unified Studio project, you must attach your custom image to your SageMaker AI domain. Follow the steps in [Attach your custom image to your domain](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-byoi-how-to-attach-to-domain.html) in the *SageMaker AI Developer Guide*, using the SageMaker AI domain obtained from above.

## Step 4: Access your custom image in Amazon SageMaker Unified Studio
<a name="byoi-how-to-step-4-access-custom-image"></a>

Once your custom image is attached to your Amazon SageMaker Unified Studio project, the users with access to your project can access it. For instructions on how users can access the custom images, see [Launch your custom image in Amazon SageMaker Unified Studio](byoi-launch-custom-image.md).

# Launch your custom image in Amazon SageMaker Unified Studio
<a name="byoi-launch-custom-image"></a>

The following section provides information on how to launch your custom image in Amazon SageMaker Unified Studio. If you have not already created a custom image, see [How to BYOI](byoi-how-to.md).

**To launch your custom image in Amazon SageMaker Unified Studio**

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your IAM Identity Center (SSO) or AWS credentials. For more information, see [Access Amazon SageMaker Unified Studio](getting-started-access-the-portal.md).

1. Navigate to the Amazon SageMaker Unified Studio home page by choosing the icon located at the top left corner of the page.

1. You can choose to create a project or choose an existing one.

1. Once you choose a project you will be taken to the **Project overview** page.

1. On the **Project overview** page, open your notebook. To do so:

   1. Choose **New** to expand the dropdown menu, located at the top right of the page.

   1. Choose **Notebook**, then continue with the following steps.

1. Once your notebook is open, choose **Configure space**, located at the top right of the page.

1. In the **Configure space** section, you can choose the image from the **Image** dropdown menu.

1. After you have verified the settings, choose **Save and restart**.

Once completed, this will start a new JupyterLab instance with the custom image.

# Speed up container startup with SOCI
<a name="byoi-soci-indexing"></a>

SOCI (Seekable Open Container Initiative) indexing enables lazy loading of custom container images in Amazon SageMaker Unified Studio or [Amazon SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated.html). SOCI significantly reduces startup times by roughly 30-70% for your custom [Bring your own image (BYOI)](byoi.md) containers. Latency improvement varies depending on the size of the image, hosting instance availability, and other application dependencies. SOCI creates an index that allows containers to launch with only necessary components, fetching additional files on-demand as needed.

SOCI addresses slow container startup times, that interrupt iterative machine learning (ML) development workflows, for custom images. As ML workloads become more complex, container images have grown larger, creating startup delays that hamper development cycles.

For more information on SOCI indexing and how to use them, see [Speed up container startup with SOCI](https://docs.aws.amazon.com/sagemaker/latest/dg/soci-indexing.html) in the *Amazon SageMaker Unified Studio Developer Guide*.

# Detach and clean up custom image resources
<a name="byoi-clean-up"></a>

An associated SageMaker AI domain is created when you create an Amazon SageMaker Unified Studio project. You will need the SageMaker AI domain details to detach and clean up your custom image resources. For instructions, see [View the SageMaker AI domain details associated with your project](view-project-details.md#view-project-details-smai-domain). 

Follow the instructions in [Detach and clean up custom image resources](https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-byoi-how-to-detach-from-domain.html) in the *Amazon SageMaker AI Developer Guide*. 

# Using Amazon SageMaker Unified Studio Library for Python
<a name="python-library"></a>

The Amazon SageMaker Unified Studio library is an open source library for interacting with Amazon SageMaker Unified Studio resources. With this library, you can access resources such as domains, projects, connections, and databases, all in one place with minimal code. The following examples demonstrate how to use the library in local and remote sessions.

**Note**  
For IAM-based domains, the Amazon SageMaker Unified Studio Library for Python is supported only when using Space Distribution Image version 2.11\$1 or 3.6\$1 in JupyterLab Notebooks or the Code Editor. Earlier versions (e.g., 2.9 or 3.2) do not support the Amazon SageMaker Unified Studio Library for Python.

# Using ClientConfig
<a name="using-client-config"></a>

If using `ClientConfig` for supplying credentials or changing the AWS region name, the `ClientConfig` object will need to be supplied when initializing any further Amazon SageMaker Studio objects, such as `Domain` or `Project`. If using non-prod endpoint for an AWS service, it can also be supplied in the `ClientConfig`. Note: In sagemaker space, datazone endpoint is by default fetched from the metadata JSON file.

```
from sagemaker_studio import ClientConfig, Project
conf = ClientConfig(region="eu-west-1")
proj = Project(config=conf)
```

# Domain
<a name="domain"></a>

`Domain` can be initialized using the following command.

```
from sagemaker_studio import Domain
dom = Domain()
```

If you are not using the Amazon SageMaker Studio within Amazon SageMaker Unified Studio JupyterLab IDE, you will need to provide the ID of the domain you want to use.

```
dom = Domain(id="123456")
```

## Domain Properties
<a name="domain-properties"></a>

A `Domain` object has several string properties that can provide information about the domain that you are using.

```
dom.id
dom.root_domain_unit_id
dom.name
dom.domain_execution_role
dom.status
dom.portal_url
```

# Project
<a name="project"></a>

`Project` can be initialized using the following command.

```
from sagemaker_studio import Project
proj = Project()
```

If you are not using the Amazon SageMaker Studio library within the Amazon SageMaker Unified Studio JupyterLab IDE, you will need to provide either the ID or name of the project you would like to use and the domain ID of the project.

```
proj = Project(name="my_proj_name", domain_id="123456")
```

## Project properties
<a name="project-properties"></a>

A `Project` object has several string properties that can provide information about the project that you are using.

```
proj.id
proj.name
proj.domain_id,
proj.project_status,
proj.domain_unit_id,
proj.project_profile_id
proj.user_id
```

### IAM Role ARN
<a name="iam-role-arn"></a>

To retrieve the project IAM role ARN, you can retrieve the `iam_role` field. This gets the IAM role ARN of the default IAM connection within your project.

```
proj.iam_role
```

### AWS KMS Key ARN
<a name="kms-key-arn"></a>

If you are using a AWS KMS key within your project, you can retrieve the `kms_key_arn` field.

```
proj.kms_key_arn
```

# S3 Path
<a name="s3-path"></a>

One of the properties of a `Project` is `s3`. You can access various S3 paths that exist within your project.

```
# S3 path of project root directory
proj.s3.root
# S3 path of datalake consumer Glue DB directory (requires DataLake environment)
proj.s3.datalake_consumer_glue_db
# S3 path of Athena workgroup directory (requires DataLake environment)
proj.s3.datalake_athena_workgroup
# S3 path of workflows output directory (requires Workflows environment)
proj.s3.workflow_output_directory
# S3 path of workflows temp storage directory (requires Workflows environment)
proj.s3.workflow_temp_storage
# S3 path of EMR EC2 log destination directory (requires EMR EC2 environment)
proj.s3.emr_ec2_log_destination
# S3 path of EMR EC2 log bootstrap directory (requires EMR EC2 environment)
proj.s3.emr_ec2_certificates
# S3 path of EMR EC2 log bootstrap directory (requires EMR EC2 environment)
proj.s3.emr_ec2_log_bootstrap
```

## Other Environment S3 Paths
<a name="other-environment-s3-paths"></a>

You can also access the S3 path of a different environment by providing an environment ID.

```
proj.s3.environment_path(environment_id="env_1234")
```

# Connections
<a name="connections"></a>

You can retrieve a list of connections for a project, or you can retrieve a single connection by providing its name.

```
proj_connections: List[Connection] = proj.connections
proj_redshift_conn = proj.connection("my_redshift_connection_name")
```

Each `Connection` object has several properties that can provide information about the connection.

```
proj_redshift_conn.name
proj_redshift_conn.id
proj_redshift_conn.physical_endpoints[0].host
proj_redshift_conn.iam_role
```

# Retrieving AWS client with SDK for Python (Boto3)
<a name="connection-clients"></a>

You can retrieve an SDK for Python (Boto3) AWS client initialized with the connection's credentials.

**Example**  
The following example shows how to create a Redshift client using create\$1client() from Redshift connection.  

```
redshift_connection: Connection = proj.connection("project.redshift")
redshift_client = redshift_connection.create_client()
```

Some connections are directly associated with an AWS service, and will default to using that AWS service's client if no service name is specified. Those connections are listed in the following table.


| Connection Type | AWS Service Name | 
| --- | --- | 
| ATHENA | athena | 
| DYNAMODB | dynamodb | 
| REDSHIFT | redshift | 
| S3 | s3 | 
| S3\$1FOLDER | s3 | 

For other connection types, you must specify an AWS service name.

**Example**  
See the following example for details.  

```
iam_connection: Connection = proj.connection("project.iam")
glue_client = iam_connection.create_client("glue")
```

# Connection data
<a name="connection-data"></a>

To retrieve all properties of a `Connection`, you can access the `data` field to get a `ConnectionData` object. `ConnectionData` fields can be accessed using the dot notation (e.g. `conn_data.top_level_field`). For retrieving further nested data within `ConnectionData`, you can access it as a dictionary. For example: `conn_data.top_level_field["nested_field"]`.

```
conn_data: ConnectionData = proj_redshift_conn.data
red_temp_dir = conn_data.redshiftTempDir
lineage_sync = conn_data.lineageSync
lineage_job_id = lineage_sync["lineageJobId"]
spark_conn = proj.connection("my_spark_glue_connection_name")
id = spark_conn.id
env_id = spark_conn.environment_id
glue_conn = spark_conn.data.glue_connection_name
workers = spark_conn.data.number_of_workers
glue_version = spark_conn.data.glue_version
# Fetching tracking server ARN and tracking server name from an MLFlow connection
ml_flow_conn = proj.connection('<my_ml_flow_connection_name>')
tracking_server_arn = ml_flow_conn.data.tracking_server_arn
tracking_server_name = ml_flow_conn.data.tracking_server_name
```

# Secrets
<a name="secrets"></a>

Retrieve the secret (username, password, other connection-related metadata) for the connection using the following property.

```
snowflake_connection: Connection = proj.connection("project.snowflake")
secret = snowflake_connection.secret
```

Secrets can be a dictionary containing credentials or a single string depending on the connection type.

# Catalogs, databases, and tables
<a name="databases-and-tables"></a>

If your `Connection` is of the `LAKEHOUSE` or `IAM` type, you can retrieve catalogs, databases, and tables within a project.

## Catalogs
<a name="catalogs"></a>

If your Connection is of the `LAKEHOUSE` or `IAM` type, you can retrieve a list of catalogs, or a single catalog by providing its id.

```
conn_catalogs: List[Catalog] = proj.connection().catalogs
my_default_catalog: Catalog = proj.connection().catalog()
my_catalog: Catalog = proj.connection().catalog("1234567890:catalog1/sub_catalog")
proj.connection("<lakehouse_connection_name>").catalogs
```

Each `Catalog` object has several properties that can provide information about the catalog.

```
my_catalog.name
my_catalog.id
my_catalog.type
my_catalog.spark_catalog_name
my_catalog.resource_arn
```

## Databases
<a name="databases"></a>

You can retrieve a list of databases or a single database within a catalog by providing its name.

```
my_catalog: Catalog
catalog_dbs: List[Database] = my_catalog.databases
my_db: Database = my_catalog.database("my_db")
```

Each `Database` object has several properties that can provide information about the database.

```
my_db.name
my_db.catalog_id
my_db.location_uri
my_db.project_id
my_db.domain_id
```

## Tables
<a name="tables"></a>

You can also retrieve a list of tables or a specific table within a `Database`.

```
my_db_tables: List[Table] = my_db.tables
my_table: Table = my_db.table("my_table")
```

Each `Table` object has several properties that can provide information about the table.

```
my_table.name
my_table.database_name
my_table.catalog_id
my_table.location
```

You can also retrieve a list of the columns within a table. `Column` contains the column name and the data type of the column.

```
my_table_columns: List[Column] = my_table.columns
col_0: Column = my_table_columns[0]
col_0.name
col_0.type
```

# Utility Methods
<a name="utility-methods"></a>

The Amazon SageMaker Unified Studio SDK provides utility modules for common data operations including SQL execution, DataFrame operations, and Spark session management.

# SQL Utilities
<a name="sql-utilities"></a>

The SQL utilities module provides a simple interface for executing SQL queries against various database engines within Amazon SageMaker Unified Studio. When no connection is specified, queries are executed locally using DuckDB.

## Supported Database Engines
<a name="supported-database-engines"></a>

The following database engines are supported:
+ Amazon Athena
+ Amazon Redshift
+ MySQL
+ PostgreSQL
+ Snowflake
+ Google BigQuery
+ Amazon DynamoDB
+ Microsoft SQL Server
+ DuckDB (default when no connection specified)

## Basic Usage
<a name="sql-basic-usage"></a>

Import the SQL utilities:

```
from sagemaker_studio import sqlutils
```

### Execute SQL with DuckDB (No Connection)
<a name="execute-sql-duckdb"></a>

When no connection is specified, queries are executed locally using DuckDB:

```
# Simple SELECT query
result = sqlutils.sql("SELECT 1 as test_column")
result

# Query with literal values
result = sqlutils.sql("SELECT * FROM table WHERE id = 123")
```

### Execute SQL with Project Connections
<a name="execute-sql-project-connections"></a>

Use existing project connections by specifying either connection name or ID:

```
# Using connection name
result = sqlutils.sql(
    "SELECT * FROM my_table",
    connection_name="my_athena_connection"
)

# Using connection ID
result = sqlutils.sql(
    "SELECT * FROM my_table",
    connection_id="conn_12345"
)
```

## Examples by Database Engine
<a name="sql-examples-by-database"></a>

### Amazon Athena
<a name="amazon-athena-examples"></a>

```
# Query Athena using project connection with parameters
result = sqlutils.sql(
    """
    SELECT customer_id, order_date, total_amount
    FROM orders
    WHERE order_date >= :start_date
    """,
    parameters={"start_date": "2024-01-01"},
    connection_name="project.athena"
)

# Create external table in Athena
sqlutils.sql(
    """
    CREATE EXTERNAL TABLE sales_data (
        customer_id bigint,
        order_date date,
        amount decimal(10,2)
    )
    LOCATION 's3://my-bucket/sales-data/'
    """,
    connection_name="project.athena"
)

# Insert data using Create Table As Select (CTAS)
sqlutils.sql(
    """
    CREATE TABLE monthly_sales AS
    SELECT
        DATE_TRUNC('month', order_date) as month,
        SUM(amount) as total_sales
    FROM sales_data
    GROUP BY DATE_TRUNC('month', order_date)
    """,
    connection_name="project.athena"
)
```

### Amazon Redshift
<a name="amazon-redshift-examples"></a>

```
# Query Redshift with parameters
result = sqlutils.sql(
    """
    SELECT product_name, category, price
    FROM products
    WHERE category = :category
    AND price > :min_price
    """,
    parameters={"category": "Electronics", "min_price": 100},
    connection_name="project.redshift"
)

# Create table in Redshift
sqlutils.sql(
    """
    CREATE TABLE customer_summary (
        customer_id INTEGER PRIMARY KEY,
        total_orders INTEGER,
        total_spent DECIMAL(10,2),
        last_order_date DATE
    )
    """,
    connection_name="project.redshift"
)

# Insert aggregated data
sqlutils.sql(
    """
    INSERT INTO customer_summary
    SELECT
        customer_id,
        COUNT(*) as total_orders,
        SUM(amount) as total_spent,
        MAX(order_date) as last_order_date
    FROM orders
    GROUP BY customer_id
    """,
    connection_name="project.redshift"
)

# Update existing records
sqlutils.sql(
    """
    UPDATE products
    SET price = price * 1.1
    WHERE category = 'Electronics'
    """,
    connection_name="project.redshift"
)
```

## Advanced Usage
<a name="sql-advanced-usage"></a>

### Working with DataFrames
<a name="working-with-dataframes"></a>

The sql function returns pandas DataFrames for SELECT queries, and row counts for DML operations:

```
import pandas as pd

# Execute query and get DataFrame
df = sqlutils.sql("SELECT * FROM sales_data", connection_name="redshift_conn")

# Use pandas operations
summary = df.groupby('region')['sales'].sum()
print(summary)

# Save to file
df.to_csv('sales_report.csv', index=False)

# DML operations return row counts
rows_affected = sqlutils.sql(
    "UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 123",
    connection_name="redshift_conn"
)
print(f"Updated {rows_affected} inventory records")
```

### Parameterized Queries
<a name="parameterized-queries"></a>

Use parameters to safely pass values to queries:

```
# Dictionary parameters (recommended)
result = sqlutils.sql(
    "SELECT * FROM orders WHERE customer_id = :customer_id AND status = :status",
    parameters={"customer_id": 12345, "status": "completed"},
    connection_name="redshift_connection"
)

# Athena with named parameters
result = sqlutils.sql(
    "SELECT * FROM products WHERE category = :category AND price > :min_price",
    parameters={"category": "Electronics", "min_price": 100},
    connection_name="athena_connection"
)
```

### Getting Database Engine
<a name="getting-database-engine"></a>

You can also get the underlying SQLAlchemy engine for advanced operations:

```
# Get engine for a connection
engine = sqlutils.get_engine(connection_name="redshift_connection")

# Use engine directly with pandas
import pandas as pd

df = pd.read_sql("SELECT * FROM large_table LIMIT 1000", engine)
```

### DuckDB Features
<a name="duckdb-features"></a>

When using DuckDB (no connection specified), you get additional capabilities:

#### Python Integration
<a name="python-integration"></a>

```
# DuckDB can access Python variables directly
import pandas as pd

my_df = pd.DataFrame({'id': [1, 2, 3], 'name': ['A', 'B', 'C']})

result = sqlutils.sql("SELECT * FROM my_df WHERE id > 1")
```

### Notes
<a name="sql-notes"></a>
+ All queries return pandas DataFrames for easy data manipulation
+ DuckDB is automatically configured with Amazon S3 credentials from the environment
+ Connection credentials are managed through Amazon SageMaker Unified Studio project connections
+ The module handles connection pooling and cleanup automatically

# DataFrame Utilities
<a name="dataframe-utilities"></a>

Read from and write to catalog tables using pandas DataFrames with automatic format detection and database management.

Supported catalog types:
+ AwsDataCatalog
+ S3CatalogTables

## Basic Usage
<a name="dataframe-basic-usage"></a>

Import the DataFrame utilities:

```
from sagemaker_studio import dataframeutils
```

## Reading from Catalog Tables
<a name="reading-from-catalog-tables"></a>

Required Inputs:
+ database (str): Database name within the catalog
+ table (str): Table name

Optional Parameters:
+ catalog (str): Catalog identifier (defaults to AwsDataCatalog if not specified)
+ format (str): Data format - auto-detects from table metadata, falls back to parquet
+ \$1\$1kwargs: Additional arguments
  + for AwsDataCatalog, kwargs can be columns, chunked, etc
  + for S3Tables, kwargs can be limit, row\$1filter, selected\$1fields, etc

```
import pandas as pd

# Read from AwsDataCatalog
df = pd.read_catalog_table(
    database="my_database",
    table="my_table"
)

# Read from S3 Tables
df = pd.read_catalog_table(
   database="my_database",
   table="my_table",
   catalog="s3tablescatalog/my_s3_tables_catalog",
)
```

### Usage with optional parameters
<a name="reading-with-optional-parameters"></a>

```
import pandas as pd

# Read from AwsDataCatalog by explicitly specifying catalogID and format
df = pd.read_catalog_table(
    database="my_database",
    table="my_table",
    catalog="123456789012",
    format="parquet"
)

# Read from AwsDataCatalog by explicitly specifying catalogID, format, and additional args -> columns
df = pd.read_catalog_table(
    database="my_database",
    table="my_table",
    catalog="123456789012",
    format="parquet",
    columns=['<column_name_1>, <column_name_2>']
)

# Read from S3 Tables with additional args -> limit
df = pd.read_catalog_table(
   database="my_database",
   table="my_table",
   catalog="s3tablescatalog/my_s3_tables_catalog",
   limit=500
)

# Read from S3 Tables with additional args -> selected_fields
df = pd.read_catalog_table(
   database="my_database",
   table="my_table",
   catalog="s3tablescatalog/my_s3_tables_catalog",
   selected_fields=['<field_name_1>, <field_name_2>']
)
```

## Writing to Catalog Tables
<a name="writing-to-catalog-tables"></a>

Required Inputs:
+ database (str): Database name within the catalog
+ table (str): Table name

Optional Parameters:
+ catalog (str): Catalog identifier (defaults to AwsDataCatalog if not specified)
+ format (str): Data format used for AwsDataCatalog (default: parquet)
+ path (str): Custom Amazon S3 path for writing to AwsDataCatalog (auto-determined if not provided)
+ \$1\$1kwargs: Additional arguments

Path Resolution Priority - Amazon S3 path is determined in this order:
+ User-provided path parameter
+ Existing database location \$1 table name
+ Existing table location
+ Project default Amazon S3 location

```
import pandas as pd

# Create sample DataFrame
df = pd.DataFrame({
    'id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
    'value': [10.5, 20.3, 15.7]
})

# Write to AwsDataCatalog
df.to_catalog_table(
    database="my_database",
    table="my_table"
)

# Write to S3 Table Catalog
df.to_catalog_table(
    database="my_database",
    table="my_table",
    catalog="s3tablescatalog/my_s3_tables_catalog"
)
```

### Writing to Catalog Tables
<a name="writing-with-optional-parameters"></a>

```
# Write to AwsDataCatalog with csv format
df.to_catalog_table(
    database="my_database",
    table="my_table",
    format="csv"
)

# Write to AwsDataCatalog at user specified s3 path
df.to_catalog_table(
    database="my_database",
    table="my_table",
    path="s3://my-bucket/custom/path/"
)

# Write to AwsDataCatalog with additional argument -> compression
df.to_catalog_table(
    database="my_database",
    table="my_table",
    compression='gzip'
)
```

# Spark Utilities
<a name="spark-utilities"></a>

The Spark utilities module provides a simple interface for working with Spark Connect sessions and managing Spark configurations for various data sources within Amazon SageMaker Unified Studio. When no connection is specified, a Spark Connect session is created using the default Amazon Athena Spark connection.

## Basic Usage
<a name="spark-basic-usage"></a>

Import the Spark utilities:

```
from sagemaker_studio import sparkutils
```

## Initialize Spark Session
<a name="initialize-spark-session"></a>

Supported connection types:
+ Spark connect

Optional Parameters:
+ connection\$1name (str): Name of the connection to execute query against (e.g., "my\$1redshift\$1connection")

When no connection is specified, a default Amazon Athena Spark session is created:

```
# Default session
spark = sparkutils.init()

# Session with specific connection
spark = sparkutils.init(connection_name="my_spark_connection")
```

## Working with Spark Options
<a name="working-with-spark-options"></a>

Supported connection types:
+ Amazon DocumentDB
+ Amazon DynamoDB
+ Amazon Redshift
+ Aurora MySQL
+ Aurora PostgreSQL
+ Azure SQL
+ Google BigQuery
+ Microsoft SQL Server
+ MySQL
+ PostgreSQL
+ Oracle
+ Snowflake

Required Inputs:
+ connection\$1name (str): Name of the connection to get Spark options for (e.g., "my\$1redshift\$1connection")

Get formatted Spark options for connecting to data sources:

```
# Get options for Redshift connection
options = sparkutils.get_spark_options("my_redshift_connection")
```

## Examples by Operation Type
<a name="spark-examples-by-operation"></a>

### Reading and Writing Data
<a name="reading-and-writing-data"></a>

```
# Create sample DataFrame
df_to_write = spark.createDataFrame(
    [(1, "Alice"), (2, "Bob")],
    ["id", "name"]
)

# Get spark options for Redshift connection
spark_options = sparkutils.get_spark_options("my_redshift_connection")

# Write DataFrame using JDBC
df_to_write.write \
    .format("jdbc") \
    .options(**spark_options) \
    .option("dbtable", "sample_table") \
    .save()

# Read DataFrame using JDBC
df_to_read = spark.read \
    .format('jdbc') \
    .options(**spark_options) \
    .option('dbtable', 'sample_table') \
    .load()

# Display results
df_to_read.show()
```

## Notes
<a name="spark-notes"></a>
+ Spark sessions are automatically configured for Amazon Athena spark compute
+ Connection credentials are managed through Amazon SageMaker Unified Studio project connections
+ The module handles session management and cleanup automatically
+ Spark options are formatted appropriately for each supported data source
+ When get\$1spark\$1options is used in EMR-S or EMR-on-EC2 compute, and the connection has EnforceSSL enabled, the formatted spark options will not have the sslrootcert value and hence that would need to be passed explicitly.

# Execution APIs
<a name="execution-apis"></a>

Execution APIs provide you the ability to start an execution to run a notebook headlessly within the same user space or on remote compute.

## Local Execution APIs
<a name="local-execution-apis"></a>

Use the following APIs to start, stop, get, or list executions within the user's space.

### StartExecution
<a name="start-execution"></a>

You can start a notebook execution headlessly within the same user space.

```
from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
from sagemaker_studio import ClientConfig

config = ClientConfig(overrides={
            "execution": {
                "local": True,
            }
        })
sagemaker_studio_api = SageMakerStudioAPI(config)

result = sagemaker_studio_api.execution_client.start_execution(
    execution_name="my-execution",
    input_config={"notebook_config": {
        "input_path": "src/folder2/test.ipynb"}},
    execution_type="NOTEBOOK",
    output_config={"notebook_config": {
        "output_formats": ["NOTEBOOK", "HTML"]
    }}
)
print(result)
```

### GetExecution
<a name="get-execution"></a>

You can retrieve details about a local execution using the `GetExecution` API.

```
from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
from sagemaker_studio import ClientConfig

config = ClientConfig(region="us-west-2", overrides={
            "execution": {
                "local": True,
            }
        })
sagemaker_studio_api = SageMakerStudioAPI(config)

get_response = sagemaker_studio_api.execution_client.get_execution(execution_id="asdf-3b998be2-02dd-42af-8802-593d48d04daa")
print(get_response)
```

### ListExecutions
<a name="list-executions"></a>

You can use the `ListExecutions` API to list all the executions that ran in the user's space.

```
from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
from sagemaker_studio import ClientConfig

config = ClientConfig(region="us-west-2", overrides={
            "execution": {
                "local": True,
            }
        })
sagemaker_studio_api = SageMakerStudioAPI(config)

list_executions_response = sagemaker_studio_api.execution_client.list_executions(status="COMPLETED")
print(list_executions_response)
```

### StopExecution
<a name="stop-execution"></a>

You can use the `StopExecution` API to stop an execution that's running in the user space.

```
from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
from sagemaker_studio import ClientConfig

config = ClientConfig(region="us-west-2", overrides={
            "execution": {
                "local": True,
            }
        })
sagemaker_studio_api = SageMakerStudioAPI(config)

stop_response = sagemaker_studio_api.execution_client.stop_execution(execution_id="asdf-3b998be2-02dd-42af-8802-593d48d04daa")
print(stop_response)
```

# Remote Execution APIs
<a name="remote-execution-apis"></a>

Use the following APIs to start, stop, get, or list executions running on remote compute.

## StartExecution
<a name="remote-start-execution"></a>

You can start a notebook execution headlessly on a remote compute specified in the `StartExecution` request.

```
from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
from sagemaker_studio import ClientConfig

config = ClientConfig(region="us-west-2")
sagemaker_studio_api = SageMakerStudioAPI(config)

result = sagemaker_studio_api.execution_client.start_execution(
    execution_name="my-execution",
    execution_type="NOTEBOOK",
    input_config={"notebook_config": {"input_path": "src/folder2/test.ipynb"}},
    output_config={"notebook_config": {"output_formats": ["NOTEBOOK", "HTML"]}},
    termination_condition={"max_runtime_in_seconds": 9000},
    compute={
        "instance_type": "ml.c5.xlarge",
        "image_details": {
            # provide either ecr_uri or (image_name and image_version)
            "image_name": "sagemaker-distribution-embargoed-loadtest",
            "image_version": "2.2",
            "ecr_uri": "123456123456.dkr.ecr.us-west-2.amazonaws.com/ImageName:latest",
        }
    }
)
print(result)
```

## GetExecution
<a name="remote-get-execution"></a>

You can retrieve details about an execution running on remote compute using the `GetExecution` API.

```
from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
from sagemaker_studio import ClientConfig

config = ClientConfig(region="us-west-2")
sagemaker_studio_api = SageMakerStudioAPI(config)

get_response = sagemaker_studio_api.execution_client.get_execution(execution_id="asdf-3b998be2-02dd-42af-8802-593d48d04daa")
print(get_response)
```

## ListExecutions
<a name="remote-list-executions"></a>

You can use the `ListExecutions` API to list all the headless executions that ran on remote compute.

```
from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
from sagemaker_studio import ClientConfig

config = ClientConfig(region="us-west-2")
sagemaker_studio_api = SageMakerStudioAPI(config)

list_executions_response = sagemaker_studio_api.execution_client.list_executions(status="COMPLETED")
print(list_executions_response)
```

## StopExecution
<a name="stop-executions"></a>

You can use the `StopExecution` API to stop an execution that's running on remote compute.

```
from sagemaker_studio.sagemaker_studio_api import SageMakerStudioAPI
from sagemaker_studio import ClientConfig

config = ClientConfig(region="us-west-2")
sagemaker_studio_api = SageMakerStudioAPI(config)

stop_response = sagemaker_studio_api.execution_client.stop_execution(execution_id="asdf-3b998be2-02dd-42af-8802-593d48d04daa")
print(stop_response)
```

# Using the JupyterLab IDE in Amazon SageMaker Unified Studio
<a name="jupyterlab"></a>

The JupyterLab page of Amazon SageMaker Unified Studio provides a JupyterLab interactive development environment (IDE) for you to use as you perform data integration, analytics, or machine learning in your projects. Amazon SageMaker Unified Studio notebooks are powered by JupyterLab spaces.

By default, the JupyterLab application comes with the Amazon SageMaker Distribution image. The distribution image includes popular packages such as the following:
+ PyTorch
+ TensorFlow
+ Keras
+ NumPy
+ Pandas
+ Scikit-learn

Amazon SageMaker Unified Studio includes a sample notebook that you can use to get started. You can also choose to create new notebooks for your business use cases.

Amazon SageMaker Unified Studio notebooks include the following key features:
+ Manage configurations to scale the instance vertically if the job being submitted demands it.
+ Access metadata to find out information such as the path to the Amazon S3 bucket where data is being stored.
+ Perform Git operations for version control.
+ Use Amazon Q chat functionality to ask questions and generate code using prompts.
+ Perform code completion using Amazon Q Developer.

**Note**  
The JupyterLab IDE has an idle shutdown feature that shuts down the IDE after it has been idle for 60 minutes. This means that if both the IDE kernel and terminal have been unused for an hour, the IDE stops running. In order to start using the IDE again after idle shutdown, you would need to navigate to the JupyterLab page again and click on the Start button to restart the kernel in the JupyterLab IDE.

# Managing configurations
<a name="managing-configurations"></a>

You can edit your JupyterLab configurations on the JupyterLab page by choosing Configure in the top right corner. A popup appears where you can change the instance type. You can also increase the EBS volume up to 16 GB if allowed by your admin.

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Expand the **Build** menu in the top navigation, then choose **JupyterLab**.

1. Choose the Configure button in the top right corner of the page. A popup appears where you can change the instance type and increase the EBS volume.

1. Specify the instance type and EBS volume that you want for testing.

   1. NOTE: After you increase the EBS volume, you cannot decrease it.

# Configuring Spark compute
<a name="configurations-spark-compute"></a>

Amazon SageMaker Unified Studio provides a set of Jupyter magic commands. Magic commands, or magics, enhance the functionality of the IPython environment. For more information about the magics that Amazon SageMaker Unified Studio provides, run `%help` in a notebook.

Compute-specific configurations can be set by using the `%%configure` Jupyter magic. The `%%configure` magic takes a JSON-formatted dictionary. To use %%configure magic, specify the compute name in the argument `-n`. Including `-f` will restart the session to forcefully apply the new configuration. Otherwise, this configuration will apply when the next session starts. 

For example: `%%configure -n compute_name -f`.

# Library management
<a name="jupyterlab-library-management"></a>

You can use the library management widget in JupyterLab to manage the library installations and configurations in your notebook.

To navigate to the library management of a notebook in Amazon SageMaker Unified Studio, complete the following steps:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to a project. You can do this by choosing **Browse all projects** from the center menu and then selecting a project, or by creating a new project.

1. From the **Build** menu, choose **JupyterLab**.

1. Navigate to a notebook or create a new one by selecting **File** > **New** > **Notebook**.

1. Choose the library management icon from the notebook navigation bar.  
![\[The Amazon SageMaker Unified Studio JupyterLab library icon.\]](http://docs.aws.amazon.com/sagemaker-unified-studio/latest/userguide/images/library-icon.png)

The following library configurations are available:

## Jar
<a name="jupyterlab-library-jar"></a>
+ **Maven artifacts**
+ **S3 paths**
+ **Disk location paths**
+ **Other paths**

## Python
<a name="jupyterlab-library-python"></a>
+ **Conda packages**
+ **PyPI packages**
+ **S3 paths**
+ **Disk location paths**
+ **Other paths**

## Adding JupyterLab library configurations
<a name="jupyterlab-library-add"></a>



1. Navigate to the JupyterLab library management page.

1. Select the configuration method you would like to add from the left navigation of the library management page.

1. Choose **Add**.

1. Input the URL, package name, coordinates, or other information as the fields indicate.

1. In the left navigation of the library management page, check the box **Apply the change to JupyterLab**.

1. Choose **Save all changes**.

# Compute-specific configuration
<a name="jupyterlab-compute-configure"></a>

 Amazon SageMaker Unified Studio provides a set of Jupyter magic commands. Magic commands, or magics, enhance the functionality of the IPython environment. For more information about the magics that Amazon SageMaker Unified Studio provides, run %help in a notebook. 

 Compute-specific configurations can be configured by %%configure Jupyter magic. The %%configure magic takes a json-formatted dictionary. To use %%configure magic, please specify the compute name in the argument -n. Include —f will restart the session to forcefully apply the new configuration, otherwise this configuration will apply only when next session starts. 

## Configure an EMR Spark session
<a name="jupyterlab-configure-emr-session"></a>

 When working with EMR on EC2 or EMR Serverless, %%configure command can be used to configure the Spark session creation parameters. Using conf settings, you can configure any Spark configuration that's mentioned in the configuration documentation for Apache Spark. 

```
%%configure -n compute_name -f 
{ 
    "conf": { 
        "spark.sql.shuffle.partitions": "36"
     }
}
```

## Configure a Glue interactive session
<a name="jupyterlab-configure-glue-session"></a>

Use the `--` prefix for run arguments specified for Glue. 

```
%%configure -n project.spark.compatibility -f
{
   "––enable-auto-scaling": "true"
   "--enable-glue-datacatalog": "false"
}
```

For more information on job parameters, see Job parameters.

You can update Spark configuration via %%configure when working with Glue with --conf in configure magic. You can configure any Spark configuration that's mentioned in the configuration documentation for Apache Spark. 

```
%%configure -n project.spark.compatibility -f 
{ 
    "--conf": "spark.sql.shuffle.partitions=36" 
}
```

# Accessing metadata
<a name="accessing-metadata"></a>

You can view metadata for your project in the notebook terminal within Amazon SageMaker Unified Studio. This shows you information such as the `ProjectS3Path`, which is the Amazon S3 bucket where your project data is stored. The project metadata is written to a file named resource-metadata.json in the folder /opt/ml/metadata/. You can get the metadata by opening a terminal from within the notebook.

1. Navigate to the Code page within the project you want to view metadata for.

1. Choose File > New > Terminal.

1. Enter in the following command:

   ```
   cat /opt/ml/metadata/resource-metadata.json
   ```

   The metadata file information then appears in the terminal window.

# Performing Git operations
<a name="performing-git-operations"></a>

The JupyterLab IDE in Amazon SageMaker Unified Studio is configured with Git and initialized with the project repository when a project is created.

To access Git operations in the Amazon SageMaker Unified Studio management console, navigate to the Code page of your project, then choose the Git button in the JupyterLab IDE left panel as shown in the image below.

This opens a panel where you can view commit history and perform Git operations. You can use this Git extension to commit and push files back to the project repository, switch your working branch or create a new one, and manage tags.

To fetch notebooks committed by other users, do a pull from the project repository.

**Note**  
When you create and enable a connection for Git access and the user accesses this connection in the JupyterLab IDE in Amazon SageMaker Unified Studio, the repository is cloned. In other words, a local copy of the repository is created in the Amazon SageMaker Unified Studio project. If the administrator later disables or deletes this Git connection, the local repository remains in the user's IDE, but users can no longer push or pull files to or from it. For more information, see [Git connections in Amazon SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/git-connections.html). 

# Using the Amazon Q data integration in AWS Glue
<a name="using-amazon-q-data-integration"></a>

Amazon SageMaker Unified Studio supports the [Amazon Q data integration](https://docs.aws.amazon.com/glue/latest/dg/q.html) in AWS Glue. It helps data engineers and ETL developers create data integration jobs using natural language letting you automate aspects of code authoring.

When using the Amazon Q data integration in AWS Glue, in the Jupyter Lab IDE, you enter comments using natural language instructions, and then the PySpark kernel generates the code on your behalf. You can customize the generated code to meet your own needs.

1. Open a Python notebook, and ensure the kernel is configured to use a PySpark connection.

1. You can request a prompt response by adding a comment and then placing it in a prompt, which will start Amazon Q processing.

1. If the prompt is AWS Glue related, the data integration generates a AWS Glue job script using PySpark.

1. Alternatively, you can continue to use your default auto-completions from Amazon Q Developer. If a prompt isn't Glue related, Amazon Q Developer will use autocomplete instead.

# Running SQL and Spark code
<a name="jupyterlab-sql-spark"></a>

You can run code against multiple compute in one Jupyter notebook using different programming languages through the use of Jupyter cell magics %%pyspark, %%sql, %%scalaspark. 

For example, to run pyspark code on Spark compute, you can run the following code:

```
%%pyspark compute_name
spark.createDataFrame([('Alice', 1)])
```

The following table represents the supported compute types of each magic:


| magic | supported compute types | 
| --- | --- | 
| %%sql | Redshift, Athena, EMR on EC2, EMR Serverless, Glue Interactive Session | 
| %%pyspark | EMR on EC2, EMR Serverless, Glue Interactive Session | 
| %%scalaspark | EMR on EC2, EMR Serverless, Glue Interactive Session | 

The dropdown available at the top of active cells allows you to select the Connection and Compute type. If no selection is made the code in the cell will be run against the Compute hosting JupyterLab (“Local Python” / “project.python”). The option selected for Connection type dictates the Compute available. The selections dictate the magics code generated in the cell and determine where your code runs.

When a new cell is created, it will select the same connection and compute type as the previous cell automatically. To configure the dropdown, go to Settings → Settings editor → Connection magics settings.

# Visualizing results
<a name="jupyterlab-visualizing"></a>

%display is a magic that customers can apply against any DataFrame to invoke a visualization for tabular data. use the visualization to scroll through a DataFrame or results of a Redshift or Athena query.

There are four different views:
+ **Table**. You can change the sampling method, sample size, and rows per page that are displayed.
+ **Summary**. Each column in the summary tab has a button labeled with the column’s name. Clicking on one of these buttons opens the a sub-tab in the column view in Tab 3 for the column that was clicked. 
+ **Column**. For each column selected in the column selector above, a sub-tab appears with more details about the contents of the column.
+ **Plotting**. In the default plotting view you can change the graph type, axes, value types, and aggregation functions for plotting. By installing an optional supported third-party plotting library on the Jupyterlab space (pygwalker, ydata-profiling, or dataprep) and running the display magic you can visualize your data using the installed library.

## Shared Project Storage
<a name="shared-project-storage"></a>

 The JupyterLab visualization widget offers an option to store visualization data in a shared Amazon S3 location within your project bucket. The data is stored using the following structure: 

```
s3://bucket/domain/project/dev/user/{sts_user_identity}/query-result/{data_uuid}/
  
├── dataframe/           # Contains DataFrame in parquet format
├── head/100/            # Sample data (100 rows)
│   ├── metadata.json
│   ├── summary_schema.json
│   └── column_schema/
└── tail/                # Additional sample data
```

## Storage Options
<a name="storage-options"></a>

 The visualization widget supports two storage modes controlled by the `--query-storage` parameter: 
+  **Cell storage** (`--query-storage cell`): Data stored locally in notebook output (current default behavior) 
+  **S3 storage** (`--query-storage s3`): Data stored in project's shared S3 bucket for persistence and sharing 
  + Choose **Store query result in S3** to store the data in project's shared S3 bucket.

## Data Access and Security
<a name="data-access-security"></a>

 When using Amazon S3 storage, the visualization data is accessible to all project members. Data persists beyond individual JupyterLab sessions. No individual user permissions can be set on stored visualizations. You should consider data classification before storing sensitive information. The storage uses the project's default runtime role for access control. 

**Note**  
 The Amazon S3 storage location is shared across the entire project. All project members can access visualization data stored by any team member. 

# Data Sharing Across Compute Environments
<a name="jupyterlab-data-sharing-across-compute"></a>

Amazon SageMaker Unified Studio provides magic commands to facilitate data sharing across different compute environments. This section outlines three key commands: `%push`, `%pop`, and `%send_to_remote`.

## %push
<a name="jupyterlab-data-sharing-across-compute-push"></a>

The `%push` command allows you to upload specified variables to your project's shared S3 storage within Amazon SageMaker Unified Studio.

```
%push <var_name>
%push <var_name1>,<var_name2>
%push -v <var_name>
%push -v <var_name> --namespace <namespace_name>
```

**Key Features:**
+ Supports multiple variable uploads when comma-separated
+ -v specifies the variable name (alternative syntax)
+ Optional --namespace argument (defaults to kernel ID)
+ Uploaded variables are accessible to all project members

**Supported Connections:**
+ Local Python connections
+ AWS Glue connections
+ AWS EMR connections

**Supported Language:** Python

## %pop
<a name="jupyterlab-data-sharing-across-compute-pop"></a>

The %pop command enables you to download specified variables from shared project Amazon S3 storage to your current compute environment.

```
%pop <var_name>
%pop <var_name1>,<var_name2>
%pop -v <var_name>
%pop -v <var_name> --namespace <namespace_name>
```

**Key Features:**
+ Supports multiple variable downloads when comma-separated
+ -v specifies the variable name (alternative syntax)
+ Optional --namespace argument (defaults to kernel ID)

**Supported Connections:**
+ Local Python connections
+ AWS Glue connections
+ AWS EMR connections

**Supported Language:** Python

## %send\$1to\$1remote
<a name="jupyterlab-data-sharing-across-compute-send_to_remote"></a>

The %send\$1to\$1remote command allows you to send a variable from the local kernel to a remote compute environment.

```
%send_to_remote --name <connection_name> --language <language> --local <local_variable_name> --remote <remote_variable_name>
```

** Key Features:**
+ Supports both Python and Scala in remote environments
+ Python remote supports dict, df, and str data types
+ Scala remote supports df and str data types

** Arguments:**
+  -l or --language: Specifies the connection language
+ -n or --name: Specifies the connection to be used
+ --local: Defines the local variable name
+ --remote or -r: Defines the remote variable name

**Supported Connections:** local Python connections

**Supported Language:**
+ Python
+ Scala

## Security considerations
<a name="jupyterlab-data-sharing-across-compute-security"></a>

 Remember that variables uploaded using `%push` are accessible to all project members within your Amazon SageMaker Unified Studio project. Ensure that sensitive data is handled appropriately and in compliance with your organization's data governance policies. 

# Using the Code Editor IDE in Amazon SageMaker Unified Studio
<a name="code-editor"></a>

Code Editor, based on Code-OSS (Open Source Software) like VS Code, helps you write, test, debug, and run your analytics and machine learning code. It also supports integrated development environment (IDE) extensions available in the Open VSX Registry.

After creating a Code Editor space in Amazon SageMaker Unified Studio, you can access your Code Editor session directly through the browser. Within your Code Editor environment, you can do the following:
+ Access artifacts from Amazon SageMaker Unified Studio
+ Clone your GitHub repositories and commit changes
+ Access the SageMaker Python SDK

You can return to Amazon SageMaker Unified Studio to review any assets created in your Code Editor environment such as experiments, pipelines, or training jobs.

# Performing Git operations in Code Editor
<a name="code-editor-git"></a>

The Code Editor IDE in Amazon SageMaker Unified Studio is configured with Git and initialized with the project repository when a Code Editor space is created.

You can use Git operations after launching your Code Editor space. To launch your project, choose **Open** next to the space you want to open in the **Spaces** tab of your project.

When Code Editor is open, choose the **Source Control** icon in the left navigation. You can use this window to perform Git operations such as committing and pushing files back to the project repository, switching your working branch or creating a new one, and managing tags.

To fetch notebooks committed by other users, do a pull from the project repository.

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to your project.

1. In the **Build** menu, choose **Spaces**.

1. Choose **Open** next to the Code Editor space you want to open.

1. In the Source Control window of your Code Editor space, choose the three-dot menu. Then choose **Pull**.

**Note**  
When you create and enable a connection for Git access and the user accesses this connection in the Code Editor IDE in Amazon SageMaker Unified Studio, the repository is cloned. In other words, a local copy of the repository is created in the Amazon SageMaker Unified Studio project. If the administrator later disables or deletes this Git connection, the local repository remains in the user's IDE, but users can no longer push or pull files to or from it. For more information, see [Git connections in Amazon SageMaker Unified Studio](https://docs.aws.amazon.com/sagemaker-unified-studio/latest/adminguide/git-connections.html). 

# Checking the version of Code Editor in Amazon SageMaker Unified Studio
<a name="code-editor-version"></a>

The following steps show how to check the version of your Code Editor application.

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to your project.

1. In the **Build** menu, choose **Spaces**.

1. Choose **Open** next to the Code Editor space you want to open.

1. In the upper-left corner of the Code Editor UI, choose the three-bar menu button.

1. Choose **Help**. Then choose **About**.

# Connections and extensions in Code Editor
<a name="code-editor-connections-extensions"></a>

Code Editor supports IDE connections to AWS services as well as extensions available in the [Open VSX Registry](https://open-vsx.org/).

 Code Editor environments are integrated with the AWS Toolkit for VS Code to add connections to AWS services. Within your Code Editor environment, you can add connections to [AWS Explorer](https://docs.aws.amazon.com/toolkit-for-vscode/latest/userguide/aws-explorer.html) to view, modify, and deploy AWS resources in Amazon S3, CloudWatch, and more. 

Code Editor supports IDE extensions available in the [Open VSX Registry](https://open-vsx.org/). To get started with extensions in your Code Editor environment, choose the Extensions icon in the left navigation pane. Here, you can configure connections to AWS by installing the AWS Toolkit. For more information, see [Installing the AWS Toolkit for Visual Studio Code](https://docs.aws.amazon.com/toolkit-for-vscode/latest/userguide/setup-toolkit.html).

In the search bar, you can search directly for additional extensions through the [Open VSX Registry](https://open-vsx.org/), such as the AWS Toolkit, Jupyter, Python, and more.

# Using Strands Agents in Amazon SageMaker Unified Studio
<a name="strands-agents"></a>

Amazon SageMaker Unified Studio now supports Strands Agents integration, streamlining AI agent development for users. This integration enables you to immediately begin building AI agents using Strands SDK's powerful capabilities without the complexity of dependency management or environment configuration.

By making Strands SDK a native component of the Amazon SageMaker ecosystem, developers can focus on creating innovative AI solutions while leveraging Amazon SageMaker's enterprise-grade infrastructure and scaling capabilities.

The Amazon BedrockModel provider is used by default when creating a basic Agent with the provided Amazon Bedrock model. You can specify which Amazon Bedrock model to use by passing in the model\$1id string directly to the Agent constructor. You must use the application inference profile ARN as the model\$1id, like in the following example:

```
import boto3
from strands import Agent, tool
from strands.models import BedrockModel
from strands_tools import calculator, current_time

# Create a Bedrock model with the custom session
bedrock_model = BedrockModel(
model_id="arn:aws:bedrock:us-west-2:123456789321:application-inference-profile/ab788q84ey7z"
)

# Create an agent with the model and tools
agent = Agent(
model=bedrock_model,
tools=[calculator, current_time]
)
# First request will cache the tools
response1 = agent("What time is it?")

# Second request will reuse the cached tools
response2 = agent("What is the square root of 1764?")
```

You can obtain the application inference profile ARN by performing the following procedure:

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to your project.

1. Navigate to your chat agents under the **Build** menu.

1. In the left-hand navigation pane, choose **Models** in order to see all the enabled Amazon Bedrock models for the project.

1. Choose a model and then on the model details page, locate the application inference profile ARN.