

# Working with Catalog in IAM-based domains


Amazon SageMaker Unified Studio provides integrated data catalog functionality that allows you to discover, organize, and manage your data assets. The catalog integrates with AWS Glue Data Catalog to provide a unified view of your databases, tables, and Amazon S3 buckets within your project.

You can use the catalog to browse existing AWS Glue Data Catalog assets and Amazon S3 buckets that your project's execution role has access to.

## Key capabilities

+ Browse accessible data assets - Navigate through AWS Glue catalogs and Amazon S3 buckets you have access to using hierarchical browsing
+ Search across all data - Find tables, models, and other assets across the environment using global search, including data you may not have access to
+ Create tables from uploaded data - Upload CSV, JSON, Parquet, or delimited files to create new catalog tables
+ Manage Amazon S3 bucket contents - Browse Amazon S3 bucket hierarchies, create folders, upload files, and manage Amazon S3 objects

# Browse data you have access to


Use browsing when you know where your data is located. You can also use it to explore your existing data assets. Expand catalog hierarchies to view databases and tables. Navigate through Amazon S3 bucket folders to find specific files and objects.

**To browse data assets you have access to**

1. In the Amazon SageMaker Unified Studio, choose **Data** in the left navigation panel.

1. To browse catalog assets:

   1. Choose the **Catalogs** tab to view available data catalogs.

   1. Expand AwsDataCatalog to see your databases.

   1. Expand any database to view its tables.

   1. Choose a table to view its schema, columns, and metadata in the right panel.

   1. For selected table, use the **Overview**, **Preview data**, and **Details** tabs to explore additional metadata and sample data.

1. To browse Amazon S3 bucket contents:

   1. Choose the **Amazon S3 buckets** tab to view accessible Amazon S3 buckets.

   1. Expand any Amazon S3 bucket to navigate through its folder structure.

   1. Choose folders to view their contents.

   1. Select files to view their properties and metadata.

# Search and find data


**Note**  
For files in Amazon S3, the search is limited to locations that can be accessed by your IAM role

**To search and find data**

1. Use the search bar at the top of the Amazon SageMaker Unified Studio interface.

1. Enter your search terms, such as:
   + Table names (for example, "churn")
   + Database names
   + Model names
   + Asset descriptions or metadata

1. Use the search suggestions and filters to refine your results:

   1. Select **Show all results for "[search term]"** to see all the results.

   1. Use the **Tables**, **Files** and **Models** filter tabs to focus on specific asset types.

1. Choose any search result to view available details and metadata.

1. Use the **Select**, **Open**, and **Open in new tab** options to work with assets.

# Create new catalog table


**To add data to a catalog**

1. In the data explorer, choose the **Add** button.

1. In the Add data dialog, choose one of the options
   + Create table: Upload data files to create a new table
   + Create Amazon S3 Tables catalog: Create a new catalog for your Amazon S3 Tables
   + Add connection: Connect to first and third-party data sources
   + Add Amazon S3 location: Add an existing Amazon S3 location

1. For creating a table from uploaded data:

   1. Choose **Create table** option and select **Next**.

   1. In the file upload area, either drag and drop your file or select **Choose file** to browse for your data file.

   1. Configure the table properties:
      + Table type: Select Amazon S3/external table
      + Catalog name: Choose the target catalog
      + Database: Select the database
      + Table name: Enter a name for your table

   1. Set the data format options:
      + Data format: Choose CSV, JSON, or Parquet
      + Compression type: Select compression if applicable
      + Delimiter: Specify the field separator for CSV files
      + Ignore first row: Check if the first row contains column headers

   1. Choose **Next** to proceed to schema configuration.

   1. In the Schema section:
      + Review the automatically detected column names and data types.
      + Modify column names by editing the text fields.
      + Change data of columns as needed

   1. Choose **Create table** to complete the process.

1. The new table will appear in your catalog with the specified schema and uploaded data.

# Work with Amazon S3 bucket


**To manage Amazon S3 bucket**

1. In the data explorer, choose the **Amazon S3 buckets** tab.

1. Expand the Amazon S3 bucket nodes to view available buckets in your account.

1. Select a bucket to view its contents.

1. To create a new folder:

   1. Choose the **Actions** menu in the bucket view.

   1. Select **Create folder**.

   1. Enter a folder name and choose **Create**.

1. To upload files:

   1. Choose **Upload files** from the **Actions** menu.

   1. Select files from your local system.

   1. Monitor the upload progress.

1. To create a table from existing Amazon S3 data:

   1. Choose **Create table from contents** from the **Actions** menu.

   1. Follow the table creation workflow to define schema and properties.

# Working with an existing AWS Glue Data Catalog IRC


This document outlines the procedure for onboarding existing AWS Glue IRC federated catalogs managed by AWS Lake Formation into an Amazon SageMaker Unified Studio domain. Successful onboarding requires granting appropriate permissions granted to the Studio role within AWS Lake Formation by the Datalake admin.

## IAM based domain


### Prerequisites

+ Amazon SageMaker Unified Studio deployed with IAM-based domain mode
+ Existing AWS Glue federated catalog managed by AWS Lake Formation
+ Data Lake Administrator credentials

### Step by Step Instructions


**Step 1: Retrieve the Project Execution Role**

1. Access your Amazon SageMaker Unified Studio project

1. Locate and copy the project execution role ARN

**Step 2: Configure Lake Formation Permissions**

1. Sign in to the AWS Management Console using Data Lake Administrator credentials

1. Navigate to AWS Lake Formation and grant Permissions (Select One Option):  
Option 1: Full Catalog Access (Recommended for Admin project)  
Grant the execution role super\$1user permissions on the federated catalog. The execution role receives complete access to discover and query all databases and tables within the federated catalog.  
Option 2: Granular Access Control (Recommended for non-Admin project)  
Apply least-privilege permissions by granting specific access levels:  

   1. Catalog Level: Grant DESCRIBE permission on the catalog to the execution role

   1. Database Level: Grant DESCRIBE permission on the target database(s) to the execution role

   1. Table Level: Grant SELECT and DESCRIBE permissions on the target table(s) to the execution role

**Step 3 : Query federated resource from Unified Studio**

1. Use Query Editor:

   1. Now you can see the federated catalogs discoverable under Query Editor and query them as well.

1. Use Data notebook

   1. To use Data notebook to query you can navigate to the notebooks tab in the left navigation panel.

   1. Create notebook and you can now run select on federated catalog

   1. For Athena(SQL) you can run the query as shown below

      ```
      SELECT * FROM "smus_lfuc_poc"."lfuc"."customer" LIMIT 100
      ```

   1. For Athena(spark), add the following config to enable federated catalog querying.

      ```
      SET `spark.sql.catalog.<catalog_name>`=`org.apache.iceberg.spark.SparkCatalog`;
      SET `spark.sql.catalog.<catalog_name>.catalog-impl`=`org.apache.iceberg.aws.glue.GlueCatalog`;
      SET `spark.sql.catalog.<catalog_name>.glue.id`=`<account_id>:<federated_catalog_name>`;
      SET `spark.sql.catalog.<catalog_name>.glue.lakeformation-enabled` = `true`;
      SET `spark.sql.catalog.<catalog_name>.glue.account-id` = `<accountid>`;
      SET `spark.sql.catalog.<catalog_name>.client.region` = `<region>`;
      ```

   1. Query the catalog by running the following:

      ```
      select * from <fderated_catalog_name>.<database_name>.<table_name>
      ```

## Identity Center based domain


### Prerequisites

+ Amazon SageMaker Unified Studio deployed with IDC-based domain mode
+ Existing AWS Glue federated catalog managed by AWS Lake Formation
+ Data Lake Administrator credentials

### Step by Step Instructions


**Step 1: Retrieve the Project IAM Role**

1. Access your Amazon SageMaker Unified Studio project

1. Locate and copy the project IAM role ARN

**Step 2: Configure Lake Formation Permissions**

1. Sign in to the AWS Management Console using Data Lake Administrator credentials

1. Navigate to AWS Lake Formation and grant Permissions (Select One Option):  
Option 1: Full Catalog Access  
Grant the project role super\$1user permissions on the federated catalog. The execution role receives complete access to discover and query all databases and tables within the federated catalog.  
Option 2: Granular Access Control  
Apply least-privilege permissions by granting specific access levels:  

   1. Catalog Level: Grant DESCRIBE permission on the catalog to the project role

   1. Database Level: Grant DESCRIBE permission on the target database(s) to the project role

   1. Table Level: Grant SELECT and DESCRIBE permissions on the target table(s) to the project role

**Step 3 : Query federated resource from Unified Studio**
+ Login into studio and access as Idc used and access the the federated resource from the explorer. You can select resource and query with Athena and Amazon Redshift.