

# Amazon S3 data in Amazon SageMaker Unified Studio
S3 data

You can bring in Amazon S3 data to your project and access it on the **Data** page of your project in Amazon SageMaker Unified Studio.

To add S3 tables to your lakehouse in Amazon SageMaker Unified Studio, see [Amazon S3 tables integration](https://docs.aws.amazon.com/sagemaker-lakehouse-architecture/latest/userguide/lakehouse-s3-tables-integration.html).

To add S3 data as assets in your Amazon SageMaker Unified Studio project catalog, see [Adding Amazon S3 data](adding-existing-s3-data.md). In Amazon SageMaker Unified Studio, assets represent specific types of data resources such as database tables, dashboards, S3 buckets or prefixes, or machine learning models. 

For S3 data in projects, SageMaker Catalog supports the creation of an asset type of **S3 Object Collection** for an Amazon S3 bucket or S3 prefix in the project. The S3 Object Collection asset type can be curated with business context metadata by adding business names, descriptions, README, glossary terms, and metadata forms, including mandatory metadata forms. Assets in Amazon SageMaker Unified Studio are versioned as changes are made in metadata.

# Adding Amazon S3 data


To bring in Amazon S3 data to your project, you must first gain access to the data and then add the data to your project. You can gain access to the data by using the project role or an access role.

**Note**  
 If you are using a bucket in a different account than the account that contains the project tooling environment, you must use an access role to gain access to the data.

## Prerequisite option 1 (recommended): Gain access using an access role


Work with your admin to complete the following steps:

1. Retrieve the project role ARN and the project ID and send them to your admin.

   1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

   1. Navigate to the project that you want to add Amazon S3 data to. You can do this by choosing **Browse all projects** from the center menu, and then selecting the name of the project.

   1. On the **Project overview** page, copy the project role ARN and the project ID.

1. The admin then must go to the Amazon S3 console and add a CORS policy to the bucket that you want to access in your project.

   1. Navigate to the Amazon S3 console.

   1. Navigate to the bucket you want to grant access to.

   1. On the **Permissions** tab, under **Cross-origin resource sharing (CORS)**, choose **Edit**.

   1. Enter in the new CORS policy, then choose **Save changes**.

      ```
      [
          {
              "AllowedHeaders": [
                  "*"
              ],
              "AllowedMethods": [
                  "PUT",
                  "GET",
                  "POST",
                  "DELETE",
                  "HEAD"
              ],
              "AllowedOrigins": [
                  "domainUrl" // example: https://dzd_abcdefg1234567.sagemaker.us-east-1.on.aws
              ],
              "ExposeHeaders": [
                  "x-amz-version-id"
              ]
          }
      ]
      ```

   1. Choose the name of an object to view its details. On the **Properties** tab, note the resource name ARN and the S3 URI. You will need to use these later.

1. The admin then must go to the IAM console and create an access role.

   1. Navigate to the IAM console.

   1. On the **Roles** page, choose **Create role**.

   1. Under **Trusted entity type**, choose **Custom trust policy**.

   1. Edit the policy to include the project ID, the project ARN, and the AWS account ID to grant Amazon S3 access permissions.

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Effect": "Allow",
                  "Principal": {
                      "Service": "access-grants.s3.amazonaws.com"
                  },
                  "Action": [
                      "sts:AssumeRole",
                      "sts:SetSourceIdentity"
                  ],
                  "Condition": {
                      "StringEquals": {
                      "aws:SourceAccount": "111122223333"
                      }
                  }
              },
              {
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": "project-role-arn"
                  },
                  "Action": "sts:AssumeRole",
                  "Condition": {
                      "StringEquals": {
                          "sts:ExternalId": "project-id"
                      }
                  }
              },
              {
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": "project-role-arn"
                  },
                  "Action": [
                      "sts:SetSourceIdentity"
                  ],
                  "Condition": {
                      "StringLike": {
                          "sts:SourceIdentity": "${aws:PrincipalTag/datazone:userId}"
                      }
                  }
              },
              {
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": "project-role-arn"
                  },
                  "Action": "sts:TagSession",
                  "Condition": {
                      "StringEquals": {
                          "aws:RequestTag/AmazonDataZoneProject": "project-id",
                          "aws:RequestTag/AmazonDataZoneDomain": "domain-id"
                      }
                  }
              }
          ]
      }
      ```

------

   1. Choose **Next** twice.

   1. Enter a name for the role, then choose **Create role**.

   1. Select the access role from the list on the **Roles** page.

   1. On the **Permissions** tab of the role, choose **Add permissions**, then **Create inline policy**.

   1. Use the JSON editor to create a policy that grants Amazon S3 access permissions.
**Note**  
Amazon SageMaker Unified Studio grants access to subscribed assets using S3 Access Grants. To enable granting access to data using S3 Access Grants, an S3 Access Grants instance is required. Amazon SageMaker Unified Studio will use an instance if one is already available or will create one. S3 Access Grants needs one instance per AWS Region in a single AWS account. For more information, see [Working with S3 Access Grants instances](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-grants-instance.html)

   1. Choose **Next**.

   1. Enter a name for the policy, then choose **Create policy**.

   1. Optional: if you want to support cross-account data sharing for S3, add the following to your policy:

      ```
      {
          "Sid": "CrossAccountS3AGResourceSharingPermissions",
          "Effect": "Allow",
          "Action": [
              "ram:CreateResourceShare"
          ],
          "Resource": "*",
          "Condition": {
              "StringEqualsIfExists": {
                  "ram:RequestedResourceType": [
                      "s3:AccessGrants"
                  ]
              },
              "StringEquals": {
                  "aws:ResourceAccount": "${aws:PrincipalAccount}"
              }
          }
      },
      {
          "Sid": "CrossAccountS3AGResourceSharingPolicyPermissions",
          "Effect": "Allow",
          "Action": [
              "s3:PutAccessGrantsInstanceResourcePolicy"
          ],
          "Resource": "arn:aws:s3:*:*:access-grants/default",
          "Condition": {
              "StringEquals": {
                  "aws:ResourceAccount": "${aws:PrincipalAccount}"
              }
          }
      }
      ```

   1. Choose **Next**.

   1. Enter a name for the policy, then choose **Create policy**.

   1. Optional: If the bucket is in a different account than the the access role, ensure cross-account bucket permissions are set by adding a bucket policy that grants cross-account permissions to the access role. For example:

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Sid": "S3AdditionalBucketPermissions",
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": "access-role-arn"
                  },
                  "Action": [
                      "s3:ListBucket",
                      "s3:GetBucketLocation"
                  ],
                  "Resource": [
                      "arn:aws:s3:::bucketName"
                  ]
              },
              {
                  "Sid": "S3AdditionalObjectPermissions",
                  "Effect": "Allow",
                  "Principal": {
                      "AWS": "access-role-arn"
                  },
                  "Action": [
                      "s3:GetObject*",
                      "s3:PutObject"
                  ],
                  "Resource": [
                      "arn:aws:s3:::bucketName/key/*"
                  ]
              }
          ]
      }
      ```

------

   1. Choose **Update policy**.

## Prerequisite option 2: Gain access using the project role


Work with your admin to complete the following steps:

1. Retrieve the project role ARN and send it to your admin.

   1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

   1. Navigate to the project that you want to add Amazon S3 data to. You can do this by choosing **Browse all projects** from the center menu, and then selecting the name of the project.

   1. On the **Project overview** page, copy the project role ARN.

1. The admin then must go to the Amazon S3 console and add a CORS policy to the bucket that you want to access in your project.

   1. Navigate to the Amazon S3 console.

   1. Navigate to the bucket you want to grant access to.

   1. On the **Permissions** tab, under **Cross-origin resource sharing (CORS)**, choose **Edit**.

   1. Enter in the new CORS policy, then choose **Save changes**.

      ```
      [
          {
              "AllowedHeaders": [
                  "*"
              ],
              "AllowedMethods": [
                  "PUT",
                  "GET",
                  "POST",
                  "DELETE",
                  "HEAD"
              ],
              "AllowedOrigins": [
                  "domainUrl" // example: https://dzd_abcdefg1234567.sagemaker.us-east-1.on.aws
              ],
              "ExposeHeaders": [
                  "x-amz-version-id"
              ]
          }
      ]
      ```

   1. Choose the name of an object to view its details. On the **Properties** tab, note the resource name ARN and the S3 URI. You will need to use these later.

1. The admin then must go to the IAM console and update the project role.

   1. Navigate to the IAM console.

   1. On the **Roles** page, search for the project role using the last string in the project role ARN, for example: `datazone_usr_role_1a2b3c45de6789_abcd1efghij2kl`.

   1. Select the project role to navigate to the project role details.

   1. Under the **Permissions** tab, choose **Add permissions**, then choose **Create inline policy**.

   1. Use the JSON editor to create a policy so that the project has access to an Amazon S3 location, using the Amazon S3 resource ARN that you noted in step 2.

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Sid": "S3AdditionalBucketPermissions",
                  "Effect": "Allow",
                  "Action": [
                      "s3:ListBucket",
                      "s3:GetBucketLocation"
                  ],
                  "Resource": [
                      "arn:aws:s3:::bucketName"
                  ]
              },
              {
                  "Sid": "S3AdditionalObjectPermissions",
                  "Effect": "Allow",
                  "Action": [
                      "s3:GetObject*",
                      "s3:PutObject"
                  ],
                  "Resource": [
                      "arn:aws:s3:::bucketName/key/*"
                  ]
              }
          ]
      }
      ```

------

   1. Choose **Next**

   1. Enter a name for the policy, then choose **Create policy**.

1. Under the **Permissions** tab, choose **Add permissions**, then choose **Create inline policy**.

1. Use the JSON editor to create a policy so that the project has access to an Amazon S3 location, using the Amazon S3 resource ARN that you noted previously.

   ```
   {
             "Sid": "S3AGLocationManagement",
             "Effect": "Allow",
             "Action": [
               "s3:CreateAccessGrantsLocation",
               "s3:DeleteAccessGrantsLocation",
               "s3:GetAccessGrantsLocation"
             ],
             "Resource": [
               "arn:aws:s3:*:*:access-grants/default/*"
             ],
             "Condition": {
               "StringEquals": {
                 "s3:accessGrantsLocationScope": "s3://bucket/folder/"
               }
             }
           },
           {
             "Sid": "S3AGPermissionManagement",
             "Effect": "Allow",
             "Action": [
               "s3:CreateAccessGrant",
               "s3:DeleteAccessGrant"
             ],
             "Resource": [
               "arn:aws:s3:*:*:access-grants/default/location/*",
               "arn:aws:s3:*:*:access-grants/default/grant/*"
             ],
             "Condition": {
               "StringLike": {
                 "s3:accessGrantScope": "s3://bucket/folder/*"
               }
             }
           }
   ```
**Note**  
Amazon SageMaker Unified Studio grants access to subscribed assets using S3 Access Grants. To enable granting access to data using S3 Access Grants, an S3 Access Grants instance is required. Amazon SageMaker Unified Studio will use an instance if one is already available or will create one. S3 Access Grants needs one instance per AWS Region in a single AWS account. For more information, see [Working with S3 Access Grants instances](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-grants-instance.html)

1. Choose **Next**.

1. Enter a name for the policy, then choose **Create policy**.

## Add the data to your project


When your admin has granted your project access to the Amazon S3 resources, you can add them to your project.

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to the project that you want to add Amazon S3 data to.

1. On the **Data** page, choose the plus icon **\$1**.

1. Select **Add S3 location**, then choose **Next**.

1. Enter a name for the location path.

1. (Optional) Add a description of the location path.

1. Use the S3 URI and Region provided by your admin.

1. If your admin has granted you access using an access role instead of the project role, enter the access role ARN from your admin. 

1. Choose **Add S3 location**.

The Amazon S3 data is then accessible within your project in the left navigation on the **Data** page.

# Sharing Amazon S3 data


Sharing data with other users in Amazon SageMaker Unified Studio means that you and other users can access the same data in multiple projects. There are two ways to share data with other users in Amazon SageMaker Unified Studio:
+ Publish Amazon S3 data to the catalog. This means that other projects can create subscription requests to request access to the data you publish. When you approve a subscription request, the other project will then have access to that data. 
+ Share Amazon S3 data directly with consumers. This means that the data you share is available to the projects you specify right away, without needing a subscription process.

In both cases, you can track and manage access to your data in the **Project catalog** page of your project in Amazon SageMaker Unified Studio. You have the option to choose whether to grant read-only or read and write access.

Amazon SageMaker Unified Studio grants access to subscribed assets using Amazon S3 Access Grants. When a subscription is revoked, a project member may still get Amazon S3 Access Grants credentials for up to 5 minutes, and credentials can be used for 15 minutes. As a result, a user may have access to the data for up to 20 minutes after the access is revoked in SageMaker Unified Studio.

## Publish Amazon S3 data to the catalog


When you publish data to the Amazon SageMaker Catalog, other projects in your Amazon SageMaker Unified Studio domain can create subscription requests to request access to the data you published. When you approve a subscription request, the other project will then have access to that data.

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to the project that contains your Amazon S3 connection.

1. On the **Data** page, in the side navigation, choose **S3** to explore your S3 data assets.

1. Choose the name of the S3 folder or bucket you want to publish.

1. Choose **Actions**, then choose **Publish to Catalog**. A confirmation window appears.

1. Choose **Publish** to confirm that you want the S3 data to be discoverable in the Amazon SageMaker Catalog. This means that members of other projects in the domain can create subscription requests for the data asset. When you review subscription requests, you will have the option to grant read-only or read and write access to the data. If you approve the subscription request, they will have access to the S3 data asset in their project.

The S3 data folder or bucket you published then appears in the Amazon SageMaker Catalog as a data asset of type **S3 Object Collection**.

## Share Amazon S3 data directly with consumers


Sharing data directly in this way makes it so that the data you share is available to the projects you specify right away, without needing a subscription process. You will first publish your dataset, and then share it.

1. Navigate to Amazon SageMaker Unified Studio using the URL from your admin and log in using your SSO or AWS credentials. 

1. Navigate to the project that contains your Amazon S3 connection.

1. On the **Data** page, in the side navigation, explore your S3 data assets.

1. Choose the name of the S3 folder or bucket you want to publish.

1. Choose **Actions**, then choose **Publish to Catalog**. A confirmation window appears.

1. Choose **Publish**

1. Navigate to the **Assets**page from the left navigation and select the asset you want to share. 

1. Choose **Actions**, then choose **Share**.

1. Use the dropdown to select projects that you want to share the S3 data with, then choose **Next**.

1. Select the access type. For read-only access, move to the next step. To grant read and write access to users in the other project, select **Read and write access**.

1. Choose **Share**.

The S3 data asset is then shown in the project catalog under approved subscrion requests. You can choose to revoke access at any time. For more information about subscription requests, see [Data discovery, subscription, and consumption](discover-data.md).