# Amazon DataZone data discovery, subscription, and consumption
Data discovery, subscription, and consumption

In Amazon DataZone, once an asset is published to a domain, subscribers can discover and request a subscription to this asset. The subscription process begins with a subscriber searching for and browsing the catalog to find an asset they want. From the Amazon DataZone portal, they choose to subscribe to the asset by submitting a subscription request that includes justification and the reason for the request. The owner of the asset reviews the request. They can either approve or reject the request. 

After a subscription is granted, a fulfillment process starts to facilitate access to the asset for the subscriber. There are two primary modes of asset access control and fulfillment: those for Amazon DataZone-managed assets and those for assets that are not managed by Amazon DataZone.
+ **Managed assets** – Amazon DataZone can manage fulfillment and permissions for managed assets, such as AWS Glue tables and Amazon Redshift tables and views.
+ **Unmanaged assets** – Amazon DataZone publishes standard events related to your actions (for example, approval given to a subscription request) to Amazon EventBridge. You can use these standard events to integrate with other AWS services or third-party solutions for custom integrations.

**Topics**
+ [

# Search for and view assets in the Amazon DataZone catalog
](search-for-data.md)
+ [

# Request subscription to assets in Amazon DataZone
](subscribe-to-data-assets-managed-by-datazone.md)
+ [

# Approve or reject a subscription request in Amazon DataZone
](approve-reject-subscription-request.md)
+ [

# Revoke an existing subscription in Amazon DataZone
](revoke-subscription.md)
+ [

# Cancel a subscription request in Amazon DataZone
](cancel-subscription-request.md)
+ [

# Unsubscribe from an asset in Amazon DataZone
](unsubscribe-from-subscription.md)
+ [

# Using existing IAM roles to fulfill Amazon DataZone subscriptions
](use-your-own-role.md)
+ [

# Grant access to managed AWS Glue Data Catalog assets in Amazon DataZone
](grant-access-to-glue-asset.md)
+ [

# Grant access to managed Amazon Redshift assets in Amazon DataZone
](grant-access-to-redshift-asset.md)
+ [

# Grant access for approved subscriptions to unmanaged assets in Amazon DataZone
](grant-access-to-unmanaged-asset.md)
+ [

# Query data in Amazon Athena or Amazon Redshift in Amazon DataZone
](query-athena-with-deep-link-in-project.md)
+ [

# Metadata enforcement rules for subscription requests
](metadata-rules.md)
+ [

# Analyze Amazon DataZone subscribed data with external analytics applications via JDBC connection
](query-with-jdbc.md)

# Search for and view assets in the Amazon DataZone catalog
Search for and view assets in the catalog

Amazon DataZone provides a streamlined way to search for data. Any Amazon DataZone user with permissions to access the data portal can search for assets in the Amazon DataZone catalog and view asset names and the metadata assigned to them. You can take a closer look at an asset by examining its details page.

**Note**  
To view the actual data that an asset contains, you must first subscribe to the asset and have your subscription request approved and access granted. 

Search in Amazon DataZone (in new and existing domains) includes results based on keyword and semantic matches. The search algorithm prioritizes keyword matches and then appends those with semantic matches. 

The semantic search functionality empowers users across different roles and functions to more effectively discover, access, and leverage their organization's data assets, leading to improved decision-making, collaboration, and overall data-driven capabilities. With semantic search, keyword inputs produce synonym-based and meaning-based search results in addition to simple keyword match results. For example, with semantic search, if you type in 'flower' as your search input, a data asset with the word 'rose' in its name is returned in search results. If you type in 'movie' as your search input, a data asset with the word 'film' in its name is returned in search results. If you type in 'football' as your search input, a data asset with the word 'soccer' in its name can be returned in search results. 

With keyword search, you can input various keywords while searching for your subscribed assets. For example, if you have an asset called `Catalog Sales Data`, it is returned in the search results if you input any of the following keywords: `catalog_sales`, `Catalog Sales`, `CatalogSales`, or `catalogsales`. 

Amazon DataZone also enhances the search experience by enabling precise exact-match and partial-match functionality for technical identifiers such as column and table names. With this new capability, you can perform searches by enclosing your keywords in double quotes (" "), ensuring results that match technical names exactly or partially. This functionality builds upon the keyword and semantic search capabilities, which empower you to discover assets by concepts and related terms. By adding a layer of precision for technical identifiers, this enhancement enables you to manage large data catalogs with complex technical naming conventions.

As you search through your data, you might need to locate specific technical assets to support your use cases. With the ability to search for technical identifiers, you can retrieve assets with accuracy, saving time and streamlining the discovery process. For instance, a query like "customer\$1id" returns columns or tables with the exact identifier, while a partial query such as "sales\$1" can identify related assets like sales\$1summary and sales\$1data\$12024. This enhancement ensures data consumers can efficiently find the assets they need, enhancing productivity.

**To search for assets in the catalog**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. You can type the name of the asset that you are looking for in the search bar on the home page of the data portal.

1. To browse namespaces, choose **Catalog** from the top right of the page to open the catalog. The catalog provides a faceted search experience for you to find assets by searching on criteria such as , data owner, and glossary terms.

1. Enter your search term in one of the search boxes. After you run a search, you can apply various ﬁlters to narrow the results. The ﬁlters include asset type, source account, and the AWS Region to which the asset belongs.

1. To view details about a specific asset, choose the asset to open its details page. The details page includes the following information:
   + The asset name, data source (AWS Glue, Amazon Redshift, or Amazon S3), type (table, view, or S3 object), number of columns, and size.
   + A description of the asset.
   + The current published revision of the asset, the owner, whether approval is required for subscriptions, the namepace, and update history.
   + An **Overview** tab which includes glossary terms and metadata forms.
   + A **Schema** tab which displays the schema of the asset, including business and technical column names, data types, and business descriptions of the columns. The schema tab is visible only for tables and views (not for Amazon S3 objects).
   + A **Subscriptions** tab which includes a list of subscribers to the domain.
   + A **History** tab which includes a list of past revisions of the asset.

# Request subscription to assets in Amazon DataZone
Request subscription to assets

Amazon DataZone allows you to find, access and consume the assets in the Amazon DataZone catalog. When you find an asset in the catalog that you want to access, you need to *subscribe* to the asset, which creates a subscription request. An approver can then approve or request your request.

You must be a member of a project in order to request subscription to an asset within that project.

**To subscribe to an asset**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Use the search bar to search for and choose the asset to which you want to subscribe, and then choose **Subscribe**. 

1. In the **Subscribe** pop up window, provide the following information:
   + The project that you want to subscribe to the asset.
   + A short justification for your subscription request.

1. Choose **Subscribe**.

   You receive a notification in the data portal when the publisher approves your request.

To view the status of the subscription request, locate and choose the project with which you subscribed to the asset. Navigate to the **Data** tab for the project, then choose **Requested data** from the left navigation pane. This page lists the assets to which the project has requested access. You can filter the list by the status of the request.

# Approve or reject a subscription request in Amazon DataZone
Approve or reject a subscription request

Amazon DataZone allows you to find, access and consume the assets in the Amazon DataZone catalog. When you find an asset in the catalog that you want to access, you must *subscribe* to the asset, which creates a subscription request. An approver can then approve or reject your request.

You must be a member of the owning project (the project that published the asset) to approve or reject a subscription request.

**To approve or reject a subscription request**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. In the data portal, choose **Browse projects list** and select the project that contains the asset with the subscription request.

1. Navigate to the **Data** tab, then choose **Incoming requests** from the left navigation pane.

1. Locate the request and choose **View request**. You can filter by **Pending** to see only requests that are still open.

1. Review the subscription request and reason for access, and decide whether to approve or reject it.

1. To approve, select between the two options:
   + **Full access**: If you choose to approve the subscription with full access option, the subscriber will get access to all the rows and columns in your data asset. 
   + **Approve with row and column filters**: To limit access to specific rows and columns of data, you can choose the option to approve with row and column filters. For more information, see [Fine-grained access control to data in Amazon DataZone](fine-grained-access-control.md). 
     + Select **Choose filters**, and then from the drop down select one or more available filters you want to apply to the subscription. 
     + To create a new filter you can choose Create new filter option, which opens a new page to create a new row or column filter. For more information, see [Create column filters in Amazon DataZone](create-column-filter.md) and [Create row filters in Amazon DataZone](create-row-filter.md).

1. (Optional) Enter a response that explains your reason for accepting or rejecting the request.

1. Choose either **Approve** or **Reject**.

As the project owner, you can revoke the subscription at any time. For more information, see [Revoke an existing subscription in Amazon DataZone](revoke-subscription.md).

To view all subscription requests, see [Amazon DataZone events and notifications](working-with-events-and-notifications.md).

**Note**  
Amazon DataZone supports fine-grained access control for AWS Glue tables, Amazon Redshift tables, and Amazon Redshift views.

## Automatic approval of subscription requests


By default, subscription requests to a published asset require manual approval by a data owner. However, Amazon DataZone supports two scenarios where subscription requests can be automatically approved:
+ Approval disabled during asset publishing - when publishing a data asset, you can choose to not require subscription approval. In this case, all incoming subscription requests to that asset are automatically approved. To learn how to disable approval for an asset, see [Publish assets to the Amazon DataZone catalog from the project inventory](publishing-data-asset.md) .
+ Requester is an owner or contributor in the project that published the asset - a subscription request is also automatically approved if the requester is already authorized to approve it manually. Specifically, if they are a member of both the project that published the asset and the project requesting access.

  To qualify for auto-approval:
  + The requester must be listed as an owner or contributor in the project where the asset was originally published.
  + The requester must also be listed as an owner or contributor in the project making the subscription request.

  This ensures that auto-approval only occurs when the requester has visibility and permissions in both projects — the one sharing the asset and the one requesting access. If the requester meets both conditions, the system auto-approves the request.

# Revoke an existing subscription in Amazon DataZone
Revoke an existing subscription

Amazon DataZone allows you to find, access and consume the assets in the Amazon DataZone catalog. When you find an asset in the catalog that you want to access, you need to *subscribe* to the asset, which creates a subscription request. An approver can then approve or request your request. You might need to revoke a subscription after you have approved it, either because the approval was a mistake, or because the subscriber no longer needs access to the asset.

You must be a member of the owning project (the project that published the asset) to revoke a subscription.

**To revoke a subscription**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project that contains the subscription you want to revoke.

1. Navigate to the **Data** tab, then choose **Incoming requests** from the left navigation pane.

1. Locate the subscription you want to revoke and choose **View subscription**.

1. (Optional) Enable the checkbox to allow the subscriber to keep the asset in the project's subscription targets. A subscription target is a reference to a set of resources where subscribed data can be made available within an environment.

   If you want to revoke access to the asset from the subscription target at a later time, you must do so in AWS Lake Formation.

1. Choose **Revoke subscription**.

You can't re-approve a subscription after you revoke it. The subscriber must subscribe to the asset again in order for you to approve it.

**Note**  
Revoking a subscription affects only the particular user’s access to the asset – the subscriber whose subscription you’re revoking. The asset remains intact and the user (subscriber) also remains intact. This user cannot access the asset until they submit and get an approval of another subscription request.

# Cancel a subscription request in Amazon DataZone
Cancel a subscription request

Amazon DataZone allows you to find, access and consume the assets in the Amazon DataZone catalog. When you find an asset in the catalog that you want to access, you need to *subscribe* to the asset, which creates a subscription request. An approver can then approve or request your request. You might need to cancel a pending subscription request, either because you submitted it by mistake, or because you no longer need read access to the asset.

To cancel a subscription request, you must be either a project owner or contributor.

**To cancel a subscription request**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project that contains the subscription request.

1. Navigate to the **Data** tab for the project, then choose **Requested data** from the left navigation pane. This page lists the assets to which the project has requested access. 

1. Filter by **Requested** to see only requests that are still pending. Locate the request and choose **View request**. 

1. Review the subscription request and choose **Cancel request**.

If you want to re-subscribe to the asset (or to a different asset), see [Request subscription to assets in Amazon DataZone](subscribe-to-data-assets-managed-by-datazone.md).

**Note**  
A pending subscription request can be canceled when there’s no longer need for a 'read' access to the asset. The asset and the user whose pending subscription request is cancelled is not affected by this action. 

# Unsubscribe from an asset in Amazon DataZone
Unsubscribe from an asset

Amazon DataZone allows you to find, access and consume the assets in the Amazon DataZone catalog. When you find an asset in the catalog that you want to access, you need to *subscribe* to the asset, which creates a subscription request. An approver can then approve or request your request. You might need to unsubscribe from an asset, either because you subscribed by mistake and were approved, or because you no longer need read access to the asset.

You must be a member of a project in order to unsubscribe from one of its assets.

**To unsubscribe from an asset**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project that contains the asset you want to unsubscribe from.

1. Navigate to the **Data** tab for the project, then choose **Requested data** from the left navigation pane. This page lists the assets to which the project has requested access. 

1. Filter by **Approved** to see only requests that have been approved. Locate the request and choose **View subscription**. 

1. Review the subscription and choose **Unsubscribe**.

If you want to re-subscribe to the asset (or to a different asset), see [Request subscription to assets in Amazon DataZone](subscribe-to-data-assets-managed-by-datazone.md).

**Note**  
When a user no longer needs access to an asset they can choose the **Unsubscribe** option. The asset remains intact, no resource is deleted as the result of this action.

# Using existing IAM roles to fulfill Amazon DataZone subscriptions


In the current release, Amazon DataZone supports you using your existing IAM roles to get access to the data. To achieve this, you can create a subscription target in the Amazon DataZone environment that you're using to fulfill your subscription. To create a subscription target for an environment in one of the associated AWS accounts, you can use the following steps: 

**Step 1: Ensure that your Amazon DataZone domain is using version 2 or higher of the RAM policy**

1. Navigate to the **Shared by me : Resource shares** page in the AWS RAM console.

1. Because AWS RAM resource shares exist in specific AWS Regions, choose the appropriate AWS Region from the dropdown list in the upper-right corner of the console. 

1. Select the resource share corresponding to your Amazon DataZone domain and then choose **Modify**. You can identify the RAM share for the Amazon DataZone domain using the name or ID of the domain as the RAM share is created with the name: `DataZone-<domain-name>-<domain-id>`.

1. Choose **Next** to proceed to the next step where you can check the version of the RAM policy and modify it. 

1. Make sure that the version of the RAM policy is Version 2 or higher. If not, use the dropdown to select Version 2 or higher.

1. Choose **Skip to step 4: Review and update**.

1. Choose **Update resource share**.

**Step 2: Create a subscription target from an associated account**
+ In the current release, Amazon DataZone supports creating subscription targets by using APIs only. Below are some examples of the payload you can use to create a subscription target for fulfilling subscriptions to your AWS Glue tables and Amazon Redshift tables or views. For more information, see [CreateSubscriptionTarget](https://docs.aws.amazon.com/datazone/latest/APIReference/API_CreateSubscriptionTarget.html).

  Example of subscription target for AWS Glue

  ```
  {
          "domainIdentifier": "<DOMAIN_ID>",
          "environmentIdentifier": "<ENVIRONMENT_ID>",
          "name": "<SUBSCRIPTION_TARGET_NAME>",
          "type": "GlueSubscriptionTargetType",
          "authorizedPrincipals" : ["IAM_ROLE_ARN"],
          "subscriptionTargetConfig" : [{"content": "{\"databaseName\": \"<DATABASE_NAME>\"}", "formName": "GlueSubscriptionTargetConfigForm"}],
          "manageAccessRole": "<GLUE_DATA_ACCESS_ROLE_IN_ASSOCIATED_ACCOUNT_ARN>",
          "applicableAssetTypes" : ["GlueTableAssetType"],
          "provider": "Amazon DataZone"
  }
  ```

  Example of subscription target for Amazon Redshift:

  ```
  {
          "domainIdentifier": "<DOMAIN_ID>",
          "environmentIdentifier": "<ENVIRONMENT_ID>",
          "name": "<SUBSCRIPTION_TARGET_NAME>",
          "type": "RedshiftSubscriptionTargetType",
          "authorizedPrincipals" : ["REDSHIFT_DATABASE_ROLE_NAME"],
          "subscriptionTargetConfig" : [{"content": "{\"databaseName\": \"<DATABASE_NAME>\", \"secretManagerArn\": \"<SECRET_MANAGER_ARN>\",\"clusterIdentifier\": \"<CLUSTER_IDENTIFIER>\"}", "formName": "RedshiftSubscriptionTargetConfigForm"}],
          "manageAccessRole": "<REDSHIFT_DATA_ACCESS_ROLE_IN_ASSOCIATED_ACCOUNT_ARN>",
          "applicableAssetTypes" : ["RedshiftViewAssetType", "RedshiftTableAssetType"],
          "provider": "Amazon DataZone"
  }
  ```
**Important**  
The environmentIdentifier you use in the API call above should exist in the same associated account from which you are making the API call. Otherwise, the API call will not succeed. 
The IAM role ARN you use in the "authorizedPrincipals" is the role to which Amazon DataZone will grant access to after a subscribed asset is added to the subscription target. These authorized principals must belong to the same account as the environment in which the subscription target is being created.
The value for provider field must be "Amazon DataZone" for Amazon DataZone to be able to complete subscription fulfillment. 
The database name provided in subscriptionTargetConfig should already exist in the account in which the target is being created. Amazon DataZone will not create this database. Also ensure that the manage access role has CREATE TABLE permission on this database.
Also make sure that the roles (IAM role for the AWS Glue and the database role for Amazon Redshift) being provided as the authorized principals already exist in the environment account. For Amazon Redshift subscription targets, additional updates are required for the role being assumed while connecting to the cluster. This role must have RedshiftDbRoles tag attached to the role. The value of the tag can be a comma separated list. The value should be the database role that was provided as the authorized principal while creating the subscription target. 

**Step 3: Subscribe to a new table and fulfill subscription to the new target**
+ Once you have created the subscription target, you can subscribe to a new table and Amazon DataZone will fulfill it to the above target. 

# Grant access to managed AWS Glue Data Catalog assets in Amazon DataZone
Grant access to managed AWS Glue Data Catalog assets

In Amazon DataZone, subscription requests and approved or granted subscriptions for **read** access to the assets are managed by asset owners. 

**Note**  
Access management for the AWS Glue Data Catalog assets using the AWS Lake Formation LF-TBAC method is not supported.  
Support for cross-Region sharing of assets in AWS Glue Data Catalog is not supported.

Once a subscription request to managed AWS Glue Data Catalog assets is approved, Amazon DataZone automatically adds these assets to all the existing data lake environments in the project. Amazon DataZone then grants and manages access to the approved AWS Glue Data Catalog tables on your behalf through AWS Lake Formation. For the subscriber project, assets that are granted appear in the AWS Glue Data Catalog as resources in your account. You can then use Amazon Athena to query the tables.

**Note**  
If a new data lake environment is added to the project after the subscribed AWS Glue Data Catalog assets have been automatically added to the existing data lake environments, you have to manually add these subscribed AWS Glue Data Catalog assets to this new data lake environment. You can do this by choosing the **Add grant** option in the **Data** tab of the project's overview page in the Amazon DataZone data portal.

For Amazon DataZone to be able to grant access to AWS Glue Data Catalog tables, the following conditions must be met.
+ The AWS Glue table must be Lake Formation-managed since Amazon DataZone grants access by managing Lake Formation permissions.
+ The **Manage access role** for the data lake environment used to publish the AWS Glue Data Catalog table must have the following Lake Formation permissions:
  + `DESCRIBE` and `DESCRIBE GRANTABLE` permissions on the AWS Glue database that contains the published table.
  + `DESCRIBE`, `SELECT`, `DESCRIBE GRANTABLE`, `SELECT GRANTABLE` permissions in Lake Formation on the published table itself.

For more information, see [Granting and revoking permissions on catalog resources](https://docs.aws.amazon.com/lake-formation/latest/dg/granting-catalog-permissions.html) in the *AWS Lake Formation Developer Guide*.

# Grant access to managed Amazon Redshift assets in Amazon DataZone
Grant access to managed Amazon Redshift assets

When a subscription to an Amazon Redshift table or view is approved, Amazon DataZone can automatically add the subscribed asset to all the data warehouse environments within the project, so that members of the project can query the data using the Amazon Redshift query editor link within their environments. Under the hood, Amazon DataZone, creates the necessary grants and datashares between the source and the subscription target. 

The process of granting access varies depending on where the source database (publisher) and the target database (subscriber) are located. 
+ Same cluster, same database - if data must be shared within the same database, Amazon DataZone grants permissions directly on the source table. 
+ Same cluster, different database - if data must be shared across two databases within the same cluster, Amazon DataZone creates a view in the target database and permissions are granted on the created view.
+ Same account different cluster - Amazon DataZone creates a datashare between the source and target cluster and creates a view on top of the shared table. Permissions are granted on the view.
+ Cross-account - same as above but an additional step is required to authorize cross-account datashare on the producer cluster side and another step to associate the data share on consumer cluster side.

**Note**  
If a new data warehouse environment is added to the project after the subscribed Amazon Redshift assets have been automatically added to the existing data warehouse environments, you have to manually add these subscribed Amazon Redshift assets to this new data warehouse environment. You can do this by choosing the **Add grant** option in the **Data** tab of the project's overview page in the Amazon DataZone data portal.

Make sure that your publishing and subscribing Amazon Redshift clusters meet all requirements for Amazon Redshift datashares. For more information, see [Amazon Redshift Developer Guide](https://docs.aws.amazon.com/redshift/latest/dg/welcome.html).

**Note**  
Amazon DataZone supports automatically granting subscriptions to both Amazon Redshift Cluster and Amazon Redshift Serverless assets.  
Cross-Region data sharing using Amazon Redshift is not supported.

# Grant access for approved subscriptions to unmanaged assets in Amazon DataZone
Grant access for approved subscriptions to unmanaged assets

In Amazon DataZone, subscription requests and approved or granted subscriptions for **read** access to the assets are managed by asset owners. 

Amazon DataZone enables users to publish any type of asset in the business data catalog. For some of these assets, Amazon DataZone can can automatically manage access grants. These assets are called **managed assets** and include Lake Formation-managed AWS Glue Data Catalog tables and Amazon Redshift tables and views. All other assets to which Amazon DataZone can't automatically grant subscriptions are called **unmanaged**.

Amazon DataZone provides a path for you to manage access grants for your unmanaged assets. When a subscription to an asset in the business data catalog is approved by the data owner, Amazon DataZone publishes an event in Amazon EventBridge in the your account along with all the necessary information in the payload that enables you to create the access grants between the source and the target. When you receive this event, you can trigger a custom handler which can use the information in the event to create necessary grants or permissions. Once you have granted the access, you can report back and update the status of the subscription in Amazon DataZone so that it can notify the user(s) who subscribed to the asset that they can start consuming the asset. For more information, see [Amazon DataZone events and notifications](working-with-events-and-notifications.md).

# Query data in Amazon Athena or Amazon Redshift in Amazon DataZone
Query data in Amazon Athena or Amazon Redshift

In Amazon DataZone, once a subscriber has access to an asset in the catalog, they can consume it (query and analyze) using Amazon Athena or Amazon Redshift query editor v2. You must be a project owner or contributor to complete this task. Depending on the blueprints enabled in the project, Amazon DataZone provides links to Amazon Athena and/or Amazon Redshift query editor v2 on the right-hand side pane of the project page in the data portal.

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. In the Amazon DataZone data portal, choose **Browse Projects List** and then find and choose the project where you have the data that you want to analyze.

1. If the Data Lake blueprint is enabled on this project, a link to Amazon Athena is displayed in the right-hand side panel on the project's home page. 

   If the Data Warehouse blueprint is enabled on this project, a link to the query editor is displayed in the right-hand side panel on the project's home page. 
**Note**  
Blueprints are defined in the environment profile with which a project is created.

**Topics**
+ [

## Query data using Amazon Athena
](#query-athena-with-deep-link)
+ [

## Query data using Amazon Redshift
](#query-redshift-with-deep-link)

## Query data using Amazon Athena


Choose the Amazon Athena link to open the Amazon Athena query editor in a new tab in the browser using the project’s credentials for authentication. The Amazon DataZone project you're working with is automatically selected as the current workgroup in the query editor. 

In the Amazon Athena query editor, write and run your queries. Some common tasks include:
+ [Query and analyze your subscribed assets](#query-analyze-subscribed-data)
+ [Create new tables](#create-new-tables)
+ [Create a table from query results (CTAS) from an external S3 bucket](#create-tables-external-s3-bucket)

### Query and analyze your subscribed assets


If access to the assets that your project is subscribed to is not granted automatically by Amazon DataZone, you must be authorized to access the underlying data. For more information on how to grant access to these assets, see [Grant access for approved subscriptions to unmanaged assets in Amazon DataZone](grant-access-to-unmanaged-asset.md).

If access to the assets that your project is subscribed to is [granted automatically by Amazon DataZone](grant-access-to-glue-asset.md), you can run SQL queries on the tables and see the results in Amazon Athena. For more information about using SQL in Amazon Athena, see [SQL reference for Athena](grant-access-to-glue-asset).

When you navigate to the Amazon Athena query editor after choosing the Amazon Athena link in the right-hand side panel on the project's home page, a **Project** dropdown is displayed in the top-right corner of the Amazon Athena query editor and your project context is automatically selected. 

You can see the following databases in the **Database** dropdown:
+ A publishing database (`{environmentname}_pub_db`). The purpose of this database is to provide you with an environment where you can produce new data within the context of your project and then be able to publish this data into the Amazon DataZone catalog. Project owners and contributors have read and write access to this database. Project viewers have only read access to this database. 
+ A subscription database (`{environmentname}_sub_db`). The purpose of this database is to share with you the data to which you have subscribed as a project member in the Amazon DataZone catalog, and to enable you to query that data.

### Create new tables


If you have connected to an external S3 bucket, you can use Amazon Athena to query and analyze the assets from an external Amazon S3 bucket. In this scenario, Amazon DataZone doesn't have permissions to grant access directly to the underlying data in the external Amazon S3 bucket, and the external Amazon S3 data created outside the project is not automatically managed in Lake Formation, and can't be managed by Amazon DataZone. An alternative is to copy the data from the external Amazon S3 bucket to a new table inside the project’s Amazon S3 bucket using a `CREATE TABLE` statement in Amazon Athena. When you run a `CREATE TABLE` query in Amazon Athena, you register your table with the AWS Glue Data Catalog. 

To specify the path to your data in Amazon S3, use the `LOCATION` property, as shown in the following example:

```
CREATE EXTERNAL TABLE 'test_table'(
...
)
ROW FORMAT ...
STORED AS INPUTFORMAT ...
OUTPUTFORMAT ...
LOCATION 's3://bucketname/folder/'
```

For more information, see [Table location in Amazon S3](https://docs.aws.amazon.com/athena/latest/ug/tables-location-format.html).

### Create a table from query results (CTAS) from an external S3 bucket


When you subscribe to an asset, access to the underlying data is read-only. You can use Amazon Athena to create a copy of the table. In Amazon Athena, `A CREATE TABLE AS SELECT (CTAS)` query creates a new table in Amazon Athena from the results of a `SELECT` statement from another query. For information about the CTAS syntax, see [CREATE TABLE AS](https://docs.aws.amazon.com/athena/latest/ug/create-table-as.html). 

The following example creates a table by copying all columns from a table:

```
CREATE TABLE new_table AS
SELECT *
FROM old_table;
```

In the following variation of the same example, your `SELECT` statement also includes a `WHERE` clause. In this case, the query selects only those rows from the table that satisfy the `WHERE` clause:

```
CREATE TABLE new_table AS
SELECT *
FROM old_table WHERE condition;
```

The following example creates a new query that runs on a set of columns from another table:

```
CREATE TABLE new_table AS
SELECT column_1, column_2, ... column_n
FROM old_table;
```

This variation of the same example creates a new table from specific columns from multiple tables:

```
CREATE TABLE new_table AS
SELECT column_1, column_2, ... column_n
FROM old_table_1, old_table_2, ... old_table_n;
```

These newly created tables are now a part of your projects’ AWS Glue database, and can be made discoverable by others and shared with other Amazon DataZone projects by publishing the data as an asset to the Amazon DataZone catalog. 

## Query data using Amazon Redshift


In the Amazon DataZone data portal, open an environment that uses the data warehouse blueprint. Choose the **Amazon Redshift** link in the right-hand panel on the environment page. This opens a confirmation dialog with necessary details that help you establish a connection to your environmemnt’s Amazon Redshift cluster or Amazon Redshift Serverless workgroup in the Amazon Redshift query editor v2.0. Once you have identified the necessary details to establish the connection, choose the **Open Amazon Redshift** button. This opens the Amazon Redshift query editor v2.0 in a new tab in the browser using temporary credentials of the Amazon DataZone environment. 

In the query editor, follow the steps below depending on whether your environment is using an Amazon Redshift Serverless workgroup or an Amazon Redshift cluster. 

For an Amazon Redshift Serverless workgroup

1. In the query editor, identify you Amazon DataZone environment’s Amazon Redshift Serverless workgroup, right-click it and choose **Create a connection**. 

1. Choose **Federated User** for authentication.

1. Provide the name of the Amazon DataZone environment's database. 

1. Choose **Create connection**.

For an Amazon Redshift cluster:

1. In the query editor, identify you Amazon DataZone environment’s Amazon Redshift cluster, right-click it and choose **Create a connection**. 

1. Select **Temporary credentials using your IAM identity** for authentication. 

1. If the above authentication method is not available, open **Account settings** by choosing the gear button in the bottom left corner, choose **Authenticate with IAM credentials** and save. This is a one-time-only setting.

1. Provide the name of the Amazon DataZone environment’s database to create the connection. 

1. Choose **Create connection**.

Now you can start querying against the tables and views within the Amazon Redshift cluster or Amazon Redshift Serverless workgroup configured for your Amazon DataZone environment. 

Any Amazon Redshift tables or views that you have subscribed to are linked to the Amazon Redshift cluster or Amazon Redshift Serverless workgroup that is configured for the environment. You can subscribe to the tables and views as well as publish any new tables and views that you create in your environment’s cluster or database.

For example, let's take a scenario in which an environment is linked to an Amazon Redshift cluster called `redshift-cluster-1` and a database called `dev` in that cluster. Using the Amazon DataZone data portal, you can query the tables and views that are added to your environment. Under the `Analytics tools` section in the right-hand side pane of the data portal, you can choose the Amazon Redshift link for this environment, which opens the query editor. You can then right-click on `redshift-cluster-1` cluster and create a connection using **Temporary credentials using your IAM identity**. Once the connection is established, you can see all the tables and views to which your environment has access under the **dev** database.

# Metadata enforcement rules for subscription requests


The metadata enforcement rules for subscription requests feature in Amazon DataZone strengthens data governance by enabling domain unit owners to establish clear metadata requirements for data consumers, streamlining access requests and enhancing data governance. This feature enables organizations to align with organization’s metadata standards, implement custom workflows, and provide a consistent, governed data access experience. 

The feature is supported in all the AWS commercial Regions where Amazon DataZone is currently available.

Domain unit owners can can complete the following procedure to configure metadata enforcement in Amazon DataZone:

1. Navigate to the Amazon DataZone data portal using the data portal URL and log in using your SSO or AWS credentials. If you’re an Amazon DataZone administrator, you can obtain the data portal URL by accessing the Amazon DataZone console at https://console.aws.amazon.com/datazone in the AWS account where the Amazon DataZone domain was created.

1. Choose **Domains**, navigate to the **Domain units** tab and choose the domain unit that you want to work with.

1. Choose the **Rules** tab and then choose **Add**.

1. On the **Create required metadata form rule** page, do the following and then choose **Add rule**:
   + Specify a name for your rule.
   + Under **Action**, choose **Subscription request**.
   + Under **Required forms**, choose **Add metadata form**, choose a metadata form within the domain / domain unit that you want to add to this rule, and then choose **Add**. You can add up to 5 metadata forms per rule.
   + Under **Scope**, specify with which data entities you want to associate these forms. You can choose data products and/or data assets.
   + Under **Data asset types**, specify whether the rule applies across all asset types or limit it to selected asset types. 
   + Under **Projects**, specify whether the required forms will be associated with data products and/or assets published by all projects or only selected projects in this domain unit. Also, check **Cascade rule to child domain units** if you want child domain units to inherit this requirement. 

Once metadata enforcement is configured, data consumers can complete the following procedure to request access:

1. Navigate to the Amazon DataZone data portal using the data portal URL and log in using your SSO or AWS credentials. If you’re an Amazon DataZone administrator, you can obtain the data portal URL by accessing the Amazon DataZone console at https://console.aws.amazon.com/datazone in the AWS account where the Amazon DataZone domain was created.

1. Use the search bar to search for and choose the asset to which you want to subscribe, and then choose **Subscribe**. 

1. In the **Subscribe** pop up window, provide the following information:
   + The project that you want to subscribe to the asset.
   + A short justification for your subscription request.
   + Complete Required Metadata - specify the required metadata fields as specified by the domain unit. If mandatory fields are incomplete, they are highlighted, and submission is disabled until resolved. Once all the mandatory fields are entered, select **Apply**.

1. Select **Request** to submit the subscription request. After submitting, an event is generated in EventBridge, which can be used in custom workflows outside of Amazon DataZone as needed. You receive a notification in the data portal when the publisher approves your request.

Data producers can complete the following procedure to approve the subscription request:

**To approve or reject a subscription request**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. In the data portal, choose **Browse projects list** and select the project that contains the asset with the subscription request.

1. Navigate to the **Data** tab, then choose **Incoming requests** from the left navigation pane.

1. Locate the request and choose **View request**. You can filter by **Pending** to see only requests that are still open.

1. Review the subscription request and reason for access, and decide whether to approve or reject it.

   Data producers can review the provided metadata, including document links and account IDs, to determine if the request meets compliance and workflow requirements before granting access.

1. To approve, select between the two options:
   + **Full access**: If you choose to approve the subscription with full access option, the subscriber will get access to all the rows and columns in your data asset. 
   + **Approve with row and column filters**: To limit access to specific rows and columns of data, you can choose the option to approve with row and column filters. For more information, see [Fine-grained access control to data in Amazon DataZone](fine-grained-access-control.md). 
     + Select **Choose filters**, and then from the drop down select one or more available filters you want to apply to the subscription. 
     + To create a new filter you can choose Create new filter option, which opens a new page to create a new row or column filter. For more information, see [Create column filters in Amazon DataZone](create-column-filter.md) and [Create row filters in Amazon DataZone](create-row-filter.md).

1. (Optional) Enter a response that explains your reason for accepting or rejecting the request.

1. Choose either **Approve**.

# Analyze Amazon DataZone subscribed data with external analytics applications via JDBC connection
Analyze your subscribed data with external analytics applications via JDBC connection

Amazon DataZone enables data consumers to easily locate and subscribe to data from multiple sources within a single project and analyze this data using Amazon Athena, Amazon Redshift Query Editor, and Amazon SageMaker.

Amazon DataZone also supports authentication via the Athena JDBC driver that enables users to query their subscribed Amazon DataZone data using popular external SQL and analytics tools, such as SQL Workbench, DBeaver, Tableau, Domino, Power BI and many others. Users can authenticate using their corporate credentials through SSO or IAM and begin analyzing their subscribed data within their Amazon DataZone projects.

Amazon DataZone's support of the Athena JDBC driver provides the following benefits:
+ Greater tool choice for querying and visualization - data consumers can connect to Amazon DataZone using their preferred tools from a wide range of analytics tools that support a JDBC connection. This enables them to continue using the software they are familiar with without the need to learn new tools for data consumption. 
+ Programmatic access - a JDBC connection to access-governed data via servers or custom applications enables data consumers to perform automated and more complex data operations.

You can use your JDBC URL to connect your external analytics tools to your Amazon DataZone subscribed data. To obtain your JDBC URL, perform the following procedure:

**Important**  
In the current release, Amazon DataZone supports authentication using the Amazon Athena JDBC Driver. To complete this procedure, make sure that you have downloaded and installed the latest [Athena JDBC driver](https://docs.aws.amazon.com/athena/latest/ug/jdbc-v3-driver.html) for your analytics application of choice. 

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. In the Amazon DataZone data portal, choose **Browse Projects List** and then find and choose the project where you have the data that you want to analyze.

1. In the right-hand side panel on the project's home page, choose **Connect with JDBC**.

1. In the **JDBC parameters** pop up window, choose your authentication method (SSO credentials or IAM credentials) and then copy the string or the individual parameters of the JDBC URL. You can then use it to connect to your external analytics application. 

When you connect your external analytics application to Amazon DataZone using your JBDC query or parameters, you invoke the `RedeemAccessToken` API. The `RedeemAccessToken` API exchanges an Identity Center access token for the `AmazonDataZoneDomainExecutionRole` credentials, which are used to call the `GetEnvironmentCredentials` API.

For more information about the authentication mechanism that uses IAM credentials to connect to Amazon DataZone-governed data in Athena, see [DataZone IAM Credentials Provider](https://docs.aws.amazon.com/athena/latest/ug/jdbc-v3-driver-datazone-iamcp.html). For more information about the authentication mechanism that enables connecting to Amazon DataZone-governed data in Athena using IAM Identity Center, see [DataZone Idc Credentials Provider](https://docs.aws.amazon.com/athena/latest/ug/jdbc-v3-driver-datazone-idc.html).

## RedeemAccessToken API Reference


**Request syntax**

```
POST /sso/redeem-token HTTP/1.1
Content-type: application/json

{
   "domainId": "string",
   "accessToken": "string"
}
```

**Request parameters**

The request uses the following parameters.

**DomainId**  
The ID of the Amazon DataZone domain.  
Pattern: ^dzd[-\$1][a-zA-Z0-9\$1-]\$11,36\$1\$1   
Required: yes

**accessToken**  
The Identity Center access token.  
Type: string  
Required: yes

**Response syntax**

```
HTTP/1.1 200
Content-type: application/json

{
   "credentials": AwsCredentials
}
```

**Response elements**

**credentials**  
The `AmazonDataZoneDomainExecutionRole` credentials that are used to call the `GetEnvironmentCredentials` API.  
Type: Array of `AwsCredentials` objects. This data type includes the following properties:  
+ accessKeyId: AccessKeyId
+ secretAccessKey: SecretAccessKey
+ sessionToken: SessionToken
+ expiration: Timestamp

**accessToken**  
The Identity Center access token.  
Type: string  
Required: yes

**Errors**

**AccessDeniedException**  
You do not have sufficient access to perform this action.  
HTTP Status Code: 403

**ResourceNotFoundException**  
The specified resource cannot be found.  
HTTP Status Code: 404

**ValidationException**  
The input fails to satisfy the constraints specified by the AWS service.  
HTTP Status Code: 400

**InternalServerException**  
The request has failed because of an unknown error, exception or failure.  
HTTP Status Code: 500