# Data inventory and publishing in Amazon DataZone
Data inventory and publishing

This section describes the tasks and procedures that you want to perform in order to create an inventory of your data in Amazon DataZone and to publish your data in Amazon DataZone.

In order to use Amazon DataZone to catalog your data, you must first bring your data (assets) as inventory of your project in Amazon DataZone. Creating inventory for a particular project, makes the assets discoverable only to that project’s members. Project inventory assets are not available to all domain users in search/browse unless explicitly published. After creating a project inventory, data owners can curate their inventory assets with the required business metadata by adding or updating business names (asset and schema), descriptions (asset and schema), read me, glossary terms (asset and schema), and metadata forms. 

The next step of using Amazon DataZone to catalog your data, is to make your project’s inventory assets discoverable by the domain users. You can do this by publishing the inventory assets to the Amazon DataZone catalog. Only the latest version of the inventory asset can be published to the catalog and only the latest published version is active in the discovery catalog. If an inventory asset is updated after it's been published into the Amazon DataZone catalog, you must explicitly publish it again in order for the latest version to be in the discovery catalog. 

For more information, see [Amazon DataZone terminology and concepts](datazone-concepts.md)

**Topics**
+ [

# Configure Lake Formation permissions for Amazon DataZone
](lake-formation-permissions-for-datazone.md)
+ [

# Create custom asset types in Amazon DataZone
](create-asset-types.md)
+ [

# Create and run an Amazon DataZone data source for the AWS Glue Data Catalog
](create-glue-data-source.md)
+ [

# Create and run an Amazon DataZone data source for Amazon Redshift
](create-redshift-data-source.md)
+ [

# Edit a data source in Amazon DataZone
](edit-data-source.md)
+ [

# Delete a data source in Amazon DataZone
](delete-data-source.md)
+ [

# Publish assets to the Amazon DataZone catalog from the project inventory
](publishing-data-asset.md)
+ [

# Manage inventory and curate assets in Amazon DataZone
](update-metadata.md)
+ [

# Manually create an asset in Amazon DataZone
](create-data-asset-manually.md)
+ [

# Unpublish an asset from the Amazon DataZone catalog
](archive-data-asset.md)
+ [

# Delete an Amazon DataZone asset
](delete-data-asset.md)
+ [

# Manually start a data source run in Amazon DataZone
](manually-start-data-source-run.md)
+ [

# Asset revisions in Amazon DataZone
](asset-versioning.md)
+ [

# Data quality in Amazon DataZone
](datazone-data-quality.md)
+ [

# Using machine learning and generative AI in Amazon DataZone
](autodoc.md)
+ [

# Data lineage in Amazon DataZone
](datazone-data-lineage.md)
+ [

# Metadata enforcement rules for publishing
](metadata-rules-publishing.md)

# Configure Lake Formation permissions for Amazon DataZone


When you create an environment using the built-in data lake blueprint (**DefaultDataLake**), an AWS Glue database is added in Amazon DataZone as part of this environment's creation process. If you want to publish assets from this AWS Glue database, no additional permissions are needed. 

However, if you want to publish assets and subscribe to assets from an AWS Glue database that exists outside of your Amazon DataZone environment, you must explicitly provide Amazon DataZone with the permissions to access tables in this external AWS Glue database. To do this, you must complete the following settings in AWS Lake Formation and attach necessary Lake Formation permissions to the [AmazonDataZoneGlueAccess-<region>-<domainId>](glue-manage-access-role.md) .
+ Configure the Amazon S3 location for your data lake in AWS Lake Formation with **Lake Formation** permission mode or **Hybrid access mode**. For more information, see [https://docs.aws.amazon.com/lake-formation/latest/dg/register-data-lake.html](https://docs.aws.amazon.com/lake-formation/latest/dg/register-data-lake.html).
+ Remove the `IAMAllowedPrincipals` permission from the Amazon Lake Formation tables for which Amazon DataZone handles permissions. For more information, see [https://docs.aws.amazon.com/lake-formation/latest/dg/upgrade-glue-lake-formation-background.html](https://docs.aws.amazon.com/lake-formation/latest/dg/upgrade-glue-lake-formation-background.html).
+ Attach the following AWS Lake Formation permissions to the [AmazonDataZoneGlueAccess-<region>-<domainId>](glue-manage-access-role.md):
  + `Describe` and `Describe grantable` permissions on the database where the tables exist
  + `Describe`, `Select`, `Describe Grantable`, `Select Grantable` permissions on the all the tables in the above database that you want DataZone to manage access on your behalf.

**Note**  
Amazon DataZone supports the AWS Lake Formation Hybrid mode. Lake Formation hybrid mode enables you to start managing permissions on you AWS Glue databases and tables through Lake Formation, while continuing to maintain any existing IAM permissions on these tables and databases. For more information, see [Amazon DataZone integration with AWS Lake Formation hybrid mode](hybrid-mode.md) 

For more information, see [Troubleshooting AWS Lake Formation permissions for Amazon DataZone](troubleshooting-datazone.md#troubleshooting-lake-formation-permissions). 

# Amazon DataZone integration with AWS Lake Formation hybrid mode


Amazon DataZone is integrated with AWS Lake Formation hybrid mode. This integration enables you to easily publish and share your AWS Glue tables through Amazon DataZone without the need to register them in AWS Lake Formation first. Hybrid mode allows you to start managing permissions on your AWS Glue tables through AWS Lake Formation while continuing to maintain any existing IAM permissions on these tables. 

To get started, you can enable the **Data location registration** setting under the **DefaultDataLake** blueprint in the Amazon DataZone management console.

**Enable integration with AWS Lake Formation hybrid mode**

1. Navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with your account credentials.

1. Choose **View domains** and choose the domain where you want to enable the integration with AWS Lake Formation hybrid mode.

1. On the domain details page, navigate to the **Blueprints** tab.

1. From the **Blueprints** list, choose the **DefaultDataLake** blueprint. 

1. Make sure that the DefaultDataLake blueprint is enabled. If it’s not enabled, follow the steps in [Enable built-in blueprints in the AWS account that owns the Amazon DataZone domain](working-with-blueprints.md#enable-default-blueprint) to enable it in your AWS Account.

1. On the DefaultDataLake details page, open the **Provisioning** tab and choose the **Edit** button in the top right corner of the page. 

1. Under **Data location registration**, check the box to enable the data location registration.

1. For the data location management role, you can create a new IAM role or select an existing IAM role. Amazon DataZone uses this role to manage read/write access to the chosen Amazon S3 bucket(s) for Data Lake using AWS Lake Formation hybrid access mode. For more information, see [AmazonDataZoneS3Manage-<region>-<domainId>](AmazonDataZoneS3Manage.md). 

1. Optionally, you can choose to exclude certain Amazon S3 locations if you do not want Amazon DataZone to automatically register them in hybrid mode. For this, complete the following steps:
   + Choose the toggle button to exclude specified Amazon S3 locations.
   + Provide the URI of the Amazon S3 bucket you want to exclude. 
   + To add additional buckets, choose **Add S3 location.**
**Note**  
Amazon DataZone only allows excluding a root S3 location. Any S3 locations within the path of a root S3 location will be automatically excluded from registration. 
   + Choose **Save changes**.

 Once you have enabled the data location registration setting in your AWS account, when a data consumer subscribes to an AWS Glue table managed through IAM permissions, Amazon DataZone will first register the Amazon S3 locations of this table in hybrid mode, and then grant access to the data consumer by managing permissions on the table through AWS Lake Formation. This ensures that IAM permissions on the table continue to exist with newly granted AWS Lake Formation permissions, without disrupting any existing workflows.

## How to handle encrypted Amazon S3 locations when enabling AWS Lake Formation hybrid mode integration in Amazon DataZone


If you are using an Amazon S3 location encrypted with an Customer managed or AWS Managed KMS key, the **AmazonDataZoneS3Manage** role must have the permission to encrypt and decrypt data with the KMS key, or the KMS key policy must grant permissions on the key to the role. 

If your Amazon S3 location is encrypted with an AWS managed key, add the following inline policy to the **AmazonDataZoneDataLocationManagement** role:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:Encrypt",
                "kms:Decrypt",
                "kms:ReEncrypt*",
                "kms:GenerateDataKey*",
                "kms:DescribeKey"
            ],
            "Resource": "arn:aws:kms:us-east-1:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab"
        }
    ]
}
```

------

If your Amazon S3 location is encrypted with a customer managed key, do the following:

1. Open the AWS KMS console at [https://console.aws.amazon.com/kms](https://console.aws.amazon.com/kms) and log in as an AWS Identity and Access Management (IAM) administrative user or as a user who can modify the key policy of the KMS key used to encrypt the location.

1. In the navigation pane, choose **Customer managed keys**, and then choose the name of the desired KMS key.

1. On the KMS key details page, choose the **Key policy** tab, and then do one of the following to add your custom role or the Lake Formation service-linked role as a KMS key user:
   + If the default view is showing (with Key administrators, Key deletion, Key users, and Other AWS accounts sections) – under the **Key users** section, add the **AmazonDataZoneDataLocationManagement** role.
   + If the key policy (JSON) is showing – edit the policy to add **AmazonDataZoneDataLocationManagement** role to the object "Allow use of the key," as shown in the following example

     ```
     ...
             {
                 "Sid": "Allow use of the key",
                 "Effect": "Allow",
                 "Principal": {
                     "AWS": [
                         "arn:aws:iam::111122223333:role/service-role/AmazonDataZoneDataLocationManage-<region>-<domain-id>"
                     ]
                 },
                 "Action": [
                     "kms:Encrypt",
                     "kms:Decrypt",
                     "kms:ReEncrypt*",
                     "kms:GenerateDataKey*",
                     "kms:DescribeKey"
                 ],
                 "Resource": "*"
             },
             ...
     ```

**Note**  
If the KMS key or Amazon S3 location are not in the same AWS account as the data catalog, follow the instructions in [Registering an encrypted Amazon S3 location across AWS accounts](https://docs.aws.amazon.com/lake-formation/latest/dg/register-cross-encrypted.html).

# Create custom asset types in Amazon DataZone
Create custom asset types

In Amazon DataZone, assets represent specific types of data resources such as database tables, dashboards, or machine learning models. To provide consistency and standardization when describing catalog assets, an Amazon DataZone domain must have a set of asset types that define how assets are represented in the catalog. An asset type defines the schema for a specific type of asset. An asset type has a set of required and optional nameable metadata form types (for example, govForm or GovernanceFormType). Asset type in Amazon DataZone are versioned. When assets are created, they are validated against the schema defined by their asset type (typically latest version), and if an invalid structure is specified, asset creation fails. 

**System asset types** - Amazon DataZone provisions service-owned system asset types (including GlueTableAssetType, GlueViewAssetType, RedshiftTableAssetType, RedshiftViewAssetType, and S3ObjectCollectionAssetType) and system form types (including DataSourceReferenceFormType, AssetCommonDetailsFormType, and SubscriptionTermsFormType). System asset types cannot be edited. 

**Custom asset types** - for creating custom asset types, you start by creating the required metadata form types and glossaries to use in the form types. You can then create custom asset types by specifying name, description, and associated metadata forms which can be required or optional. 

For asset types with structured data, to represent the column schema in the data portal, you can use the `RelationalTableFormType` to add the technical metadata to your columns, including column names, descriptions, and data types) and the ` ColumnBusinessMetadataForm` to add the business descriptions of the columns, including business names, glossary terms, and custom key value pairs. 

To create a custom asset type via the Data portal, complete the following steps:

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project where you want to create a custom asset type.

1. Navigate to the **Data** tab for the project.

1. Choose **Asset types** from the left navigation pane, then choose **Create asset type**.

1. Specify the following and then choose **Create**.
   + **Name** - the name of the custom asset type 
   + **Description** - the description of the custom asset type.
   + **Choose Add metadata forms** to add metadata forms to this custom asset type.

1. Once the custom asset type is created, you can use it to create assets.

To create a custom asset type via the APIs, complete the following steps:

1. Create a metadata form type by invoking the `CreateFormType` API action.

   The following is an Amazon SageMaker example:

   ```
   m_model = "
   
   structure SageMakerModelFormType {
      @required
      @amazon.datazone#searchable
      modelName: String
   
      @required
      modelArn: String
   
      @required
      creationTime: String
   }
   "
   
   CreateFormType(
       domainIdentifier="my-dz-domain",
       owningProjectIdentifier="d4bywm0cja1dbb",
       name="SageMakerModelFormType",
       model=m_model
       status="ENABLED"
       )
   ```

1. Next, you can create an asset type by invoking the `CreateAssetType` API action. You can create asset types only via Amazon DataZone APIs using the available system form types (`SubscriptionTermsFormType` in the below example) or your custom form types. For system form types, the type name must begin with `amazon.datazone`.

   ```
   CreateAssetType(
       domainIdentifier="my-dz-domain",
       owningProjectIdentifier="d4bywm0cja1dbb",
       name="SageMakerModelAssetType",
       formsInput={
           "SageMakerModelForm": {
               "typeIdentifier": "SageMakerModelFormType",
               "typeRevision": 7,
               "required": True,
           },
           "SubscriptionTerms": {
               "typeIdentifier": "amazon.datazone.SubscriptionTermsFormType",
               "typeRevision": 1,
               "required": False,
           },
       },
   )
   ```

   The following is an example for creating an asset type for structured data:

   ```
   CreateAssetType(
       domainIdentifier="my-dz-domain",
       owningProjectIdentifier="d4bywm0cja1dbb",
       name="OnPremMySQLAssetType",
       formsInput={
           "OnpremMySQLForm": {
               "typeIdentifier": "OnpremMySQLFormType",
               "typeRevision": 5,
               "required": True,
           },
           "RelationalTableForm": {
               "typeIdentifier": "amazon.datazone.RelationalTableFormType",
               "typeRevision": 1,
               "required": True,
           },
           "ColumnBusinessMetadataForm": {
               "typeIdentifier": "amazon.datazone.ColumnBusinessMetadataFormType",
               "typeRevision": 1,
               "required": False,
           },
           "SubscriptionTerms": {
               "typeIdentifier": "amazon.datazone.SubscriptionTermsFormType",
               "typeRevision": 1,
               "required": False,
           },
       },
   )
   ```

1. And now, you can create an asset using the custom asset types you created in the steps above.

   ```
   CreateAsset(
      domainIdentifier="my-dz-domain",
      owningProjectIdentifier="d4bywm0cja1dbb",
      typeIdentifier="SageMakerModelAssetType",
      name="MyModelAsset",
      glossaryTerms="xxx",
      formsInput=[{
           "formName": "SageMakerModelForm",
           "typeIdentifier": "SageMakerModelFormType",
           "content": "{\n \"ModelName\" : \"sample-ModelName\",\n \"ModelArn\" : \"999999911111\",\n \"CreationTime\" : \"2025-01-01 18:00:00.000\"}"
           }
           ]
   )
   ```

   And in this example you're creating a structured data asset:

   ```
   CreateAsset(
      domainIdentifier="my-dz-domain",
      owningProjectIdentifier="d4bywm0cja1dbb",
      typeIdentifier="OnPremMySQLAssetType",
      name="MyModelAsset",
      glossaryTerms="xxx",
      formsInput=[{
           "formName": "RelationalTableForm",
           "typeIdentifier": "amazon.datazone.RelationalTableFormType",
           "content": ".."
           },
           {
           "formName": "OnpremMySQLForm",
           "typeIdentifier": "OnpremMySQLFormType",
           "content": ".."
           },
           {
           "formName": "mySQLTableForm",
           "typeIdentifier": "MySQLTableFormType",
           "typeRevision": "1",
           "content": ".."
           },
           {
           "formName": "AssetCommonDetailsForm",
           "typeIdentifier": "amazon.datazone.AssetCommonDetailsFormType",
           "content": "..."
           }, 
           .....
           ]
   )
   ```

# Create and run an Amazon DataZone data source for the AWS Glue Data Catalog
Create and run a data source for the AWS Glue Data Catalog

In Amazon DataZone, you can create an AWS Glue Data Catalog data source in order to import technical metadata of database tables from AWS Glue. To add a data source for the AWS Glue Data Catalog, the source database must already exist in AWS Glue. 

When you create and run an AWS Glue data source, you add assets from the source AWS Glue database to your Amazon DataZone project's inventory. You can run your AWS Glue data sources on a set schedule or on demand to create or update your assets' technical metadata. During the data source runs, you can optionally choose to publish your assets to the Amazon DataZone catalog and thus make them discoverable by all domain users. You can also publish your project inventory assets after editing their business metadata. Domain users can search for and discover your published assets, and request subscriptions to these assets. 

**To add an AWS Glue data source**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project to which you want to add the data source.

1. Navigate to the **Data** tab for the project.

1. Choose **Data sources** from the left navigation pane, then choose **Create data source**.

1. Configure the following fields:
   + **Name** – The data source name.
   + **Description** – The data source description.

1. Under **Data source type**, choose **AWS Glue**.

1. Under **Select an environment**, specify an environment in which to publish the AWS Glue tables.

1. Under **Data selection**, provide an AWS Glue database and enter your table selection criteria. For example, if you choose **Include** and enter `*corporate`, the database will include all source tables that end with the word `corporate`.

   You can either choose an AWS Glue database form the dropdown or type a database name. The dropdown includes two databases: the publishing database and the subscription database of the environment. If you want to bring assets form a database that is not created by the environment, then you must type the name of the database instead of selecting it from the dropdown.

   You can add multiple include and exclude rules for tables within a single database. You can also add multiple databases using the **Add another database** button.

   
1. Under **Data quality**, you can choose to **Enable data quality for this data source**. If you do this, Amazon DataZone imports your existing AWS Glue data quality output into your Amazon DataZone catalog. By default, Amazon DataZone imports the latest existing 100 quality reports with no expiration date from AWS Glue.

   Data quality metrics in Amazon DataZone help you understand the completeness and accuracy of your data sources. Amazon DataZone pulls these data quality metrics from AWS Glue in order to provide context during a point in time, for example, during a business data catalog search. Data users can see how data quality metrics change over time for their subscribed assets. Data producers can ingest AWS Glue data quality scores on a schedule. The Amazon DataZone business data catalog can also display data quality metrics from third-party systems through data quality APIs. For more information, see [Data quality in Amazon DataZone](datazone-data-quality.md) 

1. Choose **Next**.

1. For **Publishing settings**, choose whether assets are immediately discoverable in the business data catalog. If you only add them to the inventory, you can choose subscription terms later and publish them to the business data catalog. 

1. For **Automated business name generation**, choose whether to automatically generate metadata for assets as they're imported from the source.

1. (Optional) For **Metadata forms**, add forms to define the metadata that is collected and saved when the assets are imported into Amazon DataZone. For more information, see [Create a metadata form in Amazon DataZone](create-metadata-form.md).

1. For **Run preference**, choose when to run the data source.
   + **Run on a schedule** – Specify the dates and time to run the data source.
   + **Run on demand** – You can manually initiate data source runs.

1. Choose **Next**.

1. Review your data source configuration and choose **Create**.

**Note**  
When an AWS Glue data source is created, Amazon DataZone creates the Lake Formation 'read only' permissions for the IAM role of the environment that is used to create the data source to access all the tables in the AWS Glue databases used in the data source. You can monitor the status of these grants under data sources on your environment's details page. Amazon DataZone adds the following AWS tags to the AWS Glue database when granting access to the publishing environment’s IAM role: `DataZoneDiscoverable_${domainId}: true`  
For the environments created prior to the current release of Amazon DataZone, project members will not be able to see granted tables in Amazon Athena.

# Create and run an Amazon DataZone data source for Amazon Redshift
Create and run a data source for Amazon Redshift

In Amazon DataZone, you can create an Amazon Redshift data source in order to import technical metadata of database tables and views from the Amazon Redshift data warehouse. To add a Amazon DataZone data source for Amazon Redshift, the source data warehouse must already exist in the Amazon Redshift.

When you create and run an Amazon Redshift data source, you add assets from the source Amazon Redshift data warehouse to your Amazon DataZone project's inventory. You can run your Amazon Redshift data sources on a set schedule or on demand to create or update your assets' technical metadata. During the data source runs, you can optionally choose to publish your project inventory assets to the Amazon DataZone catalog and thus make them discoverable by all domain users. You can also publish your inventory assets after editing their business metadata. Domain users can search for and discover your published assets and request subscriptions to these assets.

**To add an Amazon Redshift data source**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project to which you want to add the data source.

1. Navigate to the **Data** tab for the project.

1. Choose **Data sources** from the left navigation pane, then choose **Create data source.**

1. Configure the following fields:
   + **Name** – The data source name.
   + **Description** – The data source description.

1. Under **Data source type**, choose **Amazon Redshift**.

1. Under **Select an environment**, specify an environment in which to publish the Amazon Redshift tables.

1. Depending on the environment you select, Amazon DataZone will automatically apply the Amazon Redshift credentials and other parameters directly from the environment or give you the option to choose your own. 
   + If you have selected an environment that only allows publishing from environment’s default Amazon Redshift schema, then Amazon DataZone will automatically apply the Amazon Redshift credentials and other parameters including the Amazon Redshift cluster or workgroup name, AWS secret, database name, and schema name. You cannot edit these auto-populated parameters.
   + If you select an environment that does not allow to publish any data, you will not be able to proceed with data source creation.
   + If you select an environment that allows publishing data from any schema, you will see the option to either use the credentials and other Amazon Redshift parameters from the environment or to enter your own credentials/parameters. 

1. If you choose to use your own credentials to create the data source, provide the following details:
   + Under **Provide Amazon Redshift credentials**, choose whether to use a provisioned Amazon Redshift cluster or an Amazon Redshift Serverless workspace as your data source.
   + Depending on your selection in the step above, choose your Amazon Redshift cluster or workspace from the dropdown menu, then choose the secret in AWS Secrets Manager to use for authentication. You can choose an existing secret or create a new one. 
   + In order for the existing secret to appear in the drop down, make sure that your secret in AWS Secrets Manager includes the following tags (key/value):
     + AmazonDataZoneProject: <projectID> 
     + AmazonDataZoneDomain: <domainID>

     If you choose to create a new secret, then the secret is automatically tagged with the tags referenced above and no extra steps are needed. For more information, see [Storing database credentials in AWS Secrets Manager](https://docs.aws.amazon.com/redshift/latest/mgmt/data-api-access.html#data-api-secrets).

     Amazon Redshift users in the AWS secret provided for creating the data source must have `SELECT` permissions on the tables that are to be published. If you want Amazon DataZone to also manage the subscriptions (access) on your behalf, the database users in the AWS secret must also have the following permissions: 
     + `CREATE DATASHARE`
     + `ALTER DATASHARE`
     + `DROP DATASHARE`

1. Under **Data selection**, provide an Amazon Redshift database, schema, and enter your table or view selection criteria. For example, if you choose **Include** and enter `*corporate`, the asset will include all source tables that end with the word `corporate`.

   You can add multiple include rules for tables within a single database. You can also add multiple databases using the **Add another database** button.

1. Choose **Next**.

1. For **Publishing settings**, choose whether assets are immediately discoverable in the data catalog. If you only add them to the inventory, you can choose subscription terms later and publish them to the business data catalog. 

1. For **Automated business name generation**, choose whether to automatically generate metadata for assets as they're published and updated from the source.

1. (Optional) For **Metadata forms**, add forms to define the metadata that is collected and saved when the assets are imported into Amazon DataZone. For more information, see [Create a metadata form in Amazon DataZone](create-metadata-form.md).

1. For **Run preference**, choose when to run the data source.
   + **Run on a schedule** – Specify the dates and time to run the data source.
   + **Run on demand** – You can manually initiate data source runs.

1. Choose **Next**.

1. Review your data source configuration and choose **Create**.

**Note**  
When an Amazon Redshift data source is created, Amazon DataZone grants read only' access to the environment used to create the data source to access all the tables in the Amazon Redshift schemas used in the data source. You can monitor the status of these grants under data sources on your environment's details page.   
When using a different Amazon Redshift cluster or a Serverless workgroup than the one used to create the environment, you must ensure that the following AWS tag is added to the cluster or workgroup. This is necessary for the environment users to be able to view the granted database in the Amazon Redshift Query Editor V2: `DataZoneDiscoverable_${domainId}: true`   
For the environments created prior to the current release of Amazon DataZone, project members will not be able to see granted tables in Amazon Redshift.

# Edit a data source in Amazon DataZone
Edit a data source

After you create an Amazon DataZone data source, you can modify it at any time to change the source details or the data selection criteria. When you no longer need a data source, you can delete it.

To complete these steps, you must have the **AmazonDataZoneFullAccess** AWS managed policy attached. For more information, see [AWS managed policies for Amazon DataZone](security-iam-awsmanpol.md).

You can edit an Amazon DataZone data source to modify its data selection settings, including adding, removing, or changing the table selection criteria. You can also add and remove databases. You can't change the data source type or the environment in which a data source is published.

**To edit a data source**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project to which the data source belongs.

1. Navigate to the **Data** tab for the project.

1. Choose **Data sources** from the left navigation pane, then choose the data source that you want to modify.

1. Navigate to the **Data source definition** tab and choose **Edit**.

1. Make your changes to the data source definition. You can update the data source details and make changes to the data selection criteria. 

1. When you're done making changes, choose **Save**.

# Delete a data source in Amazon DataZone
Delete a data source

After you create an Amazon DataZone data source, you can modify it at any time to change the source details or the data selection criteria.

To complete these steps, you must have the **AmazonDataZoneFullAccess** AWS managed policy attached. For more information, see [AWS managed policies for Amazon DataZone](security-iam-awsmanpol.md).

When you no longer need an Amazon DataZone data source, you can remove it permenantly. After you delete a data source, all assets that originated from that data source are still available in the catalog, and users can still subscribe to them. However, the assets will stop receiving updates from the source. We recommend that you first move the dependent assets to a different data source before you delete it.

**Note**  
You must remove all fulfillments on the data source before you can delete it. For more information, see [Amazon DataZone data discovery, subscription, and consumption](discover-subscribe-consume-data.md).

**To delete a data source**

1. On the **Data** tab for the project, choose **Data sources** from the left navigation pane.

1. Choose the data source that you want to delete.

1. Choose **Actions**, **Delete data source** and confirm deletion.

# Publish assets to the Amazon DataZone catalog from the project inventory
Publish assets to the catalog from the project inventory

You can publish Amazon DataZone assets and their metadata from project inventories into the Amazon DataZone catalog. You can only publish the most recent version of an asset to the catalog.

Consider the following when publishing assets to the catalog:
+ To publish an asset to the catalog, you must be the owner or the contributor of that project.
+ For Amazon Redshift assets, ensure that the Amazon Redshift clusters associated with both publisher and subscriber clusters meet all the requirements for Amazon Redshift data sharing in order for Amazon DataZone to manage access for Redshift tables and views. See [Data sharing concepts for Amazon Redshift](https://docs.aws.amazon.com/redshift/latest/dg/concepts.html).
+ Amazon DataZone only supports access management for assets published from the AWS Glue Data Catalog and Amazon Redshift. For all other assets, such as Amazon S3 objects, Amazon DataZone does not manage access for approved subscribers. If you subscribe to these unmanaged assets, you're notified with the following message: 

  `Subscription approval does not provide access to data. Subscription grants on this asset are not managed by Amazon DataZone. For more information or help, reach out to your administrator.`

## Publish an asset in Amazon DataZone
Publish an asset

If you didn't choose to make assets immediately discoverable in the data catalog when you created a data source, perform the following steps to publish them later.

**To publish an asset**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project to which the asset belongs.

1. Navigate to the **Data** tab for the project.

1. Choose **Inventory data** from the left navigation pane, then select the asset that you want to publish.
**Note**  
By default, all assets require subscription approval, which means a data owner must approve all subscription requests to the asset. If you want to change this setting before publishing the asset, open the asset details and choose **Edit** next to **Subscription approval**. You can change this setting later by modifying and re-publishing the asset.

1. Choose **Publish asset**. The asset is directly published to the catalog.

   If you make changes to the asset, such as modifying its approval requirements , you can choose **Re-publish** to publish the updates to the catalog.

# Manage inventory and curate assets in Amazon DataZone
Manage inventory and curate assets

In order to use Amazon DataZone to catalog your data, you must first bring your data (assets) as inventory of your project in Amazon DataZone. Creating inventory for a particular project, makes the assets discoverable only to that project’s members. 

Once the assets are created in project inventory, their metadata can be curated. For example, you can edit the asset's name, description, or read me. Each edit to the asset creates a new version of the asset. You can use the History tab on the asset's details page to view all asset versions. 

You can edit the **Read Me** section and add rich descriptions for the asset. The **Read Me** section supports markdown, thus enabling you to format your descriptions as required and describe key information about an asset to consumers. 

Glossary terms can be added at the asset level by filling out available forms. 

To curate the schema, you can review the columns, add business names, descriptions, and add glossary terms at column level. 

If automated metadata generation is enabled when the data source is created, the business names for assets and columns are available to review and accept or reject individually or all at once. 

You can also edit the subscription terms to specify if approval for the asset is required or not. 

Metadata forms in Amazon DataZone enable you to extend a data asset's metadata model by adding custom-defined attributes (for example, sales region, sales year, and sales quarter). The metadata forms that are attached to an asset type are applied to all assets created from that asset type. You can also add additional metadata forms to individual assets as part of the data source run or after it's created. For creating new forms, see [Create a metadata form in Amazon DataZone](create-metadata-form.md). 

To update the metadata of an asset, you must be the owner or the contributor of the project to which the asset belongs.

**To update the metadata of an asset**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project that contains the asset whose metadata you want to update.

1. Navigate to the **Data** tab for the project.

1. Choose **Inventory data** from the left navigation pane, then choose the name of the the asset whose metadata you want to update.

1. On the asset details page, under **Metadata forms**, choose **Edit** and edit the existing forms as needed. You can also attach additional metadata forms to the asset. For more information, see [Attach additional metadata forms to assets](#update-metadata-data-steward).

1. When you're done making updates, choose **Save form**.

   When you save the form, Amazon DataZone generates a new inventory version of the asset. To publish the updated version to the catalog, choose **Re-publish asset**.

## Attach additional metadata forms to assets


By default, metadata forms attached to a domain are attached to all assets published to that domain. Data publishers can associate additional metadata forms to individual assets in order to provide additional context.

**To attach additional metadata forms to an asset**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project that contains the asset whose metadata you want to add to.

1. Navigate to the **Data** tab for the project.

1. Choose **Inventory data** from the left navigation pane, then choose the name of the the asset whose metadata you want to add to.

1. On the asset details page, under **Metadata forms**, choose **Add forms**.

1. Select the form(s) to add to the asset, then choose **Add forms**.

1. Enter values for each of the metadata fields, then choose **Save form**.

   When you save the form, Amazon DataZone generates a new inventory version of the asset. To publish the updated version to the catalog, choose **Re-publish asset**.

## Publish asset to the catalog after curation in Amazon DataZone
Publish asset to the catalog after curation

Once satisfied with the asset curation, the data owner can publish an asset version to the Amazon DataZone catalog and thus make it discoverable by all domain users. The asset shows the inventory version and the published version. In the discovery catalog, only the latest published version appears. If the metadata is updated after publishing, then a new inventory version will be available for publishing to the catalog. 

# Manually create an asset in Amazon DataZone
Manually create an asset

In Amazon DataZone, an asset is an entity that presents a single physical data object (for example, a table, a dashboard, a file) or virtual data object (for example, a view). For more information, see [Amazon DataZone terminology and concepts](datazone-concepts.md). Publishing an asset manually is a one-time operation. You don't specify a run schedule for the asset, so it's not updated automatically if its source changes.

To manually create an asset through a project, you must be the owner or the contributor of that project.

**To create an asset manually**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project to which to create the asset.

1. Navigate to the **Data** tab for the project.

1. Choose **Data sources** from the left navigation pane, then choose **Create data asset**.

1. For **Asset details**, configure the following settings:
   + **Asset type** – The type of asset.
   + **Name** – The name of the asset.
   + **Description** – A description of the asset.

1. For **S3 location**, enter the Amazon Resource Name (ARN) of the source S3 bucket.

   Optionally, enter an S3 access point. For more information, see [Managing data access with Amazon S3 access points](https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points.html).

1. For **Publishing settings**, choose whether assets are immediately discoverable in the catalog. If you only add them to the inventory, you can choose subscription terms later to publish them to the catalog.

1. Choose **Create**. 

   Once the asset is created, it will either be directly published as an active asset in the catalog, or will be stored in the inventory until you decide to publish it.

# Unpublish an asset from the Amazon DataZone catalog
Unpublish an asset from the catalog

When you unpublish an Amazon DataZone asset from the catalog, it no longer appears in global search results. New users won't be able to find or subscribe to the asset listing in the catalog, but all existing subscriptions remain the same.

To unpublish an asset, you must be the owner or the contributor of the project to which the asset belongs:

**To unpublish an asset**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project to which the asset belongs.

1. Navigate to the **Data** tab for the project.

1. Choose **Published data** from the left navigation pane.

1. Locate the asset from the list of published assets, then choose **Unpublish**.

   The asset is removed from the catalog. You can re-publish the asset at any time by choosing **Publish**.

# Delete an Amazon DataZone asset
Delete an asset

When you no longer need an asset in Amazon DataZone, you can permanently delete it. Deleting an asset is different than unpublishing an asset from the catalog. You can delete an asset and its related listing in the catalog so that it's not visible in any search results. To delete the asset listing, you must first revoke all of its subscriptions. 

To delete an asset, you must be the owner or the contributor of the project to which the asset belongs:

**Note**  
In order to delete an asset listing, you must first revoke all existing subscriptions to the asset. You can't delete an asset listing that has existing subscribers.

**To delete and asset**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project that contains the asset that you want to delete.

1. Navigate to the **Data** tab for the project.

1. Choose **Published data** from the left navigation pane, then locate and choose the asset that you want to delete. This opens the asset details page.

1. Choose **Actions**, **Delete** and confirm deletion.

   Once the asset is deleted, it's no longer available to view and users can't subscribe to it.

# Manually start a data source run in Amazon DataZone
Manually start a data source run

When you run a data source, Amazon DataZone pulls all any new or modified metadata from the source and updates the associated assets in the inventory. When you add a data source to Amazon DataZone, you specify the source's run preference, which defines whether the source runs on a schedule or on demand. If your source runs on demand, you must initiate a data source run manually.

Even if your source runs on a schedule, you can still run it manually at any time. After adding business metadata to the assets, you can select assets and publish them to the Amazon DataZone catalog in order for these assets to be discoverable by all domain users. Only published assets are searchable by other domain users.

**To run a data source manually**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project to which the data source belongs.

1. Navigate to the **Data** tab for the project.

1. Choose **Data sources** from the left navigation pane, then locate and choose the data source that you want to run. This opens the data source details page.

1. Choose **Run on demand**.

   The data source status changes to `Running` as Amazon DataZone updates the asset metadata with the most recent data from the source. You can monitor the status of the run on the **Data source runs** tab. 

# Asset revisions in Amazon DataZone
Asset versioning

Amazon DataZone increments the revision of an asset when you edit its business or technical metadata. These edits include modifying the asset name, description, glossary terms, columns names, metadata forms, and metadata form field values. These changes can result from manual edits, data source job run, or API operations. Amazon DataZone automatically generates a new asset revision any time you make an edit to the asset.

After you update an asset and a new revision is generated, you must publish the new revision to the catalog for it to be updated and available to subscribers. For more information, see [Publish assets to the Amazon DataZone catalog from the project inventory](publishing-data-asset.md). You can only publish the most recent version of an asset to the catalog.

**To view past revisions of an asset**

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose **Select project** from the top navigation pane and select the project that contains the asset.

1. Navigate to the **Data** tab for the project, then locate and choose the asset. This opens the asset details page.

1. Navigate to the **History** tab, which displays a list of past revisions of the asset.

# Data quality in Amazon DataZone


Data quality metrics in Amazon DataZone help you understand the different quality metrics such as completeness, timeliness, and accuracy of your data sources. Amazon DataZone integrates with AWS Glue Data Quality and offers APIs to integrate data quality metrics from third-party data quality solutions. Data users can see how data quality metrics change over time for their subscribed assets. To author and run the data quality rules, you can use your data quality tool of choice such as AWS Glue data quality. With data quality metrics in Amazon DataZone, data consumers can visualize the data quality scores for the assets and columns, helping build trust in the data they use for decisions. 

**Pre-requisites and IAM role changes**

If you are using Amazon DataZone's AWS managed policies, there are no additional configuration steps and these managed policies are automatically updated to support data quality. If you are using your own policies for the roles that grant Amazon DataZone the required permissions to interoperate with supported services, you must update the policies attached to these roles to enable support for reading the AWS Glue data quality information in the [AWS managed policy: AmazonDataZoneGlueManageAccessRolePolicy](security-iam-awsmanpol-AmazonDataZoneGlueManageAccessRolePolicy.md) and enable support for the time series APIs in the [AWS managed policy: AmazonDataZoneDomainExecutionRolePolicy](security-iam-awsmanpol-AmazonDataZoneDomainExecutionRolePolicy.md) and the [AWS managed policy: AmazonDataZoneFullUserAccess](security-iam-awsmanpol-AmazonDataZoneFullUserAccess.md). 

## Enabling data quality for AWS Glue assets


Amazon DataZone pulls the data quality metrics from AWS Glue in order to provide context during a point in time, for example, during a business data catalog search. Data users can see how data quality metrics change over time for their subscribed assets. Data producers can ingest AWS Glue data quality scores on a schedule. The Amazon DataZone business data catalog can also display data quality metrics from third-party systems through data quality APIs. For more information, see [AWS Glue Data Quality](https://docs.aws.amazon.com/glue/latest/dg/glue-data-quality.html) and [Getting started with AWS Glue Data Quality for the Data Catalog](https://docs.aws.amazon.com/glue/latest/dg/data-quality-getting-started.html).

You can enable data quality metrics for your Amazon DataZone assets in the following ways:
+ Use the Data Portal or the Amazon DataZone APIs to enable data quality for your AWS Glue data source via the Amazon DataZone data portal either while creating new or editing existing AWS Glue data source.

  For more information on enabling data quality for a data source via the portal, see [Create and run an Amazon DataZone data source for the AWS Glue Data Catalog](create-glue-data-source.md). 
**Note**  
You can use the Data Portal to enable data quality only for your AWS Glue inventory assets. In this release of Amazon DataZone enabling data quality for Amazon Redshift or custom types assets via the data portal is not supported.

  You can also use the APIs to enable data quality for your new or existing data sources. You can do this by invoking the [CreateDataSource](https://docs.aws.amazon.com/datazone/datazone/latest/APIReference/API_CreateDataSource.htmlAPI) or [UpdateDataSource](https://docs.aws.amazon.com/datazone/datazone/latest/APIReference/API_UpdateDataSource.htmlAPI) and setting the `autoImportDataQualityResult` parameter to 'True'.

  After data quality is enabled, you can run the data source on demand or on schedule. Each run can bring in up to 100 metrics per asset. There is no need to create forms or add metrics manually when using data source for data quality. When the asset is published, the updates that were made to the data quality form (up to 30 data points per rule of history) are reflected in the listing for the consumers. Subsequently, each new addition of metrics to the asset, is automatically added to the listing. There is no need to republish the asset to make the latest scores available to consumers. 

## Enabling data quality for custom asset types


You can use the Amazon DataZone APIs to enable data quality for any of your custom type assets. For more information, see the following:
+ [PostTimeSeriesDataPoints](https://docs.aws.amazon.com/datazone/latest/APIReference/API_PostTimeSeriesDataPoints.html)
+ [ListTimeSeriesDataPoints](https://docs.aws.amazon.com/datazone/latest/APIReference/API_ListTimeSeriesDataPoints.html)
+ [GetTimeSeriesDataPoint](https://docs.aws.amazon.com/datazone/latest/APIReference/API_GetTimeSeriesDataPoint.html)
+ [DeleteTimeSeriesDataPoints](https://docs.aws.amazon.com/datazone/latest/APIReference/API_DeleteTimeSeriesDataPoints.html)

The following steps provide an example of using APIs or CLI to import third-party metrics for your assets in Amazon DataZone:

1. Invoke the `PostTimeSeriesDataPoints` API as follows:

   ```
   aws datazone post-time-series-data-points  \
   --cli-input-json file://createTimeSeriesPayload.json \
   ```

   with the following payload:

   ```
   "domainId": "dzd_5oo7xzoqltu8mf",
       "entityId": "4wyh64k2n8czaf",
       "entityType": "ASSET",
       "form": {
           "content": "{\n  \"evaluations\" : [ {\n    \"types\" : [ \"MaximumLength\" ],\n    \"description\" : \"ColumnLength \\\"ShippingCountry\\\" <= 6\",\n    \"details\" : { },\n    \"applicableFields\" : [ \"ShippingCountry\" ],\n    \"status\" : \"PASS\"\n  }, {\n    \"types\" : [ \"MaximumLength\" ],\n    \"description\" : \"ColumnLength \\\"ShippingState\\\" <= 2\",\n    \"details\" : { },\n    \"applicableFields\" : [ \"ShippingState\" ],\n    \"status\" : \"PASS\"\n  }, {\n    \"types\" : [ \"MaximumLength\" ],\n    \"description\" : \"ColumnLength \\\"ShippingCity\\\" <= 8\",\n    \"details\" : { },\n    \"applicableFields\" : [ \"ShippingCity\" ],\n    \"status\" : \"PASS\"\n  }, {\n    \"types\" : [ \"Completeness\" ],\n    \"description\" : \"Completeness \\\"ShippingStreet\\\" >= 0.59\",\n    \"details\" : { },\n    \"applicableFields\" : [ \"ShippingStreet\" ],\n    \"status\" : \"PASS\"\n  }, {\n    \"types\" : [ \"MaximumLength\" ],\n    \"description\" : \"ColumnLength \\\"ShippingStreet\\\" <= 101\",\n    \"details\" : { },\n    \"applicableFields\" : [ \"ShippingStreet\" ],\n    \"status\" : \"PASS\"\n  }, {\n    \"types\" : [ \"MaximumLength\" ],\n    \"description\" : \"ColumnLength \\\"BillingCountry\\\" <= 6\",\n    \"details\" : { },\n    \"applicableFields\" : [ \"BillingCountry\" ],\n    \"status\" : \"PASS\"\n  }, {\n    \"types\" : [ \"Completeness\" ],\n    \"description\" : \"Completeness \\\"biLlingcountry\\\" >= 0.5\",\n    \"details\" : {\n      \"EVALUATION_MESSAGE\" : \"Value: 0.26666666666666666 does not meet the constraint requirement!\"\n    },\n    \"applicableFields\" : [ \"biLlingcountry\" ],\n    \"status\" : \"FAIL\"\n  }, {\n    \"types\" : [ \"Completeness\" ],\n    \"description\" : \"Completeness \\\"Billingstreet\\\" >= 0.5\",\n    \"details\" : { },\n    \"applicableFields\" : [ \"Billingstreet\" ],\n    \"status\" : \"PASS\"\n  } ],\n  \"passingPercentage\" : 88.0,\n  \"evaluationsCount\" : 8\n}",
           "formName": "shortschemaruleset",
           "id": "athp9dyw75gzhj",
           "timestamp": 1.71700477757E9,
           "typeIdentifier": "amazon.datazone.DataQualityResultFormType",
           "typeRevision": "8"
       },
       "formName": "shortschemaruleset"
   }
   ```

   You can obtain this payload by invoking the `GetFormType` action:

   ```
   aws datazone get-form-type --domain-identifier <your_domain_id> --form-type-identifier amazon.datazone.DataQualityResultFormType --region <domain_region> --output text --query 'model.smithy'
   ```

1. Invoke the `DeleteTimeSeriesDataPoints` API as follows:

   ```
   aws datazone delete-time-series-data-points\
   --domain-identifier dzd_bqqlk3nz21zp2f \
   --entity-identifier dzd_bqqlk3nz21zp2f \
   --entity-type ASSET \
   --form-name rulesET1 \
   ```

# Using machine learning and generative AI in Amazon DataZone


**Note**  
Powered by Amazon Bedrock: AWS implements automated abuse detection. Because the AI recommendations for descriptions functionality in Amazon DataZone is built on Amazon Bedrock, users inherit the controls implemented in Amazon Bedrock to enforce safety, security, and the responsible use of AI.

In the current release of Amazon DataZone, you can use the AI recommendations for names & descriptions functionality to automate data discovery and cataloging. Support for generative AI in Amazon DataZone creates business names and descriptions for assets and columns. You can use these names and descriptions to add business context for your data and recommend analysis for datasets, which can help boost data discovery results.

Powered by Amazon Bedrock's large language models, the AI recommendations for data asset names & descriptions in Amazon DataZone help you to ensure that your data is comprehensible and easily discoverable. The AI recommendations also suggest the most pertinent analytical applications for datasets. By reducing manual documentation tasks and advising on appropriate data usage, auto-generated names and descriptions can help you to enhance the trustworthiness of your data and minimize overlooking valuable data to accelerate informed decision making.

## Supported Regions


In the current Amazon DataZone release, the AI recommendations for names and descriptions feature is supported in the following regions:
+ US East (N. Virginia)
+ US West (Oregon)
+ Asia Pacific (Tokyo)
+ Europe (Frankfurt)
+ Asia Pacific (Sydney)
+ Canada (Central)
+ Europe (London)
+ South America (Sao Paulo)
+ Europe (Ireland)
+ Asia Pacific (Singapore)
+ US East (Ohio)
+ Asia Pacific (Seoul)

Amazon DataZone supports Business Description Generation in the following regions.
+ Asia Pacific (Mumbai)
+ Europe (Paris)

Amazon DataZone supports Business Name Generation in the following regions.
+ Europe (Stockholm)

**Bedrock Cross Region Inference**  
Amazon DataZone leverages Amazon Bedrock's Cross Region inference endpoint to serve recommendations for the US East (Ohio) region. All other regions use in-region endpoint.

## Steps to use GenAI


The following procedure describes how to generate AI recommendations for names and descriptions in Amazon DataZone:
+ Navigate to the Amazon DataZone data portal URL, and then sign in using single sign-on (SSO) or your AWS credentials. If you're an Amazon DataZone administrator, navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, and then choose **Open data portal**.
+ In the top navigation pane, choose **Select project**, and then choose the project that contains the asset for which you want to generate AI recommendations for descriptions.

### Generating Business Descriptions and Summaries

+ Navigate to the **Data** tab for the project.
+ In the left navigation pane, choose **Inventory data**, and then choose the name of the asset for which you want to generate AI recommendations for descriptions for the asset.
+ On the asset's details page, in the **Business metadata** tab, choose **Generate descriptions**.

### Generating Business Names

+ Navigate to the **Data** tab for the project.
+ In the left navigation pane, choose **Data sources**, and then choose datasource for which you want to enable business name generation.
+ Go to the **details** tab and enable the **AUTOMATED BUSINESS NAME GENERATION** configuration.
+ BusinessNames can also be generated programmatically when creating an asset by enabling the businessNameGeneration flag under predictionConfiguration in the [CreateAsset API](https://docs.aws.amazon.com/datazone/latest/APIReference/API_CreateAsset.html) payload.

### Accepting/Rejecting Predictions

+ Once the descriptions are generated, you can either edit, accept, or reject them.
+ Green icons are displayed next to each automatically generated metadata description for the data asset. In the **Business metadata** tab, you can choose the green icon next to the automatically generated **Summary**, and then choose **Edit**, **Accept**, or **Reject** to address the generated description.
+ You can also choose **Accept all** or **Reject all** options that are displayed at the top of the page when the **Business metadata** tab is selected, and thus perform the selected action on all automatically generated descriptions.
+ Or you can choose the **Schema** tab, and then address automatically generated descriptions individually by choosing the green icon for one column description at a time and then choosing **Accept** or **Reject**.
+ In the **Schema** tab, you can also choose **Accept all** or **Reject all** and thus perform the selected action on all automatically generated descriptions.

To publish the asset to the catalog with the generated descriptions, choose **Publish asset**, and then confirm this action by choosing **Publish asset** again in the **Publish asset** pop up window.

**Note**  
If you don't accept or reject the generated descriptions for an asset, and then you publish this asset, this unreviewed automatically generated metadata is not included in the published data asset.

## Support for custom relational asset types


Amazon DataZone supports genAI capabilities for custom asset types. Previously this feature was only supported for the managed AWS Glue and AMazon Redshift asset types.

In order to enable this feature, create your own asset type definition and attach `RelationalTableFormType` as one of the forms. Amazon DataZone automatically detects the presence of such forms and enables GenAI capabilities for these assets. The overall experience remains the same for generating business names (via predictionConfiguration in the CreateAsset API) and businessDescription (via Generate Description button click on the asset details page).

For more information about creating custom asset types see [Create custom asset types in Amazon DataZone](create-asset-types.md). 

## Quotas


Amazon DataZone supports different quotas for business name generation and business description generation. You can reach out to the AWS support team for an increase in these quotas.
+ BusinessDescriptionGeneration: 10K invocations/month
+ BusinessNameGeneration: 50K invocations/month

# Data lineage in Amazon DataZone


Data lineage in Amazon DataZone is an OpenLineage-compatible feature that can help you to capture and visualize lineage events, from OpenLineage-enabled systems or through APIs, to trace data origins, track transformations, and view cross-organizational data consumption. It provides you with an overarching view into your data assets to see the origin of assets and their chain of connections. The lineage data includes information on the activities inside the Amazon DataZone's business data catalog, including information about the catalogued assets, the subscribers of those assets, and the activities that happen outside the business data catalog captured programmatically using the APIs.

**Topics**
+ [

## Types of lineage nodes in Amazon DataZone
](#datazone-data-lineage-node-types)
+ [

## Key attributes in lineage nodes
](#datazone-data-lineage-key-attributes)
+ [

## Visualizing data lineage
](#datazone-data-lineage-history)
+ [

## Data lineage authorization in Amazon DataZone
](#datazone-data-lineage-authorization)
+ [

## Data lineage sample experience in Amazon DataZone
](#datazone-data-lineage-sample-experience)
+ [

## Enable data lineage in the management console
](#enable-data-lineage)
+ [

## Using Amazon DataZone data lineage programmatically
](#datazone-data-lineage-apis)
+ [

## Automate lineage for the AWS Glue catalog
](#datazone-data-lineage-automate)
+ [

## Automate lineage from Amazon Redshift
](#datazone-data-lineage-automate-redshift)

Lineage can be setup to be automatically captured from AWS Glue and Amazon Redshift databases when added to Amazon DataZone. Additionally, Spark ETL job runs in AWS Glue (v5.0 and higher) console or notebooks can be configured to send lineage events to Amazon DataZone domains. 

In Amazon DataZone, domain administrators can configure lineage while setting up data lake and data warehouse build-in blueprints which ensures that all data source runs created from those resources are enabled for automatic lineage capture.

Using Amazon DataZone's OpenLineage-compatible APIs, domain administrators and data producers can capture and store lineage events beyond what is available in Amazon DataZone, including transformations in Amazon S3, AWS Glue, and other services. This provides a comprehensive view for the data consumers and helps them gain confidence of the asset's origin, while data producers can assess the impact of changes to an asset by understanding its usage. Additionally, Amazon DataZone versions lineage with each event, enabling users to visualize lineage at any point in time or compare transformations across an asset's or job's history. This historical lineage provides a deeper understanding of how data has evolved, essential for troubleshooting, auditing, and ensuring the integrity of data assets.

With data lineage, you can accomplish the following in Amazon DataZone: 
+ Understand the provenance of data: knowing where the data originated fosters trust in data by providing you with a clear understanding of its origins, dependencies, and transformations. This transparency helps in making confident data-driven decisions.
+ Understand the impact of changes to data pipelines: when changes are made to data pipelines, lineage can be used to identify all of the downstream consumers that are to be affected. This helps to ensure that changes are made without disrupting critical data flows.
+ Identify the root cause of data quality issues: if a data quality issue is detected in a downstream report, lineage, especially column-level lineage, can be used to trace the data back (at a column level) to identify the issue back to its source. This can help data engineers to identify and fix the problem.
+ Improve data governance and compliance: column-level lineage can be used to demonstrate compliance with data governance and privacy regulations. For example, column-level lineage can be used to show where sensitive data (such as PII) is stored and how it is processed in downstream activities.

## Types of lineage nodes in Amazon DataZone


in Amazon DataZone, data lineage information is presented in nodes that represent tables and views. Depending on the context of the project, for example, a project selected at the top left in the data portal, the producers are able to view both, inventory and published assets, whereas consumers can only view the published assets. When you first open the lineage tab in the asset details page, the catalogued dataset node is the starting point for navigating upstream or downstream through the lineage nodes of your lineage graph.

The following are the types of data lineage nodes that are supported in Amazon DataZone:
+ **Dataset node** - this node type includes data lineage information about a specific data asset. 
  + Dataset nodes that include information about AWS Glue or Amazon Redshift assets published in the Amazon DataZone catalog are auto-generated and include a corresponding AWS Glue or Amazon Redshift icon within the node. 
  + Dataset nodes that include information about assets that are not published in the Amazon DataZone catalog, are created manually by domain administrators (producers) and are represented by a default custom asset icon within the node. 
+ **Job (run) node** - this node type displays the details of the job, including the latest run of a particular job and run details. This node also captures multiple runs of the job and can be viewed in the **History** tab of the node details. You can view node details by choosing the node icon.

## Key attributes in lineage nodes


The `sourceIdentifier` attribute in a lineage node represents the events happening on a dataset. The `sourceIdentifier` of the lineage node is the identifier of the dataset (table/view etc). It’s used for uniqueness enforcement on the lineage nodes. For example, there can’t be two lineage nodes with same `sourceIdentifier`. The following are examples of `sourceIdentifier` values for different types of nodes:
+ For dataset node with respective dataset type:
  + Asset: amazon.datazone.asset/<assetId>
  + Listing (published asset): amazon.datazone.listing/<listingId>
  + AWS Glue table: arn:aws:glue:<region>:<account-id>:table/<database>/<table-name> 
  + Amazon Redshift table/view: arn:aws:<redshift/redshift-serverless>:<region>:<account-id>:<table-type(table/view etc)>/<clusterIdentifier/workgroupName>/<database>/<schema>/<table-name> 
  + For any other type of dataset nodes imported using open-lineage run events, <namespace>/<name> of the input/output dataset is used as `sourceIdentifier` of the node.
+ For jobs:
  + For job nodes imported using open-lineage run events, <jobs\$1namespace>.<job\$1name> is used as sourceIdentifier.
+ For job runs:
  + For job run nodes imported using open-lineage run events, <jobs\$1namespace>.<job\$1name>/<run\$1id> is used as sourceIdentifier.

For assets created using `createAsset` API, the `sourceIdentifier` must be updated using `createAssetRevision` API to enable mapping the asset to upstream resources.

## Visualizing data lineage


Amazon DataZone’s asset details page provides a graphical representation of data lineage, making it easier to visualize data relationships upstream or downstream. The asset details page provides the following capabilities to navigate the graph:
+ Column-level lineage: expand column-level lineage when available in dataset nodes. This automatically shows relationships with upstream or downstream dataset nodes if source column information is available.
+ Column search: when the default display for number of columns is 10. If there are more than 10 columns, pagination is activiated to navigate to the rest of the columns. To quickly view a particular column, you can search on the dataset node that list just the searched column.
+ View dataset nodes only: if you want to toggle to view only dataset lineage nodes and filter out the job nodes, you can choose the Open view control icon on the top left of the graph viewer and toggle the **Display dataset nodes only** option. This will remove all the job nodes from the graph and lets you navigate just the dataset nodes. Note that when the view only dataset nodes is turned on, the graph cannot be expanded upstream or downstream.
+ Details pane: Each lineage node has details captured and displayed when selected.
  + Dataset node has a detail pane to display all the details captured for that node for a given timestamp. Every dataset node has 3 tabs, namely: Lineage info, Schema, and History tab. The history tab lists the different versions of lineage event captured for that node. All details captured from API are displayed using metadata forms or a JSON viewer.
  + Job node has a detail pane to display job details with tabs, namely: Job info, and History. The details pane also captures query or expressions captured as part of the job run. The history tab lists the different versions of job run event captured for that job. All details captured from API are displayed using metadata forms or a JSON viewer.
+ Version tabs: all lineage nodes in Amazon DataZone data lineage have versioning. For every dataset node or job node, the versions are captured as history and that enables you to navigate between the different versions to identify what has changed overtime. Each version opens a new tab in the lineage page to help compare or contrast.

## Data lineage authorization in Amazon DataZone


**Write permissions** - to publish lineage data into Amazon DataZone, you must have an IAM role with a permissions policy that includes an `ALLOW` action on the `PostLineageEvent` API. This IAM authorization happens at API Gateway layer.

**Read permissions** - there are two operations: `GetLineageNode` and `ListLineageNodeHistory` that are included in the `AmazonDataZoneDomainExecutionRolePolicy` managed policy and therefore every user in the Amazon DataZone domain can invoke these to traverse the data lineage graph. 

## Data lineage sample experience in Amazon DataZone


You can use the data lineage sample experience to browse and understand data lineage in Amazon DataZone, including traversing upstream or downstream in your data lineage graph, exploring versions and column-level lineage.

Complete the followng procedure to try the sample data lineage experience in Amazon DataZone:

1. Navigate to the Amazon DataZone data portal URL and sign in using single sign-on (SSO) or your AWS credentials. If you’re an Amazon DataZone administrator, you can navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with the AWS account where the domain was created, then choose **Open data portal**.

1. Choose any available data asset to open the asset's details page.

1. On the asset's details page, choose the **Lineage** tab, then mouse over the information icon, and then choose **Try sample lineage**.

1. In the data lineage pop up window, choose **Start guided data lineage tour**.

   At this point, a fullscreen tab that provides all the space of lineage information is displayed. The sample data lineage graph is initially displayed with a base node with 1-depth at either ends, upstream and downstream. You can expand the graph upstream or downstream. The columns information is also available for you to choose and see how lineage flows through the nodes. 

## Enable data lineage in the management console


You can enable data lineage as part of configuring your Default Data Lake and Default Data Warehouse blueprints.

Complete the following procedure to enable data lineage for your Default Data Lake blueprint.

1. Navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with your account credentials.

1. Choose **View domains** and choose the domain where you want to enable data lineage for your DefaultDataLake blueprint.

1. On the domain details page, navigate to the **Blueprints** tab.

1. On the DefaultDataLake blueprint's details page, choose the **Regions** tab.

1. You can enable data lineage as part of adding a region for your DefaultDataLake blueprint. So if a region is already added but the data lineage functionality in it is not enabled (**No** is displayed in the **Import data lineage** column, you must first remove this region. To enable data lineage, choose **Add region**, then choose the region you want to add, and make sure to check the **Enable importing data lineage** checkbox in the **Add Region** pop up window.

To enabled data lineage for your DefaultDataWarehouse blueprint, complete the following procedure.

1. Navigate to the Amazon DataZone console at [https://console.aws.amazon.com/datazone](https://console.aws.amazon.com/datazone) and sign in with your account credentials.

1. Choose **View domains** and choose the domain where you want to enable data lineage for your DefaultDataWarehouse blueprint.

1. On the domain details page, navigate to the **Blueprints** tab.

1. On the DefaultDataWarehouse blueprint's details page, choose the **Parameter sets** tab.

1. You can enable data lineage as part of adding a parameter set for your DefaultDataWarehouse blueprint. To do so, choose **Create parameter set**.

1. On the **Create parameter set** page, specify the following and then choose **Create parameter set**.
   + Name for the parameter set.
   + Description for the parameter set.
   + AWS Region where you want to create environments.
   + Specify whether Amazon DataZone is to use these parameters to establish a connection to your Amazon Redshift cluster or serverless workgroup.
   + Specify an AWS secret.
   + Specfy either a cluster or serverless workgroup that you want to use when creating environments.
   + Specify the name of the database (within the cluster or workgroup you specified) that you want to use when creating environments.
   + Under **Import data lineage**, check the **Enable importing data lineage**.

## Using Amazon DataZone data lineage programmatically


To use the data lineage functionality in Amazon DataZone, you can invoke the following APIs:
+ [GetLineageNode](https://docs.aws.amazon.com/datazone/latest/APIReference/API_GetLineageNode.html)
+ [ListLineageNodeHistory](https://docs.aws.amazon.com/datazone/latest/APIReference/API_ListLineageNodeHistory.html)
+ [PostLineageEvent](https://docs.aws.amazon.com/datazone/latest/APIReference/API_PostLineageEvent.html)

## Automate lineage for the AWS Glue catalog


As and when AWS Glue databases and tables are added to the Amazon DataZone catalog, the lineage extraction is automated for those tables using data source runs. There are few ways lineage is automated for this source:
+ Blueprint configuration - administrators setting up blueprints can configure blueprints to capture lineage automatically. This enables the administrators to define which data sources are important for lineage capture rather than relying on data producers cataloguing data. For more information, see [Enable data lineage in the management console](#enable-data-lineage). 
+ Data source configuration - data producers, as they configure data source runs for AWS Glue databases, are presented with a view along with Data Quality to inform about automated data lineage for that data source. 
  + The lineage setting can be viewed in the **Data Source Definition** tab. This value is not editable by data producers. 
  + The lineage collection in Data Source run fetches information from table metadata to build lineage. AWS Glue crawler supports different types of sources and the sources for which lineage is captured as part of the Data Source run include Amazon S3, DynamoDB, Catalog, Delta Lake, Iceberg tables, and Hudi tables stored in Amazon S3. JDBC and DocumentDB or MongoDB are current not supported as sources. 
  + Limitation - it the number of tables is more than 100, the lineage run fails after 100 tables. Make sure the AWS Glue crawler is not configured to bring in more that 100 tables in a run. 
+ AWS Glue (v5.0) configuration - while running AWS Glue jobs in AWS Glue Studio, data lineage can be configured for the jobs to send lineage events directly to Amazon DataZone domain. 

  1. Navigate to the AWS Glue console at https://console.aws.amazon.com/gluestudio and sign in with your account credentials.

  1. Choose **ETL jobs** and either create a new job or click on any of the existing jobs. 

  1. Go to **Job details** (including ETL Flows job) tab and scroll down to Generate lineage events section. 

  1. Select the checkbox to enable sending lineage events and that expands to display an input field to enter the Amazon DataZone Domain ID. 
+ AWS Glue (V5.0) Notebook configuration - in a notebook, you can automate the collection of Spark executions by adding a %%configure magic. This configuration will send events to Amazon DataZone domain. 

  ```
  %%configure --name project.spark -f
  {
      "--conf":"spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener --conf spark.openlineage.transport.type=amazon_datazone_api --conf spark.openlineage.transport.domainId={DOMAIN_ID}  --conf spark.glue.accountId={ACCOUNT_ID} --conf spark.openlineage.facets.custom_environment_variables=[AWS_DEFAULT_REGION;GLUE_VERSION;GLUE_COMMAND_CRITERIA;GLUE_PYTHON_VERSION; --conf spark.glue.JOB_NAME={JOB_NAME}"
  }
  ```

  The following are the parameter details:
  + `spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener` - OpenLineageSparkListener will be created and registered with Spark's listener bus
  + `spark.openlineage.transport.type=amazon_datazone_api` - This is an OpenLineage specification to tell the OpenLineage Plugin to use DataZone API Transport to emit lineage events to DataZone’s PostLineageEvent API. For more information, see [https://openlineage.io/docs/integrations/spark/configuration/spark\$1conf](https://openlineage.io/docs/integrations/spark/configuration/spark_conf)
  + `spark.openlineage.transport.domainId={DOMAIN_ID}` - This parameter establishes the domain to which the API transport will submit the lineage events to.
  + `spark.openlineage.facets.custom_environment_variables [AWS_DEFAULT_REGION;GLUE_VERSION;GLUE_COMMAND_CRITERIA;GLUE_PYTHON_VERSION;]` - The following environment variables (AWS\$1DEFAULT\$1REGION , GLUE\$1VERSION , GLUE\$1COMMAND\$1CRITERIA, and GLUE\$1PYTHON\$1VERSION), which Glue interactive session populates, will be added to the LineageEvent
  + `spark.glue.accountId=<ACCOUNT_ID>` - Account Id of the Glue Data Catalog where the metadata resides. This account id is used to construct Glue ARN in lineage event.
  + `spark.glue.JOB_NAME` - Job name of the lineage event. The job name in notebook can be set as `spark.glue.JOB_NAME: ${projectId}.${pathToNotebook}`.
+ Set up parameters to configure communication to Amazon DataZone from AWS Glue

  Param key: --conf

  Param value:

  ```
  spark.extraListeners=io.openlineage.spark.agent.OpenLineageSparkListener 
  --conf spark.openlineage.transport.type=amazon_datazone_api 
  --conf spark.openlineage.transport.domainId=<DOMAIN_ID>
  --conf spark.openlineage.facets.custom_environment_variables=[AWS_DEFAULT_REGION;GLUE_VERSION;GLUE_COMMAND_CRITERIA;GLUE_PYTHON_VERSION;] 
  --conf spark.glue.accountId=<ACCOUNT_ID> (replace <DOMAIN_ID> and <ACCOUNT_ID> with the right values)
  ```

  For Notebooks add these additinal parameters:

  ```
  --conf spark.glue.JobName=<SessionId> --conf spark.glue.JobRunId=<SessionId or NONE?>
  replace <SessionId> and <SessionId> with the right values
  ```

## Automate lineage from Amazon Redshift


Capturing lineage from Amazon Redshift service with data warehouse blueprint configuration setup by administrators, lineage is automatically captured by Amazon DataZone. The lineage runs captures queries executed for a given database and generates lineage events to be stored in Amazon DataZone to be visualized by data producers or consumers when they go to a particular asset. 

Lineage can be automated using the following configurations:
+ Blueprint configuration: administrators setting up blueprints can configure blueprints to capture lineage automatically. This enables the administrators to define which data sources are important for lineage capture rather than relying on data producers cataloguing data. To setup, go to [Enable data lineage in the management console](#enable-data-lineage).
+ Data source configuration: data producers, as they configure data source runs for Amazon Redshift databases, are presented with automated data lineage setting for that data source. 

  The lineage setting can be viewed in the **Data Source Definition** tab. This value is not editable by data producers. 

# Metadata enforcement rules for publishing


The metadata enforcement rules for publishing in Amazon DataZone strengthen data governance by enabling domain unit owners to establish clear metadata requirements for data producers, streamlining access requests and enhancing data governance.

The feature is supported in all the AWS commercial Regions where Amazon DataZone is currently available.

Domain unit owners can can complete the following procedure to configure metadata enforcement in Amazon DataZone:

1. Navigate to the Amazon DataZone data portal using the data portal URL and log in using your SSO or AWS credentials. If you’re an Amazon DataZone administrator, you can obtain the data portal URL by accessing the Amazon DataZone console at https://console.aws.amazon.com/datazone in the AWS account where the Amazon DataZone domain was created.

1. Choose **Domains**, navigate to the **Domain units** tab and choose the domain unit that you want to work with.

1. Choose the **Rules** tab and then choose **Add**.

1. On the **Create required metadata form rule** page, do the following and then choose **Add rule**:
   + Specify a name for your rule.
   + Under **Action**, choose **Data asset and product publishing**.
   + Under **Required forms**, choose **Add metadata form**, choose a metadata form within the domain / domain unit that you want to add to this rule, and then choose **Add**. You can add up to 5 metadata forms per rule.
   + Under **Scope**, specify with which data entities you want to associate these forms. You can choose data products and/or data assets.
   + Under **Data asset types**, specify whether the rule applies across all asset types or limit it to selected asset types. 
   + Under **Projects**, specify whether the required forms will be associated with data products and/or assets published by all projects or only selected projects in this domain unit. Also, check **Cascade rule to child domain units** if you want child domain units to inherit this requirement.