

# Connecting Amazon S3 to Amazon Q Business using the New connector
<a name="s3-v2-connector"></a>

**Note**  
**Enhanced Version:** With the new connector, you can refresh your index significantly faster than before.

## Known limitations
<a name="s3-v2-limitations"></a>

The Amazon S3 connector has the following known limitations:
+ The [Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) bucket must be in the same AWS Region as your Amazon Q index, and your index must have permissions to access the bucket that contains your documents.
+ VPC connectivity not supported (use the old version if VPC support is required)
+ Custom field mappings not supported (use the old connector version if required)
+ Document enrichment is not supported. (use the old connector version if required)

# Overview
<a name="s3-v2-overview"></a>

The following table gives an overview of the Amazon Q Business Amazon S3 connector and its supported features.


****  
[\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/s3-v2-overview.html)

# Prerequisites
<a name="s3-v2-prereqs"></a>

Before you begin, make sure that you have completed the following prerequisites.

**In Amazon S3, make sure you have:**
+ [Created an Amazon S3 bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html) and copied it's name.
**Note**  
Your bucket must be in the same AWS Region as your Amazon Q index, and your index must have permissions to access the bucket that contains your documents.

# Using the console
<a name="s3-v2-console"></a>

The following procedure outlines how to connect Amazon Q Business to Amazon S3 using the AWS Management Console.

**Connecting Amazon Q to Amazon S3**

1. Sign in to the AWS Management Console and open the Amazon Q Business console.

1. From the left navigation menu, choose **Data sources**.

1. From the **Data sources** page, choose **Add data source**.

1. Then, on the **Add data sources** page, from **Data sources**, add the **Amazon S3** data source to your Amazon Q application.

1. Then, on the **Amazon S3** data source page, enter the following information:

1. **Name and description**, do the following:
   + For **Data source name** – Name your data source for easy tracking.
**Note**  
You can include hyphens (-) but not spaces. Maximum of 1,000 alphanumeric characters.
   + **Description – *optional*** – Add an optional description for your data source. This text is viewed only by Amazon Q Business administrators and can be edited later.

1. **IAM role** – Choose an existing IAM role or create an IAM role to access your repository credentials and index content.
**Note**  
IAM roles used for applications can't be used for data sources. If you are unsure if an existing role is used for an application, choose **Create a new role** to avoid errors.

1. **Data source location** – Choose the location of your Amazon S3 bucket:

   1. **This account** – Selected by default. Choose this option if your Amazon S3 bucket is in the same account as your Amazon Q Business application.

   1. **Other account** – Choose this option if your Amazon S3 bucket is in a different account.

      1. **Account ID** – Specify the ID for the other account that owns the bucket.

1. **Sync scope**, enter the following information:

   1. **Enter the data source location** – The path to the Amazon S3 bucket where your data is stored.
      + If you selected **This account**, you can select **Browse S3** to find and choose your bucket.
      + If you selected **Other account**, you must manually enter the bucket name as the browse option is not available for cross-account buckets.
**Note**  
Your bucket must be in the same AWS Region as your Amazon Q Business index.

   1. **Maximum file size - *optional*** – You can specify the file size limit in MB for Amazon Q crawling. Amazon Q crawls only files within the defined size limit. The default file size is 50MB. The maximum file size limit is 10 GB.

   1. **Access control list configuration file location - *optional*** – The path to the location of a file containing a JSON structure that specifies access settings for the files stored in your S3 data source.
      + If you selected **This account**, you can select **Browse S3** to locate your ACL file.
      + If you selected **Other account**, you must manually enter the file path as the browse option is not available for cross-account buckets.

   1. **Metadata files folder location - *optional*** – The path to the folder in which your metadata is stored.
      + If you selected **This account**, you can select **Browse S3** to locate your metadata folder.
      + If you selected **Other account**, you must manually enter the folder path as the browse option is not available for cross-account buckets.

   1. **Filter patterns** – Add regex patterns to include or exclude documents from your index. 

      To include or exclude files and folders, you can use a prefix filter (for example `Data/`, where `Data` is a folder containing documents in your S3 bucket). You can also filter using glob patterns and file types.

   1. **Multi-media content configuration – optional** – To enable content extraction from embedded images and visuals in documents, choose **Visual content in documents**. For more information, see [Extracting semantic meaning from embedded images and visuals](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/extracting-meaning-from-images.html).

      To extract audio transcriptions and video content, enable **Audio Files**. To extract video content, enable **Video files**. For more information, see [Extracting semantic meaning from audio and video Content](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/Audio-video-extraction.html). 

   1. **Advanced settings**

      **Document deletion safeguard** - *optional*–To safeguard your documents from deletion during a sync job, select **On** and enter an integer between 0 - 100. If the percentage of documents to be deleted in your sync job exceeds the percentage you selected, the delete phase will be skipped and no documents from this data source will be deleted from your index. For more information, see [Document deletion safeguard](connector-concepts.md#document-deletion-safeguard).

1. In **Sync run schedule**, for **Frequency** – Choose how often Amazon Q will sync with your data source. For more details, see [Sync run schedule](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/connector-concepts.html#connector-sync-run). To learn how to start a data sync job, see [Starting data source connector sync jobs](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/supported-datasource-actions.html#start-datasource-sync-jobs).

1. **Tags - *optional*** – Add tags to search and filter your resources or track your AWS costs. See [Tags](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/tagging.html) for more details.

1. In **Data source details**, choose **Sync now** to allow Amazon Q to begin syncing (crawling and ingesting) data from your data source. When the sync job finishes, your data source is ready to use.
**Note**  
View CloudWatch logs for your data source sync job by selecting **View CloudWatch logs**. If you encounter a `Resource not found exception` error, wait and try again as logs may not be available immediately.  
You can also view a detailed document-level report by selecting **View Report**. This report shows the status of each document during the crawl, sync, and index stages, including any errors. If the report is empty for an in-progress job, check back later as data is emitted to the report as events occur during the sync process.  
For more information, see [Troubleshooting data source connectors](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/troubleshooting-data-sources.html#troubleshooting-data-sources-not-indexed).

# Connecting Amazon Q Business to Amazon S3 using APIs
<a name="s3-v2-api"></a>

You use the [CreateDataSource](https://docs.aws.amazon.com/amazonq/latest/api-reference/API_CreateDataSource.html) action to connect a data source to your Amazon Q application. You can also use the [UpdateDataSource](https://docs.aws.amazon.com/amazonq/latest/api-reference/API_UpdateDataSource.html) action to modify an existing data source configuration.

Then, you use the `configuration` parameter to provide a JSON blob that conforms the AWS-defined JSON schema.

For an example of the API request, see [CreateDataSource](https://docs.aws.amazon.com/amazonq/latest/api-reference/API_CreateDataSource.html) and [UpdateDataSource](https://docs.aws.amazon.com/amazonq/latest/api-reference/API_UpdateDataSource.html) in the Amazon Q API Reference.

**Topics**
+ [Amazon S3 configuration properties](#s3-v2-configuration-keys)
+ [Amazon S3 JSON schema](#s3-v2-api-json)
+ [Amazon S3 JSON schema example](#s3-v2-api-json-example)

## Amazon S3 configuration properties
<a name="s3-v2-configuration-keys"></a>

The following provides information about important configuration properties required in the schema.


| Configuration | Description | Type | Required | 
| --- | --- | --- | --- | 
| `type` | The type of data source. Specify `S3V2` as your data source type. | `string` The only allowed value is `S3V2`. | Yes | 
| `connectionConfiguration` | Configuration information for the endpoint for the data source. | `object` This property has sub-properties: `bucketName` and `bucketOwnerAccountId`. | Yes | 
| `bucketName` | The name of your Amazon S3 bucket. This is a sub-property for the `connectionConfiguration`. | `string` | Yes | 
| `bucketOwnerAccountId` | The 12-digit AWS account ID that owns the S3 bucket. This is a sub-property for the `connectionConfiguration`. | `string` Must match pattern: `^\d{12}$` | Yes | 
| `filterConfiguration` | Configuration for filtering which files to include or exclude from indexing. | `object` This property has sub-properties for patterns, prefixes, and file size limits. | No | 
| `inclusionPatterns` | File patterns to include during indexing . This is a sub-property for the `filterConfiguration`. | `array` of `string` | No | 
| `exclusionPatterns` | File patterns to exclude during indexing. This is a sub-property for the `filterConfiguration`. | `array` of `string` | No | 
| `inclusionPrefixes` | S3 key prefixes to include during indexing (e.g., documents/, reports/). This is a sub-property for the `filterConfiguration`. | `array` of `string` | No | 
| `exclusionPrefixes` | S3 key prefixes to exclude during indexing (e.g., temp/, cache/). This is a sub-property for the `filterConfiguration`. | `array` of `string` | No | 
| `maxFileSizeInMegaBytes` | Maximum file size in megabytes to index. Files larger than this will be skipped. This is a sub-property for the `filterConfiguration`. | `number` Minimum: 0, Maximum: 10240 | No | 
| `accessControlConfiguration` | Configuration for access control and permissions. | `object` This property has sub-properties for ACL configuration and default access type. | No | 
| `aclConfigurationFilePath` | Path to the ACL configuration file in your S3 bucket. This is a sub-property for the `accessControlConfiguration`. | `string` Length: 1-1024 characters | No | 
| `deletionProtectionConfiguration` | Configuration for deletion protection to prevent accidental bulk deletions. | `object` This property has sub-properties for enabling deletion protection and setting thresholds. | No | 
| `enableDeletionProtection` | Whether to enable deletion protection. This is a sub-property for the `deletionProtectionConfiguration`. | `boolean` | No | 
| `deletionProtectionThreshold` | Percentage threshold for deletion protection. If more than this percentage of documents would be deleted, the sync will be blocked. This is a sub-property for the `deletionProtectionConfiguration`. | `number` Default: 15 | No | 
| `metadataFilesPrefix` | S3 key prefix where metadata files are stored for enhanced document processing. | `string` Length: 1-1024 characters | No | 

## Amazon S3 JSON schema
<a name="s3-v2-api-json"></a>

The following is the Amazon S3 JSON schema with simplified configuration structure:

```
{
  "type": "object",
  "properties": {
    "type": {
      "type": "string",
      "pattern": "S3V2"
    },
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "bucketName": {
          "type": "string"
        },
        "bucketOwnerAccountId": {
          "type": "string",
          "pattern": "^\\d{12}$"
        }
      },
      "required": ["bucketName", "bucketOwnerAccountId"]
    },
    "filterConfiguration": {
      "type": "object",
      "properties": {
        "inclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionPrefixes": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionPrefixes": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "maxFileSizeInMegaBytes": {
          "type": "number",
          "minimum": 0,
          "maximum": 10240
        }
      }
    },
    "accessControlConfiguration": {
      "type": "object",
      "properties": {
        "aclConfigurationFilePath": {
          "type": "string",
          "minLength": 1,
          "maxLength": 1024
        }
      }
    },
    "deletionProtectionConfiguration": {
      "type": "object",
      "properties": {
        "enableDeletionProtection": {
          "type": "boolean"
        },
        "deletionProtectionThreshold": {
          "type": "number",
          "default": 15
        }
      }
    },
    "metadataFilesPrefix": {
      "type": "string",
      "minLength": 1,
      "maxLength": 1024
    }
  },
  "required": [
    "type",
    "connectionConfiguration"
  ]
}
```

## Amazon S3 JSON schema example
<a name="s3-v2-api-json-example"></a>

The following is the Amazon S3 JSON schema example with simplified configuration:

```
{
  "type": "S3V2",
  "connectionConfiguration": {
    "bucketName": "my-company-data-bucket",
    "bucketOwnerAccountId": "123456789012"
  },
  "filterConfiguration": {
    "inclusionPrefixes": ["documents/", "reports/"],
    "exclusionPrefixes": ["temp/", "cache/"],
    "maxFileSizeInMegaBytes": 100
  },
  "accessControlConfiguration": {
    "aclConfigurationFilePath": "config/acl-config.json"
  },
  "deletionProtectionConfiguration": {
    "enableDeletionProtection": true,
    "deletionProtectionThreshold": 15
  },
  "metadataFilesPrefix": "metadata/"
}
```

# Connecting Amazon Q Business to Amazon S3 using AWS CloudFormation
<a name="s3-v2-cfn"></a>

You use the [https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-qbusiness-datasource.html](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-qbusiness-datasource.html) resource to connect a data source to your Amazon Q application.

Use the [https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-qbusiness-datasource.html#cfn-qbusiness-datasource-applicationid](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-resource-qbusiness-datasource.html#cfn-qbusiness-datasource-applicationid) property to provide a JSON or YAML schema with the necessary configuration details specific to your data source connector.

To learn more about AWS CloudFormation, see [What is AWS CloudFormation?](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/Welcome.html) in the *CloudFormation User Guide*.

**Topics**
+ [Amazon S3 configuration properties](#s3-v2-configuration-keys)
+ [Amazon S3 JSON schema for using the configuration property with AWS CloudFormation](#s3-v2-cfn-json)
+ [Amazon S3 YAML schema for using the configuration property with AWS CloudFormation](#s3-v2-cfn-yaml)

## Amazon S3 configuration properties
<a name="s3-v2-configuration-keys"></a>

The following provides information about important configuration properties required in the schema.


| Configuration | Description | Type | Required | 
| --- | --- | --- | --- | 
| `type` | The type of data source. Specify `S3V2` as your data source type. | `string` The only allowed value is `S3V2`. | Yes | 
| `connectionConfiguration` | Configuration information for the endpoint for the data source. | `object` This property has sub-properties: `bucketName` and `bucketOwnerAccountId`. | Yes | 
| `bucketName` | The name of your Amazon S3 bucket. This is a sub-property for the `connectionConfiguration`. | `string` | Yes | 
| `bucketOwnerAccountId` | The 12-digit AWS account ID that owns the S3 bucket. This is a sub-property for the `connectionConfiguration`. | `string` Must match pattern: `^\d{12}$` | Yes | 
| `filterConfiguration` | Configuration for filtering which files to include or exclude from indexing. | `object` This property has sub-properties for patterns, prefixes, and file size limits. | No | 
| `inclusionPatterns` | File patterns to include during indexing . This is a sub-property for the `filterConfiguration`. | `array` of `string` | No | 
| `exclusionPatterns` | File patterns to exclude during indexing. This is a sub-property for the `filterConfiguration`. | `array` of `string` | No | 
| `inclusionPrefixes` | S3 key prefixes to include during indexing (e.g., documents/, reports/). This is a sub-property for the `filterConfiguration`. | `array` of `string` | No | 
| `exclusionPrefixes` | S3 key prefixes to exclude during indexing (e.g., temp/, cache/). This is a sub-property for the `filterConfiguration`. | `array` of `string` | No | 
| `maxFileSizeInMegaBytes` | Maximum file size in megabytes to index. Files larger than this will be skipped. This is a sub-property for the `filterConfiguration`. | `number` Minimum: 0, Maximum: 10240 | No | 
| `accessControlConfiguration` | Configuration for access control and permissions. | `object` This property has sub-properties for ACL configuration and default access type. | No | 
| `aclConfigurationFilePath` | Path to the ACL configuration file in your S3 bucket. This is a sub-property for the `accessControlConfiguration`. | `string` Length: 1-1024 characters | No | 
| `deletionProtectionConfiguration` | Configuration for deletion protection to prevent accidental bulk deletions. | `object` This property has sub-properties for enabling deletion protection and setting thresholds. | No | 
| `enableDeletionProtection` | Whether to enable deletion protection. This is a sub-property for the `deletionProtectionConfiguration`. | `boolean` | No | 
| `deletionProtectionThreshold` | Percentage threshold for deletion protection. If more than this percentage of documents would be deleted, the sync will be blocked. This is a sub-property for the `deletionProtectionConfiguration`. | `number` Default: 15 | No | 
| `metadataFilesPrefix` | S3 key prefix where metadata files are stored for enhanced document processing. | `string` Length: 1-1024 characters | No | 

## Amazon S3 JSON schema for using the configuration property with AWS CloudFormation
<a name="s3-v2-cfn-json"></a>

The following is the Amazon S3 JSON schema and examples for the configuration property for AWS CloudFormation.

**Topics**
+ [Amazon S3 JSON schema for using the configuration property with AWS CloudFormation](#s3-v2-cfn-json-schema)
+ [Amazon S3 JSON schema example for using the configuration property with AWS CloudFormation](#s3-v2-cfn-json-example)

### Amazon S3 JSON schema for using the configuration property with AWS CloudFormation
<a name="s3-v2-cfn-json-schema"></a>

The following is the Amazon S3 JSON schema for the configuration property for CloudFormation.

```
{
  "type": "object",
  "properties": {
    "type": {
      "type": "string",
      "pattern": "S3V2"
    },
    "connectionConfiguration": {
      "type": "object",
      "properties": {
        "bucketName": {
          "type": "string"
        },
        "bucketOwnerAccountId": {
          "type": "string",
          "pattern": "^\\d{12}$"
        }
      },
      "required": ["bucketName", "bucketOwnerAccountId"]
    },
    "filterConfiguration": {
      "type": "object",
      "properties": {
        "inclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionPatterns": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "inclusionPrefixes": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "exclusionPrefixes": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "maxFileSizeInMegaBytes": {
          "type": "number",
          "minimum": 0,
          "maximum": 10240
        }
      }
    },
    "accessControlConfiguration": {
      "type": "object",
      "properties": {
        "aclConfigurationFilePath": {
          "type": "string",
          "minLength": 1,
          "maxLength": 1024
        }
      }
    },
    "deletionProtectionConfiguration": {
      "type": "object",
      "properties": {
        "enableDeletionProtection": {
          "type": "boolean"
        },
        "deletionProtectionThreshold": {
          "type": "number",
          "default": 15
        }
      }
    },
    "metadataFilesPrefix": {
      "type": "string",
      "minLength": 1,
      "maxLength": 1024
    }
  },
  "required": [
    "type",
    "connectionConfiguration"
  ]
}
```

### Amazon S3 JSON schema example for using the configuration property with AWS CloudFormation
<a name="s3-v2-cfn-json-example"></a>

The following is the Amazon S3 JSON example for the Configuration property for CloudFormation.

```
{
  "type": "S3V2",
  "connectionConfiguration": {
    "bucketName": "my-company-data-bucket",
    "bucketOwnerAccountId": "123456789012"
  },
  "filterConfiguration": {
    "inclusionPatterns": ["*.pdf", "*.docx", "*.txt"],
    "exclusionPatterns": ["*.tmp", "*.log"],
    "inclusionPrefixes": ["documents/", "reports/"],
    "exclusionPrefixes": ["temp/", "cache/"],
    "maxFileSizeInMegaBytes": 100
  },
  "accessControlConfiguration": {
    "aclConfigurationFilePath": "config/acl-config.json"
  },
  "deletionProtectionConfiguration": {
    "enableDeletionProtection": true,
    "deletionProtectionThreshold": 15
  },
  "metadataFilesPrefix": "metadata/"
}
```

## Amazon S3 YAML schema for using the configuration property with AWS CloudFormation
<a name="s3-v2-cfn-yaml"></a>

The following is the Amazon S3 YAML schema and examples for the configuration property for AWS CloudFormation:

**Topics**
+ [Amazon S3 YAML schema example for using the configuration property with AWS CloudFormation](#s3-v2-cfn-yaml-example)

### Amazon S3 YAML schema example for using the configuration property with AWS CloudFormation
<a name="s3-v2-cfn-yaml-example"></a>

The following is the Amazon S3 YAML example for the Configuration property for CloudFormation:

```
AWSTemplateFormatVersion: "2010-09-09"
Description: "CloudFormation Amazon S3 Data Source Template"
Resources:
  DataSourceS3V2:
    Type: "AWS::QBusiness::DataSource"
    Properties:
      ApplicationId: app12345-1234-1234-1234-123456789012
      IndexId: indx1234-1234-1234-1234-123456789012
      DisplayName: MyS3DataSourceV2
      RoleArn: arn:aws:iam::123456789012:role/qbusiness-data-source-role
      Configuration:
        type: S3V2
        connectionConfiguration:
          bucketName: my-company-data-bucket
          bucketOwnerAccountId: "123456789012"
        filterConfiguration:
          inclusionPatterns:
            - "*.pdf"
            - "*.docx"
            - "*.txt"
          exclusionPatterns:
            - "*.tmp"
            - "*.log"
          inclusionPrefixes:
            - "documents/"
            - "reports/"
          exclusionPrefixes:
            - "temp/"
            - "cache/"
          maxFileSizeInMegaBytes: 100
        accessControlConfiguration:
          aclConfigurationFilePath: "config/acl-config.json"
        deletionProtectionConfiguration:
          enableDeletionProtection: true
          deletionProtectionThreshold: 15
        metadataFilesPrefix: "metadata/"
```

# IAM role
<a name="s3-v2-iam-role"></a>

Whether you use the AWS Management Console or the [https://docs.aws.amazon.com/amazonq/latest/api-reference/API_CreateDataSource.html](https://docs.aws.amazon.com/amazonq/latest/api-reference/API_CreateDataSource.html) API, you must provide an IAM role that allows Amazon Q Business to access your Amazon S3 bucket.

If you use the AWS CLI or an AWS SDK, you must create an AWS Identity and Access Management (IAM) policy before you create an Amazon Q Business resource. When you call the [CreateDataSource](https://docs.aws.amazon.com/amazonq/latest/api-reference/API_CreateDataSource.html) operation, you provide the Amazon Resource Name (ARN) role with the policy attached.

If you use the AWS Management Console, you can create a new IAM role in the Amazon Q console or use an existing IAM role while creating your data source.

**Note**  
To learn how to create an IAM role, see [Create a role to delegate permissions to an AWS service](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html).

Cross-account Amazon S3 buckets are supported with Amazon Q Business. However, your bucket must be located in the same AWS Region as your Amazon Q Business index, and your index must have permissions to access the bucket containing your documents.

When you use an Amazon S3 bucket as a data source, you must provide a role that has permissions to:
+ Access your Amazon S3 bucket.
+ Permission to access the [https://docs.aws.amazon.com/amazonq/latest/api-reference/API_BatchPutDocument.html](https://docs.aws.amazon.com/amazonq/latest/api-reference/API_BatchPutDocument.html) and [https://docs.aws.amazon.com/amazonq/latest/api-reference/API_BatchDeleteDocument.html](https://docs.aws.amazon.com/amazonq/latest/api-reference/API_BatchDeleteDocument.html) API operations in order to ingest documents.
+ Permission to access the Principal Store APIs needed to ingest access control and identity information from documents.

**To allow Amazon Q to use an Amazon S3 bucket as a data source, use the following role policy:**

```
{
  "Version": "2012-10-17",		 	 	 ,
  "Statement": [
    {
      "Sid": "AllowsAmazonQToGetObjectfromS3",
      "Action": [
        "s3:GetObject"
      ],
      "Resource": [
        "arn:aws:s3:::{{input_bucket_name}}/*"
      ],
      "Effect": "Allow",
      "Condition": {
        "StringEquals": {
          "aws:ResourceAccount": "{{account_id}}"
        }
      }
    },
    {
      "Sid": "AllowsAmazonQToListS3Buckets",
      "Action": [
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::{{input_bucket_name}}"
      ],
      "Effect": "Allow",
      "Condition": {
        "StringEquals": {
          "aws:ResourceAccount": "{{account_id}}"
        }
      }
    },
    {
      "Sid": "AllowsAmazonQToIngestDocuments",
      "Effect": "Allow",
      "Action": [
        "qbusiness:BatchPutDocument",
        "qbusiness:BatchDeleteDocument"
      ],
      "Resource": "arn:aws:qbusiness:{{region}}:{{source_account}}:application/{{application_id}}/index/{{index_id}}"
    },
    {
      "Sid": "AllowsAmazonQToCallPrincipalMappingAPIs",
      "Effect": "Allow",
      "Action": [
        "qbusiness:PutGroup",
        "qbusiness:CreateUser",
        "qbusiness:DeleteGroup",
        "qbusiness:UpdateUser",
        "qbusiness:ListGroups"
      ],
      "Resource": [
        "arn:aws:qbusiness:{{region}}:{{account_id}}:application/{{application_id}}",
        "arn:aws:qbusiness:{{region}}:{{account_id}}:application/{{application_id}}/index/{{index_id}}",
        "arn:aws:qbusiness:{{region}}:{{account_id}}:application/{{application_id}}/index/{{index_id}}/data-source/*"
      ]
    },
    {
      "Sid": "AllowsAmazonQToPassCustomerRole",
      "Effect": "Allow",
      "Action": [
        "iam:PassRole"
      ],
      "Resource": [
        "arn:aws:iam::{{account_id}}:role/QBusiness-DataSource-*"
      ],
      "Condition": {
        "StringEquals": {
          "iam:PassedToService": "qbusiness.amazonaws.com"
        }
      }
    }
  ]
}
```

**If the documents in the Amazon S3 bucket are encrypted, you must provide the following permissions to use the AWS KMS key to decrypt the documents:**

```
{
      "Sid": "AllowsAmazonQToDecryptSecret",
      "Effect": "Allow",
      "Action": [
        "kms:Decrypt"
      ],
      "Resource": [
        "arn:aws:kms:{{region}}:{{account_id}}:key/[[key_id]]"
      ],
      "Condition": {
        "StringLike": {
          "kms:ViaService": [
            "secretsmanager.*.amazonaws.com"
          ]
        }
      }
    }
```

**To allow Amazon Q to assume a role, use the following trust policy:**

```
{
  "Version": "2012-10-17",		 	 	 ,
  "Statement": [
    {
      "Sid": "AllowsAmazonQToAssumeRoleForServicePrincipal",
      "Effect": "Allow",
      "Principal": {
        "Service": "qbusiness.amazonaws.com"
      },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "{{source_account}}"
        },
        "ArnLike": {
          "aws:SourceArn": "arn:aws:qbusiness:{{region}}:{{source_account}}:application/{{application_id}}"
        }
      }
    }
  ]
}
```

# Clickable URLs
<a name="s3-v2-clickable-links-shared"></a>

The Clickable URL feature allows end users to access source documents through citation links in chat responses, regardless of whether a source URI is configured. 

This feature improves the verification experience by making all documents of supported datasource types accessible. Currently, clickable links are only supported for Amazon S3, custom connectors, file upload and direct `BatchPutDocument` ingestion of documents.

**Configuration Requirements:** While this feature works automatically for new applications, existing customers may need additional configuration:
+ If you already use Amazon S3 data source for your Amazon Q Business application, you will need to perform a full sync of the data source for the clickable URLs feature to be available to your users.
+ If you already use an Amazon Q Business web experience, you may need to add additional permissions to the IAM role for the web experience. See the troubleshooting section below for details.

**Download Concurrency Limit:** For information about file size limits, see [Quotas and regions](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/quotas-regions.html).

**Access Control:** The Clickable URL respects all access control settings:
+ If a user's access to a file is revoked after they've viewed it in a chat, once a resync is performed subsequent attempts to access the file will be denied with a clear error message.
+ If a file is updated after a chat reference, then once a resync is performed clicking the link will retrieve the current version of the file.
+ If a file is deleted and a resync is performed users will receive a clear error message indicating the file no longer exists.

# Troubleshooting clickable links
<a name="s3-v2-clickable-links-troubleshooting"></a>

This section helps you resolve errors that you might encounter when using clickable links for source references in your conversations with your Amazon Q Business AI assistant.

## Full sync required
<a name="s3-v2-full-sync-required"></a>

**Note**  
The following suggested troubleshooting steps apply only to existing accounts using the Amazon S3 connector.

**Issue**: When you try to access referenced URLs from an Amazon S3 or uploaded files data source, you receive the following error message.

**Error message**:

```
This document cannot be downloaded because the raw document download feature requires a full connector sync performed after 07/02/2025. Your admin has not yet completed this full sync. Please contact your admin to request a complete sync of the data source.
```

**Solution**: This error occurs when the data source hasn't completed a full sync after the clickable links feature was enabled. To resolve this issue:
+ For S3 data sources: Perform a full sync of the S3 data source.
+ For uploaded files data sources: Delete the files from the upload files data source and upload them again.

## Permission changes
<a name="s3-v2-permission-changes"></a>

**Issue**: When browsing conversation history, you click on a reference URL from an Amazon S3 data source but can't view or download the file.

**Error message**:

```
You no longer have permission to access this document. The access permissions for this document have been changed since you last accessed it. Please contact your admin if you believe you should have access to this content.
```

**Solution**: This error occurs when the permissions for the document in the ACLs on the Amazon S3 bucket changed after your conversation, removing your access to the file. The ACLs were updated in the Amazon Q Business index during a subsequent data source sync. If you believe you should have access to the document, contact your administrator to:
+ Review and update the ACLs
+ Perform a data source sync

## Document not found
<a name="s3-v2-document-not-found"></a>

**Issue**: When browsing conversation history, you click on a reference URL from an Amazon S3 or upload files data source but can't view or download the file.

**Error message**:

```
The document you're trying to access no longer exists in the data source. It may have been deleted or moved since it was last referenced. Please check with the admin if you need access to this document.
```

**Solution**: This error occurs when the document was deleted from the Amazon S3 bucket, moved to a different location, or deleted from the upload files data source after your conversation. The document was also removed from the Amazon Q Business index and staging bucket during a subsequent data source sync. If you believe the document shouldn't have been deleted, contact your administrator to restore the document and perform a data source sync.

## Insufficient permissions
<a name="s3-v2-insufficient-permissions"></a>

**Note**  
The following suggested troubleshooting steps apply only to existing accounts using the Amazon S3 connector.

**Issue**: When you click on a reference URL from an Amazon S3 or upload files data source, you can't view or download the file.

**Error message**:

```
Unable to download this document because your Web Experience lacks the required permissions. Your admin needs to update the IAM role for the Web Experience to include permissions for the GetDocumentContent API. Please contact your admin to request this IAM role update.
```

**Solution**: This error occurs when the web experience doesn't have the required permissions to invoke the GetDocumentContent API. Your administrator can resolve this error by updating the IAM role for the web experience with the permissions described in the *Considerations for using S3 clickable URLs* section.

If you already use a Amazon Q Business web experience, add the following permissions to the [IAM role](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles.html) for the Amazon Q Business web experience:

```
{
    "Sid": "QBusinessGetDocumentContentPermission",
    "Effect": "Allow",
    "Action": ["qbusiness:GetDocumentContent"],
    "Resource": [
        "arn:aws:qbusiness:{{region}}:{{source_account}}:application/{{application_id}}",
        "arn:aws:qbusiness:{{region}}:{{source_account}}:application/{{application_id}}/index/*"
    ]
}
```

# Adding document metadata in Amazon S3
<a name="s3-metadata-v2"></a>

To customize chat results for your end users, you can add metadata or document attributes to documents in an Amazon S3 bucket by using a metadata file. Metadata is additional information about a document, such as its title and the date and time it was created. To learn more about metadata in Amazon Q Business, see [Document attributes in Amazon Q Business](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/doc-attributes.html).

Amazon Q Business supports source attribution with citations. If you specify the `_source_uri` metadata field when you add metadata to your Amazon S3 bucket, the source attribution links returned by Amazon Q Business in the chat results will direct users to the configured URL. If you don't specify a `_source_uri`, users can still access the source documents through clickable citation links that will download the file at query time. This allows users to verify information even when no source URI is configured.

**Topics**
+ [Document metadata location](#s3-metadata-location-v2)
+ [Document metadata structure](#s3-metadata-structure-v2)

## Document metadata location
<a name="s3-metadata-location-v2"></a>

In Amazon S3, each metadata file can be associated with an indexed document. Your metadata files must be stored in the same Amazon S3 bucket as your indexed files. You can specify a location within the Amazon S3 bucket for your metadata files by using the AWS Management Console. Or, you can use the `metadataFilesPrefix` field of the Amazon S3 `configuration` parameter using the JSON schema when you create an Amazon S3 data source using the [https://docs.aws.amazon.com/amazonq/latest/api-reference/API_CreateDataSource.html](https://docs.aws.amazon.com/amazonq/latest/api-reference/API_CreateDataSource.html) API.

If you don't specify an Amazon S3 prefix, your metadata files must be stored in the same location as your indexed documents. If you specify an Amazon S3 prefix for your metadata files, they must be in a directory structure parallel to your indexed documents. Amazon Q looks only in the specified directory for your metadata. If the metadata isn't read, check that the directory location matches the location of your metadata.

The following examples show how the indexed document location maps to the metadata file location. The document's Amazon S3 key is appended to the metadata's Amazon S3 prefix and then suffixed with `.metadata.json` to form the metadata file's Amazon S3 path.

**Note**  
The combined Amazon S3 key, the metadata's Amazon S3 prefix, and the `.metadata.json` suffix must be no more than a total of 1,024 characters. We recommend that your Amazon S3 key is less than 1,000 characters to account for additional characters when combining your key with the prefix and suffix.

```
Bucket name:
     s3://bucketName
Document path:
     documents
Metadata path:
     none
File mapping
     s3://bucketName/documents/file.txt -> 
        s3://bucketName/documents/file.txt.metadata.json
```

```
Bucket name:
     s3://bucketName
Document path:
     documents/legal
Metadata path:
     metadata
File mapping
     s3://bucketName/documents/legal/file.txt -> 
        s3://bucketName/metadata/documents/legal/file.txt.metadata.json
```

## Document metadata structure
<a name="s3-metadata-structure-v2"></a>

You define your document metadata itself in a JSON file. The file must be a UTF-8 text file without a BOM marker. The file name of the JSON file must be `<document>.<extension>.metadata.json`. In this example, *document* is the name of the document that the metadata applies to and *extension* is the file extension for the document. The document ID must be unique in `<document>.<extension>.metadata.json`.

The content of the JSON file uses the following template:

```
{
    "DocumentId": "document ID",
    "Attributes": {
        "_authors": ["author of the document"],
        "_category": "document category",
        "_created_at": "ISO 8601 encoded string",
        "_last_updated_at": "ISO 8601 encoded string",
        "_source_uri": "document URI",
        "_version": "file version",
        "_view_count": number of times document has been viewed
    },
    "AccessControlList": [
         {
             "Name": "user name",
             "Type": "GROUP | USER",
             "Access": "ALLOW | DENY"
         }
    ],
    "Title": "document title",
    "ContentType": "PDF | HTML | MS_WORD | PLAIN_TEXT | PPT | RTF | XML | XSLT | MS_EXCEL | CSV | JSON | MD"
}
```

If you provide a Metadata path, make sure that directory structure inside the metadata directory exactly matches the directory structure of data file. 

For example, if the data file location is at `s3://bucketName/documents/legal/file.txt`, 

the metadata file location should be at `s3://bucketName/metadata/documents/legal/file.txt.metadata.json`. 

All of the attributes and fields are optional, so it's not necessary to include all attributes. However, you must provide a value for each attribute that you want to include; the value can't be empty. 

The `_created_at` and `_last_updated_at` metadata fields are ISO 8601 encoded dates. For example, 2012-03-25T12:30:10\$101:00 is the ISO 8601 date-time format for March 25, 2012, at 12:30PM (plus 10 seconds) in the Central European Time time zone.

You can use the `AccessControlList` field to filter the response from a query. This way, only certain users and groups have access to documents.

# How Amazon Q Business connector crawls Amazon S3 ACLs
<a name="s3-user-management-2"></a>

You add access control information to a document in an Amazon S3 data source using a metadata file associated with the document. You specify the file using the console or as the `aclConfigurationFilePath` parameter when you call the `CreateDataSource` or `UpdateDataSource` API and use the `configuration` parameter.

**Note**  
The ACL file and data should be stored in the same Amazon S3 bucket.

The configuration file contains a JSON structure that identifies an Amazon S3 prefix and lists the access settings for the prefix. The prefix can be a path, or it can be an individual file. If the prefix is a path, the access settings apply to all of the files in that path.

You provide three pieces of information in the file:
+ The access that the entity should have. You can use `ALLOW` or `DENY`.
+ The type of entity. You can use `USER` or `GROUP`.
+ The name of the entity.

**Important**  
The system grants ALL users access to prefixes that do NOT appear in the ACL file. 

The JSON structure for the configuration file must be in the following format:

```
[
    {
        "keyPrefix": "s3://BUCKETNAME/prefix1/",
        "aclEntries": [
            {
                "Name": "user1@example.com",
                "Type": "USER",
                "Access": "ALLOW"
            },
            {
                "Name": "group1",
                "Type": "GROUP",
                "Access": "DENY"
            }
        ]
    },
    {
        "keyPrefix": "s3://BUCKETNAME/prefix2/",
        "aclEntries": [
            {
                "Name": "user2@example.com",
                "Type": "USER",
                "Access": "ALLOW"
            },
            {
                "Name": "user1@example.com",
                "Type": "USER",
                "Access": "DENY"
            },
            {
                "Name": "group1",
                "Type": "GROUP",
                "Access": "DENY"
            }
        ]
    }
]
```

 For more information, see:
+ [Authorization](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/connector-concepts.html#connector-authorization)
+ [Identity crawler](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/connector-concepts.html#connector-identity-crawler)
+ [Understanding User Store](https://docs.aws.amazon.com/amazonq/latest/qbusiness-ug/connector-principal-store.html)