

# Customizing your Queries Responses
<a name="textract-using-adapters"></a>

Amazon Textract lets you customize the output of its pretrained Queries feature using adapters. You can use the [Amazon Textract Console](https://console.aws.amazon.com/textract/) to create an adapter. This adapter can then be referenced when calling the [AnalyzeDocument](API_AnalyzeDocument.md) and [StartDocumentAnalysis](API_StartDocumentAnalysis.md) operations. 

When you create an adapter using the console, you upload your own documents for the purposes of training the adapter and testing its performance. You also add queries to your documents and then annotate your documents by linking these queries to the correct response elements in your documents. Once you have created an adapter and annotated your documents, you can train the adapter, check its performance, and then use it when analyzing documents.

Adapters are modular components are added to the existing Amazon Textract deep learning model, extending its capabilities for the tasks it’s trained on. By fine-tuning a deep learning model with adapters, you can customize the output for document analysis tasks related to your specific use case. 

 To create and use an adapter, you must: 
+ Upload sample documents for training
+ Designate the train and test datasets
+ Annotate your documents with queries and responses
+ Train the adapter
+ Get the AdapterId
+ Use the adapter when calling `AnalyzeDocument`

**Uploading sample documents**

 To train the adapter, you must upload a set of sample documents representative of your use case. You can upload documents directly from your computer or an Amazon S3 bucket. For best results, provide as many documents for training as possible (up to a maximum of 2,500 pages training documents and 1000 test documents). Make sure that the documents represent all aspects of your use case. You must upload a minimum of five training and five testing documents. 

**Designating training and test sets**

You must divide all of your documents into training and test sets. The training set is used to train the adapter. The adapter learns the patterns contained in these annotated documents. The test set is used to evaluate the adapter performance. 

 For more information on training and testing data, see [Preparing training and testing datasets](textract-preparing-training-testing.md). 

**Annotating documents with queries and responses**

When annotating your documents, you have two choices: You can auto-label your documents using the pretrained Queries feature and then edit the labels where needed. Alternatively, you can manually label responses for each of your document queries. 

For more information on best practices for queries, see [Best Practices for Queries](https://docs.aws.amazon.com/en_us/textract/latest/dg/bestqueries.html). 

 **Train the adapter **

After you annotate the training data, you can initiate the training process for your adapter. Amazon Textract trains an adapter that's tailored to your documents. The adapter training takes 2-30 hours, depending on the size of the dataset and the AWS Region. When the training is complete, you can view the training status in the adapter details page. If the status is `training failed`, see [Debugging training failures](textract-debugging-failures-adapters.md) to debug the failure. 

**Evaluate the adapter**

After each round of adapter training, review the performance metrics in the AWS Management Console to determine how close the adapter is to your desired level of performance. You can then further improve your adapter’s accuracy for your documents by uploading a new batch of training documents or by reviewing annotations for documents that have low accuracy scores. After you create an improved version of the adapter, you can use the AWS Console to delete any earlier adapter versions that you no longer need. 

For more information on evaluation metrics, see [Evaluating and improving your adapters](textract-evaluating-improving-adapters.md). 

**Get the AdapterId**

Once the adapter has been trained, you can get the unique ID for your adapter to use with the Amazon Textract document analysis API operations. Retrieve the AdapterId by using the [ListAdapterVersions](API_ListAdapterVersions.md) API operation, or by using the AWS Management Console. 

**Call the AnalyzeDocument API operation**

To apply your custom adapter, provide its ID when calling the [AnalyzeDocument](API_AnalyzeDocument.md) or [StartDocumentAnalysis](API_StartDocumentAnalysis.md) API operations. This enhances predictions on your documents. When calling API operations, you can use up to one adapter per page.

**Video demonstration and tutorial**

[![AWS Videos](http://img.youtube.com/vi/https://www.youtube.com/embed/we-bv-NXgL0?si=K1mDHrncKLz918ZC/0.jpg)](http://www.youtube.com/watch?v=https://www.youtube.com/embed/we-bv-NXgL0?si=K1mDHrncKLz918ZC)


# Creating adapters
<a name="textract-creating-using-adapters"></a>

Before you can train an adapter, you must create an adapter. To do so, use the [CreateAdapter](API_CreateAdapter.md) operation. After you create an adapter, get information about it with the [GetAdapter](API_GetAdapter.md) operation. Configuration elements of adapters can be updated with the [UpdateAdapter](API_UpdateAdapter.md) operation. Get a list of adapters with the [ListAdapters](API_ListAdapters.md) operation. Delete an adapter you no longer need with the [DeleteAdapter](API_DeleteAdapter.md) operation.

# Create an Adapter
<a name="textract-create-adapter"></a>

To customize the Amazon Textract base model, create an adapter. To do so, use the [CreateAdapter](API_CreateAdapter.md) operation. When calling `CreateAdapter,` you provide an AdapterName and FeatureType as an input. Currently Queries is the only feature type supported. 

When creating an adapter you can also provide a Description, Tags, and a ClientRequestToken. Finally, you can choose whether the adapter should be auto-updated with the AutoUpdate argument. After creating an adapter, you can start training it on your own sample documents by using the [CreateAdapterVersion](API_CreateAdapterVersion.md) operation. 

To create an adapter with the Amazon Textract console:
+ Sign in to the Amazon Textract console.
+ Select **Custom Queries** from the left navigation panel.
+ Select **Create adapter**.

To create an adapter with the AWS CLI or AWS SDK:
+ If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see [Step 2: Set Up the AWS CLI and AWS SDKs](setup-awscli-sdk.md).
+ Use the following code to create an adapter: 

------
#### [ CLI ]

```
aws textract create-adapter \
--adapter-name "test-w2" \
--feature-types '["QUERIES"]' \
--description 'demo'
```

------

# Get adapter
<a name="textract-get-adapter"></a>

You can retrieve configuration information for an adapter at any time by calling the [GetAdapter](API_GetAdapter.md) operation and specifying an AdapterId. GetAdapter returns information on AdapterName, Description, CreationTime, AutoUpdate status, and FeatureTypes.

To see details for your adapter with the console:
+ Sign into the AWS console for Amazon Textract.
+ Select **Custom Queries** from the navigation panel on the left.
+ From the list of **Your adapters**, select the adapter you want to view the details for.
+ Review the details for the adapter on your Adapter details page.

To see details for your adapter with the CLI/SDK:
+ If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see [Step 2: Set Up the AWS CLI and AWS SDKs](setup-awscli-sdk.md).
+ Use the following code to create an adapter: 

------
#### [ CLI ]

```
aws textract get-adapter \
--adapter-id "abcdef123456"
```

------

# List adapters
<a name="textract-list-adapter"></a>

You can list all of the adapters associated with your account by using the [ListAdapters](API_ListAdapters.md) operation. You can filter the list of returned adapters by the date and time of creation by using the AfterCreationTime and BeforeCreationTime arguments. You can also set a number of maximum results to return using MaxResults.

To see a list of your adapters with the console:
+ Sign into the AWS console for Amazon Textract.
+ Select **Custom Queries** from the navigation panel on the left.
+ View your adapters in the list of your adapters.

To create an adapter with the CLI/SDK:
+ If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see [Step 2: Set Up the AWS CLI and AWS SDKs](setup-awscli-sdk.md).
+ Use the following code to create an adapter: 

------
#### [ CLI ]

```
aws textract list-adapters 
```

------

# Update adapter
<a name="textract-update-adapter"></a>

With Amazon Textract, you can update some configuration options of an adapter. Simultaneously, you can update any adapter versions associated with the adapter. To do this, call the [UpdateAdapter](API_UpdateAdapter.md) operation and provide the operation with the AdapterId and configuration elements that you want to update. The AdapterName and FeatureTypes elements cannot be updated. 

To update an adapter with the AWS CLI or AWS SDK:
+ If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see [Step 2: Set Up the AWS CLI and AWS SDKs](setup-awscli-sdk.md).
+ Use the following code to create an adapter: 

------
#### [ CLI ]

```
aws textract update-adapter \
--adapter-id 'abcdef123456' \  
--description 'demo new'
```

------

# Delete an Adapter
<a name="textract-delete-adapter"></a>

You can delete a custom Amazon Textract adapter at any time by calling the [DeleteAdapter](API_DeleteAdapter.md) API operation. You can delete an adapter by providing the DeleteAdapter operation with the AdapterId of the adapter that you want to delete. Invoke DeleteAdapter will delete all Adapter Versions associated with the Adapter ARN.

To delete an adapter with the console:
+ Sign in to the Amazon Textract console.
+ Select **Custom Queries** from the left navigation panel.
+ From the list of your adapters, select the adapter to delete.
+ Select **Delete** and follow the instructions to delete your adapter.

To create an adapter with the AWS CLI or AWS SDK:
+ If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see [Step 2: Set Up the AWS CLI and AWS SDKs](setup-awscli-sdk.md).
+ Use the following code to create an adapter: 

------
#### [ CLI ]

```
aws textract delete-adapter \
--adapter-id 'abcdef123456'
```

------

# Preparing training and testing datasets
<a name="textract-preparing-training-testing"></a>

**Training and Testing Datasets**  
The training dataset is the basis for creating an adapter. You must provide an annotated training dataset to train an adapter. This training dataset consists of user uploaded document pages, queries, and annotated query answers. The model learns from this dataset to improve its performance on the type of documents you provide.

The testing dataset is used to evaluate the adapter’s performance. The testing dataset is created by using a slice of the original dataset that the model hasn’t seen before. This process assesses the adapter’s performance with new data, creating accurate measurements and metrics. 

You must divide all of your documents into training and test sets. The training set is used to train the adapter. The adapter learns the patterns contained in these annotated documents. The test set is used to evaluate the adapter performance. If you upload fewer than 20 documents, split them equally between train and test. If you upload more than 20 documents, assign 70% of data to training and 30% to testing. When splitting documents in the AWS Management Console, you can let Amazon Textract automatically split your documents. Alternatively, you can manually divide your documents into training and testing sets. 

**Dataset components**  
Datasets contain the four following components, which you must prepare yourself or by using the AWS Management Console: 
+ Images - Images can be JPEG, PNG, 1-page PDF, or 1-page TIFF. If you are submitting multipage documents, the AWS Management Console will visualize each page separately for annotation.
+ Annotation file - The annotation file follows the Amazon Textract Block structure, though it contains only QUERY and QUERY\$1RESULT blocks.
+ Prelabeling files - This is the Block structure from the Amazon Textract current API response, pulled from the result of either the [DetectDocumentText](API_DetectDocumentText.md) or [AnalyzeDocument](API_AnalyzeDocument.md) operations. If you have already called Amazon Textract before and stored the result of the operation, you can provide the references to those results. Amazon Textract accepts multiple prelabeling files in case your document page has multiple response files exported from an asynchronous API.
+ Manifest file - A JSONL-based file where each line points to an annotation file, the prelabeling file, or an image or single-page PDF. Refer to this manifest file format when structuring your manifest file. 

Manifest files are contain one or more JSON lines, with each line containing information for a single image. What follows is a single line in a manifest file:

```
{
    "source-ref": "s3://textract-adapters-sample-bucket-129090f9e-d51c-4034-a732-48caa3b532e7/adapters/0000000000/assets/1003_3_1.png",
    "source-ref-version": "uPNKaY_2I8dxj9Kp2sO0zDUt4q3MAJen",
    "source-ref-metadata": {
        "origin-ref": "s3://textract-adapters-sample-bucket-129090f9e-d51c-4034-a732-48caa3b532e7/adapters/0000000000/original_assets/1003_3.tiff",
        "page-number": 1
    },
    "annotations-ref": "s3://textract-adapters-sample-bucket-129090f9e-d51c-4034-a732-48caa3b532e7/adapters/0000000000/annotations/1003_3_1.png.json",
    "annotations-ref-version": "nwj_MC40zsAae_idwsdEa0r4ZQaVthGs",
    "annotations-ref-metadata": {
        "prelabeling-refs": [{
            "prelabeling-ref": "s3://textract-adapters-sample-bucket-129090f9e-d51c-4034-a732-48caa3b532e7/adapters/0000000000/prelabels/fd958ee156b5b5de1ee6101dd05263120790836856774c871b877baa35e2f373/1"
            "prelabeling-ref-version": "uPNKaY_2I8dxj9Kp2sO0zDUt4q3MAJen"
        ]},
        "assignment": "TRAINING",
        "include": true,
    },
    "schema-version": "1.0"
}
```

 Note that the manifest file contains the following info: 
+  **source-ref: ** (Required) The Amazon S3 location of the image or single page file. The format is "s3://BUCKET/OBJECT\$1PATH". 
+  **source-ref-version: ** (Optional) The Amazon S3 object version of the image or single page file. 
+  **source-ref-metadata: ** (Optional) Metadata about the **source-ref** when this image of single page file should is part of a multipage document. This information is helpful when you want to evaluate the adapter on multipage documents. When not specified, we consider each **source-ref** as a standalone document. 
+  *origin-ref: * (Required) The Amazon S3 location to the original multipage document. 
+  *page-number: * (Required) Page number of the **source-ref** in the original document. 
+  **annotations-ref: ** (Required) The Amazon S3 location of the customer performed annotations on the image. The format is "s3://BUCKET/OBJECT\$1PATH". 
+  **annotations-ref-metadata: ** (Required) Metadata about the annotations attribute. Holds prelabeling references, along with assignment type of the manifest line item, and if to include/exclude the document from training. 
+  *prelabeling-refs: * (Required) An list of files from the Amazon Textract asynchronous API response of the source-ref file. Each file in prelabeling-refs should contain a Block property, with at most of 1000 blocks. 
+  *prelabeling-ref * (Required) The Amazon S3 location of the automatic annotations on the image using the Amazon Textract API. 
+  *prelabeling-ref-version * (Optional) The Amazon S3 object version of the prelabeling file. 
+  *assignment: * (Required) Specify "TRAINING" if the image belongs to the training dataset. Otherwise, use "TESTING". 
+  *include: * (Required) Specify true to include the line item for training. Otherwise, use false. 
+  **schema-version: ** (Optional) Version of the manifest file. The valid value is 1.0. 

For optimal accuracy improvements, see [Best practices for Amazon Textract Custom Queries](best-practices-adapters.md).

**Annotating the documents with queries and responses**  
When annotating your documents, you can choose to auto-label your documents using the pretrained Queries feature and then edit the labels where needed. Alternatively, you can manually label responses for each of your document queries. 

When manually labeling your documents, Amazon Textract extracts the raw text from the document. After the raw text is extracted, you can use the AWS Management Console annotation interface to create queries for your documents. Link these queries to the relevant answers in your documents to establish a "ground truth" for training. 

When auto-labeling your documents, you specify the appropriate queries for your document. When you finish adding queries to your documents, Amazon Textract attempts to extract the proper elements from your documents, generating annotations. You must then verify the accuracy of these annotations, correcting any that are incorrect. By linking queries to answers, you teach the model what information is important in your documents. 

When creating queries, consider the types of questions you will have to ask to retrieve the relevant data in your documents. For more information about this response structure, see [Query Response Structures](https://docs.aws.amazon.com/en_us/textract/latest/dg/queryresponse.html). For more information on best practices for queries, see [Best Practices for Queries](https://docs.aws.amazon.com/en_us/textract/latest/dg/bestqueries.html). 

You will need to train an adapter on representative samples of your documents. When you use the AWS Management Console for annotating the documents, the console prepares these files for you automatically. 

# Training adapter versions
<a name="training-adapter-versions"></a>

After you have created an adapter and created training and testing datasets, you can train a version of that adapter using the [CreateAdapterVersion](API_CreateAdapterVersion.md) operation.

# Create adapter version
<a name="textract-create-adapter-version"></a>

To customize the Amazon Textract base model to fit your specific use cases, create an adapter. After you create an adapter, you need to train the adapter. You can start training an adapter by calling the [CreateAdapterVersion](API_CreateAdapterVersion.md) operation. You provide the operation with an AdapterId and use the DatasetConfig to specify an Amazon S3 bucket containing the dataset you want to train the adapter on. The manifest file you provide must follow a specific format. For more information, see [Preparing training and testing datasets](textract-preparing-training-testing.md). You can also provide the operation with an optional KMSKeyId, optional ClientRequestToken, or any Tags to add to the adapter version.

Running this operation requires the appropriate IAM permissions. For a sample IAM policy, see [Permissions needed for CreateAdapterVersion](security_iam_id-based-policy-examples.md#security_iam_create-adapter-version).

To create a new adapter version with the console:
+ Sign in to the Amazon Textract console.
+ Select **Custom Queries** from the left navigation panel.
+ From the list of **Your adapters**, select the adapter.
+ On the adapter details page, select **Modify the dataset**.
+ Select the **Add documents** dropdown menu and add documents to the training dataset.
+ On the following page, choose how to add your training documents (by S3 bucket or directly from your computer).
+ Choose **Add documents** to finish adding your documents to the dataset.
+ Wait until the auto-labeling is complete.
+ Review the annotations by clicking** Review Annotations**.
+ Review each document, clicking “**Submit and next**”.
+ After you review all annotations, choose **Train adapter** to start training the new adapter.

The number of successful trainings that can be performed per month is limited per AWS account. Refer to [Set Quotas in Amazon TextractModifying Default Quotas in Amazon Textract](limits-quotas-explained.md) for more information regarding limits.

To create an adapter version with the AWS CLI or AWS SDK:
+ If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see [Step 2: Set Up the AWS CLI and AWS SDKs](setup-awscli-sdk.md).
+ Use the following code to create a adapter: 

------
#### [ CLI ]

```
aws textract create-adapter-version \
--adapter-id "012345678910" \
--dataset-config '{"ManifestS3Object": {"Bucket":"amzn-s3-demo-source-bucket","Name":"test/sample-manifest.jsonl"}}' \
--output-config '{"S3Bucket": "amzn-s3-demo-destination-bucket", "S3Prefix": "prefix-string"}'
```

------

# Evaluating and improving your adapters
<a name="textract-evaluating-improving-adapters"></a>

Once you have finished the training process and created your adapter, it's important to evaluate how well the adapter is extracting information from your documents.

**Performance metrics**  
 Three metrics are provided in the Amazon Textract console to assist you in analyzing your adapter's performance: 

1.  Precision - Precision measures the percentage of extracted information (predictions) that are correct. The higher the precision rating, the fewer false positives there are. 

1. Recall - Recall measures the percentage of total relevant items that are successfully identified and extracted by the model. The higher the recall value, the fewer false negatives there are.

1. F1 Score - The F1 score combines precision and recall into a single metric, providing a balanced measurement for overall extraction accuracy. 

The values for these measurements range from 0 to 1, with 1 being perfect extraction.

These metrics are calculated by comparing the adapter's extractions to the "ground truth" annotations on the test set. By analyzing the F1, precision, and recall, you can determine where the adapter needs improvement.

For example, low precision means many of the model’s predictions are false positives, therefore the adapter is extracting irrelevant data. In contrast, a low recall value means that the model is missing relevant data. Using these insights, you can refine the training data and retrain the adapter to increase performance.

You can also check the performance of your model by testing it with new documents and queries that you specify. Use the **Try Adapter** option in the console to get predictions for these documents. This way, you can evaluate the adapter with your own test queries and documents and see real-world examples of how the adapter is performing. 

You can also retrieve metrics for an adapter version by using the [GetAdapterVersion](API_GetAdapterVersion.md) operation using an SDK or the CLI. Get a list of adapters that you want to retrieve metrics for by using the [ListAdapterVersions](API_ListAdapterVersions.md) API operation. Delete an adapter you no longer need with the [DeleteAdapterVersion](API_DeleteAdapterVersion.md) operation.

**Improving your model**  
Adapter deployment is an iterative process, as you’ll likely need to retrain several times to reach your target level of accuracy. After you create and train your adapter, you’ll want to test and evaluate your adapter’s performance on various metrics and queries. 

If your adapter’s accuracy is lacking in any area, add new examples of those documents to increase the adapter’s performance for those queries. Try to provide the adapter with additional, varied examples that reflect the cases where it struggles. Providing your adapter with representative, varied documents enables it to handle diverse real-world examples.

After adding new documents to your training set, retrain the adapter. Then re-evaluate on your test set and queries. Repeat this process until the adapter reaches your desired level of performance. Precision, recall, and F1 scores should gradually improve over successive training iterations. 

# List adapter versions
<a name="textract-list-adapter-version"></a>

An Amazon Textract adapter can have a number of different versions associated with it. In order to see which adapter versions associated with a given adapter, you can call the [ListAdapterVersions](API_ListAdapterVersions.md) operation. The operation will return all versions of an adapter unless provided with filtering criteria using of the optional arguments such as AdapterId, AfterCreationTime, BeforeCreationTIme, Statuses, or MaxResults.

To see a list of your adapter versions with the console:
+ Sign in to the Amazon Textract console.
+ Select **Custom Queries** from the left navigation panel.
+ From the list of your adapters, select the adapter.
+ View the adapter versions in the **Adapter versions** box.

To create an adapter with the AWS CLI or AWS SDK:
+ If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see [Step 2: Set Up the AWS CLI and AWS SDKs](setup-awscli-sdk.md).
+ Use the following code to create an adapter: 

------
#### [ CLI ]

```
aws textract list-adapter-versions 
```

------

# Get an Adapter version
<a name="textract-get-adapter-version"></a>

You can retrieve configuration information and the current status of an adapter version by calling the [GetAdapterVersion](API_GetAdapterVersion.md) operation. When calling GetAdapterVersion, specify the AdapterId and the AdapterVersion. This returns information about the specified adapter version so that you can check the current operational status and configuration options.

To see details for your adapter using the console:
+ Sign in to the Amazon Textract console.
+ Select **Custom Queries** from the left navigation panel.
+ From the list of your adapters, select the adapter.
+ Select the adapter version in the **Adapter versions** box.

To see details for your adapter using the AWS CLI or AWS SDK
+ If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see [Step 2: Set Up the AWS CLI and AWS SDKs](setup-awscli-sdk.md).
+ Use the following code to create an adapter: 

------
#### [ CLI ]

```
aws textract get-adapter-version \
--adapter-id "abcdef123456" \
--adapter-version "1"
```

------

# Delete adapter version
<a name="textract-delete-adapter-version"></a>

You can delete an adapter version you’re no longer using by calling [DeleteAdapterVersion](API_DeleteAdapterVersion.md). To delete an adapter version you provide the DeleteAdapterVersion operation with both the adapter’s AdapterId and the specific AdapterVersion that you want to delete. Note that you cannot delete adapter versions with an "IN\$1PROGRESS" status.

To delete an adapter version with the console:
+ Sign in to the Amazon Textract console.
+ Select **Custom Queries** from the left navigation panel.
+ From the list of your adapters, select the adapter.
+ Select **Delete** and follow the instructions.
+ Select the adapter version that you want to delete from the list of versions in the **Adapter** versions box.
+ Select **Delete** and follow the instructions to delete your adapter.

To delete an adapter with the AWS CLI or AWS SDK:
+ If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see [Step 2: Set Up the AWS CLI and AWS SDKs](setup-awscli-sdk.md).
+ Use the following code to create an adapter: 

------
#### [ CLI ]

```
aws textract delete-adapter-version \
--adapter-id "abcdef123456" \
--adapter-version "1"
```

------

# Debugging training failures
<a name="textract-debugging-failures-adapters"></a>

If you are notified on the adapter details page that training has failed, refer to the status message to understand the error and correct it. There are two types of errors: creation errors and file errors. Some status messages are returned in the console, while others are displayed in a validation file. 

The validation file that is created alongside a training job contains information on the types of errors encountered when training. If the error message states that the error is a validation error ("Status message = Manifest file contains invalid records. Consult validation error file at OutputConfig path for more details."), refer to the validation file located in the S3 output bucket you chose during adapter training. The generated validation file is named `validation_errors.jsonl`. Each line in the file corresponds to a line in the manifest file, with errors yielded for each line in the manifest file that produces an error.

The following is a list of all creation errors and possible causes:


|  |  | 
| --- |--- |
| Error name | Error description | 
| CREATION\$1ERROR | Manifest file contains invalid records. Consult validation error file at OutputConfig path for more details. | 
| CREATION\$1ERROR | No manifest file found. Ensure manifest file is provided. | 
| CREATION\$1ERROR | Unable to access manifest file in specified S3 bucket. | 
| CREATION\$1ERROR | Manifest file located in an unsupported cross-Region S3 bucket. | 
| CREATION\$1ERROR | Contents of manifest file are empty. | 
| CREATION\$1ERROR | The manifest file size exceeds the maximum supported size. | 
| CREATION\$1ERROR | The manifest file has too many training documents. | 
| CREATION\$1ERROR | The manifest file has too many testing documents. | 
| CREATION\$1ERROR | The manifest file has too few training documents. | 
| CREATION\$1ERROR | The manifest file has too few testing documents. | 
| CREATION\$1ERROR | The manifest file has too few training, and testing documents. | 
| CREATION\$1ERROR | The manifest file has too many training, and testing documents. | 
| CREATION\$1ERROR | The manifest file has invalid encoding. | 
| CREATION\$1ERROR | Manifest file contains more training records than allowed limits. | 
| CREATION\$1ERROR | Manifest file contains more testing records than allowed limits. | 
| CREATION\$1ERROR | Unable to access the specified KMS key. | 
| CREATION\$1ERROR | Unable to access the S3 output bucket. | 
| CREATION\$1ERROR | Amazon Textract does not support cross-Region Amazon S3 resources. | 

The following is a list of file-related errors:


|  |  | 
| --- |--- |
| Error name | Error description | 
| ERROR\$1PAGE\$1COUNT\$1EXCEEDS\$1MAXIMUM | Number of pages for the same document exceeds maximum limit.(This happens when customer specified origin-ref and page\$1number in source-ref metadata.) | 
| ERROR\$1INVALID\$1FILE | The \$1source-ref\$1annotations-ref\$1prelabeling-refs\$1 file(s) is invalid. Check S3 path and/or file properties. | 
| ERROR\$1INVALID\$1JSON\$1LINE | The JSON line format is invalid | 
| ERROR\$1MANIFEST\$1JSON\$1DECODE\$1ERROR | The record is not a valid JSON object. | 
| ERROR\$1DUPLICATE\$1SOURCE\$1REF | A record with this source-ref already exists in the manifest. | 
| ERROR\$1IMAGE\$1TOO\$1LARGE | The image resolution is too large. | 
| ERROR\$1INVALID\$1PAGE\$1COUNT | The file is invalid. Expected number of pages to be 1. | 
| ERROR\$1INVALID\$1IMAGE | Unsupported source reference file format. | 
| ERROR\$1INVALID\$1PDF | Unsupported PDF file. | 
| ERROR\$1INVALID\$1PDF\$1PAGE\$1TOO\$1LARGE | Unsupported PDF file. PDF page exceeds max dimensions. | 
| ERROR\$1INVALID\$1TIFF | Unsupported TIFF file. | 
| ERROR\$1INVALID\$1TIFF\$1COMPRESSION | Unsupported TIFF compression type. | 
| ERROR\$1INVALID\$1ANNOTATIONS | Invalid annotation or prelabeling file. | 
| ERROR\$1INVALID\$1ANNOTATIONS\$1FILE\$1FORMAT | Invalid annotations file format. | 
| ERROR\$1MISSING\$1ANNOTATION\$1BLOCKS | Missing \$1PAGE\$1QUERY\$1QUERY\$1RESULT\$1 block(s). | 
| ERROR\$1INVALID\$1BLOCK | Invalid \$1QUERY\$1QUERY\$1RESULT\$1 block(s) found. | 
| ERROR\$1FILE\$1SIZE\$1LIMIT\$1EXCEEDED | The size of the \$1ref\$1file\$1type\$1 file(s) exceeds the limit: \$1size\$1limit\$1 MB. | 
| ERROR\$1INVALID\$1PERMISSIONS\$1DATASET\$1S3\$1BUCKET | Unable to access the \$1ref\$1file\$1type\$1 file(s). | 
| ERROR\$1FILE\$1NOT\$1FOUND | The \$1ref\$1file\$1type\$1 file(s) is not found. | 
| ERROR\$1FILE\$1NOT\$1FOUND\$1IN\$1REGION | Amazon Textract does not support cross-Region Amazon S3 resources. | 
| ERROR\$1QUERY\$1RESULT\$1TEXT\$1LENGTH\$1LIMIT\$1EXCEEDED | QUERY\$1RESULT text length is greater than the maximum length. | 
| ERROR\$1QUERY\$1PER\$1PAGE\$1LIMIT\$1EXCEEDED | Number of QUERY blocks is greater than the maximum allowed. | 
| ERROR\$1INVALID\$1DATA\$1FORMAT | "Invalid data format in \$1filename\$1." | 
| ERROR\$1BLOCK\$1LIMIT\$1EXCEEDED | "Number of \$1block\$1type\$1 blocks is greater than the maximum allowed." | 
| ERROR\$1DUPLICATE\$1ORIGIN\$1REF\$1PAGE\$1NUMBER\$1COMBINATION | "A record with this origin-ref and page-number already exists in the manifest." | 
| ERROR\$1INVALID\$1BLOCK\$1RELATIONSHIP | "Invalid block relationship(s) found." | 
| ERROR\$1DUPLICATED\$1BLOCK\$1ID | "Blocks Id should be unique." | 

To see API error descriptions, see the *Amazon Textract API Reference* for the appropriate operation. If an error occurs when you try to create a new adapter with the [CreateAdapterVersion](API_CreateAdapterVersion.md) operation, see the API Reference page. If an error occurs when using the Amazon Textract console, read the error pop-up for information on why the operation failed.

# Using Adapters during Inference
<a name="textract-adapter-inference"></a>

After creating an adapter, you are provided with an ID and version for your custom adapter. You can provide this ID and version to [AnalyzeDocument](API_AnalyzeDocument.md) for synchronous document analysis, or the [StartDocumentAnalysis](API_StartDocumentAnalysis.md) operation for asynchronous analysis. Providing the Adapter ID will automatically integrate the adapter into the analysis process and use it to enhance predictions for your documents. 

This way, you can leverage the capabilities of `AnalyzeDocument` while customizing it to fit your needs. When multiple adapters must be applied to different pages in the same document, you can specify one or more adapter(s) and their respective adapter versions as part of the API request. You can use the `Page` parameter to specify which pages to apply an adapter to. 

This is similar to how the `Page` parameter for Queries works. Note the following:
+ If a page is not specified, it is set to `["1"]` by default.
+ The following characters are valid in the parameter string: `1 2 3 4 5 6 7 8 9 - *`. Blank spaces are not valid. 
+ When using \$1 to indicate all pages, it must be the only element in the list.
+ The `Page` parameter does not overlap across adapters. A page can only have one adapter applied to it. 

See the following example: 

```
AdaptersConfig={ 'Adapters': [ { 'AdapterId': ADAPTER_ID,'Version': '1', 
'Pages': ["1-5"] }, { 'AdapterId': ADAPTER_ID, 'Version': '1', 'Pages':["6-*"] } ] })
```

# Custom Queries tutorial
<a name="textract-adapters-tutorial"></a>

This tutorial shows you how to create, train, evaluate, use, and manage adapters. 

With adapters, you can improve the accuracy of the Amazon Textract API operations, customizing the model’s behavior to fit your own needs and use cases. After you create an adapter with this tutorial, you can use it when analyzing your own documents with the [AnalyzeDocument](API_AnalyzeDocument.md)API operation, and also retrain the adapter for future improvements. 

 In this tutorial you’ll learn how to: 
+ Create an adapter using the AWS Management Console.
+ Create a dataset for training your adapter.
+ Annotate your training data.
+ Train your adapter on your training dataset.
+ Review your adapter’s performance.
+ Retrain your adapter.
+ Use your adapter for document analysis.
+ Delete your adapter.

## Prerequisites
<a name="textract-adapters-tutorial-prereqs"></a>

Before you begin, we recommend that you read [Creating adapters](textract-creating-using-adapters.md). 

You must also set up your AWS account and install and configure an AWS SDK. For the SDK setup instructions, see [Step 2: Set Up the AWS CLI and AWS SDKs](setup-awscli-sdk.md). 

## Create an adapter
<a name="textract-adapters-tutorial-create-adapter"></a>

Before you can train or use an adapter you must create one. To create an adapter:

1. Sign in to the AWS Management Console and open the [Amazon Textract console](https://console.aws.amazon.com/textract/).

1. In the left pane, choose Custom Queries. The Amazon Textract Custom Queries landing page is shown.  
![\[Self-service interface showing the Amazon Textract Custom Queries feature for improving information extraction accuracy on business documents, with icons depicting its benefits and workflow steps such as creating adapters, uploading samples, labeling, training models, and checking performance metrics.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image_1.png)

1. The Custom Queries landing page show you a list of all your adapters, and there is also a button to create an adapter. Choose **Create adapter** to create your adapter. The number of successful trainings that can be performed per month is limited per AWS account. Refer to [Set Quotas in Amazon TextractModifying Default Quotas in Amazon Textract](limits-quotas-explained.md) for more information regarding limits.  
![\[Your adapters list is empty with a "No adapters" message and a "Create adapter" button.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image_2.png)

1. On the following page, enter the adapter name, choose whether to automatically update your adapter, and optionally add tags to it. Then, select **Create adapter**. When you choose 'auto-update' Amazon Textract will automatically update your adapter when the pretrained Queries feature is updated. 

After you create your adapter, you will be able to see the details for that adapter, including adapter name and the Adapter ID. The presence of these details verifies that the adapter has successfully been created. 

You can now create the datasets that will be used to train and test your adapter.

## Dataset creation
<a name="textract-adapters-tutorial-dataset"></a>

In this step, you create a training dataset and a test dataset by uploading images from your local computer or from an Amazon S3 bucket. For more information about datasets, see [Detecting Text](how-it-works-detecting.md) [Preparing training and testing datasets](textract-preparing-training-testing.md) 

When uploading images from your local computer, you can upload up to 30 images at one time. If you have a large number of images to upload, consider creating the datasets by importing the images from an Amazon S3 bucket. 

1. To start creating your dataset, choose your adapter from the list of adapters, and then choose **Create dataset**.  
![\[Textract Custom Queries page for my-test-adapter showing steps to create dataset, queries, verify documents, train adapter, check performance metrics, and improve adapter.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image_3.png)

1. In the **Dataset configuration** section, choose either **Manual split** or **Autosplit**. With manual split, you can specify individual images as part of your training and testing datasets. If you choose Autosplit, it will define your training and testing sets automatically when you upload all of your images. Manual split is recommended for people who are training adapters for the first time. For now, choose **Autosplit**.   
![\[Interface showing options to create a dataset for Amazon Textract - either Manual split to provide train and test sets yourself, or Autosplit to let Textract automatically split into training and test sets. Import documents from S3 bucket or local files.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image_4.png)

1. In the Training dataset details section, you can choose **Upload documents from your computer** or **Import documents from S3 bucket**. If you choose to import your documents from an Amazon S3 bucket, provide the path to the bucket and folder that contains your training images. If you upload your documents directly from your computer, note that you can only upload 30 documents at one time. For the purposes of this tutorial, choose Upload documents from your computer.

1. In the Test dataset details section, you can choose **Upload documents from your computer** or **Import documents from S3 bucket**. For the purposes of this tutorial, choose **Upload documents from your computer**.

1. Choose **Create dataset**.

1. After creating the dataset, you will be taken to a Dataset details page. The dataset details page shows you a list of all the documents in your entire dataset, and which part of the dataset (train or test) your document has been assigned to. View this under the **Dataset assigned to** column. You can also view the following: 
   + Document name
   + Document status
   + Number of pages in the document
   + Document type
   + Document size
   + If the document is part of the training set or the testing set

1. Select **Add documents to dataset** and add at least five documents to both your training and testing datasets. If you previously selected Autosplit, you can add all the documents at once. 

1. If you want to add more documents to your dataset, use the **Add documents** menu to do so. 

Before you can start training your adapter, you need to annotate your training documents with Queries. This is required to create the "annotations-ref" entries of your manifest file. After you add all your documents to the training or testing set, you can start the annotation process. 

## Annotation and verification
<a name="textract-adapters-tutorial-annotation"></a>

In this step, you assign Queries and labels to each document you uploaded to your training and test datasets. You link a Query to the relevant answers on a document page with the AWS Management Console annotation tool.

 To assign queries and answers to your documents:

1. Select Create queries from the Adapter landing page.   
![\[To auto-annotate documents, create queries for use by Textract's pre-trained model by selecting "Create queries" button.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image_5-CreateQueries.png)

1. Add a query by entering it in the text box.   
![\[Interface for creating queries on Amazon Textract, with a text box to "Specify queries that Textract can use to extract the information you need." The example query shown is "What is the check amount?"\]](http://docs.aws.amazon.com/textract/latest/dg/images/proteus-ImageCreateQueries.jpg)

1. To add more queries, choose Add new query. Queries can have a 'raw text' response or a 'binary - Yes/No' response. To created a query with a binary response use the advanced setting.  
![\[Interface for creating text extraction queries on document files. Allows specifying raw text or binary (yes/no) response types for each query prompt. Interface has fields for entering query text, selecting response type, and adding new queries.\]](http://docs.aws.amazon.com/textract/latest/dg/images/proteus-ImageCreateQueriesYesNo.jpg)

1. After creating your queries, you must assign labels to your documents. To set labels for your documents, select **Auto-labeling** or **Manual labeling**. Auto-labeling is recommended for your first time training the adapter. Select the **Auto-labelling** option, and then choose Start auto-labeling.   
![\[Interface showing auto-labelling and manual labelling options for document processing.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image_6-Choose%20labelling%20type.png)

1. The auto-labeling process will take some time to complete. When it's done, you're notified that “Auto-labeling is now completed.” After the labeling process is complete, you must verify the accuracy of the labeling. Select **Verify documents** in the Adapter details panel on the Details page, and then choose **Start reviewing** from the Dataset page.  
![\[Overview of dataset split showing 16 total documents, with 10 in the training set and 6 in the test set. No invalid files. Status indicates dataset is ready for review.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image_7-StartReviewing.png)

1. In the annotation tool, you can select individual documents and view individual pages within those documents. Under the “Review responses” section, select a query that was assigned to your document page. If the answer to the query is incorrect, you can edit the response by clicking the **Edit** button for the query.   
![\[Review responses screen showing example customer name John Doe and prompt to enter amortization type. Includes Apply and Cancel buttons.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image7_EditAnnotations.png)

   For queries with Yes/No answers, select Yes, No, or Empty. Then, choose **Apply**. 

   When editing the OCR for the label assigned to a query, choose the provided response and then draw a bounding box around part of the document image. To do so, use the “B” shortcut key or the bounding box tool on the tool bar at the bottom of the annotation tool. Then, choose **Apply**. 

   If a query should have more than one response element (answer), you can add additional responses. To do so, select the query and then choose **Add a response**. You are then prompted to draw a bounding box on the area of the document that has the answer. Confirm that the label for your bounding box is correct. 

   To add a new query for the document page, choose **Add query**. If you add a query, you must specify the query you want to add and then draw a bounding box for the query label. 

   When you're done, choose **Submit and next** to proceed to the next document and the next set of queries and responses. Repeat until you review all of your queries and responses. 

   After you review and evaluate all your queries and responses, select **Submit and close**. 

## Training
<a name="textract-adapters-tutorial-training"></a>

After you add all of your documents to the training set or the testing set and review the generated responses for your queries, you can train the adapter.

![\[Amazon Textract adapter dataset overview showing 16 total documents, 10 in training set, 6 in test set, with no invalid files. Dataset status is annotation review complete.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image8_TrainAdapter.png)


To train the adapter:

1. Start by clicking **Train adapter** on the Dataset management page.   
![\[Notification indicating you have enough documents in your dataset to train an adapter, with a "Train adapter" button.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image9_TrainButton.png)

1. While initiating the training process, you can specify an Amazon S3 bucket that will contain the output of your adapter training job. If you specify an Amazon S3 bucket location that doesn’t exist yet, the bucket path will be created for you. You can also add tags to your adapter to track it, and customize your encryption settings. Customize the adapter training to fit your needs and then choose **Train Adapter**. 

1. On the following page, choose **Train Adapter** to confirm that you want to start the training process. This will create your first version of your adapter.

After the training process starts, you can monitor the training process status on the Adapter landing page. 

You're notified when the training process completes. Then, you can evaluate the adapter’s performance by inspecting metrics. 

## Evaluating adapter performance
<a name="textract-adapters-tutorial-metrics"></a>

To evaluate model performance, use the left navigation pane to select the adapter version to evaluate.

![\[Adapter performance metrics showing F1 score, Precision, and Recall at 94.4% each.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image10_Evaluating%20adapter%20performance.png)


By examining your adapter’s metrics, you can determine how your adapter is performing on the documents in your dataset and the queries you have defined. You can see the F1 Score, Precision, and Recall for your adapter across different elements of the training data: queries, documents, and pages. To switch between performance for these elements, choose the different tabs below the metrics display pane. 

You can also view baseline metrics at any time by toggling the **Switch to baseline metrics** switch. 

The summary of your adapter version’s performance also contains some tips on how to improve your adapter’s performance. You can review these tips at any time to improve your adapter. For more information about how to manage and improve your adapter, see [Evaluating and improving your adapters](textract-evaluating-improving-adapters.md). 

To demo your adapter and see its performance on a document:

1.  Choose **Try Adapter**.   
![\[Two buttons: "Try Adapter" and a dropdown menu labeled "Ver. 1".\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image11_try%20adapter.png)

1. On the **Try adapter** page, you can choose a document to analyze with your adapter. Select the **Choose document** button and browse to the document’s location on your device. Alternatively, drag and drop the document into the **Upload a document** pane.

After uploading a document, the Try Adapter page will update to display the results of the adapter’s analysis, including queries, query answers, and confidence levels. If you are satisfied with your adapter’s performance you can proceed to inference, using your adapter in a call to [AnalyzeDocument](API_AnalyzeDocument.md) or [StartDocumentAnalysis](API_StartDocumentAnalysis.md) . Otherwise, you can improve your adapter’s performance by retraining your adapter with additional documents. 

## Improving an adapter
<a name="textract-adapters-tutorial-improving"></a>

To improve your adapter’s performance:

1. Choose **Modify the dataset** on the Adapter details page.   
![\[Workflow diagram showing steps to create, verify, train, and improve a custom dataset for text extraction: 1. Create dataset, 2. Create queries, 3. Verify documents, 4. Train adapter, 5. Check performance metrics, 6. Improve adapter.\]](http://docs.aws.amazon.com/textract/latest/dg/images/TP-Image12_improveadapter.png)

1. On the Dataset overview page, select **Add documents**. To retrain your adapter, add at least five more documents to the training dataset.

1. You are notified that the documents are added to the dataset. Select **Start reviewing** to review the results of the auto-labeling process.

1. Review the queries and responses. After you review and approve all the annotations for the documents you added, choose **Submit and close**.

1. On the dataset management page, choose **Train adapter** to start training your adapter on all of the data in your training dataset, including the new training documents.

Every training job you run creates a new version of the adapter. Note the name of the new adapter version to be sure that you're evaluating the performance of the proper adapter version. 

## Inference
<a name="textract-adapters-tutorial-inference"></a>

After creating an adapter, you are provided with an ID for your custom adapter. You can provide this ID to the [AnalyzeDocument](API_AnalyzeDocument.md) operation for synchronous document analysis, or the [StartDocumentAnalysis](API_StartDocumentAnalysis.md) operation for asynchronous analysis. 

Providing the Adapter ID automatically integrates the adapter into the analysis process and uses it to enhance predictions for your documents. This way, you can leverage the capabilities of AnalyzeDocument while customizing it to fit your needs. 

For an example of how to run inference using your adapter and the AnalyzeDocument API operation, see [Analyzing Document Text with Amazon Textract](analyzing-document-text.md).

When multiple adapters must be applied to different pages in the same document, you can specify one or more adapter(s) and their respective adapter versions as part of the API request. You can use the Page parameter to specify which pages to apply an adapter to.

Note the following:

This is similar to how the `Page` parameter for Queries works. Note the following:
+ If a page is not specified, it is set to `["1"]` by default.
+ The following characters are valid in the parameter string: `1 2 3 4 5 6 7 8 9 - *`. Blank spaces are not valid. 
+ When using \$1 to indicate all pages, it must be the only element in the list.
+ The `Page` parameter does not overlap across adapters. A page can only have one adapter applied to it. 

## Adapter management
<a name="textract-adapters-tutorial-management"></a>

The following steps are repeated iteratively (after initial training of your adapter):
+ Choose **Modify the dataset** on the Adapter details page of the Amazon Textract console.
+  Select **Add documents**. 
+ Add at least five more documents to the training dataset to retrain your adapter.
+ You're notified that the documents have been added to the dataset. Select **Start reviewing** to review the results of the auto-labeling process.
+ Review the queries and responses. After you review and approve all the annotations for the new documents, select** Train adapter**. 
+ Wait for the adapter to complete the new round of training, then check performance metrics for your new adapter version. 

After you train your model to your target performance level, you can use your adapter for inference in your application. 

 Be sure to delete adapter versions that you no longer need. To delete an adapter: 
+ Go to the Adapters landing page, select the adapter, and choose **Delete**.
+ Type **Delete** into the text box, and then choose **Delete**.

# Copying adapters
<a name="textract-copy-adapters"></a>

Adapter Versions can be copied from one AWS account to another within AWS Regions. 

In order to copy an adapter, you must have created an adapter in the destination AWS account using the Console or API. You are not required to train an adapter version, but the meta data (Adapter name and description) for the adapter must exist. This is to ensure you/your organization have access to the destination account. 

**Note**  
Your source and destination AWS accounts must be in the same AWS region to successfully copy an adapter. Please check the account regions before attempting to copy.

Once you have created an adapter, submit a support ticket with the following details. You will need a support subscription before submitting the ticket:

```
Region: xxx

Source:
AWS Account:
Adapter ID:
Adapter Version:

Destination:
AWS Account:
Adapter ID:
```

Once the adapter is copied over, you can use the destination adapter ID and version to make inference calls. You can test the inference API output using the same set of queries you used to train the source adapter. The destination adapter will return the same results as the source adapter.