# Enhanced data labeling


Amazon SageMaker Ground Truth manages sending your data objects to workers to be labeled. Labeling each data object is a *task*. Workers complete each task until the entire labeling job is complete. Ground Truth divides the total number of tasks into smaller *batches* that are sent to workers. A new batch is sent to workers when the previous one is finished.

Ground Truth provides two features that help improve the accuracy of your data labels and reduce the total cost of labeling your data:
+ *Annotation consolidation* helps to improve the accuracy of your data object labels. It combines the results of multiple workers' annotation tasks into one high-fidelity label.
+ *Automated data labeling* uses machine learning to label portions of your data automatically without having to send them to human workers.

**Topics**
+ [

# Control the flow of data objects sent to workers
](sms-batching.md)
+ [

# Annotation consolidation
](sms-annotation-consolidation.md)
+ [

# Automate data labeling
](sms-automated-labeling.md)
+ [

# Chaining labeling jobs
](sms-reusing-data.md)

# Control the flow of data objects sent to workers
Control the flow of data objects sent to workers

Depending on the type of labeling job you create, Amazon SageMaker Ground Truth sends data objects to workers in batches or in a streaming fashion. You can control the flow of data objects to workers in the following ways:
+ For both types of labeling jobs, you can use `MaxConcurrentTaskCount` to control the total number of data objects available to all workers at a given point in time when the labeling job is running.
+ For streaming labeling jobs, you can control the flow of data objects to workers by monitoring and controlling the number of data objects sent to the Amazon SQS associated with your labeling job. 

Use the following sections to learn more about these options.

**Topics**
+ [

## Use MaxConcurrentTaskCount to control the flow of data objects
](#sms-batching-maxconcurrenttaskcount)
+ [

## Use Amazon SQS to control the flow of data objects to streaming labeling jobs
](#sms-batching-streaming-sqs)

## Use MaxConcurrentTaskCount to control the flow of data objects


[https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-MaxConcurrentTaskCount](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-MaxConcurrentTaskCount) defines the maximum number of data objects available at one time in the worker-portal task queue. If you use the console, this parameter is set to 1,000. If you use `CreateLabelingJob`, you can set this parameter to any integer between 1 and 5,000, inclusive.

Use the following example to better understand how the number of entries in your manifest file, the `NumberOfHumanWorkersPerDataObject`, and the `MaxConcurrentTaskCount` define what tasks workers see in their task queue in the worker-portal UI.

1. You have an input manifest files with 600 entries.

1. For each entry in your input manifest file, you can use `NumberOfHumanWorkersPerDataObject` to define the number of human workers that will label an entry from your input manifest file. In this example, you set `NumberOfHumanWorkersPerDataObject` equal to 3. This will create 3 different tasks for each entry in your input manifest file. Also, to be marked as successfully labeled, at least 3 different workers must label the object. This creates a total of 1,800 tasks (600 x 3) to be completed by workers.

1. You want workers to only see 100 tasks at a time in their queue in the worker portal UI. To do this, you set `MaxConcurrentTaskCount` equal to 100. Ground Truth will then fill the worker-portal task queue with 100 tasks per worker.

1. What happens next depends on the type of labeling job you are creating, and if it is a streaming labeling job.
   + **Streaming labeling job**: As long as the total number of objects available to workers is equal to `MaxConcurrentTaskCount`, all remaining dataset objects in your input manifest file and that you send in real time using Amazon SNS are placed on an Amazon SQS queue. When the total number of objects available to workers falls below `MaxConcurrentTaskCount` minus `NumberOfHumanWorkersPerDataObject`, a new data object from the queue is used to create`NumberOfHumanWorkersPerDataObject`-tasks, which are sent to workers in real time.
   + **Non-streaming labeling job**: As workers finish labeling one set of objects, up to `MaxConcurrentTaskCount` times `NumberOfHumanWorkersPerDataObject` number of new tasks will be sent to workers. This process is repeated until all data objects in the input manifest file are labeled.

## Use Amazon SQS to control the flow of data objects to streaming labeling jobs


When you create a streaming labeling job, an Amazon SQS queue is automatically created in your account. Data objects are only added to the Amazon SQS queue when the total number of objects sent to workers is above `MaxConcurrentTaskCount`. Otherwise, objects are sent directly to workers.

You can use this queue to manage the flow of data objects to your labeling job. To learn more, see [Manage labeling requests with an Amazon SQS queue](sms-streaming-how-it-works-sqs.md).

# Annotation consolidation


An *annotation* is the result of a single worker's labeling task. *Annotation consolidation* combines the annotations of two or more workers into a single label for your data objects. A label, which is assigned to each object in the dataset, is a probabilistic estimate of what the true label should be. Each object in the dataset typically has multiple annotations, but only one label or set of labels.

You decide how many workers annotate each object in your dataset. Using more workers can increase the accuracy of your labels, but also increases the cost of labeling. To learn more about Ground Truth pricing, see [Amazon SageMaker Ground Truth pricing ](https://aws.amazon.com/sagemaker/groundtruth/pricing/).

If you use the Amazon SageMaker AI console to create a labeling job, the following are the defaults for the number of workers who can annotate objects: 
+ Text classification—3 workers
+ Image classification—3 workers
+ Bounding boxes—5 workers
+ Semantic segmentation—3 workers
+ Named entity recognition—3 workers

When you use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html) operation, you set the number of workers to annotate each data object with the `NumberOfHumanWorkersPerDataObject` parameter. You can override the default number of workers that annotate a data object using the console or the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html) operation.

Ground Truth provides an annotation consolidation function for each of its predefined labeling tasks: bounding box, image classification, name entity recognition, semantic segmentation, and text classification. These are the functions:
+ Multi-class annotation consolidation for image and text classification uses a variant of the [Expectation Maximization](https://en.wikipedia.org/wiki/Expectation-maximization_algorithm) approach to annotations. It estimates parameters for each worker and uses Bayesian inference to estimate the true class based on the class annotations from individual workers. 
+ Bounding box annotation consolidates bounding boxes from multiple workers. This function finds the most similar boxes from different workers based on the [Jaccard index](https://en.wikipedia.org/wiki/Jaccard_index), or intersection over union, of the boxes and averages them. 
+ Semantic segmentation annotation consolidation treats each pixel in a single image as a multi-class classification. This function treats the pixel annotations from workers as "votes," with more information from surrounding pixels incorporated by applying a smoothing function to the image.
+ Named entity recognition clusters text selections by Jaccard similarity and calculates selection boundaries based on the mode, or the median if the mode isn't clear. The label resolves to the most assigned entity label in the cluster, breaking ties by random selection.

You can use other algorithms to consolidate annotations. For information, see [Annotation consolidation function creation](consolidation-lambda.md). 

# Annotation consolidation function creation


You can choose to use your own annotation consolidation function to determine the final labels for your labeled objects. There are many possible approaches for writing a function and the approach that you take depends on the nature of the annotations to consolidate. Broadly, consolidation functions look at the annotations from workers, measure the similarity between them, and then use some form of probabilistic judgment to determine what the most probable label should be.

If you want to use other algorithms to create annotation consolidations functions, you can find the worker responses in the `[project-name]/annotations/worker-response` folder of the Amazon S3 bucket where you direct the job output.

## Assess similarity


To assess the similarity between labels, you can use one of the following strategies, or you can use one that meets your data labeling needs:
+ For label spaces that consist of discrete, mutually exclusive categories, such as multi-class classification, assessing similarity can be straightforward. Discrete labels either match or do not match. 
+ For label spaces that don't have discrete values, such as bounding box annotations, find a broad measure of similarity. For bounding boxes, one such measure is the Jaccard index. This measures the ratio of the intersection of two boxes with the union of the boxes to assess how similar they are. For example, if there are three annotations, then there can be a function that determines which annotations represent the same object and should be consolidated.

## Assess the most probable label


With one of the strategies detailed in the previous sections in mind, make some sort of probabilistic judgment on what the consolidated label should be. In the case of discrete, mutually exclusive categories, this can be straightforward. One of the most common ways to do this is to take the results of a majority vote between the annotations. This weights the annotations equally. 

Some approaches attempt to estimate the accuracy of different annotators and weight their annotations in proportion to the probability of correctness. An example of this is the Expectation Maximization method, which is used in the default Ground Truth consolidation function for multi-class annotations. 

For more information about creating an annotation consolidation function, see [Processing data in a custom labeling workflow with AWS Lambda](sms-custom-templates-step3.md).

# Automate data labeling


If you choose, Amazon SageMaker Ground Truth can use active learning to automate the labeling of your input data for certain built-in task types. *Active learning* is a machine learning technique that identifies data that should be labeled by your workers. In Ground Truth, this functionality is called automated data labeling. Automated data labeling helps to reduce the cost and time that it takes to label your dataset compared to using only humans. When you use automated labeling, you incur SageMaker training and inference costs. 

We recommend using automated data labeling on large datasets because the neural networks used with active learning require a significant amount of data for every new dataset. Typically, as you provide more data, the potential for high accuracy predictions goes up. Data will only be auto-labeled if the neural network used in the auto-labeling model can achieve an acceptably high level of accuracy. Therefore, with larger datasets, there is more potential to automatically label the data because the neural network can achieve high enough accuracy for auto-labeling. Automated data labeling is most appropriate when you have thousands of data objects. The minimum number of objects allowed for automated data labeling is 1,250, but we strongly suggest providing a minimum of 5,000 objects.

Automated data labeling is available only for the following Ground Truth built-in task types: 
+ [Create an image classification job (Single Label)](sms-image-classification.md)
+ [Identify image contents using semantic segmentation](sms-semantic-segmentation.md)
+ Object detection ([Classify image objects using a bounding box](sms-bounding-box.md))
+ [Categorize text with text classification (Single Label)](sms-text-classification.md)

[Streaming labeling jobs](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-streaming-labeling-job.html) do not support automated data labeling.

To learn how to create a custom active learning workflow using your own model, see [Set up an active learning workflow with your own model](#samurai-automated-labeling-byom).

Input data quotas apply for automated data labeling jobs. See [Input Data Quotas](input-data-limits.md) for information about dataset size, input data size and resolution limits.

**Note**  
Before you use an the automated-labeling model in production, you need to fine-tune or test it, or both. You might fine-tune the model (or create and tune another supervised model of your choice) on the dataset produced by your labeling job to optimize the model’s architecture and hyperparameters. If you decide to use the model for inference without fine-tuning it, we strongly recommend making sure that you evaluate its accuracy on a representative (for example, randomly selected) subset of the dataset labeled with Ground Truth and that it matches your expectations.

## How it works


You enable automated data labeling when you create a labeling job. This is how it works:

1. When Ground Truth starts an automated data labeling job, it selects a random sample of input data objects and sends them to human workers. If more than 10% of these data objects fail, the labeling job will fail. If the labeling job fails, in addition to reviewing any error message Ground Truth returns, check that your input data is displaying correctly in the worker UI, instructions are clear, and that you have given workers enough time to complete tasks.

1. When the labeled data is returned, it is used to create a training set and a validation set. Ground Truth uses these datasets to train and validate the model used for auto-labeling.

1. Ground Truth runs a batch transform job, using the validated model for inference on the validation data. Batch inference produces a confidence score and quality metric for each object in the validation data.

1. The auto labeling component will use these quality metrics and confidence scores to create a *confidence score threshold* that ensures quality labels. 

1. Ground Truth runs a batch transform job on the unlabeled data in the dataset, using the same validated model for inference. This produces a confidence score for each object. 

1. The Ground Truth auto labeling component determines if the confidence score produced in step 5 for each object meets the required threshold determined in step 4. If the confidence score meets the threshold, the expected quality of automatically labeling exceeds the requested level of accuracy and that object is considered auto-labeled. 

1. Step 6 produces a dataset of unlabeled data with confidence scores. Ground Truth selects data points with low confidence scores from this dataset and sends them to human workers. 

1. Ground Truth uses the existing human-labeled data and this additional labeled data from human workers to update the model.

1. The process is repeated until the dataset is fully labeled or until another stopping condition is met. For example, auto-labeling stops if your human annotation budget is reached.

The preceding steps happen in iterations. Select each tab in the following table to see an example of the processes that happen in each iteration for an object detection automated labeling job. The number of data objects used in a given step in these images (for example, 200) is specific to this example. If there are fewer than 5,000 objects to label, the validation set size is 20% of the whole dataset. If there are more than 5,000 objects in your input dataset, the validation set size is 10% of the whole dataset. You can control the number of human labels collected per active learning iteration by changing the value for [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-MaxConcurrentTaskCount](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_HumanTaskConfig.html#sagemaker-Type-HumanTaskConfig-MaxConcurrentTaskCount) when using the API operation [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html). This value is set to 1,000 when you create a labeling job using the console. In the active learning flow illustrated under the **Active Learning** tab, this value is set to 200.

------
#### [ Model Training ]

![\[Example process of model training.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/sms/auto-labeling/sagemaker-gt-annotate-data-3.png)


------
#### [ Automated Labeling ]

![\[Example process of automated labeling.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/sms/auto-labeling/sagemaker-gt-annotate-data-4.png)


------
#### [ Active Learning ]

![\[Example process of active learning.\]](http://docs.aws.amazon.com/sagemaker/latest/dg/images/sms/auto-labeling/sagemaker-gt-annotate-data-5.png)


------

### Accuracy of automated labels


The definition of *accuracy* depends on the built-in task type that you use with automated labeling. For all task types, these accuracy requirements are pre-determined by Ground Truth and cannot be manually configured.
+ For image classification and text classification, Ground Truth uses logic to find a label-prediction confidence level that corresponds to at least 95% label accuracy. This means Ground Truth expects the accuracy of the automated labels to be at least 95% when compared to the labels that human labelers would provide for those examples.
+ For bounding boxes, the expected mean [Intersection Over Union (IoU) ](https://www.pyimagesearch.com/2016/11/07/intersection-over-union-iou-for-object-detection/) of the auto-labeled images is 0.6. To find the mean IoU, Ground Truth calculates the mean IoU of all the predicted and missed boxes on the image for every class, and then averages these values across classes.
+ For semantic segmentation, the expected mean IoU of the auto-labeled images is 0.7. To find the mean IoU, Ground Truth takes the mean of the IoU values of all the classes in the image (excluding the background).

At every iteration of Active Learning (steps 3-6 in the list above), the confidence threshold is found using the human-annotated validation set so that the expected accuracy of the auto-labeled objects satisfies certain predefined accuracy requirements.

## Create an automated data labeling job (console)


To create a labeling job that uses automated labeling in the SageMaker AI console, use the following procedure.

**To create an automated data labeling job (console)**

1. Open the Ground Truth **Labeling jobs** section of the SageMaker AI console: [https://console.aws.amazon.com/sagemaker/groundtruth](https://console.aws.amazon.com/sagemaker/groundtruth).

1. Using [Create a Labeling Job (Console)](sms-create-labeling-job-console.md) as a guide, complete the **Job overview** and **Task type** sections. Note that auto labeling is not supported for custom task types.

1. Under **Workers**, choose your workforce type. 

1. In the same section, choose **Enable automated data labeling**. 

1. Using [Configure the Bounding Box Tool](sms-getting-started.md#sms-getting-started-step4) as a guide, create worker instructions in the section ***Task Type* labeling tool**. For example, if you chose **Semantic segmentation** as your labeling job type, this section is called **Semantic segmentation labeling tool**.

1. To preview your worker instructions and dashboard, choose **Preview**.

1. Choose **Create**. This creates and starts your labeling job and the auto labeling process. 

You can see your labeling job appear in the **Labeling jobs** section of the SageMaker AI console. Your output data appears in the Amazon S3 bucket that you specified when creating the labeling job. For more information about the format and file structure of your labeling job output data, see [Labeling job output data](sms-data-output.md).

## Create an automated data labeling job (API)


To create an automated data labeling job using the SageMaker API, use the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_LabelingJobAlgorithmsConfig.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_LabelingJobAlgorithmsConfig.html) parameter of the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html) operation. To learn how to start a labeling job using the `CreateLabelingJob` operation, see [Create a Labeling Job (API)](sms-create-labeling-job-api.md).

Specify the Amazon Resource Name (ARN) of the algorithm that you are using for automated data labeling in the [LabelingJobAlgorithmSpecificationArn](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_LabelingJobAlgorithmsConfig.html#SageMaker-Type-LabelingJobAlgorithmsConfig-LabelingJobAlgorithmSpecificationArn) parameter. Choose from one of the four Ground Truth built-in algorithms that are supported with automated labeling:
+ [Create an image classification job (Single Label)](sms-image-classification.md)
+ [Identify image contents using semantic segmentation](sms-semantic-segmentation.md)
+ Object detection ([Classify image objects using a bounding box](sms-bounding-box.md)) 
+ [Categorize text with text classification (Single Label)](sms-text-classification.md)

When an automated data labeling job finishes, Ground Truth returns the ARN of the model it used for the automated data labeling job. Use this model as the starting model for similar auto-labeling job types by providing the ARN, in string format, in the [InitialActiveLearningModelArn](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_LabelingJobAlgorithmsConfig.html#SageMaker-Type-LabelingJobAlgorithmsConfig-InitialActiveLearningModelArn) parameter. To retrieve the model's ARN, use an AWS Command Line Interface (AWS CLI) command similar to the following. 

```
# Fetch the mARN of the model trained in the final iteration of the previous labeling job.Ground Truth
pretrained_model_arn = sagemaker_client.describe_labeling_job(LabelingJobName=job_name)['LabelingJobOutput']['FinalActiveLearningModelArn']
```

To encrypt data on the storage volume attached to the ML compute instance(s) that are used in automated labeling, include an AWS Key Management Service (AWS KMS) key in the `VolumeKmsKeyId` parameter. For information about AWS KMS keys, see [What is AWS Key Management Service?](https://docs.aws.amazon.com/kms/latest/developerguide/overview.html) in the *AWS Key Management Service Developer Guide*.

For an example that uses the [https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateLabelingJob.html) operation to create an automated data labeling job, see the **object\$1detection\$1tutorial** example in the **SageMaker AI Examples**, **Ground Truth Labeling Jobs** section of a SageMaker AI notebook instance. To learn how to create and open a notebook instance, see [Create an Amazon SageMaker notebook instance](howitworks-create-ws.md).

## Amazon EC2 instances required for automated data labeling


The following table lists the Amazon Elastic Compute Cloud (Amazon EC2) instances that you need to run automated data labeling for training and batch inference jobs.


| Automated Data Labeling Job Type | Training Instance Type | Inference Instance Type | 
| --- | --- | --- | 
|  Image classification  |  ml.p3.2xlarge\$1  |  ml.c5.xlarge  | 
|  Object detection (bounding box)  |  ml.p3.2xlarge\$1  |  ml.c5.4xlarge  | 
|  Text classification  |  ml.c5.2xlarge  |  ml.m4.xlarge  | 
|  Semantic segmentation  |  ml.p3.2xlarge\$1  |  ml.p3.2xlarge\$1  | 

\$1 In the Asia Pacific (Mumbai) Region (ap-south-1) use ml.p2.8xlarge instead.

 Ground Truth manages the instances that you use for automated data labeling jobs. It creates, configures, and terminates the instances as needed to perform your job. These instances don't appear in your Amazon EC2 instance dashboard.

## Set up an active learning workflow with your own model


You can create an active learning workflow with your own algorithm to run training and inferences in that workflow to auto-label your data. The notebook bring\$1your\$1own\$1model\$1for\$1sagemaker\$1labeling\$1workflows\$1with\$1active\$1learning.ipynb demonstrates this using the SageMaker AI built-in algorithm, [BlazingText](https://docs.aws.amazon.com/sagemaker/latest/dg/blazingtext.html). This notebook provides an CloudFormation stack that you can use to execute this workflow using AWS Step Functions. You can find the notebook and supporting files in this [GitHub repository](https://github.com/awslabs/amazon-sagemaker-examples/tree/master/ground_truth_labeling_jobs/bring_your_own_model_for_sagemaker_labeling_workflows_with_active_learning).

# Chaining labeling jobs


Amazon SageMaker Ground Truth can reuse datasets from prior jobs in two ways: cloning and chaining.

*Cloning* copies the setup of a prior labeling job and allows you to make additional changes before setting it to run.

*Chaining* uses not only the setup of the prior job, but also the results. This allows you to continue an incomplete job and add labels or data objects to a completed job. Chaining is a more complex operation. 

For data processing: 
+  Cloning uses the prior job's *input* manifest, with optional modifications, as the new job's input manifest. 
+  Chaining uses the prior job's *output* manifest as the new job's input manifest. 

Chaining is useful when you need to:
+ Continue a labeling job that was manually stopped.
+ Continue a labeling job that failed mid-job, after fixing issues.
+ Switch to automated data labeling after manually labeling part of a job (or the other way around).
+ Add more data objects to a completed job and start the job from there.
+ Add another annotation to a completed job. For example, you have a collection of phrases labeled for topic, then want to run the set again, categorizing them by the topic's implied audience.

In Amazon SageMaker Ground Truth you can configure a chained labeling job with either the console or the API.

## Key term: label attribute name


The *label attribute name* (`LabelAttributeName` in the API) is a string used as the key for the key-value pair formed with the label that a worker assigns to the data object.

The following rules apply for the label attribute name:
+ It can't end with `-metadata`.
+ The names `source` and `source-ref` are reserved and can't be used.
+ For semantic segmentation labeling jobs, , it must end with `-ref`. For all other labeling jobs, it *can't* end with `-ref`. If you use the console to create the job, Amazon SageMaker Ground Truth automatically appends `-ref` to all label attribute names except for semantic segmentation jobs.
+ For a chained labeling job, if you're using the same label attribute name from the originating job and you configure the chained job to use auto-labeling, then if it had been in auto-labeling mode at any point, Ground Truth uses the model from the originating job.

In an output manifest, the label attribute name appears similar to the following.

```
  "source-ref": "<S3 URI>",
  "<label attribute name>": {
    "annotations": [{
      "class_id": 0,
      "width": 99,
      "top": 87,
      "height": 62,
      "left": 175
    }],
    "image_size": [{
      "width": 344,
      "depth": 3,
      "height": 234
    }]
  },
  "<label attribute name>-metadata": {
    "job-name": "<job name>",
    "class-map": {
      "0": "<label attribute name>"
    },
    "human-annotated": "yes",
    "objects": [{
      "confidence": 0.09
    }],
    "creation-date": "<timestamp>",
    "type": "groundtruth/object-detection"
  }
```

If you're creating a job in the console and don't explicitly set the label attribute name value, Ground Truth uses the job name as the label attribute name for the job.

## Start a chained job (console)


Choose a stopped, failed, or completed labeling job from the list of your existing jobs. This enables the **Actions** menu.

From the **Actions** menu, choose **Chain**.

### Job overview panel


In the **Job overview** panel, a new **Job name** is set based on the title of the job from which you are chaining this one. You can change it.

You may also specify a label attribute name different from the labeling job name.

If you're chaining from a completed job, the label attribute name uses the name of the new job you're configuring. To change the name, select the check box.

If you're chaining from a stopped or failed job, the label attribute name uses to the name of the job from which you're chaining. It's easy to see and edit the value because the name check box is checked.

**Attribute label naming considerations**  
**The default** uses the label attribute name Ground Truth has selected. All data objects without data connected to that label attribute name are labeled.
**Using a label attribute name** not present in the manifest causes the job to process *all* the objects in the dataset.

The **input dataset location** in this case is automatically selected as the output manifest of the chained job. The input field is not available, so you cannot change it.

**Adding data objects to a labeling job**  
You cannot specify an alternate manifest file. Manually edit the output manifest from the previous job to add new items before starting a chained job. The Amazon S3 URI helps you locate where you are storing the manifest in your Amazon S3 bucket. Download the manifest file from there, edit it locally on your computer, and then upload the new version to replace it. Make sure you are not introducing errors during editing. We recommend you use JSON linter to check your JSON. Many popular text editors and IDEs have linter plugins available.

## Start a chained job (API)


The procedure is almost the same as setting up a new labeling job with `CreateLabelingJob`, except for two primary differences:
+ **Manifest location:** Rather than use your original manifest from the prior job, the value for the `ManifestS3Uri` in the `DataSource` should point to the Amazon S3 URI of the *output manifest* from the prior labeling job.
+ **Label attribute name:** Setting the correct `LabelAttributeName` value is important here. This is the key portion of a key-value pair where labeling data is the value. Sample use cases include:
  + **Adding new or more specific labels to a completed job** — Set a new label attribute name.
  + **Labeling the unlabeled items from a prior job** — Use the label attribute name from the prior job.

## Use a partially labeled dataset


You can get some chaining benefits if you use an augmented manifest that has already been partially labeled. Check the **Label attribute name** check box and set the name so that it matches the name in your manifest.

If you're using the API, the instructions are the same as those for starting a chained job. However, be sure to upload your manifest to an Amazon S3 bucket and use it instead of using the output manifest from a prior job.

The **Label attribute name** value in the manifest has to conform to the naming considerations discussed earlier.