

# Accessing Amazon Rekognition Custom Labels evaluation metrics (SDK)
<a name="im-metrics-api"></a>

The [DescribeProjectVersions](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DescribeProjectVersions) operation provides access to metrics beyond those provided in the console. 

Like the console, `DescribeProjectVersions` provides access to the following metrics as summary information for the testing results and as testing results for each label:
+ [Precision](im-metrics-use.md#im-precision-metric)
+ [Recall](im-metrics-use.md#im-recall-metric)
+ [F1](im-metrics-use.md#im-f1-metric)

The average threshold for all labels and the threshold for individual labels is returned.

`DescribeProjectVersions` also provides access to the following metrics for classification and image detection (object location on image).
+ *Confusion Matrix* for image classification. For more information, see [Viewing the confusion matrix for a model](im-confusion-matrix.md).
+ *Mean Average Precision (mAP)* for image detection.
+ *Mean Average Recall (mAR)* for image detection.

`DescribeProjectVersions` also provides access to true positive, false positive, false negative, and true negative values. For more information, see [Metrics for evaluating your model](im-metrics-use.md).

The aggregate F1 score metric is returned directly by `DescribeProjectVersions`. Other metrics are accessible from a [Accessing the model summary file](im-summary-file-api.md) and [Interpreting the evaluation manifest snapshot](im-evaluation-manifest-snapshot-api.md) files stored in an Amazon S3 bucket. For more information, see [Accessing the summary file and evaluation manifest snapshot (SDK)](im-access-summary-evaluation-manifest.md).

**Topics**
+ [Accessing the model summary file](im-summary-file-api.md)
+ [Interpreting the evaluation manifest snapshot](im-evaluation-manifest-snapshot-api.md)
+ [Accessing the summary file and evaluation manifest snapshot (SDK)](im-access-summary-evaluation-manifest.md)
+ [Viewing the confusion matrix for a model](im-confusion-matrix.md)
+ [Reference: Training results summary file](im-summary-file.md)

# Accessing the model summary file
<a name="im-summary-file-api"></a>

The summary file contains evaluation results information about the model as a whole and metrics for each label. The metrics are precision, recall, F1 score. The threshold value for the model is also supplied. The summary file location is accessible from the `EvaluationResult` object returned by `DescribeProjectVersions`. For more information, see [Reference: Training results summary file](im-summary-file.md).

The following is an example summary file.

```
{
  "Version": 1,
  "AggregatedEvaluationResults": {
    "ConfusionMatrix": [
      {
        "GroundTruthLabel": "CAP",
        "PredictedLabel": "CAP",
        "Value": 0.9948717948717949
      },
      {
        "GroundTruthLabel": "CAP",
        "PredictedLabel": "WATCH",
        "Value": 0.008547008547008548
      },
      {
        "GroundTruthLabel": "WATCH",
        "PredictedLabel": "CAP",
        "Value": 0.1794871794871795
      },
      {
        "GroundTruthLabel": "WATCH",
        "PredictedLabel": "WATCH",
        "Value": 0.7008547008547008
      }
    ],
    "F1Score": 0.9726959470546408,
    "Precision": 0.9719115848331294,
    "Recall": 0.9735042735042735
  },
  "EvaluationDetails": {
    "EvaluationEndTimestamp": "2019-11-21T07:30:23.910943",
    "Labels": [
      "CAP",
      "WATCH"
    ],
    "NumberOfTestingImages": 624,
    "NumberOfTrainingImages": 5216,
    "ProjectVersionArn": "arn:aws:rekognition:us-east-1:nnnnnnnnn:project/my-project/version/v0/1574317227432"
  },
  "LabelEvaluationResults": [
    {
      "Label": "CAP",
      "Metrics": {
        "F1Score": 0.9794344473007711,
        "Precision": 0.9819587628865979,
        "Recall": 0.9769230769230769,
        "Threshold": 0.9879502058029175
      },
      "NumberOfTestingImages": 390
    },
    {
      "Label": "WATCH",
      "Metrics": {
        "F1Score": 0.9659574468085106,
        "Precision": 0.961864406779661,
        "Recall": 0.9700854700854701,
        "Threshold": 0.014450683258473873
      },
      "NumberOfTestingImages": 234
    }
  ]
}
```

# Interpreting the evaluation manifest snapshot
<a name="im-evaluation-manifest-snapshot-api"></a>

The evaluation manifest snapshot contains detailed information about the test results. The snapshot includes the confidence rating for each prediction. It also includes the classification of the prediction compared to the actual classification of the image (true positive, true negative, false positive, or false negative). 

The files are a snapshot since only images that could be used for testing and training are included. Images that can't be verified, such as images in the wrong format, aren't included in the manifest. The testing snapshot location is accessible from the `TestingDataResult` object returned by `DescribeProjectVersions`. The training snapshot location is accessible from `TrainingDataResult` object returned by `DescribeProjectVersions`. 

The snapshot is in SageMaker AI Ground Truth manifest output format with fields added to provide additional information, such as the result of a detection's binary classification. The following snippet shows the additional fields.

```
"rekognition-custom-labels-evaluation-details": {
    "version": 1,
    "is-true-positive": true,
    "is-true-negative": false,
    "is-false-positive": false,
    "is-false-negative": false,
    "is-present-in-ground-truth": true
    "ground-truth-labelling-jobs": ["rekognition-custom-labels-training-job"]
}
```
+ *version* – The version of the `rekognition-custom-labels-evaluation-details` field format within the manifest snapshot.
+ *is-true-positive...* – The binary classification of the prediction based on how the confidence score compares to the minimum threshold for the label.
+ *is-present-in-ground-truth* – True if the prediction made by the model is present in the ground truth information used for training, otherwise false. This value isn't based on whether the confidence score exceeds the minimum threshold calculated by the model. 
+ *ground-truth-labeling-jobs* – A list of ground truth fields in the manifest line that are used for training.

For information about the SageMaker AI Ground Truth manifest format, see [Output](https://docs.aws.amazon.com/sagemaker/latest/dg/sms-data-output.html). 

The following is an example testing manifest snapshot that shows metrics for image classification and object detection.

```
// For image classification
{
  "source-ref": "s3://amzn-s3-demo-bucket/dataset/beckham.jpeg",
  "rekognition-custom-labels-training-0": 1,
  "rekognition-custom-labels-training-0-metadata": {
    "confidence": 1.0,
    "job-name": "rekognition-custom-labels-training-job",
    "class-name": "Football",
    "human-annotated": "yes",
    "creation-date": "2019-09-06T00:07:25.488243",
    "type": "groundtruth/image-classification"
  },
  "rekognition-custom-labels-evaluation-0": 1,
  "rekognition-custom-labels-evaluation-0-metadata": {
    "confidence": 0.95,
    "job-name": "rekognition-custom-labels-evaluation-job",
    "class-name": "Football",
    "human-annotated": "no",
    "creation-date": "2019-09-06T00:07:25.488243",
    "type": "groundtruth/image-classification",
    "rekognition-custom-labels-evaluation-details": {
      "version": 1,
      "ground-truth-labelling-jobs": ["rekognition-custom-labels-training-job"],
      "is-true-positive": true,
      "is-true-negative": false,
      "is-false-positive": false,
      "is-false-negative": false,
      "is-present-in-ground-truth": true
    }
  }
}


// For object detection
{
  "source-ref": "s3://amzn-s3-demo-bucket/dataset/beckham.jpeg",
  "rekognition-custom-labels-training-0": {
    "annotations": [
      {
        "class_id": 0,
        "width": 39,
        "top": 409,
        "height": 63,
        "left": 712
      },
      ...
    ],
    "image_size": [
      {
        "width": 1024,
        "depth": 3,
        "height": 768
      }
    ]
  },
  "rekognition-custom-labels-training-0-metadata": {
    "job-name": "rekognition-custom-labels-training-job",
    "class-map": {
      "0": "Cap",
      ...
    },
    "human-annotated": "yes",
    "objects": [
      {
        "confidence": 1.0
      },
      ...
    ],
    "creation-date": "2019-10-21T22:02:18.432644",
    "type": "groundtruth/object-detection"
  },
  "rekognition-custom-labels-evaluation": {
    "annotations": [
      {
        "class_id": 0,
        "width": 39,
        "top": 409,
        "height": 63,
        "left": 712
      },
      ...
    ],
    "image_size": [
      {
        "width": 1024,
        "depth": 3,
        "height": 768
      }
    ]
  },
  "rekognition-custom-labels-evaluation-metadata": {
    "confidence": 0.95,
    "job-name": "rekognition-custom-labels-evaluation-job",
    "class-map": {
      "0": "Cap",
      ...
    },
    "human-annotated": "no",
    "objects": [
      {
        "confidence": 0.95,
        "rekognition-custom-labels-evaluation-details": {
          "version": 1,
          "ground-truth-labelling-jobs": ["rekognition-custom-labels-training-job"],
          "is-true-positive": true,
          "is-true-negative": false,
          "is-false-positive": false,
          "is-false-negative": false,
          "is-present-in-ground-truth": true
        }
      },
      ...
    ],
    "creation-date": "2019-10-21T22:02:18.432644",
    "type": "groundtruth/object-detection"
  }
}
```

# Accessing the summary file and evaluation manifest snapshot (SDK)
<a name="im-access-summary-evaluation-manifest"></a>

To get training results, you call [DescribeProjectVersions](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DescribeProjectVersions). For example code, see [Describing a model (SDK)](md-describing-model-sdk.md).

The location of the metrics is returned in the `ProjectVersionDescription` response from `DescribeProjectVersions`.
+ `EvaluationResult` – The location of the summary file.
+ `TestingDataResult` – The location of the evaluation manifest snapshot used for testing. 

The F1 score and summary file location are returned in `EvaluationResult`. For example:

```
"EvaluationResult": {
                "F1Score": 1.0,
                "Summary": {
                    "S3Object": {
                        "Bucket": "echo-dot-scans",
                        "Name": "test-output/EvaluationResultSummary-my-echo-dots-project-v2.json"
                    }
                }
            }
```

The evaluation manifest snapshot is stored in the location specified in the ` --output-config` input parameter that you specified in [Training a model (SDK)](training-model.md#tm-sdk). 

**Note**  
The amount of time, in seconds, that you are billed for training is returned in `BillableTrainingTimeInSeconds`. 

For information about the metrics that are returned by the Amazon Rekognition Custom Labels, see [Accessing Amazon Rekognition Custom Labels evaluation metrics (SDK)](im-metrics-api.md).

# Viewing the confusion matrix for a model
<a name="im-confusion-matrix"></a>

A confusion matrix allows you to see the labels that your model confuses with other labels in your model. By using a confusion matrix, you can focus your improvements to the model.

During model evaluation, Amazon Rekognition Custom Labels create a confusion matrix by using the test images to identify mis-identified (confused) labels. Amazon Rekognition Custom Labels only creates a confusion matrix for classification models. The classification matrix is accessible from the summary file that Amazon Rekognition Custom Labels creates during model training. You can't view the confusion matrix in the Amazon Rekognition Custom Labels console.

**Topics**
+ [Using a confusion matrix](#im-using-confusion-matrix)
+ [Getting the confusion matrix for a model](#im-getting-confusion-matrix)

## Using a confusion matrix
<a name="im-using-confusion-matrix"></a>

The following table is the confusion matrix for the [Rooms image classification](getting-started.md#gs-image-classification-example) example project. Column headings are the labels (ground truth labels) assigned to the test images. Row headings are the labels that the model predicts for the test images. Each cell is the percentage of predictions for a label (row) that should be the ground truth label (column). For example, 67% of the predictions for bathrooms were correctly labeled as bathrooms. 33% percent of bathrooms were incorrectly labeled as kitchens. A high performing model has high cell values when the predicted label matches the ground truth label. You can see these as a diagonal line from the first to last predicted and ground truth labels. If a cell value is 0, no predictions were made for the cell's predicted label that should be the cell's ground truth label.

**Note**  
Since models are non-deterministic, the confusion matrix cell values you get from training the Rooms project might differ from the following table. 

The confusion matrix identifies areas to focus on. For example, the confusion matrix shows that 50% of the time the model confused closets for bedrooms. In this situation, you should add more images of closets and bedrooms to your training dataset. Also check that the existing closet and bedroom images are correctly labeled. This should help the model better distinguish between the two labels. To add more images to a dataset, see [Adding more images to a dataset](md-add-images.md).

While the confusion matrix is helpful, it's important to consider other metrics. For example, 100% of the predictions correctly found the floor\$1plan label, which indicates excellent performance. However, the test dataset only has 2 images with the floor\$1plan label. It also has 11 images with the living\$1space label. This imbalance is also in the training dataset (13 living\$1space images and 2 closet images). To get a more accurate evaluation, balance the training and test datasets by adding more images of under-represented labels (floor plans in this example). To get the number of test images per label, see [Accessing evaluation metrics (Console)](im-access-training-results.md). 

The following table is a sample confusion matrix, comparing the predicted label (on the y-axis) against the ground truth label:


| Predicted label | backyard | bathroom | bedroom | closet | entry\$1way | floor\$1plan | front\$1yard | kitchen | living\$1space | patio | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| backyard | 75% | 0% | 0% | 0% | 0% | 0% | 33% | 0% | 0% | 0% | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| bathroom | 0% | 67% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| bedroom | 0% | 0% | 82% | 50% | 0% | 0% | 0% | 0% | 9% | 0% | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| closet | 0% | 0% | 0% | 50% | 0% | 0% | 0% | 0% | 0% | 0% | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| entry\$1way | 0% | 0% | 0% | 0% | 33% | 0% | 0% | 0% | 0% | 0% | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| floor\$1plan | 0% | 0% | 0% | 0% | 0% | 100% | 0% | 0% | 0% | 0% | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| front\$1yard | 25% | 0% | 0% | 0% | 0% | 0% | 67% | 0% | 0% | 0% | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| kitchen | 0% | 33% | 0% | 0% | 0% | 0% | 0% | 88% | 0% | 0% | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| living\$1space | 0% | 0% | 18% | 0% | 67% | 0% | 0% | 12% | 91% | 33% | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |
| patio | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 0% | 67% | 
| --- |--- |--- |--- |--- |--- |--- |--- |--- |--- |--- |

## Getting the confusion matrix for a model
<a name="im-getting-confusion-matrix"></a>

The following code uses the [DescribeProjects](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DescribeProjects) and [DescribeProjectVersions](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DescribeProjectVersions) operations to get the [summary file](im-summary-file-api.md) for a model. It then uses the summary file to display the confusion matrix for the model. 

**To display the confusion matrix for a model (SDK)**

1. If you haven't already done so, install and configure the AWS CLI and the AWS SDKs. For more information, see [Step 4: Set up the AWS CLI and AWS SDKs](su-awscli-sdk.md).

1. Use the following code to display the confusion matrix for a model. Supply the following command line arguments:
   + `project_name` – the name of the project you want to use. You can get the project name from the projects page in the Amazon Rekognition Custom Labels console.
   + `version_name` – the version of the model that you want to use. You can get the version name from the project details page in the Amazon Rekognition Custom Labels console.

   ```
   # Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
   # SPDX-License-Identifier: Apache-2.0
   
   """
   Purpose
   
   Shows how to display the confusion matrix for an Amazon Rekognition Custom labels image
   classification model.
   """
   
   
   import json
   import argparse
   import logging
   import boto3
   import pandas as pd
   from botocore.exceptions import ClientError
   
   
   logger = logging.getLogger(__name__)
   
   
   def get_model_summary_location(rek_client, project_name, version_name):
       """
       Get the summary file location for a model.
   
       :param rek_client: A Boto3 Rekognition client.
       :param project_arn: The Amazon Resource Name (ARN) of the project that contains the model.
       :param model_arn: The Amazon Resource Name (ARN) of the model.
       :return: The location of the model summary file.
       """
   
       try:
           logger.info(
               "Getting summary file for model %s in project %s.", version_name, project_name)
   
           summary_location = ""
   
           # Get the project ARN from the project name.
           response = rek_client.describe_projects(ProjectNames=[project_name])
   
           assert len(response['ProjectDescriptions']) > 0, \
               f"Project {project_name} not found."
   
           project_arn = response['ProjectDescriptions'][0]['ProjectArn']
   
           # Get the summary file location for the model.
           describe_response = rek_client.describe_project_versions(ProjectArn=project_arn,
                                                                    VersionNames=[version_name])
           assert len(describe_response['ProjectVersionDescriptions']) > 0, \
               f"Model {version_name} not found."
   
           model=describe_response['ProjectVersionDescriptions'][0]
   
           evaluation_results=model['EvaluationResult']
   
           summary_location=(f"s3://{evaluation_results['Summary']['S3Object']['Bucket']}"
                               f"/{evaluation_results['Summary']['S3Object']['Name']}")
   
           return summary_location
   
       except ClientError as err:
           logger.exception(
               "Couldn't get summary file location: %s", err.response['Error']['Message'])
           raise
   
   
   def show_confusion_matrix(summary):
       """
       Shows the confusion matrix for an Amazon Rekognition Custom Labels
       image classification model.
       :param summary: The summary file JSON object.
       """
       pd.options.display.float_format = '{:.0%}'.format
   
       # Load the model summary JSON into a DataFrame.
   
       summary_df = pd.DataFrame(
           summary['AggregatedEvaluationResults']['ConfusionMatrix'])
   
       # Get the confusion matrix.
       confusion_matrix = summary_df.pivot_table(index='PredictedLabel',
                                                 columns='GroundTruthLabel',
                                                 fill_value=0.0).astype(float)
   
       # Display the confusion matrix.
       print(confusion_matrix)
   
   
   def get_summary(s3_resource, summary):
       """
       Gets the summary file.
       : return: The summary file in bytes.
       """
       try:
           summary_bucket, summary_key = summary.replace(
               "s3://", "").split("/", 1)
   
           bucket = s3_resource.Bucket(summary_bucket)
           obj = bucket.Object(summary_key)
           body = obj.get()['Body'].read()
           logger.info(
               "Got summary file '%s' from bucket '%s'.",
               obj.key, obj.bucket_name)
       except ClientError:
           logger.exception(
               "Couldn't get summary file '%s' from bucket '%s'.",
               obj.key, obj.bucket_name)
           raise
       else:
           return body
   
   
   def add_arguments(parser):
       """
       Adds command line arguments to the parser.
       : param parser: The command line parser.
       """
   
       parser.add_argument(
           "project_name", help="The ARN of the project in which the model resides."
       )
       parser.add_argument(
           "version_name", help="The version of the model that you want to describe."
       )
   
   
   def main():
       """
       Entry point for script.
       """
   
       logging.basicConfig(level=logging.INFO,
                           format="%(levelname)s: %(message)s")
   
       try:
   
           # Get the command line arguments.
           parser = argparse.ArgumentParser(usage=argparse.SUPPRESS)
           add_arguments(parser)
           args = parser.parse_args()
   
           print(
               f"Showing confusion matrix for: {args.version_name} for project {args.project_name}.")
   
           session = boto3.Session(profile_name='custom-labels-access')
           rekognition_client = session.client("rekognition")
           s3_resource = session.resource('s3')
   
           # Get the summary file for the model.
           summary_location = get_model_summary_location(rekognition_client, args.project_name,
                                                         args.version_name
                                                         )
           summary = json.loads(get_summary(s3_resource, summary_location))
   
           # Check that the confusion matrix is available.
           assert 'ConfusionMatrix' in summary['AggregatedEvaluationResults'], \
               "Confusion matrix not found in summary. Is the model a classification model?"
   
           # Show the confusion matrix.
           show_confusion_matrix(summary)
           print("Done")
   
       except ClientError as err:
           logger.exception("Problem showing confusion matrix: %s", err)
           print(f"Problem describing model: {err}")
   
       except AssertionError as err:
           logger.exception(
               "Error: %s.\n", err)
           print(
               f"Error: {err}\n")
   
   
   if __name__ == "__main__":
       main()
   ```

# Reference: Training results summary file
<a name="im-summary-file"></a>

The training results summary contains metrics you can use to evaluate your model. The summary file is also used to display metrics in the console training results page. The summary file is stored in an Amazon S3 bucket after training. To get the summary file, call `DescribeProjectVersion`. For example code, see [Accessing the summary file and evaluation manifest snapshot (SDK)](im-access-summary-evaluation-manifest.md). 

## Summary file
<a name="im-summary-reference"></a>

The following JSON is the format of the summary file.



**EvaluationDetails (section 3)**  
Overview information about the training task. This includes the ARN of the project that the model belongs to (`ProjectVersionArn)`, the date and time that training finished, the version of the model that was evaluated (`EvaluationEndTimestamp`), and a list of labels detected during training (`Labels`). Also included is the number of images used for training (`NumberOfTrainingImages`) and evaluation (`NumberOfTestingImages`). 

**AggregatedEvaluationResults (section 1)**  
You can use `AggregatedEvaluationResults` to evaluate the overall performance of the trained model when used with the testing dataset. Aggregated metrics are included for `Precision`, `Recall`, and `F1Score` metrics. For object detection (the object location on an image), `AverageRecall` (mAR) and `AveragePrecision` (mAP) metrics are returned. For classification (the type of object in an image), a confusion matrix metric is returned. 

**LabelEvaluationResults (section 2)**  
You can use `labelEvaluationResults` to evaluate the performance of individual labels. The labels are sorted by the F1 score of each label. The metrics included are `Precision`, `Recall`, `F1Score`, and `Threshold` (used for classification). 

The file name is formatted as follows: `EvaluationSummary-ProjectName-VersionName.json`.

```
{
  "Version": "integer",
  // section-3
  "EvaluationDetails": {
    "ProjectVersionArn": "string",
    "EvaluationEndTimestamp": "string",
    "Labels": "[string]",
    "NumberOfTrainingImages": "int",
    "NumberOfTestingImages": "int"
  },
  // section-1
  "AggregatedEvaluationResults": {
    "Metrics": {
      "Precision": "float",
      "Recall": "float",
      "F1Score": "float",
      // The following 2 fields are only applicable to object detection
      "AveragePrecision": "float",
      "AverageRecall": "float",
      // The following field is only applicable to classification
      "ConfusionMatrix":[
        {
          "GroundTruthLabel": "string",
          "PredictedLabel": "string",
          "Value": "float"
        },
        ...
      ],
    }
  },
  // section-2
  "LabelEvaluationResults": [
    {
      "Label": "string",
      "NumberOfTestingImages", "int",
      "Metrics": {
        "Threshold": "float",
        "Precision": "float",
        "Recall": "float",
        "F1Score": "float"
      },
    },
    ...
  ]
}
```