

# Detecting text
<a name="text-detection"></a>

Amazon Rekognition can detect text in images and videos. It can then convert the detected text into machine-readable text. You can use machine-readable text detection in images to implement solutions such as:
+ Visual search. For example, retrieving and displaying images that contain the same text.
+ Content insights. For example, providing insights into themes that occur in text that's recognized in extracted video frames. Your application can search recognized text for relevant content, such as news, sport scores, athlete numbers, and captions.
+ Navigation. For example, developing a speech-enabled mobile app for visually impaired people that recognizes the names of restaurants, shops, or street signs. 
+ Public safety and transportation support. For example, detecting car license plate numbers from traffic camera images. 
+ Filtering. For example, filtering personally identifiable information (PII) from images. 

For text detection in videos, you can implement solutions such as: 
+ Searching videos for clips with specific text keywords, such as a guest’s name on a graphic in a news show.
+ Moderating content for compliance with organizational standards by detecting accidental text, profanity, or spam.
+ Finding all text overlays on the video timeline for further processing, such as replacing text with text in another language for content internationalization.
+ Finding text locations, so that other graphics can be aligned accordingly.

To detect text in images in JPEG or PNG format, use the [DetectText](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectText.html) operation. To asynchronously detect text in video, use the [StartTextDetection](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_StartTextDetection.html) and [GetTextDetection](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_GetTextDetection.html) operations. Both image and video text detection operations support most fonts, including highly stylized ones. After detecting text, Amazon Rekognition creates a representation of detected words and lines of text, shows the relationship between them, and tells you where the text is on an image or video frame.

The `DetectText` and `GetTextDetection` operations detect words and lines. A *word* is one or more script characters that aren't separated by spaces. `DetectText` can detect up to 100 words in an image. `GetTextDetection` can also detect up to 100 words per frame of video. 

A word is one or more script characters that are not separated by spaces. Amazon Rekognition is designed to detect words in English, Arabic, Russian, German, French, Italian, Portuguese and Spanish.

A *line* is a string of equally spaced words. A line isn't necessarily a complete sentence (periods don't indicate the end of a line). For example, Amazon Rekognition detects a driver's license number as a line. A line ends when there is no aligned text after it or when there's a large gap between words, relative to the length of the words. Depending on the gap between words, Amazon Rekognition might detect multiple lines in text that are aligned in the same direction. If a sentence spans multiple lines, the operation returns multiple lines.

Consider the following image.

![\[Coffee mug with smiley face and text "It's Monday but keep smiling", with bounding boxes and extracted text..\]](http://docs.aws.amazon.com/rekognition/latest/dg/images/text.png)


The blue boxes represent information about the detected text and the location of the text that's returned by the `DetectText` operation. In this example, Amazon Rekognition detects "IT'S", "MONDAY", "but", "keep", and "Smiling" as words. Amazon Rekognition detects "IT'S", "MONDAY", "but keep", and "Smiling" as lines. To be detected, text must be within \$1/- 90 degrees orientation of the horizontal axis.

For an example, see [Detecting text in an image](text-detecting-text-procedure.md).

**Topics**
+ [Detecting text in an image](text-detecting-text-procedure.md)
+ [Detecting text in a stored video](text-detecting-video-procedure.md)

# Detecting text in an image
<a name="text-detecting-text-procedure"></a>

You can provide an input image as an image byte array (base64-encoded image bytes), or as an Amazon S3 object. In this procedure, you upload a JPEG or PNG image to your S3 bucket and specify the file name. 

**To detect text in an image (API)**

1. If you haven't already, complete the following prerequisites.

   1. Create or update a user with `AmazonRekognitionFullAccess` and `AmazonS3ReadOnlyAccess` permissions. For more information, see [Step 1: Set up an AWS account and create a User](setting-up.md#setting-up-iam).

   1. Install and configure the AWS Command Line Interface and the AWS SDKs. For more information, see [Step 2: Set up the AWS CLI and AWS SDKs](setup-awscli-sdk.md).

1. Upload the image that contains text to your S3 bucket. 

   For instructions, see [Uploading Objects into Amazon S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/UploadingObjectsintoAmazonS3.html) in the *Amazon Simple Storage Service User Guide*.

1. Use the following examples to call the `DetectText` operation.

------
#### [ Java ]

   The following example code displays lines and words that were detected in an image. 

   Replace the values of `amzn-s3-demo-bucket` and `photo` with the names of the S3 bucket and image that you used in step 2. 

   ```
   //Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
   //PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)
   
   package aws.example.rekognition.image;
   import com.amazonaws.services.rekognition.AmazonRekognition;
   import com.amazonaws.services.rekognition.AmazonRekognitionClientBuilder;
   import com.amazonaws.services.rekognition.model.AmazonRekognitionException;
   import com.amazonaws.services.rekognition.model.Image;
   import com.amazonaws.services.rekognition.model.S3Object;
   import com.amazonaws.services.rekognition.model.DetectTextRequest;
   import com.amazonaws.services.rekognition.model.DetectTextResult;
   import com.amazonaws.services.rekognition.model.TextDetection;
   import java.util.List;
   
   
   
   public class DetectText {
   
      public static void main(String[] args) throws Exception {
         
     
         String photo = "inputtext.jpg";
         String bucket = "bucket";
   
         AmazonRekognition rekognitionClient = AmazonRekognitionClientBuilder.defaultClient();
   
        
         
         DetectTextRequest request = new DetectTextRequest()
                 .withImage(new Image()
                 .withS3Object(new S3Object()
                 .withName(photo)
                 .withBucket(bucket)));
       
   
         try {
            DetectTextResult result = rekognitionClient.detectText(request);
            List<TextDetection> textDetections = result.getTextDetections();
   
            System.out.println("Detected lines and words for " + photo);
            for (TextDetection text: textDetections) {
         
                    System.out.println("Detected: " + text.getDetectedText());
                    System.out.println("Confidence: " + text.getConfidence().toString());
                    System.out.println("Id : " + text.getId());
                    System.out.println("Parent Id: " + text.getParentId());
                    System.out.println("Type: " + text.getType());
                    System.out.println();
            }
         } catch(AmazonRekognitionException e) {
            e.printStackTrace();
         }
      }
   }
   ```

------
#### [ Java V2 ]

   This code is taken from the AWS Documentation SDK examples GitHub repository. See the full example [here](https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javav2/example_code/rekognition/src/main/java/com/example/rekognition/DetectText.java).

   ```
   /**
   *  To run this code example, ensure that you perform the Prerequisites as stated in the Amazon Rekognition Guide:
   *  https://docs.aws.amazon.com/rekognition/latest/dg/video-analyzing-with-sqs.html
   *
   * Also, ensure that set up your development environment, including your credentials.
   *
   * For information, see this documentation topic:
   *
   * https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
   */
   
   //snippet-start:[rekognition.java2.detect_text.import]
   import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
   import software.amazon.awssdk.core.SdkBytes;
   import software.amazon.awssdk.regions.Region;
   import software.amazon.awssdk.services.rekognition.RekognitionClient;
   import software.amazon.awssdk.services.rekognition.model.DetectTextRequest;
   import software.amazon.awssdk.services.rekognition.model.Image;
   import software.amazon.awssdk.services.rekognition.model.DetectTextResponse;
   import software.amazon.awssdk.services.rekognition.model.TextDetection;
   import software.amazon.awssdk.services.rekognition.model.RekognitionException;
   import java.io.FileInputStream;
   import java.io.FileNotFoundException;
   import java.io.InputStream;
   import java.util.List;
   //snippet-end:[rekognition.java2.detect_text.import]
   
   /**
   * Before running this Java V2 code example, set up your development environment, including your credentials.
   *
   * For more information, see the following documentation topic:
   *
   * https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
   */
   public class DetectTextImage {
   
    public static void main(String[] args) {
   
        final String usage = "\n" +
            "Usage: " +
            "   <sourceImage>\n\n" +
            "Where:\n" +
            "   sourceImage - The path to the image that contains text (for example, C:\\AWS\\pic1.png). \n\n";
   
      if (args.length != 1) {
            System.out.println(usage);
            System.exit(1);
        }
   
        String sourceImage = args[0] ;
        Region region = Region.US_WEST_2;
        RekognitionClient rekClient = RekognitionClient.builder()
            .region(region)
            .credentialsProvider(ProfileCredentialsProvider.create("default"))
            .build();
   
        detectTextLabels(rekClient, sourceImage );
        rekClient.close();
    }
   
    // snippet-start:[rekognition.java2.detect_text.main]
    public static void detectTextLabels(RekognitionClient rekClient, String sourceImage) {
   
        try {
            InputStream sourceStream = new FileInputStream(sourceImage);
            SdkBytes sourceBytes = SdkBytes.fromInputStream(sourceStream);
            Image souImage = Image.builder()
                .bytes(sourceBytes)
                .build();
   
            DetectTextRequest textRequest = DetectTextRequest.builder()
                .image(souImage)
                .build();
   
            DetectTextResponse textResponse = rekClient.detectText(textRequest);
            List<TextDetection> textCollection = textResponse.textDetections();
            System.out.println("Detected lines and words");
            for (TextDetection text: textCollection) {
                System.out.println("Detected: " + text.detectedText());
                System.out.println("Confidence: " + text.confidence().toString());
                System.out.println("Id : " + text.id());
                System.out.println("Parent Id: " + text.parentId());
                System.out.println("Type: " + text.type());
                System.out.println();
            }
   
        } catch (RekognitionException | FileNotFoundException e) {
            System.out.println(e.getMessage());
            System.exit(1);
        }
    }
    // snippet-end:[rekognition.java2.detect_text.main]
   ```

------
#### [ AWS CLI ]

   This AWS CLI command displays the JSON output for the `detect-text` CLI operation. 

   Replace the values of `amzn-s3-demo-bucket` and `Name` with the names of the S3 bucket and image that you used in step 2. 

   Replace the value of `profile_name` with the name of your developer profile.

   ```
   aws rekognition detect-text  --image "{"S3Object":{"Bucket":"amzn-s3-demo-bucket","Name":"image-name"}}" --profile default
   ```

   If you are accessing the CLI on a Windows device, use double quotes instead of single quotes and escape the inner double quotes by backslash (i.e. \$1) to address any parser errors you may encounter. For an example, see the following: 

   ```
   aws rekognition detect-text  --image "{\"S3Object\":{\"Bucket\":\"amzn-s3-demo-bucket\",\"Name\":\"image-name\"}}" --profile default
   ```

------
#### [ Python ]

   The following example code displays lines and words detected in an image. 

   Replace the values of `amzn-s3-demo-bucket` and `photo` with the names of the S3 bucket and image that you used in step 2. Replace the value of `profile_name` in the line that creates the Rekognition session with the name of your developer profile.

   ```
   # Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
   # PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)
   
   import boto3
   
   def detect_text(photo, bucket):
   
       session = boto3.Session(profile_name='default')
       client = session.client('rekognition')
   
       response = client.detect_text(Image={'S3Object': {'Bucket': bucket, 'Name': photo}})
   
       textDetections = response['TextDetections']
       print('Detected text\n----------')
       for text in textDetections:
           print('Detected text:' + text['DetectedText'])
           print('Confidence: ' + "{:.2f}".format(text['Confidence']) + "%")
           print('Id: {}'.format(text['Id']))
           if 'ParentId' in text:
               print('Parent Id: {}'.format(text['ParentId']))
           print('Type:' + text['Type'])
           print()
       return len(textDetections)
   
   def main():
       bucket = 'amzn-s3-demo-bucket'
       photo = 'photo-name'
       text_count = detect_text(photo, bucket)
       print("Text detected: " + str(text_count))
   
   if __name__ == "__main__":
       main()
   ```

------
#### [ .NET ]

   The following example code displays lines and words detected in an image. 

   Replace the values of `amzn-s3-demo-bucket` and `photo` with the names of the S3 bucket and image that you used in step 2. 

   ```
   //Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
   //PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)
   
   using System;
   using Amazon.Rekognition;
   using Amazon.Rekognition.Model;
   
   public class DetectText
   {
       public static void Example()
       {
           String photo = "input.jpg";
           String bucket = "amzn-s3-demo-bucket";
   
           AmazonRekognitionClient rekognitionClient = new AmazonRekognitionClient();
   
           DetectTextRequest detectTextRequest = new DetectTextRequest()
           {
               Image = new Image()
               {
                   S3Object = new S3Object()
                   {
                       Name = photo,
                       Bucket = bucket
                   }
               }
           };
   
           try
           {
               DetectTextResponse detectTextResponse = rekognitionClient.DetectText(detectTextRequest);
               Console.WriteLine("Detected lines and words for " + photo);
               foreach (TextDetection text in detectTextResponse.TextDetections)
               {
                   Console.WriteLine("Detected: " + text.DetectedText);
                   Console.WriteLine("Confidence: " + text.Confidence);
                   Console.WriteLine("Id : " + text.Id);
                   Console.WriteLine("Parent Id: " + text.ParentId);
                   Console.WriteLine("Type: " + text.Type);
               }
           }
           catch (Exception e)
           {
               Console.WriteLine(e.Message);
           }
       }
   }
   ```

------
#### [ Node.JS ]

   The following example code displays lines and words detected in an image. 

   Replace the values of `amzn-s3-demo-bucket` and `photo` with the names of the S3 bucket and image that you used in step 2. Replace the value of `region` with the region found in your .aws credentials. Replace the value of `profile_name` in the line that creates the Rekognition session with the name of your developer profile. 

   ```
   var AWS = require('aws-sdk');
   
   const bucket = 'bucket' // the bucketname without s3://
   const photo  = 'photo' // the name of file
   
   const config = new AWS.Config({
     accessKeyId: process.env.AWS_ACCESS_KEY_ID,
     secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY,
   }) 
   AWS.config.update({region:'region'});
   const client = new AWS.Rekognition();
   const params = {
     Image: {
       S3Object: {
         Bucket: bucket,
         Name: photo
       },
     },
   }
   client.detectText(params, function(err, response) {
     if (err) {
       console.log(err, err.stack); // handle error if an error occurred
     } else {
       console.log(`Detected Text for: ${photo}`)
       console.log(response)
       response.TextDetections.forEach(label => {
         console.log(`Detected Text: ${label.DetectedText}`),
         console.log(`Type: ${label.Type}`),
         console.log(`ID: ${label.Id}`),
         console.log(`Parent ID: ${label.ParentId}`),
         console.log(`Confidence: ${label.Confidence}`),
         console.log(`Polygon: `)
         console.log(label.Geometry.Polygon)
       } 
       )
     } 
   });
   ```

------

## DetectText operation request
<a name="detecttext-request"></a>

In the `DetectText` operation, you supply an input image either as a base64-encoded byte array or as an image stored in an Amazon S3 bucket. The following example JSON request shows the image loaded from an Amazon S3 bucket.

```
{
    "Image": {
        "S3Object": {
            "Bucket": "amzn-s3-demo-bucket",
            "Name": "inputtext.jpg"
        }
    }
}
```

### Filters
<a name="text-filters"></a>

Filtering by text region, size and confidence score provides you with additional flexibility to control your text detection output. By using regions of interest, you can easily limit text detection to the regions that are relevant to you, for example, the top right of profile photo or a fixed location in relation to a reference point when reading parts numbers from an image of a machine. Word bounding box size filter can be used to avoid small background text which may be noisy or irrelevant. Word confidence filter enables you to remove results that may be unreliable due to being blurry or smudged. 

For information regarding filter values, see `[DetectTextFilters](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectTextFilters.html)`.

You can use the following filters:
+ **MinConfidence** –Sets the confidence level of word detection. Words with detection confidence below this level are excluded from the result. Values should be between 0 and 100.
+ **MinBoundingBoxWidth** – Sets the minimum width of the word bounding box. Words with bounding boxes that are smaller than this value are excluded from the result. The value is relative to the image frame width.
+ **MinBoundingBoxHeight** – Sets the minimum height of the word bounding box. Words with bounding box heights less than this value are excluded from the result. The value is relative to the image frame height.
+ **RegionsOfInterest** – Limits detection to a specific region of the image frame. The values are relative to the frame's dimensions. For text only partially within a region, the response is undefined.

## DetectText operation response
<a name="text-response"></a>

The `DetectText` operation analyzes the image and returns an array, TextDetections, where each element (`[TextDetection](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_TextDetection.html)`) represents a line or word detected in the image. For each element, `DetectText` returns the following information: 
+ The detected text (`DetectedText`)
+ The relationships between words and lines (`Id` and `ParentId`)
+ The location of text on the image (`Geometry`)
+ The confidence Amazon Rekognition has in the accuracy of the detected text and bounding box (`Confidence`)
+ The type of the detected text (`Type`)

### Detected text
<a name="text-detected-text"></a>

Each `TextDetection` element contains recognized text (words or lines) in the `DetectedText` field. A word is one or more script characters not separated by spaces. `DetectText` can detect up to 100 words in an image. Returned text might include characters that make a word unrecognizable. For example, *C@t* instead of *Cat*. To determine whether a `TextDetection` element represents a line of text or a word, use the `Type` field.

 

Each `TextDetection` element includes a percentage value that represents the degree of confidence that Amazon Rekognition has in the accuracy of the detected text and of the bounding box that surrounds the text. 

### Word and line relationships
<a name="text-ids"></a>

Each `TextDetection` element has an identifier field, `Id`. The `Id` shows the word's position in a line. If the element is a word, the parent identifier field, `ParentId`, identifies the line where the word was detected. The `ParentId` for a line is null. For example, the line "but keep" in the example image has the following the `Id` and `ParentId` values: 


|  Text  |  ID  |  Parent ID  | 
| --- | --- | --- | 
|  but keep  |  3  |     | 
|  but  |  8  |  3  | 
|  keep  |  9  |  3  | 

### Text location on an image
<a name="text-location"></a>

To determine where the recognized text is on an image, use the bounding box ([Geometry](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_Geometry.html)) information that's returned by `DetectText`. The `Geometry` object contains two types of bounding box information for detected lines and words:
+ An axis-aligned coarse rectangular outline in a [BoundingBox](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_BoundingBox.html) object
+ A finer-grained polygon that's made up of multiple X and Y coordinates in a [Point](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_Point.html) array

The bounding box and polygon coordinates show where the text is located on the source image. The coordinate values are a ratio of the overall image size. For more information, see [BoundingBox](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_BoundingBox.html). 

The following JSON response from the `DetectText` operation shows the words and lines that were detected in the following image.

![\[Smiling coffee mug next to text that says "It's Monday but keep Smiling" on a brick background, with text bounding boxes.\]](http://docs.aws.amazon.com/rekognition/latest/dg/images/text.png)


```
{
 'TextDetections': [{'Confidence': 99.35693359375,
                     'DetectedText': "IT'S",
                     'Geometry': {'BoundingBox': {'Height': 0.09988046437501907,
                                                  'Left': 0.6684935688972473,
                                                  'Top': 0.18226495385169983,
                                                  'Width': 0.1461552083492279},
                                  'Polygon': [{'X': 0.6684935688972473,
                                               'Y': 0.1838926374912262},
                                              {'X': 0.8141663074493408,
                                               'Y': 0.18226495385169983},
                                              {'X': 0.8146487474441528,
                                               'Y': 0.28051772713661194},
                                              {'X': 0.6689760088920593,
                                               'Y': 0.2821454107761383}]},
                     'Id': 0,
                     'Type': 'LINE'},
                    {'Confidence': 99.6207275390625,
                     'DetectedText': 'MONDAY',
                     'Geometry': {'BoundingBox': {'Height': 0.11442459374666214,
                                                  'Left': 0.5566731691360474,
                                                  'Top': 0.3525116443634033,
                                                  'Width': 0.39574965834617615},
                                  'Polygon': [{'X': 0.5566731691360474,
                                               'Y': 0.353712260723114},
                                              {'X': 0.9522717595100403,
                                               'Y': 0.3525116443634033},
                                              {'X': 0.9524227976799011,
                                               'Y': 0.4657355844974518},
                                              {'X': 0.5568241477012634,
                                               'Y': 0.46693623065948486}]},
                     'Id': 1,
                     'Type': 'LINE'},
                    {'Confidence': 99.6160888671875,
                     'DetectedText': 'but keep',
                     'Geometry': {'BoundingBox': {'Height': 0.08314694464206696,
                                                  'Left': 0.6398131847381592,
                                                  'Top': 0.5267938375473022,
                                                  'Width': 0.2021435648202896},
                                  'Polygon': [{'X': 0.640289306640625,
                                               'Y': 0.5267938375473022},
                                              {'X': 0.8419567942619324,
                                               'Y': 0.5295097827911377},
                                              {'X': 0.8414806723594666,
                                               'Y': 0.609940767288208},
                                              {'X': 0.6398131847381592,
                                               'Y': 0.6072247624397278}]},
                     'Id': 2,
                     'Type': 'LINE'},
                    {'Confidence': 88.95134735107422,
                     'DetectedText': 'Smiling',
                     'Geometry': {'BoundingBox': {'Height': 0.4326171875,
                                                  'Left': 0.46289217472076416,
                                                  'Top': 0.5634765625,
                                                  'Width': 0.5371078252792358},
                                  'Polygon': [{'X': 0.46289217472076416,
                                               'Y': 0.5634765625},
                                              {'X': 1.0, 'Y': 0.5634765625},
                                              {'X': 1.0, 'Y': 0.99609375},
                                              {'X': 0.46289217472076416,
                                               'Y': 0.99609375}]},
                     'Id': 3,
                     'Type': 'LINE'},
                    {'Confidence': 99.35693359375,
                     'DetectedText': "IT'S",
                     'Geometry': {'BoundingBox': {'Height': 0.09988046437501907,
                                                  'Left': 0.6684935688972473,
                                                  'Top': 0.18226495385169983,
                                                  'Width': 0.1461552083492279},
                                  'Polygon': [{'X': 0.6684935688972473,
                                               'Y': 0.1838926374912262},
                                              {'X': 0.8141663074493408,
                                               'Y': 0.18226495385169983},
                                              {'X': 0.8146487474441528,
                                               'Y': 0.28051772713661194},
                                              {'X': 0.6689760088920593,
                                               'Y': 0.2821454107761383}]},
                     'Id': 4,
                     'ParentId': 0,
                     'Type': 'WORD'},
                    {'Confidence': 99.6207275390625,
                     'DetectedText': 'MONDAY',
                     'Geometry': {'BoundingBox': {'Height': 0.11442466825246811,
                                                  'Left': 0.5566731691360474,
                                                  'Top': 0.35251158475875854,
                                                  'Width': 0.39574965834617615},
                                  'Polygon': [{'X': 0.5566731691360474,
                                               'Y': 0.3537122905254364},
                                              {'X': 0.9522718787193298,
                                               'Y': 0.35251158475875854},
                                              {'X': 0.9524227976799011,
                                               'Y': 0.4657355546951294},
                                              {'X': 0.5568241477012634,
                                               'Y': 0.46693626046180725}]},
                     'Id': 5,
                     'ParentId': 1,
                     'Type': 'WORD'},
                    {'Confidence': 99.96778869628906,
                     'DetectedText': 'but',
                     'Geometry': {'BoundingBox': {'Height': 0.0625,
                                                  'Left': 0.6402802467346191,
                                                  'Top': 0.5283203125,
                                                  'Width': 0.08027780801057816},
                                  'Polygon': [{'X': 0.6402802467346191,
                                               'Y': 0.5283203125},
                                              {'X': 0.7205580472946167,
                                               'Y': 0.5283203125},
                                              {'X': 0.7205580472946167,
                                               'Y': 0.5908203125},
                                              {'X': 0.6402802467346191,
                                               'Y': 0.5908203125}]},
                     'Id': 6,
                     'ParentId': 2,
                     'Type': 'WORD'},
                    {'Confidence': 99.26438903808594,
                     'DetectedText': 'keep',
                     'Geometry': {'BoundingBox': {'Height': 0.0818721204996109,
                                                  'Left': 0.7344760298728943,
                                                  'Top': 0.5280686020851135,
                                                  'Width': 0.10748066753149033},
                                  'Polygon': [{'X': 0.7349520921707153,
                                               'Y': 0.5280686020851135},
                                              {'X': 0.8419566750526428,
                                               'Y': 0.5295097827911377},
                                              {'X': 0.8414806127548218,
                                               'Y': 0.6099407076835632},
                                              {'X': 0.7344760298728943,
                                               'Y': 0.6084995269775391}]},
                     'Id': 7,
                     'ParentId': 2,
                     'Type': 'WORD'},
                    {'Confidence': 88.95134735107422,
                     'DetectedText': 'Smiling',
                     'Geometry': {'BoundingBox': {'Height': 0.4326171875,
                                                  'Left': 0.46289217472076416,
                                                  'Top': 0.5634765625,
                                                  'Width': 0.5371078252792358},
                                  'Polygon': [{'X': 0.46289217472076416,
                                               'Y': 0.5634765625},
                                              {'X': 1.0, 'Y': 0.5634765625},
                                              {'X': 1.0, 'Y': 0.99609375},
                                              {'X': 0.46289217472076416,
                                               'Y': 0.99609375}]},
                     'Id': 8,
                     'ParentId': 3,
                     'Type': 'WORD'}],
 'TextModelVersion': '3.0'}
```

# Detecting text in a stored video
<a name="text-detecting-video-procedure"></a>

Amazon Rekognition Video text detection in stored videos is an asynchronous operation. To start detecting text, call [StartTextDetection](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_StartTextDetection.html). Amazon Rekognition Video publishes the completion status of the video analysis to an Amazon SNS topic. If the video analysis is successful, call [GetTextDetection](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_GetTextDetection.html) to get the analysis results. For more information about starting video analysis and getting the results, see [Calling Amazon Rekognition Video operations](api-video.md).

This procedure expands on the code in [Analyzing a video stored in an Amazon S3 bucket with Java or Python (SDK)](video-analyzing-with-sqs.md). It uses an Amazon SQS queue to get the completion status of a video analysis request.

**To detect text in a video stored in an Amazon S3 bucket (SDK)**

1. Perform the steps in [Analyzing a video stored in an Amazon S3 bucket with Java or Python (SDK)](video-analyzing-with-sqs.md).

1. Add the following code to the class `VideoDetect` in step 1.

------
#### [ Java ]

   ```
   //Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved.
   //PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)
   
   
   private static void StartTextDetection(String bucket, String video) throws Exception{
              
       NotificationChannel channel= new NotificationChannel()
               .withSNSTopicArn(snsTopicArn)
               .withRoleArn(roleArn);
       
       StartTextDetectionRequest req = new StartTextDetectionRequest()
               .withVideo(new Video()
                       .withS3Object(new S3Object()
                           .withBucket(bucket)
                           .withName(video)))
               .withNotificationChannel(channel);
       
       
       StartTextDetectionResult startTextDetectionResult = rek.startTextDetection(req);
       startJobId=startTextDetectionResult.getJobId();
       
   } 
   
   private static void GetTextDetectionResults() throws Exception{
       
       int maxResults=10;
       String paginationToken=null;
       GetTextDetectionResult textDetectionResult=null;
       
       do{
           if (textDetectionResult !=null){
               paginationToken = textDetectionResult.getNextToken();
   
           }
           
       
           textDetectionResult = rek.getTextDetection(new GetTextDetectionRequest()
                .withJobId(startJobId)
                .withNextToken(paginationToken)
                .withMaxResults(maxResults));
       
           VideoMetadata videoMetaData=textDetectionResult.getVideoMetadata();
               
           System.out.println("Format: " + videoMetaData.getFormat());
           System.out.println("Codec: " + videoMetaData.getCodec());
           System.out.println("Duration: " + videoMetaData.getDurationMillis());
           System.out.println("FrameRate: " + videoMetaData.getFrameRate());
               
               
           //Show text, confidence values
           List<TextDetectionResult> textDetections = textDetectionResult.getTextDetections();
   
   
           for (TextDetectionResult text: textDetections) {
               long seconds=text.getTimestamp()/1000;
               System.out.println("Sec: " + Long.toString(seconds) + " ");
               TextDetection detectedText=text.getTextDetection();
               
               System.out.println("Text Detected: " + detectedText.getDetectedText());
                   System.out.println("Confidence: " + detectedText.getConfidence().toString());
                   System.out.println("Id : " + detectedText.getId());
                   System.out.println("Parent Id: " + detectedText.getParentId());
                   System.out.println("Bounding Box" + detectedText.getGeometry().getBoundingBox().toString());
                   System.out.println("Type: " + detectedText.getType());
                   System.out.println();
           }
       } while (textDetectionResult !=null && textDetectionResult.getNextToken() != null);
         
           
   }
   ```

   In the function `main`, replace the lines: 

   ```
           StartLabelDetection(amzn-s3-demo-bucket, video);
   
           if (GetSQSMessageSuccess()==true)
           	GetLabelDetectionResults();
   ```

   with:

   ```
           StartTextDetection(amzn-s3-demo-bucket, video);
   
           if (GetSQSMessageSuccess()==true)
           	GetTextDetectionResults();
   ```

------
#### [ Java V2 ]

   This code is taken from the AWS Documentation SDK examples GitHub repository. See the full example [here](https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/javav2/example_code/rekognition/src/main/java/com/example/rekognition/VideoDetectText.java).

   ```
   //snippet-start:[rekognition.java2.recognize_video_text.import]
   import software.amazon.awssdk.auth.credentials.ProfileCredentialsProvider;
   import software.amazon.awssdk.regions.Region;
   import software.amazon.awssdk.services.rekognition.RekognitionClient;
   import software.amazon.awssdk.services.rekognition.model.S3Object;
   import software.amazon.awssdk.services.rekognition.model.NotificationChannel;
   import software.amazon.awssdk.services.rekognition.model.Video;
   import software.amazon.awssdk.services.rekognition.model.StartTextDetectionRequest;
   import software.amazon.awssdk.services.rekognition.model.StartTextDetectionResponse;
   import software.amazon.awssdk.services.rekognition.model.RekognitionException;
   import software.amazon.awssdk.services.rekognition.model.GetTextDetectionResponse;
   import software.amazon.awssdk.services.rekognition.model.GetTextDetectionRequest;
   import software.amazon.awssdk.services.rekognition.model.VideoMetadata;
   import software.amazon.awssdk.services.rekognition.model.TextDetectionResult;
   import java.util.List;
   //snippet-end:[rekognition.java2.recognize_video_text.import]
   
   /**
   * Before running this Java V2 code example, set up your development environment, including your credentials.
   *
   * For more information, see the following documentation topic:
   *
   * https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/get-started.html
   */
   public class DetectTextVideo {
   
    private static String startJobId ="";
    public static void main(String[] args) {
   
        final String usage = "\n" +
            "Usage: " +
            "   <bucket> <video> <topicArn> <roleArn>\n\n" +
            "Where:\n" +
            "   bucket - The name of the bucket in which the video is located (for example, (for example, amzn-s3-demo-bucket). \n\n"+
            "   video - The name of video (for example, people.mp4). \n\n" +
            "   topicArn - The ARN of the Amazon Simple Notification Service (Amazon SNS) topic. \n\n" +
            "   roleArn - The ARN of the AWS Identity and Access Management (IAM) role to use. \n\n" ;
   
        if (args.length != 4) {
            System.out.println(usage);
            System.exit(1);
        }
   
        String bucket = args[0];
        String video = args[1];
        String topicArn = args[2];
        String roleArn = args[3];
   
        Region region = Region.US_EAST_1;
        RekognitionClient rekClient = RekognitionClient.builder()
            .region(region)
            .credentialsProvider(ProfileCredentialsProvider.create("profile-name"))
            .build();
   
        NotificationChannel channel = NotificationChannel.builder()
            .snsTopicArn(topicArn)
            .roleArn(roleArn)
            .build();
   
        startTextLabels(rekClient, channel, bucket, video);
        GetTextResults(rekClient);
        System.out.println("This example is done!");
        rekClient.close();
    }
   
    // snippet-start:[rekognition.java2.recognize_video_text.main]
    public static void startTextLabels(RekognitionClient rekClient,
                                   NotificationChannel channel,
                                   String bucket,
                                   String video) {
        try {
            S3Object s3Obj = S3Object.builder()
                .bucket(bucket)
                .name(video)
                .build();
   
            Video vidOb = Video.builder()
                .s3Object(s3Obj)
                .build();
   
            StartTextDetectionRequest labelDetectionRequest = StartTextDetectionRequest.builder()
                .jobTag("DetectingLabels")
                .notificationChannel(channel)
                .video(vidOb)
                .build();
   
            StartTextDetectionResponse labelDetectionResponse = rekClient.startTextDetection(labelDetectionRequest);
            startJobId = labelDetectionResponse.jobId();
   
        } catch (RekognitionException e) {
            System.out.println(e.getMessage());
            System.exit(1);
        }
    }
   
    public static void GetTextResults(RekognitionClient rekClient) {
   
        try {
            String paginationToken=null;
            GetTextDetectionResponse textDetectionResponse=null;
            boolean finished = false;
            String status;
            int yy=0 ;
   
            do{
                if (textDetectionResponse !=null)
                    paginationToken = textDetectionResponse.nextToken();
   
                GetTextDetectionRequest recognitionRequest = GetTextDetectionRequest.builder()
                    .jobId(startJobId)
                    .nextToken(paginationToken)
                    .maxResults(10)
                    .build();
   
                // Wait until the job succeeds.
                while (!finished) {
                    textDetectionResponse = rekClient.getTextDetection(recognitionRequest);
                    status = textDetectionResponse.jobStatusAsString();
   
                    if (status.compareTo("SUCCEEDED") == 0)
                        finished = true;
                    else {
                        System.out.println(yy + " status is: " + status);
                        Thread.sleep(1000);
                    }
                    yy++;
                }
   
                finished = false;
   
                // Proceed when the job is done - otherwise VideoMetadata is null.
                VideoMetadata videoMetaData=textDetectionResponse.videoMetadata();
                System.out.println("Format: " + videoMetaData.format());
                System.out.println("Codec: " + videoMetaData.codec());
                System.out.println("Duration: " + videoMetaData.durationMillis());
                System.out.println("FrameRate: " + videoMetaData.frameRate());
                System.out.println("Job");
   
                List<TextDetectionResult> labels= textDetectionResponse.textDetections();
                for (TextDetectionResult detectedText: labels) {
                    System.out.println("Confidence: " + detectedText.textDetection().confidence().toString());
                    System.out.println("Id : " + detectedText.textDetection().id());
                    System.out.println("Parent Id: " + detectedText.textDetection().parentId());
                    System.out.println("Type: " + detectedText.textDetection().type());
                    System.out.println("Text: " + detectedText.textDetection().detectedText());
                    System.out.println();
                }
   
            } while (textDetectionResponse !=null && textDetectionResponse.nextToken() != null);
   
        } catch(RekognitionException | InterruptedException e) {
            System.out.println(e.getMessage());
            System.exit(1);
        }
    }
    // snippet-end:[rekognition.java2.recognize_video_text.main]
   }
   ```

------
#### [ Python ]

   ```
   #Copyright 2019 Amazon.com, Inc. or its affiliates. All Rights Reserved.
   #PDX-License-Identifier: MIT-0 (For details, see https://github.com/awsdocs/amazon-rekognition-developer-guide/blob/master/LICENSE-SAMPLECODE.)
   
       def StartTextDetection(self):
           response=self.rek.start_text_detection(Video={'S3Object': {'Bucket': self.bucket, 'Name': self.video}},
               NotificationChannel={'RoleArn': self.roleArn, 'SNSTopicArn': self.snsTopicArn})
   
           self.startJobId=response['JobId']
           print('Start Job Id: ' + self.startJobId)
     
       def GetTextDetectionResults(self):
           maxResults = 10
           paginationToken = ''
           finished = False
   
           while finished == False:
               response = self.rek.get_text_detection(JobId=self.startJobId,
                                               MaxResults=maxResults,
                                               NextToken=paginationToken)
   
               print('Codec: ' + response['VideoMetadata']['Codec'])
               
               print('Duration: ' + str(response['VideoMetadata']['DurationMillis']))
               print('Format: ' + response['VideoMetadata']['Format'])
               print('Frame rate: ' + str(response['VideoMetadata']['FrameRate']))
               print()
   
               for textDetection in response['TextDetections']:
                   text=textDetection['TextDetection']
   
                   print("Timestamp: " + str(textDetection['Timestamp']))
                   print("   Text Detected: " + text['DetectedText'])
                   print("   Confidence: " +  str(text['Confidence']))
                   print ("      Bounding box")
                   print ("        Top: " + str(text['Geometry']['BoundingBox']['Top']))
                   print ("        Left: " + str(text['Geometry']['BoundingBox']['Left']))
                   print ("        Width: " +  str(text['Geometry']['BoundingBox']['Width']))
                   print ("        Height: " +  str(text['Geometry']['BoundingBox']['Height']))
                   print ("   Type: " + str(text['Type']) )
                   print()
   
               if 'NextToken' in response:
                   paginationToken = response['NextToken']
               else:
                   finished = True
   ```

   In the function `main`, replace the lines:

   ```
       analyzer.StartLabelDetection()
       if analyzer.GetSQSMessageSuccess()==True:
           analyzer.GetLabelDetectionResults()
   ```

   with:

   ```
       analyzer.StartTextDetection()
       if analyzer.GetSQSMessageSuccess()==True:
           analyzer.GetTextDetectionResults()
   ```

------
#### [ CLI ]

   Run the following AWS CLI command to start detecting text in a video.

   ```
    aws rekognition start-text-detection --video "{"S3Object":{"Bucket":"amzn-s3-demo-bucket","Name":"video-name"}}"\
    --notification-channel "{"SNSTopicArn":"topic-arn","RoleArn":"role-arn"}" \
    --region region-name --profile profile-name
   ```

   Update the following values:
   + Change `amzn-s3-demo-bucket` and `video-name` to the Amazon S3 bucket name and file name that you specified in step 2.
   + Change `region-name` to the AWS region that you're using.
   + Replace the value of `profile-name` with the name of your developer profile.
   + Change `topic-ARN` to the ARN of the Amazon SNS topic you created in step 3 of [Configuring Amazon Rekognition Video](api-video-roles.md).
   + Change `role-ARN` to the ARN of the IAM service role you created in step 7 of [Configuring Amazon Rekognition Video](api-video-roles.md).

   If you are accessing the CLI on a Windows device, use double quotes instead of single quotes and escape the inner double quotes by backslash (i.e. \$1) to address any parser errors you may encounter. For an example, see below: 

   ```
   aws rekognition start-text-detection --video \
    "{\"S3Object\":{\"Bucket\":\"amzn-s3-demo-bucket\",\"Name\":\"video-name\"}}" \
    --notification-channel "{\"SNSTopicArn\":\"topic-arn\",\"RoleArn\":\"role-arn\"}" \
    --region region-name --profile profile-name
   ```

   After running the proceeding code example, copy down the returned `jobID` and provide it to the following `GetTextDetection` command below to get your results, replacing `job-id-number` with the `jobID` you previously received: 

   ```
   aws rekognition get-text-detection --job-id job-id-number --profile profile-name             
   ```

------
**Note**  
If you've already run a video example other than [Analyzing a video stored in an Amazon S3 bucket with Java or Python (SDK)](video-analyzing-with-sqs.md), the code to replace might be different.

1. Run the code. Text that was detected in the video is shown in a list.

## Filters
<a name="text-detection-filters"></a>

Filters are optional request parameters that can be used when you call `StartTextDetection`. Filtering by text region, size and confidence score provides you with additional flexibility to control your text detection output. By using regions of interest, you an easily limit text detection to the regions that are relevant, for example, a bottom third region for graphics or a top left corner for reading scoreboards in a soccer game. Word bounding box size filter can be used to avoid small background text which may be noisy or irrelevant. And lastly, word confidence filter enables you to remove results that may be unreliable due to being blurry or smudged. 

For information regarding filter values, see `[DetectTextFilters](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_DetectTextFilters.html)`.

You can use the following filters:
+ **MinConfidence** –Sets the confidence level of word detection. Words with detection confidence below this level are excluded from the result. Values should be between 0 and 100.
+ **MinBoundingBoxWidth** – Sets the minimum width of the word bounding box. Words with bounding boxes that are smaller than this value are excluded from the result. The value is relative to the video frame width.
+ **MinBoundingBoxHeight** – Sets the minimum height of the word bounding box. Words with bounding box heights less than this value are excluded from the result. The value is relative to the video frame height.
+ **RegionsOfInterest** – Limits detection to a specific region of the frame. The values are relative to the frame dimensions. For objects only partially within the regions, the response is undefined.

## GetTextDetection response
<a name="text-detecting-video-response"></a>

`GetTextDetection` returns an array (`TextDetectionResults`) that contains information about the text detected in the video. An array element, [TextDetection](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_TextDetection.html), exists for each time a word or line is detected in the video. The array elements are sorted by time (in milliseconds) since the start of the video.

The following is a partial JSON response from `GetTextDetection`. In the response, note the following:
+ **Text information** – The `TextDetectionResult` array element contains information about the detected text ([TextDetection](https://docs.aws.amazon.com/rekognition/latest/APIReference/API_TextDetection.html)) and the time that the text was detected in the video (`Timestamp`).
+ **Paging information** – The example shows one page of text detection information. You can specify how many text elements to return in the `MaxResults` input parameter for `GetTextDetection`. If more results than `MaxResults` exist, or there are more results than the default maximum, `GetTextDetection` returns a token (`NextToken`) that's used to get the next page of results. For more information, see [Getting Amazon Rekognition Video analysis results](api-video.md#api-video-get).
+ **Video information** – The response includes information about the video format (`VideoMetadata`) in each page of information that's returned by `GetTextDetection`.

```
{
    "JobStatus": "SUCCEEDED",
    "VideoMetadata": {
        "Codec": "h264",
        "DurationMillis": 174441,
        "Format": "QuickTime / MOV",
        "FrameRate": 29.970029830932617,
        "FrameHeight": 480,
        "FrameWidth": 854
    },
    "TextDetections": [
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle Twinkle Little Star",
                "Type": "LINE",
                "Id": 0,
                "Confidence": 99.91780090332031,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.8337579369544983,
                        "Height": 0.08365312218666077,
                        "Left": 0.08313830941915512,
                        "Top": 0.4663468301296234
                    },
                    "Polygon": [
                        {
                            "X": 0.08313830941915512,
                            "Y": 0.4663468301296234
                        },
                        {
                            "X": 0.9168962240219116,
                            "Y": 0.4674469828605652
                        },
                        {
                            "X": 0.916861355304718,
                            "Y": 0.5511001348495483
                        },
                        {
                            "X": 0.08310343325138092,
                            "Y": 0.5499999523162842
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle",
                "Type": "WORD",
                "Id": 1,
                "ParentId": 0,
                "Confidence": 99.98338317871094,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.2423887550830841,
                        "Height": 0.0833333358168602,
                        "Left": 0.08313817530870438,
                        "Top": 0.46666666865348816
                    },
                    "Polygon": [
                        {
                            "X": 0.08313817530870438,
                            "Y": 0.46666666865348816
                        },
                        {
                            "X": 0.3255269229412079,
                            "Y": 0.46666666865348816
                        },
                        {
                            "X": 0.3255269229412079,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.08313817530870438,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Twinkle",
                "Type": "WORD",
                "Id": 2,
                "ParentId": 0,
                "Confidence": 99.982666015625,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.2423887550830841,
                        "Height": 0.08124999701976776,
                        "Left": 0.3454332649707794,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.3454332649707794,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.5878220200538635,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.5878220200538635,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.3454332649707794,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Little",
                "Type": "WORD",
                "Id": 3,
                "ParentId": 0,
                "Confidence": 99.8787612915039,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.16627635061740875,
                        "Height": 0.08124999701976776,
                        "Left": 0.6053864359855652,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.6053864359855652,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.7716627717018127,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.7716627717018127,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.6053864359855652,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        },
        {
            "Timestamp": 967,
            "TextDetection": {
                "DetectedText": "Star",
                "Type": "WORD",
                "Id": 4,
                "ParentId": 0,
                "Confidence": 99.82640075683594,
                "Geometry": {
                    "BoundingBox": {
                        "Width": 0.12997658550739288,
                        "Height": 0.08124999701976776,
                        "Left": 0.7868852615356445,
                        "Top": 0.46875
                    },
                    "Polygon": [
                        {
                            "X": 0.7868852615356445,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.9168618321418762,
                            "Y": 0.46875
                        },
                        {
                            "X": 0.9168618321418762,
                            "Y": 0.550000011920929
                        },
                        {
                            "X": 0.7868852615356445,
                            "Y": 0.550000011920929
                        }
                    ]
                }
            }
        }
    ],
    "NextToken": "NiHpGbZFnkM/S8kLcukMni15wb05iKtquu/Mwc+Qg1LVlMjjKNOD0Z0GusSPg7TONLe+OZ3P",
    "TextModelVersion": "3.0"
}
```