

# Using Nova Embeddings


Amazon Nova Multimodal Embeddings is a state-of-the-art multimodal embeddings model for agentic RAG and semantic search applications. It is the first unified embeddings model that supports text, documents, images, video, and audio through a single model, to enable cross-modal retrieval with leading accuracy. Nova Multimodal Embeddings maps each of these content types into a unified semantic space enabling developers to conduct unimodal, cross-modal and multimodal vector operations.

The Nova Embeddings API can be leveraged in a variety of applications, such as:
+ Semantic Content Retrieval and Recommendation: Generate embeddings for your content, then use them to find similar items or provide personalized recommendations to your users.
+ Multimodal Search: Combine embeddings from different content types to enable powerful cross-modal search capabilities.
+ RAG: Generate embeddings from multimodal content such as documents with interleaved text and images to power your retrieval workflow for GenAI applications.

# Key Features

+ Support for text, image, document image, video and audio in a unified semantic space. The maximum context length is 8K tokens or 30s of video and 30s of audio.
+ Synchronous and asynchronous APIs: The API supports both synchronous and asynchronous use.
+ Large file segmentation: The async API makes it easy to work with large inputs by providing API built segmentation for long text, video, and audio, controlled by user-defined parameters. The model will generate a single embedding for each segment.
+ Video with audio: Process video with audio simultaneously. The API enables you to specify if you would like a single embedding representing both modalities or two separate embeddings representing the video and audio stream respectively.
+ Embedding purpose: Nova Multimodal Embeddings enables you to optimize your embeddings depending on the intended downstream application. Supported use-cases include retrieval (RAG/Search), classification, and clustering. The specific values depend on the application (see best practices).
+ Dimension sizes: 4 dimension sizes to trade-off embedding accuracy and vector storage cost: 3072; 1024; 384; 256.
+ Input methods: You can either pass content to be embed by specifying an S3 URI or inline as a base64 encoding.

# How Nova Multimodal Embeddings works

+ When a piece of content is passed through Nova embeddings, the model converts that content into a universal numerical format, referred to as a vector. A vector is a set of arbitrary numerical values which can then be used for various search functionalities. Similar content is given a closer vector than less similar content. For example content that could be described as "happy" is given a vector closer to a vector like "joyful" as opposed to one like "sadness".

## Prerequisites


To use Multimodal Embeddings, you need the following:
+ Python installed
+ The AWS CLI Installed
+ The AWS CLI configured with access credentials for your AWS account
+ The Nova Multimodal Embeddings model enabled on your AWS Account

With these enabled, you can perform either asynchronous or synchronous embeddings requests.

## Generating embeddings synchronously


For smaller content items, you can use the Bedrock Runtime InvokeModel API. This is a good option for quickly generating embeddings for text, images, or short audio/video files.

The following example generates a synchronous embedding for the text "Hello World\$1"

```
import json
import boto3

# Create the Bedrock Runtime client.
bedrock_runtime = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1",
)

# Define the request body.
request_body = {
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": 3072,
        "text": {"truncationMode": "END", "value": "Hello, World!"},
    },
}

try:
    # Invoke the Nova Embeddings model.
    response = bedrock_runtime.invoke_model(
        body=json.dumps(request_body, indent=2),
        modelId="amazon.nova-2-multimodal-embeddings-v1:0",
        accept="application/json",
        contentType="application/json",
    )
    
except Exception as e:
    # Add your own exception handling here.
    print(e)
    
# Print the request ID.
print("Request ID:", response.get("ResponseMetadata").get("RequestId"))

# Print the response body.
response_body = json.loads(response.get("body").read())
print(json.dumps(response_body, indent=2))
```

The output will look like this:

```
   Request ID: fde55db5-c129-423b-c62d-7a8b36cf2859
{
  "embeddings": [
    {
      "embeddingType": "TEXT",
      "embedding": [
        0.031115104,
        0.032478657,
        0.10006265,
        ...
      ]
    }
  ]
}
```

## Generating embeddings asynchronously


For larger content files, you can use the Bedrock Runtime StartAsyncInvoke function to generate embeddings asynchronously. This allows you to submit a job and retrieve the results later, without blocking application execution. Results are saved to Amazon S3.

The following example starts an asynchronous embedding generation job for a video file:

```
import boto3

# Create the Bedrock Runtime client.
bedrock_runtime = boto3.client(
    service_name="bedrock-runtime",
    region_name="us-east-1",
)

model_input = {
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX",
        "embeddingDimension": 3072,
        "video": {
            "format": "mp4",
            "embeddingMode": "AUDIO_VIDEO_COMBINED",
            "source": {
                "s3Location": {"uri": "s3://amzn-s3-demo-bucket/path/to/video.mp4"}
            },
            "segmentationConfig": {
                "durationSeconds": 15  # Segment into 15 second chunks
            },
        },
    },
}

try:
    # Invoke the Nova Embeddings model.
    response = bedrock_runtime.start_async_invoke(
        modelId="amazon.nova-2-multimodal-embeddings-v1:0",
        modelInput=model_input,
        outputDataConfig={
            "s3OutputDataConfig": {
                "s3Uri": "s3://amzn-s3-demo-bucket"
            }
        },
    )

except Exception as e:
    # Add your own exception handling here.
    print(e)

# Print the request ID.
print("Request ID:", response.get("ResponseMetadata").get("RequestId"))

# Print the invocation ARN.
print("Invocation ARN:", response.get("invocationArn"))
```

The output will look like this:

```
   Request ID: 07681e80-5ce0-4723-cf52-68bf699cd23e
   Invocation ARN: arn:aws:bedrock:us-east-1:111122223333:async-invoke/g7ur3b32a10n
```

After you start the async job, use the invocationArn to check the job status with the GetAsyncInvoke function. To view recent async invocations and their status, use the ListAsyncInvokes function.

 When asynchronous embeddings generation is complete, artifacts are written to the S3 bucket you specified as the output destination. The files will have the following structure: 

```
   amzn-s3-demo-bucket/
    job-id/
        segmented-embedding-result.json
        embedding-audio.jsonl
        embedding-image.json
        embedding-text.jsonl
        embedding-video.jsonl
        manifest.json
```

# Complete embeddings request and response schema


## Complete synchronous schema




```
{
    "schemaVersion": "nova-multimodal-embed-v1",
    "taskType": "SINGLE_EMBEDDING",
    "singleEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX" | "GENERIC_RETRIEVAL" | "TEXT_RETRIEVAL" | "IMAGE_RETRIEVAL" | "VIDEO_RETRIEVAL" | "DOCUMENT_RETRIEVAL" | "AUDIO_RETRIEVAL" | "CLASSIFICATION" | "CLUSTERING",
        "embeddingDimension": 256 | 384 | 1024 | 3072,
        "text": {
            "truncationMode": "START" | "END" | "NONE",
            "value": string,
            "source": SourceObject,
        },
        "image": {
            "detailLevel": "STANDARD_IMAGE" | "DOCUMENT_IMAGE",
            "format": "png" | "jpeg" | "gif" | "webp",
            "source": SourceObject
        },
        "audio": {
            "format": "mp3" | "wav" | "ogg",
            "source": SourceObject
        },
        "video": {
            "format": "mp4" | "mov" | "mkv" | "webm" | "flv" | "mpeg" | "mpg" | "wmv" | "3gp",
            "source": SourceObject,
            "embeddingMode": "AUDIO_VIDEO_COMBINED" | "AUDIO_VIDEO_SEPARATE"
        }
    }
}
```

The following list includes all of the parameters for the request:
+ `schemaVersion` (Optional) - The schema version for the multimodal embedding model request
  + Type: string
  + Allowed values: "nova-multimodal-embed-v1"
  + Default: "nova-multimodal-embed-v1"
+ `taskType` (Required) - Specifies the type of embedding operation to perform on the input content. `single_embedding` refers to generating one embedding per model input. `segmented_embedding` refers to first segmenting the model input per user specification and then generating a single embedding per segment.
  + Type: string
  + Allowed values: Must be "SINGLE\$1EMBEDDING" for synchronous calls.
+ `singleEmbeddingParams` (Required)
  + `embeddingPurpose` (Required) - Nova Multimodal Embeddings enables you to optimize your embeddings depending on the intended application. Examples include MM-RAG, Digital Asset Management for image and video search, similarity comparison for multimodal content, or document classification for Intelligent Document Processing. `embeddingPurpose` enables you to specify the embedding use-case. Select the correct value depending on the use-case below.
    + **Search and Retrieval:** Embedding use cases like RAG and search involve two main steps: first, creating an index by generating embeddings for the content, and second, retrieving the most relevant content from the index during search. Use the following values when working with search and retrieval use-cases:
      + Indexing:
        + "GENERIC\$1INDEX" - Creates embeddings optimized for use as indexes in a vector data store. This value should be used irrespective of the modality you are indexing.
      + Search/retrieval: Optimize your embeddings depending on the type of content you are retrieving:
        + "TEXT\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing only text embeddings.
        + "IMAGE\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing only image embeddings created with the "STANDARD\$1IMAGE" detailLevel.
        + "VIDEO\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing only video embeddings or embeddings created with the "AUDIO\$1VIDEO\$1COMBINED" embedding mode.
        + "DOCUMENT\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing only document image embeddings created with the "DOCUMENT\$1IMAGE" detailLevel.
        + "AUDIO\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing only audio embeddings.
        + "GENERIC\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing mixed modality embeddings.
      + Example: In an image search app where users retrieve images using text queries, use `embeddingPurpose = generic_index` when creating an embedding index based on the images and use `embeddingPurpose = image_retrieval` when creating an embedding of the query used to retrieve the images.
    + "CLASSIFICATION" - Creates embeddings optimized for performing classification.
    + "CLUSTERING" - Creates embeddings optimized for clustering.
  + `embeddingDimension` (Optional) - The size of the vector to generate.
    + Type: int
    + Allowed values: 256 \$1 384 \$1 1024 \$1 3072
    + Default: 3072
  + `text` (Optional) - Represents text content. Exactly one of text, image, video, audio must be present.
    + `truncationMode` (Required) - Specifies which part of the text will be truncated in cases where the tokenized version of the text exceeds the maximum supported by the model.
      + Type: string
      + Allowed values:
        + "START" - Omit characters from the start of the text when necessary.
        + "END" - Omit characters from the end of the text when necessary.
        + "NONE" - Fail if text length exceeds the model's maximum token limit.
    + `value` (Optional; Either value or source must be provided) - The text value for which to create the embedding.
      + Type: string
      + Max length: 8192 characters
    + `source` (Optional; Either value or source must be provided) - Reference to a text file stored in S3. Note that the bytes option of the SourceObject is not applicable for text inputs. To pass text inline as part of the request, use the value parameter instead.
      + Type: SourceObject (see "Common Objects" section)
  + `image` (Optional) - Represents image content. Exactly one of text, image, video, audio must be present.
    + `detailLevel` (Optional) - Dictates the resolution at which the image will be processed with "STANDARD\$1IMAGE" using a lower image resolution and "DOCUMENT\$1IMAGE" using a higher resolution image to better interpret text.
      + Type: string
      + Allowed values: "STANDARD\$1IMAGE" \$1 "DOCUMENT\$1IMAGE"
      + Default: "STANDARD\$1IMAGE"
    + `format` (Required)
      + Type: string
      + Allowed values: "png" \$1 "jpeg" \$1 "gif" \$1 "webp"
    + `source` (Required) - An image content source.
      + Type: SourceObject (see "Common Objects" section)
  + `audio` (Optional) - Represents audio content. Exactly one of text, image, video, audio must be present.
    + `format` (Required)
      + Type: string
      + Allowed values: "mp3" \$1 "wav" \$1 "ogg"
    + `source` (Required) - An audio content source.
      + Type: SourceObject (see "Common Objects" section)
      + Maximum audio duration: 30 seconds
  + `video` (Optional) - Represents video content. Exactly one of text, image, video, audio must be present.
    + `format` (Required)
      + Type: string
      + Allowed values: "mp4" \$1 "mov" \$1 "mkv" \$1 "webm" \$1 "flv" \$1 "mpeg" \$1 "mpg" \$1 "wmv" \$1 "3gp"
    + `source` (Required) - A video content source.
      + Type: SourceObject (see "Common Objects" section)
      + Maximum video duration: 30 seconds
    + `embeddingMode` (Required)
      + Type: string
      + Values: "AUDIO\$1VIDEO\$1COMBINED" \$1 "AUDIO\$1VIDEO\$1SEPARATE"
        + "AUDIO\$1VIDEO\$1COMBINED" - Will produce a single embedding combining both audible and visual content.
        + "AUDIO\$1VIDEO\$1SEPARATE" - Will produce two embeddings, one for the audible content and one for the visual content.

**InvokeModel Response Body**  
When [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_InvokeModel.html) returns a successful result, the body of the response will have the following structure:

```
{
   "embeddings": [
      {
          "embeddingType": "TEXT" | "IMAGE" | "VIDEO" | "AUDIO" | "AUDIO_VIDEO_COMBINED",
          "embedding": number[],
          "truncatedCharLength": int // Only included if text input was truncated
      }
    ]                       
}
```

The following list includes all of the parameters for the response:
+ `embeddings` (Required) - For most requests, this array will contain a single embedding. For video requests where the "AUDIO\$1VIDEO\$1SEPARATE" embeddingMode mode was selected, this array will contain two embeddings - one embedding for the video content and one for the the audio content.
  + Type: array of embeddings with the following properties
    + `embeddingType` (Required) - Reports the type of embedding that was created.
      + Type: string
      + Allowed values: "TEXT" \$1 "IMAGE" \$1 "VIDEO" \$1 "AUDIO" \$1 "AUDIO\$1VIDEO\$1COMBINED"
    + `embedding` (Required) - The embedding vector.
      + Type: number[]
    + `truncatedCharLength` (Optional) - Only applies to text embedding requests. Returned if the tokenized version of the input text exceeded the model's limitations. The value indicates the character after which the text was truncated before generating the embedding.
      + Type: int

## Complete asynchronous schema


You can generate embeddings asynchronously using the Amazon Bedrock Runtime API functions [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_StartAsyncInvoke.html), [GetAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetAsyncInvoke.html), and [ListAsyncInvokes](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListAsyncInvokes.html). The asynchronous API must be used if you want to use Nova Embeddings to segment long content such as long passage of text or video and audio longer than 30 seconds.

When calling [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_StartAsyncInvoke.html), you must provide `modelId`, `outputDataConfig`, and `modelInput` parameters.

```
response = bedrock_runtime.start_async_invoke(
    modelId="amazon.nova-2-multimodal-embeddings-v1:0",
    outputDataConfig=Data Config,
    modelInput=Model Input
)
```

`outputDataConfig` specifies the S3 bucket to which you'd like to save the generated output. It has the following structure:

```
{
    "s3OutputDataConfig": {
        "s3Uri": "s3://your-s3-bucket"
    }
}
```

The `s3Uri` is the S3 URI of the destination bucket. For additional optional parameters, see the StartAsyncInvoke documentation.

The following structure is used for the `modelInput` parameter.

```
{
    "schemaVersion": "nova-multimodal-embed-v1",
    "taskType": "SEGMENTED_EMBEDDING",
    "segmentedEmbeddingParams": {
        "embeddingPurpose": "GENERIC_INDEX" | "GENERIC_RETRIEVAL" | "TEXT_RETRIEVAL" | "IMAGE_RETRIEVAL" | "VIDEO_RETRIEVAL" | "DOCUMENT_RETRIEVAL" | "AUDIO_RETRIEVAL" | "CLASSIFICATION" | "CLUSTERING",
        "embeddingDimension": 256 | 384 | 1024 | 3072,
        "text": {
            "truncationMode": "START" | "END" | "NONE",
            "value": string,
            "source": {
                "s3Location": {
                    "uri": "s3://Your S3 Object"
                }
            },
            "segmentationConfig": {
                "maxLengthChars": int
            }
        },
        "image": {
            "format": "png" | "jpeg" | "gif" | "webp",
            "source": SourceObject,
            "detailLevel": "STANDARD_IMAGE" | "DOCUMENT_IMAGE"
        },
        "audio": {
            "format": "mp3" | "wav" | "ogg",
            "source": SourceObject,
            "segmentationConfig": {
                "durationSeconds": int
            }
        },
        "video": {
            "format": "mp4" | "mov" | "mkv" | "webm" | "flv" | "mpeg" | "mpg" | "wmv" | "3gp",
            "source": SourceObject,
            "embeddingMode": "AUDIO_VIDEO_COMBINED" | "AUDIO_VIDEO_SEPARATE",
            "segmentationConfig": {
                "durationSeconds": int
            }
        }
    }
}
```

The following list includes all of the parameters for the request:
+ `schemaVersion` (Optional) - The schema version for the multimodal embedding model request
  + Type: string
  + Allowed values: "nova-multimodal-embed-v1"
  + Default: "nova-multimodal-embed-v1"
+ `taskType` (Required) - Specifies the type of embedding operation to perform on the input content. `single_embedding` refers to generating one embedding per model input. `segmented_embedding` refers to first segmenting the model input per user specification and then generating a single embedding per segment.
  + Type: string
  + Allowed values: Must be "SEGMENTED\$1EMBEDDING" for asynchronous calls.
+ `segmentedEmbeddingParams` (Required)
  + `embeddingPurpose` (Required) - Nova Multimodal Embeddings enables you to optimize your embeddings depending on the intended application. Examples include MM-RAG, Digital Asset Management for image and video search, similarity comparison for multimodal content, or document classification for Intelligent Document Processing. `embeddingPurpose` enables you to specify the embedding use-case. Select the correct value depending on the use-case below.
    + **Search and Retrieval:** Embedding use cases like RAG and search involve two main steps: first, creating an index by generating embeddings for the content, and second, retrieving the most relevant content from the index during search. Use the following values when working with search and retrieval use-cases:
      + Indexing:
        + "GENERIC\$1INDEX" - Creates embeddings optimized for use as indexes in a vector data store. This value should be used irrespective of the modality you are indexing.
      + Search/retrieval: Optimize your embeddings depending on the type of content you are retrieving:
        + "TEXT\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing only text embeddings.
        + "IMAGE\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing only image embeddings created with the "STANDARD\$1IMAGE" detailLevel.
        + "VIDEO\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing only video embeddings or embeddings created with the "AUDIO\$1VIDEO\$1COMBINED" embedding mode.
        + "DOCUMENT\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing only document image embeddings created with the "DOCUMENT\$1IMAGE" detailLevel.
        + "AUDIO\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing only audio embeddings.
        + "GENERIC\$1RETRIEVAL" - Creates embeddings optimized for searching a repository containing mixed modality embeddings.
      + Example: In an image search app where users retrieve images using text queries, use `embeddingPurpose = generic_index` when creating an embedding index based on the images and use `embeddingPurpose = image_retrieval` when creating an embedding of the query used to retrieve the images.
    + "CLASSIFICATION" - Creates embeddings optimized for performing classification.
    + "CLUSTERING" - Creates embeddings optimized for clustering.
  + `embeddingDimension` (Optional) - The size of the vector to generate.
    + Type: int
    + Allowed values: 256 \$1 384 \$1 1024 \$1 3072
    + Default: 3072
  + `text` (Optional) - Represents text content. Exactly one of text, image, video, audio must be present.
    + `truncationMode` (Required) - Specifies which part of the text will be truncated in cases where the tokenized version of the text exceeds the maximum supported by the model.
      + Type: string
      + Allowed values:
        + "START" - Omit characters from the start of the text when necessary.
        + "END" - Omit characters from the end of the text when necessary.
        + "NONE" - Fail if text length exceeds the model's maximum token limit.
    + `value` (Optional; Either value or source must be provided) - The text value for which to create the embedding.
      + Type: string
      + Max length: 8192 characters
    + `source` (Optional; Either value or source must be provided) - Reference to a text file stored in S3. Note that the bytes option of the SourceObject is not applicable for text inputs. To pass text inline as part of the request, use the value parameter instead.
    + `segmentationConfig` (Required) - Controls how text content should be segmented into multiple embeddings.
      + `maxLengthChars` (Optional) - The maximum length to allow for each segment. The model will attempt to segment only at word boundaries.
        + Type: int
        + Valid range: 800-50,000
        + Default: 32,000
  + `image` (Optional) - Represents image content. Exactly one of text, image, video, audio must be present.
    + `format` (Required)
      + Type: string
      + Allowed values: "png" \$1 "jpeg" \$1 "gif" \$1 "webp"
    + `source` (Required) - An image content source.
      + Type: SourceObject (see "Common Objects" section)
    + `detailLevel` (Optional) - Dictates the resolution at which the image will be processed with "STANDARD\$1IMAGE" using a lower image resolution and "DOCUMENT\$1IMAGE" using a higher resolution image to better interpret text.
      + Type: string
      + Allowed values: "STANDARD\$1IMAGE" \$1 "DOCUMENT\$1IMAGE"
      + Default: "STANDARD\$1IMAGE"
  + `audio` (Optional) - Represents audio content. Exactly one of text, image, video, audio must be present.
    + `format` (Required)
      + Type: string
      + Allowed values: "mp3" \$1 "wav" \$1 "ogg"
    + `source` (Required) - An audio content source.
      + Type: SourceObject (see "Common Objects" section)
    + `segmentationConfig` (Required) - Controls how audio content should be segmented into multiple embeddings.
      + `durationSeconds` (Optional) - The maximum duration of audio (in seconds) to use for each segment.
        + Type: int
        + Valid range: 1-30
        + Default: 5
  + `video` (Optional) - Represents video content. Exactly one of text, image, video, audio must be present.
    + `format` (Required)
      + Type: string
      + Allowed values: "mp4" \$1 "mov" \$1 "mkv" \$1 "webm" \$1 "flv" \$1 "mpeg" \$1 "mpg" \$1 "wmv" \$1 "3gp"
    + `source` (Required) - A video content source.
      + Type: SourceObject (see "Common Objects" section)
    + `embeddingMode` (Required)
      + Type: string
      + Values: "AUDIO\$1VIDEO\$1COMBINED" \$1 "AUDIO\$1VIDEO\$1SEPARATE"
        + "AUDIO\$1VIDEO\$1COMBINED" - Will produce a single embedding for each segment combining both audible and visual content.
        + "AUDIO\$1VIDEO\$1SEPARATE" - Will produce two embeddings for each segment, one for the audio content and one for the video content.
    + `segmentationConfig` (Required) - Controls how video content should be segmented into multiple embeddings.
      + `durationSeconds` (Optional) - The maximum duration of video (in seconds) to use for each segment.
        + Type: int
        + Valid range: 1-30
        + Default: 5

**StartAsyncInvoke Response**  
The response from a call to [StartAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_StartAsyncInvoke.html) will have the structure below. The `invocationArn` can be used to query the status of the asynchronous job using the [GetAsyncInvoke](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetAsyncInvoke.html) function.

```
{
    "invocationArn": "arn:aws:bedrock:us-east-1:xxxxxxxxxxxx:async-invoke/lvmxrnjf5mo3",
}
```

### Asynchronous Output


When asynchronous embeddings generation is complete, output artifacts are written to the S3 bucket you specified as the output destination. The files will have the following structure:

```
   amzn-s3-demo-bucket/
    job-id/
        segmented-embedding-result.json
        embedding-audio.jsonl
        embedding-image.json
        embedding-text.jsonl
        embedding-video.jsonl
        manifest.json
```

The `segmented-embedding-result.json` will contain the overall job result and reference to the corresponding jsonl files which contain actual embeddings per modality. Below is a truncated example of a file:

```
{
    "sourceFileUri": string, 
    "embeddingDimension": 256 | 384 | 1024 | 3072,
    "embeddingResults": [
        {
            "embeddingType": "TEXT" | "IMAGE" | "VIDEO" | "AUDIO" | "AUDIO_VIDEO_COMBINED",
            "status": "SUCCESS" | "FAILURE" | "PARTIAL_SUCCESS",
            "failureReason": string, // Granular error codes
            "message": string, // Human-readbale failure message
            "outputFileUri": string // S3 URI to a "embedding-modality.jsonl" file
        }
        ...
    ]
}
```

The `embedding-modality.json` will be jsonl files which contain the embedding output for each modality. Each line in the jsonl file will adhere to the following schema:

```
{
    "embedding": number[], // The generated embedding vector
    "segmentMetadata": {
        "segmentIndex": number,
        "segmentStartCharPosition": number, // Included for text only
        "segmentEndCharPosition": number, // Included for text only
        "truncatedCharLength": number, // Included only when text gets truncated
        "segmentStartSeconds": number, // Included for audio/video only
        "segmentEndSeconds": number // Included for audio/video only
    },
    "status": "SUCCESS" | "FAILURE",
    "failureReason": string, // Granular error codes
    "message": string // Human-readable failure message
}
```

The following list includes all of the parameters for the response. For text characters or audio/video times, all starting and ending times are zero-based. Additionally, all ending text positions or audio/video time values are inclusive.
+ `embedding` (Required) — The embedding vector.
  + Type: number
+ `segmentMetadata` — The metadata for the segment.
  + `segmentIndex` — The index of the segment within the array provided in the request.
  + `segmentStartCharPosition` — For text only. The starting (inclusive) character position of the embedded content within the segment.
  + `segmentEndCharPosition` — For text only. The ending character (exclusive) position of the embedded content within the segment. 
  + `truncatedCharLength` (Optional) — Returned if the tokenized version of the input text exceeded the model’s limitations. The value indicates the character after which the text was truncated before generating the embedding. 
    + Type: integer
  + `segmentStartSeconds` — For audio/video only. The starting time position of the embedded content within the segment.
  + `segmentEndSeconds` — For audio/video only. The ending time position of the embedded content within the segment.
+ `status` — The status for the segment.
+ `failureReason` — The detailed reasons on the failure for the segment.
  + `RAI_VIOLATION_INPUT_TEXT_DEFLECTION` — Input text violates RAI policy.
  + `RAI_VIOLATION_INPUT_IMAGE_DEFLECTION` — input image violates RAI policy.
  + `INVALID_CONTENT` — Invalid input.
  + `RATE_LIMIT_EXCEEDED` — Embedding request is throttled due to service unavailability.
  + `INTERNAL_SERVER_EXCEPTION` — Something went wrong.
+ `message` — Related failure message.

## File limitations for Nova Embeddings


Synchronous operations can accept both S3 inputs and inline chunks. Asynchronous operations can only accept S3 inputs.

When generating embeddings asynchronously, you'll need to ensure that your file is separated into an appropriate number of segments. For text embeddings you cannot have more than 1900 segments. For audio and video embeddings you cannot have more than 1434 segments.


**Synchronous Input size limits**  

|  File Type  |  Size Limit  | 
| --- | --- | 
|  (Inline) All file types  |  25 MB  | 
|  (S3) Text  |  1 MB; 50,000 characters  | 
|  (S3) Image  |  50 MB  | 
|  (S3) Video  |  30 seconds; 100 MB  | 
|  (S3) Audio  |  30 seconds; 100 MB  | 

**Note**  
The 25 MB inline file restriction is **after** Base64 embedding. This causes a file size inflation of about 33%


**Asynchronous Input size limits**  

|  File Type  |  Size Limit  | 
| --- | --- | 
|  (S3) Text  |  634 MB  | 
|  (S3) Image  |  50 MB  | 
|  (S3) Video  |  2 GB; 2 hours  | 
|  (S3) Audio  |  1 GB; 2 hours  | 


**Input file types**  

|  Modality  |  File types  | 
| --- | --- | 
|  Image Formats  |  PNG, JPEG, WEBP, GIF  | 
|  Audio Formats  |  MP3, WAV, OGG  | 
|  Video Formats  |  MP4, MOV, MKV, WEBM, FLV, MPEG, MPG, WMV, 3GP  | 