View a markdown version of this page

Monitor bedrock-runtime inference using CloudWatch metrics - Amazon Bedrock

Monitor bedrock-runtime inference using CloudWatch metrics

The Amazon Bedrock bedrock-runtime.region.amazonaws.com endpoint publishes metrics to Amazon CloudWatch under the AWS/Bedrock namespace. Use these metrics to monitor invocation volume, latency, token consumption, error rates, and model invocation logging delivery.

If your application calls inference through bedrock-mantle.region.api.aws, see Monitor bedrock-mantle inference using CloudWatch metrics instead.

Amazon Bedrock runtime metrics

The following table describes runtime metrics provided by Amazon Bedrock.

Metric name Unit Description
Invocations SampleCount Number of successful requests to the Converse, ConverseStream, InvokeModel, and InvokeModelWithResponseStream API operations.

InvocationLatency

MilliSeconds

The time from when a request is sent to when the last token is received.

To distinguish latency increases caused by service-side throughput changes from increases caused by longer model responses, see Diagnose InvocationLatency increases using output tokens per second (OTPS).

InvocationClientErrors

SampleCount

Number of invocations that result in client-side errors.

InvocationServerErrors

SampleCount

Number of invocations that result in AWS server-side errors.

InvocationThrottles

SampleCount

Number of invocations that the system throttled. Throttled requests and other invocation errors don't count as either Invocations or Errors. The number of throttles you see will depend on your retry settings in the SDK. For more information, see Retry behavior in the AWS SDKs and Tools Reference Guide.

InputTokenCount

SampleCount

Number of tokens in the input.

LegacyModelInvocations SampleCount Number of invocations using Legacy models

OutputTokenCount

SampleCount

Number of tokens in the output.

OutputImageCount

SampleCount

Number of images in the output (only applicable for image generation models).

TimeToFirstToken

MilliSeconds

Time from when a request is sent to when the first token is received, for the ConverseStream and InvokeModelWithResponseStream streaming API operations.

EstimatedTPMQuotaUsage

SampleCount

Estimated Tokens Per Minute (TPM) quota consumption across the Converse, ConverseStream, InvokeModel, and InvokeModelWithResponseStream API operations. This metric is an approximation and does not reflect the reservation-based token consumption that drives throttling decisions. Throttling is based on the upfront reservation of input tokens plus max_tokens (see How tokens are counted in Amazon Bedrock), which may differ from this estimate. Do not use this metric as the sole indicator for quota use or capacity planning.

CacheReadInputTokens

SampleCount

Number of input tokens read from the prompt cache. These tokens are charged at a reduced rate and don't count toward your TPM quota.

CacheWriteInputTokens

SampleCount

Number of input tokens written to the prompt cache. These tokens count toward your TPM quota.

There are also metrics for Amazon Bedrock Guardrails and Amazon Bedrock Agents.

Model invocation logging CloudWatch metrics

For each delivery success or failure attempt, the following Amazon CloudWatch metrics are emitted under the namespace AWS/Bedrock, and Across all model IDs dimension:

  • ModelInvocationLogsCloudWatchDeliverySuccess

  • ModelInvocationLogsCloudWatchDeliveryFailure

  • ModelInvocationLogsS3DeliverySuccess

  • ModelInvocationLogsS3DeliveryFailure

  • ModelInvocationLargeDataS3DeliverySuccess

  • ModelInvocationLargeDataS3DeliveryFailure

To retrieve metrics for your Amazon Bedrock operations, you specify the following information:

  • The metric dimension. A dimension is a set of name-value pairs that you use to identify a metric. Amazon Bedrock supports the following dimensions:

    • ModelId – all metrics

    • ModelId + ImageSize + BucketedStepSize – OutputImageCount

  • The metric name, such as InvocationClientErrors.

You can get metrics for Amazon Bedrock with the AWS Management Console, the AWS CLI, or the CloudWatch API. You can use the CloudWatch API through one of the AWS Software Development Kits (SDKs) or the CloudWatch API tools.

To view Amazon Bedrock metrics in the CloudWatch console, go to the metrics section in the navigation pane and select the all metrics option, then search for the model ID.

You must have the appropriate CloudWatch permissions to monitor Amazon Bedrock with CloudWatch For more information, see Authentication and Access Control for Amazon CloudWatch in the Amazon CloudWatch User Guide.