Monitor bedrock-mantle inference using CloudWatch metrics
The Amazon Bedrock OpenAI-compatible endpoint
(bedrock-mantle.) publishes customer-facing
metrics to Amazon CloudWatch under the region.api.awsAWS/BedrockMantle namespace. Use these metrics to
monitor inference volume, token consumption, and error rates across your AWS account, projects,
and models.
If your application calls inference through
bedrock-runtime. using Converse,
ConverseStream, InvokeModel, or InvokeModelWithResponseStream, see Amazon Bedrock runtime metrics instead.
region.amazonaws.com.rproxy.govskope.usbedrock-mantle metrics cover only inference issued through the Responses API, Chat
Completions API, and Anthropic Messages API on the bedrock-mantle endpoint.
bedrock-mantle publishes metrics at four levels of granularity. Each level uses a
different combination of CloudWatch dimensions, described in Dimensions.
All metrics are scoped to your AWS account.
Inference metrics
| Metric name | Unit | Description |
|---|---|---|
Inferences |
Count | Total number of completed inference requests across the Responses, Chat Completions, and Messages APIs. Published at Account, Project, Model, and Project+Model levels. |
InferenceClientErrors |
Count | Number of inference requests that failed with a client-side (4xx) error. Published at Account, Project, Model, and Project+Model levels. |
Token metrics
| Metric name | Unit | Description |
|---|---|---|
TotalInputTokens |
Count | Aggregate billable input tokens processed within the publish interval. Published at Account, Project, and Model levels. Use for sums and rate calculations. |
TotalOutputTokens |
Count | Aggregate billable output tokens generated within the publish interval. Published at Account, Project, and Model levels. |
InputTokens |
Count | Per-inference billable input tokens. Each completed inference emits one datum. Published at the Project+Model level only. Use this when you need percentile statistics (p50/p90/p99) of per-request token counts. |
OutputTokens |
Count | Per-inference billable output tokens. Each completed inference emits one datum. Published at the Project+Model level only. |
Dimensions
| Dimension name | Values | Applies to |
|---|---|---|
Project |
The project ID associated with the inference request. | Project, Project+Model |
Model |
The model identifier (for example, anthropic.claude-opus-4-7). |
Model, Project+Model |
Account-level metrics are scoped to your AWS account and have no additional dimensions. Each level of granularity emits independently, so a single inference contributes to all four levels when both the project and the model can be resolved.
Choosing the right granularity level
-
Account level – overall usage, error rate, and aggregate token volume. Suitable for high-level dashboards and account-wide alarms. Not suitable for cost analysis, because pricing varies by model.
-
Project level – per-project rollups for chargeback and team-level dashboards.
-
Model level – per-model usage and error rates, suitable for migrating dashboards built against the existing
bedrock-runtimeModelIddimension. -
Project+Model level – primary level for cost analysis, and percentile latency and token analysis. Use this when you need both project attribution and model attribution on the same datum.
Differences from bedrock-runtime metrics
-
Separate namespace.
bedrock-mantlemetrics publish toAWS/BedrockMantle. Existing dashboards and alarms built againstAWS/Bedrockwill not pick upbedrock-mantletraffic. -
Naming.
bedrock-mantleusesInferencesrather thanInvocations,TotalInputTokensandTotalOutputTokensrather thanInputTokenCountandOutputTokenCount, andInferenceClientErrorsrather thanInvocationClientErrors. -
Project dimension.
bedrock-mantlemetrics carry aProjectdimension thatbedrock-runtimemetrics do not, enabling per-project cost attribution. -
Cross-region inference.
bedrock-mantleis in-region only. Metrics are emitted in the Region that handled the request and are not aggregated across Regions the way cross-Region inference (CRIS) traffic is onbedrock-runtime. -
Latency metrics.
InvocationLatencyandTimeToFirstTokenequivalents are not yet published bybedrock-mantle.
Viewing bedrock-mantle metrics
To view bedrock-mantle metrics in the CloudWatch console:
-
Open the CloudWatch console.
-
In the navigation pane, choose Metrics, All metrics.
-
Choose the
AWS/BedrockMantlenamespace. -
Select a granularity level by choosing the dimension set that matches your query (for example,
Project, Modelfor project-and-model breakdowns).
You must have appropriate CloudWatch permissions to read bedrock-mantle metrics. For
more information, see Authentication
and access control for Amazon CloudWatch in the Amazon CloudWatch User Guide.