

# Capacity, Limits, and Cost Optimization


Amazon Bedrock offers flexible capacity options to match your workload requirements and budget. Understanding the differences between on-demand tiers (Flex, Priority, Standard), reserved tier, batch processing, and cross-region inference helps you optimize both performance and cost.

# Service tiers for optimizing performance and cost
Reserved, Standard, Priority, and Flex tiers

Amazon Bedrock offers four service tiers for model inference: Reserved, Priority, Standard, and Flex. With service tiers, you can optimize for availability, cost, and performance.

## Reserved Tier


The Reserved tier provides the ability to reserve prioritized compute capacity for your mission-critical applications that cannot tolerate any downtime. You have the flexibility to allocate different input and output tokens-per-minute capacities to match the exact requirements of your workload and control cost. When your application needs more tokens-per-minute capacity than what you reserved, the service automatically overflows to the Standard tier, ensuring uninterrupted operations. The Reserved tier targets 99.5% uptime for model response. Customers can reserve capacity for 1 month or 3 month duration. Customers pay a fixed price per 1K tokens-per-minute and are billed monthly.

To get access to the Reserved tier, please contact your AWS account team.

**Note**  
Billing continues until you delete the Reserved Tier reservation with the help of your AWS account manager.

## Priority Tier


The Priority tier delivers the fastest response times for a price premium over standard on-demand pricing. It is best suited for mission-critical applications with customer-facing business workflows that do not warrant 24X7 capacity reservation. Priority tier does not require prior reservation. You can simply set the "service\$1tier" optional parameter to "priority" to avail request level prioritization. Priority tier requests are prioritized over Standard and Flex tier requests.

## Standard Tier


The Standard tier provides consistent performance for everyday AI tasks such as content generation, text analysis, and routine document processing. By default all inference requests are routed to the Standard tier when the "service\$1tier" parameter is missing. You can also set the "service\$1tier" optional parameter to "default" for your inference request to be served with Standard tier.

## Flex Tier


For workloads that can handle longer processing times, the Flex tier offers cost-effective processing for a pricing discount. This helps you optimize cost for workloads such as model evaluations, content summarization, and agentic workflows. You can set the "service\$1tier" optional parameter to "flex" for your inference request to be served with the Flex tier and avail the pricing discount.

## Using the service tier capability


To access the service tier capability, you can set the "service\$1tier" optional parameter to "reserved", "priority", "default", or "flex" while calling the Amazon Bedrock runtime API.

```
"service_tier" : "reserved | priority | default | flex"
```

Your on-demand quota for a model is shared across the "priority", "default", and "flex" service tiers. Your "reserved" tier capacity reservation is separate from your on-demand quota. The service tier configuration for a served request is visible in API response and AWS CloudTrail Events. You can also view service tier metrics in Amazon CloudWatch Metrics under ModelId, ServiceTier, and ResolvedServiceTier, where ResolvedServiceTier shows the actual tier that served your requests.

For more information about pricing, visit the [pricing page](https://aws.amazon.com/bedrock/pricing/).

Models and regions supported by the Reserved service tier:


|  |  |  |  | 
| --- |--- |--- |--- |
| Provider | Model | Model IDs | Regions | 
| Anthropic | Claude Sonnet 4.6 | global.anthropic.claude-sonnet-4-6us.anthropic.claude-sonnet-4-6eu.anthropic.claude-sonnet-4-6 | ap-northeast-1 | 
| ap-northeast-2 | 
| ap-northeast-3 | 
| ap-southeast-1 | 
| ap-southeast-2 | 
| ap-south-1 | 
| ap-southeast-3 | 
| ap-south-2 | 
| ap-southeast-4 | 
| ca-central-1 | 
| eu-west-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-1 | 
| us-west-2 | 
| me-south-1 | 
| ap-southeast-7 | 
| af-south-1 | 
| me-central-1 | 
| ap-southeast-5 | 
| mx-central-1 | 
| il-central-1 | 
| ap-east-2 | 
| ca-west-1 | 
| Anthropic | Claude Opus 4.6 | global.anthropic.claude-opus-4-6-v1us.anthropic.claude-opus-4-6-v1eu.anthropic.claude-opus-4-6-v1 | af-south-1 | 
| ap-east-2 | 
| ap-northeast-1 | 
| ap-northeast-2 | 
| ap-northeast-3 | 
| ap-south-1 | 
| ap-south-2 | 
| ap-southeast-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ap-southeast-4 | 
| ap-southeast-5 | 
| ap-southeast-7 | 
| ca-central-1 | 
| ca-west-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| il-central-1 | 
| me-central-1 | 
| me-south-1 | 
| mx-central-1 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-1 | 
| us-west-2 | 
| Anthropic | Claude Sonnet 4.5 | global.anthropic.claude-sonnet-4-5-20250929-v1:0us.anthropic.claude-sonnet-4-5-20250929-v1:0eu.anthropic.claude-sonnet-4-5-20250929-v1:0us-gov.anthropic.claude-sonnet-4-5-20250929-v1:0 | ap-northeast-1 | 
| ap-northeast-2 | 
| ap-northeast-3 | 
| ap-southeast-1 | 
| ap-southeast-2 | 
| ap-south-1 | 
| ap-southeast-3 | 
| ap-south-2 | 
| ap-southeast-4 | 
| ca-central-1 | 
| eu-west-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-1 | 
| us-west-2 | 
| us-gov-west-1 | 
| Anthropic | Claude Opus 4.5 | global.anthropic.claude-opus-4-5-20251101-v1:0us.anthropic.claude-opus-4-5-20251101-v1:0eu.anthropic.claude-opus-4-5-20251101-v1:0 | ap-northeast-1 | 
| ap-northeast-2 | 
| ap-northeast-3 | 
| ap-southeast-1 | 
| ap-southeast-2 | 
| ap-south-1 | 
| ap-southeast-3 | 
| ap-south-2 | 
| ap-southeast-4 | 
| ca-central-1 | 
| eu-west-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-1 | 
| us-west-2 | 
| Anthropic | Claude Haiku 4.5 | global.anthropic.claude-haiku-4-5-20251001-v1:0us.anthropic.claude-haiku-4-5-20251001-v1:0eu.anthropic.claude-haiku-4-5-20251001-v1:0 | ap-northeast-1 | 
| ap-northeast-2 | 
| ap-northeast-3 | 
| ap-southeast-1 | 
| ap-southeast-2 | 
| ap-south-1 | 
| ap-southeast-3 | 
| ap-south-2 | 
| ap-southeast-4 | 
| ca-central-1 | 
| eu-west-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-1 | 
| us-west-2 | 

**Note**  
1M context length for Sonnet 4.5 is not supported by the Reserved tier.

Models and regions supported by Priority and Flex service tiers:


|  |  |  |  | 
| --- |--- |--- |--- |
| Provider | Model | Model ID | Regions | 
| OpenAI | gpt-oss-120b | openai.gpt-oss-120b-1:0 | us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-3 | 
| eu-central-1 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-west-1 | 
| eu-west-2 | 
| sa-east-1 | 
| OpenAI | gpt-oss-20b | openai.gpt-oss-20b-1:0 | us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-3 | 
| eu-central-1 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-west-1 | 
| eu-west-2 | 
| sa-east-1 | 
| OpenAI | GPT OSS Safeguard 20B | openai.gpt-oss-safeguard-20b | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| OpenAI | GPT OSS Safeguard 120B | openai.gpt-oss-safeguard-120b | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Qwen | Qwen3 235B A22B 2507 | qwen.qwen3-235b-a22b-2507-v1:0 | us-east-2 | 
| us-west-2 | 
| ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-3 | 
| eu-central-1 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-west-2 | 
| Qwen | Qwen3 Coder 480B A35B Instruct | qwen.qwen3-coder-480b-a35b-v1:0 | us-east-2 | 
| us-west-2 | 
| ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-3 | 
| eu-north-1 | 
| eu-west-2 | 
| Qwen | Qwen3-Coder-30B-A3B-Instruct | qwen.qwen3-coder-30b-a3b-v1:0 | us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-3 | 
| eu-central-1 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-west-1 | 
| eu-west-2 | 
| sa-east-1 | 
| Qwen | Qwen3 32B (dense) | qwen.qwen3-32b-v1:0 | us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-3 | 
| eu-central-1 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-west-1 | 
| eu-west-2 | 
| sa-east-1 | 
| Qwen | Qwen3 Next 80B A3B | qwen.qwen3-next-80b-a3b | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Qwen | Qwen3 VL 235B A22B | qwen.qwen3-vl-235b-a22b | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| DeepSeek | DeepSeek-V3.1 | deepseek.v3-v1:0 | us-east-2 | 
| us-west-2 | 
| ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-3 | 
| eu-north-1 | 
| eu-west-2 | 
| Amazon | Nova Premier | amazon.nova-premier-v1:0 | us-east-1\$1 | 
| us-east-2\$1 | 
| us-west-2\$1 | 
| Amazon | Nova Pro | amazon.nova-pro-v1:0 | us-east-1 | 
| us-east-2\$1 | 
| us-west-1\$1 | 
| us-west-2\$1 | 
| ap-east-2\$1 | 
| ap-northeast-1\$1 | 
| ap-northeast-2\$1 | 
| ap-south-1\$1 | 
| ap-southeast-1\$1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ap-southeast-4\$1 | 
| ap-southeast-5\$1 | 
| ap-southeast-7\$1 | 
| eu-central-1\$1 | 
| eu-north-1\$1 | 
| eu-south-1\$1 | 
| eu-south-2\$1 | 
| eu-west-1\$1 | 
| eu-west-2 | 
| eu-west-3\$1 | 
| il-central-1\$1 | 
| me-central-1 | 
| Amazon | Nova 2 Lite | amazon.nova-2-lite-v1:0 | ap-east-2 | 
| ap-northeast-1 | 
| ap-northeast-2 | 
| ap-south-1 | 
| ap-southeast-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ap-southeast-4 | 
| ap-southeast-5 | 
| ap-southeast-7 | 
| ca-central-1 | 
| ca-west-1 | 
| eu-central-1 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| il-central-1 | 
| me-central-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-1 | 
| us-west-2 | 
| Amazon | Nova 2 Pro Preview | amazon.nova-2-pro-preview-20251202-v1:0 | ap-east-2 | 
| ap-northeast-1 | 
| ap-northeast-2 | 
| ap-south-1 | 
| ap-southeast-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ap-southeast-4 | 
| ap-southeast-5 | 
| ap-southeast-7 | 
| ca-central-1 | 
| ca-west-1 | 
| eu-central-1 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| il-central-1 | 
| me-central-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-1 | 
| us-west-2 | 
| Amazon | Nova Lite 2 Omni | amazon.nova-2-lite-omni-v1 | ap-east-2 | 
| ap-northeast-1 | 
| ap-northeast-2 | 
| ap-south-1 | 
| ap-southeast-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ap-southeast-4 | 
| ap-southeast-5 | 
| ap-southeast-7 | 
| ca-central-1 | 
| ca-west-1 | 
| eu-central-1 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| il-central-1 | 
| me-central-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-1 | 
| us-west-2 | 
| Google | Gemma 3 4B | google.gemma-3-4b-it | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Google | Gemma 3 12B | google.gemma-3-12b-it | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Google | Gemma 3 27B | google.gemma-3-27b-it | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Minimax AI | Minimax M2 | minimax.minimax-m2 | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Mistral | Magistral Small 1.2 | mistral.magistral-small-2509 | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Mistral | Voxtral Mini 1.0 | mistral.voxtral-mini-3b-2507 | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Mistral | Voxtral Small 1.0 | mistral.voxtral-small-24b-2507 | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Mistral | Ministral 3B 3.0 | mistral.ministral-3-3b-instruct | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Mistral | Ministral 8B 3.0 | mistral.ministral-3-8b-instruct | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Mistral | Ministral 14B 3.0 | mistral.ministral-3-14b-instruct | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Mistral | Mistral Large 3 | mistral.mistral-large-3-675b-instruct | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Kimi AI | Kimi K2 Thinking | moonshot.kimi-k2-thinking | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Nvidia | NVIDIA Nemotron Nano 2 | nvidia.nemotron-nano-9b-v2 | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 
| Nvidia | NVIDIA Nemotron Nano 2 VL | nvidia.nemotron-nano-12b-v2 | ap-northeast-1 | 
| ap-south-1 | 
| ap-southeast-2 | 
| ap-southeast-3 | 
| ca-central-1 | 
| eu-central-1 | 
| eu-central-2 | 
| eu-north-1 | 
| eu-south-1 | 
| eu-south-2 | 
| eu-west-1 | 
| eu-west-2 | 
| eu-west-3 | 
| sa-east-1 | 
| us-east-1 | 
| us-east-2 | 
| us-west-2 | 

 \$1Model inference may be served using multiple regions. 

To control access to service tiers refer [Control access to service tiers](security_iam_id-based-policy-examples-agent.md#security_iam_id-based-policy-examples-service-tiers)

## Capacity Options



| Capacity Type | Use Case | Key Characteristics | 
| --- | --- | --- | 
| On-Demand: Flex | Sporadic, low-volume workloads |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/capacity-limits-cost-optimization.html)  | 
| On-Demand: Standard | Regular production workloads |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/capacity-limits-cost-optimization.html)  | 
| On-Demand: Priority | High-priority, latency-sensitive apps |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/capacity-limits-cost-optimization.html)  | 
| Reserved Tier | Consistent, high-volume workloads |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/capacity-limits-cost-optimization.html)  | 
| Batch | Large-scale, non-time-sensitive processing |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/capacity-limits-cost-optimization.html)  | 
| Cross-Region Inference | High availability, traffic bursting |  [\[See the AWS documentation website for more details\]](http://docs.aws.amazon.com/bedrock/latest/userguide/capacity-limits-cost-optimization.html)  | 

## Limits & Quotas


### On-Demand Limits (by tier)



| Tier | RPM Range | TPM Range | Throttling Risk | 
| --- | --- | --- | --- | 
| Flex | 10-100 | 5K-50K | High | 
| Standard | 100-500 | 50K-150K | Medium | 
| Priority | 500-1000\$1 | 150K-300K\$1 | Low | 
+ Burst capacity: Available across all tiers for short spikes
+ Soft limits: Increasable via service quota requests
+ Model-specific: Actual limits vary by foundation model

### Reserved Tier Limits

+ Minimum commitment: 1 model unit
+ Maximum units: Account and region-specific
+ Input/output token limits: Based on purchased units
+ No RPM throttling within purchased capacity

### Batch Processing Limits

+ Job size: Up to 10,000 records per batch
+ File size: Maximum 200 MB input file
+ Processing time: 24-hour completion window
+ Concurrent jobs: Region-specific quotas

### Cross-Region Inference

+ Inherits on-demand tier limits per region
+ No additional quota overhead
+ Automatic routing (no manual limit management)

## Cost Optimization


### Decision Framework



| Scenario | Recommended Option | Why | 
| --- | --- | --- | 
| Development/testing | Flex | Lowest cost, acceptable for non-production | 
| Standard production | Standard | Best cost-performance balance | 
| Critical user-facing apps | Priority | Reliability and performance over cost | 
| Steady high-volume load | Reserved Tier | 30-50% savings with commitment | 
| Bulk data processing | Batch | 50% discount, non-urgent workloads | 
| Mission-critical uptime | Cross-Region Inference | Availability > cost | 

### Optimization Strategies


**Choose the Right On-Demand Tier**
+ Start with Standard for most workloads
+ Downgrade to Flex for dev/test environments
+ Upgrade to Priority only when throttling impacts users
+ Monitor CloudWatch throttle metrics to inform decisions

**Transition to Reserved Tier**
+ When consistent load exceeds 40% of on-demand costs
+ Calculate break-even: (Monthly on-demand cost) vs (Reserved commitment)
+ Use 1-month commitment initially
+ Reserved tier can work alongside any on-demand tier

**Leverage Batch for**
+ Training data generation
+ Content moderation backlogs
+ Report generation
+ Data enrichment pipelines

**Combine Approaches**
+ Reserved tier for baseline traffic
+ Standard on-demand for moderate bursts
+ Priority on-demand for critical peak periods
+ Batch for offline processing
+ Cross-region for failover only

**Cost Monitoring**
+ Compare tier costs: Flex < Standard < Priority
+ Track tokens per request (optimize prompts)
+ Use CloudWatch metrics for utilization and throttling
+ Set billing alarms for unexpected spikes
+ Review reserved tier utilization monthly
+ Evaluate tier upgrades only when throttling occurs

# Process multiple prompts with batch inference
Batch inference

With batch inference, you can submit multiple prompts and generate responses asynchronously. You can format your input data by using either the `InvokeModel` or `Converse` API format. Batch inference helps you process a large number of requests efficiently by sending a single request and generating the responses in an Amazon S3 bucket. After defining model inputs in files you create, you upload the files to an S3 bucket. You then submit a batch inference request and specify the S3 bucket. After the job is complete, you can retrieve the output files from S3. You can use batch inference to improve the performance of model inference on large datasets.

**Note**  
Batch inference isn't supported for provisioned models.

See the following resources for general information about batch inference:
+ To see pricing for batch inference, see [Amazon Bedrock pricing](https://aws.amazon.com/bedrock/pricing/).
+ To see quotas for batch inference, see [Amazon Bedrock endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html) in the AWS General Reference.
+ To receive notifications when batch inference jobs complete or change state instead of polling, see [Monitor Amazon Bedrock job state changes using Amazon EventBridgeMonitor event changes](monitoring-eventbridge.md).

**Topics**
+ [

# Supported Regions and models for batch inference
](batch-inference-supported.md)
+ [

# Prerequisites for batch inference
](batch-inference-prereq.md)
+ [

# Create a batch inference job
](batch-inference-create.md)
+ [

# Monitor batch inference jobs
](batch-inference-monitor.md)
+ [

# Stop a batch inference job
](batch-inference-stop.md)
+ [

# View the results of a batch inference job
](batch-inference-results.md)
+ [

# Code example for batch inference
](batch-inference-example.md)
+ [

# Submit a batch of prompts with the OpenAI Batch API
](inference-openai-batch.md)

# Supported Regions and models for batch inference
Supported Regions and models

The following list provides links to general information about Regional and model support in Amazon Bedrock:
+ For a list of Region codes and endpoints supported in Amazon Bedrock, see [Amazon Bedrock endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bedrock_region).
+ For a list of Amazon Bedrock model IDs to use when calling Amazon Bedrock API operations, see [Supported foundation models in Amazon Bedrock](models-supported.md).
+ For a list of Amazon Bedrock inference profile IDs to use when calling Amazon Bedrock API operations, see [Supported cross-Region inference profiles](inference-profiles-support.md#inference-profiles-support-system).

Batch inference can be used with different types of models. The following list describes support for different types of Amazon Bedrock models:
+ **Single-region model support** – Lists regions that support sending inference requests to a foundation model in one AWS Region. For a full table of models available across Amazon Bedrock, see [Supported foundation models in Amazon Bedrock](models-supported.md).
+ **Cross-region inference profile support** – Lists regions that support using a cross-region inference profile, which support sending inference requests to a foundation model in multiple AWS regions within a geographical area. An inference profile has a prefix preceding the model ID that indicates its geographical area (for example, `us.`, `apac`). For more information for available inference profiles across Amazon Bedrock, see [Supported Regions and models for inference profiles](inference-profiles-support.md).
+ **Custom model support** – Lists regions that support sending inference requests to a customized model. For more information about model customization, see [Customize your model to improve its performance for your use case](custom-models.md).

The following table summarizes support for batch inference:


| Provider | Model | Model ID | Single-region model support | Cross-region inference profile support | Custom model support | 
| --- | --- | --- | --- | --- | --- | 
| Amazon | Amazon Nova Multimodal Embeddings | amazon.nova-2-multimodal-embeddings-v1:0 |  us-east-1  |  | N/A | 
| Amazon | Nova 2 Lite | amazon.nova-2-lite-v1:0 | N/A |  ap-east-2 ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5 ap-southeast-7 ca-central-1 ca-west-1 eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3 il-central-1 me-central-1 us-east-1 us-east-2 us-west-1 us-west-2  | N/A | 
| Amazon | Nova Lite | amazon.nova-lite-v1:0 |  me-central-1 us-east-1 us-gov-west-1  |  ap-east-2 ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5 ap-southeast-7 ca-central-1 ca-west-1 eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3 il-central-1 me-central-1 us-east-1 us-east-2 us-west-1 us-west-2  | N/A | 
| Amazon | Nova Micro | amazon.nova-micro-v1:0 |  us-east-1 us-gov-west-1  |  ap-east-2 ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-5 ap-southeast-7 eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3 il-central-1 me-central-1 us-east-1 us-east-2 us-west-2  | N/A | 
| Amazon | Nova Premier | amazon.nova-premier-v1:0 | N/A |  us-east-1 us-east-2 us-west-2  | N/A | 
| Amazon | Nova Pro | amazon.nova-pro-v1:0 |  ap-southeast-3 me-central-1 us-east-1 us-gov-west-1  |  ap-east-2 ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5 ap-southeast-7 eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3 il-central-1 me-central-1 us-east-1 us-east-2 us-west-1 us-west-2  | N/A | 
| Amazon | Titan Multimodal Embeddings G1 | amazon.titan-embed-image-v1 |  ap-south-1 ap-southeast-2 ca-central-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-west-2  |  |  us-east-1 us-west-2  | 
| Amazon | Titan Text Embeddings V2 | amazon.titan-embed-text-v2:0 |  ap-northeast-1 ap-northeast-2 ca-central-1 eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-2 sa-east-1 us-east-1 us-west-2  |  | N/A | 
| Anthropic | Claude 3 Haiku | anthropic.claude-3-haiku-20240307-v1:0 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 ca-central-1 eu-central-1 eu-central-2 eu-west-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-west-2  | N/A | N/A | 
| Anthropic | Claude 3 Opus | anthropic.claude-3-opus-20240229-v1:0 |  us-west-2  |  us-east-1  | N/A | 
| Anthropic | Claude 3 Sonnet | anthropic.claude-3-sonnet-20240229-v1:0 |  ap-northeast-2 ap-south-1 ap-southeast-2 ca-central-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-west-2  |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 eu-central-1 eu-west-1 eu-west-3 us-east-1 us-west-2  | N/A | 
| Anthropic | Claude 3.5 Haiku | anthropic.claude-3-5-haiku-20241022-v1:0 |  us-west-2  |  us-east-1  | N/A | 
| Anthropic | Claude 3.5 Sonnet | anthropic.claude-3-5-sonnet-20240620-v1:0 |  ap-northeast-1 ap-northeast-2 ap-southeast-1 eu-central-1 us-east-1 us-east-2 us-west-2  |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2 eu-central-1 eu-west-1 eu-west-3 us-east-1 us-west-2  | N/A | 
| Anthropic | Claude 3.5 Sonnet v2 | anthropic.claude-3-5-sonnet-20241022-v2:0 |  us-west-2  |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 us-east-1 us-east-2 us-west-2  | N/A | 
| Anthropic | Claude 3.7 Sonnet | anthropic.claude-3-7-sonnet-20250219-v1:0 | N/A |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 eu-central-1 eu-north-1 eu-west-1 eu-west-3 us-east-1 us-east-2 us-west-2  | N/A | 
| Anthropic | Claude Haiku 4.5 | anthropic.claude-haiku-4-5-20251001-v1:0 | N/A |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ca-central-1 eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-east-2 us-west-1 us-west-2  | N/A | 
| Anthropic | Claude Opus 4.5 | anthropic.claude-opus-4-5-20251101-v1:0 | N/A |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ca-central-1 eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-east-2 us-west-1 us-west-2  | N/A | 
| Anthropic | Claude Opus 4.6 | anthropic.claude-opus-4-6-v1 | N/A |  af-south-1 ap-east-2 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5 ap-southeast-7 ca-central-1 ca-west-1 eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3 il-central-1 me-central-1 me-south-1 mx-central-1 sa-east-1 us-east-1 us-east-2 us-west-1 us-west-2  | N/A | 
| Anthropic | Claude Sonnet 4 | anthropic.claude-sonnet-4-20250514-v1:0 | N/A |  ap-east-2 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5 ap-southeast-7 eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3 il-central-1 me-central-1 us-east-1 us-east-2 us-west-1 us-west-2  | N/A | 
| Anthropic | Claude Sonnet 4.5 | anthropic.claude-sonnet-4-5-20250929-v1:0 | N/A |  af-south-1 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ca-central-1 ca-west-1 eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3 me-south-1 mx-central-1 sa-east-1 us-east-1 us-east-2 us-gov-east-1 us-gov-west-1 us-west-1 us-west-2  | N/A | 
| Anthropic | Claude Sonnet 4.6 | anthropic.claude-sonnet-4-6 |  eu-west-2  |  af-south-1 ap-east-2 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5 ap-southeast-7 ca-central-1 ca-west-1 eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3 il-central-1 me-central-1 me-south-1 mx-central-1 sa-east-1 us-east-1 us-east-2 us-west-1 us-west-2  | N/A | 
| DeepSeek | DeepSeek V3.2 | deepseek.v3.2 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-north-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| DeepSeek | DeepSeek-V3.1 | deepseek.v3-v1:0 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-north-1 eu-west-2 us-east-2 us-west-2  |  | N/A | 
| Google | Gemma 3 12B IT | google.gemma-3-12b-it |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Google | Gemma 3 27B PT | google.gemma-3-27b-it |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Google | Gemma 3 4B IT | google.gemma-3-4b-it |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Meta | Llama 3.1 405B Instruct | meta.llama3-1-405b-instruct-v1:0 |  us-west-2  |  | N/A | 
| Meta | Llama 3.1 70B Instruct | meta.llama3-1-70b-instruct-v1:0 |  us-west-2  |  us-east-1 us-west-2  | N/A | 
| Meta | Llama 3.1 8B Instruct | meta.llama3-1-8b-instruct-v1:0 |  us-west-2  |  us-east-1 us-west-2  | N/A | 
| Meta | Llama 3.2 11B Instruct | meta.llama3-2-11b-instruct-v1:0 |  |  us-east-1 us-west-2  | N/A | 
| Meta | Llama 3.2 1B Instruct | meta.llama3-2-1b-instruct-v1:0 |  |  eu-central-1 eu-west-1 eu-west-3 us-east-1 us-west-2  | N/A | 
| Meta | Llama 3.2 3B Instruct | meta.llama3-2-3b-instruct-v1:0 |  |  eu-central-1 eu-west-1 eu-west-3 us-east-1 us-west-2  | N/A | 
| Meta | Llama 3.2 90B Instruct | meta.llama3-2-90b-instruct-v1:0 |  |  us-east-1 us-west-2  | N/A | 
| Meta | Llama 3.3 70B Instruct | meta.llama3-3-70b-instruct-v1:0 |  us-east-2  |  us-east-1 us-east-2 us-west-2  | N/A | 
| Meta | Llama 4 Maverick 17B Instruct | meta.llama4-maverick-17b-instruct-v1:0 |  |  us-east-1 us-east-2 us-west-1 us-west-2  | N/A | 
| Meta | Llama 4 Scout 17B Instruct | meta.llama4-scout-17b-instruct-v1:0 |  |  us-east-1 us-east-2 us-west-1 us-west-2  | N/A | 
| MiniMax | MiniMax M2 | minimax.minimax-m2 |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| MiniMax | MiniMax M2.1 | minimax.minimax-m2.1 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Mistral AI | Devstral 2 123B | mistral.devstral-2-123b |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Mistral AI | Magistral Small 2509 | mistral.magistral-small-2509 |  ap-northeast-1 ap-south-1 ap-southeast-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Mistral AI | Ministral 14B 3.0 | mistral.ministral-3-14b-instruct |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Mistral AI | Ministral 3 8B | mistral.ministral-3-8b-instruct |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Mistral AI | Ministral 3B | mistral.ministral-3-3b-instruct |  ap-northeast-1 ap-south-1 ap-southeast-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Mistral AI | Mistral Large (24.07) | mistral.mistral-large-2407-v1:0 |  us-west-2  | N/A | N/A | 
| Mistral AI | Mistral Large 3 | mistral.mistral-large-3-675b-instruct |  ap-northeast-1 ap-south-1 ap-southeast-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Mistral AI | Mistral Small (24.02) | mistral.mistral-small-2402-v1:0 |  us-east-1  | N/A | N/A | 
| Mistral AI | Voxtral Mini 3B 2507 | mistral.voxtral-mini-3b-2507 |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Mistral AI | Voxtral Small 24B 2507 | mistral.voxtral-small-24b-2507 |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Moonshot AI | Kimi K2 Thinking | moonshot.kimi-k2-thinking |  ap-northeast-1 ap-south-1 ap-southeast-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Moonshot AI | Kimi K2.5 | moonshotai.kimi-k2.5 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-north-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| NVIDIA | NVIDIA Nemotron Nano 12B v2 VL BF16 | nvidia.nemotron-nano-12b-v2 |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| NVIDIA | NVIDIA Nemotron Nano 9B v2 | nvidia.nemotron-nano-9b-v2 |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| NVIDIA | Nemotron Nano 3 30B | nvidia.nemotron-nano-3-30b |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| OpenAI | GPT OSS Safeguard 120B | openai.gpt-oss-safeguard-120b |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| OpenAI | GPT OSS Safeguard 20B | openai.gpt-oss-safeguard-20b |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| OpenAI | gpt-oss-120b | openai.gpt-oss-120b-1:0 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-gov-west-1 us-west-2  | N/A | N/A | 
| OpenAI | gpt-oss-20b | openai.gpt-oss-20b-1:0 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-gov-west-1 us-west-2  | N/A | N/A | 
| Qwen | Qwen3 235B A22B 2507 | qwen.qwen3-235b-a22b-2507-v1:0 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-central-1 eu-north-1 eu-south-1 eu-west-2 us-east-2 us-west-2  | N/A | N/A | 
| Qwen | Qwen3 32B (dense) | qwen.qwen3-32b-v1:0 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Qwen | Qwen3 Coder 480B A35B Instruct | qwen.qwen3-coder-480b-a35b-v1:0 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-north-1 eu-west-2 us-east-2 us-west-2  | N/A | N/A | 
| Qwen | Qwen3 Coder Next | qwen.qwen3-coder-next |  ap-southeast-2 eu-west-2 us-east-1  | N/A | N/A | 
| Qwen | Qwen3 Next 80B A3B | qwen.qwen3-next-80b-a3b |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Qwen | Qwen3 VL 235B A22B | qwen.qwen3-vl-235b-a22b |  ap-northeast-1 ap-south-1 ap-southeast-2 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Qwen | Qwen3-Coder-30B-A3B-Instruct | qwen.qwen3-coder-30b-a3b-v1:0 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Z.AI | GLM 4.7 | zai.glm-4.7 |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-north-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 
| Z.AI | GLM 4.7 Flash | zai.glm-4.7-flash |  ap-northeast-1 ap-south-1 ap-southeast-2 ap-southeast-3 eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-2 sa-east-1 us-east-1 us-east-2 us-west-2  | N/A | N/A | 

# Prerequisites for batch inference
Prerequisites

To perform batch inference, you must fulfill the following prerequisites:

1. Prepare your dataset and upload it to an Amazon S3 bucket.

1. Create an S3 bucket for your output data.

1. Set up batch inference-related permissions for the relevant IAM identities.

1. (Optional) Set up a VPC to protect the data in your S3 while carrying out batch inference. You can skip this step if you don't need to use a VPC.

To learn how to fulfill these prerequisites, navigate through the following topics:

**Topics**
+ [

# Format and upload your batch inference data
](batch-inference-data.md)
+ [

# Required permissions for batch inference
](batch-inference-permissions.md)
+ [

# Protect batch inference jobs using a VPC
](batch-vpc.md)

# Format and upload your batch inference data
Set up data

You must add your batch inference data to an S3 location that you'll choose or specify when submitting a model invocation job. The S3 location must contain the following items:
+ At least one JSONL file that defines the model inputs. A JSONL contains rows of JSON objects. Your JSONL file must end in the extension .jsonl and be in the following format:

  ```
  { "recordId" : "alphanumeric string", "modelInput" : {JSON body} }
  ...
  ```

  Each line contains a JSON object with a `recordId` field and a `modelInput` field. The format of the `modelInput` JSON object depends on the model invocation type that you choose when you [create the batch inference job](batch-inference-create.md). If you use the `InvokeModel` type (default), the format must match the `body` field for the model that you use in the `InvokeModel` request (see [Inference request parameters and response fields for foundation models](model-parameters.md)). If you use the `Converse` type, the format must match the request body of the [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) API.
**Note**  
If you omit the `recordId` field, Amazon Bedrock adds it in the output.
The order of records in the output JSONL file is not guaranteed to match the order of records in the input JSONL file.
You specify the model that you want to use when you create the [batch inference job](batch-inference-create.md).
+ (If your input content contains an Amazon S3 location) Some models allow you to define the content of the input as an S3 location. See [Example video input for Amazon Nova](#batch-inference-data-ex-s3).
**Warning**  
When using S3 URIs in your prompts, all resources must be in the same S3 bucket and folder. The `InputDataConfig` parameter must specify the folder path containing all linked resources (such as videos or images), not just an individual `.jsonl` file. Note that S3 paths are case-sensitive, so ensure your URIs match the exact folder structure.

Ensure that your inputs conform to the batch inference quotas. You can search for the following quotas at [Amazon Bedrock service quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#limits_bedrock):
+ **Minimum number of records per batch inference job** – The minimum number of records (JSON objects) across JSONL files in the job.
+ **Records per input file per batch inference job** – The maximum number of records (JSON objects) in a single JSONL file in the job.
+ **Records per batch inference job** – The maximum number of records (JSON objects) across JSONL files in the job.
+ **Batch inference input file size** – The maximum size of a single file in the job.
+ **Batch inference job size** – The maximum cumulative size of all input files.

To better understand how to set up your batch inference inputs, see the following examples:

## Example text input for Anthropic Claude 3 Haiku


If you plan to run batch inference using the [Messages API](model-parameters-anthropic-claude-messages.md) format for the Anthropic Claude 3 Haiku model, you might provide a JSONL file containing the following JSON object as one of the lines:

```
{
    "recordId": "CALL0000001", 
    "modelInput": {
        "anthropic_version": "bedrock-2023-05-31", 
        "max_tokens": 1024,
        "messages": [ 
            { 
                "role": "user", 
                "content": [
                    {
                        "type": "text", 
                        "text": "Summarize the following call transcript: ..." 
                    } 
                ]
            }
        ]
    }
}
```

## Example video input for Amazon Nova


If you plan to run batch inference on video inputs using the Amazon Nova Lite or Amazon Nova Pro models, you have the option of defining the video in bytes or as an S3 location in the JSONL file. For example, you might have an S3 bucket whose path is `s3://batch-inference-input-bucket` and contains the following files:

```
s3://batch-inference-input-bucket/
├── videos/
│   ├── video1.mp4
│   ├── video2.mp4
│   ├── ...
│   └── video50.mp4
└── input.jsonl
```

A sample record from the `input.jsonl` file would be the following:

```
{
    "recordId": "RECORD01",
    "modelInput": {
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "text": "You are an expert in recipe videos. Describe this video in less than 200 words following these guidelines: ..."
                    },
                    {
                        "video": {
                            "format": "mp4",
                            "source": {
                                "s3Location": {
                                    "uri": "s3://batch-inference-input-bucket/videos/video1.mp4",
                                    "bucketOwner": "111122223333"
                                }
                            }
                        }
                    }
                ]
            }
        ]
    }
}
```

When you create the batch inference job, you must specify the folder path `s3://batch-inference-input-bucket` in your `InputDataConfig` parameter. Batch inference will process the `input.jsonl` file at this location, along with any referenced resources (such as the video files in the `videos` subfolder).

The following resources provide more information about submitting video inputs for batch inference:
+ To learn how to validate Amazon S3 URIs in an input request, see the [Amazon S3 URL Parsing blog](https://aws.amazon.com/blogs/devops/s3-uri-parsing-is-now-available-in-aws-sdk-for-java-2-x/).
+ For more information on how to set up invocation records for video understanding with Nova, see [Amazon Nova vision prompting guidelines](https://docs.aws.amazon.com/nova/latest/userguide/prompting-vision-prompting.html).

## Example Converse input


If you set the model invocation type to `Converse` when creating the batch inference job, the `modelInput` field must use the Converse API request format. The following example shows a JSONL record for a Converse batch inference job:

```
{
    "recordId": "CALL0000001",
    "modelInput": {
        "messages": [
            {
                "role": "user",
                "content": [
                    {
                        "text": "Summarize the following call transcript: ..."
                    }
                ]
            }
        ],
        "inferenceConfig": {
            "maxTokens": 1024
        }
    }
}
```

For the full list of fields supported in the Converse request body, see [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) in the API reference.

The following topic describes how to set up S3 access and batch inference permissions for an identity to be able to carry out batch inference.

# Required permissions for batch inference
Permissions

To carry out batch inference, you must set up permissions for the following IAM identities:
+ The IAM identity that will create and manage batch inference jobs.
+ The batch inference [service role](security-iam-sr.md) that Amazon Bedrock assumes to perform actions on your behalf.

To learn how to set up permissions for each identity, navigate through the following topics:

**Topics**
+ [

## Required permissions for an IAM identity to submit and manage batch inference jobs
](#batch-inference-permissions-user)
+ [

## Required permissions for a service role to carry out batch inference
](#batch-inference-permissions-service)

## Required permissions for an IAM identity to submit and manage batch inference jobs


For an IAM identity to use this feature, you must configure it with the necessary permissions. To do so, do one of the following:
+ To allow an identity to carry out all Amazon Bedrock actions, attach the [AmazonBedrockFullAccess](security-iam-awsmanpol.md#security-iam-awsmanpol-AmazonBedrockFullAccess) policy to the identity. If you do this, you can skip this topic. This option is less secure.
+ As a security best practice, you should grant only the necessary actions to an identity. This topic describes the permissions that you need for this feature.

To restrict permissions to only actions that are used for batch inference, attach the following identity-based policy to an IAM identity:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "BatchInference",
            "Effect": "Allow",
            "Action": [  
                "bedrock:ListFoundationModels",
                "bedrock:GetFoundationModel",
                "bedrock:ListInferenceProfiles",
                "bedrock:GetInferenceProfile",
                "bedrock:ListCustomModels",
                "bedrock:GetCustomModel",
                "bedrock:TagResource", 
                "bedrock:UntagResource", 
                "bedrock:ListTagsForResource",
                "bedrock:CreateModelInvocationJob",
                "bedrock:GetModelInvocationJob",
                "bedrock:ListModelInvocationJobs",
                "bedrock:StopModelInvocationJob"
            ],
            "Resource": "*"
        }
    ]   
}
```

------

To further restrict permissions, you can omit actions, or you can specify resources and condition keys by which to filter permissions. For more information about actions, resources, and condition keys, see the following topics in the *Service Authorization Reference*:
+ [Actions defined by Amazon Bedrock](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-actions-as-permissions) – Learn about actions, the resource types that you can scope them to in the `Resource` field, and the condition keys that you can filter permissions on in the `Condition` field.
+ [Resource types defined by Amazon Bedrock](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-resources-for-iam-policies) – Learn about the resource types in Amazon Bedrock.
+ [Condition keys for Amazon Bedrock](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-policy-keys) – Learn about the condition keys in Amazon Bedrock.

The following policy is an example that scopes down permissions for batch inference to only allow a user with the account ID `123456789012` to create batch inference jobs in the `us-west-2` Region, using the Anthropic Claude 3 Haiku model:

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "CreateBatchInferenceJob",
            "Effect": "Allow",
            "Action": [
                "bedrock:CreateModelInvocationJob"
            ],
            "Resource": [
                "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-haiku-20240307-v1:0",
                "arn:aws:bedrock:us-west-2:123456789012:model-invocation-job/*"
            ]
        }
    ]
}
```

------

## Required permissions for a service role to carry out batch inference


Batch inference is carried out by a [service role](security-iam-sr.md) that assumes your identity to perform actions on your behalf. You can create a service role in the following ways:
+ Let Amazon Bedrock automatically create a service role with the necessary permissions for you by using the AWS Management Console. You can select this option when you create a batch inference job.
+ Create a custom service role for Amazon Bedrock by using AWS Identity and Access Management and attach the necessary permissions. When you submit the batch inference job, you then specify this role. For more information about creating a custom service role for batch inference, see [Create a custom service role for batch inference](batch-iam-sr.md). For more general information about creating service roles, see [Create a role to delegate permissions to an AWS service](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-service.html) in the IAM User Guide.

**Important**  
If the S3 bucket in which you [uploaded your data for batch inference](batch-inference-data.md) is in a different AWS account, you must configure an S3 bucket policy to allow the service role access to the data. You must manually configure this policy even if you use the console to automatically create a service role. To learn how to configure an S3 bucket policy for Amazon Bedrock resources, see [Attach a bucket policy to an Amazon S3 bucket to allow another account to access it](s3-bucket-access.md#s3-bucket-access-cross-account).
Foundation models in Amazon Bedrock are AWS-managed resources that cannot be used with IAM policy conditions requiring customer ownership. These models are owned and operated by AWS, and cannot be owned by individual customers. Any IAM policy condition that checks for customer-owned resources (such as conditions using resource tags, organization ID, or other ownership attributes) will fail when applied to foundation models, potentially blocking legitimate access to these services.  
For example, if your policy includes an `aws:ResourceOrgID` condition like this:  

  ```
  {
    "Condition": {
      "StringEqualsIgnoreCase": {
        "aws:ResourceOrgID": ["o-xxxxxxxx"]
      }
    }
  }
  ```
Your batch inference job will fail with `AccessDeniedException`. Remove the `aws:ResourceOrgID` condition or create separate policy statements for foundation models.

# Protect batch inference jobs using a VPC
[Optional] Set up a VPC

When you run a batch inference job, the job accesses your Amazon S3 bucket to download the input data and to write the output data. To control access to your data, we recommend that you use a virtual private cloud (VPC) with [Amazon VPC](https://docs.aws.amazon.com/vpc/latest/userguide/what-is-amazon-vpc.html). You can further protect your data by configuring your VPC so that your data isn't available over the internet and instead creating a VPC interface endpoint with [AWS PrivateLink](https://docs.aws.amazon.com/vpc/latest/privatelink/what-is-privatelink.html) to establish a private connection to your data. For more information about how Amazon VPC and AWS PrivateLink integrate with Amazon Bedrock, see [Protect your data using Amazon VPC and AWS PrivateLink](usingVPC.md).

Carry out the following steps to configure and use a VPC for the input prompts and output model responses for your batch inference jobs.

**Topics**
+ [

## Set up VPC to protect your data during batch inference
](#batch-vpc-setup)
+ [

## Attach VPC permissions to a batch inference role
](#batch-vpc-role)
+ [

## Add the VPC configuration when submitting a batch inference job
](#batch-vpc-config)

## Set up VPC to protect your data during batch inference


To set up a VPC, follow the steps at [Set up a VPC](usingVPC.md#create-vpc). You can further secure your VPC by setting up an S3 VPC endpoint and using resource-based IAM policies to restrict access to the S3 bucket containing your batch inference data by following the steps at [(Example) Restrict data access to your Amazon S3 data using VPC](vpc-s3.md).

## Attach VPC permissions to a batch inference role


After you finish setting up your VPC, attach the following permissions to your [batch inference service role](batch-iam-sr.md) to allow it to access the VPC. Modify this policy to allow access to only the VPC resources that your job needs. Replace the *subnet-ids* and *security-group-id* with the values from your VPC.

------
#### [ JSON ]

****  

```
{
    "Version":"2012-10-17",		 	 	 
    "Statement": [
        {
            "Sid": "1",
            "Effect": "Allow",
            "Action": [
                "ec2:DescribeNetworkInterfaces",
                "ec2:DescribeVpcs",
                "ec2:DescribeDhcpOptions",
                "ec2:DescribeSubnets",
                "ec2:DescribeSecurityGroups"
            ],
            "Resource": [
                "*"
            ]
        },
        {
            "Sid": "2",
            "Effect": "Allow",
            "Action": [
                "ec2:CreateNetworkInterface"
            ],
            "Resource": [
                "arn:aws:ec2:us-east-1:123456789012:network-interface/*",
                "arn:aws:ec2:us-east-1:123456789012:subnet/${{subnet-id}}",
                "arn:aws:ec2:us-east-1:123456789012:security-group/${{security-group-id}}"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestTag/BedrockManaged": [
                        "true"
                    ]
                },
                "ArnEquals": {
                    "aws:RequestTag/BedrockModelInvocationJobArn": [
                        "arn:aws:bedrock:us-east-1:123456789012:model-invocation-job/*"
                    ]
                }
            }
        },
        {
            "Sid": "3",
            "Effect": "Allow",
            "Action": [
                "ec2:CreateNetworkInterfacePermission",
                "ec2:DeleteNetworkInterface",
                "ec2:DeleteNetworkInterfacePermission"
            ],
            "Resource": [
                "*"
            ],
            "Condition": {
                "StringEquals": {
                    "ec2:Subnet": [
                        "arn:aws:ec2:us-east-1:123456789012:subnet/${{subnet-id}}"
                    ]
                },
                "ArnEquals": {
                    "ec2:ResourceTag/BedrockModelInvocationJobArn": [
                        "arn:aws:bedrock:us-east-1:123456789012:model-invocation-job/*"
                    ]
                }
            }
        },
        {
            "Sid": "4",
            "Effect": "Allow",
            "Action": [
                "ec2:CreateTags"
            ],
            "Resource": "arn:aws:ec2:us-east-1:123456789012:network-interface/*",
            "Condition": {
                "StringEquals": {
                    "ec2:CreateAction": [
                        "CreateNetworkInterface"
                    ]
                },
                "ForAllValues:StringEquals": {
                    "aws:TagKeys": [
                        "BedrockManaged",
                        "BedrockModelInvocationJobArn"
                    ]
                }
            }
        }
    ]
}
```

------

## Add the VPC configuration when submitting a batch inference job


After you configure the VPC and the required roles and permissions as described in the previous sections, you can create a batch inference job that uses this VPC.

**Note**  
Currently, when creating a batch inference job, you can only use a VPC through the API.

When you specify the VPC subnets and security groups for a job, Amazon Bedrock creates *elastic network interfaces* (ENIs) that are associated with your security groups in one of the subnets. ENIs allow the Amazon Bedrock job to connect to resources in your VPC. For information about ENIs, see [Elastic Network Interfaces](https://docs.aws.amazon.com/vpc/latest/userguide/VPC_ElasticNetworkInterfaces.html) in the *Amazon VPC User Guide*. Amazon Bedrock tags ENIs that it creates with `BedrockManaged` and `BedrockModelInvocationJobArn` tags.

We recommend that you provide at least one subnet in each Availability Zone.

You can use security groups to establish rules for controlling Amazon Bedrock access to your VPC resources.

When you submit a [CreateModelInvocationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateModelInvocationJob.html) request, you can include a `VpcConfig` as a request parameter to specify the VPC subnets and security groups to use, as in the following example.

```
"vpcConfig": { 
    "securityGroupIds": [
        "sg-0123456789abcdef0"
    ],
    "subnets": [
        "subnet-0123456789abcdef0",
        "subnet-0123456789abcdef1",
        "subnet-0123456789abcdef2"
    ]
}
```

# Create a batch inference job
Create a job

After you've set up an Amazon S3 bucket with files for running model inference, you can create a batch inference job. Before you begin, check that you set up the files in accordance with the instructions described in [Format and upload your batch inference data](batch-inference-data.md).

**Note**  
To submit a batch inference job using a VPC, you must use the API. Select the API tab to learn how to include the VPC configuration.

To learn how to create a batch inference job, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

**To create a batch inference job**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. From the left navigation pane, select **Batch inference**.

1. In the **Batch inference jobs** section, choose **Create job**.

1. In the **Job details** section, give the batch inference job a **Job name** and select a model to use for the batch inference job by choosing **Select model**.

1. In the **Model invocation type** section, choose the API format for your input data. Choose **InvokeModel** if your input data uses model-specific request formats, or choose **Converse** if your input data uses the Converse API format. The default is **InvokeModel**.

1. In the **Input data** section, choose **Browse S3** and select an S3 location for your batch inference job. Batch inference processes all JSONL and accompanying content files at that S3 location, whether the location is an S3 folder or a single JSONL file.
**Note**  
If the input data is in an S3 bucket that belongs to a different account from the one from which you're submitting the job, you must use the API to submit the batch inference job. To learn how to do this, select the API tab above.

1. In the **Output data** section, choose **Browse S3** and select an S3 location to store the output files from your batch inference job. By default, the output data will be encrypted by an AWS managed key. To choose a custom KMS key, select **Customize encryption settings (advanced)** and choose a key. For more information about encryption of Amazon Bedrock resources and setting up a custom KMS key see [Data encryption](data-encryption.md).
**Note**  
If you plan to write the output data to an S3 bucket that belongs to a different account from the one from which you're submitting the job, you must use the API to submit the batch inference job. To learn how to do this, select the API tab above.

1. In the **Service access** section, select one of the following options:
   + **Use an existing service role** – Select a service role from the drop-down list. For more information on setting up a custom role with the appropriate permissions, see [Required permissions for batch inference](batch-inference-permissions.md).
   + **Create and use a new service role** – Enter a name for the service role.

1. (Optional) To associate tags with the batch inference job, expand the **Tags** section and add a key and optional value for each tag. For more information, see [Tagging Amazon Bedrock resources](tagging.md).

1. Choose **Create batch inference job**.

------
#### [ API ]

To create a batch inference job, send a [CreateModelInvocationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateModelInvocationJob.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp).

The following fields are required:


****  

| Field | Use case | 
| --- | --- | 
| jobName | To specify a name for the job. | 
| roleArn | To specify the Amazon Resource Name (ARN) of the service role with permissions to create and manage the job. For more information, see [Create a custom service role for batch inference](batch-iam-sr.md). | 
| modelId | To specify the ID or ARN of the model to use in inference. | 
| inputDataConfig | To specify the S3 location containing the input data. Batch inference processes all JSONL and accompanying content files at that S3 location, whether the location is an S3 folder or a single JSONL file. For more information, see [Format and upload your batch inference data](batch-inference-data.md). | 
| outputDataConfig | To specify the S3 location to write the model responses to. | 

The following fields are optional:


****  

| Field | Use case | 
| --- | --- | 
| modelInvocationType | To specify the API format of the input data. Set to Converse to use the Converse API format, or InvokeModel (default) to use model-specific request formats. For more information about the Converse request format, see [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html). | 
| timeoutDurationInHours | To specify the duration in hours after which the job will time out. | 
| tags | To specify any tags to associate with the job. For more information, see [Tagging Amazon Bedrock resources](tagging.md). | 
| vpcConfig | To specify the VPC configuration to use to protect your data during the job. For more information, see [Protect batch inference jobs using a VPC](batch-vpc.md). | 
| clientRequestToken | To ensure the API request completes only once. For more information, see [Ensuring idempotency](https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html). | 

The response returns a `jobArn` that you can use to refer to the job when carrying out other batch inference-related API calls.

------

# Monitor batch inference jobs
Monitor jobs

Apart from the configurations you set for a batch inference job, you can also monitor its progress by seeing its status. For more information about the possible statuses for a job, see the `status` field in [ModelInvocationJobSummary](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ModelInvocationJobSummary.html).

To track a job's progress, you can use the progress counters that the [GetModelInvocationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetModelInvocationJob.html) and [ListModelInvocationJobs](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListModelInvocationJobs.html) API operations return. These counters show the total number of input records and how many the service has processed. You can monitor completion without checking Amazon S3 output buckets. Alternatively, you can find these numbers in the `manifest.json.out` file in the Amazon S3 bucket that contains the output files. For more information, see [View the results of a batch inference job](batch-inference-results.md). To learn how to download an S3 object, see [Downloading objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/download-objects.html).

**Tip**  
Instead of polling for job status, you can use Amazon EventBridge to receive automatic notifications when a batch inference job completes or changes state. For more information, see [Monitor Amazon Bedrock job state changes using Amazon EventBridgeMonitor event changes](monitoring-eventbridge.md).

To learn how to view details about batch inference jobs, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

**To view information about batch inference jobs**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. From the left navigation pane, select **Batch inference**.

1. In the **Batch inference jobs** section, choose a job.

1. On the job details page, you can view information about the job's configuration and monitor its progress by viewing its **Status**.

------
#### [ API ]

To get information about a batch inference job, send a [GetModelInvocationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetModelInvocationJob.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) and provide the ID or ARN of the job in the `jobIdentifier` field.

To list information about multiple batch inference jobs, send [ListModelInvocationJobs](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListModelInvocationJobs.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp). You can specify the following optional parameters:


****  

| Field | Short description | 
| --- | --- | 
| maxResults | The maximum number of results to return in a response. | 
| nextToken | If there are more results than the number you specified in the maxResults field, the response returns a nextToken value. To see the next batch of results, send the nextToken value in another request. | 

The response for `GetModelInvocationJob` and `ListModelInvocationJobs` includes a `modelInvocationType` field that indicates whether the job uses the `InvokeModel` or `Converse` API format.

The response also includes the following fields that you can use to track the progress of a running job:
+ `totalRecordCount` – The total number of records submitted to the batch inference job.
+ `processedRecordCount` – The number of records processed so far, which includes both successes and errors.
+ `successRecordCount` – The number of records successfully processed so far.
+ `errorRecordCount` – The number of records that have caused errors during processing.

To calculate the percentage of progress for a running job, divide `processedRecordCount` by `totalRecordCount`. The counters return `0` when you submit a job but processing has not yet started. While a job is in progress, the counters might be delayed by up to 1 minute.

To list all the tags for a job, send a [ListTagsForResource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListTagsForResource.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) and include the Amazon Resource Name (ARN) of the job.

------

# Stop a batch inference job
Stop a job

To learn how to stop an ongoing batch inference job, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

**To stop a batch inference job**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. From the left navigation pane, select **Batch inference**.

1. Select a job to go to the job details page or select the option button next to a job.

1. Choose **Stop job**.

1. Review the message and choose **Stop job** to confirm.
**Note**  
You're charged for tokens that have already been processed.

------
#### [ API ]

To stop a batch inference job, send a [StopModelInvocationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_StopModelInvocationJob.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) and provide the ID or ARN of the job in the `jobIdentifier` field.

If the job was successfully stopped, you receive an HTTP 200 response.

------

# View the results of a batch inference job
View the results of a job

After a batch inference job is `Completed`, you can extract the results of the batch inference job from the files in the Amazon S3 bucket that you specified during creation of the job. To learn how to download an S3 object, see [Downloading objects](https://docs.aws.amazon.com/AmazonS3/latest/userguide/download-objects.html). The S3 bucket contains the following files:

1. Amazon Bedrock generates an output JSONL file for each input JSONL file. The output files contain outputs from the model for each input in the following format. An `error` object replaces the `modelOutput` field in any line where there was an error in inference. The format of the `modelOutput` JSON object depends on the model invocation type. For `InvokeModel` jobs, the format matches the `body` field in the `InvokeModel` response (see [Inference request parameters and response fields for foundation models](model-parameters.md)). For `Converse` jobs, the format matches the response body of the [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) API.

   ```
   { "recordId" : "string", "modelInput": {JSON body}, "modelOutput": {JSON body} }
   ```

   The following example shows a possible output file.

   ```
   { "recordId" : "3223593EFGH", "modelInput" : {"inputText": "Roses are red, violets are"}, "modelOutput" : {"inputTextTokenCount": 8, "results": [{"tokenCount": 3, "outputText": "blue\n", "completionReason": "FINISH"}]}}
   { "recordId" : "1223213ABCD", "modelInput" : {"inputText": "Hello world"}, "error" : {"errorCode" : 400, "errorMessage" : "bad request" }}
   ```

1. A `manifest.json.out` file containing a summary of the batch inference job.

   ```
   {
       "totalRecordCount" : number, 
       "processedRecordCount" : number,
       "successRecordCount": number,
       "errorRecordCount": number,
       "inputTokenCount": number,
       "outputTokenCount" : number
   }
   ```

   The fields are described below:
   + `totalRecordCount` – The total number of records submitted to the batch inference job.
   + `processedRecordCount` – The number of records processed, which includes both successes and errors.
   + `successRecordCount` – The number of records successfully processed.
   + `errorRecordCount` – The number of records that caused errors during processing.
   + `inputTokenCount` – The total number of input tokens submitted to the batch inference job.
   + `outputTokenCount` – The total number of output tokens generated by the batch inference job.

# Code example for batch inference
Code example

The code example in this chapter shows how to create a batch inference job, view information about it, and stop it. This example uses the `InvokeModel` API format. For information about using the `Converse` API format, see [Format and upload your batch inference data](batch-inference-data.md).

Select a language to see a code example for it:

------
#### [ Python ]

Create a JSONL file named *abc.jsonl* and include a JSON object for each record that contains at least the minimum number of records (see the **Minimum number of records per batch inference job for *\$1Model\$1*** [Quotas for Amazon Bedrock](quotas.md)). In this example, you'll use the Anthropic Claude 3 Haiku model. The following example shows the first input JSON in the file:

```
{
    "recordId": "CALL0000001", 
    "modelInput": {
        "anthropic_version": "bedrock-2023-05-31", 
        "max_tokens": 1024,
        "messages": [ 
            { 
                "role": "user", 
                "content": [
                    {
                        "type": "text", 
                        "text": "Summarize the following call transcript: ..." 
                    } 
                ]
            }
        ]
    }
}
... 
# Add records until you hit the minimum
```

Create an S3 bucket called *amzn-s3-demo-bucket-input* and upload the file to it. Then create an S3 bucket called *amzn-s3-demo-bucket-output* to write your output files to. Run the following code snippet to submit a job and get the *jobArn* from the response:

```
import boto3

bedrock = boto3.client(service_name="bedrock")

inputDataConfig=({
    "s3InputDataConfig": {
        "s3Uri": "s3://amzn-s3-demo-bucket-input/abc.jsonl"
    }
})

outputDataConfig=({
    "s3OutputDataConfig": {
        "s3Uri": "s3://amzn-s3-demo-bucket-output/"
    }
})

response=bedrock.create_model_invocation_job(
    roleArn="arn:aws:iam::123456789012:role/MyBatchInferenceRole",
    modelId="anthropic.claude-3-haiku-20240307-v1:0",
    jobName="my-batch-job",
    inputDataConfig=inputDataConfig,
    outputDataConfig=outputDataConfig
)

jobArn = response.get('jobArn')
```

Return the `status` of the job.

```
bedrock.get_model_invocation_job(jobIdentifier=jobArn)['status']
```

List batch inference jobs that *Failed*.

```
bedrock.list_model_invocation_jobs(
    maxResults=10,
    statusEquals="Failed",
    sortOrder="Descending"
)
```

Stop the job that you started.

```
bedrock.stop_model_invocation_job(jobIdentifier=jobArn)
```

------

# Submit a batch of prompts with the OpenAI Batch API
Use the OpenAI Batch API

You can run a batch inference job using the [OpenAI Create batch API](https://platform.openai.com/docs/api-reference/batch) with Amazon Bedrock OpenAI models.

You can call the OpenAI Create batch API in the following ways:
+ Make an HTTP request with an Amazon Bedrock Runtime endpoint.
+ Use an OpenAI SDK request with an Amazon Bedrock Runtime endpoint.

Select a topic to learn more:

**Topics**
+ [

## Supported models and Regions for the OpenAI batch API
](#inference-openai-batch-supported)
+ [

## Prerequisites to use the OpenAI batch API
](#inference-openai-batch-prereq)
+ [

## Create an OpenAI batch job
](#inference-openai-batch-create)
+ [

## Retrieve an OpenAI batch job
](#inference-openai-batch-retrieve)
+ [

## List OpenAI batch jobs
](#inference-openai-batch-list)
+ [

## Cancel an OpenAI batch job
](#inference-openai-batch-cancel)

## Supported models and Regions for the OpenAI batch API
Supported models and Regions

You can use the OpenAI Create batch API with all OpenAI models supported in Amazon Bedrock and in the AWS Regions that support these models. For more information about supported models and regions, see [Supported foundation models in Amazon Bedrock](models-supported.md).

## Prerequisites to use the OpenAI batch API
Prerequisites

To see prerequisites for using the OpenAI batch API operations, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK ]
+ **Authentication** – The OpenAI SDK only supports authentication with an Amazon Bedrock API key. Generate an Amazon Bedrock API key to authenticate your request. To learn about Amazon Bedrock API keys and how to generate them, see the API keys section in the Build chapter.
+ **Endpoint** – Find the endpoint that corresponds to the AWS Region to use in [Amazon Bedrock Runtime endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-rt). If you use an AWS SDK, you might only need to specify the region code and not the whole endpoint when you set up the client.
+ **Model access** – Request access to an Amazon Bedrock model that supports this feature. For more information, see [Manage model access using SDK and CLI](model-access.md#model-access-modify).
+ **Install an OpenAI SDK** – For more information, see [Libraries](https://platform.openai.com/docs/libraries) in the OpenAI documentation.
+ **Batch JSONL file uploaded to S3** – Follow the steps at [Prepare your batch file](https://platform.openai.com/docs/guides/batch#1-prepare-your-batch-file) in the OpenAI documentation to prepare your batch file with the correct format. Then upload it to an Amazon S3 bucket.
+ **IAM permissions** – Make sure that you have the following IAM identities with the proper permissions:
  + An IAM identity that you authenticate with can carry out batch inference-related API operations. For more information, see [Required permissions for an IAM identity to submit and manage batch inference jobs](batch-inference-permissions.md).
  + The batch inference service role that you use can assume your identity, invoke the OpenAI model that you use, and has access to your batch JSONL file in S3. For more information, see [Service roles](security-iam-sr.md).

------
#### [ HTTP request ]
+ **Authentication** – You can authenticate with either your AWS credentials or with an Amazon Bedrock API key.

  Set up your AWS credentials or generate an Amazon Bedrock API key to authenticate your request.
  + To learn about setting up your AWS credentials, see [Programmatic access with AWS security credentials](https://docs.aws.amazon.com/IAM/latest/UserGuide/security-creds-programmatic-access.html).
  + To learn about Amazon Bedrock API keys and how to generate them, see the API keys section in the Build chapter.
+ **Endpoint** – Find the endpoint that corresponds to the AWS Region to use in [Amazon Bedrock Runtime endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-rt). If you use an AWS SDK, you might only need to specify the region code and not the whole endpoint when you set up the client.
+ **Model access** – Request access to an Amazon Bedrock model that supports this feature. For more information, see [Manage model access using SDK and CLI](model-access.md#model-access-modify).
+ **Batch JSONL file uploaded to S3** – Follow the steps at [Prepare your batch file](https://platform.openai.com/docs/guides/batch#1-prepare-your-batch-file) in the OpenAI documentation to prepare your batch file with the correct format. Then upload it to an Amazon S3 bucket.
+ **IAM permissions** – Make sure that you have the following IAM identities with the proper permissions:
  + An IAM identity that you authenticate with can carry out batch inference-related API operations. For more information, see [Required permissions for an IAM identity to submit and manage batch inference jobs](batch-inference-permissions.md).
  + The batch inference service role that you use can assume your identity, invoke the OpenAI model that you use, and has access to your batch JSONL file in S3. For more information, see [Service roles](security-iam-sr.md).

------

## Create an OpenAI batch job
Create a batch job

For details about the OpenAI Create batch API, refer to the following resources in the OpenAI documentation:
+ [Create batch](https://platform.openai.com/docs/api-reference/batch/create) – Details both the request and response.
+ [The request output object](https://platform.openai.com/docs/api-reference/batch/request-output) – Details the fields of the generated output from the batch job. Refer to this documentation when interpreting the results in your S3 bucket.

**Form the request**  
When forming the batch inference request, note the following Amazon Bedrock-specific fields and values:

**Request headers**
+ X-Amzn-Bedrock-RoleArn (required) – The Amazon Resource Name (ARN) of the batch inference service role. For more information, see [Create a custom service role for batch inference](batch-iam-sr.md)
+ X-Amzn-Bedrock-ModelId (required) – The ID of the foundation model to use in inference. For more information, see [Supported foundation models in Amazon Bedrock](models-supported.md).
+ X-Amzn-Bedrock-OutputEncryptionKeyId (optional) – The ID of a KMS key that you want to use to encrypt the output S3 files. For more information, see [Specifying server-side encryption with AWS KMS (SSE-KMS)](https://docs.aws.amazon.com/AmazonS3/latest/userguide/specifying-kms-encryption.html).
+ X-Amzn-Bedrock-Tags (optional) – A dictionary of keys and values that indicate tags to attach to the output. For more information, see [Tagging Amazon Bedrock resources](tagging.md).

**Request body parameters:**
+ endpoint – Must be `v1/chat/completions`.
+ input\$1file\$1id – Specify the S3 URI of your batch JSONL file.

**Find the generated results**  
The creation response includes a batch ID. The results and error logging of the batch inference job are written to the S3 folder containing the input file. The results will be in a folder with the same name as the batch ID, as in the following folder structure:

```
---- {batch_input_folder}
        |---- {batch_input}.jsonl
        |---- {batch_id}
	           |---- {batch_input}.jsonl.out
	           |---- {batch_input}.jsonl.err
```

To see examples of using the OpenAI Create batch API with different methods, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

To create a batch job with the OpenAI SDK, do the following:

1. Import the OpenAI SDK and set up the client with the following fields:
   + `base_url` – Prefix the Amazon Bedrock Runtime endpoint to `/openai/v1`, as in the following format:

     ```
     https://${bedrock-runtime-endpoint}/openai/v1
     ```
   + `api_key` – Specify an Amazon Bedrock API key.
   + `default_headers` – If you need to include any headers, you can include them as key-value pairs in this object. You can alternatively specify headers in the `extra_headers` when making a specific API call.

1. Use the [batches.create()](https://platform.openai.com/docs/api-reference/batch/create) method with the client.

Before running the following example, replace the placeholders in the following fields:
+ api\$1key – Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.
+ X-Amzn-BedrockRoleArn – Replace *arn:aws:iam::123456789012:role/BatchServiceRole* with the actual batch inference service role you set up.
+ input\$1file\$1id – Replace *s3://amzn-s3-demo-bucket/openai-input.jsonl* with the actual S3 URI to which you uploaded your batch JSONL file.

The example calls the OpenAI Create batch job API in `us-west-2` and includes one piece of metadata.

```
from openai import OpenAI

client = OpenAI(
    base_url="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1", 
    api_key="$AWS_BEARER_TOKEN_BEDROCK", # Replace with actual API key
    default_headers={
        "X-Amzn-Bedrock-RoleArn": "arn:aws:iam::123456789012:role/BatchServiceRole" # Replace with actual service role ARN
    }
)

job = client.batches.create(
    input_file_id="s3://amzn-s3-demo-bucket/openai-input.jsonl", # Replace with actual S3 URI
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "description": "test input"
    },
    extra_headers={
        "X-Amzn-Bedrock-ModelId": "openai.gpt-oss-20b-1:0",
    }
)
print(job)
```

------
#### [ HTTP request ]

To create a chat completion with a direct HTTP request, do the following:

1. Use the POST method and specify the URL by prefixing the Amazon Bedrock Runtime endpoint to `/openai/v1/batches`, as in the following format:

   ```
   https://${bedrock-runtime-endpoint}/openai/v1/batches
   ```

1. Specify your AWS credentials or an Amazon Bedrock API key in the `Authorization` header.

Before running the following example, first replace the placeholders in the following fields:
+ Authorization – Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.
+ X-Amzn-BedrockRoleArn – Replace *arn:aws:iam::123456789012:role/BatchServiceRole* with the actual batch inference service role you set up.
+ input\$1file\$1id – Replace *s3://amzn-s3-demo-bucket/openai-input.jsonl* with the actual S3 URI to which you uploaded your batch JSONL file.

The following example calls the Create chat completion API in `us-west-2` and includes one piece of metadata:

```
curl -X POST 'https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1/batches' \
    -H 'Authorization: Bearer $AWS_BEARER_TOKEN_BEDROCK' \  
    -H 'Content-Type: application/json' \
    -H 'X-Amzn-Bedrock-ModelId: openai.gpt-oss-20b-1:0' \
    -H 'X-Amzn-Bedrock-RoleArn: arn:aws:iam::123456789012:role/BatchServiceRole' \  
    -d '{    
    "input_file_id": "s3://amzn-s3-demo-bucket/openai-input.jsonl",    
    "endpoint": "/v1/chat/completions",    
    "completion_window": "24h",
    "metadata": {"description": "test input"}  
}'
```

------

## Retrieve an OpenAI batch job
Retrieve a batch job

For details about the OpenAI Retrieve batch API request and response, refer to [Retrieve batch](https://platform.openai.com/docs/api-reference/batch/retrieve).

When you make the request, you specify the ID of the batch job for which to get information. The response returns information about a batch job, including the output and error file names that you can look up in your S3 buckets.

To see examples of using the OpenAI Retrieve batch API with different methods, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

To retrieve a batch job with the OpenAI SDK, do the following:

1. Import the OpenAI SDK and set up the client with the following fields:
   + `base_url` – Prefix the Amazon Bedrock Runtime endpoint to `/openai/v1`, as in the following format:

     ```
     https://${bedrock-runtime-endpoint}/openai/v1
     ```
   + `api_key` – Specify an Amazon Bedrock API key.
   + `default_headers` – If you need to include any headers, you can include them as key-value pairs in this object. You can alternatively specify headers in the `extra_headers` when making a specific API call.

1. Use the [batches.retrieve()](https://platform.openai.com/docs/api-reference/batch/create) method with the client and specify the ID of the batch for which to retrieve information.

Before running the following example, replace the placeholders in the following fields:
+ api\$1key – Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.
+ batch\$1id – Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.

The example calls the OpenAI Retrieve batch job API in `us-west-2` on a batch job whose ID is *batch\$1abc123*.

```
from openai import OpenAI

client = OpenAI(
    base_url="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1", 
    api_key="$AWS_BEARER_TOKEN_BEDROCK" # Replace with actual API key
)

job = client.batches.retrieve(batch_id="batch_abc123") # Replace with actual ID

print(job)
```

------
#### [ HTTP request ]

To retrieve a batch job with a direct HTTP request, do the following:

1. Use the GET method and specify the URL by prefixing the Amazon Bedrock Runtime endpoint to `/openai/v1/batches/${batch_id}`, as in the following format:

   ```
   https://${bedrock-runtime-endpoint}/openai/v1/batches/batch_abc123
   ```

1. Specify your AWS credentials or an Amazon Bedrock API key in the `Authorization` header.

Before running the following example, first replace the placeholders in the following fields:
+ Authorization – Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.
+ batch\$1abc123 – In the path, replace this value with the actual ID of your batch job.

The following example calls the OpenAI Retrieve batch API in `us-west-2` on a batch job whose ID is *batch\$1abc123*.

```
curl -X GET 'https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1/batches/batch_abc123' \
    -H 'Authorization: Bearer $AWS_BEARER_TOKEN_BEDROCK'
```

------

## List OpenAI batch jobs
List batch jobs

For details about the OpenAI List batches API request and response, refer to [List batches](https://platform.openai.com/docs/api-reference/batch/list). The response returns an array of information about your batch jobs.

When you make the request, you can include query parameters to filter the results. The response returns information about a batch job, including the output and error file names that you can look up in your S3 buckets.

To see examples of using the OpenAI List batches API with different methods, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

To list batch jobs with the OpenAI SDK, do the following:

1. Import the OpenAI SDK and set up the client with the following fields:
   + `base_url` – Prefix the Amazon Bedrock Runtime endpoint to `/openai/v1`, as in the following format:

     ```
     https://${bedrock-runtime-endpoint}/openai/v1
     ```
   + `api_key` – Specify an Amazon Bedrock API key.
   + `default_headers` – If you need to include any headers, you can include them as key-value pairs in this object. You can alternatively specify headers in the `extra_headers` when making a specific API call.

1. Use the [batches.list()](https://platform.openai.com/docs/api-reference/batch/list) method with the client. You can include any of the optional parameters.

Before running the following example, replace the placeholders in the following fields:
+ api\$1key – Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.

The example calls the OpenAI List batch jobs API in `us-west-2` and specifies a limit of 2 results to return.

```
from openai import OpenAI

client = OpenAI(
    base_url="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1", 
    api_key="$AWS_BEARER_TOKEN_BEDROCK" # Replace with actual API key
)

job = client.batches.list(limit=2)

print(job)
```

------
#### [ HTTP request ]

To list batch jobs with a direct HTTP request, do the following:

1. Use the GET method and specify the URL by prefixing the Amazon Bedrock Runtime endpoint to `/openai/v1/batches`, as in the following format:

   ```
   https://${bedrock-runtime-endpoint}/openai/v1/batches
   ```

   You can include any of the optional query parameters.

1. Specify your AWS credentials or an Amazon Bedrock API key in the `Authorization` header.

Before running the following example, first replace the placeholders in the following fields:
+ Authorization – Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.

The following example calls the OpenAI List batches API in `us-west-2` and specifies a limit of 2 results to return.

```
curl -X GET 'https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1/batches?limit=2' \
    -H 'Authorization: Bearer $AWS_BEARER_TOKEN_BEDROCK' \
```

------

## Cancel an OpenAI batch job
Cancel a batch job

For details about the OpenAI Cancel batch API request and response, refer to [Cancel batch](https://platform.openai.com/docs/api-reference/batch/cancel). The response returns information about the cancelled batch job.

When you make the request, you specify the ID of the batch job that you want to cancel.

To see examples of using the OpenAI Cancel batch API with different methods, choose the tab for your preferred method, and then follow the steps:

------
#### [ OpenAI SDK (Python) ]

To cancel a batch job with the OpenAI SDK, do the following:

1. Import the OpenAI SDK and set up the client with the following fields:
   + `base_url` – Prefix the Amazon Bedrock Runtime endpoint to `/openai/v1`, as in the following format:

     ```
     https://${bedrock-runtime-endpoint}/openai/v1
     ```
   + `api_key` – Specify an Amazon Bedrock API key.
   + `default_headers` – If you need to include any headers, you can include them as key-value pairs in this object. You can alternatively specify headers in the `extra_headers` when making a specific API call.

1. Use the [batches.cancel()](https://platform.openai.com/docs/api-reference/batch/cancel) method with the client and specify the ID of the batch for which to retrieve information.

Before running the following example, replace the placeholders in the following fields:
+ api\$1key – Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.
+ batch\$1id – Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.

The example calls the OpenAI Cancel batch job API in `us-west-2` on a batch job whose ID is *batch\$1abc123*.

```
from openai import OpenAI

client = OpenAI(
    base_url="https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1", 
    api_key="$AWS_BEARER_TOKEN_BEDROCK" # Replace with actual API key
)

job = client.batches.cancel(batch_id="batch_abc123") # Replace with actual ID

print(job)
```

------
#### [ HTTP request ]

To cancel a batch job with a direct HTTP request, do the following:

1. Use the POST method and specify the URL by prefixing the Amazon Bedrock Runtime endpoint to `/openai/v1/batches/${batch_id}/cancel`, as in the following format:

   ```
   https://${bedrock-runtime-endpoint}/openai/v1/batches/batch_abc123/cancel
   ```

1. Specify your AWS credentials or an Amazon Bedrock API key in the `Authorization` header.

Before running the following example, first replace the placeholders in the following fields:
+ Authorization – Replace *\$1AWS\$1BEARER\$1TOKEN\$1BEDROCK* with your actual API key.
+ batch\$1abc123 – In the path, replace this value with the actual ID of your batch job.

The following example calls the OpenAI Cancel batch API in `us-west-2` on a batch job whose ID is *batch\$1abc123*.

```
curl -X GET 'https://bedrock-runtime.us-west-2.amazonaws.com/openai/v1/batches/batch_abc123/cancel' \
    -H 'Authorization: Bearer $AWS_BEARER_TOKEN_BEDROCK'
```

------

# Increase throughput with cross-Region inference
Cross-Region inference

With cross-Region inference, you can choose either a cross-Region inference profile tied to a specific geography (such as US or EU), or you can choose a global inference profile. When you choose an inference profile tied to a specific geography, Amazon Bedrock automatically selects the optimal commercial AWS Region within that geography to process your inference request. With global inference profiles, Amazon Bedrock automatically selects the optimal commercial AWS Region to process the request, which optimizes available resources and increases model throughput.

Both types of cross-Region inference work through [inference profiles](inference-profiles.md), which define a foundation model (FM) and the AWS Regions to which requests can be routed. When running model inference in on-demand mode, your requests might be restricted by service quotas or during peak usage times. Cross-Region inference enables you to seamlessly manage unplanned traffic bursts by utilizing compute across different AWS Regions.

You can also increase throughput for a model by purchasing [Provisioned Throughput](prov-throughput.md). Inference profiles currently don't support Provisioned Throughput.

To see the Regions and models with which you can use inference profiles to run cross-Region inference, refer to [Supported Regions and models for inference profiles](inference-profiles-support.md).

**Topics**
+ [

## Choosing between Geographic and Global cross-Region inference
](#cross-region-inference-comparison)
+ [

## General considerations
](#cross-region-inference-general-considerations)
+ [

# Geographic cross-Region inference
](geographic-cross-region-inference.md)
+ [

# Global cross-Region inference
](global-cross-region-inference.md)

## Choosing between Geographic and Global cross-Region inference


Amazon Bedrock provides two types of cross-Region inference profiles, each designed for different use cases and compliance requirements:


| Feature | Geographic Cross-Region Inference | Global Cross-Region Inference | Recommendation | 
| --- | --- | --- | --- | 
| Data residency | Within geographic boundaries (US, EU, APAC, etc.) | Any supported AWS commercial Region worldwide | Choose Geographic for compliance requirements | 
| Throughput | Higher than single-region | Highest available | Choose Global for maximum performance | 
| Cost | Standard pricing | Approximately 10% savings | Choose Global for cost optimization | 
| SCP requirements | Allow all destination Regions in profile | Allow "aws:RequestedRegion": "unspecified" | Configure based on your organizational policies | 
| Best suited for | Organizations with data residency regulations | Organizations prioritizing cost and performance | Assess your compliance and performance needs | 

Choose Geographic cross-Region inference when you have data residency requirements and need to ensure data processing remains within specific geographic boundaries. Choose Global cross-Region inference when you want maximum throughput and cost savings without geographic restrictions.

## General considerations


Note the following information about cross-Region inference:
+ There's no additional routing cost for using cross-Region inference. The price is calculated based on the Region from which you call an inference profile. For information about pricing, see [Amazon Bedrock pricing](https://aws.amazon.com/bedrock/pricing/).
+ Cross-Region inference can route requests to AWS Regions that are not manually enabled in your AWS account. Manual Region enablement is not required for cross-Region inference to function.
+ All data transmitted during cross-Region operations remains on the AWS network and does not traverse the public internet. Data is encrypted in transit between AWS Regions.
+ All cross-Region inference requests are logged in CloudTrail in your source Region. Look for the `additionalEventData.inferenceRegion` field to identify where requests were processed.
+ AWS Services powered by Amazon Bedrock may also use CRIS. See service-specific documentation for more details.

# Geographic cross-Region inference


Geographic cross-Region inference keeps data processing within specified geographic boundaries (US, EU, APAC, etc.) while providing higher throughput than single-region inference. This option is ideal for organizations with data residency requirements and compliance regulations.

## Geographic cross-Region inference considerations


Note the following information about Geographic cross-Region inference:
+ Cross-Region inference requests to an inference profile tied to a geography (e.g. US, EU and APAC) are kept within the AWS Regions that are part of the geography where the data originally resides. For example, a request made within the US is kept within the AWS Regions in the US. Although the data remains stored only in the source Region, your input prompts and output results might move outside of your source Region during cross-Region inference. All data will be transmitted encrypted across Amazon's secure network.
+ To see the default quotas for cross-Region throughput when using inference profiles tied to a geography (such as US, EU and APAC), refer to the **Cross-region model inference requests per minute for \$1\$1Model\$1** and **Cross-region model inference tokens per minute for \$1\$1Model\$1** values in [Amazon Bedrock service quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#limits_bedrock) in the *AWS General Reference*.

## IAM policy requirements for Geographic cross-Region inference


To allow an IAM user or role to invoke a Geographic cross-Region inference profile, you need to allow access to the following resources:

1. The geography-specific cross-Region inference profile (these profiles have geographic prefixes such as `us`, `eu`, `apac`)

1. The foundation model in the source Region

1. The foundation model in all destination Regions listed in the geographic profile

The following example policy grants the required permissions to use the Claude Sonnet 4.5 foundation model with a Geographic cross-Region inference profile for the US, where the source Region is `us-east-1` and the destination Regions are `us-east-1`, `us-east-2`, and `us-west-2`:

```
{
    "Version": "2012-10-17"		 	 	 ,
    "Statement": [
        {
            "Sid": "GrantGeoCrisInferenceProfileAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:us-east-1:<ACCOUNT_ID>:inference-profile/us.anthropic.claude-sonnet-4-5-20250929-v1:0"
            ]
        },
        {
            "Sid": "GrantGeoCrisModelAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0",
                "arn:aws:bedrock:us-east-2::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0",
                "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-sonnet-4-5-20250929-v1:0"
            ],
            "Condition": {
                "StringEquals": {
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:us-east-1:<ACCOUNT_ID>:inference-profile/us.anthropic.claude-sonnet-4-5-20250929-v1:0"
                }
            }
        }
    ]
}
```

The first statement grants `bedrock:InvokeModel` API access to the Geographic cross-Region inference profile for requests originating from the requesting Region. The second statement grants `bedrock:InvokeModel` API access to the foundation model in both the requesting Region and all destination Regions listed in the inference profile.

## Service Control Policy requirements for Geographic cross-Region inference


Many organizations implement Regional access controls through Service Control Policies in AWS Organizations for security and compliance. If your organization's security policy uses SCPs to block unused Regions, you must ensure that your Region-specific SCP conditions allow access to all destination Regions listed in the Geographic cross-Region inference profile for your source Region.

For Geographic cross-Region inference, you need to understand the relationship between your source Region (where you make the API call) and the destination Regions (where requests can be routed). Check the inference profile documentation to identify all destination Regions for your source Region, then ensure your SCPs allow access to all those destination Regions.

For example, if you're calling from us-east-1 (source Region) using the US Anthropic Claude Sonnet 4.5 Geographic profile, requests can be routed to us-east-1, us-east-2, and us-west-2 (destination Regions). If an SCP restricts access to only us-east-1, cross-Region inference will fail when trying to route to us-east-2 or us-west-2. Therefore, you need to allow all three destination regions in your SCP, regardless of which Region you're calling from.

When configuring SCPs for Region exclusion, remember that blocking any destination Region in the inference profile will prevent cross-Region inference from functioning properly, even if your source Region remains accessible. For SCP requirements for Global cross-Region inference, see [Service Control Policy requirements for Global cross-Region inference](global-cross-region-inference.md#global-cris-scp-setup).

To improve security, consider using the `bedrock:InferenceProfileArn` condition to limit access to specific inference profiles. This allows you to grant access to the required Regions while restricting which inference profiles can be used.

## Use Geographic cross-Region inference


To use Geographic cross-Region inference, you include an [inference profile](inference-profiles.md) when running model inference in the following ways:
+ **On-demand model inference** – Specify the ID of the inference profile as the `modelId` when sending an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html), [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html), or [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) request. An inference profile defines one or more Regions to which it can route inference requests originating from your source Region. Use of cross-Region inference increases throughput and performance by dynamically routing model invocation requests across the Regions defined in inference profile. Routing factors in user traffic, demand and utilization of resources. For more information, see [Submit prompts and generate responses with model inference](inference.md)
+ **Batch inference** – Submit requests asynchronously with batch inference by specifying the ID of the inference profile as the `modelId` when sending a [CreateModelInvocationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateModelInvocationJob.html) request. Using an inference profile lets you utilize compute across multiple AWS Regions and achieve faster processing times for your batch jobs. After the job is complete, you can retrieve the output files from the Amazon S3 bucket in the source Region.
+ **Agents** – Specify the ID of the inference profile in the `foundationModel` field in a [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateAgent.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateAgent.html) request. For more information, see [Create and configure agent manually](agents-create.md).
+ **Knowledge base response generation** – You can use cross-Region inference when generating a response after querying a knowledge base. For more information, see [Test your knowledge base with queries and responses](knowledge-base-test.md).
+ **Model evaluation** – You can submit an inference profile as a model to evaluate when submitting a model evaluation job. For more information, see [Evaluate the performance of Amazon Bedrock resources](evaluation.md).
+ **Prompt management** – You can use cross-Region inference when generating a response for a prompt you created in Prompt management. For more information, see [Construct and store reusable prompts with Prompt management in Amazon Bedrock](prompt-management.md)
+ **Prompt flows** – You can use cross-Region inference when generating a response for a prompt you define inline in a prompt node in a prompt flow. For more information, see [Build an end-to-end generative AI workflow with Amazon Bedrock Flows](flows.md).

To learn how to use an inference profile to send model invocation requests across Regions, see [Use an inference profile in model invocation](inference-profiles-use.md).

To learn more about cross-Region inference, see [Getting started with cross-Region inference in Amazon Bedrock](https://aws.amazon.com/blogs/machine-learning/getting-started-with-cross-region-inference-in-amazon-bedrock/).

For detailed information about global cross-Region inference, including IAM setup and service quota management, see [Global cross-Region inference](global-cross-region-inference.md).

# Global cross-Region inference


Global cross-Region inference extends cross-Region inference beyond geographic boundaries, enabling the routing of inference requests to supported commercial AWS Regions worldwide, optimizing available resources and enabling higher model throughput.

## Benefits of global cross-Region inference


Global cross-Region inference for Anthropic's Claude Sonnet 4.5 delivers multiple advantages over traditional geographic cross-Region inference profiles:
+ **Enhanced throughput during peak demand** – Global cross-Region inference provides improved resilience during periods of peak demand by automatically routing requests to AWS Regions with available capacity. This dynamic routing happens seamlessly without additional configuration or intervention from developers. Unlike traditional approaches that might require complex client-side load balancing between AWS Regions, global cross-Region inference handles traffic spikes automatically. This is particularly important for business-critical applications where downtime or degraded performance can have significant financial or reputational impacts.
+ **Cost-efficiency** – Global cross-Region inference for Anthropic's Claude Sonnet 4.5 offers approximately 10% savings on both input and output token pricing compared to geographic cross-Region inference. The price is calculated based on the AWS Region from which the request is made (source AWS Region). This means organizations can benefit from improved resilience with even lower costs. This pricing model makes global cross-Region inference a cost-effective solution for organizations looking to optimize their generative AI deployments. By improving resource utilization and enabling higher throughput without additional costs, it helps organizations maximize the value of their investment in Amazon Bedrock.
+ **Streamlined monitoring** – When using global cross-Region inference, CloudWatch and CloudTrail continue to record log entries in your source AWS Region, simplifying observability and management. Even though your requests are processed across different AWS Regions worldwide, you maintain a centralized view of your application's performance and usage patterns through your familiar AWS monitoring tools.
+ **On-demand quota flexibility** – With global cross-Region inference, your workloads are no longer limited by individual Regional capacity. Instead of being restricted to the capacity available in a specific AWS Region, your requests can be dynamically routed across the AWS global infrastructure. This provides access to a much larger pool of resources, making it less complicated to handle high-volume workloads and sudden traffic spikes.

## Global cross-Region inference considerations


Note the following information about Global cross-Region inference:
+ Global Cross-Region inference profiles provide higher throughput than an inference profile tied to a particular geography. An inference profile tied to a particular geography offers higher throughput than single-region inference.
+ To see the default quotas for cross-Region throughput when using Global inference profiles, refer to the **Global Cross-region model inference requests per minute for \$1\$1Model\$1** and **Global Cross-region model inference tokens per minute for \$1\$1Model\$1** values in [Amazon Bedrock service quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#limits_bedrock) in the *AWS General Reference*.

  You can request, view, and manage quotas for the Global Cross-Region Inference Profile from the [Service Quotas console](https://console.aws.amazon.com/servicequotas/home/services/bedrock/quotas) or by using AWS CLI commands in your **source region**.

## IAM policy requirements for global cross-Region inference


To enable global cross-Region inference for your users, you must apply a three-part IAM policy to the role. The following is an example IAM policy to provide granular control. You can replace `<REQUESTING REGION>` in the example policy with the AWS Region you are operating in.

```
{
    "Version": "2012-10-17"		 	 	 ,
    "Statement": [
        {
            "Sid": "GrantGlobalCrisInferenceProfileRegionAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "<REQUESTING REGION>"
                }
            }
        },
        {
            "Sid": "GrantGlobalCrisInferenceProfileInRegionModelAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:<REQUESTING REGION>::foundation-model/<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "<REQUESTING REGION>",
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
                }
            }
        },
        {
            "Sid": "GrantGlobalCrisInferenceProfileGlobalModelAccess",
            "Effect": "Allow",
            "Action": "bedrock:InvokeModel",
            "Resource": [
                "arn:aws:bedrock:::foundation-model/<MODEL NAME>"
            ],
            "Condition": {
                "StringEquals": {
                    "aws:RequestedRegion": "unspecified",
                    "bedrock:InferenceProfileArn": "arn:aws:bedrock:<REQUESTING REGION>:<ACCOUNT>:inference-profile/global.<MODEL NAME>"
                }
            }
        }
    ]
}
```

The first part of the policy grants access to the Regional inference profile in your requesting AWS Region. The second part provides access to the Regional FM resource. The third part grants access to the global FM resource, which enables the cross-Region routing capability.

When implementing these policies, make sure all three resource Amazon Resource Names (ARNs) are included in your IAM statements:
+ The Regional inference profile ARN follows the pattern `arn:aws:bedrock:REGION:ACCOUNT:inference-profile/global.MODEL-NAME`. This is used to give access to the global inference profile in the source AWS Region.
+ The Regional FM uses `arn:aws:bedrock:REGION::foundation-model/MODEL-NAME`. This is used to give access to the FM in the source AWS Region.
+ The global FM requires `arn:aws:bedrock:::foundation-model/MODEL-NAME`. This is used to give access to the FM in different global AWS Regions.

The global FM ARN has no AWS Region or account specified, which is intentional and required for the cross-Region functionality.

### Disable global cross-Region inference


You can choose from two primary approaches to implement deny policies to global CRIS for specific IAM roles, each with different use cases and implications:
+ **Remove an IAM policy** – The first method involves removing one or more of the three required IAM policies from user permissions. Because global CRIS requires all three policies to function, removing a policy will result in denied access.
+ **Implement a deny policy** – The second approach is to implement an explicit deny policy that specifically targets global CRIS inference profiles. This method provides clear documentation of your security intent and makes sure that even if someone accidentally adds the required allow policies later, the explicit deny will take precedence. The deny policy should use a `StringEquals` condition matching the pattern `"aws:RequestedRegion": "unspecified"`. This pattern specifically targets inference profiles with the `global` prefix.

When implementing deny policies, it's crucial to understand that global CRIS changes how the `aws:RequestedRegion` field behaves. Traditional AWS Region-based deny policies that use `StringEquals` conditions with specific AWS Region names such as `"aws:RequestedRegion": "us-west-2"` will not work as expected with global CRIS because the service sets this field to `global` rather than the actual destination AWS Region. However, as mentioned earlier, `"aws:RequestedRegion": "unspecified"` will result in the deny effect.

## Service Control Policy requirements for Global cross-Region inference


For Global cross-Region inference, if your organization's security policy uses SCPs to block unused Regions, you must update your region-specific SCP conditions to allow access with `"aws:RequestedRegion": "unspecified"`. This condition is specific to Amazon Bedrock Global cross-Region inference and ensures that requests can be routed to all supported AWS commercial Regions.

The following example SCP blocks all AWS API calls outside of approved Regions while allowing Amazon Bedrock Global cross-Region inference calls that use `"unspecified"` as the Region for global routing:

```
{
    "Version": "2012-10-17"		 	 	 ,
    "Statement": [
        {
            "Sid": "DenyAllOutsideApprovedRegions",
            "Effect": "Deny",
            "Action": "*",
            "Resource": "*",
            "Condition": {
                "StringNotEquals": {
                    "aws:RequestedRegion": [
                        "us-east-1",
                        "us-east-2",
                        "us-west-2",
                        "unspecified"
                    ]
                }
            }
        }
    ]
}
```

### Disable global cross-Region inference


Organizations with data residency or compliance requirements should assess whether Global cross-Region inference fits their compliance framework, since requests may be processed in other supported AWS commercial Regions. To explicitly disable Global cross-Region inference, implement the following SCP policy:

```
{
    "Effect": "Deny",
    "Action": "bedrock:*",
    "Resource": "*",
    "Condition": {
        "StringEquals": {
            "aws:RequestedRegion": "unspecified"
        },
        "ArnLike": {
            "bedrock:InferenceProfileArn": "arn:aws:bedrock:*:*:inference-profile/global.*"
        }
    }
}
```

This SCP explicitly denies Global cross-Region inference because the `"aws:RequestedRegion"` is `"unspecified"` and the `"ArnLike"` condition targets inference profiles with the `global` prefix in the ARN.

### AWS Control Tower implementation


Manually editing SCPs managed by AWS Control Tower is strongly discouraged as it can cause drift. Instead, use the mechanisms provided by Control Tower to manage these exceptions. The core principles involve either extending existing region-deny controls or enabling regions and then applying a custom, conditional blocking policy.

For detailed, step-by-step guidance on implementing cross-Region inference with Control Tower, see the blog post [ Enable Amazon Bedrock cross-Region inference in multi-account environments](https://aws.amazon.com/blogs/machine-learning/enable-amazon-bedrock-cross-region-inference-in-multi-account-environments/). This covers extending existing Region deny SCPs, enabling denied regions with custom SCPs, and using Customizations for AWS Control Tower (CfCT) to deploy custom SCPs as infrastructure as code.

## Request limit increases for global cross-Region inference


When using global CRIS inference profiles, you can use global CRIS from over 20 supported source AWS Regions. Because this will be a global limit, requests to view, manage, or increase quotas for global cross-Region inference profiles must be made through the Service Quotas console or AWS Command Line Interface (AWS CLI) in the requested source AWS Region.

Complete the following steps to request a limit increase:

1. Sign in to the Service Quotas console in your AWS account.

1. In the navigation pane, choose **AWS services**.

1. From the list of services, find and choose **Amazon Bedrock**.

1. In the list of quotas for Amazon Bedrock, use the search filter to find the specific global CRIS quotas. For example:
   + Global cross-Region model inference tokens per minute for Anthropic Claude Sonnet 4.5 V1

1. Select the quota you want to increase.

1. Choose **Request increase at account level**.

1. Enter your desired new quota value.

1. Choose **Request** to submit your request.

When calculating your required quota increase, remember to take into account for the burndown rate, defined as the rate at which input and output tokens are converted into token quota usage for the throttling system. The following models have a **5x burn down rate for output tokens (1 output token consumes 5 tokens from your quotas)**:
+ Anthropic Claude Opus 4
+ Anthropic Claude Sonnet 4.5
+ Anthropic Claude Sonnet 4
+ Anthropic Claude 3.7 Sonnet

For all other models, the burndown rate is **1:1** (1 output token consumes 1 token from your quota). For input tokens, the token to quota ratio is 1:1. The calculation for the total number of tokens per request is as follows:

`Input token count + Cache write input tokens + (Output token count x Burndown rate)`

## Use Global cross-Region inference


To use global cross-Region inference with Anthropic's Claude Sonnet 4.5, developers must complete the following key steps:
+ **Use the global inference profile ID** – When making API calls to Amazon Bedrock, specify the global Anthropic's Claude Sonnet 4.5 inference profile ID (`global.anthropic.claude-sonnet-4-5-20250929-v1:0`) instead of a AWS Region-specific model ID.
+ **Configure IAM permissions** – Grant appropriate IAM permissions to access the inference profile and FMs in potential destination AWS Regions.

Global cross-Region inference is supported for:
+ On-demand model inference
+ Batch inference
+ Agents
+ Model evaluation
+ Prompt management
+ Prompt flows

**Note**  
Global inference profile is supported for On-demand model inference, Batch inference, Agents, Model evaluation, Prompt management, and Prompt flows.

## Implement global cross-Region inference


Implementing global cross-Region inference with Anthropic's Claude Sonnet 4.5 is straightforward, requiring only a few changes to your existing application code. The following is an example of how to update your code in Python:

```
import boto3
import json
bedrock = boto3.client('bedrock-runtime', region_name='us-east-1')
model_id = "global.anthropic.claude-sonnet-4-5-20250929-v1:0"  
response = bedrock.converse(
    messages=[{"role": "user", "content": [{"text": "Explain cloud computing in 2 sentences."}]}],
    modelId=model_id,
)

print("Response:", response['output']['message']['content'][0]['text'])
print("Token usage:", response['usage'])
print("Total tokens:", response['usage']['totalTokens'])
```

# Set up a model invocation resource using inference profiles
Inference profiles

*Inference profiles* are a resource in Amazon Bedrock that define a model and one or more Regions to which the inference profile can route model invocation requests. You can use inference profiles for the following tasks:
+ **Track usage metrics** – Set up CloudWatch logs and submit model invocation requests with an application inference profile to collect usage metrics for model invocation. You can examine these metrics when you view information about the inference profile and use them to inform your decisions. For more information about how to set up CloudWatch logs, see [Monitor model invocation using CloudWatch Logs and Amazon S3](model-invocation-logging.md).
+ **Use tags to monitor costs** – Attach tags to an application inference profile to track costs when you submit on-demand model invocation requests. For more information on how to use tags for cost allocation, see [Organizing and tracking costs using AWS cost allocation tags](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/cost-alloc-tags.html) in the AWS Billing user guide.
+ **Cross-Region inference** – Increase your throughput by using an inference profile that includes multiple AWS Regions. The inference profile will distribute model invocation requests across these Regions to increase throughput and performance. For more information about cross-Region inference, see [Increase throughput with cross-Region inference](cross-region-inference.md).

Amazon Bedrock offers the following types of inference profiles:
+ **Cross Region (system-defined) inference profiles** – Inference profiles that are predefined in Amazon Bedrock and include multiple Regions to which requests for a model can be routed.
+ **Application inference profiles** – Inference profiles that a user creates to track costs and model usage. You can create an inference profile that routes model invocation requests to one Region or to multiple Regions:
  + To create an inference profile that tracks costs and usage for a model in one Region, specify the foundation model in the Region to which you want the inference profile to route requests.
  + To create an inference profile that tracks costs and usage for a model across multiple Regions, specify the cross Region (system-defined) inference profile that defines the model and Regions to which you want the inference profile to route requests.

You can use inference profiles with the following features to route requests to multiple Regions and to track usage and cost for invocation requests made with these features:
+ Model inference – Use an inference profile when running model invocation by choosing an inference profile in a playground in the Amazon Bedrock console, or by specifying the ARN of the inference profile when calling the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html), [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html), and [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) operations. For more information, see [Submit prompts and generate responses with model inference](inference.md).
+ Knowledge base vector embedding and response generation – Use an inference profile when generating a response after querying a knowledge base or when parsing non-textual information in a data source. For more information, see [Test your knowledge base with queries and responses](knowledge-base-test.md) and [Parsing options for your data source](kb-advanced-parsing.md).
+ Model evaluation – You can submit an inference profile as a model to evaluate when submitting a model evaluation job. For more information, see [Evaluate the performance of Amazon Bedrock resources](evaluation.md).
+ Prompt management – You can use an inference profile when generating a response for a prompt you created in Prompt management. For more information, see [Construct and store reusable prompts with Prompt management in Amazon Bedrock](prompt-management.md)
+ Flows – You can use an inference profile when generating a response for a prompt you define inline in a prompt node in a flow. For more information, see [Build an end-to-end generative AI workflow with Amazon Bedrock Flows](flows.md).

The price for using an inference profile is calculated based on the price of the model in the Region from which you call the inference profile. For information about pricing, see [Amazon Bedrock pricing](https://aws.amazon.com/bedrock/pricing/).

For more details about the throughput that a cross-Region inference profile can offer, see [Increase throughput with cross-Region inference](cross-region-inference.md).

**Topics**
+ [

# Supported Regions and models for inference profiles
](inference-profiles-support.md)
+ [

# Prerequisites for inference profiles
](inference-profiles-prereq.md)
+ [

# Create an application inference profile
](inference-profiles-create.md)
+ [

# Modify the tags for an application inference profile
](inference-profiles-modify.md)
+ [

# View information about an inference profile
](inference-profiles-view.md)
+ [

# Use an inference profile in model invocation
](inference-profiles-use.md)
+ [

# Delete an application inference profile
](inference-profiles-delete.md)

# Supported Regions and models for inference profiles
Supported Regions and models

For a list of Region codes and endpoints supported in Amazon Bedrock, see [Amazon Bedrock endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#bedrock_region). This topic describes predefined inference profiles that you can use and the Regions and models that support application inference profiles.

**Topics**
+ [

## Supported cross-Region inference profiles
](#inference-profiles-support-system)
+ [

## Supported Regions and models for application inference profiles
](#inference-profiles-support-user)

## Supported cross-Region inference profiles


You can carry out [cross-Region inference](cross-region-inference.md) with cross-Region (system-defined) inference profiles. Cross-Region inference allows you to seamlessly manage unplanned traffic bursts by utilizing compute across different AWS Regions. With cross-Region inference, you can distribute traffic across multiple AWS Regions.

Cross-region (system-defined) inference profiles are named after the model that they support and defined by the Regions that they support. To understand how a cross-region inference profile handles your requests, review the following definitions:
+ **Source Region** – The Region from which you make the API request that specifies the inference profile.
+ **Destination Region** – A Region to which the Amazon Bedrock service can route the request from your source Region.

When you invoke a cross-Region inference profile in Amazon Bedrock, your request originates from a source Region and is automatically routed to one of the destination Regions defined in that profile, optimizing for performance. The destination Regions for Global cross-Region inference profiles include all commercial Regions.

**Note**  
The destination Regions in a cross-Region inference profile can include *opt-in Regions*, which are Regions that you must explicitly enable at AWS account or Organization level. To learn more, see [Enable or disable AWS Regions in your account](https://docs.aws.amazon.com/accounts/latest/reference/manage-acct-regions.html). When using a cross-Region inference profile, your inference request can be routed to any of the destination Regions in the profile, even if you did not opt-in to such Regions in your account.

Service Control Policies (SCPs) and AWS Identity and Access Management (IAM) policies work together to control where cross-Region inference is allowed. Using SCPs, you can control which Regions Amazon Bedrock can use for inference, and using IAM policies, you can define which users or roles have permission to run inference. If any destination Region in a cross-Region inference profile is blocked in your SCPs, the request will fail even if other Regions remain allowed. To ensure efficient operation with cross-region inference, you can update your SCPs and IAM policies to allow all required Amazon Bedrock inference actions (for example, `bedrock:InvokeModel*` or `bedrock:CreateModelInvocationJob`) in all destination Regions included in your chosen inference profile. To learn more, see [Enabling Amazon Bedrock cross-Region inference in multi-account environments.](https://aws.amazon.com/blogs/machine-learning/enable-amazon-bedrock-cross-region-inference-in-multi-account-environments/)

**Note**  
Some inference profiles route to different destination Regions depending on the source Region from which you call it. For example, if you call `us.anthropic.claude-3-haiku-20240307-v1:0` from US East (Ohio), it can route requests to `us-east-1`, `us-east-2`, or `us-west-2`, but if you call it from US West (Oregon), it can route requests to only `us-east-1` and `us-west-2`.

To check the source and destination Regions for an inference profile, you can do one of the following:
+ Expand the corresponding section in the [list of supported cross-region inference profiles](#inference-profiles-support).
+ Send a [GetInferenceProfile](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetInferenceProfile.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) from a source Region and specify the Amazon Resource Name (ARN) or ID of the inference profile in the `inferenceProfileIdentifier` field. The `models` field in the response maps to a list of model ARNs, in which you can identify each destination Region.

**Note**  
Global cross-Region inference profile for a specific model can change over time as AWS adds more commercial Regions where your requests can be processed. However, if an inference profile is tied to a geography (such as US, EU, or APAC), its destination Region list will never change. AWS might create new inference profiles that incorporate new Regions. You can update your systems to use these inference profiles by changing the IDs in your setup to the new ones.  
The Global cross-region inference profile is currently only supported on Anthropic Claude Sonnet 4 model for the following source Regions: US West (Oregon), US East (N. Virginia), US East (Ohio), Europe (Ireland), and Asia Pacific (Tokyo). The destination Regions for Global inference profile include all commercial AWS Regions.

Expand one of the following sections to see information about a cross-Region inference profile, the source Regions from which it can be called, and the destination Regions to which it can route requests.

### GLOBAL Amazon Nova 2 Lite


To call the GLOBAL Amazon Nova 2 Lite inference profile, specify the following inference profile ID in one of the source Regions:

```
global.amazon.nova-2-lite-v1:0
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-east-2 |  Commercial AWS Regions ap-east-2  | 
| ap-northeast-1 |  Commercial AWS Regions ap-northeast-1  | 
| ap-northeast-2 |  Commercial AWS Regions ap-northeast-2  | 
| ap-south-1 |  Commercial AWS Regions ap-south-1  | 
| ap-southeast-1 |  Commercial AWS Regions ap-southeast-1  | 
| ap-southeast-2 |  Commercial AWS Regions ap-southeast-2  | 
| ap-southeast-3 |  Commercial AWS Regions ap-southeast-3  | 
| ap-southeast-4 |  Commercial AWS Regions ap-southeast-4  | 
| ap-southeast-5 |  Commercial AWS Regions ap-southeast-5  | 
| ap-southeast-7 |  Commercial AWS Regions ap-southeast-7  | 
| ca-central-1 |  Commercial AWS Regions ca-central-1  | 
| ca-west-1 |  Commercial AWS Regions ca-west-1  | 
| eu-central-1 |  Commercial AWS Regions eu-central-1  | 
| eu-north-1 |  Commercial AWS Regions eu-north-1  | 
| eu-south-1 |  Commercial AWS Regions eu-south-1  | 
| eu-south-2 |  Commercial AWS Regions eu-south-2  | 
| eu-west-1 |  Commercial AWS Regions eu-west-1  | 
| eu-west-2 |  Commercial AWS Regions eu-west-2  | 
| eu-west-3 |  Commercial AWS Regions eu-west-3  | 
| il-central-1 |  Commercial AWS Regions il-central-1  | 
| me-central-1 |  Commercial AWS Regions me-central-1  | 
| us-east-1 |  Commercial AWS Regions us-east-1  | 
| us-east-2 |  Commercial AWS Regions us-east-2  | 
| us-west-1 |  Commercial AWS Regions us-west-1  | 
| us-west-2 |  Commercial AWS Regions us-west-2  | 

### GLOBAL Anthropic Claude Opus 4.5


To call the GLOBAL Anthropic Claude Opus 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
global.anthropic.claude-opus-4-5-20251101-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| af-south-1 |  Commercial AWS Regions af-south-1  | 
| ap-east-2 |  Commercial AWS Regions ap-east-2  | 
| ap-northeast-1 |  Commercial AWS Regions ap-northeast-1  | 
| ap-northeast-2 |  Commercial AWS Regions ap-northeast-2  | 
| ap-northeast-3 |  Commercial AWS Regions ap-northeast-3  | 
| ap-south-1 |  Commercial AWS Regions ap-south-1  | 
| ap-south-2 |  Commercial AWS Regions ap-south-2  | 
| ap-southeast-1 |  Commercial AWS Regions ap-southeast-1  | 
| ap-southeast-2 |  Commercial AWS Regions ap-southeast-2  | 
| ap-southeast-3 |  Commercial AWS Regions ap-southeast-3  | 
| ap-southeast-4 |  Commercial AWS Regions ap-southeast-4  | 
| ap-southeast-5 |  Commercial AWS Regions ap-southeast-5  | 
| ap-southeast-7 |  Commercial AWS Regions ap-southeast-7  | 
| ca-central-1 |  Commercial AWS Regions ca-central-1  | 
| ca-west-1 |  Commercial AWS Regions ca-west-1  | 
| eu-central-1 |  Commercial AWS Regions eu-central-1  | 
| eu-central-2 |  Commercial AWS Regions eu-central-2  | 
| eu-north-1 |  Commercial AWS Regions eu-north-1  | 
| eu-south-1 |  Commercial AWS Regions eu-south-1  | 
| eu-south-2 |  Commercial AWS Regions eu-south-2  | 
| eu-west-1 |  Commercial AWS Regions eu-west-1  | 
| eu-west-2 |  Commercial AWS Regions eu-west-2  | 
| eu-west-3 |  Commercial AWS Regions eu-west-3  | 
| il-central-1 |  Commercial AWS Regions il-central-1  | 
| me-central-1 |  Commercial AWS Regions me-central-1  | 
| me-south-1 |  Commercial AWS Regions me-south-1  | 
| mx-central-1 |  Commercial AWS Regions mx-central-1  | 
| sa-east-1 |  Commercial AWS Regions sa-east-1  | 
| us-east-1 |  Commercial AWS Regions us-east-1  | 
| us-east-2 |  Commercial AWS Regions us-east-2  | 
| us-west-1 |  Commercial AWS Regions us-west-1  | 
| us-west-2 |  Commercial AWS Regions us-west-2  | 

### GLOBAL TwelveLabs Pegasus v1.2


To call the GLOBAL TwelveLabs Pegasus v1.2 inference profile, specify the following inference profile ID in one of the source Regions:

```
global.twelvelabs.pegasus-1-2-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-pegasus.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| af-south-1 |  Commercial AWS Regions af-south-1  | 
| ap-east-2 |  Commercial AWS Regions ap-east-2  | 
| ap-northeast-1 |  Commercial AWS Regions ap-northeast-1  | 
| ap-northeast-2 |  Commercial AWS Regions ap-northeast-2  | 
| ap-northeast-3 |  Commercial AWS Regions ap-northeast-3  | 
| ap-south-1 |  Commercial AWS Regions ap-south-1  | 
| ap-south-2 |  Commercial AWS Regions ap-south-2  | 
| ap-southeast-1 |  Commercial AWS Regions ap-southeast-1  | 
| ap-southeast-2 |  Commercial AWS Regions ap-southeast-2  | 
| ap-southeast-3 |  Commercial AWS Regions ap-southeast-3  | 
| ap-southeast-4 |  Commercial AWS Regions ap-southeast-4  | 
| ap-southeast-5 |  Commercial AWS Regions ap-southeast-5  | 
| ap-southeast-7 |  Commercial AWS Regions ap-southeast-7  | 
| ca-central-1 |  Commercial AWS Regions ca-central-1  | 
| ca-west-1 |  Commercial AWS Regions ca-west-1  | 
| eu-central-1 |  Commercial AWS Regions eu-central-1  | 
| eu-central-2 |  Commercial AWS Regions eu-central-2  | 
| eu-north-1 |  Commercial AWS Regions eu-north-1  | 
| eu-south-1 |  Commercial AWS Regions eu-south-1  | 
| eu-south-2 |  Commercial AWS Regions eu-south-2  | 
| eu-west-1 |  Commercial AWS Regions eu-west-1  | 
| eu-west-2 |  Commercial AWS Regions eu-west-2  | 
| eu-west-3 |  Commercial AWS Regions eu-west-3  | 
| il-central-1 |  Commercial AWS Regions il-central-1  | 
| me-central-1 |  Commercial AWS Regions me-central-1  | 
| me-south-1 |  Commercial AWS Regions me-south-1  | 
| mx-central-1 |  Commercial AWS Regions mx-central-1  | 
| sa-east-1 |  Commercial AWS Regions sa-east-1  | 
| us-east-1 |  Commercial AWS Regions us-east-1  | 
| us-east-2 |  Commercial AWS Regions us-east-2  | 
| us-west-1 |  Commercial AWS Regions us-west-1  | 
| us-west-2 |  Commercial AWS Regions us-west-2  | 

### Global Anthropic Claude Haiku 4.5


To call the Global Anthropic Claude Haiku 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
global.anthropic.claude-haiku-4-5-20251001-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| af-south-1 |  Commercial AWS Regions af-south-1  | 
| ap-east-2 |  Commercial AWS Regions ap-east-2  | 
| ap-northeast-1 |  Commercial AWS Regions ap-northeast-1  | 
| ap-northeast-2 |  Commercial AWS Regions ap-northeast-2  | 
| ap-northeast-3 |  Commercial AWS Regions ap-northeast-3  | 
| ap-south-1 |  Commercial AWS Regions ap-south-1  | 
| ap-south-2 |  Commercial AWS Regions ap-south-2  | 
| ap-southeast-1 |  Commercial AWS Regions ap-southeast-1  | 
| ap-southeast-2 |  Commercial AWS Regions ap-southeast-2  | 
| ap-southeast-3 |  Commercial AWS Regions ap-southeast-3  | 
| ap-southeast-4 |  Commercial AWS Regions ap-southeast-4  | 
| ap-southeast-5 |  Commercial AWS Regions ap-southeast-5  | 
| ap-southeast-7 |  Commercial AWS Regions ap-southeast-7  | 
| ca-central-1 |  Commercial AWS Regions ca-central-1  | 
| ca-west-1 |  Commercial AWS Regions ca-west-1  | 
| eu-central-1 |  Commercial AWS Regions eu-central-1  | 
| eu-central-2 |  Commercial AWS Regions eu-central-2  | 
| eu-north-1 |  Commercial AWS Regions eu-north-1  | 
| eu-south-1 |  Commercial AWS Regions eu-south-1  | 
| eu-south-2 |  Commercial AWS Regions eu-south-2  | 
| eu-west-1 |  Commercial AWS Regions eu-west-1  | 
| eu-west-2 |  Commercial AWS Regions eu-west-2  | 
| eu-west-3 |  Commercial AWS Regions eu-west-3  | 
| il-central-1 |  Commercial AWS Regions il-central-1  | 
| me-central-1 |  Commercial AWS Regions me-central-1  | 
| me-south-1 |  Commercial AWS Regions me-south-1  | 
| mx-central-1 |  Commercial AWS Regions mx-central-1  | 
| sa-east-1 |  Commercial AWS Regions sa-east-1  | 
| us-east-1 |  Commercial AWS Regions us-east-1  | 
| us-east-2 |  Commercial AWS Regions us-east-2  | 
| us-west-1 |  Commercial AWS Regions us-west-1  | 
| us-west-2 |  Commercial AWS Regions us-west-2  | 

### Global Anthropic Claude Opus 4.6


To call the Global Anthropic Claude Opus 4.6 inference profile, specify the following inference profile ID in one of the source Regions:

```
global.anthropic.claude-opus-4-6-v1
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| af-south-1 |  Commercial AWS Regions af-south-1  | 
| ap-east-2 |  Commercial AWS Regions ap-east-2  | 
| ap-northeast-1 |  Commercial AWS Regions ap-northeast-1  | 
| ap-northeast-2 |  Commercial AWS Regions ap-northeast-2  | 
| ap-northeast-3 |  Commercial AWS Regions ap-northeast-3  | 
| ap-south-1 |  Commercial AWS Regions ap-south-1  | 
| ap-south-2 |  Commercial AWS Regions ap-south-2  | 
| ap-southeast-1 |  Commercial AWS Regions ap-southeast-1  | 
| ap-southeast-2 |  Commercial AWS Regions ap-southeast-2  | 
| ap-southeast-3 |  Commercial AWS Regions ap-southeast-3  | 
| ap-southeast-4 |  Commercial AWS Regions ap-southeast-4  | 
| ap-southeast-5 |  Commercial AWS Regions ap-southeast-5  | 
| ap-southeast-7 |  Commercial AWS Regions ap-southeast-7  | 
| ca-central-1 |  Commercial AWS Regions ca-central-1  | 
| ca-west-1 |  Commercial AWS Regions ca-west-1  | 
| eu-central-1 |  Commercial AWS Regions eu-central-1  | 
| eu-central-2 |  Commercial AWS Regions eu-central-2  | 
| eu-north-1 |  Commercial AWS Regions eu-north-1  | 
| eu-south-1 |  Commercial AWS Regions eu-south-1  | 
| eu-south-2 |  Commercial AWS Regions eu-south-2  | 
| eu-west-1 |  Commercial AWS Regions eu-west-1  | 
| eu-west-2 |  Commercial AWS Regions eu-west-2  | 
| eu-west-3 |  Commercial AWS Regions eu-west-3  | 
| il-central-1 |  Commercial AWS Regions il-central-1  | 
| me-central-1 |  Commercial AWS Regions me-central-1  | 
| me-south-1 |  Commercial AWS Regions me-south-1  | 
| mx-central-1 |  Commercial AWS Regions mx-central-1  | 
| sa-east-1 |  Commercial AWS Regions sa-east-1  | 
| us-east-1 |  Commercial AWS Regions us-east-1  | 
| us-east-2 |  Commercial AWS Regions us-east-2  | 
| us-west-1 |  Commercial AWS Regions us-west-1  | 
| us-west-2 |  Commercial AWS Regions us-west-2  | 

### Global Anthropic Claude Sonnet 4.6


To call the Global Anthropic Claude Sonnet 4.6 inference profile, specify the following inference profile ID in one of the source Regions:

```
global.anthropic.claude-sonnet-4-6
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| af-south-1 |  Commercial AWS Regions af-south-1  | 
| ap-east-2 |  Commercial AWS Regions ap-east-2  | 
| ap-northeast-1 |  Commercial AWS Regions ap-northeast-1  | 
| ap-northeast-2 |  Commercial AWS Regions ap-northeast-2  | 
| ap-northeast-3 |  Commercial AWS Regions ap-northeast-3  | 
| ap-south-1 |  Commercial AWS Regions ap-south-1  | 
| ap-south-2 |  Commercial AWS Regions ap-south-2  | 
| ap-southeast-1 |  Commercial AWS Regions ap-southeast-1  | 
| ap-southeast-2 |  Commercial AWS Regions ap-southeast-2  | 
| ap-southeast-3 |  Commercial AWS Regions ap-southeast-3  | 
| ap-southeast-4 |  Commercial AWS Regions ap-southeast-4  | 
| ap-southeast-5 |  Commercial AWS Regions ap-southeast-5  | 
| ap-southeast-7 |  Commercial AWS Regions ap-southeast-7  | 
| ca-central-1 |  Commercial AWS Regions ca-central-1  | 
| ca-west-1 |  Commercial AWS Regions ca-west-1  | 
| eu-central-1 |  Commercial AWS Regions eu-central-1  | 
| eu-central-2 |  Commercial AWS Regions eu-central-2  | 
| eu-north-1 |  Commercial AWS Regions eu-north-1  | 
| eu-south-1 |  Commercial AWS Regions eu-south-1  | 
| eu-south-2 |  Commercial AWS Regions eu-south-2  | 
| eu-west-1 |  Commercial AWS Regions eu-west-1  | 
| eu-west-2 |  Commercial AWS Regions eu-west-2  | 
| eu-west-3 |  Commercial AWS Regions eu-west-3  | 
| il-central-1 |  Commercial AWS Regions il-central-1  | 
| me-central-1 |  Commercial AWS Regions me-central-1  | 
| me-south-1 |  Commercial AWS Regions me-south-1  | 
| mx-central-1 |  Commercial AWS Regions mx-central-1  | 
| sa-east-1 |  Commercial AWS Regions sa-east-1  | 
| us-east-1 |  Commercial AWS Regions us-east-1  | 
| us-east-2 |  Commercial AWS Regions us-east-2  | 
| us-west-1 |  Commercial AWS Regions us-west-1  | 
| us-west-2 |  Commercial AWS Regions us-west-2  | 

### Global Claude Sonnet 4


To call the Global Claude Sonnet 4 inference profile, specify the following inference profile ID in one of the source Regions:

```
global.anthropic.claude-sonnet-4-20250514-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  Commercial AWS Regions ap-northeast-1  | 
| eu-west-1 |  Commercial AWS Regions eu-west-1  | 
| us-east-1 |  Commercial AWS Regions us-east-1  | 
| us-east-2 |  Commercial AWS Regions us-east-2  | 
| us-west-2 |  Commercial AWS Regions us-west-2  | 

### Global Claude Sonnet 4.5


To call the Global Claude Sonnet 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
global.anthropic.claude-sonnet-4-5-20250929-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| af-south-1 |  Commercial AWS Regions af-south-1  | 
| ap-east-2 |  Commercial AWS Regions ap-east-2  | 
| ap-northeast-1 |  Commercial AWS Regions ap-northeast-1  | 
| ap-northeast-2 |  Commercial AWS Regions ap-northeast-2  | 
| ap-northeast-3 |  Commercial AWS Regions ap-northeast-3  | 
| ap-south-1 |  Commercial AWS Regions ap-south-1  | 
| ap-south-2 |  Commercial AWS Regions ap-south-2  | 
| ap-southeast-1 |  Commercial AWS Regions ap-southeast-1  | 
| ap-southeast-2 |  Commercial AWS Regions ap-southeast-2  | 
| ap-southeast-3 |  Commercial AWS Regions ap-southeast-3  | 
| ap-southeast-4 |  Commercial AWS Regions ap-southeast-4  | 
| ap-southeast-5 |  Commercial AWS Regions ap-southeast-5  | 
| ap-southeast-7 |  Commercial AWS Regions ap-southeast-7  | 
| ca-central-1 |  Commercial AWS Regions ca-central-1  | 
| ca-west-1 |  Commercial AWS Regions ca-west-1  | 
| eu-central-1 |  Commercial AWS Regions eu-central-1  | 
| eu-central-2 |  Commercial AWS Regions eu-central-2  | 
| eu-north-1 |  Commercial AWS Regions eu-north-1  | 
| eu-south-1 |  Commercial AWS Regions eu-south-1  | 
| eu-south-2 |  Commercial AWS Regions eu-south-2  | 
| eu-west-1 |  Commercial AWS Regions eu-west-1  | 
| eu-west-2 |  Commercial AWS Regions eu-west-2  | 
| eu-west-3 |  Commercial AWS Regions eu-west-3  | 
| il-central-1 |  Commercial AWS Regions il-central-1  | 
| me-central-1 |  Commercial AWS Regions me-central-1  | 
| me-south-1 |  Commercial AWS Regions me-south-1  | 
| mx-central-1 |  Commercial AWS Regions mx-central-1  | 
| sa-east-1 |  Commercial AWS Regions sa-east-1  | 
| us-east-1 |  Commercial AWS Regions us-east-1  | 
| us-east-2 |  Commercial AWS Regions us-east-2  | 
| us-west-1 |  Commercial AWS Regions us-west-1  | 
| us-west-2 |  Commercial AWS Regions us-west-2  | 

### Global Cohere Embed v4


To call the Global Cohere Embed v4 inference profile, specify the following inference profile ID in one of the source Regions:

```
global.cohere.embed-v4:0
```

For more information about inference parameters for this model, see [Link](model-parameters-embed.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  Commercial AWS Regions ap-northeast-1  | 
| ap-northeast-2 |  Commercial AWS Regions ap-northeast-2  | 
| ap-northeast-3 |  Commercial AWS Regions ap-northeast-3  | 
| ap-south-1 |  Commercial AWS Regions ap-south-1  | 
| ap-south-2 |  Commercial AWS Regions ap-south-2  | 
| ap-southeast-1 |  Commercial AWS Regions ap-southeast-1  | 
| ap-southeast-2 |  Commercial AWS Regions ap-southeast-2  | 
| ap-southeast-3 |  Commercial AWS Regions ap-southeast-3  | 
| ap-southeast-4 |  Commercial AWS Regions ap-southeast-4  | 
| ca-central-1 |  Commercial AWS Regions ca-central-1  | 
| eu-central-1 |  Commercial AWS Regions eu-central-1  | 
| eu-central-2 |  Commercial AWS Regions eu-central-2  | 
| eu-north-1 |  Commercial AWS Regions eu-north-1  | 
| eu-south-1 |  Commercial AWS Regions eu-south-1  | 
| eu-south-2 |  Commercial AWS Regions eu-south-2  | 
| eu-west-1 |  Commercial AWS Regions eu-west-1  | 
| eu-west-2 |  Commercial AWS Regions eu-west-2  | 
| eu-west-3 |  Commercial AWS Regions eu-west-3  | 
| sa-east-1 |  Commercial AWS Regions sa-east-1  | 
| us-east-1 |  Commercial AWS Regions us-east-1  | 
| us-east-2 |  Commercial AWS Regions us-east-2  | 
| us-west-1 |  Commercial AWS Regions us-west-1  | 
| us-west-2 |  Commercial AWS Regions us-west-2  | 

### US Amazon Nova 2 Lite


To call the US Amazon Nova 2 Lite inference profile, specify the following inference profile ID in one of the source Regions:

```
us.amazon.nova-2-lite-v1:0
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ca-central-1 |  ca-central-1 us-east-1 us-east-2 us-west-2  | 
| ca-west-1 |  ca-west-1 us-east-1 us-east-2 us-west-2  | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Anthropic Claude 3 Haiku


To call the US Anthropic Claude 3 Haiku inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-3-haiku-20240307-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-west-2  | 

### US Anthropic Claude 3 Opus


To call the US Anthropic Claude 3 Opus inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-3-opus-20240229-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-west-2  | 
| us-west-2 |  us-east-1 us-west-2  | 

### US Anthropic Claude 3 Sonnet


To call the US Anthropic Claude 3 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-3-sonnet-20240229-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-west-2  | 
| us-west-2 |  us-east-1 us-west-2  | 

### US Anthropic Claude 3.5 Haiku


To call the US Anthropic Claude 3.5 Haiku inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-3-5-haiku-20241022-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Anthropic Claude 3.5 Sonnet


To call the US Anthropic Claude 3.5 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-3-5-sonnet-20240620-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-west-2  | 
| us-east-2 |  us-east-1 us-west-2  | 
| us-west-2 |  us-east-1 us-west-2  | 

### US Anthropic Claude 3.5 Sonnet v2


To call the US Anthropic Claude 3.5 Sonnet v2 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-3-5-sonnet-20241022-v2:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Anthropic Claude 3.7 Sonnet


To call the US Anthropic Claude 3.7 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-3-7-sonnet-20250219-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Anthropic Claude Haiku 4.5


To call the US Anthropic Claude Haiku 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-haiku-4-5-20251001-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ca-central-1 |  ca-central-1 us-east-1 us-east-2 us-west-2  | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Anthropic Claude Opus 4.5


To call the US Anthropic Claude Opus 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-opus-4-5-20251101-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ca-central-1 |  ca-central-1 us-east-1 us-east-2 us-west-2  | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Anthropic Claude Opus 4.6


To call the US Anthropic Claude Opus 4.6 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-opus-4-6-v1
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ca-central-1 |  ca-central-1 us-east-1 us-east-2 us-west-2  | 
| ca-west-1 |  ca-west-1 us-east-1 us-east-2 us-west-2  | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Anthropic Claude Sonnet 4.5


To call the US Anthropic Claude Sonnet 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-sonnet-4-5-20250929-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ca-central-1 |  ca-central-1 us-east-1 us-east-2 us-west-2  | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Anthropic Claude Sonnet 4.6


To call the US Anthropic Claude Sonnet 4.6 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-sonnet-4-6
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ca-central-1 |  ca-central-1 us-east-1 us-east-2 us-west-2  | 
| ca-west-1 |  ca-west-1 us-east-1 us-east-2 us-west-2  | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Claude Opus 4


To call the US Claude Opus 4 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-opus-4-20250514-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Claude Opus 4.1


To call the US Claude Opus 4.1 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-opus-4-1-20250805-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Claude Sonnet 4


To call the US Claude Sonnet 4 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.anthropic.claude-sonnet-4-20250514-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Cohere Embed v4


To call the US Cohere Embed v4 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.cohere.embed-v4:0
```

For more information about inference parameters for this model, see [Link](model-parameters-embed.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US DeepSeek-R1


To call the US DeepSeek-R1 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.deepseek.r1-v1:0
```

For more information about inference parameters for this model, see [Link](https://www.deepseek.com/).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Llama 4 Maverick 17B Instruct


To call the US Llama 4 Maverick 17B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
us.meta.llama4-maverick-17b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Llama 4 Scout 17B Instruct


To call the US Llama 4 Scout 17B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
us.meta.llama4-scout-17b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Meta Llama 3.1 70B Instruct


To call the US Meta Llama 3.1 70B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
us.meta.llama3-1-70b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Meta Llama 3.1 8B Instruct


To call the US Meta Llama 3.1 8B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
us.meta.llama3-1-8b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Meta Llama 3.1 Instruct 405B


To call the US Meta Llama 3.1 Instruct 405B inference profile, specify the following inference profile ID in one of the source Regions:

```
us.meta.llama3-1-405b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 

### US Meta Llama 3.2 11B Instruct


To call the US Meta Llama 3.2 11B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
us.meta.llama3-2-11b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-west-2  | 

### US Meta Llama 3.2 1B Instruct


To call the US Meta Llama 3.2 1B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
us.meta.llama3-2-1b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-west-2  | 

### US Meta Llama 3.2 3B Instruct


To call the US Meta Llama 3.2 3B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
us.meta.llama3-2-3b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-west-2  | 

### US Meta Llama 3.2 90B Instruct


To call the US Meta Llama 3.2 90B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
us.meta.llama3-2-90b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-west-2  | 

### US Meta Llama 3.3 70B Instruct


To call the US Meta Llama 3.3 70B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
us.meta.llama3-3-70b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Mistral Pixtral Large 25.02


To call the US Mistral Pixtral Large 25.02 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.mistral.pixtral-large-2502-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-mistral.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Nova Lite


To call the US Nova Lite inference profile, specify the following inference profile ID in one of the source Regions:

```
us.amazon.nova-lite-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Nova Micro


To call the US Nova Micro inference profile, specify the following inference profile ID in one of the source Regions:

```
us.amazon.nova-micro-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Nova Premier


To call the US Nova Premier inference profile, specify the following inference profile ID in one of the source Regions:

```
us.amazon.nova-premier-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Nova Pro


To call the US Nova Pro inference profile, specify the following inference profile ID in one of the source Regions:

```
us.amazon.nova-pro-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Pegasus v1.2


To call the US Pegasus v1.2 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.twelvelabs.pegasus-1-2-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-pegasus.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Conservative Upscale


To call the US Stable Image Conservative Upscale inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-conservative-upscale-v1:0
```

For more information about inference parameters for this model, see [Link](stable-image-services.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Control Sketch


To call the US Stable Image Control Sketch inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-image-control-sketch-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-stability-diffusion.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Control Structure


To call the US Stable Image Control Structure inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-image-control-structure-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-stability-diffusion.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Creative Upscale


To call the US Stable Image Creative Upscale inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-creative-upscale-v1:0
```

For more information about inference parameters for this model, see [Link](stable-image-services.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Erase Object


To call the US Stable Image Erase Object inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-image-erase-object-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-stability-diffusion.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Fast Upscale


To call the US Stable Image Fast Upscale inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-fast-upscale-v1:0
```

For more information about inference parameters for this model, see [Link](stable-image-services.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Inpaint


To call the US Stable Image Inpaint inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-image-inpaint-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-stability-diffusion.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Outpaint


To call the US Stable Image Outpaint inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-outpaint-v1:0
```

For more information about inference parameters for this model, see [Link](stable-image-services.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Remove Background


To call the US Stable Image Remove Background inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-image-remove-background-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-stability-diffusion.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Search and Recolor


To call the US Stable Image Search and Recolor inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-image-search-recolor-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-stability-diffusion.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Search and Replace


To call the US Stable Image Search and Replace inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-image-search-replace-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-stability-diffusion.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Style Guide


To call the US Stable Image Style Guide inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-image-style-guide-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-stability-diffusion.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Stable Image Style Transfer


To call the US Stable Image Style Transfer inference profile, specify the following inference profile ID in one of the source Regions:

```
us.stability.stable-style-transfer-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-stability-diffusion.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US TwelveLabs Marengo Embed 3.0


To call the US TwelveLabs Marengo Embed 3.0 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.twelvelabs.marengo-embed-3-0-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-marengo.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 

### US TwelveLabs Marengo Embed v2.7


To call the US TwelveLabs Marengo Embed v2.7 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.twelvelabs.marengo-embed-2-7-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-marengo.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 

### US Writer Palmyra X4


To call the US Writer Palmyra X4 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.writer.palmyra-x4-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-writer-palmyra.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US Writer Palmyra X5


To call the US Writer Palmyra X5 inference profile, specify the following inference profile ID in one of the source Regions:

```
us.writer.palmyra-x5-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-writer-palmyra.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-east-1 |  us-east-1 us-east-2 us-west-2  | 
| us-east-2 |  us-east-1 us-east-2 us-west-2  | 
| us-west-1 |  us-east-1 us-east-2 us-west-1 us-west-2  | 
| us-west-2 |  us-east-1 us-east-2 us-west-2  | 

### US-GOV Claude 3 Haiku


To call the US-GOV Claude 3 Haiku inference profile, specify the following inference profile ID in one of the source Regions:

```
us-gov.anthropic.claude-3-haiku-20240307-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-gov-east-1 |  us-gov-east-1 us-gov-west-1  | 

### US-GOV Claude 3.5 Sonnet


To call the US-GOV Claude 3.5 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
us-gov.anthropic.claude-3-5-sonnet-20240620-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-gov-east-1 |  us-gov-east-1 us-gov-west-1  | 

### US-GOV Claude 3.7 Sonnet


To call the US-GOV Claude 3.7 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
us-gov.anthropic.claude-3-7-sonnet-20250219-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-gov-east-1 |  us-gov-east-1 us-gov-west-1  | 

### US-GOV Claude Sonnet 4.5


To call the US-GOV Claude Sonnet 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
us-gov.anthropic.claude-sonnet-4-5-20250929-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| us-gov-east-1 |  us-gov-west-1  | 
| us-gov-west-1 |  us-gov-west-1  | 

### APAC Anthropic Claude 3 Haiku


To call the APAC Anthropic Claude 3 Haiku inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.anthropic.claude-3-haiku-20240307-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-south-1 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-1 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-2 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 

### APAC Anthropic Claude 3 Sonnet


To call the APAC Anthropic Claude 3 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.anthropic.claude-3-sonnet-20240229-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-south-1 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-1 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-2 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 

### APAC Anthropic Claude 3.5 Sonnet


To call the APAC Anthropic Claude 3.5 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.anthropic.claude-3-5-sonnet-20240620-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-south-1 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-1 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-2 |  ap-northeast-1 ap-northeast-2 ap-south-1 ap-southeast-1 ap-southeast-2  | 

### APAC Anthropic Claude 3.5 Sonnet v2


To call the APAC Anthropic Claude 3.5 Sonnet v2 inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.anthropic.claude-3-5-sonnet-20241022-v2:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-northeast-3 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-south-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-south-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 

### APAC Anthropic Claude 3.7 Sonnet


To call the APAC Anthropic Claude 3.7 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.anthropic.claude-3-7-sonnet-20250219-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2  | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2  | 
| ap-northeast-3 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2  | 
| ap-south-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2  | 
| ap-south-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2  | 

### APAC Claude Sonnet 4


To call the APAC Claude Sonnet 4 inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.anthropic.claude-sonnet-4-20250514-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-east-2 |  ap-east-2 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4  | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 
| ap-northeast-3 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 
| ap-south-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 
| ap-south-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 
| ap-southeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 
| ap-southeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 
| ap-southeast-3 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4  | 
| ap-southeast-4 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 
| ap-southeast-5 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5  | 
| ap-southeast-7 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-7  | 
| me-central-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 me-central-1  | 

### APAC Nova Lite


To call the APAC Nova Lite inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.amazon.nova-lite-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-east-2 |  ap-east-2 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4  | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-south-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-3 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4  | 
| ap-southeast-4 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 
| ap-southeast-5 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5  | 
| ap-southeast-7 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-7  | 
| me-central-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 me-central-1  | 

### APAC Nova Micro


To call the APAC Nova Micro inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.amazon.nova-micro-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-east-2 |  ap-east-2 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4  | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-south-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-3 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4  | 
| ap-southeast-5 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5  | 
| ap-southeast-7 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-7  | 
| me-central-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 me-central-1  | 

### APAC Nova Pro


To call the APAC Nova Pro inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.amazon.nova-pro-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-east-2 |  ap-east-2 ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4  | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-south-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-southeast-1 ap-southeast-2  | 
| ap-southeast-3 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4  | 
| ap-southeast-4 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 
| ap-southeast-5 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-5  | 
| ap-southeast-7 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 ap-southeast-7  | 
| me-central-1 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4 me-central-1  | 

### APAC Pegasus v1.2


To call the APAC Pegasus v1.2 inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.twelvelabs.pegasus-1-2-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-pegasus.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-4  | 

### APAC TwelveLabs Marengo Embed v2.7


To call the APAC TwelveLabs Marengo Embed v2.7 inference profile, specify the following inference profile ID in one of the source Regions:

```
apac.twelvelabs.marengo-embed-2-7-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-marengo.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-2 |  ap-northeast-1 ap-northeast-2 ap-northeast-3 ap-south-1 ap-south-2 ap-southeast-1 ap-southeast-2 ap-southeast-3 ap-southeast-4  | 

### AU AU Anthropic Claude Sonnet 4.5


To call the AU AU Anthropic Claude Sonnet 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
au.anthropic.claude-sonnet-4-5-20250929-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-southeast-2 |  ap-southeast-2 ap-southeast-4  | 
| ap-southeast-4 |  ap-southeast-2 ap-southeast-4  | 

### AU Anthropic Claude Haiku 4.5


To call the AU Anthropic Claude Haiku 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
au.anthropic.claude-haiku-4-5-20251001-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-southeast-2 |  ap-southeast-2 ap-southeast-4  | 
| ap-southeast-4 |  ap-southeast-2 ap-southeast-4  | 

### AU Anthropic Claude Opus 4.6


To call the AU Anthropic Claude Opus 4.6 inference profile, specify the following inference profile ID in one of the source Regions:

```
au.anthropic.claude-opus-4-6-v1
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-southeast-2 |  ap-southeast-2 ap-southeast-4  | 
| ap-southeast-4 |  ap-southeast-2 ap-southeast-4  | 

### AU Anthropic Claude Sonnet 4.6


To call the AU Anthropic Claude Sonnet 4.6 inference profile, specify the following inference profile ID in one of the source Regions:

```
au.anthropic.claude-sonnet-4-6
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-southeast-2 |  ap-southeast-2 ap-southeast-4  | 
| ap-southeast-4 |  ap-southeast-2 ap-southeast-4  | 

### CA Nova Lite


To call the CA Nova Lite inference profile, specify the following inference profile ID in one of the source Regions:

```
ca.amazon.nova-lite-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ca-central-1 |  ca-central-1 ca-west-1  | 
| ca-west-1 |  ca-central-1 ca-west-1  | 

### EU Amazon Nova 2 Lite


To call the EU Amazon Nova 2 Lite inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.amazon.nova-2-lite-v1:0
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 

### EU Anthropic Claude 3 Haiku


To call the EU Anthropic Claude 3 Haiku inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.anthropic.claude-3-haiku-20240307-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-west-1 eu-west-3  | 

### EU Anthropic Claude 3 Sonnet


To call the EU Anthropic Claude 3 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.anthropic.claude-3-sonnet-20240229-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-west-1 eu-west-3  | 

### EU Anthropic Claude 3.5 Sonnet


To call the EU Anthropic Claude 3.5 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.anthropic.claude-3-5-sonnet-20240620-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-west-1 eu-west-3  | 

### EU Anthropic Claude 3.7 Sonnet


To call the EU Anthropic Claude 3.7 Sonnet inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.anthropic.claude-3-7-sonnet-20250219-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 

### EU Anthropic Claude Haiku 4.5


To call the EU Anthropic Claude Haiku 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.anthropic.claude-haiku-4-5-20251001-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-central-2 |  eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 

### EU Anthropic Claude Opus 4.5


To call the EU Anthropic Claude Opus 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.anthropic.claude-opus-4-5-20251101-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-central-2 |  eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 

### EU Anthropic Claude Opus 4.6


To call the EU Anthropic Claude Opus 4.6 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.anthropic.claude-opus-4-6-v1
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-central-2 |  eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 

### EU Anthropic Claude Sonnet 4.5


To call the EU Anthropic Claude Sonnet 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.anthropic.claude-sonnet-4-5-20250929-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-central-2 |  eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 

### EU Anthropic Claude Sonnet 4.6


To call the EU Anthropic Claude Sonnet 4.6 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.anthropic.claude-sonnet-4-6
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-central-2 |  eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 

### EU Claude Sonnet 4


To call the EU Claude Sonnet 4 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.anthropic.claude-sonnet-4-20250514-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| il-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3 il-central-1  | 

### EU Cohere Embed v4


To call the EU Cohere Embed v4 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.cohere.embed-v4:0
```

For more information about inference parameters for this model, see [Link](model-parameters-embed.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 

### EU Meta Llama 3.2 1B Instruct


To call the EU Meta Llama 3.2 1B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.meta.llama3-2-1b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-west-1 eu-west-3  | 

### EU Meta Llama 3.2 3B Instruct


To call the EU Meta Llama 3.2 3B Instruct inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.meta.llama3-2-3b-instruct-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-meta.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-west-1 eu-west-3  | 

### EU Mistral Pixtral Large 25.02


To call the EU Mistral Pixtral Large 25.02 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.mistral.pixtral-large-2502-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-mistral.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 

### EU Nova Lite


To call the EU Nova Lite inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.amazon.nova-lite-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| il-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-3 il-central-1  | 

### EU Nova Micro


To call the EU Nova Micro inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.amazon.nova-micro-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| il-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-3 il-central-1  | 

### EU Nova Pro


To call the EU Nova Pro inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.amazon.nova-pro-v1:0
```

For more information about inference parameters for this model, see [Link](https://docs.aws.amazon.com/nova/latest/userguide/getting-started-schema.html).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-west-1 eu-west-3  | 
| il-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-west-1 eu-west-3 il-central-1  | 

### EU TwelveLabs Marengo Embed 3.0


To call the EU TwelveLabs Marengo Embed 3.0 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.twelvelabs.marengo-embed-3-0-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-marengo.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 

### EU TwelveLabs Marengo Embed v2.7


To call the EU TwelveLabs Marengo Embed v2.7 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.twelvelabs.marengo-embed-2-7-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-marengo.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 

### EU TwelveLabs Pegasus v1.2


To call the EU TwelveLabs Pegasus v1.2 inference profile, specify the following inference profile ID in one of the source Regions:

```
eu.twelvelabs.pegasus-1-2-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-pegasus.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| eu-central-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-central-2 |  eu-central-1 eu-central-2 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-north-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-south-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-1 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 
| eu-west-2 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-2 eu-west-3  | 
| eu-west-3 |  eu-central-1 eu-north-1 eu-south-1 eu-south-2 eu-west-1 eu-west-3  | 

### JP Amazon Nova 2 Lite


To call the JP Amazon Nova 2 Lite inference profile, specify the following inference profile ID in one of the source Regions:

```
jp.amazon.nova-2-lite-v1:0
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-3  | 

### JP Anthropic Claude Haiku 4.5


To call the JP Anthropic Claude Haiku 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
jp.anthropic.claude-haiku-4-5-20251001-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-3  | 
| ap-northeast-3 |  ap-northeast-1 ap-northeast-3  | 

### JP Anthropic Claude Sonnet 4.5


To call the JP Anthropic Claude Sonnet 4.5 inference profile, specify the following inference profile ID in one of the source Regions:

```
jp.anthropic.claude-sonnet-4-5-20250929-v1:0
```

For more information about inference parameters for this model, see [Link](model-parameters-claude.md).

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-3  | 
| ap-northeast-3 |  ap-northeast-1 ap-northeast-3  | 

### JP Anthropic Claude Sonnet 4.6


To call the JP Anthropic Claude Sonnet 4.6 inference profile, specify the following inference profile ID in one of the source Regions:

```
jp.anthropic.claude-sonnet-4-6
```

The following table shows the source Regions from which you can call the inference profile and the destination Regions to which the requests can be routed:


| Source Regions | Destination Regions | 
| --- | --- | 
| ap-northeast-1 |  ap-northeast-1 ap-northeast-3  | 
| ap-northeast-3 |  ap-northeast-1 ap-northeast-3  | 

## Supported Regions and models for application inference profiles


Application inference profiles can be created for all models in the following AWS Regions:
+ ap-northeast-1
+ ap-northeast-2
+ ap-south-1
+ ap-southeast-1
+ ap-southeast-2
+ ca-central-1
+ eu-central-1
+ eu-west-1
+ eu-west-2
+ eu-west-3
+ sa-east-1
+ us-east-1
+ us-east-2
+ us-gov-east-1
+ us-west-2

Application inference profiles can be created from all models and inference profiles supported in Amazon Bedrock. For more information about models supported in Amazon Bedrock, see [Supported foundation models in Amazon Bedrock](models-supported.md).

# Prerequisites for inference profiles
Prerequisites

Before you can use an inference profile, check that you've fulfilled the following prerequisites:
+ Your role has access to the inference profile API actions. If your role has the [AmazonBedrockFullAccess](security-iam-awsmanpol.md#security-iam-awsmanpol-AmazonBedrockFullAccess) AWS-managed policy attached, you can skip this step. Otherwise, do the following:

  1. Follow the steps at [Creating IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) and create the following policy, which allows a role to do inference profile-related actions and run model inference using all foundation models and inference profiles.

------
#### [ JSON ]

****  

     ```
     {
         "Version":"2012-10-17",		 	 	 
         "Statement": [
             {
                 "Effect": "Allow",
                 "Action": [
                     "bedrock:InvokeModel*",
                     "bedrock:CreateInferenceProfile"
                 ],
                 "Resource": [
                     "arn:aws:bedrock:*::foundation-model/*",
                     "arn:aws:bedrock:*:*:inference-profile/*",
                     "arn:aws:bedrock:*:*:application-inference-profile/*"
                 ]
             },
             {
                 "Effect": "Allow",
                 "Action": [
                     "bedrock:GetInferenceProfile",
                     "bedrock:ListInferenceProfiles",
                     "bedrock:DeleteInferenceProfile",
                     "bedrock:TagResource",
                     "bedrock:UntagResource",
                     "bedrock:ListTagsForResource"
                 ],
                 "Resource": [
                     "arn:aws:bedrock:*:*:inference-profile/*",
                     "arn:aws:bedrock:*:*:application-inference-profile/*"
                 ]
             }
         ]
     }
     ```

------

     (Optional) You can restrict the role's access in the following ways:
     + To restrict the API actions that the role can make, modify the list in the `Action` field to contain only the [API operations](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-actions-as-permissions) that you want to allow access to.
     + To restrict the role's access to specific inference profiles, modify the `Resource` list to contain only the [inference profiles](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-resources-for-iam-policies) and foundation models that you want to allow access to. System-defined inference profiles begin with `inference-profile` and application inference profiles begin with `application-inference-profile`.
**Important**  
When you specify an inference profile in the `Resource` field in the first statement, you must also specify the foundation model in each Region associated with it.
     + To restrict user access such that they can invoke a foundation model only through an inference profile, add a `Condition` field and use the `aws:InferenceProfileArn` [condition key](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-policy-keys). Specify the inference profile that you want to filter access on. This condition can be included in a statement that scopes to the `foundation-model` resources.
     + For example, you can attach the following policy to a role to allow it to invoke the Anthropic Claude 3 Haiku model only through the US Anthropic Claude 3 Haiku inference profile in the account *111122223333* in us-west-2:

------
#### [ JSON ]

****  

       ```
       {
           "Version":"2012-10-17",		 	 	 
           "Statement": [
               {
                   "Effect": "Allow",
                   "Action": [
                       "bedrock:InvokeModel*"
                   ],
                   "Resource": [
                       "arn:aws:bedrock:us-west-2:111122223333:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"
                   ]
               },
               {
                   "Effect": "Allow",
                   "Action": [
                       "bedrock:InvokeModel*"
                   ],
                   "Resource": [
                       "arn:aws:bedrock:us-east-1::foundation-model/anthropic.claude-3-haiku-20240307-v1:0",
                       "arn:aws:bedrock:us-west-2::foundation-model/anthropic.claude-3-haiku-20240307-v1:0"
                   ],
                   "Condition": {
                       "StringLike": {
                           "bedrock:InferenceProfileArn": "arn:aws:bedrock:us-west-2:111122223333:inference-profile/us.anthropic.claude-3-haiku-20240307-v1:0"
                       }
                   }
               }
           ]
       }
       ```

------
     + For example, you can attach the following policy to a role to allow it to invoke the Anthropic Claude Sonnet 4 model only through the Global Claude Sonnet 4 inference profile in the account 111122223333 in us-east-2 (US East (Ohio)).

------
#### [ JSON ]

****  

       ```
       {
           "Version":"2012-10-17",		 	 	 
           "Statement": [
               {
                   "Effect": "Allow",
                   "Action": [
                       "bedrock:InvokeModel*"
                   ],
                   "Resource": [
                       "arn:aws:bedrock:us-east-2:111122223333:inference-profile/global.anthropic.claude-sonnet-4-20250514-v1:0"
                   ]
               },
               {
                   "Effect": "Allow",
                   "Action": [
                       "bedrock:InvokeModel*"
                   ],
                   "Resource": [
                       "arn:aws:bedrock:us-east-2::foundation-model/anthropic.claude-sonnet-4-20250514-v1:0",
                       "arn:aws:bedrock:::foundation-model/anthropic.claude-sonnet-4-20250514-v1:0"
                   ],
                   "Condition": {
                       "StringLike": {
                           "bedrock:InferenceProfileArn": "arn:aws:bedrock:us-east-2:111122223333:inference-profile/global.anthropic.claude-sonnet-4-20250514-v1:0"
                       }
                   }
               }
           ]
       }
       ```

------
     + You can also restrict the use of the Global Claude Sonnet 4 inference profile by adding an explicit Deny with a `StringEquals` condition that checks the request context key `aws:RequestedRegion` equals unspecified. Because it matches `StringEquals`, the Deny overrides any Allow and blocks Global routing of inference requests.

       ```
       {
           "Effect": "Deny",
           "Action": [
               "bedrock:InvokeModel*"
           ],
           "Resource": "*",
           "Condition": {
               "StringEquals": {
                   "aws:RequestedRegion": "unspecified"
               }
           }
       },
       ```

  1. Follow the steps at [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html) to attach the policy to a role to grant the role permissions to view and use all the inference profiles.
+ You've requested access to the model defined in the inference profile that you want to use, in the Region from which you want to call the inference profile.

# Create an application inference profile


You can create an application inference profile with one or more Regions to track usage and costs when invoking a model.
+ To create an application inference profile for one Region, specify a foundation model. Usage and costs for requests made to that Region with that model will be tracked.
+ To create an application inference profile for multiple Regions, specify a cross Region (system-defined) inference profile. The inference profile will route requests to the Regions defined in the cross Region (system-defined) inference profile that you choose. Usage and costs for requests made to the Regions in the inference profile will be tracked.

Currently, you can only create an inference profile using the Amazon Bedrock API.

To create an inference profile, send a [CreateInferenceProfile](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateInferenceProfile.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp).

The following fields are required:


****  

| Field | Use case | 
| --- | --- | 
| inferenceProfileName | To specify a name for the inference profile. | 
| modelSource | To specify the foundation model or cross Region (system-defined) inference profile that defines the model and Regions for which you want to track costs and usage. | 

The following fields are optional:


****  

| Field | Use case | 
| --- | --- | 
| description | To provide a description for the inference profile. | 
| tags | To attach tags to the inference profile. For more information, see [Tagging Amazon Bedrock resources](tagging.md) and [Organizing and tracking costs using AWS cost allocation tags](https://docs.aws.amazon.com//awsaccountbilling/latest/aboutv2/cost-alloc-tags.html). | 
| clientRequestToken | To ensure the API request completes only once. For more information, see [Ensuring idempotency](https://docs.aws.amazon.com/ec2/latest/devguide/ec2-api-idempotency.html). | 

The response returns an `inferenceProfileArn` that can be used in other inference profile-related actions and that can be used with model invocation and Amazon Bedrock resources.

# Modify the tags for an application inference profile


After you create an application inference profile, you can still manage tags through the Amazon Bedrock API by submitting a [TagResource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_TagResource.html) or [UntagResource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_UntagResource.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) and specifying the ARN of the application inference profile in the `resourceArn` field. To learn more about tagging, see [Tagging Amazon Bedrock resources](tagging.md).

# View information about an inference profile


You can view information about cross Region inference profiles or application inference profiles that you've created. To learn how to view information about an inference profile, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

**To view information about a cross Region (system-defined) inference profile**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. Select **Cross-Region inference** from the left navigation pane. Then, in the **Cross-Region inference** section, choose an inference profile.

1. View the details of the inference profile in the **Inference profile details** section and the Regions that it encompasses in the **Models** section.

**Note**  
You can't view application inference profiles in the Amazon Bedrock console.

------
#### [ API ]

To get information about an inference profile, send a [GetInferenceProfile](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetInferenceProfile.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) and specify the Amazon Resource Name (ARN) or ID of the inference profile in the `inferenceProfileIdentifier` field.

To list information about the inference profiles that you can use, send a [ListInferenceProfiles](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListInferenceProfiles.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp). You can specify the following optional parameters:


****  

| Field | Short description | 
| --- | --- | 
| maxResults | The maximum number of results to return in a response. | 
| nextToken | If there are more results than the number you specified in the maxResults field, the response returns a nextToken value. To see the next batch of results, send the nextToken value in another request. | 

------

# Use an inference profile in model invocation


You can use a cross Region inference profile in place of a foundation model to route requests to multiple Regions. To track costs and usage for a model, in one or multiple Regions, you can use an application inference profile. To learn how to use an inference profile when running model inference, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

To use an inference profile with a feature that supports it, do the following:

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. Navigate to the page for the feature that you want to use an inference profile for. For example, select **Chat / Text playground** from the left navigation pane.

1. Choose **Select model** and then choose the model. For example, choose **Amazon** and then **Nova Premier**.

1. Under **Inference**, select **Inference profiles** from the dropdown menu.

1. Select the inference profile to use (for example, **US Nova Premier**) and then choose **Apply**.

------
#### [ API ]

You can use an inference profile when running inference from any Region that is included in it with the following API operations:
+ [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html) – To use an inference profile in model invocation, follow the steps at [Submit a single prompt with InvokeModel](inference-invoke.md) and specify the Amazon Resource Name (ARN) of the inference profile in the `modelId` field. For an example, see [Use an inference profile in model invocation](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html#API_runtime_InvokeModel_Example_5).
+ [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) or [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) – To use an inference profile in model invocation with the Converse API, follow the steps at [Carry out a conversation with the Converse API operations](conversation-inference.md) and specify the ARN of the inference profile in the `modelId` field. For an example, see [Use an inference profile in a conversation](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html#API_runtime_Converse_Example_5).
+ [https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html) – To use an inference profile when generating responses from the results of querying a knowledge base, follow the steps in the API tab in [Test your knowledge base with queries and responses](knowledge-base-test.md) and specify the ARN of the inference profile in the `modelArn` field. For more information, see [Use an inference proflie to generate a response](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent-runtime_RetrieveAndGenerate.html#API_agent-runtime_RetrieveAndGenerate_Example_3).
+ [CreateEvaluationJob](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateEvaluationJob.html) – To submit an inference profile for model evaluation, follow the steps in the API tab in [Starting an automatic model evaluation job in Amazon Bedrock](model-evaluation-jobs-management-create.md) and specify the ARN of the inference profile in the `modelIdentifier` field.
+ [CreatePrompt](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreatePrompt.html) – To use an inference profile when generating a response for a prompt you create in Prompt management, follow the steps in the API tab in [Create a prompt using Prompt management](prompt-management-create.md) and specify the ARN of the inference profile in the `modelId` field.
+ [CreateFlow](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateFlow.html) – To use an inference profile when generating a response for an inline prompt that you define within a prompt node in a flow, follow the steps in the API tab in [Create and design a flow in Amazon Bedrock](flows-create.md). In defining the [prompt node](flows-nodes.md#flows-nodes-prompt), specify the ARN of the inference profile in the `modelId` field.
+ [CreateDataSource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateDataSource.html) – To use an inference profile when parsing non-textual information in a data source, follow the steps in the API section in [Parsing options for your data source](kb-advanced-parsing.md) and specify the ARN of the inference profile in the `modelArn` field.

**Note**  
If you're using a cross-Region (system-defined) inference profile, you can use either the ARN or the ID of the inference profile.

------

# Delete an application inference profile


If you no longer need an application inference profile, you can delete it. You can only delete inference profiles through the Amazon Bedrock API.

To delete an inference profile, send a [DeleteInferenceProfile](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_DeleteInferenceProfiles.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) and specify the Amazon Resource Name (ARN) or ID of the inference profile to delete in the `inferenceProflieIdentifier` field.

# Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock
Provisioned Throughput

**Throughput** refers to the number and rate of inputs and outputs that a model processes and returns. You can purchase **Provisioned Throughput** to provision a higher level of throughput for a model at a fixed cost. If you customized a model, you must purchase Provisioned Throughput to be able to use it.

You're billed hourly for a Provisioned Throughput that you purchase. For detailed information about pricing, see [Amazon Bedrock Pricing](https://aws.amazon.com/bedrock/pricing). The price per hour depends on the following factors:

1. The model that you choose (for custom models, pricing is the same as the base model that it was customized from).

1. The number of Model Units (MUs) that you specify for the Provisioned Throughput. An MU delivers a specific throughput level for the specified model. The throughput level of an MU specifies the following:
   + The number of input tokens that an MU can process across all requests within a span of one minute. 
   + The number of output tokens that an MU can generate across all requests within a span of one minute.
**Note**  
For more information about what an MU specifies, pricing per MU, and to request limit increases, contact your AWS account manager.

1. The duration of time you commit to keeping the Provisioned Throughput. The longer the commitment duration, the more discounted the hourly price becomes. You can choose between the following levels of commitment:
   + No commitment – You can delete the Provisioned Throughput at any time.
   + 1 month – You can't delete the Provisioned Throughput until the one month commitment term is over.
   + 6 months – You can't delete the Provisioned Throughput until the six month commitment term is over.
**Note**  
Billing continues until you delete the Provisioned Throughput.

The following steps outline the process of setting up and using Provisioned Throughput.

1. Determine the number of MUs you wish to purchase for a Provisioned Throughput and the amount of time for which you want to commit to using the Provisioned Throughput.

1. Purchase Provisioned Throughput for a base or custom model.

1. After the provisioned model is created, you can use it to [run model inference](inference.md).

**Topics**
+ [

# Supported Region and models for Provisioned Throughput
](prov-thru-supported.md)
+ [

# Prerequisites for Provisioned Throughput
](prov-thru-prereq.md)
+ [

# Purchase a Provisioned Throughput for an Amazon Bedrock model
](prov-thru-purchase.md)
+ [

# View information about a Provisioned Throughput
](prov-thru-info.md)
+ [

# Modify a Provisioned Throughput
](prov-thru-edit.md)
+ [

# Use a Provisioned Throughput with an Amazon Bedrock resource
](prov-thru-use.md)
+ [

# Delete a Provisioned Throughput or cancel auto renew
](prov-thru-delete.md)
+ [

# Code examples for Provisioned Throughput
](prov-thru-code-examples.md)

# Supported Region and models for Provisioned Throughput
Supported Regions and models

If you purchase Provisioned Throughput through the Amazon Bedrock API, you must specify a contextual variant of Amazon Bedrock FMs for the model ID.

**Note**  
Provisioned Throughput is supported in AWS GovCloud (US-West) only for custom models with a no-commitment purchase. Use the ID of a custom model when purchasing Provisioned Throughput for it.

The following table shows the models for which you can purchase Provisioned Throughput, the model ID to use when purchasing Provisioned Throughput, and the AWS Regions in which you can purchase Provisioned Throughput for the model.


| Provider | Model | Model ID | Single-region model support | 
| --- | --- | --- | --- | 
| Amazon | Nova 2 Lite | amazon.nova-2-lite-v1:0:256k |  us-east-1  | 
| Amazon | Nova Canvas | amazon.nova-canvas-v1:0 |  us-east-1  | 
| Amazon | Nova Lite | amazon.nova-lite-v1:0:24k |  us-east-1  | 
| Amazon | Nova Lite | amazon.nova-lite-v1:0:300k |  us-east-1  | 
| Amazon | Nova Micro | amazon.nova-micro-v1:0:128k |  us-east-1  | 
| Amazon | Nova Micro | amazon.nova-micro-v1:0:24k |  us-east-1  | 
| Amazon | Nova Pro | amazon.nova-pro-v1:0:24k |  us-east-1  | 
| Amazon | Nova Pro | amazon.nova-pro-v1:0:300k |  us-east-1  | 
| Amazon | Titan Embeddings G1 - Text | amazon.titan-embed-text-v1:2:8k |  us-east-1 us-west-2  | 
| Amazon | Titan Image Generator G1 v2 | amazon.titan-image-generator-v2:0 |  us-east-1 us-west-2  | 
| Amazon | Titan Multimodal Embeddings G1 | amazon.titan-embed-image-v1:0 |  ap-south-1 ap-southeast-2 ca-central-1 eu-central-1 eu-west-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-west-2  | 
| Anthropic | Claude | anthropic.claude-v2:0:100k |  us-east-1 us-west-2  | 
| Anthropic | Claude | anthropic.claude-v2:0:18k |  us-east-1 us-west-2  | 
| Anthropic | Claude | anthropic.claude-v2:1:18k |  eu-central-1 us-east-1 us-west-2  | 
| Anthropic | Claude | anthropic.claude-v2:1:200k |  eu-central-1 us-east-1 us-west-2  | 
| Anthropic | Claude 3 Haiku | anthropic.claude-3-haiku-20240307-v1:0:200k |  ap-southeast-2 eu-west-3 us-east-1 us-west-2  | 
| Anthropic | Claude 3 Haiku | anthropic.claude-3-haiku-20240307-v1:0:48k |  ap-south-1 ap-southeast-2 eu-west-1 eu-west-3 us-east-1 us-west-2  | 
| Anthropic | Claude 3 Sonnet | anthropic.claude-3-sonnet-20240229-v1:0:200k |  ap-southeast-2 eu-west-1 eu-west-3 us-east-1 us-west-2  | 
| Anthropic | Claude 3 Sonnet | anthropic.claude-3-sonnet-20240229-v1:0:28k |  ap-south-1 ap-southeast-2 eu-west-1 eu-west-3 us-east-1 us-west-2  | 
| Anthropic | Claude 3.5 Sonnet | anthropic.claude-3-5-sonnet-20240620-v1:0:18k |  us-west-2  | 
| Anthropic | Claude 3.5 Sonnet | anthropic.claude-3-5-sonnet-20240620-v1:0:200k |  us-west-2  | 
| Anthropic | Claude 3.5 Sonnet | anthropic.claude-3-5-sonnet-20240620-v1:0:51k |  us-west-2  | 
| Anthropic | Claude 3.5 Sonnet v2 | anthropic.claude-3-5-sonnet-20241022-v2:0:18k |  us-west-2  | 
| Anthropic | Claude 3.5 Sonnet v2 | anthropic.claude-3-5-sonnet-20241022-v2:0:200k |  us-west-2  | 
| Anthropic | Claude 3.5 Sonnet v2 | anthropic.claude-3-5-sonnet-20241022-v2:0:51k |  us-west-2  | 
| Anthropic | Claude Instant | anthropic.claude-instant-v1:2:100k |  us-east-1 us-west-2  | 
| Cohere | Embed English | cohere.embed-english-v3:0:512 |  ca-central-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-west-2  | 
| Cohere | Embed Multilingual | cohere.embed-multilingual-v3:0:512 |  ca-central-1 eu-west-2 eu-west-3 sa-east-1 us-east-1 us-west-2  | 
| Meta | Llama 3.1 70B Instruct | meta.llama3-1-70b-instruct-v1:0:128k |  us-west-2  | 
| Meta | Llama 3.1 8B Instruct | meta.llama3-1-8b-instruct-v1:0:128k |  us-west-2  | 
| Meta | Llama 3.2 11B Instruct | meta.llama3-2-11b-instruct-v1:0:128k |  us-west-2  | 
| Meta | Llama 3.2 1B Instruct | meta.llama3-2-1b-instruct-v1:0:128k |  us-west-2  | 
| Meta | Llama 3.2 3B Instruct | meta.llama3-2-3b-instruct-v1:0:128k |  us-west-2  | 
| Meta | Llama 3.2 90B Instruct | meta.llama3-2-90b-instruct-v1:0:128k |  us-west-2  | 

**Note**  
The following models don't support no-commitment purchases for the base model:  
Titan Image Generator G1 V1
Titan Image Generator G1 V2

# Prerequisites for Provisioned Throughput
Prerequisites

Before you can purchase and manage Provisioned Throughput, you need to fulfill the following prerequisites:

1. [Request access to the model or models](model-access.md) that you want to purchase Provisioned Throughput for. After access has been granted, you can purchase Provisioned Throughput for the base model and any models customized from it.

1. Ensure that your IAM role has access to the Provisioned Throughput API actions. If your role has the [AmazonBedrockFullAccess](security-iam-awsmanpol.md#security-iam-awsmanpol-AmazonBedrockFullAccess) AWS-managed policy attached, you can skip this step. Otherwise, do the following:

   1. Follow the steps at [Creating IAM policies](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_create.html) and create the following policy, which allows a role to create a Provisioned Throughput for all foundation and custom models.

------
#### [ JSON ]

****  

      ```
      {
          "Version":"2012-10-17",		 	 	 
          "Statement": [
              {
                  "Sid": "PermissionsForProvisionedThroughput",
                  "Effect": "Allow",
                  "Action": [
                      "bedrock:GetFoundationModel",
                      "bedrock:ListFoundationModels",
                      "bedrock:GetCustomModel",
                      "bedrock:ListCustomModels",
                      "bedrock:InvokeModel",
                      "bedrock:InvokeModelWithResponseStream",
                      "bedrock:ListTagsForResource",
                      "bedrock:UntagResource",
                      "bedrock:TagResource",
                      "bedrock:CreateProvisionedModelThroughput",
                      "bedrock:GetProvisionedModelThroughput",
                      "bedrock:ListProvisionedModelThroughputs",
                      "bedrock:UpdateProvisionedModelThroughput",
                      "bedrock:DeleteProvisionedModelThroughput"
                  ],
                  "Resource": "*"
              }
          ]
      }
      ```

------
**Note**  
If you're using Provisioned Throughput with cross-Region inference, you may need additional permissions. See [Increase throughput with cross-Region inference](cross-region-inference.md) to learn more.

      (Optional) You can restrict the role's access in the following ways:
      + To restrict the API actions that the role can make, modify the list in the `Action` field to contain only the [API operations](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-actions-as-permissions) that you want to allow access to.
      + After creating a provisioned model, you can restrict the role's ability to perform an API request with the provisioned model by modifying the `Resource` list to contain only the [provisioned models](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-resources-for-iam-policies) that you want to allow access to. For an example, see [Allow users to invoke a provisioned model](security_iam_id-based-policy-examples.md#security_iam_id-based-policy-examples-perform-actions-pt).
      + To restrict a role's ability to create provisioned models from specific foundation or custom models, modify the `Resource` list to contain only the [foundation and custom models](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html#amazonbedrock-resources-for-iam-policies) that you want to allow access to.

   1. Follow the steps at [Adding and removing IAM identity permissions](https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies_manage-attach-detach.html) to attach the policy to a role to grant the role permissions.

1. If you're purchasing Provisioned Throughput for a custom model that's encrypted with a customer-managed AWS KMS key, your IAM role must have permissions to decrypt the key. You can use the template at [Understand how to create a customer managed key and how to attach a key policy to it](encryption-custom-job.md#encryption-key-policy). For minimal permissions, you can use only the *Permissions for custom model users* policy statement.

# Purchase a Provisioned Throughput for an Amazon Bedrock model
Purchase a Provisioned Throughput

Amazon Bedrock offers two types of Provisioned Throughput - by Tokens and by Model Units. Refer to the following instructions for the type of Provisioned Throughput you wish to to purchase.

To learn more about the differences between the two types of Provisioned Throughput, see [Increase model invocation capacity with Provisioned Throughput in Amazon Bedrock](prov-throughput.md).

## Provisioned Throughput by Model Units


When you purchase a Provisioned Throughput by Model Units for a model, you specify the level of commitment for it and the number of model units (MUs) to allot. For MU quotas, see [Amazon Bedrock endpoints and quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html) in the AWS General Reference. Before you can purchase a Provisioned Throughput (with commitment or no commitment), you must first visit the [AWS support center](https://console.aws.amazon.com/support/home#/case/create?issueType=service-limit-increase) to request MUs for your account to be distributed between the Provisioned Throughputs. After your request has been granted, you can purchase a Provisioned Throughput.

**Note**  
After you purchase the Provisioned Throughput, if it's associated with a custom model, you can change the model by specifying one of the following options:  
The base model from which the custom model was customized
Another custom model that was customized from the same base model as the custom model
You can only change the associated model for Provisioned Throughputs associated with a custom model.

To learn how to purchase Provisioned Throughput for a model, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. Select **Provisioned Throughput** from the left navigation pane.

1. In the **Provisioned Throughput** section, choose **Purchase Provisioned Throughput**.

1. For the **Provisioned Throughput details** section, do the following:

   1. In the **Provisioned Throughput name** field, enter a name for the Provisioned Throughput.

   1. Under **Select model**, select a base model provider or a custom model category. Then select the model for which to provision throughput.
**Note**  
To see the base models for which you can purchase Provisioned Throughput without commitment, see the supported models documentation.  
In the AWS GovCloud (US) Region, you can only purchase Provisioned Throughput for custom models with no commitment.

   1. (Optional) To associate tags with your Provisioned Throughput, expand the **Tags** section and choose **Add new tag**. For more information, see [Tagging Amazon Bedrock resources](tagging.md).

1. For **Provisioning mode**, select **By Model Units**

1. For the **Commitment term & model units** section, do the following:

   1. In the **Select commitment term** section, select the amount of time for which you want to commit to using the Provisioned Throughput.

   1. In the **Model units** field, enter the desired number of model units (MUs). If you are provisioning a model with commitment, you must first visit the [AWS support center](https://console.aws.amazon.com/support/home#/case/create?issueType=service-limit-increase) to request an increase in the number of MUs that you can purchase.

1. Choose **Purchase Provisioned Throughput**.

1. Review the note that appears and acknowledge the commitment duration and price by selecting the checkbox. Then choose **Confirm purchase**.

1. The console displays the **Provisioned Throughput** overview page. The **Status** of the Provisioned Throughput in the Provisioned Throughput table becomes **Creating**. When the Provisioned Throughput is finished being created, the **Status** becomes **In service**. If the update fails, the **Status** becomes **Failed**.

------
#### [ API ]

To purchase a Provisioned Throughput, send a [CreateProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateProvisionedModelThroughput.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp).

To learn more about the contents of the request body and the parameters you need to supply to create a Provisioned Throughput by Model Units, see [CreateProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateProvisionedModelThroughput.html) in the *Amazon Bedrock API Reference*.

**Note**  
To see the base models for which you can purchase Provisioned Throughput without commitment, see the supported models documentation.  
In the AWS GovCloud (US) Region, you can only purchase Provisioned Throughput for custom models with no commitment.

The response returns a `provisionedModelArn` that you can use as a `modelId` in [model inference](inference.md). To check when the Provisioned Throughput is ready for use, send a [GetProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetProvisionedModelThroughput.html) request and check that the status is `InService`. If the update fails, its status will be `Failed`, and the [GetProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetProvisionedModelThroughput.html) response will contain a `failureMessage`.

[See code examples](prov-thru-code-examples.md)

------

# View information about a Provisioned Throughput


To learn how to view information about a Provisioned Throughput that you've purchased, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

**To view information about a Provisioned Throughput**

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. Select **Provisioned Throughput** from the left navigation pane.

1. From the **Provisioned Throughput** section, select a Provisioned Throughput.

1. View the details for the Provisioned Throughput in the **Provisioned Throughput overview** section and the tags associated with your Provisioned Throughput in the **Tags** section.

------
#### [ API ]

To retrieve information about a specific Provisioned Throughput, send a [GetProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetProvisionedModelThroughput.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp). Specify either the name of the Provisioned Throughput or its ARN as the `provisionedModelId`.

To list information about all the Provisioned Throughputs in an account, send a [ListProvisionedModelThroughputs](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListProvisionedModelThroughputs.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp). To control the number of results that are returned, you can specify the following optional parameters:


****  

| Field | Short description | 
| --- | --- | 
| maxResults | The maximum number of results to return in a response. | 
| nextToken | If there are more results than the number you specified in the maxResults field, the response returns a nextToken value. To see the next batch of results, send the nextToken value in another request. | 

For other optional parameters that you can specify to sort and filter the results, see [ListProvisionedModelThroughputs](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListProvisionedModelThroughputs.html).

To list all the tags for a Provisioned Throughput, send a [ListTagsForResource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListTagsForResource.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) and include the Amazon Resource Name (ARN) of the Provisioned Throughput.

[See code examples](prov-thru-code-examples.md)

------

# Modify a Provisioned Throughput


The aspects of a Provisioned Throughput you can edit after purchase depend on the provisioning mode. For Provisioned Throughputs by Model Units, you can edit only the name and tags of your Provisioned Throughput, and the model if it's a custom model.

With Provisioned Throughputs by Tokens, you have more options, including modifying the number of input and output tokens per minute for your Provisioned Throughput.

Refer to the following sections to learn more about editing the type of Provisioned Throughput you want to modify.

## Modify a Provisioned Throughput by Model Units


You can edit the name or tags of an existing Provisioned Throughput.

The following restrictions apply to changing the model that the Provisioned Throughput is associated with:
+ You can't change the model for a Provisioned Throughput associated with a base model.
+ If the Provisioned Throughput is associated with a custom model, you can change the association to the base model that it's customized from, or to another custom model that was derived from the same base model. 

While a Provisioned Throughput is updating, you can run inference using the Provisioned Throughput without disrupting the on-going traffic from your end customers. If you changed the model that the Provisioned Throughput is associated with, you might receive output from the old model until the update is fully deployed.

To learn how to edit a Provisioned Throughput, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. Select **Provisioned Throughput** from the left navigation pane.

1. From the **Provisioned Throughput** section, select a Provisioned Throughput.

1. Choose **Edit**. You can edit the following fields:
   + **Provisioned Throughput name** – Change the name of the Provisioned Throughput.
   + **Select model** – If the Provisioned Throughput is associated with a custom model, you can change the associated model.

1. You can edit the tags associated with your Provisioned Throughput in the **Tags** section. For more information, see [Tagging Amazon Bedrock resources](tagging.md).

1. To save your changes, choose **Save edits**.

1. The console displays the **Provisioned Throughput** overview page. The **Status** of the Provisioned Throughput in the Provisioned Throughput table becomes **Updating**. When the Provisioned Throughput is finished being update, the **Status** becomes **In service**. If the update fails, the **Status** becomes **Failed**.

------
#### [ API ]

To edit a Provisioned Throughput, send an [UpdateProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_UpdateProvisionedModelThroughput.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp).

To learn more about the request body and the parameters you need to supply, see [UpdateProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_UpdateProvisionedModelThroughput.html) in the *Amazon Bedrock API Reference*.

If the action is successful, the response returns an HTTP 200 status response. To check when the Provisioned Throughput is ready for use, send a [GetProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetProvisionedModelThroughput.html) request and check that the status is `InService`. You can't update or delete a Provisioned Throughput while its status is `Updating`. If the update fails, its status will be `Failed`, and the [GetProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetProvisionedModelThroughput.html) response will contain a `failureMessage`.

To add tags to a Provisioned Throughput, send a [TagResource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_TagResource.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) and include the Amazon Resource Name (ARN) of the Provisioned Throughput. The request body contains a `tags` field, which is an object containing a key-value pair that you specify for each tag.

To remove tags from a Provisioned Throughput, send an [UntagResource](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_UntagResource.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) and include the Amazon Resource Name (ARN) of the Provisioned Throughput. The `tagKeys` request parameter is a list containing the keys for the tags that you want to remove.

[See code examples](prov-thru-code-examples.md)

------

# Use a Provisioned Throughput with an Amazon Bedrock resource
Use a Provisioned Throughput

After you purchase a Provisioned Throughput, you can use it with the following features:
+ **Model inference** – You can test the Provisioned Throughput in an Amazon Bedrock console playground. When you're ready to deploy the Provisioned Throughput, set up your application to invoke the provisioned model. Choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

**To use a Provisioned Throughput in the Amazon Bedrock console playground**

  1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

  1. From the left navigation pane, select **Chat**, **Text**, or **Image** under **Playgrounds**, depending your use case.

  1. Choose **Select model**.

  1. In the **1. Category** column, select a provider or custom model category. Then, in the **2. Model** column, select the model that your Provisioned Throughput is associated with.

  1. In the **3. Throughput** column, select your Provisioned Throughput.

  1. Choose **Apply**.

  To learn how to use the Amazon Bedrock playgrounds, see [Generate responses in the console using playgrounds](playgrounds.md).

------
#### [ API ]

  To run inference using a Provisioned Throughput, send an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html), [InvokeModelWithResponseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModelWithResponseStream.html), [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html), or [ConverseStream](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_ConverseStream.html) request with an [Amazon Bedrock runtime endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-rt). Specify the provisioned model ARN as the `modelId` parameter. To see requirements for the request body for different models, see [Inference request parameters and response fields for foundation models](model-parameters.md).

  [See code examples](prov-thru-code-examples.md)

------
+ **Associate a Provisioned Throughput with an agent alias** – You can associate a Provisioned Throughput when you [create](agents-deploy.md) or [update](agents-alias-edit.md) an agent alias. In the Amazon Bedrock console, you choose the Provisioned Throughput when setting up the alias or editing it. In the Amazon Bedrock API, you specify the `provisionedThroughput` in the `routingConfiguration` when you send a [CreateAgentAlias](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_CreateAgentAlias.html) or [UpdateAgentAlias](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_agent_UpdateAgentAlias.html); request.

# Delete a Provisioned Throughput or cancel auto renew


Your Provisioned Throughput will automatically renew at the end of each commitment term, maintaining your current input and output tokens configurations.

If you don't want to keep your Provisioned Throughput, you can delete it or, for Provisioned Throughput by Tokens, cancel auto renew to prevent it renewing when the current term ends.

## Deleting a Provisioned Throughput


When you delete a Provisioned Throughput, you'll no longer be able to invoke the model at the throughput level that you purchased it for. If you delete a Provisioned Throughput associated with a custom model, the custom model isn't deleted. To learn how to delete a custom model, see [Delete a custom model](model-customization-delete.md).

**Note**  
You can't delete a Provisioned Throughput by Model Units with commitment before the commitment term is complete.

To learn how to delete a Provisioned Throughput, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. Select **Provisioned Throughput** from the left navigation pane.

1. From the **Provisioned Throughput** section, select a Provisioned Throughput.

1. Choose **Delete** from the **Actions** dropdown menu.

1. The console displays a modal form to warn you that deletion is permanent. Choose **Confirm** to proceed.

1. The Provisioned Throughput is immediately deleted.

------
#### [ API ]

To delete a Provisioned Throughput, send a [DeleteProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_DeleteProvisionedModelThroughput.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp). Specify either the name of the Provisioned Throughput or its ARN as the `provisionedModelId`. If deletion is successful, the response returns an HTTP 200 status code.

[See code examples](prov-thru-code-examples.md)

------

## Canceling auto renew for a Provisioned Throughput


For Provisioned Throughput by Tokens, you can cancel auto renew at any point before your commitment term ends to prevent a Provisioned Throughput from automatically renewing.

If you cancel auto renew, your Provisioned Throughput will remain in service until the end of your commitment term. You will still be charged the full provision fee for your current term, whether you run inference or not.

After you cancel auto renew for a Provisioned Throughput, you can't make any further modifications to your Provisioned Throughput for the remainder of the commitment term.

**Note**  
Auto renew cannot be re-enabled once cancelled. If you need Provisioned Throughput after your current term expires, you will need to purchase a new Provisioned Throughput.

To learn how to cancel auto renew for a Provisioned Throughput by Tokens, choose the tab for your preferred method, and then follow the steps:

------
#### [ Console ]

1. Sign in to the AWS Management Console with an IAM identity that has permissions to use the Amazon Bedrock console. Then, open the Amazon Bedrock console at [https://console.aws.amazon.com/bedrock](https://console.aws.amazon.com/bedrock).

1. Select **Provisioned Throughput** from the left navigation pane.

1. From the **Provisioned Throughput** section, select a Provisioned Throughput.

1. Choose **Cancel auto renew** from the **Actions** dropdown menu.

1. The console displays a modal form to warn you that this action cannot be undone. Choose **Confirm** to proceed.

1. The Provisioned Throughput will remain active until the end of the current commitment term, after which it will be automatically deleted.

------
#### [ API ]

To cancel auto renew for a Provisioned Throughput, send an [UpdateProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_UpdateProvisionedModelThroughput.html) request with an [Amazon Bedrock control plane endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-cp) with the `disableAutoRenew` parameter set to `true`. The Provisioned Throughput will remain active until the end of the current commitment term.

[See code examples](prov-thru-code-examples.md)

------

# Code examples for Provisioned Throughput
Code examples

The following code examples demonstrate how to create a Provisioned Throughput and how to manage and invoke it, using the AWS CLI and the Python SDK. You can create a Provisioned Throughput from a foundation model or from a model that you've already customized. Before you get started, carry out the following prerequisites:

**Prerequisites**

The following examples use the Amazon Nova Lite model, whose model ID is `amazon.nova-lite-v1:0:24k`. If you haven't already, request access to the Amazon Nova Lite by following the steps at [Manage model access using SDK and CLI](model-access.md#model-access-modify).

If you want to purchase Provisioned Throughput for a different foundation model or a custom model, you'll have to do the following:

1. Find the model's ID (for foundation models), name (for custom models), or ARN (for either) by doing one of the following:
   + If you're purchasing a Provisioned Throughput for a foundation model, find the ID or Amazon Resource Name (ARN) of a model that supports provisioning in one of the following ways:
     + Look up the value in the table.
     + Send a [ListFoundationModels](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListFoundationModels.html) request and specify the `byInferenceType` value as `PROVISIONED` to see a list of models that support provisioning. Find the value in the `modelId` or `modelArn` field.
   + If you're purchasing a Provisioned Throughput for a custom model, find the name or Amazon Resource Name (ARN) of the model that you customized in one of the following ways:
     + In the Amazon Bedrock console, choose **Custom models** from the left navigation pane. Find the name of your customized model in the **Models** list or select it and find the **Model ARN** in the **Model details**.
     + Send a [ListCustomModels](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_ListCustomModels.html) request and find the `modelName` or `modelArn` value of your custom model in the response.

1. Modify the `body` of the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) request in the examples below to match the format of the body of the model by finding it in [Inference request parameters and response fields for foundation models](model-parameters.md).

Choose the tab for your preferred method, and then follow the steps:

------
#### [ AWS CLI ]

1. Send a [CreateProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateProvisionedModelThroughput.html) request to create a no-commitment Provisioned Throughput called *MyPT*, by running the following command in a terminal:

   ```
   aws bedrock create-provisioned-model-throughput \
      --model-units 1 \
      --provisioned-model-name MyPT \
      --model-id amazon.nova-lite-v1:0:24k
   ```

1. The response returns a `provisioned-model-arn`. Allow some time for the creation to complete. To check its status, send a [GetProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_GetProvisionedModelThroughput.html) request and provide the name or ARN of the provisioned model as the `provisioned-model-id`, by running the following command:

   ```
   aws bedrock get-provisioned-model-throughput \
       --provisioned-model-id ${provisioned-model-arn}
   ```

1. Run inference with your provisioned model by sending an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) request. Provide the ARN of the provisioned model that was returned in the `CreateProvisionedModelThroughput` response, as the `model-id`. The output is written to a file named *output.txt* in your current folder.

   ```
   aws bedrock-runtime invoke-model \
       --model-id ${provisioned-model-arn} \
       --body '{
                   "messages": [{
                       "role": "user",
                       "content": [{
                           "text": "Hello"
                       }]
                   }],
                   "inferenceConfig": {
                       "temperature":0.7
                   }
               }' \
       --cli-binary-format raw-in-base64-out \
       output.txt
   ```

1. Send a [DeleteProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_DeleteProvisionedModelThroughput.html) request to delete the Provisioned Throughput using the following command. You'll no longer be charged for the Provisioned Throughput.

   ```
   aws bedrock delete-provisioned-model-throughput 
     --provisioned-model-id MyPT
   ```

------
#### [ Python (Boto) ]

The following code snippets walk you through creating a Provisioned Throughput getting information about it, and invoking the Provisioned Throughput.

1. To create a no-commitment Provisioned Throughput called *MyPT* and assign the ARN of the Provisioned Throughput to a variable called *provisioned\$1model\$1arn*, send the following [CreateProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateProvisionedModelThroughput.html) request:

   ```
   import boto3 
   
   provisioned_model_name = 'MyPT'
   
   bedrock = boto3.client(service_name='bedrock')
   response = bedrock.create_provisioned_model_throughput(
       modelUnits=1,
       provisionedModelName=provisioned_model_name, 
       modelId='amazon.nova-lite-v1:0:24k' 
   )
                           
   provisioned_model_arn = response['provisionedModelArn']
   ```

1. Allow some time for the creation to complete. You can check its status with the following code snippet. You can provide either the name of the Provisioned Throughput or the ARN returned from the [CreateProvisionedModelThroughput](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_CreateProvisionedModelThroughput.html) response as the `provisionedModelId`.

   ```
   bedrock.get_provisioned_model_throughput(provisionedModelId=provisioned_model_name)
   ```

1. Run inference with your updated provisioned model with the following command and using the ARN of the provisioned model as the `modelId`.

   ```
   import json
   import logging
   import boto3
   
   from botocore.exceptions import ClientError
   
   
   class ImageError(Exception):
       "Custom exception for errors returned by the model"
   
       def __init__(self, message):
           self.message = message
   
   
   logger = logging.getLogger(__name__)
   logging.basicConfig(level=logging.INFO)
   
   
   def generate_text(model_id, body):
       """
       Generate text using your provisioned custom model.
       Args:
           model_id (str): The model ID to use.
           body (str) : The request body to use.
       Returns:
           response (json): The response from the model.
       """
   
       logger.info(
           "Generating text with your provisioned custom model %s", model_id)
   
       brt = boto3.client(service_name='bedrock-runtime')
   
       accept = "application/json"
       content_type = "application/json"
   
       response = brt.invoke_model(
           body=body, modelId=model_id, accept=accept, contentType=content_type
       )
       response_body = json.loads(response.get("body").read())
   
       finish_reason = response_body.get("error")
   
       if finish_reason is not None:
           raise ImageError(f"Text generation error. Error is {finish_reason}")
   
       logger.info(
           "Successfully generated text with provisioned custom model %s", model_id)
   
       return response_body
   
   
   def main():
       """
       Entrypoint for example.
       """
       try:
           logging.basicConfig(level=logging.INFO,
                               format="%(levelname)s: %(message)s")
   
           model_id = provisioned-model-arn
   
           body = json.dumps({
               "inputText": "what is AWS?"
           })
   
           response_body = generate_text(model_id, body)
           print(f"Input token count: {response_body['inputTextTokenCount']}")
   
           for result in response_body['results']:
               print(f"Token count: {result['tokenCount']}")
               print(f"Output text: {result['outputText']}")
               print(f"Completion reason: {result['completionReason']}")
   
       except ClientError as err:
           message = err.response["Error"]["Message"]
           logger.error("A client error occurred: %s", message)
           print("A client error occured: " +
                 format(message))
       except ImageError as err:
           logger.error(err.message)
           print(err.message)
   
       else:
           print(
               f"Finished generating text with your provisioned custom model {model_id}.")
   
   
   if __name__ == "__main__":
       main()
   ```

1. Delete the Provisioned Throughput with the following code snippet. You'll no longer be charged for the Provisioned Throughput.

   ```
   bedrock.delete_provisioned_model_throughput(provisionedModelId=provisioned_model_name)
   ```

------

# Quotas for Amazon Bedrock
Quotas

Your AWS account has default quotas, formerly referred to as limits, for Amazon Bedrock. To view service quotas for Amazon Bedrock, do one of the following:
+ Follow the steps at [Viewing service quotas](https://docs.aws.amazon.com/servicequotas/latest/userguide/gs-request-quota.html) and select **Amazon Bedrock** as the service.
+ Refer to the [Amazon Bedrock service quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#limits_bedrock) in the AWS General Reference.

Model inference in Amazon Bedrock is controlled by quotas on token usage. Some models utilize tokens at a higher rate. For more information about these rates and how to optimize your token usage, see [How tokens are counted in Amazon Bedrock](quotas-token-burndown.md).

To maintain the performance of the service and to ensure appropriate usage of Amazon Bedrock, the default quotas assigned to an account might be updated depending on regional factors, payment history, fraudulent usage, and/or approval of a [quota increase request](quotas-increase.md).

**Topics**
+ [

# How tokens are counted in Amazon Bedrock
](quotas-token-burndown.md)
+ [

# Monitor your token usage by counting tokens before running inference
](count-tokens.md)
+ [

# Request an increase for Amazon Bedrock quotas
](quotas-increase.md)

# How tokens are counted in Amazon Bedrock
How tokens are counted

When you run model inference, there are quotas on the number of tokens that can be processed depending on which Amazon Bedrock model you use. Review the following terminology related to token quotas:


****  

| Term | Definition | 
| --- | --- | 
| InputTokenCount | The CloudWatch Amazon Bedrock runtime metric that represents the number of tokens in a request provided as input to the model. | 
| OutputTokenCount | The CloudWatch Amazon Bedrock runtime metric that represents the number of tokens generated by the model in response to a request. | 
| CacheReadInputTokens | The CloudWatch Amazon Bedrock runtime metric that represents the number of input tokens that were successfully retrieved from a cache instead of being reprocessed by the model. This value is 0 if you don't use [prompt caching](prompt-caching.md). | 
| CacheWriteInputTokens | The CloudWatch Amazon Bedrock runtime metric that represents the number of input tokens that were successfully written into the cache. This value is 0 if you don't use [prompt caching](prompt-caching.md). | 
| Tokens per minute (TPM) | A quota set by AWS at the model level on the number of tokens (including both input and output) that you can use in one minute. | 
| Tokens per day (TPD) | A quota set by AWS at the model level on the number of tokens (including both input and output) that you can use in one day. By default, this value is TPM x 24 x 60. However, new AWS accounts have reduced quotas. | 
| Requests per minute (RPM) | A quota set by AWS at the model level on the number of requests that you can send in one minute. | 
| max\$1tokens | A parameter you provide in your request to set a maximum amount of output tokens the model can generate. | 
| Burndown rate | The rate at which input and output tokens are converted into token quota usage for the throttling system. | 

The burndown rate for Anthropic Claude models version 3.7 and later is **5x for output tokens** (1 output token consumes 5 tokens from your quotas):

For all other models, the burndown rate is **1:1** (1 output token consumes 1 token from your quota).

**Topics**
+ [

## Understanding token quota management
](#quotas-token-burndown-management)
+ [

## Understanding the impact of the max\$1tokens parameter
](#quotas-token-burndown-max-tokens)
+ [

## Optimizing the max\$1tokens parameter
](#quotas-token-burndown-max-tokens-optimize)

## Understanding token quota management


When you make a request, tokens are deducted from your TPM and TPD quotas. Calculations occur at the following stages:
+ **At the start of the request** – Assuming that you haven't exceeded your RPM quota, the following sum is deducted from your quotas. The request is throttled if you exceed a quota.

  ```
  Total input tokens + max_tokens
  ```
+ **During processing** – The quota consumed by the request is periodically adjusted to account for the actual number of output tokens generated.
+ **At the end of the request** – The total number of tokens consumed by the request will be calculated as follows and any unused tokens are replenished to your quota:

  ```
  InputTokenCount + CacheWriteInputTokens + (OutputTokenCount x burndown rate)
  ```

  If you don't use [prompt caching](prompt-caching.md), `CacheWriteInputTokens` will be 0. `CacheReadInputTokens` don't contribute to this calculation.

**Note**  
You're only billed for your actual token usage.  
For example, if you use Anthropic Claude Sonnet 4 and send a request containing 1,000 input tokens and it generates a response equivalent to 100 tokens:  
**1,500 tokens** (1,000 \$1 100 x 5) will be depleted from your TPM and TPD quotas.
You'll only be billed for **1,100 tokens**.

## Understanding the impact of the max\$1tokens parameter


The `max_tokens` value is deducted from your quota at the beginning of each request. If you’re hitting TPM quotas earlier than expected, try reducing `max_tokens` to better approximate the size of your completions.

The following scenarios provide examples of how quota deductions would have worked on completed requests using a model that has a 5x burndown rate for output tokens::

### Scenario 1: High max\$1tokens value


Assume the following parameters:
+ **InputTokenCount:** 3,000
+ **CacheReadInputTokens:** 4,000
+ **CacheWriteInputTokens:** 1,000
+ **OutputTokenCount:** 1,000
+ **max\$1tokens:** 32,000

The following quota deductions take place:
+ **Initial deduction when request is made:** 40,000 (= 3,000 \$1 4,000 \$1 1,000 \$1 32,000)
+ **Final adjusted deduction after response is generated:** 9,000 (= 3,000 \$1 1,000 \$1 1,000 x 5)

In this scenario, fewer concurrent requests could be made because the `max_tokens` parameter was set too high. This reduces the request concurrency, throughput, and quota utilization, because the TPM quota capacity would be reached quickly.

### Scenario 2: Optimized max\$1tokens value


Assume the following parameters:
+ **InputTokenCount:** 3,000
+ **CacheReadInputTokens:** 4,000
+ **CacheWriteInputTokens:** 1,000
+ **OutputTokenCount:** 1,000
+ **max\$1tokens:** 1,250

The following quota deductions take place:
+ **Initial deduction when request is made:** 9,250 (= 3,000 \$1 4,000 \$1 1,000 \$1 1,250)
+ **Final adjusted deduction after response is generated:** 9,000 (= 3,000 \$1 1,000 \$1 1,000 x 5)

In this scenario, the `max_tokens` parameter was optimized, because the initial deduction is only slightly higher than the final adjusted deduction. This helped increase the request concurrency, throughput, and quota utilization.

## Optimizing the max\$1tokens parameter


By optimizing the `max_tokens` parameter, you can efficiently utilize your allocated quota capacity. To help inform your decision about this parameter, you can use Amazon CloudWatch, which automatically collects metrics from AWS services, including token usage data in Amazon Bedrock.

Tokens are recorded in the `InputTokenCount` and `OutputTokenCount` runtime metrics (for more metrics, see [Amazon Bedrock runtime metrics](monitoring.md#runtime-cloudwatch-metrics).

To use CloudWatch monitoring to inform your decision of the `max_tokens` parameter, do the following in the AWS Management Console:

1. Sign in to the Amazon CloudWatch console at [https://console.aws.amazon.com/cloudwatch](https://console.aws.amazon.com/cloudwatch).

1. From the left navigation pane, select **Dashboards**.

1. Select the **Automatic dashboards** tab.

1. Select **Bedrock**.

1. In the **Token Counts by Model** dashboard, select the expand icon.

1. Select a time duration and range parameters for the metrics to account for peak usage.

1. From the dropdown menu labeled **Sum**, you can choose different metrics to observe your token usage. Examine these metrics to guide your decision on setting your `max_tokens` value.

# Monitor your token usage by counting tokens before running inference
Count tokens to monitor usage and cost

When you run model inference, the number of tokens that you send in the input contributes to the cost of the request and towards the quota of tokens that you can use per minute and per day. The [CountTokens](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_CountTokens.html) API helps you estimate token usage before sending requests to foundation models by returning the token count that would be used if the same input were sent to the model in an inference request.

**Note**  
Using the [CountTokens](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_CountTokens.html) API doesn't incur charges.

Token counting is model-specific because different models use different tokenization strategies. The token count returned by this operation will match the token count that would be charged if the same input were sent to the model to run inference.

You can use the `CountTokens` API to do the following:
+ Estimate costs before sending inference requests.
+ Optimize prompts to fit within token limits.
+ Plan for token usage in your applications.

**Topics**
+ [

## Supported models and Regions for token counting
](#count-tokens-supported)
+ [

## Count tokens in a request
](#count-tokens-use)
+ [

## Try an example
](#count-tokens-example)

## Supported models and Regions for token counting


The following table shows foundation model support for token counting:


| Provider | Model | Model ID | Single-region model support | 
| --- | --- | --- | --- | 
| Anthropic | Claude 3.5 Haiku | anthropic.claude-3-5-haiku-20241022-v1:0 |  us-west-2  | 
| Anthropic | Claude 3.5 Sonnet | anthropic.claude-3-5-sonnet-20240620-v1:0 |  ap-northeast-1 ap-southeast-1 eu-central-1 eu-central-2 us-east-1 us-west-2  | 
| Anthropic | Claude 3.5 Sonnet v2 | anthropic.claude-3-5-sonnet-20241022-v2:0 |  ap-southeast-2 us-west-2  | 
| Anthropic | Claude 3.7 Sonnet | anthropic.claude-3-7-sonnet-20250219-v1:0 |  eu-west-2  | 
| Anthropic | Claude Opus 4 | anthropic.claude-opus-4-20250514-v1:0 |  | 
| Anthropic | Claude Sonnet 4 | anthropic.claude-sonnet-4-20250514-v1:0 |  | 

## Count tokens in a request


To count the number of input tokens in an inference request, send a [CountTokens](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_CountTokens.html) request with an [Amazon Bedrock runtime endpoint](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#br-rt), Specify the model in the header and the input to count tokens for in the `body` field. The value of the `body` field depends on whether you're counting input tokens for an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) or [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) request:
+ For an `InvokeModel` request, the format of the `body` is a string representing a JSON object whose format depends on the model that you specify.
+ For a `Converse` request, the format of the `body` is a JSON object specifying the `messages` and `system` prompts included in the conversation.

## Try an example
Try an example

The examples in this section let you count tokens for an `InvokeModel` and `Converse` request with Anthropic Claude 3 Haiku.

**Prerequisites**
+ You've downloaded AWS SDK for Python (Boto3) and your configuration is set up such that your credentials and default AWS Region are automatically recognized.
+ Your IAM identity has permissions for the following actions (for more information, see [Action, resources, and condition keys for Amazon Bedrock](https://docs.aws.amazon.com/service-authorization/latest/reference/list_amazonbedrock.html)):
  + bedrock:CountTokens – Allows the usage of `CountTokens`.
  + bedrock:InvokeModel – Allows the usage of `InvokeModel` and `Converse`. Should be scoped to the *arn:\$1\$1Partition\$1:bedrock:\$1\$1Region\$1::foundation-model/anthropic.claude-3-haiku-20240307-v1:0*, at minimum.

To try out counting tokens for an [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) request, run the following Python code:

```
import boto3
import json

bedrock_runtime = boto3.client("bedrock-runtime")

input_to_count = json.dumps({
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 500,
    "messages": [
        {
            "role": "user",
            "content": "What is the capital of France?"
        }
    ]
})

response = bedrock_runtime.count_tokens(
    modelId="anthropic.claude-3-5-haiku-20241022-v1:0",
    input={
        "invokeModel": {
            "body": input_to_count
        }
    }
)

print(response["inputTokens"])
```

To try out counting tokens for a [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) request, run the following Python code:

```
import boto3
import json 

bedrock_runtime = boto3.client("bedrock-runtime")

input_to_count = {
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "text": "What is the capital of France?"
                }
            ]
        },
        {
            "role": "assistant",
            "content": [
                {
                    "text": "The capital of France is Paris."
                }
            ]
        },
        {
            "role": "user",
            "content": [
                {
                    "text": "What is its population?"
                }
            ]
        }
    ],
    "system": [
        {
            "text": "You're an expert in geography."
        }
    ]
}

response = bedrock_runtime.count_tokens(
    modelId="anthropic.claude-3-5-haiku-20241022-v1:0",
    input={
        "converse": input_to_count
    }
)

print(response["inputTokens"])
```

# Request an increase for Amazon Bedrock quotas


The steps for requesting a quota increase for your account depend on the value in the **Adjustable** column in the quotas table in [Amazon Bedrock service quotas](https://docs.aws.amazon.com/general/latest/gr/bedrock.html#limits_bedrock):
+ If a quota is marked as **Yes**, you can adjust it by following the steps at [Requesting a Quota Increase](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html) in the Service Quotas User Guide.
+ For any model, you can request an increase for the following quotas together:
  + Cross-Region InvokeModel tokens per minute for *\$1\$1model\$1*
  + Cross-Region InvokeModel requests per minute for *\$1\$1model\$1*
  + On-demand InvokeModel tokens per minute for *\$1\$1model\$1*
  + On-demand InvokeModel requests per minute for *\$1\$1model\$1*
  + Model invocation max tokens per day for *\$1\$1model\$1*

  To request an increase for any combination of these quotas, request an increase for the **Cross-Region InvokeModel tokens per minute for *\$1\$1model\$1*** quota by following the steps at [Requesting a Quota Increase](https://docs.aws.amazon.com/servicequotas/latest/userguide/request-quota-increase.html) in the Service Quotas User Guide. After you do so, the support team will reach out and offer you the option of also increasing the other four quotas.
**Note**  
Due to overwhelming demand, priority will be given to customers who generate traffic that consumes their existing quota allocation. Your request might be denied if you don't meet this condition.

# Prompt caching for faster model inference
Prompt caching

Prompt caching is an optional feature that you can use with supported models on Amazon Bedrock to reduce inference response latency and input token costs. By adding portions of your context to a cache, the model can leverage the cache to skip recomputation of inputs, allowing Bedrock to share in the compute savings and lower your response latencies.

Prompt caching can help when you have workloads with long and repeated contexts that are frequently reused for multiple queries. For example, if you have a chatbot where users can upload documents and ask questions about them, it can be time consuming for the model to process the document every time the user provides input. With prompt caching, you can cache the document so that future queries containing the document don't need to reprocess it.

When using prompt caching, you're charged at a reduced rate for tokens read from cache. Depending on the model, tokens written to cache may be charged at a rate that is higher than that of uncached input tokens. Any tokens not read from or written to cache, are charged at the standard input token rate for that model. For more information, see the [Amazon Bedrock pricing page](https://aws.amazon.com/bedrock/pricing/).

## How it works


If you opt to use prompt caching, Amazon Bedrock creates a cache composed of *cache checkpoints*. These are markers that define the contiguous subsection of your prompt that you wish to cache (often referred to as a prompt prefix). These prompt prefixes should be static between requests, alterations to the prompt prefix in subsequent requests will result in cache misses.

Cache checkpoints have a minimum and maximum number of tokens, dependent on the specific model you're using. You can only create a cache checkpoint if your total prompt prefix meets the minimum number of tokens. For example, the Anthropic Claude 3.7 Sonnet model requires at least 1,024 tokens per cache checkpoint. That means that your first cache checkpoint can be defined after 1,024 tokens and your second cache checkpoint can be defined after 2,048 tokens. If you try to add a cache checkpoint before meeting the minimum number of tokens, your inference will still succeed, but your prefix will not be cached. The cache has a Time To Live (TTL), which resets with each successful cache hit. During this period, the context in the cache is preserved. If no cache hits occur within the TTL window, your cache expires. Most models support a 5-minute TTL, while Claude Opus 4.5, Claude Haiku 4.5, and Claude Sonnet 4.5 also support an extended 1-hour TTL option.

You can use prompt caching anytime you get model inference in Amazon Bedrock for supported models. Prompt caching is supported by the following Amazon Bedrock features:

**Converse and ConverseStream APIs**  
You can carry on a conversation with a model where you specify cache checkpoints in your prompts.

**InvokeModel and InvokeModelWithResponseStream APIs**  
You can submit single prompt requests in which you enable prompt caching and specify your cache checkpoints.

**Prompt Caching with Cross-region Inference**  
Prompt caching can be used in conjunction with cross region inference. Cross-region inference automatically selects the optimal AWS Region within your geography to serve your inference request, thereby maximizing available resources and model availability. At times of high demand, these optimizations may lead to increased cache writes.

**Amazon Bedrock Prompt management**  
When you [create](prompt-management-create.md) or [modify](prompt-management-modify.md) a prompt, you can choose to enable prompt caching. Depending on the model, you can cache system prompts, system instructions, and messages (user and assistant). You can also choose to disable prompt caching.

The APIs provide you with the most flexibility and granular control over the prompt cache. You can set an individual cache checkpoint within your prompts. You can add to the cache by creating more cache checkpoints, up to the maximum number of cache checkpoints allowed for the specific model. For more information, see [Supported models, Regions, and limits](#prompt-caching-models).

## Supported models, Regions, and limits


The following table lists the supported models along with their token minimums, maximum number of cache checkpoints, and fields that allow cache checkpoints.


| Model name | Model ID | Release Type | Minimum number of tokens per cache checkpoint | Maximum number of cache checkpoints per request | Supported TTL | Fields that accept prompt cache checkpoints | 
| --- | --- | --- | --- | --- | --- | --- | 
| Claude Opus 4.5 | anthropic.claude-opus-4-5-20251101-v1:0 | Generally Available | 4,096 | 4 | 5 minutes, 1 hour | `system`, `messages`, and `tools` | 
| Claude Opus 4.1 | anthropic.claude-opus-4-1-20250805-v1:0 | Generally Available | 1,024 | 4 | 5 minutes | `system`, `messages`, and `tools` | 
| Claude Opus 4 | anthropic.claude-opus-4-20250514-v1:0 | Generally Available | 1,024 | 4 | 5 minutes | `system`, `messages`, and `tools` | 
| Claude Sonnet 4.5 | anthropic.claude-sonnet-4-5-20250929-v1:0 | Generally Available | 1,024 | 4 | 5 minutes, 1 hour | `system`, `messages`, and `tools` | 
| Claude Haiku 4.5 | anthropic.claude-haiku-4-5-20251001-v1:0 | Generally Available | 4,096 | 4 | 5 minutes, 1 hour | `system`, `messages`, and `tools` | 
| Claude Sonnet 4 | anthropic.claude-sonnet-4-20250514-v1:0 | Generally Available | 1,024 | 4 | 5 minutes | `system`, `messages`, and `tools` | 
| Claude 3.7 Sonnet | anthropic.claude-3-7-sonnet-20250219-v1:0 | Generally Available | 1,024 | 4 | 5 minutes | `system`, `messages`, and `tools` | 
| Claude 3.5 Haiku | anthropic.claude-3-5-haiku-20241022-v1:0 | Generally Available | 2,048 | 4 | 5 minutes | `system`, `messages`, and `tools` | 
| Claude 3.5 Sonnet v2 | anthropic.claude-3-5-sonnet-20241022-v2:0 | Preview | 1,024 | 4 | 5 minutes | `system`, `messages`, and `tools` | 
| Amazon Nova Micro | amazon.nova-micro-v1:0 | Generally available | 1K1 | 4 | 5 minutes | `system` and `messages` | 
| Amazon Nova Lite | amazon.nova-lite-v1:0 | Generally available | 1K1 | 4 | 5 minutes | `system` and `messages`2 | 
| Amazon Nova Pro | amazon.nova-pro-v1:0 | Generally available | 1K1 | 4 | 5 minutes | `system` and `messages`2 | 
| Amazon Nova Premier | amazon.nova-premier-v1:0 | Generally available | 1K1 | 4 | 5 minutes | `system` and `messages`2 | 
| Amazon Nova 2 Lite | amazon.nova-2-lite-v1:0 | Generally available | 1K1 | 4 | 5 minutes | `system` and `messages`2 | 

1: The Amazon Nova models support a maximum number of 20K tokens for prompt caching.

2: Prompt caching is primarily for text prompts.

To use the 1-hour TTL option with supported models (Claude Opus 4.5, Claude Haiku 4.5, and Claude Sonnet 4.5), specify the `ttl` field in your cache checkpoint. In the Converse API, add `"ttl": "1h"` to your `cachePoint` object. In the InvokeModel API for Claude models, add `"ttl": "1h"` to your `cache_control` object. If no `ttl` value is provided, the default 5-minute caching behavior applies. The 1-hour TTL is useful for longer-running sessions or batch processing scenarios where you want to maintain the cache across extended periods.

Amazon Nova offers automatic prompt caching for all text prompts, including `User` and `System` messages. This mechanism can provide latency benefits when prompts begin with repetitive parts, even without explicit configuration. However, to unlock cost savings and ensure more consistent performance benefits, we recommend opting in to **Explicit Prompt Caching**.

## Simplified Cache Management for Claude Models


For Claude models, Amazon Bedrock offers a simplified approach to cache management that reduces the complexity of manually placing cache checkpoints. Instead of requiring you to specify exact cache checkpoint locations, you can use automatic cache management with a single breakpoint at the end of your static content.

When you enable simplified cache management, the system automatically checks for cache hits at previous content block boundaries, looking back up to approximately 20 content blocks from your specified breakpoint. This allows the model to find the longest matching prefix from your cache without requiring you to predict the optimal checkpoint locations. To use this, place a single cache checkpoint at the end of your static content, before any dynamic or variable content. The system will automatically find the best cache match.

For more granular control, you can still use multiple cache checkpoints (up to 4 for Claude models) to specify exact cache boundaries. You should use multipled cache checkpoints if you are caching sections that change at different frequencies or want more control over exactly what gets cached.

**Important**  
The automatic prefix checking only looks back approximately 20 content blocks from your cache checkpoint. If your static content extends beyond this range, consider using multiple cache checkpoints or restructuring your prompt to place the most frequently reused content within this range.

## How to effectively use prompt caching


If you have prompts that are used at a regular cadence (i.e., system prompts that are used more frequently than every 5 minutes), continue to use the 5-minute cache, since this will continue to be refreshed at no additional charge.

The 1-hour cache is best used in the following scenarios:
+ When you have prompts that are likely used less frequently than 5 minutes, but more frequently than every hour. For example, when an agentic side-agent will take longer than 5 minutes, or when storing a long chat conversation with a user and you generally expect that user may not respond in the next 5 minutes.
+ When latency is important and your follow-up prompts may be sent beyond 5 minutes.
+ When you want to improve your rate limit utilization, since cache hits are not deducted against your rate limit.

You can use both 1-hour and 5-minute cache controls in the same request, but with an important constraint: Cache entries with longer TTL must appear before shorter TTLs (i.e., a 1-hour cache entry must appear before any 5-minute cache entries).

## Getting started


The following sections show you a brief overview of how to use the prompt caching feature for each method of interacting with models through Amazon Bedrock.

### Converse API


The [Converse](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_Converse.html) API provides advanced and flexible options for implementing prompt caching in multi-turn conversations. For more information about the prompt requirements for each model, see the preceding section [Supported models, Regions, and limits](#prompt-caching-models).

**Example request**

The following examples show a cache checkpoint set in the `messages`, `system`, or `tools` fields of a request to the Converse API. You can place checkpoints in any of these locations for a given request. For example, if sending a request to the Claude 3.5 Sonnet v2 model, you could place two cache checkpoints in `messages`, one cache checkpoint in `system`, and one in `tools`. For more detailed information and examples of structuring and sending Converse API requests, see [Carry out a conversation with the Converse API operations](conversation-inference.md).

Specify the desired ttl value as below, when ttl value not specified the default behavior of 5 minutes caching applies.

```
"cachePoint" : {
    "type": "default",
    "ttl" : "5m | 1h"
}
```

------
#### [ messages checkpoints ]

In this example, the first `image` field provides an image to the model, and the second `text` field asks the model to analyze the image. As long as the number of tokens preceding the `cachePoint` in the `content` object meets the minimum token count for the model, a cache checkpoint is created.

```
...
"messages": [
   {
        "role": "user",
        "content": [
            {
                "image": {
                    "bytes": "asfb14tscve..."
                }
            },
            {
                "text": "What's in this image?"
            },
            {
                "cachePoint": {
                    "type": "default"
                }
            }
      ]
  }
]
...
```

------
#### [ system checkpoints ]

In this example, you provide your system prompt in the `text` field. Additionally, you can add a `cachePoint` field to cache the system prompt.

```
...
  "system": [ 
    {
        "text": "You are an app that creates play lists for a radio station that plays rock and pop music. Only return song names and the artist. "
    },
    {
        "cachePoint": {
            "type": "default"
        }
    }
  ],
...
```

------
#### [ tools checkpoints ]

In this example, you provide your tool definition in the `toolSpec` field. (Alternatively, you can call a tool that you’ve previously defined. For more information, see [Use a tool to complete an Amazon Bedrock model response](tool-use.md).) Afterward, you can add a `cachePoint` field to cache the tool.

```
...
toolConfig={
    "tools": [
        {
            "toolSpec": {
                "name": "top_song",
                "description": "Get the most popular song played on a radio station.",
                "inputSchema": {
                    "json": {
                        "type": "object",
                        "properties": {
                            "sign": {
                                "type": "string",
                                "description": "The call sign for the radio station for which you want the most popular song. Example calls signs are WZPZ and WKRP."
                            }
                        },
                        "required": [
                            "sign"
                        ]
                    }
                }
            }
        },
        {
                "cachePoint": {
                    "type": "default"
                }
        }
    ]
}
...
```

------

The model response from the Converse API includes three new fields that are specific to prompt caching. The `CacheReadInputTokens` and `CacheWriteInputTokens` values tell you how many tokens were read from the cache and how many tokens were written to the cache because of your previous request. The `CacheDetails` values tell you the ttl used for the number of token written to cache. These are values that you're charged for by Amazon Bedrock, at a rate that's lower than the cost of full model inference.

### InvokeModel API


Prompt caching is enabled by default when you call the [InvokeModel](https://docs.aws.amazon.com/bedrock/latest/APIReference/API_runtime_InvokeModel.html) API. You can set cache checkpoints at any point in your request body, similar to the previous example for the Converse API.

------
#### [ Anthropic Claude ]

The following example shows how to structure the body of your InvokeModel request for the Anthropic Claude 3.5 Sonnet v2 model. Note that the exact format and fields of the body for InvokeModel requests may vary depending on the model you choose. To see the format and content of the request and response bodies for different models, see [Inference request parameters and response fields for foundation models](model-parameters.md).

Specify the desired ttl value as below, when ttl value not specified the default behavior of 5 minutes caching applies.

```
"cache_control" : {
    "type": "ephemeral",
    "ttl" : "5m | 1h"
}
```

```
body={
        "anthropic_version": "bedrock-2023-05-31",
        "system":"Reply concisely",
        "messages": [
            {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe the best way to learn programming."
                },
                {
                    "type": "text",
                    "text": "Add additional context here for the prompt that meets the minimum token requirement for your chosen model.",
                    "cache_control": {
                        "type": "ephemeral"
                    }
                }
            ]
            }
        ],
        "max_tokens": 2048,
        "temperature": 0.5,
        "top_p": 0.8,
        "stop_sequences": [
            "stop"
        ],
        "top_k": 250
}
```

------
#### [ Amazon Nova ]

The following example shows how to structure the body of your InvokeModel request for the Amazon Nova model. Note that the exact format and fields of the body for InvokeModel requests may vary depending on the model you choose. To see the format and content of the request and response bodies for different models, see [Inference request parameters and response fields for foundation models](model-parameters.md).

```
{
    "system": [{
        "text": "Reply Concisely"
    }],
    "messages": [{
        "role": "user",
        "content": [{
            "text": "Describe the best way to learn programming"
        },
        {
            "text": "Add additional context here for the prompt that meets the minimum token requirement for your chosen model.",
            "cachePoint": {
                "type": "default"
            }
        }]
    }],
    "inferenceConfig": {
        "maxTokens": 300,
        "topP": 0.1,
        "topK": 20,
        "temperature": 0.3
    }
}
```

------

For more information about sending an InvokeModel request, see [Submit a single prompt with InvokeModel](inference-invoke.md).

### Playground


In a chat playground in the Amazon Bedrock console, you can turn on the prompt caching option, and Amazon Bedrock automatically creates cache checkpoints for you.

Follow the instructions in [Generate responses in the console using playgrounds](playgrounds.md) to get started with prompting in an Amazon Bedrock playground. For supported models, prompt caching is automatically turned on in the playground. However, if it’s not, then do the following to turn on prompt caching:

1. In the left side panel, open the **Configurations** menu.

1. Turn on the **Prompt caching** toggle.

1. Run your prompts.

After your combined input and model responses reach the minimum required number of tokens for a checkpoint (which varies by model), Amazon Bedrock automatically creates the first cache checkpoint for you. As you continue chatting, each subsequent reach of the minimum number of tokens creates a new checkpoint, up to the maximum number of checkpoints allowed for the model. You can view your cache checkpoints at any time by choosing **View cache checkpoints** next to the **Prompt caching** toggle, as shown in the following screenshot.

![\[UI toggle for prompt caching in an Amazon Bedrock text playground.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/prompt-caching/bedrock-prompt-caching-ui-toggle.png)


You can view how many tokens are being read from and written to the cache due to each interaction with the model by viewing the **Caching metrics** pop-up (![\[The metrics icon shown in model responses when prompt caching is enabled.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/prompt-caching/bedrock-prompt-caching-metrics-icon.png)) in the playground responses.

![\[Caching metrics box that shows the number of tokens read from and written to the cache.\]](http://docs.aws.amazon.com/bedrock/latest/userguide/images/prompt-caching/bedrock-prompt-caching-metrics.png)


If you turn off the prompt caching toggle while in the middle of a conversation, you can continue chatting with the model.