Rate limits and quotas
Claude Platform on AWS assigns Tier 1 rate limits when you subscribe. Anthropic manages rate limits directly, not through AWS quota systems.
Default limits
Claude Platform on AWS uses Anthropic’s standard tier schedule, identical to the first-party Claude API. Tier 1 limits apply per workspace. Limits are pooled by model family (for example, one combined limit covers Claude Opus 4.7, 4.6, 4.5, and earlier Opus models; Sonnet models share a separate combined limit; Haiku models share another).
For the current Tier 1 values — requests per minute (RPM), input tokens per minute (ITPM), output tokens per minute (OTPM) — and for higher-tier thresholds, see Rate limits
Rate limit headers
Every response includes headers that report your current rate limit status. Key headers:
-
anthropic-ratelimit-requests-limit— Maximum requests per minute -
anthropic-ratelimit-requests-remaining— Requests remaining in the current window -
anthropic-ratelimit-requests-reset— Time when the request limit resets (RFC 3339) -
anthropic-ratelimit-tokens-limit— Maximum combined tokens (input + output) per minute -
anthropic-ratelimit-tokens-remaining— Combined tokens remaining in the current window -
anthropic-ratelimit-tokens-reset— Time when the combined token limit resets (RFC 3339) -
anthropic-ratelimit-input-tokens-limit/-remaining/-reset— Input-token-specific headers -
anthropic-ratelimit-output-tokens-limit/-remaining/-reset— Output-token-specific headers -
retry-after— On a 429 response, the number of seconds to wait before retrying
See Response headers
Requesting higher limits
Unlike the first-party Claude API, automatic tier advancement does not apply on Claude Platform on AWS. To request higher limits, contact your Anthropic account representative with your workspace ID and desired throughput. For tier thresholds and other details, see Rate limits
Rate limit errors
When you exceed a rate limit, the API returns HTTP 429 with a rate_limit_error type. Implement exponential backoff with jitter in your retry logic. The retry-after header indicates how many seconds to wait before retrying.