To mitigate misuse and manage capacity on our API, we have implemented limits on how much an organization can use the Claude API.
Usage Tier | Credit Purchase | Max Usage per Month |
---|---|---|
Tier 1 | $5 | $100 |
Tier 2 | $40 | $500 |
Tier 3 | $200 | $1,000 |
Tier 4 | $400 | $5,000 |
Monthly Invoicing | N/A | N/A |
retry-after
header indicating how long to wait.
ITPM rate limits are estimated at the beginning of each request, and the estimate is adjusted during the request to reflect the actual number of input tokens used.
The final adjustment counts input_tokens
and cache_creation_input_tokens
towards ITPM rate limits, while cache_read_input_tokens
are not (though they are still billed).
In some instances, cache_read_input_tokens
are counted towards ITPM rate limits.
OTPM rate limits are estimated based on max_tokens
at the beginning of each request, and the estimate is adjusted at the end of the request to reflect the actual number of output tokens used.
If you’re hitting OTPM limits earlier than expected, try reducing max_tokens
to better approximate the size of your completions.
Rate limits are applied separately for each model; therefore you can use different models up to their respective limits simultaneously.
You can check your current rate limits and behavior in the Anthropic Console.
context-1m-2025-08-07
beta header with Claude Sonnet 4, separate rate limits apply. See Long context rate limits below.Model | Maximum requests per minute (RPM) | Maximum input tokens per minute (ITPM) | Maximum output tokens per minute (OTPM) |
---|---|---|---|
Claude Opus 4.x* | 50 | 30,000 | 8,000 |
Claude Sonnet 4 | 50 | 30,000 | 8,000 |
Claude Sonnet 3.7 | 50 | 20,000 | 8,000 |
Claude Sonnet 3.5 2024-10-22 (deprecated) | 50 | 40,000† | 8,000 |
Claude Sonnet 3.5 2024-06-20 (deprecated) | 50 | 40,000† | 8,000 |
Claude Haiku 3.5 | 50 | 50,000† | 10,000 |
Claude Opus 3 (deprecated) | 50 | 20,000† | 4,000 |
Claude Sonnet 3 | 50 | 40,000† | 8,000 |
Claude Haiku 3 | 50 | 50,000† | 10,000 |
cache_read_input_tokens
towards ITPM usage.
Maximum requests per minute (RPM) | Maximum batch requests in processing queue | Maximum batch requests per batch |
---|---|---|
50 | 100,000 | 100,000 |
Maximum input tokens per minute (ITPM) | Maximum output tokens per minute (OTPM) |
---|---|
500,000 | 100,000 |
Header | Description |
---|---|
retry-after | The number of seconds to wait until you can retry the request. Earlier retries will fail. |
anthropic-ratelimit-requests-limit | The maximum number of requests allowed within any rate limit period. |
anthropic-ratelimit-requests-remaining | The number of requests remaining before being rate limited. |
anthropic-ratelimit-requests-reset | The time when the request rate limit will be fully replenished, provided in RFC 3339 format. |
anthropic-ratelimit-tokens-limit | The maximum number of tokens allowed within any rate limit period. |
anthropic-ratelimit-tokens-remaining | The number of tokens remaining (rounded to the nearest thousand) before being rate limited. |
anthropic-ratelimit-tokens-reset | The time when the token rate limit will be fully replenished, provided in RFC 3339 format. |
anthropic-ratelimit-input-tokens-limit | The maximum number of input tokens allowed within any rate limit period. |
anthropic-ratelimit-input-tokens-remaining | The number of input tokens remaining (rounded to the nearest thousand) before being rate limited. |
anthropic-ratelimit-input-tokens-reset | The time when the input token rate limit will be fully replenished, provided in RFC 3339 format. |
anthropic-ratelimit-output-tokens-limit | The maximum number of output tokens allowed within any rate limit period. |
anthropic-ratelimit-output-tokens-remaining | The number of output tokens remaining (rounded to the nearest thousand) before being rate limited. |
anthropic-ratelimit-output-tokens-reset | The time when the output token rate limit will be fully replenished, provided in RFC 3339 format. |
anthropic-priority-input-tokens-limit | The maximum number of Priority Tier input tokens allowed within any rate limit period. (Priority Tier only) |
anthropic-priority-input-tokens-remaining | The number of Priority Tier input tokens remaining (rounded to the nearest thousand) before being rate limited. (Priority Tier only) |
anthropic-priority-input-tokens-reset | The time when the Priority Tier input token rate limit will be fully replenished, provided in RFC 3339 format. (Priority Tier only) |
anthropic-priority-output-tokens-limit | The maximum number of Priority Tier output tokens allowed within any rate limit period. (Priority Tier only) |
anthropic-priority-output-tokens-remaining | The number of Priority Tier output tokens remaining (rounded to the nearest thousand) before being rate limited. (Priority Tier only) |
anthropic-priority-output-tokens-reset | The time when the Priority Tier output token rate limit will be fully replenished, provided in RFC 3339 format. (Priority Tier only) |
anthropic-ratelimit-tokens-*
headers display the values for the most restrictive limit currently in effect. For instance, if you have exceeded the Workspace per-minute token limit, the headers will contain the Workspace per-minute token rate limit values. If Workspace limits do not apply, the headers will return the total tokens remaining, where total is the sum of input and output tokens. This approach ensures that you have visibility into the most relevant constraint on your current API usage.