Rate limits

Rate limits

Each API key has a requests-per-minute (RPM) limit, enforced as a fixed window. New keys default to 600 RPM. Exceeding it returns:

{
  "error": {
    "message": "Rate limit exceeded. Slow down or contact us to raise your limit.",
    "type": "rate_limit_error",
    "code": "rate_limit_exceeded"
  }
}

with HTTP status 429.

Handling 429s

Back off and retry. Most OpenAI SDKs retry 429 automatically with exponential backoff; if you’re calling the API directly, add a short delay (e.g. 1s, 2s, 4s) between retries.

Raising your limit

Need more throughput? Reach out and we’ll raise your key’s RPM. During beta we run at limited capacity, so you may occasionally see a queue under heavy load even within your limit.

Capacity vs. limits

Two different things can slow a request:

  • Rate limit (429) — you’re sending requests faster than your key allows. Back off.
  • Cold model (503 model_cold) — no GPU is warm for that model yet. Retry shortly; the first request after idle spins one up.