Rate Limiting

Control request rates per IP, user, or team.

Enable rate limiting

ratelimit:
  enabled: true
  default:
    requests_per_minute: 60
    requests_per_hour: 1000
    burst: 10
  per_ip: true
  per_user: true

How it works

Rate limiting uses a token bucket algorithm. Each key (IP, user, or team) gets a bucket that refills at the configured rate. The burst setting controls how many requests can fire in quick succession before rate limiting kicks in.

When a request is rate-limited, Stockyard returns HTTP 429 with a Retry-After header telling the client when to retry.

Per-IP limiting

When per_ip: true, each unique IP address gets its own rate limit. This is the simplest mode and works well for public-facing APIs.

Per-user limiting

When per_user: true, rate limits apply per authenticated user (identified by their API key). This requires auth to be enabled. Different users can have different limits set via the team API.

Per-team limiting

Team isolation includes per-team rate limits. Set custom RPM and TPM (tokens per minute) limits per team:

curl -X PUT http://localhost:4200/api/teams/backend \
  -d '{"rpm_limit": 100, "tpm_limit": 100000}'

Response headers

Rate-limited responses include these headers:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1711900800
Retry-After: 15

Exemptions

Admin requests authenticated with STOCKYARD_ADMIN_KEY are exempt from rate limiting. This ensures management API calls always work even when rate limits are active.