Rate limiting / quota enforcement
When to reach for this
Reach for this when…
- Public API with tiered plans (free, paid)
- Anti-abuse on write-heavy endpoints
- Per-tenant fairness in multi-tenant systems
- Protection against volumetric attacks
Not really this pattern when…
- Internal service-to-service where every caller is trusted
- Workloads with predictable fixed capacity (batch jobs)
Good vs bad answer
Interviewer probe
“How do you rate-limit your public API?”
Weak answer
"1000 requests per minute per IP."
Strong answer
"Token bucket per API key — burst 100, sustained 10/s for free, 10× that for paid. Redis INCR+PEXPIRE backend. Edge CDN does a per-IP volumetric layer at 1000/min as DDoS safety-net. Per-endpoint buckets for expensive calls (/export separate from /search). Response: 429 + Retry-After + RateLimit-Limit/-Remaining/-Reset headers so clients back off gracefully. Fail-open if Redis is unreachable — we don't DoS ourselves. Hot-key problem (celebrity API keys) is on my roadmap via local-counter fallback."
Why it wins: Names algorithm, identity, storage, response contract, fail-open, and the hot-key risk.
Cheat sheet
- •Default: token bucket. Burst B + sustained R.
- •Storage: Redis INCR + EXPIRE per (identity, bucket).
- •Identity: API key > user id > IP. Layer them.
- •429 + Retry-After + RateLimit-* headers. Always.
- •Expensive endpoints: separate bucket.
- •Fail-open on limiter outage.
- •Edge (volumetric) → gateway (per-key) → app (per-user / per-path).
Core concept
Rate limiting answers two questions: "how fast can this client go?" (capacity) and "is this client abusing us?" (fraud). Algorithm + identity axis + storage are the three choices.
Algorithm choice (pick by burst tolerance):
- Token bucket: bucket of B tokens refilled at R/s; each request takes one. Allows bursts up to B, sustains R/s. The default.
- Leaky bucket: smooths to constant R/s. Fine when you want deterministic output rate.
- Sliding window counter: ~O(1) storage, close-to-accurate, avoids the fixed-window edge doubling problem.
- Fixed window: simple but allows 2B in 2 seconds at edges. Only for rough guardrails.
Identity axis (layer them):
- Per-IP at the edge (volumetric defence; hurts corporate NATs so use as a safety net only).
- Per-API-key at the gateway (authenticated APIs).
- Per-user-id at the app (authenticated users inside a shared key).
- Per-path at the app (expensive endpoints get their own bucket).
Distributed enforcement: a single global limit needs a shared counter. Default: Redis INCR + EXPIRE per (identity, bucket). One round-trip per request; scales to 100k+ req/s per Redis shard. For relaxed limits, local counters with periodic sync tolerate brief over-limit windows.
Response contract (non-optional): 429 + Retry-After + RateLimit-* headers so clients back off gracefully.
Canonical examples
- →API gateway quotas (Stripe, GitHub)
- →Login / signup flood control
- →Email send rate caps
- →SMS verification code throttles
- →Public chatbot APIs
Variants
Per-key token bucket in Redis
INCR + EXPIRE per (api_key, bucket) with one Redis round-trip per request.
The default for authenticated APIs. Globally consistent, ~1 ms tax per request, scales to millions of keys.
Pros
- +Globally consistent
- +Cheap per request
- +Any broker
Cons
- −Redis on the hot path
- −Hot keys skew shards
Choose this variant when
- Public authenticated APIs
- Need cross-gateway consistency
Local counter with periodic sync
Each gateway has its own bucket; syncs to shared store every N seconds.
Zero per-request external call. Accepts brief over-limit windows. For DDoS-scale limits, not quotas.
Pros
- +Zero round-trip
- +Survives limiter outage
Cons
- −Over-limit by (gateway_count × local_bucket) worst case
- −Not for precise billing quotas
Choose this variant when
- Very high QPS edge
- Approximation acceptable
Layered (edge + gateway + app)
CDN per-IP + gateway per-key + app per-user-and-path.
Defence in depth. Each layer catches a different abuse class; failure of one doesn't expose the system.
Pros
- +Multiple abuse classes covered
- +Failure tolerance
Cons
- −Operational complexity
- −Coordinate limits to avoid false positives
Choose this variant when
- Production public APIs
- Previously abused systems
Failure modes
Corporate NATs hide thousands behind one IP. Rate-limiting the IP denies service to the office. Combine with API key or user id on authenticated paths.
Edge-doubling: client gets 2B in 2 seconds across the window boundary. Sliding window or token bucket instead.
Clients back off blindly — too aggressively or too gently. Always emit Retry-After and RateLimit-* headers.
Limiter dies → everything fails. Fail-open for DDoS limits, or fall back to local counters. Never DoS yourself because the quota check failed.
Drills
Token bucket in 30 seconds.Reveal
Bucket holds up to B tokens. Refills at R/s. Each request removes 1; reject if empty. Allows bursts to B then sustained R/s. Two numbers per identity (tokens, last_refill_time). Redis: INCR the counter with expiry; compare to limit.
Your limiter Redis is down. What happens?Reveal
Fail-open for volumetric DDoS limits — downstream capacity absorbs. Fail to local counters for quota-style limits (each gateway enforces conservatively until Redis recovers). Never hard-fail the whole API because the quota check failed.