Pattern

Rate limiting / quota enforcement

When to reach for this

Reach for this when…

Public API with tiered plans (free, paid)
Anti-abuse on write-heavy endpoints
Per-tenant fairness in multi-tenant systems
Protection against volumetric attacks

Not really this pattern when…

Internal service-to-service where every caller is trusted
Workloads with predictable fixed capacity (batch jobs)

Good vs bad answer

Interviewer probe

“How do you rate-limit your public API?”

Weak answer

"1000 requests per minute per IP."

Strong answer

"Token bucket per API key — burst 100, sustained 10/s for free, 10× that for paid. Redis INCR+PEXPIRE backend. Edge CDN does a per-IP volumetric layer at 1000/min as DDoS safety-net. Per-endpoint buckets for expensive calls (/export separate from /search). Response: 429 + Retry-After + RateLimit-Limit/-Remaining/-Reset headers so clients back off gracefully. Fail-open if Redis is unreachable — we don't DoS ourselves. Hot-key problem (celebrity API keys) is on my roadmap via local-counter fallback."

Why it wins: Names algorithm, identity, storage, response contract, fail-open, and the hot-key risk.

Cheat sheet

•Default: token bucket. Burst B + sustained R.
•Storage: Redis INCR + EXPIRE per (identity, bucket).
•Identity: API key > user id > IP. Layer them.
•429 + Retry-After + RateLimit-* headers. Always.
•Expensive endpoints: separate bucket.
•Fail-open on limiter outage.
•Edge (volumetric) → gateway (per-key) → app (per-user / per-path).

Core concept

Rate limiting answers two questions: "how fast can this client go?" (capacity) and "is this client abusing us?" (fraud). Algorithm + identity axis + storage are the three choices.

Algorithm choice (pick by burst tolerance):

Token bucket: bucket of B tokens refilled at R/s; each request takes one. Allows bursts up to B, sustains R/s. The default.
Leaky bucket: smooths to constant R/s. Fine when you want deterministic output rate.
Sliding window counter: ~O(1) storage, close-to-accurate, avoids the fixed-window edge doubling problem.
Fixed window: simple but allows 2B in 2 seconds at edges. Only for rough guardrails.

Identity axis (layer them):

Per-IP at the edge (volumetric defence; hurts corporate NATs so use as a safety net only).
Per-API-key at the gateway (authenticated APIs).
Per-user-id at the app (authenticated users inside a shared key).
Per-path at the app (expensive endpoints get their own bucket).

Distributed enforcement: a single global limit needs a shared counter. Default: Redis INCR + EXPIRE per (identity, bucket). One round-trip per request; scales to 100k+ req/s per Redis shard. For relaxed limits, local counters with periodic sync tolerate brief over-limit windows.

Response contract (non-optional): 429 + Retry-After + RateLimit-* headers so clients back off gracefully.

Canonical examples

→API gateway quotas (Stripe, GitHub)
→Login / signup flood control
→Email send rate caps
→SMS verification code throttles
→Public chatbot APIs

Variants

Per-key token bucket in Redis

INCR + EXPIRE per (api_key, bucket) with one Redis round-trip per request.

The default for authenticated APIs. Globally consistent, ~1 ms tax per request, scales to millions of keys.

Pros

+Globally consistent
+Cheap per request
+Any broker

Cons

−Redis on the hot path
−Hot keys skew shards

Choose this variant when

Public authenticated APIs
Need cross-gateway consistency

Local counter with periodic sync

Each gateway has its own bucket; syncs to shared store every N seconds.

Zero per-request external call. Accepts brief over-limit windows. For DDoS-scale limits, not quotas.

Pros

+Zero round-trip
+Survives limiter outage

Cons

−Over-limit by (gateway_count × local_bucket) worst case
−Not for precise billing quotas

Choose this variant when

Very high QPS edge
Approximation acceptable

Layered (edge + gateway + app)

CDN per-IP + gateway per-key + app per-user-and-path.

Defence in depth. Each layer catches a different abuse class; failure of one doesn't expose the system.

Pros

+Multiple abuse classes covered
+Failure tolerance

Cons

−Operational complexity
−Coordinate limits to avoid false positives

Choose this variant when

Production public APIs
Previously abused systems

Failure modes

Per-IP only

Corporate NATs hide thousands behind one IP. Rate-limiting the IP denies service to the office. Combine with API key or user id on authenticated paths.

Fixed-window counter

Edge-doubling: client gets 2B in 2 seconds across the window boundary. Sliding window or token bucket instead.

No Retry-After

Clients back off blindly — too aggressively or too gently. Always emit Retry-After and RateLimit-* headers.

Limiter on hot path with no fallbackAdvanced

Limiter dies → everything fails. Fail-open for DDoS limits, or fall back to local counters. Never DoS yourself because the quota check failed.

Drills

Token bucket in 30 seconds.Reveal

Bucket holds up to B tokens. Refills at R/s. Each request removes 1; reject if empty. Allows bursts to B then sustained R/s. Two numbers per identity (tokens, last_refill_time). Redis: INCR the counter with expiry; compare to limit.

Your limiter Redis is down. What happens?Reveal

Fail-open for volumetric DDoS limits — downstream capacity absorbs. Fail to local counters for quota-style limits (each gateway enforces conservatively until Redis recovers). Never hard-fail the whole API because the quota check failed.

11% complete

Current

When to reach for this

Step 1 of 9

Good vs bad answer

Jump to next

All patterns

Pattern

Rate limiting / quota enforcement

When to reach for this

Reach for this when…

Public API with tiered plans (free, paid)
Anti-abuse on write-heavy endpoints
Per-tenant fairness in multi-tenant systems
Protection against volumetric attacks

Not really this pattern when…

Internal service-to-service where every caller is trusted
Workloads with predictable fixed capacity (batch jobs)

Good vs bad answer

Interviewer probe

“How do you rate-limit your public API?”

Weak answer

"1000 requests per minute per IP."

Strong answer

Why it wins: Names algorithm, identity, storage, response contract, fail-open, and the hot-key risk.

Cheat sheet

•Default: token bucket. Burst B + sustained R.
•Storage: Redis INCR + EXPIRE per (identity, bucket).
•Identity: API key > user id > IP. Layer them.
•429 + Retry-After + RateLimit-* headers. Always.
•Expensive endpoints: separate bucket.
•Fail-open on limiter outage.
•Edge (volumetric) → gateway (per-key) → app (per-user / per-path).

Core concept

Rate limiting answers two questions: "how fast can this client go?" (capacity) and "is this client abusing us?" (fraud). Algorithm + identity axis + storage are the three choices.

Algorithm choice (pick by burst tolerance):

Token bucket: bucket of B tokens refilled at R/s; each request takes one. Allows bursts up to B, sustains R/s. The default.
Leaky bucket: smooths to constant R/s. Fine when you want deterministic output rate.
Sliding window counter: ~O(1) storage, close-to-accurate, avoids the fixed-window edge doubling problem.
Fixed window: simple but allows 2B in 2 seconds at edges. Only for rough guardrails.

Identity axis (layer them):

Per-IP at the edge (volumetric defence; hurts corporate NATs so use as a safety net only).
Per-API-key at the gateway (authenticated APIs).
Per-user-id at the app (authenticated users inside a shared key).
Per-path at the app (expensive endpoints get their own bucket).

Response contract (non-optional): 429 + Retry-After + RateLimit-* headers so clients back off gracefully.

Canonical examples

→API gateway quotas (Stripe, GitHub)
→Login / signup flood control
→Email send rate caps
→SMS verification code throttles
→Public chatbot APIs

Variants

Per-key token bucket in Redis

INCR + EXPIRE per (api_key, bucket) with one Redis round-trip per request.

The default for authenticated APIs. Globally consistent, ~1 ms tax per request, scales to millions of keys.

Pros

+Globally consistent
+Cheap per request
+Any broker

Cons

−Redis on the hot path
−Hot keys skew shards

Choose this variant when

Public authenticated APIs
Need cross-gateway consistency

Local counter with periodic sync

Each gateway has its own bucket; syncs to shared store every N seconds.

Zero per-request external call. Accepts brief over-limit windows. For DDoS-scale limits, not quotas.

Pros

+Zero round-trip
+Survives limiter outage

Cons

−Over-limit by (gateway_count × local_bucket) worst case
−Not for precise billing quotas

Choose this variant when

Very high QPS edge
Approximation acceptable

Layered (edge + gateway + app)

CDN per-IP + gateway per-key + app per-user-and-path.

Defence in depth. Each layer catches a different abuse class; failure of one doesn't expose the system.

Pros

+Multiple abuse classes covered
+Failure tolerance

Cons

−Operational complexity
−Coordinate limits to avoid false positives

Choose this variant when

Production public APIs
Previously abused systems

Failure modes

Per-IP only

Corporate NATs hide thousands behind one IP. Rate-limiting the IP denies service to the office. Combine with API key or user id on authenticated paths.

Fixed-window counter

Edge-doubling: client gets 2B in 2 seconds across the window boundary. Sliding window or token bucket instead.

No Retry-After

Clients back off blindly — too aggressively or too gently. Always emit Retry-After and RateLimit-* headers.

Limiter on hot path with no fallbackAdvanced

Limiter dies → everything fails. Fail-open for DDoS limits, or fall back to local counters. Never DoS yourself because the quota check failed.

Drills

Token bucket in 30 seconds.Reveal

Your limiter Redis is down. What happens?Reveal

11% complete

Current

When to reach for this

Step 1 of 9

Good vs bad answer

Jump to next