intermediatesecurity

Abuse prevention & rate limiting

Token bucket, sliding window, per-user vs per-IP, bot detection.

Rate limits are the only thing between your free tier and a botnet. A system without them is not a product — it's a target.

Read this if your last attempt…

You didn't mention rate limits in your API design
You'd limit per-IP and call it done (goodbye, corporate NAT)
You can't explain token bucket vs sliding window
You don't know how distributed rate limiting stays consistent

The concept

Rate limiting answers two distinct questions: "how fast can a client go?" (capacity protection) and "is this client abusing us?" (fraud/abuse). The algorithm, the identity axis, the response contract, and the failure mode all matter — each is a design decision, not a default.

Four design axes

1. Algorithm — how you count.
- Token bucket — refills at R tokens/sec, capped at B. Allows bursts up to B then sustains R. Default choice. O(1) storage (one counter + timestamp).
- Leaky bucket — queued at the rim, drains at R/s. Smooths bursts into constant output. Adds latency — good for outbound shaping to a 3rd party, bad for inbound user requests.
- Fixed window counter — count per calendar minute. Edge-doubling problem: client hits B in the last second of minute 1 and B in the first of minute 2 → 2B in 2 seconds. Avoid unless you accept the 2× burst.
- Sliding window log — exact count via timestamp list. O(requests-in-window) memory; rarely the right tradeoff at scale.
- Sliding window counter — weighted blend of two fixed windows. ~95% accurate, O(1). Good compromise.
- GCRA (Generic Cell Rate Algorithm) — leaky-bucket semantics in O(1) with a single timestamp. Stripe uses this for high-QPS production limiters.

Architecture diagram· Token bucket

Bucket refills at R tokens/sec, capped at B. Request consumes 1 token or is rejected.

Algorithm choice — pick by burst tolerance and storage cost.

Algorithm	Bursts	Accuracy	Cost
Token bucket	Yes (up to B)	Good	O(1) — two numbers per key
Leaky bucket	No — smooths to R/s	Exact rate	O(1) + optional queue
Fixed window	Up to 2B at edges	Poor	O(1) — one counter
Sliding log	Yes	Exact	O(requests in window)
Sliding counter	Yes	Very good	O(1) — two counters

How interviewers grade this

You name the algorithm (token bucket default) and the burst + sustained rate.
You list the identity axes you limit on (IP, API key, user, path).
You distinguish local (per-gateway) from global (cross-gateway) limiting.
You specify the response (429 with Retry-After; standardised headers like RateLimit-*).
You size the limiter storage (Redis ops/s, memory).

Variants

Per-key token bucket in Redis

INCR + EXPIRE per (identity, bucket) — one round-trip per request.

The default for authenticated APIs. Key = (api_key, bucket), Redis INCR returns the new count; compare to limit. Cluster keys by hash-tag so the same identity hits the same shard.

Pros

+Globally consistent
+Cheap per request (~1 ms Redis round-trip)
+Scales to millions of keys

Cons

−Redis is a dependency on the hot path
−Slight request-latency tax
−Hot keys (celebrity users) skew shards

Choose this variant when

Public APIs
Need cross-gateway consistency
Redis already on the stack

Local limit with periodic sync

Each gateway keeps its own counter; sync to shared store every N seconds.

Eliminates Redis round-trip per request. Accepts small over-limit windows (the sync interval). Fine for DDoS protection; unsafe for precise quotas.

Missing diagram: distributed-limiter

The regional variation: each region has its own bucket, usage streams to a central reconciler, and global quotas are enforced at billing time instead of on the hot path.

Pros

+Zero per-request external call
+Survives Redis outage
+Low latency tax

Cons

−Over-limit by up to (gateway count × local bucket) in worst case
−Not suitable for precise billing quotas

Choose this variant when

High-QPS edge
Approximation is acceptable
Need to survive limiter outage

Layered (edge + app + backend)

IP-level at CDN/edge, API-key at gateway, user-id at app.

Defence in depth. Each layer catches a different abuse class: CDN catches volumetric attacks, gateway catches API-key theft, app catches authenticated abuse.

Missing diagram: layered-rate-limit

A limit at only one layer is brittle. CDN alone misses authenticated abuse; gateway alone is bypassed by volumetric floods. Layer all three so failure at any single point leaves the rest enforcing.

Pros

+Multiple abuse classes covered
+Failure of one layer doesn't expose the system

Cons

−Operational complexity
−Must coordinate limits to avoid false positives

Choose this variant when

Production public APIs
Systems that have been abused once already

Worked example

Design: rate limits for a public REST API.

Layers:

1CDN: per-IP, 1000 req/min. Catches volumetric abuse before it hits origin. Cloudflare / CloudFront rules.
2API gateway: per-API-key, token bucket, burst=100, sustain=10/s (free tier); burst=1000, sustain=100/s (paid). Redis-backed with INCR + PEXPIRE.
3Per-endpoint overrides: POST /search is 10× cheaper internally than POST /export; separate buckets.

Response contract:

429 Too Many Requests with Retry-After header.
RateLimit-Limit / RateLimit-Remaining / RateLimit-Reset headers on every response (standard draft).

Sizing:

10k customers, peak 1k active, 100 req/s each → 100k req/s at peak.
Redis: 100k INCR/s, well within a single c5.xlarge Redis.

Edge cases:

Corporate NAT: don't limit only on IP. Always combine with API key / cookie where possible.
Authenticated but anonymous: per-user fallback is IP + cookie hash.
Burst-friendly clients: size burst at 10× sustained for UX friendliness.

Good vs bad answer

Interviewer probe

“How do you rate-limit your API?”

Weak answer

"1000 requests per minute per IP."

Strong answer

"Token bucket per API key — burst 100, sustained 10/s for free tier; 10× that paid. Redis-backed with INCR+PEXPIRE. CDN does a per-IP volumetric layer at 1000/min to absorb blunt attacks before they hit origin. Response is 429 + Retry-After + RateLimit-* headers so clients can back off gracefully. Per-endpoint buckets for expensive operations — /export doesn't share a bucket with /search."

Why it wins: Names algorithm, identity axis, layering, storage, response contract, and per-endpoint scaling.

Interview playbook2-3 minutes whenever a public API, abuse protection, or quota is introduced; ~1 minute in deep-dive on expensive endpoints or distributed consistency.

When it comes up

Any public API — the interviewer will ask "what stops abuse?"
When you're designing a free tier or quota system
During capacity discussions — a rogue client can DoS you
When authentication or API keys come up
When an expensive endpoint appears in the design (search, export, heavy compute)

Order of reveal

1
State the two problems rate limits solve. "Capacity protection (a rogue client can't overwhelm us) and abuse/fraud (one identity can't farm quota). Different axes, sometimes the same tool."
2
Pick the algorithm with a reason. "Token bucket by default — burst B, sustained R. Allows legitimate bursts without punishing the user; simpler to explain than sliding windows."
3
Name the identity axis, not just the IP. "Primary axis is API key for authenticated traffic. Per-user inside a shared key if abuse is per-user. IP only as a CDN-level volumetric safety net."
4
Layer the limits. "Three layers: CDN does per-IP volumetric, gateway does per-API-key, app does per-user and per-expensive-endpoint. Each catches a different abuse class."
5
Commit to a storage design. "Redis with INCR + PEXPIRE per (key, bucket). One round-trip per request, ~1 ms added latency. Clustered by hash-tag so the same identity hits the same shard."
6
Specify the response contract. "429 Too Many Requests, Retry-After header, and RateLimit-Limit / Remaining / Reset headers on every response. Clients can back off gracefully."
7
Name the failure mode. "When the limiter is down, fail open for DDoS-style limits and fail local for quota-style limits. Never hard-fail the whole API because the counter is unreachable."

Signature phrases

“Token bucket is the default; justify anything else”

“Per-IP only denies service to corporate NATs”

“Layered limits: CDN, gateway, app”

“Fail open, not closed”

“Expensive endpoints get their own bucket”

“429 with Retry-After and RateLimit-* headers”

“Token bucket is the default; justify anything else” — Prevents fixed-window-edge-doubling mistakes.
“Per-IP only denies service to corporate NATs” — Catches the single most common junior answer.
“Layered limits: CDN, gateway, app” — Defence in depth in one phrase.
“Fail open, not closed” — Prevents turning a limiter outage into a full API outage.
“Expensive endpoints get their own bucket” — Shows you've costed work, not just request count.
“429 with Retry-After and RateLimit-* headers” — The exact response contract clients need.

Likely follow-ups

?“Walk me through an exact token bucket implementation in Redis. What commands, what edge cases?”Reveal

Data structure per identity: A hash bucket:{api_key} with two fields:

tokens — current token count (float, supports partial refill).
last_refill — millisecond timestamp of the last refill.

Request flow (atomic via Lua script to avoid races):

lua

-- KEYS[1] = bucket key, ARGV = {now_ms, refill_rate, burst}
local now   = tonumber(ARGV[1])
local rate  = tonumber(ARGV[2])   -- tokens per ms
local burst = tonumber(ARGV[3])

local data = redis.call('HMGET', KEYS[1], 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or burst
local last   = tonumber(data[2]) or now

-- Refill based on elapsed time, capped at burst
tokens = math.min(burst, tokens + (now - last) * rate)

if tokens >= 1 then
  tokens = tokens - 1
  redis.call('HSET', KEYS[1], 'tokens', tokens, 'last_refill', now)
  redis.call('PEXPIRE', KEYS[1], 60000)   -- TTL idle identities out
  return 1   -- allow
else
  redis.call('HSET', KEYS[1], 'tokens', tokens, 'last_refill', now)
  return 0   -- deny
end

Why a Lua script: two-command sequences (GET then SET) race under concurrency — two gateways could both see 1 token and both allow. The script is atomic inside Redis.

Edge cases:

1First request from an identity — key doesn't exist; initialise to full burst. Handled by or burst / or now.
2Long idle period — without TTL, Redis keeps the key forever. PEXPIRE resets on every access; idle keys evict automatically.
3Clock skew between gateways — use redis.call('TIME') inside the script instead of client-sent time. Cleaner but one extra cmd.
4Hot key — one celebrity API key could overload the shard hosting its bucket. Mitigate with client-side local pre-check (skip the Redis call if locally we know the bucket is very empty) + sharding the bucket across bucket:{api_key}:{shard_n}.

Per-request cost: ~1 ms (Redis RTT within the region) + ~50 μs of script execution. At 100K QPS, one Redis primary handles this comfortably.

?“What happens when Redis goes down? What does the system do?”Reveal

Two strategies depending on what the limit protects:

1. DDoS-style volumetric limits: FAIL OPEN.

If the limiter is unavailable, allow the request.
Log the gap; alert on sustained fail-open.
Rationale: a DoS attack on the limiter (or a Redis cluster incident) should not itself cause a full API outage. Fail-open trades some worst-case overage for resilience.
Downstream capacity absorbs it; overload alarms fire if there is actually a flood.

2. Quota / billing limits: FAIL LOCAL.

Each gateway falls back to an in-memory bucket with a conservative limit (say 1/N of the global, where N is the gateway count).
Still enforces some limit; accepts over-limit by a factor during the outage.
Logs the identities that were throttled locally — post-incident, reconcile billing.
Rationale: a 15-minute outage where a single customer consumes 10× their quota is a billing issue, not a business risk.

3. Hard, must-enforce limits (financial, compliance): FAIL CLOSED.

Reject requests. Return 503 Service Unavailable.
Only for flows where "serve no request" is genuinely better than "serve a request unlimited".
Rare in practice. Most limits should not be in this category.

The anti-pattern to avoid: fail-closed by default. A single-point-of-failure limiter on the hot path is an outage waiting to happen. Default to fail-open, escalate fail-local only where quota matters, fail-closed almost never.

Operational tooling for fail-open:

Kill switch — feature flag to force fail-open without redeploying. Useful when Redis is slow but not down.
Circuit breaker on Redis calls — if Redis p99 spikes, short-circuit and fail-open proactively instead of letting every request pay the timeout.
Degraded-mode metric — dashboard shows "limiter is in degraded mode" so incident response knows.

?“A user hits their rate limit. What should they see, and what headers does your response carry?”Reveal

Status code: 429 Too Many Requests (RFC 6585).

Body (JSON):

json

{
  "error": "rate_limited",
  "message": "API rate limit exceeded. Retry after 12 seconds.",
  "limit": 100,
  "remaining": 0,
  "reset_at": "2026-04-23T14:35:00Z",
  "docs_url": "https://docs.example.com/api/rate-limits"
}

Headers (draft-ietf-httpapi-ratelimit-headers standard, increasingly supported):

RateLimit-Limit: 100 — the policy limit.
RateLimit-Remaining: 0 — how many requests they have left in the current window.
RateLimit-Reset: 12 — seconds until the limit resets (or a unix timestamp with RateLimit-Policy).
Retry-After: 12 — how long to wait before retrying (seconds or HTTP-date). Required — this is how well-behaved clients back off.

On every response (not just 429): Emit RateLimit-Limit, RateLimit-Remaining, and RateLimit-Reset. Clients can proactively throttle themselves before hitting the wall. This reduces 429 volume and makes abuse detection clearer (abusive clients ignore these headers).

What NOT to do:

Silent drops. Closing the connection without a 429 leaves the client guessing between network failure and rate limit.
403 Forbidden. Implies permanent denial; clients won't retry. Use 429.
200 with error in body. Breaks clients that rely on HTTP status for retry logic.
Exponential backoff math. Server tells the client how long to wait (Retry-After); don't rely on clients to calculate it.

Abuse-detection signals hidden here:

Clients that ignore Retry-After and hammer anyway are candidates for aggressive throttling or IP-level blocks.
Clients that back off exactly as instructed are good citizens; whitelist candidates for higher limits.

?“How do you handle rate limits for a globally distributed API with gateways in 10 regions?”Reveal

The core tension: global consistency (a single quota across regions) vs per-request latency (cross-region Redis call adds 50-150 ms).

Three strategies, pick by use case:

1. Regional limits, no global sync. Each region enforces its own bucket (e.g., 100K req/min per region per key). Users stay in their region (geo-DNS). Simple, low-latency, per-request cost ~1 ms local Redis.

Overage: worst case, a user with cross-region requests gets N× their intended limit (where N = number of regions). Usually acceptable.
Use case: most public APIs. Customers rarely cross regions; the overage is small.

2. Global limit with local leases. Each region gets a share of the global quota — e.g. the us-east gateway is leased 60% of the 100K/min limit. Regions negotiate lease sizes periodically based on demand (every 30 s). Requests check the local lease without cross-region round-trip.

Overage: bounded to the lease granularity. Adjustment lag means a surging region is briefly under-provisioned before the next negotiation.
Use case: workloads with high imbalance (big customer concentrated in one region). Closer to global accuracy without per-request latency.

3. Strict global via cross-region Redis (or a consensus store). Every request hits a single authoritative limiter, possibly replicated for availability but coordinating through Raft or similar.

Overage: none (strict).
Cost: per-request RTT includes cross-region hop. ~100 ms added latency for cross-continent.
Use case: hard compliance / billing quotas where accuracy matters more than latency. Rarely the right answer.

Practical recommendation for most APIs: per-region limits with a global cap on paid tiers enforced asynchronously (at billing time, not at request time). This gives low latency for the hot path and accurate billing for customers, at the cost of accepting small intra-second overages.

The interview takeaway: don't propose strict global limits reflexively. Name the overage tolerance and match the strategy. "Per-region, global billing reconciliation" is the senior answer.

?“Token bucket vs leaky bucket vs sliding window — when would you actually pick each?”Reveal

Token bucket — the default.

Allows bursts up to B then sustained at R.
Matches typical user behavior (occasional bursts) without punishing legitimate spikes.
O(1) storage, trivial to implement in Redis.
Pick when: most APIs. Unless you have a specific reason, this.

Leaky bucket — smoothing, not limiting.

Queue at input, constant drain rate.
Bursts get delayed, not rejected — adds latency.
Pick when: you have a hard downstream rate limit (e.g., you call a 3rd-party API that accepts max 10 req/s) and want to smooth your outbound calls rather than drop them. It's an outbound shaping tool, not an inbound rejection tool.
Don't pick when: your users prefer 429s to silent 5-second waits. Queueing violates user expectations.

Fixed window counter — avoid.

Count requests per calendar minute.
The edge-doubling problem: a client can do B in the last second of minute 1 and B in the first of minute 2 → 2B in 2 seconds.
Pick when: you explicitly accept the 2× burst and want O(1) simplicity. Rare.

Sliding window log — accurate but expensive.

Store every request timestamp, count those in the last N seconds.
O(requests-in-window) memory; O(log n) or O(n) per check.
Pick when: you need exact accuracy for small N (tiny per-second bucket) and memory isn't a concern. Rarely the right tradeoff at scale.

Sliding window counter — the compromise.

Two adjacent fixed windows, weighted by position in current window.
~95% accurate, O(1) storage.
Pick when: you want fixed-window simplicity without the edge-doubling problem. Good for analytics-style limits where a small error is acceptable.

GCRA (Generic Cell Rate Algorithm) — the sophisticated option.

Mathematically equivalent to leaky bucket, but O(1) with a single timestamp (no queue).
Used by Stripe's rate limiter; precise and cheap.
Pick when: you want leaky-bucket semantics without queueing cost. High-QPS production limiters.

The interview default: token bucket. If asked why not X, name the specific tradeoff (edge-doubling for fixed-window, added latency for leaky-bucket, memory cost for sliding-log).

Code examples

luaRedis Lua — atomic token-bucket check (fill + consume in one round-trip)

-- KEYS[1] = bucket key (e.g. "rl:api-key:abc123")
-- ARGV[1] = capacity (B)          ARGV[2] = refill rate (R tokens/sec)
-- ARGV[3] = now (ms)              ARGV[4] = cost (usually 1)
-- Returns: { allowed (0/1), tokens_remaining, retry_after_ms }
local capacity = tonumber(ARGV[1])
local rate     = tonumber(ARGV[2])
local now      = tonumber(ARGV[3])
local cost     = tonumber(ARGV[4])

local data = redis.call('HMGET', KEYS[1], 'tokens', 'ts')
local tokens = tonumber(data[1]) or capacity
local last   = tonumber(data[2]) or now

-- Refill since last check, capped at capacity.
local elapsed = math.max(0, now - last)
tokens = math.min(capacity, tokens + (elapsed * rate / 1000.0))

local allowed = 0
local retry_after = 0
if tokens >= cost then
  tokens = tokens - cost
  allowed = 1
else
  retry_after = math.ceil((cost - tokens) * 1000.0 / rate)
end

redis.call('HMSET', KEYS[1], 'tokens', tokens, 'ts', now)
-- Key expires after 2× fill time — limits memory for idle keys.
redis.call('PEXPIRE', KEYS[1], math.ceil(2 * capacity * 1000 / rate))

return { allowed, math.floor(tokens), retry_after }

typescriptExpress middleware — 429 + Retry-After + RateLimit-* headers

import type { Request, Response, NextFunction } from 'express';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL!);
const bucketScript = /* the Lua script above, pre-loaded via SCRIPT LOAD */ '';

export function rateLimit(opts: { capacity: number; refillPerSec: number }) {
  return async (req: Request, res: Response, next: NextFunction) => {
    const key = `rl:${req.header('X-API-Key') ?? req.ip}`;
    const now = Date.now();
    try {
      const [allowed, remaining, retryMs] = (await redis.evalsha(
        bucketScript, 1, key,
        opts.capacity, opts.refillPerSec, now, 1,
      )) as [number, number, number];

      // Emit headers on EVERY response so clients can self-throttle.
      res.setHeader('RateLimit-Limit', opts.capacity);
      res.setHeader('RateLimit-Remaining', remaining);
      res.setHeader(
        'RateLimit-Reset',
        Math.ceil((opts.capacity - remaining) / opts.refillPerSec),
      );

      if (allowed === 1) return next();

      const retrySec = Math.ceil(retryMs / 1000);
      res.setHeader('Retry-After', retrySec);
      return res.status(429).json({
        error: 'rate_limited',
        message: `Rate limit exceeded. Retry after ${retrySec}s.`,
        limit: opts.capacity,
        remaining,
        retry_after: retrySec,
      });
    } catch (err) {
      // Fail OPEN — limiter outage must not take down the API.
      req.log?.warn({ err }, 'rate-limit: redis unavailable, failing open');
      return next();
    }
  };
}

Common mistakes

Per-IP only

Corporate NATs hide thousands of users behind one IP. Rate-limiting the IP denies service to everyone in the office. Use API key or user id for authenticated paths; IP only as a CDN-level safety net.

Fixed window counter

Attacker times bursts across the window boundary → 2× the intended limit in 2 seconds. Prefer sliding window or token bucket.

No Retry-After header

Clients back-off blind — either too aggressively (bad UX) or too gently (sustained over-limit). Emit Retry-After and the RateLimit-* headers on every response.

Limiter on the critical path without fallbackAdvanced

Limiter dies → all requests fail. Fallback: fail open (serve but log), or switch to local-counter mode until the limiter recovers. Never hard-fail the whole API because the quota check failed.

Practice drills

Explain token bucket in 30 seconds.Reveal

Bucket holds up to B tokens. Refills at R tokens/second. Every request removes one token; if the bucket is empty, reject. This allows bursts up to B then sustained at R/s. One counter + one timestamp per identity.

Your rate limiter is a single Redis. What happens when it goes down?Reveal

Two options: (a) fail-open — let requests through while the limiter is unavailable, log the gap, let downstream capacity absorb it; (b) fail to local counters — each gateway falls back to in-memory limiting with a conservative (lower) limit until Redis recovers. Almost always fail-open for DDoS-scale limits and fail-local for quota-style limits. Never fail-closed on the critical path without a very good reason.

Interviewer: "should limits be per API key or per user?"Reveal

Depends on who the API key represents. If one key per end-user (mobile app embedding a per-user token), they are the same. If one key per developer app used by many users, limit per key (the app) and separately per user where identifiable. The rule: limit at whatever axis matches the cost. Abuse by one user inside a shared key is a per-user problem; abuse of the key itself is a per-key problem.

Cheat sheet

•Default algorithm: token bucket. Burst B + sustained R.
•Default storage: Redis with INCR + EXPIRE. One round-trip per request.
•Limit on identity, not just IP. API key > user id > IP.
•Layer: CDN (volumetric) → gateway (per-key) → app (per-user, per-path).
•Response: 429 + Retry-After + RateLimit-* headers.
•Fail-open on limiter outage — don't DoS yourself.
•Expensive endpoints get their own bucket.

Practice this skill

These problems exercise Abuse prevention & rate limiting. Try one now to apply what you just learned.

rate limiter

7% complete

Current

Read this if

Step 1 of 14

The concept

Jump to next

Algorithm

Bursts

Accuracy

Cost

Token bucket

Yes (up to B)

Good

O(1) — two numbers per key

Leaky bucket

No — smooths to R/s

Exact rate

O(1) + optional queue

Fixed window

Up to 2B at edges

Poor

O(1) — one counter

Sliding log

Yes

Exact

O(requests in window)

Sliding counter

Yes

Very good

O(1) — two counters

-- KEYS[1] = bucket key, ARGV = {now_ms, refill_rate, burst} local now = tonumber(ARGV[1]) local rate = tonumber(ARGV[2]) -- tokens per ms local burst = tonumber(ARGV[3]) local data = redis.call('HMGET', KEYS[1], 'tokens', 'last_refill') local tokens = tonumber(data[1]) or burst local last = tonumber(data[2]) or now -- Refill based on elapsed time, capped at burst tokens = math.min(burst, tokens + (now - last) * rate) if tokens >= 1 then tokens = tokens - 1 redis.call('HSET', KEYS[1], 'tokens', tokens, 'last_refill', now) redis.call('PEXPIRE', KEYS[1], 60000) -- TTL idle identities out return 1 -- allow else redis.call('HSET', KEYS[1], 'tokens', tokens, 'last_refill', now) return 0 -- deny end

{ "error": "rate_limited", "message": "API rate limit exceeded. Retry after 12 seconds.", "limit": 100, "remaining": 0, "reset_at": "2026-04-23T14:35:00Z", "docs_url": "https://docs.example.com/api/rate-limits" }

-- KEYS[1] = bucket key (e.g. "rl:api-key:abc123") -- ARGV[1] = capacity (B) ARGV[2] = refill rate (R tokens/sec) -- ARGV[3] = now (ms) ARGV[4] = cost (usually 1) -- Returns: { allowed (0/1), tokens_remaining, retry_after_ms } local capacity = tonumber(ARGV[1]) local rate = tonumber(ARGV[2]) local now = tonumber(ARGV[3]) local cost = tonumber(ARGV[4]) local data = redis.call('HMGET', KEYS[1], 'tokens', 'ts') local tokens = tonumber(data[1]) or capacity local last = tonumber(data[2]) or now -- Refill since last check, capped at capacity. local elapsed = math.max(0, now - last) tokens = math.min(capacity, tokens + (elapsed * rate / 1000.0)) local allowed = 0 local retry_after = 0 if tokens >= cost then tokens = tokens - cost allowed = 1 else retry_after = math.ceil((cost - tokens) * 1000.0 / rate) end redis.call('HMSET', KEYS[1], 'tokens', tokens, 'ts', now) -- Key expires after 2× fill time — limits memory for idle keys. redis.call('PEXPIRE', KEYS[1], math.ceil(2 * capacity * 1000 / rate)) return { allowed, math.floor(tokens), retry_after }

import type { Request, Response, NextFunction } from 'express'; import Redis from 'ioredis'; const redis = new Redis(process.env.REDIS_URL!); const bucketScript = /* the Lua script above, pre-loaded via SCRIPT LOAD */ ''; export function rateLimit(opts: { capacity: number; refillPerSec: number }) { return async (req: Request, res: Response, next: NextFunction) => { const key = `rl:${req.header('X-API-Key') ?? req.ip}`; const now = Date.now(); try { const [allowed, remaining, retryMs] = (await redis.evalsha( bucketScript, 1, key, opts.capacity, opts.refillPerSec, now, 1, )) as [number, number, number]; // Emit headers on EVERY response so clients can self-throttle. res.setHeader('RateLimit-Limit', opts.capacity); res.setHeader('RateLimit-Remaining', remaining); res.setHeader( 'RateLimit-Reset', Math.ceil((opts.capacity - remaining) / opts.refillPerSec), ); if (allowed === 1) return next(); const retrySec = Math.ceil(retryMs / 1000); res.setHeader('Retry-After', retrySec); return res.status(429).json({ error: 'rate_limited', message: `Rate limit exceeded. Retry after ${retrySec}s.`, limit: opts.capacity, remaining, retry_after: retrySec, }); } catch (err) { // Fail OPEN — limiter outage must not take down the API. req.log?.warn({ err }, 'rate-limit: redis unavailable, failing open'); return next(); } }; }