Design: Design a Rate Limiter

Endpoints

Add the operations your service exposes. Method, path, and status codes make your API much easier to review.

Start with a template

Applies to all endpoints

Policies that aren't specific to a single endpoint — auth, rate limits, versioning, and other notes.

Authentication

Rate limiting

Versioning

Notes (idempotency, redirect choice, error shape, anything else)

Diagram

Draw the components and how traffic flows between them. Notes below the canvas are optional — the Walkthrough panel is the primary place to narrate flows.

Draw the architecture so the components and connections tell the core story; flow narration in Notes is optional but recommended.

Components

Click palette â†’ add

Drag edge dot â†’ connect

Double-click node/edge â†’ rename

Shift+drag or box-select â†’ multi

Click any component on the left to add it here.

Drag the edge of a node to connect it to another.

Double-click any arrow to label it.

Notes (optional) — flow narration, trade-offs, chosen alternatives0 elements on canvas

Request walkthrough

Trace each core requirement as an ordered sequence of hops through your diagram. Use component names from your canvas for the From / To columns.

1

Given (subject, route), return allow / deny + retry-after metadata.

FromToAction / payload

1.

2.

2

Limits are configurable per subject (user / API key / IP) and per route.

FromToAction / payload

1.

2.

3

Support multiple windows simultaneously (e.g. 10/sec AND 100/min).

FromToAction / payload

1.

2.

Storage schema

For each entity, declare how it's stored. Sharding key is the interesting one — pick the access pattern it optimises for.

Subject

The thing being rate-limited (user, API key, or IP).

In-memory / derived

Storage type

Primary key

Sharding / partition key

Critical fields

Notes (indexes, TTL, access pattern)

Policy

Rule binding: (subject_type, route) → (limit, window, algorithm).

In-memory / derived

Storage type

Primary key

Sharding / partition key

Critical fields

Notes (indexes, TTL, access pattern)

Bucket

Per-subject per-window counter state, key = (subject, route, window_start).

In-memory / derived

Storage type

Primary key

Sharding / partition key

Critical fields

Notes (indexes, TTL, access pattern)

Decision

Result of one check call: { allowed, remaining, reset_at, retry_after }.

In-memory / derived

Storage type

Primary key

Sharding / partition key

Critical fields

Notes (indexes, TTL, access pattern)

Component choices

Pick one per row and give a one-line reason. These are the concrete technology decisions your diagram implies.

Topology

Where the decision is made — at the edge or a shared service.

State Storage

Where per-subject state lives (e.g. rate-limit counters).

Algorithm

The core decision algorithm for this problem.

Load Balancer

How traffic is distributed to your app servers.

Your diagram

No components drawn yet — edit the diagram before answering.

Iterate on your design — don't start over.

Each scenario below probes a specific weakness in a typical HLD. Reference components from your diagram by name, describe what breaks and at what load, then name the minimum change that fixes it. Strong answers identify the precise failure mode — not just "scale it up".

1

A celebrity tweets a link and 100K req/sec all hit the same rate-limit key for that tweet. Your shard for that key melts. What do you do?

Probes: sharding partitioning

Your answer

2

You deploy in 3 regions, each with its own Redis. The customer says "100 requests per minute global". What is your honest answer?

Probes: consistency tradeoffs

Your answer

Design a Rate Limiter

API & core entities

Endpoints

Applies to all endpoints

High-level design

Deep dives