Design: Design a Webhook Delivery Service

Endpoints

Add the operations your service exposes. Method, path, and status codes make your API much easier to review.

Start with a template

Applies to all endpoints

Policies that aren't specific to a single endpoint — auth, rate limits, versioning, and other notes.

Authentication

Rate limiting

Versioning

Notes (idempotency, redirect choice, error shape, anything else)

Diagram

Draw the components and how traffic flows between them. Notes below the canvas are optional — the Walkthrough panel is the primary place to narrate flows.

Draw the architecture so the components and connections tell the core story; flow narration in Notes is optional but recommended.

Components

Click palette â†’ add

Drag edge dot â†’ connect

Double-click node/edge â†’ rename

Shift+drag or box-select â†’ multi

Click any component on the left to add it here.

Drag the edge of a node to connect it to another.

Double-click any arrow to label it.

Notes (optional) — flow narration, trade-offs, chosen alternatives0 elements on canvas

Request walkthrough

Trace each core requirement as an ordered sequence of hops through your diagram. Use component names from your canvas for the From / To columns.

1

Deliver internal events to subscribed customer HTTP endpoints with at-least-once durability — every accepted event lands on every active subscription, or terminates as dead-letter.

FromToAction / payload

1.

2.

2

Retry failed deliveries with exponential backoff and jitter; up to 12 attempts spread across 24 hours before dead-lettering.

FromToAction / payload

1.

2.

3

Sign every webhook with HMAC-SHA256 over (timestamp + body) using a per-subscription secret; receivers reject if timestamp is older than 5 minutes (replay protection).

FromToAction / payload

1.

2.

4

Per-subscription event-type filter (e.g. charge.* matches charge.succeeded, charge.failed).

FromToAction / payload

1.

2.

5

Customer dashboard with full delivery history (each attempt: status, response code, latency, request body) and one-click replay for any event in last 30 days.

FromToAction / payload

1.

2.

6

Per-customer fairness — one slow endpoint must not delay deliveries for other customers.

FromToAction / payload

1.

2.

Storage schema

For each entity, declare how it's stored. Sharding key is the interesting one — pick the access pattern it optimises for.

Subscription

A customer-registered endpoint: customer_id, URL, event-type filter, secret, active flag, retry policy override.

In-memory / derived

Storage type

Primary key

Sharding / partition key

Critical fields

Notes (indexes, TTL, access pattern)

Event

An internal event with id, event_type, body, created_at, retention_until. Produced once, fanned out to N subscriptions.

In-memory / derived

Storage type

Primary key

Sharding / partition key

Critical fields

Notes (indexes, TTL, access pattern)

DeliveryAttempt

Per-(subscription × event × attempt) row: attempt_seq, status, response_code, latency_ms, last_error, next_retry_at.

In-memory / derived

Storage type

Primary key

Sharding / partition key

Critical fields

Notes (indexes, TTL, access pattern)

DeadLetterRecord

A delivery that exhausted retries — surfaces in dashboard, replayable manually within 30 days.

In-memory / derived

Storage type

Primary key

Sharding / partition key

Critical fields

Notes (indexes, TTL, access pattern)

Component choices

Pick one per row and give a one-line reason. These are the concrete technology decisions your diagram implies.

Load Balancer

How traffic is distributed to your app servers.

API Gateway / Proxy

The proxy / gateway tier that owns auth, routing, and per-tenant policy before traffic reaches the app.

Queue / Stream

Async work buffer for writes/fan-out.

Cache

Where hot reads are served from.

Database

Primary durable store for entities.

Worker / Dispatcher Pool

The async worker tier that drains the queue and does the slow work (delivery, fan-out, embedding).

Topology

Where the decision is made — at the edge or a shared service.

Your diagram

No components drawn yet — edit the diagram before answering.

Iterate on your design — don't start over.

Each scenario below probes a specific weakness in a typical HLD. Reference components from your diagram by name, describe what breaks and at what load, then name the minimum change that fixes it. Strong answers identify the precise failure mode — not just "scale it up".

1

A customer with 10K dead-lettered events from yesterday clicks "replay all" on the dashboard. What happens?

Probes: abuse rate limiting

Your answer

2

A customer's endpoint goes from 100ms responses to 30s responses for 10 minutes, then back to normal. Walk me through their queue depth and dispatch rate minute-by-minute.

Probes: failure mode analysis

Your answer

Design a Webhook Delivery Service

API & core entities

Endpoints

Applies to all endpoints

High-level design

Deep dives