Loading…
Loading…
Add the operations your service exposes. Method, path, and status codes make your API much easier to review.
Policies that aren't specific to a single endpoint — auth, rate limits, versioning, and other notes.
Diagram
Draw the components and how traffic flows between them. Notes below the canvas are optional — the Walkthrough panel is the primary place to narrate flows.
Request walkthrough
Trace each core requirement as an ordered sequence of hops through your diagram. Use component names from your canvas for the From / To columns.
Accept notification requests from upstream services via API or event bus.
Deliver notifications via push (APNs/FCM), SMS (Twilio), and email (SES/SendGrid).
Respect user notification preferences (channel opt-in/out, quiet hours, frequency caps).
De-duplicate: the same notification to the same user should not be sent twice.
Support priority levels (critical: OTP/security, high: transactional, low: marketing).
Storage schema
For each entity, declare how it's stored. Sharding key is the interesting one — pick the access pattern it optimises for.
An inbound request: recipient, type, channel hint, priority, template, payload.
Per-user channel opt-ins, quiet hours, frequency caps, device tokens.
A parameterized message template (subject, body, deep link) per notification type × channel.
Tracks each notification: status (queued → sent → delivered/failed/bounced), timestamps, retry count.
Component choices
Pick one per row and give a one-line reason. These are the concrete technology decisions your diagram implies.
How traffic is distributed to your app servers.
Async work buffer for writes/fan-out.
Primary durable store for entities.
The async worker tier that drains the queue and does the slow work (delivery, fan-out, embedding).
Your diagram
No components drawn yet — edit the diagram before answering.
Iterate on your design — don't start over.
Each scenario below probes a specific weakness in a typical HLD. Reference components from your diagram by name, describe what breaks and at what load, then name the minimum change that fixes it. Strong answers identify the precise failure mode — not just "scale it up".
APNs (Apple Push Notification Service) is down for 10 minutes. What happens to the millions of queued push notifications?
Probes: failure mode analysis