Loading…
Loading…
Add the operations your service exposes. Method, path, and status codes make your API much easier to review.
Policies that aren't specific to a single endpoint — auth, rate limits, versioning, and other notes.
Diagram
Draw the components and how traffic flows between them. Notes below the canvas are optional — the Walkthrough panel is the primary place to narrate flows.
Request walkthrough
Trace each core requirement as an ordered sequence of hops through your diagram. Use component names from your canvas for the From / To columns.
Users can send and receive text messages in 1:1 conversations.
Users can create and participate in group chats (up to 100 members).
Messages sent while a user is offline are delivered when they reconnect (up to 30 days).
Users can send and receive media (images, files) in messages.
Storage schema
For each entity, declare how it's stored. Sharding key is the interesting one — pick the access pattern it optimises for.
A registered user with one or more connected devices.
A conversation — either 1:1 or group — with participant list and metadata.
A text or media payload sent within a chat, with sender, timestamp, and delivery status.
A specific device/session for a user (phone, tablet, desktop). A user may have multiple active clients.
Component choices
Pick one per row and give a one-line reason. These are the concrete technology decisions your diagram implies.
How traffic is distributed to your app servers.
Where hot reads are served from.
Async work buffer for writes/fan-out.
Primary durable store for entities.
Where per-subject state lives (e.g. rate-limit counters).
The async worker tier that drains the queue and does the slow work (delivery, fan-out, embedding).
Your diagram
No components drawn yet — edit the diagram before answering.
Iterate on your design — don't start over.
Each scenario below probes a specific weakness in a typical HLD. Reference components from your diagram by name, describe what breaks and at what load, then name the minimum change that fixes it. Strong answers identify the precise failure mode — not just "scale it up".
A chat server holding 100K connections crashes. Walk me through what happens to those 100K users.
Probes: failure mode analysis
If you deploy chat servers in US, EU, and Asia, how do you handle a message from a US user to an EU user?
Probes: consistency tradeoffs