Loading…
Loading…
Learn
New here? Start with one of the entry tracks. Already practising? The recommended path at the top closes whatever your last debrief flagged. Want a specific concept? Search, or jump straight to Fundamentals — the 16 skills every checkpoint maps to.
Walk into the round with the five most-tested patterns and the failure-mode framework rehearsed.
Best for: Mid-senior engineers 7–14 days before a round
Start here
Each starter is an opinionated reading path. Open one and work top to bottom — the writing assumes nothing about prior context.
First time
Build the core vocabulary every senior engineer has.
Crunch time
The 90-minute crash course that plugs the most common gaps.
Data modelling
For engineers whose designs are strong on compute, weak on storage.
Recognise fast
Learn the six system shapes interviewers expect you to name on sight.
Fundamentals
Every checkpoint in a debrief maps to one of these. Open the one your last attempt flagged, or browse top to bottom — the dot indicates difficulty (core / intermediate / advanced).
Capacity estimation (back-of-envelope)
25mEvery downstream decision — cache size, shard count, replica count — collapses onto one question: what are the numbers? Candidates who skip this are designing vibes, not systems.
Numbers to know
15mIf your design decisions aren't backed by numbers, they're opinions. Knowing that Redis handles 100K ops/sec or that a cross-region RTT is 60ms isn't trivia — it's what separates "we'll add a cache" from "we need 3 Redis instances because our hot-path is 200K reads/sec."
High-level architecture
40mThe HLD is the diagram every subsequent question is asked against. Clear boundaries + explicit dataflow beats clever components every time. Most candidates over-draw; seniors underdraw and label.
Async messaging & queues
30mQueues decouple producers from consumers — but the delivery semantics come with sharp edges. Exactly-once is a lie; at-least-once + idempotent consumers is the truth.
Networking fundamentals
20mYou can't design distributed systems without understanding how bytes travel from client to server and back. DNS, TCP, TLS, HTTP — these aren't trivia. They're the latency budget, the failure modes, and the protocol choices that underpin every design decision you make.
Data model design
35m"We'll put it in Postgres" is not a data model. The data model is entities, keys, relationships, cardinalities, and the access patterns each one has to serve — and it locks in every trade-off you will chase for the rest of the design.
Storage choice justification
40mPicking a database is a first-principles decision, not a defaults one. "We use Postgres" is a cultural statement; "the access pattern is point-lookup at 100k QPS with eventual consistency, so we use DynamoDB" is a design.
Sharding & partitioning
20mThe partition key is the single most consequential decision in a distributed data design. Pick it wrong and no amount of horsepower recovers you — the hot shard stays hot, the rebalance never finishes, and the team spends a quarter migrating.
Caching strategy
30m"We'll add a cache" is where weak designs die. Interviewers ask: which one, caching what exactly, with what TTL, invalidated how, behind which API boundary? If you can't answer all five, the cache line in your diagram is decoration.
Load balancing & traffic routing
25mL4 vs L7 is not a trivia question — it's about whether the LB can make decisions based on the request content. One is dumb and fast; the other is smart and expensive. Most prompts want L7 at the edge and L4 between services.
Consistent hashing
30mSimple hash(key) % N breaks the moment N changes — nearly every key remaps, every cache goes cold, and every shard rebalances. Consistent hashing moves only 1/N of keys. It's the algorithm behind every production cache cluster and most distributed databases.
Failure mode analysis
40mSystems don't fail because you didn't think they could. They fail the way you failed to think about. Failure-mode analysis is structured paranoia — and interviewers grade on whether you can produce it on demand.
Replication & durability
30mReplication is how you survive a node death; durability is how you survive a bad deploy. Candidates confuse the two and end up with a design that's highly available but cheerfully corrupt.
Observability & operations
25mYou cannot operate what you cannot see; you cannot page on what you cannot measure. Candidates who design beautiful systems with no metrics, no logs, and no alerts are designing systems their on-call team will hate.
Patterns
Once the fundamentals click, pattern recognition is what separates a strong design round from a struggle. Six shapes cover most of what you'll meet in interviews.
Read-heavy
80mReads dominate writes by 10:1 or more. Every layer exists to keep the primary out of the hot path.
Write-heavy
90mWrites arrive faster than any single node can persist them synchronously. The design is about absorbing, spreading, and deferring them.
Classic request-response
60mThe boring default. Synchronous HTTP with a cache. Works for 70% of APIs and you should say so.
Long-running tasks
80mAccept work with 202 + job_id, process asynchronously, and let clients track progress via poll, push, or webhook.
Producer-consumer / work queue
35mDecouple rate of production from rate of consumption with a durable queue and autoscaled workers.
Event-driven / saga
70mCoordinate multi-service workflows via compensating transactions instead of distributed locks — choreography for simple flows, orchestration for everything else.
Real-time delivery
80mPoll, long-poll, SSE, or WebSocket — the choice is about update frequency, direction, and how many persistent connections you can afford.
Fan-out: on write vs on read
75mWhere does the work live — at write time (push to every follower's inbox) or at read time (gather from each followed user)? Both break at the extremes; hybrids win.
Edge caching / CDN-first
70mThe cheapest request is the one that never hits your origin. Push static and near-static content to the edge and let the CDN absorb 80–99% of reads.
Multi-region active-passive / active-active
75mGeographic distribution for latency, DR, and compliance. Active-passive is operationally sane; active-active is a conflict-resolution project.
Search over content
5mInverted index + ranking service. The hard part isn't indexing — it's relevance and keeping the index fresh.
Geospatial / proximity lookup
5mGeohash / S2 / H3 for "nearby X" queries. Straight lat/lng on a B-tree dies at a few thousand rows.
Large file upload & blob handling
5mChunked + resumable uploads direct to blob storage. Signed URLs. App server never touches the bytes.
Content recommendation
5mFeature store + candidate generation + ranking. Offline training, online serving. Separate the cheap retrieval from the expensive scoring.
Rate limiting / quota enforcement
5mToken bucket + distributed counter. The token bucket math is trivial; the distribution is the hard part.
Reading paths
Each path is an opinionated sequence of lessons and patterns. Browse the catalogue below — or pick the recommended one above if you have practice data.
Turn a vague prompt into a designable problem, sketch the right high-level shape, and defend the API contract.
For: Junior → mid-level, first system design round coming up
StartWalk into the round with the five most-tested patterns and the failure-mode framework rehearsed.
For: Mid-senior engineers 7–14 days before a round
StartPick the right store, the right partition key, and the right indexes for a given prompt — with defensible reasoning.
For: Engineers whose feedback often cites "data model unclear" or "why that database?"
StartRecognise which of six system shapes a prompt maps to within the first two minutes, and narrate the v1 → v2 → v3 scaling path cold.
For: Engineers who learn by shape, not by topic list
StartName a failure mode for each component, a mitigation for each, and an availability target + topology that matches.
For: Engineers whose feedback cites "hand-wavy on failure" or "no DR story"
StartDefend every design choice with a specific trade-off — not "it's faster" but "we trade X for Y at our scale".
For: Senior engineers gunning for staff calibration
StartFrame a prompt, bound its scope, and draft a defensible API contract in under 10 minutes.
For: Engineers who run out of time before reaching the architecture
StartSound like an engineer who has shipped systems at scale, not one who has read about them.
For: Strong seniors pursuing staff-level rounds
StartDesign a push-based real-time system end-to-end: protocol, fan-out strategy, presence, back-pressure, and reconnection semantics.
For: Engineers prepping for chat / feed / collab-doc prompts
StartName the right store, the right partition key, the right indexes, and the right replication mode — and defend each.
For: Engineers prepping for data-heavy prompts (search, analytics, feeds, storage products)
Start