Learn Practice Mock

SR

Loading…

Learn

The library, in the order you should read it.

New here? Start with one of the entry tracks. Already practising? The recommended path at the top closes whatever your last debrief flagged. Want a specific concept? Search, or jump straight to Fundamentals — the 16 skills every checkpoint maps to.

Featured path this week

Interviewing next week

Walk into the round with the five most-tested patterns and the failure-mode framework rehearsed.

Best for: Mid-senior engineers 7–14 days before a round

6 stops·~90 min

Start here
Fundamentals
Patterns
Reading paths

Start here

Pick the entry point that matches where you are.

Each starter is an opinionated reading path. Open one and work top to bottom — the writing assumes nothing about prior context.

First time

New to system design

Build the core vocabulary every senior engineer has.

~110 min·6 stops

Crunch time

Interviewing next week

The 90-minute crash course that plugs the most common gaps.

~90 min·6 stops

Data modelling

Weak on data modelling

For engineers whose designs are strong on compute, weak on storage.

~85 min·5 stops

Recognise fast

Patterns first

Learn the six system shapes interviewers expect you to name on sight.

~120 min·7 stops

Fundamentals

The 19 skills an interviewer grades against.

Every checkpoint in a debrief maps to one of these. Open the one your last attempt flagged, or browse top to bottom — the dot indicates difficulty (core / intermediate / advanced).

Requirements & scope1 Capacity & estimation2 Architecture3 API design1 Data & storage3 Scalability3 Reliability3 Trade-offs1 Performance1 Security & abuse1

Requirements & scope

1

Requirements & scope framing
30m
The first five minutes decide the next forty. Candidates who skip scoping design the wrong system brilliantly — and still fail. The ones who nail it look senior before they've drawn a single box.

Capacity & estimation

2

Capacity estimation (back-of-envelope)
25m
Every downstream decision — cache size, shard count, replica count — collapses onto one question: what are the numbers? Candidates who skip this are designing vibes, not systems.
Numbers to know
15m
If your design decisions aren't backed by numbers, they're opinions. Knowing that Redis handles 100K ops/sec or that a cross-region RTT is 60ms isn't trivia — it's what separates "we'll add a cache" from "we need 3 Redis instances because our hot-path is 200K reads/sec."

Architecture

3

High-level architecture
40m
The HLD is the diagram every subsequent question is asked against. Clear boundaries + explicit dataflow beats clever components every time. Most candidates over-draw; seniors underdraw and label.
Async messaging & queues
30m
Queues decouple producers from consumers — but the delivery semantics come with sharp edges. Exactly-once is a lie; at-least-once + idempotent consumers is the truth.
Networking fundamentals
20m
You can't design distributed systems without understanding how bytes travel from client to server and back. DNS, TCP, TLS, HTTP — these aren't trivia. They're the latency budget, the failure modes, and the protocol choices that underpin every design decision you make.

API design

1

API contract design
35m
The API is the contract every client writes code against. Vague endpoints here metastasize into ambiguity everywhere else in the design. Interviewers use API design to separate candidates who have shipped from candidates who have read blog posts.

Data & storage

3

Data model design
35m
"We'll put it in Postgres" is not a data model. The data model is entities, keys, relationships, cardinalities, and the access patterns each one has to serve — and it locks in every trade-off you will chase for the rest of the design.
Storage choice justification
40m
Picking a database is a first-principles decision, not a defaults one. "We use Postgres" is a cultural statement; "the access pattern is point-lookup at 100k QPS with eventual consistency, so we use DynamoDB" is a design.
Sharding & partitioning
20m
The partition key is the single most consequential decision in a distributed data design. Pick it wrong and no amount of horsepower recovers you — the hot shard stays hot, the rebalance never finishes, and the team spends a quarter migrating.

Scalability

3

Caching strategy
30m
"We'll add a cache" is where weak designs die. Interviewers ask: which one, caching what exactly, with what TTL, invalidated how, behind which API boundary? If you can't answer all five, the cache line in your diagram is decoration.
Load balancing & traffic routing
25m
L4 vs L7 is not a trivia question — it's about whether the LB can make decisions based on the request content. One is dumb and fast; the other is smart and expensive. Most prompts want L7 at the edge and L4 between services.
Consistent hashing
30m
Simple hash(key) % N breaks the moment N changes — nearly every key remaps, every cache goes cold, and every shard rebalances. Consistent hashing moves only 1/N of keys. It's the algorithm behind every production cache cluster and most distributed databases.

Reliability

3

Failure mode analysis
40m
Systems don't fail because you didn't think they could. They fail the way you failed to think about. Failure-mode analysis is structured paranoia — and interviewers grade on whether you can produce it on demand.
Replication & durability
30m
Replication is how you survive a node death; durability is how you survive a bad deploy. Candidates confuse the two and end up with a design that's highly available but cheerfully corrupt.
Observability & operations
25m
You cannot operate what you cannot see; you cannot page on what you cannot measure. Candidates who design beautiful systems with no metrics, no logs, and no alerts are designing systems their on-call team will hate.

Trade-offs

1

Consistency trade-offs
35m
CAP is not a trivia question. It's the trade-off that every distributed system lives under, and getting it wrong is how you end up with "strong consistency" backed by a single node — or "eventual consistency" on data that absolutely cannot be eventually wrong.

Performance

1

Latency budgeting
20m
A budget you don't compute is a budget you'll blow. Every synchronous hop costs milliseconds you don't get back — and tail latency isn't the average plus a bit, it's a different animal.

Security & abuse

1

Abuse prevention & rate limiting
25m
Rate limits are the only thing between your free tier and a botnet. A system without them is not a product — it's a target.

Patterns

System shapes you should recognise on sight.

Once the fundamentals click, pattern recognition is what separates a strong design round from a struggle. Six shapes cover most of what you'll meet in interviews.

Workload shape

· How traffic is distributed between reads and writes drives every other decision.

Read-heavy

Reads dominate writes by 10:1 or more. Every layer exists to keep the primary out of the hot path.

Write-heavy

Writes arrive faster than any single node can persist them synchronously. The design is about absorbing, spreading, and deferring them.

Classic request-response

The boring default. Synchronous HTTP with a cache. Works for 70% of APIs and you should say so.

Execution shape

· Synchronous, asynchronous, real-time — how work flows through the system.

Long-running tasks

Accept work with 202 + job_id, process asynchronously, and let clients track progress via poll, push, or webhook.

Producer-consumer / work queue

Decouple rate of production from rate of consumption with a durable queue and autoscaled workers.

Event-driven / saga

Coordinate multi-service workflows via compensating transactions instead of distributed locks — choreography for simple flows, orchestration for everything else.

Real-time delivery

Poll, long-poll, SSE, or WebSocket — the choice is about update frequency, direction, and how many persistent connections you can afford.

Data movement & fan-out

· One-to-many delivery, geographic distribution, and the trade-offs that come with them.

Fan-out: on write vs on read

Where does the work live — at write time (push to every follower's inbox) or at read time (gather from each followed user)? Both break at the extremes; hybrids win.

Edge caching / CDN-first

The cheapest request is the one that never hits your origin. Push static and near-static content to the edge and let the CDN absorb 80–99% of reads.

Multi-region active-passive / active-active

Geographic distribution for latency, DR, and compliance. Active-passive is operationally sane; active-active is a conflict-resolution project.

Specialised shapes

· Problem-specific patterns you'll recognise on sight once you've seen them.

Search over content

Inverted index + ranking service. The hard part isn't indexing — it's relevance and keeping the index fresh.

Geospatial / proximity lookup

Geohash / S2 / H3 for "nearby X" queries. Straight lat/lng on a B-tree dies at a few thousand rows.

Large file upload & blob handling

Chunked + resumable uploads direct to blob storage. Signed URLs. App server never touches the bytes.

Content recommendation

Feature store + candidate generation + ranking. Offline training, online serving. Separate the cheap retrieval from the expensive scoring.

Rate limiting / quota enforcement

Token bucket + distributed counter. The token bucket math is trivial; the distribution is the hard part.

Reliability & infra

· The disciplines that turn "it works" into "it stays working".

High Availability

Redundancy + graceful degradation + operational discipline. You don't buy 99.99% — you earn it.

Leader election / consensus

Raft / Paxos via etcd / ZooKeeper. When exactly-one-of-N must do the thing, use a consensus service — don't roll your own.

Reading paths

Full curricula for specific gaps.

Each path is an opinionated sequence of lessons and patterns. Browse the catalogue below — or pick the recommended one above if you have practice data.

6 stops·~110 min

New to system design

Turn a vague prompt into a designable problem, sketch the right high-level shape, and defend the API contract.

For: Junior → mid-level, first system design round coming up

6 stops·~90 min

Interviewing next week

Walk into the round with the five most-tested patterns and the failure-mode framework rehearsed.

For: Mid-senior engineers 7–14 days before a round

5 stops·~85 min

Weak on data modelling

Pick the right store, the right partition key, and the right indexes for a given prompt — with defensible reasoning.

For: Engineers whose feedback often cites "data model unclear" or "why that database?"

7 stops·~120 min

Patterns first

Recognise which of six system shapes a prompt maps to within the first two minutes, and narrate the v1 → v2 → v3 scaling path cold.

For: Engineers who learn by shape, not by topic list

6 stops·~95 min

Weak on reliability

Name a failure mode for each component, a mitigation for each, and an availability target + topology that matches.

For: Engineers whose feedback cites "hand-wavy on failure" or "no DR story"

5 stops·~80 min

Weak on trade-offs

Defend every design choice with a specific trade-off — not "it's faster" but "we trade X for Y at our scale".

For: Senior engineers gunning for staff calibration

5 stops·~75 min

API & requirements cleanup

Frame a prompt, bound its scope, and draft a defensible API contract in under 10 minutes.

For: Engineers who run out of time before reaching the architecture

6 stops·~110 min

Senior → staff calibration

Sound like an engineer who has shipped systems at scale, not one who has read about them.

For: Strong seniors pursuing staff-level rounds

6 stops·~100 min

Real-time systems

Design a push-based real-time system end-to-end: protocol, fan-out strategy, presence, back-pressure, and reconnection semantics.

For: Engineers prepping for chat / feed / collab-doc prompts

7 stops·~110 min

Data-intensive systems

Name the right store, the right partition key, the right indexes, and the right replication mode — and defend each.

For: Engineers prepping for data-heavy prompts (search, analytics, feeds, storage products)