corearchitecture

High-level architecture

Component boundaries, request flow, sync vs async paths.

The HLD is the diagram every subsequent question is asked against. Clear boundaries + explicit dataflow beats clever components every time. Most candidates over-draw; seniors underdraw and label.

Read this if your last attempt…

Your whiteboard ended up as a mess of boxes with no visible request path
You got the feedback "the design was hard to follow"
You jumped to components (Kafka! Redis!) before drawing the request shape
You couldn't answer "walk me through a single request end-to-end"
You drew 15 boxes because you thought more = senior

The concept

A high-level architecture diagram has exactly one job: make the request flow legible. If someone can look at your drawing and narrate "the client calls X, which does Y, which persists in Z", the HLD is working. Every decoration that doesn't serve that goal is noise.

The signal a strong HLD sends is not "I know the names of many technologies." It's "I can explain what happens in this system from the moment a user taps a button to the moment their screen updates."

Architecture diagram· The canonical request path

Every HLD is a version of this: client → edge → app tier → data tier, with async side-channels for anything that doesn't have to block the response.

What to draw vs what to leave as an aside.

Category	Always draw	Sometimes draw	Mention verbally
Entry	LB / edge	WAF / DDoS if asked	Cert termination
Compute	Stateless app tier	Service breakdown if >3 services	Framework / language
Primary data store	Yes	Read replicas if read-heavy	Replication lag budget
Cache	If on hot path	Multi-tier (edge + app) if relevant	Eviction policy
Queue / stream	If async work exists	DLQ / retry topology	Visibility timeout tuning
Search index	If search matters	CDC pipeline to it	Rebuild procedure
Object storage	If media is involved	Lifecycle rules (Glacier)	CDN on top
Auth	No — off to the side	Login flow arrow	JWT / session tactics
Observability	No	Only if prompt demands	Metrics / logs / traces

How interviewers grade this

You draw fewer than 10 boxes on the main diagram. More than that and the interviewer has stopped following.
Every edge has a verb (SELECT, enqueue, publish, stream) — not just an arrow.
You label the protocol on at least one edge (HTTP, gRPC, AMQP, WebSocket).
Async paths are visually distinct from sync paths (dashed, via a queue).
Stateful components (DB, cache, queue) are drawn with replicas or clusters, not as single boxes.
You name the bottleneck before the interviewer asks where it is.
Read and write paths are split when they genuinely diverge, not preemptively.
You name what you are NOT drawing — control plane, auth, observability as verbal asides.

Variants

Monolith + read replicas

One app service, one primary DB, 2–3 read replicas, optional cache.

The honest answer for most prompts under ~10K QPS. Boring, correct, and always the baseline against which fancier designs get compared.

The shape:

LB → N app replicas (stateless) → primary DB
Read replicas for scaling reads (lag < 1 s)
Redis in front for hot reads
Single deploy unit; single on-call rotation

Why it wins for 80% of prompts:

Cheapest to build and operate
No distributed transactions, no eventual consistency
Failure modes are few and understood
Easy to reason about latency and capacity

When it breaks:

Write QPS exceeds what a single primary handles (~20k sustained)
Teams exceed ~30 engineers sharing one codebase
Workloads diverge — one path is 100× the QPS of another

Pros

+Fastest to reason about
+One deploy unit, one DB, one team
+Strong consistency by default
+Ops cost is minimal

Cons

−Single deploy blast radius
−Vertical scaling ceiling on the DB
−Hard to scale teams past ~30 engineers

Choose this variant when

< 10K QPS combined
Single domain / tight coupling
Small team (< 30 engineers)
Most system design prompts as the v1

Domain-split services

Services drawn along domain boundaries (feed, profile, messaging) with their own stores.

Architecture diagram· Domain-split services with their own stores

Each service owns its data. Cross-service reads/writes go over RPC or async events. Boundaries follow domains, not technologies.

The right answer when the prompt has clearly separable workloads (e.g., feed reads and messaging writes have nothing in common). Each service owns its data; cross-service calls are explicit.

The shape:

LB → API gateway → N services (each with its own datastore)
Services communicate via gRPC or async events
Each service: own deploy, own dashboards, own on-call

Service boundaries that work:

Feed service (read-heavy) separate from posting service (write-heavy)
User service (identity, profile) separate from everything else
Payment service (compliance) separate from product service
Media service (CPU/bandwidth) separate from API service (low-resource)

Traps:

Splitting too early (< 30 engineers, single domain)
Splitting along technology rather than domain (a "Redis service" is not a domain)
No answer for cross-service transactions (reach for saga / outbox)

Pros

+Independent scaling per workload
+Team ownership maps to service boundary
+Blast radius bounded by service
+Technology diversity where it matters

Cons

−Cross-service transactions are a design problem (sagas, outbox)
−More moving parts in ops (deploys, dashboards, on-call)
−Easy to over-split too early
−Distributed tracing becomes mandatory

Choose this variant when

Clearly separable domains
> 30 engineers
Different scaling profiles per workload
Different technology needs per domain

CQRS / separate read path

Writes go to the system of record; a derived read model is materialised into a denormalised store.

Architecture diagram· CQRS — write path and read path diverge

Writes go to the source of truth (normalised). A projector materialises the read model into a denormalised store optimised for the access pattern.

Use when read load dwarfs writes and the read shape differs sharply from the write shape (e.g., timelines, feeds, search). The read path can be rebuilt from the write log at any time.

The shape:

Write path: client → app → primary DB (normalised)
Projector: CDC stream → read models (cache, search index, denormalised KV)
Read path: client → app → read model (fast, pre-joined)

Example — Instagram feed:

Write: user posts → Postgres (source of truth)
Projector: CDC → fan out to follower inbox in Redis
Read: user loads feed → single Redis LRANGE per user

When NOT CQRS:

Read:write ratio < 10:1 — operational cost not worth it
Small team — the projector pipeline is real ongoing work
Strict read-your-writes consistency — eventual consistency between sides breaks UX

Pros

+Read and write paths scale independently
+Read model is tailored to the access pattern (fast)
+Rebuildable from the log — no schema-migration pain
+New read models are cheap to add (new projector)

Cons

−Operational cost of the projector pipeline is real
−Eventual consistency between write and read sides
−Overkill when read:write < ~10:1
−Requires streaming infrastructure

Choose this variant when

Read:write > 10:1
Read shape ≠ write shape
Can tolerate seconds of staleness on reads
Multiple read consumers want different shapes

Event-driven / async-first

Kafka (or equivalent) is the backbone; services publish events and consume independently.

Architecture diagram· Event-driven backbone — many consumers, one log

Producers publish facts to Kafka once. Each consumer reads independently with its own offset. Add new consumers without touching producers.

When multiple services need to react to the same facts without tight coupling, an event log becomes the architecture's spine.

The shape:

Sync path: client → app → primary DB (write) → publish event to Kafka
Kafka retention: 7–30 days, partitioned by entity id
Consumers: search indexer, analytics pipeline, notification service, audit logger — each with independent offsets

Why it wins:

New consumers added without touching producers
Replay from the log recovers from consumer bugs
Decouples write latency from downstream work
Audit trail is built-in (the log itself)

When it hurts:

Small systems drown in ops cost (Kafka is not free)
Eventual consistency is the default everywhere
Debugging cross-service issues requires distributed tracing
Event schema evolution needs real discipline

Pros

+Add consumers without touching producers
+Replay for recovery and new features
+Audit trail from the log
+Decoupled write latency

Cons

−Kafka ops / cost is real
−Eventual consistency everywhere
−Debugging needs distributed tracing
−Schema evolution requires discipline

Choose this variant when

Many independent consumers of the same facts
Audit/replay requirements
Fan-out is a core pattern
Team has streaming infra expertise

Edge-heavy / CDN-first

CDN/edge workers handle most traffic; origin servers only serve the uncacheable tail.

Architecture diagram· Edge-heavy — most traffic served from CDN, origin is small

CDN absorbs cacheable reads at the edge. Edge workers personalise where needed. Origin only sees cache misses + writes — typically <5% of total traffic.

When the workload is dominated by reads of cacheable content (media sites, e-commerce catalogs, documentation, blogs, content APIs), serving from the edge moves latency from 200 ms to 20 ms and cost drops 10×.

The shape:

CloudFlare / CloudFront / Fastly at the edge, cache aggressively
Edge workers (Cloudflare Workers, Lambda@Edge) for personalisation
Origin tier is small and serves only cache misses + writes
Static assets on S3 / GCS with lifecycle rules

Signals to reach for it:

Geographically distributed users
Content is public or semi-public (cacheable without user-specific data)
Latency budget < 100 ms p95 globally
Origin QPS would be prohibitive without cache

Constraints:

Cache invalidation is the eternal problem
Personalised content complicates things — edge workers or hybrid cache keys
Cost model is different (CDN egress vs origin compute)

Pros

+Sub-50 ms latency globally
+10× cost reduction on static-heavy workloads
+Origin tier is small and simple
+DDoS protection is native to CDN

Cons

−Cache invalidation is hard
−Personalised content needs edge workers or hybrid caching
−Debugging edge issues is harder (less observability)
−Lock-in to CDN provider features

Choose this variant when

Media / content / e-commerce catalogs
Geographically distributed users
Read-heavy with cacheable content
Latency budget < 100 ms p95 globally

Worked example

Prompt: Design a Reddit-style link-sharing app.

Step 1 — establish the request shape

Dominant read path: GET /feed (personalised, per-subreddit, paginated). ~100k QPS at peak.
Dominant write path: POST /submit (~500/s), POST /vote (~20k/s).
Non-functional: p95 < 200 ms on feed, writes durable, 3-region deploy eventually.

Step 2 — draw the edge → compute → state skeleton

Edge: CloudFlare CDN for static (images, CSS, JS) + ALB for dynamic
Compute: stateless app tier behind ALB (auto-scaled, 20–100 instances)
State:

- Primary: Postgres (submissions, votes, comments) — source of truth, ACID - Cache: Redis (feed cache, vote counts, hot comment threads) - Media: S3 for uploaded images/videos, CloudFront CDN in front - Search: Elasticsearch (projected from Postgres via CDC) for /search

Step 3 — add async side-channels

Vote written → publish event to Kafka

- Consumer 1: update denormalised score on submission row (async) - Consumer 2: update hot-feed ranking cache (async) - Consumer 3: analytics pipeline (async)

Submit written → publish event

- Consumer 1: index for search (async) - Consumer 2: notify subscribed users (async) - Consumer 3: analytics

Step 4 — label every edge

Client → CloudFlare: 10 ms (global anycast)
CloudFlare → ALB: 20 ms (regional)
ALB → app: 2 ms (intra-AZ)
App → Redis: 1 ms (cache hit, 95% hit rate on feeds)
App → Postgres: 5 ms (cache miss path)
App → Kafka: 2 ms (fire-and-forget publish)

Total sync budget: 40 ms on hot path, well inside 200 ms p95 target.

Step 5 — read vs write path

Read path: Client → CDN → ALB → App → Redis (hit) / else Postgres read replicas
Write path: Client → ALB → App → Postgres primary → Kafka publish
Separation lets reads scale horizontally via replicas; writes remain bounded by the primary

Step 6 — what I'm NOT drawing

Auth service (users carry a signed JWT; app validates locally)
Observability stack (Prometheus, Grafana, Jaeger, PagerDuty — all present, off diagram)
Deploy pipeline, feature flags, secrets manager (control plane)
Admin tools for moderation (separate low-QPS service)

Step 7 — named bottleneck "The pressure point is the feed cache hit rate. At 95% we serve 100k QPS with 5k/s hitting Postgres read replicas — manageable. Drop to 85% and Postgres gets 15k/s reads, which starts fighting with vote writes. The fix would be either more Postgres read replicas (short term) or pre-computing hot subreddit feeds into Redis with a write-time fan-out (longer term). The next scaling move after that is sharding Postgres by subreddit, which we'd do at ~10M active users."

That is the whole HLD in three minutes with clear dataflow, named bottleneck, and a scaling path.

Good vs bad answer

Interviewer probe

“Can you draw the architecture for a URL shortener?”

Weak answer

"Sure — we'll have a load balancer, a cluster of app servers, Postgres, Redis, Kafka for async events, Elasticsearch for search, S3 for storage, Cassandra for high write throughput, and probably a GraphQL layer on top." (Proceeds to draw 15 boxes with arrows going everywhere.)

Strong answer

"Four boxes to start. Client → LB → app tier → Postgres. One box off to the side: Redis, sitting next to the app tier, for the hot-read path (GET /:code). That's it for v1.

Let me walk a request through: client hits /abc → LB routes to any app instance → app checks Redis → on miss, SELECT long_url FROM urls WHERE short_code = 'abc' from Postgres → caches the result → 302 redirect. Writes are the reverse: INSERT INTO urls in Postgres, no cache involvement, return the short code.

This design handles ~20K reads/sec with one Redis at 95% hit rate and ~500 origin QPS on Postgres. The pressure point is the Redis hit rate.

I'd only add boxes when we hit a specific limit:

Multi-region: when latency SLO forces geo-distribution, CloudFlare Workers at the edge with Redis replicas per region.
Kafka: only when we need async analytics or click-stream processing for separate consumers.
Sharding: at 500M+ short codes when Postgres becomes a bottleneck.

Not in v1."

Why it wins: The strong answer has a visible request path, a stated cache hit rate, a concrete QPS number, a named pressure point, and explicit triggers for when to add complexity. The weak answer is a box buffet with no dataflow and no reasoning for why each piece exists.

Interview playbook10–15 min, usually the longest single block of the interview

When it comes up

After capacity estimation, when the interviewer says "sketch the architecture"
When you need to orient the interviewer before deep-diving a specific component
When the interviewer asks "walk me through a single request end-to-end"
When asked "where are the bottlenecks?"
When the conversation has drifted and you need to reset the shared mental model

Order of reveal

1
State the request shape. "Two dominant paths: read (GET /X at Y QPS) and write (POST /X at Z QPS). Latency budget is W. Let me draw the shape that serves those."
2
Draw the 3-layer skeleton. "Edge → compute → state. Client hits the LB, LB fans to stateless app tier, app tier reads/writes state. Three arrows, three layers."
3
Place the primary store. "Primary is Postgres (ACID matters for the write path). It's the source of truth."
4
Add the cache on the hot path. "Redis sits in front of Postgres on the read path. 95% hit rate target. The read goes Redis first, Postgres on miss, write-back to Redis."
5
Name async side-channels. "Async work — analytics, search indexing, notifications — goes via Kafka. Dashed lines. Fire-and-forget from the write path."
6
Label the edges. "Each edge gets a verb and a protocol. Client → LB over HTTPS. App → Redis via TCP GET. App → Kafka via producer.send()."
7
Call out what you are NOT drawing. "Auth service is off-diagram — JWTs validated in-process. Observability, deploy pipeline, feature flags — control plane, mentioned not drawn."
8
Name the bottleneck proactively. "The pressure point is the cache hit rate. If it drops below 90%, Postgres is on the hook and that is where I'd scale next — more read replicas, then sharding."

Signature phrases

“Edge → compute → state”

“Every edge has a verb”

“Async is dashed, via a queue”

“The pressure point is X”

“What I am NOT drawing”

“Stateless compute, state is where decisions live”

“Edge → compute → state” — Sharp skeleton that sets the shared mental model fast.
“Every edge has a verb” — Forces dataflow over decoration.
“Async is dashed, via a queue” — Visual convention that shows seniority.
“The pressure point is X” — Pre-empts the bottleneck question and demonstrates failure thinking.
“What I am NOT drawing” — Signals discipline instead of omission.
“Stateless compute, state is where decisions live” — Core architecture mantra that frames every subsequent discussion.

Likely follow-ups

?“Your design has four services. Why not one monolith?”Reveal

Fair question — usually the right answer IS a monolith. Services here are justified because:

1Distinct scaling profiles. The feed service runs at 100k QPS; the posting service at 500 QPS. Putting them in one binary means overprovisioning compute for the low-traffic path or starving the high-traffic one.
2Blast radius. A bug in the comment service should not take down the feed read path. Separate processes = separate failures.
3Team boundaries. Each service has a clear owning team with independent deploy cadence.

But I would start with a monolith if any of those were absent. Premature service split is the more common mistake — the operational cost (dashboards, on-call, distributed tracing, deploys) is real and only pays back when the scaling / blast-radius / team arguments are concrete.

If the interviewer pushes back further, I would collapse to a monolith for the v1 and split only when the specific pain appears.

?“Why is auth off the main diagram?”Reveal

Because it is not on the critical path of every request. The pattern:

1Once per session: client hits POST /auth/login → auth service verifies credentials → returns a signed JWT (or opaque session id).
2Every subsequent request: client sends the token in Authorization: Bearer. App server verifies the signature locally (JWT) or hits a local session cache (Redis). No network call to auth service.
3Token refresh: periodic background call to auth service to get a new token. Off the critical path.

Drawing auth as a middleman on every request is a common mistake — it implies the auth service is on the hot path, which would be disastrous for latency and availability. The correct architecture decouples auth verification from auth issuance.

The exception is fine-grained authorization (per-resource ACLs that can't fit in a token) — that may need a local cache with a short TTL or an explicit policy service call, but usually cached aggressively.

?“You said "Redis cache at 95% hit rate." What happens if Redis goes down?”Reveal

Graceful degradation, not total failure. The pattern:

1App catches Redis connection errors and falls through to Postgres directly.
2Circuit breaker on the Redis client — after N failures, stop trying Redis for ~30 seconds and go straight to Postgres.
3Postgres now eats 100% of read traffic (20k QPS instead of 1k). This is where read replicas matter — at least 3 replicas sized to handle full load without the cache.
4Alert fires: "Redis unavailable, serving from primary." On-call engineer paged.
5Recovery: Redis comes back, app re-connects, hit rate climbs over a few minutes.

What NOT to do:

Fail the request if Redis is down (terrible — cache should be optional)
Retry Redis indefinitely (cascading failure as app threads block)
Size Postgres for "cache-hit-rate" — size it for the full load minus what you're confident the cache catches

This is the "cache is an optimisation, not a dependency" discipline — essential for any production system.

?“Walk me through a single write end-to-end.”Reveal

Take "user submits a vote" on the Reddit-style system:

1Client → CloudFlare (10 ms) — HTTPS POST /v1/votes
2CloudFlare → ALB (20 ms, anycast to region) — regional ingress
3ALB → app instance (2 ms) — round-robin, stateless
4App validates JWT locally (< 1 ms, no network)
5App validates request body (is vote ±1, are user/post IDs valid)
6App writes to Postgres primary: INSERT INTO votes (user_id, post_id, value) ON CONFLICT (user_id, post_id) DO UPDATE ... — 5 ms, idempotent
7App publishes vote.created event to Kafka — 2 ms, fire-and-forget
8App returns 201 to client — total p50 ~40 ms, p99 ~150 ms

Async consumers (not on the critical path):

Consumer A updates denormalised score on the post (eventual consistency, < 1 s lag)
Consumer B updates hot-feed ranking cache in Redis
Consumer C writes to analytics data warehouse

What if the Kafka publish fails?

Use the outbox pattern: write the event to a vote_events table in the same Postgres transaction as the vote. A separate CDC process publishes the event to Kafka at-least-once. Never lose events even if Kafka is down.

This end-to-end walk shows the interviewer I understand sync vs async, the idempotency story, the failure mode, and the latency budget.

Code examples

typescriptOutbox pattern — atomic "write + publish" without distributed transactions

// Write business data AND the event in the SAME DB transaction.
// A separate CDC process publishes events to Kafka at-least-once.

await db.transaction(async (tx) => {
  // 1. Business write (source of truth)
  await tx.votes.insert({
    user_id: userId,
    post_id: postId,
    value: voteValue,
  });

  // 2. Event written in the SAME transaction
  await tx.outbox.insert({
    event_type: 'vote.created',
    payload: JSON.stringify({ userId, postId, voteValue }),
    created_at: new Date(),
  });
});

// A separate process reads outbox rows and publishes to Kafka:
//   SELECT * FROM outbox WHERE published_at IS NULL ORDER BY id LIMIT 100;
//   for row in batch: kafka.produce(row.payload)
//   UPDATE outbox SET published_at = now() WHERE id IN (...);
//
// Guarantees: event is published iff the business write committed.
// At-least-once delivery; consumers must be idempotent.

typescriptCircuit breaker for graceful cache degradation

class RedisCircuitBreaker {
  private failures = 0;
  private openedAt: number | null = null;
  private readonly threshold = 5;
  private readonly cooldownMs = 30_000;

  async get(key: string): Promise<string | null> {
    if (this.isOpen()) return null;  // skip Redis, go to origin

    try {
      const result = await redis.get(key);
      this.failures = 0;  // reset on success
      return result;
    } catch (err) {
      this.failures += 1;
      if (this.failures >= this.threshold) {
        this.openedAt = Date.now();
        logger.warn('Redis circuit opened; degrading to origin');
      }
      return null;  // treat as cache miss
    }
  }

  private isOpen(): boolean {
    if (this.openedAt === null) return false;
    if (Date.now() - this.openedAt > this.cooldownMs) {
      this.openedAt = null;
      this.failures = 0;
      return false;  // half-open; try again
    }
    return true;
  }
}

yamlHLD as a single-file service topology (pseudo-IaC)

# Reddit-style system — top-level topology
edge:
  cdn: cloudflare
  lb: aws-alb
  waf: cloudflare-waf

compute:
  app:
    type: stateless
    replicas: 20-100 (auto-scaled)
    runtime: node.js
  workers:
    type: stateless
    replicas: 10
    consumes: kafka.topics.[vote.created, post.created]

state:
  primary_db:
    type: postgres
    replicas: 1 primary + 3 read replicas
    size: db.r5.4xlarge
  cache:
    type: redis
    mode: cluster
    nodes: 6 (3 primary + 3 replica)
  stream:
    type: kafka
    brokers: 3
    retention: 7d
    topics: [vote.created, post.created, user.created]
  media:
    type: s3
    bucket: reddit-media
    cdn: cloudfront
  search:
    type: elasticsearch
    nodes: 3
    source: cdc-from-postgres

# Flows
sync_read: client -> cdn -> alb -> app -> redis (hit) / postgres-replica (miss)
sync_write: client -> alb -> app -> postgres-primary -> kafka (outbox)
async_fanout: kafka -> workers -> [denormalised-tables, search, analytics]

Common mistakes

Drawing components instead of dataflow

A diagram of 10 boxes without arrows tells the interviewer nothing. The whole point is the flow — draw the arrows first, label them with the verb and the protocol, then let the boxes emerge.

Too many boxes too early

You don't need Kafka, Elasticsearch, and 5 microservices in the v1 design. Start with the simplest thing that handles the stated requirements, then justify every addition with a concrete trigger ("at X QPS we'd add Y").

Hiding async paths inside sync boxes

If email/indexing/fan-out happens asynchronously, it must be visually distinct on the diagram. Candidates who pretend every side-effect is sync end up designing impossible latency budgets.

Drawing the control plane on the data-plane diagramAdvanced

Auth server, deploy system, feature-flag service, secrets manager — all important, all cluttering. Mention them verbally; draw them only if the prompt is specifically about control-plane concerns.

No visible scaling axisAdvanced

A senior HLD shows where the system scales (app tier → horizontal, DB → replicas or shards, cache → cluster) and where it doesn't. If every component is a single box, you've drawn a prototype, not an architecture.

Service boundaries along technology, not domainAdvanced

A "Redis service" or "Kafka service" is not a service boundary — technology is horizontal. Services split along domain boundaries: feed, profile, payment, messaging. Check every service against "what product capability does this own?"

Practice drills

You've drawn the HLD for a chat app. The interviewer asks "where does the message go first?". What's your answer?Reveal

Walk the request end-to-end:

1Client (WebSocket) → WebSocket-terminating LB (sticky by connection)
2LB → connection-handler service (holds the WS connection open)
3Connection-handler publishes to a Kafka topic messages.published partitioned by channel_id
4Fan-out workers consume from Kafka:

- Worker A: writes to message DB (Postgres or Cassandra) — this is the source of truth - Worker B: looks up channel members, pushes the message to their active connections (via Redis pub/sub cross-cluster)

1Recipient clients receive the message over their existing WS connection

Key insight: the message hits durable storage IN PARALLEL with fan-out, not before it. Receiver latency is not gated on disk writes.

If they push on durability: "The Kafka log is the system of record. Even if the DB write fails, the message is in Kafka with 7-day retention — we can replay it. The DB is a materialised read model for history queries."

If they push on ordering: "Kafka partitioning by channel_id gives per-channel ordering. Within a channel, messages are delivered in publish order."

Your interviewer says "the diagram has 6 boxes but no labels on any arrow. What do they do?" What went wrong?Reveal

You drew components without dataflow. Fix in 60 seconds:

1Label each arrow with the verb: GET, SELECT, publish, enqueue, fan-out.
2Label each arrow with the protocol: HTTP, gRPC, AMQP, WebSocket, TCP.
3Label at least one arrow with a latency ballpark: "1 ms (intra-AZ)", "30 ms (cross-region)".
4Distinguish sync from async: solid for sync, dashed for async via a queue.

The interviewer grades understanding of the flow, not knowledge of component names. The boxes are just anchors for the arrows. An HLD with labelled arrows and unlabeled boxes is better than the reverse.

You want to add a Kafka cluster to your v1 HLD. The interviewer asks "why?". What's the shape of a good justification?Reveal

Good justification has three parts:

1. Concrete trigger. "At X QPS / Y fan-out consumers / Z durability requirement, the write path would have to fan out synchronously and exceed our 100 ms latency budget."

2. Alternative considered. "Simpler alternatives: direct RPC calls from the writer to each consumer (breaks when a consumer is down), or a single Redis pub/sub (no persistence, no replay). Kafka gives us decoupling, retention, and replay that those don't."

3. Ops acknowledgement. "Kafka is not free — we need a 3-broker cluster, schema registry, monitoring, DLQ strategy. I'd add it at <specific trigger>, not day 1."

The anti-pattern: adding Kafka because it sounds senior. If you can't name the trigger and the alternative, strip it out. "We start without Kafka; we add it when consumer X appears" is the right v1.

Your HLD has 12 services. The interviewer frowns. How do you respond?Reveal

Acknowledge and consolidate.

"You're right — 12 services is too many for this prompt. Let me collapse. The real domain boundaries here are:

1Identity (auth, profile, session)
2Content (posts, comments, media)
3Feed (timeline, ranking)
4Notifications (async fan-out)

That's 4 services. Everything else I drew (search-indexer, analytics-ingestor, media-transcoder) are workers that consume from Kafka, not services with their own APIs. They don't need their own deploy, dashboards, or on-call.

For the v1, I'd even collapse further — Content + Feed could be one service initially, split when the scaling profiles diverge."

The lesson: splitting is a tool, not a virtue. The right number of services is the smallest number that maps cleanly to team boundaries and scaling profiles. Everything else is workers, libraries, or modules inside one service.

Interviewer: "How would this HLD change if we added multi-region requirements?"Reveal

Three layers change:

1. Edge layer:

CDN remains (already global)
LB becomes geo-DNS (Route 53 / Cloudflare) routing users to their nearest region
Each region has its own regional LB + app tier

2. Compute layer:

Replicated per region — identical stateless app tier in each
Session state in Redis per region (with cross-region replication for failover)

3. State layer — the hard part:

Primary DB: either single-region primary with cross-region read replicas (if writes tolerate ~100 ms cross-region latency) OR multi-primary with conflict resolution (Cassandra, Spanner, DynamoDB Global Tables)
Cache: independent per region (eventual consistency between regions is usually fine)
Object store: S3 cross-region replication or native multi-region bucket
Kafka / event stream: MirrorMaker 2 replicating topics between regions

Trade-offs to name:

Active-active is 2× the cost but gives latency + availability wins
Cross-region write consistency is the hard problem — either go to a multi-primary store or accept write-region pinning
Data residency rules (GDPR — EU data stays in EU) constrain which regions can hold which data

When to propose multi-region: when the prompt demands sub-100 ms p95 globally, or when regional outage resilience is stated. Don't volunteer multi-region to look senior — it is a 3× complexity multiplier.

Cheat sheet

•Shape: edge → compute → state. Every HLD starts here.
•Draw < 10 boxes on the main diagram. More = lost interviewer.
•Every edge: verb + protocol (e.g., "SELECT over TCP", "publish via AMQP").
•Dashed lines for async. Through a queue — never direct.
•Stateless compute = trivially replicated. State = where hard decisions live.
•Name the bottleneck before asked. "The cache hit rate is the pressure point."
•Justify every extra component with a QPS / storage / team threshold.
•Separate control plane from data plane; mention control plane verbally.
•Auth is off-diagram (JWT validated locally). Draw login flow, not every request.
•Name what you are NOT drawing to signal discipline, not omission.

Practice this skill

These problems exercise High-level architecture. Try one now to apply what you just learned.

url shortener rate limiter chat system

7% complete

Current

Read this if

Step 1 of 14

The concept

Jump to next

Edge-heavy / CDN-first

CDN/edge workers handle most traffic; origin servers only serve the uncacheable tail.

Architecture diagram· Edge-heavy — most traffic served from CDN, origin is small

CDN absorbs cacheable reads at the edge. Edge workers personalise where needed. Origin only sees cache misses + writes — typically <5% of total traffic.

The shape:

CloudFlare / CloudFront / Fastly at the edge, cache aggressively
Edge workers (Cloudflare Workers, Lambda@Edge) for personalisation
Origin tier is small and serves only cache misses + writes
Static assets on S3 / GCS with lifecycle rules

Signals to reach for it:

Geographically distributed users
Content is public or semi-public (cacheable without user-specific data)
Latency budget < 100 ms p95 globally
Origin QPS would be prohibitive without cache

Constraints:

Cache invalidation is the eternal problem
Personalised content complicates things — edge workers or hybrid cache keys
Cost model is different (CDN egress vs origin compute)

Pros

+Sub-50 ms latency globally
+10× cost reduction on static-heavy workloads
+Origin tier is small and simple
+DDoS protection is native to CDN

Cons

−Cache invalidation is hard
−Personalised content needs edge workers or hybrid caching
−Debugging edge issues is harder (less observability)
−Lock-in to CDN provider features

Choose this variant when

Media / content / e-commerce catalogs
Geographically distributed users
Read-heavy with cacheable content
Latency budget < 100 ms p95 globally

// Write business data AND the event in the SAME DB transaction. // A separate CDC process publishes events to Kafka at-least-once. await db.transaction(async (tx) => { // 1. Business write (source of truth) await tx.votes.insert({ user_id: userId, post_id: postId, value: voteValue, }); // 2. Event written in the SAME transaction await tx.outbox.insert({ event_type: 'vote.created', payload: JSON.stringify({ userId, postId, voteValue }), created_at: new Date(), }); }); // A separate process reads outbox rows and publishes to Kafka: // SELECT * FROM outbox WHERE published_at IS NULL ORDER BY id LIMIT 100; // for row in batch: kafka.produce(row.payload) // UPDATE outbox SET published_at = now() WHERE id IN (...); // // Guarantees: event is published iff the business write committed. // At-least-once delivery; consumers must be idempotent.

class RedisCircuitBreaker { private failures = 0; private openedAt: number | null = null; private readonly threshold = 5; private readonly cooldownMs = 30_000; async get(key: string): Promise<string | null> { if (this.isOpen()) return null; // skip Redis, go to origin try { const result = await redis.get(key); this.failures = 0; // reset on success return result; } catch (err) { this.failures += 1; if (this.failures >= this.threshold) { this.openedAt = Date.now(); logger.warn('Redis circuit opened; degrading to origin'); } return null; // treat as cache miss } } private isOpen(): boolean { if (this.openedAt === null) return false; if (Date.now() - this.openedAt > this.cooldownMs) { this.openedAt = null; this.failures = 0; return false; // half-open; try again } return true; } }

# Reddit-style system — top-level topology edge: cdn: cloudflare lb: aws-alb waf: cloudflare-waf compute: app: type: stateless replicas: 20-100 (auto-scaled) runtime: node.js workers: type: stateless replicas: 10 consumes: kafka.topics.[vote.created, post.created] state: primary_db: type: postgres replicas: 1 primary + 3 read replicas size: db.r5.4xlarge cache: type: redis mode: cluster nodes: 6 (3 primary + 3 replica) stream: type: kafka brokers: 3 retention: 7d topics: [vote.created, post.created, user.created] media: type: s3 bucket: reddit-media cdn: cloudfront search: type: elasticsearch nodes: 3 source: cdc-from-postgres # Flows sync_read: client -> cdn -> alb -> app -> redis (hit) / postgres-replica (miss) sync_write: client -> alb -> app -> postgres-primary -> kafka (outbox) async_fanout: kafka -> workers -> [denormalised-tables, search, analytics]