coreestimation

Capacity estimation (back-of-envelope)

Deriving QPS, storage, bandwidth, memory from user-facing numbers.

Every downstream decision — cache size, shard count, replica count — collapses onto one question: what are the numbers? Candidates who skip this are designing vibes, not systems.

Read this if your last attempt…

Your reviewer said "where did that number come from?"
You jumped from the prompt straight to architecture in under a minute
You confidently sized a Redis cluster without knowing the working set
You said "use DynamoDB for scale" without naming what scale
Your design had a single box labelled "DB" with no number attached

The concept

Capacity estimation turns a vague user count into concrete numbers: QPS, storage, bandwidth, memory. You have about three minutes in an interview to do it, and whatever numbers you produce will anchor every decision that follows.

What senior candidates do isn't flawless arithmetic. It's that their architecture is traceable back to a handful of inputs they said out loud.

Architecture diagram· The capacity estimation pipeline

DAU feeds every number downstream: QPS → replicas, storage → shards, hot set → cache size, bandwidth → CDN tier.

Peak-to-average multipliers you should default to unless the prompt says otherwise.

Workload type	Peak / avg	Why	Sizing knob
Consumer social (feed, DM)	3–5×	Diurnal + event-driven spikes	Cache + CDN capacity
E-commerce checkout	5–10×	Flash sales, festive campaigns	Queue + async inventory
Enterprise SaaS	2–3×	Business hours concentration	Scheduled autoscaling
IoT / telemetry	1.5–2×	Uniform device distribution	Partition count
Ads / RTB	1.2×	Bounded by upstream traffic	Colocate with inventory
Ride-hailing (driver pings)	2–3×	Rush-hour + weather spikes	Geospatial shard fan-out
Video streaming	3–4×	Prime time + live events	CDN + origin shield
News / virality	50–100×	Hacker-News / TikTok moments	CDN + edge cache + graceful shed

Numbers are defaults for conversation; adjust if the prompt gives you real data.
Always state peak factor out loud — it is a decision, not a constant.

How interviewers grade this

You state the DAU, an action rate per user, and a read:write ratio — explicitly, within the first 90 seconds.
You convert daily totals to per-second QPS (÷ 86,400) and distinguish average from peak with a named factor.
Your storage math uses row size × row count × retention — not "a lot of data".
Your cache sizing uses a hot-set heuristic (80/20) on top of working-set size, and you quote a cache hit rate you can defend.
Your QPS number directly drives replica count, shard count, or cache tier later in the design.
You name one sizing threshold where your design would change shape — "if DAU crosses X, we would need Y".

Variants

QPS-first sizing (read-heavy services)

Start from peak read QPS; cache, replicas, and regions fall out.

Best when the prompt is dominated by reads — feeds, product catalogs, search, redirects.

Order of operations

1DAU × reads/user/day ÷ 86,400 → avg read QPS
2× peak factor (3–5×) → peak read QPS
3× (1 − cache hit rate) → origin QPS
4Origin QPS ÷ per-replica capacity → replica count

Anchoring numbers

Single Postgres primary: ~5K QPS of mixed OLTP, ~20K of simple key-lookup
Single Redis instance: ~100K GET ops/sec
Single Nginx: ~50K RPS of static + short-lived connections

Stating these up front earns credibility. If your numbers land well inside these envelopes, the interviewer rarely pushes on "is this too much for one node?"

Pros

+Matches the shape of 80% of interview prompts
+Drives cache-sizing and replica-count conversations naturally
+Easy to trace every architecture box back to a number

Cons

−Buries storage growth — you still have to do that separately
−Writes sometimes get under-sized; re-check after the main pass

Choose this variant when

Social feed, e-commerce product pages, URL shorteners, news sites
Write QPS is <10% of read QPS and obviously small

Storage-first sizing (durable-data services)

Start from storage growth; sharding threshold and retention drive everything.

Best when the prompt is dominated by durable state — chat history, metrics, logs, financial ledgers.

Order of operations

1Events/day × payload size = bytes/day
2× retention (years) × 365 = total bytes
3÷ per-node capacity → shard count
4Peak writes ÷ per-shard write QPS → shard count (take the larger)

Anchoring numbers

Single SSD-backed node: 2–10 TB usable after indexes and overhead
Cassandra/Dynamo per-node sweet spot: 1–3 TB
Kafka broker: 10–20 TB with retention

When storage crosses ~5 TB for an OLTP workload or QPS crosses ~2K writes/sec, name sharding explicitly and pick a shard key aligned to the dominant access pattern.

Pros

+Surfaces the "do we need sharding?" decision early
+Right default for metrics, logging, messaging, ledgers

Cons

−Can overstate node count if you forget compression and TTL
−Under-sizes CPU for hot-shard scenarios — revisit with QPS

Choose this variant when

Chat, notifications, metrics ingestion, financial events
Retention is long (months to years) and every event must be stored

Write-heavy sizing

Writes dominate; buffer them, batch them, and separate the write path.

Best when writes > 10% of total traffic or approach 1K writes/sec at peak.

Order of operations

1Write events/sec × payload = write bandwidth
2Decide: durable synchronously vs async via queue?
3If sync — pick a store that handles peak writes (Cassandra LSM, DynamoDB, sharded Postgres)
4If async — size the queue for peak × buffer duration

Anchoring numbers

Kafka partition: 10K events/sec write, 100K read
Cassandra node: 10–50K writes/sec
Sharded Postgres shard: 2–5K writes/sec

Distinguish write amplification: one user action often produces multiple downstream writes (audit log, search index, analytics event). Multiply by the fan-out before sizing.

Pros

+Right mental model for IoT, telemetry, chat, payments
+Forces you to name queueing vs direct-write trade-off

Cons

−Harder to map to interview defaults (Postgres + Redis)
−Adds ops complexity the interviewer may push back on

Choose this variant when

Write QPS > 1K sustained, or > 5K at peak
Fan-out per user action > 3 downstream writes

Burst / flash-sale sizing

Size for the minute-long spike, not the daily average.

Best when traffic is dominated by scheduled or unscheduled spikes — flash sales, concert tickets, sports scores, viral moments.

Order of operations

1Pick the worst-case minute, not the day
2Expected spike users × actions/user/minute / 60 → peak per-second QPS
3Multiply by 2× safety factor (humans retry when things are slow)
4Decide: absorb with autoscaling, queue, or precomputation?

Anchoring numbers

Flash sale: 100× baseline QPS for 5–15 minutes
World-Cup goal: 50× spike on sports apps for 30 seconds
Concert on-sale: 20–50× spike, sustained 10 minutes

Reject pure autoscaling for multi-second spikes — new machines take 30–90 seconds to boot. Pre-warm, queue, or shed load.

Pros

+Prevents the "we autoscale" hand-wave from surviving challenge
+Maps directly to queue sizing and admission control

Cons

−Overstates baseline cost if you forget to separate spike from steady

Choose this variant when

Prompt mentions scheduled events, sales, or viral content
p99.9 matters (payments, voting, booking)

Back-of-envelope-only (time-pressured rough pass)

Three numbers, sixty seconds — just enough to earn the right to keep talking.

Use this when the interviewer explicitly says "don't go deep on numbers, let's see the architecture first." Goal is to establish scale tier, not to engineer a final answer.

The three numbers

1Order-of-magnitude QPS — 100 / 1K / 10K / 100K / 1M. Just the bucket.
2Order-of-magnitude storage — GB / TB / PB.
3Read:write ratio — 1:1, 10:1, 100:1, 1000:1.

State them in ~20 seconds, then move on to architecture. Revisit for numerical precision only when a decision hinges on it (sharding threshold, cache capacity). Interviewers respect candidates who know when to stop computing.

Pros

+Keeps time on your side in a 45-minute round
+Signals you know which decisions need precision and which do not

Cons

−Will not survive a "push on the numbers" probe — have the full pass ready

Choose this variant when

Interviewer explicitly de-emphasizes math
Round is < 45 minutes and you need time for deep dive

Worked example

Scenario: Design a URL shortener. 100M URLs/month, consumer read-heavy.

Step 1 — inputs stated out loud

DAU: 30M (1/3 of 100M MAU assumption)
New URLs per day: 100M / 30 = ~3.3M writes/day
Reads per URL: 100:1 lifetime ratio, peaks early then tails off
Reads per day: 3.3M × 100 = ~330M reads/day (steady-state active link set)
Retention: 5 years active, indefinite archive

Step 2 — QPS

Writes: 3.3M / 86,400 = ~40 writes/sec avg, × 5 peak = ~200 writes/sec
Reads: 330M / 86,400 = ~3,800 reads/sec avg, × 5 peak = ~20,000 reads/sec
Cache target hit rate: 95% (redirect payload is tiny and heavily reused)
Origin read QPS: 20,000 × 5% = 1,000 reads/sec

Step 3 — storage

Row size: ~500 B (short code, long URL, user id, created_at, TTL, click count)
Rows over 5 years: 100M × 12 × 5 = 6 billion rows
Total storage: 6B × 500 B = 3 TB
Fits comfortably on one primary plus replicas; sharding NOT required yet

Step 4 — cache

Hot set (80/20): 20% × 6B rows = 1.2B rows
But redirect payload we actually cache is ~200 B (short → long mapping only)
Cache memory: 1.2B × 200 B = 240 GB
One Redis cluster with 3×128 GB shards handles this with headroom

Step 5 — bandwidth

Redirect response: ~500 B
20,000 reads/sec × 500 B = 10 MB/sec origin egress
Trivially fits a single region; no CDN strictly required but nice to have for RTT

Step 6 — sanity check and future-cast

Total design: 1 Postgres primary + 2 read replicas, 1 Redis cluster (3 shards), 1 LB tier
Bottleneck: cache hit rate. If it drops to 80%, origin QPS jumps to 4,000 — still under replica capacity but uncomfortable
At 10× scale (30M/day writes, 200K read peak): shard Postgres by short-code prefix, 3–5 shards; Redis cluster goes to 6 shards; introduce CDN for geographic RTT
At 100× scale: DynamoDB or Cassandra replaces Postgres; CDN absorbs 80% of reads before they hit origin

The whole pass took three minutes. Every architecture choice downstream is now defensible against "why that shape?"

Good vs bad answer

Interviewer probe

“What QPS are we designing for?”

Weak answer

"A lot — millions of users. So we'll definitely need sharding, probably a cache cluster, maybe Kafka for writes. Let's use DynamoDB to be safe on scale."

Strong answer

"Assume 100M MAU, one-third DAU — so ~33M. At 10 actions per user per day that's 330M requests/day, ~3.8K RPS average, ~15K peak at a 4× factor. Writes are 10% of that so ~400 RPS peak. One Postgres primary handles 400 writes/sec comfortably; reads go through Redis targeting a 95% hit rate, so origin read QPS is ~750. That fits two read replicas. No sharding yet — we'd revisit at 10× DAU."

Why it wins: The strong answer ends with a concrete sizing decision the rest of the interview can be tested against. The weak answer sized nothing; it named technologies without naming a number.

Interview playbook2–3 min (front-loaded, before architecture)

When it comes up

Within the first 5 minutes of any design round — as soon as scope is agreed
After the interviewer asks "how many users?" or "what scale?"
Before you draw your first architecture box
Whenever you reach for a technology (cache, queue, shard) and need to justify the shape

Order of reveal

1
Anchor on DAU. "Let's assume ~X million DAU — pulling from MAU if given, otherwise applying a 1/3 ratio for consumer products."
2
Name actions per user per day. "A typical user does Y actions per day — feeds refreshed, messages sent, searches — so that gives us X·Y daily requests."
3
Convert to per-second + peak. "Dividing by 86,400 seconds gives us average QPS; applying a 4× peak factor gives us design QPS of Z."
4
State read:write ratio. "This is a read-heavy / write-heavy / balanced system at ~N:1. That means reads dominate sizing — we should expect to cache aggressively."
5
Compute storage + cache. "Row size times rows per day times retention gives us total storage. Applying 80/20 to get the hot set gives us cache memory."
6
Name the sharding threshold. "We're under the single-primary threshold today. If QPS crosses ~2K writes/sec or storage crosses ~5 TB, we'd shard by <key>."
7
Pause and invite challenge. "Does this scale assumption match what you have in mind, or should I adjust it?"

Signature phrases

“Let's anchor on numbers before architecture”

“I'll apply a 4× peak factor for consumer traffic”

“One primary handles ~5K OLTP QPS; we're well under that”

“The 80/20 hot set fits in ~240 GB of cache”

“At 10× DAU, we'd shard by user_id”

“Let's anchor on numbers before architecture” — Signals you know capacity drives design shape, not the reverse.
“I'll apply a 4× peak factor for consumer traffic” — Names the multiplier explicitly rather than hand-waving "scale".
“One primary handles ~5K OLTP QPS; we're well under that” — Anchors against a known single-node ceiling to justify not sharding.
“The 80/20 hot set fits in ~240 GB of cache” — Shows you can translate storage into cache sizing mechanically.
“At 10× DAU, we'd shard by user_id” — Future-casts a threshold instead of pretending scale is infinite.

Likely follow-ups

?“What if traffic is 10× what you assumed?”Reveal

Step through the cascade without hand-waving:

Cache tier first: hot set grows 10×; add shards or tier to a second-level cache
Read replicas next: if cache hit rate stays put, origin read QPS grows 10× — replicate
Then primary: if write QPS crosses ~2K/sec, shard by the dominant access-path key
Then region: if bandwidth crosses ~1 Gbps regional egress, add a second region

Name each threshold. "Replication breaks at ~2K writes/sec sustained" is what senior sounds like.

?“Why did you pick a 4× peak factor?”Reveal

Because consumer traffic is diurnal plus event-driven. Empirically, peak-to-average ratios land 3–5× for social products (measured at Twitter, Meta, LinkedIn). For flash-sale / scheduled-event systems it's 10×+. I'd tighten the number if the prompt mentions a specific event (World Cup, ticket drop) or loosen it for enterprise workloads concentrated in business hours.

?“How did you get the 95% cache hit rate?”Reveal

It's a defensible default for read-heavy systems with temporal locality. Twitter timeline hits ~95%, CDN static assets hit 98%+, URL shorteners often hit 99%+ because the same links trend. I'd revisit if the access pattern were uniform (fresh IDs, no repeats) — there 30–50% is more realistic, and the cache story changes.

?“Your storage is 3 TB — why not shard?”Reveal

A single Postgres primary with read replicas comfortably serves 5–10 TB with modern NVMe SSDs. Sharding adds cross-shard join pain, rebalancing ops, and client-library complexity. The rule I use: shard when a single primary can no longer meet either storage or write-QPS SLAs. We're well under both. I'd shard when write QPS > ~2K/sec sustained or data > ~5–10 TB with strict p99 latency requirements.

Code examples

textThe 5-line capacity template you keep in your head

DAU × actions_per_user_per_day / 86,400        → avg QPS
avg QPS × peak_factor (3–5×)                   → peak QPS
peak QPS × (1 − cache_hit_rate)                → origin QPS
rows_per_day × row_size × retention_days       → total storage
0.2 × total storage                            → hot cache working set

Common mistakes

Averaging everything and ignoring peak

Real traffic is spiky. A system that "handles 4K QPS on average" dies at the first spike. Multiply avg by 3–5× for consumer workloads and state the factor out loud. Signature fix: always write avg and peak side-by-side.

Assuming every read hits the DB

Without a cache hit ratio, your replica count balloons. Even a naive cache-aside with 80% hit rate cuts origin load 5×. Bake in a hit rate — and defend it against a "what if the cache is cold?" follow-up.

Steady-state storage for a 5-year capacity plan

A 200 GB "current" database grows to 3–10 TB over a product horizon. Always extrapolate. It changes whether you need sharding. Show the math: rows/day × row size × retention.

Inventing unrealistic user countsAdvanced

Claiming "1 billion DAU" for a niche B2B product is a tell. Anchor to a peer company (Instagram ≈ 2B MAU, Spotify ≈ 600M MAU, Uber ≈ 150M MAU) or state you are using the prompt's hint.

Conflating MAU and DAU

MAU is a vanity metric; DAU drives load. Default ratios: consumer social ~1/3, messaging ~1/2, SaaS ~1/5. State the ratio so the interviewer can push back.

Forgetting write amplificationAdvanced

One user-visible action ("post a tweet") can produce 5–20 downstream writes: fan-out, search index, analytics, audit log, push notification. Multiply before sizing the write path.

Practice drills

A video platform says "1B users worldwide". What DAU do you use for capacity math?Reveal

Take 1B MAU and apply the MAU→DAU ratio. For consumer video (YouTube, TikTok-style), that's ~1/3 — so 300M DAU. If you can't remember the ratio, state your assumption: "I'll use 1/3 DAU/MAU for consumer video; tell me if that's off." Avoid designing for 1B DAU unless the prompt explicitly says so.

You estimate 10K QPS read and 200 QPS write. The interviewer asks: do you need sharding?Reveal

Probably not. A single Postgres primary handles 200 writes/sec trivially; replicas + a cache absorb 10K reads. Justify: "200 writes/sec × 5 years × row size stays under ~5 TB; cache absorbs 95% of reads so origin is ~500 QPS. Single primary + 2 read replicas + cache is enough. I'd reach for sharding when writes exceed ~2K QPS or data exceeds ~10 TB."

Interviewer: "What if DAU goes 10×?" What changes in your design first?Reveal

Cascade in order:

1Cache / CDN bill first — hot set grows 10×; add shards or a second-level cache.
2Read replicas next — if cache hit rate holds, origin reads grow 10×; replicate.
3Writes last — they become the bottleneck when they cross primary capacity (~2K QPS sustained), at which point shard by the dominant key.

Writes usually have the most headroom because they're a small fraction of traffic. The right answer names the order, not just the endpoint.

Your peak read QPS is 50,000 and your cache hit rate is 90%. How many read replicas do you need?Reveal

Origin read QPS = 50,000 × 10% = 5,000 QPS. A Postgres read replica can serve ~2–5K mixed OLTP QPS depending on query complexity. So 2–3 replicas with headroom. State your per-replica capacity assumption explicitly — "I'm assuming ~2K QPS per replica for this workload" — and plan for at least one extra replica for N+1 availability.

You're designing a ride-sharing dispatcher. Drivers ping location every 4 seconds while online. 1M concurrent drivers. What's your write QPS?Reveal

Write QPS = 1M / 4 = 250K writes/sec. That's huge. A single OLTP primary cannot do this; neither can a naive sharded Postgres. The honest answer: "I'd either batch pings into 1-second buckets (lowers QPS 4× to 60K and still meets freshness) or stream through Kafka and store in Cassandra / time-series DB. OLTP is the wrong tool here." Sizing correctly here is the whole interview.

Cheat sheet

•1 day = 86,400 seconds. Memorise it. Round to 100K for speed.
•Peak:avg — consumer 3–5×, enterprise 2–3×, flash-sale 5–10×, virality 50–100×.
•1 KB × 1 million rows = 1 GB. 1 KB × 1 billion = 1 TB. Keep this mental unit.
•80/20 rule: 20% of entities drive 80% of traffic. Size hot cache on 20%.
•Default cache hit rate: 95% for read-heavy with temporal locality; 50% for uniform.
•Single-primary ceiling: ~5K OLTP QPS, ~5–10 TB durable; shard above.
•MAU → DAU: consumer social 1/3, messaging 1/2, SaaS 1/5.
•If your QPS < 10K and storage < 5 TB, you likely don't need sharding. Say so.
•Always state: DAU, actions/user/day, read:write ratio — these three anchor everything.
•Every box on the diagram should be traceable to one of your numbers.

Practice this skill

These problems exercise Capacity estimation (back-of-envelope). Try one now to apply what you just learned.

url shortener rate limiter

7% complete

Current

Read this if

Step 1 of 14

The concept

Jump to next

Workload type

Peak / avg

Why

Sizing knob

Consumer social (feed, DM)

3–5×

Diurnal + event-driven spikes

Cache + CDN capacity

E-commerce checkout

5–10×

Flash sales, festive campaigns

Queue + async inventory

Enterprise SaaS

2–3×

Business hours concentration

Scheduled autoscaling

IoT / telemetry

1.5–2×

Uniform device distribution

Partition count

Ads / RTB

1.2×

Bounded by upstream traffic

Colocate with inventory

Ride-hailing (driver pings)

2–3×

Rush-hour + weather spikes

Geospatial shard fan-out

Video streaming

3–4×

Prime time + live events

CDN + origin shield

News / virality

50–100×

Hacker-News / TikTok moments

CDN + edge cache + graceful shed

DAU × actions_per_user_per_day / 86,400 → avg QPS avg QPS × peak_factor (3–5×) → peak QPS peak QPS × (1 − cache_hit_rate) → origin QPS rows_per_day × row_size × retention_days → total storage 0.2 × total storage → hot cache working set