Capacity estimation (back-of-envelope)
Deriving QPS, storage, bandwidth, memory from user-facing numbers.
Every downstream decision — cache size, shard count, replica count — collapses onto one question: what are the numbers? Candidates who skip this are designing vibes, not systems.
Read this if your last attempt…
- Your reviewer said "where did that number come from?"
- You jumped from the prompt straight to architecture in under a minute
- You confidently sized a Redis cluster without knowing the working set
- You said "use DynamoDB for scale" without naming what scale
- Your design had a single box labelled "DB" with no number attached
The concept
Capacity estimation turns a vague user count into concrete numbers: QPS, storage, bandwidth, memory. You have about three minutes in an interview to do it, and whatever numbers you produce will anchor every decision that follows.
What senior candidates do isn't flawless arithmetic. It's that their architecture is traceable back to a handful of inputs they said out loud.
DAU feeds every number downstream: QPS → replicas, storage → shards, hot set → cache size, bandwidth → CDN tier.
Peak-to-average multipliers you should default to unless the prompt says otherwise.
| Workload type | Peak / avg | Why | Sizing knob |
|---|---|---|---|
| Consumer social (feed, DM) | 3–5× | Diurnal + event-driven spikes | Cache + CDN capacity |
| E-commerce checkout | 5–10× | Flash sales, festive campaigns | Queue + async inventory |
| Enterprise SaaS | 2–3× | Business hours concentration | Scheduled autoscaling |
| IoT / telemetry | 1.5–2× | Uniform device distribution | Partition count |
| Ads / RTB | 1.2× | Bounded by upstream traffic | Colocate with inventory |
| Ride-hailing (driver pings) | 2–3× | Rush-hour + weather spikes | Geospatial shard fan-out |
| Video streaming | 3–4× | Prime time + live events | CDN + origin shield |
| News / virality | 50–100× | Hacker-News / TikTok moments | CDN + edge cache + graceful shed |
- Numbers are defaults for conversation; adjust if the prompt gives you real data.
- Always state peak factor out loud — it is a decision, not a constant.
How interviewers grade this
- You state the DAU, an action rate per user, and a read:write ratio — explicitly, within the first 90 seconds.
- You convert daily totals to per-second QPS (÷ 86,400) and distinguish average from peak with a named factor.
- Your storage math uses row size × row count × retention — not "a lot of data".
- Your cache sizing uses a hot-set heuristic (80/20) on top of working-set size, and you quote a cache hit rate you can defend.
- Your QPS number directly drives replica count, shard count, or cache tier later in the design.
- You name one sizing threshold where your design would change shape — "if DAU crosses X, we would need Y".
Variants
QPS-first sizing (read-heavy services)
Start from peak read QPS; cache, replicas, and regions fall out.
Best when the prompt is dominated by reads — feeds, product catalogs, search, redirects.
Order of operations
- 1DAU × reads/user/day ÷ 86,400 → avg read QPS
- 2× peak factor (3–5×) → peak read QPS
- 3× (1 − cache hit rate) → origin QPS
- 4Origin QPS ÷ per-replica capacity → replica count
Anchoring numbers
- Single Postgres primary: ~5K QPS of mixed OLTP, ~20K of simple key-lookup
- Single Redis instance: ~100K GET ops/sec
- Single Nginx: ~50K RPS of static + short-lived connections
Stating these up front earns credibility. If your numbers land well inside these envelopes, the interviewer rarely pushes on "is this too much for one node?"
Pros
- +Matches the shape of 80% of interview prompts
- +Drives cache-sizing and replica-count conversations naturally
- +Easy to trace every architecture box back to a number
Cons
- −Buries storage growth — you still have to do that separately
- −Writes sometimes get under-sized; re-check after the main pass
Choose this variant when
- Social feed, e-commerce product pages, URL shorteners, news sites
- Write QPS is <10% of read QPS and obviously small
Storage-first sizing (durable-data services)
Start from storage growth; sharding threshold and retention drive everything.
Best when the prompt is dominated by durable state — chat history, metrics, logs, financial ledgers.
Order of operations
- 1Events/day × payload size = bytes/day
- 2× retention (years) × 365 = total bytes
- 3÷ per-node capacity → shard count
- 4Peak writes ÷ per-shard write QPS → shard count (take the larger)
Anchoring numbers
- Single SSD-backed node: 2–10 TB usable after indexes and overhead
- Cassandra/Dynamo per-node sweet spot: 1–3 TB
- Kafka broker: 10–20 TB with retention
When storage crosses ~5 TB for an OLTP workload or QPS crosses ~2K writes/sec, name sharding explicitly and pick a shard key aligned to the dominant access pattern.
Pros
- +Surfaces the "do we need sharding?" decision early
- +Right default for metrics, logging, messaging, ledgers
Cons
- −Can overstate node count if you forget compression and TTL
- −Under-sizes CPU for hot-shard scenarios — revisit with QPS
Choose this variant when
- Chat, notifications, metrics ingestion, financial events
- Retention is long (months to years) and every event must be stored
Write-heavy sizing
Writes dominate; buffer them, batch them, and separate the write path.
Best when writes > 10% of total traffic or approach 1K writes/sec at peak.
Order of operations
- 1Write events/sec × payload = write bandwidth
- 2Decide: durable synchronously vs async via queue?
- 3If sync — pick a store that handles peak writes (Cassandra LSM, DynamoDB, sharded Postgres)
- 4If async — size the queue for peak × buffer duration
Anchoring numbers
- Kafka partition: 10K events/sec write, 100K read
- Cassandra node: 10–50K writes/sec
- Sharded Postgres shard: 2–5K writes/sec
Distinguish write amplification: one user action often produces multiple downstream writes (audit log, search index, analytics event). Multiply by the fan-out before sizing.
Pros
- +Right mental model for IoT, telemetry, chat, payments
- +Forces you to name queueing vs direct-write trade-off
Cons
- −Harder to map to interview defaults (Postgres + Redis)
- −Adds ops complexity the interviewer may push back on
Choose this variant when
- Write QPS > 1K sustained, or > 5K at peak
- Fan-out per user action > 3 downstream writes
Burst / flash-sale sizing
Size for the minute-long spike, not the daily average.
Best when traffic is dominated by scheduled or unscheduled spikes — flash sales, concert tickets, sports scores, viral moments.
Order of operations
- 1Pick the worst-case minute, not the day
- 2Expected spike users × actions/user/minute / 60 → peak per-second QPS
- 3Multiply by 2× safety factor (humans retry when things are slow)
- 4Decide: absorb with autoscaling, queue, or precomputation?
Anchoring numbers
- Flash sale: 100× baseline QPS for 5–15 minutes
- World-Cup goal: 50× spike on sports apps for 30 seconds
- Concert on-sale: 20–50× spike, sustained 10 minutes
Reject pure autoscaling for multi-second spikes — new machines take 30–90 seconds to boot. Pre-warm, queue, or shed load.
Pros
- +Prevents the "we autoscale" hand-wave from surviving challenge
- +Maps directly to queue sizing and admission control
Cons
- −Overstates baseline cost if you forget to separate spike from steady
Choose this variant when
- Prompt mentions scheduled events, sales, or viral content
- p99.9 matters (payments, voting, booking)
Back-of-envelope-only (time-pressured rough pass)
Three numbers, sixty seconds — just enough to earn the right to keep talking.
Use this when the interviewer explicitly says "don't go deep on numbers, let's see the architecture first." Goal is to establish scale tier, not to engineer a final answer.
The three numbers
- 1Order-of-magnitude QPS — 100 / 1K / 10K / 100K / 1M. Just the bucket.
- 2Order-of-magnitude storage — GB / TB / PB.
- 3Read:write ratio — 1:1, 10:1, 100:1, 1000:1.
State them in ~20 seconds, then move on to architecture. Revisit for numerical precision only when a decision hinges on it (sharding threshold, cache capacity). Interviewers respect candidates who know when to stop computing.
Pros
- +Keeps time on your side in a 45-minute round
- +Signals you know which decisions need precision and which do not
Cons
- −Will not survive a "push on the numbers" probe — have the full pass ready
Choose this variant when
- Interviewer explicitly de-emphasizes math
- Round is < 45 minutes and you need time for deep dive
Worked example
Scenario: Design a URL shortener. 100M URLs/month, consumer read-heavy.
Step 1 — inputs stated out loud
- DAU: 30M (1/3 of 100M MAU assumption)
- New URLs per day: 100M / 30 = ~3.3M writes/day
- Reads per URL: 100:1 lifetime ratio, peaks early then tails off
- Reads per day: 3.3M × 100 = ~330M reads/day (steady-state active link set)
- Retention: 5 years active, indefinite archive
Step 2 — QPS
- Writes: 3.3M / 86,400 = ~40 writes/sec avg, × 5 peak = ~200 writes/sec
- Reads: 330M / 86,400 = ~3,800 reads/sec avg, × 5 peak = ~20,000 reads/sec
- Cache target hit rate: 95% (redirect payload is tiny and heavily reused)
- Origin read QPS: 20,000 × 5% = 1,000 reads/sec
Step 3 — storage
- Row size: ~500 B (short code, long URL, user id, created_at, TTL, click count)
- Rows over 5 years: 100M × 12 × 5 = 6 billion rows
- Total storage: 6B × 500 B = 3 TB
- Fits comfortably on one primary plus replicas; sharding NOT required yet
Step 4 — cache
- Hot set (80/20): 20% × 6B rows = 1.2B rows
- But redirect payload we actually cache is ~200 B (short → long mapping only)
- Cache memory: 1.2B × 200 B = 240 GB
- One Redis cluster with 3×128 GB shards handles this with headroom
Step 5 — bandwidth
- Redirect response: ~500 B
- 20,000 reads/sec × 500 B = 10 MB/sec origin egress
- Trivially fits a single region; no CDN strictly required but nice to have for RTT
Step 6 — sanity check and future-cast
- Total design: 1 Postgres primary + 2 read replicas, 1 Redis cluster (3 shards), 1 LB tier
- Bottleneck: cache hit rate. If it drops to 80%, origin QPS jumps to 4,000 — still under replica capacity but uncomfortable
- At 10× scale (30M/day writes, 200K read peak): shard Postgres by short-code prefix, 3–5 shards; Redis cluster goes to 6 shards; introduce CDN for geographic RTT
- At 100× scale: DynamoDB or Cassandra replaces Postgres; CDN absorbs 80% of reads before they hit origin
The whole pass took three minutes. Every architecture choice downstream is now defensible against "why that shape?"
Good vs bad answer
Interviewer probe
“What QPS are we designing for?”
Weak answer
"A lot — millions of users. So we'll definitely need sharding, probably a cache cluster, maybe Kafka for writes. Let's use DynamoDB to be safe on scale."
Strong answer
"Assume 100M MAU, one-third DAU — so ~33M. At 10 actions per user per day that's 330M requests/day, ~3.8K RPS average, ~15K peak at a 4× factor. Writes are 10% of that so ~400 RPS peak. One Postgres primary handles 400 writes/sec comfortably; reads go through Redis targeting a 95% hit rate, so origin read QPS is ~750. That fits two read replicas. No sharding yet — we'd revisit at 10× DAU."
Why it wins: The strong answer ends with a concrete sizing decision the rest of the interview can be tested against. The weak answer sized nothing; it named technologies without naming a number.
When it comes up
- Within the first 5 minutes of any design round — as soon as scope is agreed
- After the interviewer asks "how many users?" or "what scale?"
- Before you draw your first architecture box
- Whenever you reach for a technology (cache, queue, shard) and need to justify the shape
Order of reveal
- 1Anchor on DAU. "Let's assume ~X million DAU — pulling from MAU if given, otherwise applying a 1/3 ratio for consumer products."
- 2Name actions per user per day. "A typical user does Y actions per day — feeds refreshed, messages sent, searches — so that gives us X·Y daily requests."
- 3Convert to per-second + peak. "Dividing by 86,400 seconds gives us average QPS; applying a 4× peak factor gives us design QPS of Z."
- 4State read:write ratio. "This is a read-heavy / write-heavy / balanced system at ~N:1. That means reads dominate sizing — we should expect to cache aggressively."
- 5Compute storage + cache. "Row size times rows per day times retention gives us total storage. Applying 80/20 to get the hot set gives us cache memory."
- 6Name the sharding threshold. "We're under the single-primary threshold today. If QPS crosses ~2K writes/sec or storage crosses ~5 TB, we'd shard by <key>."
- 7Pause and invite challenge. "Does this scale assumption match what you have in mind, or should I adjust it?"
Signature phrases
- “Let's anchor on numbers before architecture” — Signals you know capacity drives design shape, not the reverse.
- “I'll apply a 4× peak factor for consumer traffic” — Names the multiplier explicitly rather than hand-waving "scale".
- “One primary handles ~5K OLTP QPS; we're well under that” — Anchors against a known single-node ceiling to justify not sharding.
- “The 80/20 hot set fits in ~240 GB of cache” — Shows you can translate storage into cache sizing mechanically.
- “At 10× DAU, we'd shard by user_id” — Future-casts a threshold instead of pretending scale is infinite.
Likely follow-ups
?“What if traffic is 10× what you assumed?”Reveal
Step through the cascade without hand-waving:
- Cache tier first: hot set grows 10×; add shards or tier to a second-level cache
- Read replicas next: if cache hit rate stays put, origin read QPS grows 10× — replicate
- Then primary: if write QPS crosses ~2K/sec, shard by the dominant access-path key
- Then region: if bandwidth crosses ~1 Gbps regional egress, add a second region
Name each threshold. "Replication breaks at ~2K writes/sec sustained" is what senior sounds like.
?“Why did you pick a 4× peak factor?”Reveal
Because consumer traffic is diurnal plus event-driven. Empirically, peak-to-average ratios land 3–5× for social products (measured at Twitter, Meta, LinkedIn). For flash-sale / scheduled-event systems it's 10×+. I'd tighten the number if the prompt mentions a specific event (World Cup, ticket drop) or loosen it for enterprise workloads concentrated in business hours.
?“How did you get the 95% cache hit rate?”Reveal
It's a defensible default for read-heavy systems with temporal locality. Twitter timeline hits ~95%, CDN static assets hit 98%+, URL shorteners often hit 99%+ because the same links trend. I'd revisit if the access pattern were uniform (fresh IDs, no repeats) — there 30–50% is more realistic, and the cache story changes.
?“Your storage is 3 TB — why not shard?”Reveal
A single Postgres primary with read replicas comfortably serves 5–10 TB with modern NVMe SSDs. Sharding adds cross-shard join pain, rebalancing ops, and client-library complexity. The rule I use: shard when a single primary can no longer meet either storage or write-QPS SLAs. We're well under both. I'd shard when write QPS > ~2K/sec sustained or data > ~5–10 TB with strict p99 latency requirements.
Code examples
DAU × actions_per_user_per_day / 86,400 → avg QPS
avg QPS × peak_factor (3–5×) → peak QPS
peak QPS × (1 − cache_hit_rate) → origin QPS
rows_per_day × row_size × retention_days → total storage
0.2 × total storage → hot cache working setCommon mistakes
Real traffic is spiky. A system that "handles 4K QPS on average" dies at the first spike. Multiply avg by 3–5× for consumer workloads and state the factor out loud. Signature fix: always write avg and peak side-by-side.
Without a cache hit ratio, your replica count balloons. Even a naive cache-aside with 80% hit rate cuts origin load 5×. Bake in a hit rate — and defend it against a "what if the cache is cold?" follow-up.
A 200 GB "current" database grows to 3–10 TB over a product horizon. Always extrapolate. It changes whether you need sharding. Show the math: rows/day × row size × retention.
Claiming "1 billion DAU" for a niche B2B product is a tell. Anchor to a peer company (Instagram ≈ 2B MAU, Spotify ≈ 600M MAU, Uber ≈ 150M MAU) or state you are using the prompt's hint.
MAU is a vanity metric; DAU drives load. Default ratios: consumer social ~1/3, messaging ~1/2, SaaS ~1/5. State the ratio so the interviewer can push back.
One user-visible action ("post a tweet") can produce 5–20 downstream writes: fan-out, search index, analytics, audit log, push notification. Multiply before sizing the write path.
Practice drills
A video platform says "1B users worldwide". What DAU do you use for capacity math?Reveal
Take 1B MAU and apply the MAU→DAU ratio. For consumer video (YouTube, TikTok-style), that's ~1/3 — so 300M DAU. If you can't remember the ratio, state your assumption: "I'll use 1/3 DAU/MAU for consumer video; tell me if that's off." Avoid designing for 1B DAU unless the prompt explicitly says so.
You estimate 10K QPS read and 200 QPS write. The interviewer asks: do you need sharding?Reveal
Probably not. A single Postgres primary handles 200 writes/sec trivially; replicas + a cache absorb 10K reads. Justify: "200 writes/sec × 5 years × row size stays under ~5 TB; cache absorbs 95% of reads so origin is ~500 QPS. Single primary + 2 read replicas + cache is enough. I'd reach for sharding when writes exceed ~2K QPS or data exceeds ~10 TB."
Interviewer: "What if DAU goes 10×?" What changes in your design first?Reveal
Cascade in order:
- 1Cache / CDN bill first — hot set grows 10×; add shards or a second-level cache.
- 2Read replicas next — if cache hit rate holds, origin reads grow 10×; replicate.
- 3Writes last — they become the bottleneck when they cross primary capacity (~2K QPS sustained), at which point shard by the dominant key.
Writes usually have the most headroom because they're a small fraction of traffic. The right answer names the order, not just the endpoint.
Your peak read QPS is 50,000 and your cache hit rate is 90%. How many read replicas do you need?Reveal
Origin read QPS = 50,000 × 10% = 5,000 QPS. A Postgres read replica can serve ~2–5K mixed OLTP QPS depending on query complexity. So 2–3 replicas with headroom. State your per-replica capacity assumption explicitly — "I'm assuming ~2K QPS per replica for this workload" — and plan for at least one extra replica for N+1 availability.
You're designing a ride-sharing dispatcher. Drivers ping location every 4 seconds while online. 1M concurrent drivers. What's your write QPS?Reveal
Write QPS = 1M / 4 = 250K writes/sec. That's huge. A single OLTP primary cannot do this; neither can a naive sharded Postgres. The honest answer: "I'd either batch pings into 1-second buckets (lowers QPS 4× to 60K and still meets freshness) or stream through Kafka and store in Cassandra / time-series DB. OLTP is the wrong tool here." Sizing correctly here is the whole interview.
Cheat sheet
- •1 day = 86,400 seconds. Memorise it. Round to 100K for speed.
- •Peak:avg — consumer 3–5×, enterprise 2–3×, flash-sale 5–10×, virality 50–100×.
- •1 KB × 1 million rows = 1 GB. 1 KB × 1 billion = 1 TB. Keep this mental unit.
- •80/20 rule: 20% of entities drive 80% of traffic. Size hot cache on 20%.
- •Default cache hit rate: 95% for read-heavy with temporal locality; 50% for uniform.
- •Single-primary ceiling: ~5K OLTP QPS, ~5–10 TB durable; shard above.
- •MAU → DAU: consumer social 1/3, messaging 1/2, SaaS 1/5.
- •If your QPS < 10K and storage < 5 TB, you likely don't need sharding. Say so.
- •Always state: DAU, actions/user/day, read:write ratio — these three anchor everything.
- •Every box on the diagram should be traceable to one of your numbers.
Practice this skill
These problems exercise Capacity estimation (back-of-envelope). Try one now to apply what you just learned.