coreestimation

Numbers to know

Latency, throughput, and capacity numbers for sizing designs, not vibes.

If your design decisions aren't backed by numbers, they're opinions. Knowing that Redis handles 100K ops/sec or that a cross-region RTT is 60ms isn't trivia — it's what separates "we'll add a cache" from "we need 3 Redis instances because our hot-path is 200K reads/sec."

Read this if your last attempt…

You proposed sharding at 1M rows (Postgres handles billions with proper indexes)
You can't estimate QPS from DAU in your head
You don't know how much a single Postgres or Redis instance can handle
Your capacity math is "a lot" instead of a number

The concept

Modern hardware is far more powerful than most candidates assume. A single well-tuned Postgres instance handles 10K-50K transactions per second. A single Redis instance handles 100K-300K operations per second. If your system does 1K QPS and you're proposing sharding, you're adding massive complexity for no reason.

The numbers you need to memorise fall into three buckets: latency (how long each hop takes), capacity (how much one instance can handle), and formulas (how to estimate from user counts).

System capacity per instance — the numbers that prevent over-engineering.

System	Capacity per instance	Scale signal
Redis	100K-300K ops/sec, up to ~25 GB RAM	Memory full, or QPS > 300K
Postgres / MySQL	10K-50K TPS, up to ~10 TB storage	Write QPS > 10K or storage > 10 TB
Cassandra	10K-50K writes/sec per node, linear scaling	Add nodes for more throughput
Kafka	100K-1M msgs/sec per broker, 50 TB storage	Throughput > 800K or partition count > 200K
Elasticsearch	1K-10K searches/sec per node	Search latency > SLA or index > 1 TB
App server	1K-10K req/sec per instance	CPU > 70% or response latency > SLA
Nginx / Envoy	50K-100K req/sec as reverse proxy	Connections near 100K per instance
S3	5,500 GETs/sec and 3,500 PUTs/sec per prefix	Prefix partitioning for higher throughput

How interviewers grade this

You estimate QPS from DAU fluently — a number, not "a lot".
You know single-instance capacity (Redis: 100K ops, Postgres: 10K+ TPS) and size accordingly.
You do capacity math when making scaling decisions, not by vibes.
You know latency numbers and use them in your budget.
You push back on premature sharding by citing actual instance limits.

Variants

Latency numbers

The time each hop takes — from nanoseconds to hundreds of milliseconds.

These are the numbers that define your latency budget. Every serial hop in your design adds this much latency:

L1 cache reference: ~1 ns
Main memory reference: ~100 ns
SSD random read (4KB): ~100 microseconds
Redis / Memcached GET (same AZ): ~0.5-1 ms
Simple indexed DB query (same AZ): ~2-10 ms
Cross-AZ round trip (same region): ~1-2 ms
Cross-region (same continent, e.g. US-East to US-West): ~30-60 ms
Cross-continent (US to Europe): ~80-120 ms
CDN edge hit: ~5-20 ms
DNS cold lookup: ~50-200 ms
TCP + TLS 1.3 handshake: ~30-60 ms

The critical insight: the gap between in-memory (nanoseconds) and cross-continent (100ms) is six orders of magnitude. That's why caching works so well — you're replacing a 10ms DB call with a 0.5ms Redis call, a 20x improvement. And it's why global deployments with regional data are essential for low-latency apps — cross-continent latency is physics, not engineering.

Choose this variant when

Every design — these numbers should be muscle memory

Quick estimation formulas

The five formulas that size every system.

1QPS from DAU: DAU x actions_per_user / 86,400. Peak is 2-3x average for consumer apps, up to 10x for flash-sale or event-driven spikes.

1Storage per year: rows_per_day x avg_row_size x 365. Don't forget to multiply by retention. "1 GB/day" sounds small until you multiply by 5 years = 1.8 TB.

1Bandwidth: QPS x avg_response_size. This tells you your CDN/egress bill and whether you need compression.

1Cache size: working_set_fraction x total_data x avg_object_size. Working set is typically 20% of data for most apps (the 80/20 rule).

1Server count: peak_QPS / QPS_per_server + 30% headroom. Always size for peak, not average.

The senior move: compute these in the first 3 minutes of a design round, then use them to drive every infrastructure decision that follows. "We need 3 Redis instances because our working set is 60 GB" beats "we'll use Redis" every time.

Choose this variant when

Capacity estimation section of every design round
Before proposing any infrastructure component

Common over-engineering mistakes

Things candidates propose too early because they don't know the numbers.

Sharding at 1M rows: Postgres handles billions of rows with proper indexes. You don't need sharding until storage exceeds ~10 TB or write QPS exceeds ~10K. Sharding at 1M rows is adding distributed systems complexity for zero benefit.

Cache for 100 req/sec: a single Postgres instance handles this in its sleep. You cache when read QPS exceeds DB capacity or latency exceeds your SLA, not as a default.

Kafka for 100 events/sec: a simple database table with a consumer polling every second handles this. Kafka shines at >10K events/sec with multiple consumer groups and replay requirements.

Microservices for a new product: a single monolith on one server handles most early-stage products. Split when you have team-scale problems, not load problems.

The rule: if you can't justify the complexity with a number (QPS, storage, latency), you're over-engineering. Interviewers at staff+ level actively look for this signal — knowing when NOT to add infrastructure is as important as knowing when to add it.

Choose this variant when

When you hear yourself saying "let's add X just in case"

Worked example

Scenario: size the infrastructure for a URL shortener with 100M MAU.

Step 1 — QPS:

DAU = 100M / 3 = ~33M (consumer app DAU/MAU ratio ~1/3).
Write: 1 new URL per user per day = 33M / 86,400 = ~380 writes/sec avg. Peak 3x = ~1,200.
Read: 100:1 read:write ratio. 38,000 reads/sec avg. Peak = ~115,000.

Step 2 — Storage (5 years):

33M URLs/day x 500 bytes x 365 x 5 = ~30 TB.
Exceeds single Postgres (~10 TB). Sharding needed for storage. Partition by short_code hash.

Step 3 — Cache:

115K reads/sec at peak. Single Redis handles this (100-300K ops/sec).
Hot set: 20% of 30B total URLs x 200 bytes = ~1.2 TB. Too big for one Redis.
But 80/20 on the hot set: 20% of 1.2TB = ~240 GB cache gets ~96% hit rate.
240 GB / 25 GB per Redis = ~10 Redis instances. Redis Cluster.

Step 4 — Servers:

115K reads/sec peak. Each app server does ~5K req/sec.
115K / 5K = 23 servers. Add 30% = ~30 servers.

Result: 30 app servers, 10 Redis shards, 3+ Postgres shards. Every number is justified.

Good vs bad answer

Interviewer probe

“How many servers do you need?”

Weak answer

"A lot — we'll have millions of users. Let's use auto-scaling and it'll figure it out."

Strong answer

"33M DAU at 100:1 read/write gives us ~115K reads/sec peak. Each app server handles about 5K req/sec, so I need ~23 servers plus 30% headroom — call it 30. Cache: 240 GB working set across 10 Redis shards. Storage: 30 TB over 5 years, so 3+ Postgres shards partitioned by short_code hash. The bottleneck is cache hit rate — at 96%, origin DB sees only ~4,600 QPS peak, which 3 shards handle easily."

Why it wins: Every infrastructure decision has a number behind it. The interviewer can challenge any number and get a defended answer.

Interview playbook3–5 min during capacity estimation; referenced throughout the round

When it comes up

During capacity estimation — the first 3–5 minutes of most design rounds
When proposing infrastructure — every component needs a number
When the interviewer challenges "do you actually need this?"
When sizing cache, DB instances, or server fleet
When pushing back on over-engineering (sharding, microservices, Kafka)

Order of reveal

1
State the base unit. "1 day = 86,400 seconds. Base number for every QPS estimate."
2
Compute QPS from DAU. "DAU × actions/user / 86,400 = avg QPS. Peak is 2-3× for consumer, 10× for flash sales."
3
State storage with retention. "Rows/day × size × retention_years. Without retention, the number is meaningless."
4
Size each component from single-instance capacity. "Redis: 100k-300k ops/s. Postgres: 10k-50k TPS. App: 1k-10k req/s. Divide peak by per-instance to size fleet."
5
Add 30% headroom. "Always size for peak, not average, and add headroom for growth and failure margin."
6
Call out cache hit rate as the lever. "At X% hit rate, origin sees only (1-X)× the QPS. This is where the biggest scaling wins live."
7
Push back on premature scaling. "At Y QPS / Z TB, we do NOT need sharding/Kafka/microservices yet. Here's the threshold at which we would."

Signature phrases

“1 day = 86,400 seconds”

“Don't shard until 10 TB or 10k write QPS”

“Peak is 2-3× average”

“Working set is ~20% of data”

“Modern Postgres handles billions of rows”

“Every number has a justification”

“1 day = 86,400 seconds” — The single most-used number in capacity estimation.
“Don't shard until 10 TB or 10k write QPS” — Specific threshold that pushes back on premature complexity.
“Peak is 2-3× average” — Prevents under-sizing for the worst hour of the day.
“Working set is ~20% of data” — Heuristic for sizing cache without guessing.
“Modern Postgres handles billions of rows” — Counter to the reflexive "scales poorly" narrative.
“Every number has a justification” — Frames your sizing as defendable, not vibes.

Likely follow-ups

?“Your system is at 1k write QPS. Do you need to shard the database?”Reveal

No. A single well-tuned Postgres instance handles 10k–50k writes/sec. At 1k QPS you're using 2–10% of a single instance.

When sharding IS justified:

1Sustained write QPS > 20k/s on a single hot table
2Dataset size > 10 TB (backup, recovery, and dump times become prohibitive)
3Access-pattern divergence — different tables want different sharding strategies

At 1k QPS, the right optimisations are:

Indexes tailored to access patterns (the biggest lever)
Read replicas for read scaling
Connection pooling (PgBouncer)
Partitioning by time (for large but low-QPS tables)

Sharding at 1k QPS adds massive operational complexity (cross-shard queries, rebalancing, ops surface) for zero performance benefit. It is a common over-engineering trap at the senior level — pushing back on it shows maturity.

?“How do you estimate cache size for a system?”Reveal

Working set × object size, with a hit-rate target.

Step 1 — estimate working set. The 80/20 rule: about 20% of data handles 80% of traffic. For really skewed distributions (celebrity content, viral posts), 5% handles 95%. For uniform distributions (random-access KV), working set ≈ all data.

Step 2 — compute cache memory.

text

working_set_fraction × total_items × avg_object_size

Example: 1B URLs × 20% working set × 500 bytes = 100 GB cache.

Step 3 — pick instance count. Single Redis maxes at ~25 GB RAM (larger needs Redis Cluster). 100 GB / 25 GB = 4 Redis shards.

Step 4 — validate the hit rate target. Typically design for 90–95% hit rate. At 95% hit rate, origin sees 5% of traffic. If origin capacity at 5% is comfortable, the cache is sized right.

Step 5 — plan for cold start. On full cache flush, origin must handle 100% of traffic temporarily. Either (a) warm the cache with a replay job, or (b) size origin for full load.

?“A cross-region call takes 60 ms. What does that mean for your design budget?”Reveal

Every serial cross-region hop costs you 60 ms off the latency budget.

Implications:

1Parallelise what you can. If request needs data from 3 regions, do all 3 calls in parallel — budget is max(60, 60, 60) = 60 ms, not 180 ms.
2Cache aggressively in the caller region — eliminate the round trip entirely.
3Co-locate request path data. If users in EU always need EU data, serve them from EU; don't ping US-East on every request.
4Use async for non-critical cross-region work. Replication, backup, analytics — send cross-region through a queue, don't block the request.

Hard facts:

US-East ↔ US-West: ~60 ms
US-East ↔ EU-West: ~80 ms
US-East ↔ AP-Southeast: ~150 ms
Speed of light in fiber: ~200,000 km/s — the floor no optimisation breaks

At a 100 ms p95 budget, you have zero room for even one cross-region hop. Regional deployments with local data are the only architecture that works.

Common mistakes

Using 2010-era numbers

Modern hardware is powerful. A single Postgres on an m5.xlarge handles far more than candidates expect. Don't propose sharding for 10K QPS. Do the math with current instance sizes.

Estimating storage without retention

"1 KB per row x 1M rows/day = 1 GB/day." For how long? 1 year = 365 GB. 5 years = 1.8 TB. Retention determines whether you need sharding. Always multiply by time horizon.

Sizing for average, not peak

Peak is 2-3x average for consumer apps, 5-10x for event-driven spikes. Size infrastructure for peak. Your SLA doesn't say "available on average".

Forgetting 86,400

1 day = 86,400 seconds. This is the single most-used number in capacity estimation. Memorize it. 100M events/day / 86,400 = ~1,157 events/sec.

Practice drills

Twitter has 300M MAU, 50M DAU, average 20 tweet reads per session. What's the read QPS?Reveal

50M DAU x 20 reads / 86,400 = ~11,574 QPS average. Peak (3x) = ~35K QPS. A single Postgres handles this with indexes, but you want read replicas + cache for latency. This is a caching problem, NOT a sharding problem.

Each tweet is 1KB. 500M tweets/day. How much storage per year?Reveal

500M x 1KB = 500 GB/day. x 365 = ~180 TB/year. Way beyond a single Postgres (~10 TB). Sharding is necessary for storage, not QPS. Partition by user_id. Archive old tweets to cold storage to keep the hot dataset at ~10-20 TB.

You need a Redis cache for 1B short URLs at 100K reads/sec. How many Redis instances?Reveal

QPS: 100K reads/sec — one Redis handles this. But memory: 1B x ~200 bytes = ~200 GB. Single Redis maxes at ~25 GB. So 200/25 = 8 Redis shards for memory, even though one handles the QPS. The bottleneck is memory, not throughput. Use Redis Cluster, shard by short URL key.

Cheat sheet

•1 day = 86,400 seconds. Memorize it.
•QPS from DAU: DAU x actions/user / 86,400. Peak = 2-3x average.
•Redis: 100K-300K ops/sec. Postgres: 10K-50K TPS.
•Cross-AZ: 1-2ms. Cross-region: 60ms. Cross-continent: 100ms+.
•Don't shard until storage > 10 TB or write QPS > 10K.
•Don't cache until read QPS > DB capacity or latency > SLA.
•Working set for cache: ~20% of data (80/20 rule).
•1 KB x 1M rows = 1 GB. The base unit for storage math.
•Server count: peak_QPS / QPS_per_server + 30% headroom.

9% complete

Current

Read this if

Step 1 of 11

The concept

Jump to next

System

Capacity per instance

Scale signal

Redis

100K-300K ops/sec, up to ~25 GB RAM

Memory full, or QPS > 300K

Postgres / MySQL

10K-50K TPS, up to ~10 TB storage

Write QPS > 10K or storage > 10 TB

Cassandra

10K-50K writes/sec per node, linear scaling

Add nodes for more throughput

Kafka

100K-1M msgs/sec per broker, 50 TB storage

Throughput > 800K or partition count > 200K

Elasticsearch

1K-10K searches/sec per node

Search latency > SLA or index > 1 TB

App server

1K-10K req/sec per instance

CPU > 70% or response latency > SLA

Nginx / Envoy

50K-100K req/sec as reverse proxy

Connections near 100K per instance

5,500 GETs/sec and 3,500 PUTs/sec per prefix

Prefix partitioning for higher throughput