Numbers to know
Latency, throughput, and capacity numbers for sizing designs, not vibes.
If your design decisions aren't backed by numbers, they're opinions. Knowing that Redis handles 100K ops/sec or that a cross-region RTT is 60ms isn't trivia — it's what separates "we'll add a cache" from "we need 3 Redis instances because our hot-path is 200K reads/sec."
Read this if your last attempt…
- You proposed sharding at 1M rows (Postgres handles billions with proper indexes)
- You can't estimate QPS from DAU in your head
- You don't know how much a single Postgres or Redis instance can handle
- Your capacity math is "a lot" instead of a number
The concept
Modern hardware is far more powerful than most candidates assume. A single well-tuned Postgres instance handles 10K-50K transactions per second. A single Redis instance handles 100K-300K operations per second. If your system does 1K QPS and you're proposing sharding, you're adding massive complexity for no reason.
The numbers you need to memorise fall into three buckets: latency (how long each hop takes), capacity (how much one instance can handle), and formulas (how to estimate from user counts).
System capacity per instance — the numbers that prevent over-engineering.
| System | Capacity per instance | Scale signal |
|---|---|---|
| Redis | 100K-300K ops/sec, up to ~25 GB RAM | Memory full, or QPS > 300K |
| Postgres / MySQL | 10K-50K TPS, up to ~10 TB storage | Write QPS > 10K or storage > 10 TB |
| Cassandra | 10K-50K writes/sec per node, linear scaling | Add nodes for more throughput |
| Kafka | 100K-1M msgs/sec per broker, 50 TB storage | Throughput > 800K or partition count > 200K |
| Elasticsearch | 1K-10K searches/sec per node | Search latency > SLA or index > 1 TB |
| App server | 1K-10K req/sec per instance | CPU > 70% or response latency > SLA |
| Nginx / Envoy | 50K-100K req/sec as reverse proxy | Connections near 100K per instance |
| S3 | 5,500 GETs/sec and 3,500 PUTs/sec per prefix | Prefix partitioning for higher throughput |
How interviewers grade this
- You estimate QPS from DAU fluently — a number, not "a lot".
- You know single-instance capacity (Redis: 100K ops, Postgres: 10K+ TPS) and size accordingly.
- You do capacity math when making scaling decisions, not by vibes.
- You know latency numbers and use them in your budget.
- You push back on premature sharding by citing actual instance limits.
Variants
Latency numbers
The time each hop takes — from nanoseconds to hundreds of milliseconds.
These are the numbers that define your latency budget. Every serial hop in your design adds this much latency:
- L1 cache reference: ~1 ns
- Main memory reference: ~100 ns
- SSD random read (4KB): ~100 microseconds
- Redis / Memcached GET (same AZ): ~0.5-1 ms
- Simple indexed DB query (same AZ): ~2-10 ms
- Cross-AZ round trip (same region): ~1-2 ms
- Cross-region (same continent, e.g. US-East to US-West): ~30-60 ms
- Cross-continent (US to Europe): ~80-120 ms
- CDN edge hit: ~5-20 ms
- DNS cold lookup: ~50-200 ms
- TCP + TLS 1.3 handshake: ~30-60 ms
The critical insight: the gap between in-memory (nanoseconds) and cross-continent (100ms) is six orders of magnitude. That's why caching works so well — you're replacing a 10ms DB call with a 0.5ms Redis call, a 20x improvement. And it's why global deployments with regional data are essential for low-latency apps — cross-continent latency is physics, not engineering.
Choose this variant when
- Every design — these numbers should be muscle memory
Quick estimation formulas
The five formulas that size every system.
- 1QPS from DAU: DAU x actions_per_user / 86,400. Peak is 2-3x average for consumer apps, up to 10x for flash-sale or event-driven spikes.
- 1Storage per year: rows_per_day x avg_row_size x 365. Don't forget to multiply by retention. "1 GB/day" sounds small until you multiply by 5 years = 1.8 TB.
- 1Bandwidth: QPS x avg_response_size. This tells you your CDN/egress bill and whether you need compression.
- 1Cache size: working_set_fraction x total_data x avg_object_size. Working set is typically 20% of data for most apps (the 80/20 rule).
- 1Server count: peak_QPS / QPS_per_server + 30% headroom. Always size for peak, not average.
The senior move: compute these in the first 3 minutes of a design round, then use them to drive every infrastructure decision that follows. "We need 3 Redis instances because our working set is 60 GB" beats "we'll use Redis" every time.
Choose this variant when
- Capacity estimation section of every design round
- Before proposing any infrastructure component
Common over-engineering mistakes
Things candidates propose too early because they don't know the numbers.
Sharding at 1M rows: Postgres handles billions of rows with proper indexes. You don't need sharding until storage exceeds ~10 TB or write QPS exceeds ~10K. Sharding at 1M rows is adding distributed systems complexity for zero benefit.
Cache for 100 req/sec: a single Postgres instance handles this in its sleep. You cache when read QPS exceeds DB capacity or latency exceeds your SLA, not as a default.
Kafka for 100 events/sec: a simple database table with a consumer polling every second handles this. Kafka shines at >10K events/sec with multiple consumer groups and replay requirements.
Microservices for a new product: a single monolith on one server handles most early-stage products. Split when you have team-scale problems, not load problems.
The rule: if you can't justify the complexity with a number (QPS, storage, latency), you're over-engineering. Interviewers at staff+ level actively look for this signal — knowing when NOT to add infrastructure is as important as knowing when to add it.
Choose this variant when
- When you hear yourself saying "let's add X just in case"
Worked example
Scenario: size the infrastructure for a URL shortener with 100M MAU.
Step 1 — QPS:
- DAU = 100M / 3 = ~33M (consumer app DAU/MAU ratio ~1/3).
- Write: 1 new URL per user per day = 33M / 86,400 = ~380 writes/sec avg. Peak 3x = ~1,200.
- Read: 100:1 read:write ratio. 38,000 reads/sec avg. Peak = ~115,000.
Step 2 — Storage (5 years):
- 33M URLs/day x 500 bytes x 365 x 5 = ~30 TB.
- Exceeds single Postgres (~10 TB). Sharding needed for storage. Partition by short_code hash.
Step 3 — Cache:
- 115K reads/sec at peak. Single Redis handles this (100-300K ops/sec).
- Hot set: 20% of 30B total URLs x 200 bytes = ~1.2 TB. Too big for one Redis.
- But 80/20 on the hot set: 20% of 1.2TB = ~240 GB cache gets ~96% hit rate.
- 240 GB / 25 GB per Redis = ~10 Redis instances. Redis Cluster.
Step 4 — Servers:
- 115K reads/sec peak. Each app server does ~5K req/sec.
- 115K / 5K = 23 servers. Add 30% = ~30 servers.
Result: 30 app servers, 10 Redis shards, 3+ Postgres shards. Every number is justified.
Good vs bad answer
Interviewer probe
“How many servers do you need?”
Weak answer
"A lot — we'll have millions of users. Let's use auto-scaling and it'll figure it out."
Strong answer
"33M DAU at 100:1 read/write gives us ~115K reads/sec peak. Each app server handles about 5K req/sec, so I need ~23 servers plus 30% headroom — call it 30. Cache: 240 GB working set across 10 Redis shards. Storage: 30 TB over 5 years, so 3+ Postgres shards partitioned by short_code hash. The bottleneck is cache hit rate — at 96%, origin DB sees only ~4,600 QPS peak, which 3 shards handle easily."
Why it wins: Every infrastructure decision has a number behind it. The interviewer can challenge any number and get a defended answer.
When it comes up
- During capacity estimation — the first 3–5 minutes of most design rounds
- When proposing infrastructure — every component needs a number
- When the interviewer challenges "do you actually need this?"
- When sizing cache, DB instances, or server fleet
- When pushing back on over-engineering (sharding, microservices, Kafka)
Order of reveal
- 1State the base unit. "1 day = 86,400 seconds. Base number for every QPS estimate."
- 2Compute QPS from DAU. "DAU × actions/user / 86,400 = avg QPS. Peak is 2-3× for consumer, 10× for flash sales."
- 3State storage with retention. "Rows/day × size × retention_years. Without retention, the number is meaningless."
- 4Size each component from single-instance capacity. "Redis: 100k-300k ops/s. Postgres: 10k-50k TPS. App: 1k-10k req/s. Divide peak by per-instance to size fleet."
- 5Add 30% headroom. "Always size for peak, not average, and add headroom for growth and failure margin."
- 6Call out cache hit rate as the lever. "At X% hit rate, origin sees only (1-X)× the QPS. This is where the biggest scaling wins live."
- 7Push back on premature scaling. "At Y QPS / Z TB, we do NOT need sharding/Kafka/microservices yet. Here's the threshold at which we would."
Signature phrases
- “1 day = 86,400 seconds” — The single most-used number in capacity estimation.
- “Don't shard until 10 TB or 10k write QPS” — Specific threshold that pushes back on premature complexity.
- “Peak is 2-3× average” — Prevents under-sizing for the worst hour of the day.
- “Working set is ~20% of data” — Heuristic for sizing cache without guessing.
- “Modern Postgres handles billions of rows” — Counter to the reflexive "scales poorly" narrative.
- “Every number has a justification” — Frames your sizing as defendable, not vibes.
Likely follow-ups
?“Your system is at 1k write QPS. Do you need to shard the database?”Reveal
No. A single well-tuned Postgres instance handles 10k–50k writes/sec. At 1k QPS you're using 2–10% of a single instance.
When sharding IS justified:
- 1Sustained write QPS > 20k/s on a single hot table
- 2Dataset size > 10 TB (backup, recovery, and dump times become prohibitive)
- 3Access-pattern divergence — different tables want different sharding strategies
At 1k QPS, the right optimisations are:
- Indexes tailored to access patterns (the biggest lever)
- Read replicas for read scaling
- Connection pooling (PgBouncer)
- Partitioning by time (for large but low-QPS tables)
Sharding at 1k QPS adds massive operational complexity (cross-shard queries, rebalancing, ops surface) for zero performance benefit. It is a common over-engineering trap at the senior level — pushing back on it shows maturity.
?“How do you estimate cache size for a system?”Reveal
Working set × object size, with a hit-rate target.
Step 1 — estimate working set. The 80/20 rule: about 20% of data handles 80% of traffic. For really skewed distributions (celebrity content, viral posts), 5% handles 95%. For uniform distributions (random-access KV), working set ≈ all data.
Step 2 — compute cache memory.
working_set_fraction × total_items × avg_object_sizeExample: 1B URLs × 20% working set × 500 bytes = 100 GB cache.
Step 3 — pick instance count. Single Redis maxes at ~25 GB RAM (larger needs Redis Cluster). 100 GB / 25 GB = 4 Redis shards.
Step 4 — validate the hit rate target. Typically design for 90–95% hit rate. At 95% hit rate, origin sees 5% of traffic. If origin capacity at 5% is comfortable, the cache is sized right.
Step 5 — plan for cold start. On full cache flush, origin must handle 100% of traffic temporarily. Either (a) warm the cache with a replay job, or (b) size origin for full load.
?“A cross-region call takes 60 ms. What does that mean for your design budget?”Reveal
Every serial cross-region hop costs you 60 ms off the latency budget.
Implications:
- 1Parallelise what you can. If request needs data from 3 regions, do all 3 calls in parallel — budget is max(60, 60, 60) = 60 ms, not 180 ms.
- 2Cache aggressively in the caller region — eliminate the round trip entirely.
- 3Co-locate request path data. If users in EU always need EU data, serve them from EU; don't ping US-East on every request.
- 4Use async for non-critical cross-region work. Replication, backup, analytics — send cross-region through a queue, don't block the request.
Hard facts:
- US-East ↔ US-West: ~60 ms
- US-East ↔ EU-West: ~80 ms
- US-East ↔ AP-Southeast: ~150 ms
- Speed of light in fiber: ~200,000 km/s — the floor no optimisation breaks
At a 100 ms p95 budget, you have zero room for even one cross-region hop. Regional deployments with local data are the only architecture that works.
Common mistakes
Modern hardware is powerful. A single Postgres on an m5.xlarge handles far more than candidates expect. Don't propose sharding for 10K QPS. Do the math with current instance sizes.
"1 KB per row x 1M rows/day = 1 GB/day." For how long? 1 year = 365 GB. 5 years = 1.8 TB. Retention determines whether you need sharding. Always multiply by time horizon.
Peak is 2-3x average for consumer apps, 5-10x for event-driven spikes. Size infrastructure for peak. Your SLA doesn't say "available on average".
1 day = 86,400 seconds. This is the single most-used number in capacity estimation. Memorize it. 100M events/day / 86,400 = ~1,157 events/sec.
Practice drills
Twitter has 300M MAU, 50M DAU, average 20 tweet reads per session. What's the read QPS?Reveal
50M DAU x 20 reads / 86,400 = ~11,574 QPS average. Peak (3x) = ~35K QPS. A single Postgres handles this with indexes, but you want read replicas + cache for latency. This is a caching problem, NOT a sharding problem.
Each tweet is 1KB. 500M tweets/day. How much storage per year?Reveal
500M x 1KB = 500 GB/day. x 365 = ~180 TB/year. Way beyond a single Postgres (~10 TB). Sharding is necessary for storage, not QPS. Partition by user_id. Archive old tweets to cold storage to keep the hot dataset at ~10-20 TB.
You need a Redis cache for 1B short URLs at 100K reads/sec. How many Redis instances?Reveal
QPS: 100K reads/sec — one Redis handles this. But memory: 1B x ~200 bytes = ~200 GB. Single Redis maxes at ~25 GB. So 200/25 = 8 Redis shards for memory, even though one handles the QPS. The bottleneck is memory, not throughput. Use Redis Cluster, shard by short URL key.
Cheat sheet
- •1 day = 86,400 seconds. Memorize it.
- •QPS from DAU: DAU x actions/user / 86,400. Peak = 2-3x average.
- •Redis: 100K-300K ops/sec. Postgres: 10K-50K TPS.
- •Cross-AZ: 1-2ms. Cross-region: 60ms. Cross-continent: 100ms+.
- •Don't shard until storage > 10 TB or write QPS > 10K.
- •Don't cache until read QPS > DB capacity or latency > SLA.
- •Working set for cache: ~20% of data (80/20 rule).
- •1 KB x 1M rows = 1 GB. The base unit for storage math.
- •Server count: peak_QPS / QPS_per_server + 30% headroom.