corescalability

Caching strategy

What to cache, TTLs, invalidation, write-through vs write-back, stampede.

"We'll add a cache" is where weak designs die. Interviewers ask: which one, caching what exactly, with what TTL, invalidated how, behind which API boundary? If you can't answer all five, the cache line in your diagram is decoration.

Read this if your last attempt…

You said "we'll use Redis" and got asked how invalidation works
Your design has three caches and you can't explain how they interact
You were surprised by a cache stampede question
You never set a TTL — just "a cache"
You couldn't answer "what happens when the cache is down?"

The concept

A cache is a deliberate trade: you give up some consistency to buy latency and cost savings. Every cache decision is really four decisions at once: where in the stack it sits, what it stores, how fresh the data is, and what happens when it fails.

Saying "add Redis" without naming those four things is a red flag. The senior move is to name the pattern explicitly and defend the TTL against a concrete staleness tolerance.

Architecture diagram· The five-layer cache stack

A request passes through up to five caches before reaching origin. Senior designs name each layer explicitly.

Quick triage across the six patterns.

Pattern	Staleness	Write latency	Failure blast radius	Best for
Cache-aside	TTL-bounded	Unchanged	Transparent fallback to DB	General read workloads
Read-through (lib)	TTL-bounded	Unchanged	Same as cache-aside	Large codebase consistency
Write-through	Near-zero	+1 hop	Writes fail with cache	Read-your-write
Write-behind	Near-zero	Fastest	Data loss on cache crash	Low-value high-volume writes
Refresh-ahead	Near-zero	Unchanged	Same as cache-aside	Known hot keys
Negative caching	TTL-bounded	Unchanged	Same as cache-aside	Public key lookups

Most real systems compose two or three of these — cache-aside as the baseline, negative caching on public lookups, refresh-ahead on the known hot keys.

How interviewers grade this

You name a specific cache technology (Redis, Memcached, CDN, in-process LRU) and justify the choice against the access pattern.
You choose a caching pattern explicitly — cache-aside vs write-through vs write-behind — and say why.
You set a TTL and explain why that TTL is acceptable for this use case.
You have a cache invalidation story for writes that mutate cached data.
You address at least one pathological case: cold cache, hot key, stampede, stale read.
You size the cache — working set, hit rate, shard count — instead of saying "we'll cache it".

Variants

Cache-aside (lazy / read-through)

App owns the cache; populate on miss.

Architecture diagram· Cache-aside (read-through)

The app owns the cache. On miss, fetch from DB, populate cache, return. Most common pattern by a mile.

The default pattern. The app tries the cache first; on miss, reads from DB, writes back to cache, returns. Simple, widely deployed, and the one every senior engineer reaches for first.

Invalidation: explicit DEL on write, TTL as safety net. Some teams rely on TTL alone — fine if staleness bound is acceptable.

Watch out for: the cold-start stampede (see pitfalls) and the "write to DB, forget to invalidate" bug. In a microservices setup you also need to make sure that every service with a write path that touches cacheable data either DELs the key or publishes an invalidation.

Pros

+Simple; no coupling between cache and DB
+App tolerates cache outage (falls through to DB)
+Works with any cache + DB combo

Cons

−Stale on DB write if invalidation is missed
−First request after expiry is slow
−Stampede risk on hot key expiration

Choose this variant when

Read-heavy workloads where freshness can tolerate TTL-level staleness
You control both the app code and the DB
You want to ship today, not tomorrow

Read-through (library-owned)

A cache library owns the miss path; app just calls get(key).

A close cousin of cache-aside where the cache client handles the "miss → fetch from origin → populate → return" dance for you. Apps just call cache.get(key, loader) and the library does the rest. Semantically identical to cache-aside but cleaner to implement consistently across a large codebase.

When it's worth it: medium-to-large services where multiple teams touch the cache. Centralizing the miss path in a library prevents divergent bugs (someone forgets to populate, someone forgets to coalesce requests).

Examples in the wild: Caffeine (Java), AWS DAX for DynamoDB, Amazon ElastiCache with the Redis Lua scripting.

Pros

+Consistent miss-path behaviour across all callers
+Library can bake in single-flight, metrics, and stampede protection
+App code is simpler

Cons

−You inherit the library's opinions
−Harder to reason about if the library is a black box

Choose this variant when

Multiple teams share the cache and divergent miss paths are a real risk
You want metrics and stampede protection for free

Write-through

Writes hit cache + DB synchronously.

Architecture diagram· Write-through

Writes go to cache AND DB synchronously. Cache is always consistent with DB.

Every write goes to cache and DB in the same logical transaction. The cache is always consistent with the DB — at the cost of write latency (you pay both hops).

Use when: you need near-zero staleness on freshly-written keys and writes are a small fraction of total traffic. User profile updates and settings pages are the canonical fit.

Watch out for: if the cache is down, writes fail unless you fall back to DB-only and hope to repopulate. Also beware caching writes that will never be read — a one-time settings change pollutes cache memory for nothing.

Pros

+Cache is never stale relative to DB
+No "forgot to invalidate" class of bug
+Reads after a write are always hot

Cons

−Every write pays two hops
−Cache outage fails writes (unless you fall back carefully)
−Caches cold-written data even if it'll never be read

Choose this variant when

User profile or settings pages where the user immediately reads what they wrote
Write volume is small enough that caching every write is cheap

Write-behind (write-back)

Write to cache; flush to DB async.

Architecture diagram· Write-behind

Write to cache; cache flushes to DB async. Great for high write throughput but risks loss on crash.

Writes return after hitting the cache. A background job batches and flushes to DB. Massively improves write latency and lets you coalesce bursts — at the cost of durability risk and operational complexity.

Reality check: most teams that think they want this actually want a proper async queue + consumer. Write-behind is a fancy way to say "my cache is my write buffer" and it's a rough failure mode — if the cache node dies before flush, the writes are gone.

Pros

+Very low write latency
+Natural batching / coalescing of bursts
+Can smooth over brief DB outages

Cons

−Data loss if cache node dies before flush
−Complex to reason about (what's durable? when?)
−Hard to rebuild correctly after an outage

Choose this variant when

Metrics, counters, analytics — low-value writes where occasional loss is fine
Known bursty workloads where DB can't absorb spikes

Refresh-ahead

Proactively refresh keys before TTL expiry.

For hot keys with predictable access, refresh the cache entry before TTL expires so users never see a miss. Usually implemented as a background job that samples keys with high access frequency and re-fetches from origin.

Trade-off: wasted work if a refreshed key isn't read again. Only worth it for the top-N hot keys.

Pros

+Eliminates post-expiry latency spikes
+No stampede — background refresh is singleton
+User-perceived freshness improves

Cons

−Wasted refreshes on keys that go cold
−Needs access-frequency tracking
−Extra moving part to operate

Choose this variant when

Top-N celebrity profiles on a social app
Hot product pages during a flash sale
Global config or feature flags read on every request

Negative caching

Cache "not found" too — so misses do not re-hit origin.

When a key genuinely does not exist in origin (deleted user, invalid short code, 404), you still want to cache that fact for a short TTL. Otherwise every lookup of a non-existent key becomes a full origin miss and an attacker can hammer origin just by generating random keys.

How: store a sentinel value (null, a tombstone, or a typed "not-found" marker) with a short TTL (seconds to a minute). On next lookup, the app sees the sentinel and returns 404 without touching origin.

Watch out for: legitimate keys that briefly looked absent (race between write and read). Keep the negative TTL short enough that a newly-created key becomes visible quickly.

Pros

+Shields origin from lookup storms on non-existent keys
+Simple to add to any cache-aside implementation

Cons

−Briefly hides newly-created keys
−Cardinality grows with attack surface — needs bounded memory

Choose this variant when

Public key-lookup endpoints (URL shorteners, short-link expanders)
Where enumeration attacks are likely

Worked example

Scenario: Caching the feed timeline for a social app. 100M DAU, 10 timeline fetches/user/day.

Step 1 — sizing

Reads: 100M × 10 / 86,400 ≈ 12K reads/sec avg, × 4 peak = ~50K reads/sec
Payload: 30 posts × 1 KB each = ~30 KB per timeline
Working set: 100M users × 30 KB hot timeline = ~3 TB total if we cached all
Hit rate target: 95% (most users refresh their OWN timeline repeatedly)
Cache memory: 80/20 hot set = ~600 GB
Shards: 600 GB / 25 GB usable per Redis shard = ~24 shards

Step 2 — pattern choice

Pattern: cache-aside with a 30-second TTL
Why: feeds tolerate 30s staleness; write path (new post) already pushes invalidations
Invalidation: on new post, publish to Kafka topic; each region's cache subscribes and DELs affected users' timelines

Step 3 — stampede + hot key

Stampede: single-flight per user — one in-flight timeline-build per user key, others wait up to 500ms for result
Hot key: not usually a problem for per-user timelines, but celebrity feeds see hot reads. In-proc LRU on each app server catches 90% of reads for the top 10K users

Step 4 — failure modes

Cache down: circuit-breaker to in-proc LRU only; rate-limit origin to 20% of normal; return "timeline unavailable, try again" for tail users (graceful degradation)
Stale read: acceptable up to TTL; writer pushes invalidation so in worst case users see a 30s delay on seeing a new post

Step 5 — write path

New post → insert into DB → publish FeedInvalidation event → Kafka consumer walks the poster's followers → DEL each follower's cached timeline
Fan-out cost: ~200 avg followers × 100M daily posts ≈ 20B invalidations/day ≈ 230K invalidations/sec at peak — size Kafka + consumer for this

This whole thing takes 4 minutes in an interview. Every decision is traceable to a number.

Good vs bad answer

Interviewer probe

“How are you caching the URL shortener's read path?”

Weak answer

"Redis in front of Postgres — we cache the short code → long URL mapping. Hit rate will be high because redirects are concentrated."

Strong answer

"Cache-aside, two-tier. In-process LRU on each app server holds the top 100k hot keys — that absorbs viral spikes. Redis behind that for the long tail, 10-minute TTL. On miss we go to Postgres, populate both tiers, return. Invalidation: explicit DEL on delete + TTL as safety net. For stampede on popular expiring keys I use request coalescing — only one in-flight fetch per key. Negative caching on unknown codes with a 60s TTL to shield origin from enumeration. Hot key on one Redis shard is the residual risk; in-proc LRU handles it because the same shard would get all the traffic anyway."

Why it wins: Names the pattern, the TTL, the invalidation, the stampede protection, the hot-key mitigation, and the negative-caching defence — all six of the things interviewers probe.

Interview playbook5–7 min across the read path

When it comes up

Any read-heavy system (almost all of them)
After you name QPS in the capacity pass — interviewer asks how you will handle it
When storage > single-primary capacity and you need to offload reads
Whenever latency SLO is < 50ms p99 on a DB-backed endpoint
When the prompt includes "hot content" or "trending" or "viral"

Order of reveal

1
Name the pattern. "Cache-aside as the default — simple, tolerates cache outages, works everywhere."
2
Name the technology. "Redis for server-side key-value cache; in-process LRU above it for the hottest 1% of keys; CDN at the edge for anything publicly cacheable."
3
Size it. "Working set is ~600 GB, so ~24 Redis shards at 25 GB usable each."
4
Set the TTL. "30-second TTL — feeds tolerate this staleness, and it bounds the invalidation-miss blast radius."
5
Wire invalidation. "Writes publish an invalidation event; each region's cache consumer DELs affected keys. TTL is the safety net."
6
Handle stampede. "Single-flight per key — one in-flight origin fetch per key, others wait. Bounded to 500ms."
7
Handle hot keys. "In-proc LRU above Redis absorbs hot reads. For truly viral keys, publish to CDN edge so 1000s of POPs share load."
8
Handle cache failure. "Circuit-break on Redis errors, fall through to in-proc LRU only, rate-limit origin to prevent cascade."

Signature phrases

“Cache-aside as the default, with a stampede defence”

“TTL is my staleness budget — I can defend 30 seconds”

“In-proc LRU above Redis catches the top 1% of keys”

“Single-flight per key to prevent stampede”

“Negative caching with a 60-second TTL”

“On cache outage, circuit-break to origin with rate limits”

“Cache-aside as the default, with a stampede defence” — Names both the pattern and its most common failure mode.
“TTL is my staleness budget — I can defend 30 seconds” — Shows TTL is a consequence of product requirement, not a guess.
“In-proc LRU above Redis catches the top 1% of keys” — Layered caching prevents hot shards without adding ops complexity.
“Single-flight per key to prevent stampede” — Names the specific technique, not just "we'll prevent stampede".
“Negative caching with a 60-second TTL” — Shows awareness of enumeration attacks on public key lookups.
“On cache outage, circuit-break to origin with rate limits” — Demonstrates failure-mode thinking the interviewer rarely has to probe for.

Likely follow-ups

?“What if Redis goes down?”Reveal

Three-stage response:

1Circuit-break immediately — app detects Redis errors via health-check and stops attempting Redis calls for 30s.
2Fall through to in-proc LRU — still serves the top 1% of traffic with zero network calls.
3Rate-limit origin — cap origin QPS at something it can sustain (say 20% of normal cached throughput). Return graceful degradation responses for excess traffic.

The key point: you never let a Redis outage cascade into an origin meltdown. The fallback path has to be explicitly rate-limited.

?“How do you prevent cache stampede on a hot key expiring?”Reveal

Two options, use one or both:

Single-flight (request coalescing): library-level. When a key misses, the first request fetches from origin; subsequent requests for the same key wait for that in-flight fetch to return. Bounds the stampede to exactly one origin call per key regardless of how many concurrent misses.
Probabilistic early expiration: before TTL, each read has a small and growing probability of triggering a refresh. By the time TTL actually hits, the key has almost certainly been refreshed already. XFetch algorithm is the reference implementation.

I'd use single-flight as the default because it needs no tuning; add probabilistic expiration for the top-N hot keys.

?“How do you invalidate the cache when the underlying data changes?”Reveal

Two strategies, usually combined:

Explicit DEL on write: every write path that mutates cacheable data deletes the corresponding cache keys before returning. This is the most correct approach but hard to enforce across large codebases.
TTL as safety net: even if DEL is missed, the entry expires in seconds-to-minutes.

For cross-service invalidation, publish an invalidation event on Kafka/pub-sub; each service subscribes and DELs its own affected keys. For cross-region, publish to a global topic; each region's consumer DELs its local cache. Consistency is eventual and bounded by topic lag — usually < 1 second.

?“Your cache hit rate dropped from 95% to 70%. What do you investigate?”Reveal

Walk the list:

1Working set grew — did a new feature blow up cardinality? (e.g. per-device keys instead of per-user)
2TTL shortened — someone lowered a TTL to fix a staleness bug and didn't notice the hit-rate impact.
3Invalidation floods — an upstream change is DELing more keys than expected.
4Memory pressure / evictions — Redis is evicting because working set exceeds capacity; check evicted_keys metric.
5Cold start — recent cache restart; warming up.

The instrument to check: keyspace_hits / (keyspace_hits + keyspace_misses) in Redis, plus application-level hit/miss metrics. If evictions are non-zero, you're over capacity — add shards or shrink the working set.

Code examples

pythonCache-aside with single-flight stampede protection

import threading
from functools import lru_cache

_inflight = {}
_inflight_lock = threading.Lock()

def get_with_singleflight(key, loader, ttl_sec=60):
    value = redis.get(key)
    if value is not None:
        return value
    # stampede defence — one in-flight loader per key
    with _inflight_lock:
        if key in _inflight:
            event = _inflight[key]
        else:
            event = threading.Event()
            _inflight[key] = event
            try:
                value = loader()                 # origin fetch
                redis.set(key, value, ex=ttl_sec)
                return value
            finally:
                _inflight.pop(key, None)
                event.set()
    event.wait(timeout=0.5)
    return redis.get(key)

httpHTTP caching headers that actually do work

Cache-Control: public, max-age=86400, s-maxage=300, stale-while-revalidate=60
ETag: "v42-abc123"
Vary: Accept-Encoding, Accept-Language

# public          — CDN/edge may cache
# max-age=86400   — browser keeps for 1 day
# s-maxage=300    — shared caches (CDN) keep for 5 min
# stale-while-revalidate=60  — serve stale up to 60s while refetching in bg
# ETag            — conditional revalidation (304 Not Modified on match)
# Vary            — split cache by header

typescriptCircuit breaker fallback when cache is down

// On Redis errors, short-circuit for 30s and serve from in-proc LRU + rate-limited origin.
import CircuitBreaker from 'opossum';

const breaker = new CircuitBreaker(
  (key: string) => redis.get(key),
  {
    timeout: 50,           // ms before a Redis call is considered failed
    errorThresholdPercentage: 25,
    resetTimeout: 30_000,  // half-open after 30s
  },
);

breaker.fallback(async (key: string) => {
  const local = lru.get(key);          // 1) in-proc LRU
  if (local) return local;
  if (!originLimiter.tryAcquire()) {   // 2) rate-limit origin
    throw new Error('degraded: origin saturated');
  }
  const value = await db.fetch(key);
  lru.set(key, value);                 // populate LRU only; skip Redis while open
  return value;
});

export const getWithCache = (key: string) => breaker.fire(key);

Common mistakes

Cache stampede on hot-key expiration

A hot key expires and thousands of requests miss simultaneously, hammering the DB. Fix: request coalescing (single-flight) — only one in-flight fetch per key, others wait. Or probabilistic early expiration: refresh the key N seconds before TTL based on age × usage.

Architecture diagram· Single-flight stampede defence

Many concurrent misses for the same key collapse to ONE origin fetch. Followers wait on the in-flight result instead of piling on origin.

No invalidation story

You SET on create, but when the underlying row changes you don't DEL. Result: users read stale data forever. Fix: make every write path either DEL or SET the cache. For multi-region setups, publish invalidations via pub/sub.

Hot-key imbalance

A viral link is on one Redis shard; that shard hits 100% CPU while the rest are idle. Fix stack: in-process LRU (catches most before it reaches Redis) → replica fanout (read from any replica of the shard) → CDN edge (absorbs 80%+ before origin sees it).

Architecture diagram· Hot-key mitigation stack — defence in depth

A viral key would hammer one Redis shard. Layer caches above it: CDN absorbs public reads, in-proc LRU absorbs per-server reads, Redis replicas spread load. Origin only sees what nothing else caught.

Caching mutable aggregates without a staleness planAdvanced

Caching "like count" or "unread messages" without naming how stale is acceptable is a tell. These change per-user and per-second; the cache either has to be per-user (cardinality explodes) or tolerated-stale (interviewer asks about it).

Caching secrets or user-specific data on a shared CDNAdvanced

A response containing user-specific data (balance, personalized feed) cached at the edge will be served to the wrong user. Fix: Cache-Control: private on the origin response, or explicit per-user cache keys. When in doubt, do not cache it at the edge.

Cache warming absent from the deploy playbookAdvanced

A fresh deploy starts cold; the first burst of traffic hammers origin. Fix: pre-warm during deployment by replaying a sample of requests; or roll deploys gradually so each batch warms before the next joins; or use sticky sessions to concentrate warm-up on a subset of users first.

Practice drills

You add caching to an existing service. The DB load doesn't drop. What do you check first?Reveal

Hit rate. A cache with low hit rate is worse than no cache (you're paying the lookup penalty). Common causes: (1) keys are high-cardinality so the working set doesn't fit; (2) TTL is too short; (3) you're caching per-user when you could cache per-entity; (4) the cache is cold because the app restarts evict it. Instrument hit/miss at the app, then at Redis. Expect 90%+ hit rate on redirect-shaped workloads; anything below 70% is a design smell.

Cache-aside works great — except when the DB is slow. Then what?Reveal

Cache misses queue up behind the slow DB, the app runs out of worker threads, and effectively the whole system stops. This is the cascading failure mode that destroys naive cache-aside in production. Fixes: (1) request coalescing per key so one slow fetch doesn't multiply; (2) circuit breaker on DB so the app short-circuits to a stale-cache-is-OK path; (3) serve stale-on-error from cache — return the last known value with a header indicating it, rather than a 500.

You need per-user personalized data served with <50ms p99. Cache at the CDN, at Redis, or in-process?Reveal

Probably Redis + in-proc LRU, not the CDN. Reasons:

CDN is wrong: per-user data must not land on a shared edge cache without Cache-Control: private, and even then the hit rate is low because each user has their own entry.
Redis: good for cross-server sharing but still a 1–2 ms network hop, close to the p99 budget when you consider TCP and app processing.
In-proc LRU: ~100 ns, but each server has its own copy — with sticky sessions (or replicating the top-N to every server) you can get 80%+ hit rate for free.

The answer: Redis as the source of truth, in-proc LRU on each app server with a short TTL (5–10 s) and sticky routing.

A write to the DB succeeded but the DEL to the cache failed. What guarantees do you have?Reveal

None automatically — your cache is stale until the TTL fires. Options to make this safer: (1) retry the DEL with backoff in the write path (acceptable if cache write failures are rare); (2) publish an invalidation event to Kafka/pub-sub, which is durable and retried by the consumer; (3) write-through — make the cache write part of the same logical transaction (more complex, higher write latency). Most teams pick option 2 because it's cheap and eventually correct.

Your cache memory is 80% full and Redis is starting to evict. What do you do?Reveal

Four options in order of complexity:

1Increase capacity — add shards or bigger instances. Cheapest, solves the symptom.
2Shrink keys/values — compress JSON values, shorten key names, drop embedded duplicates. Often halves memory with no hit-rate impact.
3Lower TTLs — if your staleness budget allows, shorter TTLs mean faster turnover and lower steady-state memory.
4Cache less — only cache truly hot keys; let cold keys go directly to origin. Needs hit-rate telemetry to pick the threshold.

I'd start with #2 (always worth trying) + #1 (buys time), then consider #3/#4 based on cost.

Cheat sheet

•Four cache decisions: where, what, how fresh, how it fails.
•Cache-aside is the default. Justify anything else.
•TTL = your staleness budget. Name it, defend it.
•Stampede: single-flight or probabilistic early expiration.
•Hot keys: in-proc LRU → replica fanout → CDN. Stack them.
•Every cache write path: SET and make sure invalidation is wired.
•Cache outages must fail open for non-critical paths; circuit-break before they kill origin.
•Negative-cache public lookups with short TTLs to shield origin from enumeration.
•Size it: working set × hit rate × per-shard capacity. ~25 GB usable per Redis shard.
•Cache-Control: private for user-specific responses; never let PII hit a shared edge cache.

Practice this skill

These problems exercise Caching strategy. Try one now to apply what you just learned.

url shortener news feed

7% complete

Current

Read this if

Step 1 of 14

The concept

Jump to next

Pattern

Staleness

Write latency

Failure blast radius

Best for

Cache-aside

TTL-bounded

Unchanged

Transparent fallback to DB

General read workloads

Read-through (lib)

TTL-bounded

Unchanged

Same as cache-aside

Large codebase consistency

Write-through

Near-zero

+1 hop

Writes fail with cache

Read-your-write

Write-behind

Near-zero

Fastest

Data loss on cache crash

Low-value high-volume writes

Refresh-ahead

Near-zero

Unchanged

Same as cache-aside

Known hot keys

Negative caching

TTL-bounded

Unchanged

Same as cache-aside

Public key lookups

import threading from functools import lru_cache _inflight = {} _inflight_lock = threading.Lock() def get_with_singleflight(key, loader, ttl_sec=60): value = redis.get(key) if value is not None: return value # stampede defence — one in-flight loader per key with _inflight_lock: if key in _inflight: event = _inflight[key] else: event = threading.Event() _inflight[key] = event try: value = loader() # origin fetch redis.set(key, value, ex=ttl_sec) return value finally: _inflight.pop(key, None) event.set() event.wait(timeout=0.5) return redis.get(key)

Cache-Control: public, max-age=86400, s-maxage=300, stale-while-revalidate=60 ETag: "v42-abc123" Vary: Accept-Encoding, Accept-Language # public — CDN/edge may cache # max-age=86400 — browser keeps for 1 day # s-maxage=300 — shared caches (CDN) keep for 5 min # stale-while-revalidate=60 — serve stale up to 60s while refetching in bg # ETag — conditional revalidation (304 Not Modified on match) # Vary — split cache by header

// On Redis errors, short-circuit for 30s and serve from in-proc LRU + rate-limited origin. import CircuitBreaker from 'opossum'; const breaker = new CircuitBreaker( (key: string) => redis.get(key), { timeout: 50, // ms before a Redis call is considered failed errorThresholdPercentage: 25, resetTimeout: 30_000, // half-open after 30s }, ); breaker.fallback(async (key: string) => { const local = lru.get(key); // 1) in-proc LRU if (local) return local; if (!originLimiter.tryAcquire()) { // 2) rate-limit origin throw new Error('degraded: origin saturated'); } const value = await db.fetch(key); lru.set(key, value); // populate LRU only; skip Redis while open return value; }); export const getWithCache = (key: string) => breaker.fire(key);