Caching strategy
What to cache, TTLs, invalidation, write-through vs write-back, stampede.
"We'll add a cache" is where weak designs die. Interviewers ask: which one, caching what exactly, with what TTL, invalidated how, behind which API boundary? If you can't answer all five, the cache line in your diagram is decoration.
Read this if your last attempt…
- You said "we'll use Redis" and got asked how invalidation works
- Your design has three caches and you can't explain how they interact
- You were surprised by a cache stampede question
- You never set a TTL — just "a cache"
- You couldn't answer "what happens when the cache is down?"
The concept
A cache is a deliberate trade: you give up some consistency to buy latency and cost savings. Every cache decision is really four decisions at once: where in the stack it sits, what it stores, how fresh the data is, and what happens when it fails.
Saying "add Redis" without naming those four things is a red flag. The senior move is to name the pattern explicitly and defend the TTL against a concrete staleness tolerance.
A request passes through up to five caches before reaching origin. Senior designs name each layer explicitly.
Quick triage across the six patterns.
| Pattern | Staleness | Write latency | Failure blast radius | Best for |
|---|---|---|---|---|
| Cache-aside | TTL-bounded | Unchanged | Transparent fallback to DB | General read workloads |
| Read-through (lib) | TTL-bounded | Unchanged | Same as cache-aside | Large codebase consistency |
| Write-through | Near-zero | +1 hop | Writes fail with cache | Read-your-write |
| Write-behind | Near-zero | Fastest | Data loss on cache crash | Low-value high-volume writes |
| Refresh-ahead | Near-zero | Unchanged | Same as cache-aside | Known hot keys |
| Negative caching | TTL-bounded | Unchanged | Same as cache-aside | Public key lookups |
- Most real systems compose two or three of these — cache-aside as the baseline, negative caching on public lookups, refresh-ahead on the known hot keys.
How interviewers grade this
- You name a specific cache technology (Redis, Memcached, CDN, in-process LRU) and justify the choice against the access pattern.
- You choose a caching pattern explicitly — cache-aside vs write-through vs write-behind — and say why.
- You set a TTL and explain why that TTL is acceptable for this use case.
- You have a cache invalidation story for writes that mutate cached data.
- You address at least one pathological case: cold cache, hot key, stampede, stale read.
- You size the cache — working set, hit rate, shard count — instead of saying "we'll cache it".
Variants
Cache-aside (lazy / read-through)
App owns the cache; populate on miss.
The app owns the cache. On miss, fetch from DB, populate cache, return. Most common pattern by a mile.
The default pattern. The app tries the cache first; on miss, reads from DB, writes back to cache, returns. Simple, widely deployed, and the one every senior engineer reaches for first.
Invalidation: explicit DEL on write, TTL as safety net. Some teams rely on TTL alone — fine if staleness bound is acceptable.
Watch out for: the cold-start stampede (see pitfalls) and the "write to DB, forget to invalidate" bug. In a microservices setup you also need to make sure that every service with a write path that touches cacheable data either DELs the key or publishes an invalidation.
Pros
- +Simple; no coupling between cache and DB
- +App tolerates cache outage (falls through to DB)
- +Works with any cache + DB combo
Cons
- −Stale on DB write if invalidation is missed
- −First request after expiry is slow
- −Stampede risk on hot key expiration
Choose this variant when
- Read-heavy workloads where freshness can tolerate TTL-level staleness
- You control both the app code and the DB
- You want to ship today, not tomorrow
Read-through (library-owned)
A cache library owns the miss path; app just calls get(key).
A close cousin of cache-aside where the cache client handles the "miss → fetch from origin → populate → return" dance for you. Apps just call cache.get(key, loader) and the library does the rest. Semantically identical to cache-aside but cleaner to implement consistently across a large codebase.
When it's worth it: medium-to-large services where multiple teams touch the cache. Centralizing the miss path in a library prevents divergent bugs (someone forgets to populate, someone forgets to coalesce requests).
Examples in the wild: Caffeine (Java), AWS DAX for DynamoDB, Amazon ElastiCache with the Redis Lua scripting.
Pros
- +Consistent miss-path behaviour across all callers
- +Library can bake in single-flight, metrics, and stampede protection
- +App code is simpler
Cons
- −You inherit the library's opinions
- −Harder to reason about if the library is a black box
Choose this variant when
- Multiple teams share the cache and divergent miss paths are a real risk
- You want metrics and stampede protection for free
Write-through
Writes hit cache + DB synchronously.
Writes go to cache AND DB synchronously. Cache is always consistent with DB.
Every write goes to cache and DB in the same logical transaction. The cache is always consistent with the DB — at the cost of write latency (you pay both hops).
Use when: you need near-zero staleness on freshly-written keys and writes are a small fraction of total traffic. User profile updates and settings pages are the canonical fit.
Watch out for: if the cache is down, writes fail unless you fall back to DB-only and hope to repopulate. Also beware caching writes that will never be read — a one-time settings change pollutes cache memory for nothing.
Pros
- +Cache is never stale relative to DB
- +No "forgot to invalidate" class of bug
- +Reads after a write are always hot
Cons
- −Every write pays two hops
- −Cache outage fails writes (unless you fall back carefully)
- −Caches cold-written data even if it'll never be read
Choose this variant when
- User profile or settings pages where the user immediately reads what they wrote
- Write volume is small enough that caching every write is cheap
Write-behind (write-back)
Write to cache; flush to DB async.
Write to cache; cache flushes to DB async. Great for high write throughput but risks loss on crash.
Writes return after hitting the cache. A background job batches and flushes to DB. Massively improves write latency and lets you coalesce bursts — at the cost of durability risk and operational complexity.
Reality check: most teams that think they want this actually want a proper async queue + consumer. Write-behind is a fancy way to say "my cache is my write buffer" and it's a rough failure mode — if the cache node dies before flush, the writes are gone.
Pros
- +Very low write latency
- +Natural batching / coalescing of bursts
- +Can smooth over brief DB outages
Cons
- −Data loss if cache node dies before flush
- −Complex to reason about (what's durable? when?)
- −Hard to rebuild correctly after an outage
Choose this variant when
- Metrics, counters, analytics — low-value writes where occasional loss is fine
- Known bursty workloads where DB can't absorb spikes
Refresh-ahead
Proactively refresh keys before TTL expiry.
For hot keys with predictable access, refresh the cache entry before TTL expires so users never see a miss. Usually implemented as a background job that samples keys with high access frequency and re-fetches from origin.
Trade-off: wasted work if a refreshed key isn't read again. Only worth it for the top-N hot keys.
Pros
- +Eliminates post-expiry latency spikes
- +No stampede — background refresh is singleton
- +User-perceived freshness improves
Cons
- −Wasted refreshes on keys that go cold
- −Needs access-frequency tracking
- −Extra moving part to operate
Choose this variant when
- Top-N celebrity profiles on a social app
- Hot product pages during a flash sale
- Global config or feature flags read on every request
Negative caching
Cache "not found" too — so misses do not re-hit origin.
When a key genuinely does not exist in origin (deleted user, invalid short code, 404), you still want to cache that fact for a short TTL. Otherwise every lookup of a non-existent key becomes a full origin miss and an attacker can hammer origin just by generating random keys.
How: store a sentinel value (null, a tombstone, or a typed "not-found" marker) with a short TTL (seconds to a minute). On next lookup, the app sees the sentinel and returns 404 without touching origin.
Watch out for: legitimate keys that briefly looked absent (race between write and read). Keep the negative TTL short enough that a newly-created key becomes visible quickly.
Pros
- +Shields origin from lookup storms on non-existent keys
- +Simple to add to any cache-aside implementation
Cons
- −Briefly hides newly-created keys
- −Cardinality grows with attack surface — needs bounded memory
Choose this variant when
- Public key-lookup endpoints (URL shorteners, short-link expanders)
- Where enumeration attacks are likely
Worked example
Scenario: Caching the feed timeline for a social app. 100M DAU, 10 timeline fetches/user/day.
Step 1 — sizing
- Reads: 100M × 10 / 86,400 ≈ 12K reads/sec avg, × 4 peak = ~50K reads/sec
- Payload: 30 posts × 1 KB each = ~30 KB per timeline
- Working set: 100M users × 30 KB hot timeline = ~3 TB total if we cached all
- Hit rate target: 95% (most users refresh their OWN timeline repeatedly)
- Cache memory: 80/20 hot set = ~600 GB
- Shards: 600 GB / 25 GB usable per Redis shard = ~24 shards
Step 2 — pattern choice
- Pattern: cache-aside with a 30-second TTL
- Why: feeds tolerate 30s staleness; write path (new post) already pushes invalidations
- Invalidation: on new post, publish to Kafka topic; each region's cache subscribes and DELs affected users' timelines
Step 3 — stampede + hot key
- Stampede: single-flight per user — one in-flight timeline-build per user key, others wait up to 500ms for result
- Hot key: not usually a problem for per-user timelines, but celebrity feeds see hot reads. In-proc LRU on each app server catches 90% of reads for the top 10K users
Step 4 — failure modes
- Cache down: circuit-breaker to in-proc LRU only; rate-limit origin to 20% of normal; return "timeline unavailable, try again" for tail users (graceful degradation)
- Stale read: acceptable up to TTL; writer pushes invalidation so in worst case users see a 30s delay on seeing a new post
Step 5 — write path
- New post → insert into DB → publish FeedInvalidation event → Kafka consumer walks the poster's followers → DEL each follower's cached timeline
- Fan-out cost: ~200 avg followers × 100M daily posts ≈ 20B invalidations/day ≈ 230K invalidations/sec at peak — size Kafka + consumer for this
This whole thing takes 4 minutes in an interview. Every decision is traceable to a number.
Good vs bad answer
Interviewer probe
“How are you caching the URL shortener's read path?”
Weak answer
"Redis in front of Postgres — we cache the short code → long URL mapping. Hit rate will be high because redirects are concentrated."
Strong answer
"Cache-aside, two-tier. In-process LRU on each app server holds the top 100k hot keys — that absorbs viral spikes. Redis behind that for the long tail, 10-minute TTL. On miss we go to Postgres, populate both tiers, return. Invalidation: explicit DEL on delete + TTL as safety net. For stampede on popular expiring keys I use request coalescing — only one in-flight fetch per key. Negative caching on unknown codes with a 60s TTL to shield origin from enumeration. Hot key on one Redis shard is the residual risk; in-proc LRU handles it because the same shard would get all the traffic anyway."
Why it wins: Names the pattern, the TTL, the invalidation, the stampede protection, the hot-key mitigation, and the negative-caching defence — all six of the things interviewers probe.
When it comes up
- Any read-heavy system (almost all of them)
- After you name QPS in the capacity pass — interviewer asks how you will handle it
- When storage > single-primary capacity and you need to offload reads
- Whenever latency SLO is < 50ms p99 on a DB-backed endpoint
- When the prompt includes "hot content" or "trending" or "viral"
Order of reveal
- 1Name the pattern. "Cache-aside as the default — simple, tolerates cache outages, works everywhere."
- 2Name the technology. "Redis for server-side key-value cache; in-process LRU above it for the hottest 1% of keys; CDN at the edge for anything publicly cacheable."
- 3Size it. "Working set is ~600 GB, so ~24 Redis shards at 25 GB usable each."
- 4Set the TTL. "30-second TTL — feeds tolerate this staleness, and it bounds the invalidation-miss blast radius."
- 5Wire invalidation. "Writes publish an invalidation event; each region's cache consumer DELs affected keys. TTL is the safety net."
- 6Handle stampede. "Single-flight per key — one in-flight origin fetch per key, others wait. Bounded to 500ms."
- 7Handle hot keys. "In-proc LRU above Redis absorbs hot reads. For truly viral keys, publish to CDN edge so 1000s of POPs share load."
- 8Handle cache failure. "Circuit-break on Redis errors, fall through to in-proc LRU only, rate-limit origin to prevent cascade."
Signature phrases
- “Cache-aside as the default, with a stampede defence” — Names both the pattern and its most common failure mode.
- “TTL is my staleness budget — I can defend 30 seconds” — Shows TTL is a consequence of product requirement, not a guess.
- “In-proc LRU above Redis catches the top 1% of keys” — Layered caching prevents hot shards without adding ops complexity.
- “Single-flight per key to prevent stampede” — Names the specific technique, not just "we'll prevent stampede".
- “Negative caching with a 60-second TTL” — Shows awareness of enumeration attacks on public key lookups.
- “On cache outage, circuit-break to origin with rate limits” — Demonstrates failure-mode thinking the interviewer rarely has to probe for.
Likely follow-ups
?“What if Redis goes down?”Reveal
Three-stage response:
- 1Circuit-break immediately — app detects Redis errors via health-check and stops attempting Redis calls for 30s.
- 2Fall through to in-proc LRU — still serves the top 1% of traffic with zero network calls.
- 3Rate-limit origin — cap origin QPS at something it can sustain (say 20% of normal cached throughput). Return graceful degradation responses for excess traffic.
The key point: you never let a Redis outage cascade into an origin meltdown. The fallback path has to be explicitly rate-limited.
?“How do you prevent cache stampede on a hot key expiring?”Reveal
Two options, use one or both:
- Single-flight (request coalescing): library-level. When a key misses, the first request fetches from origin; subsequent requests for the same key wait for that in-flight fetch to return. Bounds the stampede to exactly one origin call per key regardless of how many concurrent misses.
- Probabilistic early expiration: before TTL, each read has a small and growing probability of triggering a refresh. By the time TTL actually hits, the key has almost certainly been refreshed already. XFetch algorithm is the reference implementation.
I'd use single-flight as the default because it needs no tuning; add probabilistic expiration for the top-N hot keys.
?“How do you invalidate the cache when the underlying data changes?”Reveal
Two strategies, usually combined:
- Explicit DEL on write: every write path that mutates cacheable data deletes the corresponding cache keys before returning. This is the most correct approach but hard to enforce across large codebases.
- TTL as safety net: even if DEL is missed, the entry expires in seconds-to-minutes.
For cross-service invalidation, publish an invalidation event on Kafka/pub-sub; each service subscribes and DELs its own affected keys. For cross-region, publish to a global topic; each region's consumer DELs its local cache. Consistency is eventual and bounded by topic lag — usually < 1 second.
?“Your cache hit rate dropped from 95% to 70%. What do you investigate?”Reveal
Walk the list:
- 1Working set grew — did a new feature blow up cardinality? (e.g. per-device keys instead of per-user)
- 2TTL shortened — someone lowered a TTL to fix a staleness bug and didn't notice the hit-rate impact.
- 3Invalidation floods — an upstream change is DELing more keys than expected.
- 4Memory pressure / evictions — Redis is evicting because working set exceeds capacity; check
evicted_keysmetric. - 5Cold start — recent cache restart; warming up.
The instrument to check: keyspace_hits / (keyspace_hits + keyspace_misses) in Redis, plus application-level hit/miss metrics. If evictions are non-zero, you're over capacity — add shards or shrink the working set.
Code examples
import threading
from functools import lru_cache
_inflight = {}
_inflight_lock = threading.Lock()
def get_with_singleflight(key, loader, ttl_sec=60):
value = redis.get(key)
if value is not None:
return value
# stampede defence — one in-flight loader per key
with _inflight_lock:
if key in _inflight:
event = _inflight[key]
else:
event = threading.Event()
_inflight[key] = event
try:
value = loader() # origin fetch
redis.set(key, value, ex=ttl_sec)
return value
finally:
_inflight.pop(key, None)
event.set()
event.wait(timeout=0.5)
return redis.get(key)
Cache-Control: public, max-age=86400, s-maxage=300, stale-while-revalidate=60
ETag: "v42-abc123"
Vary: Accept-Encoding, Accept-Language
# public — CDN/edge may cache
# max-age=86400 — browser keeps for 1 day
# s-maxage=300 — shared caches (CDN) keep for 5 min
# stale-while-revalidate=60 — serve stale up to 60s while refetching in bg
# ETag — conditional revalidation (304 Not Modified on match)
# Vary — split cache by header// On Redis errors, short-circuit for 30s and serve from in-proc LRU + rate-limited origin.
import CircuitBreaker from 'opossum';
const breaker = new CircuitBreaker(
(key: string) => redis.get(key),
{
timeout: 50, // ms before a Redis call is considered failed
errorThresholdPercentage: 25,
resetTimeout: 30_000, // half-open after 30s
},
);
breaker.fallback(async (key: string) => {
const local = lru.get(key); // 1) in-proc LRU
if (local) return local;
if (!originLimiter.tryAcquire()) { // 2) rate-limit origin
throw new Error('degraded: origin saturated');
}
const value = await db.fetch(key);
lru.set(key, value); // populate LRU only; skip Redis while open
return value;
});
export const getWithCache = (key: string) => breaker.fire(key);Common mistakes
A hot key expires and thousands of requests miss simultaneously, hammering the DB. Fix: request coalescing (single-flight) — only one in-flight fetch per key, others wait. Or probabilistic early expiration: refresh the key N seconds before TTL based on age × usage.
Many concurrent misses for the same key collapse to ONE origin fetch. Followers wait on the in-flight result instead of piling on origin.
You SET on create, but when the underlying row changes you don't DEL. Result: users read stale data forever. Fix: make every write path either DEL or SET the cache. For multi-region setups, publish invalidations via pub/sub.
A viral link is on one Redis shard; that shard hits 100% CPU while the rest are idle. Fix stack: in-process LRU (catches most before it reaches Redis) → replica fanout (read from any replica of the shard) → CDN edge (absorbs 80%+ before origin sees it).
A viral key would hammer one Redis shard. Layer caches above it: CDN absorbs public reads, in-proc LRU absorbs per-server reads, Redis replicas spread load. Origin only sees what nothing else caught.
Caching "like count" or "unread messages" without naming how stale is acceptable is a tell. These change per-user and per-second; the cache either has to be per-user (cardinality explodes) or tolerated-stale (interviewer asks about it).
A response containing user-specific data (balance, personalized feed) cached at the edge will be served to the wrong user. Fix: Cache-Control: private on the origin response, or explicit per-user cache keys. When in doubt, do not cache it at the edge.
A fresh deploy starts cold; the first burst of traffic hammers origin. Fix: pre-warm during deployment by replaying a sample of requests; or roll deploys gradually so each batch warms before the next joins; or use sticky sessions to concentrate warm-up on a subset of users first.
Practice drills
You add caching to an existing service. The DB load doesn't drop. What do you check first?Reveal
Hit rate. A cache with low hit rate is worse than no cache (you're paying the lookup penalty). Common causes: (1) keys are high-cardinality so the working set doesn't fit; (2) TTL is too short; (3) you're caching per-user when you could cache per-entity; (4) the cache is cold because the app restarts evict it. Instrument hit/miss at the app, then at Redis. Expect 90%+ hit rate on redirect-shaped workloads; anything below 70% is a design smell.
Cache-aside works great — except when the DB is slow. Then what?Reveal
Cache misses queue up behind the slow DB, the app runs out of worker threads, and effectively the whole system stops. This is the cascading failure mode that destroys naive cache-aside in production. Fixes: (1) request coalescing per key so one slow fetch doesn't multiply; (2) circuit breaker on DB so the app short-circuits to a stale-cache-is-OK path; (3) serve stale-on-error from cache — return the last known value with a header indicating it, rather than a 500.
You need per-user personalized data served with <50ms p99. Cache at the CDN, at Redis, or in-process?Reveal
Probably Redis + in-proc LRU, not the CDN. Reasons:
- CDN is wrong: per-user data must not land on a shared edge cache without Cache-Control: private, and even then the hit rate is low because each user has their own entry.
- Redis: good for cross-server sharing but still a 1–2 ms network hop, close to the p99 budget when you consider TCP and app processing.
- In-proc LRU: ~100 ns, but each server has its own copy — with sticky sessions (or replicating the top-N to every server) you can get 80%+ hit rate for free.
The answer: Redis as the source of truth, in-proc LRU on each app server with a short TTL (5–10 s) and sticky routing.
A write to the DB succeeded but the DEL to the cache failed. What guarantees do you have?Reveal
None automatically — your cache is stale until the TTL fires. Options to make this safer: (1) retry the DEL with backoff in the write path (acceptable if cache write failures are rare); (2) publish an invalidation event to Kafka/pub-sub, which is durable and retried by the consumer; (3) write-through — make the cache write part of the same logical transaction (more complex, higher write latency). Most teams pick option 2 because it's cheap and eventually correct.
Your cache memory is 80% full and Redis is starting to evict. What do you do?Reveal
Four options in order of complexity:
- 1Increase capacity — add shards or bigger instances. Cheapest, solves the symptom.
- 2Shrink keys/values — compress JSON values, shorten key names, drop embedded duplicates. Often halves memory with no hit-rate impact.
- 3Lower TTLs — if your staleness budget allows, shorter TTLs mean faster turnover and lower steady-state memory.
- 4Cache less — only cache truly hot keys; let cold keys go directly to origin. Needs hit-rate telemetry to pick the threshold.
I'd start with #2 (always worth trying) + #1 (buys time), then consider #3/#4 based on cost.
Cheat sheet
- •Four cache decisions: where, what, how fresh, how it fails.
- •Cache-aside is the default. Justify anything else.
- •TTL = your staleness budget. Name it, defend it.
- •Stampede: single-flight or probabilistic early expiration.
- •Hot keys: in-proc LRU → replica fanout → CDN. Stack them.
- •Every cache write path: SET and make sure invalidation is wired.
- •Cache outages must fail open for non-critical paths; circuit-break before they kill origin.
- •Negative-cache public lookups with short TTLs to shield origin from enumeration.
- •Size it: working set × hit rate × per-shard capacity. ~25 GB usable per Redis shard.
- •Cache-Control: private for user-specific responses; never let PII hit a shared edge cache.
Practice this skill
These problems exercise Caching strategy. Try one now to apply what you just learned.