Edge caching / CDN-first
When to reach for this
Reach for this when…
- Public, shareable content (product pages, articles, media)
- Static assets (JS, CSS, images, video manifests)
- Read:write ratio > 100:1 on shared content
- Global user base — edge latency matters
- Origin cost or capacity is a concern
Not really this pattern when…
- Every response is personalised per-user with no shared shell
- Content changes every request (real-time feeds, live scores)
- Compliance prohibits off-origin caching (PII, health records)
- Write-heavy workload where reads are incidental
Cheat sheet
- •Cache key = URL + selected query params + Vary headers. Strip tracking params.
- •Hashed static assets: Cache-Control: public, max-age=31536000, immutable.
- •HTML/API: Cache-Control: public, s-maxage=60, stale-while-revalidate=300.
- •Private/personalised: Cache-Control: private, no-store.
- •Never Vary on Cookie for public content — it kills hit rate.
- •Origin shield collapses miss storms: 200 PoP misses → 1 origin request.
- •Surrogate key purge: tag responses, purge by tag. One call, all URLs.
- •Versioned URLs (content hash) = self-invalidating. No purge needed.
- •Shell-and-slot: CDN serves cached shell, browser fetches personalised slots.
- •SWR = user never waits for origin. Stale served instantly, fresh fetched async.
- •Origin offload math: hit_rate × traffic = origin savings. CDN pays for itself.
- •Edge compute: sub-10 ms for auth, A/B, geo-routing. Keep heavy logic at origin.
Core concept
CDN as architecture, not addon
A CDN is a geographically distributed cache that sits between your users and your origin servers. Every PoP (Point of Presence) stores copies of responses keyed by URL and selected headers. When a request arrives, the nearest PoP checks its local cache: HIT returns instantly (sub-10 ms); MISS forwards to origin, caches the response, then returns it. The goal is simple — maximise the hit rate so origin handles only writes and genuine cache misses.
Client hits nearest PoP. Cache HIT serves instantly; cache MISS falls through to origin, which queries DB and returns through the PoP for caching.
Pull vs push CDN
Pull CDN (the default model): origin serves normally; the CDN fetches and caches on first miss. Zero upfront work — you point DNS at the CDN and set Cache-Control headers. The cost is cold-start latency: the first request to each PoP for each cache key hits origin. For a long-tail catalogue (millions of product pages), most PoPs will have most pages cold at any given time.
Push CDN: you upload assets to the CDN's storage before users request them. No cold-start miss, but you manage the upload pipeline. Ideal for large, known assets — video segments, software binaries, deploy bundles. Most production setups use pull for HTML/API and push for heavy media.
Cache-key discipline — the single most consequential design choice
The cache key determines what gets stored and what gets a separate copy. Default key: HTTP method + full URL. Add Vary: Accept-Language and you get one copy per language. Add Vary: Cookie and you get one copy per user session — hit rate collapses to near zero.
Rules of cache-key hygiene:
- 1Strip tracking query parameters (utm_source, fbclid, gclid) — they shatter the cache with no semantic difference.
- 2Normalise query-param order so ?color=red&size=M and ?size=M&color=red map to one entry.
- 3Never Vary on Cookie for public content. If you must personalise, use the shell-and-slot pattern.
- 4Monitor cache-key cardinality — if your CDN stores 50M distinct keys, most have a single hit before eviction. That is a sign of a bad key.
TTL strategies
| Content type | TTL | Header |
|---|---|---|
| Hashed static asset (app.a1b2c3.js) | 1 year | Cache-Control: public, max-age=31536000, immutable |
| HTML page shell | 60 s + SWR 300 s | Cache-Control: public, max-age=60, stale-while-revalidate=300 |
| Public API GET | 5–60 s | Cache-Control: public, max-age=30 |
| Personalised response | 0 | Cache-Control: private, no-store |
Stale-while-revalidate (SWR) is the most powerful TTL directive for user experience. After max-age expires, the PoP serves the stale copy instantly while fetching a fresh copy from origin in the background. The user never waits for origin; the cache self-heals within seconds. SWR is the right default for any content that tolerates a few seconds of staleness.
Invalidation at the edge
Tight TTLs self-heal, but sometimes you need immediate invalidation — a price correction, a security patch, a legal takedown. Three mechanisms:
- 1Purge by URL: call the CDN API with the exact URL. Simple but does not scale to thousands of pages.
- 2Surrogate keys (tags): tag responses with logical groups (product-42, category-electronics). Purge by tag invalidates every URL carrying that tag across all PoPs. Fastly and Cloudflare support this natively; CloudFront requires Lambda@Edge workarounds.
- 3Versioned URLs: change the URL itself (app.v2.js or app.[hash].js). Old URL expires naturally; new URL is fetched fresh. Best for deploy-time assets.
Propagation time matters. Fastly purges in <150 ms globally. CloudFront purges in 1–5 minutes. Know your CDN's SLA.
Origin offload math
The business case for CDN is arithmetic: if your hit rate is 95% and you serve 100K req/s, only 5K req/s reach origin. Without the CDN, origin needs 20× the capacity. At $0.01/10K requests on the CDN vs $0.10/10K on origin compute, the CDN pays for itself many times over. In an interview, stating the hit-rate × traffic = origin savings formula signals you think about CDN quantitatively, not just "it makes things faster."
Shell-and-slot for personalised content
The hardest CDN problem is pages that are 90% shared (product info, layout, nav) and 10% personalised (cart count, recommendations, logged-in username). Caching the whole page per user destroys hit rate. Serving everything from origin wastes the CDN.
Shell-and-slot splits the page: the CDN serves a cached HTML shell with placeholder divs. Client-side JavaScript fetches the personalised slots from origin API calls (which are Cache-Control: private). The shell gets a 95%+ hit rate; the slots are tiny JSON payloads that origin can serve cheaply. This pattern is the bridge between CDN-first and personalisation — and it cross-references directly with the read-heavy pattern's cache-aside strategy.
Edge compute — blurring CDN and app tier
Cloudflare Workers, Fastly Compute@Edge, Vercel Edge Functions, and AWS Lambda@Edge run application code at the PoP. Use cases: A/B test bucketing, auth token validation, geo-routing, HTML assembly from cached fragments. The PoP is no longer a dumb cache — it is a lightweight application server. The trade-off: per-request compute cost at the edge is higher than at origin, and cold-start latency varies by provider. Use edge compute for logic that must be geographically close to the user; keep heavy processing at origin.
Canonical examples
- →E-commerce product catalogue pages
- →News / blog articles
- →Streaming video manifests + segments
- →Software downloads, JS bundles
- →Public API responses with short TTLs
Variants
No CDN — origin direct
Every request hits origin. Simplest topology but highest origin load and worst global latency.
Every request reaches origin. Origin bears full read load. Latency equals client-to-origin round trip.
The no-CDN topology is where every system starts. Clients resolve your domain to your origin server's IP (or load balancer). Every GET, every static asset, every API call travels from the user's browser across the internet to your data centre and back. If your users are in Sydney and your origin is in us-east-1, every request pays ~200 ms of network latency before your server even starts processing.
Every request reaches origin. Origin bears full read load. Latency equals client-to-origin round trip.
Why it works at small scale. For a few thousand daily users, all in one geography, the simplicity is unbeatable. No cache invalidation bugs, no stale content surprises, no CDN bill. Your origin server is the single source of truth for every byte — debugging is trivial because there is exactly one place to look.
Why it breaks. Three forces push you off this topology:
- 1Latency. Global users experience round-trip times of 150–300 ms to a single-region origin. Multiply by the number of sequential requests to render a page (HTML + CSS + JS + API calls) and the page load exceeds 2 seconds — the threshold where conversion drops measurably.
- 1Origin load. Every static asset request (logo.png, bundle.js, fonts) hits your application servers. These bytes are identical for every user. Serving them from origin is pure waste — CPU cycles, bandwidth, connection slots consumed for content that never changes.
- 1Cost. Bandwidth from cloud providers is expensive ($0.05–0.09/GB on AWS). CDN bandwidth is 3–10× cheaper because CDNs negotiate peering agreements at scale. At 100 TB/month, the savings fund the entire CDN bill.
When to stay here. Internal tools, B2B dashboards with <1,000 users in one region, pre-launch MVPs. The moment you have public traffic or global users, add a CDN.
Pros
- +Zero CDN configuration or cost
- +No cache staleness — every response is fresh
- +Simplest debugging — one source of truth
Cons
- −Highest latency for distant users
- −Origin bears 100% of read load
- −Expensive bandwidth at scale
Choose this variant when
- Internal tool with <1K users in one region
- Pre-launch MVP with no public traffic
- All responses are private / personalised
Pull CDN
CDN fetches from origin on cache miss and serves subsequent requests from edge.
Client hits PoP. On miss, PoP fetches from origin, caches, and returns. Subsequent requests served from edge.
The pull CDN is the default production topology. You point your domain's DNS (CNAME) at the CDN provider. The CDN's PoPs intercept every request. On a cache HIT, the PoP returns the response in <10 ms. On a MISS, the PoP forwards the request to your origin, caches the response according to Cache-Control headers, and returns it to the client.
Client hits PoP. On miss, PoP fetches from origin, caches, and returns. Subsequent requests served from edge.
Setup is minimal. Configure DNS, set Cache-Control headers on your origin responses, and you are live. No asset upload pipeline, no deploy hooks — the CDN populates itself lazily. This is why pull CDN is the right default.
The cold-start problem. Every PoP starts cold. The first request for a given URL at a given PoP always hits origin. With 200+ PoPs worldwide and a long-tail catalogue (10M product pages), most PoP-URL combinations are cold at any point in time. The effective hit rate depends on request concentration: a popular homepage might hit 99% across all PoPs, while a niche product page might be cached in only 3 PoPs.
Cache-Control is the API contract. Your origin controls caching behaviour entirely through HTTP headers. This is elegant — the CDN is a standards-compliant HTTP cache. But it means a misconfigured header (Cache-Control: private on a public page, or no Cache-Control at all) silently breaks caching with no error. Monitor your CDN's HIT/MISS ratio per path — a sudden drop means a header regression.
Vary header traps. Adding Vary: Accept-Encoding is safe (gzip vs brotli, 2-3 variants). Adding Vary: Accept-Language creates one cache entry per language. Adding Vary: Cookie creates one entry per user session — cache is useless. The rule: Vary only on headers that produce meaningfully different responses, and keep the cardinality low.
Cost model. CDN pricing is per-request + per-GB egress, both cheaper than origin. The origin savings from a 90% hit rate typically exceed the CDN cost by 3–5×. Monitor origin offload percentage — it is the single metric that justifies the CDN investment.
Pros
- +Minimal setup — DNS + headers
- +Origin offload of 80–95% for cacheable content
- +Automatic global distribution
Cons
- −Cold-start misses on each PoP
- −Long-tail content may never warm in most PoPs
- −Misconfigured headers silently kill hit rate
Choose this variant when
- Public-facing web application with cacheable content
- Global user base needing low latency
- Team does not want to manage asset upload pipelines
CDN + origin shield
A shield PoP sits between edge PoPs and origin, collapsing concurrent misses into one origin request.
Multiple PoPs converge on a single shield PoP. Shield absorbs concurrent misses so origin sees at most one request per cache key per TTL.
Origin shield adds a second caching tier between edge PoPs and your origin. When an edge PoP has a cache miss, instead of going directly to origin, it routes to the designated shield PoP. If the shield has the content, it serves it — edge PoP caches it, done. If the shield also misses, it fetches from origin, caches, and serves all waiting edge PoPs.
Multiple PoPs converge on a single shield PoP. Shield absorbs concurrent misses so origin sees at most one request per cache key per TTL.
Why shield matters: the miss storm. Without a shield, a TTL expiry on a popular URL triggers simultaneous misses from 200+ PoPs, all hitting origin at the same instant. This is a miss storm (also called a cache stampede at the edge). Origin sees a 200× traffic spike for a few seconds every TTL cycle. With a shield, all those PoP misses converge on one shield PoP, which sends exactly one request to origin. The storm is collapsed.
Shield placement. Choose a shield PoP geographically close to your origin — ideally in the same cloud region. This minimises shield-to-origin latency. Most CDN providers let you configure this: Fastly has explicit shield PoP selection, CloudFront has Origin Shield as a checkbox with region selection, Cloudflare uses tiered caching (automatic).
Request coalescing. Advanced CDN configurations combine shield with request coalescing (also called request collapsing). When multiple requests for the same cache key arrive at a PoP within a short window, the PoP sends only one request upstream and fans out the response to all waiting clients. Fastly and Varnish support this natively. Combined with shield, this means even within a single PoP, concurrent misses produce one origin request.
Cost of an extra hop. Shield adds latency on a cold miss: client → edge PoP → shield PoP → origin instead of client → edge PoP → origin. The extra hop is typically 5–30 ms depending on shield placement. For a cold miss that is already paying 50–200 ms of origin processing, this is negligible. The origin protection is worth far more than the latency cost.
When to enable shield. Any time your origin is capacity-constrained or your content has sharp TTL boundaries that cause synchronised expiry. Shield is nearly free (small per-request surcharge) and dramatically smooths origin load.
Pros
- +Collapses miss storms to single origin request
- +Dramatically smooths origin load
- +Shield warms content for all edge PoPs
Cons
- −Extra hop adds 5–30 ms on cold miss
- −Shield PoP is a concentration point (rare failure)
- −Small additional CDN cost per request
Choose this variant when
- Origin is capacity-constrained or expensive
- Content has synchronised TTL expiry
- High PoP count with long-tail content
Edge compute
Run application code at the PoP — the CDN becomes a lightweight app server.
PoP runs application logic (Workers, Compute@Edge). Only API / data calls reach origin. Static + dynamic edge responses.
Edge compute platforms — Cloudflare Workers, Fastly Compute@Edge, Vercel Edge Functions, AWS Lambda@Edge, Deno Deploy — run your code at the CDN's PoPs. The PoP is no longer a passive cache; it executes JavaScript/TypeScript/Wasm in a sandboxed runtime, with access to edge KV stores, and returns computed responses.
PoP runs application logic (Workers, Compute@Edge). Only API / data calls reach origin. Static + dynamic edge responses.
Use cases that justify edge compute:
- 1A/B test bucketing. Hash the user ID at the edge, select a variant, serve the corresponding cached page. No origin round-trip, no client-side flicker.
- 2Auth token validation. Verify a JWT at the edge and reject unauthenticated requests before they reach origin. Reduces origin attack surface.
- 3Geo-routing. Inspect the client's country/region (available from CDN headers) and route to the nearest origin or serve region-specific content.
- 4HTML assembly. Fetch a cached HTML shell and inject personalised fragments from edge KV or a short origin call. This is the programmatic version of shell-and-slot.
- 5API gateway logic. Rate limiting, request transformation, header injection — lightweight middleware that does not need origin.
Trade-offs to name in an interview:
- Per-request compute cost. Edge compute is billed per invocation + CPU time. At millions of requests per second, this adds up. A cached static response costs $0.01/10K; an edge function invocation costs $0.50–1.00/million on top.
- Limited runtime. Workers have 128 MB memory and 30 s CPU time (often less). No filesystem, no long-running processes. This is a constraint, not a bug — it keeps the edge fast.
- Cold start. V8 isolates (Cloudflare) start in <5 ms. Lambda@Edge containers start in 50–200 ms. Choose your provider based on your cold-start tolerance.
- Debugging. Logs are distributed across 200+ PoPs. You need centralised log aggregation (Logpush, Datadog) to debug production issues.
When edge compute wins over traditional CDN. When you need sub-20 ms response times for logic that varies per request but does not need origin data. When origin round-trip time is the bottleneck and the logic is simple enough to run in a constrained runtime. When you want to move auth, A/B, or geo-routing out of your application servers entirely.
Pros
- +Sub-10 ms responses for computed content
- +Offloads auth, A/B, geo-routing from origin
- +Global deployment with zero infra management
Cons
- −Per-invocation cost adds up at high volume
- −Constrained runtime (memory, CPU, no FS)
- −Distributed debugging across 200+ PoPs
Choose this variant when
- Need sub-20 ms for request-dependent responses
- Auth or A/B routing is a latency bottleneck
- Team can work within constrained edge runtimes
Scaling path
No CDN — origin direct
Ship the feature. Every request hits origin directly.
Every request reaches origin. Origin bears full read load. Latency equals client-to-origin round trip.
All reads and static assets served by origin servers. Works for small internal apps and pre-launch MVPs. Latency equals client-to-origin RTT; origin bears 100% of traffic.
What triggers the next iteration
- Origin handles all static asset requests — wasted compute
- Global users experience 150–300 ms RTT
- Bandwidth costs scale linearly with traffic
Pull CDN
Offload cacheable reads to edge PoPs. Origin handles only misses.
Client hits PoP. On miss, PoP fetches from origin, caches, and returns. Subsequent requests served from edge.
DNS points to CDN. PoPs cache responses on first miss. Hit rate 80–95% for well-keyed content. Origin load drops by an order of magnitude. Cold-start misses still hit origin for each PoP.
What triggers the next iteration
- Cold-start miss storms on TTL expiry across 200+ PoPs
- Long-tail content rarely cached in most PoPs
- Cache-key misconfiguration silently kills hit rate
CDN + origin shield
Collapse miss storms. Origin sees at most one request per cache key per TTL.
Multiple PoPs converge on a single shield PoP. Shield absorbs concurrent misses so origin sees at most one request per cache key per TTL.
A shield PoP between edge and origin absorbs concurrent misses. Combined with request coalescing, origin load is nearly flat regardless of PoP count. Shield placement should be in the same region as origin.
What triggers the next iteration
- Edge still serves stale content during SWR window
- Personalised content cannot be cached at edge
- Shield adds an extra hop on cold misses
Edge compute
Run application logic at the PoP. Origin handles only data writes and heavy computation.
PoP runs application logic (Workers, Compute@Edge). Only API / data calls reach origin. Static + dynamic edge responses.
Edge workers handle auth, A/B bucketing, geo-routing, and HTML assembly. Origin becomes a pure API server. Static content is cached; dynamic content is computed at the edge with sub-10 ms latency.
What triggers the next iteration
- Per-invocation cost at millions of RPS
- Constrained runtime limits complex logic
- Debugging distributed across 200+ PoPs
Deep dives
Cache-key discipline — the foundation of hit rate
Good key: URL + Accept-Language (few variants, high hit rate). Bad key: URL + session cookie (per-user, near 0% hit rate).
The cache key is the single most consequential decision in CDN architecture. It determines how many distinct copies the CDN stores and, inversely, how often a request can be served from cache. A bad cache key turns a CDN into an expensive proxy.
Good key: URL + Accept-Language (few variants, high hit rate). Bad key: URL + session cookie (per-user, near 0% hit rate).
Default key: method + URL. A GET to /product/42?color=red stores one entry. A GET to /product/42?color=red&utm_source=google stores a different entry — identical content, separate cache slot. Multiply by every tracking parameter every ad platform appends, and your hit rate collapses.
Rule 1: Strip tracking parameters. Configure the CDN to remove utm_*, fbclid, gclid, msclkid, and similar parameters before computing the cache key. Fastly, Cloudflare, and CloudFront all support query-string whitelisting or blacklisting. Whitelist is safer — explicitly name the parameters that affect the response (color, size, page) and ignore everything else.
Rule 2: Normalise query-param order. ?color=red&size=M and ?size=M&color=red should map to the same cache key. Some CDNs do this automatically; others require edge logic to sort parameters before caching.
Rule 3: Use Vary sparingly. Vary: Accept-Encoding is fine (2-3 variants: gzip, brotli, identity). Vary: Accept-Language is acceptable if you serve genuinely different content per language (5–20 variants). Vary: Cookie is almost always wrong for public content — it creates one cache entry per session, destroying hit rate.
Rule 4: Monitor cardinality. If your CDN reports 50M distinct cache keys but only 2M distinct pages, you have a key explosion problem. The ratio of keys to logical resources should be close to 1:1 (times the number of Vary dimensions).
Cache key for API responses. API GETs can be cached if the response depends only on the URL and a few headers. Example: /api/products?category=shoes with Vary: Accept-Language caches per language. But /api/feed with Authorization header must use Cache-Control: private — the CDN must never serve one user's feed to another.
Interview signal. When a candidate says "put it behind a CDN," ask "what is the cache key?" If they can articulate the key design, strip rules, and cardinality concern, they are operating at a senior level.
TTL strategies — immutable, short-lived, and stale-while-revalidate
Static hashed assets: 1yr immutable. API responses: 30s TTL. HTML: stale-while-revalidate flow.
TTL (Time To Live) controls how long a cached response is considered fresh. Setting the right TTL per content type is the second most impactful CDN decision after cache key design.
Static hashed assets: 1yr immutable. API responses: 30s TTL. HTML: stale-while-revalidate flow.
Immutable assets (1 year). Any asset whose URL contains a content hash — app.a1b2c3.js, style.d4e5f6.css — is immutable by definition. If the content changes, the hash changes, and the URL changes. Set Cache-Control: public, max-age=31536000, immutable. The CDN will never revalidate this URL. This is the gold standard for static assets and the reason modern build tools (Webpack, Vite, esbuild) put hashes in filenames.
HTML pages and API responses (short TTL + SWR). These change unpredictably. A 60-second max-age means a price update takes at most 60 seconds to propagate. Adding stale-while-revalidate=300 means the PoP serves stale content instantly for up to 5 minutes while fetching a fresh copy in the background. The user never waits for origin; the cache self-heals within one revalidation cycle.
The SWR flow in detail:
- 1Request arrives. max-age has expired but SWR window is still open.
- 2PoP serves the stale response immediately (0 ms added latency).
- 3PoP sends an asynchronous revalidation request to origin.
- 4Origin returns fresh response. PoP replaces the cached entry.
- 5Next request gets the fresh response.
This means staleness is bounded by the SWR window (not the max-age), and the user experience is always fast. SWR is the single best directive for balancing freshness and performance.
Private / personalised content (no cache). Any response that depends on the user's identity — dashboard data, account settings, cart contents — must be Cache-Control: private, no-store. The CDN must never cache it. Use the shell-and-slot pattern to cache the surrounding page while keeping personalised slots private.
s-maxage vs max-age. s-maxage sets the TTL for shared caches (CDN) independently from max-age (browser cache). Example: Cache-Control: public, max-age=0, s-maxage=60 tells the browser to always revalidate but lets the CDN serve cached content for 60 seconds. This is useful when you want the CDN to absorb traffic but the browser to always check freshness.
Conditional requests (ETag / Last-Modified). When a cached response expires, the PoP can send a conditional request to origin: If-None-Match: "etag-value". If the content hasn't changed, origin returns 304 Not Modified (no body), saving bandwidth. ETags are cheap insurance against unnecessary full responses.
Invalidation at the edge — purge, tags, and versioned URLs
Origin writes DB, then calls CDN purge API with surrogate keys. All PoPs invalidate matching entries. Alternative: deploy new hashed URL.
Tight TTLs self-heal, but sometimes you need immediate invalidation: a price correction, a security vulnerability in a JS bundle, a legal takedown. CDN invalidation is the escape hatch — use it rarely, but know it cold.
Origin writes DB, then calls CDN purge API with surrogate keys. All PoPs invalidate matching entries. Alternative: deploy new hashed URL.
Purge by URL. Call the CDN's purge API with the exact URL: POST /purge { "url": "https://shop.com/product/42" }. Every PoP that has this URL cached invalidates it. The next request triggers a miss, fetches from origin, and re-caches. Simple, but does not scale — if you update 10,000 products, you need 10,000 purge calls.
Surrogate keys (cache tags). When origin serves a response, it attaches a Surrogate-Key header: Surrogate-Key: product-42 category-electronics homepage-featured. The CDN indexes the cached response under all those keys. To invalidate everything related to product 42, you purge by tag: POST /purge { "key": "product-42" }. One call invalidates the product page, the category listing, the homepage feature slot — every URL that carried that tag. Fastly supports this natively with <150 ms global propagation. Cloudflare uses Cache-Tags. CloudFront has limited tag support and relies on Lambda@Edge workarounds.
Versioned URLs. Instead of invalidating, change the URL itself. app.v1.js becomes app.v2.js. The old URL expires naturally per its TTL; the new URL is fetched fresh. This is the standard approach for deploy-time assets (JS, CSS, images). Build tools generate content-hashed filenames automatically. No purge API call needed — the deploy itself is the invalidation.
Soft purge vs hard purge. A hard purge removes the cached entry immediately; the next request is a cold miss. A soft purge marks the entry as stale; the next request triggers a background revalidation (SWR-like behaviour). Soft purge avoids miss storms on popular URLs — users get stale content for a few seconds instead of waiting for origin. Use soft purge as the default; hard purge for security-critical invalidation.
Propagation time matters. Fastly propagates purges in <150 ms globally. CloudFront takes 1–5 minutes. Akamai varies by configuration. If your use case requires <1 s invalidation (security patches, price corrections on high-traffic pages), choose a CDN with fast purge propagation — it is a capability differentiator, not just a speed claim.
Interview tip. When discussing invalidation, name the trade-off: purge is fast but operationally risky (purge the wrong tag and origin gets hammered). Versioned URLs are safe but require a deploy. Tight TTL + SWR is the low-risk default; purge is the emergency lever.
Shell-and-slot — caching pages with personalised fragments
CDN serves cached HTML shell (nav, layout, product info). Browser JS fetches personalised slot (cart count, recs) from origin API.
The hardest CDN problem is a page that is 90% shared and 10% personalised. An e-commerce product page has the same product title, images, description, and price for every user — but the cart count, recently-viewed recommendations, and "Hello, Alice" greeting are per-user. Caching the whole page per user destroys hit rate. Serving the whole page from origin wastes the CDN.
CDN serves cached HTML shell (nav, layout, product info). Browser JS fetches personalised slot (cart count, recs) from origin API.
Shell-and-slot architecture:
- 1The shell is the static/shared portion of the page: HTML layout, navigation, product information, footer. It is served by the CDN with a standard TTL (60 s + SWR 300 s). Hit rate: 95%+.
- 1The slots are the personalised fragments: cart count, user greeting, recommendation carousel, wishlist indicator. They are fetched by client-side JavaScript as small JSON API calls after the shell loads. These API calls use Cache-Control: private, no-store — they never touch the CDN.
- 1The browser assembles the final page by injecting slot content into placeholder divs in the shell. The user sees the shell instantly (CDN speed), then slots appear within 100–300 ms as API calls complete.
Why this works. The shell is the heavy part — HTML, CSS references, product images, structured data. Serving it from edge saves 80–95% of bandwidth and latency. The slots are tiny (a few hundred bytes of JSON each) and cheap for origin to serve. Total origin load: a handful of small API calls per page view instead of rendering the entire page.
Server-side vs client-side slot filling. Edge compute enables a server-side variant: the edge worker fetches the cached shell, fetches slot data from origin API, assembles the HTML, and returns a complete page. The user gets a fully rendered page from the edge with no client-side JS required. This improves perceived performance and SEO (no layout shift from slot injection). The trade-off is edge compute cost per request.
Cross-reference: read-heavy pattern. Shell-and-slot is the CDN-side complement to the read-heavy pattern's cache-aside strategy. In read-heavy, the application cache (Redis/Memcached) absorbs database reads. In shell-and-slot, the CDN absorbs HTTP reads. A fully optimised read path uses both: CDN serves the shell, browser fetches slots, slots hit the application cache before touching the database. Three layers of caching, each absorbing a different class of traffic.
Pitfall: layout shift. If slots change the page layout when they load (e.g., a recommendation carousel pushes content down), the user experiences Cumulative Layout Shift (CLS), which hurts Core Web Vitals scores. Reserve space in the shell for each slot (fixed-height placeholder) to prevent shift.
Origin shield — collapsing miss storms
Multiple edge PoPs miss simultaneously. Without shield, all hit origin. With shield, one PoP absorbs the miss storm.
When a popular URL's TTL expires, every PoP that had it cached simultaneously discovers the entry is stale. Without protection, 200+ PoPs send concurrent requests to origin. Origin sees a traffic spike of 200× baseline for a few seconds every TTL cycle. This is a miss storm — the edge equivalent of a cache stampede.
Multiple edge PoPs miss simultaneously. Without shield, all hit origin. With shield, one PoP absorbs the miss storm.
How origin shield works. A single PoP is designated as the shield for a given origin. All edge PoPs route their misses to the shield instead of directly to origin. The shield acts as a second-tier cache:
- 1Edge PoP A has a miss for /product/42.
- 2Edge PoP A sends the request to Shield PoP (US-Central).
- 3Shield PoP checks its own cache. If HIT, returns to PoP A.
- 4If shield also misses, shield sends exactly one request to origin.
- 5While shield waits for origin, PoP B and PoP C also send misses for /product/42 to shield.
- 6Shield coalesces all waiting requests. Origin processes one request.
- 7Shield caches the response and fans it out to PoP A, B, C simultaneously.
Result: origin sees one request instead of 200+. The miss storm is collapsed.
Request coalescing at the shield. This is the key mechanism. When multiple requests for the same cache key arrive at the shield within a short window (typically the time it takes to fetch from origin), the shield holds them in a queue, sends one request upstream, and broadcasts the response to all waiters. Fastly and Varnish call this "request collapsing" and support it natively. Without coalescing, even a shield would forward concurrent misses — just from one PoP instead of 200.
Shield placement strategy. Place the shield PoP in the same cloud region as your origin. Shield-to-origin latency should be <5 ms (same-region) rather than 30–100 ms (cross-continent). This minimises the time window during which other PoPs' misses can pile up.
Multi-origin shield. If you have origins in multiple regions (us-east, eu-west, ap-south), configure separate shields per origin region. Each shield serves the PoPs geographically closest to its origin. This is Cloudflare's "tiered caching" model — a hierarchy of PoPs with configurable topology.
Cost-benefit. Shield adds ~$0.001 per 10K requests and 5–30 ms per cold miss. In exchange, it can reduce origin request volume by 50–90% for content with synchronised TTL expiry. For origins with hard capacity limits (legacy on-prem, rate-limited APIs), shield is not optional — it is a reliability requirement.
Edge compute — when the CDN becomes the app tier
PoP runs application logic (Workers, Compute@Edge). Only API / data calls reach origin. Static + dynamic edge responses.
Edge compute platforms blur the line between CDN and application server. A Cloudflare Worker or Fastly Compute function runs your code at the PoP, with sub-millisecond cold start (V8 isolates) or <50 ms (Wasm), serving computed responses without an origin round-trip.
PoP runs application logic (Workers, Compute@Edge). Only API / data calls reach origin. Static + dynamic edge responses.
Architecture shift. In a traditional CDN setup, the PoP is a dumb cache: it stores responses keyed by URL and replays them. With edge compute, the PoP is a programmable node: it receives a request, runs your handler function, optionally reads from edge KV storage, optionally calls origin for data, and returns a computed response. The PoP is no longer pass-through — it is an application tier.
Where edge compute adds value:
- Auth at the edge. Validate JWTs without an origin round-trip. Reject unauthenticated requests at the PoP. This reduces origin load and attack surface simultaneously. Edge KV stores the JWKS (JSON Web Key Set) for validation.
- A/B testing without flicker. Traditional client-side A/B testing shows the default variant, then flickers to the test variant after JS loads. Edge compute selects the variant before HTML is returned — no flicker, no layout shift, no client-side SDK.
- Geo-personalisation. The CDN injects geo headers (country, city, region). Edge compute uses these to serve region-specific pricing, language, or compliance banners without origin involvement.
- HTML assembly from fragments. The edge worker fetches a cached HTML shell and multiple cached JSON fragments, assembles a complete page, and returns it. Each fragment has its own TTL and cache key. This is ESI (Edge Side Includes) done right — programmatic, testable, debuggable.
- API gateway functions. Rate limiting, request validation, header transformation, CORS handling — lightweight middleware that runs at the edge instead of at origin.
Constraints to name in an interview:
- 1Memory limit. Workers: 128 MB. Lambda@Edge: 128–512 MB. You cannot load a machine learning model or process large files.
- 2CPU time. Workers: 10–50 ms CPU time per request (plan-dependent). Long computations must happen at origin.
- 3No persistent connections. Edge runtimes are request-scoped. You cannot maintain WebSocket connections or database connection pools. Each request opens connections fresh (or uses connection pooling services like Hyperdrive).
- 4Distributed state. Edge KV stores are eventually consistent with ~60 s propagation. If you need strong consistency, you need origin.
Cost model. Workers: $0.50 per million requests + $12.50 per million ms CPU time. At 100M requests/month, that is $50 in invocations alone — cheap for auth routing, expensive for heavy computation. The decision framework: if the logic is <5 ms CPU and benefits from geographic proximity, run it at the edge. Otherwise, keep it at origin.
Case studies
Netflix Open Connect — the ultimate push CDN
Netflix built its own CDN (Open Connect) because no commercial CDN could handle 15%+ of global internet traffic during peak hours. Open Connect Appliances (OCAs) are custom servers deployed inside ISP networks worldwide — over 17,000 servers in 6,000+ ISP sites across 175+ countries.
Architecture. Netflix's control plane (running on AWS) determines which video segments are popular in each region. During off-peak hours, it pre-positions (pushes) video content to OCAs in the relevant ISPs. When a user presses play, the Netflix client receives a manifest pointing to the nearest OCA. The OCA serves the video segments directly — the traffic never leaves the ISP's network.
Cache key design. Video files are chunked into ~5-second segments, each encoded at multiple bitrates. The cache key is effectively segment_id + bitrate. Content hashes ensure byte-identical segments across all OCAs. There are no Vary headers — the manifest selects the right bitrate URL.
Numbers. Open Connect serves 100+ Tbps of peak throughput globally. OCAs use custom FreeBSD-based software optimised for sequential disk reads. A single OCA can saturate 100 Gbps of network throughput. Cache hit rate exceeds 95% because the push model pre-warms content based on predictive popularity models.
Invalidation. Content that is removed (licensing expiry) is deleted from OCAs via the control plane. No purge API needed — the manifest simply stops listing the segments, and OCAs garbage-collect unreferenced content.
Takeaway
At Netflix scale, the CDN is not a service you buy — it is infrastructure you build. But the principles (push popular content, cache key by content hash, pre-warm based on prediction) apply at every scale.
Shopify Storefront — edge caching for millions of stores
Shopify serves millions of online stores, each with its own domain, theme, and product catalogue. The storefront is the public-facing read path — product pages, collection pages, the homepage — and it must be fast globally for every store.
Architecture. Shopify routes all storefront traffic through Cloudflare's CDN. Each store's pages are cached at the edge with a cache key of host + path + selected query params (page, sort, filter). The Vary header includes Accept-Language for stores with multiple languages. Tracking parameters are stripped at the edge.
TTL strategy. Product pages use a short TTL (30–60 s) with stale-while-revalidate (300 s). When a merchant updates a product (price, title, image), Shopify's backend publishes a webhook that triggers a targeted cache purge using Cloudflare's Cache-Tag API. The product page, any collection page featuring that product, and the store's sitemap are all invalidated via surrogate keys.
Shell-and-slot. Shopify's storefront uses a variant of shell-and-slot. The page HTML (shell) is cached and contains the product info, images, and layout. Dynamic elements (cart count, recently viewed, personalised recommendations) are loaded via client-side JavaScript fetching the Storefront API (Cache-Control: private).
Numbers. Shopify reports that edge caching offloads 80–90% of storefront read traffic from origin. During Black Friday / Cyber Monday, edge caching is the primary defence against traffic spikes — origin capacity is sized for misses only, not total traffic. Peak traffic exceeds 1M requests/second across all stores, with edge serving >800K/s from cache.
Lesson for interviews. Shopify's approach is replicable: pull CDN + surrogate key purge + shell-and-slot + short TTL with SWR. You do not need Netflix-scale custom infrastructure. A well-configured commercial CDN with disciplined cache keys and invalidation covers most e-commerce use cases.
Takeaway
Commercial CDN + surrogate key purge + shell-and-slot + SWR is the replicable playbook for high-traffic e-commerce — no custom CDN needed.
Vercel / Next.js ISR — CDN-native incremental static regeneration
Vercel's Incremental Static Regeneration (ISR) integrates CDN caching directly into the application framework. Instead of choosing between static site generation (all pages built at deploy time) and server-side rendering (every page built on request), ISR generates pages on demand and caches them at the edge with automatic revalidation.
How ISR works.
- 1A page is requested for the first time. The edge has no cached version.
- 2The edge worker calls the Next.js server to render the page (server-side rendering for this one request).
- 3The rendered HTML is cached at the edge with a TTL (the revalidate interval, e.g., 60 seconds).
- 4Subsequent requests within the TTL are served from edge cache instantly.
- 5After the TTL expires, the next request triggers a background revalidation: the edge serves the stale page to the user while asynchronously calling the server to render a fresh version.
- 6The fresh version replaces the cached entry. Next request gets fresh content.
This is stale-while-revalidate implemented at the framework level, not just the HTTP header level. The framework controls the render, the cache key, and the revalidation trigger.
On-demand ISR. Next.js also supports on-demand revalidation: when a CMS webhook fires (content updated), your API route calls res.revalidate('/product/42'), which triggers an immediate re-render and cache update. This is equivalent to surrogate key purge but integrated into the framework's build pipeline.
Cache key. ISR pages are keyed by URL path + locale. Vercel's edge network handles the rest: stripping query params, normalising paths, managing per-locale variants.
Numbers. Vercel reports sub-50 ms TTFB for ISR-cached pages (edge response). Background revalidation adds no latency to the user request. For a catalogue of 100K products, ISR means you deploy instantly (no build step for 100K pages) and each page is generated on first visit, then cached.
Trade-off. ISR couples your caching strategy to the Next.js framework. If you are not on Next.js (or Nuxt, SvelteKit, Astro with similar features), you implement the same pattern manually: render on miss, cache at edge with SWR headers, purge on content change.
Takeaway
ISR is stale-while-revalidate implemented at the framework level — the pattern is CDN-native rendering with automatic revalidation, replicable in any stack with SWR headers and purge-on-write.
Decision levers
Pull vs push CDN
Pull (default): origin serves; CDN caches on miss. Zero setup; cold-start misses cost one origin request. Push: pre-upload assets before users request them — no cold start but requires an upload pipeline. Use push for large known assets (video segments, software binaries); pull for everything else.
Cache-key design
Strip tracking params (utm_*, fbclid) so ad links do not shatter the cache. Normalise query-param order. Add Vary only when the response actually differs — never Vary: Cookie for public content. Monitor key cardinality: keys/pages ratio should be close to 1:1 per Vary dimension.
TTL strategy
Hashed assets: 1 year immutable. HTML shells: 30–60 s + SWR 300 s. API responses: 5–60 s. Private content: no-store. Use s-maxage to set CDN TTL independently from browser TTL. SWR is the default directive for any content that tolerates seconds of staleness.
Invalidation mechanism
Surrogate key purge for content updates (one API call invalidates all tagged URLs). Versioned URLs for deploy-time assets (content hash in filename). Soft purge as default to avoid miss storms on popular URLs. Hard purge for security-critical invalidation only.
Origin shield
Enable shield when origin is capacity-constrained, content has synchronised TTL expiry, or you have many PoPs with long-tail content. Place shield in the same region as origin. Cost is minimal (~$0.001/10K requests); origin protection is substantial.
Failure modes
Tracking parameters, excessive Vary headers, or unnormalised query strings create millions of cache entries with single-digit hit counts. Fix: strip tracking params, whitelist meaningful query params, normalise order, monitor cardinality.
Adding Vary: Cookie to public pages creates one cache entry per user session. Hit rate drops to ~0%. Fix: never Vary on Cookie for shared content. Use shell-and-slot for personalisation.
Popular URL expires simultaneously across 200+ PoPs. All hit origin at once. Fix: origin shield + request coalescing. Also jitter TTLs slightly (± random seconds) to desynchronise expiry.
Content updated in DB but CDN still serves old version until TTL expires. Fix: trigger surrogate key purge on write. Or use tight TTL + SWR so staleness is bounded to seconds.
Mass purge (e.g., purge all products) causes 50K simultaneous misses from all PoPs. Origin buckles under the refill load. Fix: soft purge (serve stale while revalidating), rate-limit purge calls, or stagger purges over time.
Origin does not set Cache-Control on responses. CDN applies its own default (which varies by provider — some cache, some do not). Fix: always set explicit Cache-Control headers. Treat missing headers as a bug, not a CDN config issue.
Lambda@Edge cold starts add 50–200 ms on first invocation. Users hitting cold PoPs experience degraded latency. Fix: use V8 isolate-based platforms (Cloudflare Workers, Deno Deploy) with <5 ms cold start, or pre-warm Lambda functions.
Decision table
CDN topology comparison
| Dimension | No CDN | Pull CDN | CDN + Shield | Edge Compute |
|---|---|---|---|---|
| Latency (global) | 150–300 ms | 5–50 ms (HIT) | 5–50 ms (HIT) | <10 ms |
| Origin load | 100% | 5–20% | 1–5% | <1% (API only) |
| Cold-start misses | N/A | Per PoP per key | Per shield per key | Per PoP per key |
| Setup complexity | None | DNS + headers | DNS + headers + shield config | Code + deploy pipeline |
| Miss storm risk | N/A | High | Low | Medium |
| Personalisation | Server-side | Shell-and-slot | Shell-and-slot | Edge assembly |
| Cost | High (origin) | Low (CDN cheap) | Low + shield fee | Medium (per-invoke) |
- Pull CDN is the right default for any public-facing application.
- Add shield when origin protection matters or traffic is bursty.
- Edge compute is justified when sub-20 ms computed responses add measurable business value.
Worked example
Worked example: E-commerce product pages with CDN
Prompt: "Design the read path for an e-commerce site serving 50K product pages to 10M daily visitors globally."
Step 1: Identify cacheable content
Product pages are the core read path. Each page contains:
- Shared content (cacheable): product title, description, images, price, reviews summary, related products, SEO metadata, breadcrumbs, navigation.
- Personalised content (not cacheable): cart count, recently viewed items, user greeting, wishlist indicator, personalised recommendations.
The shared content is identical for every visitor. The personalised content varies per user. This immediately points to the shell-and-slot pattern.
Step 2: CDN topology
Deploy a pull CDN (Cloudflare, Fastly, or CloudFront) with origin shield enabled. Origin runs in us-east-1 with shield PoP in the same region.
Client hits nearest PoP. Cache HIT serves instantly; cache MISS falls through to origin, which queries DB and returns through the PoP for caching.
Step 3: Cache-key design
Cache key: URL path + selected query params (color, size, page) + Accept-Language.
Strip: utm_source, utm_medium, utm_campaign, fbclid, gclid, ref, affiliate. Normalise: sort query params alphabetically. Do not Vary on: Cookie, Authorization, User-Agent.
Expected cardinality: 50K products × 3 languages × 2 encoding variants = 300K cache entries. Well within CDN capacity.
Step 4: TTL strategy
| Content | TTL | Header |
|---|---|---|
| Product page HTML (shell) | 60 s + SWR 300 s | Cache-Control: public, s-maxage=60, stale-while-revalidate=300 |
| Product images | 1 year | Cache-Control: public, max-age=31536000, immutable (content-hashed URLs) |
| JS/CSS bundles | 1 year | Cache-Control: public, max-age=31536000, immutable (content-hashed URLs) |
| Personalised slots (cart, recs) | 0 | Cache-Control: private, no-store |
| API: /api/product/:id | 30 s | Cache-Control: public, s-maxage=30, stale-while-revalidate=120 |
Step 5: Invalidation
When a merchant updates a product (price change, new image, description edit):
- 1Backend writes to DB.
- 2Backend calls CDN purge API with surrogate key: product-42.
- 3CDN invalidates all URLs tagged with product-42: the product page, any category page featuring this product, the homepage if this product is featured.
- 4Next request triggers a miss, origin renders fresh content, CDN caches it.
For deploy-time assets (JS, CSS), use versioned URLs (content hash in filename). No purge needed — the deploy changes the URL.
Step 6: Shell-and-slot for personalisation
The product page HTML is the shell — cached at the CDN. It contains placeholder divs:
The browser fetches personalised slots from origin API calls:
- GET /api/me/cart-count → returns { count: 3 }
- GET /api/me/recommendations?context=product-42 → returns [ ... ]
These API calls use Cache-Control: private, no-store. Origin serves them from application cache (Redis) → DB. Total origin load: ~2 small JSON calls per page view instead of rendering the full page.
Step 7: Origin offload math
- Total traffic: 10M daily visitors × 5 pages/visit = 50M page views/day ≈ 580 req/s average, 3K req/s peak.
- CDN hit rate (shell): 95% (short TTL + popular products).
- Origin sees: 5% × 580 = 29 req/s average for page renders, plus personalisation API calls (~1,160 req/s for slots).
- Without CDN: origin handles 580 req/s of full page renders. With CDN: 29 req/s renders + 1,160 req/s tiny JSON calls.
- Origin compute savings: ~80%. CDN cost: ~$200/month. Origin savings: ~$2,000/month in compute. Net benefit: 10× ROI.
Step 8: Edge compute (optional)
For maximum performance, add an edge worker that:
- 1Serves the cached shell.
- 2Validates the user's JWT at the edge.
- 3Fetches personalised slot data from origin API.
- 4Assembles the complete HTML at the edge.
- 5Returns a fully rendered page in <50 ms.
This eliminates client-side JS for slot injection, improves SEO (complete HTML for crawlers), and eliminates layout shift. Trade-off: per-request edge compute cost (~$0.50/million).
Interview playbook
When it comes up
- Prompt mentions "millions of reads" on public/shared content
- Global user base with latency requirements
- Static assets or near-static pages dominate the read path
- "How would you handle traffic spikes / Black Friday?"
- Cost optimisation discussion around bandwidth or compute
Order of reveal
- 1Identify cacheable content. Separate shared from personalised. Shared content (product info, static assets) goes to CDN. Personalised content (cart, recs) stays at origin. This is the shell-and-slot split.
- 2Name the CDN topology. Pull CDN with origin shield. DNS points to CDN, PoPs cache on miss, shield collapses miss storms.
- 3Define the cache key. URL + selected query params + Accept-Language. Strip tracking params. Never Vary on Cookie for public content. Monitor cardinality.
- 4Set TTL per content type. Hashed assets: 1yr immutable. HTML shell: 60s + SWR 300s. API responses: 30s. Personalised: private, no-store.
- 5Describe invalidation. Surrogate key purge on content write. Versioned URLs for deploy assets. Soft purge as default, hard purge for security.
- 6Do the origin offload math. 95% hit rate × 50M daily page views = origin sees only 2.5M misses. CDN cost is 10× cheaper than the origin compute it replaces.
- 7Mention edge compute if relevant. For sub-20 ms personalised responses, edge workers can assemble shell + slots at the PoP. Trade-off is per-invocation cost.
Signature phrases
- “The cache key is the single most consequential CDN decision.” — Shows you understand that CDN effectiveness is determined by key design, not just "putting it behind a CDN."
- “Strip tracking params or your hit rate collapses.” — Specific, actionable, and names a real production problem most candidates miss.
- “Origin shield collapses miss storms.” — Shows awareness of the PoP → shield → origin hierarchy and why it matters for origin protection.
- “Shell-and-slot: cache the shared 90%, fetch the personal 10%.” — Names the specific pattern for mixing cached and personalised content.
- “SWR means the user never waits for origin.” — Demonstrates understanding of stale-while-revalidate as the key UX optimisation.
- “Hit rate times traffic equals origin savings — that is the business case.” — Quantitative thinking about CDN ROI, not just "it makes things faster."
Likely follow-ups
?“What happens when you need to invalidate a cached page immediately?”Reveal
Use surrogate key purge: tag the response with logical keys (product-42, category-shoes) and purge by tag. One API call invalidates all URLs carrying that tag across all PoPs. Fastly propagates in <150 ms. For deploy assets, use versioned URLs — the deploy itself is the invalidation. If the CDN purge is slow (CloudFront: 1–5 min), use tight TTL + SWR as the primary mechanism and reserve purge for emergencies.
?“How do you handle personalised content on a CDN-cached page?”Reveal
Shell-and-slot pattern. The CDN serves the cached HTML shell (product info, layout, nav). Client-side JS fetches personalised slots (cart count, recommendations) from origin API with Cache-Control: private. Alternatively, an edge worker assembles shell + slots server-side for a fully rendered page without client-side injection.
?“What is your cache key for product pages?”Reveal
URL path + whitelisted query params (color, size, page) + Vary: Accept-Language. Strip all tracking params (utm_*, fbclid, gclid). Normalise query-param order. Expected cardinality: products × languages × encodings — should be manageable. Monitor the key count to catch explosions.
?“How do you prevent a cache stampede at the edge?”Reveal
Origin shield + request coalescing. Shield collapses miss storms from 200+ PoPs to one origin request. Request coalescing within each PoP holds concurrent requests for the same key and fans out the response. Combined, origin sees flat traffic regardless of PoP count or TTL synchronisation.
?“What is the cost model for a CDN?”Reveal
CDN cost = per-request fee + per-GB egress. Typically $0.01/10K requests and $0.01–0.05/GB. Origin savings = (1 - miss_rate) × origin_compute_cost. At 95% hit rate and 50M daily requests, origin handles 2.5M instead of 50M — 20× reduction. CDN cost is usually 3–5× less than the origin compute it replaces.
?“When would you NOT use a CDN?”Reveal
When every response is fully personalised per user with no shared shell (e.g., a user dashboard with no public content). When compliance requires that data never leaves your origin network (PII, HIPAA-covered data). When the content changes every request (live scores, real-time feeds) — though even here, a 1-second TTL can absorb burst traffic.
Code snippets
// Build Cache-Control header based on content type
function cacheControl(type: 'immutable' | 'page' | 'api' | 'private'): string {
switch (type) {
case 'immutable':
// Hashed static assets — cache forever
return 'public, max-age=31536000, immutable';
case 'page':
// HTML shell — short TTL + stale-while-revalidate
return 'public, s-maxage=60, stale-while-revalidate=300';
case 'api':
// Public API response — short TTL
return 'public, s-maxage=30, stale-while-revalidate=120';
case 'private':
// Personalised content — never cache at CDN
return 'private, no-store';
}
}
// Usage in an Express/Next.js handler:
// res.setHeader('Cache-Control', cacheControl('page'));
// res.setHeader('Surrogate-Key', 'product-42 category-shoes');// Purge CDN cache by surrogate key (Fastly-style API)
async function purgeByTag(tags: string[]): Promise<void> {
const resp = await fetch('https://api.fastly.com/service/SVC_ID/purge', {
method: 'POST',
headers: {
'Fastly-Key': process.env.FASTLY_API_KEY!,
'Content-Type': 'application/json',
'Fastly-Soft-Purge': '1', // soft purge = serve stale while revalidating
},
body: JSON.stringify({ surrogate_keys: tags }),
});
if (!resp.ok) throw new Error(`Purge failed: ${resp.status}`);
}
// On product update:
// await db.updateProduct(42, { price: 29.99 });
// await purgeByTag(['product-42', 'category-shoes']);// Express middleware that sets SWR headers based on route
import { Request, Response, NextFunction } from 'express';
const SWR_CONFIG: Record<string, { sMaxAge: number; swr: number }> = {
'/product/:id': { sMaxAge: 60, swr: 300 },
'/api/products': { sMaxAge: 30, swr: 120 },
'/': { sMaxAge: 120, swr: 600 },
};
function swrMiddleware(route: string) {
const config = SWR_CONFIG[route];
return (_req: Request, res: Response, next: NextFunction) => {
if (config) {
res.setHeader(
'Cache-Control',
`public, s-maxage=${config.sMaxAge}, stale-while-revalidate=${config.swr}`
);
}
next();
};
}// Cloudflare Worker: validate JWT, serve cached shell, inject user context
export default {
async fetch(request: Request, env: Env): Promise<Response> {
// 1. Auth at the edge
const token = request.headers.get('Authorization')?.replace('Bearer ', '');
const user = token ? await verifyJWT(token, env.JWT_SECRET) : null;
// 2. Fetch cached shell from edge cache (or origin on miss)
const cacheKey = new URL(request.url).pathname;
const cache = caches.default;
let shell = await cache.match(cacheKey);
if (!shell) {
shell = await fetch(env.ORIGIN_URL + cacheKey);
// Cache the shell for 60s
const headers = new Headers(shell.headers);
headers.set('Cache-Control', 'public, s-maxage=60');
shell = new Response(shell.body, { headers });
await cache.put(cacheKey, shell.clone());
}
// 3. Return shell (browser JS fetches personalised slots)
return shell;
},
};# AWS CloudFront distribution with Origin Shield enabled
# Place shield in the same region as your origin for minimal latency
Resources:
CDNDistribution:
Type: AWS::CloudFront::Distribution
Properties:
DistributionConfig:
Origins:
- Id: AppOrigin
DomainName: origin.example.com
OriginShield:
Enabled: true
OriginShieldRegion: us-east-1 # Same region as origin
CustomOriginConfig:
OriginProtocolPolicy: https-only
DefaultCacheBehavior:
ViewerProtocolPolicy: redirect-to-https
CachePolicyId: !Ref CachePolicy
OriginRequestPolicyId: !Ref OriginRequestPolicy
Enabled: true
CachePolicy:
Type: AWS::CloudFront::CachePolicy
Properties:
CachePolicyConfig:
Name: ProductPageCache
DefaultTTL: 60
MaxTTL: 86400
MinTTL: 0
ParametersInCacheKeyAndForwardedToOrigin:
EnableAcceptEncodingGzip: true
EnableAcceptEncodingBrotli: true
QueryStringsConfig:
QueryStringBehavior: whitelist
QueryStrings:
- color
- size
- pageDrills
A product page has a 95% CDN hit rate with 1M daily requests. How many requests reach origin?Reveal
5% × 1,000,000 = 50,000 requests/day reach origin. At 86,400 seconds/day, that is ~0.6 req/s average. With origin shield, concurrent misses are collapsed further.
Your CDN reports 10M cache keys for 500K product pages. What is wrong?Reveal
Key cardinality is 20× the page count. Likely causes: tracking query params (utm_*, fbclid) not stripped, unnormalised query-param order, or excessive Vary headers. Fix: whitelist meaningful params, strip the rest, normalise order.
A merchant updates a product price. How long until all users see the new price?Reveal
With surrogate key purge: <150 ms (Fastly) to 5 min (CloudFront) for purge propagation, plus one origin fetch on next miss. With TTL-only (60s + SWR 300s): worst case 360 seconds (max-age expired, stale served, revalidation completes). Purge makes it near-instant.
Why is Vary: Cookie dangerous for public pages?Reveal
It creates one cache entry per unique Cookie header value. Since every user has a unique session cookie, the CDN stores a separate copy per user. Hit rate drops to ~0% because no two users share a cache entry. Use shell-and-slot instead.
What is the difference between max-age and s-maxage?Reveal
max-age sets TTL for all caches (browser + CDN). s-maxage sets TTL for shared caches (CDN) only, overriding max-age for the CDN. Use s-maxage=60, max-age=0 to let the CDN cache for 60s while the browser always revalidates.
When should you use edge compute vs traditional CDN caching?Reveal
Use edge compute when you need computed responses near the user: auth validation, A/B bucketing, geo-routing, HTML assembly. Use traditional caching when responses are static or semi-static with standard TTLs. Edge compute adds per-invocation cost; caching is essentially free after the first miss.