Pattern

Content recommendation

When to reach for this

Reach for this when…

"Recommended for you" feeds
Personalised home screens
E-commerce cross-sell / up-sell
"People you may know" / friend suggestions
Music / video autoplay

Not really this pattern when…

Small catalogue (<1000 items) — just popularity-sort
Pure chronological feed (no personalisation)
Regulatory prohibition on personalisation

Good vs bad answer

Interviewer probe

“Design a "Recommended for you" feed.”

Weak answer

"Train a neural network on click data and use it to recommend."

Strong answer

"Two-stage. Candidate generation blends channels: two-tower embedding ANN (top 500), collaborative filtering (top 200), trending + fresh items (top 100) — ~1000 candidates. Ranker is a GBDT scoring each candidate with features from a feature store: user 7-day activity, item age, item CTR, context (time, device). Feature store (Feast or in-house) serves the same features to training jobs and to the online ranker — no skew. Top 20 returned, with 10% epsilon-greedy swap-ins for exploration. Training data = logged impressions + clicks; retrain daily. Cold-start user: fall back to popular-in-segment. Cold-start item: content-based ANN from item metadata. Latency budget: 50 ms p99 — caps candidate count, batched feature fetch, warm cache."

Why it wins: Two-stage explicitly, feature-store unification, exploration, cold-start, and a latency budget.

Cheat sheet

•Two stages: candidate gen (cheap, many) + ranker (expensive, few).
•Always multi-channel candidate gen. No single source.
•Feature store: same code for train + serve.
•Start with GBDT. NN later.
•Always explore. 5–10% random.
•Retrain on a cadence matching drift.
•Cold-start plan for users AND items.

Core concept

The two-stage pattern dominates: candidate generation narrows millions of items to ~1000 quickly; ranking scores those with an expensive model and picks top K.

Candidate generation techniques (cheap retrieval):

Collaborative filtering / matrix factorisation — users who liked X also liked Y. Batch-computed; often via ALS or two-tower embeddings.
Content-based — item embedding + user embedding; ANN (FAISS, ScaNN, Pinecone) for nearest neighbours. Millisecond retrieval on 100M+ items.
Heuristic channels — "trending now", "new in your city", "friends of friends". Simple SQL/Redis aggregations. Always include a few.

Ranking (expensive scoring): a model (GBDT, DeepFM, two-tower) scores each candidate against rich features. Features come from a feature store (Feast, Tecton, in-house): user features (demographics, recent activity), item features (title, category, age, CTR), context features (time of day, device). Training labels come from implicit feedback (clicks, watch time).

The feedback loop: every serving generates logs (impressions, clicks, plays). Logs are the training data for tomorrow's model. Retrain daily / weekly. Watch for feedback-loop pathologies — the model recommends what it's already recommending and stops exploring.

Exploration matters: pure greedy → filter bubble. Inject a small fraction of out-of-distribution recommendations (epsilon-greedy, Thompson sampling) to keep the model learning.

Canonical examples

→YouTube watch-next
→Spotify Discover Weekly
→Amazon "customers also bought"
→Instagram / TikTok feed
→Netflix home row

Decision levers

Candidate-gen source mix

Never one source. Always: CF (or two-tower embeddings) + content-based ANN + trending + a small fresh-items channel. Weighted blend keeps the slate diverse.

Offline vs online features

Batch features (user 30-day activity): computed daily, served from feature store. Real-time features (last 5 clicks): streamed via Kafka → feature store's online layer. Skew between offline training features and online serving features is the #1 cause of "model works in training, fails in prod".

Ranking model

Start with GBDT (LightGBM / XGBoost) — fast, interpretable, strong baseline. Graduate to NN (DeepFM, two-tower) when GBDT plateaus and you have the infra. Don't start with deep learning.

Exploration strategy

Epsilon-greedy (simple, lose small fraction to random) or Thompson sampling (principled, bandit framework). 5–10% exploration is typical. Without it, coverage collapses.

Failure modes

Train/serve feature skewAdvanced

Training features use DB joins; serving features use cached values. They diverge; offline metrics lie. Fix: one codepath generates features for both train and serve (feature store).

Feedback loop collapseAdvanced

Model recommends what it recommends; users click only what's shown; model reinforces. Coverage shrinks. Fix: exploration.

Cold-start

New user / new item has no history. Fix: content-based fallback for items; popularity-in-segment for users; progressive reveal as data accrues.

Stale models

Model trained 6 months ago. User preferences drifted; item catalogue changed. Retrain on a schedule (daily for fast-moving catalogues).

P99 latency blown by rankerAdvanced

Expensive NN scoring N candidates times K feature lookups = latency explosion. Budget: ~50 ms for the whole rec call. Cap candidate count; batch feature fetches; warm caches.

Drills

Why not just one big model?Reveal

Latency. Scoring a deep model against 100M candidates is impossible in 50 ms. Candidate gen is cheap retrieval (ANN, SQL) — narrows to thousands. Ranking is expensive scoring — runs on those thousands. Two stages = each tuned for its job.

Offline AUC up, online CTR flat. Diagnose.Reveal

Top suspects: (1) train/serve feature skew — offline features computed differently than online; (2) selection bias — training on logged impressions means the model learned what the old system showed, not what users actually want; (3) exploration rate too low — no new signal reaching the model; (4) business metric mismatch — AUC ≠ CTR. Investigate in that order.

11% complete

Current

When to reach for this

Step 1 of 9

Good vs bad answer

Jump to next

All patterns

Pattern

Content recommendation

When to reach for this

Reach for this when…

"Recommended for you" feeds
Personalised home screens
E-commerce cross-sell / up-sell
"People you may know" / friend suggestions
Music / video autoplay

Not really this pattern when…

Small catalogue (<1000 items) — just popularity-sort
Pure chronological feed (no personalisation)
Regulatory prohibition on personalisation

Good vs bad answer

Interviewer probe

“Design a "Recommended for you" feed.”

Weak answer

"Train a neural network on click data and use it to recommend."

Strong answer

Why it wins: Two-stage explicitly, feature-store unification, exploration, cold-start, and a latency budget.

Cheat sheet

•Two stages: candidate gen (cheap, many) + ranker (expensive, few).
•Always multi-channel candidate gen. No single source.
•Feature store: same code for train + serve.
•Start with GBDT. NN later.
•Always explore. 5–10% random.
•Retrain on a cadence matching drift.
•Cold-start plan for users AND items.

Core concept

The two-stage pattern dominates: candidate generation narrows millions of items to ~1000 quickly; ranking scores those with an expensive model and picks top K.

Candidate generation techniques (cheap retrieval):

Collaborative filtering / matrix factorisation — users who liked X also liked Y. Batch-computed; often via ALS or two-tower embeddings.
Content-based — item embedding + user embedding; ANN (FAISS, ScaNN, Pinecone) for nearest neighbours. Millisecond retrieval on 100M+ items.
Heuristic channels — "trending now", "new in your city", "friends of friends". Simple SQL/Redis aggregations. Always include a few.

Exploration matters: pure greedy → filter bubble. Inject a small fraction of out-of-distribution recommendations (epsilon-greedy, Thompson sampling) to keep the model learning.

Canonical examples

→YouTube watch-next
→Spotify Discover Weekly
→Amazon "customers also bought"
→Instagram / TikTok feed
→Netflix home row

Decision levers

Candidate-gen source mix

Never one source. Always: CF (or two-tower embeddings) + content-based ANN + trending + a small fresh-items channel. Weighted blend keeps the slate diverse.

Offline vs online features

Ranking model

Start with GBDT (LightGBM / XGBoost) — fast, interpretable, strong baseline. Graduate to NN (DeepFM, two-tower) when GBDT plateaus and you have the infra. Don't start with deep learning.

Exploration strategy

Epsilon-greedy (simple, lose small fraction to random) or Thompson sampling (principled, bandit framework). 5–10% exploration is typical. Without it, coverage collapses.

Failure modes

Train/serve feature skewAdvanced

Training features use DB joins; serving features use cached values. They diverge; offline metrics lie. Fix: one codepath generates features for both train and serve (feature store).

Feedback loop collapseAdvanced

Model recommends what it recommends; users click only what's shown; model reinforces. Coverage shrinks. Fix: exploration.

Cold-start

New user / new item has no history. Fix: content-based fallback for items; popularity-in-segment for users; progressive reveal as data accrues.

Stale models

Model trained 6 months ago. User preferences drifted; item catalogue changed. Retrain on a schedule (daily for fast-moving catalogues).

P99 latency blown by rankerAdvanced

Expensive NN scoring N candidates times K feature lookups = latency explosion. Budget: ~50 ms for the whole rec call. Cap candidate count; batch feature fetches; warm caches.

Drills

Why not just one big model?Reveal

Offline AUC up, online CTR flat. Diagnose.Reveal

11% complete

Current

When to reach for this

Step 1 of 9

Good vs bad answer

Jump to next