Fan-out: on write vs on read
When to reach for this
Reach for this when…
- Social graph with asymmetric follow counts (power-law follower distribution)
- Newsfeed / activity stream / home timeline design
- One-to-many notification delivery at scale
- Any design where a single write must be visible to M other users
- "Design Twitter" / "Design Instagram feed" / "Design LinkedIn feed"
Not really this pattern when…
- Bounded small groups only (team chat, <50 members) — just materialise the group inbox directly
- No follow relationship — user pulls their own data, not other users' data
- Symmetric messaging (1:1 DMs) — that's a queue, not a fan-out
- Pub/sub where consumers are services, not users — that's producer-consumer
Good vs bad answer
Interviewer probe
“How would you design a Twitter-like home timeline?”
Weak answer
"I'd use a database to store tweets and when a user wants their feed, I'd query all the tweets from people they follow, sorted by time. For scale I'd add caching. If there are a lot of followers I'd use a message queue to handle the writes asynchronously."
This answer doesn't identify the core tradeoff (push vs pull), doesn't mention the celebrity problem, proposes no concrete data structures, and conflates "caching" with architectural decisions. The candidate hasn't shown they understand why the naive approach breaks.
Strong answer
"This is a fan-out problem — the question is whether to pay distribution cost at write time or read time. Pure push (fan-out on write into Redis ZSET inboxes) gives O(1) reads but dies on celebrities — a Bieber tweet would trigger 100M inbox writes. Pure pull gives O(1) writes but dies on power users following thousands of accounts. The production answer is a hybrid with a ~1M-follower threshold: push below, pull above, merge at read time.
Write path: post service → check follower count → if below threshold, fan-out worker pipelines ZADD into follower inboxes (bounded to 800 via ZREMRANGEBYRANK); if above, store in author timeline only. Read path: ZREVRANGE the inbox, pull-fetch from each celebrity the user follows (~10-15), merge-K with a min-heap, pass candidates through a ranking service, return ranked page."
This answer names the pattern, identifies the structural tradeoff, cites the celebrity problem with real numbers, proposes concrete data structures (Redis ZSET), and sketches both paths of the hybrid. It demonstrates principal-engineer-level reasoning.
Why it wins: The strong answer identifies the structural tradeoff, proposes the hybrid with a principled threshold, and sketches concrete data structures — showing the candidate can reason about cost distributions, not just draw boxes.
Cheat sheet
- •Push: O(1) read, O(followers) write. Pull: O(following) read, O(1) write.
- •Hybrid with celebrity threshold (~1M followers) is the production answer.
- •Inbox = Redis ZSET, bounded to 800 entries via ZADD + trim.
- •Merge-K at read: min-heap of inbox + K celebrity pulls, O(N log K).
- •Spread fan-out: chunk follower list, drain over 30-60s for mid-tier celebs.
- •Follow → backfill last 10 posts. Unfollow → lazy filter + nightly sweep.
- •Ranking pipeline: candidates → features → ML score → diversity filter.
- •Inactive user inboxes: 7-day TTL, reconstruct on next login.
- •Multi-region: regional fan-out workers + regional inbox Redis clusters.
- •Dynamic threshold with hysteresis prevents oscillation (promote at 1M, demote at 800K).
- •Fan-out workers are idempotent — ZADD same member+score is a no-op, safe to retry.
- •Celebrity timelines are cache-hot (millions pull them) — >99% hit rate.
Core concept
Every social feed faces the same fundamental question: when a user publishes a post, who pays the distribution cost — the writer or the reader?
Fan-out on write (push) pays at write time. The moment a user publishes, a fleet of fan-out workers copies the post identifier into every follower's inbox — typically a Redis sorted set keyed by user id, scored by timestamp. From the reader's perspective, the feed is already materialised: open the app, read your inbox, done. The beauty of push is that reads are O(1). The horror is that writes are O(followers). For an average user with 300 followers, that's 300 ZADD operations — trivial. For a celebrity with 100 million followers, it's 100 million ZADD operations from a single post event. At Twitter's scale of ~500 million tweets per day, a handful of celebrity posts can generate billions of inbox writes and bury the fan-out queue for minutes.
Fan-out on read (pull) pays at read time. Posts are stored exactly once in the author's own timeline. When a reader opens their feed, the feed service fetches the last N posts from each account the reader follows, merges them with a min-heap (O(N log K) where K is the number of followed accounts), and returns the ranked result. Writes are O(1) — beautiful. But reads now scale with the reader's follow count. A power user following 5,000 accounts triggers 5,000 timeline lookups per feed refresh. Even with aggressive caching, that's an untenable p99 latency for a product where users pull-to-refresh every few seconds.
Neither shape survives the real world alone. The follower distribution on every social platform follows a power law: the vast majority of users have hundreds of followers, a small minority have millions. This isn't a bug in the data — it's the fundamental structure of attention networks. And it means the system must handle both extremes simultaneously.
The canonical production architecture: push for normal users, pull for celebrities, merge at read.
The hybrid approach — the only architecture that actually ships at scale — partitions users by follower count. Below a threshold (typically around 1 million followers), posts are fanned out on write into follower inboxes. Above the threshold, posts stay in the celebrity's own timeline and are pulled at read time. At feed-read time, the service merges the reader's pre-materialised inbox with a small number of celebrity timeline pulls (most users follow fewer than 20 celebrities), ranks the combined result, and returns the feed.
This is not a clever optimisation. It is the recognition that push and pull are two regimes of the same cost function, and the follower-count threshold is the phase boundary between them. Twitter's engineering team arrived at this architecture around 2012-2013 after years of pure-push pain. Instagram uses a similar hybrid with heavier algorithmic ranking. LinkedIn's activity feed applies the same principle but with a different twist: connections are bounded (average ~500) so the push path dominates, while company and influencer follows use the pull path.
The follow graph is asymmetric by design. Users follow celebrities; celebrities don't follow back. This asymmetry means the write amplification of push is concentrated on a tiny fraction of authors, while the read amplification of pull is concentrated on a tiny fraction of readers (the power users who follow thousands). The hybrid exploits this by routing each post through the cheaper path based on its author's follower count.
The numbers that matter: Twitter processes roughly 500 million tweets per day across 200+ million daily active users. The fan-out workers process millions of inbox writes per second. A single Bieber post (100M+ followers) would, under pure push, generate more inbox writes than all other tweets in that second combined. The hybrid avoids this entirely — Bieber's posts live in his timeline and are pulled by the ~20% of users who follow him when they open their feed.
Understanding this pattern is non-negotiable for any feed design interview. The interviewer isn't testing whether you know the word "fan-out" — they're testing whether you can reason about the cost distribution of a write that must be visible to an arbitrary number of readers, and whether you can identify the structural property of the data (power-law follower distribution) that makes a hybrid the only viable answer.
Canonical examples
- →Twitter / X home timeline
- →Instagram feed
- →LinkedIn activity feed
- →Activity notifications (mentions, likes, retweets)
- →Push notification fan-out to mobile devices
- →GitHub activity / notification feed
Variants
Fan-out on write (push)
Copy the post id into every follower's inbox at write time.
When a user posts, the fan-out worker copies the post id into every follower inbox. Reads are O(1).
Fan-out on write is conceptually the simplest feed architecture: when a user publishes a post, a background worker iterates over their follower list and inserts the post identifier into each follower's inbox. The inbox is almost always a Redis sorted set (ZSET) keyed by the follower's user id, with the post timestamp (or a snowflake id) as the score. The ZADD command is O(log N) per member where N is the inbox size, and since we bound inboxes to ~800 entries, each write is effectively O(1).
The write path looks like this: the post service persists the post to the posts database, then enqueues a fan-out task. The fan-out worker reads the author's follower list (paginated, from a follow-graph service), pipelines batches of ZADD commands to the inbox Redis cluster, and after each batch calls ZREMRANGEBYRANK to trim the inbox to its maximum size. A Redis pipeline of 50 ZADDs completes in under 1ms on a local cluster, so fanning out to 10,000 followers takes ~200ms — fast enough to feel real-time.
When a user posts, the fan-out worker copies the post id into every follower inbox. Reads are O(1).
This is essentially what Twitter used before 2012. Every tweet triggered a fan-out to every follower's inbox. For the median user with ~200 followers, this was perfect — 200 writes per tweet, amortised across a fleet of fan-out workers, with sub-second delivery. Feed reads were a single ZREVRANGE call returning the top 50 post ids, followed by a multi-get to hydrate the post objects. Total read latency: ~5ms.
The model breaks catastrophically on celebrity posts. When Justin Bieber tweets to 100M+ followers, the fan-out queue receives a task that expands to 100 million ZADD operations. Even at 100K writes/second per worker, that's 1,000 seconds of fan-out time from a single tweet. During that window, the fan-out queue backs up, other users' tweets are delayed, and followers see stale feeds. This is the "celebrity post storm" — the defining failure mode of pure push.
There's a subtler problem too: wasted storage. In any social network, a large fraction of accounts are inactive — they signed up, followed some people, and never came back. Pure push writes into their inboxes anyway. At Twitter's scale, inactive-user inbox writes consumed a significant fraction of Redis memory for content no one would ever read.
Despite these limitations, pure push remains the right answer for bounded fan-out scenarios: team activity feeds (Slack channels, GitHub repo watchers), notification delivery to device tokens (where the fan-out is from event to registered devices), and any system where the maximum follower count is architecturally bounded.
Pros
- +O(1) reads — the feed is pre-materialised in the inbox
- +Read latency is predictable and low (~5ms for a ZREVRANGE)
- +Ranking and filtering can be baked in at write time
- +Simple read path — single data source, no merge logic
Cons
- −Celebrity posts are catastrophic — O(followers) writes per post
- −Write amplification = followers × posts per day
- −Inactive followers waste storage and write bandwidth
- −Fan-out queue backlog delays all users during celebrity storms
Choose this variant when
- Most users have <100K followers (bounded fan-out)
- Reads vastly outnumber writes (typical social: 100:1 read:write)
- You control the follow graph shape (e.g., team-based, not social)
- The system has no celebrity / power-law problem
Fan-out on read (pull)
Store posts once in the author's timeline; gather and merge at read time.
Posts live in author timelines. At read time the feed service gathers, merges, and ranks.
Fan-out on read inverts the cost structure: writes are O(1) — the author appends a post to their own timeline — and reads pay the aggregation cost. When a reader opens their feed, the feed service looks up the reader's follow list, fetches the last N posts from each followed account's timeline, merges them into a single sorted stream, and returns the top page.
The merge operation is a classic K-way merge. You have K sorted streams (one per followed account), each producing posts in reverse-chronological order. A min-heap of size K lets you extract the global top-N in O(N log K) time. In practice, K is the number of accounts the reader follows — for a typical user following 200 accounts, this means a heap of 200 entries and ~50 extractMin operations to fill a page.
Posts live in author timelines. At read time the feed service gathers, merges, and ranks.
This was Facebook's early approach (before the shift to algorithmic ranking): at news feed render time, pull the latest posts from each friend's wall, merge, and display. For the median user with ~150 friends, this was manageable. But the p99 user follows 2,000+ accounts, and the feed service must issue 2,000 parallel timeline reads, wait for the slowest one, then merge. Even with aggressive caching (each timeline cached in memcached with a 60-second TTL), the tail latency was painful.
The deeper problem is cache efficiency. In a push model, the inbox is a single hot key per user — extremely cache-friendly. In a pull model, each followed account's timeline is a separate cache entry, and the hit rate depends on how many other readers are also following that account. Celebrity timelines are hot (millions of readers pull them) and stay cached. But the long tail of low-follower accounts has poor cache hit rates, and a cache miss means a database read for a timeline that might be read once and evicted.
Pull does have genuine advantages. Writes are trivially fast — one append, no fan-out queue, no worker fleet. There's no wasted work on inactive readers. And when you add algorithmic ranking (which requires scoring each candidate against the reader's interest model), pull naturally produces the candidate set you need to rank. Push pre-materialises a chronological inbox that you then have to re-rank anyway, which partly defeats the purpose.
Pull is the right answer when the fan-in is low (users follow <100 accounts), when celebrity content dominates the feed, or in read-light workloads where most users check their feed infrequently. It's also the simpler starting point — no fan-out infrastructure, no inbox storage, just timelines and a merge service.
Pros
- +O(1) writes — no fan-out queue, no worker fleet
- +No wasted work on inactive readers
- +Naturally produces candidate set for algorithmic ranking
- +No inbox storage amplification
Cons
- −Read latency scales with fan-in (number of followed accounts)
- −Cache hit rate per-followed-account varies wildly
- −K-way merge adds CPU cost at read time
- −Tail latency from slowest timeline read dominates p99
Choose this variant when
- Users follow <100 accounts on average
- Celebrity / one-to-many broadcast dominates the content mix
- Algorithmic ranking is already required (pull provides candidates naturally)
- Write volume is high relative to reads (rare in social, common in logging)
Hybrid (push + pull threshold)
Push for normal users, pull for celebrities, merge at read time.
The hybrid is not a compromise — it is the only architecture that survives production at scale. The insight is that push and pull are two regimes of the same cost function, and the follower-count threshold is the phase boundary.
Set a threshold — say, 1 million followers. Below it, the post triggers fan-out on write: the fan-out worker pushes the post id into every follower's inbox. Above it, the post is stored only in the celebrity's own timeline; no fan-out occurs. At feed-read time, the service reads the user's inbox (pre-materialised by push) and also pulls the latest posts from each celebrity the user follows. A typical user follows 5-15 celebrities, so this pull adds at most 15 timeline reads to an otherwise O(1) inbox read. The merged result is passed to the ranking service.
The threshold is not a magic number — it's a cost boundary. Below it, the per-post fan-out cost (N ZADD operations) is cheaper than the aggregate read-time pull cost across all readers. Above it, the fan-out cost exceeds the read-time cost. The exact value depends on the ratio of post frequency to feed-read frequency, the Redis write throughput, and the celebrity's posting cadence. Twitter's engineering team settled on a threshold in the low millions of followers, tuned dynamically: users crossing the threshold during viral moments are reclassified within minutes by a follower-count monitoring service.
This is what Twitter, Instagram, and LinkedIn all do. Twitter's Raffi Krikorian described this architecture at QCon: the fan-out service checks the author's follower count before deciding the write path. Instagram extends it with heavier algorithmic ranking — the pull path feeds celebrity content into a ML-based scoring pipeline that weighs recency, engagement prediction, and interest-graph affinity. LinkedIn's variant exploits the fact that connections are bounded (~500 average) so the push path handles connections, while company and influencer follows use the pull path.
The complexity cost is real: two code paths, two storage layers, a merge step at read time, and a monitoring system for threshold crossings. But the alternative — pure push that melts during a Bieber tweet, or pure pull that can't serve p99 latency for power users — doesn't ship.
Pros
- +Celebrities don't destroy the write path — no 100M-write storms
- +Normal users still get O(1) reads for most of their feed content
- +Bounded number of celebrity pulls at read time (typically <20)
- +Proven at scale by Twitter, Instagram, LinkedIn
Cons
- −Two code paths with different failure modes
- −Threshold tuning is ongoing (static breaks during viral moments)
- −Merge + rank at read time adds latency and complexity
- −Monitoring for threshold crossings adds operational overhead
Choose this variant when
- Any social platform at scale — this is the canonical senior answer
- Follower distribution follows a power law (it always does)
- The product demands sub-100ms feed reads AND celebrity support
Activity stream (non-social fan-out)
System-event fan-out where the source set is bounded — push always works.
Not every fan-out problem has a celebrity problem. Activity streams — GitHub's notification feed, Slack's channel activity, JIRA's project activity log — differ from social feeds in one structural way: the source set is bounded by system design, not by user behaviour.
On GitHub, a repository might have 10,000 watchers. When someone pushes a commit, the activity service fans out a notification to those 10,000 watchers' activity feeds. The fan-out is bounded because repository watchers are bounded (and self-selected). There is no "celebrity repository" with 100 million watchers that breaks the push path. Slack channels have a hard member limit. JIRA projects have bounded team sizes.
In these systems, pure fan-out on write is the correct answer and the hybrid complexity is unnecessary. The architecture is straightforward: event source → message queue → fan-out worker → per-user activity inbox (often a bounded list in Redis or a DynamoDB partition). The fan-out is measured in thousands, not millions, and completes in milliseconds.
The key difference from social fan-out is that the architect controls the maximum fan-out by controlling the maximum group size. If you can enforce that no single source fans out to more than N recipients, and N is bounded by product rules (not by user growth), pure push is simpler, cheaper, and more predictable than any hybrid.
Scaling path
V1: Single database, pull at read time
Ship a working feed with zero infrastructure beyond the application database.
Posts and follows tables, feed built at read time with a join.
Store posts in a posts table, follows in a follows table. When a user requests their feed, run a join: SELECT p.* FROM posts p JOIN follows f ON p.author_id = f.followed_id WHERE f.follower_id = ? ORDER BY p.created_at DESC LIMIT 50. This is pure pull — no fan-out, no inbox, no workers.
This works for the first 10K users. The join is fast when the follows table fits in memory and the posts table is indexed on (author_id, created_at). Read latency is 10-50ms depending on the number of followed accounts.
What triggers the next iteration
- The join becomes expensive as the follows table grows — each feed read scans O(following) index entries
- No caching layer — every feed read hits the database
- Cannot support real-time delivery — users must refresh to see new posts
- Ranking requires sorting the merged result on every read
V2: Materialised inbox with Redis ZSET (push)
Eliminate the read-time join by pre-materialising each user's feed into a Redis sorted set.
Fan-out worker writes into Redis ZSET inboxes; feed reads are O(1).
Add a fan-out worker: when a user posts, iterate their follower list and ZADD the post id (scored by timestamp) into each follower's inbox ZSET. Feed reads become a single ZREVRANGE call — O(1) regardless of follow count.
Trim inboxes to 800 entries with ZREMRANGEBYRANK after each ZADD batch. Older content is still available via the author's timeline (fallback for deep scroll). This handles millions of users with sub-10ms feed reads.
What triggers the next iteration
- Celebrity posts generate millions of ZADD operations, backing up the fan-out queue
- Inactive users waste Redis memory — their inboxes are written but never read
- Fan-out latency for high-follower authors can exceed 60 seconds
- No ranking beyond chronological order
V3: Hybrid fan-out + ranking service
Survive celebrity posts and add algorithmic ranking.
Push for normal users, pull for celebrities, ranking service merges and scores.
Introduce a follower-count threshold (~1M). Below it: push (fan-out on write into inboxes). Above it: skip fan-out, store in author timeline only. At read time: merge the pre-materialised inbox with pull-fetched celebrity timelines, then pass candidates through a ranking service that scores by recency, engagement prediction, and affinity.
The ranking service receives ~100-200 candidates (50 from inbox + 10-15 celebrity timeline pulls × 10 posts each), enriches them with features from the feature store, runs an ML scoring model, applies diversity filtering, and returns a ranked page of 50 items. Total read latency: 30-80ms.
What triggers the next iteration
- Single-region deployment means cross-region read latency for global users
- Ranking model cold-start for new users with sparse interaction history
- Fan-out still takes seconds for mid-tier celebrities (100K-1M followers)
- Follow/unfollow churn creates stale entries in inboxes
V4: Multi-region fan-out with regional inboxes
Serve global users with <50ms feed reads from the nearest region.
Posts originate in one region and are replicated to regional fan-out workers with regional inboxes.
Partition followers by region. When a post arrives, the region router splits the follower list into regional chunks and dispatches fan-out tasks to regional fan-out workers. Each region maintains its own inbox Redis cluster. Feed reads are served entirely from the local region — no cross-region calls for the inbox read path.
Celebrity timeline pulls still cross regions (the celebrity's timeline lives in their home region), but these are cached aggressively in each consuming region with a 30-second TTL. Cross-region replication lag for celebrity content is acceptable because feed freshness expectations are lower for celebrity posts (users don't expect to see a Bieber tweet within 1 second).
What triggers the next iteration
- Cross-region celebrity timeline cache invalidation adds complexity
- Follow-graph partitioning must track user region changes
- Regional inbox divergence during network partitions
- Operational cost of managing Redis clusters in 3+ regions
Deep dives
Choosing and managing the celebrity threshold
The celebrity threshold is the single most important tuning parameter in a hybrid fan-out system. Set it too low and you pull too many authors at read time, inflating feed latency. Set it too high and celebrity posts storm the fan-out queue. The right value depends on four variables: the average fan-out worker throughput (ZADD/sec), the SLA for fan-out completion time, the p99 number of celebrities a user follows, and the acceptable read-time pull latency.
A practical starting point is 1 million followers. At this level, a single post triggers up to 1M ZADD operations. With a fan-out worker fleet processing 500K writes/second aggregate, that's a 2-second fan-out — acceptable. Above 1M, the fan-out time grows linearly and starts competing with other posts in the queue.
Four tiers of users and their fan-out strategy. The boundary is never arbitrary — it is an engineering partition.
Static thresholds break during viral moments. A user with 900K followers posts something that goes viral; within hours they cross 2M followers. If the threshold check happens only at post time, the system fans out to 900K inboxes for the viral post (fine) but then fans out to 2M for the next post (still fine but getting expensive). The real problem is the reverse: a user oscillates around the threshold, and some posts are pushed while others aren't, creating inconsistent feed behaviour for followers.
The solution is a dynamic threshold with hysteresis. The follower-count monitoring service checks counts every 5 minutes. When a user crosses from below to above the threshold, they're reclassified as a "celebrity" with a cooldown period (e.g., 1 hour) before they can be reclassified back down. This prevents oscillation.
What happens when a 100M-follower celebrity posts into a pure-push system.
The celebrity storm diagram shows what happens without a threshold: a 100M-follower post generates 100M fan-out tasks, saturates the queue, and delays all other fan-out for minutes. With the threshold, this post simply goes into the author's timeline — zero fan-out tasks, zero queue impact.
Inbox storage: Redis ZSET mechanics and memory budget
The inbox is the core data structure of fan-out on write. Each user's inbox is a Redis sorted set (ZSET) where members are post ids and scores are timestamps (or snowflake ids for ordering). The ZSET provides three critical operations:
- 1ZADD — insert a post id with a score. O(log N) where N is the set size. With bounded inboxes (N ≤ 800), this is effectively O(1).
- 2ZREVRANGE — read the top K entries by descending score. O(K + log N). For a feed page of 50 items: sub-millisecond.
- 3ZREMRANGEBYRANK — trim the set to keep only the top N entries. Called after ZADD to enforce the size bound.
Fan-out workers ZADD into bounded sorted sets; feed reads via ZREVRANGE; deep scroll falls back to author timelines.
The fan-out worker pipelines these operations for efficiency. A typical batch: ZADD inbox:{user_id} {timestamp} {post_id} for each of 50 follower inboxes in a single Redis pipeline, followed by ZREMRANGEBYRANK inbox:{user_id} 0 -(max_size+1) to trim. The pipeline completes in under 1ms for 50 operations.
Memory budget: each ZSET entry costs ~80 bytes (64-byte member + 8-byte score + 8-byte overhead). An inbox of 800 entries costs ~64KB. For 200M users: 200M × 64KB = 12.8TB. This is the primary cost driver and the reason inboxes must be bounded. Without the bound, memory grows linearly with post volume × follower count, which is unsustainable.
Deep scroll (page 2+) works by continuing the ZREVRANGE with an offset. Beyond the inbox's 800 entries, the feed service falls back to pulling from author timelines — switching from push to pull for historical content. This is acceptable because deep scroll is rare (<5% of feed reads go past page 1) and latency expectations are relaxed.
TTL is an additional lever: set a 7-day TTL on inbox keys so that users who haven't opened the app in a week don't consume memory. When they return, the feed service detects the empty/expired inbox and does a full pull-based reconstruction, then resumes normal push delivery.
Merge-K streams at read time
In the hybrid model, feed reads merge two data sources: the pre-materialised inbox (from push) and K celebrity timelines (from pull). The merge is a K-way merge of sorted streams, where K = 1 (inbox) + number of celebrities the user follows.
The algorithm uses a min-heap of size K+1. Initialise the heap with the most recent entry from each stream. To produce the next feed item, extract the max from the heap, then insert the next entry from that item's source stream. Repeat until you have a full page (typically 50 items).
The reader's inbox plus K celebrity timelines are merged via a min-heap, then ranked and filtered.
Time complexity: O(N log K) where N is the page size and K is the number of streams. For a typical user following 10 celebrities: O(50 × log 11) ≈ 170 comparisons — trivial.
The expensive part isn't the merge — it's fetching the celebrity timelines. Each timeline read is a ZREVRANGE against a Redis key or a database query. With 10 celebrity pulls, and each taking 2-5ms (cached) or 20-50ms (cache miss), the total pull latency is 2-50ms depending on cache hit rates. Celebrity timelines are hot (millions of readers access them) so cache hit rates exceed 99% in practice.
Caching the merged result is tempting but tricky. The merged feed is personalised (different users follow different celebrities), so you can't share it across users. You can cache it per-user with a short TTL (30-60 seconds) to handle rapid pull-to-refresh. The cache key includes the user id and a version number that increments on each inbox write, ensuring the user sees new posts within one refresh cycle.
For the ranking pipeline, the merge step produces a candidate set of ~100-200 items (50 from inbox + 10-15 × 10 from celebrity timelines). This candidate set is then scored and re-ranked by the ranking model, which may reorder items based on engagement prediction, recency decay, and interest-graph affinity. The merge-K step is therefore a pre-filter, not the final ordering.
Spread fan-out: taming mid-tier celebrities
The hybrid threshold handles mega-celebrities (>1M followers) by routing them to pull. But what about mid-tier celebrities — users with 100K to 1M followers who are still on the push path? A single post from a 500K-follower user generates 500K ZADD operations. If ten such users post simultaneously, that's 5M fan-out tasks hitting the queue at once.
Spread fan-out addresses this by chunking the follower list and scheduling chunks over time. Instead of enqueuing a single "fan out to 500K followers" task, the fan-out scheduler splits the follower list into chunks of 50K and schedules them with staggered delays: chunk 1 at t=0, chunk 2 at t=10s, chunk 3 at t=20s, and so on. The fan-out workers drain at a steady rate, never spiking above their provisioned throughput.
Chunk the follower list and drain at a steady rate to avoid thundering-herd writes.
The trade-off is fan-out latency. A 500K-follower post now takes ~100 seconds to reach all followers instead of ~10 seconds. For most users, this is invisible — they don't notice whether a post appears in their feed 5 seconds or 50 seconds after publication. The only users who might notice are the author's most engaged followers, who are typically in the first chunk (sorted by recent activity).
Implementation details: the fan-out scheduler stores chunk metadata in a delayed message queue (e.g., SQS with delay seconds, or a Redis sorted set used as a delay queue). Each chunk message contains the post id, the follower list offset, and the chunk size. Workers process chunks idempotently — if a chunk is retried, the ZADD operations are naturally idempotent (same member + same score = no-op).
Rate limiting fan-out workers is the complementary control. Each worker has a configurable writes-per-second limit (e.g., 50K ZADD/sec). If the queue depth exceeds a threshold, the scheduler automatically increases the inter-chunk delay for new posts, providing backpressure without dropping fan-out tasks. The system self-regulates: during quiet periods, fan-out is near-instant; during peak load, it spreads gracefully.
Handling follow/unfollow churn
The follow graph is not static. Users follow and unfollow accounts constantly — Twitter sees millions of follow/unfollow events per day. Each event has implications for the fan-out system that are easy to overlook in a design interview.
On follow: When user A follows user B, A's inbox should contain B's recent posts. Two approaches: (1) Backfill — a background worker fetches B's last N posts and ZADDs them into A's inbox. This provides immediate gratification but adds write load proportional to the follow rate. (2) Lazy — don't backfill; A will start seeing B's posts on the next fan-out. This is simpler but means A's feed looks sparse right after following someone. Most production systems use backfill with a small N (e.g., 10 posts).
On follow: backfill inbox. On unfollow: lazy filter at read, nightly sweep cleans stale entries.
On unfollow: When user A unfollows user B, B's posts should disappear from A's feed. Two approaches: (1) Eager deletion — scan A's inbox and remove all post ids authored by B. This requires a secondary index (post_id → author_id) and is expensive for large inboxes. (2) Lazy filter — leave B's posts in A's inbox but filter them out at read time by checking the follow graph. This is cheaper but means stale entries consume inbox space.
The production answer is lazy filter + nightly sweep. At read time, the feed service checks each candidate post's author against the reader's current follow set (cached) and filters out unfollowed authors. A nightly batch job scans inboxes and removes entries from unfollowed authors, reclaiming space. The lazy filter ensures correctness immediately; the nightly sweep ensures space efficiency eventually.
Edge case: A follows B, B posts, A unfollows B, A re-follows B. With lazy filter, B's posts are filtered out during the unfollow window and reappear on re-follow (because the filter checks current follow state). With eager deletion, the posts are gone and would need backfill on re-follow. The lazy approach handles this naturally.
Another subtlety: if B is above the celebrity threshold, follow/unfollow doesn't affect A's inbox at all (B's posts were never pushed). The follow graph update only affects the pull list used at read time — the feed service checks which celebrities to pull from based on A's current follow set.
The ranking pipeline: from candidates to feed
A modern feed is not chronological — it is ranked. The ranking pipeline transforms a raw candidate set into a personalised, engaging feed. In the hybrid fan-out model, the pipeline sits between the merge step and the final response.
Candidate generation from inbox + celebrity pull feeds into feature enrichment, ML scoring, and diversity filtering.
Stage 1: Candidate generation. The merge-K step produces 100-200 candidates from the inbox (push) and celebrity timelines (pull). This is the initial candidate pool.
Stage 2: Feature enrichment. Each candidate is annotated with features from the feature store: post age, author engagement rate, media type, number of likes/retweets in the first 5 minutes (early engagement signal), the reader's historical interaction rate with this author, topic embeddings, and recency decay. The feature store is typically a low-latency key-value store (Redis or a purpose-built feature service) pre-computed by offline pipelines.
Stage 3: Scoring. A lightweight ML model (typically a gradient-boosted tree or a small neural network) scores each candidate. The model predicts P(engagement) — the probability the reader will like, comment, or share the post. Training data comes from historical engagement logs. Inference must complete in <10ms for the full candidate set, which constrains model complexity.
Stage 4: Diversity filtering. A ranked list of the top 50 by score goes through a diversity filter that ensures no single author dominates the feed (e.g., max 3 posts per author in a single page), mixes content types (text, image, video), and injects exploration candidates (posts from authors the user hasn't interacted with recently) to avoid filter bubbles.
Stage 5: Materialisation. The final ranked page is cached per-user with a 60-second TTL. Subsequent feed reads within the TTL window return the cached result. On pull-to-refresh, the cache is invalidated and the pipeline runs again.
The ranking pipeline adds 20-50ms to feed read latency (on top of the ~10ms for inbox read + merge). The total feed read latency budget is typically 50-100ms. Keeping the pipeline within budget requires aggressive pre-computation (features in the feature store, not computed at serving time), model simplicity (inference must be <10ms), and parallelism (feature fetch and model inference can overlap for different candidates).
Case studies
The canonical push-pull hybrid
Twitter's home timeline is the most-cited example of hybrid fan-out in system design interviews, and for good reason — it's the system that forced the industry to invent the pattern.
In Twitter's early years (2006-2012), the timeline was pure fan-out on write. When a user tweeted, a fleet of fan-out workers wrote the tweet id into every follower's inbox stored in an in-memory timeline cache. Raffi Krikorian, then VP of Engineering, described this architecture in his 2012 QCon talk: the system processed ~400K tweets per minute (pre-bot era), each triggering an average fan-out of ~300 inbox writes. For the median user, this worked beautifully — sub-second delivery, O(1) reads.
The system started breaking around 2010 as celebrity accounts grew. Lady Gaga, Justin Bieber, and Barack Obama each had tens of millions of followers. A single Bieber tweet generated 30M+ inbox writes. The fan-out queue would back up for minutes, delaying timeline delivery for all users — not just Bieber's followers. The team called this the "celebrity problem" and it became the defining engineering challenge of the era.
The solution was the hybrid: users above a follower-count threshold (publicly acknowledged to be in the millions) were reclassified as "pull" accounts. Their tweets went into their own timeline (stored in Manhattan, Twitter's internal distributed database) but were NOT fanned out to follower inboxes. At feed-read time, the timeline service merged the reader's inbox with pull-fetched tweets from the celebrities they followed.
By 2013, Twitter processed ~500M tweets per day across 200M+ daily active users. The fan-out workers processed millions of inbox writes per second. The hybrid reduced the fan-out queue's peak load by ~40% (since celebrity tweets no longer generated fan-out tasks), and feed read latency stayed under 100ms for the p99.
Key technical details: inboxes were stored in Redis (later migrated to Manhattan for durability), bounded to ~800 entries per user, with ZADD + trim for bounded insertion. The feed read path merged the inbox with up to 20 celebrity timeline pulls, then passed candidates through a lightweight ranking layer.
Takeaway
Twitter proved that pure push breaks on power-law follower distributions and that the hybrid with a celebrity threshold is the production-viable architecture for social feeds.
Hybrid fan-out with ML-heavy ranking
Instagram's feed evolution illustrates how the fan-out pattern interacts with algorithmic ranking. When Instagram launched (2010), the feed was purely chronological — posts from followed accounts sorted by time. The underlying system used fan-out on write: when a user posted a photo, a background worker pushed the post id into each follower's inbox. With Instagram's early user base, this worked.
As Instagram grew (acquired by Facebook in 2012, 1B+ users by 2018), two pressures emerged. First, the celebrity problem — accounts like Selena Gomez (400M+ followers) made pure push untenable. Second, the shift from chronological to algorithmic ranking (announced in 2016) meant the feed was no longer just "your inbox sorted by time" — it was "a ranked selection of candidates scored by an ML model."
Instagram's solution mirrors Twitter's hybrid but with heavier emphasis on the ranking pipeline. The push path handles normal users (sub-threshold follower count), materialising inbox entries in a Redis-like cache. The pull path handles celebrity and brand accounts. At read time, the merge step produces a candidate set of ~200-500 items, which feeds into a multi-stage ranking pipeline:
- 1A lightweight first-pass model filters the 500 candidates down to ~150 based on coarse features (recency, author affinity, content type).
- 2A heavyweight second-pass model scores the 150 candidates using deep features (image embeddings, engagement prediction, interest graph similarity).
- 3A diversity and freshness pass ensures the final 50-item page mixes content types and doesn't over-index on a single author.
The key insight from Instagram's architecture is that fan-out and ranking are not independent concerns — they interact. Push pre-materialises a chronological inbox, but algorithmic ranking ignores the chronological order anyway. The push path's value is not in providing a sorted feed — it's in providing a fast candidate set. Without push, the ranking pipeline would need to pull from hundreds of timelines to generate candidates, adding unacceptable latency.
Instagram also applies different fan-out strategies for different content types: Feed posts use the full hybrid pipeline, while Stories use a simpler push-only path (Stories have a 24-hour TTL and a "ring" UI that naturally limits the number of visible items, reducing the ranking burden).
Takeaway
Instagram shows that fan-out and ranking are complementary: push provides fast candidate generation, ranking provides personalisation. Different content types (Feed vs Stories) can use different fan-out strategies.
Activity feed with bounded fan-out
LinkedIn's activity feed is an interesting variant of the fan-out pattern because it mixes structurally different content types — posts, job changes, endorsements, work anniversaries, article shares — each with different fan-out characteristics.
LinkedIn's social graph differs from Twitter's in a critical way: connections are symmetric (mutual) and bounded. The average LinkedIn user has ~500 connections, and the platform caps connections at 30,000. This means the push path for connection-sourced content has a natural upper bound — no post ever fans out to more than 30K inboxes. For connections, pure fan-out on write is viable and efficient.
However, LinkedIn also supports "following" (asymmetric, unbounded) for companies, influencers, and thought leaders. A company like Google has millions of followers. An influencer like Richard Branson has 19M+ followers. For these entities, LinkedIn uses the pull path — company and influencer posts are stored in the entity's timeline and pulled at read time.
The activity feed ranking pipeline handles the heterogeneity of content types. Each content type has different engagement signals: a job change notification has no "like" count but has a "congratulate" action; an article share has reading time; a post has likes, comments, and shares. The ranking model is trained on a unified engagement prediction that normalises across content types.
LinkedIn's engineering blog describes a three-tier architecture: (1) a "standardisation" layer that normalises activities from different sources into a common schema, (2) a fan-out layer that routes activities through push or pull based on the source entity type, and (3) a ranking layer that scores and diversifies the merged candidate set.
A notable operational detail: LinkedIn's feed has a "relevance vs recency" tension that's more acute than Twitter's. A connection's job change from 3 days ago is still highly relevant (you should congratulate them), while a post from 3 hours ago may not be (it's already been seen by others). The ranking model explicitly models "time-relevance decay" per content type, with job changes decaying slowly (days) and posts decaying quickly (hours).
Takeaway
LinkedIn demonstrates that bounded fan-out (symmetric connections) makes pure push viable for the primary social graph, while unbounded follows (companies, influencers) require the pull path — a natural partition that simplifies the hybrid.
Decision levers
Celebrity threshold value
The threshold determines which users get push vs pull treatment. Start at ~1M followers and tune based on fan-out queue latency. Lower the threshold if the queue backs up; raise it if read-time pull latency increases. Use hysteresis (e.g., promote at 1M, demote at 800K) to prevent oscillation. Monitor threshold crossings — a viral user crossing from normal to celebrity mid-post needs graceful handling.
Inbox size bound
Bounded inboxes (typically 800 entries) control Redis memory usage. The bound determines how far back a user can scroll in their pre-materialised feed before the system falls back to pull-based deep scroll. 800 entries × ~80 bytes = 64KB per user. For 200M users: ~12.8TB. Increasing the bound improves deep-scroll UX but linearly increases memory cost. Decreasing it saves memory but triggers more fallback reads.
Spread fan-out vs instant fan-out
For mid-tier celebrities (100K-1M followers), choose between instant fan-out (faster delivery, risk of queue saturation) and spread fan-out (chunked over 30-60 seconds, steady queue utilisation). Spread is safer but adds latency. The choice depends on the product's freshness SLA: if users expect sub-second delivery, instant with rate-limited workers; if 30-second delay is acceptable, spread.
Ranking model complexity
The ranking model sits on the read path and must score 100-200 candidates in <20ms. Gradient-boosted trees (XGBoost/LightGBM) inference in <5ms for 200 items. Small neural nets add expressiveness but need GPU inference or heavy optimisation. The ranking budget is the difference between the feed read SLA (e.g., 100ms) and the merge + data fetch time (~50ms). Overshoot the budget and feed reads lag; undershoot and engagement drops.
Multi-region vs single-region
Single-region is simpler but adds cross-region latency for global users (100-200ms for US-to-Asia). Multi-region with regional inboxes eliminates this but requires cross-region fan-out routing, regional Redis clusters, and celebrity timeline replication. The break-even point is typically ~30% of DAU outside the primary region. Below that, CDN-cached feed responses may suffice; above it, regional inboxes are worth the operational cost.
Failure modes
Without a celebrity threshold, a 100M-follower post generates 100M fan-out tasks. The queue saturates, workers fall behind, and ALL users' feeds go stale — not just the celebrity's followers. Fix: push-pull threshold. The celebrity's post goes to their timeline only; followers pull it at read time. Zero fan-out tasks, zero queue impact.
In pure pull, a user following 5,000 accounts triggers 5,000 timeline reads per feed refresh. Even with 99% cache hit rate, that's 50 cache misses at 20ms each = 1 second of tail latency. Fix: hybrid with push for normal-follower-count authors. The inbox handles 95% of the content; pull handles only the small number of celebrities.
Without bounded inboxes, memory grows linearly with post volume × follower count. At Twitter scale, unbounded inboxes would consume petabytes. Fix: ZREMRANGEBYRANK to trim inboxes to 800 entries. Deep scroll falls back to author timeline reads. Add TTL (7 days) to expire inactive users' inboxes.
A user hovering around the celebrity threshold (e.g., 990K-1.01M followers) gets reclassified on every check. Some posts fan out, others don't, creating inconsistent delivery. Fix: hysteresis — promote at 1M, demote at 800K, with a 1-hour cooldown before reclassification. The gap prevents rapid oscillation.
When a user unfollows an account, their inbox still contains that account's posts. Without cleanup, the feed shows content from unfollowed users. Fix: lazy filter at read time (check each candidate's author against current follow set) ensures immediate correctness. Nightly sweep removes stale entries for memory reclamation.
If the fan-out queue (SQS, Kafka, etc.) goes down, no new posts reach inboxes. Feeds go stale within minutes. Fix: replicated queue with at-least-once delivery. Fan-out workers are idempotent (ZADD with same member+score is a no-op). Dead-letter queue for poison messages. Circuit breaker on the fan-out path with fallback to pull for all users during outage.
Decision table
Fan-out approach comparison
| Approach | Best for | Write cost | Read cost | Complexity |
|---|---|---|---|---|
| Push (fan-out on write) | Normal users (<100K followers) | O(followers) per post | O(1) — read inbox | Low — single path |
| Pull (fan-out on read) | Celebrity-heavy or low-follow-count | O(1) — append to timeline | O(following) — merge K streams | Low — single path |
| Hybrid (push + pull) | Any large-scale social platform | O(followers) below threshold; O(1) above | O(1) inbox + O(K) celebrity pull | High — two paths + merge + threshold monitoring |
| Activity stream (bounded push) | System events, team feeds | O(group size) — bounded | O(1) — read inbox | Low — push only, no threshold needed |
- K = number of celebrities the reader follows (typically <20)
- Threshold is typically ~1M followers but should be tuned dynamically
Worked example
Worked example: Design a Twitter-like home timeline
Step 1: Recognise the pattern. The prompt says "home timeline" or "news feed" — this is fan-out. The key question is: when a user posts, how does the post reach their followers' feeds? Immediately say: "This is a fan-out problem. The core tradeoff is push vs pull, and at scale we'll need a hybrid."
The canonical production architecture: push for normal users, pull for celebrities, merge at read.
Step 2: Estimate the volumes. Twitter-scale numbers: 200M DAU, 500M tweets/day, average user follows ~200 accounts, median follower count ~300, top 0.1% of users have >1M followers. Feed reads: assume each DAU opens the app 5x/day = 1B feed reads/day ≈ 12K reads/sec. Tweet writes: 500M/day ≈ 6K writes/sec. Read:write ratio is ~2000:1 — this is extremely read-heavy, which favours push (pay at write time, amortise over many reads).
Step 3: Choose the hybrid. Pure push dies on celebrities: Bieber (100M followers) × 1 tweet = 100M inbox writes. Pure pull dies on power users: a user following 5K accounts triggers 5K timeline reads per feed refresh. The hybrid pushes for users below ~1M followers and pulls for users above. State this explicitly — it shows you understand the structural reason for the hybrid.
Step 4: Sketch the write path. User posts → post service persists to posts DB → enqueue fan-out task → fan-out worker checks author's follower count. If below threshold: paginate through follower list, pipeline ZADD operations to each follower's inbox (Redis ZSET, bounded to 800 entries). If above threshold: store in author's timeline only, no fan-out. The fan-out worker also stores the post in the author's own timeline (always, regardless of threshold).
Step 5: Sketch the read path. User opens app → GET /feed → feed read service reads user's inbox (ZREVRANGE, top 50 post ids) → also pull-fetches last 10 posts from each celebrity the user follows (typically 5-15 celebrities) → merge-K with min-heap → pass ~150 candidates to ranking service → ranking scores by engagement prediction, recency, affinity → diversity filter (max 3 per author, mix content types) → return ranked page of 50 → cache per-user with 60s TTL.
Step 6: Address the celebrity problem. The interviewer will ask: "What happens when Justin Bieber tweets?" Answer: "Bieber is above the celebrity threshold, so his tweet triggers zero fan-out. It goes into his timeline only. When a Bieber follower opens their feed, the feed read service pulls Bieber's latest tweets as part of the merge step. Since most users follow <20 celebrities, this adds at most 20 timeline reads — well within our latency budget."
Step 7: Address ranking. "We don't return a chronological feed — we rank. The inbox gives us ~50 candidates from push. Celebrity pulls give us ~100-150 more. We enrich each candidate with features from the feature store (engagement rate, recency, author affinity) and score with a lightweight ML model. The ranking adds ~20ms to the read path."
Step 8: Address follow/unfollow churn. "On follow: a backfill worker injects the new followee's last 10 posts into the follower's inbox. On unfollow: we filter at read time (check each candidate's author against the current follow set) and run a nightly sweep to clean stale inbox entries. This avoids expensive eager deletion while maintaining correctness."
This walkthrough hits every signal an interviewer looks for: pattern recognition, volume estimation, principled tradeoff reasoning, concrete data structure choices (Redis ZSET, min-heap merge), failure mode awareness (celebrity storm), and operational concerns (churn handling, caching, ranking).
Interview playbook
When it comes up
- "Design a news feed" or "Design Twitter home timeline"
- "Design an activity feed" or "Design Instagram feed"
- "How would you deliver a post to all followers?"
- Any mention of asymmetric follow graphs at scale
- "Design a notification system" (bounded fan-out variant)
Order of reveal
- 11. Name the pattern. This is a fan-out problem — the core question is whether to pay the distribution cost at write time (push) or read time (pull).
- 22. Frame the tradeoff. Push gives O(1) reads but O(followers) writes. Pull gives O(1) writes but O(following) reads. The follower distribution is power-law, which means neither alone works.
- 33. Propose the hybrid. The production answer is a hybrid: push for users below a follower threshold (~1M), pull for celebrities above it. Merge at read time.
- 44. Sketch the write path. Post → fan-out worker → check threshold → push into Redis ZSET inboxes (bounded to 800) or store in author timeline only.
- 55. Sketch the read path. Read inbox + pull celebrity timelines → merge-K with min-heap → rank with ML model → diversity filter → return page.
- 66. Address failure modes. Celebrity storm is handled by the threshold. Mid-tier surge is handled by spread fan-out (chunked over time). Stale follows are handled by lazy filter + nightly sweep.
- 77. Discuss scaling levers. Multi-region inboxes for global latency. Ranking model complexity vs latency budget. Dynamic threshold tuning. Inbox TTL for inactive users.
Signature phrases
- “Push and pull are two regimes of the same cost function; the follower threshold is the phase boundary.” — Shows you understand the structural reason for the hybrid, not just the implementation.
- “The celebrity problem isn't an edge case — it's the defining constraint of feed architecture at scale.” — Reframes what sounds like a corner case as the central design driver.
- “Inbox bounded to 800 entries via ZADD + trim — old content falls back to author timelines.” — Demonstrates concrete Redis mechanics and the bounded-storage invariant.
- “Spread fan-out: chunk the follower list and drain at a steady rate to avoid thundering-herd writes.” — Shows awareness of the mid-tier celebrity problem beyond just the threshold.
- “Lazy filter on unfollow, nightly sweep for space reclamation — correctness now, efficiency later.” — Demonstrates pragmatic engineering: immediate consistency via cheap filtering, deferred cleanup via batch.
- “The merge step is O(N log K) where K is the number of celebrity pulls — typically under 20.” — Precise complexity analysis shows you've thought about the read-path cost.
Likely follow-ups
?“What if a normal user goes viral and suddenly has 5M followers?”Reveal
The follower-count monitoring service detects the threshold crossing within 5 minutes. Future posts route through the pull path (no fan-out). In-flight fan-out for the current post continues — it was below threshold when posted. To handle the surge, spread fan-out chunks the follower list and drains at a controlled rate, preventing queue saturation. The user is reclassified with a hysteresis cooldown to prevent oscillation if their follower count fluctuates around the threshold.
?“How do you handle unfollow?”Reveal
Lazy filter at read time: when assembling the feed, check each candidate post's author against the reader's current follow set (cached in a bloom filter or set lookup). Posts from unfollowed authors are filtered out before ranking. A nightly batch sweep scans inboxes and removes entries from authors the user no longer follows, reclaiming Redis memory. This avoids expensive real-time deletion while ensuring immediate correctness.
?“How does ranking work in this system?”Reveal
The merge step produces 100-200 candidates (inbox + celebrity pulls). Each candidate is enriched with features from the feature store: post recency, author engagement rate, reader-author interaction history, content type, early engagement signals (likes in first 5 minutes). A lightweight ML model (gradient-boosted tree or small neural net) scores each candidate by P(engagement). A diversity filter ensures no author dominates and content types are mixed. Total ranking latency: <20ms for the full candidate set.
?“How would you handle multi-region deployment?”Reveal
Partition followers by region. When a post arrives, the region router dispatches fan-out tasks to regional workers. Each region has its own Redis inbox cluster, so feed reads are local. Celebrity timeline pulls may cross regions — cache them in each consuming region with a 30-second TTL. The follow graph is globally consistent (replicated) but fan-out is regional. Cross-region replication lag for celebrity content is acceptable because users don't expect sub-second freshness for celebrity posts.
?“Celebrity handling — why not just use pull for everyone?”Reveal
Pure pull means every feed read triggers O(following) timeline lookups. For a user following 500 accounts, that's 500 reads per feed refresh. Even with caching, the p99 latency is unacceptable — a single cache miss on one of 500 timelines adds 20-50ms. Push pre-materialises the feed for the common case (normal-follower-count authors), reducing feed reads to O(1) for ~95% of the content. The hybrid pays the push cost for the 95% where it's cheap and uses pull only for the 5% where push would be catastrophically expensive.
?“Real-time vs batch ranking — which one?”Reveal
Real-time ranking at feed-read time for the primary feed. The candidate set is small enough (100-200 items) that scoring completes in <20ms. Batch ranking (pre-computing ranked feeds offline) doesn't work because the candidate set changes with every new post and follow/unfollow event — the precomputed feed would be stale by the time it's read. However, some features used by the ranking model ARE batch-computed: author engagement rates, topic embeddings, and user interest profiles are updated by offline pipelines every few hours and served from the feature store.
Code snippets
import redis
from typing import List
r = redis.Redis(host='inbox-redis', port=6379, decode_responses=True)
INBOX_MAX_SIZE = 800
BATCH_SIZE = 50
def fanout_to_followers(
post_id: str,
timestamp: float,
follower_ids: List[str],
) -> int:
"""Push post_id into each follower's inbox ZSET.
Returns the number of inboxes written."""
written = 0
for i in range(0, len(follower_ids), BATCH_SIZE):
batch = follower_ids[i : i + BATCH_SIZE]
pipe = r.pipeline(transaction=False)
for fid in batch:
key = f"inbox:{fid}"
pipe.zadd(key, {post_id: timestamp})
# Trim to bounded size — keep only top INBOX_MAX_SIZE
pipe.zremrangebyrank(key, 0, -(INBOX_MAX_SIZE + 1))
pipe.execute()
written += len(batch)
return writtenimport heapq
from typing import List, Tuple, Iterator
def merge_k_streams(
streams: List[Iterator[Tuple[float, str]]],
page_size: int = 50,
) -> List[str]:
"""Merge K sorted streams (score desc) into a single ranked page.
Each stream yields (score, post_id) in descending order."""
# Max-heap via negated scores
heap: list[Tuple[float, int, str]] = []
for idx, stream in enumerate(streams):
item = next(stream, None)
if item:
score, post_id = item
heapq.heappush(heap, (-score, idx, post_id))
result: List[str] = []
while heap and len(result) < page_size:
neg_score, idx, post_id = heapq.heappop(heap)
result.append(post_id)
item = next(streams[idx], None)
if item:
score, pid = item
heapq.heappush(heap, (-score, idx, pid))
return resultimport redis
from typing import List, Set
r = redis.Redis(host='inbox-redis', port=6379, decode_responses=True)
CELEB_THRESHOLD = 1_000_000
def read_hybrid_feed(
user_id: str,
celebrity_ids: List[str],
page_size: int = 50,
) -> List[str]:
"""Read feed by merging push inbox + pull from celebrities."""
# 1. Read pre-materialised inbox (push path)
inbox_posts = r.zrevrange(
f"inbox:{user_id}", 0, page_size - 1, withscores=True
)
# 2. Pull celebrity timelines (pull path)
celeb_posts = []
pipe = r.pipeline(transaction=False)
for cid in celebrity_ids:
pipe.zrevrange(f"timeline:{cid}", 0, 9, withscores=True)
celeb_results = pipe.execute()
for result in celeb_results:
celeb_posts.extend(result)
# 3. Merge and sort by score (timestamp) descending
all_posts = list(inbox_posts) + celeb_posts
all_posts.sort(key=lambda x: float(x[1]), reverse=True)
# 4. Deduplicate and take top page_size
seen: Set[str] = set()
feed: List[str] = []
for post_id, _ in all_posts:
if post_id not in seen:
seen.add(post_id)
feed.append(post_id)
if len(feed) >= page_size:
break
return feedimport redis
from typing import Optional
r = redis.Redis(host='graph-redis', port=6379, decode_responses=True)
CELEB_THRESHOLD = 1_000_000
HYSTERESIS_LOWER = 800_000 # Don't reclassify until well below
def check_celebrity_status(
user_id: str,
current_status: Optional[str] = None,
) -> str:
"""Determine if a user should be on the push or pull path.
Uses hysteresis to prevent oscillation near the boundary."""
follower_count = int(r.get(f"follower_count:{user_id}") or 0)
if current_status == 'celebrity':
# Only reclassify back to normal if well below threshold
if follower_count < HYSTERESIS_LOWER:
return 'normal'
return 'celebrity'
else:
# Classify as celebrity if above threshold
if follower_count >= CELEB_THRESHOLD:
return 'celebrity'
return 'normal'-- KEYS[1] = inbox key (e.g., "inbox:user123")
-- ARGV[1] = post_id (member)
-- ARGV[2] = timestamp (score)
-- ARGV[3] = max inbox size (e.g., 800)
--
-- Atomically add the post and trim the inbox to bounded size.
-- Returns 1 if the post was new, 0 if it already existed.
local added = redis.call('ZADD', KEYS[1], ARGV[2], ARGV[1])
local size = redis.call('ZCARD', KEYS[1])
local max_size = tonumber(ARGV[3])
if size > max_size then
redis.call('ZREMRANGEBYRANK', KEYS[1], 0, size - max_size - 1)
end
return addedDrills
Why can't you use pure fan-out on write for a Twitter-scale system?Reveal
Because the follower distribution is power-law. A celebrity with 100M followers generates 100M inbox writes per post, saturating the fan-out queue and delaying delivery for ALL users. The write amplification (followers × posts/day) is unsustainable for the top 0.1% of users, even though it works perfectly for the median user with ~300 followers.
What data structure should you use for the inbox and why?Reveal
Redis sorted set (ZSET). Members are post ids, scores are timestamps. ZADD is O(log N) for insertion. ZREVRANGE returns the top-K by recency in O(K + log N). ZREMRANGEBYRANK trims to bounded size atomically. The ZSET provides insertion, retrieval, and trimming in a single data structure with sub-millisecond latency.
How does the merge-K step work at read time in the hybrid model?Reveal
Initialise a max-heap with the most recent entry from each source (inbox + K celebrity timelines). Extract the max, add it to the result, and push the next entry from that source. Repeat for page_size items. Time complexity: O(N log K) where N = page size and K = number of sources. For typical values (N=50, K=15), this is ~200 comparisons — trivial.
What happens when a normal user goes viral and crosses the celebrity threshold?Reveal
The follower-count monitoring service detects the crossing within minutes and reclassifies the user. Future posts route through the pull path (no fan-out). In-flight fan-out for the current post continues — it was below threshold when posted. Hysteresis prevents oscillation: promote at 1M, demote at 800K, with a cooldown period.
Why use lazy filter + nightly sweep for unfollows instead of eager deletion?Reveal
Eager deletion requires scanning the entire inbox to find posts by the unfollowed author — O(N) per unfollow, expensive at scale. Lazy filter checks the follow graph at read time (cached, O(1) per candidate) and hides unfollowed content immediately. The nightly sweep batch-processes stale entries for memory reclamation. This trades temporary space overhead for correctness + low write-path cost.
How would you handle fan-out in a multi-region deployment?Reveal
Partition followers by region in the follow graph. When a post arrives, the region router dispatches fan-out tasks to regional workers, each writing to regional inbox Redis clusters. Feed reads are served from the local region. Celebrity timeline pulls may cross regions — cache them locally with a 30-second TTL. This ensures feed reads never incur cross-region latency while keeping celebrity content reasonably fresh.