Pattern

Search over content

When to reach for this

Reach for this when…

Full-text search over a corpus (millions+ of docs)
Faceted search on e-commerce
Log / metric exploration
"Search for..." in the prompt — always this

Not really this pattern when…

Lookup by exact key (that's a primary index)
Dataset < 10k rows (SQL with a btree is fine)
Structured-only queries on well-known columns (SQL wins)

Good vs bad answer

Interviewer probe

“Design product search for 10M products.”

Weak answer

"SQL with ILIKE '%term%'."

Strong answer

"Elasticsearch with BM25 base relevance. Postgres is source of truth; Debezium CDC publishes changes to Kafka; an indexer consumer upserts into ES. Fields: standard + English stemming on name/description; keyword (un-analysed) on categories and SKUs for exact filter; numeric on price and rating for sort + range. Query: multi_match with field boosts, function_score for popularity + recency, filters as must-clauses that don't affect the score. Rebuild: alias pattern — new index products-v4, bulk reindex from DB, atomic alias flip, keep v3 for rollback. Could use Meilisearch if we don't need ES's aggregation power — simpler ops."

Why it wins: Names engine, pipeline, analyzer strategy, BM25 + boosts, rebuild pattern, and the alternative for smaller scale.

Cheat sheet

•DB = truth. Search index = derived view.
•CDC / outbox publishes changes async.
•<10M docs: Postgres FTS often enough.
•>10M or facets: Elasticsearch / OpenSearch.
•Alias pattern for atomic reindex + rollback.
•BM25 base + function_score business signals.
•LIKE is not search.

Core concept

Search engines (Elasticsearch, OpenSearch, Meilisearch, Typesense, Vespa) are built on an inverted index: for every term, a posting list of (doc_id, positions, frequency). A query looks up posting lists, intersects/unions doc_ids, scores with BM25, sorts, returns top K.

Three separable layers:

1Ingestion — DB is truth; CDC or outbox publishes changes to the indexer, which updates the search engine. Async; accept a few seconds of search lag.
2Index — sharded by doc_id (or by tenant for multi-tenant). Replicated for HA and read throughput. Field-level analysis (language, stemming, stop words) — different fields use different analyzers.
3Query / re-rank — coordinator fans out to shards, merges top-K, applies business signals (popularity, recency, personalisation) via function_score or a separate re-ranker.

Relevance (BM25 + signals): BM25 is the base score. Layer business signals on top: popularity boost (log1p(sales)), recency decay, personalisation. Never replace BM25 — boost it.

Rebuild-ability is non-optional. Analyzer change, schema change, bug — you'll rebuild. Use the alias pattern: products alias points at products-v4; rebuild v5 in background; flip alias atomically; keep v4 for rollback.

Canonical examples

→Product search
→Wiki / documentation search
→Log exploration UI (Kibana)
→Feed search / hashtag search
→Support-ticket search

Decision levers

Engine choice

Postgres FTS up to ~10M docs and modest QPS. Meilisearch / Typesense for great DX at moderate scale. Elasticsearch / OpenSearch for billions + facets + custom scoring. Vespa when search IS the product and you need ML re-rank.

Ingestion pipeline

Never dual-write. Outbox or CDC (Debezium). App writes one DB transaction; CDC publishes to Kafka; indexer consumer upserts to search engine. Idempotent.

Analyzer per field

Free-text (name, description): standard + language stemming. Identifiers (category, SKU): keyword — un-analysed, exact match. Numeric: numeric fields for sort + filter. Wrong analyzer = zero matches.

Relevance strategy

BM25 base. function_score multiplies by popularity + recency. A/B test changes on CTR and conversion. Offline: ML re-rank on top of the top-K, for pros.

Failure modes

Search as source of truth

Elasticsearch has crashed and corrupted. Treat it as a derived view; never the only copy of the data. Rebuild from DB is the recovery path.

Synchronous indexing on write

Couples write latency to indexer availability. Make it async via CDC/outbox; accept a few seconds of search lag.

LIKE at scale

Full table scan per query. Dies at hundreds of thousands of rows. Step up to Postgres FTS, then a real search engine.

No rebuild storyAdvanced

Schema / analyzer change with no alias + rebuild pattern = downtime. Plan alias pattern from day 1.

Query-time personalisation everywhereAdvanced

Per-user re-ranking of every result blows cache and latency. Boost per-user at the top-K level, not at scan time.

Drills

Explain an inverted index in 30 seconds.Reveal

For every term, store a posting list of (doc_id, term_frequency, positions). A search looks up each term's list, intersects (AND) or unions (OR) doc_ids, scores with BM25, sorts, returns top K. The index is expensive to build, cheap to query — opposite of a btree.

You changed the analyzer. How to deploy?Reveal

Alias pattern. Create products-v4 with new analyzer. Bulk reindex from the source DB (or ES reindex API). Verify results. Flip products alias from v3 to v4 atomically. Keep v3 for 24h for rollback. Delete v3 after confidence.

11% complete

Current

When to reach for this

Step 1 of 9

Good vs bad answer

Jump to next

All patterns

Pattern

Search over content

When to reach for this

Reach for this when…

Full-text search over a corpus (millions+ of docs)
Faceted search on e-commerce
Log / metric exploration
"Search for..." in the prompt — always this

Not really this pattern when…

Lookup by exact key (that's a primary index)
Dataset < 10k rows (SQL with a btree is fine)
Structured-only queries on well-known columns (SQL wins)

Good vs bad answer

Interviewer probe

“Design product search for 10M products.”

Weak answer

"SQL with ILIKE '%term%'."

Strong answer

Why it wins: Names engine, pipeline, analyzer strategy, BM25 + boosts, rebuild pattern, and the alternative for smaller scale.

Cheat sheet

•DB = truth. Search index = derived view.
•CDC / outbox publishes changes async.
•<10M docs: Postgres FTS often enough.
•>10M or facets: Elasticsearch / OpenSearch.
•Alias pattern for atomic reindex + rollback.
•BM25 base + function_score business signals.
•LIKE is not search.

Core concept

Three separable layers:

1Ingestion — DB is truth; CDC or outbox publishes changes to the indexer, which updates the search engine. Async; accept a few seconds of search lag.
2Index — sharded by doc_id (or by tenant for multi-tenant). Replicated for HA and read throughput. Field-level analysis (language, stemming, stop words) — different fields use different analyzers.
3Query / re-rank — coordinator fans out to shards, merges top-K, applies business signals (popularity, recency, personalisation) via function_score or a separate re-ranker.

Relevance (BM25 + signals): BM25 is the base score. Layer business signals on top: popularity boost (log1p(sales)), recency decay, personalisation. Never replace BM25 — boost it.

Canonical examples

→Product search
→Wiki / documentation search
→Log exploration UI (Kibana)
→Feed search / hashtag search
→Support-ticket search

Decision levers

Engine choice

Ingestion pipeline

Never dual-write. Outbox or CDC (Debezium). App writes one DB transaction; CDC publishes to Kafka; indexer consumer upserts to search engine. Idempotent.

Analyzer per field

Relevance strategy

BM25 base. function_score multiplies by popularity + recency. A/B test changes on CTR and conversion. Offline: ML re-rank on top of the top-K, for pros.

Failure modes

Search as source of truth

Elasticsearch has crashed and corrupted. Treat it as a derived view; never the only copy of the data. Rebuild from DB is the recovery path.

Synchronous indexing on write

Couples write latency to indexer availability. Make it async via CDC/outbox; accept a few seconds of search lag.

LIKE at scale

Full table scan per query. Dies at hundreds of thousands of rows. Step up to Postgres FTS, then a real search engine.

No rebuild storyAdvanced

Schema / analyzer change with no alias + rebuild pattern = downtime. Plan alias pattern from day 1.

Query-time personalisation everywhereAdvanced

Per-user re-ranking of every result blows cache and latency. Boost per-user at the top-K level, not at scan time.

Drills

Explain an inverted index in 30 seconds.Reveal

You changed the analyzer. How to deploy?Reveal

11% complete

Current

When to reach for this

Step 1 of 9

Good vs bad answer

Jump to next