Geospatial / proximity lookup
When to reach for this
Reach for this when…
- "Find N nearest" queries
- Bounded-radius search ("within 2 km")
- Real-time location tracking + dispatch
- Map-based product UX
Not really this pattern when…
- Locations are fixed and small in number (< 1000) — memory + linear scan is fine
- Lookup is by region id, not coordinates (that's a plain index)
Good vs bad answer
Interviewer probe
“Find all drivers within 2 km of a pickup.”
Weak answer
"WHERE lat BETWEEN ? AND ? AND lng BETWEEN ? AND ?."
Strong answer
"Full scan at any real scale. Use a spatial index. Primary: PostGIS with a 2dsphere GiST index — ST_DWithin returns matches in milliseconds. Hot path: Redis GEO sharded by region, sub-ms GEORADIUS for the dispatch loop. Driver pings update both (Postgres as truth, Redis as fast path). Redis dies → falls back to PostGIS with higher latency. At planet scale (Uber territory) I'd move to H3 cells with custom per-cell sharded in-memory stores."
Why it wins: Rejects BETWEEN, names the right index, layers truth + hot tier, and calibrates to scale.
Cheat sheet
- •Never BETWEEN on lat/lng.
- •Default: PostGIS 2dsphere.
- •Hot path: Redis GEO for sub-ms.
- •Planet-scale: S2 or H3 custom.
- •Geohash: query 9 neighbours + filter actual distance.
- •Shard by region when one index outgrows a node.
Core concept
A spatial index answers "find all points within radius R of (lat, lng)" efficiently. Every modern option beats two B-trees intersected.
The main systems:
- PostGIS (Postgres + GiST/SP-GiST): R-tree-based. Supports points, polygons, complex shapes. Default when you're on Postgres.
- MongoDB 2dsphere: simple API, JSON-native. Fine for most apps.
- Redis GEO: geohash-backed sorted set; sub-ms GEORADIUS. Perfect for hot dispatch tier.
- Elasticsearch geo_point: combine geo with full-text search and aggregations.
- S2 / H3: cell-based libraries (Google / Uber). Planet-scale, uniform cells. Custom infrastructure.
Geohash edge-cell problem: two points near a cell boundary can have totally different prefixes despite being close. Always query the target cell + 8 neighbours, then filter by actual Haversine distance.
Architecture at scale (ride-sharing):
- Truth store: PostGIS for durability.
- Hot tier: Redis GEO sharded by region, rebuilt from truth on failure.
- Dispatch reads from Redis; falls back to PostGIS on Redis outage (slower but works).
Canonical examples
- →Uber / Lyft driver matching
- →DoorDash restaurant discovery
- →Yelp / Google Maps nearby search
- →Pokemon Go / AR proximity
- →ATM / store locator
Decision levers
Primary store
PostGIS if you're on Postgres. MongoDB 2dsphere if Mongo. For pure location workloads at high scale, a Redis-first design with a durable truth store.
Hot-path cache
Redis GEO for sub-ms. Size: 1M points ≈ 200 MB in a Redis instance. Shard by region to keep each shard small.
Query shape
Radius vs bounding box vs kNN. Radius + kNN most common in dispatch. Bounding box for map viewport queries.
Scale tier
< 10M points globally: single PostGIS index. 10–100M: sharded PostGIS + Redis GEO. 100M+ with ride-dispatch QPS: S2/H3 custom (Uber territory).
Failure modes
Two B-trees intersected. Full scan once rows > 10k. Use a spatial index always.
Two points 50m apart on opposite sides of a cell have different prefixes. Query 9 cells + filter by actual distance.
100M points in one index is painful. Shard by region / country / S2 cell-level.
PostGIS geometry uses planar math (wrong for lat/lng at distance); geography uses geodesic (right). Mixing silently breaks.
Drills
Why does BETWEEN fail?Reveal
It's two independent B-tree lookups intersected. With N points in a latitude band and M in a longitude band, you fetch N+M and intersect — linear in bands, not in the answer size. A spatial index walks only cells that overlap the query region.
kNN on a geospatial index — how?Reveal
Priority-queue walk of nearby cells. Start at the target cell; expand to neighbours; keep a running top-K heap; stop when no undiscovered cell can contain a point closer than the current K-th. PostGIS: ORDER BY location <-> target LIMIT K uses the spatial index natively.