intermediatetradeoffs

Consistency trade-offs

Strong vs eventual vs causal, quorum, read-your-writes.

CAP is not a trivia question. It's the trade-off that every distributed system lives under, and getting it wrong is how you end up with "strong consistency" backed by a single node — or "eventual consistency" on data that absolutely cannot be eventually wrong.

Read this if your last attempt…

You said "strong consistency" for the whole system without justifying it per data class
You can't explain the difference between linearizable, sequential, causal, and eventual
You've heard of CAP but haven't heard of PACELC
You default to "eventual consistency" for everything because "it's more scalable"

The concept

CAP vs PACELC

The CAP theorem says: in the presence of a network partition (which will happen in any distributed system), you choose between Consistency (all nodes see the same data) and Availability (every request gets a response). Partition tolerance is mandatory — networks fail. So you're really choosing between C and A during partitions. During normal operation, you can have both.

Architecture diagram· The consistency spectrum

Not a binary choice. Each data class picks its level on this spectrum — strong costs latency, eventual costs correctness guarantees.

Consistency levels you'll actually cite.

Level	Guarantee	Cost	Typical use
Linearizable (strong)	Every read sees the latest committed write	Needs consensus; high latency	Uniqueness checks, financial balances, inventory
Sequential	All nodes agree on operation order	Similar to linearizable; needs global ordering	Audit logs, state machines, distributed locks
Causal	Causally related operations appear in order	Vector clocks; moderate overhead	Comments (reply after parent), collaborative docs
Read-your-writes	A client sees its own writes immediately	Session affinity or sticky reads	Dashboards, edit-then-view flows, profile updates
Monotonic reads	A client never sees time go backwards	Session routing / version tracking	Timelines, comment threads, notifications
Eventual	Replicas converge given no new writes	Cheapest — no coordination	Counts, metrics, recommendations, search indexes

Variants

Linearizable (strong consistency)

Every read returns the most recent committed write, as if there's a single copy of the data.

Architecture diagram· Quorum math: W + R > N ⇒ linearizable reads

N=3, W=2, R=2. Any write quorum (2 nodes) and any read quorum (2 nodes) must overlap in at least one node — that node has the latest write.

How it works: Single-leader with synchronous replication to a quorum, or a consensus protocol like Raft or Paxos. Every write blocks until a majority of replicas acknowledge.

Real-world systems: Google Spanner (TrueTime + Paxos), CockroachDB, etcd/ZooKeeper (for metadata/coordination).

Cost: Every write has the latency of the slowest quorum member. For a 3-replica cluster with one cross-AZ replica, that's ~2-5ms extra per write. For cross-region: 50-200ms per write — often prohibitive.

When it's worth it:

Uniqueness constraints (username, short-URL alias, ticket seat)
Financial transactions (account balance, transfer)
Distributed locks / leader election
Anything where reading stale data causes a business-level bug (double-booking, double-spending)

When it's NOT worth it: Anything read-heavy where staleness is tolerable. Forcing linearizable reads on a social media timeline kills throughput for zero user benefit.

Choose this variant when

Uniqueness checks
Financial data
Inventory / booking
Coordination metadata

Causal consistency

Causally related operations appear in order; concurrent operations may appear in any order.

Architecture diagram· Three multi-region consistency shapes

Regional primary (most common) vs Spanner-style (globally linearizable) vs multi-leader (eventual). Picking wrong costs either write latency or correctness.

The insight: most operations aren't concurrent with each other — they have causal relationships. A reply to a comment is causally dependent on the comment. Causal consistency guarantees you'll never see the reply without the comment.

How it works: Track dependencies with vector clocks or logical timestamps. When operation B depends on A, B carries A's timestamp, and no replica serves B until it has also processed A.

Why it matters: Causal consistency gives you much of the benefit of strong consistency with much less coordination overhead. You don't need global ordering — just ordering of related operations.

Real-world systems: MongoDB (causal consistency sessions), COPS (academic), some Cassandra configurations with lightweight transactions.

Example: In a collaborative document, if User A types "Hello" and User B replies "Hi", causal consistency ensures every user sees "Hello" before "Hi". But two independent edits to different paragraphs can appear in either order.

Choose this variant when

Comment threads (reply after parent)
Collaborative editing
Chat message ordering within a conversation
Any case where "this happened because of that" matters

Read-your-writes

A client always sees its own writes immediately; other clients may see a delay.

Architecture diagram· Read-your-writes via LSN token

Write returns an LSN; client stores it in session; subsequent reads pass the LSN; replica waits until it has applied up to that LSN before answering.

The most common "good enough" consistency level. Users expect that after they click "Save", they see their changes. They don't expect other users to see the change instantly.

Implementation options:

1Session affinity: route the user's reads to the same replica they wrote to. Simple but brittle (what if that replica fails?).
2Read-after-write token: the write returns a logical timestamp; the client passes it on the next read; the replica waits until it's caught up to that timestamp before responding.
3Read from leader: after a write, read from the leader for N seconds, then fall back to replicas. Blunt but effective.

Where most candidates go wrong: They promise "strong consistency" when all they actually need is read-your-writes. The difference is huge: strong consistency requires every read from every client to see the latest write, which needs consensus. Read-your-writes only requires your own session to be consistent, which just needs routing.

Cost: Nearly free — just route correctly. No consensus needed.

Choose this variant when

User dashboards
Profile edits
Settings changes
Any "I just saved, why don't I see it?" scenario

Eventual consistency

All replicas converge to the same state given no new writes — but there's no guarantee on when.

Architecture diagram· Concurrent writes: five resolution strategies

Two replicas accept conflicting writes to the same field. Each strategy picks a different tradeoff between simplicity, correctness, and data preservation.

Default for read-heavy, scale-out systems. Async replication means writes are fast (ack from one node) and replicas catch up in the background. The convergence window is typically milliseconds to low seconds.

The catch: conflict resolution. If two replicas accept conflicting writes concurrently (e.g., two users edit the same field), you need a strategy:

Last-writer-wins (LWW): use timestamps; latest write wins. Simple but loses data silently.
Merge / CRDT: automatically merge concurrent updates (e.g., sets can union, counters can add). No data loss but not always possible for arbitrary data.
Application-level resolution: surface the conflict to the user (like Git merge conflicts).

Real-world systems: DynamoDB (default), Cassandra, S3, DNS.

When it's appropriate:

Click counts, view counts, analytics
Recommendation scores
Search index updates
Session data that's regenerable
Anything where "slightly stale" is invisible to the user

When it's NOT appropriate: Anything where the divergence window creates a business bug (double-sell, double-book, lost money).

Choose this variant when

Counters and analytics
Search indexes
Recommendations
CDN cache content
Any data where millisecond staleness is invisible

Worked example

Scenario: designing an e-commerce platform. The interviewer asks about consistency.

Data class analysis:

Product inventory → Linearizable. Overselling is a business-critical bug. Write quorum W=2 of N=3 on decrement; read quorum R=2 on stock check. Or single-leader with sync replication.

User cart → Read-your-writes. User adds item, expects to see it. No other user reads this cart. Route reads to the same replica via session affinity or read-after-write token.

Product reviews → Eventual. A review appearing 5 seconds late is invisible. Async replication to read replicas; replicas serve the read-heavy review page.

Product catalog (name, price, description) → Eventual with short window. Price changes propagate within seconds but don't need consensus. Cache with 30s TTL.

Order confirmation → Linearizable. The order record must be durable and consistent before sending the confirmation email. Write to leader + sync replicate to standby before returning 200.

What the interviewer hears: "I'm not using one consistency level for the whole system. Inventory gets linearizable because overselling costs money. Cart gets read-your-writes because only the owner reads it. Reviews are eventual because seconds of staleness are invisible. I can implement this with a single Postgres cluster: SELECT FOR UPDATE on inventory, session-pinned reads for cart, read-replica serving for reviews."

Quorum math for inventory: N=3 replicas, W=2, R=2. W+R=4 > N=3, so reads always overlap with the latest write. Writes are slower by ~2ms (latency of second replica within AZ), but inventory accuracy is worth it.

Good vs bad answer

Interviewer probe

“What consistency model does your system use?”

Weak answer

"Strong consistency — we use Postgres which is ACID compliant."

Strong answer

"It depends on the data class. Inventory is linearizable — I use SELECT FOR UPDATE on the stock count because overselling costs real money. The user's cart is read-your-writes — I route their reads to the replica they last wrote to via a session token, which is nearly free. Product reviews are eventually consistent — async replication to read replicas is fine because a 2-second delay on a review is invisible. I'm not paying the consensus overhead for data that doesn't need it."

Why it wins: Names three different consistency levels for three data classes, justifies each with a business reason, costs out the trade-off, and doesn't over-engineer.

Interview playbook2 minutes when classifying data classes; 2-3 minutes per deep-dive when consistency becomes the crux of a design decision.

When it comes up

Whenever a database is introduced in the HLD
During multi-region design — CAP becomes acute
When replication lag, stale reads, or read-your-writes come up
Whenever "ACID", "strong consistency", or "eventual consistency" enters the conversation
In any deep-dive on inventory, booking, balance, or uniqueness

Order of reveal

1
Reject one-size-fits-all. "Consistency isn't a system-wide switch; it's a per-data-class decision. Different pieces of data need different guarantees."
2
Classify the data. "Three classes here: must-be-exact (inventory, balance, uniqueness), must-see-my-own-writes (dashboards, profiles), and stale-is-fine (counts, analytics, search results)."
3
Assign the minimum viable level per class. "Linearizable for inventory — overselling costs money. Read-your-writes for the user's own dashboard — route reads with session affinity. Eventual for review counts — they'll converge in seconds."
4
State the cost of each choice. "Linearizable writes pay quorum latency — ~2 ms intra-AZ, 50-100 ms cross-region. Read-your-writes is nearly free. Eventual is the cheapest at every layer."
5
Use PACELC, not just CAP. "CAP is about partitions — rare. PACELC includes normal-operation latency vs consistency, which affects every request. Most systems are PA/EL or PC/EC."
6
Prove consistency with quorum math if quorum is used. "N=3 replicas, W=2, R=2. W+R=4 > N=3, so reads overlap with the latest write. Linearizable by construction."

Signature phrases

“Consistency is per data class, not per system”

“PACELC, not just CAP”

“Read-your-writes is cheap; strong consistency is expensive”

“W + R > N ⇒ linearizable reads”

“Eventual is fine when staleness is invisible”

“"ACID" is local; CAP is distributed”

“Consistency is per data class, not per system” — The single most important reframe; separates seniors instantly.
“PACELC, not just CAP” — Signals depth beyond the interview cliché.
“Read-your-writes is cheap; strong consistency is expensive” — Prevents over-engineering.
“W + R > N ⇒ linearizable reads” — Concrete math, not vibes.
“Eventual is fine when staleness is invisible” — Justifies the choice with a user-centric test.
“"ACID" is local; CAP is distributed” — Catches the common conflation.

Likely follow-ups

?“Your system is globally distributed. Give me the full consistency story across regions.”Reveal

The core tension: cross-region latency is 50-200 ms. Synchronous cross-region replication turns every strongly-consistent write into a multi-region round-trip. Most systems can't pay this.

Three common shapes:

1. Regional primary, global async replicas (most common).

Each user is pinned to their home region. Writes go to the regional primary.
Async replicas in other regions for DR and global read traffic.
Consistency: strong within region, eventual across regions.
Cost: cross-region reads from replicas may be seconds behind. Failover to another region has RPO > 0.
When: geographically partitioned users (US users mostly write US data, EU users mostly write EU data).

2. Spanner-style globally consistent.

Shards distributed across regions; each shard's leaders in 3+ regions; commit requires majority.
Consistency: linearizable globally.
Cost: write latency ~100-150 ms (cross-region RTT for quorum). Expensive engineering (TrueTime, GPS clocks).
When: regulatory requirement for strong global consistency (financial exchanges, compliance-heavy). Rare.

3. Multi-leader with conflict resolution.

Every region accepts all writes; async replication between regions; conflicts resolved by LWW, CRDTs, or app-level merge.
Consistency: eventual with convergence.
Cost: conflicts are a real problem; LWW loses data silently, CRDTs limit data types.
When: high write rate with clear convergence semantics (collaborative editing, shopping carts). Cassandra, Dynamo Global Tables, Riak.

Per-data-class across regions:

Global balance / inventory: Spanner-style or regional-primary with cross-region confirmation.
User profile: regional primary, eventual cross-region.
Analytics: eventual everywhere, async streaming to a central warehouse.
Session state: regional (no cross-region consistency needed if session is tied to the user's home region).

The honest interview answer: "Global strong consistency is expensive. I'd partition by user region and accept eventual for cross-region reads. For truly global strong data (like unique usernames), I'd either use a dedicated Spanner-backed service or accept a single-region bottleneck for that one check."

?“What is read-your-writes consistency and how do you implement it without requiring strong consistency?”Reveal

Definition: a client always sees its own writes, even if other clients might not yet. Weaker than linearizable (which requires every client to see the write). Stronger than eventual.

Why it matters: most user-facing "consistency" complaints are really RYW violations. A user saves a setting, reloads, sees the old value — they're confused. But whether another user sees the new setting immediately is irrelevant.

Three implementation strategies:

1. Session affinity (sticky reads).

Route all reads for a user session to the same replica they wrote to.
Cheap, simple, works for session lifetime.
Breaks if the replica fails mid-session (fallback to leader).

2. Read-after-write token / LSN tracking.

Write returns a logical timestamp (Postgres LSN, Cassandra write time, MongoDB's ClusterTime).
Client stores the token in the session.
Subsequent reads pass the token; the replica either:

- Waits until it has applied up to that LSN, OR - Rejects the read (redirect to another replica or the leader).

More robust than session affinity; handles replica failover gracefully.

3. Leader reads for N seconds post-write.

After a write, route reads to the leader for a bounded time (say 5 seconds — longer than typical replica lag).
After the window, reads go to replicas.
Blunt but works for most apps. Simple to implement if you already track "time since last write" per session.

Cost comparison:

Linearizable reads: quorum read on every request. Full quorum latency for every read.
Read-your-writes: session routing or token check. ~0 extra latency; uses existing replication.

Typical production shape: token-based RYW for the user's own sessions, eventual reads for other users' views of the same data. This is what MongoDB's causal consistency sessions implement; Postgres supports it via pg_current_wal_lsn() + pg_last_wal_replay_lsn() on replicas.

The interview move: when someone says "we need strong consistency," ask "strong for whom?" Usually they want RYW.

?“You have a 3-node cluster with W=1, R=1. A user writes to node A, then reads from node B. What do they see?”Reveal

Probably not their write. W=1 means only node A has the write committed locally. R=1 means we return the value from whatever node the read hits — which could be B, which hasn't received the replication yet.

W + R = 2, N = 3. W + R ≤ N, so reads are NOT guaranteed to see the latest write. This is the eventual consistency case.

The user experience: they save, they refresh, they see stale data. Confusing and feels like a bug.

Options to fix, in order of cost:

1. Increase R to 3 (read from all, take latest).

Now W + R = 4 > 3. Linearizable reads.
Cost: reads need responses from all nodes; latency = max of N nodes.
Good for small clusters where N=3 is typical.

2. Increase W to 2 (write quorum).

N=3, W=2, R=1. W + R = 3 = N. Still not guaranteed linearizable (need strict >, not ≥).
Actually: if you can guarantee read hits a node that has the write, strong. If read hits the one node without it, stale.
Usually combine W=2, R=2 (sum = 4 > 3) for safety.

3. Route the read to node A (session affinity / RYW).

Now the read hits the same node as the write; immediate visibility.
Only the writing user sees the write immediately; others still eventual.
Cheapest option for the common case.

4. Use read-after-write tokens.

Write returns the LSN. Subsequent reads pass the LSN. Replica waits to apply or redirects.
Robust to replica failover.

The interview insight: the "correct" answer depends on what guarantee you need. For dashboards: option 3 (RYW). For uniqueness checks: option 1 or 2 (linearizable). For social-media timelines: accept option 0 (eventual) and move on.

?“Two users edit the same document field simultaneously in an eventually consistent system. What happens and how do you resolve it?”Reveal

What happens by default: both writes are accepted by whatever replica they hit. Eventually they propagate. When the replicas meet, they see two conflicting versions.

Resolution strategies (pick one or combine):

1. Last-writer-wins (LWW) by timestamp.

Each write carries a timestamp; the one with the later time wins.
Simple, but loses data silently. If User A writes at t=10:00:00.500 and User B writes at t=10:00:00.501, A's change vanishes with no warning.
Requires synchronised clocks — cross-region clock skew breaks this.
When: data where "last edit wins" is semantically correct (user profile bio, where both users see "that's my edit now"). Not for collaborative work or counters.

2. CRDTs (Conflict-free Replicated Data Types).

Data types with mathematical merge rules. Counters (G-Counter, PN-Counter) merge by addition. Sets (G-Set, OR-Set) merge by union. Registers (LWW-Register, MV-Register) keep multiple values.
No data loss, deterministic merge.
Limitation: works only for supported types. You can't CRDT-merge arbitrary application objects; you must design data structures that decompose into CRDT primitives.
When: counters (likes, views), shopping carts (union of items), collaborative text (Y.js, Automerge).

3. Vector clocks + application-level merge.

Each replica tracks per-node version vectors. Concurrent writes are detected explicitly.
When conflict detected, return both values to the application. The app (or user) resolves.
When: Git-like workflows, version-controlled documents, complex business objects where only a domain expert can merge.

4. Pessimistic locking.

Before writing, acquire a distributed lock on the resource.
Forces serialization; no conflicts possible.
Cost: latency of the lock round-trip, availability risk if the lock service fails.
When: narrow, high-value operations where you can't afford conflict (financial transfers, seat reservations).

5. Operational transformation (OT).

Used by Google Docs, Office Online. Operations are transformed against concurrent operations so they compose correctly.
Complex to implement; being displaced by CRDTs in newer systems.
When: real-time collaborative editing where latency matters.

Interview answer: "It depends on the data. Counters → CRDT (PN-Counter). User profile field → LWW with caveat documented. Collaborative doc content → CRDT (Y.js-style) or OT. Anything financial → pessimistic lock or strong consistency; I wouldn't use eventual for that at all."

?“Why would you ever choose eventual over strong? Isn't strong always safer?”Reveal

Strong is never "free." It costs latency on every write and scalability on every read.

The concrete tradeoff:

	Strong	Eventual
Write latency	Quorum round-trip (2-100+ ms)	Local node ack (sub-ms)
Write availability	Blocks if quorum unreachable	Accepts writes from any live node
Read throughput	Quorum read rate	Per-replica read rate × replica count
Complexity	Consensus protocol (Raft, Paxos)	Async replication
Cross-region cost	50-200 ms per write	Negligible

For a social media feed viewed 1M times/sec:

Strong: 1M reads/sec ÷ quorum rate. At 10K reads/sec per quorum-read: need 100 quorum-read capacity. Expensive.
Eventual: 1M reads/sec ÷ per-replica rate. At 50K reads/sec per replica: need 20 replicas. Each replica serves independently. Much cheaper, linearly scalable.

When eventual is genuinely safer:

Partition tolerance. Strong requires majority available. During a network partition, a strong system refuses writes (unavailable). An eventual system keeps accepting writes (available). For a content feed, unavailable is worse than stale.
Availability under node failure. Strong needs quorum; lose half the nodes, lose the service. Eventual degrades gracefully.
Geographic distribution. Strong across continents is ~150 ms per write. Users won't wait. Eventual with regional primaries lets you serve from close by.

The test for "is eventual OK?":

1Is the divergence window shorter than human perception (typically < 1 s after convergence)? If yes, eventual is invisible.
2Does a stale read cause a business bug (double-sell, double-spend, incorrect balance)? If yes, eventual is not safe. Use strong.
3Is the write rate high enough that strong consistency would create a single-node bottleneck? If yes, eventual is necessary for scale.

The interview mental model: "Strong is the default for data where mistakes cost money. Eventual is the default for data where mistakes are invisible. Pick per class."

Code examples

typescriptRead-your-writes via LSN token (MongoDB causal sessions, Postgres LSN style)

// --- Writer returns the logical timestamp (LSN / clusterTime / etag) ---
async function writeAndGetToken(db: Db, doc: Doc): Promise<string> {
  const result = await db.execute(
    'INSERT INTO items (...) VALUES (...) RETURNING pg_current_wal_lsn()::text AS lsn',
    doc,
  );
  return result.rows[0].lsn; // e.g. "0/1A4F3B8"
}

// --- Client stores the token in session cookie / localStorage ---
session.set('min_read_lsn', await writeAndGetToken(db, doc));

// --- Subsequent read forces the replica to catch up ---
async function readWithRyw<T>(db: Db, sql: string, lsn: string): Promise<T> {
  // On a Postgres read replica:
  //   1. Compare replica's applied LSN with the required LSN.
  //   2. Either wait (bounded) or redirect to leader.
  const status = await db.execute(
    `SELECT pg_last_wal_replay_lsn() >= $1::pg_lsn AS caught_up`, [lsn]);
  if (!status.rows[0].caught_up) {
    // Option A: wait up to 500ms
    await db.execute(`SELECT pg_wait_for_lsn($1, '500ms')`, [lsn]);
    // Option B: throw and let router redirect to leader
  }
  return db.execute(sql);
}

typescriptQuorum read math — Dynamo/Cassandra-style (W + R > N)

async function quorumRead<T>(
  replicas: Replica[], // N replicas
  key: string,
  R: number,           // read quorum size
): Promise<T> {
  // Send read to all replicas, take first R responses.
  const results = await Promise.all(
    replicas.map((r) => r.get(key).catch((e) => ({ err: e }))),
  );
  const ok = results.filter((r): r is { value: T; ts: number } => 'value' in r);
  if (ok.length < R) throw new Error('quorum not reached');

  // Latest-wins by timestamp; optionally trigger read-repair on stale replicas.
  const winner = ok.reduce((a, b) => (a.ts > b.ts ? a : b));
  scheduleReadRepair(replicas, key, winner);
  return winner.value;
}

// Guarantee: if W + R > N, any read quorum overlaps with the write quorum
// in at least one replica — so the latest write is always in the read set.
// Example: N=3, W=2, R=2 ⇒ W+R=4 > 3 ⇒ linearizable reads.

typescriptCRDT counter (PN-Counter) — merge-safe without coordination

// Each replica tracks its own increments and decrements separately.
// Value = sum(increments) - sum(decrements). Merging is a pairwise max.
class PNCounter {
  // replicaId → count contributed by that replica.
  private P = new Map<string, number>(); // positive
  private N = new Map<string, number>(); // negative

  constructor(private readonly me: string) {}

  increment(by = 1) { this.P.set(this.me, (this.P.get(this.me) ?? 0) + by); }
  decrement(by = 1) { this.N.set(this.me, (this.N.get(this.me) ?? 0) + by); }

  value(): number {
    const sum = (m: Map<string, number>) =>
      [...m.values()].reduce((a, b) => a + b, 0);
    return sum(this.P) - sum(this.N);
  }

  // Merge with another replica's state. Idempotent, commutative, associative.
  // Concurrent increments are never lost — unlike LWW on a single integer.
  merge(other: PNCounter) {
    for (const [k, v] of other.P) this.P.set(k, Math.max(this.P.get(k) ?? 0, v));
    for (const [k, v] of other.N) this.N.set(k, Math.max(this.N.get(k) ?? 0, v));
  }
}

Common mistakes

Promising "strong consistency" everywhere

Announcing "we use strong consistency" when most reads don't need it costs latency on every request and forces single-leader shapes that won't scale. Name the data that actually needs it.

Eventual consistency on data that can't be wrong

Account balance, ticket inventory, alias uniqueness — these need stronger guarantees. "It'll converge" is not acceptable when the divergence window is a double-spend.

Confusing "ACID" with "distributed consistency"

A single Postgres instance is ACID-compliant, but that says nothing about what happens with replicas, multi-region, or multiple services. ACID is local; CAP/PACELC is about the distributed system. Don't conflate them.

Saying "we use eventual consistency because it's faster" without naming the risk

Eventual consistency IS faster — but the interviewer wants to hear: "the risk is a stale read during the convergence window, which for this data class is acceptable because [specific business reason]." Naming the risk is the signal.

Not knowing quorum mathAdvanced

If you say "we use quorum" but can't state N, W, R and prove W+R>N, the claim is empty. For N=3: W=2, R=2 → linearizable. W=1, R=1 → eventual. Know the numbers.

Practice drills

Interviewer: "Is your system CP or AP?" How do you answer?Reveal

Don't answer with one letter. Say: "It depends on the data class. Inventory is CP — during a partition, we reject writes rather than risk overselling. Product catalog is AP — during a partition, we serve stale data from replicas because a 30-second-old price is better than an error page. The system isn't uniformly CP or AP." Then mention PACELC for extra credit.

You have a 3-node cluster with W=1, R=1. A user writes to node A and immediately reads from node B. Do they see their write?Reveal

Not necessarily. W=1 means only node A has the write. R=1 means we read from one node, which might be B, which hasn't replicated yet. W+R=2 which is ≤ N=3, so reads are NOT guaranteed to see the latest write. For read-your-writes, either: (1) set R=3 (read from all, take latest), (2) route reads to the same node they wrote to, or (3) use a read-after-write token.

Two users simultaneously edit the same document field in an eventually consistent system. What happens?Reveal

Without conflict resolution: depends on the strategy. LWW: whichever write has the later timestamp wins; the other is silently dropped. This is fine for "last edit wins" fields (user profile bio) but terrible for counters (two increments → only one counted). CRDTs: for supported types (counters, sets, registers), merges automatically with no data loss. For arbitrary data: surface the conflict to the application layer, like Git does.

Why would you ever choose eventual consistency over strong? Isn't strong always better?Reveal

Strong consistency costs latency and throughput. For N=3 with sync replication: every write is as slow as the slowest quorum member (2-5ms within AZ, 50-200ms cross-region). Read throughput is capped at quorum read rate. Eventual consistency: writes ack from one node (sub-ms), reads from any replica (infinite horizontal scale for reads). For a social media feed viewed 1M times/sec, the choice between 1ms reads from any replica vs 5ms reads from quorum majority is the difference between 50 servers and 250 servers.

Cheat sheet

•Name the consistency level per data class, not per system.
•PACELC > CAP: Partition → A or C; Else → Latency or Consistency.
•Read-your-writes ≠ strong; it's much cheaper and usually what users actually need.
•Quorum: W+R>N → linearizable reads. Prove it with numbers.
•Eventual + idempotent reconciliation > strong + slow for most read-heavy data.
•Conflict resolution: LWW (lossy but simple) vs CRDTs (safe but limited types) vs app-level merge.
•Strong consistency tax: every write waits for quorum ack. Within AZ: ~2ms. Cross-region: 50-200ms.
•"We use Postgres so it's consistent" conflates ACID (local) with distributed consistency.

Practice this skill

These problems exercise Consistency trade-offs. Try one now to apply what you just learned.

url shortener

8% complete

Current

Read this if

Step 1 of 13

The concept

Jump to next

Level

Guarantee

Cost

Typical use

Linearizable (strong)

Every read sees the latest committed write

Needs consensus; high latency

Uniqueness checks, financial balances, inventory

Sequential

All nodes agree on operation order

Similar to linearizable; needs global ordering

Audit logs, state machines, distributed locks

Causal

Causally related operations appear in order

Vector clocks; moderate overhead

Comments (reply after parent), collaborative docs

Read-your-writes

A client sees its own writes immediately

Session affinity or sticky reads

Dashboards, edit-then-view flows, profile updates

Monotonic reads

A client never sees time go backwards

Session routing / version tracking

Timelines, comment threads, notifications

Eventual

Replicas converge given no new writes

Cheapest — no coordination

Counts, metrics, recommendations, search indexes

Strong

Eventual

Write latency

Quorum round-trip (2-100+ ms)

Local node ack (sub-ms)

Write availability

Blocks if quorum unreachable

Accepts writes from any live node

Read throughput

Quorum read rate

Per-replica read rate × replica count

Complexity

Consensus protocol (Raft, Paxos)

Async replication

Cross-region cost

50-200 ms per write

Negligible

// --- Writer returns the logical timestamp (LSN / clusterTime / etag) --- async function writeAndGetToken(db: Db, doc: Doc): Promise<string> { const result = await db.execute( 'INSERT INTO items (...) VALUES (...) RETURNING pg_current_wal_lsn()::text AS lsn', doc, ); return result.rows[0].lsn; // e.g. "0/1A4F3B8" } // --- Client stores the token in session cookie / localStorage --- session.set('min_read_lsn', await writeAndGetToken(db, doc)); // --- Subsequent read forces the replica to catch up --- async function readWithRyw<T>(db: Db, sql: string, lsn: string): Promise<T> { // On a Postgres read replica: // 1. Compare replica's applied LSN with the required LSN. // 2. Either wait (bounded) or redirect to leader. const status = await db.execute( `SELECT pg_last_wal_replay_lsn() >= $1::pg_lsn AS caught_up`, [lsn]); if (!status.rows[0].caught_up) { // Option A: wait up to 500ms await db.execute(`SELECT pg_wait_for_lsn($1, '500ms')`, [lsn]); // Option B: throw and let router redirect to leader } return db.execute(sql); }

async function quorumRead<T>( replicas: Replica[], // N replicas key: string, R: number, // read quorum size ): Promise<T> { // Send read to all replicas, take first R responses. const results = await Promise.all( replicas.map((r) => r.get(key).catch((e) => ({ err: e }))), ); const ok = results.filter((r): r is { value: T; ts: number } => 'value' in r); if (ok.length < R) throw new Error('quorum not reached'); // Latest-wins by timestamp; optionally trigger read-repair on stale replicas. const winner = ok.reduce((a, b) => (a.ts > b.ts ? a : b)); scheduleReadRepair(replicas, key, winner); return winner.value; } // Guarantee: if W + R > N, any read quorum overlaps with the write quorum // in at least one replica — so the latest write is always in the read set. // Example: N=3, W=2, R=2 ⇒ W+R=4 > 3 ⇒ linearizable reads.

// Each replica tracks its own increments and decrements separately. // Value = sum(increments) - sum(decrements). Merging is a pairwise max. class PNCounter { // replicaId → count contributed by that replica. private P = new Map<string, number>(); // positive private N = new Map<string, number>(); // negative constructor(private readonly me: string) {} increment(by = 1) { this.P.set(this.me, (this.P.get(this.me) ?? 0) + by); } decrement(by = 1) { this.N.set(this.me, (this.N.get(this.me) ?? 0) + by); } value(): number { const sum = (m: Map<string, number>) => [...m.values()].reduce((a, b) => a + b, 0); return sum(this.P) - sum(this.N); } // Merge with another replica's state. Idempotent, commutative, associative. // Concurrent increments are never lost — unlike LWW on a single integer. merge(other: PNCounter) { for (const [k, v] of other.P) this.P.set(k, Math.max(this.P.get(k) ?? 0, v)); for (const [k, v] of other.N) this.N.set(k, Math.max(this.N.get(k) ?? 0, v)); } }