intermediatearchitecture

Async messaging & queues

Decoupling, delivery semantics, ordering, DLQs, Kafka vs SQS vs RMQ.

Queues decouple producers from consumers — but the delivery semantics come with sharp edges. Exactly-once is a lie; at-least-once + idempotent consumers is the truth.

Read this if your last attempt…

You said "we'll put a queue in front of it" without naming the semantics
You assumed exactly-once delivery was a real thing
You haven't thought about what happens to messages that fail N times in a row
You can't name the difference between a queue and a log

The concept

Async messaging is the mechanism that lets you say "the response doesn't have to wait for this". The producer commits the work to a broker and returns to the client; the consumer runs at its own pace.

The three things to name when you propose a queue

Delivery semantics — at-most-once, at-least-once, or "effectively exactly-once". At-least-once is the pragmatic default; it requires idempotent consumers. Exactly-once within a closed system (Kafka transactions, FoundationDB) exists but is expensive and doesn't cross system boundaries.
Ordering guarantees — per-partition / per-key ordering vs global ordering. Global ordering doesn't scale; key-based partitioning preserves order for a user/entity while still scaling horizontally.
Failure handling — visibility timeouts, max-receives, and a Dead Letter Queue (DLQ) for messages that fail N times. Without a DLQ, a poison message blocks the partition forever.

Architecture diagram· Producer → queue → consumer, with DLQ

The queue decouples lifetimes: producer commits the work, consumer processes on its own schedule. DLQ catches what can't be processed after N attempts.

Queue vs log — pick the right shape for the workload.

Queue (SQS, RabbitMQ)	Log (Kafka, Kinesis)
Consumption model	One consumer per message	Many independent consumer groups
Message lifetime	Deleted on ack	Retained for N days regardless
Replay	Not possible — it's gone	Trivial — rewind offset
Ordering	Per-queue or none	Per-partition (key-based)
Best for	Work distribution, jobs	Event streams, CDC, analytics fan-out
Scaling limit	Throughput per queue	Partitions × per-partition throughput

How interviewers grade this

You name delivery semantics explicitly: "at-least-once with idempotent consumers".
You state the ordering guarantee and what partitions it (per-user, per-account, per-aggregate).
You have a DLQ in the picture with a max-receive count.
You distinguish queue (work distribution, one consumer per message) from log (event fan-out, multiple independent readers).
You size the queue: expected depth at peak, retention window, message size, resulting GB.

Variants

Work queue (competing consumers)

Producers enqueue jobs; a pool of workers competes to consume.

Architecture diagram· Queue vs log: competing consumers vs independent consumer groups

Queue (top): one worker per message, message deleted on ack. Log (bottom): every group reads the full stream at its own offset, retained for days.

The canonical async job pattern. Each job is processed exactly once by one worker (assuming ack-on-success). Throughput scales with worker count. Ordering is not preserved — accept that or use a keyed queue.

Pros

+Simple operational model
+Linear horizontal scaling
+Back-pressure is visible (queue depth)

Cons

−No replay
−No fan-out — one consumer per job
−Poison messages need a DLQ

Choose this variant when

Background jobs
Email sending, image processing
Loose-coupling between services

Event log (Kafka-style)

Producers publish; multiple independent consumer groups read the same stream.

Architecture diagram· Partition by key: ordering within a key, parallelism across keys

Messages with the same key hash to the same partition; consumers see per-key ordering. Different keys process in parallel. Global ordering is sacrificed for scale.

The right shape for event-driven architectures. Each event is durably written to a partition and retained for days. Different consumers (search index, analytics, notifications) each maintain their own offset and replay from any point.

Pros

+Replay + time-travel debugging
+Fan-out is free — add a new consumer group
+Per-partition ordering

Cons

−Heavier to operate (broker cluster, ZK/KRaft)
−Per-partition ordering doesn't give global order
−Rebalancing pauses consumers briefly

Choose this variant when

Multiple consumers of the same events
CDC fan-out
Analytics/audit pipeline alongside ops

Pub/sub (SNS, Google Pub/Sub)

Fire-and-forget broadcast; subscribers get a copy each.

Architecture diagram· Three delivery semantics — what each guarantees and what fails

At-most-once drops; at-least-once duplicates; "effectively exactly-once" requires transactions + idempotency. Exactly-once across system boundaries is a lie.

Lightweight fan-out without the retention and operational cost of a log. No replay, but each subscriber (often another queue) gets its own copy. Good for "notify many things when X happens".

Pros

+Dead simple fan-out
+Managed-service friendly
+Decouples publisher count from subscriber count

Cons

−No replay
−Best-effort ordering
−Per-message cost adds up at high fan-out

Choose this variant when

Notifications, webhooks
Low-retention fan-out
Cross-service "something happened" events

Saga (distributed transaction via compensations)

Chain local transactions across services; compensate in reverse on failure.

Architecture diagram· Saga pattern: local transactions chained with compensations

A distributed transaction implemented as a sequence of local txns. If any step fails, the preceding steps run their compensations in reverse order.

When one user action spans multiple services that each own their own data (order service, payment service, inventory service, shipping service), you cannot use a single ACID transaction. The saga pattern replaces the distributed lock with a sequence of local transactions plus compensations that undo prior steps if a later step fails.

Two orchestration styles:

Choreography — each service listens for events and publishes the next event. No central coordinator. Simpler to deploy but harder to reason about the full flow ("which services participate?").
Orchestration — a dedicated orchestrator (state machine, often in something like Temporal / Step Functions / AWS SWF) calls each service in order and triggers compensations on failure. Easier to observe; one place owns the flow.

Key design rules:

1Every forward step has a compensation. "Cancel shipment" undoes "schedule shipment". "Refund" undoes "charge".
2Compensations must be idempotent — they may be retried.
3Not every step can be perfectly compensated (you can refund money but you can't un-send an email). Order steps so the hardest-to-undo step runs last.
4Saga state must be durable — if the orchestrator dies mid-flow, it resumes. Temporal, Step Functions, and DB-backed state machines give you this.

What the interviewer looks for: you distinguish saga from 2PC, you name choreography vs orchestration with a reason, you order steps by reversibility, and you name the durable state store.

Worked example

Scenario: User uploads a video. The system has to (a) transcode, (b) extract thumbnails, (c) run content moderation, (d) notify followers.

Bad shape: upload → sync fan-out to all four services. p95 is now the slowest service — fails if any one is slow or down.

Good shape:

1Upload endpoint writes the raw file to object storage and inserts a row in videos with status=uploaded.
2Uploader publishes one event video.uploaded to a log (Kafka topic, partitioned by user_id).
3Four consumer groups read this topic independently:

- transcoder — reads, transcodes, updates DB, publishes video.transcoded. - thumbnailer — parallel with transcoding. - moderator — parallel; if flagged, publishes video.flagged and sets status. - notifier — listens to video.transcoded + video.cleared (composite gate) and fans out to followers.

Delivery: at-least-once; every consumer is idempotent (keyed by video_id + stage). DLQ per consumer group, max-receives = 5. If a message ends up in DLQ, an alert fires and a human investigates.

Ordering: per-partition (user_id). Two uploads from the same user are processed in order; two users in parallel. Global ordering is not needed here.

Retention: 7 days on the topic — enough to replay a consumer group after a bug fix without losing data.

Good vs bad answer

Interviewer probe

“You want to send emails asynchronously. What does that look like?”

Weak answer

"We put a queue in front of the email service. The producer fires a message, the consumer sends the email. Done."

Strong answer

"Work queue (SQS). Producer inserts a row in an outbox table in the same transaction as the business write, then publishes the message (outbox pattern — guarantees the event is published iff the business change committed). Consumer is idempotent, keyed on (user_id, template_id, send_dedupe_key) — retries won't send duplicates. Visibility timeout: 30s. Max receives: 3. DLQ gets the poison messages. At 1k/s expected, one queue with ~10 workers is plenty. If the provider is down, the queue absorbs up to N minutes of messages; we alert when depth > 10k."

Why it wins: The strong answer names the outbox pattern (how events correctly bridge DB and queue), idempotency key design, operational knobs (visibility, retries, DLQ), sizing, and the alert threshold. The weak answer omits all of it.

Interview playbook2-3 minutes when introducing async flows; ~1 minute per deep-dive whenever delivery semantics, ordering, or failure recovery is probed.

When it comes up

When a slow or unreliable downstream would drag synchronous p99 beyond the budget
When multiple consumers need the same event (search index, notifications, analytics)
When the interviewer asks "what if this service is down?" about a dependency
Any email, notification, video processing, or background-job flow
When CDC, event-driven architecture, or saga comes up in deep-dive

Order of reveal

1
Name queue vs log upfront. "Queue for work distribution — one consumer per message. Log for event fan-out and replay — multiple independent consumer groups. Different shapes for different problems."
2
State delivery semantics explicitly. "At-least-once delivery with idempotent consumers. Exactly-once across network boundaries is a myth; at-least-once + idempotency is the production pattern."
3
Name the partition key for ordering. "Partitioned by user_id — per-user ordering preserved, global ordering sacrificed. Two users can process in parallel; two events for the same user are ordered."
4
Bridge DB and queue with outbox. "Outbox pattern: insert event into an 'outbox' table in the same DB transaction as the business write, then a relay publishes from the outbox. No dual-write race."
5
Specify operational knobs. "Visibility timeout 30s, max-receives 3, then DLQ. Alert when DLQ depth > 0 or queue depth > 10K."
6
Size the queue. "Peak depth × message size × retention = storage. At 1K/sec with 2 KB messages and 7-day retention, that's ~1.2 TB — Kafka SSD territory."

Signature phrases

“Exactly-once is a lie; at-least-once + idempotent is the truth”

“Queue for work, log for events”

“Per-partition ordering, not global”

“Outbox or CDC — never dual-write”

“Every consumer has a DLQ and a page on DLQ depth”

“Async is not free — consumer lag is on the dashboard”

“Exactly-once is a lie; at-least-once + idempotent is the truth” — The single most important semantics fact; candidates who claim exactly-once get flagged.
“Queue for work, log for events” — Distinguishes the two shapes in one phrase.
“Per-partition ordering, not global” — Shows you know how scale and ordering trade.
“Outbox or CDC — never dual-write” — The correct pattern for bridging DB and queue.
“Every consumer has a DLQ and a page on DLQ depth” — Operational maturity in one line.
“Async is not free — consumer lag is on the dashboard” — Pushes back on the naive "async means infinite scale" view.

Likely follow-ups

?“Explain the outbox pattern and why it beats a naive transaction.publish() pattern.”Reveal

The naive dual-write:

text

BEGIN;
INSERT INTO orders ...;   -- DB commit
COMMIT;
kafka.publish(event);      -- Network publish

Three failure modes:

1DB commits, network to Kafka fails → event lost, business state drifts.
2Kafka publish succeeds, DB commit fails → phantom event, consumers act on a non-existent order.
3Process crashes between the two → either of the above, silently.

The outbox fix:

text

BEGIN;
INSERT INTO orders ...;
INSERT INTO outbox (event_type, payload, aggregate_id) VALUES (...);
COMMIT;

A separate relay process reads from outbox (poll or CDC), publishes to Kafka, marks rows as sent.

Why this works:

Atomicity with business change — either both the order AND the outbox row commit, or neither.
At-least-once publish — if the relay crashes mid-publish, it retries. Consumers must be idempotent (keyed by event_id).
No distributed transaction — the relay handles async publish; the DB transaction stays local.

Implementation options:

1Polling relay — SELECT FROM outbox WHERE sent=false. Simple but ~100-500ms lag.
2Debezium / CDC — tail the DB change log directly to Kafka. Near-realtime (~10 ms lag), but adds operational complexity.
3Transactional outbox service — dedicated service per aggregate. Middle ground.

Tradeoff: one extra table, a relay process, ~seconds of publish latency. In exchange you get atomicity between DB and queue that no dual-write scheme provides.

?“A message fails repeatedly and lands in the DLQ. Walk me through the runbook.”Reveal

Step 1 — alert fires on DLQ depth > 0. Page the on-call, not a silent dashboard.

Step 2 — inspect. What kind of failure?

Transient downstream outage (DB was slow, API was rate-limited) → replay after the outage clears.
Poison data (malformed payload, missing required field) → consumer or producer bug.
Schema drift (producer emitted v2 event, consumer only knows v1) → deployment ordering bug.
Business-rule rejection (event references a deleted aggregate) → business logic needs a branch.

Step 3 — fix and replay. Most brokers support moving messages from DLQ back to the main queue (SQS has ReceiveMessage+DeleteMessage, Kafka needs tooling). After the fix ships, replay.

Step 4 — if it was a consumer bug, consider backfill. Messages may have silently succeeded with the wrong result before hitting DLQ. Check your idempotency logs for those IDs; re-emit if needed.

Step 5 — tune. Was max-receives too low (caused false DLQ during transient blip)? Too high (poison messages thrashed for too long before surfacing)? 3-5 is typical; tune per-consumer.

Step 6 — prevent. Add a schema registry (if drift), add a validation step in the producer (if poison data), add a circuit breaker around the downstream (if outage cascades).

Metrics to have: DLQ depth (alert), DLQ enqueue rate (trend), per-consumer failure rate (by error class). Without the last one you can't tell transient from poison.

?“Your partition key is user_id. A celebrity user generates 10× the message rate. What happens and how do you fix it?”Reveal

What happens: that one partition becomes hot. Its consumer is saturated while others idle. Consumer lag on that partition grows — the celebrity's events are seconds or minutes behind while others are real-time. Eventually you hit partition retention limits and lose data, or the consumer OOMs from lag.

Fixes, in order of pain:

1Scale the consumer for the hot partition. Beef up that specific consumer (bigger instance). Buys time but doesn't fix the structural issue. One partition still has one consumer in Kafka.

1Sub-partition the hot key. If per-user ordering isn't required for all event types, emit with key user_id + event_id_hash % 10 to spread across 10 partitions. Fast fix, loses ordering for that user.

1Two-level partitioning. Hash user_id into a wider space first. Celebrity users map to multiple partitions by construction. Ordering is preserved per (user, sub-partition).

1Dedicated consumer group for hot users. Detect hot users, route their events to a separate topic with more partitions and more consumers.

1Fix the producer. Often the hot key is a bug — the producer is retrying, amplifying, or fan-out-on-write-ing unnecessarily. Audit before scaling the consumer.

The ordering tradeoff to name explicitly: per-user ordering matters for some event types (like "user blocked another user, then sent message" — order matters). For others (like "user viewed post X") ordering doesn't. Classify events and sub-partition only the ones that tolerate reordering.

?“Kafka or SQS for this? How do you choose?”Reveal

Rule of thumb: SQS when you want a queue; Kafka when you want a log.

Pick SQS when:

One consumer group; work distribution pattern (jobs, emails, image processing).
Deleting on ack is the natural lifecycle — once done, gone.
You don't need replay.
You want a managed service with near-zero operational overhead.
Throughput is under ~30K msg/sec (SQS standard) or ~3K msg/sec with FIFO.
Message lifetime is minutes to hours, not days.

Pick Kafka when:

Multiple independent consumers of the same events (CDC fan-out, analytics alongside ops, search indexing).
You need replay (bug fix in a consumer, backfill a new consumer).
Throughput is >50K msg/sec.
You want per-partition ordering at scale.
Retention is measured in days or weeks — the stream is the source of truth.
You're willing to operate brokers, ZK/KRaft, schema registry.

Common mistake: using Kafka for a simple work queue. You pay 10× the operational cost for replay and fan-out you don't need. SQS + Lambda is often the right answer.

The other common mistake: using SQS where Kafka's multi-consumer replay actually matters. You end up building "replay from S3" infrastructure that Kafka gives you for free.

Newer middle ground: Google Pub/Sub, AWS Kinesis Data Streams. Managed log-shaped services without full Kafka operational weight.

?“How do you guarantee end-to-end ordering when a message flows through 3 services?”Reveal

You usually don't — and you shouldn't try to. End-to-end ordering across services means a global distributed ordering, which collapses to single-partition throughput.

Better framing: where does ordering actually matter?

Between two events for the same aggregate (same user, same order, same account) — yes, preserve this.
Between events across aggregates — no, these are independent.

How to preserve per-aggregate ordering across services:

1Partition every topic by the same key. If service A emits order.created partitioned by order_id and service B emits order.updated partitioned by order_id, then a single consumer that groups by order_id sees them in order (assuming at-least-once + idempotency).

1Propagate the partition key through the pipeline. Service A consumes from topic X (keyed by user_id), processes, emits to topic Y — also keyed by user_id. The downstream sees per-user ordering maintained.

1Use a sequence number per aggregate. If strict causal ordering is required, include a monotonic sequence per aggregate. Consumers buffer and wait for gaps or detect out-of-order delivery. Expensive; reserve for high-stakes flows (financial ledger, audit).

The trap: "serializer" service. Some teams funnel all events through a single ordering service to guarantee ordering. This destroys scale and creates a SPOF. Unless ordering crosses aggregate boundaries (which is usually a design smell), don't do this.

The right interview answer: "Per-aggregate ordering, not global. Every service partitions by the aggregate id. Global ordering is a different problem and usually indicates the boundary is wrong."

Code examples

typescriptIdempotent consumer — the at-least-once safety net

// At-least-once delivery guarantees you WILL see duplicates.
// The consumer must make processing idempotent by deduping on a
// stable key derived from the event, not just the delivery id.
async function handleOrderEvent(msg: Message) {
  const event = JSON.parse(msg.body) as OrderEvent;
  const idempotencyKey = `${event.order_id}:${event.type}`;

  const inserted = await db.execute(
    `INSERT INTO processed_events (key, received_at)
     VALUES ($1, now())
     ON CONFLICT (key) DO NOTHING
     RETURNING key`,
    [idempotencyKey],
  );

  if (inserted.rows.length === 0) {
    // Duplicate — already processed. Ack and move on.
    await msg.ack();
    return;
  }

  try {
    await applyBusinessLogic(event);   // safe to run exactly once now
    await msg.ack();
  } catch (err) {
    // Rollback dedup row so a retry can process it.
    await db.execute('DELETE FROM processed_events WHERE key = $1', [idempotencyKey]);
    throw err;                         // broker will redeliver
  }
}

typescriptOutbox pattern — publish iff the DB commit succeeds

// Problem: publishing to Kafka after the business commit can fail
// (process crashes in between), leading to lost events. Publishing
// BEFORE the commit can lead to ghost events (committed to Kafka,
// rolled back in DB). Neither is acceptable.
//
// Solution: write the event to an `outbox` table in the SAME
// transaction as the business write. A separate relay polls the
// outbox and publishes. At-least-once by construction.

async function createOrder(cmd: CreateOrderCmd) {
  await db.transaction(async (tx) => {
    const order = await tx.insert('orders', { ...cmd, status: 'pending' });
    await tx.insert('outbox', {
      aggregate_id: order.id,
      topic: 'orders.created',
      payload: JSON.stringify(order),
      created_at: new Date(),
    });
  });
  // commit -> both order and outbox row are durable
}

// Relay (separate process)
setInterval(async () => {
  const batch = await db.query(
    'SELECT * FROM outbox WHERE published_at IS NULL ORDER BY id LIMIT 100 FOR UPDATE SKIP LOCKED');
  for (const row of batch) {
    await kafka.send({ topic: row.topic, key: row.aggregate_id, value: row.payload });
    await db.query('UPDATE outbox SET published_at = now() WHERE id = $1', [row.id]);
  }
}, 100);

typescriptExponential backoff with DLQ after N failures

const MAX_RECEIVES = 5;

async function processWithBackoff(msg: Message) {
  const receives = msg.attributes.ApproximateReceiveCount ?? 1;
  try {
    await processMessage(msg);
    await msg.ack();
  } catch (err) {
    if (receives >= MAX_RECEIVES) {
      // Poison pill — stop retrying, send to DLQ for human triage.
      await dlq.send({ body: msg.body, error: String(err), receives });
      await msg.ack();
      return;
    }
    // Requeue with exponential visibility timeout: 1s, 2s, 4s, 8s, 16s.
    const visibilityMs = 1000 * Math.pow(2, receives - 1);
    await msg.changeVisibilityTimeout(visibilityMs);
    throw err; // broker will redeliver after the timeout
  }
}

Common mistakes

Assuming exactly-once delivery

Exactly-once across a network is impossible in the general case. Even "Kafka transactions" are exactly-once within Kafka, not end-to-end. Design for at-least-once + idempotent consumers. The candidate who asserts exactly-once is flagged.

No DLQ

A single poison message can block a partition or a consumer forever. Every consumer must have a DLQ and a max-receive count. Messages in the DLQ should page a human, not silently accumulate.

Dual-write (DB + queue without outbox)

Writing to the DB and publishing to a queue in two separate operations creates a race: the DB succeeds, the publish fails, the queue never gets the event. Fix with the outbox pattern (publish via CDC from an outbox table) or a transactional queue (a message-broker-in-your-DB).

No idempotency key

At-least-once means duplicates happen. Consumers must dedupe using a key — either the business identity (user_id + action_id) or a producer-provided correlation id. Without this, retries cause double-sends, double-charges, double-notifications.

Using a global-ordered queue at scaleAdvanced

Global ordering across all messages collapses to a single-partition bottleneck. Pick an order-preserving partition key (user_id, account_id, aggregate_id) and accept that messages across keys are not ordered with respect to each other.

Practice drills

Explain the outbox pattern in one paragraph.Reveal

In the same DB transaction as the business write, insert a row into an outbox table describing the event. A separate process (CDC tail, or a polling worker) reads the outbox and publishes to the broker, marking rows as sent. This makes the DB write and the event publish atomic from the outside world's point of view: you never publish without committing, and you never commit without eventually publishing. The cost is seconds of publish latency and one extra table — cheap compared to the correctness it buys.

Interviewer: "a message ends up in the DLQ. What's your runbook?"Reveal

Step 1: inspect — is it a poison-data bug (bad payload) or a transient downstream failure that outlasted the retry window? Step 2: if transient, replay the message back onto the main queue (most brokers support this directly). Step 3: if poison, fix the consumer or sanitise the payload, then replay. Step 4: alert tuning — DLQ depth > 0 should page; DLQ growth rate > X should escalate.

Your partition key is user_id. A celebrity user triggers 10× the rate of everyone else. What happens and how do you fix it?Reveal

One partition becomes hot: its consumer is pegged while others idle. Lag on that partition grows; the celebrity's messages are seconds behind. Fixes: (1) sub-partition within the hot key (user_id + sequence%N, if order-within-user is not needed); (2) use two-level keying (user_id hashed into a wider space); (3) dedicated consumer for the hot partition with more resources. The "right" fix depends on whether ordering within that user matters.

Cheat sheet

•Queue = one consumer per message (work). Log = many independent consumers + replay (events).
•Default semantics: at-least-once + idempotent consumer. Never "exactly-once".
•Always: visibility timeout + max-receives + DLQ + alert on DLQ depth.
•Ordering: per-partition (key-based), not global. Name the partition key.
•DB + queue together: outbox pattern. Never dual-write.
•Size the queue: peak depth × message size × retention = storage.
•Latency budget: async ≠ free; p99 consumer lag should be on your dashboard.

Practice this skill

These problems exercise Async messaging & queues. Try one now to apply what you just learned.

chat system notification service

7% complete

Current

Read this if

Step 1 of 14

The concept

Jump to next

Queue (SQS, RabbitMQ)

Log (Kafka, Kinesis)

Consumption model

One consumer per message

Many independent consumer groups

Message lifetime

Deleted on ack

Retained for N days regardless

Replay

Not possible — it's gone

Trivial — rewind offset

Ordering

Per-queue or none

Per-partition (key-based)

Best for

Work distribution, jobs

Event streams, CDC, analytics fan-out

Scaling limit

Throughput per queue

Partitions × per-partition throughput

Saga (distributed transaction via compensations)

Chain local transactions across services; compensate in reverse on failure.

Architecture diagram· Saga pattern: local transactions chained with compensations

A distributed transaction implemented as a sequence of local txns. If any step fails, the preceding steps run their compensations in reverse order.

Two orchestration styles:

Choreography — each service listens for events and publishes the next event. No central coordinator. Simpler to deploy but harder to reason about the full flow ("which services participate?").
Orchestration — a dedicated orchestrator (state machine, often in something like Temporal / Step Functions / AWS SWF) calls each service in order and triggers compensations on failure. Easier to observe; one place owns the flow.

Key design rules:

1Every forward step has a compensation. "Cancel shipment" undoes "schedule shipment". "Refund" undoes "charge".
2Compensations must be idempotent — they may be retried.
3Not every step can be perfectly compensated (you can refund money but you can't un-send an email). Order steps so the hardest-to-undo step runs last.
4Saga state must be durable — if the orchestrator dies mid-flow, it resumes. Temporal, Step Functions, and DB-backed state machines give you this.

What the interviewer looks for: you distinguish saga from 2PC, you name choreography vs orchestration with a reason, you order steps by reversibility, and you name the durable state store.

// At-least-once delivery guarantees you WILL see duplicates. // The consumer must make processing idempotent by deduping on a // stable key derived from the event, not just the delivery id. async function handleOrderEvent(msg: Message) { const event = JSON.parse(msg.body) as OrderEvent; const idempotencyKey = `${event.order_id}:${event.type}`; const inserted = await db.execute( `INSERT INTO processed_events (key, received_at) VALUES ($1, now()) ON CONFLICT (key) DO NOTHING RETURNING key`, [idempotencyKey], ); if (inserted.rows.length === 0) { // Duplicate — already processed. Ack and move on. await msg.ack(); return; } try { await applyBusinessLogic(event); // safe to run exactly once now await msg.ack(); } catch (err) { // Rollback dedup row so a retry can process it. await db.execute('DELETE FROM processed_events WHERE key = $1', [idempotencyKey]); throw err; // broker will redeliver } }

// Problem: publishing to Kafka after the business commit can fail // (process crashes in between), leading to lost events. Publishing // BEFORE the commit can lead to ghost events (committed to Kafka, // rolled back in DB). Neither is acceptable. // // Solution: write the event to an `outbox` table in the SAME // transaction as the business write. A separate relay polls the // outbox and publishes. At-least-once by construction. async function createOrder(cmd: CreateOrderCmd) { await db.transaction(async (tx) => { const order = await tx.insert('orders', { ...cmd, status: 'pending' }); await tx.insert('outbox', { aggregate_id: order.id, topic: 'orders.created', payload: JSON.stringify(order), created_at: new Date(), }); }); // commit -> both order and outbox row are durable } // Relay (separate process) setInterval(async () => { const batch = await db.query( 'SELECT * FROM outbox WHERE published_at IS NULL ORDER BY id LIMIT 100 FOR UPDATE SKIP LOCKED'); for (const row of batch) { await kafka.send({ topic: row.topic, key: row.aggregate_id, value: row.payload }); await db.query('UPDATE outbox SET published_at = now() WHERE id = $1', [row.id]); } }, 100);