coreapi

API contract design

Resource modelling, idempotency, pagination, error semantics.

The API is the contract every client writes code against. Vague endpoints here metastasize into ambiguity everywhere else in the design. Interviewers use API design to separate candidates who have shipped from candidates who have read blog posts.

Read this if your last attempt…

Your reviewer marked down "idempotency" or "pagination"
You defaulted to REST without asking what the client looks like
You put everything in query string, including POST bodies
You have no error envelope — just "return an error"
You said "version it later" and never named how

The concept

An API is a contract, and contracts have to be explicit. A strong API design is four layered choices.

Pick a style: REST, gRPC, GraphQL, or async (events).
Define a resource model: what the nouns are, what their identifiers look like, how they nest.
Lock the semantics: idempotency, pagination, error envelope, versioning.
Address the non-functional surface: auth, rate limiting, payload limits, deprecation policy.

Architecture diagram· The four API styles and where they fit

REST for public browser-friendly APIs, gRPC for internal mesh, GraphQL for varied client views, async for long-running work.

Picking the API style — the 80% rule.

Style	Sweet spot	Weakness	Auth norm	Caching	Typical interview use
REST/JSON	Public APIs, browsers, broad client support	Overfetching, chatty composite reads	Bearer / API key	HTTP Cache-Control	Default unless prompt says otherwise
gRPC	Internal service mesh, low latency	Browser needs proxy, harder to debug	mTLS + JWT	Client-side	Internal mesh, streaming RPC
GraphQL	Client-controlled aggregation, multi-client	Hard to HTTP-cache; N+1 traps	Bearer	App-layer (Apollo)	Mobile + web sharing backend
Async (queue)	Long-running jobs, fan-out	Needs status/callback contract	Bearer	N/A	Exports, email, notifications
SSE / WebSocket	Real-time server push	Persistent connection scale	Bearer on upgrade	N/A	Live chat, live feeds

Most real systems compose 2–3 styles: REST for public, gRPC internally, async for long work, SSE for push.
Do not introduce a style without naming why — every style adds operational surface.

How interviewers grade this

You pick an API style (REST / gRPC / GraphQL / async) and justify against client type + payload shape.
Resource names are plural nouns with hierarchical IDs. No verbs in paths.
Write endpoints have an idempotency story (Idempotency-Key header, natural idempotent verb, or deduplication).
List endpoints specify pagination strategy (offset, cursor, or token).
Error responses follow a consistent envelope with a stable code field clients can switch on.
Status codes match semantics — 201 for create, 409 for conflict, 422 for validation, 429 for rate limit.
You name a versioning and deprecation policy up front, not "we'll figure it out".

Variants

REST / JSON over HTTP

Nouns-and-verbs design for public APIs and browser clients.

The default style. Resources are nouns in the path; HTTP methods are the verbs.

Conventions that matter:

Plural resources: /users, not /user
Nested collections up to two levels: /users/{id}/orders
Status codes used as contract: 200/201/204 on success, 4xx on client error, 5xx on server error
ETags for optimistic concurrency: If-Match: "etag" on updates
Content-Type: application/json; everywhere

Sweet spot: public APIs consumed by browsers and third parties. Debuggable with curl, cacheable with Cache-Control, and every language has a client.

Weakness: overfetching (you get the whole resource even if you only need one field) and chatty for composite reads (multiple GETs to assemble one view). GraphQL exists because of these two.

Pros

+Widely supported, debuggable, cacheable
+Clear verb-method mapping is self-documenting
+Browser-friendly; works with Cache-Control and CDNs

Cons

−Overfetching — no field selection by default
−Chatty for composite reads
−Versioning drift over time (unless policy is disciplined)

Choose this variant when

Public or semi-public APIs
Diverse third-party client ecosystem
Standard CRUD patterns over well-defined resources

gRPC / Protobuf

Typed, streaming, low-latency RPC for internal service meshes.

gRPC is RPC that takes schema seriously. Protobuf defines messages and services; code generators produce strongly-typed clients in every major language; HTTP/2 multiplexes many calls over one connection.

Why it wins for internal mesh:

Typed contract — breaking changes caught at compile time
Binary encoding is 5–10× smaller than JSON
Bi-directional streaming (great for chat, telemetry, live updates)
Cancellation propagates through the deadline

When it hurts:

Browsers need grpc-web (a proxy) — not native
Harder to debug without specialized tooling
Schema evolution discipline required (do not renumber fields)

In an interview, reach for gRPC when the prompt involves a service mesh (dozens of internal services), real-time streaming (stock ticks, live chat), or very low latency (< 10 ms budget).

Pros

+Typed contracts catch breaking changes at compile time
+Efficient binary wire format
+HTTP/2 multiplexing + streaming

Cons

−Not browser-native (needs proxy)
−Harder to debug with off-the-shelf tools
−Schema evolution requires discipline

Choose this variant when

Internal service-to-service APIs
Real-time streaming requirements
Latency budget < 10 ms per hop

GraphQL

Client-controlled field selection and aggregation over a typed schema.

GraphQL lets the client ask for exactly the fields it needs, aggregated across multiple underlying resources, in one round trip.

Why it shines for multi-client backends:

iOS, Android, and web can each ask for a different slice of the same data
No overfetching — the client names the fields
Schema is typed end-to-end
Single endpoint (POST /graphql) — easier to version

The traps:

Caching is hard — HTTP caches key on URL, but GraphQL uses POST with a body. You end up needing an application cache.
N+1 resolver problem — naive server implementations hit the DB once per item in a list. Use DataLoader or similar batching.
Query depth attacks — a hostile client can write a deeply nested query that explodes server work. Limit depth and complexity.

In an interview, reach for GraphQL when the prompt explicitly mentions multiple client types with different data needs.

Pros

+Client controls field selection; no overfetching
+Single endpoint reduces surface area
+Strong typing end-to-end

Cons

−HTTP caching does not apply; needs app-layer cache
−N+1 resolvers hurt performance without DataLoader
−Query complexity must be bounded against DoS

Choose this variant when

Multiple clients with divergent data needs
Aggregation across several underlying services
Schema evolution and typed clients are important

Async (queue / webhook / event-driven)

Return 202 Accepted and deliver the result later via callback or polling.

For work that cannot complete inside a synchronous request — exports, ML inference, image processing, large fan-outs — accept the request, enqueue it, return a job handle, and let the client poll or receive a webhook.

The contract:

POST /exports → 202 Accepted with { job_id, status: "queued" }
GET /exports/{job_id} → { status: "running"|"done"|"failed", result?: {...} }
Optional webhook callback on completion

What you must specify:

Timeout / TTL for stale jobs
Retry policy (client-driven via poll, or server-driven via DLQ)
Result retention (where does the export live, for how long, under what URL)

The common mistake: using async for things that should be sync (adds latency for no reason) or sync for things that should be async (60-second HTTP requests fail in cruel ways).

Pros

+Decouples request latency from work duration
+Natural fit for fan-out and bulk operations
+Failures are retryable without client complexity

Cons

−Client needs to handle polling or webhook callbacks
−Status contract must be explicit (job_id, status field, result shape)
−Extra operational complexity (queue, DLQ, workers)

Choose this variant when

Work takes > a few seconds (exports, ML, transcoding)
Fan-out delivery (notifications to many recipients)
Client cannot block on result (mobile apps going to background)

Server-Sent Events / WebSockets / long-poll

Persistent connection for real-time server push.

When the server needs to push updates to the client as they happen — live scores, chat messages, stock ticks — RPC-style request/response is the wrong shape.

Three options:

Server-Sent Events (SSE): one-way server→client over HTTP; simple, auto-reconnecting, firewall-friendly. Great default for push.
WebSockets: full-duplex; use when client also sends frequent messages (chat, collaborative editing).
Long-polling: fallback for networks where SSE/WS do not work; each "poll" request holds open until a message arrives.

What you design:

Connection handshake (auth on the upgrade request)
Heartbeat / keepalive (detect stale connections within 30 s)
Backpressure (how to handle a slow client)
Scale-out (how connections land on a specific server; how to broadcast across servers — Redis pub/sub is the default)

Pros

+Real-time push without polling overhead
+Low latency for server-initiated messages
+Natural fit for chat, live feeds, collaboration

Cons

−Persistent connections consume server memory
−Harder to scale horizontally (connection affinity)
−Requires keepalive / reconnect logic on client

Choose this variant when

Server initiates messages (notifications, live scores)
Bidirectional low-latency (chat, collaboration)
Polling would be too frequent or too laggy

Worked example

Scenario: Design the POST endpoint to create a short URL.

Resource shape

Resource: a URL (plural: /urls)
Version: v1 (path-versioned)
ID scheme: short_code (7 chars, base62, server-generated unless client provides custom_alias)

Endpoint definition

POST /v1/urls
Request body: { long_url, custom_alias?, expires_at? }
Headers: Authorization: Bearer <token>, Idempotency-Key: <uuid>
Response 201 Created: { id, short_code, short_url, long_url, created_at, expires_at, owner_id }
Response 409 Conflict: { error: { code: "alias_taken", message, details: { alias } } }
Response 422 Unprocessable: { error: { code: "invalid_url", message, details: { field, reason } } }
Response 429 Too Many Requests: { error: { code: "rate_limited", message } } with Retry-After header

Idempotency contract

Client sends Idempotency-Key header with a UUID they generate
Server stores key → { status, response_body } for 24h in Redis
On retry with same key: return stored response, do not re-create
On retry with same key but different body: return 422 { code: "idempotency_body_mismatch" }

Pagination on the list endpoint

GET /v1/urls?after=<cursor>&limit=50
Response: { data: [...], next_cursor: "...", has_more: true }
Cursor is opaque base64 of (created_at, id) — stable under inserts

Rate limiting

100 create/min per API key, returned as HTTP 429 with Retry-After: 45
Headers on every response: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset

Versioning policy

v1 supported for 12 months after v2 ships
Deprecation header set for the last 6 months: Deprecation: true, Sunset: Thu, 01 Nov 2026 00:00:00 GMT
Clients still on v1 at sunset receive 410 Gone

Auth

API keys for server-to-server; rotate every 90 days; revoke immediately on compromise
OAuth2 bearer tokens for user-facing clients

That is a complete POST /urls design in under three minutes. Every axis a senior API reviewer probes is covered.

Good vs bad answer

Interviewer probe

“Walk me through the POST endpoint to create a short URL.”

Weak answer

"POST /createUrl with { long_url: '...' } in the body, returns { short_url: '...' }."

Strong answer

"POST /v1/urls — plural resource, versioned path. Body is { long_url, custom_alias?, expires_at? }. Returns 201 Created with the full resource { id, short_code, short_url, long_url, created_at, expires_at, owner_id }.

Idempotent via an Idempotency-Key header — we store key→response in Redis for 24h, so retries return the original resource. On custom alias conflict we return 409 Conflict with { error: { code: 'alias_taken', details: { alias } } }. Validation failures return 422 with structured details.

Rate limited per API key at 100/min; 429 with Retry-After. Versioning: path-based, 12-month support window after v2 ships, Deprecation/Sunset headers for the last 6 months. Auth: Bearer token for users, API key for servers."

Why it wins: Versioning, correct status codes, full resource returned, idempotency with storage contract, structured error envelope, rate limiting with headers, deprecation policy — every axis a senior API reviewer probes.

Interview playbook5–10 min, typically in the deep-dive phase

When it comes up

Immediately after HLD, when the interviewer says "walk me through the API"
When the prompt explicitly asks for a service contract
Any deep-dive on a write path (idempotency becomes the focus)
Any deep-dive on a list endpoint (pagination becomes the focus)
When the interviewer probes on "what if the client retries?"

Order of reveal

1
Pick the style. "REST for public / gRPC for internal / GraphQL for multi-client / async for long-running — I'll use X because Y."
2
Name the resources. "Resources are: users, orders, items. Plural nouns, IDs are UUIDv7 (opaque and time-ordered)."
3
Define the endpoint shape. "POST /v1/<resource>, request body is <fields>, response is 201 with the full resource."
4
Declare idempotency. "Idempotency-Key header; server stores key→response for 24h in Redis; retries return the cached response."
5
Declare pagination. "Cursor-based: ?after=<cursor>&limit=50; cursor is opaque base64 of the sort key. No offset pagination."
6
Error envelope + status codes. "All errors return { error: { code, message, details? } }. 201 on create, 409 on conflict, 422 on validation, 429 on rate limit."
7
Versioning + deprecation. "Path versioning; v1 supported for 12 months post-v2, Deprecation/Sunset headers for the last 6."
8
Auth + rate limiting. "Bearer tokens for users, API keys for servers. Rate limits on identity not IP. 429 with Retry-After."

Signature phrases

“Nouns in paths, methods are the verbs”

“Idempotency-Key header, stored for 24 hours”

“Cursor pagination, not offset”

“Stable error envelope with a code field”

“Path versioning with a 12-month deprecation window”

“Rate limit on identity, not IP”

“Nouns in paths, methods are the verbs” — Distills REST design into one correction-proof rule.
“Idempotency-Key header, stored for 24 hours” — Names Stripe's convention the interviewer is thinking of.
“Cursor pagination, not offset” — Shows you have dealt with production APIs at scale.
“Stable error envelope with a code field” — Signals you have been on the client side of bad APIs.
“Path versioning with a 12-month deprecation window” — Demonstrates API lifecycle maturity.
“Rate limit on identity, not IP” — Shows you know shared-NAT gotchas.

Likely follow-ups

?“Why not offset pagination?”Reveal

Two reasons:

1Inserts between pages cause duplicates or skipped items. A user paging through a list that grows while they scroll sees items flicker in and out of the feed.
2Deep offsets are O(offset) on the DB — OFFSET 100000 LIMIT 20 makes the DB scan and discard 100K rows. Query time grows linearly with offset.

Cursor pagination has neither problem. The cursor encodes "where you were" (usually a base64 of the last item's (sort_key, id)) and the DB seeks directly via an index lookup: WHERE (created_at, id) < (cursor_time, cursor_id) ORDER BY created_at DESC, id DESC LIMIT 20. Constant time regardless of page.

Only use offset for bounded admin UIs where the dataset is small.

?“The client sends the same Idempotency-Key with a different request body. What do you return?”Reveal

422 Unprocessable Entity with { error: { code: "idempotency_body_mismatch", message: "Idempotency-Key already used with a different request" } }.

This is Stripe's convention and it's the correct defensive stance. Returning the cached response would be wrong (it does not match what the client is now asking for); creating a new resource would violate idempotency. The server has to reject.

The implementation: when storing the idempotency record, also store a hash of the request body. On retry, compare hashes before returning the cached response.

?“How do you version the API without breaking existing clients?”Reveal

Path-versioning with a strict lifecycle policy:

1Publish v2 alongside v1; both serve traffic.
2Set `Deprecation: true` and `Sunset: <date>` headers on v1 responses starting 6 months before sunset.
3Publish a migration guide that maps v1→v2 changes.
4At sunset, v1 returns 410 Gone with a pointer to the migration guide.
5Minimum support window: 12 months after v2 ships.

For smaller changes (adding a field), use additive evolution — never remove or rename a field in the same major version; just add new fields and document them. Clients ignoring unknown fields is a standard convention.

?“What happens when the rate limit is hit?”Reveal

429 Too Many Requests with three things:

Retry-After header: seconds the client should wait (or an HTTP date)
Body: { error: { code: "rate_limited", message: "Request rate exceeded. Retry after 45 seconds." } }
X-RateLimit headers on every response (success and failure): X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset

The Retry-After header lets polite clients back off automatically. The X-RateLimit headers let clients self-throttle (slow down when remaining is low) without hitting 429 at all. Both together is the convention that Twitter, GitHub, and Stripe follow.

Under the hood I use token-bucket per identity (API key or user ID), not IP — shared NAT makes IP limiting punish innocent users.

Code examples

httpIdempotency via client-supplied key — Stripe convention

POST /v1/charges HTTP/1.1
Idempotency-Key: a1b2c3-d4e5-f6-7890
Content-Type: application/json
Authorization: Bearer sk_test_...

{
  "amount": 2000,
  "currency": "usd",
  "source": "tok_visa"
}

# On first call: creates charge, stores key → response
# On retry: returns the stored response, does NOT create a second charge
# On retry with different body: 422 { code: "idempotency_body_mismatch" }

httpCursor-based pagination — stable under inserts

GET /v1/urls?after=eyJ0IjoxNjg5MDAwMDAwLCJpZCI6Ijg3NjUifQ&limit=50

HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1715000000

{
  "data": [ /* ... 50 items ... */ ],
  "next_cursor": "eyJ0IjoxNjg5MDAwMDQyLCJpZCI6IjkwMDAifQ",
  "has_more": true
}

jsonStable error envelope

HTTP/1.1 409 Conflict
Content-Type: application/json

{
  "error": {
    "code": "alias_taken",
    "message": "The alias 'cool-link' is already in use.",
    "details": {
      "alias": "cool-link",
      "conflicting_id": "url_abc123"
    },
    "request_id": "req_01HF9ZPXYZ"
  }
}

protobufgRPC service definition — typed, versionable

syntax = "proto3";
package urls.v1;

service UrlService {
  rpc CreateUrl(CreateUrlRequest) returns (Url);
  rpc GetUrl(GetUrlRequest) returns (Url);
  rpc ListUrls(ListUrlsRequest) returns (ListUrlsResponse);
}

message CreateUrlRequest {
  string long_url = 1;
  optional string custom_alias = 2;
  optional google.protobuf.Timestamp expires_at = 3;
}

message Url {
  string id = 1;
  string short_code = 2;
  string long_url = 3;
  google.protobuf.Timestamp created_at = 4;
  optional google.protobuf.Timestamp expires_at = 5;
  string owner_id = 6;
}

Common mistakes

Verbs in paths

POST /createUser is the tell that you've shipped RPC, not REST. Resources are nouns. The verb is the HTTP method. Same rule applies in gRPC — method names like CreateUser match the resource, not the operation style.

No idempotency on writes

Every network client retries. Without an idempotency contract, retries become duplicates. Either use natural idempotent verbs (PUT with client-supplied id), or require an Idempotency-Key header and dedupe server-side. Payments systems have been taken down by missing this.

Offset pagination on a mutable listAdvanced

Offset pagination breaks when inserts happen between pages — users see duplicates or miss items. Also O(offset) on the DB, which gets painful past offset 10k. Use cursor/token pagination. Reserve offset for admin UIs where the dataset is small and bounded.

Ad-hoc error shapes

Two endpoints return errors as { error: "..." } and { message: "..." }. Clients now have to parse both. Lock a single envelope: { error: { code, message, details? } }. code must be a stable enum clients can switch on.

Wrong status codes

Returning 200 with { error: "..." } instead of 4xx. Clients now parse every response body to detect errors, and retry logic breaks because they retry 200s. Match the class: 2xx = success, 4xx = client error, 5xx = server error.

No versioning policyAdvanced

"We'll version it later" means you never will. Put /v1/ in paths from day one, commit to a 12-month support window after /v2/ ships, and use Deprecation/Sunset headers for the last 6 months. The discipline matters more than the mechanism.

Practice drills

A client retries a POST and you created the resource twice. What's your fix?Reveal

Server-side dedupe on an idempotency key. Two common shapes:

1Natural idempotency: the resource has a unique id the client provides (e.g., client_message_id on a chat app). Server reject on duplicate key.
2Idempotency-Key header: require every POST to include a client-generated UUID; server stores key → response for 24h; on repeat, return the stored response.

Edge case: client retries same key with a different body — return 422 { code: "idempotency_body_mismatch" }. Do not silently return the cached response; do not create a new resource.

Operational: keep the idempotency store small (TTL) and monitor for key-reuse patterns (could indicate a buggy client).

Your GraphQL endpoint is slow because one resolver hits the DB 100 times per request. What's happening?Reveal

N+1 resolver problem. The client asked for a list of 100 orders, each with its customer. Your naive resolver fires one query for customer per order. 1 query for the list + 100 queries for the customers = 101 queries.

Fix: DataLoader pattern. Batch and dedupe fetches within a single request. The customer resolver puts customer IDs on a queue; a tick later, DataLoader fires one SELECT * FROM customers WHERE id IN (...) that resolves all 100 in one round trip.

This is a required pattern for GraphQL at production scale — every major GraphQL server (Apollo, Relay) ships DataLoader or equivalent.

Your API has been running for 2 years; a major breaking change is needed. How do you ship it?Reveal

Parallel versions with a deprecation schedule.

1Ship /v2 alongside /v1. Both serve traffic. v2 has the breaking changes; v1 is frozen.
2Publish migration guide mapping v1 → v2 endpoint-by-endpoint.
3Instrument v1 usage to know who's still on it. Reach out to the top 10 clients.
4T-6 months: start setting Deprecation: true and Sunset: <date> headers on v1 responses.
5T-2 months: email all remaining v1 clients with the cutover date.
6T=0: v1 returns 410 Gone with a pointer to the migration guide.
7T+3 months: remove v1 code after usage has been zero for a quarter.

Skipping these steps is how APIs get a reputation for breaking. The discipline matters more than the versioning scheme.

When would you choose gRPC over REST for a new internal service?Reveal

When all of the following are true:

You own both sides (no third-party consumers)
Latency budget is tight (< 10 ms per hop) — binary encoding + HTTP/2 multiplexing matters
Strong typing helps — the team values compile-time contract checking over curl-ability
Streaming is needed — bidirectional streams for chat, telemetry, live updates

Do not choose gRPC when:

Browsers are a direct client (needs grpc-web proxy)
The API is public / third-party (REST is expected)
Debugging with curl / postman matters more than wire efficiency
Your team has zero gRPC experience and no plans to invest

Default for internal mesh is often "REST for now, gRPC when latency becomes the bottleneck." There's no shame in starting with REST.

The client says "your API randomly returns 500 for no reason." What do you investigate?Reveal

Walk the list:

1Is the error envelope correct? If the API returns 500 for what's actually a 4xx (validation error), that's the bug. Check that client errors map to 4xx classes.
2Is there an unhandled exception path? Any unexpected throw should map to 500 but with a structured body — if the body is the default HTML error page, the service is crashing before the middleware catches it.
3Timeout / retry interaction: if the client times out at 10s and the server at 30s, the client sees "no response" which libraries often surface as a 5xx.
4Rate limit leaking through as 500: if the rate limiter is misconfigured, it might panic rather than return 429.
5Downstream dependency: if a dependency is down, requests through your API will 500. Check upstream health.

The correct response to the user: "Give me a request_id from a 500 response", then search logs. Stable error envelopes with request_id make this tractable; without them, you are guessing.

Cheat sheet

•Nouns in paths. Methods are the verbs.
•Idempotency-Key header for POSTs, natural idempotency for PUTs.
•Cursor pagination > offset pagination, always.
•Stable error envelope: { error: { code, message, details? } }.
•Version in the path (/v1/...) — easy to deprecate, easy to route.
•Return the full resource on create; do not force a follow-up GET.
•Rate limit on identity, not on IP. Return X-RateLimit headers on every response.
•201 on create, 204 on delete, 409 on conflict, 422 on validation, 429 on rate limit.
•Deprecation policy: 12 months support after new version; Deprecation/Sunset headers for last 6.
•Max payload size (10 MB), max header size (1 KB), rejected with 413.

Practice this skill

These problems exercise API contract design. Try one now to apply what you just learned.

url shortener

7% complete

Current

Read this if

Step 1 of 14

The concept

Jump to next

Style

Sweet spot

Weakness

Auth norm

Caching

Typical interview use

REST/JSON

Public APIs, browsers, broad client support

Overfetching, chatty composite reads

Bearer / API key

HTTP Cache-Control

Default unless prompt says otherwise

gRPC

Internal service mesh, low latency

Browser needs proxy, harder to debug

mTLS + JWT

Client-side

Internal mesh, streaming RPC

GraphQL

Client-controlled aggregation, multi-client

Hard to HTTP-cache; N+1 traps

Bearer

App-layer (Apollo)

Mobile + web sharing backend

Async (queue)

Long-running jobs, fan-out

Needs status/callback contract

Bearer

N/A

Exports, email, notifications

SSE / WebSocket

Real-time server push

Persistent connection scale

Bearer on upgrade

N/A

Live chat, live feeds

Async (queue / webhook / event-driven)

Return 202 Accepted and deliver the result later via callback or polling.

The contract:

POST /exports → 202 Accepted with { job_id, status: "queued" }
GET /exports/{job_id} → { status: "running"|"done"|"failed", result?: {...} }
Optional webhook callback on completion

What you must specify:

Timeout / TTL for stale jobs
Retry policy (client-driven via poll, or server-driven via DLQ)
Result retention (where does the export live, for how long, under what URL)

The common mistake: using async for things that should be sync (adds latency for no reason) or sync for things that should be async (60-second HTTP requests fail in cruel ways).

Pros

+Decouples request latency from work duration
+Natural fit for fan-out and bulk operations
+Failures are retryable without client complexity

Cons

−Client needs to handle polling or webhook callbacks
−Status contract must be explicit (job_id, status field, result shape)
−Extra operational complexity (queue, DLQ, workers)

Choose this variant when

Work takes > a few seconds (exports, ML, transcoding)
Fan-out delivery (notifications to many recipients)
Client cannot block on result (mobile apps going to background)

POST /v1/charges HTTP/1.1 Idempotency-Key: a1b2c3-d4e5-f6-7890 Content-Type: application/json Authorization: Bearer sk_test_... { "amount": 2000, "currency": "usd", "source": "tok_visa" } # On first call: creates charge, stores key → response # On retry: returns the stored response, does NOT create a second charge # On retry with different body: 422 { code: "idempotency_body_mismatch" }

GET /v1/urls?after=eyJ0IjoxNjg5MDAwMDAwLCJpZCI6Ijg3NjUifQ&limit=50 HTTP/1.1 200 OK X-RateLimit-Limit: 1000 X-RateLimit-Remaining: 847 X-RateLimit-Reset: 1715000000 { "data": [ /* ... 50 items ... */ ], "next_cursor": "eyJ0IjoxNjg5MDAwMDQyLCJpZCI6IjkwMDAifQ", "has_more": true }

HTTP/1.1 409 Conflict Content-Type: application/json { "error": { "code": "alias_taken", "message": "The alias 'cool-link' is already in use.", "details": { "alias": "cool-link", "conflicting_id": "url_abc123" }, "request_id": "req_01HF9ZPXYZ" } }

syntax = "proto3"; package urls.v1; service UrlService { rpc CreateUrl(CreateUrlRequest) returns (Url); rpc GetUrl(GetUrlRequest) returns (Url); rpc ListUrls(ListUrlsRequest) returns (ListUrlsResponse); } message CreateUrlRequest { string long_url = 1; optional string custom_alias = 2; optional google.protobuf.Timestamp expires_at = 3; } message Url { string id = 1; string short_code = 2; string long_url = 3; google.protobuf.Timestamp created_at = 4; optional google.protobuf.Timestamp expires_at = 5; string owner_id = 6; }