API contract design
Resource modelling, idempotency, pagination, error semantics.
The API is the contract every client writes code against. Vague endpoints here metastasize into ambiguity everywhere else in the design. Interviewers use API design to separate candidates who have shipped from candidates who have read blog posts.
Read this if your last attempt…
- Your reviewer marked down "idempotency" or "pagination"
- You defaulted to REST without asking what the client looks like
- You put everything in query string, including POST bodies
- You have no error envelope — just "return an error"
- You said "version it later" and never named how
The concept
An API is a contract, and contracts have to be explicit. A strong API design is four layered choices.
- Pick a style: REST, gRPC, GraphQL, or async (events).
- Define a resource model: what the nouns are, what their identifiers look like, how they nest.
- Lock the semantics: idempotency, pagination, error envelope, versioning.
- Address the non-functional surface: auth, rate limiting, payload limits, deprecation policy.
REST for public browser-friendly APIs, gRPC for internal mesh, GraphQL for varied client views, async for long-running work.
Picking the API style — the 80% rule.
| Style | Sweet spot | Weakness | Auth norm | Caching | Typical interview use |
|---|---|---|---|---|---|
| REST/JSON | Public APIs, browsers, broad client support | Overfetching, chatty composite reads | Bearer / API key | HTTP Cache-Control | Default unless prompt says otherwise |
| gRPC | Internal service mesh, low latency | Browser needs proxy, harder to debug | mTLS + JWT | Client-side | Internal mesh, streaming RPC |
| GraphQL | Client-controlled aggregation, multi-client | Hard to HTTP-cache; N+1 traps | Bearer | App-layer (Apollo) | Mobile + web sharing backend |
| Async (queue) | Long-running jobs, fan-out | Needs status/callback contract | Bearer | N/A | Exports, email, notifications |
| SSE / WebSocket | Real-time server push | Persistent connection scale | Bearer on upgrade | N/A | Live chat, live feeds |
- Most real systems compose 2–3 styles: REST for public, gRPC internally, async for long work, SSE for push.
- Do not introduce a style without naming why — every style adds operational surface.
How interviewers grade this
- You pick an API style (REST / gRPC / GraphQL / async) and justify against client type + payload shape.
- Resource names are plural nouns with hierarchical IDs. No verbs in paths.
- Write endpoints have an idempotency story (Idempotency-Key header, natural idempotent verb, or deduplication).
- List endpoints specify pagination strategy (offset, cursor, or token).
- Error responses follow a consistent envelope with a stable code field clients can switch on.
- Status codes match semantics — 201 for create, 409 for conflict, 422 for validation, 429 for rate limit.
- You name a versioning and deprecation policy up front, not "we'll figure it out".
Variants
REST / JSON over HTTP
Nouns-and-verbs design for public APIs and browser clients.
The default style. Resources are nouns in the path; HTTP methods are the verbs.
Conventions that matter:
- Plural resources:
/users, not/user - Nested collections up to two levels:
/users/{id}/orders - Status codes used as contract: 200/201/204 on success, 4xx on client error, 5xx on server error
- ETags for optimistic concurrency:
If-Match: "etag"on updates Content-Type: application/json; everywhere
Sweet spot: public APIs consumed by browsers and third parties. Debuggable with curl, cacheable with Cache-Control, and every language has a client.
Weakness: overfetching (you get the whole resource even if you only need one field) and chatty for composite reads (multiple GETs to assemble one view). GraphQL exists because of these two.
Pros
- +Widely supported, debuggable, cacheable
- +Clear verb-method mapping is self-documenting
- +Browser-friendly; works with Cache-Control and CDNs
Cons
- −Overfetching — no field selection by default
- −Chatty for composite reads
- −Versioning drift over time (unless policy is disciplined)
Choose this variant when
- Public or semi-public APIs
- Diverse third-party client ecosystem
- Standard CRUD patterns over well-defined resources
gRPC / Protobuf
Typed, streaming, low-latency RPC for internal service meshes.
gRPC is RPC that takes schema seriously. Protobuf defines messages and services; code generators produce strongly-typed clients in every major language; HTTP/2 multiplexes many calls over one connection.
Why it wins for internal mesh:
- Typed contract — breaking changes caught at compile time
- Binary encoding is 5–10× smaller than JSON
- Bi-directional streaming (great for chat, telemetry, live updates)
- Cancellation propagates through the deadline
When it hurts:
- Browsers need grpc-web (a proxy) — not native
- Harder to debug without specialized tooling
- Schema evolution discipline required (do not renumber fields)
In an interview, reach for gRPC when the prompt involves a service mesh (dozens of internal services), real-time streaming (stock ticks, live chat), or very low latency (< 10 ms budget).
Pros
- +Typed contracts catch breaking changes at compile time
- +Efficient binary wire format
- +HTTP/2 multiplexing + streaming
Cons
- −Not browser-native (needs proxy)
- −Harder to debug with off-the-shelf tools
- −Schema evolution requires discipline
Choose this variant when
- Internal service-to-service APIs
- Real-time streaming requirements
- Latency budget < 10 ms per hop
GraphQL
Client-controlled field selection and aggregation over a typed schema.
GraphQL lets the client ask for exactly the fields it needs, aggregated across multiple underlying resources, in one round trip.
Why it shines for multi-client backends:
- iOS, Android, and web can each ask for a different slice of the same data
- No overfetching — the client names the fields
- Schema is typed end-to-end
- Single endpoint (
POST /graphql) — easier to version
The traps:
- Caching is hard — HTTP caches key on URL, but GraphQL uses POST with a body. You end up needing an application cache.
- N+1 resolver problem — naive server implementations hit the DB once per item in a list. Use DataLoader or similar batching.
- Query depth attacks — a hostile client can write a deeply nested query that explodes server work. Limit depth and complexity.
In an interview, reach for GraphQL when the prompt explicitly mentions multiple client types with different data needs.
Pros
- +Client controls field selection; no overfetching
- +Single endpoint reduces surface area
- +Strong typing end-to-end
Cons
- −HTTP caching does not apply; needs app-layer cache
- −N+1 resolvers hurt performance without DataLoader
- −Query complexity must be bounded against DoS
Choose this variant when
- Multiple clients with divergent data needs
- Aggregation across several underlying services
- Schema evolution and typed clients are important
Async (queue / webhook / event-driven)
Return 202 Accepted and deliver the result later via callback or polling.
For work that cannot complete inside a synchronous request — exports, ML inference, image processing, large fan-outs — accept the request, enqueue it, return a job handle, and let the client poll or receive a webhook.
The contract:
POST /exports→202 Acceptedwith{ job_id, status: "queued" }GET /exports/{job_id}→{ status: "running"|"done"|"failed", result?: {...} }- Optional webhook callback on completion
What you must specify:
- Timeout / TTL for stale jobs
- Retry policy (client-driven via poll, or server-driven via DLQ)
- Result retention (where does the export live, for how long, under what URL)
The common mistake: using async for things that should be sync (adds latency for no reason) or sync for things that should be async (60-second HTTP requests fail in cruel ways).
Pros
- +Decouples request latency from work duration
- +Natural fit for fan-out and bulk operations
- +Failures are retryable without client complexity
Cons
- −Client needs to handle polling or webhook callbacks
- −Status contract must be explicit (job_id, status field, result shape)
- −Extra operational complexity (queue, DLQ, workers)
Choose this variant when
- Work takes > a few seconds (exports, ML, transcoding)
- Fan-out delivery (notifications to many recipients)
- Client cannot block on result (mobile apps going to background)
Server-Sent Events / WebSockets / long-poll
Persistent connection for real-time server push.
When the server needs to push updates to the client as they happen — live scores, chat messages, stock ticks — RPC-style request/response is the wrong shape.
Three options:
- Server-Sent Events (SSE): one-way server→client over HTTP; simple, auto-reconnecting, firewall-friendly. Great default for push.
- WebSockets: full-duplex; use when client also sends frequent messages (chat, collaborative editing).
- Long-polling: fallback for networks where SSE/WS do not work; each "poll" request holds open until a message arrives.
What you design:
- Connection handshake (auth on the upgrade request)
- Heartbeat / keepalive (detect stale connections within 30 s)
- Backpressure (how to handle a slow client)
- Scale-out (how connections land on a specific server; how to broadcast across servers — Redis pub/sub is the default)
Pros
- +Real-time push without polling overhead
- +Low latency for server-initiated messages
- +Natural fit for chat, live feeds, collaboration
Cons
- −Persistent connections consume server memory
- −Harder to scale horizontally (connection affinity)
- −Requires keepalive / reconnect logic on client
Choose this variant when
- Server initiates messages (notifications, live scores)
- Bidirectional low-latency (chat, collaboration)
- Polling would be too frequent or too laggy
Worked example
Scenario: Design the POST endpoint to create a short URL.
Resource shape
- Resource: a URL (plural: /urls)
- Version: v1 (path-versioned)
- ID scheme: short_code (7 chars, base62, server-generated unless client provides custom_alias)
Endpoint definition
- POST /v1/urls
- Request body: { long_url, custom_alias?, expires_at? }
- Headers: Authorization: Bearer <token>, Idempotency-Key: <uuid>
- Response 201 Created: { id, short_code, short_url, long_url, created_at, expires_at, owner_id }
- Response 409 Conflict: { error: { code: "alias_taken", message, details: { alias } } }
- Response 422 Unprocessable: { error: { code: "invalid_url", message, details: { field, reason } } }
- Response 429 Too Many Requests: { error: { code: "rate_limited", message } } with Retry-After header
Idempotency contract
- Client sends Idempotency-Key header with a UUID they generate
- Server stores key → { status, response_body } for 24h in Redis
- On retry with same key: return stored response, do not re-create
- On retry with same key but different body: return 422 { code: "idempotency_body_mismatch" }
Pagination on the list endpoint
- GET /v1/urls?after=<cursor>&limit=50
- Response: { data: [...], next_cursor: "...", has_more: true }
- Cursor is opaque base64 of (created_at, id) — stable under inserts
Rate limiting
- 100 create/min per API key, returned as HTTP 429 with Retry-After: 45
- Headers on every response: X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset
Versioning policy
- v1 supported for 12 months after v2 ships
- Deprecation header set for the last 6 months: Deprecation: true, Sunset: Thu, 01 Nov 2026 00:00:00 GMT
- Clients still on v1 at sunset receive 410 Gone
Auth
- API keys for server-to-server; rotate every 90 days; revoke immediately on compromise
- OAuth2 bearer tokens for user-facing clients
That is a complete POST /urls design in under three minutes. Every axis a senior API reviewer probes is covered.
Good vs bad answer
Interviewer probe
“Walk me through the POST endpoint to create a short URL.”
Weak answer
"POST /createUrl with { long_url: '...' } in the body, returns { short_url: '...' }."
Strong answer
"POST /v1/urls — plural resource, versioned path. Body is { long_url, custom_alias?, expires_at? }. Returns 201 Created with the full resource { id, short_code, short_url, long_url, created_at, expires_at, owner_id }.
Idempotent via an Idempotency-Key header — we store key→response in Redis for 24h, so retries return the original resource. On custom alias conflict we return 409 Conflict with { error: { code: 'alias_taken', details: { alias } } }. Validation failures return 422 with structured details.
Rate limited per API key at 100/min; 429 with Retry-After. Versioning: path-based, 12-month support window after v2 ships, Deprecation/Sunset headers for the last 6 months. Auth: Bearer token for users, API key for servers."
Why it wins: Versioning, correct status codes, full resource returned, idempotency with storage contract, structured error envelope, rate limiting with headers, deprecation policy — every axis a senior API reviewer probes.
When it comes up
- Immediately after HLD, when the interviewer says "walk me through the API"
- When the prompt explicitly asks for a service contract
- Any deep-dive on a write path (idempotency becomes the focus)
- Any deep-dive on a list endpoint (pagination becomes the focus)
- When the interviewer probes on "what if the client retries?"
Order of reveal
- 1Pick the style. "REST for public / gRPC for internal / GraphQL for multi-client / async for long-running — I'll use X because Y."
- 2Name the resources. "Resources are: users, orders, items. Plural nouns, IDs are UUIDv7 (opaque and time-ordered)."
- 3Define the endpoint shape. "POST /v1/<resource>, request body is <fields>, response is 201 with the full resource."
- 4Declare idempotency. "Idempotency-Key header; server stores key→response for 24h in Redis; retries return the cached response."
- 5Declare pagination. "Cursor-based: ?after=<cursor>&limit=50; cursor is opaque base64 of the sort key. No offset pagination."
- 6Error envelope + status codes. "All errors return { error: { code, message, details? } }. 201 on create, 409 on conflict, 422 on validation, 429 on rate limit."
- 7Versioning + deprecation. "Path versioning; v1 supported for 12 months post-v2, Deprecation/Sunset headers for the last 6."
- 8Auth + rate limiting. "Bearer tokens for users, API keys for servers. Rate limits on identity not IP. 429 with Retry-After."
Signature phrases
- “Nouns in paths, methods are the verbs” — Distills REST design into one correction-proof rule.
- “Idempotency-Key header, stored for 24 hours” — Names Stripe's convention the interviewer is thinking of.
- “Cursor pagination, not offset” — Shows you have dealt with production APIs at scale.
- “Stable error envelope with a code field” — Signals you have been on the client side of bad APIs.
- “Path versioning with a 12-month deprecation window” — Demonstrates API lifecycle maturity.
- “Rate limit on identity, not IP” — Shows you know shared-NAT gotchas.
Likely follow-ups
?“Why not offset pagination?”Reveal
Two reasons:
- 1Inserts between pages cause duplicates or skipped items. A user paging through a list that grows while they scroll sees items flicker in and out of the feed.
- 2Deep offsets are O(offset) on the DB —
OFFSET 100000 LIMIT 20makes the DB scan and discard 100K rows. Query time grows linearly with offset.
Cursor pagination has neither problem. The cursor encodes "where you were" (usually a base64 of the last item's (sort_key, id)) and the DB seeks directly via an index lookup: WHERE (created_at, id) < (cursor_time, cursor_id) ORDER BY created_at DESC, id DESC LIMIT 20. Constant time regardless of page.
Only use offset for bounded admin UIs where the dataset is small.
?“The client sends the same Idempotency-Key with a different request body. What do you return?”Reveal
422 Unprocessable Entity with { error: { code: "idempotency_body_mismatch", message: "Idempotency-Key already used with a different request" } }.
This is Stripe's convention and it's the correct defensive stance. Returning the cached response would be wrong (it does not match what the client is now asking for); creating a new resource would violate idempotency. The server has to reject.
The implementation: when storing the idempotency record, also store a hash of the request body. On retry, compare hashes before returning the cached response.
?“How do you version the API without breaking existing clients?”Reveal
Path-versioning with a strict lifecycle policy:
- 1Publish v2 alongside v1; both serve traffic.
- 2Set `Deprecation: true` and `Sunset: <date>` headers on v1 responses starting 6 months before sunset.
- 3Publish a migration guide that maps v1→v2 changes.
- 4At sunset, v1 returns 410 Gone with a pointer to the migration guide.
- 5Minimum support window: 12 months after v2 ships.
For smaller changes (adding a field), use additive evolution — never remove or rename a field in the same major version; just add new fields and document them. Clients ignoring unknown fields is a standard convention.
?“What happens when the rate limit is hit?”Reveal
429 Too Many Requests with three things:
- Retry-After header: seconds the client should wait (or an HTTP date)
- Body:
{ error: { code: "rate_limited", message: "Request rate exceeded. Retry after 45 seconds." } } - X-RateLimit headers on every response (success and failure):
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset
The Retry-After header lets polite clients back off automatically. The X-RateLimit headers let clients self-throttle (slow down when remaining is low) without hitting 429 at all. Both together is the convention that Twitter, GitHub, and Stripe follow.
Under the hood I use token-bucket per identity (API key or user ID), not IP — shared NAT makes IP limiting punish innocent users.
Code examples
POST /v1/charges HTTP/1.1
Idempotency-Key: a1b2c3-d4e5-f6-7890
Content-Type: application/json
Authorization: Bearer sk_test_...
{
"amount": 2000,
"currency": "usd",
"source": "tok_visa"
}
# On first call: creates charge, stores key → response
# On retry: returns the stored response, does NOT create a second charge
# On retry with different body: 422 { code: "idempotency_body_mismatch" }GET /v1/urls?after=eyJ0IjoxNjg5MDAwMDAwLCJpZCI6Ijg3NjUifQ&limit=50
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 847
X-RateLimit-Reset: 1715000000
{
"data": [ /* ... 50 items ... */ ],
"next_cursor": "eyJ0IjoxNjg5MDAwMDQyLCJpZCI6IjkwMDAifQ",
"has_more": true
}HTTP/1.1 409 Conflict
Content-Type: application/json
{
"error": {
"code": "alias_taken",
"message": "The alias 'cool-link' is already in use.",
"details": {
"alias": "cool-link",
"conflicting_id": "url_abc123"
},
"request_id": "req_01HF9ZPXYZ"
}
}syntax = "proto3";
package urls.v1;
service UrlService {
rpc CreateUrl(CreateUrlRequest) returns (Url);
rpc GetUrl(GetUrlRequest) returns (Url);
rpc ListUrls(ListUrlsRequest) returns (ListUrlsResponse);
}
message CreateUrlRequest {
string long_url = 1;
optional string custom_alias = 2;
optional google.protobuf.Timestamp expires_at = 3;
}
message Url {
string id = 1;
string short_code = 2;
string long_url = 3;
google.protobuf.Timestamp created_at = 4;
optional google.protobuf.Timestamp expires_at = 5;
string owner_id = 6;
}Common mistakes
POST /createUser is the tell that you've shipped RPC, not REST. Resources are nouns. The verb is the HTTP method. Same rule applies in gRPC — method names like CreateUser match the resource, not the operation style.
Every network client retries. Without an idempotency contract, retries become duplicates. Either use natural idempotent verbs (PUT with client-supplied id), or require an Idempotency-Key header and dedupe server-side. Payments systems have been taken down by missing this.
Offset pagination breaks when inserts happen between pages — users see duplicates or miss items. Also O(offset) on the DB, which gets painful past offset 10k. Use cursor/token pagination. Reserve offset for admin UIs where the dataset is small and bounded.
Two endpoints return errors as { error: "..." } and { message: "..." }. Clients now have to parse both. Lock a single envelope: { error: { code, message, details? } }. code must be a stable enum clients can switch on.
Returning 200 with { error: "..." } instead of 4xx. Clients now parse every response body to detect errors, and retry logic breaks because they retry 200s. Match the class: 2xx = success, 4xx = client error, 5xx = server error.
"We'll version it later" means you never will. Put /v1/ in paths from day one, commit to a 12-month support window after /v2/ ships, and use Deprecation/Sunset headers for the last 6 months. The discipline matters more than the mechanism.
Practice drills
A client retries a POST and you created the resource twice. What's your fix?Reveal
Server-side dedupe on an idempotency key. Two common shapes:
- 1Natural idempotency: the resource has a unique id the client provides (e.g.,
client_message_idon a chat app). Server reject on duplicate key. - 2Idempotency-Key header: require every POST to include a client-generated UUID; server stores
key → responsefor 24h; on repeat, return the stored response.
Edge case: client retries same key with a different body — return 422 { code: "idempotency_body_mismatch" }. Do not silently return the cached response; do not create a new resource.
Operational: keep the idempotency store small (TTL) and monitor for key-reuse patterns (could indicate a buggy client).
Your GraphQL endpoint is slow because one resolver hits the DB 100 times per request. What's happening?Reveal
N+1 resolver problem. The client asked for a list of 100 orders, each with its customer. Your naive resolver fires one query for customer per order. 1 query for the list + 100 queries for the customers = 101 queries.
Fix: DataLoader pattern. Batch and dedupe fetches within a single request. The customer resolver puts customer IDs on a queue; a tick later, DataLoader fires one SELECT * FROM customers WHERE id IN (...) that resolves all 100 in one round trip.
This is a required pattern for GraphQL at production scale — every major GraphQL server (Apollo, Relay) ships DataLoader or equivalent.
Your API has been running for 2 years; a major breaking change is needed. How do you ship it?Reveal
Parallel versions with a deprecation schedule.
- 1Ship /v2 alongside /v1. Both serve traffic. v2 has the breaking changes; v1 is frozen.
- 2Publish migration guide mapping v1 → v2 endpoint-by-endpoint.
- 3Instrument v1 usage to know who's still on it. Reach out to the top 10 clients.
- 4T-6 months: start setting
Deprecation: trueandSunset: <date>headers on v1 responses. - 5T-2 months: email all remaining v1 clients with the cutover date.
- 6T=0: v1 returns 410 Gone with a pointer to the migration guide.
- 7T+3 months: remove v1 code after usage has been zero for a quarter.
Skipping these steps is how APIs get a reputation for breaking. The discipline matters more than the versioning scheme.
When would you choose gRPC over REST for a new internal service?Reveal
When all of the following are true:
- You own both sides (no third-party consumers)
- Latency budget is tight (< 10 ms per hop) — binary encoding + HTTP/2 multiplexing matters
- Strong typing helps — the team values compile-time contract checking over curl-ability
- Streaming is needed — bidirectional streams for chat, telemetry, live updates
Do not choose gRPC when:
- Browsers are a direct client (needs grpc-web proxy)
- The API is public / third-party (REST is expected)
- Debugging with curl / postman matters more than wire efficiency
- Your team has zero gRPC experience and no plans to invest
Default for internal mesh is often "REST for now, gRPC when latency becomes the bottleneck." There's no shame in starting with REST.
The client says "your API randomly returns 500 for no reason." What do you investigate?Reveal
Walk the list:
- 1Is the error envelope correct? If the API returns 500 for what's actually a 4xx (validation error), that's the bug. Check that client errors map to 4xx classes.
- 2Is there an unhandled exception path? Any unexpected throw should map to 500 but with a structured body — if the body is the default HTML error page, the service is crashing before the middleware catches it.
- 3Timeout / retry interaction: if the client times out at 10s and the server at 30s, the client sees "no response" which libraries often surface as a 5xx.
- 4Rate limit leaking through as 500: if the rate limiter is misconfigured, it might panic rather than return 429.
- 5Downstream dependency: if a dependency is down, requests through your API will 500. Check upstream health.
The correct response to the user: "Give me a request_id from a 500 response", then search logs. Stable error envelopes with request_id make this tractable; without them, you are guessing.
Cheat sheet
- •Nouns in paths. Methods are the verbs.
- •Idempotency-Key header for POSTs, natural idempotency for PUTs.
- •Cursor pagination > offset pagination, always.
- •Stable error envelope:
{ error: { code, message, details? } }. - •Version in the path (
/v1/...) — easy to deprecate, easy to route. - •Return the full resource on create; do not force a follow-up GET.
- •Rate limit on identity, not on IP. Return X-RateLimit headers on every response.
- •201 on create, 204 on delete, 409 on conflict, 422 on validation, 429 on rate limit.
- •Deprecation policy: 12 months support after new version; Deprecation/Sunset headers for last 6.
- •Max payload size (10 MB), max header size (1 KB), rejected with 413.
Practice this skill
These problems exercise API contract design. Try one now to apply what you just learned.