Why API Design Is a Force Multiplier
An API is a contract. Once published and consumed by clients you don’t control, breaking that contract costs real money — forced migrations, outage windows, angry partners. A well-designed API, by contrast, is self-documenting, evolvable, and reduces the surface area of integration bugs.
This is why API design decisions get escalated to staff engineers. The code you write to implement an endpoint takes hours. The API shape you define will constrain your system for years.
REST: Constraints That Matter
REST (Representational State Transfer) is not just “JSON over HTTP.” Roy Fielding’s dissertation defines six architectural constraints. Only two are commonly violated in practice:
- Stateless — each request contains all information needed to process it. No server-side session. This is what makes REST horizontally scalable.
- Uniform Interface — standardised interaction via resources, HTTP methods, and hypermedia (HATEOAS). This is what makes REST discoverable.
The other four (client-server, cacheable, layered system, code-on-demand) are generally satisfied by any web API.
Richardson Maturity Model
A practical way to assess how “RESTful” an API really is:
Level 0: HTTP as a tunnel
POST /api
{"action": "getUserOrders", "userId": 42}
Level 1: Resources (URLs represent nouns)
POST /users/42/orders
POST /users/42/orders/cancel ← still using verbs
Level 2: HTTP Verbs + Status Codes used correctly
GET /users/42/orders → 200
POST /users/42/orders → 201
DELETE /users/42/orders/99 → 204
GET /users/42/orders/999999 → 404
Level 3: HATEOAS (Hypermedia Controls)
GET /users/42/orders/99
{
"orderId": 99,
"status": "shipped",
"_links": {
"self": { "href": "/users/42/orders/99" },
"cancel": { "href": "/users/42/orders/99/cancel", "method": "DELETE" },
"track": { "href": "/shipments/TRK-abc123" }
}
}
Most production APIs operate at Level 2. Level 3 is rare but powerful for APIs consumed by generic clients (like HAL browsers). The links eliminate the need for clients to construct URLs — they follow links like a web browser.
Resource Naming Best Practices
# Use plural nouns for collections
GET /users ← collection
GET /users/42 ← single resource
POST /users ← create in collection
# Hierarchy for owned resources
GET /users/42/orders ← user's orders
GET /users/42/orders/99 ← specific order
# Avoid deep nesting (> 2 levels gets unwieldy)
# Bad:
GET /companies/1/departments/2/teams/3/members/4
# Better: flatten with query params for context
GET /team-members/4?teamId=3
# Actions that don't map to CRUD: use sub-resource verbs sparingly
POST /orders/99/cancel ← acceptable for state transitions
POST /payments/capture ← acceptable for operations
# Query parameters for filtering, sorting, projection
GET /orders?status=shipped&sort=-createdAt&fields=id,status,total
Naming conventions: kebab-case for URLs (/user-profiles), camelCase for JSON fields (userId, createdAt). Be consistent — inconsistency is the #1 complaint in API usability surveys.
HTTP Methods: Semantics and Idempotency
Understanding idempotency is critical — it defines retry safety and how clients recover from network failures.
| Method | Semantics | Idempotent | Safe | Cacheable |
|---|---|---|---|---|
| GET | Read resource | ✅ | ✅ | ✅ |
| HEAD | Read headers only | ✅ | ✅ | ✅ |
| OPTIONS | Read capabilities | ✅ | ✅ | ❌ |
| PUT | Replace resource entirely | ✅ | ❌ | ❌ |
| PATCH | Partial update | ❌ (usually) | ❌ | ❌ |
| DELETE | Remove resource | ✅ | ❌ | ❌ |
| POST | Create / non-idempotent action | ❌ | ❌ | ❌ |
Idempotent means calling it N times has the same effect as calling it once. DELETE /orders/99 called twice: second call returns 404 but the system state is identical — order 99 is deleted.
PATCH idempotency caveat: PATCH /counter {"increment": 1} is not idempotent. PATCH /counter {"value": 5} is. Design PATCH bodies to be declarative (set-to-value), not imperative (apply-operation), to achieve idempotency.
HTTP Status Codes: A Precise Taxonomy
2xx — Success
200 OK → GET, PUT, PATCH response with body
201 Created → POST that created a resource; include Location header
202 Accepted → async operation started; polling or webhook to follow
204 No Content → DELETE, PUT/PATCH with no response body needed
3xx — Redirection
301 Moved Permanently → URL changed forever; update bookmarks
302 Found → temporary redirect
304 Not Modified → GET with If-None-Match/If-Modified-Since; use cached copy
4xx — Client Error
400 Bad Request → malformed syntax, validation failure
401 Unauthorized → not authenticated (misleading name: means "unauthenticated")
403 Forbidden → authenticated but not authorised
404 Not Found → resource doesn't exist
409 Conflict → state conflict (e.g., duplicate creation, optimistic lock fail)
410 Gone → resource existed and was permanently deleted
422 Unprocessable → syntactically valid but semantically invalid
429 Too Many Requests → rate limited
5xx — Server Error
500 Internal Server Error → catch-all; don't expose stack traces
502 Bad Gateway → upstream service returned invalid response
503 Service Unavailable → server is overloaded or down; include Retry-After
504 Gateway Timeout → upstream service timed out
Common mistakes:
- Returning
200with{"success": false, "error": "not found"}— clients can’t programmatically handle errors without parsing body - Using
404for “no results” — an empty collection[]with200is correct - Using
401when you mean403— the difference matters for auth debugging
Versioning Strategies
APIs need to evolve. How you version determines how painful evolution is.
Strategy 1: URL Path Versioning
GET /v1/users/42
GET /v2/users/42
Pros: Obvious, easy to route at gateway/load balancer level, easy to document separately, easy to deprecate (just redirect old prefix).
Cons: Not “pure REST” (the URL represents a resource, not a version of a resource). Two URLs for the same logical resource. Clients hardcode versions.
Used by: Stripe, Twilio, GitHub, most public APIs.
Strategy 2: Header Versioning
GET /users/42
API-Version: 2024-01-15
Pros: Clean URLs, easier to support fine-grained versioning (date-based like Stripe).
Cons: Can’t test in browser URL bar, harder to route at the gateway layer, version not visible in logs unless explicitly extracted.
Used by: Stripe (date-based versions like 2023-10-16).
Strategy 3: Content Negotiation (Accept Header)
GET /users/42
Accept: application/vnd.example.v2+json
Pros: Technically “correct” per HTTP spec. Can serve different representations of same resource.
Cons: Verbose, unfamiliar to most developers, poor tooling support.
Used by: GitHub API (partially).
Recommendation
Use URL path versioning for public APIs (/v1/, /v2/). Use date-based header versioning for APIs where clients pin to a specific date (Stripe’s model — excellent for backwards compatibility without explosive version proliferation).
Never version at the field level in the same endpoint — it creates combinatorial complexity.
Pagination Patterns
Offset Pagination
// Request
GET /orders?offset=100&limit=25
// Response
{
"data": [...],
"pagination": {
"total": 1543,
"offset": 100,
"limit": 25,
"hasMore": true
}
}
Pros: Random access (jump to page 40), easy to implement with SQL LIMIT/OFFSET.
Cons:
- Inconsistent results during writes: if a record is inserted at position 50 while you’re paginating, page 3 will contain a duplicate of the last item on page 2 (the “page shift” problem).
- Performance degrades:
OFFSET 10000 LIMIT 25requires the database to scan and discard 10,000 rows — O(n) cost.
-- This is slow at large offsets
SELECT * FROM orders ORDER BY created_at DESC LIMIT 25 OFFSET 10000;
-- Requires full sort + skip of 10,000 rows
Use when: Admin dashboards, analytics UIs where users want “go to page 15” and the dataset is small to medium (<100k rows).
Cursor Pagination (Keyset Pagination)
// Request
GET /orders?cursor=eyJpZCI6MTAwMH0&limit=25
// Response
{
"data": [...],
"pagination": {
"nextCursor": "eyJpZCI6OTc1fQ", // base64({"id":975})
"hasMore": true
}
}
-- Efficient: uses index, no full scan
SELECT * FROM orders
WHERE id < 1000 -- cursor decoded
ORDER BY id DESC
LIMIT 25;
-- Uses B-tree index on id → O(log n) + O(limit)
Pros:
- O(log n) database cost regardless of page depth
- Stable results: inserts/deletes don’t shift pages
- Works for infinite scroll / feed UIs
Cons:
- No random access — can’t jump to “page 40”
- Cursor is opaque to clients
- Sorting by multiple fields requires compound cursor (e.g.,
{"createdAt": "2024-01-15T10:00:00Z", "id": 42})
Use when: Feeds, timelines, large datasets, infinite scroll, any API where you can’t predict access patterns.
Time-Based Pagination
// Request: get events between timestamps
GET /events?since=2024-01-01T00:00:00Z&until=2024-01-02T00:00:00Z&limit=100
// Useful for: audit logs, analytics, webhook replay
Use when: Data is naturally time-ordered and consumers want to poll for new data (webhooks, audit logs).
Error Response Schema: RFC 7807 Problem Details
Ad-hoc error schemas are a plague. RFC 7807 defines a standard:
HTTP/1.1 422 Unprocessable Entity
Content-Type: application/problem+json
{
"type": "https://api.example.com/errors/validation-error",
"title": "Validation Error",
"status": 422,
"detail": "The request body contains invalid fields.",
"instance": "/orders/create#2024-01-15T10:30:00Z",
"errors": [
{
"field": "items[0].quantity",
"code": "MUST_BE_POSITIVE",
"message": "Quantity must be greater than 0"
},
{
"field": "shippingAddress.postalCode",
"code": "INVALID_FORMAT",
"message": "Postal code must match pattern ^[0-9]{6}$"
}
],
"traceId": "abc123def456"
}
Key fields:
type: URI uniquely identifying the error type (machine-readable, links to docs)title: human-readable summary of the error typestatus: HTTP status code (redundant with HTTP status but useful for middleware)detail: human-readable explanation specific to this occurrenceinstance: URI identifying this specific occurrence (useful for support)traceId: distributed trace ID for log correlation
Idempotency Keys
POST is not idempotent. Network failures between sending a POST and receiving a response leave the client unable to know if the action was executed. Did the payment go through? Did the order get created?
Solution: Client-supplied idempotency keys.
// Client generates a unique key per logical operation
const idempotencyKey = crypto.randomUUID();
// Attaches it to the request
await fetch('/payments', {
method: 'POST',
headers: {
'Idempotency-Key': idempotencyKey,
'Content-Type': 'application/json'
},
body: JSON.stringify({ amount: 5000, currency: 'USD' })
});
// If network times out, client retries with SAME key
// Server deduplicates: if key seen, return cached response
// Server-side implementation
async function createPayment(req: Request): Promise<Response> {
const idempotencyKey = req.headers.get('Idempotency-Key');
if (idempotencyKey) {
const cached = await redis.get(`idempotency:${idempotencyKey}`);
if (cached) {
return new Response(cached, {
status: 200,
headers: { 'Idempotent-Replayed': 'true' }
});
}
}
// Execute the actual operation
const payment = await processPayment(req.body);
const responseBody = JSON.stringify(payment);
if (idempotencyKey) {
// Cache for 24 hours — long enough for any reasonable retry window
await redis.set(`idempotency:${idempotencyKey}`, responseBody, 'EX', 86400);
}
return new Response(responseBody, { status: 201 });
}
Storage consideration: Idempotency keys need to be stored for the retry window (typically 24h–7 days). At Stripe’s scale (millions of API calls/day), this is a significant Redis footprint. Use a short TTL and document it clearly.
Scope: Keys are scoped per API key / tenant, not globally. idempotency:{apiKeyHash}:{clientKey}.
GraphQL vs REST: Real Tradeoffs
GraphQL is not strictly better than REST. The tradeoffs are real.
# GraphQL: client specifies exactly what it needs
query {
user(id: "42") {
name
email
orders(last: 5) {
id
total
status
}
}
}
REST equivalent requires:
GET /users/42 → name, email (+ 20 other fields you don't need)
GET /users/42/orders → all orders paginated
| Concern | REST | GraphQL |
|---|---|---|
| Over-fetching | Common (fixed response shape) | Eliminated (client selects fields) |
| Under-fetching | Common (N+1 requests) | Eliminated (single query) |
| Caching | Simple (HTTP cache by URL) | Hard (POST /graphql, same URL) |
| Schema evolution | Via versioning strategy | Additive schema evolution |
| File uploads | Simple multipart | Complex (no standard) |
| Rate limiting | Per-endpoint | Hard (query cost unknown) |
| Error handling | HTTP status codes | Always 200, errors in body |
| Tooling | Mature | Growing (GraphiQL, Apollo) |
| Learning curve | Low | Medium-High |
| N+1 problem | No (resolved at endpoint) | Yes (requires DataLoader) |
Use GraphQL when:
- Building a BFF (Backend for Frontend) serving mobile + web with different data needs
- Internal APIs consumed by teams you control
- High data-access flexibility needed (exploratory dashboards)
Use REST when:
- Public API (caching, simplicity, broad tooling support)
- Simple CRUD with predictable access patterns
- File upload/download heavy
- Team unfamiliar with GraphQL and DataLoader patterns
gRPC and Protocol Buffers
gRPC uses Protocol Buffers (protobuf) for efficient binary serialization and HTTP/2 for transport. It’s the standard for internal service-to-service communication at Google, Netflix, and Uber.
// order_service.proto
syntax = "proto3";
package orders.v1;
service OrderService {
rpc GetOrder (GetOrderRequest) returns (Order);
rpc CreateOrder (CreateOrderRequest) returns (Order);
rpc ListOrders (ListOrdersRequest) returns (stream Order); // server streaming
rpc BulkCreateOrders (stream CreateOrderRequest) returns (BulkResult); // client streaming
}
message Order {
string order_id = 1;
string user_id = 2;
repeated LineItem items = 3;
OrderStatus status = 4;
int64 created_at_ms = 5;
}
message LineItem {
string product_id = 1;
int32 quantity = 2;
int64 price_cents = 3;
}
enum OrderStatus {
ORDER_STATUS_UNSPECIFIED = 0;
ORDER_STATUS_PENDING = 1;
ORDER_STATUS_CONFIRMED = 2;
ORDER_STATUS_SHIPPED = 3;
}
Serialization comparison:
JSON payload: {"orderId":"ord_abc123","userId":"usr_42","status":"CONFIRMED"}
JSON bytes: ~65 bytes
Protobuf same data:
Binary bytes: ~18 bytes (72% smaller)
Parse time: 5-10x faster than JSON
gRPC vs REST:
| Concern | REST + JSON | gRPC + Protobuf |
|---|---|---|
| Payload size | Larger | ~70% smaller |
| Parse speed | Slower | 5-10x faster |
| Schema | Optional (OpenAPI) | Required (strongly typed) |
| Browser support | Native | Requires grpc-web proxy |
| Streaming | SSE / WebSocket | Native bi-directional |
| Code generation | Optional | First-class (all languages) |
| Human readable | ✅ | ❌ (binary) |
| Load balancing | HTTP/1.1 compatible | Requires L7 load balancer |
Use gRPC for: Internal microservice communication, real-time streaming, mobile apps needing high throughput on metered connections.
API Gateway Pattern
An API Gateway is the single entry point for all client requests:
Mobile App ──┐
Web App ──┼──► API Gateway ──► Auth Service
Partner API ──┘ │
├──► User Service
├──► Order Service
└──► Payment Service
The gateway handles cross-cutting concerns:
- Authentication: verify JWT, OAuth tokens — services trust gateway
- Rate limiting: per-client, per-endpoint
- Request routing: path → service mapping
- Load balancing: across service instances
- SSL termination
- Request/response transformation: e.g., REST → gRPC translation
- Observability: access logs, traces, metrics
- Caching: edge caching for GET responses
Popular choices: Kong (open-source, plugin ecosystem), AWS API Gateway (managed, deep AWS integration), Nginx + Lua, Envoy (used internally by many companies as sidecar + gateway).
Backward Compatibility
Never break existing clients. The rules:
// SAFE: additive changes
// Adding new optional fields is backward compatible
{
"orderId": "123",
"status": "shipped",
"trackingUrl": "https://..." // NEW: clients that don't know about this field ignore it
}
// SAFE: new optional request parameter
GET /orders?includeArchived=true // clients that don't send it get old behaviour
// BREAKING: removing fields
// Clients reading "status" will break if you remove it
// BREAKING: changing field type
// "amount": 5000 → "amount": "50.00" // integer to string breaks clients
// BREAKING: changing enum values
// "status": "in_transit" → "status": "shipped" // renames break clients
// BREAKING: changing URL structure
// /v1/users/42/orders → /v1/orders?userId=42
Tolerant Reader pattern: Clients should ignore unknown fields in responses (most JSON parsers do this by default with proper configuration). This makes the server side of additive changes safe.
Postel’s Law: “Be conservative in what you send, be liberal in what you accept.” Accept extra fields gracefully; never emit undocumented fields in stable APIs.
Deprecation Strategy
- Announce deprecation in changelog, developer portal, and response headers:
Deprecation: Sat, 01 Jun 2024 00:00:00 GMT
Sunset: Mon, 01 Jan 2025 00:00:00 GMT
Link: <https://api.example.com/docs/migration/v2>; rel="successor-version"
-
Monitor usage — track calls to deprecated endpoints by API key. Reach out directly to heavy users of soon-to-be-removed endpoints.
-
Minimum deprecation window: 6 months for public APIs, 12 months for enterprise/partner APIs.
-
Sunset ≠ delete immediately — return
410 Gonewith helpful migration message before actually decomissioning infrastructure.
OpenAPI Specification
OpenAPI (formerly Swagger) is the de-facto standard for REST API documentation:
# openapi.yaml
openapi: 3.1.0
info:
title: Order Service API
version: 1.0.0
paths:
/orders:
get:
summary: List orders
parameters:
- name: cursor
in: query
schema: { type: string }
- name: limit
in: query
schema: { type: integer, default: 25, maximum: 100 }
responses:
'200':
description: Order list
content:
application/json:
schema:
$ref: '#/components/schemas/OrderList'
'401':
$ref: '#/components/responses/Unauthorized'
components:
schemas:
Order:
type: object
required: [orderId, status, total]
properties:
orderId:
type: string
example: "ord_abc123"
status:
type: string
enum: [pending, confirmed, shipped, cancelled]
total:
type: integer
description: Amount in smallest currency unit (cents)
example: 9999
Benefits: auto-generate SDKs (openapi-generator), mock servers (Prism), documentation (Redoc, Swagger UI), contract testing (Schemathesis).
Rate Limiting Headers
HTTP/1.1 200 OK
X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1705312800 # Unix timestamp when limit resets
Retry-After: 3600 # Seconds until retry (on 429)
GitHub’s convention (widely adopted):
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4987
X-RateLimit-Used: 13
X-RateLimit-Reset: 1705312800
X-RateLimit-Resource: core
Interview Checklist
REST Fundamentals
- What are the 6 REST constraints? Which two are most commonly violated?
- Explain the Richardson Maturity Model with examples
- What is the difference between
401 Unauthorizedand403 Forbidden? - When should you return
202 Acceptedvs201 Created?
Idempotency and Safety
- Which HTTP methods are idempotent? Which are safe?
- Why is PATCH not always idempotent? How do you make it idempotent?
- How do idempotency keys work? How would you implement them in Redis?
- A client posts a payment but gets a network timeout. What should it do?
Versioning and Evolution
- Compare URL versioning vs header versioning — tradeoffs?
- What is a breaking change? Give 5 examples
- What is the Tolerant Reader pattern?
- How would you deprecate and sunset a widely-used API endpoint?
Pagination
- Why is offset pagination slow at large offsets?
- How does cursor/keyset pagination work? What SQL does it use?
- When would you choose offset over cursor pagination?
Protocol Comparison
- REST vs GraphQL: when does each win? What is the N+1 problem in GraphQL?
- Why use gRPC for internal services? What are the limitations?
- What does a protobuf wire format gain over JSON?
Architecture
- What does an API gateway do? What cross-cutting concerns does it handle?
- Design the API for a ride-sharing app (drivers, riders, trips, payments)
- How would you version an API that serves 10,000 partner integrations?