System Design · Topic 5 of 16

CAP Theorem

100 XP

The theorem

In a distributed system, you can guarantee at most two of the following three properties:

  • Consistency (C): Every read returns the most recent write, or an error. All nodes see the same data at the same time.
  • Availability (A): Every request receives a response (not necessarily the most recent data). The system is always operational.
  • Partition Tolerance (P): The system continues to operate even if messages between nodes are lost or delayed (network partition).

Stated by Eric Brewer in 2000, formalized by Gilbert and Lynch in 2002.


Why you always need P

A network partition is when nodes in a distributed system can’t communicate with each other. This will happen in any real distributed system — network cables fail, switches go down, datacenters lose connectivity.

If you drop P (partition tolerance), you’re essentially building a single-node system. You can’t have a distributed system without partition tolerance.

The real choice is: when a partition occurs, do you sacrifice C or A?

This reframes CAP: it’s not three options to pick two from — it’s a choice you make under partition.


CP: Consistency over Availability

When a partition occurs, the system refuses requests rather than returning potentially stale data.

Normal operation:    During partition:
Node A ←→ Node B    Node A ✗ Node B
  ↓           ↓         ↓
write       read     "Error: cannot reach quorum"
propagates  fresh    (refuse rather than return stale)

Examples: HBase, Zookeeper, etcd, most relational databases with synchronous replication.

Use when: Correctness is critical. Financial transactions, inventory management, leader election. Returning wrong data is worse than returning an error.


AP: Availability over Consistency

When a partition occurs, nodes keep accepting reads and writes. After the partition heals, nodes reconcile — eventual consistency.

Normal operation:    During partition:    After healing:
Node A ←→ Node B    Node A  |  Node B    Node A ←→ Node B
  ↓           ↓     writes  |  writes    conflict resolution
                    diverge |  diverge   (last-write-wins, merge, etc.)

Examples: Cassandra, DynamoDB, CouchDB, DNS.

Use when: Availability matters more than immediate consistency. Social media feeds, product catalogs, DNS (you’ll accept a slightly stale IP before seeing an error), shopping carts (Amazon’s famous dynamo paper was motivated by this).


Eventual consistency

AP systems don’t give up on consistency — they just delay it. After a partition heals, nodes exchange updates and converge to the same state. This is called eventual consistency.

How long until consistency? It depends on the system. Could be milliseconds, could be seconds. In practice, for well-designed systems, it’s usually milliseconds unless there’s a real network issue.

The “eventually” part causes real problems:

  • User posts a tweet. Reads from a replica that hasn’t caught up. Tweet appears to vanish.
  • Two users edit the same document offline. Both saves succeed. Now there’s a conflict.
  • A counter is incremented on two nodes simultaneously. One increment is lost.

Conflict resolution strategies:

  • Last-write-wins: Use timestamps; the newer write survives. Simple, but clock skew can cause data loss.
  • Version vectors (vector clocks): Track causality. Can detect conflicts precisely, then surface them for application-level resolution (Dynamo’s approach).
  • CRDTs (Conflict-free Replicated Data Types): Data structures designed to be merged automatically. Counters, sets, registers. No conflicts by design.

PACELC: the extension

CAP only describes behavior during partitions. PACELC extends it:

Partition? choose A or C. Else (no partition)? choose Latency or Consistency.

Even without a partition, there’s a trade-off: synchronous replication (strong consistency) adds latency. Asynchronous replication (lower latency) means replicas may lag.

SystemPartitionElse
MySQL (sync replication)CPHigh latency
MySQL (async replication)CPLow latency, possible stale reads
CassandraAPLow latency
SpannerCPLow latency (but requires atomic clocks)

What this means in system design interviews

When you choose a database or describe data replication, you’re implicitly making a CAP choice. Interviewers want to see that you understand the trade-off, not that you memorized the theorem.

The right framing: “What’s the cost of inconsistency? What’s the cost of unavailability?”

  • Payment processing → CP. A double charge or missed payment is worse than an error.
  • User profile reads → AP. Serving a 5-second-stale profile is fine. Refusing to serve anything is not.
  • Leader election → CP. Two nodes thinking they’re the leader causes chaos.
  • DNS → AP. A stale IP is tolerable. DNS going down for everyone is not.

Don’t memorize “use Cassandra for AP and PostgreSQL for CP.” Understand the trade-off, then justify your choice based on the business requirement.