Distributed Systems
A principle stating that a distributed data store can provide at most two of: Consistency, Availability, and Partition tolerance.
The CAP theorem, proved by Eric Brewer, states that any networked shared-data system can guarantee at most two of three properties simultaneously: Consistency (every read sees the most recent write), Availability (every request gets a non-error response), and Partition tolerance (the system continues to operate despite network partitions between nodes).
In practice, partitions are unavoidable in any real network — packets get dropped, links go down, switches fail. So the choice is really between consistency and availability when a partition occurs. CP systems (like HBase, MongoDB in default mode, etcd) refuse to serve potentially stale data, returning errors during partitions. AP systems (like Cassandra, DynamoDB, CouchDB) keep accepting reads and writes on both sides of the partition, accepting temporary inconsistency.
The modern view is that CAP is a useful starting point but oversimplifies. The PACELC theorem extends it: even when there is no Partition (E for "else"), you must trade Latency against Consistency. Real systems make these tradeoffs at the per-request level via tunable consistency.
Use CAP as a thinking tool when designing any distributed data store. Be explicit about which guarantee you give up during partitions.
CAP is not a strict either/or. Real systems offer tunable consistency, partial availability, and complex behaviors during partitions that do not fit the clean trichotomy.
A consistency model where, given enough time and no new updates, all replicas of a piece of data will converge to the same value.
The process by which a group of distributed nodes agree on a single value or sequence of values, even in the presence of failures.
Maintaining multiple copies of the same data across different nodes for fault tolerance, read scalability, and lower latency.
A protocol by which a group of nodes selects one node as the coordinator or "leader" responsible for a given task.
The minimum number of nodes that must agree on an operation for it to be considered successful in a distributed system.
Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.