Distributed Systems
The minimum number of nodes that must agree on an operation for it to be considered successful in a distributed system.
A quorum is the minimum subset of nodes whose agreement is required for a decision in a distributed system. For a cluster of N nodes, the typical quorum is ⌊N/2⌋ + 1 — a strict majority. With this rule, any two quorums must overlap in at least one node, which prevents two conflicting decisions from being committed simultaneously.
In replicated databases, quorums tune the consistency-availability tradeoff. With N replicas, a write quorum W and read quorum R, the system is strongly consistent if W + R > N, because any read quorum must overlap with the latest write quorum. Cassandra and DynamoDB expose W and R as per-request settings, letting applications choose between fast-and-eventually-consistent (W=1, R=1) and slow-and-strongly-consistent (W=N, R=N).
Quorum systems gracefully degrade: as long as a quorum is reachable, the system continues to function. Below quorum, the system loses availability for writes (or reads, depending on configuration) to preserve safety.
Use quorums in every replicated state machine, voting protocol, and consensus implementation. Tune W and R explicitly when using tunable-consistency datastores.
Larger quorums improve fault tolerance but increase latency (more nodes to wait for). The classic gotcha: a 2-node cluster has a quorum of 2, so losing either node loses availability — always size for fault tolerance you actually need.
The process by which a group of distributed nodes agree on a single value or sequence of values, even in the presence of failures.
Maintaining multiple copies of the same data across different nodes for fault tolerance, read scalability, and lower latency.
A principle stating that a distributed data store can provide at most two of: Consistency, Availability, and Partition tolerance.
A protocol by which a group of nodes selects one node as the coordinator or "leader" responsible for a given task.
A consistency model where, given enough time and no new updates, all replicas of a piece of data will converge to the same value.
Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.