Distributed Systems
A hashing technique that minimizes the amount of data that needs to be moved when nodes are added to or removed from a distributed system.
Consistent hashing maps both data keys and nodes onto the same hash ring (a circular space, typically 0 to 2^32 − 1). Each key is assigned to the first node encountered while walking clockwise from the key's hash. When a new node is added, it takes over only the keys that fall between it and its predecessor — roughly 1/N of all keys — instead of forcing a global reshuffle.
Naive hashing (key % N) requires moving almost every key when N changes, because the modulus changes for every key. Consistent hashing keeps the disruption local. Virtual nodes — assigning each physical node many positions on the ring — improve load balancing and reduce variance.
Consistent hashing is the foundation of Amazon Dynamo, Apache Cassandra, Memcached client-side sharding, Akamai's CDN routing, and many distributed caches and load balancers.
Use consistent hashing whenever you shard data across a dynamic set of nodes — caches, key-value stores, and stateful load balancers.
Consistent hashing does not handle range queries well, requires virtual nodes for even distribution, and still suffers from hot keys if a single key receives outsized traffic.
Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.
A component that distributes incoming network traffic across multiple backend servers to maximize throughput, minimize response time, and avoid overload.
A principle stating that a distributed data store can provide at most two of: Consistency, Availability, and Partition tolerance.
A consistency model where, given enough time and no new updates, all replicas of a piece of data will converge to the same value.
The process by which a group of distributed nodes agree on a single value or sequence of values, even in the presence of failures.
A protocol by which a group of nodes selects one node as the coordinator or "leader" responsible for a given task.