Distributed Systems

Consistent Hashing

A hashing technique that minimizes the amount of data that needs to be moved when nodes are added to or removed from a distributed system.

In depth

Consistent hashing maps both data keys and nodes onto the same hash ring (a circular space, typically 0 to 2^32 − 1). Each key is assigned to the first node encountered while walking clockwise from the key's hash. When a new node is added, it takes over only the keys that fall between it and its predecessor — roughly 1/N of all keys — instead of forcing a global reshuffle.

Naive hashing (key % N) requires moving almost every key when N changes, because the modulus changes for every key. Consistent hashing keeps the disruption local. Virtual nodes — assigning each physical node many positions on the ring — improve load balancing and reduce variance.

Consistent hashing is the foundation of Amazon Dynamo, Apache Cassandra, Memcached client-side sharding, Akamai's CDN routing, and many distributed caches and load balancers.

When to use

Use consistent hashing whenever you shard data across a dynamic set of nodes — caches, key-value stores, and stateful load balancers.

Tradeoffs

Consistent hashing does not handle range queries well, requires virtual nodes for even distribution, and still suffers from hot keys if a single key receives outsized traffic.

Sharding

Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.

Load Balancer

A component that distributes incoming network traffic across multiple backend servers to maximize throughput, minimize response time, and avoid overload.

CAP Theorem

A principle stating that a distributed data store can provide at most two of: Consistency, Availability, and Partition tolerance.

Eventual Consistency

A consistency model where, given enough time and no new updates, all replicas of a piece of data will converge to the same value.

Consensus

The process by which a group of distributed nodes agree on a single value or sequence of values, even in the presence of failures.

Leader Election

A protocol by which a group of nodes selects one node as the coordinator or "leader" responsible for a given task.

MediumInfrastructure

Distributed Systems

Consistent Hashing

A hashing technique that minimizes the amount of data that needs to be moved when nodes are added to or removed from a distributed system.

In depth

Consistent hashing is the foundation of Amazon Dynamo, Apache Cassandra, Memcached client-side sharding, Akamai's CDN routing, and many distributed caches and load balancers.

When to use

Use consistent hashing whenever you shard data across a dynamic set of nodes — caches, key-value stores, and stateful load balancers.

Tradeoffs

Consistent hashing does not handle range queries well, requires virtual nodes for even distribution, and still suffers from hot keys if a single key receives outsized traffic.

Consistent Hashing

In depth

When to use

Tradeoffs

Related terms

Sharding

Load Balancer

CAP Theorem

Eventual Consistency

Consensus

Leader Election

Practice this concept

Design a Distributed Counter

Design a Web Cache

Design a Global Content Distribution Network

Design a Cloud Storage Gateway

Consistent Hashing

In depth

When to use

Tradeoffs

Related terms

Sharding

Load Balancer

CAP Theorem

Eventual Consistency

Consensus

Leader Election

Practice this concept

Design a Distributed Counter

Design a Web Cache

Design a Global Content Distribution Network

Design a Cloud Storage Gateway