Distributed Systems

Consensus

Also known as: Distributed Consensus, Raft, Paxos

The process by which a group of distributed nodes agree on a single value or sequence of values, even in the presence of failures.

In depth

Consensus is one of the fundamental problems in distributed systems. A consensus protocol allows a group of nodes to agree on a value such that all non-faulty nodes decide on the same value, the chosen value was proposed by some node, and the protocol terminates. The two famous algorithms are Paxos (Lamport, 1990s) and Raft (Ongaro & Ousterhout, 2014). Raft was explicitly designed to be more understandable than Paxos and is the dominant choice in modern systems.

Consensus underpins much of modern infrastructure: leader election in Kubernetes (etcd), service discovery in Consul, distributed locking in ZooKeeper, transaction commit in Spanner, and replicated state machines in nearly every modern database. Whenever you need a single source of truth across multiple nodes, consensus is the building block.

Consensus requires a majority (quorum) of nodes to agree. With three nodes, you need at least two; with five, three. This is why consensus clusters are sized as odd numbers — five nodes can tolerate two failures, four can only tolerate one.

When to use

Use consensus whenever multiple nodes must agree on a critical fact: who is the leader, what is the next log entry, did this transaction commit. Consensus is expensive — keep its scope small.

Tradeoffs

Consensus is slow (multiple network round-trips), unavailable when a quorum is not reachable, and complex to implement correctly. Reuse battle-tested implementations (etcd, ZooKeeper) rather than rolling your own.

Leader Election

A protocol by which a group of nodes selects one node as the coordinator or "leader" responsible for a given task.

Quorum

The minimum number of nodes that must agree on an operation for it to be considered successful in a distributed system.

Replication

Maintaining multiple copies of the same data across different nodes for fault tolerance, read scalability, and lower latency.

CAP Theorem

A principle stating that a distributed data store can provide at most two of: Consistency, Availability, and Partition tolerance.

Eventual Consistency

A consistency model where, given enough time and no new updates, all replicas of a piece of data will converge to the same value.

Sharding

Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.

MediumInfrastructure

Design a Blockchain Based System

MediumInfrastructure

Design a Distributed Locking System

Back to glossary

Distributed Systems

Consensus

Also known as: Distributed Consensus, Raft, Paxos

The process by which a group of distributed nodes agree on a single value or sequence of values, even in the presence of failures.

In depth

When to use

Use consensus whenever multiple nodes must agree on a critical fact: who is the leader, what is the next log entry, did this transaction commit. Consensus is expensive — keep its scope small.

Consensus

In depth

When to use

Tradeoffs

Related terms

Leader Election

Quorum

Replication

CAP Theorem

Eventual Consistency

Sharding

Practice this concept

Design a Blockchain Based System

Design a Distributed Locking System

Consensus

In depth

When to use

Tradeoffs

Related terms

Leader Election

Quorum

Replication

CAP Theorem

Eventual Consistency

Sharding

Practice this concept

Design a Blockchain Based System

Design a Distributed Locking System