SystemCity
WorkspaceProblemsCanvasPricing
Sign in
S

SystemCity

AI-powered system design tutor. Learn architecture, ace interviews, build real systems.

Learn

  • Learn System Design
  • Interview Prep Guide
  • All Problems
  • Glossary
  • Compare
  • Design Canvas

Product

  • Pricing
  • Portfolio
  • Support

Legal

  • Terms
  • Privacy
  • Refunds

© 2026 SystemCity. All rights reserved.

Master system design · interview prep · 120+ problems

Back to glossary

Distributed Systems

Leader Election

A protocol by which a group of nodes selects one node as the coordinator or "leader" responsible for a given task.

In depth

Leader election is the mechanism by which a distributed system designates one node as the leader for some scope of work. The leader typically handles writes, coordinates state changes, or schedules work, while followers replicate or stand by. Examples: a single MongoDB primary, the etcd leader, the active master in HDFS, the producer of a Kafka transaction coordinator.

Election is usually built on consensus: nodes propose themselves, a quorum agrees, and the winner takes the role for a bounded "term" or "lease". When the leader fails (detected by missed heartbeats), a new election kicks off and another node takes over. Lease-based election with fencing tokens prevents the dreaded split-brain scenario where two nodes both believe they are the leader.

Leader election simplifies many distributed problems by reducing them to single-node problems: only one node accepts writes, so you avoid concurrent-write conflicts entirely.

When to use

Use leader election whenever you need a single coordinator: primary database replicas, scheduled job runners, cluster managers, distributed lock services.

Tradeoffs

A single leader is a bottleneck (all writes go through it) and an availability concern (if election takes seconds, the system is unavailable for writes during that window). Multi-leader and leaderless designs trade complexity for higher availability.

Related terms

Consensus

The process by which a group of distributed nodes agree on a single value or sequence of values, even in the presence of failures.

Quorum

The minimum number of nodes that must agree on an operation for it to be considered successful in a distributed system.

Replication

Maintaining multiple copies of the same data across different nodes for fault tolerance, read scalability, and lower latency.

CAP Theorem

A principle stating that a distributed data store can provide at most two of: Consistency, Availability, and Partition tolerance.

Eventual Consistency

A consistency model where, given enough time and no new updates, all replicas of a piece of data will converge to the same value.

Sharding

Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.

Practice this concept

MediumInfrastructure

Design a Blockchain Based System

MediumInfrastructure

Design a Distributed Locking System

HardInfrastructure

Design a Distributed File System

HardMessaging

Design a Distributed Messaging System