Distributed Systems

Sharding

Also known as: Partitioning, Horizontal Partitioning

Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.

In depth

Sharding (also called horizontal partitioning) divides a large dataset into smaller pieces called shards, each held on a separate node. Where replication makes copies of the same data, sharding distributes different data across nodes. Together they let a system scale beyond the capacity and throughput of any single machine.

The shard key — the field used to decide which shard a row belongs to — is the most important design decision in any sharded system. Hash-based sharding (hash the key, mod by shard count) gives even distribution but makes range queries expensive. Range-based sharding keeps related rows together for fast range scans but risks hot shards if the workload is skewed. Geographic sharding routes users to the data center nearest them.

Resharding (changing the number of shards) is one of the hardest operational problems in distributed systems. Consistent hashing reduces the data movement required when shards are added or removed, which is why it underpins systems like Cassandra and DynamoDB.

When to use

Shard when a single node cannot hold the dataset, cannot serve the write throughput, or cannot meet latency requirements. Shard reluctantly — operational cost is high.

Tradeoffs

Sharding eliminates cross-shard transactions, makes joins expensive, complicates resharding, and often forces denormalization. A wrong shard key bakes in years of pain.

Horizontal Scaling

Adding more machines to a system to handle increased load, as opposed to making a single machine more powerful.

Consistent Hashing

A hashing technique that minimizes the amount of data that needs to be moved when nodes are added to or removed from a distributed system.

Replication

Maintaining multiple copies of the same data across different nodes for fault tolerance, read scalability, and lower latency.

CAP Theorem

A principle stating that a distributed data store can provide at most two of: Consistency, Availability, and Partition tolerance.

Eventual Consistency

A consistency model where, given enough time and no new updates, all replicas of a piece of data will converge to the same value.

Consensus

The process by which a group of distributed nodes agree on a single value or sequence of values, even in the presence of failures.

MediumInfrastructure

Distributed Systems

Sharding

Also known as: Partitioning, Horizontal Partitioning

Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.

In depth

When to use

Shard when a single node cannot hold the dataset, cannot serve the write throughput, or cannot meet latency requirements. Shard reluctantly — operational cost is high.

Tradeoffs

Sharding eliminates cross-shard transactions, makes joins expensive, complicates resharding, and often forces denormalization. A wrong shard key bakes in years of pain.

Sharding

In depth

When to use

Tradeoffs

Related terms

Horizontal Scaling

Consistent Hashing

Replication

CAP Theorem

Eventual Consistency

Consensus

Practice this concept

Design a Distributed Counter

Design a Distributed Linked List

Design a Key Value Store

Design a Distributed OLTP Database

Sharding

In depth

When to use

Tradeoffs

Related terms

Horizontal Scaling

Consistent Hashing

Replication

CAP Theorem

Eventual Consistency

Consensus

Practice this concept

Design a Distributed Counter

Design a Distributed Linked List

Design a Key Value Store

Design a Distributed OLTP Database