Scalability & Performance

Horizontal Scaling

Also known as: Scale Out, Scaling Out

Adding more machines to a system to handle increased load, as opposed to making a single machine more powerful.

In depth

Horizontal scaling — also called scaling out — adds more nodes (servers, containers, pods) to a system. Each new node takes a share of the traffic, typically routed by a load balancer. The capacity of the system grows roughly linearly with the number of nodes.

Horizontal scaling is the dominant approach for modern web-scale systems because commodity hardware is cheap, cloud providers make provisioning instant, and there is no theoretical ceiling: you can always add another node. Stateless services (where any node can serve any request) horizontally scale trivially. Stateful services (databases, caches with affinity) scale horizontally through sharding, replication, or partitioning.

Vertical scaling — buying a bigger box — is the alternative. Vertical scaling is simpler and avoids distributed-systems complexity, but hits hardware ceilings, has costly upgrade paths, and provides no fault tolerance.

When to use

Choose horizontal scaling when traffic exceeds what a single beefy machine can handle, when you need fault tolerance (one node can die without bringing down the system), or when you want elastic capacity that grows and shrinks with demand.

Tradeoffs

Horizontal scaling pushes complexity into your application: you must handle distributed state, eventual consistency, partial failures, and idempotency. Stateful workloads (databases) are much harder to scale out than stateless ones.

Load Balancer

A component that distributes incoming network traffic across multiple backend servers to maximize throughput, minimize response time, and avoid overload.

Sharding

Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.

Replication

Maintaining multiple copies of the same data across different nodes for fault tolerance, read scalability, and lower latency.

Caching

Storing copies of frequently accessed data in fast memory so that subsequent requests can be served without recomputing or refetching.

CDN (Content Delivery Network)

A globally distributed network of edge servers that cache static content close to end users to minimize latency and origin load.

Vertical Scaling

Increasing the capacity of a single machine — more CPU, memory, or disk — to handle more load.

Back to glossary

Scalability & Performance

Horizontal Scaling

Also known as: Scale Out, Scaling Out

Adding more machines to a system to handle increased load, as opposed to making a single machine more powerful.

Horizontal Scaling

In depth

When to use

Tradeoffs

Related terms

Load Balancer

Sharding

Replication

Caching

CDN (Content Delivery Network)

Vertical Scaling

Horizontal Scaling

In depth

When to use

Tradeoffs

Related terms

Load Balancer

Sharding

Replication

Caching

CDN (Content Delivery Network)

Vertical Scaling