SystemCity
WorkspaceProblemsCanvasPricing
Sign in
S

SystemCity

AI-powered system design tutor. Learn architecture, ace interviews, build real systems.

Learn

  • Learn System Design
  • Interview Prep Guide
  • All Problems
  • Glossary
  • Compare
  • Design Canvas

Product

  • Pricing
  • Portfolio
  • Support

Legal

  • Terms
  • Privacy
  • Refunds

© 2026 SystemCity. All rights reserved.

Master system design · interview prep · 120+ problems

Back to glossary

Scalability & Performance

Horizontal Scaling

Also known as: Scale Out, Scaling Out

Adding more machines to a system to handle increased load, as opposed to making a single machine more powerful.

In depth

Horizontal scaling — also called scaling out — adds more nodes (servers, containers, pods) to a system. Each new node takes a share of the traffic, typically routed by a load balancer. The capacity of the system grows roughly linearly with the number of nodes.

Horizontal scaling is the dominant approach for modern web-scale systems because commodity hardware is cheap, cloud providers make provisioning instant, and there is no theoretical ceiling: you can always add another node. Stateless services (where any node can serve any request) horizontally scale trivially. Stateful services (databases, caches with affinity) scale horizontally through sharding, replication, or partitioning.

Vertical scaling — buying a bigger box — is the alternative. Vertical scaling is simpler and avoids distributed-systems complexity, but hits hardware ceilings, has costly upgrade paths, and provides no fault tolerance.

When to use

Choose horizontal scaling when traffic exceeds what a single beefy machine can handle, when you need fault tolerance (one node can die without bringing down the system), or when you want elastic capacity that grows and shrinks with demand.

Tradeoffs

Horizontal scaling pushes complexity into your application: you must handle distributed state, eventual consistency, partial failures, and idempotency. Stateful workloads (databases) are much harder to scale out than stateless ones.

Related terms

Load Balancer

A component that distributes incoming network traffic across multiple backend servers to maximize throughput, minimize response time, and avoid overload.

Sharding

Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.

Replication

Maintaining multiple copies of the same data across different nodes for fault tolerance, read scalability, and lower latency.

Caching

Storing copies of frequently accessed data in fast memory so that subsequent requests can be served without recomputing or refetching.

CDN (Content Delivery Network)

A globally distributed network of edge servers that cache static content close to end users to minimize latency and origin load.

Vertical Scaling

Increasing the capacity of a single machine — more CPU, memory, or disk — to handle more load.