SystemCity
WorkspaceProblemsCanvasPricing
Sign in
S

SystemCity

AI-powered system design tutor. Learn architecture, ace interviews, build real systems.

Learn

  • Learn System Design
  • Interview Prep Guide
  • All Problems
  • Glossary
  • Compare
  • Design Canvas

Product

  • Pricing
  • Portfolio
  • Support

Legal

  • Terms
  • Privacy
  • Refunds

© 2026 SystemCity. All rights reserved.

Master system design · interview prep · 120+ problems

Back to glossary

Scalability & Performance

Vertical Scaling

Also known as: Scale Up, Scaling Up

Increasing the capacity of a single machine — more CPU, memory, or disk — to handle more load.

In depth

Vertical scaling means making one server bigger. You replace a 4-core, 16 GB instance with a 32-core, 256 GB one. The application code typically does not change; the same monolithic process simply has more resources available.

Vertical scaling is appealing because it avoids distributed-systems complexity. There is no need to shard data, coordinate replicas, or handle partial failures. It is often the right first move when a system is small, the engineering team is small, and reliability requirements are modest.

The downside is a hard ceiling. Even the largest cloud instances top out at a few terabytes of RAM and a few hundred cores. Vertical scaling also provides no fault tolerance — if the one big machine dies, the entire system goes down. And the cost curve is non-linear: doubling instance size often more than doubles the price.

When to use

Use vertical scaling for early-stage systems, single-node databases (within reason), and any workload that genuinely fits on one machine. Many companies postpone horizontal scaling far longer than they think by aggressively scaling vertically first.

Tradeoffs

Hardware ceiling, no fault tolerance, expensive at the upper end, and downtime during upgrade in many setups. Eventually, every successful system outgrows vertical scaling.

Related terms

Horizontal Scaling

Adding more machines to a system to handle increased load, as opposed to making a single machine more powerful.

Sharding

Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.

Caching

Storing copies of frequently accessed data in fast memory so that subsequent requests can be served without recomputing or refetching.

CDN (Content Delivery Network)

A globally distributed network of edge servers that cache static content close to end users to minimize latency and origin load.

Load Balancer

A component that distributes incoming network traffic across multiple backend servers to maximize throughput, minimize response time, and avoid overload.

Latency

The time delay between a request being sent and a response being received — typically measured in milliseconds.