SystemCity
WorkspaceProblemsCanvasPricing
Sign in
S

SystemCity

AI-powered system design tutor. Learn architecture, ace interviews, build real systems.

Learn

  • Learn System Design
  • Interview Prep Guide
  • All Problems
  • Glossary
  • Compare
  • Design Canvas

Product

  • Pricing
  • Portfolio
  • Support

Legal

  • Terms
  • Privacy
  • Refunds

© 2026 SystemCity. All rights reserved.

Master system design · interview prep · 120+ problems

Back to glossary

Scalability & Performance

Load Balancer

A component that distributes incoming network traffic across multiple backend servers to maximize throughput, minimize response time, and avoid overload.

In depth

A load balancer sits in front of a pool of backend servers and routes each incoming request to one of them according to an algorithm. Common algorithms include round-robin, least-connections, weighted round-robin, IP hash (for sticky sessions), and consistent hashing.

Load balancers operate at different OSI layers. Layer 4 (transport) load balancers route based on IP and port — fast but unaware of HTTP semantics. Layer 7 (application) load balancers like NGINX, HAProxy, and AWS ALB inspect HTTP headers, paths, and cookies, allowing path-based routing, header-based routing, and request-level features like rate limiting, TLS termination, and request rewriting.

A properly designed load balancer enables horizontal scaling, performs health checks to remove unhealthy nodes from rotation, and provides a single virtual IP / DNS name that hides the topology of the backend.

When to use

Anytime you have more than one backend server. The first scaling step in almost every architecture is to put a load balancer in front of two or more app servers.

Tradeoffs

A single load balancer is a single point of failure — production systems run them in active/active or active/passive HA pairs. Sticky sessions improve cache locality but reduce flexibility and can create hot servers.

Related terms

Horizontal Scaling

Adding more machines to a system to handle increased load, as opposed to making a single machine more powerful.

Reverse Proxy

A server that sits in front of one or more backend servers and forwards client requests to them, often handling TLS, caching, compression, and load balancing.

Consistent Hashing

A hashing technique that minimizes the amount of data that needs to be moved when nodes are added to or removed from a distributed system.

Caching

Storing copies of frequently accessed data in fast memory so that subsequent requests can be served without recomputing or refetching.

CDN (Content Delivery Network)

A globally distributed network of edge servers that cache static content close to end users to minimize latency and origin load.

Vertical Scaling

Increasing the capacity of a single machine — more CPU, memory, or disk — to handle more load.