Scalability & Performance
The number of operations a system can handle per unit of time, often measured in requests per second (RPS) or queries per second (QPS).
Throughput measures the rate at which a system processes work. Common units include requests per second, transactions per second, messages per second, or bytes per second. Throughput and latency are related but distinct: a system can have high throughput and high latency simultaneously (think of a long assembly line that produces a finished product every second but takes an hour for any single item to traverse).
Throughput is bounded by the slowest stage in the pipeline. A web service might be CPU-bound, network-bound, database-bound, or I/O-bound, and the bottleneck shifts as you scale. Profiling and load testing are essential to know which.
For system design interviews, you should be able to do back-of-envelope calculations: a single modern server can typically handle a few thousand to tens of thousands of requests per second for simple workloads, much less for complex ones.
Throughput is the primary scaling metric for batch systems, message queues, ingestion pipelines, and write-heavy databases.
Maximizing throughput often hurts latency: batching and pipelining trade individual response time for aggregate work done. Find the right point on this curve for your workload.
The time delay between a request being sent and a response being received — typically measured in milliseconds.
A component that distributes incoming network traffic across multiple backend servers to maximize throughput, minimize response time, and avoid overload.
Storing copies of frequently accessed data in fast memory so that subsequent requests can be served without recomputing or refetching.
A globally distributed network of edge servers that cache static content close to end users to minimize latency and origin load.
Adding more machines to a system to handle increased load, as opposed to making a single machine more powerful.
Increasing the capacity of a single machine — more CPU, memory, or disk — to handle more load.