Scalability & Performance

Rate Limiting

Also known as: Throttling

A control mechanism that caps the number of requests a client can make in a given time window to protect a service from abuse and overload.

In depth

Rate limiting restricts how many requests a single client (identified by user ID, API key, IP address, or any combination) can make within a sliding or fixed time window. It is the primary defense against abusive clients, runaway scripts, and accidental request storms.

Common algorithms include the token bucket (tokens regenerate at a fixed rate; each request consumes one), the leaky bucket (requests queue and drain at a fixed rate), fixed window (count requests per minute, reset at minute boundaries), and sliding window (a smoother variant that avoids edge bursts).

In a distributed system, rate limiting state must be shared across all nodes. Redis is a popular backend because of its single-threaded atomic operations like INCR and EXPIRE. Some systems push rate limiting to the edge (CDN, API gateway) so that abusive traffic is rejected before it reaches origin.

When to use

Apply rate limiting to every public API, every login endpoint, every expensive operation, and any endpoint that triggers downstream side effects (email, SMS, payment).

Tradeoffs

Aggressive rate limiting frustrates legitimate users; lax rate limiting leaves you exposed. Distributed rate limiting requires coordination, adding latency. Per-user limits can be circumvented by clients using many accounts or IPs.

API Gateway

A single entry point that routes external requests to internal services, handling concerns like authentication, rate limiting, and request transformation in one place.

Circuit Breaker

A pattern that stops calls to a failing downstream service for a cool-off period to prevent cascading failures and give the service time to recover.

Idempotency

A property of operations such that performing them multiple times has the same effect as performing them once — essential for safe retries.

Caching

Storing copies of frequently accessed data in fast memory so that subsequent requests can be served without recomputing or refetching.

CDN (Content Delivery Network)

A globally distributed network of edge servers that cache static content close to end users to minimize latency and origin load.

Load Balancer

A component that distributes incoming network traffic across multiple backend servers to maximize throughput, minimize response time, and avoid overload.

MediumInfrastructure

Design an API Rate Limiter

Back to glossary

Scalability & Performance

Rate Limiting

Also known as: Throttling

A control mechanism that caps the number of requests a client can make in a given time window to protect a service from abuse and overload.

In depth

When to use

Apply rate limiting to every public API, every login endpoint, every expensive operation, and any endpoint that triggers downstream side effects (email, SMS, payment).

Rate Limiting

In depth

When to use

Tradeoffs

Related terms

API Gateway

Circuit Breaker

Idempotency

Caching

CDN (Content Delivery Network)

Load Balancer

Practice this concept

Design an API Rate Limiter

Rate Limiting

In depth

When to use

Tradeoffs

Related terms

API Gateway

Circuit Breaker

Idempotency

Caching

CDN (Content Delivery Network)

Load Balancer

Practice this concept

Design an API Rate Limiter