SystemCity
WorkspaceProblemsCanvasPricing
Sign in
S

SystemCity

AI-powered system design tutor. Learn architecture, ace interviews, build real systems.

Learn

  • Learn System Design
  • Interview Prep Guide
  • All Problems
  • Glossary
  • Compare
  • Design Canvas

Product

  • Pricing
  • Portfolio
  • Support

Legal

  • Terms
  • Privacy
  • Refunds

© 2026 SystemCity. All rights reserved.

Master system design · interview prep · 120+ problems

Back to glossary

Scalability & Performance

Rate Limiting

Also known as: Throttling

A control mechanism that caps the number of requests a client can make in a given time window to protect a service from abuse and overload.

In depth

Rate limiting restricts how many requests a single client (identified by user ID, API key, IP address, or any combination) can make within a sliding or fixed time window. It is the primary defense against abusive clients, runaway scripts, and accidental request storms.

Common algorithms include the token bucket (tokens regenerate at a fixed rate; each request consumes one), the leaky bucket (requests queue and drain at a fixed rate), fixed window (count requests per minute, reset at minute boundaries), and sliding window (a smoother variant that avoids edge bursts).

In a distributed system, rate limiting state must be shared across all nodes. Redis is a popular backend because of its single-threaded atomic operations like INCR and EXPIRE. Some systems push rate limiting to the edge (CDN, API gateway) so that abusive traffic is rejected before it reaches origin.

When to use

Apply rate limiting to every public API, every login endpoint, every expensive operation, and any endpoint that triggers downstream side effects (email, SMS, payment).

Tradeoffs

Aggressive rate limiting frustrates legitimate users; lax rate limiting leaves you exposed. Distributed rate limiting requires coordination, adding latency. Per-user limits can be circumvented by clients using many accounts or IPs.

Related terms

API Gateway

A single entry point that routes external requests to internal services, handling concerns like authentication, rate limiting, and request transformation in one place.

Circuit Breaker

A pattern that stops calls to a failing downstream service for a cool-off period to prevent cascading failures and give the service time to recover.

Idempotency

A property of operations such that performing them multiple times has the same effect as performing them once — essential for safe retries.

Caching

Storing copies of frequently accessed data in fast memory so that subsequent requests can be served without recomputing or refetching.

CDN (Content Delivery Network)

A globally distributed network of edge servers that cache static content close to end users to minimize latency and origin load.

Load Balancer

A component that distributes incoming network traffic across multiple backend servers to maximize throughput, minimize response time, and avoid overload.

Practice this concept

MediumInfrastructure

Design an API Rate Limiter