SystemCity
WorkspaceProblemsCanvasPricing
Sign in
S

SystemCity

AI-powered system design tutor. Learn architecture, ace interviews, build real systems.

Learn

  • Learn System Design
  • Interview Prep Guide
  • All Problems
  • Glossary
  • Compare
  • Design Canvas

Product

  • Pricing
  • Portfolio
  • Support

Legal

  • Terms
  • Privacy
  • Refunds

© 2026 SystemCity. All rights reserved.

Master system design · interview prep · 120+ problems

Back to glossary

Reliability & Resilience

Circuit Breaker

A pattern that stops calls to a failing downstream service for a cool-off period to prevent cascading failures and give the service time to recover.

In depth

A circuit breaker wraps calls to a downstream service. When failures exceed a threshold (e.g., 50% of calls in the last 10 seconds), the breaker "trips" — subsequent calls fail immediately without contacting the downstream service. After a cool-off period, the breaker enters a half-open state and lets a limited number of calls through to test recovery; if they succeed, the breaker closes and normal traffic resumes.

The pattern prevents two failure modes. First, cascading failure: a slow downstream chokes the upstream service's thread pool, which then chokes its upstream, until the whole system is gridlocked. Second, retry storms: every client retrying a failing service drives load even higher and prevents recovery.

Libraries like Netflix Hystrix (now in maintenance), Resilience4j, Polly, and most service mesh sidecars (Istio, Linkerd) implement circuit breakers as a configuration concern rather than application code.

When to use

Wrap every cross-service call with a circuit breaker, especially in microservice architectures where one slow service can drag down many others.

Tradeoffs

Circuit breakers introduce additional behavior to test and tune. Fallback responses may be confusing to users. Tripping too eagerly causes false positives; too late defeats the purpose.

Related terms

Retry & Backoff

A reliability pattern that re-attempts failed operations after progressively longer delays, optionally with jitter, to ride out transient failures.

Rate Limiting

A control mechanism that caps the number of requests a client can make in a given time window to protect a service from abuse and overload.

Graceful Degradation

Designing a system so that when a component fails, the rest of the system continues to operate with reduced functionality rather than failing completely.

Idempotency

A property of operations such that performing them multiple times has the same effect as performing them once — essential for safe retries.

SLA, SLO, SLI

Service Level Indicator (the metric), Service Level Objective (the target), and Service Level Agreement (the contract with consequences).