SystemCity
WorkspaceProblemsCanvasPricing
Sign in
S

SystemCity

AI-powered system design tutor. Learn architecture, ace interviews, build real systems.

Learn

  • Learn System Design
  • Interview Prep Guide
  • All Problems
  • Glossary
  • Compare
  • Design Canvas

Product

  • Pricing
  • Portfolio
  • Support

Legal

  • Terms
  • Privacy
  • Refunds

© 2026 SystemCity. All rights reserved.

Master system design · interview prep · 120+ problems

Back to glossary

Reliability & Resilience

Graceful Degradation

Designing a system so that when a component fails, the rest of the system continues to operate with reduced functionality rather than failing completely.

In depth

Graceful degradation is the practice of designing services to fail partially rather than completely. When a recommendation service is down, the homepage still loads — just without the personalized carousel. When the comments service is slow, the article still renders — comments show a "loading…" placeholder or are hidden entirely. The user sees a worse experience but not a broken one.

Implementing graceful degradation typically combines several techniques: timeouts on every dependency, circuit breakers that trip on excess failures, fallback values (cached, default, or empty), feature flags that can disable expensive features under load, and prioritization of critical paths over nice-to-haves.

The alternative — a single dependency outage takes down the whole product — is almost always worse for users and worse for the business. Designing for partial failure is what separates merely working systems from genuinely robust ones.

When to use

Build graceful degradation into every system that has independent components. The richer the product surface, the more critical it becomes.

Tradeoffs

Graceful degradation requires more code paths, more fallback content, more testing, and more thought during design. It is easy to skip and very expensive to retrofit.

Related terms

Circuit Breaker

A pattern that stops calls to a failing downstream service for a cool-off period to prevent cascading failures and give the service time to recover.

Retry & Backoff

A reliability pattern that re-attempts failed operations after progressively longer delays, optionally with jitter, to ride out transient failures.

Idempotency

A property of operations such that performing them multiple times has the same effect as performing them once — essential for safe retries.

SLA, SLO, SLI

Service Level Indicator (the metric), Service Level Objective (the target), and Service Level Agreement (the contract with consequences).