Data & Storage

Denormalization

Intentionally duplicating data across tables to avoid expensive joins and improve read performance, at the cost of write complexity.

In depth

Denormalization is the deliberate addition of redundant data to a schema to make reads faster. In a fully normalized schema, every fact is stored in exactly one place; reads must join multiple tables to assemble the final answer. In a denormalized schema, common read shapes are pre-joined and stored together so a single query returns everything.

Denormalization is the dominant pattern in NoSQL document stores like MongoDB and DynamoDB, where joins are expensive or unsupported. It is also common in analytical schemas (star and snowflake) and in any high-traffic read path where joins have become a bottleneck.

The cost is write amplification. When a denormalized fact (a user's display name copied into every comment they posted) changes, every copy must be updated — synchronously, asynchronously, or accepted as eventually consistent. Denormalization moves complexity from the read path to the write path.

When to use

Denormalize when reads dominate writes by orders of magnitude, when joins are too slow, when using a NoSQL store that does not join efficiently, or when designing for a specific access pattern at scale.

Tradeoffs

Denormalization complicates updates, can lead to inconsistency if any update path is missed, and grows storage. Reverse the decision is hard once consumers depend on the duplicated shape.

SQL vs NoSQL

A choice between relational databases with strict schemas and ACID guarantees and non-relational databases optimized for scale, flexibility, or specialized workloads.

Sharding

Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.

Eventual Consistency

A consistency model where, given enough time and no new updates, all replicas of a piece of data will converge to the same value.

ACID

Atomicity, Consistency, Isolation, Durability — the four properties that traditional database transactions guarantee.

Database Indexing

A data structure (typically a B-tree or hash table) that lets a database find rows matching a query without scanning the entire table.

Object Storage

A storage architecture that manages data as objects (file + metadata + ID) in a flat namespace, optimized for huge amounts of unstructured data.

Back to glossary

Data & Storage

Denormalization

Intentionally duplicating data across tables to avoid expensive joins and improve read performance, at the cost of write complexity.

In depth

When to use

Tradeoffs

Denormalization complicates updates, can lead to inconsistency if any update path is missed, and grows storage. Reverse the decision is hard once consumers depend on the duplicated shape.

Denormalization

In depth

When to use

Tradeoffs

Related terms

SQL vs NoSQL

Sharding

Eventual Consistency

ACID

Database Indexing

Object Storage

Denormalization

In depth

When to use

Tradeoffs

Related terms

SQL vs NoSQL

Sharding

Eventual Consistency

ACID

Database Indexing

Object Storage