Data & Storage
Intentionally duplicating data across tables to avoid expensive joins and improve read performance, at the cost of write complexity.
Denormalization is the deliberate addition of redundant data to a schema to make reads faster. In a fully normalized schema, every fact is stored in exactly one place; reads must join multiple tables to assemble the final answer. In a denormalized schema, common read shapes are pre-joined and stored together so a single query returns everything.
Denormalization is the dominant pattern in NoSQL document stores like MongoDB and DynamoDB, where joins are expensive or unsupported. It is also common in analytical schemas (star and snowflake) and in any high-traffic read path where joins have become a bottleneck.
The cost is write amplification. When a denormalized fact (a user's display name copied into every comment they posted) changes, every copy must be updated — synchronously, asynchronously, or accepted as eventually consistent. Denormalization moves complexity from the read path to the write path.
Denormalize when reads dominate writes by orders of magnitude, when joins are too slow, when using a NoSQL store that does not join efficiently, or when designing for a specific access pattern at scale.
Denormalization complicates updates, can lead to inconsistency if any update path is missed, and grows storage. Reverse the decision is hard once consumers depend on the duplicated shape.
A choice between relational databases with strict schemas and ACID guarantees and non-relational databases optimized for scale, flexibility, or specialized workloads.
Splitting a large dataset across multiple machines so that each shard holds a subset of the data and handles a subset of the load.
A consistency model where, given enough time and no new updates, all replicas of a piece of data will converge to the same value.
Atomicity, Consistency, Isolation, Durability — the four properties that traditional database transactions guarantee.
A data structure (typically a B-tree or hash table) that lets a database find rows matching a query without scanning the entire table.
A storage architecture that manages data as objects (file + metadata + ID) in a flat namespace, optimized for huge amounts of unstructured data.