Data & Storage
Also known as: Blob Storage, S3
A storage architecture that manages data as objects (file + metadata + ID) in a flat namespace, optimized for huge amounts of unstructured data.
Object storage stores data as objects — blobs of bytes plus arbitrary metadata, identified by a unique key — in a flat namespace called a bucket. There are no directories in the traditional sense; the slash-separated structure of keys is a naming convention, not a real hierarchy. Object stores are accessed over HTTP via REST APIs (most famously Amazon S3) and are the storage substrate of modern cloud infrastructure.
Object stores are designed for massive scale, high durability (S3 advertises 11 nines), and low cost per GB. They sacrifice features that file systems and block storage offer: there is no in-place modification (each write replaces the whole object), no random access within an object, and consistency models historically traded off for availability (S3 became strongly consistent only in 2020).
Typical use cases: static assets for websites, user uploads (images, videos, documents), data lake storage for analytics, backups, and the storage tier underneath modern data warehouses (Snowflake, BigQuery) and lakehouses (Iceberg, Delta).
Use object storage for any large, immutable, unstructured data: media files, log archives, backups, ML training data, and the static assets behind a CDN.
Object stores are not suitable for low-latency random updates, transactional workloads, or anything requiring POSIX semantics. Per-request cost can dominate for tiny-object, high-traffic workloads.
A globally distributed network of edge servers that cache static content close to end users to minimize latency and origin load.
A choice between relational databases with strict schemas and ACID guarantees and non-relational databases optimized for scale, flexibility, or specialized workloads.
Atomicity, Consistency, Isolation, Durability — the four properties that traditional database transactions guarantee.
A data structure (typically a B-tree or hash table) that lets a database find rows matching a query without scanning the entire table.
Intentionally duplicating data across tables to avoid expensive joins and improve read performance, at the cost of write complexity.
A database optimized for storing and querying timestamped data points — ideal for metrics, sensor data, financial ticks, and events.