Data & Storage
Also known as: Indexing, B-tree Index
A data structure (typically a B-tree or hash table) that lets a database find rows matching a query without scanning the entire table.
A database index is a separate data structure that maps the values of one or more columns to the physical location of the matching rows. Without an index, a query that filters on a column must scan every row in the table — O(N). With an index, lookup is typically O(log N) for a B-tree or O(1) for a hash index.
Most relational databases use B-tree indexes by default because they support equality queries, range queries, and sort orders efficiently. Hash indexes are faster for exact-match lookups but cannot do ranges. Specialized indexes include GIN/GiST (Postgres, for full-text and geospatial), bitmap indexes (analytics warehouses), and inverted indexes (Elasticsearch, for text search).
Indexes are not free. Every index adds disk space and slows down writes (each insert, update, or delete must update every relevant index). The right number of indexes is enough to make important queries fast and no more — a common production performance issue is a table with too many redundant indexes.
Add an index for any column you frequently filter, join, or sort by — but only after measuring. Drop indexes that no query uses.
Indexes trade write throughput and disk space for read speed. Composite indexes only help if the query matches the leading column. Over-indexing is a real anti-pattern in production databases.
A choice between relational databases with strict schemas and ACID guarantees and non-relational databases optimized for scale, flexibility, or specialized workloads.
Intentionally duplicating data across tables to avoid expensive joins and improve read performance, at the cost of write complexity.
Atomicity, Consistency, Isolation, Durability — the four properties that traditional database transactions guarantee.
A storage architecture that manages data as objects (file + metadata + ID) in a flat namespace, optimized for huge amounts of unstructured data.
A database optimized for storing and querying timestamped data points — ideal for metrics, sensor data, financial ticks, and events.