Relational databases (SQL)
Data is stored in tables with rows and columns. Relationships between tables are expressed via foreign keys. Queries use SQL.
-- Users and their orders — normalized across two tables
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON o.user_id = u.id
GROUP BY u.id
ACID guarantees:
- Atomicity: a transaction either fully succeeds or fully rolls back — no partial writes
- Consistency: the database always moves from one valid state to another
- Isolation: concurrent transactions don’t see each other’s intermediate state
- Durability: committed data survives crashes
When to use SQL:
- Structured data with well-defined schema
- Complex queries and reporting
- Transactional integrity is required (payments, inventory, bookings)
- Relationships between entities matter
- You need JOINs
Popular options: PostgreSQL (default choice), MySQL, SQLite (embedded).
NoSQL databases
“Not Only SQL” — a broad category of databases that don’t use the relational model. They trade some ACID guarantees for scalability, flexibility, or performance in specific access patterns.
Document stores
Store data as JSON-like documents. Each document is self-contained.
{
"_id": "user_123",
"name": "Deveshwar",
"orders": [
{ "id": "ord_1", "total": 499, "items": ["item_a", "item_b"] }
]
}
Good for: Nested/variable data, content management, user profiles, any data that reads as a unit. Examples: MongoDB, Firestore.
Key-value stores
Simplest possible model: key → value. Extremely fast, especially in-memory.
Good for: Sessions, caching, feature flags, configuration. Examples: Redis, DynamoDB (also supports more).
Wide-column stores
Rows with flexible columns, optimized for write-heavy, append-mostly workloads.
Good for: Time-series data, event logs, analytics at massive scale. Examples: Cassandra, HBase.
Graph databases
Nodes and edges. Optimized for traversing relationships.
Good for: Social networks, recommendation engines, fraud detection. Examples: Neo4j.
Indexing
Without an index, a query scans every row in the table — O(n). With an index, it jumps directly to matching rows — O(log n) with a B-tree index.
-- Without index: full table scan
SELECT * FROM users WHERE email = 'dev@example.com'; -- scans all rows
-- Create index
CREATE INDEX idx_users_email ON users(email);
-- Now uses the index: O(log n)
SELECT * FROM users WHERE email = 'dev@example.com';
B-tree indexes (the default in PostgreSQL, MySQL):
- Good for equality (
=) and range queries (>,<,BETWEEN) - Ordered — also accelerates
ORDER BY - Most useful for high-cardinality columns (many distinct values)
Hash indexes:
- Equality only — no ranges
- O(1) lookup instead of O(log n)
- Rare in practice (B-tree is usually fast enough and more versatile)
Composite indexes: Index on multiple columns.
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Efficient for: WHERE user_id = 123 AND created_at > '2026-01-01'
-- Follows leftmost prefix rule — user_id alone also uses this index
The downside of indexes: Every write must update the index. Heavily-indexed tables have slower writes. Don’t index everything — index the columns in your WHERE and JOIN conditions.
Replication
Running a single database server is a single point of failure and a read bottleneck. Replication copies data to additional servers.
Leader-follower (master-replica)
Writes → [ Leader ] → propagate to followers
Reads → [ Follower 1 ] [ Follower 2 ] [ Follower 3 ]
All writes go to the leader. Reads can go to any follower. Followers may lag behind the leader by milliseconds to seconds (replication lag).
Pros: Simple. Read scaling (more followers = more read capacity). Follower can be promoted if leader fails.
Cons: Writes don’t scale (all go to one leader). Replication lag means followers may return stale data. Failover is manual or needs extra tooling.
Multi-leader (active-active)
Multiple leaders accept writes. Changes propagate between leaders.
Pros: Write scaling. Survives single leader failure.
Cons: Write conflicts — same record updated on two leaders simultaneously. Conflict resolution is application-specific and hard to get right. Avoid unless you have a clear use case (multi-datacenter active-active).
Leaderless (quorum)
Write to N nodes; a write succeeds when W nodes acknowledge it. Read from R nodes; return the value if a quorum agrees.
With N=3, W=2, R=2: you can tolerate one node failure for both reads and writes, and R+W > N ensures you always read at least one node with the latest write.
Examples: Cassandra, DynamoDB. Used for high availability + partition tolerance.
Sharding (horizontal partitioning)
When a single database can’t hold all the data, split it across multiple database servers (shards).
Range-based sharding: Users A-M on shard 1, N-Z on shard 2. Simple, but hotspots if the data isn’t uniform.
Hash-based sharding: shard = hash(user_id) % N. Even distribution, but range queries must hit all shards.
Directory-based sharding: A lookup service maps keys to shards. Flexible (add shards without rehashing), but the directory is a bottleneck.
The downside: Cross-shard JOINs are expensive or impossible. You design your data model around the shard key. Schema migrations become much harder.
SQL vs NoSQL: the decision
| Choose SQL when | Choose NoSQL when |
|---|---|
| Data has clear relationships | Access patterns are known and simple |
| Transactions matter | Schema is flexible or varies per record |
| Complex queries and reporting | Need to scale writes horizontally (NoSQL shards naturally) |
| Team knows SQL | Specific access pattern maps to a NoSQL model |
| You’re unsure (default) | Massive scale with uniform access pattern |
Default to PostgreSQL. Switch to NoSQL when you have a specific, concrete reason — not because “it scales better.” Modern PostgreSQL with proper indexing and read replicas handles most workloads comfortably.