Scalability Basics — ZERO TO GRIND

What is scalability?

A system is scalable if it can handle increased load without a redesign. Load might mean more users, more requests per second, more data stored, or larger payloads.

Scalability is not the same as performance. A fast system that falls over at 10× traffic is not scalable. A slow system that handles 10× traffic gracefully is scalable (just also slow).

The goal is to design systems where adding resources translates predictably into added capacity.

Vertical scaling (scale up)

Add more power to the existing machine: more CPU cores, more RAM, faster disks.

Before:          After:
[ Server 8GB ]   [ Server 64GB ]

Pros:

Simple — no application changes required
No distributed systems complexity
Consistent (single node = no network partitions, no sync issues)

Cons:

Hard limit — you can only buy so much CPU/RAM
Single point of failure
Expensive beyond a certain point (non-linear cost curve)
Requires downtime to upgrade hardware

When to use: Early-stage products, databases (where horizontal scaling is hard), stateful services where distribution is painful.

Horizontal scaling (scale out)

Add more machines, distribute the load across them.

Before:          After:
[ Server ]       [ Server ] [ Server ] [ Server ]
                      ↑
                 Load Balancer

Pros:

Theoretically unlimited scale
No single point of failure (if designed right)
Can use commodity hardware

Cons:

Distributed systems complexity: consistency, partitioning, coordination
State management becomes hard — which server holds the user’s session?
Network latency between nodes adds up

When to use: Stateless services (API servers, web servers), read-heavy workloads, anything where requests are independent.

Stateless vs stateful

This is the key to horizontal scaling. A stateless server treats every request independently — it doesn’t remember anything between requests. Any server can handle any request. This makes horizontal scaling trivially achievable.

A stateful server remembers things between requests (sessions, open connections). If user A’s session is on server 1 and the load balancer sends their next request to server 2, it fails.

Solutions for stateful services:

Sticky sessions: load balancer always routes the same user to the same server. Works, but defeats much of the scaling benefit.
Externalize state: move session data to Redis or a database. All servers share the state store. This is the standard approach.

The stateless/stateful split

A well-designed system separates concerns:

[ Client ]
    ↓
[ Load Balancer ]
    ↓
[ API Servers — STATELESS, many of them ]
    ↓
[ Database / Cache — STATEFUL, scaled separately ]

The application tier is stateless and scales horizontally. The persistence tier is stateful and scaled using database-specific techniques (replication, sharding).

Latency vs throughput

Two key metrics:

Latency: time to handle one request (ms). “How fast is each operation?”
Throughput: requests handled per unit time (req/s). “How much total work can the system do?”

They’re related but not the same. A system can have low latency and low throughput (fast but not parallelized), or high throughput and high latency (slow per request but handles many concurrently).

Goal	Optimization
Lower latency	Caching, faster algorithms, geographic distribution
Higher throughput	Horizontal scaling, async processing, parallelism

Estimating scale

Before designing, anchor on numbers. Common rough figures for interviews:

Metric	Rough value
Read QPS a single server handles	~10,000 req/s
Write QPS a single database handles	~1,000 writes/s
1 day of seconds	~86,400
1 year of seconds	~31.5 million
Average tweet size	~300 bytes
1 GB	10^9 bytes

Example estimate: Twitter with 100M daily active users, each user seeing 100 tweets/day.

Total tweet views: 10^7 × 100 = 10^9 per day
QPS: 10^9 / 86,400 ≈ 11,500 reads/sec
Peak (assume 3× average): ~35,000 reads/sec
Conclusion: you definitely need horizontal scaling, caching is critical

Getting the order of magnitude right matters more than the exact number.