A URL shortener is a deceptively simple product that hides serious distributed systems challenges: collision-free ID generation at scale, sub-10ms redirect latency globally, and analytics pipelines that don’t slow down the hot path. Interviewers love it because it’s scoped enough to finish in 45 minutes but deep enough to separate strong candidates.
Step 1 — Clarify requirements
Never jump into design. Start by driving a requirements conversation. Here’s how a strong candidate runs it.
Functional requirements
- Shorten URL: Given a long URL, return a short URL (e.g.,
https://bck.ly/aB3xZ2). - Redirect: Given a short code, redirect the user to the original URL.
- Custom aliases: Users can optionally choose their short code (e.g.,
/my-blog). - Expiry: URLs can have a TTL, defaulting to permanent.
- Analytics: Track click counts, referrer, country, device type per short URL.
- User accounts: Users can manage (list, delete, update) their own URLs.
Non-functional requirements
- Scale: 100 million URLs in the system. Write: ~200 new URLs/sec. Read: ~20,000 redirects/sec (100:1 read:write ratio).
- Latency: p99 redirect latency < 10ms (this is the critical path — must be cached).
- Availability: 99.99% uptime. Redirects must work even if the write path is degraded.
- Short code length: Codes should be as short as possible while supporting 100M+ URLs.
- Security: No ability to enumerate valid short codes by incrementing a counter.
What we are NOT building
- A full link-management SaaS with teams, folders, SAML SSO, etc. (out of scope for the interview).
Step 2 — Capacity estimation
Always do back-of-envelope math before drawing any boxes. It drives every storage, caching, and server sizing decision.
QPS (Queries per second)
Write (URL creation):
100M URLs / (365 days × 86,400 sec) ≈ 3 URLs/sec average
With burst (marketing campaigns): ~200 URLs/sec peak
Read (redirects):
100:1 read:write ratio → 200 URLs/sec × 100 = 20,000 redirects/sec peak
Storage estimation
Per URL record:
shortCode: 7 bytes (base62, 7 chars)
originalUrl: 512 bytes (average long URL)
userId: 8 bytes
createdAt: 8 bytes
expiresAt: 8 bytes
clickCount: 8 bytes
─────────────────────
Total: ~551 bytes per row
100M rows × 551 bytes ≈ 55 GB for URL table (fits on a single large RDS instance)
Analytics events (raw click logs):
20,000 clicks/sec × 86,400 sec/day ≈ 1.7 billion events/day
Per event: ~200 bytes
Daily storage: 1.7B × 200 bytes ≈ 340 GB/day
→ Analytics needs a separate time-series/columnar store (ClickHouse)
Bandwidth
Reads: 20,000 req/sec × 512 bytes/response ≈ 10 MB/s outbound
Writes: 200 req/sec × 512 bytes/request ≈ 100 KB/s inbound
→ Bandwidth is not the bottleneck. Latency is.
Key insight from estimation: The URL table is tiny (55 GB) — it easily fits in memory as a Redis cache. Analytics storage is massive and must be decoupled entirely.
Step 3 — API design
POST /api/v1/urls — Create short URL
Request:
{
"originalUrl": "https://example.com/very/long/path?with=query¶ms=here",
"customAlias": "my-blog", // optional
"expiresAt": "2025-12-31T00:00:00Z" // optional, null = never
}
Response 201 Created:
{
"shortUrl": "https://bck.ly/aB3xZ2",
"shortCode": "aB3xZ2",
"originalUrl": "https://example.com/...",
"expiresAt": null,
"createdAt": "2024-01-15T10:30:00Z"
}
GET /{shortCode} — Redirect
- Returns
301 Moved Permanentlyor302 Found.
301 vs 302 — this is a common interview question:
| 301 Permanent | 302 Temporary | |
|---|---|---|
| Browser caches redirect? | Yes | No |
| Subsequent requests hit your server? | No (browser goes direct) | Yes |
| Analytics accuracy | Poor (cached clicks not tracked) | Accurate |
| Server load | Lower (browser caches) | Higher |
| Use when | URL never changes, max performance | Need click analytics |
Recommendation: Use 302 if analytics matter. Use 301 if you only care about redirect performance and don’t need accurate click counts.
GET /api/v1/urls/{shortCode}/stats — Analytics
{
"shortCode": "aB3xZ2",
"totalClicks": 42891,
"clicksLast24h": 1204,
"topCountries": [{"country": "US", "clicks": 18200}, ...],
"topReferrers": [{"referrer": "twitter.com", "clicks": 9400}, ...]
}
DELETE /api/v1/urls/{shortCode} — Soft-delete
Returns 204 No Content. Marks the record as deleted; does not physically remove it (audit trail).
Step 4 — Database schema
CREATE TABLE urls (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
short_code VARCHAR(16) NOT NULL,
original_url TEXT NOT NULL,
user_id BIGINT UNSIGNED,
created_at DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
expires_at DATETIME(3),
click_count BIGINT NOT NULL DEFAULT 0,
is_deleted BOOLEAN NOT NULL DEFAULT FALSE,
PRIMARY KEY (id),
UNIQUE KEY uq_short_code (short_code),
KEY idx_user_id (user_id),
KEY idx_expires_at (expires_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
CREATE TABLE users (
id BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
email VARCHAR(255) NOT NULL,
password_hash VARCHAR(255) NOT NULL,
created_at DATETIME(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
rate_limit_tier ENUM('free','pro','enterprise') NOT NULL DEFAULT 'free',
PRIMARY KEY (id),
UNIQUE KEY uq_email (email)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Why TEXT for original_url? URLs can exceed VARCHAR(255). TEXT is stored off-page so the row stays compact. We never filter by original_url directly, so no index needed on it.
Why BIGINT for id? We use this as the seed for base62 encoding (see next section). MySQL’s AUTO_INCREMENT BIGINT gives us ~9.2 × 10¹⁸ unique IDs — effectively unlimited.
Step 5 — Short code generation strategies
This is the core technical challenge. Interviewers want to hear you walk through multiple approaches and their tradeoffs.
Approach 1: Hash the original URL (MD5 / SHA256 truncation)
shortCode = base62(MD5(originalUrl))[0:7]
Problem — hash collision: Two different URLs can produce the same 7-character prefix.
- MD5 produces 128 bits. Taking 7 base62 chars = 42 bits. Collision probability after ~200K URLs is non-trivial (birthday problem).
Mitigation: If collision detected, append a salt (MD5(url + salt)) and retry. But now you need a uniqueness check on every insert — a round-trip to the DB on the write path. Not elegant.
Verdict: ❌ Reject. The collision handling complexity isn’t worth it.
Approach 2: Base62 encoding of AUTO_INCREMENT ID
shortCode = base62(autoIncrementId)
Base62 alphabet: 0-9a-zA-Z (62 characters)
const ALPHABET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';
function toBase62(id: number): string {
if (id === 0) return ALPHABET[0];
let result = '';
while (id > 0) {
result = ALPHABET[id % 62] + result;
id = Math.floor(id / 62);
}
return result;
}
// ID 1 → "1" (1 char)
// ID 100M → "6LAze" (5 chars)
// ID 3.5B → "aB3xZ2" (6 chars)
// ID 218T → "zzzzzzz" (7 chars)
How many IDs can 7 chars hold? 62⁷ = 3.5 trillion. Way more than enough for 100M URLs.
Problem — predictability: IDs are sequential. An attacker can enumerate bck.ly/1, bck.ly/2, bck.ly/3 and discover all URLs. Bad for private links.
Mitigation: XOR the ID with a secret key before encoding, or use a format-preserving encryption (FPE) scheme. Adds complexity.
Another problem — single point of failure: A single MySQL AUTO_INCREMENT is a bottleneck for distributed writes. If the primary DB is down, no new URLs can be created.
Verdict: ✅ Good for simple single-region deployments. Works for the interview if you acknowledge the enumeration risk.
Approach 3: Random code + uniqueness check
function generateCode(length = 7): string {
const chars = ALPHABET;
let code = '';
for (let i = 0; i < length; i++) {
code += chars[Math.floor(Math.random() * chars.length)];
}
return code;
}
// Insert with: INSERT IGNORE ... then check rows affected
// If 0 rows affected, collision → retry
Problem: At 100M URLs stored (occupying ~8.6% of the 62⁷ keyspace), collision probability per attempt is ~8.6%. Acceptable. But requires a DB round-trip on every write, and in the worst case multiple retries.
Verdict: ✅ Simple, no enumeration risk, good for small-to-medium scale.
Approach 4: Distributed ID generator (Snowflake-style) — PREFERRED
A Snowflake ID is a 64-bit integer composed of:
| 41 bits timestamp (ms since epoch) | 10 bits worker ID | 12 bits sequence |
- 41 bits of timestamp: covers ~69 years from a custom epoch.
- 10 bits worker ID: supports 1024 worker nodes.
- 12 bits sequence: 4096 IDs per millisecond per worker = 4M IDs/sec total.
class SnowflakeGenerator {
private readonly epoch = 1704067200000n; // 2024-01-01
private sequence = 0n;
private lastTimestamp = -1n;
constructor(private workerId: bigint) {
if (workerId < 0n || workerId > 1023n) {
throw new Error('Worker ID must be 0-1023');
}
}
nextId(): bigint {
let timestamp = BigInt(Date.now());
if (timestamp === this.lastTimestamp) {
this.sequence = (this.sequence + 1n) & 4095n;
if (this.sequence === 0n) {
// Sequence overflow: wait for next millisecond
while (timestamp <= this.lastTimestamp) {
timestamp = BigInt(Date.now());
}
}
} else {
this.sequence = 0n;
}
this.lastTimestamp = timestamp;
return ((timestamp - this.epoch) << 22n)
| (this.workerId << 12n)
| this.sequence;
}
}
// Then: shortCode = toBase62(Number(snowflakeId))
Advantages:
- No DB round-trip to generate the ID.
- IDs are k-sorted (roughly time-ordered), which is great for DB index locality.
- Each worker node is independent — no coordination needed.
Verdict: ✅✅ Best approach for large-scale distributed systems. Use this.
Step 6 — Redirect flow with CDN + Cache
The redirect is the absolute hottest path. At 20,000 req/sec globally with p99 < 10ms, you need multiple caching layers.
User Browser
│
▼
CDN (CloudFront / Fastly)
- Caches 302 responses with Cache-Control: max-age=300
- 90%+ cache hit rate for popular short codes
│ (cache miss)
▼
Load Balancer (NGINX / AWS ALB)
│
▼
Redirect Service (stateless Node.js / Go)
│
├──► Redis Cache (shortCode → originalUrl)
│ TTL: 24 hours, LRU eviction
│ Hit rate: 95%+ (Zipfian distribution — top 5% codes = 95% traffic)
│
└──► MySQL Read Replica (cache miss only)
Updates Redis on miss (cache-aside)
Redis cache key design
Key: url:{shortCode}
Value: originalUrl (plain string)
TTL: 86400 seconds (24 hours)
On redirect service startup, pre-warm cache with top 10,000 most-clicked URLs.
Redirect service logic
async function handleRedirect(shortCode: string): Promise<string | null> {
// 1. Check Redis
const cached = await redis.get(`url:${shortCode}`);
if (cached) {
// Fire-and-forget analytics event
void publishClickEvent(shortCode);
return cached;
}
// 2. Cache miss — hit read replica
const row = await db.queryOne(
'SELECT original_url, expires_at, is_deleted FROM urls WHERE short_code = ?',
[shortCode]
);
if (!row || row.is_deleted) return null;
if (row.expires_at && new Date(row.expires_at) < new Date()) return null;
// 3. Populate cache
await redis.setex(`url:${shortCode}`, 86400, row.original_url);
void publishClickEvent(shortCode);
return row.original_url;
}
Response headers for 302:
HTTP/1.1 302 Found
Location: https://example.com/original/url
Cache-Control: no-store
X-Request-ID: req_abc123
Step 7 — Database choice
MySQL (chosen) vs Cassandra
| Factor | MySQL | Cassandra |
|---|---|---|
| Data model | Relational, strong ACID | Wide-column, eventual consistency |
| Read pattern | Point lookup by shortCode (PK index) | Point lookup by partition key |
| Write pattern | Single-row inserts, low QPS | High-throughput, multi-region writes |
| URL table size | 55 GB — fits on one machine | Overkill for this size |
| Consistency | Strong (important: you don’t want stale redirects after deletion) | Tunable, but eventual by default |
| Operations | Simpler, well-understood | Complex to operate |
Verdict: MySQL with read replicas is the right choice for a URL shortener at 100M URLs. The dataset is small, reads are simple point lookups, and strong consistency matters (expired/deleted URLs must redirect correctly immediately).
Use Cassandra only if you’re building at Twitter/Bitly scale (billions of URLs, multi-region active-active writes).
MySQL scaling for reads
MySQL Primary (writes only: INSERT, UPDATE click_count)
│
├──► Read Replica 1 (redirects)
├──► Read Replica 2 (redirects)
└──► Read Replica 3 (analytics queries)
With Redis absorbing 95%+ of reads, the replicas handle <1,000 req/sec — trivially manageable.
Step 8 — Custom aliases
Custom aliases require a different insertion path:
async function createShortUrl(req: CreateUrlRequest): Promise<ShortUrl> {
let shortCode: string;
if (req.customAlias) {
// Validate: alphanumeric, 3-16 chars, not a reserved word
if (!isValidAlias(req.customAlias)) throw new BadRequestError('Invalid alias');
// Check availability
const exists = await db.queryOne(
'SELECT id FROM urls WHERE short_code = ?', [req.customAlias]
);
if (exists) throw new ConflictError('Alias already taken');
shortCode = req.customAlias;
} else {
// Generate Snowflake ID and encode
const id = snowflake.nextId();
shortCode = toBase62(Number(id));
}
await db.execute(
`INSERT INTO urls (short_code, original_url, user_id, expires_at)
VALUES (?, ?, ?, ?)`,
[shortCode, req.originalUrl, req.userId, req.expiresAt]
);
return { shortCode, shortUrl: `https://bck.ly/${shortCode}`, ... };
}
Reserved words to block as aliases: api, admin, login, signup, dashboard, health, etc. Maintain a blocklist table.
Step 9 — Analytics
The naive approach (and why it fails)
Don’t do this:
-- WRONG: synchronous UPDATE on every redirect
UPDATE urls SET click_count = click_count + 1 WHERE short_code = ?;
At 20,000 redirects/sec, this creates a write hotspot on the urls table. Every redirect becomes a write — you’ve just made your hottest read path into a write path. The DB will melt.
Async analytics pipeline
Redirect Service
│
│ publishes event (fire-and-forget, non-blocking)
▼
Kafka Topic: "click-events"
│
├──► Analytics Consumer (Flink / Kafka Streams)
│ │
│ ├──► ClickHouse (raw click storage, time-series analytics)
│ │ Partitioned by (short_code, date)
│ │
│ └──► Redis INCR (real-time counter update)
│ Key: clicks:{shortCode}:{YYYYMMDD}
│
└──► Aggregation Consumer
Rolls up daily/weekly/monthly counts
Writes to MySQL urls.click_count (async, not on hot path)
ClickHouse schema for raw clicks
CREATE TABLE click_events (
short_code String,
clicked_at DateTime,
ip_hash String, -- hashed for privacy
country LowCardinality(String),
referrer String,
device_type LowCardinality(String), -- mobile/desktop/tablet
user_agent String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(clicked_at)
ORDER BY (short_code, clicked_at);
ClickHouse can ingest billions of rows and answer aggregate queries (COUNT, GROUP BY country) in milliseconds. It’s purpose-built for this.
Real-time click counter in Redis
// In the analytics consumer
async function processClickEvent(event: ClickEvent): Promise<void> {
const dateKey = event.clickedAt.toISOString().slice(0, 10); // YYYY-MM-DD
await redis.incr(`clicks:${event.shortCode}:${dateKey}`);
await redis.expire(`clicks:${event.shortCode}:${dateKey}`, 7 * 86400); // 7-day TTL
}
// Serving the stats API
async function getStats(shortCode: string): Promise<Stats> {
const today = new Date().toISOString().slice(0, 10);
const clicksToday = await redis.get(`clicks:${shortCode}:${today}`);
// Total clicks from ClickHouse for historical data
...
}
Step 10 — Rate limiting
Per-user URL creation limits prevent abuse and protect the DB from write spikes.
Free tier: 10 URL creations / hour
Pro tier: 1,000 URL creations / hour
Enterprise: Unlimited (with fair-use monitoring)
Implementation: Token bucket in Redis per user per tier.
async function checkRateLimit(userId: string, tier: 'free' | 'pro'): Promise<void> {
const limit = tier === 'free' ? 10 : 1000;
const windowSec = 3600; // 1 hour
const key = `ratelimit:create:${userId}`;
const current = await redis.incr(key);
if (current === 1) {
await redis.expire(key, windowSec);
}
if (current > limit) {
throw new RateLimitError(`Limit: ${limit} URLs/hour. Resets in ${await redis.ttl(key)}s`);
}
}
For anonymous users, rate limit by IP address. Use a sliding window (ZSET in Redis) instead of fixed window to avoid burst at window boundary.
Step 11 — Complete architecture diagram
┌─────────────────────────────────────────────┐
│ CLIENTS │
└──────────────┬───────────────┬──────────────┘
│ │
Redirect Write/Read
(GET /{code}) (POST /shorten)
│ │
┌──────────────▼───────────────▼──────────────┐
│ CDN (CloudFront) │
│ Caches popular 302 redirects (TTL 5min) │
└──────────────┬───────────────┬──────────────┘
│ │
┌──────────────▼───────────────▼──────────────┐
│ Load Balancer (ALB) │
└──────┬──────────────────────┬───────────────┘
│ │
┌────────────▼──────┐ ┌──────────▼────────────┐
│ Redirect Service │ │ URL Write Service │
│ (Node.js, 10x) │ │ (Node.js, 3x) │
└────────┬──────────┘ └──────────┬────────────┘
│ │
┌────────────▼──────────┐ ┌────────────▼───────────┐
│ Redis Cluster │ │ Snowflake ID Service │
│ url:{code}→URL │ │ (worker per instance) │
│ clicks:{code}:date │ └────────────────────────┘
│ ratelimit:{user} │ │
└────────────┬──────────┘ ┌────────────▼───────────┐
│ │ MySQL Primary │
│ │ (writes) │
┌────────────▼──────────┐ └────────────┬───────────┘
│ MySQL Read Replicas │ │ replication
│ (cache miss reads) │◄───────────────┘
└───────────────────────┘
┌───────────────────────────────────────────────────────┐
│ ANALYTICS PIPELINE │
│ │
│ Redirect Service ──► Kafka ──► Flink Consumer │
│ │ │
│ ┌─────────▼─────────┐ │
│ │ ClickHouse │ │
│ │ (raw click events) │ │
│ └───────────────────┬┘ │
│ │ │
│ Stats API ◄─────────┘ │
└───────────────────────────────────────────────────────┘
Step 12 — Scaling deep dives
Scaling the redirect path to 200K+ req/sec
- CDN layer: Popular short codes hit CDN edge nodes, never reach your origin. Target 90%+ CDN hit rate.
- Redis cluster: Shard by
shortCodehash. A 3-node Redis cluster handles 500K+ ops/sec easily. - Stateless redirect services: Horizontally scalable. Add instances freely — they only talk to Redis and don’t share state.
- Read replicas: MySQL replicas handle the <1% of requests that are cache misses.
Handling URL expiry
Expired URLs must redirect to a 410 Gone page, not silently continue working.
- Check at read time: The redirect service checks
expires_aton every cache miss. This is correct but slow. - Proactive cache invalidation: A background cron job (runs every minute) queries
SELECT short_code FROM urls WHERE expires_at < NOW() AND expires_at > NOW() - INTERVAL 2 MINUTEand deletes those cache keys from Redis. - Redis TTL as expiry: When populating the cache, set TTL =
min(24h, expires_at - now()). The cache key naturally expires at the right time.
Abuse prevention
- Safe browsing check: Before shortening, check the destination URL against Google Safe Browsing API. Reject known malware/phishing URLs.
- Blocklist: Maintain a table of blocked domains. Before inserting, check
original_urldomain against blocklist. - NSFW detection: For image/video destinations, run async content classification.
- Rate limiting: Already covered. IP-based limits for anonymous users.
Private URLs
Add a visibility column: ENUM('public', 'private', 'password_protected').
- Private: Only the owner can access stats. The redirect itself still works (you don’t want to expose that the URL exists).
- Password-protected: On redirect, serve an intermediate page that requires a password. Store
bcrypt(password)in the URLs table.
Common follow-up questions
Q: What if the same long URL is shortened multiple times?
Deduplicate with a hash index: UNIQUE KEY uq_url_hash (SHA256(original_url)). Return the existing short code if the URL already exists. Trade-off: user may expect a fresh short code for tracking purposes, so make deduplication opt-in or per-user.
Q: How do you handle 301 cache poisoning?
If you’ve issued a 301 and later need to change or delete the URL, browsers have cached the old redirect indefinitely. This is why 302 is generally safer for a URL shortener service — you maintain control.
Q: How would you implement QR code generation?
After shortening, generate a QR code image client-side (using a library like qrcode.js) or server-side on demand. Cache the generated PNG in S3 at key qr/\{shortCode\}.png and serve via CDN. Generate lazily on first request.
Q: Multi-region deployment?
Write path stays in primary region (single primary MySQL). Read path deploys redirect services globally, each backed by a local Redis replica. MySQL replication (async) feeds regional replicas. Accept <1 second eventual consistency — a brand-new URL might not be available at edge for up to 1 second after creation. Use synchronous replication to 1 additional region if you need stronger guarantees.