Design: URL Shortener — ZERO TO GRIND

A URL shortener is a deceptively simple product that hides serious distributed systems challenges: collision-free ID generation at scale, sub-10ms redirect latency globally, and analytics pipelines that don’t slow down the hot path. Interviewers love it because it’s scoped enough to finish in 45 minutes but deep enough to separate strong candidates.

Step 1 — Clarify requirements

Never jump into design. Start by driving a requirements conversation. Here’s how a strong candidate runs it.

Functional requirements

Shorten URL: Given a long URL, return a short URL (e.g., https://bck.ly/aB3xZ2).
Redirect: Given a short code, redirect the user to the original URL.
Custom aliases: Users can optionally choose their short code (e.g., /my-blog).
Expiry: URLs can have a TTL, defaulting to permanent.
Analytics: Track click counts, referrer, country, device type per short URL.
User accounts: Users can manage (list, delete, update) their own URLs.

Non-functional requirements

Scale: 100 million URLs in the system. Write: ~200 new URLs/sec. Read: ~20,000 redirects/sec (100:1 read:write ratio).
Latency: p99 redirect latency < 10ms (this is the critical path — must be cached).
Availability: 99.99% uptime. Redirects must work even if the write path is degraded.
Short code length: Codes should be as short as possible while supporting 100M+ URLs.
Security: No ability to enumerate valid short codes by incrementing a counter.

What we are NOT building

A full link-management SaaS with teams, folders, SAML SSO, etc. (out of scope for the interview).

Step 2 — Capacity estimation

Always do back-of-envelope math before drawing any boxes. It drives every storage, caching, and server sizing decision.

QPS (Queries per second)

Write (URL creation):
  100M URLs / (365 days × 86,400 sec) ≈ 3 URLs/sec average
  With burst (marketing campaigns): ~200 URLs/sec peak

Read (redirects):
  100:1 read:write ratio → 200 URLs/sec × 100 = 20,000 redirects/sec peak

Storage estimation

Per URL record:
  shortCode:    7 bytes   (base62, 7 chars)
  originalUrl:  512 bytes (average long URL)
  userId:       8 bytes
  createdAt:    8 bytes
  expiresAt:    8 bytes
  clickCount:   8 bytes
  ─────────────────────
  Total:        ~551 bytes per row

100M rows × 551 bytes ≈ 55 GB for URL table (fits on a single large RDS instance)

Analytics events (raw click logs):
  20,000 clicks/sec × 86,400 sec/day ≈ 1.7 billion events/day
  Per event: ~200 bytes
  Daily storage: 1.7B × 200 bytes ≈ 340 GB/day
  → Analytics needs a separate time-series/columnar store (ClickHouse)

Bandwidth

Reads:  20,000 req/sec × 512 bytes/response ≈ 10 MB/s outbound
Writes: 200 req/sec × 512 bytes/request ≈ 100 KB/s inbound
→ Bandwidth is not the bottleneck. Latency is.

Key insight from estimation: The URL table is tiny (55 GB) — it easily fits in memory as a Redis cache. Analytics storage is massive and must be decoupled entirely.

Step 3 — API design

POST /api/v1/urls — Create short URL

Request:

{
  "originalUrl": "https://example.com/very/long/path?with=query&params=here",
  "customAlias": "my-blog",      // optional
  "expiresAt": "2025-12-31T00:00:00Z"  // optional, null = never
}

Response 201 Created:

{
  "shortUrl": "https://bck.ly/aB3xZ2",
  "shortCode": "aB3xZ2",
  "originalUrl": "https://example.com/...",
  "expiresAt": null,
  "createdAt": "2024-01-15T10:30:00Z"
}

GET /{shortCode} — Redirect

Returns 301 Moved Permanently or 302 Found.

301 vs 302 — this is a common interview question:

	301 Permanent	302 Temporary
Browser caches redirect?	Yes	No
Subsequent requests hit your server?	No (browser goes direct)	Yes
Analytics accuracy	Poor (cached clicks not tracked)	Accurate
Server load	Lower (browser caches)	Higher
Use when	URL never changes, max performance	Need click analytics

Recommendation: Use 302 if analytics matter. Use 301 if you only care about redirect performance and don’t need accurate click counts.

GET /api/v1/urls/{shortCode}/stats — Analytics

{
  "shortCode": "aB3xZ2",
  "totalClicks": 42891,
  "clicksLast24h": 1204,
  "topCountries": [{"country": "US", "clicks": 18200}, ...],
  "topReferrers": [{"referrer": "twitter.com", "clicks": 9400}, ...]
}

DELETE /api/v1/urls/{shortCode} — Soft-delete

Returns 204 No Content. Marks the record as deleted; does not physically remove it (audit trail).

Step 4 — Database schema

CREATE TABLE urls (
  id            BIGINT UNSIGNED  NOT NULL AUTO_INCREMENT,
  short_code    VARCHAR(16)      NOT NULL,
  original_url  TEXT             NOT NULL,
  user_id       BIGINT UNSIGNED,
  created_at    DATETIME(3)      NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  expires_at    DATETIME(3),
  click_count   BIGINT           NOT NULL DEFAULT 0,
  is_deleted    BOOLEAN          NOT NULL DEFAULT FALSE,
  PRIMARY KEY (id),
  UNIQUE KEY uq_short_code (short_code),
  KEY idx_user_id (user_id),
  KEY idx_expires_at (expires_at)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

CREATE TABLE users (
  id            BIGINT UNSIGNED  NOT NULL AUTO_INCREMENT,
  email         VARCHAR(255)     NOT NULL,
  password_hash VARCHAR(255)     NOT NULL,
  created_at    DATETIME(3)      NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
  rate_limit_tier ENUM('free','pro','enterprise') NOT NULL DEFAULT 'free',
  PRIMARY KEY (id),
  UNIQUE KEY uq_email (email)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Why TEXT for original_url? URLs can exceed VARCHAR(255). TEXT is stored off-page so the row stays compact. We never filter by original_url directly, so no index needed on it.

Why BIGINT for id? We use this as the seed for base62 encoding (see next section). MySQL’s AUTO_INCREMENT BIGINT gives us ~9.2 × 10¹⁸ unique IDs — effectively unlimited.

Step 5 — Short code generation strategies

This is the core technical challenge. Interviewers want to hear you walk through multiple approaches and their tradeoffs.

Approach 1: Hash the original URL (MD5 / SHA256 truncation)

shortCode = base62(MD5(originalUrl))[0:7]

Problem — hash collision: Two different URLs can produce the same 7-character prefix.

MD5 produces 128 bits. Taking 7 base62 chars = 42 bits. Collision probability after ~200K URLs is non-trivial (birthday problem).

Mitigation: If collision detected, append a salt (MD5(url + salt)) and retry. But now you need a uniqueness check on every insert — a round-trip to the DB on the write path. Not elegant.

Verdict: ❌ Reject. The collision handling complexity isn’t worth it.

Approach 2: Base62 encoding of AUTO_INCREMENT ID

shortCode = base62(autoIncrementId)

Base62 alphabet: 0-9a-zA-Z (62 characters)

const ALPHABET = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ';

function toBase62(id: number): string {
  if (id === 0) return ALPHABET[0];
  let result = '';
  while (id > 0) {
    result = ALPHABET[id % 62] + result;
    id = Math.floor(id / 62);
  }
  return result;
}

// ID 1        → "1"       (1 char)
// ID 100M     → "6LAze"   (5 chars)
// ID 3.5B     → "aB3xZ2"  (6 chars)
// ID 218T     → "zzzzzzz" (7 chars)

How many IDs can 7 chars hold? 62⁷ = 3.5 trillion. Way more than enough for 100M URLs.

Problem — predictability: IDs are sequential. An attacker can enumerate bck.ly/1, bck.ly/2, bck.ly/3 and discover all URLs. Bad for private links.

Mitigation: XOR the ID with a secret key before encoding, or use a format-preserving encryption (FPE) scheme. Adds complexity.

Another problem — single point of failure: A single MySQL AUTO_INCREMENT is a bottleneck for distributed writes. If the primary DB is down, no new URLs can be created.

Verdict: ✅ Good for simple single-region deployments. Works for the interview if you acknowledge the enumeration risk.

Approach 3: Random code + uniqueness check

function generateCode(length = 7): string {
  const chars = ALPHABET;
  let code = '';
  for (let i = 0; i < length; i++) {
    code += chars[Math.floor(Math.random() * chars.length)];
  }
  return code;
}
// Insert with: INSERT IGNORE ... then check rows affected
// If 0 rows affected, collision → retry

Problem: At 100M URLs stored (occupying ~8.6% of the 62⁷ keyspace), collision probability per attempt is ~8.6%. Acceptable. But requires a DB round-trip on every write, and in the worst case multiple retries.

Verdict: ✅ Simple, no enumeration risk, good for small-to-medium scale.

Approach 4: Distributed ID generator (Snowflake-style) — PREFERRED

A Snowflake ID is a 64-bit integer composed of:

| 41 bits timestamp (ms since epoch) | 10 bits worker ID | 12 bits sequence |

41 bits of timestamp: covers ~69 years from a custom epoch.
10 bits worker ID: supports 1024 worker nodes.
12 bits sequence: 4096 IDs per millisecond per worker = 4M IDs/sec total.

class SnowflakeGenerator {
  private readonly epoch = 1704067200000n; // 2024-01-01
  private sequence = 0n;
  private lastTimestamp = -1n;

  constructor(private workerId: bigint) {
    if (workerId &lt; 0n || workerId > 1023n) {
      throw new Error('Worker ID must be 0-1023');
    }
  }

  nextId(): bigint {
    let timestamp = BigInt(Date.now());
    if (timestamp === this.lastTimestamp) {
      this.sequence = (this.sequence + 1n) & 4095n;
      if (this.sequence === 0n) {
        // Sequence overflow: wait for next millisecond
        while (timestamp <= this.lastTimestamp) {
          timestamp = BigInt(Date.now());
        }
      }
    } else {
      this.sequence = 0n;
    }
    this.lastTimestamp = timestamp;
    return ((timestamp - this.epoch) &lt;&lt; 22n)
      | (this.workerId &lt;&lt; 12n)
      | this.sequence;
  }
}

// Then: shortCode = toBase62(Number(snowflakeId))

Advantages:

No DB round-trip to generate the ID.
IDs are k-sorted (roughly time-ordered), which is great for DB index locality.
Each worker node is independent — no coordination needed.

Verdict: ✅✅ Best approach for large-scale distributed systems. Use this.

Step 6 — Redirect flow with CDN + Cache

The redirect is the absolute hottest path. At 20,000 req/sec globally with p99 < 10ms, you need multiple caching layers.

User Browser
    │
    ▼
  CDN (CloudFront / Fastly)
  - Caches 302 responses with Cache-Control: max-age=300
  - 90%+ cache hit rate for popular short codes
    │ (cache miss)
    ▼
  Load Balancer (NGINX / AWS ALB)
    │
    ▼
  Redirect Service (stateless Node.js / Go)
    │
    ├──► Redis Cache (shortCode → originalUrl)
    │    TTL: 24 hours, LRU eviction
    │    Hit rate: 95%+ (Zipfian distribution — top 5% codes = 95% traffic)
    │
    └──► MySQL Read Replica (cache miss only)
         Updates Redis on miss (cache-aside)

Redis cache key design

Key:   url:{shortCode}
Value: originalUrl (plain string)
TTL:   86400 seconds (24 hours)

On redirect service startup, pre-warm cache with top 10,000 most-clicked URLs.

Redirect service logic

async function handleRedirect(shortCode: string): Promise<string | null> {
  // 1. Check Redis
  const cached = await redis.get(`url:${shortCode}`);
  if (cached) {
    // Fire-and-forget analytics event
    void publishClickEvent(shortCode);
    return cached;
  }

  // 2. Cache miss — hit read replica
  const row = await db.queryOne(
    'SELECT original_url, expires_at, is_deleted FROM urls WHERE short_code = ?',
    [shortCode]
  );

  if (!row || row.is_deleted) return null;
  if (row.expires_at && new Date(row.expires_at) &lt; new Date()) return null;

  // 3. Populate cache
  await redis.setex(`url:${shortCode}`, 86400, row.original_url);

  void publishClickEvent(shortCode);
  return row.original_url;
}

Response headers for 302:

HTTP/1.1 302 Found
Location: https://example.com/original/url
Cache-Control: no-store
X-Request-ID: req_abc123

Step 7 — Database choice

MySQL (chosen) vs Cassandra

Factor	MySQL	Cassandra
Data model	Relational, strong ACID	Wide-column, eventual consistency
Read pattern	Point lookup by shortCode (PK index)	Point lookup by partition key
Write pattern	Single-row inserts, low QPS	High-throughput, multi-region writes
URL table size	55 GB — fits on one machine	Overkill for this size
Consistency	Strong (important: you don’t want stale redirects after deletion)	Tunable, but eventual by default
Operations	Simpler, well-understood	Complex to operate

Verdict: MySQL with read replicas is the right choice for a URL shortener at 100M URLs. The dataset is small, reads are simple point lookups, and strong consistency matters (expired/deleted URLs must redirect correctly immediately).

Use Cassandra only if you’re building at Twitter/Bitly scale (billions of URLs, multi-region active-active writes).

MySQL scaling for reads

MySQL Primary (writes only: INSERT, UPDATE click_count)
    │
    ├──► Read Replica 1 (redirects)
    ├──► Read Replica 2 (redirects)
    └──► Read Replica 3 (analytics queries)

With Redis absorbing 95%+ of reads, the replicas handle <1,000 req/sec — trivially manageable.

Step 8 — Custom aliases

Custom aliases require a different insertion path:

async function createShortUrl(req: CreateUrlRequest): Promise<ShortUrl> {
  let shortCode: string;

  if (req.customAlias) {
    // Validate: alphanumeric, 3-16 chars, not a reserved word
    if (!isValidAlias(req.customAlias)) throw new BadRequestError('Invalid alias');

    // Check availability
    const exists = await db.queryOne(
      'SELECT id FROM urls WHERE short_code = ?', [req.customAlias]
    );
    if (exists) throw new ConflictError('Alias already taken');

    shortCode = req.customAlias;
  } else {
    // Generate Snowflake ID and encode
    const id = snowflake.nextId();
    shortCode = toBase62(Number(id));
  }

  await db.execute(
    `INSERT INTO urls (short_code, original_url, user_id, expires_at)
     VALUES (?, ?, ?, ?)`,
    [shortCode, req.originalUrl, req.userId, req.expiresAt]
  );

  return { shortCode, shortUrl: `https://bck.ly/${shortCode}`, ... };
}

Reserved words to block as aliases: api, admin, login, signup, dashboard, health, etc. Maintain a blocklist table.

Step 9 — Analytics

The naive approach (and why it fails)

Don’t do this:

-- WRONG: synchronous UPDATE on every redirect
UPDATE urls SET click_count = click_count + 1 WHERE short_code = ?;

At 20,000 redirects/sec, this creates a write hotspot on the urls table. Every redirect becomes a write — you’ve just made your hottest read path into a write path. The DB will melt.

Async analytics pipeline

Redirect Service
    │
    │ publishes event (fire-and-forget, non-blocking)
    ▼
Kafka Topic: "click-events"
    │
    ├──► Analytics Consumer (Flink / Kafka Streams)
    │        │
    │        ├──► ClickHouse (raw click storage, time-series analytics)
    │        │    Partitioned by (short_code, date)
    │        │
    │        └──► Redis INCR (real-time counter update)
    │             Key: clicks:{shortCode}:{YYYYMMDD}
    │
    └──► Aggregation Consumer
             Rolls up daily/weekly/monthly counts
             Writes to MySQL urls.click_count (async, not on hot path)

ClickHouse schema for raw clicks

CREATE TABLE click_events (
    short_code    String,
    clicked_at    DateTime,
    ip_hash       String,       -- hashed for privacy
    country       LowCardinality(String),
    referrer      String,
    device_type   LowCardinality(String),  -- mobile/desktop/tablet
    user_agent    String
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(clicked_at)
ORDER BY (short_code, clicked_at);

ClickHouse can ingest billions of rows and answer aggregate queries (COUNT, GROUP BY country) in milliseconds. It’s purpose-built for this.

Real-time click counter in Redis

// In the analytics consumer
async function processClickEvent(event: ClickEvent): Promise<void> {
  const dateKey = event.clickedAt.toISOString().slice(0, 10); // YYYY-MM-DD
  await redis.incr(`clicks:${event.shortCode}:${dateKey}`);
  await redis.expire(`clicks:${event.shortCode}:${dateKey}`, 7 * 86400); // 7-day TTL
}

// Serving the stats API
async function getStats(shortCode: string): Promise<Stats> {
  const today = new Date().toISOString().slice(0, 10);
  const clicksToday = await redis.get(`clicks:${shortCode}:${today}`);
  // Total clicks from ClickHouse for historical data
  ...
}

Step 10 — Rate limiting

Per-user URL creation limits prevent abuse and protect the DB from write spikes.

Free tier:     10 URL creations / hour
Pro tier:      1,000 URL creations / hour
Enterprise:    Unlimited (with fair-use monitoring)

Implementation: Token bucket in Redis per user per tier.

async function checkRateLimit(userId: string, tier: 'free' | 'pro'): Promise<void> {
  const limit = tier === 'free' ? 10 : 1000;
  const windowSec = 3600; // 1 hour
  const key = `ratelimit:create:${userId}`;

  const current = await redis.incr(key);
  if (current === 1) {
    await redis.expire(key, windowSec);
  }
  if (current > limit) {
    throw new RateLimitError(`Limit: ${limit} URLs/hour. Resets in ${await redis.ttl(key)}s`);
  }
}

For anonymous users, rate limit by IP address. Use a sliding window (ZSET in Redis) instead of fixed window to avoid burst at window boundary.

Step 11 — Complete architecture diagram

                    ┌─────────────────────────────────────────────┐
                    │                  CLIENTS                     │
                    └──────────────┬───────────────┬──────────────┘
                                   │               │
                            Redirect             Write/Read
                            (GET /{code})        (POST /shorten)
                                   │               │
                    ┌──────────────▼───────────────▼──────────────┐
                    │              CDN (CloudFront)                 │
                    │   Caches popular 302 redirects (TTL 5min)    │
                    └──────────────┬───────────────┬──────────────┘
                                   │               │
                    ┌──────────────▼───────────────▼──────────────┐
                    │           Load Balancer (ALB)                │
                    └──────┬──────────────────────┬───────────────┘
                           │                      │
              ┌────────────▼──────┐    ┌──────────▼────────────┐
              │  Redirect Service  │    │    URL Write Service   │
              │  (Node.js, 10x)    │    │    (Node.js, 3x)      │
              └────────┬──────────┘    └──────────┬────────────┘
                       │                           │
          ┌────────────▼──────────┐   ┌────────────▼───────────┐
          │   Redis Cluster       │   │  Snowflake ID Service  │
          │   url:{code}→URL      │   │  (worker per instance) │
          │   clicks:{code}:date  │   └────────────────────────┘
          │   ratelimit:{user}    │                │
          └────────────┬──────────┘   ┌────────────▼───────────┐
                       │              │   MySQL Primary          │
                       │              │   (writes)               │
          ┌────────────▼──────────┐   └────────────┬───────────┘
          │   MySQL Read Replicas  │                │ replication
          │   (cache miss reads)   │◄───────────────┘
          └───────────────────────┘

          ┌───────────────────────────────────────────────────────┐
          │                 ANALYTICS PIPELINE                      │
          │                                                         │
          │  Redirect Service ──► Kafka ──► Flink Consumer         │
          │                                    │                    │
          │                          ┌─────────▼─────────┐        │
          │                          │    ClickHouse       │        │
          │                          │  (raw click events) │        │
          │                          └───────────────────┬┘        │
          │                                              │          │
          │                          Stats API ◄─────────┘         │
          └───────────────────────────────────────────────────────┘

Step 12 — Scaling deep dives

Scaling the redirect path to 200K+ req/sec

CDN layer: Popular short codes hit CDN edge nodes, never reach your origin. Target 90%+ CDN hit rate.
Redis cluster: Shard by shortCode hash. A 3-node Redis cluster handles 500K+ ops/sec easily.
Stateless redirect services: Horizontally scalable. Add instances freely — they only talk to Redis and don’t share state.
Read replicas: MySQL replicas handle the <1% of requests that are cache misses.

Handling URL expiry

Expired URLs must redirect to a 410 Gone page, not silently continue working.

Check at read time: The redirect service checks expires_at on every cache miss. This is correct but slow.
Proactive cache invalidation: A background cron job (runs every minute) queries SELECT short_code FROM urls WHERE expires_at < NOW() AND expires_at > NOW() - INTERVAL 2 MINUTE and deletes those cache keys from Redis.
Redis TTL as expiry: When populating the cache, set TTL = min(24h, expires_at - now()). The cache key naturally expires at the right time.

Abuse prevention

Safe browsing check: Before shortening, check the destination URL against Google Safe Browsing API. Reject known malware/phishing URLs.
Blocklist: Maintain a table of blocked domains. Before inserting, check original_url domain against blocklist.
NSFW detection: For image/video destinations, run async content classification.
Rate limiting: Already covered. IP-based limits for anonymous users.

Private URLs

Add a visibility column: ENUM('public', 'private', 'password_protected').

Private: Only the owner can access stats. The redirect itself still works (you don’t want to expose that the URL exists).
Password-protected: On redirect, serve an intermediate page that requires a password. Store bcrypt(password) in the URLs table.

Common follow-up questions

Q: What if the same long URL is shortened multiple times?

Deduplicate with a hash index: UNIQUE KEY uq_url_hash (SHA256(original_url)). Return the existing short code if the URL already exists. Trade-off: user may expect a fresh short code for tracking purposes, so make deduplication opt-in or per-user.

Q: How do you handle 301 cache poisoning?

If you’ve issued a 301 and later need to change or delete the URL, browsers have cached the old redirect indefinitely. This is why 302 is generally safer for a URL shortener service — you maintain control.

Q: How would you implement QR code generation?

After shortening, generate a QR code image client-side (using a library like qrcode.js) or server-side on demand. Cache the generated PNG in S3 at key qr/\{shortCode\}.png and serve via CDN. Generate lazily on first request.

Q: Multi-region deployment?

Write path stays in primary region (single primary MySQL). Read path deploys redirect services globally, each backed by a local Redis replica. MySQL replication (async) feeds regional replicas. Accept <1 second eventual consistency — a brand-new URL might not be available at edge for up to 1 second after creation. Use synchronous replication to 1 additional region if you need stronger guarantees.