Union Find (DSU) — ZERO TO GRIND

Union Find (also called Disjoint Set Union, DSU) answers one question with near-constant time: “Are these two nodes currently in the same connected component?” With path compression and union by rank, each find and union runs in amortized O(α(n)) — the inverse Ackermann function. For any input size you’ll ever encounter, α(n) ≤ 4. Treat it as O(1).

The Problem It Solves

You have n nodes and edges arriving one at a time. After each edge arrives, you need to answer: “Are node u and node v connected — directly or through any chain of edges?”

This is dynamic connectivity: the graph is growing, and you need live answers after each addition.

Why Not BFS/DFS?

Run a BFS or DFS per query: O(V + E) per query. With Q queries and E edges, total cost is O(Q × (V + E)). For 10^5 queries and 10^5 edges that’s 10^10 operations — nowhere near fast enough.

Union Find answers every query in O(α(n)) regardless of graph size.

The Core Idea: Parent Array

Each element points to a parent. A root points to itself. Elements in the same component share the same root.

n = 5, initial state (each node is its own component):
parent = [0, 1, 2, 3, 4]   ← index i has parent i

After union(0, 1):
parent = [1, 1, 2, 3, 4]   ← 0's root is now 1

After union(2, 3):
parent = [1, 1, 3, 3, 4]   ← 2's root is now 3

After union(1, 3):
parent = [1, 3, 3, 3, 4]   ← 0 and 2 are now in the same component

find(x) follows parent pointers until reaching a root (where parent[root] === root). union(x, y) finds both roots and attaches one under the other.

Find with Path Compression

Without optimisation, chains form naturally. find on a chain of length n is O(n) — no better than a linked list traversal.

Path compression: on the way back up from the root, point every visited node directly to the root. This flattens the tree so that all subsequent find calls on those nodes are O(1).

function find(parent: number[], x: number): number {
  if (parent[x] !== x) {
    parent[x] = find(parent, parent[x]); // recursively compress
  }
  return parent[x];
}

Before path compression — chain: 4 → 3 → 2 → 1 → 1

After find(4) — flat: 4 → 1, 3 → 1, 2 → 1

All nodes visited during the traversal now point directly to the root. Future find calls on any of them are O(1).

Union by Rank

Without rank control, union attaches one root under the other arbitrarily. Adversarial input can create a chain of height n even with path compression applied lazily.

Union by rank always attaches the shorter tree under the taller one. Height is therefore bounded at O(log n) — before path compression makes it even flatter.

function union(parent: number[], rank: number[], x: number, y: number): boolean {
  const rx = find(parent, x);
  const ry = find(parent, y);
  if (rx === ry) return false; // already in the same component

  if (rank[rx] < rank[ry])      parent[rx] = ry;
  else if (rank[rx] > rank[ry]) parent[ry] = rx;
  else {
    parent[ry] = rx;
    rank[rx]++; // only increment when ranks are equal
  }
  return true; // a union actually happened
}

rank is an upper bound on tree height, not the exact height. Once path compression starts flattening trees, actual height drops below the rank — but the rank bound is still correct for ordering decisions.

Combined: O(α(n)) Amortised Per Operation

With both path compression and union by rank applied together:

Each operation costs O(α(n)) amortised
α is the inverse Ackermann function: α(2^65536) = 4
For any realistic input — including graphs with 10^9 nodes — α(n) ≤ 4
For all practical purposes: treat DSU operations as O(1)

The formal proof (Tarjan, 1975) uses a potential function argument and is not interview material. For interviews, state: “With path compression and union by rank, each operation is effectively O(1) amortised.”

Full UnionFind Class

class UnionFind {
  private parent: number[];
  private rank: number[];
  private _count: number; // number of disjoint components

  constructor(n: number) {
    this.parent = Array.from({ length: n }, (_, i) => i);
    this.rank = new Array(n).fill(0);
    this._count = n;
  }

  find(x: number): number {
    if (this.parent[x] !== x) {
      this.parent[x] = this.find(this.parent[x]); // path compression
    }
    return this.parent[x];
  }

  union(x: number, y: number): boolean {
    const rx = this.find(x);
    const ry = this.find(y);
    if (rx === ry) return false; // same component, no-op

    if (this.rank[rx] < this.rank[ry])      this.parent[rx] = ry;
    else if (this.rank[rx] > this.rank[ry]) this.parent[ry] = rx;
    else {
      this.parent[ry] = rx;
      this.rank[rx]++;
    }
    this._count--;
    return true;
  }

  connected(x: number, y: number): boolean {
    return this.find(x) === this.find(y);
  }

  get count(): number {
    return this._count;
  }
}
// All operations: O(α(n)) amortised ≈ O(1)   Space: O(n)

Note: count decrements by 1 with each successful union (when two previously separate components merge). This makes counting connected components free — no post-processing needed.

Number of Islands — LC 200 (DSU Approach)

The canonical approach uses BFS/DFS in O(m×n). DSU produces the same complexity but showcases applying Union Find to a grid. Cells become nodes; adjacency becomes edges.

function numIslands(grid: string[][]): number {
  const rows = grid.length;
  const cols = grid[0].length;

  // Flatten the grid: cell (r, c) → index r*cols+c
  const uf = new UnionFind(rows * cols);
  let waterCells = 0;

  for (let r = 0; r < rows; r++) {
    for (let c = 0; c < cols; c++) {
      if (grid[r][c] === '0') {
        waterCells++;
        continue;
      }
      const idx = r * cols + c;
      // Union with right and bottom neighbours (avoid double-counting)
      if (r + 1 < rows && grid[r + 1][c] === '1') uf.union(idx, (r + 1) * cols + c);
      if (c + 1 < cols && grid[r][c + 1] === '1') uf.union(idx, r * cols + (c + 1));
    }
  }

  // DSU initialises ALL cells (including water) as components.
  // Subtract water cells to get island count.
  return uf.count - waterCells;
}
// Time: O(m × n × α(m×n)) ≈ O(m×n)   Space: O(m×n)

We only union rightward and downward to avoid processing each edge twice (union is idempotent but the double work is unnecessary).

Redundant Connection — LC 684

Given n nodes and n edges forming a graph with exactly one cycle, find the edge that creates the cycle. Return the last such edge in input order (there is exactly one answer).

Insight: process edges one by one. An edge (u, v) creates a cycle if and only if u and v are already in the same component when the edge is encountered.

function findRedundantConnection(edges: number[][]): number[] {
  // Nodes are 1-indexed per the problem statement
  const uf = new UnionFind(edges.length + 1);

  for (const [u, v] of edges) {
    if (!uf.union(u, v)) {
      // union returned false → already connected → this edge is redundant
      return [u, v];
    }
  }

  return []; // never reached for valid input
}
// Time: O(n × α(n)) ≈ O(n)   Space: O(n)

This is a three-line solution once UnionFind is defined. The entire problem reduces to: process edges, return the first one whose endpoints were already connected.

Accounts Merge — LC 721

Each account is [name, email1, email2, ...]. Two accounts belong to the same person if they share at least one email. Merge all accounts for the same person; return each merged account with emails sorted.

Strategy: treat every unique email as a DSU node. For each account, union all its emails together. Then group emails by their root representative.

function accountsMerge(accounts: string[][]): string[][] {
  const emailToId = new Map<string, number>();
  const emailToName = new Map<string, string>();
  let id = 0;

  // Assign a unique integer ID to every email seen
  for (const account of accounts) {
    const name = account[0];
    for (let i = 1; i < account.length; i++) {
      const email = account[i];
      if (!emailToId.has(email)) {
        emailToId.set(email, id++);
        emailToName.set(email, name);
      }
    }
  }

  const uf = new UnionFind(id);

  // Union all emails within the same account
  for (const account of accounts) {
    const firstId = emailToId.get(account[1])!;
    for (let i = 2; i < account.length; i++) {
      uf.union(firstId, emailToId.get(account[i])!);
    }
  }

  // Group emails by their root component representative
  const rootToEmails = new Map<number, string[]>();
  for (const [email, eid] of emailToId) {
    const root = uf.find(eid);
    if (!rootToEmails.has(root)) rootToEmails.set(root, []);
    rootToEmails.get(root)!.push(email);
  }

  // Build result: sort emails alphabetically within each group, prepend name
  const result: string[][] = [];
  for (const emails of rootToEmails.values()) {
    emails.sort();
    // All emails in the group share the same owner name
    result.push([emailToName.get(emails[0])!, ...emails]);
  }

  return result;
}
// Time: O(N × α(N)) where N = total emails across all accounts
// Space: O(N)

The subtlety: email IDs are assigned globally across all accounts. Two accounts that share an email will give that email the same ID on the second encounter (the has check), ensuring they get unioned correctly.

Number of Connected Components — LC 323

Given n nodes and a list of undirected edges, return the number of connected components.

function countComponents(n: number, edges: number[][]): number {
  const uf = new UnionFind(n);
  for (const [u, v] of edges) uf.union(u, v);
  return uf.count;
}
// Time: O(E × α(n)) ≈ O(E)   Space: O(n)

Three lines with the UnionFind class. This is the canonical DSU warm-up problem — if an interviewer asks it, they want to see DSU, not BFS.

Connection to Kruskal’s MST Algorithm

Kruskal’s Minimum Spanning Tree algorithm is built entirely on DSU:

Sort all edges by weight ascending.
Process edges in order: if the two endpoints are not connected, add the edge to the MST and union them.
If they are connected, skip the edge (it would create a cycle).
Stop when n-1 edges have been added (MST is complete).

function kruskalMST(n: number, edges: [number, number, number][]): number {
  // edges[i] = [u, v, weight]
  edges.sort((a, b) => a[2] - b[2]); // sort by weight

  const uf = new UnionFind(n);
  let totalWeight = 0;
  let edgesAdded = 0;

  for (const [u, v, w] of edges) {
    if (uf.union(u, v)) {
      totalWeight += w;
      edgesAdded++;
      if (edgesAdded === n - 1) break; // MST is complete
    }
  }

  return edgesAdded === n - 1 ? totalWeight : -1; // -1 = graph is disconnected
}
// Time: O(E log E) for sorting — DSU operations are negligible
// Space: O(n) for DSU, O(E) for edges

The DSU union replaces what would otherwise be a cycle-detection BFS in every iteration. Without DSU, Kruskal’s is O(E × (V + E)) — impractical for dense graphs.

Union Find vs BFS/DFS for Connectivity

Criteria	Union Find (DSU)	BFS / DFS
Single connectivity query	O(α(n)) ≈ O(1)	O(V + E)
Dynamic edge additions	O(α(n)) per edge	Full re-traversal needed
Dynamic edge deletions	Not supported	Supported with care
Count connected components	O(1) — tracked in `union`	O(V + E)
Find actual path between nodes	Not directly	Yes — record parent array
Detect cycle (undirected)	O(1) — same component check	O(V + E) DFS with colours
Topological order / SCC	Not applicable	DFS-based (Kahn’s, Tarjan’s)
Memory	O(n)	O(V + E)
Implementation lines	~25 lines	~15 lines

Choose Union Find when: you need fast connectivity queries, edges arrive dynamically, you want component counts, or you’re doing Kruskal’s MST. Choose BFS/DFS when: you need the actual path, traversal order matters, edges can be deleted, or you need SCC / topological sort.

Common Pitfalls

Missing path compression in find — without it, find degrades to O(n) on chains. The single line parent[x] = find(parent, parent[x]) is the entire optimisation. Missing it in an interview is a serious mistake.
Forgetting union by rank — without rank, repeated unions on a skewed graph create chains. Always maintain a rank or size array.
Off-by-one on component count — initialise count = n, decrement by 1 in each successful union. Getting this wrong produces an off-by-one answer on every connectivity problem.
Applying DSU to directed graphs — DSU models undirected connectivity. For directed graphs, connected-component questions require Kosaraju’s or Tarjan’s SCC algorithm.
Comparing node indices instead of roots — connected(x, y) must call find(x) === find(y), not parent[x] === parent[y]. Comparing raw parents only checks immediate parents, not ultimate roots.
Forgetting 1-indexed nodes — LC 684 and many graph problems use 1-indexed nodes. Construct new UnionFind(edges.length + 1) or new UnionFind(n + 1) and leave index 0 unused, rather than shifting every node index by -1 throughout the solution.