AI Engineering · Topic 3 of 8

Vector Databases

150 XP

Why relational DBs cannot do this

Similarity search asks: “Find me the 10 vectors most similar to this query vector.” A naïve implementation computes cosine similarity (or Euclidean distance) against every stored vector. With 10 million 1 536-dimensional vectors, that is 10M × 1 536 = ~15 billion floating point operations per query — unacceptably slow.

SQL has no efficient operator for this. A WHERE cosine_similarity(embedding, ?) > 0.8 clause would do a full table scan regardless of any traditional B-tree index, because B-trees index scalar ordering, not geometric proximity in high-dimensional space.

You need a specialised index structure: Approximate Nearest Neighbour (ANN).

ANN algorithms

HNSW — Hierarchical Navigable Small Worlds

The dominant algorithm in production today (used by pgvector, Qdrant, Weaviate). HNSW builds a multilayer graph:

  • Top layers have few nodes and long-range edges — fast coarse navigation.
  • Bottom layer has all nodes and short-range edges — precise local search.

At query time, the algorithm greedily traverses from the top layer downward, greedy-searching each layer. Result: O(log n) queries with ~95–99% recall.

Tradeoffs: High recall, fast queries, but memory-intensive — the graph lives in RAM. Index build time is also O(n log n) with significant constants. Two key parameters:

  • m — connections per node (higher = better recall, more RAM)
  • ef_construction — search width during build (higher = better recall, slower build)

IVF — Inverted File Index

Divides the vector space into nlist Voronoi cells via k-means clustering. Vectors are stored in their nearest centroid’s bucket.

At query time: compute distance to all centroids (cheap), pick the top nprobe cells, scan only those buckets. This trades recall for dramatically lower memory — the index is mostly the centroid list + bucket assignments.

Tradeoffs: Fast builds, low RAM, but recall drops if the nearest vector lands in a non-searched cell. Works well at very large scales (billions of vectors) combined with quantisation (IVF-PQ).

LSH — Locality Sensitive Hashing

Randomly projects vectors into hash buckets where similar vectors collide with high probability. Query: hash the query, check the same bucket(s).

Tradeoffs: Very fast build and query, but lowest recall of the three. Rarely used in production today — HNSW dominates for accuracy-sensitive workloads.

Algorithm comparison

AlgorithmRecallQuery speedBuild timeMemoryBest use
HNSW★★★★★★★★★★★★★HighDefault choice
IVF★★★★★★★★★★★★LowBillion-scale
LSH★★★★★★★★★★★★★LowApproximate only
SystemNotesHosted?
pgvectorPostgres extension; HNSW and IVF indexes; SQL-native metadata filteringSelf-hosted or Supabase/Neon
QdrantPurpose-built, Rust-based, excellent filtered search performanceCloud + self-hosted
PineconeFully managed, simple API, serverless tierCloud only
WeaviateMulti-modal, built-in BM25 hybrid searchCloud + self-hosted
MilvusMassive scale (100B+ vectors), GPU accelerationSelf-hosted / Zilliz Cloud
ChromaLightweight, great for local dev and prototypingEmbedded / self-hosted

Choosing in an interview: Start with pgvector if you already run Postgres — one less system to operate. Graduate to Qdrant or Pinecone when you need better filtered-search performance at scale, or when your vectors outgrow what Postgres can handle in RAM.

Pure vector search returns “semantically similar” results, but you almost always have constraints: document.tenant_id = 'acme', published_at > 2024-01-01, language = 'en'. Pre-filtering (filter first, ANN search the subset) and post-filtering (ANN search all, discard non-matching) both degrade either recall or performance.

Qdrant and Weaviate implement filtered HNSW — the graph traversal skips nodes that fail the filter predicate, maintaining recall without post-filtering overhead. This is a key differentiator from pgvector, which post-filters by default.

When to use BM25 instead

BM25 (the algorithm behind Elasticsearch and most full-text search engines) wins when:

  • Users type exact product names, error codes, IDs, or proper nouns — semantic similarity actively hurts here
  • Your corpus is small (< 50k docs) and latency budget is very tight
  • You need explainability — BM25 term matching is auditable; ANN scores are opaque

The production answer is usually hybrid search: run both BM25 and vector ANN in parallel, then merge the result lists using Reciprocal Rank Fusion (RRF) or a learned reranker. Weaviate and Qdrant have built-in hybrid modes. This is also the retrieval layer of a production RAG system.