Why relational DBs cannot do this
Similarity search asks: “Find me the 10 vectors most similar to this query vector.” A naïve implementation computes cosine similarity (or Euclidean distance) against every stored vector. With 10 million 1 536-dimensional vectors, that is 10M × 1 536 = ~15 billion floating point operations per query — unacceptably slow.
SQL has no efficient operator for this. A WHERE cosine_similarity(embedding, ?) > 0.8 clause would do a full table scan regardless of any traditional B-tree index, because B-trees index scalar ordering, not geometric proximity in high-dimensional space.
You need a specialised index structure: Approximate Nearest Neighbour (ANN).
ANN algorithms
HNSW — Hierarchical Navigable Small Worlds
The dominant algorithm in production today (used by pgvector, Qdrant, Weaviate). HNSW builds a multilayer graph:
- Top layers have few nodes and long-range edges — fast coarse navigation.
- Bottom layer has all nodes and short-range edges — precise local search.
At query time, the algorithm greedily traverses from the top layer downward, greedy-searching each layer. Result: O(log n) queries with ~95–99% recall.
Tradeoffs: High recall, fast queries, but memory-intensive — the graph lives in RAM. Index build time is also O(n log n) with significant constants. Two key parameters:
m— connections per node (higher = better recall, more RAM)ef_construction— search width during build (higher = better recall, slower build)
IVF — Inverted File Index
Divides the vector space into nlist Voronoi cells via k-means clustering. Vectors are stored in their nearest centroid’s bucket.
At query time: compute distance to all centroids (cheap), pick the top nprobe cells, scan only those buckets. This trades recall for dramatically lower memory — the index is mostly the centroid list + bucket assignments.
Tradeoffs: Fast builds, low RAM, but recall drops if the nearest vector lands in a non-searched cell. Works well at very large scales (billions of vectors) combined with quantisation (IVF-PQ).
LSH — Locality Sensitive Hashing
Randomly projects vectors into hash buckets where similar vectors collide with high probability. Query: hash the query, check the same bucket(s).
Tradeoffs: Very fast build and query, but lowest recall of the three. Rarely used in production today — HNSW dominates for accuracy-sensitive workloads.
Algorithm comparison
| Algorithm | Recall | Query speed | Build time | Memory | Best use |
|---|---|---|---|---|---|
| HNSW | ★★★★★ | ★★★★★ | ★★★ | High | Default choice |
| IVF | ★★★★ | ★★★★ | ★★★★ | Low | Billion-scale |
| LSH | ★★★ | ★★★★★ | ★★★★★ | Low | Approximate only |
Popular vector database systems
| System | Notes | Hosted? |
|---|---|---|
| pgvector | Postgres extension; HNSW and IVF indexes; SQL-native metadata filtering | Self-hosted or Supabase/Neon |
| Qdrant | Purpose-built, Rust-based, excellent filtered search performance | Cloud + self-hosted |
| Pinecone | Fully managed, simple API, serverless tier | Cloud only |
| Weaviate | Multi-modal, built-in BM25 hybrid search | Cloud + self-hosted |
| Milvus | Massive scale (100B+ vectors), GPU acceleration | Self-hosted / Zilliz Cloud |
| Chroma | Lightweight, great for local dev and prototyping | Embedded / self-hosted |
Choosing in an interview: Start with pgvector if you already run Postgres — one less system to operate. Graduate to Qdrant or Pinecone when you need better filtered-search performance at scale, or when your vectors outgrow what Postgres can handle in RAM.
Metadata filtering + vector search
Pure vector search returns “semantically similar” results, but you almost always have constraints: document.tenant_id = 'acme', published_at > 2024-01-01, language = 'en'. Pre-filtering (filter first, ANN search the subset) and post-filtering (ANN search all, discard non-matching) both degrade either recall or performance.
Qdrant and Weaviate implement filtered HNSW — the graph traversal skips nodes that fail the filter predicate, maintaining recall without post-filtering overhead. This is a key differentiator from pgvector, which post-filters by default.
When to use BM25 instead
BM25 (the algorithm behind Elasticsearch and most full-text search engines) wins when:
- Users type exact product names, error codes, IDs, or proper nouns — semantic similarity actively hurts here
- Your corpus is small (< 50k docs) and latency budget is very tight
- You need explainability — BM25 term matching is auditable; ANN scores are opaque
The production answer is usually hybrid search: run both BM25 and vector ANN in parallel, then merge the result lists using Reciprocal Rank Fusion (RRF) or a learned reranker. Weaviate and Qdrant have built-in hybrid modes. This is also the retrieval layer of a production RAG system.