Vector Database

Also known as: vector DB, vector store, ANN database, embedding database

TL;DR

A vector database is a database whose primary index is an approximate-nearest-neighbor structure over high-dimensional vectors. The system substrate for production dense retrieval — it wraps an ANN algorithm (HNSW, IVF, PQ) with persistence, replication, metadata filtering, and incremental updates.

A vector database is a database whose primary index is an approximate-nearest-neighbor structure over high-dimensional vectors. The defining query is give me the K vectors closest, by cosine or L2, to this query vector; metadata filtering, persistence, replication, and write throughput are scaffolding around the ANN core. The category exists because embedding -based retrieval and RAG made an ANN index the load-bearing operator in production systems whose previous database was a row store.

What the system actually does

The read path is three stages: an ANN walk (usually HNSW , sometimes IVF with product-quantization compression), a metadata filter against scalar columns (tenant_id, published_at), then a score-sort to top-K. The write path is upsert plus incremental index update — a new HNSW node, an IVF assignment, a tombstone for a delete. ANN indexes were designed as build-once in-memory structures; turning them into a database means concurrent inserts that don’t block queries, crash safety mid-build, persisted graph structure, and quantized codes kept in sync with full-precision originals.

The current landscape

The vector-DB family

FAISS — library, not database. Use directly when you control the lifecycle.
pgvector — Postgres extension. Right answer under ~10M vectors: ACID, joins to row data.
LanceDB — columnar Lance format, append-only, Rust. Incremental indexing without rebuilds.
Qdrant — Rust, strongest payload-filtering ergonomics. OSS default when filters are load-bearing.
Weaviate — Go, multi-tenant, hybrid search built in. Polished product surface.
Pinecone — managed, opinionated. The easiest answer when ops capacity is the constraint.
Milvus — heaviest, GPU support, the original heavyweight at billion-scale.
Vespa — battle-tested at Yahoo and Spotify scale. Sparse and dense in one engine.

Where the choice actually matters

Under 10M vectors with simple filters, pgvector with HNSW collapses the stack to one system. Between 10M and 1B vectors with tight latency, Qdrant or LanceDB — filter ergonomics vs append-only ingestion. Hybrid sparse+dense with faceted filtering points to Weaviate or Vespa. Managed with no ops capacity is Pinecone. GPU-accelerated billion-scale points to Milvus or FAISS-on-GPU with an infra team owning deployment.

A vector DB is an ANN library plus the boring stuff that makes the ANN library survive production — persistence, concurrent writes, metadata-filtered queries, and update-without-rebuild.

Two patterns. Log-structured / append-only: writes land as new fragments with their own indexes, queries fan out and merge, background compaction merges fragments. LanceDB’s manifest points at the current fragment set; updates never rewrite existing data.

In-place insertion with tombstones: HNSW supports per-vector insertion natively. Deletes are tombstone bits the search ignores; the graph is rebuilt periodically to reclaim space. IVF is less forgiving — when the corpus distribution shifts, centroids stop reflecting it, and K-means retraining is the only fix. HNSW handles continuous ingestion gracefully; IVF needs reindex windows.

Three workloads. Heavily lexical retrieval: queries dominated by exact tokens — model numbers, identifiers, error codes — BM25 wins. Heavily structured retrieval: when the dominant filter is WHERE a = X AND b > Y and embedding similarity is a tiebreaker, a row store with a B-tree plus a cheap reranker beats the vector path. Small corpora: under ~100K vectors, brute-force is faster than the ANN walk plus database overhead.

Where it fits in the retrieval stack

The vector database is the substrate for dense retrieval — the first-stage candidate generator in retrieve-then-rerank. It also backs a semantic cache , where the index is over past queries rather than corpus documents. Mixed with a lexical inverted index it becomes hybrid search , the production default for anything that isn’t pure paraphrase.

Go further

FAISS is a library — what makes a vector DB different?

A vector DB adds persistence, concurrent writes, metadata filtering ('where customer_id = X'), incremental indexing, and replication on top of a raw ANN algorithm. FAISS provides the algorithm; LanceDB, Qdrant, Pinecone, Weaviate, and Milvus wrap that algorithm into operational systems with HTTP/gRPC surfaces, auth, and multi-tenancy.

FAISS LanceDB

Do I need a dedicated vector DB or can I use Postgres + pgvector?

Under ~10M vectors and modest QPS, pgvector with HNSW is the right answer — one operational system, ACID guarantees, no synchronization between vector and row data. Above ~10M vectors, dedicated vector DBs win on memory layout, ANN tuning, and update throughput, and the cost of running a second system stops feeling silly.

HNSW

What's the most-overlooked feature when comparing vector DBs?

Metadata-filtered ANN. Naive 'retrieve then filter' wrecks recall when the filter is selective — ask for top-100, get 5 that pass the filter, and you have top-5 in expectation rather than top-100. Production DBs need pre-filtered ANN that prunes disqualified candidates during the ANN walk. Qdrant and Weaviate handle this well; many older systems do not.

Hybrid search

← All concepts

The best AI teams build with ZeroEntropy models

Book Demo View docs