Rerankerszerank-2 · zerank-2-small · zerank-2-nano Embeddingszembed-1 Custom Modelscontext compression · query rewriting · fine-tuning Enterpriseon-prem · dedicated · SLA

Legal Manufacturing Healthcare Finance Customer Support E-Commerce

Documentation Slack Community Discord

BlogEngineering posts, releases, and field notes.ConceptsReference catalog of retrieval + LLM primitives.PlaybooksNamed failure modes with diagnostics and fixes.VersusHead-to-head against every major competitor.EvalsHow we benchmark in production conditions.

Latest from the blog

Matryoshka Is Dead: Why MRL Isn't Lossless for zembed-1

Zemail: Semantic Gmail Search on Claude Code & Cowork

AutoOptimize: Why Your Embedding Model Is the Bottleneck in Agentic AI

Reranking Reddit: What Happens When You Sort Comments by Relevance Instead of Karma

harrier-27b: Can 27B Parameters Beat zembed-1?

Pricing

Rerankerszerank-2 · zerank-2-small · zerank-2-nano Embeddingszembed-1 Custom Modelscontext compression · query rewriting · fine-tuning Enterpriseon-prem · dedicated · SLA

Legal Manufacturing Healthcare Finance Customer Support E-Commerce

Documentation Slack Community Discord

Blog

Matryoshka Is Dead: Why MRL Isn't Lossless for zembed-1

Zemail: Semantic Gmail Search on Claude Code & Cowork

AutoOptimize: Why Your Embedding Model Is the Bottleneck in Agentic AI

Reranking Reddit: What Happens When You Sort Comments by Relevance Instead of Karma

harrier-27b: Can 27B Parameters Beat zembed-1?

Smarter Context Compression for LLM Pipelines: zerank-2 as a Calibrated Classifier

Beyond Binary: A New Version of the MTEB

zembed-1 vs voyage-4: Our Embedding Model Wins on Retrieval

"Let's eat, grandma" vs "let's eat grandma": how embedding models encode the world

Introducing zembed-1: The World's Best Text-Embedding Model

How Assembled Powers High-Quality AI Customer Support with ZeroEntropy

Prompting Best Practices For Instruction-Following Rerankers

Open-source alternatives to Cohere Rerank in 2026

Latency Performance Assessment of zerank-2

Introducing zerank-2: The Most Accurate Multilingual Instruction-Following Reranker

The Latency Myth: Why Reranking Is Still the Smartest Optimization You Can Make

Context Engineering Webinar: Everything You Missed

How Vera Health Achieved State-of-the-Art Clinical Accuracy Using ZeroEntropy

Equall Improves Legal Document Structuring and Retrieval Accuracy with ZeroEntropy

Implementing ZeroEntropy Reranking with turbopuffer Retrieval

Paper TLDR: How we trained zerank-1 with the zELO method

Mem0 Improves Memory Retrieval Accuracy with ZeroEntropy

On The Geometric Limit of Dense Single Vector Embeddings

Should You Use LLMs for Reranking? A Deep Dive into Pointwise, Listwise, and Cross-Encoders

My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

Announcing ZeroEntropy's First Rerankers: zerank-1 and zerank-1-small

ZeroEntropy Raises $4.2M Seed Round to Make AI Retrieval Truly Intelligent

Improving Retrieval with ELO Scores

What is a reranker and do I need one?

Deep Dive: The Architecture of ZeroEntropy v1

AGI requires better retrieval, not just better LLMs

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Pricing Evals Sign in

Concepts / Search & Retrieval

Topic · 21 concepts

Search & Retrieval

How systems find relevant documents in the first place.

First-pass retrieval is the wide-net stage of any production search system — the algorithms, indexes, and tradeoffs that surface a few hundred candidates per query out of millions. Classical lexical retrieval (BM25), modern dense embeddings, and the hybrids that combine them all live here. Get this stage wrong and the rest of the pipeline can't recover; get it right and a reranker downstream can finish the job. The concepts below cover what each technique actually does, where each one breaks, and why production systems almost always layer multiple methods rather than picking one.

Approximate Nearest Neighbor (ANN)

ANN algorithms — HNSW, IVF, ScaNN — find the closest vectors to a query without scanning all of them. They give up a small slice of recall in exchange for orders-of-magnitude speedup.
BM25

BM25 is a classical lexical retrieval algorithm that scores documents by how well their term frequencies match a query, with corrections for document length and rare-term importance.
Chunking

Chunking is the process of splitting long documents into smaller passages that fit cleanly inside an embedding model's context window — and that align with semantic boundaries so each chunk is independently retrievable.
Dense Retrieval

Dense retrieval finds documents by comparing their embeddings to a query embedding via cosine or dot product, served from an approximate-nearest-neighbor index.
FAISS

FAISS (Facebook AI Similarity Search) is the C++ library for efficient similarity search and clustering of dense vectors. It implements the canonical ANN algorithms — flat, IVF, HNSW, PQ, and combinations — with CPU and GPU backends.
First-Pass Retrieval

First-pass retrieval is the initial wide-net stage of a production search pipeline that surfaces a few hundred candidate documents per query out of millions. It optimizes for recall and speed; precision-at-the-top is left to a reranker downstream.
Grounded Generation

Grounded generation is the pattern of forcing an LLM's output to be derivable from a supplied set of retrieved sources, with citations attached. The standard defense against hallucination in RAG pipelines.
HNSW

HNSW (Hierarchical Navigable Small World) is the dominant graph-based ANN algorithm. A multi-layer proximity graph supports log-time approximate search by greedy walks at each layer.
Hybrid Search

Hybrid search combines lexical retrieval (BM25) with dense retrieval (embeddings) into one ranked candidate set. Each method catches what the other misses, so the union is more recall-complete than either alone.
Inverted Index

An inverted index maps each term to the list of documents (and positions) where it appears. The classical data structure behind keyword search — sub-millisecond lookups over billions of documents and the substrate every BM25 implementation builds on.
IVF Clustering

IVF (Inverted File Index) is the cluster-based ANN algorithm: K-means partitions the corpus into a few thousand cells, each query is matched to its nearest centroids, then exhaustively searched within only those cells.
Parent-Document Retrieval

Parent-document retrieval splits the index granularity from the context granularity: embed and retrieve over small chunks for precision, but return the larger parent document to the LLM. Fixes the chunk-boundary problem in RAG.
Product Quantization

Product quantization (PQ) compresses a vector by splitting it into M sub-vectors and quantizing each independently against a small codebook learned via K-means.
Query Expansion

Augmenting the original query with synonyms, paraphrases, or hypothetical answers before retrieval. The classical IR technique that LLMs reinvented as HyDE. Sometimes a clean win, sometimes drift that hurts more than it helps.
Query Rewriting

Query rewriting transforms a user's raw query into one or more reformulated versions tuned for retrieval — expanding abbreviations, decomposing multi-part questions, or fixing the syntax expected by an underlying search API.
RAG (Retrieval-Augmented Generation)

RAG is the pattern of retrieving relevant documents and feeding them into an LLM as context, so the LLM can answer with grounded, citeable information instead of guessing from its training data.
Reciprocal Rank Fusion

Reciprocal rank fusion (RRF) is the boring, parameter-free way to merge multiple ranked lists into one. Sum $1/(k + \text{rank})$ across lists with $k=60$ — and you have the default fusion method in production hybrid-search stacks.
Semantic Search

Semantic search is the umbrella term for retrieval that goes beyond surface keyword matching to capture meaning — most often via dense embeddings, but also via learned-sparse models, query rewriting, and reranking.
Sparse Retrieval

Sparse retrieval is the family of methods that represent queries and documents as high-dimensional sparse vectors over a vocabulary — including BM25 and modern learned-sparse models like SPLADE and uniCOIL.
SPLADE

SPLADE (SParse Lexical AnD Expansion) is a learned sparse retrieval model: a transformer produces a sparse term-weight vector over the BERT vocabulary for each query and document, scored by dot product on an inverted index.
TF-IDF

TF-IDF weighs a term by how often it appears in a document (term frequency) times how rare it is across the corpus (inverse document frequency).