Rerankerszerank-2 · zerank-2-small · zerank-2-nano Embeddingszembed-1 Custom Modelscontext compression · query rewriting · fine-tuning Enterpriseon-prem · dedicated · SLA

Legal Manufacturing Healthcare Finance Customer Support E-Commerce

Documentation Slack Community Discord

BlogEngineering posts, releases, and field notes.ConceptsReference catalog of retrieval + LLM primitives.PlaybooksNamed failure modes with diagnostics and fixes.VersusHead-to-head against every major competitor.EvalsHow we benchmark in production conditions.

Latest from the blog

Matryoshka Is Dead: Why MRL Isn't Lossless for zembed-1

Zemail: Semantic Gmail Search on Claude Code & Cowork

AutoOptimize: Why Your Embedding Model Is the Bottleneck in Agentic AI

Reranking Reddit: What Happens When You Sort Comments by Relevance Instead of Karma

harrier-27b: Can 27B Parameters Beat zembed-1?

Pricing

Rerankerszerank-2 · zerank-2-small · zerank-2-nano Embeddingszembed-1 Custom Modelscontext compression · query rewriting · fine-tuning Enterpriseon-prem · dedicated · SLA

Legal Manufacturing Healthcare Finance Customer Support E-Commerce

Documentation Slack Community Discord

Blog

Matryoshka Is Dead: Why MRL Isn't Lossless for zembed-1

Zemail: Semantic Gmail Search on Claude Code & Cowork

AutoOptimize: Why Your Embedding Model Is the Bottleneck in Agentic AI

Reranking Reddit: What Happens When You Sort Comments by Relevance Instead of Karma

harrier-27b: Can 27B Parameters Beat zembed-1?

Smarter Context Compression for LLM Pipelines: zerank-2 as a Calibrated Classifier

Beyond Binary: A New Version of the MTEB

zembed-1 vs voyage-4: Our Embedding Model Wins on Retrieval

"Let's eat, grandma" vs "let's eat grandma": how embedding models encode the world

Introducing zembed-1: The World's Best Text-Embedding Model

How Assembled Powers High-Quality AI Customer Support with ZeroEntropy

Prompting Best Practices For Instruction-Following Rerankers

Open-source alternatives to Cohere Rerank in 2026

Latency Performance Assessment of zerank-2

Introducing zerank-2: The Most Accurate Multilingual Instruction-Following Reranker

The Latency Myth: Why Reranking Is Still the Smartest Optimization You Can Make

Context Engineering Webinar: Everything You Missed

How Vera Health Achieved State-of-the-Art Clinical Accuracy Using ZeroEntropy

Equall Improves Legal Document Structuring and Retrieval Accuracy with ZeroEntropy

Implementing ZeroEntropy Reranking with turbopuffer Retrieval

Paper TLDR: How we trained zerank-1 with the zELO method

Mem0 Improves Memory Retrieval Accuracy with ZeroEntropy

On The Geometric Limit of Dense Single Vector Embeddings

Should You Use LLMs for Reranking? A Deep Dive into Pointwise, Listwise, and Cross-Encoders

My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

Announcing ZeroEntropy's First Rerankers: zerank-1 and zerank-1-small

ZeroEntropy Raises $4.2M Seed Round to Make AI Retrieval Truly Intelligent

Improving Retrieval with ELO Scores

What is a reranker and do I need one?

Deep Dive: The Architecture of ZeroEntropy v1

AGI requires better retrieval, not just better LLMs

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Pricing Evals Sign in

Concepts / Rerankers

Topic · 9 concepts

Rerankers

The second stage that puts the right answer at the top.

First-pass retrieval is fast and approximate; rerankers are slow and precise. A reranker — typically a cross-encoder — takes a small candidate set from first-pass and reorders it by actual relevance, paying close attention to each (query, document) pair. The concepts below cover the architectural variants (cross-encoder vs bi-encoder, pointwise vs pairwise vs listwise), the production properties that matter (calibration, instruction-following, confidence), and the tradeoffs that make rerankers indispensable in any RAG pipeline aiming for high-quality answers.

Cascade Rerankers

A cascade reranker stacks multiple rerankers from cheap-and-fast to expensive-and-accurate, with each stage filtering candidates before passing a smaller set to the next.
ColBERT

A late-interaction retrieval architecture: encode each token of query and document into its own vector, score pairs by maxsim. Sits between bi-encoder (one vector per text, fast) and cross-encoder (full attention, accurate but slow).
Cross-Encoder

A cross-encoder takes a (query, document) pair as a single joint input and produces one relevance score. It captures token-level interactions between query and document — much more accurate than embedding them separately, at higher cost per pair.
Instruction-Following Reranker

An instruction-following reranker accepts an explicit instruction or context alongside the (query, document) pair, and reranks accordingly. Lets you inject business rules, user preferences, or domain context per call without retraining.
Listwise Reranking

Listwise reranking processes the entire candidate list as a single input and produces a permutation, rather than scoring each (query, document) pair independently. More expressive but more expensive — typically powered by an LLM.
Pairwise Reranker

A reranker that scores by comparing two candidate documents head-to-head — `model(query, doc_A, doc_B) → which is more relevant`. More accurate than pointwise (transitivity arbitrage, calibration-free) but $O(N^2)$ at inference.
Pointwise Scoring

Pointwise scoring evaluates each (query, document) pair independently, producing one relevance score per pair. The dominant pattern for cross-encoder rerankers because it's simple, parallelizable, and produces calibrated scores.
Reranker

A reranker is a second-stage retrieval model that takes a candidate set from first-pass retrieval and reorders it by relevance. It's how production search systems get high precision without paying full LLM cost on every query.
Score Calibration (Rerankers)

A calibrated reranker outputs scores whose absolute value is meaningful — 0.8 means roughly 80% relevance consistently across queries and domains, so you can threshold and filter reliably. Most rerankers are *not* calibrated.