Rerankerszerank-2 · zerank-2-small · zerank-2-nano Embeddingszembed-1 Custom Modelscontext compression · query rewriting · fine-tuning Enterpriseon-prem · dedicated · SLA

Legal Manufacturing Healthcare Finance Customer Support E-Commerce

Documentation Slack Community Discord

BlogEngineering posts, releases, and field notes.ConceptsReference catalog of retrieval + LLM primitives.PlaybooksNamed failure modes with diagnostics and fixes.VersusHead-to-head against every major competitor.EvalsHow we benchmark in production conditions.

Latest from the blog

Matryoshka Is Dead: Why MRL Isn't Lossless for zembed-1

Zemail: Semantic Gmail Search on Claude Code & Cowork

AutoOptimize: Why Your Embedding Model Is the Bottleneck in Agentic AI

Reranking Reddit: What Happens When You Sort Comments by Relevance Instead of Karma

harrier-27b: Can 27B Parameters Beat zembed-1?

Pricing

Rerankerszerank-2 · zerank-2-small · zerank-2-nano Embeddingszembed-1 Custom Modelscontext compression · query rewriting · fine-tuning Enterpriseon-prem · dedicated · SLA

Legal Manufacturing Healthcare Finance Customer Support E-Commerce

Documentation Slack Community Discord

Blog

Matryoshka Is Dead: Why MRL Isn't Lossless for zembed-1

Zemail: Semantic Gmail Search on Claude Code & Cowork

AutoOptimize: Why Your Embedding Model Is the Bottleneck in Agentic AI

Reranking Reddit: What Happens When You Sort Comments by Relevance Instead of Karma

harrier-27b: Can 27B Parameters Beat zembed-1?

Smarter Context Compression for LLM Pipelines: zerank-2 as a Calibrated Classifier

Beyond Binary: A New Version of the MTEB

zembed-1 vs voyage-4: Our Embedding Model Wins on Retrieval

"Let's eat, grandma" vs "let's eat grandma": how embedding models encode the world

Introducing zembed-1: The World's Best Text-Embedding Model

How Assembled Powers High-Quality AI Customer Support with ZeroEntropy

Prompting Best Practices For Instruction-Following Rerankers

Open-source alternatives to Cohere Rerank in 2026

Latency Performance Assessment of zerank-2

Introducing zerank-2: The Most Accurate Multilingual Instruction-Following Reranker

The Latency Myth: Why Reranking Is Still the Smartest Optimization You Can Make

Context Engineering Webinar: Everything You Missed

How Vera Health Achieved State-of-the-Art Clinical Accuracy Using ZeroEntropy

Equall Improves Legal Document Structuring and Retrieval Accuracy with ZeroEntropy

Implementing ZeroEntropy Reranking with turbopuffer Retrieval

Paper TLDR: How we trained zerank-1 with the zELO method

Mem0 Improves Memory Retrieval Accuracy with ZeroEntropy

On The Geometric Limit of Dense Single Vector Embeddings

Should You Use LLMs for Reranking? A Deep Dive into Pointwise, Listwise, and Cross-Encoders

My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

Announcing ZeroEntropy's First Rerankers: zerank-1 and zerank-1-small

ZeroEntropy Raises $4.2M Seed Round to Make AI Retrieval Truly Intelligent

Improving Retrieval with ELO Scores

What is a reranker and do I need one?

Deep Dive: The Architecture of ZeroEntropy v1

AGI requires better retrieval, not just better LLMs

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Pricing Evals Sign in

Concepts / Training Methodology

Topic · 21 concepts

Training Methodology

How modern retrieval models get their relevance signal.

The supervision signal a retrieval model is trained on determines what it can learn. The concepts below cover how modern rerankers and embeddings get their relevance targets — pairwise preferences from frontier-LLM ensembles, Thurstone fits that recover continuous Elo-style scores, and the distillation pipelines that compress a giant teacher into a fast specialized student. This is the methodology family behind zerank-1, zerank-2, and zembed-1, and the same shape generalizes to any narrow task where pairwise judgments are cheap and absolute scores are noisy.

Catastrophic Forgetting

When fine-tuning a pre-trained model on a new task erases capabilities the base model originally had. The classical neural-network failure mode that dominates fine-tuning practice — and the reason LoRA, mixed-data training, and rehearsal exist.
Constitutional AI

Constitutional AI replaces human pairwise preference labels with a written constitution — a list of natural-language rules — and uses an LLM to critique and revise its own outputs against those rules.
DPO (Direct Preference Optimization)

DPO is the closed-form alternative to RLHF: optimize the LLM directly on pairwise preferences, with no separate reward model and no reinforcement learning loop. Simpler, more stable, and the default alignment recipe in 2026.
Elo Score

Elo is a continuous skill rating recovered from pairwise win/loss outcomes — originally for chess, now repurposed in retrieval to convert pairwise document preferences into pointwise relevance scores.
Ensemble Learning

Combining the predictions of multiple models — bagging, boosting, stacking — to get a single output more accurate than any individual member.
Entropy Regularization

Adding an entropy bonus to a training objective to keep the model's output distribution from collapsing too sharply. Used in policy-gradient RL (PPO, SAC, A3C) to encourage exploration.
Fine-Tuning

Fine-tuning is the process of further training a pre-trained model on task-specific or domain-specific data. It's how a generalist becomes a specialist.
Information Bottleneck

The information bottleneck principle frames learning as a compression problem: find a representation T of input X that throws away every bit of X that is not informative about the target Y. Formally, maximize I(T; Y) while minimizing I(X; T).
Instruction Tuning

Instruction tuning is fine-tuning a pre-trained language model on (instruction, response) pairs so it learns to follow directions. The step that turns 'GPT-base' into 'GPT-instruct'.
Knowledge Distillation

Training a small (student) model to mimic the outputs of a larger (teacher) model — getting most of the teacher's quality at a fraction of the cost. The basis of essentially every production deployment of small specialized models.
Learning-Rate Scheduler

A learning-rate scheduler is the function that changes the learning rate over training. Linear warmup followed by cosine decay is the modern default; WSD (warmup-stable-decay) is the 2024 successor. Picking the schedule is as load-bearing as picking the peak LR.
LoRA and Parameter-Efficient Fine-Tuning (PEFT)

LoRA injects tiny low-rank adapter matrices into a frozen base model and trains only those — typically ~1% of the parameters. Results match or beat full fine-tuning on most narrow tasks at a fraction of the memory and storage cost.
Pairwise Preference

Pairwise preference is the supervision signal where, for a query and two candidate documents, an annotator (or LLM) picks which one is more relevant.
PPO (Proximal Policy Optimization)

A clipped policy-gradient algorithm that keeps each update close to the previous policy via a clip on the importance-sampling ratio. The standard RL optimizer for RLHF — Schulman et al. 2017, OpenAI — and the algorithm GPT-3.5/4 and Llama-2 were aligned with.
Process Reward Model

A process reward model (PRM) scores each intermediate step of a reasoning chain, not just the final answer. It's the supervision signal that powers post-o1 reasoning models — credit assignment along the trajectory, not only at the end.
Reward Modeling

Training a model that predicts a scalar quality or preference score for an LLM's output. The backbone of RLHF — the reward model is what the LLM optimizes against.
RLHF (Reinforcement Learning from Human Feedback)

RLHF is the classical alignment recipe: train a reward model from human pairwise preferences, then fine-tune the language model with PPO to maximize that reward.
Supervised Fine-Tuning (SFT)

SFT is plain supervised learning applied to a pre-trained language model: given (input, target) pairs, train the model to produce the target. The umbrella term for any fine-tuning that's not preference-based — distinct from RLHF and DPO.
Synthetic Data Generation

Using a frontier LLM to generate training data for a smaller specialized model. The dominant data-creation method in 2026 — every modern open-weight instruct model and most production-tuned rerankers train on synthetic data, including zerank-2.
Thurstone Model

A statistical model from 1927 that converts pairwise comparisons into continuous quality scores. Foundational to chess Elo ratings, food preference studies, and modern reranker training via the zELO methodology.
zELO

ZeroEntropy's training methodology for rerankers and embeddings. Frontier LLMs vote pairwise on document relevance; a Thurstone fit recovers continuous Elo-style scores; the scores become regression targets for a small specialized model.