QUERYDOCUMENT"what is RAG?""RAG combines..."TOKENIZE & CONCATENATECLSSEPEVERY TOKEN ATTENDS TO EVERY OTHERCLSSEPCLSSEP[CLS] HIDDEN STATE.42-.18.67.31-.44.55LINEAR HEAD0.78RELEVANCE
Product · Rerankers

Add Accuracy, Not Latency

Reorders your search candidates so the actual answer beats the lookalike sitting next to it. zerank-2 tops every public reranker leaderboard at 2–3× the speed and a fraction the cost of an LLM doing the same job.

Models
zerank-2
Our flagship rerankerSOTA performance · 4B · instruction-following · multilingual · open weights
zerank-2-nano
Latency-tuned, on the API0.6B · lowest P50 · closed weights
zerank-2-small
Mid-size · private beta1.7B · closed weights · contact us
Quantized inference
4–5× throughput at frontier accuracycustom kernels · much lower P50 · same NDCG
Trusted by
The Problem

Embeddings find related docs.Rerankers find relevant ones.

First-stage retrieval — BM25, dense embeddings, hybrids — surfaces a few hundred candidates per query. Without a reranker, your top-1 is whatever happened to be closest in vector space. Usually that's an answer-shaped distractor sitting next to an actual answer.

0.78
NDCG@10

zerank-2 leads the reranker leaderboard across 29 evaluation datasets and three independent LLM judges.

149.7 ms
P50 latency

Per rerank call on 12 KB documents — 2–3× faster than Jina and Cohere at superior quality.

12–17×
faster than LLMs

Compared to a frontier listwise LLM reranker like GPT-5-mini, at a fraction the cost and higher NDCG@10.

Benchmark

NDCG@10 average across 29 datasets

See evals
What Teams Are Saying

ZeroEntropy gave us state-of-the-art clinical accuracy across millions of medical research papers — both for simple retrieval and for our Deep Research use case via the MCP server.

Vera HealthClinical AI, Vera Health

We replaced our reranker with ZeroEntropy and saw an immediate jump in answer quality on our customer support corpus. The latency was the part we expected to lose; it actually got better.

AssembledEngineering, Assembled

Memory recall accuracy went up meaningfully across our agent benchmarks once we wired zerank-2 into the retrieval path.

Mem0Engineering, Mem0
Specs & Performance

zerank-2: The World's Best Reranker

Cross-encoder reranker, calibrated, multilingual, instruction-following. The numbers below are from the zerank-2 launch evals and the latency assessment under Poisson production load.

Specs
Parameters
4B (flagship) · 1.7B small · 0.6B nano
Architecture
Cross-encoder · open weights on the 4B
Context window
32K tokens
Languages
100+ · near-English parity on major ones · code-switch robust
Outputs
Calibrated 0–1 relevance score + per-call confidence statistic
Instructions
Native — append context, abbreviation tables, business rules per call
Pricing
$0.025 / 1M tokens — 50% under every other commercial reranker
Performance
NDCG@10 (29 datasets, 3 LLM judges)
0.7625 — #1 across public reranker leaderboards
P50 latency · 12 KB docs
149.7 ms — 2–3× faster than Cohere & Jina at higher quality
Latency tail · Poisson load
2.7% over 500 ms · 0.9% over 1 s · 0% over 3 s · zero failures
vs Cohere rerank-3.5 · >500 ms
2.7% vs 14.3%
vs Jina reranker m0 · >500 ms
2.7% vs 70.8%
vs frontier listwise LLMs
12–17× faster at higher NDCG@10 and a fraction of the cost
Methodology

zELO — train on the easier question

Ask two careful raters “is this document relevant?” and you'll get two different answers — relevance is fuzzy. Ask them “which of these two is more relevant?” and they'll agree. zELO trains exclusively on the easy question, then converts those head-to-head wins into continuous relevance scores. Same math that ranks chess players.

The Same Math Behind Chess EloPOINTWISEQ: "how relevant is this — 0 to 1?"q:"what is RAG?"d:"RAG combines retrieval and..."MODEL010.320.710.48SAME (q, d) · DIFFERENT SCORE EVERY CALLvsPAIRWISEQ: "which answers it better — A or B?"q:"what is RAG?"A:"Cheap cotton cloth for..."B:"RAG is retrieval +..."MODEL→ B→ B→ BSAME (q, A, B) · SAME ANSWER EVERY CALLAGGREGATING PAIRWISE OUTCOMES → CONTINUOUS SCORESPAIRWISE OUTCOMESA vs BBwinsB vs CBwinsA vs CAwinsB vs DBwinsC vs DCwinsTHURSTONEFITCONTINUOUS RELEVANCE SCOREA0.50B0.50C0.50D0.50↳ TRAIN THE RERANKER ON THESE
01

Pointwise scoring is noisy

Ask a rater to score (query, document) on a 0–1 scale and you get a different number every time — different raters disagree, and the same rater either drifts or discretizes hard. The label noise is the ceiling on every reranker trained against it.

02

Pairwise comparisons are stable

Ask the same rater 'given this query, is document A or B more relevant?' and the answer barely moves across raters or across calls. The information density is much higher per judgment, and the disagreement is much lower.

03

Recover scores via Thurstone — like chess Elo

Many pairwise outcomes (A beats B, B beats C, A beats C, …) feed a Thurstone fit — the same statistical idea behind chess Elo. Out comes one continuous relevance score per document, calibrated against every comparison we've seen. Those scores are the SFT target.

04

LLMs as raters, calibrated and mixed (zerank-2)

Frontier LLMs (Claude, GPT, Gemini) are the raters — more consistent than humans and able to work at scale. zerank-2 extends this with a per-rater calibration: we fit a (μ, κ) Beta distribution to each model and iteratively mix them, so each rater's judgment is weighted by how reliable it has actually been.

Paper

Read the work behind the model

loading…

zELO: ELO-inspired Training Method for Rerankers and Embedding Models

Pipitone, Houir Alami, Avadhanam, Kaminskyi, Khoo

We introduce a novel training methodology named zELO, which optimizes retrieval performance via the analysis that ranking tasks are statically equivalent to a Thurstone model. Trained end-to-end from unannotated queries and documents in less than 10,000 H100-hours, zerank-1 achieves the highest retrieval scores across finance, legal, code, and STEM — outperforming closed-source proprietary rerankers on NDCG@10 and Recall.

arXiv abstract
Pairwise Preferences → Continuous Relevance ScoresSTEP 1 · GROUND TRUTHfrontier LLMs(q, dᵢ, dⱼ)1 random pairclaudeCoTgptCoTgeminiCoTpᵢⱼ = ⟨ensemble⟩112K QUERIES · 112K GOLD PAIRSexpensive · slow · gold→ DISTILL VIA BCE LOSSSTEP 2 · DISTILL PAIRWISEpairwise SLM rerankerR'_pairℒ = BCE(pᵢⱼ, p'ᵢⱼ)→ B→ B→ BQWEN3-4B INIT · ~1000× FASTERnear-ensemble accuracy, SLM speed↓ INFERENCE OVER A GRAPHSTEP 3 · zELO FITgraph of pairs → fitted Elosd₁d₂d₃d₄d₅d₆d₇d₈cycle 1cycle 2k = 4 · diam = 2~0.4% of all pairsk/2 random cycles, unionedTHURSTONE FIT10Eloᵢ − EloⱼP = ½(1 + erf · Δ)FITTED Elos PER (q, d)A0.86B0.68C0.49D0.32E0.18STEP 4 · DISTILL POINTWISEpointwise rerankerzerank-1ℒ = (R(q,d) − Elo)²(q, d):single fwd-pass0.82~5M (q, d, Elo) MSE PAIRSQwen3-4B → zerank-1→ SHIPS AS zerank-1112K LLM-ENSEMBLE INFERENCES → 5M MSE TARGETS · NO HUMAN ANNOTATIONS
Ship Models That Work

Integrate ZeroEntropy models in minutes. Production-ready, latency-optimized, available everywhere.

AWSHugging FaceAzure
Partner Providers

Access all models through a single, latency-optimized API, or through our partner providers.

# Create an API Key at https://dashboard.zeroentropy.dev

from zeroentropy import ZeroEntropy

zclient = ZeroEntropy()

response = zclient.models.rerank(
    model="zerank-2",
    query="What is Retrieval Augmented Generation?",
    documents=[
        "RAG combines retrieval with generation...",
    ],
)

for doc in response.results:
    print(doc)
API
ZeroEntropy API

Start building in minutes with Python and TypeScript SDKs.

VPC
ZeroEntropy VPC

Deploy in your own cloud with dedicated infrastructure. Available on AWS Marketplace and Azure.

Enterprise
Enterprise and Model Licensing

Custom deployments, dedicated capacity, model licensing, model fine-tuning, and SLAs. Talk to us.

Enterprise-Ready

From security to scale, ZeroEntropy is built for the demands of production ready AI

Compliance portal
SOC2 Type II

SOC2 Type II

Audited controls for data security, availability, and confidentiality — verified annually.

HIPAA Compliant

HIPAA Compliant

BAA-ready infrastructure with encryption at rest and in transit for protected health data.

Security lock blueprint
GDPR Compliant

GDPR Compliant

Full data residency controls, right-to-deletion, and DPA agreements for EU customers.

CCPA Compliant

CCPA Compliant

Consumer data rights honored with full transparency on collection, use, and deletion.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord