Product · Rerankers

Add Accuracy, Not Latency

Reorders your search candidates so the actual answer beats the lookalike sitting next to it. zerank-2 tops every public reranker leaderboard at 2–3× the speed and a fraction the cost of an LLM doing the same job.

Get API Key Read Docs

Models

zerank-2

Our flagship rerankerSOTA performance · 4B · instruction-following · multilingual · open weights

zerank-2-nano

Latency-tuned, on the API0.6B · lowest P50 · closed weights

zerank-2-small

Mid-size · private beta1.7B · closed weights · contact us

Quantized inference

4–5× throughput at frontier accuracycustom kernels · much lower P50 · same NDCG

Trusted by

The Problem

Embeddings find related docs.Rerankers find relevant ones.

First-stage retrieval — BM25, dense embeddings, hybrids — surfaces a few hundred candidates per query. Without a reranker, your top-1 is whatever happened to be closest in vector space. Usually that's an answer-shaped distractor sitting next to an actual answer.

0.78

NDCG@10

zerank-2 leads the reranker leaderboard across 29 evaluation datasets and three independent LLM judges.

149.7 ms

P50 latency

Per rerank call on 12 KB documents — 2–3× faster than Jina and Cohere at superior quality.

12–17×

faster than LLMs

Compared to a frontier listwise LLM reranker like GPT-5-mini, at a fraction the cost and higher NDCG@10.

Benchmark

NDCG@10 average across 29 datasets

See evals

What Teams Are Saying

“ZeroEntropy gave us state-of-the-art clinical accuracy across millions of medical research papers — both for simple retrieval and for our Deep Research use case via the MCP server.”

Vera HealthClinical AI, Vera Health

“We replaced our reranker with ZeroEntropy and saw an immediate jump in answer quality on our customer support corpus. The latency was the part we expected to lose; it actually got better.”

AssembledEngineering, Assembled

“Memory recall accuracy went up meaningfully across our agent benchmarks once we wired zerank-2 into the retrieval path.”

Mem0Engineering, Mem0

Specs & Performance

zerank-2: The World's Best Reranker

Cross-encoder reranker, calibrated, multilingual, instruction-following. The numbers below are from the zerank-2 launch evals and the latency assessment under Poisson production load.

Specs

Parameters: 4B (flagship) · 1.7B small · 0.6B nano
Architecture: Cross-encoder · open weights on the 4B
Context window: 32K tokens
Languages: 100+ · near-English parity on major ones · code-switch robust
Outputs: Calibrated 0–1 relevance score + per-call confidence statistic
Instructions: Native — append context, abbreviation tables, business rules per call
Pricing: $0.025 / 1M tokens — 50% under every other commercial reranker

Performance

NDCG@10 (29 datasets, 3 LLM judges): 0.7625 — #1 across public reranker leaderboards
P50 latency · 12 KB docs: 149.7 ms — 2–3× faster than Cohere & Jina at higher quality
Latency tail · Poisson load: 2.7% over 500 ms · 0.9% over 1 s · 0% over 3 s · zero failures
vs Cohere rerank-3.5 · >500 ms: 2.7% vs 14.3%
vs Jina reranker m0 · >500 ms: 2.7% vs 70.8%
vs frontier listwise LLMs: 12–17× faster at higher NDCG@10 and a fraction of the cost

Methodology

zELO — train on the easier question

Ask two careful raters “is this document relevant?” and you'll get two different answers — relevance is fuzzy. Ask them “which of these two is more relevant?” and they'll agree. zELO trains exclusively on the easy question, then converts those head-to-head wins into continuous relevance scores. Same math that ranks chess players.

Pointwise scoring is noisy

Ask a rater to score (query, document) on a 0–1 scale and you get a different number every time — different raters disagree, and the same rater either drifts or discretizes hard. The label noise is the ceiling on every reranker trained against it.

Pairwise comparisons are stable

Ask the same rater 'given this query, is document A or B more relevant?' and the answer barely moves across raters or across calls. The information density is much higher per judgment, and the disagreement is much lower.

Recover scores via Thurstone — like chess Elo

Many pairwise outcomes (A beats B, B beats C, A beats C, …) feed a Thurstone fit — the same statistical idea behind chess Elo. Out comes one continuous relevance score per document, calibrated against every comparison we've seen. Those scores are the SFT target.

LLMs as raters, calibrated and mixed (zerank-2)

Frontier LLMs (Claude, GPT, Gemini) are the raters — more consistent than humans and able to work at scale. zerank-2 extends this with a per-rater calibration: we fit a (μ, κ) Beta distribution to each model and iteratively mix them, so each rater's judgment is weighted by how reliable it has actually been.

Paper

Read the work behind the model

loading…

zELO: ELO-inspired Training Method for Rerankers and Embedding Models

Pipitone, Houir Alami, Avadhanam, Kaminskyi, Khoo

We introduce a novel training methodology named zELO, which optimizes retrieval performance via the analysis that ranking tasks are statically equivalent to a Thurstone model. Trained end-to-end from unannotated queries and documents in less than 10,000 H100-hours, zerank-1 achieves the highest retrieval scores across finance, legal, code, and STEM — outperforming closed-source proprietary rerankers on NDCG@10 and Recall.

arXiv abstract

Ship Models That Work

Integrate ZeroEntropy models in minutes. Production-ready, latency-optimized, available everywhere.

Partner Providers

Access all models through a single, latency-optimized API, or through our partner providers.

# Create an API Key at https://dashboard.zeroentropy.dev

from zeroentropy import ZeroEntropy

zclient = ZeroEntropy()

response = zclient.models.rerank(
    model="zerank-2",
    query="What is Retrieval Augmented Generation?",
    documents=[
        "RAG combines retrieval with generation...",
    ],
)

for doc in response.results:
    print(doc)

ZeroEntropy API

Start building in minutes with Python and TypeScript SDKs.

ZeroEntropy VPC

Deploy in your own cloud with dedicated infrastructure. Available on AWS Marketplace and Azure.

Enterprise and Model Licensing

Custom deployments, dedicated capacity, model licensing, model fine-tuning, and SLAs. Talk to us.

Enterprise-Ready

From security to scale, ZeroEntropy is built for the demands of production ready AI

Compliance portal

SOC2 Type II

Audited controls for data security, availability, and confidentiality — verified annually.

HIPAA Compliant

BAA-ready infrastructure with encryption at rest and in transit for protected health data.

GDPR Compliant

Full data residency controls, right-to-deletion, and DPA agreements for EU customers.

CCPA Compliant

Consumer data rights honored with full transparency on collection, use, and deletion.

The best AI teams build with ZeroEntropy models

Book Demo View docs