Versus

ZeroEntropy vs Voyage

Pick a competitor. Same eval set, same judges, no hand-picked splits. The numbers below are computed live from /evals/.

ZeroEntropy vs OpenAI

Per-dataset NDCG@10blue = ZE · tan = OpenAI
NDCG@10 lead+5.0 pts0.721 vs 0.671
Recall@100 lead+5.9 pts0.790 vs 0.731
Datasets won30 / 34ZE > flagship at NDCG@10
Flagship modelopenai-v3-largeno first-party reranker

penAI's text-embedding-3-large is the default many production stacks reach for — and zembed-1 outperforms it by 5-7 points Recall@100 across most verticals. OpenAI ships no first-party reranker, so the full pipeline gap is wider than the embedding gap alone.

Per-vertical Δ NDCG@10 (pts)sorted ZE-best → ZE-worst
Specialized
+12.6
Finance
+7.7
Instruction Following
+7.0
Medical
+6.3
Manufacturing
+5.4
Multilingual
+4.4
Legal
+4.3
QA & Knowledge
+1.6
Science
+1.6
OpenAI wins ←→ ZE wins
Where the gap closes
  • Common compatibility — OpenAI embeddings are wired into more vector DBs, frameworks, and downstream tooling than anything else on this list, so the integration cost of staying put is effectively zero.
  • General-purpose embeddings beyond retrieval — text-embedding-3-large is broadly suitable for classification and clustering, where zembed-1 was specifically tuned for retrieval and may underweight other downstream tasks.
Where ZE wins
  • Recall@100 — 5-7 point gap across most verticals on the eval set.
  • The full retrieval pipeline — OpenAI ships no first-party reranker, so embed-only vs zembed-1 + zerank-2 is a single-stage-vs-two-stage comparison.
  • Per-query cost at production scale — zembed-1 is materially cheaper per million tokens at every tier.
How we measure

No cherry-picking. No hand-tuned splits.

28 datasets

Heterogeneous coverage — legal, finance, medical, multilingual, instruction-following, long-context. Every model evaluated on the same set.

3 LLM judges

Gemini-3-Flash, GPT-5-nano, Grok-4-fast. Inter-judge agreement (κ) ≥ 0.7 across the suite — see /concepts/eval-set-quality/ for the discipline.

Paired bootstrap

Per-query deltas, not averaged independent samples. 95% CI on every reported number; statistical significance never asserted on n < 30.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord