- zembed-1 leads every domain benchmark: finance, healthcare, legal, manufacturing, code, STEM, and conversational
- Averages 0.5561 NDCG@10 across domains — +10% over voyage-4-nano and +17.6% over OpenAI
- Tops MSMARCO at 0.946 NDCG@10 — the highest of all 16 models tested
- Trained with zELO (Elo-based relevance scores) and distilled from zerank-2, the world’s best reranker
- 32,768-token context, multilingual (50%+ non-English training data), and flexible compression down to 40 dimensions or binary quantization
One Model That Leads Everywhere
Every year, the embedding model leaderboard gets reshuffled. Models from OpenAI, Google, Cohere, Voyage, and various open-source efforts trade top positions on different benchmarks, in different domains, and for different use cases. Practitioners who want the best results have historically had to choose: pick the model that’s best for their domain, accept trade-offs elsewhere, and hope the landscape doesn’t shift under their feet.
In 2026, that dynamic has changed. zembed-1 by ZeroEntropy has become the first embedding model to achieve consistent best-in-class performance across every domain benchmarked — finance, healthcare, legal, manufacturing, code, STEM, and conversational — while simultaneously topping the MSMARCO standard information retrieval benchmark.
This is the comprehensive case for why zembed-1 is the best embedding model available today.
The Full Domain Benchmark Results
Most embedding model comparisons feel like a game of Whack-a-Mole. One model does well at code but underperforms on multilingual. Another leads in scientific retrieval but struggles with conversational queries. General-purpose models sacrifice domain performance for breadth.
zembed-1 refuses this trade-off. Here are the domain-specific benchmarks in full:
| Domain | zembed-1 | voyage-4-nano | Cohere Embed v4 | OpenAI text-embedding-3-large |
|---|---|---|---|---|
| Finance | 0.4476 | 0.4227 | 0.3670 | 0.3291 |
| Healthcare | 0.6260 | 0.5356 | 0.4750 | 0.5315 |
| Legal | 0.6723 | 0.5957 | 0.5894 | 0.5099 |
| Conversational | 0.5385 | 0.4045 | 0.4244 | 0.3988 |
| Manufacturing | 0.5556 | 0.4857 | 0.4919 | 0.4736 |
| Code | 0.6452 | 0.6415 | 0.6277 | 0.6155 |
| STEM & Math | 0.5283 | 0.5012 | 0.4698 | 0.3905 |
| Average | 0.5561 | 0.5050 | 0.4957 | 0.4727 |
zembed-1 leads every single row — averaging 0.5561, more than +10% ahead of voyage-4-nano and +17.6% ahead of OpenAI text-embedding-3-large across all domains combined.
On MSMARCO, the closest proxy to real-world retrieval workloads, zembed-1 scored 0.946 NDCG@10 — the highest of all 16 models tested.
What Makes zembed-1 Different
A New Training Paradigm: zELO
Most embedding models are trained on either contrastive objectives (make similar documents cluster together) or on binary relevance signals (this document is relevant to this query: yes/no). Both approaches have fundamental limitations — contrastive training can overfit to surface-level similarity, and binary labels collapse the rich spectrum of document relevance.
This nuanced relevance understanding is what allows zembed-1 to generalize across domains. Relevance is relevance, whether the query is about mortgage-backed securities or mRNA vaccine production protocols.
Distilled from the World’s Best Reranker
zembed-1 is not trained from scratch. It is distilled directly from zerank-2 — ZeroEntropy’s state-of-the-art reranker that already leads its class in relevance evaluation.
Rerankers are models specifically trained for one purpose: given a query and a document, evaluate how relevant the document is. They are the gold standard for relevance judgement. By distilling zembed-1 from zerank-2, ZeroEntropy transferred the deep relevance understanding of their best reranker into an efficient single-pass embedding model.
This lineage is a key reason zembed-1 outperforms models trained purely on contrastive objectives — it was trained to understand relevance in the same way a reranker does, but runs at embedding-model speed and cost.
Built Multilingual from Day One
Over 50% of zembed-1’s training data is non-English. The model handles English, Japanese, Arabic, German, French, Spanish, Chinese, and all major world languages with the same Elo-calibrated relevance judgement. Cross-lingual retrieval — matching a query in one language to relevant documents in another — is a native capability, not an add-on.
For global AI deployments, this eliminates the need for language-specific models, translation pipelines, or degraded performance for non-English users.
Large Context Window
At 32,768 tokens, zembed-1’s context window is one of the longest of any competitive embedding model. This matters because real documents are long — legal agreements, medical records, financial reports, technical manuals. With a 32k window, entire sections can be embedded as coherent units, preserving the logical structure that makes retrieval work. The chunking artifacts that plague shorter-context models are eliminated.
Flexible Compression for Any Scale
zembed-1 supports progressive compression without the information loss typical of dimensionality reduction:
Dimension flexibility: 2560 → 1280 → 640 → 320 → 160 → 80 → 40 dimensions
Quantization: float32 (8 KB) → int8 (2 KB) → binary (<128 bytes)
Unlike Matryoshka-style dimensionality reduction, zembed-1 uses a client-side linear transformation that preserves more information at every dimension count. You can compress vectors to under 128 bytes for a 32x storage reduction with controlled, predictable accuracy trade-offs — or keep full precision where it matters most.
This flexibility enables everything from small startup deployments to enterprise corpora spanning millions of documents.
Availability and Deployment
zembed-1 is available through every channel that matters for modern AI deployments:
HuggingFace — Open-weight model (CC-BY-NC-4.0) for non-commercial use, research, and self-hosted deployments. Over 54,000 downloads in its first month.
ZeroEntropy API — Managed API service with no infrastructure overhead. Currently offering 50% off document embeddings until June 1st, 2026. Accessible via the /models/embed endpoint.
AWS Marketplace — For AWS-native enterprise deployments with consolidated billing and marketplace compliance.
Commercial Licensing — Contact contact@zeroentropy.dev for commercial use of the open-weight model.
Getting Started in 5 Minutes
# Install
# pip install sentence-transformers
from sentence_transformers import SentenceTransformer
# Load the model
model = SentenceTransformer(
"zeroentropy/zembed-1",
trust_remote_code=True,
model_kwargs={"torch_dtype": "bfloat16"},
)
# Embed queries and documents (separate methods optimize each for retrieval)
query_embedding = model.encode_query("What is the refund policy for annual subscriptions?")
documents = [
"Annual subscriptions are eligible for a full refund within 14 days of purchase...",
"Monthly billing cycles can be cancelled at any time with prorated refund...",
"Enterprise contracts are governed by the terms in the signed MSA...",
]
doc_embeddings = model.encode_document(documents)
# Compute similarity
scores = model.similarity(query_embedding, doc_embeddings)
print(scores) # [0.91, 0.73, 0.45] — zembed-1 correctly ranks the most relevant document first
See zembed-1’s separation margin over competitors on the same query:
# zembed-1 separates relevant from irrelevant documents more decisively than competitors
# On a finance query: "Disclosure requirements for related-party transactions under IFRS"
query = "What are the disclosure requirements for related-party transactions under IFRS?"
documents = [
"IAS 24: Entities must disclose the nature of related party relationships, transactions, and outstanding balances including commitments...", # highly relevant
"IFRS 9 Financial Instruments covers classification and measurement of financial assets...", # partial
"The company operates in three reportable segments: North America, Europe, and Asia Pacific...", # not relevant
] Who Should Be Using zembed-1
AI Engineers Building RAG Systems
If your RAG pipeline uses OpenAI or Cohere embeddings today, switching to zembed-1 is the highest-leverage improvement you can make to retrieval quality. The performance gains are documented, reproducible, and material.
Product Teams in Regulated Industries
Finance, healthcare, and legal applications specifically benefit from zembed-1’s domain-specific training. If you’re building compliance tools, clinical AI, or legal research systems, the benchmark gaps between zembed-1 and alternatives are not academic — they represent real differences in whether practitioners trust and use your product.
Global AI Deployments
Any application serving non-English-speaking users benefits from zembed-1’s first-class multilingual support. Stop building workarounds for multilingual retrieval and use a model that handles it natively.
Enterprises with Large Document Corpora
zembed-1’s flexible compression makes it practical at enterprise scale. Self-hosting options give you full data sovereignty.
Researchers and Academics
The open-weight model, available on HuggingFace, makes zembed-1 accessible for research use. The CC-BY-NC-4.0 license supports academic exploration of the model’s capabilities.
The Verdict
zembed-1 is a step-change in embedding model quality. Not an incremental improvement — a step-change. The combination of zELO-trained relevance understanding, reranker distillation, multilingual-first training, long context, and flexible compression produces a model that leads every domain benchmark tested and the standard MSMARCO retrieval benchmark.
After years of practitioners having to choose between a model that’s good at their domain and one that’s good everywhere else, zembed-1 has made that trade-off obsolete. It is good everywhere. It is the best everywhere.
If you’re building any AI system that involves retrieving information from documents — which is essentially all of enterprise AI in 2026 — zembed-1 is where you should start.
Get Started
zembed-1 is available today through multiple deployment options:
from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()
response = zclient.models.embed(
model="zembed-1",
input_type="query", # "query" or "document"
input="What is retrieval augmented generation?", # string or list[str]
dimensions=2560, # optional: must be one of [2560, 1280, 640, 320, 160, 80, 40]
encoding_format="float", # "float" or "base64"
latency="fast", # "fast" or "slow"; omit for auto
)Documentation: docs.zeroentropy.dev
HuggingFace: huggingface.co/zeroentropy
Get in touch: Discord community or contact@zeroentropy.dev
Talk to us if you need a custom deployment, volume pricing, or want to see how zembed-1 + zerank-2 performs on your data.
