QUERYDOCUMENT"what is RAG?""RAG combines..."ENCODERENCODERQ.30-.40.50-.20.60-.30D.30-.40.50-.10.60-.20×ELEMENTWISE PRODUCT.09+.16+.25+.02+.36+.06Σ=0.94SIMILARITY
Product · Embeddings

First-Pass Retrieval,state-of-the-art recall

zembed-1 is the current #1 embedding model on graded retrieval benchmarks. 4B parameters, up to 2560 dimensions, multilingual, and instruction-aware. Because your embedding search's ceiling is your entire pipeline's ceiling.

Trusted by
The Problem

Dense Recall is a ceiling

Embeddings are fast and cheap and how you find a hundred plausible candidates out of ten million in milliseconds. But Recall@100 is the silent ceiling on every RAG pipeline: everything downstream – LLM calls, rerankers, your agents – can only sort what the embedding surfaces, and most models simply leave relevant documents on the floor.

0.715
NDCG@10 (avg)

Across 28 datasets and 3 LLM judges. Ahead of voyage-4 (0.712) and harrier-27b (0.706).

0.771
Recall@100

Highest of any embedding model we test: +2.0 pts over voyage-4, +2.2 over harrier-27b.

~9×
faster than voyage-4

P50 ~280 ms vs ~2500 ms at 2 QPS / 2560-dim / 512-token inputs. Fast mode goes lower.

Benchmark

NDCG@10 average across 28 datasets

See evals
What Teams Are Saying

Better recall on our long-tail queries was the entire reason to switch. The reranker downstream got a better candidate set on every query, and our metrics moved.

Vera HealthClinical AI, Vera Health

Multilingual recall held up across our European and Asian markets where the previous embedding fell off a cliff.

SendbirdSearch, Sendbird
Specs & Performance

zembed-1: The World's Best Text Embedding Model

Bi-encoder embedding model, distilled directly from zerank-2 so the relevance signal you fit downstream is the same one the model was trained on. Tunable at inference: dimensions and quantization both swap on the fly, no retraining.

Specs
Parameters
4B · open weights
Context window
32K tokens
Languages
100+ · >50% non-English training data · cross-lingual parity
Dimensions
40 → 2560, truncatable at inference (no retraining; not Matryoshka)
Quantization
float32 · int8 (4× smaller) · binary (32× smaller) — pick per call
Storage example
8 KB at full / 256 dims @ int8 → 256 B / under 128 B at the smallest end
Training
Distilled from zerank-2 (zELO-derived continuous Elo targets)
API
input_type: query|document · latency: fast|slow per call
Performance
NDCG@10 (28 datasets, 3 LLM judges)
0.715 — #1 of every embedding model we tested
Recall@100
0.771 — +2.0 over voyage-4, +2.2 over harrier-27b
vs the field on Recall@100
Up to +7% over OpenAI Large, Qwen3-4B, BGE-M3, Gemini, Cohere v4, Voyage-4-nano
P50 latency · 2 QPS · 2560-d · 512 tokens
~280 ms (fast) vs ~2500 ms voyage-4 — ~9× faster
Vertical strength
Largest gains on finance, healthcare, legal — domain-vocabulary-heavy verticals
Methodology

Trained on similarity, not boundaries.

Most embedding models train against binary relevant/not-relevant labels. zembed-1 is trained on continuous relevance scores derived from pairwise LLM preferences — the same signal behind zerank-2. This is why it does disproportionately well on graded evaluations where binary-trained competitors plateau.

01

Continuous relevance scores

Pairwise LLM preferences are converted into absolute ELO-style scores via Thurstone fitting — a graded signal, not a binary one.

02

Broad-domain training

Legal, medical, financial, code, multilingual, and technical corpora — chosen so the model generalizes to private enterprise data, not just the public benchmark.

03

Flexible dimensions

Output 1024 / 1536 / 2560-dim vectors. 1536 is the production sweet spot; 2560 for last-mile accuracy; 1024 for index-cost-sensitive consumer search.

04

Query/document asymmetry

An `input_type` parameter (`query` vs `document`) embeds the same string at slightly different points in space — reflecting the inherent asymmetry of search.

Ship Models That Work

Integrate ZeroEntropy models in minutes. Production-ready, latency-optimized, available everywhere.

AWSHugging FaceAzure
Partner Providers

Access all models through a single, latency-optimized API, or through our partner providers.

# Create an API Key at https://dashboard.zeroentropy.dev

from zeroentropy import ZeroEntropy

zclient = ZeroEntropy()

response = zclient.models.rerank(
    model="zerank-2",
    query="What is Retrieval Augmented Generation?",
    documents=[
        "RAG combines retrieval with generation...",
    ],
)

for doc in response.results:
    print(doc)
API
ZeroEntropy API

Start building in minutes with Python and TypeScript SDKs.

VPC
ZeroEntropy VPC

Deploy in your own cloud with dedicated infrastructure. Available on AWS Marketplace and Azure.

Enterprise
Enterprise and Model Licensing

Custom deployments, dedicated capacity, model licensing, model fine-tuning, and SLAs. Talk to us.

Enterprise-Ready

From security to scale, ZeroEntropy is built for the demands of production ready AI

Compliance portal
SOC2 Type II

SOC2 Type II

Audited controls for data security, availability, and confidentiality — verified annually.

HIPAA Compliant

HIPAA Compliant

BAA-ready infrastructure with encryption at rest and in transit for protected health data.

Security lock blueprint
GDPR Compliant

GDPR Compliant

Full data residency controls, right-to-deletion, and DPA agreements for EU customers.

CCPA Compliant

CCPA Compliant

Consumer data rights honored with full transparency on collection, use, and deletion.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord