Product · Embeddings

First-Pass Retrieval,state-of-the-art recall

zembed-1 is the current #1 embedding model on graded retrieval benchmarks. 4B parameters, up to 2560 dimensions, multilingual, and instruction-aware. Because your embedding search's ceiling is your entire pipeline's ceiling.

Get API Key Read Docs

Models

zembed-1 (fast mode)

Latency-tuned configurationlatency: fast · lowest P50 · same model · API parameter

zembed-1 (high throughput)

Throughput-tuned configurationlatency: slow · higher QPS · same model · API parameter

Trusted by

The Problem

Dense Recall is a ceiling

Embeddings are fast and cheap and how you find a hundred plausible candidates out of ten million in milliseconds. But Recall@100 is the silent ceiling on every RAG pipeline: everything downstream – LLM calls, rerankers, your agents – can only sort what the embedding surfaces, and most models simply leave relevant documents on the floor.

0.715

NDCG@10 (avg)

Across 28 datasets and 3 LLM judges. Ahead of voyage-4 (0.712) and harrier-27b (0.706).

0.771

Recall@100

Highest of any embedding model we test: +2.0 pts over voyage-4, +2.2 over harrier-27b.

~9×

faster than voyage-4

P50 ~280 ms vs ~2500 ms at 2 QPS / 2560-dim / 512-token inputs. Fast mode goes lower.

Benchmark

NDCG@10 average across 28 datasets

See evals

What Teams Are Saying

“Better recall on our long-tail queries was the entire reason to switch. The reranker downstream got a better candidate set on every query, and our metrics moved.”

Vera HealthClinical AI, Vera Health

“Multilingual recall held up across our European and Asian markets where the previous embedding fell off a cliff.”

SendbirdSearch, Sendbird

Specs & Performance

zembed-1: The World's Best Text Embedding Model

Bi-encoder embedding model, distilled directly from zerank-2 so the relevance signal you fit downstream is the same one the model was trained on. Tunable at inference: dimensions and quantization both swap on the fly, no retraining.

Specs

Parameters: 4B · open weights
Context window: 32K tokens
Languages: 100+ · >50% non-English training data · cross-lingual parity
Dimensions: 40 → 2560, truncatable at inference (no retraining; not Matryoshka)
Quantization: float32 · int8 (4× smaller) · binary (32× smaller) — pick per call
Storage example: 8 KB at full / 256 dims @ int8 → 256 B / under 128 B at the smallest end
Training: Distilled from zerank-2 (zELO-derived continuous Elo targets)
API: input_type: query|document · latency: fast|slow per call

Performance

NDCG@10 (28 datasets, 3 LLM judges): 0.715 — #1 of every embedding model we tested
Recall@100: 0.771 — +2.0 over voyage-4, +2.2 over harrier-27b
vs the field on Recall@100: Up to +7% over OpenAI Large, Qwen3-4B, BGE-M3, Gemini, Cohere v4, Voyage-4-nano
P50 latency · 2 QPS · 2560-d · 512 tokens: ~280 ms (fast) vs ~2500 ms voyage-4 — ~9× faster
Vertical strength: Largest gains on finance, healthcare, legal — domain-vocabulary-heavy verticals

Methodology

Trained on similarity, not boundaries.

Most embedding models train against binary relevant/not-relevant labels. zembed-1 is trained on continuous relevance scores derived from pairwise LLM preferences — the same signal behind zerank-2. This is why it does disproportionately well on graded evaluations where binary-trained competitors plateau.

Continuous relevance scores

Pairwise LLM preferences are converted into absolute ELO-style scores via Thurstone fitting — a graded signal, not a binary one.

Broad-domain training

Legal, medical, financial, code, multilingual, and technical corpora — chosen so the model generalizes to private enterprise data, not just the public benchmark.

Flexible dimensions

Output 1024 / 1536 / 2560-dim vectors. 1536 is the production sweet spot; 2560 for last-mile accuracy; 1024 for index-cost-sensitive consumer search.

Query/document asymmetry

An `input_type` parameter (`query` vs `document`) embeds the same string at slightly different points in space — reflecting the inherent asymmetry of search.

Ship Models That Work

Integrate ZeroEntropy models in minutes. Production-ready, latency-optimized, available everywhere.

Partner Providers

Access all models through a single, latency-optimized API, or through our partner providers.

# Create an API Key at https://dashboard.zeroentropy.dev

from zeroentropy import ZeroEntropy

zclient = ZeroEntropy()

response = zclient.models.rerank(
    model="zerank-2",
    query="What is Retrieval Augmented Generation?",
    documents=[
        "RAG combines retrieval with generation...",
    ],
)

for doc in response.results:
    print(doc)

ZeroEntropy API

Start building in minutes with Python and TypeScript SDKs.

ZeroEntropy VPC

Deploy in your own cloud with dedicated infrastructure. Available on AWS Marketplace and Azure.

Enterprise and Model Licensing

Custom deployments, dedicated capacity, model licensing, model fine-tuning, and SLAs. Talk to us.

Enterprise-Ready

From security to scale, ZeroEntropy is built for the demands of production ready AI

Compliance portal

SOC2 Type II

Audited controls for data security, availability, and confidentiality — verified annually.

HIPAA Compliant

BAA-ready infrastructure with encryption at rest and in transit for protected health data.

GDPR Compliant

Full data residency controls, right-to-deletion, and DPA agreements for EU customers.

CCPA Compliant

Consumer data rights honored with full transparency on collection, use, and deletion.

The best AI teams build with ZeroEntropy models

Book Demo View docs