- zembed-1 scores 0.5385 NDCG@10 in conversational retrieval, the highest of any embedding model
- +35.0% over OpenAI text-embedding-3-large and +26.9% over Cohere Embed v4
- The largest performance margins across all benchmarked domains — conversational retrieval is where generic models struggle most
- zELO training bridges the gap between informal user language and formal documentation
- Multilingual conversational retrieval across 100+ languages with no separate models needed
zembed-1 Understands What People Actually Mean
Conversational AI has a retrieval problem. The language people use when talking to chatbots, virtual assistants, and support systems is different from how information is written in knowledge bases, documentation, and FAQs. People ask questions in fragments, use colloquial phrasing, make implicit assumptions, and rarely use the exact terminology that matches the documents they need.
The embedding model is the bridge between what users say and what the system knows. Choose poorly and your conversational AI will miss obvious matches, hallucinate answers because it can’t find the right documents, and frustrate users with irrelevant responses. Choose zembed-1 by ZeroEntropy and that bridge becomes reliable.
zembed-1 has achieved the highest benchmark score of any embedding model in the conversational domain — with a +33.2% advantage over OpenAI’s best model and a substantial lead over all other competitors.
The Conversational Retrieval Problem
When someone types a query into a search engine, they often use keywords — they’ve learned to “speak search.” But when someone talks to a chatbot or virtual assistant, they communicate naturally: “I can’t figure out how to cancel my subscription, I’ve been trying for like 20 minutes,” or “what happens if I miss a payment?” or “is there a way to get a refund after 30 days?”
These natural language queries need to be matched to documentation that was written in formal, structured language:
- “Subscription cancellation: To cancel your subscription, navigate to Account Settings > Billing > Cancel Subscription…”
- “Late payment policy: Accounts overdue by more than 30 days will be subject to…”
- “Refund eligibility: Refunds are available within 30 days of purchase for unused portion of subscription…”
Benchmark Performance in the Conversational Domain
| Model | Conversational NDCG@10 |
|---|---|
| zembed-1 | 0.5385 |
| Cohere Embed v4 | 0.4244 |
| voyage-4-nano | 0.4045 |
| OpenAI text-embedding-3-large | 0.3988 |
The conversational domain shows zembed-1’s largest performance gaps across competitors. Against voyage-4-nano, zembed-1 leads by +33.1%. Against OpenAI text-embedding-3-large, the advantage is +35.0%. Even against Cohere Embed v4, zembed-1 outperforms by +26.9%.
These are not marginal improvements — they represent a qualitative difference in how well the system understands conversational queries. For chatbot and virtual assistant applications, these numbers translate directly into user satisfaction, deflection rates, and the quality of answers that downstream language models can generate.
Why zembed-1 Excels at Conversational Retrieval
zELO Training on Real Query-Document Pairs
The zELO training methodology trains zembed-1 by having document candidates compete in pairwise relevance judgements rather than matching on surface-level similarity. This approach specifically captures the semantic bridge between how people ask questions and how documents answer them — even when the vocabulary is completely different.
The model learns that “my package hasn’t arrived” maps to “shipment tracking and delivery issues” documentation, that “I forgot my password” maps to “account recovery procedures,” and that “the app keeps crashing on my phone” maps to “mobile application troubleshooting guides” — because real query-document pairs teach it these relationships.
Distilled from a Reranker That Understands Intent
zembed-1 is distilled from zerank-2, ZeroEntropy’s state-of-the-art reranker. Rerankers are specifically trained to evaluate whether a document truly addresses a user’s query — a task that requires understanding intent, not just vocabulary. This lineage gives zembed-1 a unique advantage in conversational settings where intent interpretation is everything.
Handles Conversational Query Diversity
zembed-1 was trained with over 50% non-English data, meaning its conversational retrieval performance extends across languages. A multilingual customer support system can serve users in their native language with the same retrieval quality — whether the customer writes in Spanish, French, Arabic, or Japanese.
Conversational AI Use Cases
Customer Support Chatbots
The most common conversational AI use case. zembed-1 powers retrieval of the right support articles, FAQs, and troubleshooting guides for customer queries — so the chatbot can answer accurately instead of hallucinating or deflecting. The performance gap vs. competitors translates directly into higher resolution rates and lower escalation to human agents.
Enterprise Knowledge Assistants
Internal chatbots that help employees find HR policies, IT documentation, process guides, and company information. Employees ask questions in natural language — zembed-1 retrieves the right answer from the right document.
Product and E-Commerce Assistants
Virtual shopping assistants that help customers find products matching their described needs. “I need something waterproof for hiking in cold weather that my 8-year-old can use” needs to retrieve the right product category and relevant items — a classic conversational retrieval challenge.
Healthcare Patient Communication
Patient-facing virtual assistants that answer questions about conditions, medications, and care instructions. zembed-1’s combination of conversational and healthcare domain performance makes it ideal for these hybrid applications.
Educational Tutoring Systems
Students ask questions conversationally — “I don’t understand why we multiply instead of divide here” — and the system needs to retrieve the relevant explanation, worked example, or concept definition. zembed-1 handles the informal query language that students actually use.
HR and Employee Support
Answer questions about benefits, policies, time-off procedures, and onboarding using natural conversational queries against a knowledge base of formal policy documents. zembed-1’s conversational retrieval performance makes these systems actually usable for employees.
Designing Better Conversational RAG Systems with zembed-1
A well-designed conversational RAG pipeline with zembed-1 typically looks like this:
- User query arrives in natural language — “can I add my spouse to my insurance plan mid-year?”
- zembed-1 encodes the query using
encode_query()which optimizes the encoding for retrieval - zembed-1 retrieves the top-k most relevant documents from the knowledge base vector store
- A language model generates a response grounded in the retrieved context
from sentence_transformers import SentenceTransformer
model = SentenceTransformer(
"zeroentropy/zembed-1",
trust_remote_code=True,
model_kwargs={"torch_dtype": "bfloat16"},
)
# Conversational support query
query_embeddings = model.encode_query(
"can i add my spouse to my plan after the open enrollment period ended?"
)
knowledge_base = [
"Qualifying Life Events: Outside of Open Enrollment, you may make changes to your health insurance plan within 30 days of a qualifying life event. Qualifying life events include marriage, birth of a child, loss of other coverage, and divorce...",
"Open Enrollment Period: Our annual Open Enrollment runs from November 1 to November 30. During this time, you can add dependents, change plan tiers, or update coverage...",
"Dependent Coverage: You may add eligible dependents to your health insurance plan. Eligible dependents include your legal spouse, domestic partner (where applicable), and children up to age 26...",
]
document_embeddings = model.encode_document(knowledge_base)
similarities = model.similarity(query_embeddings, document_embeddings)
# zembed-1 correctly surfaces the Qualifying Life Events document as most relevant
For a full production conversational RAG pipeline with conversation history awareness:
from sentence_transformers import SentenceTransformer
from openai import OpenAI # or any LLM client
embed_model = SentenceTransformer(
"zeroentropy/zembed-1",
trust_remote_code=True,
model_kwargs={"torch_dtype": "bfloat16"},
)
llm = OpenAI()
def conversational_rag(
user_message: str,
conversation_history: list[dict],
knowledge_base_embeddings,
knowledge_base_texts: list[str],
top_k: int = 4,
) -> str:
# Include recent history in the query for contextual retrieval
recent_context = " ".join(
[m["content"] for m in conversation_history[-3:]] + [user_message]
)
# Retrieve relevant knowledge using the enriched query
q_emb = embed_model.encode_query(recent_context)
scores = embed_model.similarity(q_emb, knowledge_base_embeddings)[0]
top_idx = scores.argsort(descending=True)[:top_k]
retrieved_context = "\n\n".join([knowledge_base_texts[i] for i in top_idx])
# Generate a grounded answer
messages = conversation_history + [{
"role": "user",
"content": f"Context:\n{retrieved_context}\n\nQuestion: {user_message}"
}]
response = llm.chat.completions.create(model="gpt-4o", messages=messages)
return response.choices[0].message.content
# Example
history = []
answer = conversational_rag(
"Can I add my spouse to my plan mid-year?",
history,
knowledge_base_embeddings,
knowledge_base_texts,
) The ROI of Better Conversational Retrieval
The business impact of zembed-1’s conversational retrieval advantage is concrete:
Higher automation rates: When the retrieval is right, the chatbot can answer correctly. Every additional percentage point of automation rate reduces human agent costs.
Better user experience: Users who get accurate answers the first time don’t escalate, don’t abandon, and are more likely to trust the system.
Reduced hallucination: Language models hallucinate when they don’t have the right context. Better retrieval means the LLM has what it needs to answer accurately, dramatically reducing hallucination rates.
Multilingual support at no extra cost: zembed-1’s multilingual training means you get the same conversational retrieval quality in every language — no separate models, no translation overhead.
For companies running chatbots at scale, moving to zembed-1 from OpenAI or Cohere embeddings represents a meaningful improvement in every metric that matters.
Get Started
zembed-1 is available today through multiple deployment options:
from zeroentropy import ZeroEntropy
zclient = ZeroEntropy()
response = zclient.models.embed(
model="zembed-1",
input_type="query", # "query" or "document"
input="What is retrieval augmented generation?", # string or list[str]
dimensions=2560, # optional: must be one of [2560, 1280, 640, 320, 160, 80, 40]
encoding_format="float", # "float" or "base64"
latency="fast", # "fast" or "slow"; omit for auto
)Documentation: docs.zeroentropy.dev
HuggingFace: huggingface.co/zeroentropy
Get in touch: Discord community or contact@zeroentropy.dev
Talk to us if you need a custom deployment, volume pricing, or want to see how zembed-1 + zerank-2 performs on your data.
