FAQ
Common questions about ZeroEntropy's models, pricing, deployment, and how we fit into a production retrieval stack. For deeper technical references, see the concepts catalog; for symptom-shaped runbooks, see the playbooks.
What is ZeroEntropy?
ZeroEntropy trains specialized small models — rerankers, embeddings, and custom models — for production AI systems. The thesis is that the long-term shape of production AI is a constellation of fine-tuned specialists wrapped around frontier LLMs, not one giant LLM doing everything.
What's a reranker, and why would I use one?
A reranker is a second-stage retrieval model that reorders a candidate set from first-pass retrieval by relevance. It is how production search systems get high precision at the top of the result list without paying full LLM cost on every query — the standard pattern is BM25 or dense retrieval feeding the top 50–200 candidates into a cross-encoder reranker.
What is the difference between an embedding and a reranker?
An embedding is a fixed-size vector that lets you compare two pieces of text by cosine similarity. A reranker is a much heavier model that takes a (query, document) pair as joint input and produces one relevance score. Embeddings are cheap and cacheable for indexing; rerankers are precise but cost more per pair — so production systems use embeddings (or BM25) to fetch and rerankers to order.
Which models does ZeroEntropy offer?
Our current production models are zerank-2 (the reranker) and zembed-1 (the embedding). Both are available via the API; details on benchmarks, latency, and pricing are on the rerankers and embeddings pages.
How does ZeroEntropy compare to OpenAI, Cohere, or Voyage?
We train smaller, faster, more accurate models for the specific tasks (reranking and embedding) that production retrieval pipelines load-bear on. Frontier LLM providers sell generality; we sell specialization. On standard public benchmarks like MTEB and BEIR, zerank-2 and zembed-1 are competitive with or ahead of the generalist alternatives at a fraction of latency and cost — and our regrading methodology often widens the gap on harder evals.
What is the typical latency on a rerank or embedding request?
A rerank call on a batch of 100 documents typically completes in the tens of milliseconds at our P50, sub-100ms at P95. An embedding call returns in a few milliseconds. Exact numbers are in the rerankers and embeddings pages, with the workload assumptions stated.
How is pricing structured?
Per-token, with separate input and output rates for the reranker and a single rate for the embedder. The full table is on the pricing page. Enterprise contracts include committed volume, SLAs, and private deployment options — contact us if your usage profile is non-standard.
Is there a free tier?
Yes — every account starts with a free tier sufficient to evaluate the API end-to-end on your own data. See pricing for the current credit allocation and limits.
What languages do the models support?
zerank-2 and zembed-1 are trained on a multilingual corpus that covers the major world languages — English, Spanish, French, German, Portuguese, Italian, Russian, Mandarin, Japanese, Korean, Arabic, Hindi, and others. Cross-lingual retrieval (querying in one language, retrieving in another) is supported. See cross-lingual retrieval.
How do I get started?
Create an account on docs.zeroentropy.dev, generate an API key, and call the rerank or embed endpoint. Most users are running a working integration inside an hour.
Can I self-host or deploy ZeroEntropy models in my own VPC?
Yes, on enterprise plans. The standard deployment options are a hosted API (most customers), a single-tenant deployment in a region of your choice, and on-premise / VPC deployment for regulated industries. Enterprise has the full menu.
How does ZeroEntropy handle data privacy and compliance?
API traffic is encrypted in transit and at rest; we do not train on customer data by default. SOC 2 Type II is in place; HIPAA-eligible workloads are available on enterprise plans. The trust center has the full posture, certifications, and DPA template; the privacy policy covers what data we collect and how we use it.
Can ZeroEntropy train a custom reranker or embedder on my data?
Yes — that is what custom models is about. Our zELO methodology uses frontier LLMs to generate graded relevance labels on your corpus, then trains a specialized small model that beats the generalist on your specific domain. The typical custom-model project ships a deployed model in 2–4 weeks.
Should I fine-tune my own model instead?
For most retrieval and reranking workloads, no — the engineering cost of doing it well (label curation, hard-negative mining, hyperparameter sweeps, eval discipline, ongoing retraining) eats the apparent margin. Use our hosted models for the standard cases; reach for custom models when your domain is genuinely different (legal, medical, technical with rare terms) and the volume justifies the lift. The honest comparison is on a fixed quality target and total cost-of-ownership, not list-price-per-token.
Where do I find deeper technical references?
The concepts catalog has 250+ articles covering everything from BM25 and embeddings to RLHF, FlashAttention, and eval set quality. The playbooks are symptom-shaped — "RAG is returning the right docs but wrong answers" — and walk through the diagnostic stack. The blog covers product launches, research notes, and customer stories.
