Question 1

What is ZeroEntropy?

Accepted Answer

ZeroEntropy trains specialized small models — rerankers, embeddings, and custom models — for production AI systems. The thesis is that the long-term shape of production AI is a constellation of fine-tuned specialists wrapped around frontier LLMs, not one giant LLM doing everything.

Question 2

What's a reranker, and why would I use one?

Accepted Answer

A reranker is a second-stage retrieval model that reorders a candidate set from first-pass retrieval by relevance. It is how production search systems get high precision at the top of the result list without paying full LLM cost on every query — the standard pattern is BM25 or dense retrieval feeding the top 50–200 candidates into a cross-encoder reranker.

Question 3

What is the difference between an embedding and a reranker?

Accepted Answer

An embedding is a fixed-size vector that lets you compare two pieces of text by cosine similarity. A reranker is a much heavier model that takes a (query, document) pair as joint input and produces one relevance score. Embeddings are cheap and cacheable for indexing; rerankers are precise but cost more per pair — so production systems use embeddings (or BM25) to fetch and rerankers to order.

Question 4

Which models does ZeroEntropy offer?

Accepted Answer

Our current production models are zerank-2 (the reranker) and zembed-1 (the embedding). Both are available via the API; details on benchmarks, latency, and pricing are on the rerankers and embeddings pages.

Question 5

How does ZeroEntropy compare to OpenAI, Cohere, or Voyage?

Accepted Answer

We train smaller, faster, more accurate models for the specific tasks (reranking and embedding) that production retrieval pipelines load-bear on. Frontier LLM providers sell generality; we sell specialization. On standard public benchmarks like MTEB and BEIR, zerank-2 and zembed-1 are competitive with or ahead of the generalist alternatives at a fraction of latency and cost — and our regrading methodology often widens the gap on harder evals.

Question 6

What is the typical latency on a rerank or embedding request?

Accepted Answer

A rerank call on a batch of 100 documents typically completes in the tens of milliseconds at our P50, sub-100ms at P95. An embedding call returns in a few milliseconds. Exact numbers are in the rerankers and embeddings pages, with the workload assumptions stated.

Question 7

How is pricing structured?

Accepted Answer

Per-token, with separate input and output rates for the reranker and a single rate for the embedder. The full table is on the pricing page. Enterprise contracts include committed volume, SLAs, and private deployment options — contact us if your usage profile is non-standard.

Question 8

Is there a free tier?

Accepted Answer

Yes — every account starts with a free tier sufficient to evaluate the API end-to-end on your own data. See pricing for the current credit allocation and limits.

Question 9

What languages do the models support?

Accepted Answer

zerank-2 and zembed-1 are trained on a multilingual corpus that covers the major world languages — English, Spanish, French, German, Portuguese, Italian, Russian, Mandarin, Japanese, Korean, Arabic, Hindi, and others. Cross-lingual retrieval (querying in one language, retrieving in another) is supported. See cross-lingual retrieval.

Question 10

How do I get started?

Accepted Answer

Create an account on docs.zeroentropy.dev, generate an API key, and call the rerank or embed endpoint. Most users are running a working integration inside an hour.

Question 11

Can I self-host or deploy ZeroEntropy models in my own VPC?

Accepted Answer

Yes, on enterprise plans. The standard deployment options are a hosted API (most customers), a single-tenant deployment in a region of your choice, and on-premise / VPC deployment for regulated industries. Enterprise has the full menu.

Question 12

How does ZeroEntropy handle data privacy and compliance?

Accepted Answer

API traffic is encrypted in transit and at rest; we do not train on customer data by default. SOC 2 Type II is in place; HIPAA-eligible workloads are available on enterprise plans. The trust center has the full posture, certifications, and DPA template; the privacy policy covers what data we collect and how we use it.

Question 13

Can ZeroEntropy train a custom reranker or embedder on my data?

Accepted Answer

Yes — that is what custom models is about. Our zELO methodology uses frontier LLMs to generate graded relevance labels on your corpus, then trains a specialized small model that beats the generalist on your specific domain. The typical custom-model project ships a deployed model in 2–4 weeks.

Question 14

Should I fine-tune my own model instead?

Accepted Answer

For most retrieval and reranking workloads, no — the engineering cost of doing it well (label curation, hard-negative mining, hyperparameter sweeps, eval discipline, ongoing retraining) eats the apparent margin. Use our hosted models for the standard cases; reach for custom models when your domain is genuinely different (legal, medical, technical with rare terms) and the volume justifies the lift. The honest comparison is on a fixed quality target and total cost-of-ownership, not list-price-per-token.

Question 15

Where do I find deeper technical references?

Accepted Answer

The concepts catalog has 250+ articles covering everything from BM25 and embeddings to RLHF, FlashAttention, and eval set quality. The playbooks are symptom-shaped — "RAG is returning the right docs but wrong answers" — and walk through the diagnostic stack. The blog covers product launches, research notes, and customer stories.

FAQ