CUSTOM-TRAINED MODELSENTERPRISE API AGENTquery rewritingAGENT ASKS"find Q3 financedocs from EU"API CALLq="Q3" type=finregion=EU sort=datezegen~1.7B · API-shape-tunedCODEBASE RECALLsemantic grepYOU REMEMBER"where the &*$! do we hitredis on first load"FOUNDcache/preheat.ts:42prewarm() {}zgrep~0.6B · zero-keyword searchAGENTIC CODING HARNESSLLM context compressionread_file(...)grep("retry")edit_file(retry.ts)FAIL: timeoutedit_file(retry.ts)kept: retry.ts editkept: timeout trace+ summary · 6→3 turnszcompress~1.7B · long agent traces
Product · Custom Models

Specialized Small Models for Specialized Core Tasks

Specialized small models, one per task, for each of the moving parts inside your production AI stack — query rewriting tuned to your enterprise API's exact shape, context compression for long-running agentic coding harnesses, semantic grep that finds the function whose name you forgot. Cheaper, better, and much faster than steering an LLM into the same job.

Custom models inherit the same compliant platform — see our Trust Center.

Models
Query rewriting
Fixes agent ↔ API recall gapsenterprise APIs · agent tool stacks
Context compression
Tames long agent tracesagentic coding · tool-call traces · memory
Domain fine-tunes
Custom zerank-2 / zembed-1legal · clinical · scientific · code
Semantic grep
Find the function whose name you forgotcodebase recall · agentic coding · zero-keyword
Trusted by
INFRASTRUCTURE & COMPLIANCENeed on-prem, SOC 2, or BAA paperwork?Both our public and custom models can run inside your VPC, on-prem, or air-gapped — see Enterprise for the infrastructure side.
See Enterprise
The Problem

Generalist LLMs are the wrong tool for narrow repeated tasks.

An LLM has to relearn the task from instructions on every call. It's huge, slow, and expensive at production traffic. A specialized model — small, with the behaviour baked into the weights — runs circles around it on the narrow thing it was trained to do, often by two or even three orders of magnitude on cost and latency.

0.5–4B
parameters

Where most of our specialized models land. Cheap to host, fast to call, easy to deploy in a VPC.

~100×
cheaper than LLM

Typical cost gap between a custom-trained model and a frontier LLM doing the same narrow task.

1–2 wks
training time

White-glove — hand us an eval set and task definition. We make the data and ship the model.

What Teams Are Saying

We had an enterprise customer whose agent was hitting an awkward search API and losing recall to mismatched query syntax they simply couldn't prompt out of their LLM. We trained a small rewrite model on (intent, observed-API-recall) pairs; recall on the agent's downstream task went up substantially and latency halved.

ZeroEntropy teamCustom training, Internal note
Methodology

zELO — preferences, not labels

The same statistical idea behind zerank-2 generalizes. For any task where ground truth is 'which of these candidates is better for the goal,' we collect pairwise preferences from frontier-LLM ensembles, recover continuous scores via Thurstone fit, and fine-tune a small specialized model against those scores. Transfers cleanly to query rewriting, context compression, and domain-specific retrieval.

The Same Math Behind Chess EloPOINTWISEQ: "how relevant is this — 0 to 1?"q:"what is RAG?"d:"RAG combines retrieval and..."MODEL010.320.710.48SAME (q, d) · DIFFERENT SCORE EVERY CALLvsPAIRWISEQ: "which answers it better — A or B?"q:"what is RAG?"A:"Cheap cotton cloth for..."B:"RAG is retrieval +..."MODEL→ B→ B→ BSAME (q, A, B) · SAME ANSWER EVERY CALLAGGREGATING PAIRWISE OUTCOMES → CONTINUOUS SCORESPAIRWISE OUTCOMESA vs BBwinsB vs CBwinsA vs CAwinsB vs DBwinsC vs DCwinsTHURSTONEFITCONTINUOUS RELEVANCE SCOREA0.50B0.50C0.50D0.50↳ TRAIN THE RERANKER ON THESE
01

Pairwise preferences > absolute labels

Asking 'which of these two is better for the task?' is far more stable than asking 'how good is this on a 0–1 scale?'. Lower noise, higher signal, scales to LLM ensembles.

02

Thurstone-fit continuous targets

Many pairwise outcomes feed a Thurstone fit (the same math behind chess Elo) that recovers a continuous score per item — the supervision signal we train against.

03

Distill into a small specialized model

Fine-tune a 1–7B model with MSE against the recovered scores. Open weights available; managed inference for low-latency deployments.

04

Per-rater calibration (zerank-2 evolution)

Each LLM rater gets a (μ, κ) Beta calibration that we iteratively mix, weighting each rater's judgment by how reliable it has been. Same trick generalizes to custom-task training.

Pairwise Preferences → Continuous Relevance ScoresSTEP 1 · GROUND TRUTHfrontier LLMs(q, dᵢ, dⱼ)1 random pairclaudeCoTgptCoTgeminiCoTpᵢⱼ = ⟨ensemble⟩112K QUERIES · 112K GOLD PAIRSexpensive · slow · gold→ DISTILL VIA BCE LOSSSTEP 2 · DISTILL PAIRWISEpairwise SLM rerankerR'_pairℒ = BCE(pᵢⱼ, p'ᵢⱼ)→ B→ B→ BQWEN3-4B INIT · ~1000× FASTERnear-ensemble accuracy, SLM speed↓ INFERENCE OVER A GRAPHSTEP 3 · zELO FITgraph of pairs → fitted Elosd₁d₂d₃d₄d₅d₆d₇d₈cycle 1cycle 2k = 4 · diam = 2~0.4% of all pairsk/2 random cycles, unionedTHURSTONE FIT10Eloᵢ − EloⱼP = ½(1 + erf · Δ)FITTED Elos PER (q, d)A0.86B0.68C0.49D0.32E0.18STEP 4 · DISTILL POINTWISEpointwise rerankerzerank-1ℒ = (R(q,d) − Elo)²(q, d):single fwd-pass0.82~5M (q, d, Elo) MSE PAIRSQwen3-4B → zerank-1→ SHIPS AS zerank-1112K LLM-ENSEMBLE INFERENCES → 5M MSE TARGETS · NO HUMAN ANNOTATIONS
Ship Models That Work

Integrate ZeroEntropy models in minutes. Production-ready, latency-optimized, available everywhere.

AWSHugging FaceAzure
Partner Providers

Access all models through a single, latency-optimized API, or through our partner providers.

# Create an API Key at https://dashboard.zeroentropy.dev

from zeroentropy import ZeroEntropy

zclient = ZeroEntropy()

response = zclient.models.rerank(
    model="zerank-2",
    query="What is Retrieval Augmented Generation?",
    documents=[
        "RAG combines retrieval with generation...",
    ],
)

for doc in response.results:
    print(doc)
API
ZeroEntropy API

Start building in minutes with Python and TypeScript SDKs.

VPC
ZeroEntropy VPC

Deploy in your own cloud with dedicated infrastructure. Available on AWS Marketplace and Azure.

Enterprise
Enterprise and Model Licensing

Custom deployments, dedicated capacity, model licensing, model fine-tuning, and SLAs. Talk to us.

Enterprise-Ready

From security to scale, ZeroEntropy is built for the demands of production ready AI

Compliance portal
SOC2 Type II

SOC2 Type II

Audited controls for data security, availability, and confidentiality — verified annually.

HIPAA Compliant

HIPAA Compliant

BAA-ready infrastructure with encryption at rest and in transit for protected health data.

Security lock blueprint
GDPR Compliant

GDPR Compliant

Full data residency controls, right-to-deletion, and DPA agreements for EU customers.

CCPA Compliant

CCPA Compliant

Consumer data rights honored with full transparency on collection, use, and deletion.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord