Product · Custom Models

Specialized Small Models for Specialized Core Tasks

Specialized small models, one per task, for each of the moving parts inside your production AI stack — query rewriting tuned to your enterprise API's exact shape, context compression for long-running agentic coding harnesses, semantic grep that finds the function whose name you forgot. Cheaper, better, and much faster than steering an LLM into the same job.

Custom models inherit the same compliant platform — see our Trust Center.

Get API Key Read Docs

Models

Query rewriting

Fixes agent ↔ API recall gapsenterprise APIs · agent tool stacks

Context compression

Tames long agent tracesagentic coding · tool-call traces · memory

Domain fine-tunes

Custom zerank-2 / zembed-1legal · clinical · scientific · code

Semantic grep

Find the function whose name you forgotcodebase recall · agentic coding · zero-keyword

Trusted by

INFRASTRUCTURE & COMPLIANCENeed on-prem, SOC 2, or BAA paperwork?Both our public and custom models can run inside your VPC, on-prem, or air-gapped — see Enterprise for the infrastructure side.

See Enterprise

The Problem

Generalist LLMs are the wrong tool for narrow repeated tasks.

An LLM has to relearn the task from instructions on every call. It's huge, slow, and expensive at production traffic. A specialized model — small, with the behaviour baked into the weights — runs circles around it on the narrow thing it was trained to do, often by two or even three orders of magnitude on cost and latency.

0.5–4B

parameters

Where most of our specialized models land. Cheap to host, fast to call, easy to deploy in a VPC.

~100×

cheaper than LLM

Typical cost gap between a custom-trained model and a frontier LLM doing the same narrow task.

1–2 wks

training time

White-glove — hand us an eval set and task definition. We make the data and ship the model.

What Teams Are Saying

“We had an enterprise customer whose agent was hitting an awkward search API and losing recall to mismatched query syntax they simply couldn't prompt out of their LLM. We trained a small rewrite model on (intent, observed-API-recall) pairs; recall on the agent's downstream task went up substantially and latency halved.”

ZeroEntropy teamCustom training, Internal note

Methodology

zELO — preferences, not labels

The same statistical idea behind zerank-2 generalizes. For any task where ground truth is 'which of these candidates is better for the goal,' we collect pairwise preferences from frontier-LLM ensembles, recover continuous scores via Thurstone fit, and fine-tune a small specialized model against those scores. Transfers cleanly to query rewriting, context compression, and domain-specific retrieval.

Pairwise preferences > absolute labels

Asking 'which of these two is better for the task?' is far more stable than asking 'how good is this on a 0–1 scale?'. Lower noise, higher signal, scales to LLM ensembles.

Thurstone-fit continuous targets

Many pairwise outcomes feed a Thurstone fit (the same math behind chess Elo) that recovers a continuous score per item — the supervision signal we train against.

Distill into a small specialized model

Fine-tune a 1–7B model with MSE against the recovered scores. Open weights available; managed inference for low-latency deployments.

Per-rater calibration (zerank-2 evolution)

Each LLM rater gets a (μ, κ) Beta calibration that we iteratively mix, weighting each rater's judgment by how reliable it has been. Same trick generalizes to custom-task training.

Ship Models That Work

Integrate ZeroEntropy models in minutes. Production-ready, latency-optimized, available everywhere.

Partner Providers

Access all models through a single, latency-optimized API, or through our partner providers.

# Create an API Key at https://dashboard.zeroentropy.dev

from zeroentropy import ZeroEntropy

zclient = ZeroEntropy()

response = zclient.models.rerank(
    model="zerank-2",
    query="What is Retrieval Augmented Generation?",
    documents=[
        "RAG combines retrieval with generation...",
    ],
)

for doc in response.results:
    print(doc)

ZeroEntropy API

Start building in minutes with Python and TypeScript SDKs.

ZeroEntropy VPC

Deploy in your own cloud with dedicated infrastructure. Available on AWS Marketplace and Azure.

Enterprise and Model Licensing

Custom deployments, dedicated capacity, model licensing, model fine-tuning, and SLAs. Talk to us.

Enterprise-Ready

From security to scale, ZeroEntropy is built for the demands of production ready AI

Compliance portal

SOC2 Type II

Audited controls for data security, availability, and confidentiality — verified annually.

HIPAA Compliant

BAA-ready infrastructure with encryption at rest and in transit for protected health data.

GDPR Compliant

Full data residency controls, right-to-deletion, and DPA agreements for EU customers.

CCPA Compliant

Consumer data rights honored with full transparency on collection, use, and deletion.

The best AI teams build with ZeroEntropy models

Book Demo View docs