Rerankerszerank-2 · zerank-2-small · zerank-2-nano Embeddingszembed-1 Custom Modelscontext compression · query rewriting · fine-tuning Enterpriseon-prem · dedicated · SLA

Legal Manufacturing Healthcare Finance Customer Support E-Commerce

Documentation Slack Community Discord

BlogEngineering posts, releases, and field notes.ConceptsReference catalog of retrieval + LLM primitives.PlaybooksNamed failure modes with diagnostics and fixes.VersusHead-to-head against every major competitor.EvalsHow we benchmark in production conditions.

Latest from the blog

Matryoshka Is Dead: Why MRL Isn't Lossless for zembed-1

Zemail: Semantic Gmail Search on Claude Code & Cowork

AutoOptimize: Why Your Embedding Model Is the Bottleneck in Agentic AI

Reranking Reddit: What Happens When You Sort Comments by Relevance Instead of Karma

harrier-27b: Can 27B Parameters Beat zembed-1?

Pricing

Rerankerszerank-2 · zerank-2-small · zerank-2-nano Embeddingszembed-1 Custom Modelscontext compression · query rewriting · fine-tuning Enterpriseon-prem · dedicated · SLA

Legal Manufacturing Healthcare Finance Customer Support E-Commerce

Documentation Slack Community Discord

Blog

Matryoshka Is Dead: Why MRL Isn't Lossless for zembed-1

Zemail: Semantic Gmail Search on Claude Code & Cowork

AutoOptimize: Why Your Embedding Model Is the Bottleneck in Agentic AI

Reranking Reddit: What Happens When You Sort Comments by Relevance Instead of Karma

harrier-27b: Can 27B Parameters Beat zembed-1?

Smarter Context Compression for LLM Pipelines: zerank-2 as a Calibrated Classifier

Beyond Binary: A New Version of the MTEB

zembed-1 vs voyage-4: Our Embedding Model Wins on Retrieval

"Let's eat, grandma" vs "let's eat grandma": how embedding models encode the world

Introducing zembed-1: The World's Best Text-Embedding Model

How Assembled Powers High-Quality AI Customer Support with ZeroEntropy

Prompting Best Practices For Instruction-Following Rerankers

Open-source alternatives to Cohere Rerank in 2026

Latency Performance Assessment of zerank-2

Introducing zerank-2: The Most Accurate Multilingual Instruction-Following Reranker

The Latency Myth: Why Reranking Is Still the Smartest Optimization You Can Make

Context Engineering Webinar: Everything You Missed

How Vera Health Achieved State-of-the-Art Clinical Accuracy Using ZeroEntropy

Equall Improves Legal Document Structuring and Retrieval Accuracy with ZeroEntropy

Implementing ZeroEntropy Reranking with turbopuffer Retrieval

Paper TLDR: How we trained zerank-1 with the zELO method

Mem0 Improves Memory Retrieval Accuracy with ZeroEntropy

On The Geometric Limit of Dense Single Vector Embeddings

Should You Use LLMs for Reranking? A Deep Dive into Pointwise, Listwise, and Cross-Encoders

My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

Announcing ZeroEntropy's First Rerankers: zerank-1 and zerank-1-small

ZeroEntropy Raises $4.2M Seed Round to Make AI Retrieval Truly Intelligent

Improving Retrieval with ELO Scores

What is a reranker and do I need one?

Deep Dive: The Architecture of ZeroEntropy v1

AGI requires better retrieval, not just better LLMs

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Pricing Evals Sign in

Concepts / Agents

Topic · 12 concepts

Agents

When LLMs become decision-makers in a loop.

Agentic systems use an LLM to decide which actions to take, observe their results, and continue. The concepts below cover the building blocks — tool use, function calling, planning, trajectory pruning — and the operational patterns that make agent loops reliable enough to ship. Specialized small models increasingly handle the narrow tasks inside agents (tool selection, argument extraction, sub-task decomposition) while the LLM does the open-ended reasoning.

Agent

An agent is an LLM placed in a perception/decision/action loop — it reads context, picks an action (often a tool call), observes the result, and iterates until the goal is met.
Agent Guardrails

Agent guardrails are the input/output filters, tool-call validators, and allow-lists that bound what an agent can do and say. Defense-in-depth: layered checks at the prompt boundary, the tool boundary.
Agent Loop

The agent loop is the execution scaffold that wraps an LLM into an agent: perceive → think → act → observe → repeat. It's the trajectory primitive.
Agent Memory

Agent memory is how an agent persists information across turns and sessions. Short-term memory lives in the context window; long-term memory lives in an external store (vector DB, structured records, files).
Agent Orchestration

Agent orchestration is the routing layer that decides which agent or model handles each step. The dominant patterns are workflow orchestration (a deterministic graph of agents) and autonomous orchestration (a supervisor delegating to sub-agents).
Agentic RAG

Agentic RAG is RAG where the model decides what to retrieve, reformulates queries, and iterates — instead of a single pre-baked query going to the index.
Function Calling

Function calling is the structured-API mechanism that providers (OpenAI, Anthropic, Google) expose for tool use: you give the model a JSON schema describing each function, and the model responds with a typed call request the runtime can execute.
MCP (Model Context Protocol)

MCP is Anthropic's open standard for connecting LLMs to tools and data sources. An MCP server exposes a catalog of tools, resources, and prompts; any MCP-aware client can use them.
Multi-Agent Systems

Multi-agent systems use multiple specialized agents — different roles, tools, or models — coordinating to solve a task. Patterns range from a coordinator dispatching to specialists to debate setups where agents argue toward a better answer.
Planning and Decomposition

Planning and decomposition is the agent pattern of breaking a complex goal into ordered sub-tasks and executing them, instead of trying to one-shot the whole thing.
Reflection and Critique

Reflection is the agent self-evaluation pattern: produce an answer, evaluate it against the goal or known criteria, refine if needed. It catches errors that one-shot generation misses, at the cost of extra tokens and latency.
Tool Use

Tool use is the pattern where an LLM emits a structured request to call an external function — a search API, a code runner, a database query — and the runtime executes it and returns the result.