Rerankerszerank-2 · zerank-2-small · zerank-2-nano Embeddingszembed-1 Custom Modelscontext compression · query rewriting · fine-tuning Enterpriseon-prem · dedicated · SLA

Legal Manufacturing Healthcare Finance Customer Support E-Commerce

Documentation Slack Community Discord

BlogEngineering posts, releases, and field notes.ConceptsReference catalog of retrieval + LLM primitives.PlaybooksNamed failure modes with diagnostics and fixes.VersusHead-to-head against every major competitor.EvalsHow we benchmark in production conditions.

Latest from the blog

Matryoshka Is Dead: Why MRL Isn't Lossless for zembed-1

Zemail: Semantic Gmail Search on Claude Code & Cowork

AutoOptimize: Why Your Embedding Model Is the Bottleneck in Agentic AI

Reranking Reddit: What Happens When You Sort Comments by Relevance Instead of Karma

harrier-27b: Can 27B Parameters Beat zembed-1?

Pricing

Rerankerszerank-2 · zerank-2-small · zerank-2-nano Embeddingszembed-1 Custom Modelscontext compression · query rewriting · fine-tuning Enterpriseon-prem · dedicated · SLA

Legal Manufacturing Healthcare Finance Customer Support E-Commerce

Documentation Slack Community Discord

Blog

Matryoshka Is Dead: Why MRL Isn't Lossless for zembed-1

Zemail: Semantic Gmail Search on Claude Code & Cowork

AutoOptimize: Why Your Embedding Model Is the Bottleneck in Agentic AI

Reranking Reddit: What Happens When You Sort Comments by Relevance Instead of Karma

harrier-27b: Can 27B Parameters Beat zembed-1?

Smarter Context Compression for LLM Pipelines: zerank-2 as a Calibrated Classifier

Beyond Binary: A New Version of the MTEB

zembed-1 vs voyage-4: Our Embedding Model Wins on Retrieval

"Let's eat, grandma" vs "let's eat grandma": how embedding models encode the world

Introducing zembed-1: The World's Best Text-Embedding Model

How Assembled Powers High-Quality AI Customer Support with ZeroEntropy

Prompting Best Practices For Instruction-Following Rerankers

Open-source alternatives to Cohere Rerank in 2026

Latency Performance Assessment of zerank-2

Introducing zerank-2: The Most Accurate Multilingual Instruction-Following Reranker

The Latency Myth: Why Reranking Is Still the Smartest Optimization You Can Make

Context Engineering Webinar: Everything You Missed

How Vera Health Achieved State-of-the-Art Clinical Accuracy Using ZeroEntropy

Equall Improves Legal Document Structuring and Retrieval Accuracy with ZeroEntropy

Implementing ZeroEntropy Reranking with turbopuffer Retrieval

Paper TLDR: How we trained zerank-1 with the zELO method

Mem0 Improves Memory Retrieval Accuracy with ZeroEntropy

On The Geometric Limit of Dense Single Vector Embeddings

Should You Use LLMs for Reranking? A Deep Dive into Pointwise, Listwise, and Cross-Encoders

My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

Announcing ZeroEntropy's First Rerankers: zerank-1 and zerank-1-small

ZeroEntropy Raises $4.2M Seed Round to Make AI Retrieval Truly Intelligent

Improving Retrieval with ELO Scores

What is a reranker and do I need one?

Deep Dive: The Architecture of ZeroEntropy v1

AGI requires better retrieval, not just better LLMs

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Pricing Evals Sign in

Concepts / Embeddings

Topic · 16 concepts

Embeddings

The dense-vector layer of modern retrieval.

Embedding models compress a piece of text into a fixed-size vector whose position encodes meaning. Two queries about the same thing land near each other in the space; two unrelated queries land far apart. That spatial property is the foundation of dense retrieval, semantic search, and most modern RAG. The concepts below cover how embeddings are produced, how to compare them (cosine similarity, dot product, magnitudes), and the inference-time levers — dimension truncation, quantization, cross-lingual support — that determine whether your billion-document index costs cents or thousands of dollars per month.

2-norm (Euclidean Length)

The 2-norm of a vector is its Euclidean length — the square root of the sum of squared components. Normalizing a vector to 2-norm = 1 makes it a unit vector.
Bi-Encoder

A bi-encoder embeds the query and the document separately into vectors, then compares them with a dot product or cosine. Fast and cacheable — the basis of every dense retrieval system.
Contrastive Learning

The training paradigm behind almost every modern embedding model. Pull positive pairs (query, relevant document) close in vector space; push negatives far apart.
Cosine Similarity

Cosine similarity is the cosine of the angle between two vectors — equivalently, their dot product divided by the product of their magnitudes. It's the standard way to compare embedding vectors for relevance.
Cross-Lingual Retrieval

Cross-lingual retrieval is finding documents in one language that answer a query in another. A multilingual embedding or reranker maps text from any language into the same vector space, so a French query can retrieve English documents.
Curse of Dimensionality

In high-dimensional spaces, distance and similarity behave counterintuitively — random points become nearly equidistant, volume concentrates near the surface of any region, and naive nearest-neighbor search loses much of its discriminative power.
Embedding

An embedding is a fixed-size vector representation of a piece of text (or image, audio, etc) that places semantically similar inputs near each other in a high-dimensional space. The basis of dense retrieval, semantic search, and most modern RAG.
Embedding Quantization

Quantization compresses each dimension of an embedding from 32-bit floats down to smaller representations — typically int8 (4× smaller) or single-bit binary (32× smaller) — to shrink index size and speed up similarity search.
Hard-Negative Mining

The training-data trick that makes embedders actually competitive: source negatives that look similar to the positive but aren't actually relevant.
In-Batch Negatives

The simplest way to scale contrastive training: treat every other example in the same batch as a negative for the current positive pair. Free supervision, no extra forward passes. The reason embedder training cares about batch size.
InfoNCE Loss

InfoNCE is the contrastive loss objective behind almost every modern embedder. For each positive pair, softmax-normalize the similarities of (positive, negatives) and treat it as N+1-way classification.
Johnson-Lindenstrauss Lemma

A 1984 result that says you can reduce a high-dimensional vector to a much lower dimension via random projection while approximately preserving pairwise distances. The mathematical reason aggressive dimension truncation works for embeddings.
Matryoshka Representation Learning (MRL)

Matryoshka representation learning trains an embedding model so that *prefixes* of its output vector are themselves valid embeddings — letting you truncate from 2048 to 1024 to 512 dimensions at inference time without retraining.
Multimodal Embeddings

An embedding space shared across modalities — text, image, audio, video — so a query in one modality retrieves content in another. CLIP-style contrastive training is the dominant recipe. Doing it well is far harder than doing it at all.
Multiple Negatives Ranking Loss

MNRL is a contrastive ranking loss that scores a query against one positive and many negatives, then trains the positive to score highest. Popularized by sentence-transformers, it's the workhorse loss for fine-tuning bi-encoders on labeled pairs.
Orthogonality Concentration

In high dimensions, two random vectors are almost always nearly orthogonal — their cosine similarity concentrates sharply around 0. The reason untrained embeddings give noise and why training has to actively fight the geometry.