Embeddings
The dense-vector layer of modern retrieval.
Embedding models compress a piece of text into a fixed-size vector whose position encodes meaning. Two queries about the same thing land near each other in the space; two unrelated queries land far apart. That spatial property is the foundation of dense retrieval, semantic search, and most modern RAG. The concepts below cover how embeddings are produced, how to compare them (cosine similarity, dot product, magnitudes), and the inference-time levers — dimension truncation, quantization, cross-lingual support — that determine whether your billion-document index costs cents or thousands of dollars per month.
- 2-norm (Euclidean Length)
The 2-norm of a vector is its Euclidean length — the square root of the sum of squared components. Normalizing a vector to 2-norm = 1 makes it a unit vector.
- Bi-Encoder
A bi-encoder embeds the query and the document separately into vectors, then compares them with a dot product or cosine. Fast and cacheable — the basis of every dense retrieval system.
- Contrastive Learning
The training paradigm behind almost every modern embedding model. Pull positive pairs (query, relevant document) close in vector space; push negatives far apart.
- Cosine Similarity
Cosine similarity is the cosine of the angle between two vectors — equivalently, their dot product divided by the product of their magnitudes. It's the standard way to compare embedding vectors for relevance.
- Cross-Lingual Retrieval
Cross-lingual retrieval is finding documents in one language that answer a query in another. A multilingual embedding or reranker maps text from any language into the same vector space, so a French query can retrieve English documents.
- Curse of Dimensionality
In high-dimensional spaces, distance and similarity behave counterintuitively — random points become nearly equidistant, volume concentrates near the surface of any region, and naive nearest-neighbor search loses much of its discriminative power.
- Embedding
An embedding is a fixed-size vector representation of a piece of text (or image, audio, etc) that places semantically similar inputs near each other in a high-dimensional space. The basis of dense retrieval, semantic search, and most modern RAG.
- Embedding Quantization
Quantization compresses each dimension of an embedding from 32-bit floats down to smaller representations — typically int8 (4× smaller) or single-bit binary (32× smaller) — to shrink index size and speed up similarity search.
- Hard-Negative Mining
The training-data trick that makes embedders actually competitive: source negatives that look similar to the positive but aren't actually relevant.
- In-Batch Negatives
The simplest way to scale contrastive training: treat every other example in the same batch as a negative for the current positive pair. Free supervision, no extra forward passes. The reason embedder training cares about batch size.
- InfoNCE Loss
InfoNCE is the contrastive loss objective behind almost every modern embedder. For each positive pair, softmax-normalize the similarities of (positive, negatives) and treat it as N+1-way classification.
- Johnson-Lindenstrauss Lemma
A 1984 result that says you can reduce a high-dimensional vector to a much lower dimension via random projection while approximately preserving pairwise distances. The mathematical reason aggressive dimension truncation works for embeddings.
- Matryoshka Representation Learning (MRL)
Matryoshka representation learning trains an embedding model so that *prefixes* of its output vector are themselves valid embeddings — letting you truncate from 2048 to 1024 to 512 dimensions at inference time without retraining.
- Multimodal Embeddings
An embedding space shared across modalities — text, image, audio, video — so a query in one modality retrieves content in another. CLIP-style contrastive training is the dominant recipe. Doing it well is far harder than doing it at all.
- Multiple Negatives Ranking Loss
MNRL is a contrastive ranking loss that scores a query against one positive and many negatives, then trains the positive to score highest. Popularized by sentence-transformers, it's the workhorse loss for fine-tuning bi-encoders on labeled pairs.
- Orthogonality Concentration
In high dimensions, two random vectors are almost always nearly orthogonal — their cosine similarity concentrates sharply around 0. The reason untrained embeddings give noise and why training has to actively fight the geometry.
- Foundations 48
The bedrock primitives every other topic builds on.
- Data 18
The corpora, curation, and quality decisions that make models possible.
- Language Models 32
The foundational substrate of modern AI.
- Multimodal 13
When text isn't the only signal — vision, audio, and joint embedding spaces.
- Prompting 16
How you talk to an LLM, and when you stop.
- Agents 12
When LLMs become decision-makers in a loop.
- Search & Retrieval 21
How systems find relevant documents in the first place.
- Rerankers 9
The second stage that puts the right answer at the top.
- Evaluation 21
How to measure retrieval quality and trust the numbers.
- Training Methodology 21
How modern retrieval models get their relevance signal.
- Performance Engineering 25
Squeezing throughput, latency, and memory out of GPUs.
- Production 16
From notebook to live traffic.
