Rerankers
The second stage that puts the right answer at the top.
First-pass retrieval is fast and approximate; rerankers are slow and precise. A reranker — typically a cross-encoder — takes a small candidate set from first-pass and reorders it by actual relevance, paying close attention to each (query, document) pair. The concepts below cover the architectural variants (cross-encoder vs bi-encoder, pointwise vs pairwise vs listwise), the production properties that matter (calibration, instruction-following, confidence), and the tradeoffs that make rerankers indispensable in any RAG pipeline aiming for high-quality answers.
- Cascade Rerankers
A cascade reranker stacks multiple rerankers from cheap-and-fast to expensive-and-accurate, with each stage filtering candidates before passing a smaller set to the next.
- ColBERT
A late-interaction retrieval architecture: encode each token of query and document into its own vector, score pairs by maxsim. Sits between bi-encoder (one vector per text, fast) and cross-encoder (full attention, accurate but slow).
- Cross-Encoder
A cross-encoder takes a (query, document) pair as a single joint input and produces one relevance score. It captures token-level interactions between query and document — much more accurate than embedding them separately, at higher cost per pair.
- Instruction-Following Reranker
An instruction-following reranker accepts an explicit instruction or context alongside the (query, document) pair, and reranks accordingly. Lets you inject business rules, user preferences, or domain context per call without retraining.
- Listwise Reranking
Listwise reranking processes the entire candidate list as a single input and produces a permutation, rather than scoring each (query, document) pair independently. More expressive but more expensive — typically powered by an LLM.
- Pairwise Reranker
A reranker that scores by comparing two candidate documents head-to-head — `model(query, doc_A, doc_B) → which is more relevant`. More accurate than pointwise (transitivity arbitrage, calibration-free) but $O(N^2)$ at inference.
- Pointwise Scoring
Pointwise scoring evaluates each (query, document) pair independently, producing one relevance score per pair. The dominant pattern for cross-encoder rerankers because it's simple, parallelizable, and produces calibrated scores.
- Reranker
A reranker is a second-stage retrieval model that takes a candidate set from first-pass retrieval and reorders it by relevance. It's how production search systems get high precision without paying full LLM cost on every query.
- Score Calibration (Rerankers)
A calibrated reranker outputs scores whose absolute value is meaningful — 0.8 means roughly 80% relevance consistently across queries and domains, so you can threshold and filter reliably. Most rerankers are *not* calibrated.
- Foundations 48
The bedrock primitives every other topic builds on.
- Data 18
The corpora, curation, and quality decisions that make models possible.
- Language Models 32
The foundational substrate of modern AI.
- Multimodal 13
When text isn't the only signal — vision, audio, and joint embedding spaces.
- Prompting 16
How you talk to an LLM, and when you stop.
- Agents 12
When LLMs become decision-makers in a loop.
- Search & Retrieval 21
How systems find relevant documents in the first place.
- Embeddings 16
The dense-vector layer of modern retrieval.
- Evaluation 21
How to measure retrieval quality and trust the numbers.
- Training Methodology 21
How modern retrieval models get their relevance signal.
- Performance Engineering 25
Squeezing throughput, latency, and memory out of GPUs.
- Production 16
From notebook to live traffic.
