Topic · 21 concepts

Evaluation

How to measure retrieval quality and trust the numbers.

Retrieval quality lives or dies on what you measure. The concepts below cover the metrics that drive every reranker and embedding leaderboard — NDCG@K, Recall@K, MRR — plus the public benchmarks that report them (most prominently MTEB). Beyond mechanics, the harder skill is knowing which metric to optimize for your downstream use case, which benchmarks generalize, and where leaderboard numbers diverge from production performance. Most retrieval systems get tuned against the wrong metric for years; the fix is usually one chart away.

Other topics
ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord