Latency Benchmark: Cohere rerank 3.5 vs. ZeroEntropy zerank-1

Jul 22, 2025 · GitHub Twitter Slack LinkedIn Discord
Latency Benchmark: Cohere rerank 3.5 vs. ZeroEntropy zerank-1

Speed is the secret ingredient that makes great AI feel instant

What is a reranker and why you need one

A reranker is a cross-encoder neural model that takes a short list of candidate documents from a fast first-stage search (BM25, vector search or hybrid) and rescoring them with full query–document context. This second-pass step dramatically boosts precision in your top-k results, ensuring your LLM or user sees the most relevant snippets first.

Diagram illustrating the reranking pipeline

Benchmark results

ModelNDCG@10Latency (12 KB)Latency (150 KB)
Cohere rerank 3.50.7091171.5 ms ± 106.8459.2 ms ± 87.9
ZeroEntropy zerank-10.7683149.7 ms ± 53.1314.4 ms ± 94.6

zerank-1 is:

  • ~12 % faster than Cohere 3.5 on small payloads (149.7 ms vs 171.5 ms)
  • ~31 % faster on large payloads (314.4 ms vs 459.2 ms)

All while delivering the highest NDCG@10.

Why speed matters

Whether you’re powering an enterprise search portal or a conversational voice agent, every millisecond counts. Here are some examples why:

Examples
  • RAG apps: Users expect sub-second results. Slow reranking means cold leads and frustrated employees.
  • Voice AI agents: Jitter in your pipeline breaks the illusion of a human-like dialogue. Quick reranking keeps the conversation flowing.
  • E-commerce search bars: Users only go through the top ~10 results which need to be very accurate, but every wasted millisecond can make them churn.

When to use a reranker

Tight LLM contexts

Surface the few most relevant documents so your prompt stays under token limits.

Precision-critical workflows

Legal search, medical Q&A or compliance use cases where every bit of relevance matters.

Cost-sensitive scale

Lower inference time means lower compute bills at 100 M+ monthly calls.

Try zerank-1 today

Try zerank-1 today

Experience sub-200 ms reranking with top-tier accuracy:

Give your search, agent or RAG pipeline the speed boost it needs.

Related Blogs

Catch all the latest releases and updates from ZeroEntropy.

ZeroEntropy
The best AI teams retrieve with ZeroEntropy
Follow us on
GitHubTwitterSlackLinkedInDiscord