Back

Latency Performance Assessment of zerank-2

Dec 9, 2025 ·

Latency Performance Assessment of zerank-2

TL;DR

zerank-2 delivers consistent, low-latency performance under realistic production conditions. In our testing, 97.3% of requests completed under 500ms with zero failures. This document presents our latency measurements and explains how to properly benchmark reranker performance.

Why Proper Latency Testing Matters

When evaluating reranker latency, it’s critical that your testing reflects actual production usage patterns. Real user traffic doesn’t arrive at uniform intervals. It comes in bursts and clusters. Testing with sequential requests or artificial patterns will give you misleading results that don’t predict real-world performance.

Our tests use Poisson arrival patterns because they model the random, bursty nature of production traffic. This approach reveals how systems behave under realistic load conditions, including queueing effects and concurrent request handling.

Testing Methodology

All tests conducted using:

Poisson arrival patterns at 1-10 requests/second
60-second test duration
50 documents per request
Payload size ≤2KB per document

Performance Results

ZeRank-2 Latency Distribution

Latency Threshold	Requests Exceeding Threshold
>75ms	100.0%
>100ms	100.0%
>150ms	50.5%
>200ms	21.2%
>250ms	11.3%
>500ms	2.7%
>750ms	1.4%
>1s	0.9%
>3s	0.0%
>5s	0.0%
>10s	0.0%
>30s	0.0%
Failed	0.0%

Comparative Performance

Threshold	zerank-2	Cohere rerank-3.5	Jina reranker m0	Voyage rerank-2.5
>150ms	50.5%	34.3%	100.0%	80.5%
>500ms	2.7%	14.3%	70.8%	10.9%
>1s	0.9%	11.6%	57.4%	9.7%
>10s	0.0%	6.4%	55.7%	9.2%
Failed	0.0%	0.0%	55.7%	9.2%

Key Metrics

Zero failures across all test conditions
97.3% of requests completed under 500ms
99.1% of requests completed under 1 second
100% of requests completed under 3 seconds

zerank-2 maintains consistent performance across the entire latency distribution, with no requests exceeding 3 seconds.

Related Blogs

Catch all the latest releases and updates from ZeroEntropy.

Apr 02, 2026

Smarter Context Compression for LLM Pipelines: zerank-2 as a Calibrated Classifier

How to use zerank-2's calibrated relevance scores as a binary classifier for context compression, document routing, and multi-label classification — at 50-100x less cost than LLM classification.

Mar 02, 2026

"Let's eat, grandma" vs "let's eat grandma": how embedding models encode the world

A deep dive into how embedding models encode meaning, why famous training examples create the illusion of capability, and what consistent behavior across 10k+ nouns tells us about genuine understanding.

Feb 23, 2026

2026's Top 10 Embedding Companies Powering Search Technology

The best AI teams retrieve with ZeroEntropy

Book Demo View docs