ZeroEntropy's zerank-1 vs. Jina AI's jina-reranker-m0

Aug 7, 2025 · GitHub Twitter Slack LinkedIn Discord
ZeroEntropy's zerank-1 vs. Jina AI's jina-reranker-m0
TL;DR

ZeroEntropy’s zerank-1 outperforms Jina AI’s jina-reranker-m0 across the board: ~+4% higher NDCG@10, up to ~12x faster latency, and 2x cheaper pricing—all on text-only reranking workloads. Jina’s model wins if you need true multimodal (image + text) reranking.

What Is a Reranker and Why You Might Need One

A reranker is a cross-encoder neural network that rescores and reorders an initial set of candidate documents based on query–document relevance. By processing each query–document pair together, it picks up subtle semantic signals that keyword or bi-encoder methods miss. Rerankers slot in after your first-stage search, whether BM25, vector search, or hybrid, to maximize precision in your top k results. Learn more in our guide to rerankers and why they matter: What Is a Reranker and Do I Need One?

TLDR: Jina AI’s vs ZeroEntropy’s latest rerankers

ModelNDCG@10Latency (12 kb)Latency (75 kb)Price
jina-rerank-m00.7279547.14 ± 66.84 ms1990.37 ± 115.91 ms$0.050/1M tokens
zerank-10.7683149.7 ms ± 53.1156.4 ms ± 94.6$0.025/1M tokens
Ratio~+4%~3.7x faster~12x faster2x cheaper

You can read a more thorough benchmark of zerank-1 and its open-source counterpart zerank-1-small here.

Breakdown of the comparison

Accuracy

Normalized Discounted Cumulative Gain at cutoff 10 (NDCG@10) evaluates ranking quality by rewarding highly relevant documents in early positions. It combines a relevance score (e.g. graded 0–3) with a logarithmic discount on rank, then normalizes against the ideal ordering. Values range from 0 (poor) to 1 (perfect).

Because NDCG@10 applies a steep logarithmic discount to top‐ranked items and then normalizes against the perfect ordering, even a single highly relevant document slipping from position 1 to 2 can slash its contribution and send your overall score tumbling. Errors compound across the top ten slots, so maintaining near-perfect ordering on diverse datasets makes squeezing out every fraction of a percent extremely challenging.

Latency

Latency measurements show that ZeroEntropy’s zerank-1 processes a 12 KB payload in under 150 ms on average—about 4 times faster than Jina’s m0—and sustains response times below 315 ms even for 150 KB inputs. These improvements stem from optimizations in our inference engine that minimize overhead in cross-encoder scoring and make real-time reranking at scale practical for large payloads.

Payload sizejina-reranker-m0 latencyzerank-1 latencyRatio
12 KB547.14 ± 66.84 ms149.7 ms ± 53.1ZeroEntropy ~4x faster
75 KB1990.37 ± 115.91 ms156.4 ms ± 94.6ZeroEntropy ~12x faster

Price

A reranker request consumes bytes based on the number of documents and the total length of the input. The formula is:

Total bytes = 150
+ len(query.encode("utf-8"))
+ len(document.encode("utf-8"))

This is calculated per document, so the query is counted once for each document you pass in.

For example, if you send a request with 10 documents, the total usage is:

10 × len(query.encode("utf-8"))
+len(document_i.encode("utf-8")) for i in 110

Our pricing is simple and transparent. We charge $0.025/1M tokens.

Jina AI’s pricing is calculated in the exact same fashion, however, they charge $0.050/1M tokens, which is twice the cost.

What the Models Target

jina-reranker-m0

Purpose: Multilingual + multimodal reranking for visually rich documents (pages, figures, tables, infographics) and code-search tasks

Inputs: Query + up to 29-language document images or text blocks

Use cases: Visual document search, long-form multimodal text reranking

zerank-1

Purpose: High-precision text-only reranking to boost any first-stage retrieval (BM25, vector search)

Inputs: Query + candidate text documents

Use cases: Enterprise search, RAG pipelines, Voice AI, customer-facing search improvements

Which to Choose?

Pick jina-reranker-m0 if you need true multimodal reranking (images + text)

Pick zerank-1 if:

  • Your use case is text-only and you need maximum top-k precision
  • You prefer an API with low latency and cheap token-based pricing
  • You require enterprise SLA or on-prem support

Related Blogs

Catch all the latest releases and updates from ZeroEntropy.

ZeroEntropy
The best AI teams retrieve with ZeroEntropy
Follow us on
GitHubTwitterSlackLinkedInDiscord