Back

My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

Sep 4, 2025 ·

My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

TL;DR

My AskAI replaced its existing reranker with ZeroEntropy’s zerank‑1 across production traffic. Results: faster responses at scale, a measurable lift in answer quality, and lower cost. After an A/B rollout in production with strong significance, My AskAI migrated 100 percent of rerank requests to ZeroEntropy.

“We ran an A/B test in production, and after only a few days we saw a statistically significant accuracy bump. Along with the cost and latency improvements, this was a no-brainer decision.”

— Alex Rainey, CTO, My AskAI

Company

My AskAI provides AI customer‑support agents that integrate with tools like Zendesk, Intercom, Gorgias, and Freshdesk. The product resolves, on average, 75% of all customer support tickets and can also gracefully escalate to human agents. They offer enterprise-grade security with full GDPR compliance. On top of that, they’re one of the most cost effective solutions in the market, charging just $0.10 per support ticket handled.

Problem

My AskAI’s existing reranker introduced latency variance and tail latency spikes, limiting how many candidate chunks they could safely score per query. The team wanted to push throughput and improve answer quality without raising costs.

Constraints

Production traffic measured in tens of thousands of queries per day
Latency budgets for live support workflows
Need for straightforward integration and predictable scaling behavior

Approach

My AskAI ran an A/B in production: existing reranker vs ZeroEntropy zerank‑1. The experiment measured latency distributions, error rates, and internal success metrics such as the “I don’t know” rate.

Integration

ZeroEntropy is a drop‑in cross‑encoder reranker that sits after first‑stage retrieval. Migration involved swapping the rerank call in My AskAI’s retrieval pipeline.

from zeroentropy import ZeroEntropy

zclient = ZeroEntropy()
response = zclient.models.rerank(
    model="zerank-1",
    query="What's the cancellation policy for my booking?",
    documents=[
        "Reservations are fully refundable if canceled at least 24h before the check-in date. ",
        "Cancellation policies vary depending on the type of reservation.",
        "Flexible Rate bookings may be canceled up to 6pm on the day prior to arrival ",
    ],
)

Scalability expectation

For a fixed model and hardware profile, rerank latency grows roughly O(N) with the number of candidate documents. This informed My AskAI’s plan to increase the candidate cap from 50 to 100 while watching p95 and p99.

Results

Latency in production

Over 113,878 requests

Metric	Latency
p50	173 ms
p90	240 ms
p99	352 ms

Quality

“Our key metric is AI resolution and AI CSAT” says Alex Rainey, CTO of My AskAI, “both of these were ~3% higher (absolute change). These may seem small, but we have a highly optimized AI support agent system, so gains like this are rare and usually come with a significant latency or cost impact.”

Cost

A 25% cost reduction compared to My AskAI’s prior provider; ZeroEntropy rerank pricing is 0.025 per million tokens.

Decision

My AskAI moved all rerank requests to ZeroEntropy.

“After running an A/B test in production, after only a few days we saw a statistically significant result. Along with the cost and latency improvements, this was a no-brainer decision.”

— Alex Rainey, Co‑founder, MyAskAI

Why ZeroEntropy

Speed and tail control

Consistent p50–p99 improvements made it possible to increase candidate set size without breaching SLOs. By reranking more documents, users got richer context and more accurate AI responses.

Accuracy

Cross‑encoder scoring trained with zELO pairwise ranking delivered a measurable lift, with a simple swap of an API call.

Cost efficiency

zerank-1’s competitive pricing significantly lowered MyAskAI’s cost, even while doubling the number tokens reranked.

Roadmap fit

Instruction‑following reranking and customer‑specific finetuning are planned.

Takeaways for technical leaders

About ZeroEntropy

ZeroEntropy provides rerankers, embeddings, and an end‑to‑end retrieval engine. The zerank‑1 reranker is available via API, through our partner Baseten, and soon in the AWS Marketplace.

Get Started

Try ZeroEntropy's zerank‑1 reranker in your own retrieval pipeline.

→ Create an API Key Get Started with ZeroEntropy → View Documentation Explore available models

Contact founders@zeroentropy.dev for enterprise terms.

Related Blogs

Catch all the latest releases and updates from ZeroEntropy.

Apr 02, 2026

Smarter Context Compression for LLM Pipelines: zerank-2 as a Calibrated Classifier

How to use zerank-2's calibrated relevance scores as a binary classifier for context compression, document routing, and multi-label classification — at 50-100x less cost than LLM classification.

Mar 02, 2026

"Let's eat, grandma" vs "let's eat grandma": how embedding models encode the world

A deep dive into how embedding models encode meaning, why famous training examples create the illusion of capability, and what consistent behavior across 10k+ nouns tells us about genuine understanding.

Feb 23, 2026

2026's Top 10 Embedding Companies Powering Search Technology

The best AI teams retrieve with ZeroEntropy

Book Demo View docs