My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

Sep 4, 2025 · GitHub Twitter Slack LinkedIn Discord
My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy
TL;DR

My AskAI replaced its existing reranker with ZeroEntropy’s zerank‑1 across production traffic. Results: faster responses at scale, a measurable lift in answer quality, and lower cost. After an A/B rollout in production with strong significance, My AskAI migrated 100 percent of rerank requests to ZeroEntropy.

“We ran an A/B test in production, and after only a few days we saw a statistically significant accuracy bump. Along with the cost and latency improvements, this was a no-brainer decision.”

— Alex Rainey, CTO, My AskAI

Company

My AskAI provides AI customer‑support agents that integrate with tools like Zendesk, Intercom, Gorgias, and Freshdesk. The product resolves, on average, 75% of all customer support tickets and can also gracefully escalate to human agents. They offer enterprise-grade security with full GDPR compliance. On top of that, they’re one of the most cost effective solutions in the market, charging just $0.10 per support ticket handled.

My AskAI product screenshot

Problem

My AskAI’s existing reranker introduced latency variance and tail latency spikes, limiting how many candidate chunks they could safely score per query. The team wanted to push throughput and improve answer quality without raising costs.

Constraints
  • Production traffic measured in tens of thousands of queries per day
  • Latency budgets for live support workflows
  • Need for straightforward integration and predictable scaling behavior

Approach

My AskAI ran an A/B in production: existing reranker vs ZeroEntropy zerank‑1. The experiment measured latency distributions, error rates, and internal success metrics such as the “I don’t know” rate.

Integration

ZeroEntropy is a drop‑in cross‑encoder reranker that sits after first‑stage retrieval. Migration involved swapping the rerank call in My AskAI’s retrieval pipeline.

from zeroentropy import ZeroEntropy

zclient = ZeroEntropy()
response = zclient.models.rerank(
    model="zerank-1",
    query="What's the cancellation policy for my booking?",
    documents=[
        "Reservations are fully refundable if canceled at least 24h before the check-in date. ",
        "Cancellation policies vary depending on the type of reservation.",
        "Flexible Rate bookings may be canceled up to 6pm on the day prior to arrival ",
    ],
)

Scalability expectation

For a fixed model and hardware profile, rerank latency grows roughly O(N) with the number of candidate documents. This informed My AskAI’s plan to increase the candidate cap from 50 to 100 while watching p95 and p99.

Results

Latency in production

Over 113,878 requests

MetricLatency
p50173 ms
p90240 ms
p99352 ms

Quality

“Our key metric is AI resolution and AI CSAT” says Alex Rainey, CTO of My AskAI, “both of these were ~3% higher (absolute change). These may seem small, but we have a highly optimized AI support agent system, so gains like this are rare and usually come with a significant latency or cost impact.”

Cost

A 25% cost reduction compared to My AskAI’s prior provider; ZeroEntropy rerank pricing is 0.025 per million tokens.

Decision

My AskAI moved all rerank requests to ZeroEntropy.

“After running an A/B test in production, after only a few days we saw a statistically significant result. Along with the cost and latency improvements, this was a no-brainer decision.”

— Alex Rainey, Co‑founder, MyAskAI

Why ZeroEntropy

Speed and tail control

Consistent p50–p99 improvements made it possible to increase candidate set size without breaching SLOs. By reranking more documents, users got richer context and more accurate AI responses.

Accuracy

Cross‑encoder scoring trained with zELO pairwise ranking delivered a measurable lift, with a simple swap of an API call.

Cost efficiency

zerank-1’s competitive pricing significantly lowered MyAskAI’s cost, even while doubling the number tokens reranked.

Roadmap fit

Instruction‑following reranking and customer‑specific finetuning are planned.

Takeaways for technical leaders

About ZeroEntropy

ZeroEntropy provides rerankers, embeddings, and an end‑to‑end retrieval engine. The zerank‑1 reranker is available via API, through our partner Baseten, and soon in the AWS Marketplace.

Get Started

Try ZeroEntropy's zerank‑1 reranker in your own retrieval pipeline.

Related Blogs

Catch all the latest releases and updates from ZeroEntropy.

ZeroEntropy
The best AI teams retrieve with ZeroEntropy
Follow us on
GitHubTwitterSlackLinkedInDiscord