How to Do RAG with Mastra and ZeroEntropy

Aug 17, 2025 ·

How to Do RAG with Mastra and ZeroEntropy

This tutorial shows you how to build a high-quality RAG system using Mastra’s framework with ZeroEntropy’s specialized reranker. You’ll learn how to combine fast vector search with accurate reranking to deliver relevant results without the cost and latency penalties of LLM-based reranking.

Why This Stack?

Mastra provides a complete RAG framework with vector store abstractions, metadata filtering, and flexible retrieval patterns. ZeroEntropy offers purpose-built reranking models that outperform LLM-based approaches while being 10x cheaper and faster.

This combination gives you:

Sub-second retrieval even with large document collections
Better relevance than basic vector similarity
Production-ready cost structure ($2-5 per 1k queries vs $20-100 for LLM reranking)
Clean abstractions that let you swap vector stores without rewriting code

Prerequisites

npm install @mastra/core @mastra/rag @mastra/pg @ai-sdk/openai ai

You’ll need:

A PostgreSQL database with pgvector extension
OpenAI API key for embeddings
ZeroEntropy API access (sign up at dashboard.zeroentropy.dev)

Environment setup:

POSTGRES_CONNECTION_STRING=postgresql://user:pass@localhost:5432/rag_db
OPENAI_API_KEY=sk-...
ZEROENTROPY_API_KEY=ze-...  # Sign up at zeroentropy.ai

Step 1: Create the ZeroEntropy Reranker

First, implement the RelevanceScoreProvider interface for ZeroEntropy:

import type { RelevanceScoreProvider } from '@mastra/core/relevance';
import ZeroEntropy from 'zeroentropy';

export class ZeroEntropyRelevanceScorer implements RelevanceScoreProvider {
  private client: ZeroEntropy;
  private model: string;

  constructor(model?: string, apiKey?: string) {
    this.client = new ZeroEntropy({
      apiKey: apiKey || process.env.ZEROENTROPY_API_KEY || '',
    });
    this.model = model || 'zerank-1';
  }

  async getRelevanceScore(query: string, text: string): Promise<number> {
    const response = await this.client.models.rerank({
      query,
      documents: [text],
      model: this.model,
      top_n: 1,
    });

    return response.results[0]?.relevance_score ?? 0;
  }
}

How this works:

Query-document scoring

The getRelevanceScore method is called for each query-document pair during reranking.

API call

It sends the pair to ZeroEntropy’s API using the zerank-1 model.

Score return

Returns a relevance score (0-1 scale) indicating how well the text answers the query.

Fallback

Falls back to 0 if no score is returned.

Step 2: Complete Working Example

Here’s the full implementation replacing GPT-4o-mini with ZeroEntropy:

import { openai } from '@ai-sdk/openai';
import { Mastra } from '@mastra/core/mastra';
import { Agent } from '@mastra/core/agent';
import { PgVector } from '@mastra/pg';
import { MDocument, createVectorQueryTool } from '@mastra/rag';
import { embedMany } from 'ai';
import { ZeroEntropyRelevanceScorer } from './zeroentropy-scorer';

// Create vector query tool with ZeroEntropy reranker
const vectorQueryTool = createVectorQueryTool({
  vectorStoreName: 'pgVector',
  indexName: 'embeddings',
  model: openai.embedding('text-embedding-3-small'),
  reranker: {
    provider: new ZeroEntropyRelevanceScorer('zerank-1'),
  },
});

// Create RAG agent
export const ragAgent = new Agent({
  id: 'rag-agent',
  name: 'RAG Agent',
  instructions: `You are a helpful assistant that answers questions based on the provided context. Keep your answers concise and relevant.
    Important: When asked to answer a question, please base your answer only on the context provided in the tool.
    If the context doesn't contain enough information to fully answer the question, please state that explicitly.`,
  model: openai('gpt-4o-mini'),
  tools: {
    vectorQueryTool,
  },
});

// Initialize Mastra with PgVector
const pgVector = new PgVector({
  connectionString: process.env.POSTGRES_CONNECTION_STRING!
});

Key Takeaways

Related Blogs

Catch all the latest releases and updates from ZeroEntropy.

Apr 02, 2026

Smarter Context Compression for LLM Pipelines: zerank-2 as a Calibrated Classifier

How to use zerank-2's calibrated relevance scores as a binary classifier for context compression, document routing, and multi-label classification — at 50-100x less cost than LLM classification.

Mar 02, 2026

"Let's eat, grandma" vs "let's eat grandma": how embedding models encode the world

A deep dive into how embedding models encode meaning, why famous training examples create the illusion of capability, and what consistent behavior across 10k+ nouns tells us about genuine understanding.

Feb 23, 2026

2026's Top 10 Embedding Companies Powering Search Technology

The best AI teams retrieve with ZeroEntropy

Book Demo View docs