rag-engine

rag-engine

Agentic RAG Framework for Node.js
Zero dependencies. Auto-retries. Full decision trace.

npm install rag-engine

v0.3.0 · 0 dependencies · MIT License · Node 18+

Agentic, Not Basic

Basic RAG returns garbage when retrieval fails. rag-engine retries with better queries, rewrites using document vocabulary, and admits when it doesn't know.

Zero Dependencies

Core runs with just Node.js. No axios, no langchain, no bloat. Native fetch() for API calls, in-memory vector store with cosine similarity.

Full Transparency

Every query returns a decision trace showing exactly what the agent did — search, evaluate, rewrite, synthesize. Debug your RAG in seconds.

Getting Started

Overview

rag-engine is an Agentic RAG (Retrieval-Augmented Generation) framework for Node.js. Unlike basic RAG which blindly returns whatever chunks the vector search finds, rag-engine adds an intelligent agent loop that evaluates whether the retrieved chunks actually answer the question — and retries with better queries when they don't.

  • 5-line quickstart — import, create, ingest, query, done
  • Zero runtime dependencies for the core package
  • Agentic loop with relevance judge, query rewriting, and honest give-up
  • Full decision trace on every query for debugging
  • TypeScript-first with 20+ exported interfaces
  • CLI for quick prototyping
  • Auto-detects LLM provider from environment variables

Installation

npm install rag-engine

Requires Node.js 18+ (for native fetch). Set your OpenAI API key:

export OPENAI_API_KEY=sk-...

Or create a .env file in your project root:

OPENAI_API_KEY=sk-your-key-here

Quick Start (5 Lines)

import { RagEngine } from 'rag-engine'

const rag = await RagEngine.create()           // auto-detects OPENAI_API_KEY
await rag.ingest('./docs')                     // loads, chunks, embeds, stores
const result = await rag.query('How does auth work?')
console.log(result.answer)                     // answer with citations
console.log(result.sources)                    // relevant chunks with scores
console.log(result.trace)                      // full agent decision log
console.log(result.metrics)                    // timing and LLM call stats

CLI Usage

# Ingest files
npx rag-engine ingest ./docs
npx rag-engine ingest ./src --glob "**/*.ts"

# Query (auto-ingests ./docs if present)
npx rag-engine query "How does authentication work?"

# Show index stats
npx rag-engine stats

The CLI reads .env files automatically. Note: CLI uses in-memory store, so documents are re-ingested on every run.

How It Works

The Agentic Pipeline

User Question
1. RETRIEVAL
Vector search → top-K chunks
2. RELEVANCE JUDGE
LLM evaluates: "Do these chunks answer the question?"
Returns score (0-1) + decision
Score ≥ 0.5 → SYNTHESIZE → Answer with citations
Score 0.2-0.5 → REWRITE → Better query, retry
Score < 0.2 → GIVE UP → Honest "I don't know"
RESPONSE
{ answer, sources, trace, metrics }

Basic RAG retrieves chunks and sends them directly to an LLM. If the chunks are bad, the LLM confidently generates a wrong answer. Agentic RAG adds a judge step — and retries with better queries when retrieval fails.

Agent Decisions

DecisionWhenWhat Happens
SYNTHESIZEScore ≥ 0.5Chunks are good — generate answer with citations
REWRITEScore 0.2–0.5Chunks are related but off-topic — rewrite query using document vocabulary, retry
BROADEN< 3 resultsToo few results — broaden the query, retry
GIVE_UPScore < 0.2 or max retriesHonestly say "I don't know"

Decision Trace

Every query returns a full trace array showing exactly what the agent did:

result.trace = [
  { action: "search", query: "How does auth work?", resultsCount: 5, attempt: 1 },
  { action: "evaluate", score: 0.42, decision: "rewrite",
    reasoning: "Chunks discuss authorization roles, not authentication flow" },
  { action: "rewrite", newQuery: "user login JWT token session management" },
  { action: "search", query: "user login JWT...", resultsCount: 8, attempt: 2 },
  { action: "evaluate", score: 0.89, decision: "synthesize",
    reasoning: "Chunks cover login flow, JWT creation, and session handling" },
  { action: "synthesize", attempt: 2 }
]

When your RAG gives a bad answer, check result.trace and instantly see where it went wrong.

Agent Loop Pseudocode

async function agentLoop(query, options) {
  const trace = []
  let currentQuery = query

  for (let attempt = 1; attempt <= options.maxRetries; attempt++) {
    const chunks = await retriever.search(currentQuery, { topK: 10 })

    const judgment = await llm.chatJSON({
      system: 'Score 0-1, decide: synthesize|rewrite|broaden|give_up',
      user: `Question: ${currentQuery}\nChunks:\n${formatChunks(chunks)}`
    })

    if (judgment.decision === 'synthesize') {
      const answer = await llm.chat({
        system: 'Answer using ONLY the provided chunks. Cite sources.',
        user: `Question: ${query}\nChunks:\n${formatChunks(chunks)}`
      })
      return { answer, sources: chunks, trace }
    }

    if (judgment.decision === 'rewrite') {
      currentQuery = judgment.rewrittenQuery
      continue
    }

    if (judgment.decision === 'give_up') {
      return { answer: "I don't know.", sources: [], trace }
    }
  }
  return { answer: 'Could not find a confident answer.', sources: [], trace }
}

API Reference

RagEngine

declare class RagEngine {
  static create(config?: RagConfig): Promise<RagEngine>
  ingest(pathOrText: string, options?: IngestOptions): Promise<{ chunksAdded: number; filesProcessed: number }>
  query(question: string): Promise<QueryResult>
  stats(): { chunks: number }
  clear(): void
}

create(config?) — Creates engine, auto-detects provider from env. Verifies embeddings before returning.

ingest(pathOrText, options?) — Ingests a directory, file, or raw text string.

query(question) — Runs the agentic loop. Returns answer + sources + trace + metrics.

RagConfig

interface RagConfig {
  llm?: LLMConfig | LLMProvider         // default: OpenAI gpt-4o-mini
  embeddings?: EmbeddingsConfig          // default: text-embedding-3-small
  store?: VectorStore                    // default: in-memory
  chunker?: ChunkerConfig | string       // default: sliding-window 512 tokens
  agent?: AgentConfig                    // default: 3 retries, 0.5 threshold
  retrieval?: RetrievalConfig            // default: topK 10
}

interface AgentConfig {
  maxRetries?: number           // default: 3
  relevanceThreshold?: number   // default: 0.5
  systemPrompt?: string         // prepended to synthesis prompt
}

QueryResult

interface QueryResult {
  answer: string           // generated answer with citations
  sources: ScoredChunk[]   // chunks used, sorted by relevance
  trace: TraceEntry[]      // full agent decision log
  metrics: QueryMetrics    // timing and usage stats
}

interface QueryMetrics {
  totalTimeMs: number
  retrievalTimeMs: number
  llmCalls: number
  tokensUsed: number
}

TraceEntry

interface TraceEntry {
  action: 'search' | 'evaluate' | 'rewrite' | 'broaden' | 'synthesize' | 'give_up'
  timestamp: number
  query?: string
  resultsCount?: number
  attempt?: number
  score?: number
  decision?: string
  reasoning?: string
  newQuery?: string
}

LLMProvider / EmbeddingsProvider

interface LLMProvider {
  chat(messages: ChatMessage[], options?: LLMCallOptions): Promise<string>
  chatJSON<T>(messages: ChatMessage[], options?: LLMCallOptions): Promise<T>
}

interface EmbeddingsProvider {
  embed(texts: string[]): Promise<number[][]>
  embedQuery(text: string): Promise<number[]>
}

Implement these interfaces to bring your own LLM or embeddings provider.

VectorStore

interface VectorStore {
  add(chunks: Chunk[]): Promise<void>
  search(embedding: number[], topK: number): Promise<ScoredChunk[]>
  count(): number
  clear(): void
}

Examples

Quickstart (5 Lines)

import { RagEngine } from 'rag-engine'

const rag = await RagEngine.create()
await rag.ingest('./docs')
const result = await rag.query('How does auth work?')
console.log(result.answer)

Chat with Your Codebase

import { RagEngine } from 'rag-engine'

const rag = await RagEngine.create()
await rag.ingest('./src', { glob: '**/*.{ts,js}' })

const result = await rag.query('Where is user authentication handled?')
console.log(result.answer)
console.log(result.sources)

Free Local RAG with Ollama

100% free, 100% local. No API keys, no data leaves your machine.

import { RagEngine } from 'rag-engine'

const rag = await RagEngine.create({
  llm: { provider: 'ollama', model: 'llama3' },
  embeddings: { provider: 'ollama', model: 'nomic-embed-text' },
})

await rag.ingest('./docs')
const result = await rag.query('Explain the retry logic')

Express.js API

import express from 'express'
import { RagEngine } from 'rag-engine'

const app = express()
const rag = await RagEngine.create()
await rag.ingest('./docs')

app.use(express.json())
app.post('/ask', async (req, res) => {
  const result = await rag.query(req.body.question)
  res.json(result)
})
app.listen(3000, () => console.log('RAG API on :3000'))

Next.js API Route

// app/api/chat/route.ts
import { RagEngine } from 'rag-engine'
import { NextRequest, NextResponse } from 'next/server'

let rag: RagEngine

async function getRag() {
  if (!rag) {
    rag = await RagEngine.create()
    await rag.ingest('./docs')
  }
  return rag
}

export async function POST(req: NextRequest) {
  const { question } = await req.json()
  const engine = await getRag()
  const result = await engine.query(question)
  return NextResponse.json(result)
}

Customer Support Bot

import { RagEngine } from 'rag-engine'

const rag = await RagEngine.create({
  agent: {
    systemPrompt: `You are a helpful customer support agent.
      Always be polite. If you don't know, suggest contacting
      support@company.com.`,
    maxRetries: 2,
  },
})

await rag.ingest('./knowledge-base')
const result = await rag.query('How do I reset my password?')
console.log(result.answer)

Providers

ProviderLLM ModelsEmbeddingsCostSetup
OpenAIgpt-4o, gpt-4o-minitext-embedding-3-smallPaidOPENAI_API_KEY
Anthropicclaude-sonnet, haiku— (use OpenAI)PaidANTHROPIC_API_KEY
Ollamallama3, mistral, phi3nomic-embed-textFREELocal install
Googlegemini-2.0-flashtext-embedding-004Free tierGOOGLE_API_KEY

Auto-detection: if no provider is configured, rag-engine checks OPENAI_API_KEY from environment.

Architecture

Module Map

src/
  core/engine.ts       RagEngine class — wires everything together
  core/agent.ts        Agentic loop (retrieve → judge → decide → retry/answer)
  llm/openai.ts        OpenAI LLM + embeddings via native fetch()
  llm/prompts.ts       All agent prompts (judge, synthesizer)
  stores/memory.ts     In-memory vector store (Map + cosine similarity)
  ingest/loader.ts     File/directory loader with glob support
  ingest/chunkers/     Sliding-window chunker (sentence-aware)

Design Decisions

DecisionChoiceWhy
Runtime depsZeroDifferentiator vs LangChain (200+ deps)
HTTPNative fetch()Node 18+ built-in
Default storeIn-memoryWorks instantly, zero setup
Default LLMAuto-detectZero config needed
BuildtsupFast ESM output with .d.ts
TestsvitestFast native TypeScript

Roadmap

Phase 1 — MVP (v0.1.0–v0.3.0) ✓

RagEngine.create() with auto-detect, agentic loop, in-memory store, OpenAI provider, sliding-window chunker, CLI, full TypeScript types. 4 bugs found and fixed through audit.

Phase 2 — CLI + Providers

Full CLI (init, serve), Ollama/Anthropic/Gemini providers, markdown + code chunkers, hybrid retrieval (vector + BM25).

Phase 3 — Evaluation + Plugins

Built-in evaluation, plugin system, web-search fallback, SQLite store, example projects.

Phase 4 — External Stores + Polish

Pinecone, Chroma, Qdrant stores, streaming responses, composable pipeline builder, full docs and tutorials.