rag-engine

Agentic RAG Framework for Node.js
Zero dependencies. Auto-retries. Full decision trace.

npm install rag-engine

v0.3.0 · 0 dependencies · MIT License · Node 18+

Agentic, Not Basic

Basic RAG returns garbage when retrieval fails. rag-engine retries with better queries, rewrites using document vocabulary, and admits when it doesn't know.

Zero Dependencies

Core runs with just Node.js. No axios, no langchain, no bloat. Native fetch() for API calls, in-memory vector store with cosine similarity.

Full Transparency

Every query returns a decision trace showing exactly what the agent did — search, evaluate, rewrite, synthesize. Debug your RAG in seconds.

Getting Started

Overview

rag-engine is an Agentic RAG (Retrieval-Augmented Generation) framework for Node.js. Unlike basic RAG which blindly returns whatever chunks the vector search finds, rag-engine adds an intelligent agent loop that evaluates whether the retrieved chunks actually answer the question — and retries with better queries when they don't.

5-line quickstart — import, create, ingest, query, done
Zero runtime dependencies for the core package
Agentic loop with relevance judge, query rewriting, and honest give-up
Full decision trace on every query for debugging
TypeScript-first with 20+ exported interfaces
CLI for quick prototyping
Auto-detects LLM provider from environment variables

Installation

npm install rag-engine

Requires Node.js 18+ (for native fetch). Set your OpenAI API key:

export OPENAI_API_KEY=sk-...

Or create a .env file in your project root:

OPENAI_API_KEY=sk-your-key-here

Quick Start (5 Lines)

import { RagEngine } from 'rag-engine'

const rag = await RagEngine.create()           // auto-detects OPENAI_API_KEY
await rag.ingest('./docs')                     // loads, chunks, embeds, stores
const result = await rag.query('How does auth work?')
console.log(result.answer)                     // answer with citations
console.log(result.sources)                    // relevant chunks with scores
console.log(result.trace)                      // full agent decision log
console.log(result.metrics)                    // timing and LLM call stats

CLI Usage

# Ingest files
npx rag-engine ingest ./docs
npx rag-engine ingest ./src --glob "**/*.ts"

# Query (auto-ingests ./docs if present)
npx rag-engine query "How does authentication work?"

# Show index stats
npx rag-engine stats

The CLI reads .env files automatically. Note: CLI uses in-memory store, so documents are re-ingested on every run.

How It Works

The Agentic Pipeline

User Question

↓

1. RETRIEVAL
Vector search → top-K chunks

↓

2. RELEVANCE JUDGE
LLM evaluates: "Do these chunks answer the question?"
Returns score (0-1) + decision

↓

Score ≥ 0.5 → SYNTHESIZE → Answer with citations

Score 0.2-0.5 → REWRITE → Better query, retry

Score < 0.2 → GIVE UP → Honest "I don't know"

↓

RESPONSE
{ answer, sources, trace, metrics }

Basic RAG retrieves chunks and sends them directly to an LLM. If the chunks are bad, the LLM confidently generates a wrong answer. Agentic RAG adds a judge step — and retries with better queries when retrieval fails.

Agent Decisions

Decision	When	What Happens
SYNTHESIZE	Score ≥ 0.5	Chunks are good — generate answer with citations
REWRITE	Score 0.2–0.5	Chunks are related but off-topic — rewrite query using document vocabulary, retry
BROADEN	< 3 results	Too few results — broaden the query, retry
GIVE_UP	Score < 0.2 or max retries	Honestly say "I don't know"

Decision Trace

Every query returns a full trace array showing exactly what the agent did:

result.trace = [
  { action: "search", query: "How does auth work?", resultsCount: 5, attempt: 1 },
  { action: "evaluate", score: 0.42, decision: "rewrite",
    reasoning: "Chunks discuss authorization roles, not authentication flow" },
  { action: "rewrite", newQuery: "user login JWT token session management" },
  { action: "search", query: "user login JWT...", resultsCount: 8, attempt: 2 },
  { action: "evaluate", score: 0.89, decision: "synthesize",
    reasoning: "Chunks cover login flow, JWT creation, and session handling" },
  { action: "synthesize", attempt: 2 }
]

When your RAG gives a bad answer, check result.trace and instantly see where it went wrong.

Agent Loop Pseudocode

async function agentLoop(query, options) {
  const trace = []
  let currentQuery = query

  for (let attempt = 1; attempt <= options.maxRetries; attempt++) {
    const chunks = await retriever.search(currentQuery, { topK: 10 })

    const judgment = await llm.chatJSON({
      system: 'Score 0-1, decide: synthesize|rewrite|broaden|give_up',
      user: `Question: ${currentQuery}\nChunks:\n${formatChunks(chunks)}`
    })

    if (judgment.decision === 'synthesize') {
      const answer = await llm.chat({
        system: 'Answer using ONLY the provided chunks. Cite sources.',
        user: `Question: ${query}\nChunks:\n${formatChunks(chunks)}`
      })
      return { answer, sources: chunks, trace }
    }

    if (judgment.decision === 'rewrite') {
      currentQuery = judgment.rewrittenQuery
      continue
    }

    if (judgment.decision === 'give_up') {
      return { answer: "I don't know.", sources: [], trace }
    }
  }
  return { answer: 'Could not find a confident answer.', sources: [], trace }
}

API Reference

RagEngine

declare class RagEngine {
  static create(config?: RagConfig): Promise<RagEngine>
  ingest(pathOrText: string, options?: IngestOptions): Promise<{ chunksAdded: number; filesProcessed: number }>
  query(question: string): Promise<QueryResult>
  stats(): { chunks: number }
  clear(): void
}

create(config?) — Creates engine, auto-detects provider from env. Verifies embeddings before returning.

ingest(pathOrText, options?) — Ingests a directory, file, or raw text string.

query(question) — Runs the agentic loop. Returns answer + sources + trace + metrics.

RagConfig

interface RagConfig {
  llm?: LLMConfig | LLMProvider         // default: OpenAI gpt-4o-mini
  embeddings?: EmbeddingsConfig          // default: text-embedding-3-small
  store?: VectorStore                    // default: in-memory
  chunker?: ChunkerConfig | string       // default: sliding-window 512 tokens
  agent?: AgentConfig                    // default: 3 retries, 0.5 threshold
  retrieval?: RetrievalConfig            // default: topK 10
}

interface AgentConfig {
  maxRetries?: number           // default: 3
  relevanceThreshold?: number   // default: 0.5
  systemPrompt?: string         // prepended to synthesis prompt
}

QueryResult

interface QueryResult {
  answer: string           // generated answer with citations
  sources: ScoredChunk[]   // chunks used, sorted by relevance
  trace: TraceEntry[]      // full agent decision log
  metrics: QueryMetrics    // timing and usage stats
}

interface QueryMetrics {
  totalTimeMs: number
  retrievalTimeMs: number
  llmCalls: number
  tokensUsed: number
}

TraceEntry

interface TraceEntry {
  action: 'search' | 'evaluate' | 'rewrite' | 'broaden' | 'synthesize' | 'give_up'
  timestamp: number
  query?: string
  resultsCount?: number
  attempt?: number
  score?: number
  decision?: string
  reasoning?: string
  newQuery?: string
}

LLMProvider / EmbeddingsProvider

interface LLMProvider {
  chat(messages: ChatMessage[], options?: LLMCallOptions): Promise<string>
  chatJSON<T>(messages: ChatMessage[], options?: LLMCallOptions): Promise<T>
}

interface EmbeddingsProvider {
  embed(texts: string[]): Promise<number[][]>
  embedQuery(text: string): Promise<number[]>
}

Implement these interfaces to bring your own LLM or embeddings provider.

VectorStore

interface VectorStore {
  add(chunks: Chunk[]): Promise<void>
  search(embedding: number[], topK: number): Promise<ScoredChunk[]>
  count(): number
  clear(): void
}

Examples

Quickstart (5 Lines)

import { RagEngine } from 'rag-engine'

const rag = await RagEngine.create()
await rag.ingest('./docs')
const result = await rag.query('How does auth work?')
console.log(result.answer)

Chat with Your Codebase

import { RagEngine } from 'rag-engine'

const rag = await RagEngine.create()
await rag.ingest('./src', { glob: '**/*.{ts,js}' })

const result = await rag.query('Where is user authentication handled?')
console.log(result.answer)
console.log(result.sources)

Free Local RAG with Ollama

100% free, 100% local. No API keys, no data leaves your machine.

import { RagEngine } from 'rag-engine'

const rag = await RagEngine.create({
  llm: { provider: 'ollama', model: 'llama3' },
  embeddings: { provider: 'ollama', model: 'nomic-embed-text' },
})

await rag.ingest('./docs')
const result = await rag.query('Explain the retry logic')

Express.js API

import express from 'express'
import { RagEngine } from 'rag-engine'

const app = express()
const rag = await RagEngine.create()
await rag.ingest('./docs')

app.use(express.json())
app.post('/ask', async (req, res) => {
  const result = await rag.query(req.body.question)
  res.json(result)
})
app.listen(3000, () => console.log('RAG API on :3000'))

Next.js API Route

// app/api/chat/route.ts
import { RagEngine } from 'rag-engine'
import { NextRequest, NextResponse } from 'next/server'

let rag: RagEngine

async function getRag() {
  if (!rag) {
    rag = await RagEngine.create()
    await rag.ingest('./docs')
  }
  return rag
}

export async function POST(req: NextRequest) {
  const { question } = await req.json()
  const engine = await getRag()
  const result = await engine.query(question)
  return NextResponse.json(result)
}

Customer Support Bot

import { RagEngine } from 'rag-engine'

const rag = await RagEngine.create({
  agent: {
    systemPrompt: `You are a helpful customer support agent.
      Always be polite. If you don't know, suggest contacting
      support@company.com.`,
    maxRetries: 2,
  },
})

await rag.ingest('./knowledge-base')
const result = await rag.query('How do I reset my password?')
console.log(result.answer)

Providers

Provider	LLM Models	Embeddings	Cost	Setup
OpenAI	gpt-4o, gpt-4o-mini	text-embedding-3-small	Paid	`OPENAI_API_KEY`
Anthropic	claude-sonnet, haiku	— (use OpenAI)	Paid	`ANTHROPIC_API_KEY`
Ollama	llama3, mistral, phi3	nomic-embed-text	FREE	Local install
Google	gemini-2.0-flash	text-embedding-004	Free tier	`GOOGLE_API_KEY`

Auto-detection: if no provider is configured, rag-engine checks OPENAI_API_KEY from environment.

Architecture

Module Map

src/
  core/engine.ts       RagEngine class — wires everything together
  core/agent.ts        Agentic loop (retrieve → judge → decide → retry/answer)
  llm/openai.ts        OpenAI LLM + embeddings via native fetch()
  llm/prompts.ts       All agent prompts (judge, synthesizer)
  stores/memory.ts     In-memory vector store (Map + cosine similarity)
  ingest/loader.ts     File/directory loader with glob support
  ingest/chunkers/     Sliding-window chunker (sentence-aware)

Design Decisions

Decision	Choice	Why
Runtime deps	Zero	Differentiator vs LangChain (200+ deps)
HTTP	Native fetch()	Node 18+ built-in
Default store	In-memory	Works instantly, zero setup
Default LLM	Auto-detect	Zero config needed
Build	tsup	Fast ESM output with .d.ts
Tests	vitest	Fast native TypeScript

Roadmap

Phase 1 — MVP (v0.1.0–v0.3.0) ✓

RagEngine.create() with auto-detect, agentic loop, in-memory store, OpenAI provider, sliding-window chunker, CLI, full TypeScript types. 4 bugs found and fixed through audit.

Phase 2 — CLI + Providers

Full CLI (init, serve), Ollama/Anthropic/Gemini providers, markdown + code chunkers, hybrid retrieval (vector + BM25).

Phase 3 — Evaluation + Plugins

Built-in evaluation, plugin system, web-search fallback, SQLite store, example projects.

Phase 4 — External Stores + Polish

Pinecone, Chroma, Qdrant stores, streaming responses, composable pipeline builder, full docs and tutorials.