Agentic, Not Basic
Basic RAG returns garbage when retrieval fails. rag-engine retries with better queries, rewrites using document vocabulary, and admits when it doesn't know.
Zero Dependencies
Core runs with just Node.js. No axios, no langchain, no bloat. Native fetch() for API calls, in-memory vector store with cosine similarity.
Full Transparency
Every query returns a decision trace showing exactly what the agent did — search, evaluate, rewrite, synthesize. Debug your RAG in seconds.
Getting Started
Overview
rag-engine is an Agentic RAG (Retrieval-Augmented Generation) framework for Node.js. Unlike basic RAG which blindly returns whatever chunks the vector search finds, rag-engine adds an intelligent agent loop that evaluates whether the retrieved chunks actually answer the question — and retries with better queries when they don't.
- 5-line quickstart — import, create, ingest, query, done
- Zero runtime dependencies for the core package
- Agentic loop with relevance judge, query rewriting, and honest give-up
- Full decision trace on every query for debugging
- TypeScript-first with 20+ exported interfaces
- CLI for quick prototyping
- Auto-detects LLM provider from environment variables
Installation
npm install rag-engine
Requires Node.js 18+ (for native fetch). Set your OpenAI API key:
export OPENAI_API_KEY=sk-...
Or create a .env file in your project root:
OPENAI_API_KEY=sk-your-key-here
Quick Start (5 Lines)
import { RagEngine } from 'rag-engine'
const rag = await RagEngine.create() // auto-detects OPENAI_API_KEY
await rag.ingest('./docs') // loads, chunks, embeds, stores
const result = await rag.query('How does auth work?')
console.log(result.answer) // answer with citations
console.log(result.sources) // relevant chunks with scores
console.log(result.trace) // full agent decision log
console.log(result.metrics) // timing and LLM call stats
CLI Usage
# Ingest files
npx rag-engine ingest ./docs
npx rag-engine ingest ./src --glob "**/*.ts"
# Query (auto-ingests ./docs if present)
npx rag-engine query "How does authentication work?"
# Show index stats
npx rag-engine stats
The CLI reads .env files automatically. Note: CLI uses in-memory store, so documents are re-ingested on every run.
How It Works
The Agentic Pipeline
Vector search → top-K chunks
LLM evaluates: "Do these chunks answer the question?"
Returns score (0-1) + decision
{ answer, sources, trace, metrics }
Basic RAG retrieves chunks and sends them directly to an LLM. If the chunks are bad, the LLM confidently generates a wrong answer. Agentic RAG adds a judge step — and retries with better queries when retrieval fails.
Agent Decisions
| Decision | When | What Happens |
|---|---|---|
| SYNTHESIZE | Score ≥ 0.5 | Chunks are good — generate answer with citations |
| REWRITE | Score 0.2–0.5 | Chunks are related but off-topic — rewrite query using document vocabulary, retry |
| BROADEN | < 3 results | Too few results — broaden the query, retry |
| GIVE_UP | Score < 0.2 or max retries | Honestly say "I don't know" |
Decision Trace
Every query returns a full trace array showing exactly what the agent did:
result.trace = [
{ action: "search", query: "How does auth work?", resultsCount: 5, attempt: 1 },
{ action: "evaluate", score: 0.42, decision: "rewrite",
reasoning: "Chunks discuss authorization roles, not authentication flow" },
{ action: "rewrite", newQuery: "user login JWT token session management" },
{ action: "search", query: "user login JWT...", resultsCount: 8, attempt: 2 },
{ action: "evaluate", score: 0.89, decision: "synthesize",
reasoning: "Chunks cover login flow, JWT creation, and session handling" },
{ action: "synthesize", attempt: 2 }
]
When your RAG gives a bad answer, check result.trace and instantly see where it went wrong.
Agent Loop Pseudocode
async function agentLoop(query, options) {
const trace = []
let currentQuery = query
for (let attempt = 1; attempt <= options.maxRetries; attempt++) {
const chunks = await retriever.search(currentQuery, { topK: 10 })
const judgment = await llm.chatJSON({
system: 'Score 0-1, decide: synthesize|rewrite|broaden|give_up',
user: `Question: ${currentQuery}\nChunks:\n${formatChunks(chunks)}`
})
if (judgment.decision === 'synthesize') {
const answer = await llm.chat({
system: 'Answer using ONLY the provided chunks. Cite sources.',
user: `Question: ${query}\nChunks:\n${formatChunks(chunks)}`
})
return { answer, sources: chunks, trace }
}
if (judgment.decision === 'rewrite') {
currentQuery = judgment.rewrittenQuery
continue
}
if (judgment.decision === 'give_up') {
return { answer: "I don't know.", sources: [], trace }
}
}
return { answer: 'Could not find a confident answer.', sources: [], trace }
}
API Reference
RagEngine
declare class RagEngine {
static create(config?: RagConfig): Promise<RagEngine>
ingest(pathOrText: string, options?: IngestOptions): Promise<{ chunksAdded: number; filesProcessed: number }>
query(question: string): Promise<QueryResult>
stats(): { chunks: number }
clear(): void
}
create(config?) — Creates engine, auto-detects provider from env. Verifies embeddings before returning.
ingest(pathOrText, options?) — Ingests a directory, file, or raw text string.
query(question) — Runs the agentic loop. Returns answer + sources + trace + metrics.
RagConfig
interface RagConfig {
llm?: LLMConfig | LLMProvider // default: OpenAI gpt-4o-mini
embeddings?: EmbeddingsConfig // default: text-embedding-3-small
store?: VectorStore // default: in-memory
chunker?: ChunkerConfig | string // default: sliding-window 512 tokens
agent?: AgentConfig // default: 3 retries, 0.5 threshold
retrieval?: RetrievalConfig // default: topK 10
}
interface AgentConfig {
maxRetries?: number // default: 3
relevanceThreshold?: number // default: 0.5
systemPrompt?: string // prepended to synthesis prompt
}
QueryResult
interface QueryResult {
answer: string // generated answer with citations
sources: ScoredChunk[] // chunks used, sorted by relevance
trace: TraceEntry[] // full agent decision log
metrics: QueryMetrics // timing and usage stats
}
interface QueryMetrics {
totalTimeMs: number
retrievalTimeMs: number
llmCalls: number
tokensUsed: number
}
TraceEntry
interface TraceEntry {
action: 'search' | 'evaluate' | 'rewrite' | 'broaden' | 'synthesize' | 'give_up'
timestamp: number
query?: string
resultsCount?: number
attempt?: number
score?: number
decision?: string
reasoning?: string
newQuery?: string
}
LLMProvider / EmbeddingsProvider
interface LLMProvider {
chat(messages: ChatMessage[], options?: LLMCallOptions): Promise<string>
chatJSON<T>(messages: ChatMessage[], options?: LLMCallOptions): Promise<T>
}
interface EmbeddingsProvider {
embed(texts: string[]): Promise<number[][]>
embedQuery(text: string): Promise<number[]>
}
Implement these interfaces to bring your own LLM or embeddings provider.
VectorStore
interface VectorStore {
add(chunks: Chunk[]): Promise<void>
search(embedding: number[], topK: number): Promise<ScoredChunk[]>
count(): number
clear(): void
}
Examples
Quickstart (5 Lines)
import { RagEngine } from 'rag-engine'
const rag = await RagEngine.create()
await rag.ingest('./docs')
const result = await rag.query('How does auth work?')
console.log(result.answer)
Chat with Your Codebase
import { RagEngine } from 'rag-engine'
const rag = await RagEngine.create()
await rag.ingest('./src', { glob: '**/*.{ts,js}' })
const result = await rag.query('Where is user authentication handled?')
console.log(result.answer)
console.log(result.sources)
Free Local RAG with Ollama
100% free, 100% local. No API keys, no data leaves your machine.
import { RagEngine } from 'rag-engine'
const rag = await RagEngine.create({
llm: { provider: 'ollama', model: 'llama3' },
embeddings: { provider: 'ollama', model: 'nomic-embed-text' },
})
await rag.ingest('./docs')
const result = await rag.query('Explain the retry logic')
Express.js API
import express from 'express'
import { RagEngine } from 'rag-engine'
const app = express()
const rag = await RagEngine.create()
await rag.ingest('./docs')
app.use(express.json())
app.post('/ask', async (req, res) => {
const result = await rag.query(req.body.question)
res.json(result)
})
app.listen(3000, () => console.log('RAG API on :3000'))
Next.js API Route
// app/api/chat/route.ts
import { RagEngine } from 'rag-engine'
import { NextRequest, NextResponse } from 'next/server'
let rag: RagEngine
async function getRag() {
if (!rag) {
rag = await RagEngine.create()
await rag.ingest('./docs')
}
return rag
}
export async function POST(req: NextRequest) {
const { question } = await req.json()
const engine = await getRag()
const result = await engine.query(question)
return NextResponse.json(result)
}
Customer Support Bot
import { RagEngine } from 'rag-engine'
const rag = await RagEngine.create({
agent: {
systemPrompt: `You are a helpful customer support agent.
Always be polite. If you don't know, suggest contacting
support@company.com.`,
maxRetries: 2,
},
})
await rag.ingest('./knowledge-base')
const result = await rag.query('How do I reset my password?')
console.log(result.answer)
Providers
| Provider | LLM Models | Embeddings | Cost | Setup |
|---|---|---|---|---|
| OpenAI | gpt-4o, gpt-4o-mini | text-embedding-3-small | Paid | OPENAI_API_KEY |
| Anthropic | claude-sonnet, haiku | — (use OpenAI) | Paid | ANTHROPIC_API_KEY |
| Ollama | llama3, mistral, phi3 | nomic-embed-text | FREE | Local install |
| gemini-2.0-flash | text-embedding-004 | Free tier | GOOGLE_API_KEY |
Auto-detection: if no provider is configured, rag-engine checks OPENAI_API_KEY from environment.
Architecture
Module Map
src/
core/engine.ts RagEngine class — wires everything together
core/agent.ts Agentic loop (retrieve → judge → decide → retry/answer)
llm/openai.ts OpenAI LLM + embeddings via native fetch()
llm/prompts.ts All agent prompts (judge, synthesizer)
stores/memory.ts In-memory vector store (Map + cosine similarity)
ingest/loader.ts File/directory loader with glob support
ingest/chunkers/ Sliding-window chunker (sentence-aware)
Design Decisions
| Decision | Choice | Why |
|---|---|---|
| Runtime deps | Zero | Differentiator vs LangChain (200+ deps) |
| HTTP | Native fetch() | Node 18+ built-in |
| Default store | In-memory | Works instantly, zero setup |
| Default LLM | Auto-detect | Zero config needed |
| Build | tsup | Fast ESM output with .d.ts |
| Tests | vitest | Fast native TypeScript |
Roadmap
Phase 1 — MVP (v0.1.0–v0.3.0) ✓
RagEngine.create() with auto-detect, agentic loop, in-memory store, OpenAI provider, sliding-window chunker, CLI, full TypeScript types. 4 bugs found and fixed through audit.
Phase 2 — CLI + Providers
Full CLI (init, serve), Ollama/Anthropic/Gemini providers, markdown + code chunkers, hybrid retrieval (vector + BM25).
Phase 3 — Evaluation + Plugins
Built-in evaluation, plugin system, web-search fallback, SQLite store, example projects.
Phase 4 — External Stores + Polish
Pinecone, Chroma, Qdrant stores, streaming responses, composable pipeline builder, full docs and tutorials.