Knowledge-RAG v3.0: The Local RAG MCP Server That Finally Just Works

Knowledge-RAG v3.0: The Local RAG MCP Server That Finally Just Works

Knowledge-RAG v3.0 eliminates Docker/Ollama setup, adds hybrid search with cross-encoder reranking, and auto-indexes your docs—making private RAG in Claude Code a one-command install.

3h ago·4 min read·3 views·via hn_claude_code
Share:

Knowledge-RAG v3.0: The Local RAG MCP Server That Finally Just Works

What It Does

Knowledge-RAG is a Model Context Protocol (MCP) server that brings local, private Retrieval-Augmented Generation to Claude Code. Unlike cloud RAG solutions that risk data leaks or complex local setups requiring Docker and Ollama, Knowledge-RAG installs with pip install knowledge-rag and runs entirely on your machine. It transforms your notes, documentation, PDFs, and other files into instantly searchable context for Claude, with zero external dependencies.

Version 3.0 is a major overhaul that replaces the Ollama backend with FastEmbed and ONNX Runtime, making the system faster, more reliable, and completely serverless. The update introduces 12 MCP tools, hybrid search with cross-encoder reranking, markdown-aware chunking, and real-time file monitoring.

Why It Works Better

The magic of v3.0 comes from three key improvements:

Glama Score

1. No More External Servers
FastEmbed replaces Ollama entirely. Embeddings and reranking now run in-process via ONNX Runtime—no server to start, no port conflicts, no separate process management. The embedding model (BAAI/bge-small-en-v1.5, ~50MB) downloads automatically on first run and caches locally.

2. Hybrid Search + Cross-Encoder Reranking
Knowledge-RAG combines:

  • BM25 search for keyword matching (with 54 security-term synonym expansions—"sqli" automatically includes "sql injection")
  • Vector search for semantic similarity
  • Cross-encoder reranking (Xenova/ms-marco-MiniLM-L-6-v2, ~25MB) that jointly scores query-document pairs for precision
  • Maximal Marginal Relevance (lambda=0.7) to reduce redundant results and promote diverse sources

This pipeline dramatically improves results for ambiguous queries where simple vector similarity fails.

3. Smart Document Processing
Markdown files are now split by ## and ### header boundaries instead of fixed windows, creating semantically coherent chunks. The system supports 9 formats (DOCX, Excel, PowerPoint, CSV, PDF, etc.) with format-specific extraction—DOCX headings become markdown structure, Excel sheets become text tables.

How To Install & Use It

Installation

# Clone and install
pip install knowledge-rag

# Restart Claude Code — that's it

After installation, Claude Code automatically detects the MCP server. The first startup downloads models and embeds your documents (stored in ~/knowledge_rag/documents/ by default).

Configuration

Add to your Claude Code configuration:

{
  "mcpServers": {
    "knowledge-rag": {
      "command": "knowledge-rag",
      "args": ["--documents", "/path/to/your/docs"]
    }
  }
}

Or use the default documents directory:

# Just copy files here
cp your-file.pdf ~/knowledge_rag/documents/

The system monitors this directory via watchdog with 5-second debounce—add, modify, or delete files, and it auto-reindexes.

Querying from Claude Code

Once running, use natural language in Claude Code:

"Search our internal docs for the API rate limiting policy"
"Find the deployment checklist from last quarter's post-mortem"
"What does our security policy say about password rotation?"

Knowledge-RAG provides the relevant document chunks as context, and Claude incorporates them into its responses.

When To Use It

Knowledge-RAG shines when you need Claude to reference:

  • Internal documentation and wikis
  • Security policies and compliance docs
  • API specifications and architecture diagrams
  • Post-mortems and runbooks
  • Team notes and meeting summaries
  • Research papers and technical PDFs

Because everything runs locally, it's ideal for proprietary codebases, sensitive internal processes, or any documentation you wouldn't send to a cloud API.

Upgrade Notes

If upgrading from v2.x:

git pull origin main
source venv/bin/activate  # or .\venv\Scripts\activate on Windows
pip install -r requirements.txt
# Restart Claude Code

The first startup after upgrading triggers a "nuclear rebuild"—all documents are re-embedded with the new 384-dimension model. This takes longer initially but results in faster queries afterward.

The Bottom Line

Knowledge-RAG v3.0 delivers what local RAG promised but rarely achieved: one-command installation, zero configuration headaches, and retrieval precision that actually finds what you need. By eliminating Docker/Ollama complexity and adding sophisticated reranking, it makes private document search in Claude Code not just possible but practical.

AI Analysis

Claude Code users should install Knowledge-RAG today if they work with any private documentation. The v3.0 update fundamentally changes the calculus—what was previously a 15-minute Docker/Ollama setup is now `pip install`. This isn't just incremental improvement; it's a different category of tool. **Immediate action:** Run `pip install knowledge-rag`, restart Claude Code, and drop your most-referenced documentation into `~/knowledge_rag/documents/`. The auto-indexing means you'll have searchable context in under a minute. Start with your team's internal wiki, API docs, or security policies—these are exactly the documents Claude needs but can't access otherwise. **Workflow change:** Instead of manually copying documentation snippets into prompts, use natural language queries like "What's our deployment checklist?" or "Find the error handling guidelines." Knowledge-RAG will retrieve the relevant sections, and Claude will incorporate them. This is particularly powerful for onboarding new team members or working in unfamiliar parts of the codebase where you need to reference existing patterns and policies.
Original sourcegithub.com

Trending Now

More in Products & Launches

Browse more AI articles