Knowledge-RAG v3.0: The Local RAG MCP Server That Finally Just Works
What It Does
Knowledge-RAG is a Model Context Protocol (MCP) server that brings local, private Retrieval-Augmented Generation to Claude Code. Unlike cloud RAG solutions that risk data leaks or complex local setups requiring Docker and Ollama, Knowledge-RAG installs with pip install knowledge-rag and runs entirely on your machine. It transforms your notes, documentation, PDFs, and other files into instantly searchable context for Claude, with zero external dependencies.
Version 3.0 is a major overhaul that replaces the Ollama backend with FastEmbed and ONNX Runtime, making the system faster, more reliable, and completely serverless. The update introduces 12 MCP tools, hybrid search with cross-encoder reranking, markdown-aware chunking, and real-time file monitoring.
Why It Works Better
The magic of v3.0 comes from three key improvements:
1. No More External Servers
FastEmbed replaces Ollama entirely. Embeddings and reranking now run in-process via ONNX Runtime—no server to start, no port conflicts, no separate process management. The embedding model (BAAI/bge-small-en-v1.5, ~50MB) downloads automatically on first run and caches locally.
2. Hybrid Search + Cross-Encoder Reranking
Knowledge-RAG combines:
- BM25 search for keyword matching (with 54 security-term synonym expansions—"sqli" automatically includes "sql injection")
- Vector search for semantic similarity
- Cross-encoder reranking (Xenova/ms-marco-MiniLM-L-6-v2, ~25MB) that jointly scores query-document pairs for precision
- Maximal Marginal Relevance (lambda=0.7) to reduce redundant results and promote diverse sources
This pipeline dramatically improves results for ambiguous queries where simple vector similarity fails.
3. Smart Document Processing
Markdown files are now split by ## and ### header boundaries instead of fixed windows, creating semantically coherent chunks. The system supports 9 formats (DOCX, Excel, PowerPoint, CSV, PDF, etc.) with format-specific extraction—DOCX headings become markdown structure, Excel sheets become text tables.
How To Install & Use It
Installation
# Clone and install
pip install knowledge-rag
# Restart Claude Code — that's it
After installation, Claude Code automatically detects the MCP server. The first startup downloads models and embeds your documents (stored in ~/knowledge_rag/documents/ by default).
Configuration
Add to your Claude Code configuration:
{
"mcpServers": {
"knowledge-rag": {
"command": "knowledge-rag",
"args": ["--documents", "/path/to/your/docs"]
}
}
}
Or use the default documents directory:
# Just copy files here
cp your-file.pdf ~/knowledge_rag/documents/
The system monitors this directory via watchdog with 5-second debounce—add, modify, or delete files, and it auto-reindexes.
Querying from Claude Code
Once running, use natural language in Claude Code:
"Search our internal docs for the API rate limiting policy"
"Find the deployment checklist from last quarter's post-mortem"
"What does our security policy say about password rotation?"
Knowledge-RAG provides the relevant document chunks as context, and Claude incorporates them into its responses.
When To Use It
Knowledge-RAG shines when you need Claude to reference:
- Internal documentation and wikis
- Security policies and compliance docs
- API specifications and architecture diagrams
- Post-mortems and runbooks
- Team notes and meeting summaries
- Research papers and technical PDFs
Because everything runs locally, it's ideal for proprietary codebases, sensitive internal processes, or any documentation you wouldn't send to a cloud API.
Upgrade Notes
If upgrading from v2.x:
git pull origin main
source venv/bin/activate # or .\venv\Scripts\activate on Windows
pip install -r requirements.txt
# Restart Claude Code
The first startup after upgrading triggers a "nuclear rebuild"—all documents are re-embedded with the new 384-dimension model. This takes longer initially but results in faster queries afterward.
The Bottom Line
Knowledge-RAG v3.0 delivers what local RAG promised but rarely achieved: one-command installation, zero configuration headaches, and retrieval precision that actually finds what you need. By eliminating Docker/Ollama complexity and adding sophisticated reranking, it makes private document search in Claude Code not just possible but practical.





