pgvector
13 articles about pgvector in AI news
Expose pgvector as an MCP Server: From Hardcoded RAG to Reusable Tool Server
Wrap pgvector search in FastMCP to create a reusable MCP server. Any LLM client—including Claude Code—can then query your vector database without hardcoded integrations.
We Cut Embedding Storage Costs by ~90% — Replacing S3 with PostgreSQL
A team cut embedding storage costs by ~90% by migrating from S3 to PostgreSQL with pgvector, enabling efficient vector search and on-demand retrieval for RAG and recommender systems, with no performance loss.
Agent Harnessing: The Infrastructure That Makes AI Agents Work
A detailed technical guide argues that the model is not the hard part of building AI agents. The six-component harness — context management, memory, tools, control flow, verification, and coordination — is what separates production-grade agents from those that fail silently.
RAG vs Fine-Tuning: A Practical Guide for Choosing the Right LLM
The article provides a clear, decision-oriented comparison between Retrieval-Augmented Generation (RAG) and fine-tuning for customizing LLMs in production, helping practitioners choose the right approach based on data freshness, cost, and output control needs.
A Practical Framework for Moving Enterprise RAG from POC to Production
The article presents a detailed, production-ready framework for building an enterprise RAG system, covering architecture, security, and deployment. It provides a concrete path for companies to move beyond experimental prototypes.
How I Built a Production RAG Pipeline for Fintech at 1M+ Daily Transactions
A technical case study from a fintech ML engineer outlines the end-to-end design of a Retrieval-Augmented Generation pipeline built for production at extreme scale, processing over a million daily transactions. It provides a rare, real-world blueprint for building reliable, high-volume AI systems.
A Go Developer's Journey to Demystify AI and Build a RAG System
A developer recounts his journey from viewing AI as an intimidating 'monster' to building a functional RAG system, providing a practical, ground-level perspective on implementation. This matters as it reflects the ongoing democratization of advanced AI techniques beyond research labs.
DevFix MCP Server: Stop Your AI Assistant from Using Outdated Stack Overflow Answers
A new MCP server provides Claude Code with version-aware, community-verified solutions to coding problems, replacing unreliable web searches.
Modern RAG in 2026: A Production-First Breakdown of the Evolving Stack
A technical guide outlines the critical components of a modern Retrieval-Augmented Generation (RAG) system for 2026, focusing on production-ready elements like ingestion, parsing, retrieval, and reranking. This matters as RAG is the dominant method for grounding enterprise LLMs in private data.
Add Vector Memory to Claude Code: The claude-memory-mcp Server Solves CLAUDE.md's 200-Line Limit
Install this open-source MCP server to give Claude Code persistent, searchable memory across projects. It surfaces only relevant context, solving CLAUDE.md's scaling problems.
How to Run 60 Code Experiments Overnight with Claude Code's Autoresearch Skill
A developer open-sourced a Claude Code skill that autonomously runs experiments on your codebase, proving what doesn't work is as valuable as what does.
Google Launches Gemini Embedding 2: A New Multimodal Foundation for AI
Google has launched Gemini Embedding 2, a second-generation multimodal embedding model. This technical release, alongside the removal of API rate limits, provides developers with a more powerful and accessible tool for building AI applications that understand text, images, and other data types.
Beyond MMR: A Parameter-Free AI Approach to Curate Diverse, Relevant Product Recommendations
New research tackles the NP-hard problem of balancing similarity and diversity in vector retrieval. For luxury retail, this means AI can generate more serendipitous, engaging, and commercially effective product recommendations and search results without manual tuning.